Labels: chart, regex, spreadsheet, sudoku
Writing a regex-based sudoku solver with very little language-dependent code opened the door to benchmarking the various PCRE engines. Without much work, I was able to test perl, ruby, python and python-psyco.
Here are the results, unit is seconds:
sudre.pl sudre.rb sudre.py sudre.psyco empty.input 0.012 0.028 0.316 0.240 full.input 0.032 0.076 0.312 0.264 level-1.input1 0.028 0.072 0.328 0.268 level-1.input2 0.028 0.068 0.328 0.268 level-1.input3 0.024 0.052 0.328 0.248 level-1.input4 0.008 0.020 0.316 0.240 level-1.input5 0.008 0.016 0.308 0.236 level-2.input1 0.012 0.016 0.320 0.228 level-2.input2 0.112 0.300 0.400 0.356 level-2.input3 0.172 0.464 0.440 0.412 level-2.input4 0.016 0.048 0.316 0.252 level-2.input5 0.064 0.184 0.340 0.288 level-3.input1 0.212 0.552 0.484 0.424 level-3.input2 0.028 0.072 0.336 0.244 level-3.input3 1.404 3.564 1.428 1.492 level-3.input4 0.072 0.172 0.360 0.296 level-3.input5 3.676 8.725 3.044 3.332 level-4.input1 0.612 1.504 0.840 0.776 level-4.input2 1.580 4.036 1.604 1.736 level-4.input3 0.076 0.220 0.332 0.316 level-4.input4 0.052 0.140 0.352 0.284 level-4.input5 0.076 0.196 0.368 0.292 ----- ----- ----- ----- mean 0.377 0.933 0.600 0.568
Some quick observations:
About a year ago, I proposed a contest on #noise to write a sudoku solver. LIM wrote us a driver and generator. Chouser, Mr_Bones_, and owend wrote solvers. I started one in LISP but never finished it, to my shame...
One question that's always been in my mind is "How could we use a regex to solve a sudoku?" The main problem seems to be the input: Regular expressions are a language for searching, so you need the search space expressed in a way that the regex can process it. The naive approach expresses all the possible puzzles on the input, which is way too much data.
I usually think of a sudoku like this, where zeros are squares that need to be solved:
0 0 0 4 0 0 0 0 0 0 0 0 0 0 2 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 1 0 0 0 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 7 0 0 0 0 0 0 0 0 0 0 7 6 3 0 0 0 0 0
The problem with this representation is that the zeros aren't searchable. The regular expression can't tell that they mean "any number 1 through 9". So for a first step, my solver replaces them:
$puz =~ s{\d}{$& ? $&.' ' : '123456789'}ge;
So the massaged input looks like this:
123456789 123456789 123456789 4 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 2 123456789 7 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 4 1 123456789 123456789 123456789 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 6 7 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 7 6 3 123456789 123456789 123456789 123456789 123456789
Now each cell contains the full list of possibilities for that cell. This is searchable. And with negative lookahead assertions, it doesn't have to waste nearly as much time backtracking.
The final output, after the second substitution, is:
9 8 7 4 6 5 3 2 1 6 5 4 1 3 2 9 7 8 3 2 1 9 8 7 6 5 4 8 6 5 7 9 3 4 1 2 7 4 9 8 2 1 5 3 6 2 1 3 6 5 4 7 8 9 5 3 8 2 4 9 1 6 7 1 9 2 5 7 6 8 4 3 4 7 6 3 1 8 2 9 5
Neat, eh? So finally, here are the two scripts implementing this solution. The first script is the solver itself. It accepts a puzzle on stdin formatted like the input above, converts it internally to the searchable format, and generates on stdout the solved puzzle. The second script generates the regular expression which is inserted into the first script.
By the way, while this isn't the fastest solver out there by any means (it's just brute force, after all), it solves most puzzles in approximate 0.01 seconds. That's 1/10 of the time genre.pl takes to generate the regular expression in the first place.