

Labels: chart, regex, spreadsheet, sudoku
Writing a regex-based sudoku solver with very little language-dependent code opened the door to benchmarking the various PCRE engines. Without much work, I was able to test perl, ruby, python and python-psyco.
Here are the results, unit is seconds:
sudre.pl sudre.rb sudre.py sudre.psyco
empty.input 0.012 0.028 0.316 0.240
full.input 0.032 0.076 0.312 0.264
level-1.input1 0.028 0.072 0.328 0.268
level-1.input2 0.028 0.068 0.328 0.268
level-1.input3 0.024 0.052 0.328 0.248
level-1.input4 0.008 0.020 0.316 0.240
level-1.input5 0.008 0.016 0.308 0.236
level-2.input1 0.012 0.016 0.320 0.228
level-2.input2 0.112 0.300 0.400 0.356
level-2.input3 0.172 0.464 0.440 0.412
level-2.input4 0.016 0.048 0.316 0.252
level-2.input5 0.064 0.184 0.340 0.288
level-3.input1 0.212 0.552 0.484 0.424
level-3.input2 0.028 0.072 0.336 0.244
level-3.input3 1.404 3.564 1.428 1.492
level-3.input4 0.072 0.172 0.360 0.296
level-3.input5 3.676 8.725 3.044 3.332
level-4.input1 0.612 1.504 0.840 0.776
level-4.input2 1.580 4.036 1.604 1.736
level-4.input3 0.076 0.220 0.332 0.316
level-4.input4 0.052 0.140 0.352 0.284
level-4.input5 0.076 0.196 0.368 0.292
----- ----- ----- -----
mean 0.377 0.933 0.600 0.568
Some quick observations:
About a year ago, I proposed a contest on #noise to write a sudoku solver. LIM wrote us a driver and generator. Chouser, Mr_Bones_, and owend wrote solvers. I started one in LISP but never finished it, to my shame...
One question that's always been in my mind is "How could we use a regex to solve a sudoku?" The main problem seems to be the input: Regular expressions are a language for searching, so you need the search space expressed in a way that the regex can process it. The naive approach expresses all the possible puzzles on the input, which is way too much data.
I usually think of a sudoku like this, where zeros are squares that need to be solved:
0 0 0 4 0 0 0 0 0 0 0 0 0 0 2 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 1 0 0 0 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 7 0 0 0 0 0 0 0 0 0 0 7 6 3 0 0 0 0 0
The problem with this representation is that the zeros aren't searchable. The regular expression can't tell that they mean "any number 1 through 9". So for a first step, my solver replaces them:
$puz =~ s{\d}{$& ? $&.' ' : '123456789'}ge;
So the massaged input looks like this:
123456789 123456789 123456789 4 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 2 123456789 7 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 4 1 123456789 123456789 123456789 9 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 6 7 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 7 6 3 123456789 123456789 123456789 123456789 123456789
Now each cell contains the full list of possibilities for that cell. This is searchable. And with negative lookahead assertions, it doesn't have to waste nearly as much time backtracking.
The final output, after the second substitution, is:
9 8 7 4 6 5 3 2 1 6 5 4 1 3 2 9 7 8 3 2 1 9 8 7 6 5 4 8 6 5 7 9 3 4 1 2 7 4 9 8 2 1 5 3 6 2 1 3 6 5 4 7 8 9 5 3 8 2 4 9 1 6 7 1 9 2 5 7 6 8 4 3 4 7 6 3 1 8 2 9 5
Neat, eh? So finally, here are the two scripts implementing this solution. The first script is the solver itself. It accepts a puzzle on stdin formatted like the input above, converts it internally to the searchable format, and generates on stdout the solved puzzle. The second script generates the regular expression which is inserted into the first script.
By the way, while this isn't the fastest solver out there by any means (it's just brute force, after all), it solves most puzzles in approximate 0.01 seconds. That's 1/10 of the time genre.pl takes to generate the regular expression in the first place.