n01senet: May 2007

n01senet

regex engine timing charts

Charts by Google Spreadsheets (taller columns are worse):

Labels: chart, regex, spreadsheet, sudoku

¶ 7:15 PM Tuesday, May 08, 2007 by Chouser 1 comments

regex engine timing

Writing a regex-based sudoku solver with very little language-dependent code opened the door to benchmarking the various PCRE engines. Without much work, I was able to test perl, ruby, python and python-psyco.

Here are the results, unit is seconds:

                sudre.pl  sudre.rb  sudre.py  sudre.psyco
empty.input     0.012     0.028     0.316     0.240
full.input      0.032     0.076     0.312     0.264
level-1.input1  0.028     0.072     0.328     0.268
level-1.input2  0.028     0.068     0.328     0.268
level-1.input3  0.024     0.052     0.328     0.248
level-1.input4  0.008     0.020     0.316     0.240
level-1.input5  0.008     0.016     0.308     0.236
level-2.input1  0.012     0.016     0.320     0.228
level-2.input2  0.112     0.300     0.400     0.356
level-2.input3  0.172     0.464     0.440     0.412
level-2.input4  0.016     0.048     0.316     0.252
level-2.input5  0.064     0.184     0.340     0.288
level-3.input1  0.212     0.552     0.484     0.424
level-3.input2  0.028     0.072     0.336     0.244
level-3.input3  1.404     3.564     1.428     1.492
level-3.input4  0.072     0.172     0.360     0.296
level-3.input5  3.676     8.725     3.044     3.332
level-4.input1  0.612     1.504     0.840     0.776
level-4.input2  1.580     4.036     1.604     1.736
level-4.input3  0.076     0.220     0.332     0.316
level-4.input4  0.052     0.140     0.352     0.284
level-4.input5  0.076     0.196     0.368     0.292
                -----     -----     -----     -----
          mean  0.377     0.933     0.600     0.568

Some quick observations:

Perl is fastest on most of the puzzles, but python is far more consistent. My first guess is that python takes a lot longer to compile the regex, but may execute it faster after it's compiled.
Ruby-1.8 (current stable) crashes with a stack overflow trying to compile the regex. Ruby-1.9 was used for this test.
It would be interesting to make some graphs out of this... volunteers?

sudoku by regex

About a year ago, I proposed a contest on #noise to write a sudoku solver. LIM wrote us a driver and generator. Chouser, Mr_Bones_, and owend wrote solvers. I started one in LISP but never finished it, to my shame...

One question that's always been in my mind is "How could we use a regex to solve a sudoku?" The main problem seems to be the input: Regular expressions are a language for searching, so you need the search space expressed in a way that the regex can process it. The naive approach expresses all the possible puzzles on the input, which is way too much data.

I usually think of a sudoku like this, where zeros are squares that need to be solved:

0 0 0 4 0 0 0 0 0
0 0 0 0 0 2 0 7 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 4 1 0
0 0 9 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 6 7
0 0 0 0 0 0 0 0 0
0 7 6 3 0 0 0 0 0

The problem with this representation is that the zeros aren't searchable. The regular expression can't tell that they mean "any number 1 through 9". So for a first step, my solver replaces them:

$puz =~ s{\d}{$& ? $&.'        ' : '123456789'}ge;

So the massaged input looks like this:

123456789 123456789 123456789 4         123456789 123456789 123456789 123456789 123456789
123456789 123456789 123456789 123456789 123456789 2         123456789 7         123456789
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
123456789 123456789 123456789 123456789 123456789 123456789 4         1         123456789
123456789 123456789 9         123456789 123456789 123456789 123456789 123456789 123456789
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
123456789 123456789 123456789 123456789 123456789 123456789 123456789 6         7
123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
123456789 7         6         3         123456789 123456789 123456789 123456789 123456789

Now each cell contains the full list of possibilities for that cell. This is searchable. And with negative lookahead assertions, it doesn't have to waste nearly as much time backtracking.

The final output, after the second substitution, is:

9 8 7 4 6 5 3 2 1
6 5 4 1 3 2 9 7 8
3 2 1 9 8 7 6 5 4
8 6 5 7 9 3 4 1 2
7 4 9 8 2 1 5 3 6
2 1 3 6 5 4 7 8 9
5 3 8 2 4 9 1 6 7
1 9 2 5 7 6 8 4 3
4 7 6 3 1 8 2 9 5

Neat, eh? So finally, here are the two scripts implementing this solution. The first script is the solver itself. It accepts a puzzle on stdin formatted like the input above, converts it internally to the searchable format, and generates on stdout the solved puzzle. The second script generates the regular expression which is inserted into the first script.

By the way, while this isn't the fastest solver out there by any means (it's just brute force, after all), it solves most puzzles in approximate 0.01 seconds. That's 1/10 of the time genre.pl takes to generate the regular expression in the first place.

¶ 9:03 AM Thursday, May 03, 2007 by agriffis

Contributors