Solving Japanese Crosswords with SAT Solver
On Habré there were several articles on solving Japanese crosswords, where the authors came up with various ways to solve such crosswords. In the commentary on the article Solving Japanese Color Crosswords with the Speed of Light, I suggested that since the solution of Japanese crosswords is an NP-complete problem, they must be solved using an appropriate tool, namely the SAT solver. Since my idea was met with very skepticism, I decided to try to implement it and compare the results with other approaches. What came out of this can be found under the cut.
As you know, the Japanese crossword or nomogram is a logic puzzle in which you need to restore the image in a rectangle using the numbers located to the left in the row headers and on top in the column headers. These numbers correspond to the order and length of the blocks of shaded cells in a row or column, and there must be at least one empty cell between them when blocks are installed. Here I consider only two-color crosswords, where each cell can be either painted over or not. In fact, to solve this problem, it is necessary to find the position of each block. It should be noted that a task may have one or several solutions, as well as no solutions at all. This solver should also report.
It is easy to understand that the problem is generally over the top and the authors of numerous applications are trying to find the fastest solution algorithm bywriting their own bikes of different tricks, instead of using the well-developed methods used in the Boolean satisfiability problem. However, this statement requires proof, so I decided to reformulate the problem of solving a Japanese crossword puzzle in conjunctive-normal form and use one of the well-known SAT solvers to solve it in order to confirm it or disprove it.
To represent this problem in conjunctive-normal form, we introduce one boolean variable for each cell of the field and one variable for each possible position of each block in rows and columns. For a cell of a field, a boolean variable will mean whether this cell is colored or not, and for a block position, a boolean variable will mean whether this block is present at this position or not. We write down the necessary relations between the variables (clauses):
1. Each block declared in a row or column must appear at least in one position. This corresponds to a clause of the form (X1 V X2 V ... XN), where X1, X2 ... XN are all possible positions of a given block in a row or column.
2. Each block in a row or column should appear no more than once. This corresponds to the set of clauses of the form (not Xi) V (not Xj), where Xi, Xj (i! = J) are all possible positions of a given block in a row or column.
3. The correct order of blocks. Since it is necessary to maintain the correct order of blocks, as well as eliminate their intersection, it is necessary to add clauses, of the form (not Xi) V (not Xj), where Xi, Xj are variables corresponding to the positions of different blocks that have the wrong order or intersect.
4. The painted cell must be contained inside at least one block, the position of which includes the given cell. This corresponds to a clause of the form ((not Xk) V X1 V X2 ... XN), where Xk is the variable corresponding to the cell, and X1, X2 ... XN are the variables corresponding to the positions of the blocks containing the given cell.
5. Each empty cell must not be contained in any possible position of a single block. This corresponds to the set of clauses of the form Xi V (not Xj), where Xi is the variable corresponding to the cell, and Xj is the variable corresponding to one position of any block containing the given cell.
Thus, the problem is formulated as a conjunctive-normal form and can be solved with the help of the SAT Solver. Moreover, if there is no solution, the SAT Solver will determine that there is no solution.
Now it is the turn to confirm or deny my assumption that the SAT solver will cope with the solution of Japanese crosswords faster than other algorithms. To test this, I took examples from the Survey of Paint-by-Number Puzzle Solvers website . On this site there is a table comparing the speed of various applications for solving Japanese crosswords and a good set of examples - from the lightest, which are solved by everyone, to complex, which only one application solves. These results were obtained on a computer with a 2.6GHz AMD Phenom II X4 810 quad-core 64-bit processor Memory: 8 Gb. I used an Intel® Core (TM) i7-2600K CPU @ 3.40GHz Memory 16 Gb computer.
As a result, I got the following results:
What conclusions can be drawn from this?
In conclusion, you can add that the SAT solver can receive more than one solution, if there are any. To do this, you need to add one clause of the form ((not X1) V (not X2) V ... (not XN)), where X1, X2 ... XN are variables corresponding to the filled cells in the previous solution.
As you know, the Japanese crossword or nomogram is a logic puzzle in which you need to restore the image in a rectangle using the numbers located to the left in the row headers and on top in the column headers. These numbers correspond to the order and length of the blocks of shaded cells in a row or column, and there must be at least one empty cell between them when blocks are installed. Here I consider only two-color crosswords, where each cell can be either painted over or not. In fact, to solve this problem, it is necessary to find the position of each block. It should be noted that a task may have one or several solutions, as well as no solutions at all. This solver should also report.
It is easy to understand that the problem is generally over the top and the authors of numerous applications are trying to find the fastest solution algorithm by
To represent this problem in conjunctive-normal form, we introduce one boolean variable for each cell of the field and one variable for each possible position of each block in rows and columns. For a cell of a field, a boolean variable will mean whether this cell is colored or not, and for a block position, a boolean variable will mean whether this block is present at this position or not. We write down the necessary relations between the variables (clauses):
1. Each block declared in a row or column must appear at least in one position. This corresponds to a clause of the form (X1 V X2 V ... XN), where X1, X2 ... XN are all possible positions of a given block in a row or column.
2. Each block in a row or column should appear no more than once. This corresponds to the set of clauses of the form (not Xi) V (not Xj), where Xi, Xj (i! = J) are all possible positions of a given block in a row or column.
3. The correct order of blocks. Since it is necessary to maintain the correct order of blocks, as well as eliminate their intersection, it is necessary to add clauses, of the form (not Xi) V (not Xj), where Xi, Xj are variables corresponding to the positions of different blocks that have the wrong order or intersect.
4. The painted cell must be contained inside at least one block, the position of which includes the given cell. This corresponds to a clause of the form ((not Xk) V X1 V X2 ... XN), where Xk is the variable corresponding to the cell, and X1, X2 ... XN are the variables corresponding to the positions of the blocks containing the given cell.
5. Each empty cell must not be contained in any possible position of a single block. This corresponds to the set of clauses of the form Xi V (not Xj), where Xi is the variable corresponding to the cell, and Xj is the variable corresponding to one position of any block containing the given cell.
Thus, the problem is formulated as a conjunctive-normal form and can be solved with the help of the SAT Solver. Moreover, if there is no solution, the SAT Solver will determine that there is no solution.
Now it is the turn to confirm or deny my assumption that the SAT solver will cope with the solution of Japanese crosswords faster than other algorithms. To test this, I took examples from the Survey of Paint-by-Number Puzzle Solvers website . On this site there is a table comparing the speed of various applications for solving Japanese crosswords and a good set of examples - from the lightest, which are solved by everyone, to complex, which only one application solves. These results were obtained on a computer with a 2.6GHz AMD Phenom II X4 810 quad-core 64-bit processor Memory: 8 Gb. I used an Intel® Core (TM) i7-2600K CPU @ 3.40GHz Memory 16 Gb computer.
As a result, I got the following results:
======== sample-nin/webpbn-00001.nin ========
Start read data.
16 lines were read
Solver started. vars = 150
Clauses = 562
SATISFIABLE
apsnono finished.
real 0m0,610s
user 0m0,004s
sys 0m0,002s
========= sample-nin/webpbn-00006.nin ========
Start read data.
41 lines were read
Solver started. vars = 1168
Clauses = 10215
SATISFIABLE
apsnono finished.
real 0m0,053s
user 0m0,028s
sys 0m0,000s
========= sample-nin/webpbn-00016.nin ========
Start read data.
69 lines were read
Solver started. vars = 7484
Clauses = 191564
SATISFIABLE
apsnono finished.
real 0m0,368s
user 0m0,186s
sys 0m0,008s
========= sample-nin/webpbn-00021.nin ========
Start read data.
40 lines were read
Solver started. vars = 1240
Clauses = 11481
SATISFIABLE
apsnono finished.
real 0m0,095s
user 0m0,034s
sys 0m0,000s
========= sample-nin/webpbn-00023.nin ========
Start read data.
22 lines were read
Solver started. vars = 311
Clauses = 1498
SATISFIABLE
apsnono finished.
real 0m0,147s
user 0m0,006s
sys 0m0,000s
========= sample-nin/webpbn-00027.nin ========
Start read data.
51 lines were read
Solver started. vars = 2958
Clauses = 38258
SATISFIABLE
apsnono finished.
real 0m0,089s
user 0m0,050s
sys 0m0,010s
========= sample-nin/webpbn-00065.nin ========
Start read data.
75 lines were read
Solver started. vars = 7452
Clauses = 134010
SATISFIABLE
apsnono finished.
real 0m0,272s
user 0m0,166s
sys 0m0,009s
========= sample-nin/webpbn-00436.nin ========
Start read data.
76 lines were read
Solver started. vars = 6900
Clauses = 134480
SATISFIABLE
apsnono finished.
real 0m0,917s
user 0m0,830s
sys 0m0,005s
========= sample-nin/webpbn-00529.nin ========
Start read data.
91 lines were read
Solver started. vars = 10487
Clauses = 226237
SATISFIABLE
apsnono finished.
real 0m0,286s
user 0m0,169s
sys 0m0,005s
========= sample-nin/webpbn-00803.nin ========
Start read data.
96 lines were read
Solver started. vars = 9838
Clauses = 278533
SATISFIABLE
apsnono finished.
real 0m0,827s
user 0m0,697s
sys 0m0,008s
========= sample-nin/webpbn-01611.nin ========
Start read data.
116 lines were read
Solver started. vars = 25004
Clauses = 921246
SATISFIABLE
apsnono finished.
real 0m3,467s
user 0m3,301s
sys 0m0,084s
========= sample-nin/webpbn-01694.nin ========
Start read data.
96 lines were read
Solver started. vars = 13264
Clauses = 391427
SATISFIABLE
apsnono finished.
real 0m0,964s
user 0m0,822s
sys 0m0,016s
========= sample-nin/webpbn-02040.nin ========
Start read data.
116 lines were read
Solver started. vars = 26445
Clauses = 1182535
SATISFIABLE
apsnono finished.
real 0m7,512s
user 0m7,354s
sys 0m0,122s
========= sample-nin/webpbn-02413.nin ========
Start read data.
41 lines were read
Solver started. vars = 1682
Clauses = 15032
SATISFIABLE
apsnono finished.
real 0m0,258s
user 0m0,053s
sys 0m0,001s
========= sample-nin/webpbn-02556.nin ========
Start read data.
111 lines were read
Solver started. vars = 11041
Clauses = 340630
SATISFIABLE
apsnono finished.
real 0m0,330s
user 0m0,136s
sys 0m0,009s
========= sample-nin/webpbn-02712.nin ========
Start read data.
95 lines were read
Solver started. vars = 13212
Clauses = 364416
SATISFIABLE
apsnono finished.
real 0m6,503s
user 0m6,365s
sys 0m0,032s
========= sample-nin/webpbn-03541.nin ========
Start read data.
111 lines were read
Solver started. vars = 19249
Clauses = 676595
SATISFIABLE
apsnono finished.
real 0m5,008s
user 0m4,785s
sys 0m0,100s
========= sample-nin/webpbn-04645.nin ========
Start read data.
121 lines were read
Solver started. vars = 19159
Clauses = 793580
SATISFIABLE
apsnono finished.
real 0m4,739s
user 0m4,477s
sys 0m0,107s
========= sample-nin/webpbn-06574.nin ========
Start read data.
51 lines were read
Solver started. vars = 2932
Clauses = 33191
SATISFIABLE
apsnono finished.
real 0m0,231s
user 0m0,176s
sys 0m0,000s
========= sample-nin/webpbn-06739.nin ========
Start read data.
81 lines were read
Solver started. vars = 10900
Clauses = 256833
SATISFIABLE
apsnono finished.
real 0m0,782s
user 0m0,730s
sys 0m0,008s
========= sample-nin/webpbn-07604.nin ========
Start read data.
111 lines were read
Solver started. vars = 18296
Clauses = 478535
SATISFIABLE
apsnono finished.
real 0m1,524s
user 0m1,324s
sys 0m0,026s
========= sample-nin/webpbn-08098.nin ========
Start read data.
39 lines were read
Solver started. vars = 1255
Clauses = 10950
SATISFIABLE
apsnono finished.
real 0m0,216s
user 0m0,133s
sys 0m0,000s
========= sample-nin/webpbn-09892.nin ========
Start read data.
91 lines were read
Solver started. vars = 13887
Clauses = 419787
SATISFIABLE
apsnono finished.
real 0m12,048s
user 0m11,858s
sys 0m0,088s
========= sample-nin/webpbn-10088.nin ========
Start read data.
116 lines were read
Solver started. vars = 23483
Clauses = 1020515
SATISFIABLE
apsnono finished.
real 0m17,472s
user 0m16,795s
sys 0m0,321s
========= sample-nin/webpbn-10810.nin ========
Start read data.
121 lines were read
Solver started. vars = 25726
Clauses = 895436
SATISFIABLE
apsnono finished.
real 0m0,898s
user 0m0,562s
sys 0m0,026s
========= sample-nin/webpbn-12548.nin ========
Start read data.
88 lines were read
Solver started. vars = 13685
Clauses = 486012
SATISFIABLE
apsnono finished.
real 3m1,682s
user 2m58,537s
sys 0m1,507s
========= sample-nin/webpbn-18297.nin ========
Start read data.
79 lines were read
Solver started. vars = 9708
Clauses = 272394
SATISFIABLE
apsnono finished.
real 0m16,643s
user 0m16,326s
sys 0m0,210s
========= sample-nin/webpbn-22336.nin ========
Start read data.
159 lines were read
Solver started. vars = 67137
Clauses = 5420886
SATISFIABLE
apsnono finished.
real 1m46,555s
user 1m43,336s
sys 0m3,075s
What conclusions can be drawn from this?
- SAT Solver decided all the examples that are solved by other applications, even webpbn-22336, which is solved only by one application.
- SAT Solver easily solves many examples that cannot be solved by most applications.
- The solution time is in most cases better with SAT Solver than with other applications.
- I used a single-threaded SAT solver; if you use a multi-threaded SAT solver, the results will be even better.
- When using SAT Solver, there is no need to invent your own algorithms, which someone has invented with a high probability.
In conclusion, you can add that the SAT solver can receive more than one solution, if there are any. To do this, you need to add one clause of the form ((not X1) V (not X2) V ... (not XN)), where X1, X2 ... XN are variables corresponding to the filled cells in the previous solution.