2010 Second International Workshop on Education Technology and Computer Science
Rating and Generating Sudoku Puzzles Shanchen Pang, Eryan Li Shandong University of Science and Technology
Tao Song
Peng Zhang
Huazhong University of Science and Technology
the Chinese Academy of Sciences
Qingdao, China
Wuhan, China
Beijing, China
[email protected]
Definition 1 [9].A proper puzzle has a unique solution.
Abstract—The Sudoku puzzle has achieved worldwide popularity recently, and attracted great attention of the computational intelligence community. How to evaluate the difficulty of the puzzle is a difficult thing. In this paper, the nondeterministic degree is regarded as the main factor that affects Sudoku puzzle and we get the difficulty through summing the entropy of each step. We give the difficulty level model by mapping the difficulty to integers. An algorithm is developed to generate Sudoku puzzles when given a difficulty level.
The meaning of a unique solution is that all the numbers to be assigned to the blank entries are necessary assignments. A minimal proper puzzle is that no givens can be removed leaving it a proper puzzle (with a single solution). The lowest known is 17 givens in general Sudoku, or 18 when the positions of the givens are constrained to be half-turn rotationally symmetric [9]. In this paper proper puzzles are considered. That is to say the number of the givens is more than 18.
Keywords- Sudoku; Algorithm; Difficulty Level
Up to now, most of the studies are on how to solving Sudokus. People proposed several practical reasoning algorithms based on different knowledge representations. Ref.[12] shows that generating a Sudoku puzzle is a NP Complete problem. How to solve the NP Complete problem? Some people proposed their own methods. Ref.[5][6][8] encode Sudoku as a Satisfaction Problem (SAT), and use a general SAT solver to get the solution. Ref.[6][7] represent the Sudoku puzzle as a Constrain Satisfaction Problem (CSP), and compare different propagation schemes for solving Sudoku. However, ref.[4] reportes that some hard puzzles can not be solved using these methods, such as [10][11].
I INTRODUCTION Sudoku is a Japanese logical game that has recently become popular in Europe and North-America [1]. The first puzzle is created in USA 1979, but it circled through Japan and reappeared to west recently. And later, Sudoku is very popular because it is challenging with simple rules[2]. Because Sudoku puzzle is a good example of various reasoning approaches in Artificial Intelligence area, it has attracted great attention of the computational intelligence community [3][4][5][6][7][8]. Sudoku puzzle is a square composed of 9×9 grids. The square is divided into 3×3 sub squares. The solution of Sudoku puzzle is such that each row, column and sub square contains each integer number from [1][9] once and only once.
The published Sudoku puzzles are often ranked in terms of difficulty. If the difficulty is known, the players can choose Sudoku whose difficulty is suitable for them. However, the metrics of their difficulty levels haven’t been elaborated. Sometimes the metrics are developed half-baked. A quantitative measurement of the complexity level of Sudoku puzzles based on the graph structure and information theory is proposed [13]. But the difficulty level is got after finishing the puzzle. GA method[14][15] is used to generate new Sudoku puzzles. Ref.[14] uses genetic algorithm to test the difficulty levels of new Sudoku puzzles, and uses it as a rating machine. But the method in [15] seems inefficient, since in their example the GA needed 35700 generations to come up with a new puzzle.
At the beginning, there exists some numbers (givens) in the puzzle that are given. Fig.1 shows the original situation of the Sudoku puzzle where 23 numbers for 81 possible positions are givens.
In order to provide a formal basis for the studies and developments of Sudoku, the paper will still study on the solving systems and difficulty evaluation. Our paper is organized as follows. In section 2, a metric of measuring the difficulty level was shown base on the information theory. In section 3, we proposed an algorithm that can generate Sudoku puzzle when the difficulty level was given. And then we gave
Figure1. A starting point of the Sudoku puzzle, where 23 locations contain a given number.
978-0-7695-3987-4/10 $26.00 © 2010 IEEE DOI 10.1109/ETCS.2010.77
457
the flow chart of the algorithm. Finally, we discussed the model and suggested future research work.
According to the definition of entropy, the uncertainty generated in this step is as follows: Definition 4. The entropy of step i is defined:
II DIFFICULTY LEVEL OF SUDOKU In order to develop an extensible metrics to define difficulty levels, we first construct a mathematical model to define it by a feasible formula. This section aims at providing a quantitative metric to assess the complexity without any playing. In the traditional method, the metrics of their difficulty levels haven’t been elaborated. Sometimes the metrics are developed half-baked. Most of difficulty levels are only according to given numbers of the initial Sudoku puzzle, which means that the more the number of givens, the easier the solution is. But actually, there are many factors affect the puzzle, such as techniques used, logical analysis difficulty level, the cells of the initial puzzle (numbers, positions), the numbers of grids could be ensured by each trying step. However, the difficulty of Sudoku mainly depends on the initial distribution of the givens [5]. That is to say, the initial distribution of the givens is the main factor that affects Sudoku puzzle. In this part we will talk about how to get the difficulty based on initial distribution. Aˊ Preparation Shannon[16] first proposed the definition of entropy which can evaluate the uncertainty of signals sent by an information source. The “uncertainty” concept is similar to the uncertainty of choosing t.he correct number in the blanks. So we propose to evaluate the complexity of solving Sudoku puzzles using the information entropy.
H i = −[ 1
variable with possible value X is: n
n
i =1
i =1
a positive real number and the larger the value is, the larger the uncertainty is. Therefore, this definition will be extended to the uncertainty of Sudoku puzzle. B. Difficulty Design
Definition 3. The number of numbers that can be filled in a certain empty cell is called Nondeterministic degree (denoted by n ) of this cell. During solving the puzzle, the empty cells will be filled step by step. In each step, there will be some candidates for each empty cell, It is easy to know we should choose the cell that has the minimum nondeterministic degree˄ min( n ) ˅ and fill the blank with one possible candidate. The possibility of choose one candidate is 1 (if min(n) ≠ 0) . min(n)
)]* min( n) = log 2 (min( n))
(2)
N
d = ∑ Hi
(3)
i =1
The value of d illustrates the uncertainty in the entire process. The larger the sum entropy is, the larger the uncertainty of the puzzle is. Therefore, the difficulty of the puzzle can be defined by sum entropy d . C. Difficulty Level The next task is how to get difficulty level D from d . A mapping D = f ( d ) should be set up from a scale to integers, so that every Sudoku puzzle is involved in a certain difficulty level. The function f may be not unique. D is proper if
f is reasonable. For example, if we take 4 difficulty levels. One method is: ⎧ ⎪ ⎪ D=⎨ ⎪ ⎪ ⎩
(1)
Generally, b will be the value of 2 and H ( X ) denotes the uncertainty of signals sent by X . The value of H ( X ) is
min( n )
The entropy means the uncertainty of step i . Supposed that there will be N steps in all to finish the puzzle, the sum entropy in solving the puzzle is:
, the entropy of
H ( X ) = −∑ pi log b ( pi ) , where ( ∑ pi = 1)
log 2 ( 1
(where min(n) ≠ 0 )
Definition 2[16]. Suppose that X is a discrete random
{ p1 , p2 ,..., pn }
min( n )
1
if
2 3
if if
4
if
d ∈ [0,1) ;
d ∈ [1, 2 ) ; d ∈ [2,3) ;
(4)
d ≥ 3.
Of course, we can get different
D from different f .
But it is reasonable that D is a monotonically increasing function. We can use the value of D to define the difficulty level of a random Sudoku puzzle. III GENERATION OF THE PUZZLES Definition 5. Candidate is the numbers that suitable for an empty cell. Sudoku puzzle can be regarded as a tree named Puzzle Tree. The nodes are the puzzles during solving the given puzzle. If there are empty cells that have only one candidate for the original puzzle, fill the cells with the candidate. The gotten puzzle is the root of the Puzzle Tree. Given a father node, the sons are the puzzles (i.e. nodes) after one empty cell of the node filled with a candidate. For a given difficulty level D0 , we propose an algorithm to solve the Sudoku Puzzle based on the Puzzle Tree.
458
Step1. If there is a puzzle in Hash Map whose difficulty is D ( D = f ( d )) , output the corresponding puzzle, and end. Else go to Step 2;
Start
Find the corresponding difficulty level puzzle in HashMap
Step2. Generate a proper Sudoku puzzle stochastically, and let d = 0 ;
Exist?
Step3. Calculate the nondeterministic degrees of the empty cells of the root node;
Y
N
generate a puzzle stochastically, d=0
Step4. Choose the empty cell whose nondeterministic degree is min(n) and let d = d + log 2 (min(n)) . Let the empty cell be the father. Construct the sons of the father node.
return
Calculate the nondeterministic degree of every empty cell
Step5. Choose one son node and calculate the nondeterministic degrees of the empty cells of the puzzle that the son represents;
Choose the empty cell whose nondeterministic degree is minimum and d=d+log2(min(n)) Mark the puzzle state as father node and construct the sons
Step6. If min(n) = 0 , cut off the branch and return to father node. Go to next son. Calculate the nondeterministic degrees of the empty cells and go to Step 4. Else let the son node be father node and go to Step 4;
Y
Choose one son and calculate the nondeterministic degree
Step7. When the number of the empty cells is 0, store the puzzle;
Exist a empty cell of min(n)=0
Step8. Calculate D by D = f ( d ) . If D and D0 are
Exist empty cells?
N
N
equal, save the puzzle and its sum entropy d into HashMap. Return it and end. Else clean some cells stochastically, let d = 0 and return to Step 3.
Y N
Exist father?
save the nonempty puzzle
Y
Calculate the original puzzle difficulty according to metric
Generate a proper Sudoku puzzle stochastically based
the difficulty satisfy the difficulty level?
on a given solved puzzle, and it only need remove some
N
numbers. Hash Map is a special table that can store the
Construct original puzzle by clearing some cells stochastically, d=0
historical results. The course of step 3 to step 6 is a puzzle tree which is described as follows:
Y
Save the puzzle into HashMap and its difficulty
return
father
Figure3. Flowchart of the algorithm IV CONCLUTION
1
Rating the difficulty level of Sudoku is a difficult thing, and there are few people researching this problem. In this paper, our model gives a metric to define the difficulty level based on information theory, and gets the real difficulty easily by this way. Our algorithm can generate Sudoku puzzle according to the difficulty and it can be designed to a platform as a generating machine. The other problem is to study whether the difficulty of ratings given Sudoku puzzles in newspapers are consistent with their difficulty. Grading puzzles is said to be one of the most difficult tasks in Sudoku puzzle creation[2], so this can be a helpful tool for that purpose. The new puzzles are usually generated by finding one possible Sudoku solution and then removing numbers as long as only one unique solution exists. Another research area would be to study if it is possible to generate a fitness function based on an energy function [17]. Therefore, there will be much work to do in the future
sons
3 2
The nondeterminstic degree is 0, cut off
4
Return the puzzle
Figure2. Puzzle tree Here we design the flowchart (Fig.3) to illustrate the detail of the algorithm.
459
REFERENCE [1]
Wikipedia. Sudoku. Available via WWW: http://en.wikipedia.org/wiki/Sudoku (cited11.9.2006). [2] Semeniuk, I Stuck on you. New Scientist 24/31 December, 2005, 45-47. [3] A. Caine and R. Cohen, “MITS: A Mixed-Initiative Intelligent Tutoring System for Sudoku,” in Luc Lamontagne, Mario Marchand (Eds.), Advances in Artificial Intelligence, Proceedings of 19th Conference of the Canadian Society for Computational Studies of Intelligence, Canadian AI 2006, LNCS vol. 4013, 2006, pp. 50–561. [4] M. Henz and H.M. Truong, “SUDOKUSAT-A Tool for Analyzing Difficult Sudoku Puzzlesm,” in proceedings of the First International Workshop on Applications with Artificial Intelligence, IWAAI 2007. [5] I. Lynce and J. Ouaknine, “Sudoku as a SAT problem,” in Electronic Proceedings of the 9th International Symposium on Artificial Intelligence and Mathematics, 2006. [6] C.G. Reeson, K.-C. Huang, K.M. Bayer, and B.Y. Choueiry, “An Interactive Constraint-Based Approach to Sudoku,” in proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence (AAAI’07), 2007, pp. 1976–1977. [7] H. Simonis, “Sudoku as a constraint problem,” in Hnich, B., Prosser, P., Smith, B. (eds.) Proceedings of the 4th International Workshop on Modelling and Reformulating Constraint Satisfaction Problems, 2005, pp. 13–27. [8] T. Weber, “A SAT-based Sudoku solver,” in Geooff Sutcliffe and Andrei Voronkov (eds.), proceedings of the 12th International Conference on Logic for Programming, Artificial Intelligence, and Reasoning (LPAR-12), 2005, pp. 11–15. [9] mathematics of Sudoku http://en.wikipedia.org/wiki/Mathematics_of_Sudoku. [10] A. Inkala, AI Escargot - The Most Difficult Sudoku Puzzle, Lulu.com Publisher, Finland, 2007. [11] Ravel, The hardest Sudokus. http://www.sudoku.com/forums/viewtopic.php?t=4212&start=587. [12] T. Yato, and T. Seta, “Complexity and completeness of finding another solution and its application to puzzles,” IEICE ransactions on Fundamentals of Electronics, Communications and Computer Science 86-A(5), pp. 1052–1060, Oxford University Press, 2003. [13] Zhe Chen LATTIS, INSA,Heuristic Reasoning on Graph and Game Complexity of Sudoku arXiv:0903.1659v1 [cs.AI] 9 Mar 2009. [14] Timo Mantere. Janne Koljonen Solving and Rating Sudoku Puzzles with Genetic gorithms New Developments in Artificial Intelligence and the Semantic Web Proceedings of the 12th Finnish Artificial Intelligence Conference STeP 2006.86-92. [15] Gold, M.. Using Genetic Algorithms to Come up with Sudoku Puzzles. Sep 23, 2005. Available via http://www.c-sharpcorner.com/UploadFile/mgold/Sudoku092320050 03323AM/Sudoku.aspx. [16] C.E. Shannon, “A Mathematical Theory of Communication,” The Bell System Technical Journal, 27, pp. 379–423, 623–656, 1948. [17] Alander, J.T. Potential function approach in search: analyzing a board game. In J. Alander, P. AlaSiuru, and H. Hyötyniemi (eds.), Step 2004 . The 11th Finnish Artificial Intelligence Conference, Heureka, Vantaa, 1-3 September 2004, Vol. 3, Origin of Life and Genetic Algorithms, 2004, 61-75.
460