Let's Sudoku with SAS - Lex Jansen

19 downloads 742 Views 430KB Size Report
There are essentially two different ways of programming sudoku puzzles. The first method ..... [2] Author, “Fu bared – Let's dance”, http://www.jalat.com/blogs.
NESUG 2006

Ins & Outs

Let’s Sudoku with SAS®! Vatsala Karwe, Patricia Seunarine and Carol Razafindrakoto ABSTRACT This paper discusses three SAS® programs that can be used to solve sudoku puzzles. All three pieces of code are different -- beginning with the way the sudoku puzzle problem is set up, to the way that it is solved. We have named these programs symsudoku (because this program uses call symput as a primary tool), vkalg (an algorithm for SAS®), and SAS4SAS (sudoku analyzer-solver for SAS®). In this paper, we first present details of vkalg, digressing occasionally to compare certain features of our three programs; we then discuss symsudoku and SAS4SAS in the final sections. We conclude with a note on our experiences while working on this paper.

INTRODUCTION It seems to us that everyone’s sudoku-ing. There are numerous websites filled with sudoku puzzles, and neat setups of puzzles you can do on-line. There are books on Sudoku, on Master Sudoku, on How-To Sudoku, and magazines and newspapers carry sudoku puzzles for entertaining their readers. What is it that everyone is doing? Here is an example of a sudoku puzzle: 3

2

7

5

8

7 7

1 4

6

2

9

1

2

6

8

4

5

2

7 4

4

5 6

3

9

A sudoku puzzle is a grid of n rows and n columns, in which some pre-assigned “clues” or “givens” have been entered. The size of the grid can be nxn, where n is the square of an integer; the most common is the 9x9 grid, as above, so we’ll mainly discuss this. Our program SAS4SAS is written for general n, where the size of the grid can be fed into the program as a global macro variable at the outset. Programs symsudoku and vkalg have been written for the 9x9 grid, but it is not difficult to extend these coding ideas to general n. A “block” is determined by its row and column; there are nine blocks in the 9x9 grid, the first being the first 3x3 grid in the upper left-most corner (the intersection of rows 1,2,3 and columns 1,2,3), then the second block is the next 3x3 grid (intersection of rows 1,2,3 with columns 4,5,6) and so on. The “constraints” of the puzzle are that each row, column and block should contain each of the digits 1 through 9 exactly once. Each cell must contain exactly one digit. (From Wikipedia:The name "Sudoku" is the Japanese abbreviation of a longer phrase, "Suuji wa dokushin ni kagiru", meaning "the digits must remain single"4.) For general n, these constraints can be stated in terms of n symbols – instead of the 9 digits, any set of n distinct symbols can be used to specify a sudoku puzzle. As stated earlier, there is a fair amount of work – on the mathematics of sudoku, as well as the programming of sudoku – that is already in place. Our web search uncovered sudoku programs written in C++, Java and Perl. However, we did not find any work on sudoku in the language we were most familiar with – SAS® 1 . We try to fill this gap. While working on this paper, we explored Donald Knuth’s Dancing Links Algorithm1. Although fascinating, we do not implement this directly in SAS®; our program SAS4SAS executes a backtracking routine very efficiently.

It has been pointed out to us that in May 2006 there were some vigorous discussions on sudoku programming in SAS® on the listserve http://listserv.uga.edu. Our programs are different from the ones on this listserve, and the main ideas of our programs were developed prior to May 2006. In this regard, we would also like to point out that almost all of our code is in BASE SAS®, version 8.2. [1]

1

NESUG 2006

Ins & Outs

There are essentially two different ways of programming sudoku puzzles. The first method – the “logical” way -- is to use the logic we would apply when given a sudoku puzzle. This would mean checking for all possible entries in cells, rows, columns and blocks and when only one possible entry is indicated by any of these checks, one would fill that cell with that one possible entry. Doing this repeatedly would solve most easy- and medium-level-ofdifficulty puzzles. But wait. How do we do the “evil” and “fiendish” puzzles? One way could be to keep following the working of the human mind, and to set up several strategies (named swordfish, etc.-- in the literature). We tried this. The problem with this approach is that one may always come up against an even-more-evil puzzle which one may not have seen before. This would hold specially in the 16x16 grids, or larger, and there would be many strategies to think of and put into the sudoku-solver program. As we were programming, every time this did happen, we were stuck and needed to set up code to get the better of the evil puzzle in front of us. Our programs symsudoku and vkalg begin with this logical approach, then use recursive methods to pick up when straightforward logical strategies cannot get one any further. The other programming technique would be to do it the “brute force” way – just check all possible configurations. The challenge here is efficiency – even in the 9x9 grid, we need to set up ways of checking for consistency, and then either backtracking or moving forward – in a space and time-conserving manner, or else the problem gets unmaneageable, or one has to wait too long for a solution. Our program SAS4SAS utilizes this approach. In this paper, we first discuss the program vkalg at length, and then we follow up with a discussion of some features of symsudoku and SAS4SAS. Appendices 1a (symsudoku), 1b (Flowchart for backtracking routine of symsudoku), 2 (vkalg), and 3 (SAS4SAS) follow. These appendices contain the actual code of our three programs2.

DATA INPUT: In program vkalg.sas the data is input as follows: each record represents a possible outcome for the rth row and cth column. Since there are 9 rows and 9 columns and each cell may have 9 possible entries, this means that the initial dataset has 729 records. The aim now is to apply the information from the given clues and delete records that are inconsistent. Towards this end, we create a clue variable (variable name clue). Each record that corresponds to an entry in the puzzle, gets clue=1; clue is missing for all other records. Our final dataset will contain exactly 81 records, each a “clue” – either a given clue, or one assigned “clue” status by our program. LOGICAL PROCESSING THE SAS® WAY: After having set up our puzzle as in the preceding section, we continue to the next step; this is to apply the constraints to our set-up. Sorting and merging in SAS® allows us to eliminate all impossible records from our dataset. For example, since there can be only one entry in each cell, we can discard all records that have the same row and column, but a different value for entry than a given clue. Note that we do not need to proceed clue-by-clue; SAS® handles all clues, and all constraints arising from these clues, in essentially the two steps Ø and Ù. Row, column and block constraints are handled in much the same way. At the end of this step, we are left with exactly one record for each cell that contained a clue (that is, no other record has the same row and column) and for cells that were empty to begin with, we have at least a single record, or we can have multiple records at this stage.

FINDING NEW CLUES: Recall that we began with 9 records for each empty cell. Clearly if, after elimination of records, an empty cell has exactly one record, the value of the entry variable in this record must be the cell entry. So we can assign the status of clue=1 to this record. If there is exactly one record corresponding to a fixed row and entry value, then this must be a clue, and all other records with the same row and entry value can be discarded. The same holds true for columns and blocks. To make SAS® implement this logic, we introduce the variables cellcnt, rowcnt, colcnt and blkcnt. Each of these counts the number of possible entries in each cell, row, column and block respectively. We obtain counts using first. and last. processing Ú. We merge on these counts to the main dataset Û. CONSISTENCY CHECKS: Our initial processing ensures that all cells with clue=1 have a unique (row,column), (row,entry), (column,entry) and (block, entry). Hence for cells with clue=1, a cellcnt, rowcnt,colcnt or blkcnt greater than 1 indicates an in-

2

Due to space limitations, some interesting aspects of data output needed to be trimmed from our programs; we might mention these in our talk in September 2006. 2

NESUG 2006

Ins & Outs

consistent set of clues (in vkalg, this sets the macro variable chkwarn to “Inconsistent Clues”, this message is printed and processing stops). On the other hand, if for a non-clue cell any of the counts become zero, then clearly some records that were eliminated should not have been, and this too points to an inconsistency (this sets macro variable inconsis=1).

MOVING TOWARDS A SOLUTION: If the consistency checks go through – there are no inconsistencies – then we can go ahead and use the information in the “cnt” variables to add to our clue set Ü. If any cell, row, column or block count is equal to 1, then we can identify the record that is a clue. Note that we do not need to eliminate related records at this point – the next time the do-loop executes, corresponding records are discarded based on the new set of clues. Looping continues till no more clue cells are found. ARE WE DONE YET? When we are done, we will have exactly 81 records in our dataset, all of which have been assigned clue=1. One way to check whether we are at the end of our solution search is to set up the code as in Ý. The macro variable allclues should be TRUE (that is, there are exactly 81 clues), and if it is not, we need to continue. Most easy and medium-level puzzles get solved by this first do-loop process. But what if the number of clues is less than 81, and our initial processing leaves us without any further clue-possibilities? This means that we are now left with one set of identified clues, and another set of records which have minimum of (cellcnt, rowcnt, colcnt, blkcnt) greater than or equal to 2. Instead of doing a depth-first search at this point, we do a breadth-first search. BREADTH-WISE SEARCH: We look at the scenario in which we have at least one cell with minimum of (cellcnt, rowcnt, colcnt, blkcnt) exactly equal to 2. What we know is that one of these two records is a clue, the other needs to be discarded. Suppose we set one of these to be a clue, and suppose this is the wrong decision. Our initial processing do-loop will use this additional clue, and will discard records based on this information. If we then wish to backtrack and change our decision of assigning clue-status to this record, we will have to worry about re-instating all deleted records – and undoing changes made to the clue variable-- because of this wrongly assigned clue. Instead of getting into this complicated processing, we do the following: we delete one of the records from our dataset. This leaves just one of the two records in our dataset, and the initial-processing do-loop assigns clue=1 to this remaining record automatically in its next loop. Now if we made a mistake in our deletion decision, it is simple to re-instate our dataset – all we need to do is reinstate our last-correctly-processed dataset using the set statement in a SAS® datastep. Since deleting just one record may not solve the entire puzzle, we need to set up a delete sequence for all cells that have at most 2 possible entries. This is done in Þ. Once we delete one of two records, the other becomes a clue in the next loop. This additional clue, alongwith the initial-processing do-loop, either leads to a solution, or we need to loop to the next record in the delete sequence. SOLUTION: Initially, we set the allclues macro variable to "FALSE”. When a solution is reached, the macro variable allclues is “TRUE”. This stops the do-while recursion and the solution is printed out using proc tabulate. Note that we make proc tabulate give us the mean of the variable entry, which is exactly the entry itself. We wish to mention here that a number of steps could be set up to loop through records when the minimum of (cellcnt, rowcnt, colcnt, blkcnt) is greater than 2. Even the most evil puzzles, however, seem to be capitulate to the current code, so for the sake of simplicity we have left the code as is; we may come back to this in our talk in September.

SYMSUDOKU AND SAS4SAS: Symsudoku was the first piece of code we developed. It seemed to us the most “natural” way of thinking. The input dataset contains exactly 81 records for the 9 rows and 9 columns of the sudoku 9x9 grid. That is, each record represents one cell. A cell entry is determined by the value of the variable a. For each cell (that is, record) we set up nine additional variables b1 through b9. Here b1=1 means that 1 is a possible entry and b1=0 means that 1 cannot be an entry in that cell. All the b-variables are initialized to 1. As the constraints are applied, the bvariables are set to 0. Repeated use of call symput serves to fix the cell row, column, block and entry under con3

NESUG 2006

Ins & Outs

sideration. Constraints are applied, and consistency checks carried out, by going into the dataset and comparing current data-vector values with the values fixed into the call symput macro variables. Once again, as in vkalg, simply looking at cellsums, rowsums, colsums and blocksums can lead to a deadend. From that point on, one needs a different technique – either more sophisticated logic techniques, or a brute-force programming technique. We proceed to obtain a solution by applying the latter, a backtracking technique that, just as the initial-processing, utilizes a few small and nifty macros. THE BACKTRACKING ROUTINE IN SYMSUDOKU: Please refer to the flowchart in Appendix 1b to follow along with the discussion below. What do we need for a backtracking algorithm to work? First, we need to identify all the candidate entries in each cell. Note that this is done in the first part of symsudoku by the b1-b9 variables: if, for example, b1=1 in any cell, then 1 is a possible entry for that cell. We need some mechanism to identify which entry was made and which would be the next possible entry. For this, we need to set up a sequence of possible entries (this will be bseq, as explained in the subsequent paragraph). We also need to be able to identify whether the entry was the last possible entry in a cell. We need this because if it is not the last entry we need to move to the next possible entry within that cell, and if it is the last entry in that cell, then either we need to move to the next cell, or, if in the backtracking phase, we need to move back to the previous cell. In order to be able to move from cell to cell, or within a cell, we change the shape of our dataset Ø. For each cell (one record in our original dataset) we create a record for each b-variable that has a value of 1. For example, if cell (row=1, column=1) has b1=1, b3=1 and b9=1 then in this new dataset we would have (row=1,column=1,entry=1), (row=1,column=1,entry=3), (row=1,column=1,entry=9), as three records for the same cell. Note that this gets us to the same form of the dataset as in vkalg.sas. In this dataset, we ensure that we have a variable – bseq – which goes sequentially from 1 to the total number of entries that are as yet only candidates towards cell entry. Also, we mark the last candidate entry into a cell by EntryLÙ. The variable Nincell identifies the number of candidate entries in a cell and the variable Sincell identifies the sum of candidate entries in the current and all preceding cells. We set up the macro variable allb to contain the total number of entries that are candidates. Note that in this set up we can identify –and fix- a candidate entry by simply entering all variable values in a record into macro variables Ú. We set up ntrk to be the macro variable that points to a fixed candidate (the one with bseq=&ntrk). We can move to the next candidate in a cell by setting ntrk=ntrk+1; we can move to the next cell by setting ntrk=Sincell + 1; and we can move to the previous cell by setting ntrk=Sincell-Nincell. We can check whether our candidate entry is the last entry in the cell by checking if EntryL=1. In this routine, we fix an entry by setting Clue=1Û. To undo this while backtracking, all we need to do is to go back and set Clue=0 Ü. When Sincell=allb and the candidate entry is consistent, we are done, and we print out the result using proc tabulate. Note that the backtracking routine of symsudoku can be combined with the first part of vkalg.sas, where the initial clue constraints are applied. Alternatively, one may ignore the initial application of constraints altogether, and then one can simply use the backtracking routine to solve the entire puzzle. SAS4SAS: SAS4SAS is much more general in its approach. It systematically looks for all possible combinations. It uses the backtracking technique to withdraw from dead-end combinations or to look for other solutions if some solutions have already been found. A puzzle is solved in one data step. Initial grid data is read into the program using input and cards statements. The program checks the original puzzle grid for inconsistencies before it proceeds to solving itØ. The puzzle grid is organized into a 2-dimensional array (puzzleg{})Ù. The approach used is to get the list of all candidates for each of the rows, and find the right cells for these candidates one-by-one. For example, if the numbers 3, 5, 6, 8, and 9 are used in one row, the remaining numbers 1, 2, 4

NESUG 2006

Ins & Outs

4, and 7 are the candidates for that row. The program starts with the candidate with the lowest value, 1 in our example, and attempts to place that candidate in one of the empty cells in the row moving from left to right. We say that 1 can be kept in a cell if there is no other 1 in the corresponding column, and no other given 1 in the corresponding box(block). If 1 can be kept in the cell that we are attempting to fill, then the program takes the next candidate, 2 in our example, and attempts to place it in one of the remaining empty cells in the same row, and processing continues in a similar fashion for all the candidate digits. After all candidates have been successfully placed in one row, the program goes to the next row and processes the candidates for that row in the same way as described above. One solution to the puzzle is found when all rows are successfully filled in. If the current candidate cannot be kept in the cell we are attempting to fill, the program attempts to place it in the next empty cell in the same row and in the column located to the right of that row, if any. If the current candidate cannot be kept in any of the empty cells of a row, the program abandons the current candidate. It then goes to the previously processed candidate and attempts to place it into the next empty cell located to its right, if any. If a solution cannot be found, that is, all candidates of a row cannot be placed within cells, the program goes back to the previous row, and continues to process the most-recently-processed candidate in that row. At the last stage, the program returns to the first row. At this point, if a solution cannot be found, there is no other row to go back to, and the program terminates. SAS4SAS has finished trying all possible combinations consistent with the given initial grid, and will at this point have found all solutions to the puzzle (all of which can be printed out). Now, to implement the method described above, the program creates the following arrays: 1) a 2-dimensional array (candr{})Ú to identify all candidates for a given row, where the first dimension indicates the rows, and the second dimension indicates the candidates in each row. The value of the array item is the candidate numberÛ, or missing if the corresponding number is not a candidate for the rowÜ. 2) a 2-dimensional array (candrc{})Ú to keep the current column where each candidate is, where the first dimension indicates the rows, and the second dimension indicates the candidates in each row. The value of the array item is the column of the cell where the candidate is currently placed inÛ, or missing if the candidate is not currently placed in any cell of the rowÜ. 3) a 2-dimensional array (candc{})Ú to identify all candidates for a given column, where the first dimension indicates the columns, and the second dimension indicates the candidates in each column. The value of the array item is the candidate numberÛ, or missing if the corresponding number is not a candidate for the columnÜ. 4) a 2-dimensional array (candb{})Ú to identify all candidates for a given box, where the first dimension indicates the rows of the cells, and the second dimension indicates the columns of the cells. The value of the array item is the candidate numberÛ, or missing if the corresponding number is not a candidate for the boxÜ. When attempting to place a candidate from a row into a cell, the program performs consistency checks to see whether the candidate from the row is also a candidate in the column of the cell one is attempting to fill, and also a candidate in the box (block) of this same cell. These consistency checks are done by looking at the current status of the number in the 2-dimensional array of column candidatesÝ, and the current status of the number in the 2-dimensional array of the box (block) candidatesÞ. That is, checks are done in only two cells rather than in all cells in the column and in all cells in the corresponding box (block). When a candidate can be placed in a cell, the program saves the column of the cell into the corresponding cell of the array that identifies current columns to which candidates belong. Then, the program updates the corresponding array items in the row candidates, column candidates, and box candidates to show that the number is no longer a candidate for the corresponding row, column, and box(block)ß. When a candidate is removed from the puzzle grid because of backtracking, the program uses the number to be removed to point to the appropriate array items in the row candidate array, column candidate array, and box (block) candidate array. The program updates the row candidate array item, column candidate array item, and box(block) candidate array item to show that the number is again available as candidate for the corresponding row, column, and box(block) à. The program then identifies the next empty cell in the grid that can be filled with the current candidate of interest by using the saved column number from the array that indicates the current location of the candidates. SAS4SAS can start from any grid status--including empty grids--and is capable of finding all possible solutions pertaining to the initial grid. After a solution has been found, the program goes on to find other solutions if any still exist. However, to look for a limited number of solutions, say 5, the user can set the macro variable limnsolutions to 5.If desired, SAS4SAS can show the intermediate steps for debugging purposes by setting the macro variable intermsteps to “yes”. SAS4SAS is capable of processing any nxn grid – whereas we humans agonize over 9x9 grids. 5

NESUG 2006

Ins & Outs

CONCLUSION: Setting up sudoku-solving routines in SAS® turned out to be a fun and challenging experience. Although we have three programs on this, we do not feel that we are done. Probably we will never be completely done with this. Having built up such a background, we will constantly be on the look-out for techniques that might improve processing methods, for algorithms that will give us aha-moments, for mathematical results that will make the whole process trivial. On a more practical plane, symsudoku and vkalg need to be generalized to the general nxn grid; perhaps macro processing can be removed from some of our programs and one can incorporate these into the data step for greater efficiency; we might also want to combine features of the three programs, and compare the CPU times, to get a feel for runtimes of these processes. Future work on this could be to look for deterministic ways to be able to tell whether a given puzzle is solvable or not and whether it has a unique solution or multiple solutions. Also, one could possibly modify our programs to produce sudoku puzzles -- grids that we humans would enjoy solving by hand. Moving forward from this point, it seems that the next sas-sudoku program should be written for the nxn grid. Also, it might be a good idea to get a standard way of reading in and outputting sudoku datasets, so that the output of one program can be fed into another; this would allow one to combine the best features of available programs and it would also make it easier to compare routines.

REFERENCES: [1] Donald E. Knuth, “Dancing Links”,Stanford University [2] Author, “Fu bared – Let’s dance”, http://www.jalat.com/blogs [3] Rob Rohrbough, “RE:determining if a data set is empty (temp data set)”, www.listserv.uga.edu [4] http://en.wikipedia.org/wiki/Sudoku

ACKNOWLEDGMENTS: SAS® is a Registered Trademark of the SAS® Institute, Inc. of Cary, North Carolina. We are grateful to Mathematica Policy Research for providing us some of the resources to work on this paper. We also wish to express our heartfelt thanks to our families for bearing up with us during many sudoku-crazed moments.

CONTACT INFORMATION: As we write, we are receiving requests for our code. We look forward to a lively discussion in September. We welcome questions and comments. Please feel free to contact us at: Vatsala Karwe, Patricia Seunarine, Carol Razafindrakoto Mathematica Policy Research P.O. Box 2393 Princeton, NJ 08543 Work Phone: (609) 755-3535 (please ask to be connected to whomever you wish to speak to) Email: [email protected]; [email protected]; [email protected]

6

NESUG 2006

Ins & Outs

Appendix 1a SYMSUDOKU else if (4 le kr le 6) and (1 le jc le 3) then ib=4; else if (4 le kr le 6) and (4 le jc le 6) then ib=5; else if (4 le kr le 6) and (7 le jc le 9) then ib=6; else if (7 le kr le 9) and (1 le jc le 3) then ib=7; else if (7 le kr le 9) and (4 le jc le 6) then ib=8; else if (7 le kr le 9) and (7 le jc le 9) then ib=9; run; data sdk; set sdk; /* initialize b1-b9 to 1 for each ib jc kr*/ array b(9) b1-b9; do mm=1 to 9; b(mm)=1; end; drop mm; run; %global check; /* macro setb chooses a cell with an entry, and fixes the block, column, row and entry values into macro variables. then sudoku constraints are applied, and b1-b9 are set equal to 0 according to these constraints*/ %macro setb(dataset) ; data sdka; set &dataset; if a > 0; run; data sdka; set sdka; varn=_n_; run; %do n=1 %to 81; data _null_; set sdka (where = (varn = &n) ); if (a > 0) then do; call symput("ifix", ib); call symput("jfix", jc); call symput("kfix", kr); call symput("afix", a); end; run; data &dataset; set &dataset; array b(9) b1-b9; /* set given clue cells */ if ((ib=&ifix) and (jc=&jfix) and (kr=&kfix)) then do; do nm=1 to 9; b(nm)=0; end; b(&afix)=1; end; drop nm; run; data &dataset; set &dataset; array b(9) b1-b9; /*rule 1 -- no dups in 3x3 cells, columns or rows */ if (a lt 1) then do; if ((ib=&ifix) or (jc=&jfix) or (kr=&kfix)) then b(&afix)=0;

options nocenter ; /*convenient way of reading in a sudoku puzzle */ data sdk; %macro mkvars; %do ic=1 %to 9; %do ir=1 %to 9; length evar&ir.&ic 8 ; evar&ir.&ic=.; %end; %end; %mend mkvars; %mkvars; run; data sdk; set sdk; array evars(9,9) evar11--evar99; do crow=1 to 9 ; do ccol=1 to 9 ; input evars{crow,ccol} @ ; end ; input ; end ; datalines ; 51.74.... .8.6....3 ......9.. .26...... ...3.8... ......15. ..4...... 3....1.7. ....92.68 ; run; /* rearrange data to prepare for application of constraints do-loop. make jc the column variable, kr the row variable and ib the block variable. let a be the cell entry. */ data sdk; set sdk; array evars(9,9) evar11--evar99; do crow=1 to 9; do ccol=1 to 9; jc=ccol; kr=crow; a=evars(crow,ccol); output; end; end; drop crow ccol evar: ; data sdk; set sdk; if (1 le kr le 3) and (1 le jc le 3) then ib=1; else if (1 le kr le 3) and (4 le jc le 6) then ib=2; else if (1 le kr le 3) and (7 le jc le 9) then ib=3; 7

NESUG 2006

Ins & Outs

run; proc tabulate data=&dataset; class jc kr; var ncellstat; tables kr * ncellstat=""*mean=""*f=9.0, jc /rts=6 row=float ; run; %mend showit ; /* macro cellsum calculates sum of b1-b9 within a cell. if only one of b1-b9 is one, then this fixes the cell entry*/ %macro cellsum(dataset) ; data sdk ; set sdk; bsum=sum(of b1-b9); ia=(a > 0); run; data sdk ; set sdk; if ( (a lt 1) and bsum=1) then do; if b1=1 then a=1; else if b2=1 then a=2; else if b3=1 then a=3; else if b4=1 then a=4; else if b5=1 then a=5; else if b6=1 then a=6; else if b7=1 then a=7; else if b8=1 then a=8; else if b9=1 then a=9; upflag=1; end; ia=(a > 0); run; %mend cellsum; /* macro blocksum calculates the sum of (for example) b1 over all cells of a block -- if this sum, blocksum1, is equal to 1, then the entry in the cell with b1=1 gets fixed as equal to 1 */ %macro blocksum(dataset) ; proc sort data=&dataset; by ib; run; data blk&dataset (keep=ib blocksum: ); set &dataset; by ib; if first.ib then do; %do mm=1 %to 9; blocksum&mm =0; %end; end; %do mm=1 %to 9; if b&mm=1 then blocksum&mm + 1 ; %end; if last.ib then output; run; data &dataset; merge &dataset blk&dataset; by ib; run; data &dataset; set &dataset; if blocksum1=1 and b1=1 and (a lt 1) then do; a=1; upflagb=1; end;

end; run; %end; %mend setb; /* macro chkit checks whether any inconsistencies come up due to recent updates to the dataset */ %macro chkit(dataset); %let check=noproblem; data chk&dataset; set &dataset; array b(9) b1-b9; /* check for consistency:: no dups in 3x3 cells, columns or rows */ if (a ge 1) then do; if (^((ib=&ifix) and (jc=&jfix) and (kr=&kfix)) and ((ib=&ifix) or (jc=&jfix) or (kr=&kfix)) and a=&afix ) then do; call symput('check','problem'); output; end; end; run; %if ("&check" = "problem") %then %do; data dummy; length message $46; message = "problem::inconsistency"; label message='note note note:'; run; proc print data=dummy noobs label; var message; run; %end; %mend chkit; /* macro showit displays the current state of the solution. all possible cell entries will get displayed in the cell*/ %macro showit(dataset); data &dataset; set &dataset; length g1-g9 3 ; array b(9) b1-b9; array g(9) g1-g9; do l=1 to 9; if b(l)=1 then g(l)=l; else g(l)=0; end; /*for sas 9 * call cat(cellstat,g1,g2,g3,g4,g5,g6,g7,g8,g9); */ /* for sas 8 */ bcellstat = g1||g2||g3||g4||g5||g6||g7||g8||g9 ; cellstat=compress(trim(left(bcellstat))); cellstat=compress(cellstat,'0'); ncellstat=input(cellstat,8.); nia=(length(cellstat)=1); drop l bcellstat cellstat; 8

NESUG 2006

Ins & Outs

else if blocksum2=1 and b2=1 and (a lt 1) then do; a=2; upflagb=1; end; else if blocksum3=1 and b3=1 and (a lt 1) then do; a=3; upflagb=1; end; else if blocksum4=1 and b4=1 and (a lt 1) then do; a=4; upflagb=1; end; else if blocksum5=1 and b5=1 and (a lt 1) then do; a=5; upflagb=1; end; else if blocksum6=1 and b6=1 and (a lt 1) then do; a=6; upflagb=1; end; else if blocksum7=1 and b7=1 and (a lt 1) then do; a=7; upflagb=1; end; else if blocksum8=1 and b8=1 and (a lt 1) then do; a=8; upflagb=1; end; else if blocksum9=1 and b9=1 and (a lt 1) then do; a=9; upflagb=1; end; drop blocksum: ; run; %mend blocksum; /* macro colsum calculates the sum of (for example) b1 over all cells of a column -- if this sum, colsum1, is equal to 1, then the entry in the cell with b1=1 gets fixed as equal to 1 */ %macro colsum(dataset); proc sort data=&dataset; by jc; run; data col&dataset (keep=jc colsum: ); set &dataset ; by jc; if first.jc then do; %do mm=1 %to 9; colsum&mm=0; %end; end; %do mm=1 %to 9; if b&mm=1 then colsum&mm + 1 ; %end; if last.jc then output; run; data &dataset; merge &dataset col&dataset; by jc; run; data &dataset; set &dataset; if colsum1=1 and b1=1 and (a lt 1) then do;a=1;upflagc=1; end; else if colsum2=1 and b2=1 and (a lt 1) then do;a=2;upflagc=1; end; else if colsum3=1 and b3=1 and (a lt 1) then do;a=3;upflagc=1; end; else if colsum4=1 and b4=1 and (a lt 1) then do;a=4;upflagc=1; end; else if colsum5=1 and b5=1 and (a lt 1) then do;a=5;upflagc=1; end; else if colsum6=1 and b6=1 and (a lt 1) then do;a=6;upflagc=1; end; else if colsum7=1 and b7=1 and (a lt 1) then do;a=7;upflagc=1; end; else if colsum8=1 and b8=1 and (a lt 1) then do;a=8;upflagc=1; end;

else if colsum9=1 and b9=1 and (a lt 1) then do;a=9;upflagc=1; end; drop colsum: ; run; %mend colsum; /* macro rowsum calculates the sum of (for example) b1 over all cells of a row -- if this sum, rowsum1, is equal to 1, then the entry in the cell with b1=1 gets fixed as equal to 1 */ %macro rowsum(dataset) ; proc sort data=&dataset; by kr; run; data row&dataset (keep=kr rowsum: ); set &dataset ; by kr; if first.kr then do; %do mm=1 %to 9; rowsum&mm =0; %end; end; %do mm=1 %to 9; if b&mm=1 then rowsum&mm + 1 ; %end; if last.kr then output; run; data &dataset; merge &dataset row&dataset; by kr; run; data &dataset; set &dataset; if rowsum1=1 and b1=1 and (a lt 1) then do;a=1;upflagr=1; end; else if rowsum2=1 and b2=1 and (a lt 1) then do;a=2;upflagr=1; end; else if rowsum3=1 and b3=1 and (a lt 1) then do;a=3;upflagr=1; end; else if rowsum4=1 and b4=1 and (a lt 1) then do;a=4;upflagr=1; end; else if rowsum5=1 and b5=1 and (a lt 1) then do;a=5;upflagr=1; end; else if rowsum6=1 and b6=1 and (a lt 1) then do;a=6;upflagr=1; end; else if rowsum7=1 and b7=1 and (a lt 1) then do;a=7;upflagr=1; end; else if rowsum8=1 and b8=1 and (a lt 1) then do;a=8;upflagr=1; end; else if rowsum9=1 and b9=1 and (a lt 1) then do;a=9;upflagr=1; end; drop rowsum: ; run; %mend rowsum; %global datafind; %global nrun; data sdkout0; set sdk; if _n_ =1 ; a=999; run; /* macro allclues checks whether there is any change from one run to the next; if there is no change, the do-loop stops.*/ %macro allclues(nrunn, dataset); 9

NESUG 2006

Ins & Outs

%global allb; Ø data temp (drop=b1-b9 ); set sdk; array b(9) b1-b9; do ii=1 to 9; if b(ii)=1 then do; entry=ii; output; end; end; drop ii; run; data sdka; set temp; if a > 0 ; entry=a; clue=9; nincell=1; sincell=1; entryf=1; entryl=1; entryseq=1; bseq=0; ; run; data temp; set temp; if a< 0; run; data temp ; set temp; cnter=_n_; run; proc sort data=temp; by descending cnter; run; data temp1; set temp; if _n_=1; run; proc print data=temp1; run; data temp1; set temp1; call symput("allb", cnter); run; data sdkmiss; set temp; run; proc sort data=sdkmiss; by jc kr ; run; data sdkmiss; set sdkmiss; by jc kr; entryf= first.kr; entryl= last.kr; Ù if first.kr then entryseq=0; entryseq + 1; bseq=_n_; run; proc sort data=sdkmiss; by jc kr ; run; data numposs (keep=jc kr bsum); set sdkmiss; by jc kr; if last.kr then output; run; data numposs; set numposs; by jc kr ; if _n_=1 then sincell=0; sincell + bsum; run; proc sort data=numposs (rename=(bsum=nincell)); by jc kr ; run; proc sort data=sdkmiss; by jc kr ; run; data sdkmiss; merge numposs (keep=jc kr nincell sincell) sdkmiss; by jc kr ; clue=0;run; data sdk; set sdka sdkmiss; run; /******macros used in backtracking ******************/ %let ntrk=1; %global atrk ; %global setclue ; %global entryf ; %global entryl ; %global itrk ; %global jtrk; %global ktrk ; %global cellseq ;

data sdkout&nrunn; set &dataset; %let iprev=%eval(&nrunn -1); proc compare base=sdkout&nrunn compare=sdkout&iprev out=diffsdk outnoequal noprint; run; %let datafind=false; proc print data=diffsdk; title "diffsdk ::: nrun= &nrun "; run; title; /* look for observations;*/ data _null_; set diffsdk; call symput('datafind','true'); call symput("nrun", %eval(&nrunn + 1) ); stop; run; /* if no observations, create "not found" report and redisplay data screen;*/ %if ("&datafind" ^="true") %then %do; data dummy; length message $46; message = "no change from &iprev to &nrun"; label message='notice:'; run; proc print data=dummy noobs label; var message; run; %end; %mend allclues; /*macro runit calls all the above macros in an appropriate sequence*/ %macro runit; %let datafind=true; %let nrun=1; %do %while("&datafind" = "true"); %setb(sdk); %cellsum(sdk); %setb(sdk); %blocksum(sdk); %setb(sdk); %rowsum(sdk); %setb(sdk); %colsum(sdk); %setb(sdk); %cellsum(sdk); %setb(sdk); %showit(sdk); %allclues(&nrun, sdk); %end; %mend runit; %runit; /*****************************************************/ /* _________end of "logical" way-of-thinking! */ /* backtracking algorithm begins........... */ /***change shape of data for backtracking ************/ 10

NESUG 2006

Ins & Outs

/* sets clue= 0 in fixed record of a cell */ run; %mend undorec; /*macro doneyet checks if all 81 clues have been identified*/ %macro doneyet; data sdkchk; set sdk; if clue > 0 ; run; data sdkchk; set sdkchk; chkn=_n_; run; proc sort data=sdkchk; by descending chkn; run; data chkn; set sdkchk; if _n_=1; call symput("doneyet", chkn); run; %mend doneyet; /*******backtracking algorithm begins ****************/ /*refer to flowchart (appendix 1b) ****************/ %let trak=0; %macro backtrak(dataset); Ü %retrn1: run; %if &ntrk le &allb %then %do; %fixone; %setclue; %trak; /*if done, exit */ %if &trak=0 and &sincell=&allb %then %do; %goto exitnow; %end; %if &trak=0 %then %do; %let ntrk=%eval(&sincell + 1); /* go to next cell */ %goto retrn1; %end; %else %if &trak=1 %then %do; %if &entryl=0 %then %do; %undorec; %let ntrk=%eval(&ntrk + 1); %goto retrn1; %end; %if &entryl=1 %then %do; %retrn2: run; %undorec; %let ntrk=%eval(&sincell - &nincell); %if &ntrk=0 %then %do; %put **********check---> inconsistency**************; %end;

%global nincell ; %global sincell ; %global entryseq ; /*fixes one of the many possible candidate entries in a cell to be the cell entry*/ %macro fixone ; Ú data _null_; set sdk (where = (bseq = &ntrk) ); call symput("atrk", entry); call symput("entryf", entryf); call symput("entryl", entryl); call symput("itrk", ib); call symput("jtrk", jc); call symput("ktrk", kr); call symput("nincell", nincell); call symput("sincell", sincell); call symput("entryseq", entryseq); call symput("setclue", clue); run; %mend fixone; /*identifies the ntrk record as a clue */ %macro setclue; data sdk; set sdk; /* set given clue cell */ if (bseq=&ntrk) then clue=1; run;

Û

%mend setclue; /*macro trak checks for consistency */ %macro trak; %let trak=0; data _null_; set sdk; /* check for consistency:: no dups in 3x3 cells, columns or rows */ if clue gt 0 then do; if (^((ib=&itrk) and (jc=&jtrk) and (kr=&ktrk)) and ((ib=&itrk) or (jc=&jtrk) or (kr=&ktrk)) and entry=&atrk ) then do; call symput("trak",'1'); end; end; run; %mend trak; /*macro undorec undoes what was done previously */ %macro undorec; data sdk; set sdk; /* set given clue cell */ if bseq=&ntrk then clue=0;

%fixone; %if &setclue=1 %then %do; %goto retrn2; %end; %else %do; %do %until( &setclue=1) ; 11

NESUG 2006

Ins & Outs

%let ntrk=%eval(&ntrk - 1); %fixone; %end; %end; %if &setclue=1 %then %do; %undorec; %let ntrk=%eval(&ntrk + 1); %goto retrn1; %end;

%end; %end; %end; %exitnow: run; %mend backtrak; %backtrak(sdk);

data sdk ; set sdk; if clue=0 then delete; else if clue=1 then a=entry; run; proc tabulate data=sdk; class jc kr; var a; tables kr * a=""*mean=""*f=6.0, jc /rts=6 row=float ; title "sudoku solution"; run; title; endsas;

12

NESUG 2006

Ins & Outs

Appendix 1b SYMSUDOKU – FLOWCHART FOR BACKTRACKING ALGORITHM

ntrk=bseq

fixone;clue=1 Clue=1 ntrk=Sincell + 1 NO Sincell=Allb?

YES

Consistent?(trak)

YES

NO

exitnow

EntryL=1?

YES Undorec ntrk=Sincell-Nincell

ntrk=ntrk-1

NO

Clue=1?

NO

YES Clue=1? YES

Undorec ntrk=Sincell-Nincell

Undorec ntrk=ntrk+1

13

NO

Undorec ntrk=ntrk+1

NESUG 2006

Ins & Outs

Appendix 2 VKALG r=9 and c=1 and entry=9 or r=9 and c=2 and entry=3 or r=9 and c=3 and entry=7 ) then clue=1; run; %macro doit; %local dochk; %let dochk=1; %global inconsis; %global chkwarn; %let inconsis=0; %let CHKWARN=ok; %do %while (&dochk=1 and &chkwarn=ok and &inconsis=0); /* any of these nodupkeys produces +ve # of deletes, means there exists an inconsistency*/ data cluesRC; set new; if clue=1; Ø proc sort nodupkey data=cluesRC (keep=r c); by r c ; run; data cluesR; set new; if clue=1; proc sort nodupkey data=cluesR (keep=r entry); by r entry ; run; data cluesC; set new; if clue=1; proc sort nodupkey data=cluesC (keep=c entry); by c entry ; run; data cluesB; set new; if clue=1; proc sort nodupkey data=cluesB (keep=block entry); by block entry ; run;

options nocenter ; data new; do entry=1 to 9 ; /*entry */ do c=1 to 9 ; /* column*/ do r=1 to 9 ; /* row */ if (1 le r le 3) and (1 le c le 3) then block=1; else if (1 le r le 3) and (4 le c le 6) then block=2; else if (1 le r le 3) and (7 le c le 9) then block=3; else if (4 le r le 6) and (1 le c le 3) then block=4; else if (4 le r le 6) and (4 le c le 6) then block=5; else if (4 le r le 6) and (7 le c le 9) then block=6; else if (7 le r le 9) and (1 le c le 3) then block=7; else if (7 le r le 9) and (4 le c le 6) then block=8; else if (7 le r le 9) and (7 le c le 9) then block=9; output; end; end;end; cellcnt=.; rowcnt=.; colcnt=.; blkcnt=.; run; data new; set new; /*Clues*/ if ( r=1 and c=7 and entry=8 or r=1 and c=8 and entry=6 or r=1 and c=9 and entry=2 or r=2 and c=3 and entry=9 or r=3 and c=1 and entry=4 or r=3 and c=2 and entry=6 or r=3 and c=3 and entry=2 or r=3 and c=6 and entry=1 or r=4 and c=3 and entry=3 or r=4 and c=4 and entry=4 or r=4 and c=5 and entry=2 or r=4 and c=6 and entry=6 or r=4 and c=9 and entry=7 or r=6 and c=1 and entry=6 or r=6 and c=4 and entry=9 or r=6 and c=5 and entry=3 or r=6 and c=6 and entry=7 or r=6 and c=7 and entry=5 or r=7 and c=4 and entry=8 or r=7 and c=7 and entry=4 or r=7 and c=8 and entry=3 or r=7 and c=9 and entry=1 or r=8 and c=7 and entry=7 or

proc sort data=new; by r c; run; Ù data new discardRC ; merge cluesRC (in=in1) new; by r c; Cellflag=in1; if ^(in1 and clue^=1) then output new; else output discardRC; run; proc sort data=new; by r entry ; run; data new discardR; merge cluesR (in=in1) new; by r entry; rowflag=in1; if ^(in1 and clue^=1) then output new; else output discardR; run; proc sort data=new; by c entry ; run; data new discardC ; merge cluesC (in=in1) new; by c entry; 14

NESUG 2006

Ins & Outs

call symput('CHKWARN','INCONSISTENT CLUES'); end; end;

colflag=in1; if ^(in1 and clue^=1) then output new; else output discardC; run; proc sort data=new; by block entry ; run; data new (drop=cellcnt rowcnt colcnt blkcnt) discardB ; merge cluesB (in=in1) new; by block entry; blkflag=in1; if ^(in1 and clue^=1) then output new; else output discardB ; run; /*check for cell sums */ Ú %macro checkit(restrict, r1var, r2var); proc sort data=new ; by &r1var &r2var ; data &restrict.cnt; set new ; by &r1var &r2var; firstc=first.&r2var; lastc=last.&r2var; if first.&r2var then do; &restrict.cnt=0;end; &restrict.cnt + 1; if last.&r2var then output; run; %mend; %checkit(cell, r, c); %checkit(row, r, entry); %checkit(col, c, entry); %checkit(blk, block, entry); proc sort data=cellcnt (keep=r c cellcnt); by r c ; run; proc sort data=rowcnt (keep=r entry rowcnt); by r entry; run; proc sort data=colcnt (keep=c entry colcnt); by c entry; run; proc sort data=blkcnt (keep=block entry blkcnt); by block entry; run; proc sort data=new ; by r c ; run; data new ; merge cellcnt new ; by r c ; run; Û proc sort data=new; by r entry; run; data new; merge rowcnt new; by r entry; run; proc sort data=new ; by c entry; run; data new; merge colcnt new; by c entry; run; proc sort data=new; by block entry; run; data new; merge blkcnt new; by block entry; run;

if clue ^=1 then do; if (cellcnt=1 or rowcnt=1 or colcnt=1 or blkcnt=1) then do; clue=1; cellcnt=1; rowcnt=1; colcnt=1; blkcnt=1; fclue=1; call symput('dochk','1'); end; if fclue ^=1 then do; if ((cellcnt le 1) or (rowcnt le 1) or (colcnt le 1) or (blkcnt le 1)) then do; call symput('inconsis','1'); end; end; end; run; %put Inconsis= &inconsis ; %put CHKWARN===== &chkwarn; %end; %global allclues; %let allclues=TRUE; Ý data _null_; set new (where =(clue ^= 1)); call symput ('allclues','FALSE'); run; %mend doit; %doit; /* PREPARE FOR RE-LOOP METHOD*****/ /* delete records one-by-one; re-install if no solution found */ %macro prepit; data new; set new; delseq=. ; run; data newclue; set new (where=( clue^=1 and (cellcnt=2 or rowcnt=2 or colcnt=2 or blkcnt=2 ))); delseq=.; run; data newclue; set newclue (drop=delseq); Delseq = _N_ ; Þ run; %global numnon;

%let dochk=0; data new ; set new; if clue=1 then do; if (cellcnt > 1 or rowcnt >1 or colcnt > 1 or blkcnt >1) then do; Ü 15

NESUG 2006

Ins & Outs

data new; set newfix; if Delseq=&il then delete; call symput('il',%eval(&il + 1)); run; %doit; &useit %prepit; %end; %if &allclues=TRUE %then %do; proc tabulate data=new; class r c; var entry; tables r * entry=""*Mean=""*f=2.0, c /rts=6 row=float; run; %end; %mend RECIT; %RECIT; endsas;

proc sql noprint; select nobs into :numnon from dictionary.tables where upcase(libname)="WORK" and upcase(memname)="NEWCLUE"; quit; run; proc sort data=newclue; by r c entry; run; proc sort data=new ; by r c entry; run; data newfix; merge new newclue (keep=r c entry Delseq); by r c entry; run; %mend prepit; %prepit; %let useit=*; /*toggle useit to blank if necessary*/ /* RE-LOOP BEGINS *****************************/ %macro recit; %let il=1; %do %while((&allclues=FALSE) and (&il le %eval(&numnon))) ;

16

NESUG 2006

Ins & Outs

Appendix 3 SAS4SAS /* SAS4SAS SUDOKU ANALYSER/SOLVER FOR SAS This program analyzes and solves SUDOKU puzzles It is capable of finding all possible solutions to any puzzle It can also solve SUDOKU puzzles in dimensions other than the classic 9x9 grid */ *-- Define Program parameters ; %let intermsteps = yes ; /* show intermediate steps for debugging purposes */ %let intermsteps = no ; /* do not show intermediate steps for debugging purposes */ %let limnsolutions = 5 ; /* limit of number of solutions */ %let limnsolutions = 500 ; /* limit of number of solutions */ %let limnsolutions = 1 ; /* limit of number of solutions */ *-- Define Grid Attributes ; %let boxsize = 2 ; /* box size (puzzle dimension) */ %let boxsize = 4 ; /* box size (puzzle dimension) */ %let boxsize = 3 ; /* box size (puzzle dimension) */ %let gridsize = %eval(&boxsize.*&boxsize.) ; /* grid size */ %let ncells = %eval(&gridsize.*&gridsize.) ; /* number of cells */ %let cellwidth = 3 ; /* cell width */ %let gridwidth = %eval((&cellwidth.+1)*&gridsize.+1) ; /* grid width */ options ps=%sysfunc(max(%eval(2*&gridsize.+5),60)) ls=%sysfunc(max(%eval((&cellwidth.+1)*&gridsize.+10),80)) ; *== Define macro procedures ; %macro genotherparam ; /* generate other parameters */ %global debug_opt ; /* use with selected put statement */ %if %upcase(&intermsteps)=YES %then %let debug_opt = ; %else %let debug_opt = * ; %mend ; %genotherparam ; %macro list2dimv(vgenn,dim1,dim2) ; /* list 2-dimensional array items */ %do idim1=1 %to &dim1. ; %let fidim1 = %sysfunc(putn(&idim1.,z2.)) ; %do idim2=1 %to &dim2. ; &vgenn.&fidim1.%sysfunc(putn(&idim2.,z2.)) %end ; %end ; %mend ; *=== Main Procedure ; title1 "********** SUDOKU ANALYZER/SOLVER FOR SAS **********" ; 17

NESUG 2006

Ins & Outs

data puzzlegrid (keep=nsolution crow_ ccol_ puzzlecell) ; file print ; length gridtitlef $&gridwidth. ; gridtitlef = repeat("*",&gridwidth.-1) ; *-- define puzzle grid ; array puzzleg{&gridsize.,&gridsize.} %list2dimv(puzzleg,&gridsize.,&gridsize.) ; *-- get puzzle ; do crow=1 to &gridsize. ; do ccol=1 to &gridsize. ; input puzzleg{crow,ccol} @ ; end ; input ; end ; *-- check validity of puzzle grid ; /*Ø*/ puzzleok = 1 ; do crow=1 to &gridsize.-1 while (puzzleok) ; do ccol=1 to &gridsize.-1 while (puzzleok) ; if .