Exact, Uniform Sampling of Latin Squares and Sudoku Matrices
arXiv:1502.00235v1 [math.ST] 1 Feb 2015
Stephen DeSalvo1 January 31, 2015 Abstract. We provide a method for the exact, uniform random sampling of Latin squares and Sudoku matrices via probabilistic divide-and-conquer. Using an implementation in C++, we have generated i.i.d. samples of size 1000 each of Sudoku matrices and Latin Squares of order n, 5 ≤ n ≤ 9. Keywords. Random Sampling, Latin Square, Sudoku, Probabilistic Divide-and-Conquer, Rejection Sampling MSC classes: 60C05, 65C50, 60-04,
1
Introduction
The random sampling of combinatorial structures is an active topic, with many papers devoted to applying the principles of rejection sampling, see for example [Duchon et al., 2004], or running a Markov chain, see for example [Jerrum & Sinclair, 1996]. Each method presents difficulties: rejection sampling provides exact samples in finite time, but that finite time is often too large to be useful in practice; rapidly mixing Markov chains are often easily fashioned to a problem, along with bounds on the mixing time, but they are notably inexact in finite time. And while Markov chain coupling from the past [Propp & Wilson, 1996] is an exact sampling method, it typically requires either a complete enumeration of the objects or at least some type of monotonic structure, which is often not easily accessible. Our algorithm of choice is probabilistic divide-and-conquer (PDC) [Arratia & DeSalvo, 2011], which is an exact sampling method in finite time. It is provably more efficient than hard rejection sampling, which is rejection sampling with a rejection probability of 0 or 1, respectively, depending on whether the sample lies within a prescribed set of objects, which we call the target region, or outside of it. The main idea of PDC is to break up the hard rejection step into two smaller steps: the first step generates a piece of the random object, weighted according to its prevalence inside the target region, and then the remaining part of the random object is completed in proportion to its prevalence inside the target region. There are many ways to implement PDC, and in this paper we implement a version called almost deterministic second half, which requires very little known about the target region. When we do have detailed knowledge about the target region, in particular partial completions of random objects inside the target region, as is the case for Sudoku matrices, we show how one can more optimally design a PDC algorithm. In addition, our algorithms work 1
Department of Mathematics, University of California Los Angeles.
[email protected]
1
directly with permutations and tables of values, and do not require any sophisticated implementations of transformations other than permutations of rows and columns in a matrix, nor do they require any exceedingly large amounts of storage. There have been many notable approaches which apply specifically to the random generation of Sudoku matrices. An efficient backtracking algorithm is utilized in [Newton & DeSalvo, 2010], though no quantitative bounds are given for the bias. Another approach is to fashion a Markov chain with uniform stationary distribution; this was described in [Jacobson & Matthews, 1996]; see also [Fontana et al., 2012]. Importance sampling delivers a collection of Sudoku matrices as well as weights which allows one to calculate unbiased estimates of statistical parameters; this was done in [Ridder, 2013]. Since Sudoku matrices are in one-to-one correspondence with certain types of graphs, see for example [Fontana, 2013], we surmise a large number of Markov chain algorithms exist which could be adapted to this problem, and, since our present goal is an algorithm which is exact and u.a.r. in finite time, we do not pursue this further. The paper is organized as follows. In Section 2 we give the definitions of a Latin Square of order n and Sudoku matrix. In Section 3.1 we present some simple rejection sampling algorithms and calculate rejection probabilities, and in Section 3.2 we state and describe our recommended algorithms for random generation of Sudoku matrices and Latin squares. In Section 4 we review the relevant combinatorial arguments which justify the various rejection probabilities used, and analyze the run–time costs of the algorithms. In Section 5 we review other approaches.
2
Definitions
Sudoku matrices are 9 × 9 matrices which satisfy the following row, column, and block constraints, which we refer to as the Sudoku conditions: • Each row, labelled R1 , . . . , R9 , is a permutation of {1, 2, . . . , 9}; • Each column, labelled C1 , . . . , C9 , is a permutation of {1, 2, . . . , 9}; • There are nine 3 ×3 sub-blocks, labelled B1 , B2 , . . . , B9 , each of which is a permutation of {1, 2, . . . , 9}; these blocks are indicated below. B1
B2
B3
B4
B5
B6
B7
B8
B9
2
We let S denote the set of all Sudoku matrices. A Latin square of order n is an n × n matrix which satisfies the following row and column constraints, heretofore referred to as the Latin square conditions: • Each row, labelled R1 , . . . , Rn is a permutation of {1, 2, . . . , n}. • Each column, labelled C1 , . . . , Cn is a permutation of {1, 2, . . . , n}. We let LSn denote the set of all Latin squares of order n. Note that S ⊂ LS9 . In the calculations that follow, a ≈ b is simply the elementary meaning where a and/or b have been rounded to some nearest value.
3
Sampling Algorithms
3.1
Rejection Sampling Algorithms
There are various simple algorithms for obtaining a uniformly at random (u.a.r) element of S; three approaches are below, which easily generalize to LSn for any n. 1. Sample each entry i.i.d. uniform over the set {1, 2, . . . , 9}; restart if any of the Sudoku conditions are not satisfied. 2. Sample each row (or column or block) as i.i.d. uniform over the set of all permutations of {1, 2, . . . , 9}; restart if any of the Sudoku conditions are not satisfied. 3. Let the first row be (1, 2, . . . , 9). Sample rows 2 through 8 as i.i.d. uniform over the set of all fixed–point free permutations of {1, 2, . . . , 9}; if row 9 can be completed, with all Sudoku conditions satisfied, then complete it, and subsequently transform the entries by a random permutation of {1, . . . , 9} and return the matrix; otherwise, restart. These algorithms are generally known as rejection algorithms, since the target set of objects S lies within the complete sample space Ω. Since the algorithm generates samples uniformly over all elements of Ω, any Sudoku matrix generated by such a hard rejection algorithm is also uniform over S. The expected number of times one must sample elements of Ω before obtaining a sample from S is simply the quotient |Ω|/|S|, where the notation |A| denotes the number of elements in a finite set A. It was shown in [Felgenhauer & Jarvis, 2006] that the number of Sudoku matrices is exactly |S| = 6670903752021072936960 ≈ 6.67 × 1021 . This number is too large to simply list all of the elements and select one at random.
3
There are 981 ≈ 2.0 × 1077 matrices in Ω1 = {1, . . . , 9}9×9 , so for the first, most naive approach of i.i.d. entry sampling, the expected number of times one must sample from Ω1 9! 56 is about 3.3 × 10 . There are |Ω2 | = 9 ≈ 3.0 × 1044 different ways of selecting 9 distinct permutations of {1, 2, . . . , 9}, whence, |Ω2 |/|S| ≈ 4.5 × 1022 . The third algorithm is the most respectable of the three, as it eliminates two of the rows and reduces the number of possible permutations from 9! to ⌊9!/e⌋. However, since we are fixing the first row, we can only sample from those elements of S which also have the same first row; thus, we must normalize by |S|/9! instead of |S|. We have Ω3 = {the set of all collections of seven fixed–point free permutations of {1, 2, . . . , 9}}, hence, |Ω3 | (9!/e)7 ≈ ≈ 4.2 × 1019 . |S|/9 1.8 × 1016 These rejection rates are not practical. At this stage, there are two different ways to proceed. One approach is to relax the demand that the measure of S is exactly uniform, and instead adopt algorithms which are faster but not guaranteed to be uniform. The second approach is to take advantage of more detailed properties of the elements in S, which we now explore using PDC.
3.2
PDC Sampling Algorithms
Rejection sampling algorithms can very often be improved, with very little extra information known about the target region, using PDC. One must be able to write the sample space Ω as pairs of elements (A, B), where A ∈ A and B ∈ B, for some sets of elements A and B which can be sampled independently. Then, the target distribution on S ⊂ Ω, denoted by L(S), must be such that L(S) = L((A, B)|h(A, B) = 1), where h : A × B → {0, 1} is some (measurable) functional which determines whether or not the pair (A, B) lies in the target set. The PDC Lemma [Arratia & DeSalvo, 2011, Lemma 2.1] states that we may sample first from L(X) = L(A|h(A, B) = 1), say with observation X = x, and then from L(Y |X = x) = L(B|h(x, B) = 1), whence L(S) = L(X, Y ). An algorithm to perform these steps is as follows: 1. Generate sample from L(A | h(A, B) = 1), call it x. 2. Generate sample from L(B | h(x, B) = 1), call it y. 3. Return (x, y). 4
Often, however, the conditional distributions are not known, and so the more practical PDC algorithm utilizes von Neumann’s rejection sampling approach [Von Neumann, 1951], which allows us to sample from the conditional distribution L(A|h(A, B) = 1) using L(A) and a biased coin. Algorithm 1 [Arratia & DeSalvo, 2011] Probabilistic Divide-and-Conquer via von Neumann 1: Generate sample from L(A), call it a. 2: Accept a with probability t(a), where t(a) is a function of L(B) and h; otherwise, restart. 3: Generate sample from L(B | h(a, B) = 1), call it y. 4: Return (a, y).
In order to sample directly from the conditional distribution L(B | h(a, B) = 1), we choose L(A) such that L(B | h(a, B) = 1) is trivial, or consists of only a few elements. The rejection probability t(a) must satisfy t(a) =
P(h(a, B) = 1) ≤ 1, α
for all a ∈ A and some 0 < α ≤ 1;
The optimal choice of α is maxa P(h(a, B) = 1), but the algorithm is still valid and unbiased for any α larger than this value. Thus, in order to apply PDC, it suffices to choose any universal upper bound of P(h(a, B) = 1), but greater efficiency is achieved by selecting the optimal value of α. The first algorithm presented is for n × n Latin squares. Algorithm 2 PDC Uniform Latin Square Sampling algorithm 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12:
Let R1 = (1, . . . , n). for i = 2, . . . , n − 3 do Generate Ri uniformly from the set of fixed–point free permutations starting with i. end for Let d denote the number of possible completions given (R1 , . . . , Rn−3 ). if U < ⌊ n−3 d⌋ ⌊n/2⌋ then 6 3 2 Select a completion uniformly at random from the d possible completions. else Goto 2 end if Apply a random permutation to the rows of the matrix. Apply a random permutation of {1, . . . , n} to the entries and return.
This algorithm is simple, yet effective for uniform sampling of Latin squares for n ≤ 9. The expected number of times we must resample (R2 , . . . , Rn−3 ) before we accept a sample is 5
given by (see Theorem 4.8) 6⌊ expected number of samples =
n−3 ⌋ 3
2⌊n/2⌋
⌊n!/e⌋ (n−1)
n−4
Ln /n!(n − 1)!
=: En ,
n ≥ 5,
where Ln denotes the number of Latin squares of size n. A table of values is below; the values for Ln were taken from the exact values on the Wikipedia page http://en.wikipedia.org/wiki/Latin_square. n 5 6 7 8 9 10
Ln 1.6 × 105 8.1 × 108 6.14 × 1013 1.08 × 1020 5.52 × 1027 9.98 × 1036
En 4.3 63 2.1 × 103 6.3 × 105 4.6 × 108 3.3 × 1012
These values indicate that n ≥ 10 is too large for Algorithm 2 to be useful using a modern personal computer. Also relevant for implementation purposes, is the fact that for small n one may enumerate and store all fixed–point free permutations of n; after such a random access list is computed and stored it is efficient to randomly sample i.i.d. elements. We now describe a similar algorithm for Sudoku matrices. The notation P9′ below refers to a subset of all possible permutations of {1, 2, . . . , 9} which do not violate the column conditions of a Sudoku matrix based on the preceding rows R1 , R2 , R3 . Also, by U we mean a uniform random number from the interval [0, 1], independent of all other random variables. Algorithm 3 PDC Uniform Sudoku Sampling algorithm. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:
Let B1 = (1, 2, . . . , 9). Select B2 , B3 in proportion to the number of completable Sudoku matrices. Generate (R4 , R5 , R6 , R7 ), i.i.d. permutations from P9′ . Let d denote the number of possible completions given (B1 , B2 , B3 , R4 , R5 , R6 , R7 ). d , then if U < 16 Select a completion uniformly at random from the d possible completions. else Goto 3. end if Apply a random permutation to (C4 , C5 , C6 ). Apply a random permutation to (C7 , C8 , C9 ). Apply a random permutation to ((C4 , C5 , C6 ), (C7 , C8 , C9)). Apply a random permutation of {1, 2, . . . , 9} to the entries and return.
Line 2 demands some further explanation. By considering various symmetries in the permutations of columns, it was shown in [Felgenhauer & Jarvis, 2006] that there are only 36288 6
essentially unique completions to blocks B2 and B3 2 ; and, for each configuration of blocks B1 , B2 , B3 , there is a certain number of completable Sudoku matrices, which was calculated via brute force. A downloadable file containing these enumerations is posted at http://www.afjarvis.staff.shef.ac.uk/sudoku/bertram.html; the first few lines look like the following: [456789,789123,123456] [456789,789123,123465] [456789,789123,123546] [456789,789123,123564] [456789,789123,123645] ...
=> => => => =>
108374976 102543168 102543168 100231616 100231616
The list of three 6–tuples represent columns 3 through 9 of the first three rows, and the integer value at the end of the arrow counts the number of completable Sudoku matrices given these first three rows. For example, the text file says that the matrix with top three rows given by 1 2 3 4 5 6 7 8 9 4 5 6 7 8 9 1 2 3 7 8 9 1 2 3 6 4 5 has precisely 108374976 completions of the bottom 6 rows which are valid Sudoku matrices. Line 2 is thus a straightforward sampling of the entries in B2 and B3 , in proportion to the number of completed Sudoku matrices. There are two reasonable ways to perform this sampling. Denote by xi the number of possible completions given outcome i, i = 1, . . . , 36288. One can normalize by the sum Pof all the completions and generate a value using the discrete probability distribution {xi / j xj }i=1,...,36288 ; this is not so ideal in general due to floating point considerations. Instead, we apply PDC in our implementation by sampling uniformly from among the 36288 possibilities first, say we select outcome i, and we accept this sample with probability xi / maxj xj . The rest of the algorithm is straightforward.
4
Analysis of Algorithms
In this section we establish the rejection probabilities of Algorithm 2 and Algorithm 3, and calculate expected run–time costs. First, we justify the rejection probabilities. Lemma 4.1. Assume the first seven rows of a Sudoku matrix have been filled in, and none of the Sudoku conditions are violated based on these first seven rows. Then the total number of possible completions of this matrix which also do not violate the Sudoku conditions is at most 16. This bound cannot be improved. 2
This number was later simplified to 71 after taking more symmetries into account.
7
Proof. The proof is by elementary reasoning. WLOG, assume the last two rows of column j can be completed by {2j − 1, 2j} for j = 1, 2, 3, 4. We shall denote this by the following table 1 3 5 7 9 2 4 6 8 a It is obvious that we may interchange the 1 and the 2, the 3 and the 4, etc., however, whichever element is paired with the 9, say 5, determines which row the 9 lies in. Every other pairing in further columns is also uniquely determined based on the orientation of entries of the first four columns. Thus, there are at most 24 = 16 possible completions. The following matrix (in reduced form) was generated by Algorithm 3. The first seven rows of this matrix allow for 16 potential completions, thus, 16 is a tight upper bound. 1 4 7 9 5 6 3 8 2
2 5 8 4 7 3 6 9 1
3 6 9 8 2 1 7 5 4
4 7 1 2 3 5 9 6 8
5 8 6 1 4 9 2 3 7
9 2 3 7 6 8 4 1 5
6 3 2 5 8 4 1 7 9
7 1 5 6 9 2 8 4 3
8 9 4 3 1 7 5 2 6
We can continue this line of reasoning further, by sampling all but the final k rows. Lemma 4.2. Assume the first 9 − k rows of a Sudoku matrix have been filled in, and none of the Sudoku conditions are violated. Then the total number of possible completions of this matrix which also do not violate the Sudoku conditions is at most s(k) := 24 63 . . . (k!)⌊9/k⌋ . Algorithm 2 for Latin squares fixes the first column entry of the i–th row to be i, so the corresponding upper bound is slightly smaller. Lemma 4.3. Assume the first n − k rows of a Latin square of order n have been filled in according to Algorithm 2, and none of the Latin square conditions are violated. Then the total number of possible completions of this matrix which also do not violate the Latin square conditions (and whose first coordinate is fixed) is at most ℓ(n, k) := 2⌊n/2⌋ 6⌊n/3⌋ . . . (k!)⌊(n−k)/k⌋ . These lemmas are of course trivial, though sufficient to state and prove the algorithms in this paper and still provide a more efficient means of sampling than hard rejection sampling. We strongly note, however, that should any improvement to these bounds be proved, one could then substitute in directly for s(k) or ℓ(n, k) into the denominator of the rejection probability. 8
To help facilitate such an endeavor, we give a motivating example when k = 3. The provable bound for Sudoku matrices is s(3) = 24 63 = 3456. However, the largest observed value in a sample of size 1000 was 288. If, in fact, 288 is the smallest upper bound, and was used instead of 3456, then the run–time of the algorithm which samples all but the final three rows would be reduced by a factor of 12. A major cost in this case, however, is that of finding the complete list of completable triplets of rows; it is this cost which makes our implementation of case k = 2 currently faster. Remark 4.4. For PDC algorithms in general, Hall’s marriage theorem can establish a strictly positive rejection probability, but our primary interest is an upper bound on the maximum possible number of completions. An enumerative form of Hall’s marriage theorem, which provides non–trivial upper bounds on the total number of possible completions, not just that at least one exists, would greatly add to the effectiveness of PDC as a general algorithm, as demonstrated in the example above. We now justify Line 2 of Algorithm 3. In [Felgenhauer & Jarvis, 2006], the total number of Sudoku matrices is found by first reducing by symmetry the total number of possible completed first three rows; the number given is 36288. For each of these configurations, the authors of [Felgenhauer & Jarvis, 2006] use a brute force search for the number of possible Sudoku matrices that could be completed conditional on one of the possible first three rows. Lemma 4.5 ([Felgenhauer & Jarvis, 2006]). The table of 36288 possible configurations of B2 and B3 contains, for each configuration, the number of possible Sudoku matrices that can be completed given the first three rows. These numbers all lie between 94888576 and 108374976. The fact that these values are relatively constant has a number of quantitative and qualitative interpretations. From a rejection sampling perspective, it means that we should reject each proposed set of first three rows in proportion to the number of completions, normalized by the maximum possible, i.e., 108374976. Thus, the probability of rejecting a sample is at worst 1 − 94888576/108374976 ≈ 0.12444. Practically, this means that sampling of the first three rows is efficient. Also, it says that each configuration is approximately as completable as any other configuration; i.e., accepting any completable configuration of the first three rows introduces a small bias. Thus, the main cost of the algorithm is the rejection sampling of rows 4 through 7. Through a complete enumeration of all 36288 cases, the number of valid permutations is always between 12000 and 12096. There are thus at most 120964 ≈ 2 × 1016 different combinations of permutations that can be placed in rows 4 through 7, but only about 108 valid completions are possible. This means that on average we must sample about 2 × 108 4–tuples before we get lucky and obtain a quadruplet that yields a completable Sudoku matrix. Even after we obtain a completable quadruplet, however, we must still survive the final rejection cost which comes from the fact that different quadruplets yield a different number of completable Sudoku matrices. Lemma 4.6 ([Devroye, 1986]). Algorithms which utilize rejection sampling have a geometrically distributed number of generations before acceptance. Let f denote the target distri-
9
bution, and g denote the sampling distribution for rejection sampling. The parameter of the geometric distribution which yields optimal rejection rates is given by −1 f (i) C := max . i g(i) Theorem 4.7. Algorithm 3 samples uniformly over the set of all Sudoku matrices. The number of times to sample blocks B2 and B3 by uniformly choosing one of the 36288 possible choices uniformly at random and using rejection sampling is geometrically distributed with expected value 36288 · 108374976 ≈ 1.1. (1) 3546146300288 The number of times to sample rows R4 , R5 , R6 , R7 is geometrically distributed with expected value at least 3.0 × 109 and at most 3.6 × 109 . Proof. The uniformity of the algorithm follows by two applications of probabilistic divideand-conquer given by Algorithm 1. We now calculate the precise rejection probabilities. The first application of PDC is given by the division A = (B1 , B2 , B3 ),
B = (R4 , R5 , R6 , R7 , R8 , R9 ).
The rejection probabilities t(a) are given by the number of possible completions normalized by the maximum number of possible completions; these values are from Lemma 4.5. We index each of the possible configurations of B2 and B3 by i = 1, 2, . . . , 36288. Then we select one uniformly at random, hence g(i) = 1/36288, but the distribution desired is the one that weights each configuration in proportion to the number of possible completions, denoted by xi previously, which is contained in the table of values computed in [Felgenhauer & Jarvis, 2006]; that is, f (i) =
xi , 3546146300288
i = 1, . . . , 36288,
P where 3546146300288 = j xj is the total number of possible completions by all elements in the table, which makes f (i) a probability distribution. The expected value of a geometric random variable is given by the inverse of its parameter, whence the expected number of times to sample from blocks B2 and B3 is given by 1 f (i) 36288 · (maxi xi ) = = max , i C g(i) 3546146300288 which is given by Equation 1. The second application of PDC is the division A = (R4 , R5 , R6 , R7 ),
B = (R8 , R9 ).
The rejection probability t(a) is given by the number of possible completions (calculated via brute force) given a = (r4 , . . . , r7 ), normalized by 16, the optimal upper bound provided by 10
Lemma 4.1. Rather than sample from the set of all permutations of {1, . . . , 9}, since we have already accepted the first three rows of the matrix, we can automatically discard any permutations which would violate the Sudoku conditions. Thus, we sample these four rows from the set P9′ , which depends on the particular configuration of B2 and B3 accepted in the first part. By brute force calculation over all 36288 possible first three rows, we have 12000 ≤ |P9′ | ≤ 12096. Thus, 1 1 1 ≤ g(j) = ′ 4 ≤ , for all j. 4 12096 |P9 | 120004 Similarly, we have # completions given first seven rows 16 16 ≤ f (j) = ≤ . 108374976 # completions given first three rows 94888576 Whence, 1 120004 × 16 ≤ = 3.0 × 10 ≈ 108374976 C 9
120964 × 16 f (j) ≤ max ≈ 3.6 × 109 . j g(j) 94888576
The random sampling of the first three rows in Algorithm 3 is efficient, and thus we confidently suggest the rejection scheme in lieu of forming the discrete distribution normalized by the sum of all entries in order to avoid possible floating point errors. The rejection sampling and calculation costs of the next four rows, however, is the main cost of the algorithm. For Latin squares, our algorithm does not take into account any partial information. Theorem 4.8. Algorithm 2 samples uniformly over the set of all Latin squares of order n. The number of times to sample rows R2 , . . . , Rn−3 is geometrically distributed with expected value n−4 n−3 6⌊ 3 ⌋ 2⌊n/2⌋ ⌊n!/e⌋ (n−1) , n ≥ 5. En = Ln /n!(n − 1)! Proof. We note that Ln is normalized by n!(n − 1)! since we are taking the first row and first column to be the permutation (1, . . . , n). Each of the rows Ri , i = 2, . . . , n − 3, is sampled from the set of all fixed-point-free permutations with first entry i; by symmetry, there are ⌊n!/e⌋/(n − 1) such permutations, and our sample space is the set of all (n − 4)–tuples. The proof proceeds in the same fashion as the proof of Theorem 4.7, using Lemma 4.3 and Lemma 4.6.
5 5.1
Other approaches Alternative PDC parameterizations
There are two simple alternative approaches to Algorithm 3 that require little modification of the original algorithm. The first approach is to randomly sample R4 through R8 , and, 11
conditional on the existence of a completion to R9 , we may simply accept the sample R4 through R8 and fill in the unique completion to R9 ; this variation is called PDC deterministic second half [Arratia & DeSalvo, 2011], see also [DeSalvo, 2014]. The expected number of times we must resample R4 through R8 is easily seen to be within the interval 120965 120005 ≈ [2.3, 2.73] × 1012 , , 108374976 94888576 which is substantially more costly than the almost deterministic second half approach utilized in Algorithm 3. The only advantage this approach offers is that one does not need to determine the number of possible completions given the first seven rows; once the first eight rows are determined, the last row is uniquely determined, and the probability of acceptance is normalized to 1. Alternatively, we can instead randomly sample R4 through R6 , and reject these three rows in proportion to the number of possible completions of the last three rows. Absent of any non–trivial information on the number of possible possible completions, we must select a universal upper bound over all possibilities, as provided by Lemma 4.2 using k = 3. The expected number of times we must resample R4 through R6 is then within the interval 4 3 2 6 120003 24 63 120963 ≈ [0.92, 1.1] × 107 . , 108374976 94888576 The reason why we cannot champion this approach presently is because our implementation for k = 2 in C++ is optimized to enumerate between all possible completions using fast bit– wise operations, whereas in the more general case our brute force code does not have this option encoded. We emphasize that while we were unable to efficiently code this algorithm, should such a module be coded efficiently it could very easily beat Algorithm 3. Algorithm 2 can be generalized so that rows R2 , . . . , Rn−k are sampled in Line 5 for the first step in PDC. In this case, the rejection probability in Line 6 is replaced with d/ℓ(n, k), and the number of times to resample has expected value n−k−1 ℓ(n, k) ⌊n!/e⌋ (n−1) , n ≥ 5, 1 ≤ k ≤ n − 4. Ln /n!(n − 1)!
5.2
Alternative Approaches
When various symmetries are taken into account, one can reduce the total number of Sudoku matrices to 5472730538 ≈ 5.4×109 essentially different Sudoku matrices [Russell & Jarvis, 2006]. This number is certainly more practical, and a comprehensive list of such matrices could be stored for random access if the memory was available, thus offering a O(1) algorithm, assuming the transformations also take O(1) to implement. The advantage with our approach is that its memory requirements are entirely practical, even for a computer of modest means. Nevertheless, should Algorithm 3 be deemed not efficient enough, a more efficient random sampling algorithm may be achievable by taking these symmetries into account. 12
A recent approach to random sampling of Latin squares and Sudoku matrices is via the random sampling of S-permutation matrices [Dahl, 2009]; see also [Fontana, 2011, Fontana, 2013, Yordzhev, 2013a, Yordzhev, 2012]. While this procedure generates i.i.d. uniform samples, the rejection probability for Sudoku matrices was shown in [Yordzhev, 2013b] to be on the order of 10−14 . For Latin squares of order n, the procedure was implemented in [Fontana, 2013] and shown to be an effective method of uniform sampling for n ≤ 7. The fact that the number of completions of partially completed Sudoku matrices are relatively constant indicate that we introduce a small bias by accepting “anything that works.” Thus, by skipping the rejection step, we introduce a small but quantifiable bias into the sample; we can counter this bias by attaching a weight to each matrix, as is done in important sampling, to obtain unbiased estimators. While this does not represent an i.i.d. sample, the weights allow for unbiased parameter estimation; see for example [McKay et al., 1979, Ridder, 2013]. Finally, one may ask about generalizing the techniques of this paper to generalized n2 × n2 Sudoku grids. The efficiency in Algorithm 3 depends on having access to the table of values which determines the number of possible completions given the first three rows. If such partial information is known, then it may be possible to design an efficient, practical PDC algorithm which splits up the random sampling steps in stages in a way that makes the rejection steps more feasible. At present, we are not aware of any such known partial information, and thus a PDC algorithm analogous to the Latin square algorithm would be our default approach.
6
Final Remarks
Our motivation for these algorithms is to verify a claim from [Newton & DeSalvo, 2010], in which the authors calculated and compared the Shannon entropy of a random sample of Sudoku matrices and Latin squares of order 9 using a backtracking algorithm. The algorithm generates one row at a time, filling in entries from left to right by selecting one of the possible choices uniformly at random. If during this process a row cannot be completed, then that row and the entire previous row is deleted, and the procedure continues. This algorithm is surprisingly fast, though not necessarily unbiased, and allowed the authors to generate a sample of size 108 that was used in the estimation of Shannon’s entropy. For Sudoku matrices, the entropy was estimated as 1.73312 ± 0.000173, and for Latin squares of order 9 the entropy was estimated as 1.73544 ± 0.0001735. Assuming the bias in the backtracking algorithm is relatively small, this calculation suggests that a statistical error of at most 10−3 is required in order to definitively distinguish a difference in entropies between the set of Sudoku matrices and the set of Latin squares of order 9, and so we should expect to need an unbiased sample of size at least 106 to effectively distinguish between the entropies of these two sets of matrices. While Algorithm 3 and Algorithm 2 are provably unbiased, they each took approximately
13
24 hours each to generate a sample of size 1000 on a personal computer3 . Thus, at least for personal computing, a sample of size 106 is impractical using our approach. Using this i.i.d. sample of size 1000, an unbiased estimate for the entropy of Sudoku matrices is 1.73356 ± 0.0383708, and for Latin squares of order 9 our estimate for the entropy is 1.73335±0.0389771. These estimates are consistent with the backtracking algorithm, although we are unable to definitely resolve the question as to whether Sudoku matrices or Latin squares of order 9 have the smaller entropy. The importance sampling approach in [Ridder, 2013] is the most promising method for verifying the entropy calculations. We have supplied C++ source code used to generate the samples and posted it at https://github.com/stephendesalvo, along with the files containing the random samples of size 1000, and scripts to load in the matrices into an n × n × 1000 array in Matlab and a 1000 × n × n array in Mathematica.
7
Acknowledgements
The author would like to acknowledge helpful conversations with James Zhao and Edo Liberty.
References [Arratia & DeSalvo, 2011] Arratia, R. & DeSalvo, S. (2011). Probabilistic divide-andconquer: a new exact simulation method, with integer partitions as an example. arXiv preprint arXiv:1110.3856. [Dahl, 2009] Dahl, G. (2009). Permutation matrices related to sudoku. Linear Algebra and its Applications, 430(8), 2457–2463. [DeSalvo, 2014] DeSalvo, S. (2014). Probabilistic divide-and-conquer: deterministic second half. arXiv preprint arXiv:1411.6698. [Devroye, 1986] Devroye, L. (1986). Nonuniform random variate generation. SpringerVerlag, New York. [Duchon et al., 2004] Duchon, P., Flajolet, P., Louchard, G., & Schaeffer, G. (2004). Boltzmann samplers for the random generation of combinatorial structures. Combin. Probab. Comput., 13(4-5), 577–625. [Felgenhauer & Jarvis, 2006] Felgenhauer, B. & Jarvis, F. (2006). Mathematics of sudoku i. Mathematical Spectrum, 39(1), 15–22. 3
1.8 GHz Intel Core i7.
14
[Fontana, 2011] Fontana, R. (2011). Fractions of permutations. an application to sudoku. Journal of Statistical Planning and Inference, 141(12), 3697–3704. [Fontana, 2013] Fontana, R. (2013). Random latin squares and sudoku designs generation. arXiv preprint arXiv:1305.3697. [Fontana et al., 2012] Fontana, R., Rapallo, F., & Rogantin, M. P. (2012). Markov bases for sudoku grids. In Advanced statistical methods for the analysis of large data-sets (pp. 305–315). Springer. [Jacobson & Matthews, 1996] Jacobson, M. T. & Matthews, P. (1996). Generating uniformly distributed random latin squares. Journal of Combinatorial Designs, 4(6), 405– 437. [Jerrum & Sinclair, 1996] Jerrum, M. & Sinclair, A. (1996). The markov chain monte carlo method: an approach to approximate counting and integration. Approximation algorithms for NP-hard problems, (pp. 482–520). [McKay et al., 1979] McKay, M. D., Beckman, R. J., & Conover, W. J. (1979). Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21(2), 239–245. [Newton & DeSalvo, 2010] Newton, P. K. & DeSalvo, S. A. (2010). The shannon entropy of sudoku matrices. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Science, (pp. rspa20090522). [Propp & Wilson, 1996] Propp, J. G. & Wilson, D. B. (1996). Exact sampling with coupled markov chains and applications to statistical mechanics. Random structures and Algorithms, 9(1-2), 223–252. [Ridder, 2013] Ridder, A. (2013). Counting the number of sudoku’s by importance sampling simulation. [Russell & Jarvis, 2006] Russell, E. & Jarvis, F. (2006). Mathematics of sudoku ii. Mathematical Spectrum, 39(2), 54–58. [Von Neumann, 1951] Von Neumann, J. (1951). Various techniques used in connection with random digits. Applied Math Series, 12(36-38), 1. [Yordzhev, 2012] Yordzhev, K. (2012). Bipartite graphs related to mutually disjoint spermutation matrices. ISRN Discrete Mathematics, 2012. [Yordzhev, 2013a] Yordzhev, K. (2013a). On the number of disjoint pairs of s-permutation matrices. Discrete Applied Mathematics, 161(18), 3072–3079. [Yordzhev, 2013b] Yordzhev, K. (2013b). Random permutations, random sudoku matrices and randomized algorithms. arXiv preprint arXiv:1312.0192.
15