Using Local Graph Topology to Model Hard Binary Constraint Satisfaction Problems Michael J. Dent and Robert E. Mercer Dept. of Computer Science, The University of Western Ontario London, Ont., Canada N6A 5B7 e-mail:
[email protected],
[email protected] Abstract
Smith (1994), Smith & Dyer (1995), and Prosser (1994a,1994b), have recently shown how to model hard binary Constraint Satisfaction Problems (CSPs). Their model parameterizes CSPs with a global constraint tightness. This paper presents a new model which parameterizes CSPs with local constraint tightness parameters. The new model combines Smith's predictor of the transition point p^2crit with local graph topology. We show empirically that the new model produces problems that are on average as hard or harder to search than the old model, produces problems with a wide range of local constraint tightness, and produces problems that are closer to the peak of the phase transition. We show for sparse constraint graphs that the new model produces problems much closer to the phase transition than the old model. Our results indicate that the underlying structure of a problem needs to be included in models of hard problems.
1 Introduction
Many problems in Arti cial Intelligence and Operations Research can be expressed as Constraint Satisfaction Problems (CSPs) [Mackworth, 1992; Tsang, 1993]. A CSP is represented by a set of variables, a set of nite discrete domains for those variables, and a set of constraints over those variables. A solution to a CSP is an assignment of values to variables which satis es all the given constraints. Although CSPs are NP-complete, worst case analysis has little to say about the actual runtime performance of backtracking CSP algorithms on any particular instance. As an average case analysis of these algorithms is extremely dicult, empirical studies must be performed to compare algorithms [Dechter and Meiri, 1994]. Many studies have used simple problems such as the n-queens problem [Haralick and Elliot, 1980; Nadel, 1989] or the zebra problem [Prosser, 1993] for comparison, or more realistic problems such as crossword puzzles [Ginsberg et al., 1990], or graph isomorphism [McGregor, 1979]. However, the results
of these studies are biased to one particular problem leaving one to doubt whether or not they apply to other CSP domains. To overcome this bias, many papers have attempted to classify the performance of CSP algorithms on randomly generated binary CSPs [Gaschnig, 1979; Haralick and Elliot, 1980; Dechter, 1990; Benson and Freuder, 1992; Dechter and Meiri, 1994; van Run, 1994]. In a binary CSP all constraints have arity 2. A binary CSP can be interpreted as a graph in which the nodes represent the variables, the edges represent the constraints, and the contents of the nodes are the domains. Random binary CSPs are parameterized by the 4-tuple hn; m; p1 ; p2 i where n is the number of variables, m is the (global) domain size for each variable, p1 is the probability that a constraint exists between two variables (also called the constraint density), and p2 is the global conditional probability that a pair of values between the two variables are inconsistent (also called the global constraint tightness). These papers may use a slightly dierent parameterization for these 4 parameters but there is no overall dierence from the above parameterization [Prosser, 1994b]. In general, unless some heuristics are used, most random binary CSPs created with the above 4 parameters are not very satisfactory to compare algorithms as most random problem instances are easy to prove satis able or unsatis able. It has recently been shown, empirically [Cheeseman et al., 1991; Mitchell et al., 1992] and theoretically [Williams and Hogg, 1992; 1993], that for NPcomplete and NP-hard search problems a phase transition occurs as one of the order parameters that describes gross features of the problem is varied. This phase transition occurs when problems go from being under-constrained (all problems are solvable) to over-constrained (all problems are unsolvable). The transition point is de ned as the point in the phase transition where exactly half of the problems are solvable. It was observed that problems that are at the transition point have the highest average search cost. For nite problems, the phase transition occurs over a range of values of the order parameter. In the limit, the phase transition becomes instantaneous and the transition point and phase transition become synonymous. Smith (1994), Smith & Dyer (1995), and
Prosser (1994a,1994b), have shown how to generate random binary CSPs that are in this phase transition using the above CSP parameterization as the \order" parameters. Smith observed that the phase transition for random binary CSPs occurs near the point where the expected number of solutions is one [Smith, 1994; Prosser, 1994a]. Smith used an expected number of solutions formula, similar in form to one in Haralick & Elliot (1980), to predict where the phase transition is using this observation. Smith showed empirically that random problems generated in the phase transition are modeled by the formula with the exception of CSPs that have sparse graphs. These random problems are called hard problems as they are appreciably harder to search than problems that are generated outside of the phase transition. Both Smith and Prosser propose that Smith's model of hard binary CSPs be used to compare CSP algorithms empirically. This has already been done by many authors1 [Dechter and Meiri, 1994; Dent and Mercer, 1994; Frost and Dechter, 1994a; 1994b; Prosser, 1995]. We believe that the underlying structure of a problem needs to be included into models of hard problems. Smith & Dyer (1995) also conclude that local graph topology needs to be included in order to better model the transition point. With this in mind we generalize the above CSP parameterization to allow for local constraint tightness values. We show how to model hard binary CSPs using the new generalization. The new model uses local graph topology to set the local constraint tightness values. We show empirically that the new model produces problems that are of similar hardness or harder than those generated with Smith's model, produces problems with a wide range of local constraint tightness, produces problems that are closer and sometimes above the peak of the phase transition and produces problems that are closer to having one solution. We also show for sparse graphs that the new model produces problems much closer to the phase transition.
2 Hard Random Binary CSPs
A binary CSP is represented with a set of variables, V = fv1 ; : : : ; vn g, a set of nite discrete domains D = fd1 ; : : : ; dn g, mi = jdi j, and a set of re exive constraints C = fci;j j1 i < j ng. Each ci;j is that subset of di dj that is regarded as the set of acceptable pairs. We rst discuss Smith's (1994) model of hard random binary CSPs which uses a global tightness parameter and then present the new model which uses a set of local tightness parameters.
2.1 The Global Constraint Tightness Model
Smith's model of hard random binary CSPs is parameterized by a 4-tuple hn; m; p1 ; p2 i where n is the number of variables, m is the global domain size (every 1 These authors found the crossover point independently of Smith, except for Dent & Mercer (1994) which used Smith's result.
domain has the same size), p1 is the probability that a constraint exists between two variables (also called the constraint density), and p2 is the global conditional probability that any pair of values between the two variables is inconsistent (also called the global constraint tightness). Smith reasoned that problems which have an expected number of solutions less than one are over-constrained and are therefore more likely to be unsatis able and, conversely, problems with an expected number of solutions greater than one are under-constrained and are therefore more likely to be satis able. Smith conjectured that the phase transition for random binary CSPs occurs near the region where the expected number of solutions for a problem is one. Equation (1), derived from a similar formula in Haralick & Elliot (1980), can be used to calculate the expected number of solutions of a random CSP. E(Soln) = mn (1 ? p2 )
?1) p1
n(n
2
(1)
The expected number of solutions is the number of possible tuples in a CSP multiplied by the probability of satisfying all the constraints. The parameter p2 can be used to vary how constrained a problem is. In Smith's papers, the value of p2 for which the hardest problems occur (the transition point) is called p2crit. Smith conjectured that an estimator of p2crit (denoted p^2crit ) can be derived from equation (1) using E(Soln) = 1.
p^2crit = 1 ? m
?2 ?1)p1
(2)
(n
Smith (1994), Smith & Dyer (1995), and Prosser (1994a,1994b) show empirically that p^2crit is a good estimator of the transition point p2crit especially as n grows larger. The one exception is for sparse graphs, i.e. those CSPs with small values of p1 .
2.2 The Local Constraint Tightness Model
A model of hard random binary CSPs using local constraint tightness is parameterized by a 4-tuple hn; m; p1 ; P i where n is the number of variables, m is the global domain size, p1 is the probability that a constraint exists between two variables, and P is a set fp2 j1 i < j ng of local constraint tightness probabilities. That is, p2 is the local conditional probability that any pair of values between variables vi and vj is inconsistent. The expected number of solutions of a random binary CSP using the new parameterization is: i;j
i;j
E(Soln) = mn
Y
1i