Phase Transition in Finding Multiple Solutions in ... - Semantic Scholar

1 downloads 0 Views 175KB Size Report
Alvin C. M. Kwan, Edward P. K. Tsang, James E. Borrett email address: {alvin ... problems (CSPs) [Williams & Hogg 94] [Prosser 94]. [Smith 94]. The results of ...
Phase Transition in Finding Multiple Solutions in Constraint Satisfaction Problems Alvin C. M. Kwan, Edward P. K. Tsang, James E. Borrett email address: {alvin, edward, jborrett}@essex.ac.uk Department of Computer Science, University of Essex Wivenhoe Park, Colchester, Essex CO4 3SQ, United Kingdom Abstract The study of phase transitions in constraint satisfaction problems has attracted much attention recently. Most of the research so far attempts to locate the phase transition for constraint satisfaction problems in which single solution is required. Smith notes that the phase transition is only an interesting event if just one solution is required. Our study reveals that a phase transition also occurs when more than one solution to a CSP is required. Our findings indicate that its location is a function of the number of solutions required. These results are obtained by generalising some previous work done by Williams and Hogg, and Smith.

1

Introduction

Since the publication of Cheeseman’s paper on locating hard problems for NP-complete problems [Cheeseman et. al. 91], much research has been done on studying the same issue on satisfiability problems [Mitchell et. al. 92] [Gent & Walsh 94] and constraint satisfaction problems (CSPs) [Williams & Hogg 94] [Prosser 94] [Smith 94]. The results of their research indicate that problems are easy to solve or prove insoluble if they are away from phase transition whereas problems near phase transition are typically hard to solve. The above work focussed on finding the first solution of CSPs. In this paper we focus our attention on phase transition in CSPs in which multiple solutions are required. Empirical results described in [Williams & Hogg 94] support the theory that the average and median search effort for solving CSPs is maximal when the problems are near to the boundary which separates the soluble and insoluble regions. This means that when only the first solution is required, the phase transition in search cost corresponds to the phase transition of problem's solubility. Work by Smith and Prosser supports this theory by providing empirical results on randomly generated binary CSPs, with the main difference in their work being that they use median search effort only [Smith 94] [Prosser 94].

Smith notes that the phase transition is only an interesting event if just one solution is required after discovering the phenomenon does not occur when all solutions of problems are to be found [Smith 94]. However, we have found that phase transition in search cost does occur when multiple solutions of a CSP are needed. More importantly it is the number of solutions required that affects where phase transition occurs1. Furthermore, some phenomena of phase transitions for "find-one-solution" problems were observed on phase transitions for problems that need multiple solutions in our experiments. We believe that our work is useful in practice. For example, there are applications which need a good but not necessarily an optimal solution. This may be the case where the evaluation criteria cannot be easily expressed in problem solver. For such applications, a fixed number of solutions could be determined for subsequent evaluation by domain experts. If a problem of this kind is predicted near phase transition, endusers may find it too time-consuming to allow the problem solving proceeded that way. They may determine to reduce the number of solutions to be found or to relax some of the problem’s constraints for the sake of efficiency. In the next section, we will suggest a generalisation of an equation from [Williams & Hogg 94] to demonstrate the search cost for i solutions of a problem is maximal when that problem has exactly i solutions. By modifying Smith’s phase transition predictor accordingly [Smith 94], a generalised phase transition predictor, which accommodates the number of solutions required as a parameter, is obtained. Experiments were conducted on randomly generated binary CSPs for evaluating the accuracy of the generalised predictor. The experimental design and results will be described in Section 3 and Section 4 respectively. Conclusions are given in the final section.

1

Phase transition refers to phase transitions in solubility (for finding the first solution), in solution adequacy (for finding multiple solutions) and in search cost in this paper. The meaning should be clear from its context.

2

Phase Transitions

In [Cheeseman et. al. 91], phase transition is referred to as the change of state in which all problems turn from soluble to insoluble. Note that phase transition in search cost is also used in the same paper to refer to the occurrence of a peak in the average computational cost. The important point is that both phase transitions occur in the same place. Since we are interested in finding a fixed number of solutions for CSPs, say i, in our context the phase transition is considered as a state where all problems do have i or more solutions to a state which they do not. We first review the terminology and equations suggested in [Williams & Hogg 94] and then adapt it so that the number of solutions required is taken into consideration. Table 1 defines the notation used in this paper. Notations S Nsoln ( S )

Interpretation set of minimised nogoods for the problem number of solutions

C( S )

cost to find all solutions

C1 ( S )

cost to find a solution or determine there are none

C p1 ( S )

estimated cost for C1 ( S )

Ci ( S )

cost to find the first i solutions or determine there are fewer than i estimated cost for Ci ( S )

C pi ( S )

& Hogg 94])

(1)

which gives the approximate cost of finding the first solution, if it exists, or else exhausting the search space. Similarly, the estimated cost of finding the first i solutions of a problem is given by  i ×C(S )/ Nsoln (S ) if Nsoln (S)>i −1, C pi (S ) =  if Nsoln (S)i −1, C( S) if N soln (S) i −1 , the number of solutions

Table 1 is a modified form of the Table 1 in [Williams & Hogg 94]. Assuming that the solutions are distributed evenly across the search space, the cost of finding a solution is roughly equal to the overall cost for finding all solutions divided by the number of solutions. Thus the estimated cost for finding the first solution, C p1 (S) , is given by (Equation (2) in [William

if Nsoln (S )>0, if Nsoln (S)=0,

 i × C( S) ≈ min C( S) ,  max 1, N soln ( S) 

decreases as ln C pi ( S)

Table 1. Global measures associated with a CSP

 C(S )/ Nsoln (S ) C p1 (S) =   C(S )

  i × C(S )  C pi (S) = min C(S ), max(1, N soln (S )) 

A denote average value of the numbers in

A. The average value of the estimated cost for C pi is

N so ln (S) = m n ( p2) 2 which appears in [Smith 94].

p1( n ( n −1))

(5)

Thus, the p2 where phase transition in cost takes place for finding i solutions, denoted by p2 hardi , given fixed values for n, m and p1 is defined as follows. 

p2 hardi

−2



   i   p1( n )( n −1)  = n m 

(6)

Note that the p2 used in [Smith 94] and [Prosser 94] is equivalent to our 1- p2.

3

Experimental Design

To evaluate the accuracy of Equation (4) in predicting the phase transition in cost, we conducted experiments on a number of randomly generated binary constraint satisfaction problems using a range of algorithmheuristic combinations. These algorithms included backjumping (BJ) [Gaschnig 78], backmarking (BM) [Gaschnig 77], forward checking (FC) [Haralick & Elliott 80], conflict-directed backjumping (CBJ), forward checking with conflict-directed backjumping (FC-CBJ) and backmarking with conflict-directed backjumping (BM-CBJ) [Prosser 93]. The minimal width ordering heuristic (MWO) [Freuder 82] was applied to all the above algorithms. In addition, the fail-first heuristic (FF) [Haralick & Elliott 80] was applied to FC and FC-CBJ. The algorithm and heuristic combinations were chosen because they represent a variety of CSP algorithms, including look ahead, look back as well as their hybrids and we would like to show that phase transition is indeed a phenomenon that lies in problems, not algorithms or heuristics, even when multiple solutions are needed. The cost parameter in which we were interested was that of compatibility checks during search. This is generally accepted to be an algorithm independent measure of search cost. The range of CSPs used in our experiments were of the class and .2 All of the binary constraints are represented in binary matrices where a “1” represents a permitted label and a “0” represents a label which is disallowed between two variables as in [Haralick & Elliott 80]. None of our problems were permitted universal constraints (that allow all pair of values) or null constraints (that allow no pairs of values). Each of the CSPs generated was guaranteed to have a connected constraint graph. In order to test our theory on phase transition, we set a limit of 500 solutions for each run of an algorithm-heuristic combination. For 500 solutions our calculation predicted that this would provide us with a clear shift in the phase transition and hence our results

2

The 10- and 20-variable problems are quite small but they still allow us to observe the phenomenon. Larger problems are prohibitive in experimentation time, although investigation with larger problems is an issue in future work.

were expected to give a good indication as to the accuracy of our theory. The p1 values were varied on a scale of 0.1 from 0.1 to 1.0.3,4 For each p1, the value of p2 which was expected to make the generated problems difficult for finding the first 500 solutions, p2 hard500 , was calculated according to Equation (6). p2 was varied by an increment of 0.01 around the expected phase transition

(

) (

)

from p2 hard500 − 0.05 to p2 hard500 + 0.05 . A set of 50 problem instances were generated for each p1 and p2 combination. In order to have a better picture of how phase transitions occur, additional samples were taken for 10-variable problems, with p2 varied from 0.05 to 0.95 by a step of 0.05, for p1 equal to 0.3, 0.6 and 0.9. Control experiments were done on these additional samples on identifying the approximate location where phase transitions occur when only the first solution and/or the first 50 solutions are needed.5 This allowed us to observe any shift of phase transition as the number of solutions to be found is changed. The expected values of p 2 hard for our experiment are given in Table 2.6

p1 0.20 0.29 0.40 0.49 0.60 0.69 0.80 0.89 1.00

n = 10 (see footnotes 1 and 2) p2 hard1 p2 hard50 p2 hard 500 0.08 0.19 0.28 0.36 0.43 0.49 0.53 0.57 0.60

0.12 0.26 0.35 0.44 0.49 0.55 0.59 0.63 0.65

0.15 0.30 0.39 0.48 0.54 0.59 0.63 0.66 0.69

n = 20 p1

p2 hard1

p2 hard 500

0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

0.09 0.30 0.45 0.55 0.62 0.67 0.71 0.74 0.76 0.78

0.12 0.35 0.50 0.59 0.66 0.71 0.74 0.77 0.79 0.81

Table 2. Theoretical values of p2, for different p1, at which phase transitions for 10-variable and 20-variable CSPs occur

3

p1 is rounded down to produce an integer number of constraints whenever an increment of 0.1 is impossible. 4 0.2 is the smallest p1 that a CSP of 10 variables would have a connected constraint graph. 5 Only BM-CBJ using MWO heuristic was chosen to run on the control experiments because of its run-time efficiency. In our implementation, the cost (in CPU time) per compatibility check performed by FC-CBJ using FF heuristic is more costly than the former for the region of our experiments. 6 p1 is approximated to two decimal places.

4

algorithm-heuristic combinations are shown. Note that other combinations also exhibit similar behaviour.

Results

In order to analyse our results, both the mean and median number of compatibility checks were calculated. Although the average cost of solving a set of problems of a fixed combination of p1 and p2 is usually greater than the median cost due to the existence of some outliers, the mean is still a useful measure since we are not interested in absolute values, but how they vary when some parameter is changed. The first feature to look at is whether our results are consistent with the theory - i.e. whether Equation (6) would be able to give us a reasonable prediction of where phase transitions occur. During phase transitions, the average and the median number of compatibility checks performed by an algorithm should be greatest [Williams & Hogg 94].7 Thus, for each of the algorithm-heuristic combinations, we examined the value of p2, for each p1, at which the average or the median number of compatibility checks is maximal, to see whether it is in line with the p2 hard500 computed from Equation (6). Our results based on maximal average cost are shown in Table 3 (for 10-variable CSPs) and Table 5 (for 20-variable CSPs). Table 4 (for 10-variable CSPs) and Table 6 (for 20-variable CSPs) are constructed in a similar way but considering the maximum of the median cost instead of the average. Note that the entries highlighted with grey background in Tables 5 and 6 lie at the end of our sampled p2 values and as a result they may not be the ultimate peak for the given p1 value. Note that the algorithm and heuristic combinations mentioned in Tables 3-6 are as follows: alg1 = FC-CBJ+MWO; alg3 = BM-CBJ+MWO; alg5 = CBJ+MWO; alg7 = FC-CBJ+FF;

alg2 = FC+MWO; alg4 = BM+MWO; alg6 = BJ+MWO; alg8 = FC+FF;

Tables 3 to 6 show a good match between the theory and the empirical results particularly when the constraint density is high. It appears that the accuracy of our theory increases with the number of variables in the problems except when constraint density is low, e.g., when p1 ≤ 0.2, where discrepancies exist. Figures 7-9 and 13-15 show the shape of the phase transitions near the theoretical p2 hard500 , considering the average number of compatibility checks, when p1 was set to 0.3, 0.6 and 0.9 for 10-variable and 20variable problems. Figures 10-12 show the median number of compatibility checks for the same values of n, m, p1 and p2.8 For clarity of presentation, only four 7

There may be some exceptionally difficult problems in a region where most problems are easy [Smith, 1994a] [Gent & Walsh 1994a]. 8 Since the phase transition diagrams based on average cost and median cost are very similar, we omitted the median cost phase transition diagrams for 20-variable problems.

p1

p2 hard 500

0.20 0.29 0.40 0.49 0.60 0.69 0.80 0.89 1.00

0.15 0.30 0.39 0.48 0.54 0.59 0.63 0.66 0.69

p2 where the maximal average number of checks is observed alg1 alg2 alg3 alg4 alg5 alg6 alg7 alg8 0.16 0.29 0.38 0.46 0.54 0.58 0.62 0.66 0.69

0.16 0.29 0.38 0.46 0.54 0.58 0.62 0.66 0.69

0.16 0.29 0.38 0.46 0.54 0.58 0.62 0.66 0.69

0.16 0.29 0.38 0.46 0.54 0.58 0.62 0.66 0.69

0.17 0.29 0.40 0.47 0.55 0.58 0.64 0.66 0.69

0.17 0.29 0.40 0.46 0.55 0.58 0.64 0.66 0.69

0.17 0.30 0.39 0.46 0.54 0.58 0.62 0.66 0.69

0.17 0.27 0.39 0.46 0.54 0.58 0.62 0.66 0.69

Table 3. Theoretical and experimental p2 hard500 (avg. cost) for different algorithm-heuristic combinations in finding the first 500 solutions of 10-variable CSPs

p1

p2 hard 500

0.20 0.29 0.40 0.49 0.60 0.69 0.80 0.89 1.00

0.15 0.30 0.39 0.48 0.54 0.59 0.63 0.66 0.69

Table 4.

p2 where the maximal median number of checks is observed alg1 alg2 alg3 alg4 alg5 alg6 alg7 alg8 0.18 0.27 0.38 0.46 0.54 0.58 0.62 0.66 0.69

0.18 0.27 0.38 0.46 0.54 0.58 0.62 0.66 0.69

0.18 0.27 0.38 0.46 0.54 0.58 0.62 0.66 0.69

0.16 0.28 0.38 0.46 0.54 0.58 0.62 0.66 0.69

0.19 0.28 0.40 0.48 0.54 0.58 0.62 0.66 0.69

Theoretical and experimental

0.19 0.28 0.40 0.48 0.54 0.58 0.62 0.66 0.69

0.17 0.27 0.39 0.46 0.54 0.59 0.62 0.66 0.69

0.17 0.27 0.39 0.46 0.54 0.59 0.62 0.66 0.69

p2 hard500

(median cost) for different algorithm-heuristic combinations in finding the first 500 solutions of 10variable CSPs p2 where the maximal average number of checks is observed p1

p2 hard 500

alg1

0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

0.12 0.35 0.50 0.59 0.66 0.71 0.74 0.77 0.79 0.81

0.15 0.37 0.51 0.60 0.66 0.71 0.74 0.77 0.79 0.81

alg2

alg3

alg4

alg5

alg6

alg7

alg8

0.15 0.37 0.51 0.60 0.66 0.71 0.74 0.77 0.79 0.81

0.15 0.37 0.51 0.60 0.66 0.71 0.74 0.77 0.79 0.81

0.17 0.37 0.51 0.60 0.66 0.71 0.74 0.77 0.79 0.81

0.17 0.37 0.51 0.60 0.66 0.71 0.74 0.77 0.79 0.81

0.17 0.37 0.51 0.60 0.66 0.71 0.74 0.77 0.79 0.81

0.14 0.36 0.50 0.60 0.66 0.71 0.74 0.77 0.79 0.81

0.16 0.37 0.50 0.60 0.66 0.71 0.74 0.77 0.79 0.81

Table 5. Theoretical and experimental p2 hard500 (avg. cost) for different algorithm-heuristic combinations in finding the first 500 solutions of 20-variable CSPs p2 where the maximal median number of checks is observed p1

p2 hard 500

alg1

alg2

alg3

alg4

alg5

alg6

alg7

alg8

0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

0.12 0.35 0.50 0.59 0.66 0.71 0.74 0.77 0.79 0.81

0.15 0.37 0.50 0.59 0.66 0.71 0.74 0.77 0.79 0.81

0.15 0.35 0.50 0.59 0.66 0.71 0.74 0.77 0.79 0.81

0.15 0.35 0.50 0.59 0.66 0.71 0.74 0.77 0.79 0.81

0.16 0.35 0.50 0.59 0.66 0.71 0.74 0.77 0.79 0.81

0.17 0.37 0.50 0.59 0.66 0.71 0.74 0.77 0.79 0.81

0.17 0.37 0.50 0.59 0.66 0.71 0.74 0.77 0.79 0.81

0.16 0.36 0.50 0.59 0.66 0.71 0.74 0.77 0.79 0.81

0.16 0.36 0.50 0.59 0.66 0.71 0.74 0.77 0.79 0.81

Table 6.

Theoretical and experimental

p2 hard500

(median cost) for different algorithm-heuristic combinations in finding the first 500 solutions of 20variable CSPs

Median of checks for p1=0.29 Mean of checks for p1=0.29 BM-CBJ+MWO

AAAA BM+MWO AAAA

NO. OF CHECKS

6000.0

FC-CBJ+FF 5000.0

CBJ+MWO

4000.0 3000.0 2000.0 1000.0 0.0

AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAA AAAAAAAA AAAA AAAA AAAA AAAA AAAAAAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAA AAAA AAAA AAAA AAAACBJ+MWO

0.25

0.27

0.29

NO. OF CHECKS

BM-CBJ+MWO

7000.0

5000.0 4500.0 4000.0 3500.0 3000.0 2500.0 2000.0 1500.0 1000.0 500.0 0.0

AAAA BM+MWO FC-CBJ+FF CBJ+MWO

AAAAAAAA AAAA AAAA AAAA AAAAAAAA AAAA AAAA AAAA AAAAAAAA AAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAA AAAA AAAAAAAA AAAAAAAA AAAA AAAA AAAA AAAA AAAAAAAA AAAA AAAA AAAAAAAA AAAAAAAA AAAAAAAAAAAACBJ+MWO

0.25

0.27

FC-CBJ+FF BM+MWO BM-CBJ+MWO

0.31 0.33 0.35

P2

Figure 7. Cross section for p1 = 0.29, n = m = 10 along the theoretical p2 hard500 = 0.30 (considering the

0.29

0.31

FC-CBJ+FF BM+MWO BM-CBJ+MWO 0.33

0.35

P2

Figure 10. Cross section for p1 = 0.29, n = m = 10 along the theoretical p2 hard500 = 0.30 (considering the median number of compatibility checks)

average number of compatibility checks) Median of checks for p1=0.6 Mean of checks for p1=0.6 FC-CBJ+FF BM-CBJ+MWO

35000.0

AAAA BM+MWO AAAA

35000.0 30000.0

FC-CBJ+FF CBJ+MWO

25000.0 20000.0 15000.0 10000.0 5000.0 0.0

AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAACBJ+MWO AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA FC-CBJ+FF

0.49

0.51

0.53

0.55

P2

CBJ+MWO

25000.0 20000.0

AAAA AAAAAAAAAAAA AAAA AAAAAAAA AAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAA AAAA AAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAAAAAA AAAAAAAA AAAA CBJ+MWO

15000.0 10000.0 5000.0 0.0

BM+MWO BM-CBJ+MWO 0.57

BM-CBJ+MWO

AAAA BM+MWO AAAA

30000.0 NO. OF CHECKS

NO. OF CHECKS

40000.0

0.49

0.51

0.53

0.59

Figure 8. Cross section for p1 = 0.60, n = m = 10 along the theoretical p2 hard500 = 0.54 (considering the average number of compatibility checks)

0.55

P2

BM+MWO BM-CBJ+MWO FC-CBJ+FF 0.57 0.59

Figure 11. Cross section for p1 = 0.60, n = m = 10 along the theoretical p2 hard500 = 0.54 (considering the median number of compatibility checks)

Mean of checks for p1=0.89 Median of checks for p1=0.89

FC-CBJ+FF 140000.0

BM-CBJ+MWO

CBJ+MWO

100000.0 80000.0 60000.0 40000.0 20000.0 0.0 0.61

AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAA AAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAA AAAA AAAA AAAA AAAAAAAA AAAA AAAA AAAACBJ+MWO AAAA AAAA AAAA AAAAAAAA AAAA

BM+MWO BM-CBJ+MWO FC-CBJ+FF

0.63 0.65 0.67 P2

BM-CBJ+MWO

AAAA BM+MWO AAAA

100000.0

CBJ+MWO 80000.0 60000.0 40000.0 20000.0 0.0 0.61

AAAA AAAA AAAAAAAA AAAA AAAA AAAA AAAA AAAAAAAA AAAAAAAA AAAA AAAA AAAA AAAA AAAAAAAA AAAA AAAA AAAA AAAAAAAA AAAAAAAA AAAA AAAA AAAA AAAA AAAAAAAA AAAAAAAA AAAA AAAAAAAA AAAA AAAA AAAAAAAA AAAA AAAA CBJ+MWO AAAAAAAA AAAAAAAA AAAA AAAA BM+MWO 0.63

0.65

BM-CBJ+MWO FC-CBJ+FF 0.67

0.69

P2

0.69 0.71

Figure 9. Cross section for p1 = 0.89, n = m = 10 along the theoretical p2 hard500 = 0.66 (considering the average number of compatibility checks)

FC-CBJ+FF

120000.0 NO. OF CHECKS

NO. OF CHECKS

140000.0

AAAA BM+MWO AAAA

120000.0

0.71

Figure 12. Cross section for p1 = 0.89, n = m = 10 along the theoretical p2 hard500 = 0.66 (considering the median number of compatibility checks)

Mean of checks for p1=0.3

NO. OF CHECKS

100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 0

BM-CBJ+MWO

AAAA AAAA BM+MWO

FC-CBJ+FF CBJ+MWO

AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA CBJ+MWO AAAAAAAA AAAA AAAA AAAA FC-CBJ+FF AAAAAAAA AAAA 0.45 BM+MWO 0.47

0.49

in which only the first solution was sought [Prosser 94]. Next we see how increasing the number of solutions to be found affects the location of phase transitions. Figures 16, 17 and 18 show the profiles of the search efforts required by BM-CBJ with MWO to find the first, 50 and 500 solutions for 10-variable CSPs by varying p2 for p1 equal to 0.29, 0.60 and 0.89 respectively. p1 = 0.29

BM-CBJ+MWO 0.51

0.53

2500

0.55

P2

average number of compatibility checks)

no. of checks

500 solutions

Figure 13. Cross section for p1 = 0.30, n = 20, m = 10 along the theoretical p2 hard500 = 0.5 (considering the

Mean of checks for p1=0.6

2000

50 solutions 1 solution

1500 1000 500 0

p2 0

FC-CBJ+FF

NO. OF CHECKS

2000000 1800000 1600000 1400000 1200000 1000000 800000 600000 400000 200000 0

BM-CBJ+MWO

AAAA BM+MWO AAAA

0.2

0.4

0.6

0.8

1

Figure 16. Cost of finding the first, 50, and 500 solutions for 10-variable CSPs with p1 = 0.29

CBJ+MWO

0.68

p1 = 0.60 12000 500 solutions

BM-CBJ+MWO FC-CBJ+FF

0.7

0.72

0.74

P2

0.76

Figure 14. Cross section for p1 = 0.60, n = 20, m = 10 along the theoretical p2 hard500 = 0.71 (considering the average number of compatibility checks)

10000 no. of checks

0.66

AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA CBJ+MWO AAAA AAAA AAAA AAAA AAAA AAAA BM+MWO AAAA AAAA AAAA AAAA

50 solutions 1 solution

8000 6000 4000 2000 0

FC-CBJ+FF

0.2

0.4

0.6

0.8

1

Figure 17. Cost of finding the first, 50, and 500 solutions for 10-variable CSPs with p1 = 0.60

BM-CBJ+MWO

AAAA AAAA BM+MWO

CBJ+MWO

p1 = 0.89

AAAAAAAA AAAA AAAA AAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAACBJ+MWO AAAA AAAAAAAA AAAA BM+MWO AAAAAAAA AAAA 0.74 0.76

0.78 P2

0.80

BM-CBJ+MWO FC-CBJ+FF

0.82

0.84

Figure 15. Cross section for p1 = 0.90, n = 20, m = 10 along the theoretical p2 hard500 = 0.79 (considering the average number of compatibility checks) It can be seen from Figures 7 to 15 that when p1 increases, both the maximal average and the median search effort increase and the definition of the curves, which depicts phase transitions, becomes more pronounced. These results are consistent with Prosser's results in his study of phase transition in binary CSPs

35000 500 solutions 50 solutions

30000

1 solution no. of checks

NO. OF CHECKS

20000000 18000000 16000000 14000000 12000000 10000000 8000000 6000000 4000000 2000000 0

p2 0

Mean of checks for p1=0.9

25000 20000 15000 10000 5000 p2

0 0

0.2

0.4

0.6

0.8

1

Figure 18. Cost of finding the first, 50, and 500 solutions for 10-variable CSPs with p1 = 0.89

CBJ with FF heuristic is arbitrary and use of other algorithm-heuristic combinations generate similar results. Although there is no p2 at which exactly 25 CSPs have 500 or more solutions and the other 25 have less, it is still quite clear, from Tables 19 and 20, that the p2 at which the peak in average search cost occurs and the p2 at which crossover happens are, if not coincide, very close (within 1% for FC-CBJ with FF heuristic) regardless of the value of p1. Accuracy may be improved by using higher resolution for p2. This would mean increasing the domain to allow changes in p2 less than 0.01. However, as mentioned earlier, our theory gives a reasonable prediction of where the phase transition happens when p1 is not low. The accuracy of the prediction improves when n or p1 increases.

Despite the fact that the peaks of the curves in Figure 16 are less well defined due to the relatively low constraint density, it is clear from Figures 16-18 that increasing the number of solutions to be found shifts the phase transitions to the right which corresponds to looser constraints. It can also be seen that the definition of the curve representing the phase transition becomes more pronounced. Finally we investigated the correlation between p 2 hard and the location of the crossover point [Crawford & Anton 93]. The results are tabulated in Tables 19 and 20. The approximate crossover points are highlighted by shaded boxes whereas the p2 values at which the peak of the average search cost for FCCBJ with FF heuristic were found in our experiments are marked in bold italics. Note that the choice of FC-

p1

p2 hard500

0.20

0.15

0.29

0.30

0.40

0.39

0.49

0.48

0.60

0.54

0.69

0.59

0.80

0.63

0.89

0.66

1.00

0.69

p2 (number of CSPs which have 500 or more solutions among the 50 samples) 0.10 (0) 0.25 (1) 0.34 (0) 0.43 (0) 0.49 (0) 0.54 (0) 0.58 (0) 0.61 (0) 0.64 (0)

0.11 (0) 0.26 (5) 0.35 (0) 0.44 (0) 0.50 (0) 0.55 (0) 0.59 (0) 0.62 (0) 0.65 (0)

0.12 (0) 0.27 (17) 0.36 (1) 0.45 (6) 0.51 (0) 0.56 (0) 0.60 (0) 0.63 (0) 0.66 (0)

0.13 (1) 0.28 (30) 0.37 (2) 0.46 (9) 0.52 (3) 0.57 (2) 0.61 (0) 0.64 (0) 0.67 (0)

0.14 (4) 0.29 (34) 0.38 (5) 0.47 (26) 0.53 (12) 0.58 (22) 0.62 (2) 0.65 (9) 0.68 (2)

0.15 (13) 0.30 (44) 0.39 (14) 0.48 (38) 0.54 (27) 0.59 (39) 0.63 (31) 0.66 (33) 0.69 (27)

0.16 (27) 0.31 (49) 0.40 (31) 0.49 (49) 0.55 (46) 0.60 (49) 0.64 (45) 0.67 (49) 0.70 (49)

0.17 (33) 0.32 (49) 0.41 (43) 0.50 (50) 0.56 (48) 0.61 (50) 0.65 (50) 0.68 (50) 0.71 (50)

0.18 (48) 0.33 (50) 0.42 (48) 0.51 (50) 0.57 (50) 0.62 (50) 0.66 (50) 0.69 (50) 0.72 (50)

0.19 (49) 0.34 (50) 0.43 (50) 0.52 (50) 0.58 (50) 0.63 (50) 0.67 (50) 0.70 (50) 0.73 (50)

0.20 (50) 0.35 (50) 0.44 (50) 0.53 (50) 0.59 (50) 0.64 (50) 0.68 (50) 0.71 (50) 0.74 (50)

Table 19. Crossover points for finding the first 500 solutions of 10-variable CSPs

p1

p2 hard500

0.10

0.12

0.20

0.35

0.30

0.50

0.40

0.59

0.50

0.66

0.60

0.71

0.70

0.74

0.80

0.77

0.90

0.79

1.00

0.81

p2 (number of CSPs which have 500 or more solutions among the 50 samples) 0.07 (0) 0.30 (0) 0.45 (0) 0.54 (0) 0.61 (0) 0.66 (0) 0.69 (0) 0.72 (0) 0.74 (0) 0.76 (0)

0.08 (0) 0.31 (0) 0.46 (0) 0.55 (0) 0.62 (0) 0.67 (0) 0.70 (0) 0.73 (0) 0.75 (0) 0.77 (0)

0.09 (0) 0.32 (0) 0.47 (0) 0.56 (0) 0.63 (0) 0.68 (0) 0.71 (0) 0.74 (0) 0.76 (0) 0.78 (0)

0.10 (0) 0.33 (0) 0.48 (0) 0.57 (0) 0.64 (0) 0.69 (0) 0.72 (0) 0.75 (0) 0.77 (0) 0.79 (0)

0.11 (0) 0.34 (5) 0.49 (11) 0.58 (1) 0.65 (2) 0.70 (4) 0.73 (0) 0.76 (0) 0.78 (0) 0.80 (0)

0.12 (4) 0.35 (12) 0.50 (23) 0.59 (14) 0.66 (22) 0.71 (33) 0.74 (16) 0.77 (22) 0.79 (13) 0.81 (16)

0.13 (8) 0.36 (19) 0.51 (34) 0.60 (38) 0.67 (49) 0.72 (50) 0.75 (47) 0.78 (50) 0.80 (50) 0.82 (50)

0.14 (17) 0.37 (33) 0.52 (49) 0.61 (49) 0.68 (50) 0.73 (50) 0.76 (50) 0.79 (50) 0.81 (50) 0.83 (50)

0.15 (31) 0.38 (46) 0.53 (50) 0.62 (50) 0.69 (50) 0.74 (50) 0.77 (50) 0.80 (50) 0.82 (50) 0.84 (50)

0.16 (35) 0.39 (47) 0.54 (50) 0.63 (50) 0.70 (50) 0.75 (50) 0.78 (50) 0.81 (50) 0.83 (50) 0.85 (50)

Table 20. Crossover points for finding the first 500 solutions of 20-variable CSPs

0.17 (45) 0.40 (50) 0.55 (50) 0.64 (50) 0.71 (50) 0.76 (50) 0.79 (50) 0.82 (50) 0.84 (50) 0.86 (50)

5

Conclusions

In this work we have extended the theory developed by Williams and Hogg, and Smith on predicting the location of the phase transition for a particular class of binary constraint satisfaction problems. Our theory allows for the prediction of the location of the phase transition in search cost when a specified number of solutions to the CSP are to be found. Empirical investigation of the accuracy of this equation has shown that it holds well provided the density of the constraint graph, expressed as p1, is not low. It remains a challenge to find an explanation for the unexpected behaviour of the model at low p1. For example, in [Prosser 94], Prosser conjectures that for low p1, the constraint graphs may be unconnected or it tends to have a large variation in the degree sequence. He suggests that these may be the reasons why the prediction on where phase transitions occur is inaccurate. However we only used connected constraint graphs in our experiments. We also computed the standard deviation of the degree of nodes of the constraint graphs and found that it has a low-high-low profile when p1 was varied from 0.1 (0.2 for 10variable case) to 1.0. The maximum standard deviation can be obtained when p1 is around 0.5. This makes the previous conjecture inadequate to explain why a prediction for p1=0.5 is more accurate than that for a lower p1. Recently, Smith suggests that the discrepancy could be due to the large variance of expected number of solutions near the theoretical crossover point for sparse problems [Smith 94b]. The validity of this hypothesis is yet to be confirmed by more empirical work.

Acknowledgments We would like to thank Barbara Smith for reading an early draft of this paper and suggesting improvements on it. The research reported in this paper was supported by the University of Essex research promotion fund ref. R7027 and by the EPSRC research grant ref. GR/J/42878. REFERENCES [Cheeseman et. al. 91] Cheeseman et. al. 1991. Where the Really Hard Problems Are. In Proceedings IJCAI-91, 331-337. [Crawford & Anton 93] Crawford, J.M. & Auton, L.D., Experimental Results on the Crossover Point in Satisfiability Problems. In Preceedings AAAI-93, 21-27. [Freuder 82] Freuder, E. 1982. A Sufficient Condition for Backtrack-Free Search. J. ACM 29, 24-32.

[Gaschnig 77] Gaschnig, J. 1977. A General Backtrack Algorithm that Eliminates Most Redundant Tests. In Proceedings IJCAI-77, 457. [Gaschnig 78] Gaschnig, J. 1978. Experimental Case Studies of Backtrack vs. Waltz-Type vs. New Algorithms for Satisficing Assignment Problems. In Proceedings of the 2nd Biennial Conference of the Canadian Society for Computational Studies of Intelligence, 268-277. [Gent & Walsh 94] Gent, I.P. & Walsh, T. 1994. The SAT Phase Transition. In Proceedings ECAI-94, 100-105-109 [Gent & Walsh 94a] Gent, I.P. & Walsh, T. 1994. Easy Problems are Sometimes Hard. Artificial Intelligence 70, 335-345. [Haralick & Elliott 80] Haralick, R.M. & Elliott, G.L. 1980. Increasing Tree Search Efficiency for Constraint Satisfaction Problems. Artificial Intelligence 14, 263-314. [Mitchell et. al. 92] Mitchell, D., Selman, B. & Levesque, H. 1992. Hard and Easy Distribution of SAT Problems. In Proceedings AAAI-92, 459-465. [Prosser 93] Prosser, P. 1993. Hybrid Algorithms for the Constraint Satisfaction Problem. Computational Intelligence 9(3), 268-299. [Prosser 94] Prosser, P. 1994. Binary Constraint Satisfaction Problems: some are harder than the others. In Proceedings ECAI-94, 95-99. [Smith 94] Smith, B. 1994. Phase Transition and the Mushy Region in Constraint Satisfaction Problems. In Proceedings ECAI-94, 100-104. [Smith 94a] Smith, B. 1994. In Search of Exceptionally Difficult Constraint Satisfaction Problems. In Proceedings ECAI-94 Workshop on Constraint Processing, 79-86. [Smith 94b] Smith, B. 1994. Locating the Phase Transition in Constraint Satisfaction Problems. Research Report 94.16, School of Computer Studies, University of Leeds. To appear in Artificial Intelligence. [Williams & Hogg 94] Williams, C. & Hogg, T. 1994. Exploiting the Deep Structure of Constraint Problems. Artificial Intelligence 70(1-2), 73-117.

Suggest Documents