Tuning Local Search for Satis ability Testing - Programming Systems

1 downloads 0 Views 209KB Size Report
local search to other algorithm classes. 4. Local search routines can fail to nd a solution even when the ..... randomized Gsat variant that incorporates a history.
Copyright 1996, American Association for Arti cial Intelligence. All rights reserved.1

Tuning Local Search for Satis ability Testing

Andrew J. Parkes

CIS Dept. and CIRL 1269 University of Oregon Eugene, OR 97403-1269 U.S.A. [email protected]

Abstract

Local search algorithms, particularly Gsat and Wsat, have attracted considerable recent attention, primarily because they are the best known approaches to several hard classes of satis ability problems. However, replicating reported results has been dicult because the setting of certain key parameters is something of an art, and because details of the algorithms, not discussed in the published papers, can have a large impact on performance. In this paper we present an ecient probabilistic method for nding the optimal setting for a critical local search parameter, Max ips, and discuss important details of two di ering versions of Wsat. We then apply the optimization method to study performance of Wsat on satis able instances of Random 3SAT at the crossover point and present extensive experimental results over a wide range of problem sizes. We nd that the results are well described by having the optimal value of Max ips scale as a simple power of the number of variables, n, and the average run time scale sub-exponentially (basically as nlog(n) ) over the range n = 25; : : : ; 400.

INTRODUCTION

In recent years, a variety of local search routines have been proposed for (Boolean) satis ability testing. It has been shown (Selman, Levesque, & Mitchell 1992; Gu 1992; Selman, Kautz, & Cohen 1994) that local search can solve a variety of realistic and randomly generated satis ability problems much larger than conventional procedures such as Davis-Putnam. The characteristic feature of local search is that it starts on a total variable assignment and works by repeatedly changing variables that appear in violated constraints (Minton et al. 1990). The changes are typically made according to some hill-climbing heuristic  This work has been supported by ARPA/Rome Labs under contracts F30602-93-C-0031 and F30602-95-1-0023 and by a doctoral fellowship of the DFG to the second author (Graduiertenkolleg Kognitionswissenschaft). 1 The ocial version of this paper has been published by the American Association for Arti cial Intelligence, Proceedings of AAAI-96. (http://www.aaai.org)

Joachim P. Walser

Programming Systems Lab Universitat des Saarlandes Postfach 151150, 66041 Saarbrucken Germany [email protected]

strategy with the aim of maximizing the number of satis ed constraints. However, as is usual with hillclimbing, we are liable to get stuck on local maxima. There are two standard ways to overcome this problem: \noise" can be introduced (Selman, Kautz, & Cohen 1994); or, with a frequency controlled by a cuto parameter Max ips we can just give up on local moves and restart with some new assignment. Typically both of these techniques are used together but their use raises several interesting issues: 1. What value should we assign to Max ips? There are no known fundamental rules for how to set it, yet it can have a signi cant impact on performance and deserves optimization. Also, empirical comparison of di erent procedures should be done fairly, which involves rst optimizing parameters. 2. If we cannot make an optimal choice then how much will performance su er? 3. How does the performance scale with the problem size? This is especially important when comparing local search to other algorithm classes. 4. Local search routines can fail to nd a solution even when the problem instance is actually satis able. We might like an idea of how often this happens, i.e. the false failure rate under the relevant time and problem size restrictions. At present, resolving these issues requires extensive empirical analysis because the random noise implies that di erent runs can require very di erent runtimes even on the same problem instance. Meaningful results will require an average over many runs. In this paper we give a probabilistic method to reduce the computational cost involved in Max ips optimization and also present scaling results obtained with its help. The paper is organized as follows: Firstly, we discuss a generic form of a local search procedure, and present details of two speci c cases of the Wsat algorithm (Selman, Kautz, & Cohen 1994). Then we describe the optimization method, which we call \retrospective parameter variation " (rpv), and show how it allows data collected at one value of Max ips to be reused to produce runtime results for a range of values. We

note that the same concept of rpv can also be used to study the e ects of introducing varying amounts of parallelization into local search by simulating multiple threads (Walser 1995). Finally, we present the results of experiments to study the performance of the two versions of Wsat on Random 3SAT at the crossover point (Cheeseman, Kanefsky, & Taylor 1991; Mitchell, Selman, & Levesque 1992; Crawford & Auton 1996). By making extensive use of rpv, and fast multi-processor machines, we are able to give results up to 400 variables. We nd that the optimal Max ips setting scales as a simple monomial, and the mean runtime scales subexponentially, but faster than a simple power-law.

LOCAL SEARCH IN SAT

the choice of heuristic. In this paper we focus on the two selection strategies, as given in Figure 2. Here, fP denotes the number of clauses that are xed (become satis ed) if variable P is ipped. Similarly, bP is the number that break (become unsatis ed) if P is ipped. Hence, fP ? bP is simply the net increase in the number of satis ed clauses. Wsat/G

proc select{variable( ; A) C := a random unsatis ed clause with probability p :

S := random variable in C probability 1 ? p : S := variable in C with maximal fP ? bP

end return S

Figure 1 gives the outline of a typical local search routine (Selman, Levesque, & Mitchell 1992) to nd a satisfying assignment for a set of clauses 1. proc Local-Search-SAT Input clauses ; Max ips; and Maxtries for i := 1 to Maxtries do A := new total truth assignment for j := 1 to Max ips do if A satis es then return A P := select{variable ( ; A) A := A with P ipped

end Wsat/SKC proc select{variable( ; A)

end

end

end end return \No satisfying assignment found"

Figure 1: A generic local search procedure for SAT.

Here, local moves are \ ips" of variables that are chosen by select-variable , usually according to a randomized greedy strategy. We refer to the sequence of

ips between restarts (new total truth assignments) as a \try", and a sequence of tries nishing with a successful try as a \run". The parameter Maxtries can be used to ensure termination (though in our experiments we always set it to in nity). We also assume that the new assignments are all chosen randomly, though other methods have been considered (Gent & Walsh 1993).

Two Wsat Procedures

In our experiments, we used two variants of the Wsat { \walk" satis ability class of local search procedures. This class was introduced by Selman et. al. (Selman, Kautz, & Cohen 1994) as \Wsat makes ips by rst randomly picking a clause that is not satis ed by the current assignment, and then picking (either at random or according to a greedy heuristic) a variable within that clause to ip." Thus Wsat is a restricted version of Figure 1 but there remains substantial freedom in A clause is a disjunction of literals. A literal is a propositional variable or its negation. 1

C := a random unsatis ed clause u := minS 2C bS if u = 0 then S := a variable P 2 C with bP = 0

else with probability p :

S := random variable in C probability 1 ? p : S := variable P 2 C with minimal bP

end return S

Figure 2: Two Wsat variable selection strategies. Ties for the best variable are broken at random. The rst strategy2 Wsat/G is simple hillclimbing on the net number of satis ed clauses, but perturbed by noise because with probability p, a variable is picked randomly from the clause. The second procedure Wsat/SKC is that 3of a version of Wsat by Cohen, Kautz, and Selman. We give the details here because they were not present in the published paper (Selman, Kautz, & Cohen 1994), but are none-the-less rather interesting. In particular, Wsat/SKC uses a less obvious, though very e ective, selection strategy. Firstly, hill-climbing is done solely on the number of clauses that break if a variable is ipped, and the number of clauses that get xed is ignored. Secondly, a random move is never made if it is possible to do a move in which no previously satis ed clauses become broken. In all it exhibits a sort of \minimal greediness", in that it de nitely xes the one randomly selected clause but otherwise merely tries to minimize the damage to the already satis ed clauses. In contrast, Wsat/G is greedier and will blithely cause lots of damage if it can get paid back by other clauses. 2 3

Andrew Baker, personal communication. Publically available, ftp://ftp.research.att.com/dist/ai

RETROSPECTIVE VARIATION OF MAXFLIPS

We now describe a simple probabilistic method for ef ciently determining the Max ips dependence of the mean runtime of a randomized local search procedure such as Wsat. We use the term \retrospective" because the parameter is varied after the actual experiment is over. As discussed earlier, a side-e ect of the randomness introduced into local search procedures is that the runtime now varies between runs. It is often the case that this \inter-run " variation is large and to give meaningful runtime results we need an average over many runs. Furthermore, we will need to determine the mean runtime over a range of values of Max ips. The naive way to proceed is to do totally independent sets of runs at many di erent Max ips values. However, this is rather wasteful of data because the successful try on a run often uses many fewer ips than the current Max ips, and so we should be able to re-use it (together with the number of failed tries) to produce results for smaller values of Max ips. Suppose we take a single problem instance  and make many runs of a local search procedure with Max ips=mD , resulting in a sample with a total of N tries. The goal is to make predictions for Max ips = m < mD . Label the successful tries by i, and let xi be the number of ips it took i to succeed. Write the bag of all successful mtries as S0 = fx1; : : :; xlg and de ne a reduced bag S0 by removing tries that took longer than m S0m := fxi 2 S0 j xi  mg: (1) We are asssuming there is randomness in each try and no learning between tries so we can consider the tries to be independent. From this it follows that the new bag provides us with information on the distribution of ips for successful tries with Max ips=m. Also, an estimate for the probability that a try will succeed within m ips is simply jS m j pm  N0 (2) Together S0m and pm allow us to make predictions for the behaviour at Max ips=m. In this paper we are concerned with the expected (mean) number of ips E;m for the instance  under consideration. Let S0m be the mean of the elements in the bag S0m . With probability pm , the solution will be found on the rst try, in which case we expect S0m ips. With probability (1 ? pm ) pm , the rst try will fail, but the second will succeed, in which case we expect m + S0m ips, and so on. Hence, E;m =

X1 (1 ? pm)k pm (k m + Sm ) k=0

0

(3)

which simpli es to give the main result of this section (4) E;m = (1=pm ? 1) m + S0m

with pm and S0m estimated from the reduced bag as de ned above. This is as to be expected since 1=pm is the expected number of tries. It is clearly easy to implement a system to take single data-set obtained at Max ips=mD , and estimate the expected number of ips for many di erent smaller values of m. Note that it might seem that a more direct and obvious method would be to take the bag of all runs rather than tries, and then simply discard runs in which the nal try took longer than m. However, such a method would discard the information that the associated tries all took longer than m. In contrast, our method captures this information: the entire population of tries is used. Instance Collections To deal with a collection, C, of instances we apply rpv to each instance individually and then proceed exactly as if this retrospectively simulated data had been obtained directly. For example, the expected mean number of ips for C is Em = jC1 j E;m: (5)  2C Note that this rpv approach is not restricted to means, but also allows the investigation of other statistical measures such as standard deviation or percentiles. Practical Application of RPV The primary limit on the use of rpv arises from the need to ensure that the bag of successful tries does not get too small and invalidate the estimate for pm . Since the bag size decreases as we decrease m it follows that there will be an e ective lower bound on the range over which we can safely apply rpv from a given mD . This can be o set by collecting more runs per instance. However, there is a tradeo to be made: If trying to make predictions at too small a value of m it becomes more ecient to give up on trying to use the data from Max ips=mD and instead make a new data collection at smaller Max ips. This problem with the bag size is exacerbated by the fact that di erent instances can have very di erent behaviours and hence di erent ranges over which rpv is valid. It would certainly be possible to do some analysis of the errors arising from the rpv. The data collection system could even do such an analysis to monitor current progress and then concentrate new runs on the instances and values of Max ips for which the results are most needed. In practice, we followed a simpler route: we made a xed number of runs per instance and then accepted the rpv results only down to values of m for some xed large fraction of instances still had a large enough bagsize. Hence, rpv does not always remove the need to consider data collection at various values of Max ips, however, it does allow us to collect data at more widely separated Max ips values and then interpolate between the resulting \direct" data points: This saves a time-consuming ne-grained data-collection, or binary search through Max ips values.

X

350000

EXPERIMENTAL RESULTS

Overall Max ips Dependence

Our aim here is to show the usage of rpv and also give a broad picture of how the mean runtime Em varies with m for the two di erent Wsat variants. We took a xed, but randomly selected, sample of 103 instances at (n; l) = (200; 854), for which we made 200

300000 250000 flips

To evaluate performance of satis ability procedures, a class of randomized benchmark problems, Random 3SAT, has been studied extensively (Mitchell, Selman, & Levesque 1992; Mitchell 1993; Crawford & Auton 1996). Random 3SAT provides a ready source of hard scalable problems. Problems in random k-SAT with n variables and l clauses are generated as follows: a random subset of size k of the n variables is selected for each clause, and each variable negated with probability 1=2. If instances are taken from near the crossover point (where 50% of the randomly generated problems are satis able) then the fastest systematic algorithms, such as Tableau (Crawford & Auton 1996), show a well-behaved increase in hardness: time required scales as a simple exponential in n. In the following we present results for the performance of both variants of Wsat on satis able Random 3SAT problems at the crossover point. We put particular emphasis on nding the Max ips value m at which the mean runtime averaged over all instances is a minimum. Note that the clause/variable ratio is not quite constant at the crossover point but tends to be slightly higher for small n. Hence, to avoid \falling o the hardness peak", we used the experimental results (Crawford & Auton 1996) for the number of clauses at crossover, rather than using a constant value such as 4:3n. To guarantee a fair sample of satis able instances we used Tableau to lter out the unsatis able instances. At n=400 this took about 2-4 hours per instance, and so even this part was computationally nontrivial, and in fact turned out to be the limiting factor for the maximum problem size. For Wsat/SKC, the setting of the noise parameter p has been reported to be optimal between 0.5 and 0.6 (Selman, Kautz, & Cohen 1994). We found evidence that such values are also close to optimal for Wsat/G, hence we have produced all results here with p = 0:5 for both Wsat variants, but will discuss this further in the next section. We present the experiments in three parts. Firstly, we compare the two variants of Wsat using a small number of problem instances but over a wide range of Max ips values to show the usage of rpv to determine their Max ips dependencies. We then concentrate on the more ecient Wsat/SKC, using a medium number of instances, and investigate the scaling properties over the range n = 25; : : :; 400 variables. This part represents the bulk of the data collection, and heavily relied on rpv. Finally, we look at a large data sample at n = 200 to check for outliers.

200000 WSAT/G RPV WSAT/G direct WSAT/SKC RPV WSAT/SKC direct

150000 100000 50000 0 1e+03

1e+04

1e+05

1e+06

1e+07

1e+08

m

Figure 3: Variation of Em against Max ips=m. Based on 1k instances of (200,854) with 200 runs per instance. runs on each instance at m=5k, 20k, 80k, 800k, 8000k,

1 using both variants of Wsat. At each m we directly

calculated Em . We then applied rpv to the samples to extend the results down towards the next data-point. The results are plotted in Figure 3. Here the error bars are the 95% con dence intervals for the particular set of instances, i.e. they re ect only the uncertainty from the limited number of runs, and not from the limited number of instances. Note that the rpv lines from one data point do indeed match the lower, directly obtained, points. This shows that the rpv is not introducing signi cant errors when used to extrapolate over these ranges. Clearly, at p = 0:5, the two Wsat algorithms exhibit very di erent dependencies on Max ips: selecting too large a value slows Wsat/G down by a factor of about 4, in contrast to a slowdown of about 20% for Wsat/SKC. At Max ips=1 the slowest try for Wsat/G took 90 minutes against 5 minutes for the worst e ort from Wsat/SKC. As has been observed before (Gent & Walsh 1995), \random walk" can significantly reduce the Max ips sensitivity of a local search procedure: Restarts and noise ful ll a similar purpose by allowing for downhill moves and driving the algorithm around in the search space. Experimentally we found that while the peak performance Em varies only very little with small variation of p (0:05), the Max ips sensitivity can vary quite remarkably. This topic warrants further study and again rpv is useful since it e ectively reduces the two-dimensional parameter space (p; m) to just p. While the average di erence for Em between the two Wsats on Random 3SAT is typically about a factor of two, we found certain instances of circuit synthesis problems (Selman, Kautz, & Cohen 1994) where Wsat/SKC is between 15 and 50 times faster. Having identi ed the better Wsat, we will next be concerned with its optimized scaling.

Extensive Experiments on Wsat/SKC

We would now like to obtain accurate results for the performance of Wsat/ SKC on the Random 3SAT do-

1.0e+05

1.0e+05 flips

1.0e+04 flips

1.0e+06

m_{5-} m* a n^b

1.0e+03

1.0e+04

mean f(n) g(n)

1.0e+03

1.0e+02

1.0e+02 1.0e+01 25

50 100 150 200 n, number of variables

300 400

Figure 4: The variation of m5? and m with n. We use m1+ and m1? as the upper and lower error bounds on m . main. So far, we have only considered con dence intervals resulting from the inter-run variation. Unfortunately, for the previous sample of 1000 instances, the error in Em arising from the inter-instance variation is much larger (about 25%). This means that to obtain reasonable total errors we needed to look at larger samples of instances. We were mostly interested in the behaviour at and near m . Preliminary experiments indicated that making the main data collection at just m = 0:5n2 would allow us to safely use rpv to study the minimum. Hence, using Wsat/SKC at p = 0:5, we did 200 runs per instance at m = 0:5n2, and then used rpv to nd m . The results are summarized in Table 1, and we now explain the other entries in this table. We found it useful to characterize the Em curves by values of Max ips which we denote by mk? , and de ne as the largest value of m such that m < m and Em is k% we de ne mk+ as the smalllarger than Em . Similarly est value of m > m with the same property. We can easily read these o from the curve produced by rpv. The rpv actually produces the set E;m and so we also sorted these and calculated various percentiles of the distribution (the 99th percentile means that we expect that 99% of the instances will, on average, be solved in this number of ips). Finally, the error on Em is the 95% con dence level as obtained from the standard deviation of the E;m sample (inter-instance). The error from the limited number of runs was negligible in comparison. In the next section we interpret this data with respect to how m and Em vary with n. We did not convert ips to times because the actual ips rate varies remarkably little (from about 70k- ips/sec down to about 60k- ips/sec).

Scaling of Optimal Max ips

In Figure 4 we can see how m and m5? vary with n. In order to interpret this data we tted the function anb against m5? because the Em curves are rather at and so m is relatively ill-de ned. However, they seem to have the same scaling and also m > m5? by de ni-

1.0e+01 50 100 150 200 250 300 350 400 450 n, number of variables

Figure 5: The scaling behaviour of Wsat/SKC at crossover. The data points are Em together with its 95% con dence limits. The lines are best ts of the functions given in the text. tion. We obtained a best t4 with the values a = 0:02 and b = 2:39 (the resulting line is also plotted in Figure 4). Similar results for Wsat/G also indicate a very similar scaling of n2:36. For comparison Hsat, a nonrandomized Gsat variant that incorporates a history mechanism, has been observed to have a m scaling of 1:65 n (Gent & Walsh 1995).

Scaling of Performance

In Figure 5 we plot the variation of Em with n. We can see that the scaling is not as fast as a simple exponential in n, however the upward curve of the corresponding log-log plot (not presented) indicates that it is also worse than a simple monomial. Unfortunately, we know of no theoretical scaling result that could be reasonably compared with this data. For example, results are known (Koutsoupias & Papadimitriou 1992) when the number of clauses is (n2 ), but they do not apply because the crossover is at O(n). Hence we approached the scaling from a purely empirical perspective, by trying to t functions to the data that can reveal certain characteristics of the scaling. We also nd it very interesting that m so often seems to t a simple power law, but are not aware of any explanation for this. However, the ts do provide an idea of the scaling that might have practical use, or maybe even lead to some theoretical arguments. We could nd no good 2-parameter t to the Em curve, however the following functions give good ts, and provide the lines on Figure 5. (For f(n), we found the values f1 = 12:5  2:02, f2 = ?0:6  0:07, and f3 = 0:4  0:01.) f(n) = f1 nf2 +f3 lg(n) g(n) = g1 exp(ng2 (1 + g3=n)) The fact that such di erent functions give good ts to the data illustrates the obviously very limited dis4 Using the Marquardt-Levenberg algorithm as implemented in Gnu t by C. Grammes at the Universitat des Saarlandes, Germany, 1993.

Vars 25 50 100 150 200 250 300 350 400

Cls 113 218 430 641 854 1066 1279 1491 1704

m m5? Em 95%-cnf. Median 99-perc. 70 37 116 2 94 398 375 190 591 12 414 2,876 2,100 1,025 3,817 111 2,123 27,367 6,100 2,925 13,403 486 5,891 120,597 11,900 5,600 36,973 2,139 12,915 406,966 15,875 8,800 92,915 8,128 25,575 1,050,104 23,200 13,100 171,991 15,455 43,314 2,121,809 32,000 19,300 334,361 69,850 65,574 4,258,904 43,500 27,200 528,545 114,899 96,048 11,439,288

Table 1: Experimental results for Wsat/SKC with p = 0:5 on Random 3SAT at crossover. The results are based on 10k instances (25{250 variables), 6k instances (300 vars), 3k instances (350 vars) and 1k instances (400 vars). criminating power of such attempts at empirical ts. We certainly do not claim that these are asymptotic complexities! However, we include them as indicative. The fact that f3 > 0 indicates a scaling which is similar to a simple power law except that the exponent is slowly growing. Alternatively, the fact that we found g2  0:4, can be regarded as a further indication that scaling is slower than a simple exponential (which would have g2 = 1).

Testing for Outliers

One immediate concern is whether the average runtimes quoted above are truly meaningful for the problem class Random 3SAT. It could easily be that the effect of outliers, instances that Wsat takes much longer to solve, eventually dominates. That is, as the sample size increases then we could get suciently hard instances and with sucient frequency such that the mean would drift upwards (Mitchell 1993). To check for this e ect we decided to concentrate on (n; l) = (200; 854) and took 105 instances. Since we only wanted to check for outliers, we did not need high accuracy estimates and it was sucient to do just 20 runs/instance of Wsat/SKC at Max ips=80k, not using rpv. We directly calculated the mean for each seed: in Figure 6 we plot the distribution of the lg(E;m ). 100000 instances

10000 1000 100 10 1 6 8 10 12 14 16 18 20 22 24 26 28 i

Figure 6: From a total of 105 instances we give the number of instances for which the expected number of

ips E;m is in the range 2i  E;m < 2i+1 . This is for Wsat/SKC at m = 80k, based on 20 runs per instance.

If the tail of the distribution were too long, or if there were signs of a bimodal distribution with a small secondary peak but at a very large Em then we would be concerned about the validity of the means quoted above. However we see no signs of any such e ects. On the contrary the distribution seems to be quite smooth, and its mean is consistent with the previous data. However, we do note that the distribution tail is suciently large that a signi cant fraction of total runtime is spent on relatively few instances. This means that sample sizes are e ectively reduced, and is the reason for our use of 10k samples for the scaling results where computationally feasible.

RELATED WORK

Optimal parameter setting of Max ips and scaling results have been examined before by Gent and Walsh (Gent & Walsh 1993; 1995). However, for each sample point (each n) Max ips had to be optimized experimentally. Thus, the computational cost of a systematic Max ips optimization for randomized algorithms was too high for problems larger than 100 variables. This experimental range was not sucient to rule out a polynomial runtime dependence of about order 3 (in the number of variables) for Gsat (Gent & Walsh 1993). A sophisticated approach to the study of incomplete randomized algorithms is the framework of Las Vegas algorithms which has been used by Luby, Sinclair, and Zuckermann (Luby, Sinclair, & Zuckerman 1993) to examine optimal speedup. They have shown that for a single instance there is no advantage to having Max ips vary between tries and present optimal cuto times based on the distribution of the runtimes. These results, however, are not directly applicable to average runtimes for a collection of instances. Local Search procedures also have close relations to simulated annealing (Selman & Kautz 1993). Indeed, combinations of simulated annealing with Gsat have been tried for hard SAT problems (Spears 1995). We can even look upon the restart as being a short period of very high temperature that will drive the variable assignment to a random value. In this case we nd it

interesting that work in simulated annealing also has cases in which periodic reheating is useful (Boese & Kahng 1994). We intend to explore these connections further.

CONCLUSIONS

We have tried to address the four issues in the introduction empirically. In order to allow for optimizing Max ips, we presented retrospective parameter variation (rpv), a simple resampling method that signi cantly reduces the amount of experimentation needed to optimize certain local search parameters. We then studied two di erent variants of Wsat, which we label as Wsat/G and Wsat/SKC (the latter due to Selman et al.). The application of rpv revealed better performance of Wsat/SKC. An open question is whether the relative insensitivity of Wsat/SKC to restarts carries over to realistic problem domains. We should note that for general SAT problems the Max ips-scaling is of course not a simple function of the number of variables and p only, but is also a ected by other properties such as the problem constrainedness. To study the scaling behaviour of local search, we experimented with Wsat/SKC on hard Random 3SAT problems over a wide range of problem sizes, applying rpv to optimize performance. Our experimental results strongly suggest subexponential scaling on Random 3SAT, and we can thus support previous claims (Selman, Levesque, & Mitchell 1992; Gent & Walsh 1993) that local search scales signi cantly better than Davis-Putnam related procedures. Unfortunately rpv cannot be used to directly determine the impact of the noise parameter p, however it is still useful since instead of varying two parameters, only p has to be varied experimentally, while di erent Max ips values can be simulated. We plan to extend this work in two further directions. Firstly, we intend to move closer to real problems. For example, investigations of binary encodings of scheduling problems reveal a strong correlation between the number of bottlenecks in the problems and optimal Max ips for Wsat/G. Secondly, we would like to understand the Max ips-dependence in terms of behaviours of individual instances.

ACKNOWLEDGEMENTS

We would like to thank the members of CIRL, particularly Jimi Crawford, for many helpful discussions, and Andrew Baker for the use of his Wsat. We thank David McAllester and our anonymous reviewers (for both AAAI and ECAI) for very helpful comments on this paper. The experiments reported here were made possible by the use of three SGI (Silicon Graphic Inc.) Power Challenge systems purchased by the University of Oregon Computer Science Department through NSF grant STI-9413532.

References

Boese, K., and Kahng, A. 1994. Best-so-far vs. whereyou-are: implications for optimal nite-time annealing. Systems and Control Letters 22(1):71{8. Cheeseman, P.; Kanefsky, B.; and Taylor, W. 1991. Where the really hard problems are. In Proceedings IJCAI-91. Crawford, J., and Auton, L. 1996. Experimental results on the crossover point in Random 3SAT. Arti cial Intelligence. To appear. Gent, I., and Walsh, T. 1993. Towards an understanding of hill-climbing procedures for SAT. In Proceedings AAAI-93, 28{33. Gent, I., and Walsh, T. 1995. Unsatis ed variables in local search. In Hybrid Problems, Hybrid Solutions (Proceedings of AISB-95). IOS Press. Gu, J. 1992. Ecient local search for very large-scale satis ability problems. SIGART Bulletin 3(1):8{12. Koutsoupias, E., and Papadimitriou, C. 1992. On the greedy algorithm for satis ability. Information Processing Letters 43:53{55. Luby, M.; Sinclair, A.; and Zuckerman, D. 1993. Optimal speedup of Las Vegas algorithms. Technical Report TR-93-010, International Computer Science Institute, Berkeley, CA. Minton, S.; Johnston, M. D.; Philips, A. B.; and Laird, P. 1990. Solving large-scale constraint satisfcation and scheduling problems using a heuristic repair method. Arti cial Intelligence 58:161{205. Mitchell, D.; Selman, B.; and Levesque, H. 1992. Hard and easy distributions of SAT problems. In Proceedings AAAI-92, 459{465. Mitchell, D. 1993. An empirical study of random SAT. Master's thesis, Simon Fraser University. Selman, B., and Kautz, H. 1993. An empirical study of greedy local search for satis ability testing. In Proceedings of IJCAI-93. Selman, B.; Kautz, H.; and Cohen, B. 1994. Noise strategies for improving local search. In Proceedings AAAI-94, 337{343. Selman, B.; Levesque, H.; and Mitchell, D. 1992. A new method for solving hard satis ability problems. In Proceedings AAAI-92, 440{446. Spears, W. 1995. Simulated annealing for hard satis ability problems. DIMACS Series on Discrete Mathematics and Theoretical Computer Science. To appear. Walser, J. 1995. Retrospective analysis: Re nements of local search for satis ability testing. Master's thesis, University of Oregon.