Prevention is Better than Cure - Semantic Scholar

Variable Dependency in Local Search: Prevention is Better than Cure Steve Prestwich

Overview LS often scales up better than BT, but has the reputation of being inferior to BT on structured SAT instances: does badly on SAT solver competition industrial benchmarks does LS need a boost (cf clause learning)? hybridising LS with propagation or explicitly handling variable dependencies helps, but BT is still unbeaten on these problems improving LS on structured problems would have many practical applications, perhaps solving larger instances of real-world applications than is currently possible but first we must understand the cause of its poor performance 1

conjecture current modelling practices are (unintentionally) biased in favour of BT:

• in SAT modelling we often eliminate symmetry, but symmetry appears harmless or even helpful to LS

• SAT encodings of constraints sometimes improve consistency reasoning of UP, but UP is not used in most LS algorithms (and the ladder structure in some of these encodings harms LS)

• dependent variables may be introduced when aiming for compactness (eg Tseitin) 2

more specific conjecture the model feature to blame for LS’s poor performance is often dependent variables: • dependencies are known to slow down LS [Kautz, McAllester & Selman], especially in long chains [Prestwich; Wei & Selman] I test these conjectures by remodelling 2 problems whose large instances have long resisted solution by local search: parity learning & Towers of Hanoi as STRIPS planning I devising new encodings with reduced variable dependency (also higher solution densities) and boost LS performance by several orders of magnitude in both cases: solves 32-bit & 6-disk instances for the first time using a standard SAT local search algorithm (RSAPS) 3

parity learning this is a well-known SAT benchmark given vectors xi = (xi1, . . . , xin) (i = 1 . . . m) with each xij ∈ {0, 1}, a vector y = (y1, . . . , ym ) and an error tolerance integer k find a vector a = (a1 , . . . , an ) s.t. |{i : parity(a · xi) 6= yi}| ≤ k to make hard instances set m = 2n and k = 7n/8 n is the number of bits

4

hard for both BT & LS, especially 32-bit instances:

• only quite recently solved by BT [Bailleux & Boufkhad; Baumgartner & Massacci; Li; Warners & van Maaren]

• only in IJCAI’07 solved by LS (speciallydesigned algorithm) [Pham, Thornton & Sattar]

I show that, after reformulation, an off-theshelf LS (RSAPS) solves them in similar time

5

“standard encoding” (SATLIB parX-Y-c): families of clauses:

3

• calculate parities of a · xi • compute disagreements in parities

• encode a cardinality constraint to limit disagreements

(n is a power of 2 so cardinality is easy)

6

my encoding variables Ai contain solution, Pj denote parities each scalar product a · xj has parity Pj : Pj ≡

M

Ai

i∈τj

(τj = {i | xij = T }) ≤ k of m literals are true: LE(k, π1, . . . , πm ) (πj is P¯j if yj = T and Pj if yj = F ) new SAT-encodings of cardinality and parity with very short dependency chains: better for LS 7

cardinality constraint LE(k, π1, . . . , πm) says that ≤ k literals are T I use a bitwise encoding from a CP’06 workshop paper that worked well on clique problems (but not yet tested against standard cardinality encodings) first consider k = 1 (AMO): define variables bk (k = 1 . . . dlog2 me, add clauses π¯i ∨ bk [or ¯ bk ] if bit k of the binary representation of i − 1 is 1 [or 0], where k = 1 . . . dlog2 me (O(log m) new variables and O(m log m) binary clauses) 8

now k > 1: suppose we have k bins, define xij = T if πi is placed in bin j every true πi is in a bin: 

πi → 

_ j



xij 

≤ 1 πi may be placed in a bin: AMOi(xij ) using bitwise encoding highly symmetric (πi can be permuted among bins) but symmetry and LS go well together

9

parity constraints Lp (i) can SAT-encode i=1 Pi = k by enumera-

tion: exponential encoding

(ii) decompose via new variables: P1 ⊕ z 1 ≡ k P 2 ⊕ z 2 ≡ z 1 . . . Pp−3 ⊕ zp−3 ≡ zp−2 Pp ≡ zp−1 and use exponential encoding for binary & ternary constraints: linear encoding (similar to SATLIB encodings) drawback: long chain of variable dependencies

10

(iii) bisect constraint, solve 2 subproblems, merge results by a ternary constraint: bisection encoding replaces chain of length p by a tree of depth log p Lp (iv) decompose i=1 Pi = k into L2α Lα P ≡ k 1 i=1 i i=α+1 Pi ≡ k2 . . . Lβ Lp P ≡ k and β i=1 ki ≡ k i=p−α+1 i

where β = dp/αe and tree branching factor α satisfies 1 < α < p; tree depth 2 Exponentially encode remaining parity constraints: shallow encoding

11

results bisection and linear encoding very similar in flips and time to SATLIB encodings (see paper): are trees as harmful as chains? best results (β = 10, median secs to find a solution): n 8 12 16 20 24 28 32

linear 0.00 0.02 11 408 — — —

shallow 0.00 0.03 0.47 11 272 3,640 49,633

12

extrapolation: 2 years for n=32 with linear encoding [Pham, Thornton & Sattar] results similar in flips & time (but unspecified machine) best BT results are better: improve LS by similar preprocessing?

13

Towers of Hanoi SAT-based STRIPS planning achieves very good results in competitions ToH-as-STRIPS up to 6 discs has been solved by BT, but only up to 4 discs by LS (hardness increases rapidly) perhaps because it has dependency chains and only 1 solution [Selman] I design a new encoding that eliminates both these features

14

standard STRIPS as SAT set an upper bound on discrete times for the plan length define variables for (i) state after last action, and (ii) actions at each time define clauses for (i) linearity (≤ 1 action at any time), (ii) actions imply preconditions and effects, (iii) frame axioms (explanatory version: list reasons a fluent changes at each time), (iv) describe initial and goal states a series of improvements for LS...

15

ToH domain knowledge ToH is usually STRIPS-modelled like Blocks World: model which peg or disc each disc is on, and which pegs and discs are “clear” instead we can just model which peg each disc is on: the order is implied (no large discs on small ones) gives a more compact STRIPS model: no “clear” predicate, no discs “on” other discs, fewer actions

16

superparallelism an important technique in planning: parallel plans allowing more than one action at a given time plan lengths (and SAT models) can be shorter also increases the solution density of the SAT problem: a linear plan often corresponds to exponentially many parallel plans there’s no parallelism in ToH but we can create some by removing some exclusion axioms, eg...

17

• allow disk 1 to move from peg 1 to peg 2, and disk 2 to move from peg 3 to peg 2, at the same time

• allow disk 1 to move from peg 1 to peg 2, and disk 2 to move from peg 2 to peg 3, at the same time

these parallel plans can be transformed to linear ones in polynomial time: superparallelism (adds parallelism beyond any that is naturally present in the model) drawback: can’t insist on optimal plans 18

long-range dependencies frame axioms create dependency chains seems to be no way to avoid these chains, as they are a property of the problem itself and not the encoding but we can break up the chain structure by using the method of [Wei & Selman]: add implied clauses to cause long-range dependencies between times further apart than 1 unit I use a generalisation of explanatory frame axioms (GEF axioms) to time differences ≥ 1 adding all GEF axioms increases space complexity, but we can add a randomly-chosen subset of them luckily Wei & Selman showed that adding a relatively small number was optimal, and I found the same (5%) 19

implied clauses I add exclusion axioms corresponding to two disks making the same move (can never occur because the larger disk’s preconditions are unsatisfied if the smaller one is on the same peg, so these clauses are redundant)

20

results D 3 4 5 6

execution time (seconds) standard compact parallel GEF 0.096 0.0058 0.0010 0.0010 — 5.8 0.0093 0.017 — — 1.8 0.30 — — — 980

“—”: > 109 flips each technique greatly improves performance (>5 orders of magnitude in flips for 4 discs) first SAT LS results for 5 and 6 disks, comparable to the best BT results (though at the cost of reducing plan quality by superparallelism) further improvements possible by operator splitting (which reduces the space complexity of SAT-encoded planning problems) and preprocessing by unit propagation and subsumption 21

results also added GEF axioms to standard model but could not solve 4 disks more results: AdaptNovelty+ and VW faster on compact than standard, even faster with superparallelism; AdaptNovelty+ faster with GEF axioms, VW hardly affected (apart from the overhead of maintaining the additional clauses); ZChaff, SATZ & SATO all improved by compact encoding, ZChaff faster with superparallelism, SATZ and SATO slower, SATO faster with GEF axioms, ZChaff and SATZ slower in other words: compact encoding helps all algorithms, other techniques mostly help LS but are erratic on BT (modelling for LS is distinct from modelling for BT) 22

summary LS on hard structured problems can be hugely boosted by reformulation reducing dependency chains and increasing solution density seem to be key techniques when modelling for LS — but not BT [Minton, Johnston, Philips & Laird] different from aims of modelling for BT: symmetry elimination and consistency of unit propagation modelling for LS is distinct from modelling for BT, and worth studying

23

aside shouldn’t increased solution density also help BT? not necessarily: structured SAT problems may contain clusters of solutions, and Minton et al.’s nonsystematic search hypothesis is that LS benefits more than BT from high solution density this is because LS is largely immune to clustering while BT may start from a point far from any cluster

24

possible applications: • parity constraint shallow encoding may be useful for cryptanalysis • cardinality constraint encoding has many potential applications, but not yet tested against known encodings • superparallelism can be applied to STRIPS models of other planning problems • GEF axioms can be added to SAT-based planning systems • BMC has a similar structure to planning and contains parity constraints, so it may also benefit 25

conclusion LS can be hugely boosted on some structured problems by remodelling no need for complex new algorithms or analysis? could combine the 2 approaches

26