A parallel solving algorithm for quantified constraints problems J´er´emie Vautard
Arnaud Lallouet
Youssef Hamadi
University of Caen — GREYC BP 5186 - 14032 Caen
[email protected]
University of Caen — GREYC BP 5186 - 14032 Caen
[email protected]
Microsoft Research 7 J J Thomson Avenue Cambridge, United Kingdom
[email protected]
Abstract—Quantified constraint satisfaction problems have been the topic of an increasing number of studies for a few years. However, only sequential resolution algorithms have been proposed so far. This paper presents a parallel QCSP+ solving algorithm based on a problem-partition approach. It then discuss about work distribution policies and presents several experimental results comparing several parameters.
I. I NTRODUCTION Quantified constraint satisfaction problems (QCSP) allows natural modeling of problems involving two ennemies (such as two-player games), or problems with uncertainty (e.g. robust scheduling, testing...), which generally lies out of reasonable reach of the constraint satisfaction framework (CSP). A CSP consists in a set of variables and a set of constraints on these variables. A solution of a CSP is an assignment of the variables such that all the constraints are satisfied. Most CSP solvers find such a solution by recursively enumerating all possibles values of one variable and propagating domains reductions. Since several years, parallel CSP solving procedures have been studied and tested against sequential ones. For example, a simple way to parallelize a CSP solver consists in solving every subproblem generated by the enumeration of a variable in parallel. Other frameworks have also been parallelized, such as boolean formula satisfaction (SAT) and quantified boolean formulas (QBF). However, as far as we know, no parallel approach have ever been done to solve QCSP. This paper introduces such a parallel algorithm for solving QCSP+ , an extension to QCSP introduced in [1], and presents some experiments on several instances of easy and hard problems. II. PARALLEL SOLVING Parallelism is largely used in artificial intelligence and operation research problems. In the more particular domain of CSPs, the idea of a parallel constraint solver or optimizer have been studied for years [7] [6]. Several approaches have been followed to have several processors solving one problem. A first one consists in dividing the search space of the problem into parts that the processing units will search independently. This problem partition is naturally given by the search tree: every node is itself a subproblem. In this approach, sharing the jobs can be done in several ways: a naive method consists in ensuring that the problem is split often enough to avoid
starvation. Work stealing is a more reactive solution: a subproblem being solved is split as soon as a processor needs work. Another approach is the portfolio-based one: basically, it consists in running several solvers in parallel on the same problem, so that the most efficient solver for this problem will quickly give a solution. It may use communication, each instance sending its own deductions to each other. The first approach is for example used in the Gecode CSP solver [3] since version 3. The parallel version of the solver uses a work-stealing method to keep every thread active. In boolean satisfaction (SAT), award-winning solver ManySAT [4] is portfolio-based. It uses a Master/slaves hierarchy among the processing units and widely relies on information sharing between instances. In the quantified domain, the QBF solver PQSOLVE [2], is based of a problem-dividing approach with a workload balancing mechanism. More recently, Lewis et al. introduced PaQuBE in 2009 [5], which features conflict-clause sharing among processes. III. T HE QCSP+ FRAMEWORK A. Formalism 1) CSPs: Let V be a set of variables. Each v ∈ V , has got a domain Dv . For a given W ⊆ V , we denote DW the set of tuples on W , i.e. the cartesian product of the domains of all the variables of W . A constraint c is a pair (W, T ), W being a set of variables ant T ∈ DW a set of tuples. The constraint is satisfied for the values of W which form a tuple of T . If T = ∅, the constraint is empty and can never be satisfied. On the other hand, a constraint such that T = DW is full and will be satisfied whatever value its variables take. W and T are also denoted by var(c) and sol(c). A CSP is a set C of S constraints. We denote var(C) the set of its variables, i.e. c∈C var(c) and sol(C) the set of its solutions, i.e. the set of tuples on var(C) satisfying all the constraints of C. The empty CSP (denoted >) is true whereas a CSP containing an empty constraint is false and denoted ⊥. An assignment is a set of couples (v, a), v being a variable and a a value, such that no variable appear twice. Given an assignment A of variables W , we denote C[W ← A] the CSP obtained by reducing the domains of the variables of W to he values given by the assignment A.
2) Quantified problems: A quantified set of variables (or qset) is a pair (q, W ) where q ∈ {∀, ∃} and W is a set of variables. We call prefix a sequence of qsets [(q0 , W0 ), . . . , (qn−1 , Wn−1 )] where (i 6= j) → (Wi ∩ Wj = Sn−1 ∅). We denote var(P ) = i=0 Wi . A QCSP is a pair (P, G) where P is a prefix and G, is a CSP called the goal such that var(C) ⊆ var(P ). QCSP+ have been formalized in [1]to restrict the quantifiers of a QCSP to chosen values. Such a restriction implies changing the nature of the qset: it includes a CSP whose solutions define the allowed values for the variables Wi . QCSP+ are defined as follows: A restricted quantified set of variables or rqset is a triple (q, W, C) where (q, W ) is a qset and C a CSP. A QCSP+ is a pair Q = (P, G) where P is a prefix Si of rqsets such that ∀i, var(Ci ) ⊆ W j . Moreover, j=0 var(G) ⊆ var(P ) still holds. 3) Solution: A QCSP (P, G) where P = [(∃, W0 ), (∀, W1 ), . . . , (∃, Wn )] represents the following logic formula: ∃W0 ∈ DW0 ∀W1 ∈ DW1 . . . ∃Wn G Thus, a solution can not be a simple assignment of the variables: in fact, the goal has to be satisfied for all values the universally quantified variables may take. Intuitively, such a problem can be seen as a “game” where a ∃-player tries to satisfy G while a ∀-player aims at making it false, each player assigning the variables in turn in the order defined by the prefix. The solution must represent the strategy that the ∃-player should adopt to be sure that the goal will always be satisfied whatever its opponent do. In this paper, we represent a strategy by the set of every possible scenario (i.e. total assignment of the variables) it generates. Definition 1 (Set of strategies of a QCSP): The set of strategies of a QCSP Q = ([(q0 , W0 ), . . . , (qn−1 , Wn−1 )], G), denoted Str(Q)is inductively defined as: • • •
Str([], G) = {()} Str([(∃, W )|P 0 ], G) = {t o nS s0 |t ∈ DW ∧ s0 ∈ Str(P 0 , G)} Str([(∀, W )|P 0 ], G) = { (t o n α(t)|α ∈ (DW 7→ Str(P 0 )}
t∈D W
A strategy of strat(P, G) is winning iff all its scenarios satisfy G. Such a winning strategy is a solution of the problem. In the QCSP+ case, the “moves” of each player are restricted to assignments that satisfy the restrictors. The logic formula it represents is: ∃W0 C0 ∧ (∀W1 C1 → (∃W2 . . . G)) A solution of a QCSP+ is still a winning strategy, except that the scenarios are restricted by the rqsets: The set of the strategies of a QCSP+ is defined inductively as: • • •
Str([], G) = {()} Str([(∃, W, C)|P 0 ], G) = {t o n s | t ∈ DW ∧ t|var(C) ∈ sol(C) ∧ s ∈ StrP 0 , G} S Str([(∀, W, C)|P 0 ], G) = { α(DW ) | α ∈ Πt∈DW ( { t o n s | t|var(C) ∈ sol(C) ? s ∈ WIN(P 0 , G) : s = () ) }
We note that some scenarios can assign only a subset of the variables: if the restrictor of a rqset (∀, Wi , Ci ) becomes inconsistent, the last inductive case leads to an assignment of the variables of W1 ∪ . . . ∪ Wi−1 . In that case, this assignment is enough to always validate the end of the formula. A strategy of a QCSP+ is winning when all the scenarios assigning every variables verify the goal.
The set of scenarios of a wining strategy has a special structure: the inductive definition leads to the fact that if a scenario assign the variables W1 ∪ . . . ∪ Wi (among others) to values a1 . . . ai , one of these situations occur: (1) either qi+1 = ∀ and every solutions of Ci+1 where W1 = a1 . . . Wi = ai is present at least in one scenario of the winning strategy; or (2) qi+1 = ∃ and one solution of Ci+1 where W1 = a1 . . . Wi = ai is present at least in one scenario of the winning strategy. A strategy can be represented a tree where each node at a given depth i is labeled an assignment Wi , such that every branch represents a unique scenario. So, given a node such that itself and the set of its ancestors give the assignment W1 = a1 . . . Wi = ai , either qi+1 = ∃ and this node has exactly one child labeled with a solution of Ci+1 , or qi+1 = ∀ and the node has as many children as Ci+1 has solutions. B. QCSP+ solving procedure The only publicly available QCSP+ solver is QeCode [8]. It consists in a classical backtracking algorithm: rqsets are considered from left to right, and the scenarios of the winning strategy are progressively built: given a QCSP+ Q, a depth i (pointing to (qi , Wi , Ci ) or to thG) and an assignment A = a1 . . . ai−1 of the variables of W1 ∪ . . . ∪ Wi−1 , the recursive procedure returns a winning strategy for the corresponding subproblem: if qi = ∀, it consists in a set of winning sub-strategies for each solution of Ci , whereas if qi = ∃, it consists in a winning sub-strategy for one solution of Ci . When viewing the set of scenarios as a tree, each recursive call computes a sub-tree for a given node, which can be identified by the partial assignment A. IV. M ULTITHREADED ALGORITHM The multithreaded algorithm we propose has a MasterSlaves pattern based structure: a central data-structure called manager maintains a partial strategy (i.e. a strategy with missing scenarios), and calls several (single-threaded) workers to complete it. Those workers are just “classical” sequential QCSP+ solver instances. Because each incomplete branch yields a different subproblem, no direct communication between a worker and another is needed. thus, every messages transit between the manager and a worker. The main communication between manager M and worker S consists in sending jobs from M to S and returning substrategies from S to M . Formally, a job is a quadruplet (Q, i, A, Sol) where Q is a QCSP+ , i a depth and A an assignment W1 ∪ ... ∪ Wi , and Sol a set of solutions of Ci [W1 ...Wi ← A]. It describes a location of a subproblem needed to be solved, the remaining sub-branches yet to be explored are given by Sol. Once S has finished solving a subproblem, it returns to M the corresponding sub-strategy and asks for another job. M can also send signals to S to prematurely stop it if needed: (1) if S is solving a subproblem that is not relevant anymore (because the branch has been cut upper in the tree by another worker), M sends a signal ordering it to stop and fetch another job; (2) if M has no more job in its waiting list, it sends a signal to a worker to make
Worker
Manager GetWork() (Q,i,pos)
solving... ReturnWork(pos,Strategy) GetWork() (Q,i,pos) Terminate GetWork() STOP
Fig. 1. Example of communication sequence between a worker and the manager. The worker first asks for a job, solves it and return the strategy. Its second job is interrupted by a Terminate signal from the manager, so it stops the solving and returns nothing. At the third attempt to get a job, the manager returns a Stop signal, and the worker definitely ends its execution.
it stop and return the partial sub-strategy already computed. Every incomplete branch being as many jobs available for other workers. Once the whole problem is solved, each worker is killed and the result is returned. Each worker is constructed along with a thread executing the worker’s main loop. The manager’s controls and datastructure update are performed when a worker calls one of its methods. A mutual exclusion mechanism is placed on the Manager’s methods, so that only one thread may update the master at a time. A. Workers Each worker is an object having a very simple looping main procedure. This loop fetches a task from the Manager, and tries to solve it by calling an internal (single threaded) Solve method. The Manager can also possibly return a WAIT pseudotask, which will cause the worker to sleep until a task becomes available, or by a STOP pseudo-task, which will exit from the main loop, and consequently stop the worker. Once the solver returns a solution, it is sent to the Manager.A worker is able to catch two signals called Terminate and Send partial. Both indicates that the search procedure should stop, but the former means that the task has become useless while the later calls for returning a partial result plus the jobs left to do. The Solve method inherit from the QCSP+ search procedure presented above, modified in order to catch the signals in a reasonably short time.Figure 1 presents an example of interaction between a worker and the Manager. As a worker is just a simple-threaded QCSP+ solver, any tuning of a sequential quantified constraint solver also apply here. In particular, choosing search heuristics is a natural issue that might greatly impact the solving time as it already does for CSP and sequential QCSP+ solvers. B. Manager The manager is an object containing a partial winning strategy of the problem, and distributes jobs to the workers to
complete it. During search, this strategy miss some scenarios, but also potentially have some in excess: in fact, several workers may be at the same time solving subproblems corresponding to several solutions of the same existential restrictor. In this case, each may return a partial sub-strategy (if asked to stop and share its work), though only at most one of these will be kept in the final winning strategy. Considering the tree representation of the strategy, it corresponds to brothers existential nodes, that will eventually be cut when a worker extracts a complete sub-strategy for one of them. During search, the winning strategy is incomplete. In order to keep trace of incomplete branches in the strategy, special “todo” nodes are inserted at their position. The corresponding jobs are listed in a list called ToDo. A list Current Workers of the workers currently solving a job, as well as a list of sleeping workers (waiting for a job) are also maintained. The fetchWork method is called by a worker and returns a task from the ToDo list. If ToDo is empty, it sends the Send partial signal to one worker from Current Workers and returns WAIT to the calling worker. If Current Workers is also empty, the search has ended and the manager returns ST OP . The returnWork method is called by a worker which has finished a job. It attaches the returned sub-strategy to the incomplete strategy and cuts the branches that are no longer necessary. Each worker that was solving a node on a cut branch are sent the Terminate signal. A worker calling this method may also submit an incomplete work. In this case, the incomplete substrategy is attached, the remaining jobs for this subproblem are queued in the T oDo list and the workers contained in the sleeping list are awoken. 1) Manager tuning: The manager has to set various priorities that may notably affect the solving time. First, a priority has to be defined when choosing which job, among the ones in the T oDo list, will be assigned to a worker first: which are the most “promising” jobs ? The manager not being a solver, this comparison between jobs can only consider their position in the strategy (in particular: depth and quantifier). Another choice to make is: which worker do we stop in order to share its work when needed ? This choice may consider the position of the subproblem being solved (a job near the root is more likely to be cut into more pieces), as well as the execution time of the worker (it is less interesting to share an almost done work). V. E XPERIMENTAL RESULTS A. Experiments The connect-4 is a two-players game consisting in a vertical board having 7 rows and 6 columns in which each player drops a counter in turn (one player has yellow counters, and its opponent red ones). The aim of the game is to be the first to align 4 counters of its color, horizontally, vertically or diagonally. In this game, each player choose only the column to play, as the counter will drop to the lower unoccupied place. The game of Nim consists in a heap of counters from which each player take in turn one, two or three elements, the aim of the game being picking the last element. The variant we
3000
1,2
FIFO Exists-nearest Exists-deepese Forall-Nearest Forall-Deepest Deepest Nearest
2500
2000
14 moves 13 moves 12 moves 11 moves
1
0,8
1500
0,6
1000
0,4
500
0,2
0
0 1
2
3
4
5
6
7
8
Fig. 2. Solving time of the connect-4 problem with a depth of 14, for several ordering of jobs. X-axis gives number of workers. Each time represented is the average time of 7 runs.
1
2
3
4
5
6
7
8
Fig. 3. Relative solving time of parallel solver w.r.t. the sequential one, for the connect-4 game, until depths 11, 12, 13 and 14. X-axis gives number of workers. X-axis gives number of workers. Jobs were distributed shallowest first. Each time represented is the average time of 7 runs. 2 39 counters 37 counters 34 counters 33 counters
1,8
modeled consists in letting the players take up to twice the number of counters that have been picked up by their opponent at the previous turn. We solved the 6×7 connect-4 game with a depth from 11 to 14 (14 being solved by QeCode in less than 1 hour), and the variation of the Nim game with an initial number of counters from 30 to 39 (this last cast taking 1000 seconds to qecode to solve it). Lower values leads to immediatelysolvable problems, which are not relevant for testing. The workers used an appropriate branching heuristics (especially, the connect-4 has an ad-hoc value heuristics). We used several jobs-priority heuristics for each instance: first-in-firt-out, deepest first, shallowest first, universal first (then, deepest or shallowest first), and existential first (idem). Each one of these instances was run with a maximum number of workers varying from 1 to 8. Those tests were run on machines equipped with two quad-core Opteron at 2,3 GHz, thus every worker had a dedicated processor core to execute itself. Each instance was run 7 times, in order to measure the influence of the parallelism-inherent indeterminism on computation time.
is probably the limiting factor here. From 4 to 5 workers, the 1,6 threads are distributed on the two processors, and so the second 1,4 memory controller is used, noticeably enlarging the memory 1,2 bandwidth. For the game of Nim, parallelism was less efficient: 1 for0,834 to 39 counters, 2 workers take only 20 to 40% less time (against 40 to 55% for connect-4), and adding more than 5 0,6 workers was useless. The 33-counters version of the problem 0,4 was not hard enough to take advantage of parallelism, and 0,2 using more than 1 worker did slow the search down.
B. Results
[1] Marco Benedetti, Arnaud Lallouet, and J´er´emie Vautard. Qcsp made practical by virtue of restricted quantification. In Manuela M. Veloso, editor, IJCAI, pages 38–43, 2007. [2] Rainer Feldmann, Burkhard Monien, and Stefan Schamberger. A distributed algorithm to evaluate quantified boolean formulae. pages 285– 290. AAAI Press, 2000. [3] Gecode Team. Gecode: Generic constraint development environment, 2006. Available from http://www.gecode.org. [4] Youssef Hamadi and Lakhdar Sais. Manysat: a parallel sat solver. Journal on Satisfiability, Boolean Modeling and Computation (JSAT, 2009. [5] Matthew Lewis, Paolo Marin, Tobias Schubert, Massimo Narizzano, Bernd Becker, and Enrico Giunchiglia. Paqube: Distributed qbf solving with advanced knowledge sharing. In SAT ’09: Proceedings of the 12th International Conference on Theory and Applications of Satisfiability Testing, pages 509–523, Berlin, Heidelberg, 2009. Springer-Verlag. [6] Laurent Perron. Search procedures and parallelism in constraint programming. In CP ’99: Proceedings of the 5th International Conference on Principles and Practice of Constraint Programming, pages 346–360, London, UK, 1999. Springer-Verlag. [7] E. A. Pruul and G. L. Nemhauser. Branch-and-bound and parallel computation: A historical note. Operations Research Letters, 7(2):65– 69, April 1988. [8] QeCode Team. QeCode: An open QCSP+ solver, 2008. Available from http://www.univ-orleans.fr/lifo/software/qecode/.
The most striking result is the great difference of execution time in a same set of runs: even for the hardest problems, the fastest run may complete in half the time taken by the lowest. The benefit of the jobs-ordering heuristics can very hardly be demonstrated. Average times does not give a clear advantage to one particular heuristics in any series; even if the shallowest-first one seems to be slightly more efficient. Figure 2 gives compared performances of the resolution for the seven heuristics listed upper. Figure 3 gives ratios between sequential and parallel (with several number of workers) solving times for the connect4 game, up to a depth of 11 to 14. The sequential solver takes respectively 70, 239, 852 and 2730 seconds to solve these problems.We notice an important speed-up for 2 workers. The solving time reduces a little for 3 and 4 workers, and drops again brutally for 5 workers, most likely for a hardware reason: the connect-4 models being quite big, memory access
0
1
2
3
5 6 7 8 VI.4 C ONCLUSION QCSP+ solving seems to get advantage from parallelism quite easily with a problem-dividing approach. This first parallel solver we presented offers noticeable performance speed-up for hard problems. This constitutes a first approach to parallel quantified constraint satisfaction solving, that shows noticeable benefits in terms of resolution time, and calls to further investigations on efficiency of job-sharing policies and search heuristics in a parallel solver.
R EFERENCES