Parallel Partial Order Reduction with Topological Sort ... - CiteSeerX

Parallel Partial Order Reduction with Topological Sort Proviso Jiˇr´ı Barnat and Luboˇs Brim and Petr Roˇckai Faculty of Informatics Masaryk University Brno, Czech Republic {barnat,brim,xrockai}@fi.muni.cz

Abstract—Partial order reduction and distributed-memory processing are the two essential techniques to fight the wellknown state space explosion problem in explicit state model checking. Unfortunately, these two techniques have not been integrated yet to a satisfactory degree. While for verification of safety properties, there are a few rather successful approaches to parallel partial order reduction, for LTL model checking all suggested approaches are either too technically involved to be smoothly incorporated with the existing parallel algorithms, or they are simply weak in the sense that the achieved reduction in the size of the state space is minor. The main source of difficulties is the cycle proviso that requires one fully expanded state on every cycle in the reduced state space graph. This can be easily achieved in the sequential case by employing depthfirst search strategy for state space generation. Unfortunately, this strategy is incompatible with parallel (hence distributedmemory) processing, which limits application of partial order reduction technique to the sequential case. In this paper we suggest a new technique that guarantees correct construction of the reduced state space graph w.r.t. the cycle proviso. Our new technique is fully compatible with the parallel graph traversal procedure while at the same time it provides competitive reduction of the state space if compared to the serial case. The new technique has been implemented within the parallel and distributed-memory LTL model checker D IVIN E and its performance is reported in this paper. Keywords-LTL Model Checking; Partial Order Reduction; Parallel Distributed-Memory Processing; D IVIN E

I. I NTRODUCTION Model checking has developed into a mature and widely used approach for assessing correctness and other functional properties of increasingly complex computer systems. Unfortunately, the state spaces of these systems are often so large that applying sequential algorithms becomes impractical. In these cases, parallel and distributed-memory approaches can be used to provide the vast computation resources required by the model checking process. Parallel algorithms have been successfully applied to explicit state model checking [1]–[3], symbolic model checking [4]–[6], analysis of stochastic [7] and timed [8] systems, equivalence checking [9], state space generation [10], and other related problems [11], [12]. Using parallel architectures, such as compute clusters, extends the ability of model checkers to handle larger verification problems. However, it does not solve the state explosion problem as such. The need for parallel model

checking approaches supported with additional state space reduction techniques like partial order reduction or symmetry reduction therefore still persists. Partial order reduction (POR) has been successfully used by sequential explicit LTL model checkers to reduce the number of states that must be explored and stored during the verification process. There are several accomplishments of the method, for more details see [13]–[15]. The idea is based on the observation that for verification purposes many of the system executions are equivalent with respect to the verified property. As a result, an exploration algorithm that is equipped with partial order reduction may safely avoid generation of some of the system executions, provided that it explores at least one representative from each equivalence class. The pruning of executions is technically achieved by considering only a subset of enabled actions/transitions in a system state when generating the state space. These subsets are referred to as ample sets. An action that is enabled in a system state, but is not part of the ample set for that state, is temporarily ignored by the generation algorithm. Note that such an action could be permanently ignored if it is ignored in all states along a cycle in the reduced state space graph. This may of course influence the correctness of the verification procedure. Consequently, an exploration algorithm has to guarantee that there is no enabled action ignored permanently in any system execution. This is achieved in practice by demanding at least one fully expanded state (a state for which ample set contains all enabled actions) on every cycle – the so-called cycle proviso. For the sequential case there is an efficient algorithmic solution to the cycle proviso problem that builds upon a depth-first exploration strategy during the generation of the reduced state space graph. Unfortunately, the depth-first exploration strategy is incompatible with parallel (and, by extension, distributedmemory) processing. One of the most challenging problems connected with the application of partial order reduction to the parallel setting is implementing the cycle proviso within the parallel generation of the reduced state space graph, i.e. preventing permanent action ignoring. In this paper, we suggest a new algorithm to identify and expand at least one state on every cycle in the reduced state space (which in turn builds on a new parallel algorithm to select a set of states covering all cycles of a directed graph).

Our new algorithm is amenable to parallel processing and as such can be used to implement the cycle proviso in parallel and/or distributed-memory setting. Importantly, our new technique does not increase complexity of the underlying state space generation algorithm, while at the same time it allows for roughly the same state space reduction as achieved in the sequential case with the standard depth-first based solution to the cycle proviso. The main novelty in our approach lies in employing a topological sort procedure to select vertices for full expansion to tackle the ignoring problem, therefore ensuring correctness of the reduction. It has been shown that topological sort can be implemented efficiently in parallel. The rest of the paper is organised as follows. In Section II we briefly introduce the automata-based approach to LTL model checking and we give all the necessary definitions and lemmas related to the partial order reduction technique our approach is based on. In Section III we summarise existing approaches to deal with parallel partial order reduction. We list some algorithmic concepts to deal with the cycle proviso problem and we discuss how they apply to parallel model checking. In Section IV we introduce our new algorithm for ensuring the cycle proviso and in Section V we report on some experiments. Finally, Section VI concludes the paper. II. P RELIMINARIES A. Parallel LTL Model Checking The automata-theoretic approach to explicit-state LTL model checking [16], [17] exploits the fact that every set of executions expressible by an LTL formula can be described by a Büchi automaton. In particular, the approach suggests to express all system executions by a system automaton and all executions not satisfying the formula by a property or negative claim automaton. These two automata are combined into a synchronous product in order to check for presence of system executions that violate the property expressed by the formula. The language recognised by the product automaton is empty iff no system execution is invalid. The language emptiness problem for Büchi automata can be expressed as an accepting cycle detection problem in a graph. Each Büchi automaton can be naturally identified with an automaton graph which is a directed graph G = (V, E, s, A) where V is the set of states (n = |V |), E is a set of edges (m = |E|), s is an initial state, and A ⊆ V is a set of accepting states. We say that a cycle in G is accepting if it contains an accepting state. Let A be a Büchi automaton and GA the corresponding automaton graph. Then A recognises a nonempty language iff GA contains an accepting cycle reachable from s. The LTL model checking problem is thus reduced to an accepting cycle detection problem in the automaton graph. To detect the presence of an accepting cycle in the underlying graph of the product automaton, Nested DFS [18] or other DFS-based algorithms [19] are used in sequential

tools, such as SPIN [20]. The efficiency of the algorithm relies on the depth-first search postorder that is known to be P-complete [21], implying that it is unlikely that the postorder can be computed by a scalable work-optimal parallel algorithm. Consequently, parallel algorithms for accepting cycle detection [1], [3], [22] build upon repeated reachability procedures, value propagation, and other algorithmic techniques that can be accelerated by means of parallel computing. B. Partial Order Reduction In this section, we shortly present the theoretical concept (definitions and propositions) of the reduction behind Peled’s ample set approach [14], [23], [24] and some heuristics that are commonly used in practice. Note that the ample set approach requires that the property to be verifies is expressed by means of LTL−x , i.e. by an LTL formula without the X (next) operator. Henceforward, whenever we speak of an LTL in the context of the partial order reduction, we actually mean LTL−x . The starting point is a system automaton graph augmented with a labelling function L that attaches to any state s a set of atomic propositions (taken from a set AP ) that are assumed to hold in state s (usually called a Kripke structure). Definition 2.1: A Kripke structure is a tuple (S, T, S0 , L) where • • • •

S is a set of states, T is a set of transitions (∀α ∈ T : α ⊆ S × S), S0 ∈ S is an initial state, L : S → 2AP is a labelling function.

We say that a transition α is enabled in a state s, whenever α(s) is defined. The idea of partial order reduction is to disable some transitions in some of the states, obtaining a new structure K 0 , such that for a fixed LTL formula ϕ, it holds that K |= ϕ ⇐⇒ K 0 |= ϕ. The reduced system K 0 is defined through the so-called ample sets. For each state s ∈ K, we define ample(s) ⊆ enabled (s) to be the set of transitions enabled in the reduced system. To formulate conditions for the ample-sets that will ensure the correctness of the reduction, we need notions of independent and invisible transitions. We say that a transition α is independent of a transition β iff: 1) ∀s : α ∈ enabled (s) =⇒ α ∈ enabled (β(s)) 2) ∀s : α(β(s)) = β(α(s)) We say that a transition α is invisible iff it holds that ∀s ∈ S : L(s) ∩ AP 0 = L(α(s)) ∩ AP 0 . Apart from requiring correctness (the system defined through the ample sets satisfies ϕ iff the original system does), two properties are crucial for successful application of the reduction: 1) The ample sets need to be efficiently obtainable from description of the original system.

2) The reduction achieved needs to be reasonably large, that is, the reduced system should be significantly smaller than the original. Traditionally, the following four conditions are used to determine a correct and suitable ample set for each state s: C0 ample(s) = ∅ ⇐⇒ enabled (s) = ∅ C1 Along every path in the original structure K that starts in s, the following condition holds: a transition that is dependent on a transition in ample(s) cannot be executed without a transition in ample(s) occurring first. C2 If s is not fully expanded, then every α ∈ ample(s) is invisible. C3 (cycle proviso) A cycle in the reduced structure is not allowed if it contains a state in which some transition is enabled, but is never included in ample(s) for any state s on the cycle. Conditions C0 and C2 are easily checked locally for a given system state and an ample set. Direct checking of C1 is expensive, therefore a safe approximation is typically used in practice, such that the correctness of the ample set with respect to the (approximation of) C1 can be checked locally for a given system state as well. As a result all exploration algorithms (including parallel ones) can be augmented to look for C0-, C1- and C2-correct ample set for a system state when they process the state, and to use it for the generation of the reduced state space graph. What remains to be checked by the algorithm to achieve a correct reduction of the state space is the cycle proviso, which requires that at least one state along each cycle is fully expanded (meaning ample(s) = enabled (s)). Efficient checking of the cycle proviso is therefore the key issue to be elaborated when introducing a parallel, distributed-memory algorithm for state space generation and LTL model checking with partial order reduction. III. R ELATED W ORK The partial order reduction technique has been intensively studied as a leading technique to fight the state explosion problem in explicit model checking. As a result, a number of improvements and variants of the technique has been developed and successfully integrated into verification tools. However, these improvements are mutually exclusive in many cases and their usefulness depends on the target domain of the application. In particular, for the ample set method, there are subclasses of properties for which the formal requirements on ample sets may be safely weakened, hence different reduction algorithms applied. For example, to prove a deadlock freedom, the reduced structure does not have to fulfil the C3 property at all [25]. Similarly, if we check the system for a safety property, such as assertion violation, it is adequate for the states on a cycle in the reduced structure to be able to reach at least one fully

expanded (not necessarily immediate) successor state. In the following we will focus on various strategies to deal with C3 proviso that have been introduced in the literature so far. In particular, we will discuss their applicability to distributedmemory computing. For the purpose of parallel distributed-memory computing, the product automaton graph is divided into parts. Each parallel worker is then responsible for one part of the graph and executes its computation on the states of its part. Transitions that connect states from different parts of the graph are referred to as cross transitions. A. Static Partial Order Reduction Static partial order reduction [26] builds upon the fact that the system under consideration is an asynchronous product of individual system components. Since every cycle in the system graph projects to cycles of the components, it is possible to a-priori construct a set of states that cover every possible cycle in the system graph. Whenever a state of the reduced structure is a member of such a covering set, it is fully expanded. Static partial order reduction technique is compatible with distributed-memory computing, however, it is generally considered to be less effective than dynamic approaches listed below. B. Dynamic Partial Order Reduction In the dynamic partial order reduction approach, the decision about the full expansion of a state is done for the state when it is processed by the exploration algorithm. There are several nuances in checking for the cycle proviso (condition C3) that depend on whether the reduced structure is used for verification of safety or liveness properties, or whether the exploration algorithm follows a particular search order (depth-first, breadth-first, etc.). 1) Stack proviso.: The classical cycle detection proviso is connected with depth-first traversal algorithm. The depthfirst search algorithm maintains a stack of states on the path from the initial state of the graph to the currently processed state. If the currently processed state has a direct successor that is on the stack, there is a cycle in the reduced structure. In case of verification of liveness properties, such a situation requires that the currently processed state must be fully expanded. However, this is not the case when verifying safety properties, where the full expansion of the currently processed state may be safely avoided if there is at least one direct successor of the state that is outside the stack [27]. 2) Local-Stack proviso.: When employing stack proviso in parallel setting a problem arises: Since there are states of the graph processed concurrently, multiple stacks must be maintained, which is technically involved and expensive as parts of each stack might be in possession of a different worker and thus inaccessible locally [28]. A solution to this problem is the so called Local-Stack proviso. With a local stack proviso each worker participating in the parallel

computation does not maintain a complete stack, but only a most recent part of it made of the worker’s local states. For that part of the stack, the standard stack proviso is used, however, whenever a cross transition is reached, the parallel reduction algorithm has to process it with a special care. Note that a worker may select a different ample set to avoid reaching a cross transition [28], [29]. A safe strategy to cope with cross transitions is to fully expand a state whenever a cross transition emanates from it [30], [31]. This rather strong condition may be weakened so that a state is fully expanded only if the cross transitions leads to a state owned by a worker with higher id [30]. These strategies are cheap and easy to implement, however, the reduction achieved is not that appealing. Another solution to deal with cross transition and achieving satisfiable reduction was introduced in [30]. However, the reduction algorithm presented there increases space and time complexity of the underlying exploration algorithm, which unfortunately reduces its practicality. 3) General visited proviso.: If for whatever reasons the algorithm for exploration of the reduced structure does not follow a depth-first visiting strategy, it cannot maintain the search stack, hence, it cannot apply the (local) stack proviso. In such a situation an alternative cycle detection proviso must be applied. A safe strategy is to fully expand a state whenever there is a risk that the currently explored transition may complete a cycle. Since we assume an arbitrary structure of the graph a cycle may be closed every time a graph traversal algorithm discovers a transition leading to an already visited state. Again the visited proviso differs according to the type of the property to be analysed. For general liveness properties, a state is fully expanded whenever one of its direct successors have been visited before. This proviso has been successfully combined with parallel and distributed-memory algorithms for cycle detection [32]. However, to detect whether a non-local successor state (state owned by a different workstation) has been visited before an additional message is required to be sent over the network for every cross transition, which notably slows down the parallel computation. 4) Open set proviso.: Quite recently, the general visited proviso has been further improved [33]. With open set proviso a full expansion is avoided if transition leads to a visited, but not yet explored state (a state from the so called open set). In such a situation the full expansion is achieved when exploring the last state from the open set. Unfortunately, open set proviso is incompatible with parallel processing as if two parallel workers concurrently explore a state and both of them postpone the full expansion of the state because of the other worker, no full expansion happens after all.

IV. N EW A LGORITHM FOR C HECKING C YCLE P ROVISO In this section, we present new time-optimal POR-enabled exploration algorithm that guarantees that along every cycle in the reduced state space graph, there is at least one fully expanded state. The algorithm is based on an iterative application of a topological sort procedure and is thus straightforwardly applicable in parallel and distributedmemory settings. Definition 4.1: Let R be the reachability relation over the vertices of a directed acyclic graph G = (V, E) such that uRv iff there is a directed path from u to v, i.e. (u, v) ∈ E + . Then R is a partial order. Topological sort is a linear extension of this partial order, that is, a total order compatible with R. Note that topological sort is only defined for directed acyclic graphs. This is simply because if the graph contains a cycle, no linear order for the graph exists. There are two different procedures to compute the topological ordering for a directed acyclic graph. The standard procedure employs the so called vertex finishing times as computed by a depth-first search. Since the depth-first search procedure is difficult to be performed in parallel, another algorithm for topological sorting – the Kahn’s algorithm [34] – is employed. Basically, the Kahn’s algorithm sweeps the directed graph in the direction of edges and marks vertices one by one in a topological order. To preserve the topological sort when marking vertices of the graph, Kahn’s algorithm employs the so called “topological in-degree”, i.e. a numeric value associated with every vertex of the graph denoting the number of immediate predecessors that have not yet been marked (given a topological order). Initially the topological in-degree of a vertex corresponds to the real in-degree of the vertex, i.e. the number of immediate predecessors. However, as the predecessors of the vertex are given a topological order, the topological in-degree associated with the vertex decreases. The vertex can be given a topological order only if the corresponding topological in-degree drops to zero. The idea of our new algorithm for checking the cycle proviso exploits the fact that a graph can be topologically sorted if and only if it is acyclic. The Kahn’s topological sort procedure, if executed on a graph with a cycle, terminates as soon as there are no vertices with zero topological indegree. After the procedure terminates, only the leading acyclic part of the graph has been topologically sorted. However, the procedure has decreased the topological indegree of all vertices that immediately follow the leading acyclic subgraph, i.e. all vertices v such that (u, v) ∈ E and u has been ordered by the procedure. It can be shown that these vertices either lie on a cycle or they are preceded by a cycle in the directed graph. Our new algorithm for checking cycle proviso takes all these vertices, marks them as vertices to be fully expanded, resets the topological indegree of these vertices to zero, and calls again to the Kahn’s

Figure 1: An illustration of how the new algorithm detects the set of states that intersects with all cycles. States to be removed by a topological sort procedure are marked with a black dot, states to be included in the result set are marked with a little cross. The algorithm proceeds as indicated by the sequence of graphs from left to right.

topological sort procedure. The whole process is repeated until all the vertices are ordered. See the pseudo-code given in Algorithm 1. Variables indegree(v) and top ind(v) represent the original graph in-degree and topological indegree of the vertex v, respectively. An illustration of how the algorithm behaves is given also in Figure 1. Note that for the correctness of the algorithm we assume that all the vertices of the graph are reachable from the set of initial vertices, denoted with I in the pseudo-code. Note that the size of the resulting set as computed by the algorithm in the pseudo-code is not necessarily minimal. To compute the optimal (in the size) set of vertices intersecting with all cycles of the graph is an N P -complete problem. A possible heuristics to decrease the size of the resulting set is to fill the set X with only a single vertex at line 12 of the algorithm instead of considering all vertices satisfying the condition.

Algorithm 1 C OVER A LL C YCLES Input: Directed graph G = (V, E, I) Output: R ⊆ V such that R intersects with all cycles in G

Lemma 4.2: For a given graph G = (V, E, I) the algorithm C OVER A LL C YCLES() returns a set of states such that the set contains at least one vertex from every cycle in G. Proof: First let us make the observation that the algorithm is a graph traversal algorithm. It maintains the set of vertices that have not yet been traversed – W and this set is decreased as the algorithm proceeds. To prove that every cycle in G intersects with R at the end of the execution of the algorithm we will employ the relation between R and W . In particular, for any cycle c ⊆ V , either c ⊆ W or there is a state s ∈ c such that s ∈ R. We will demonstrate that this property is actually an invariant of the main while loop. Employing a simple observation that on a cyclic path no vertex may have topological in-degree equal to zero, we may argue that by repeated application of lines 6 to 10 we cannot remove a state from W that is a part of a cycle fully contained in W (property of Kahn’s topological sort procedure). Therefore a state on a cycle fully contained in W can be removed from W at line 7 only if its

15: 16: 17: 18: 19:

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14:

R ← ∅; W ← V for all v ∈ V do top ind(v) ← indegree(v) end for while W 6= ∅ do Q ← {u | top ind(u) = 0} ∩ W W ←W rQ for all (u, v) ∈ E ∩ (Q × V ) do top ind(v) ← top ind(v) − 1 end for if Q = ∅ then X ← W ∩ {v | v ∈ I ∨ ((u, v) ∈ E ∧ u 6∈ W )} R←R∪X for all w ∈ X do top ind(w) ← 0 end for end if end while return R

topological in-degree has been set to zero explicitly, which may happen only at line 15. However, any such an update of the topological in-degree for a vertex w is preceded by inserting the vertex into R meaning that if a cycle is not fully covered by W it has at least one state in R, which is the desired property. Therefore, when the loop terminates, W is empty and (as follows from the invariant of the loop), R contains at least one state from each cycle in G. Lemma 4.3: For a finite graph G = (V, E, I) the algorithm C OVER A LL C YCLES() terminates.

Proof: In each iteration of the loop, either there is at least one state removed from W (Q is nonempty after the assignment at line 6), or top ind is set to zero for some of the states in W meaning that Q will become nonempty in the next iteration of the loop. Hence, |W | necessarily decreases after at most two succeeding iterations of the loop. Lemma 4.4: Let G = (V, E, I) be a directed graph. Algorithm C OVER A LL C YCLES proceeds in time O(|V |+|E|). Proof: It is easy to see that if a vertex is removed from W it is never processed again by the algorithm and neither there are the edges leading to it. It remains to be shown that it takes constant amount of work per edge and per vertex to remove a vertex from W : If the set of vertices with zero topological in-degree is manipulated as a list, consideration of all vertices with zero in-degree is constant-per-vertex operation. Updates of top ind(v) happen at most once for each edge. Also the assignment at line 12 is a constant-peredge and constant-per-vertex operation as a vertex cannot be inserted into X a second time. (When it is inserted into X, it is removed from W in the next iteration of the loop). New-POR-Enabled State Space Generation Since a model checker usually works with implicitly-given graphs, i.e. graphs given by the function of initial state and function for enumerating emanating edges for a given state (successor function), we have no direct way to learn the immediate predecessors of a state in the state space nor their exact number. Accordingly, Kahn’s algorithm has to be implemented in two phases. At first, the algorithm generates the state space graph and computes the actual graph indegree, and then sweeps the generated graph and computes the topological order of vertices. This two-phase approach applies also to our new algorithm, C OVER A LL C YCLES. In order to correctly generate the reduced state space graph we employ the C OVER A LL C YCLES algorithm as indicated by the pseudo-code listed as Algorithm 2. First, we generate the state space using the ample sets satisfying conditions C0, C1, and C2 (line 5).1 Then we run the C OVER A LL C YCLES algorithm to detect which states have to be fully expanded in order to satisfy the cycle proviso. Full expansion of these states may however generate new states that have not been generated yet. These new states are treated as initial states for a new (not yet generated) part of the reduced state space graph (line 9). We then let the model checking algorithm generate this new part, again using ample sets satisfying conditions C0, C1, and C2. The newly generated part of the state space may contain new cycles. Hence, we have to run C OVER A LL C YCLES again for the new part of the state space graph. The whole procedure repeats until no new states to be re-expanded are discovered by the C OVER A LL C YCLES algorithm. See Figure 2. 1 Note that the graph generated by C0, C1 and C2 is inevitably a subgraph of the resulting reduced state space graph, hence, we cannot generate states that are outside the reduced state space graph.

Algorithm 2 POR-R EDUCED S TATE S PACE Input: Directed graph G = (V, E, v0 ) EAmp – edges in ample sets for C0,C1 and C2 Output: P ⊆ V set of states of POR-reduced state space 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11:

P ←∅ I ← {v0 } while I 6= ∅ do ENew ← EAmp r (P × P ) + VNew ← {v | ∃u ∈ I : (u, v) ∈ ENew } ENew ← ENew ∩ (VNew × V ) P ← P ∪ VNew R ← C OVER A LL C YCLES(VNew , ENew , I) I ← {v | ∃u ∈ R : (u, v) ∈ E} r P end while return P

Lemma 4.5: All cycles in the state space as computed by POR-R EDUCED S TATE S PACE algorithm contain at least one fully expanded state. Proof: We can see that the set of vertices as returned by the algorithm is constructed from subgraph parts by adding these parts to the final set, see line 7. Realizing that full expansion of vertices is done for all vertices in R at line 9 of the algorithm, we can use Lemma 4.2 to argue that every cycle fully contained in a single subgraph being added to the final set contains at least one fully expanded state. It remains to be shown that all cycles that cross the boundaries of individual subgraphs contain at least one fully expanded state. However, a path can lead out of a subgraph identified with VNew only through a fully expanded state (a state in corresponding R): this trivially guarantees presence of fully expanded states on all cycles crossing subgraph boundaries. Lemma 4.6: Algorithm POR-R EDUCED S TATE S PACE terminates for any finite graph G. Proof: The termination condition of the loop requires I to become empty, which demonstrably happens in finite time. First, a vertex cannot be part of the set I multiple times: once it is inserted into I, the next iteration of the loop will also insert the vertex into P , at which point it can never be included in I again. Since the total number of vertices is finite, and new I is computed in every iteration of the loop, I has to become empty after a finite number of iterations. Lemma 4.7: Let G = (V, E) be the POR-reduced state space graph. Algorithm POR-R EDUCED S TATE S PACE generates the states of G in linear time w.r.t. the size of G, i.e. in time O(|V | + |E|). Proof: The state space as generated by the algorithm is built iteratively from parts identified during individual iterations of the main loop and stored in VNew . Once a vertex

a)

b)

c)

d)

e)

Figure 2: An illustration of how the new POR technique proceeds when constructing correctly POR-reduced state space. The algorithm proceeds as indicated by the sequence of pictures from left to right: a) initial state and the part of the reduced state space that is reachable from it without any full expansion, b) states to be fully expanded as computed by the C OVER A LL C YCLES procedure, c) after the full expansion, some states may be new, i.e. outside the so far generated part of the state space, d) state space reachable from the new states without any full expansion, e) states to be fully expanded in the second part of the graph, here no new states are generated, hence the algorithm terminates.

is inserted into VNew it is also added to P in the same iteration of the loop, and therefore it is never processed in any later iteration. Moreover, the algorithm performs linear amount of work for each subgraph (VNew , ENew ) in each iteration: two operations are executed on the subgraph – first computes VNew (line 5, using reachability procedure), second processes the subgraph using C OVER A LL C YCLES. Since both these operations are in O(|VNew | + |ENew |), the overall complexity of POR-R EDUCED S TATE S PACE is in O(|V | + |E|). New-POR-Enabled Parallel LTL Model Checking To exploit partial order reduction, our new cycle proviso algorithm needs to be combined with a suitable model checking algorithm. Here we demonstrate how to embed the new cycle proviso algorithm into the parallel model checking algorithm used by the D IVIN E model checker. The algorithm of D IVIN E is work-optimal for the majority of verification instances and works on-the-fly. The algorithm stems from algorithms OWCTY [22] and MAP [3]. It starts out with a single reachability procedure, that is augmented with a heuristic for early detection of accepting cycles [1]. After the reachability procedure, the algorithm computes an approximation set of states lying on an accepting cycle. In order to do so, it takes the product automaton graph and repeatedly eliminates states from the graph that cannot be part of an accepting cycle, employing the two following elimination rules: 1) Eliminate states unreachable from an accepting state. 2) Eliminate states with zero predecessors. The algorithm terminates as soon as there are no more states to be eliminated (fix-point has been reached). It has been shown that these two elimination rules are enough to detect the presence of an accepting cycle in the graph. In particular, there is an accepting cycle in the graph if and only if the algorithm has terminated with a non-empty set of states. For

Algorithm 3 D IVIN E M ODEL C HECKING A LGORITHM Input: Directed graph G = (V, E, v0 , A) EAmp – edges in ample sets for C0,C1 and C2 Output: Presence of an accepting cycle in G Old ← ∅ P ← POR-R EDUCED S TATE S PACE(G, EAmp ) S←P while S 6= Old do Old ← S S ← E LIMINATE U NREACHABLE(S, A, P ) S ← E LIMINATE Z ERO P REDS(S, P ) 8: end while 9: return (S 6= ∅)

1:

2: 3: 4: 5: 6: 7:

the overall structure of the algorithm see the pseudo-code as listed in Algorithm 3. Since the initial reachability procedure is explorationorder independent, it may be adapted to perform the state space generation using to the new cycle-proviso-checking algorithm, see line 2 of the pseudo-code in Algorithm 3, and to generate only the reduced state space graph. However, succeeding procedures manipulating the state space must be aware of the reduced state space graph in order to avoid exploration of transitions leading outside the reduced structure. We indicate this in the pseudo-code by passing the set of states of the reduced structure as an additional parameter to the elimination procedures. In the actual implementation, however, we keep a single bit associated with every state of the reduced state space to indicate whether the state has been fully expanded or not. Any succeeding traversal procedure then uses this bit to decide whether to employ a C0-C2 compatible ample set when expanding the state or whether to perform full expansion of the state.

No POR

Model peterson.1.dve peterson.2.dve peterson.3.dve peterson.1.prop2.dve peterson.2.prop2.dve peterson.1.prop3.dve peterson.2.prop3.dve mcs.1.dve mcs.2.dve mcs.1.prop2.dve mcs.2.prop2.dve mcs.1.prop3.dve mcs.2.prop3.dve synapse.1.dve synapse.2.dve synapse.1.prop2.dve synapse.2.prop2.dve leader_filters.1.dve leader_filters.2.dve leader_filters.3.dve leader_filters.1.prop2.dve leader_filters.2.prop2.dve leader_filters.3.prop2.dve IN TOTAL

12 124 170 22 234 24 249 7 1 12 2 15 2 46 61 7 15 4 29 91 4 28 91

498 704 156 816 376 985 368 963 408 206 462 815 811 756 048 226 713 966 284 093 966 804 093

1 262 517

DFS-POR

TOP-POR

999 949 147 481 441 907 181 312 937 545 849 687 941 290 048 758 713 810 423 809 966 239 093

64.0 % 85.7 % 75.8 % 76.6 % 91.4 % 63.6 % 85.0 % 91.8 % 66.5 % 94.5 % 75.1 % 92.8 % 69.0 % 92.5 % 100.0 % 93.5 % 100.0 % 96.8 % 76.5 % 96.3 % 100.0 % 80.6 % 100.0 %

7780 102779 122704 17098 210287 15479 202829 7778 1332 12132 2370 15610 2672 43108 61048 6780 15713 4810 22423 87809 4966 23239 91093

62.2 % 82.4 % 72.1 % 74.9 % 89.7 % 61.9 % 81.3 % 97.6 % 94.6 % 99.3 % 96.2 % 98.7 % 95.0 % 92.1 % 100.0 % 93.8 % 100.0 % 96.8 % 76.5 % 96.3 % 100.0 % 80.6 % 100.0 %

1 103 525

87.4 %

1 081 839

85.7 %

7 106 129 17 214 15 212 7 11 1 14 1 43 61 6 15 4 22 87 4 23 91

Table I: Number of state in the state space graph achieved with no reduction, the traditional DFS-based reduction, and the newly suggested topological-sort-based reduction for selected BEEM models. N 8 9 10

No POR 1 node 3 nodes 6 nodes 1:45 6.6 GB 3:55 10.5 GB 3:48 7.08 GB — — — — — —

TOP-POR 1 node 3 nodes 6 1:06 0.9 GB 0:35 0.68 GB 0:17 6:48 3.71 GB 3:24 3.49 GB 1:43 — — 8:52

nodes 0.55 GB 2.14 GB 11.0 GB

Table II: Leader election, verifying F (elected ), with N processes. Each cluster node has 16 GB of RAM and 4 cores. The memory usage reported is per node, the times are wall-clock times. The runs that did not finish due to memory exhaustion are marked “—”.

Note that our approach to checking cycle proviso is compatible with the on-the-fly model checking approach as suggested in [1], [35]. Therefore replacing the standard reachability with our POR-R EDUCED S TATE S PACE procedure retains the on-the-fly verification quality of the algorithm.

Generally, our new algorithm for checking the cycle proviso may be incorporated into other accepting cycle detection algorithms provided that the algorithm under consideration is independent of the exploration order. Moreover, if the algorithm needs to revisit states, it must be possible to defer these revisit operations in the execution of the algorithm until after the complete reduced state space has been constructed.

V. E XPERIMENTS To evaluate the reduction efficiency of our new algorithm for checking the cycle proviso, we have implemented the algorithm within our parallel and distributed model checker D IVIN E. We have compared our reduction technique with the traditional approach based on DFS-stack proviso. We have not compared our reduction against previous “parallel” reductions techniques listed in Section III, as the comparisons of these techniques to DFS-stack based reduction has been in most cases reported in the respective papers. We run our tool to perform the state-space exploration for a couple of verification instances of several models using first no POR reduction, and then either DFS-stack reduction (referred to as DFS-POR), or the new topological sort reduction (denoted as TOP-POR). The numbers

of states generated for individual instances are given in Table I. All results in this table were obtained using serial, single-threaded implementation (remember that DFS-stack reduction cannot be parallelised). From the experiments we can conclude that the topological sort proviso reduction is capable to achieve similar reductions. Unfortunately, the C0 – C2 implementation currently available for DVE models and the prototype nature of our C3 implementation limit the practical benefits derivable from POR. Nevertheless, on models where POR is particularly effective, we have achieved both interesting speedup and considerable memory savings even with the initial implementation. We have used a leader election model (as available from BEEM), instantiated with different number of processes and using a number of multi-core blade servers. The results are presented in Table II.

[2] ——, “Scalable Multi-core LTL Model-Checking,” in Model Checking Software, the 14th international SPIN Workshop, ser. LNCS, vol. 4595. Springer-Verlag, 2007, pp. 187–203. ˇ a, P. Moravec, and J. Simˇ ˇ sa, “Accepting [3] L. Brim, I. Cern´ Predecessors are Better than Back Edges in Distributed LTL Model-Checking,” in 5th International Conference on Formal Methods in Computer-Aided Design (FMCAD’04), ser. LNCS, vol. 3312. Springer-Verlag, 2004, pp. 352–366. [4] O. Grumberg, T. Heyman, N. Ifergan, and A. Schuster, “”achieving speedups in distributed symbolic reachability analysis through asynchronous computation”,” in Correct Hardware Design and Verification Methods, 13th IFIP WG 10.5 Advanced Research Working Conference, CHARME 2005, ser. LNCS. Springer, 2005, pp. 129–145. [5] O. Grumberg, T. Heyman, and A. Schuster, “Distributed Model Checking for µ-calculus,” in CAV’01, ser. LNCS, vol. 2102. Springer, 2001, pp. 350–362.

VI. C ONCLUSIONS In this paper we proposed a new cycle detection proviso for partial order reduction. The proviso differs fundamentally from the known approaches as it employs topological sort that has never been considered in combination with partial order reduction before. The proviso can be checked by parallel algorithms and does not affect the asymptotic complexity of the model checking algorithm while it is at the same time competitive with the standard DFS-based solution to the cycle proviso with respect to the reduction achieved. The new algorithm also does not differentiate between local and cross transitions with respect to state partitioning, hence, the achieved state space reduction is independent of a particular partitioning. Our experiments demonstrated that there are cases where partial order reduction in combination with distributed-memory processing can extend the scope of model checking to systems that are not verifiable using either POR or distributed-memory alone. Furthermore, the new approach is compatible with the onthe-fly principle. Therefore, if combined with the algorithm from [1] we have an on-the-fly parallel algorithm with efficient partial order reduction technique. For a particular class of verification problems, namely for model checking properties expressible by weak Büchi automata, this corresponds to the best sequential solution available. ACKNOWLEDGMENTS This work has been partially supported by the Czech Science Foundation grants No. 201/09/1389 and 201/09/P497. Petr Roˇckai has been partially supported by Red Hat, Inc. R EFERENCES [1] J. Barnat, L. Brim, and P. Roˇckai, “A Time-Optimal On-theFly Parallel Algorithm for Model Checking of Weak LTL Properties,” in Formal Methods and Software Engineering (ICFEM 2009), ser. LNCS, vol. 5885. Springer, 2009, pp. 407–425.

[6] ——, “A work-efficient distributed algorithm for reachability analysis,” Formal Methods in System Design, vol. 29, no. 2, pp. 157–175, 2006. [7] B. R. Haverkort, A. Bell, and H. C. Bohnenkamp, “On the Efficient Sequential and Distributed Generation of Very Large Markov Chains From Stochastic Petri Nets.” in Proc. 8th Int. Workshop on Petri Net and Performance Models. IEEE Computer Society Press, 1999, pp. 12–21. [8] G. Behrmann, T. S. Hune, and F. W. Vaandrager, “Distributed Timed Model Checking — How the Search Order Matters,” in CAV’00, ser. LNCS, vol. 1855. Springer, 2000, pp. 216–231. [9] S. Blom and S. Orzan, “A Distributed Algorithm for Strong Bisimulation Reduction Of State Spaces,” Int J Softw Tools Technol Transfer, vol. 7, no. 1, pp. 74–86, 2005. [10] G. Ciardo, Y. Zhao, and X. Jin, “Parallel symbolic state-space exploration is difficult, but what is the alternative?” CoRR, vol. abs/0912.2785, 2009. [11] A. Bell and B. R. Haverkort, “Sequential and distributed model checking of petri net specifications,” Int J Softw Tools Technol Transfer, vol. 7, no. 1, pp. 43–60, 2005. [12] B. Bollig, M. Leucker, and M. Weber, “Parallel Model Checking for the Alternation Free µ-Calculus,” in TACAS’01, ser. LNCS, vol. 2031. Springer, 2001, pp. 543–558. [13] P. Godefroid and P. Wolper, “A partial approach to model checking,” Information and Computation, vol. 110, no. 2, pp. 305–326, May 1994. [14] D. Peled, “All from one, one from all: on model checking using representatives,” in Proceedings of the 5th International Conference on Computer Aided Verification (CAV’93), ser. LNCS, vol. 697. Springer-Verlag, 1993, pp. 409–423. [15] ——, “Ten Years of Partial Order Reduction,” in Proceedings of the 10th International Conference on Computer Aided Verification (CAV’98), ser. LNCS, vol. 1427. SpringerVerlag, 1998, pp. 17–28.

[16] M. Vardi and P. Wolper, “An automata-theoretic approach to automatic program verification,” in IEEE Symposium on Logic in Computer Science. Computer Society Press, 1986, pp. 322–331.

[26] R. P. Kurshan, V. Levin, M. Minea, D. Peled, and H. Yenigün, “Static Partial Order Reduction,” in Tools and Algorithms for Construction and Analysis of Systems (TACAS’98), ser. LNCS, vol. 1384. Springer, 1998, pp. 345–357.

[17] M. Y. Vardi, “Automata-Theoretic Model Checking Revisited,” in VMCAI’07, ser. LNCS, vol. 4349. Springer, 2007, pp. 137–150.

[27] G. J. Holzmann, P. Godefroid, and D. Pirottin, “Coverage preserving reduction strategies for reachability analysis,” in 12th Int. Conf on Protocol Specification Testing nad Verification (IFIP 1992), 1992, pp. 349–363.

[18] C. Courcoubetis, M. Vardi, P. Wolper, and M. Yannakakis, “Memory-Efficient Algorithms for the Verification of Temporal Properties,” Formal Methods in System Design, vol. 1, pp. 275–288, 1992. [19] S. Schwoon and J. Esparza, “A Note on On-The-Fly Verification Algorithms,” in Proceedings of the 11th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), ser. LNCS, vol. 3440. Springer, 2005, pp. 174–190.

[28] F. Lerda and R. Sisto, “Distributed-memory Model Checking with SPIN,” in Proc. of the 5th International SPIN Workshop, ser. LNCS, vol. 1680. Springer-Verlag, 1999. [29] R. Palmer and G. Gopalakrishnan, “Partial order reduction assisted parallel model checking,” in Proc. Parallel and Distributed Model Checking (PDMC) Workshop, 2002.

[20] G. J. Holzmann, The Spin Model Checker: Primer and Reference Manual. Addison-Wesley, 2003.

ˇ a, P. Moravec, and J. Simˇ ˇ sa, “Distributed [30] L. Brim, I. Cern´ Partial Order Reduction of State Spaces,” Electronic Notes in Theoretical Computer Science (PDMC 2004), vol. 128, no. 3, pp. 63 – 74, 2005.

[21] J. H. Reif, “Depth-first search is inherently sequential,” Information Processing Letters, vol. 20, no. 5, pp. 229–234, Jun. 1985.

[31] G. J. Holzmann and D. Bosnacki, “The Design of a Multicore Extension of the SPIN Model Checker,” IEEE Trans. Software Eng., vol. 33, no. 10, pp. 659–674, 2007.

ˇ a and R. Pelánek, “Distributed Explicit Fair Cycle [22] I. Cern´ Detection,” in SPIN’03, ser. LNCS, vol. 2648. Springer, 2003, pp. 49–73.

[32] J. Barnat, L. Brim, and J. Chaloupka, “From Distributed Memory Cycle Detection to Parallel LTL Model Checking,” Electronic Notes in Theoretical Computer Science (FMICS 2004), vol. 133, no. 1, pp. 21–39, 2005.

[23] G. J. Holzmann and D. Peled, “An improvement in formal verification,” in FORTE’94, ser. IFIP Conference Proceedings, vol. 6. Chapman & Hall, 1995, pp. 197–211. [24] D. Peled, “Combining partial order reductions with on-thefly model-checking,” in Proceedings of CAV’94. Springer Verlag, LNCS 818, 1994, pp. 377–390. [25] P. Godefroid and P. Wolper, “Using partial orders for the efficient verification of deadlock freedom and safety properties,” Form. Methods Syst. Des., vol. 2, no. 2, pp. 149–164, 1993.

[33] D. Bosnacki, S. Leue, and A. Lluch-Lafuente, “Partialorder reduction for general state exploring algorithms,” STTT, vol. 11, no. 1, pp. 39–51, 2009. [34] A. B. Kahn, “Topological sorting of large networks,” Communications of the ACM, vol. 5, no. 11, pp. 558–562, 1962. [35] J. Barnat, L. Brim, and P. Roˇckai, “DiVinE Multi-Core – A Parallel LTL Model-Checker,” in Automated Technology for Verification and Analysis, ser. LNCS, vol. 5311. Springer, 2008, pp. 234–239.

Parallel Partial Order Reduction with Topological Sort ... - CiteSeerX

Parallel Partial Order Reduction with Topological Sort ... - CiteSeerX

Suggest Documents

Cartesian Partial-Order Reduction - SpinRoot

Parallel Merge Sort with Load Balancing

Partial Order Reduction for PINS - Semantic Scholar

Partial Transmit Sequences for PAPR Reduction Using Parallel Tabu ...

Characterizing Topological Order with Matrix Product Operators

Parallel Database Sort and Join Operations Revisited on ... - CiteSeerX

Power Optimized Partial Product Reduction Interconnect ... - CiteSeerX

Parallel Sparse LU Factorization with Partial

Symmetry and Partial Order Reduction Techniques in Model Checking ...

Enhancing partial-order reduction via process clustering - Pure

Partial-Order Reduction for General State Exploring Algorithms

Dynamic Partial Order Reduction for Relaxed Memory Models

Partial-Order Reduction for Context-Bounded State Exploration ...

Dynamic Partial-Order Reduction for Model Checking Software

Partial Order Reduction for Scalable Testing of SystemC TLM Designs

Partial Order Reduction for Timed Circuit Verification Based on Level ...

Partial Order Reduction in Directed Model Checking 1 Introduction

Distributed Dynamic Partial Order Reduction Based Verification of ...

Partial Order Reduction for Event-driven Multi-threaded Programs

Partial Order Reduction for Event-driven Multi-threaded Programs

Partial Order Reduction for Event-driven Multi-threaded Programs

Ontological Knowledge Base Reasoning with Sort ... - CiteSeerX

Higher-Order Topological Insulators

Generating Robust Partial Order Schedules - CiteSeerX