A Conservative Data Flow Algorithm for Detecting All Pairs ... - CiteSeerX

0 downloads 0 Views 790KB Size Report
Jul 30, 2001 - As the number and significance of parallel and concurrent programs continue to ... In 9 cases, non-concurrency analysis identifies pairs that are not found by our ...... The steps in lines 6–9 are executed at most once for each rendezvous ...... NCA є time time time yes. 6. 5. 5. 5. 0. 0. 0. 0. 0.00. 0.00. 0.01 no. 7.
A Conservative Data Flow Algorithm for Detecting All Pairs of Statements that May Happen in Parallel for Rendezvous-Based Concurrent Programs Gleb Naumovich

George S. Avrunin

Department of Computer and Information Science Technical Report TR-CIS-2001-02 07/30/2001

A Conservative Data Flow Algorithm for Detecting All Pairs of Statements that May Happen in Parallel for Rendezvous-Based Concurrent Programs Gleb Naumovich Dept. of Computer and Information Science Polytechnic University Brooklyn, NY 11201 [email protected]

George S. Avrunin Dept. of Mathematics and Statistics University of Massachusetts Amherst, MA 01003 [email protected]

Abstract Information about which pairs of statements in a concurrent program can execute in parallel is important for optimizing and debugging programs, for detecting anomalies, and for improving the accuracy of data flow analysis. In this paper, we describe a new data flow algorithm that finds a conservative approximation of the set of all such pairs for programs that use the rendezvous model of communication. We have carried out a comparison of the precision of our algorithm and that of the most precise of the earlier approaches, Masticola and Ryder’s non-concurrency analysis [15], using a sample of 159 concurrent Ada programs that includes the collection assembled by Masticola and Ryder. For these examples, our algorithm was almost always more precise than non-concurrency analysis, in the sense that the set of pairs identified by our algorithm as possibly happening in parallel is a proper subset of the set identified by nonconcurrency analysis. In 140 cases, we were able to use an exponential-time reachability analysis to compute the set of pairs of statements that may happen in parallel. For these cases, there were a total of only 25 pairs identified by our polynomial-time algorithm that were not identified by the reachability analysis.

1 Introduction As the number and significance of parallel and concurrent programs continue to increase, so does the need for methods to provide developers with information about the possible behavior of those programs. In this paper, we address the problem of determining which pairs of statements in a concurrent program can possibly execute in parallel. Information about this aspect of the behavior of a concurrent program has applications in debugging, optimization (both manual and automatic), detection of synchronization anomalies such as data races, and improving the accuracy of data flow analysis [15]. The problem of precisely determining the pairs of statements that can execute in parallel is undecidable. Instead, we are interested in computing a conservative approximation of all such pairs of statements. In this paper, we use the term MHP information (for May Happen in Parallel) to refer to this approximation. MHP information is conservative in the sense that if there exists a real execution the program such that two statements  and  from different  of     must be included in the MHP information. In this paper, we threads of control happen in parallel, the pair propose a new polynomial-time data flow algorithm for computing conservative MHP information for programs with the rendezvous model of concurrency.

1

This work concentrates on languages with the rendezvous model of concurrency, such as Ada. Since much of the related work concentrated on analysis of Ada programs, the implementation of our approach and the experimental evaluation of this implementation target Ada programs. Therefore, in this paper we use the Ada terminology, e.g. referring to threads of control as tasks and to communications between tasks as rendezvous. The semantics of this concurrency model are introduced in Section 3.1. For reasons of efficiency, in this work we do not take values of the program variables into account. Thus, our computation of MHP information is based entirely on the control flow and synchronizations in the program. In general, this makes the MHP information computed by our approach less precise than if information about program variables was taken into account. Even with this simplification, the problem of computing MHP information is NP-complete [21]. A naive algorithm based on analyzing the state space of a program, without taking program variables into account, is exponential in the number of program threads and therefore impractical. In this paper, we empirically compare the MHP information computed by our algorithm to that computed by the inefficient but more precise reachability algorithm. In addition, we compare the MHP information computed by our algorithm with that computed by the most precise of the polynomial-time algorithms proposed to date, non-concurrency analysis of Masticola and Ryder [15]. In our experimental comparison, we use a set of 159 Ada programs that includes the programs used by Masticola and Ryder to evaluate non-concurrency analysis. On these programs, our algorithm finds all of the MHP pairs identified by non-concurrency analysis in 150 cases; in 118 cases, our algorithm finds pairs that are not found by non-concurrency analysis. In 9 cases, non-concurrency analysis identifies pairs that are not found by our algorithm but, in all of these cases, our algorithm finds many more pairs that are not identified by non-concurrency analysis. For 140 cases, we were able to run the reachability analysis. (In the remaining cases, this precise but inefficient analysis ran out of memory.) Our algorithm fails to find all the pairs of statements that cannot happen together for just six of these 140 programs, missing a total of just 25 pairs. The next section discusses related work. We describe the rendezvous-style synchronous communications of Ada and introduce the program model used by our algorithm in Section 3. Section 4 introduces our MHP algorithm. Section 5 describes Masticola and Ryder’s non-concurrency analysis, investigates a relationship between the program models used in non-concurrency analysis and our algorithm, and shows the results of the empirical comparison between these two approaches. Section 6 concludes and describes future work.

2 Related Work The previous approaches have computed the complement of MHP information about a program, namely cannot happen in parallel information (called Can’t Happen Together, or CHT, by Masticola and Ryder) describing the set of statements that cannot happen in parallel with a given statement. A conservative estimate of the statements that   cannot happen in parallel with a given statement is a set of statements CHT  such that no statement in CHT  can happen in parallel with on any execution of the program. Because of imprecision in computing CHT information,   there may be statements not in CHT  that also cannot happen in parallel with . Thus, the complement of CHT  is a conservative estimate of the set of statements that may happen in parallel with , in the sense that it contains all statements that may happen in parallel with , possibly together with some additional statements. Callahan and Subhlok [3] proposed a data flow algorithm that computes, for a given statement in a concurrent program, a set of statements such that all instances of those statements must execute before any instance of the given statement (B4 analysis). This approach is defined for concurrent programs with a post-wait type of synchronization (similar to the wait-notify mechanism of Java). This algorithm computes B4 relationships among pairs of statements based on control flow within individual threads of control and the pattern of post and wait commands in those threads. In the worst case, the complexity of B4 analysis is cubic in the number of program statements. Duesterwald and Soffa [5] proposed an algorithm for solving the B4 problem for Ada programs in the presence of procedures and demonstrated its usefulness for detecting data races in concurrent software. The worst-case complexity of this algorithm is also cubic in the number of statements in the program.

2

    ! " #%$ %%$  &'( ) +* %%$ ,)  " # %+* %!  " #%$-* %!  .*

  /10,  ! 324 5&'() 6* 6* 327)  " # %6* %! ,/108*

  / 9:  ! 324 5&'() 6* -* 327)  " # %6* %! ,/ 9+*

Figure 1: Illustration of the case where B4 analysis misses concurrency information

While computing the information about program statements that cannot happen in parallel based on the B4 information alone is conservative, this approach will miss statements that cannot happen in parallel but are not in a B4  relationship. Figure 1 shows a fragment of a simple Ada program that illustrates this point. In this program, task /10 /9   implements a shared lock used by tasks and . Task makes sure that only one other task in the program  can execute any region of code between calls on entries acquire and release. calls to procedures and in    Therefore, /10 / 9 3  ; tasks and cannot happen in parallel. B4 analysis cannot detect that CHT, because, depending on which /10 / 9 #32< 5&'( )  of the tasks and calls on entry first, calls to and can occur in either order. Masticola and Ryder [15] extend B4 analysis by deriving concurrency information to identify additional pairs of statements that can never happen in parallel. In this approach, called non-concurrency analysis, four techniques are applied repeatedly to refine the can’t happen together (CHT) information about the program. One of these refinements is a version of the B4 analysis and the others use patterns detected in a graph model of the program that provide sufficient conditions for concluding that the statements corresponding to two nodes many never happen in parallel. For example, one of the refinements detects the situation illustrated in Figure 1 as an instance of a commonly occurring   critical section construct and therefore determines that  =; CHT. The refinements are done repeatedly, until none of them produces an improvement of the CHT information. Non-concurrency analysis, like ? our MHP analysis, @?BA  relies on inlining of subprograms. The worst-case complexity of this approach is > , where is the number of statements in the program. This approach subsumes the previous approaches and thus computes the most precise information to date. Therefore, in this paper we compare our proposed MHP algorithm to non-concurrency analysis.

3 Program Model In this section, we introduce the Ada concurrency mechanism and then propose a graph model that supports this mechanism. This model conservatively captures all possible executions of a program and is used by our algorithm to compute the MHP information for this program.

3.1 The Task Communication Mechanism of Ada In Ada, a program consists of a set of threads of control, called tasks, that may run in parallel. The basic construct for communication and synchronization between tasks is the rendezvous, a form of synchronous communication. A task may call on a named entry in a second task; execution of the calling task is then blocked until the called task accepts the call and the two tasks complete the rendezvous, possibly passing information in both directions. We say that a call  which has not yet been accepted is pending. A task declaring a particular entry may accept calls from other tasks on #%5$  that entry by executing an statement; if no calls on this entry are pending, the accepting task cannot execute #%%$#   the statement and is blocked until a call on entry is made by another task. ' DD) 1 Figure 2 contains a code example with three tasks . One of these tasks, C , models a shared buffer of size 1, )8 )(0 )8 )9 )8  ' D#D) into which the other two tasks, E and E , write by calling on entry F declared by the C task 1 Note

that the body of the procedure

GIHG4J G 00 9

        !   #"$%'&)((*&,++.-/01,2  3  342   #"65 42.47 48 9 0 1:42;&)(( 00 ?(@7 A B => 00 ?(DC 4 9%64  E%F--) => 00 ?( > 27 A, B 2  74? 2    

 , : => 00 1 C 4 2N7 > L"$,2  3  3,2 OO  PM142782 Q  42.4  42.4  74 AA 17 A, B

 A > 142. G C  H , I"J 9 342 2 > 77E 2  42. 9

AA :C 9%RK7 > L  "S42!42 3 M-T  C 2N7 > L"65 K 7 > 9 2 /C 4 9 AA  > 27 A, ? 2 7? 2  => 00 9 Figure 24: Shared buffer example using task types p

a

. . . s1

sr

b

Figure 25: General form of an activation node

  CCFG for this task are labeled q ... ¿ , and the begin node in the dynamically started task is labeled U . Figure 26  ã ifã are labeled  ã ü ifgãÞ and  ã ü ifãß . shows the PEG for the example in Figure 24. The two instances of task ü The numbers associated with the nodes are the same as in the PEG in Figure 5 that corresponds to the example with å i  ã ifã static writer tasks, except for the four new nodes. Nodes 27 and 28, both labeled ü ü ü , represent the calls to  å i  ã ifã ã ijfã procedure ü ü . Each of these calls triggers creation of a ü task. Activation nodes 29 and 30 represent ü this creation. Modifications of the MHP algorithms are trivial. Since each activation node has a single predecessor, the MHP information associated with this predecessor is simply propagated into the activation node: M vl }_ƒ M vY’ } , where l denotes an activation node and ’ denotes its predecessor. For the purpose of computing GEN sets, activation nodes are treated in the same way as rendezvous nodes. We conservatively assume that all activation nodes are reachable.

42

7 begin

27 initWriter

29

Writer1

Buffer

Writer2

1 begin

28 initWriter

2 Buffer.lock

8 accept lock

13 begin

30

14 Buffer.lock 23

19

9

3 Buffer.write

15 Buffer.write

accept write−start 20

4

24

16

10 accept write−end 25

21

11 accept unlock

5 Buffer.unlock

12 end

Buffer.unlock 26

22

6

17

end

18 end

Figure 26: The PEG for the shared buffer example with task types

C

The Proof of Equivalence of the Basic and Efficient Versions of the MHP Algorithm

The following lemmas and theorems prove that the efficient and basic versions of the MHP algorithm compute identical information. Lemma 9. Š ± r |  w , if ±  INeff v | } at any point during the efficient MHP algorithm, then eventually ± MHPeff v | } , unless | is a rendezvous node that the efficient algorithm determines is not reachable.



Proof. Suppose that at some point during the efficient MHP algorithm in Figure 10 some ±  w gets in INeff v | } . This can happen in lines (10), (20), or (25). First consider the case when ± is added to INeff v | } in line (10). If the Reacheff flag of | is set, ± will be added to NewMeff v | } in line (12) and subsequently added to Meff v | } in line (28). If 43

the Reacheff flag of | is not set, but is set on some subsequent iteration of the algorithm, ± will be placed in Meff v | } in a similar way. Finally, if the Reacheff flag of | is never set, ± is not placed in Meff v | } . Consider the case where ± is added to INeff v | } in line (20). If ± is not already in Meff v | } , ± is added to NewMeff in line (21) and subsequently to Meff v | } in line (28). If ± is added to INeff v | } in line (25), it happens on an iteration for a node other than | , | would be placed on the worklist in line (26). Subsequently, when | is taken off the worklist, one of the first two cases applies. Lemma 10. Š ± r’  N r Š |Ս LOCAL, if ± Meff v | } .

is added to Floweff vY’/r | } at some point, then eventually ±

is added to

Proof. Node ± is added to Floweff vY’/r | } in line (30). Since set NewM is not empty on that iteration (it must contain ± ), successors of ’ , including | , are added to the worklist in line (27). When | is taken off the worklist, ± is taken from Floweff vY’Îr | } and added to INeff v | } in line (20). According to Lemma 9, ± will be added to Meff v | } . Lemma 11. Š ± r%’ q r%’ ”  N r Š |² REND, if ± is added to Floweff vY’ q r | } at some point, ± is added to Floweff vY’ ” r | } at some point, and the flag Reacheff of | is set at some point, then eventually ± is added to Meff v | } . Proof. Node ± is added to Floweff vY’Ÿq]r | } and Floweff v™’M”gr | } in line (30), on the respective iterations for nodes ’Ÿq and | | ’M” . In both cases, is placed on the worklist in line (27). Consider line (10) that is executed every time node is ± | taken off the worklist. Without loss of generality we assume that is added to Floweff vY’Ÿq]r } before it is added to Floweff v™’ ” r | } . Suppose that line (10) is executed while ± Floweff vY’ q r | } , but before ± is added to Floweff vY’ ” r | } . Then, if ±  ³ Meff v™’ ” r | } , ± is not added to INeff v | } . Now suppose that this line is executed while ±A Floweff vY’ ” r | } . By this point, ±  Meff vY’ q } and therefore ± is added to IN v | } . The statement of this lemma follows after using the result of Lemma 9. Theorem 12. The efficient MHP algorithm correctly implements the basic MHP algorithm, in the sense that MHP sets computed by the two algorithms are the same for every node in the PEG: |˜ Š

Proof. First we prove that Š

|˜

|

wŽr MHPeff v wŽr MHP v

|

}Cƒ

}‘Ë

MHP v | }

(5)

MHPeff v | }

(6)

We carry out the proof by induction on the number of iterations of the basic MHP algorithm. We will prove that Statement 6. If on › -th step of the basic MHP algorithm node ± Meff v | } at some point in the efficient MHP algorithm.

is added to the M set of node | , then ±

is added to

After 0 iterations of the basic MHP algorithm, Š |˜ wŽr M v | }Cƒ©¦ , so statement 6 trivially holds. Suppose that statement 6 holds after › steps and consider the ›WV Ä -st iteration of the basic MHP algorithm. Suppose that ± is added to M v | } , for some ± r | w . We have to consider two cases, based on whether | is a local or rendezvous node. Suppose first that |ȍ REND. Let Preds v | }‹ƒ‰… ’ q r%’ ” † . Since ±  M v | } , according to the basic MHP algorithm i ãpâQf Reach v | } must be set to , which means that ’ q  M v™’ ” } . By the induction hypothesis, ’ q YX eff vY’ ” } , and so i ãpâQf | Reacheff v } is set to in line (6). Consider all possible ways in which ± could propagate into M v | } . Suppose first that ± was propagated into | from its predecessors, which means ±A M v™’ q } ¢ ±A M v™’ ” } . By the induction hypothesis, ±A Meff vY’ q } ¢ ±A Meff v™’ ” } , and so ± is placed in Floweff vY’ q } and Floweff v™’ ” } . According to Lemma 11, ± will be placed in Meff v | } . If ± propagates into X v | } by symmetry, it means that |² M v ± } , and so by the induction hypothesis |² Meff v ± } , which means that |² MHPeff v ± } . In this case, ± is inserted in INeff v | } in line (25), and by Lemma 9, ±AZX eff v | } . Now suppose that |# LOCAL. In this case ± could get into X v | } by one of three ways, via propagation from a predecessor, by symmetry, and by using the GEN rule. First, suppose that ½[’  Preds v | } ~ ±  M v™’ } . By the induction hypothesis, ±  Meff v™’ } and therefore ± is inserted in Floweff vY’Îr | } . According to Lemma 10, ±A Meff v | } . 44

The symmetry case is handled in the same way as for rendezvous nodes. Finally, suppose that ±  GEN v | } . Let k be the rendezvous predecessor of both ± and | . For ±  GEN v | } , it i ãâQf i ãâQf is necessary that Reach vk } is set to . By the induction hypothesis, Reacheff vk } is set to . When Reacheff v%k } | becomes true, k is inserted in ReachableComPredseff v } in line (8). Subsequently, in line (19), ± is added to GENeff v | } and then, in line (21), to NewM. To complete the proof of this theorem, we need to show that Š

|˜

wŽr MHPeff v

|

}ÌË

MHP v | }

(7)

Again we use induction and formulate the following statement, which implies (7). Statement 7. If on › -th step of the efficient MHP algorithm node ±

is added to the Meff set of node | , ±A M v | } .

Statement 7 trivially holds before any iterations of the main loop of the efficient MHP algorithm, since Š |  | | wŽr M v }‡ƒ Meff v }Cƒ§¦ . Suppose that Statement 7 holds after › iterations of the main loop of the efficient algorithm. Consider ›[VÄ ’st iteration of the main loop of the efficient algorithm. Suppose that ± is added to Meff v | } . First assume that |  REND. If the Reacheff flag of | was set to true on this iteration, ± could be put in Meff v | } by way of executing line (12). Therefore, ± was put in INeff v | } on either this or one of the previous iterations, by executing line (10). Assume, without loss of generality, that ±  Floweff vY’ q r | }ÌÒ Meff v™’ ” } on that iteration. ±  Floweff v™’ q r | } means that ± was also put in Meff v™’ q } on one of the preceding iterations, and so by the induction hypothesis, ±  M vY’ q } . Similarly, since ±  Meff vY’ ” } , then by the induction hypothesis, ±  M v™’ ” } . Therefore, ± is put in M v | } after | becomes reachable in the basic algorithm. Node ± could also be put in INeff v | } by the symmetry step in line (25). This means that |² Meff v ± } on a previous iteration of the efficient algorithm and so by the induction hypothesis, |  M v ± } and symmetry step of the basic algorithm places ± in M v | } . Now consider the case of |² LOCAL. Then it is either the case that ±A GENeff v | } or ±A INeff v | } . In the former case, a rendezvous predecessor k of | and ± must be marked as reachable and placed in the ReachableComPreds set of | . This means that on some previous iteration line (8) is executed for node k . Then the two local predecessors of k are in the Meff sets of each other on that iteration. By the inductive hypothesis, these predecessors are in the M sets of i ãpâQf and ± is put in each other on some iteration of the basic algorithm. Consequently, the Reach flag of k is set to | | GEN v } and subsequently in M v } . The cases where ± is propagated into INeff v | } from a predecessor of | or by symmetry are handled analogously to the case of rendezvous nodes above.

D

Subsumption of Some Non-concurrency Analysis Refinements by the MHP Algorithm

D.1 Subsumption of B4 Analysis For each node | in the sync graph, a set Reach v | } is created, which contains all nodes in this graph from which | is reachable via control flow edges. Further, a set \ v | } is built as follows:

\«v |

}Cƒ

Reach v | }

]

… ’

÷ |²

Reach vY’ }š†

Intuitively, \«v | } is the set of all nodes that reach | , but that are not reachable from | via paths of finite length. It is easy to see that |  ³ Reach v | } , as there is a path from | to | of length 0. The algorithm in Figure 27 (adapted from [15]) computes B4 information on a sync graph.

45

Algorithm 8 (B4 analysis). Input: Sync graph v%w sync rx

Suggest Documents