Visual-MCM: Visualising Execution Histories on

Visual-MCM: Visualising Execution Histories on Multiple Memory Consistency Models Alba Cristina Melo and Simone Cintra Chagas University of Brasilia, Campus Universitario - Asa Norte, Brasilia, Brazil

Abstract. The behaviour of Distributed Shared Memory Systems is dictated by the memory consistency model. In order to provide a better understanding on the semantics of the memory models, many researchers have proposed formalisms to de ne them. Even with formal de nitions, it is still dicult to say what kind of execution histories can be produced on a particular memory model. In this paper, we propose Visual-MCM, a visualisation tool that shows what operations orderings could lead to user-de ned execution histories on dierent memory models. We also present a prototype of Visual-MCM that analyses execution histories for two dierent memory consistency models.

1 Introduction Using the shared memory programming paradigm in parallel architectures is quite complex. The main reason for this is the diculty to make the shared memory behave exactly as if it was a uniprocessor memory. This characteristic is important since it could make uniprocessor programs automatically portable to parallel architectures. Besides, programmers are already used to the uniprocessor shared memory programming model and using a similar model would make parallel programming easier. In [6], a memory model called Sequential Consistency (SC) was proposed. In this model, all parallel processes are able to observe all shared memory accesses in the same order. Although SC provides a relatively easy programming model, there is a great overhead on coherence operations to guarantee the global order. To reduce coherence overhead, researchers have proposed to relax some consistency conditions, thus creating new shared memory behaviours that are different from the traditional uniprocessor one. These memory models are called relaxed memory models because they only guarantee that some of the shared memory accesses are seen by all processors on the same order. Memory consistency models, have not been originally formally de ned. The lack of a unique framework where memory models can be formally de ned was also observed by other researchers and some work was indeed done in the sense of formal memory model de nitions. However, it is still dicult to say if some undesirable orderings will be produced when using a particular memory model, even for formally-de ned memory consistency models. This led sometimes to unexpected behaviours, and some undesirable execution histories have been produced.

In this article, we present Visual-MCM, a visualisation tool that assists DSM designers in the task of analysing a memory consistency model. This new tool analyses a particular execution history on a memory model and shows if it is valid or not. Basically, we build an execution tree and traverse it, respecting the constraints imposed by the current memory model. The rest of this paper is organised as follows. Section 2 describes the formalism used to de ne memory models. Section 3 describes the approach used to decide the validity of an execution history on a chosen memory model. The prototype of Visual- MCM is presented in section 4. Related work in the area of DSM visualisation is presented in section 5. Finally, conclusions are presented in section 6.

2 Formalising Memory Consistency Models To de ne memory models formally, we use a history-based system model that was already described in [4]. In table 1, we only review some de nitions.

Table 1. System Model

System A nite Set of Processors Processor pi Executes operations on M Shared Global Memory M Contains all memory addresses Caches all memory addresses of M Local Memory mi opi (x)v Operation executed by pi on address x with value v Basic types of operations read(r) and write(w) Synchronisation operations sync(x);acquire(x);release(x);user-de ned functions opi (x)v issued Processor pi executes the instruction opi (x)v rpi (x)v performed The value of x returned to pi cannot be modi ed wpi (x)v C = a=0;n?1 wpa(x)v Value v is written on position x of mj wpi (x)v performed to pj wpi (x)v performed wpi (x)v is performed with respect to all processors Local Execution History Hpi Ordered sequence of memory operations issued by pi UHpi Execution History H Memory Consistency Model Order relation on a set of shared memory accesses

P

In our de nitions, we use the notion of linear sequences. If Q is a history, a linear sequence of Q contains all operations in Q exactly once. A linear sequence is legal if all read operations r(x)v return the value written by the most recent write operation on the same address in the sequence. In execution histories, we have some operation orderings that are allowed and some orderings that are forbidden . The decision of which orderings are valid is made by the memory consistency model. One execution history is valid on a memory consistency model if it respects the order relation de ned by the model.

An order relation that is used in the de nition of nearly all memory consistency models proposed until now in the literature is program order. Program-order. po An operation o1 is related to an operation o2 by programorder (o1 ?! o2) if: a) both operations are issued by the same processor pi and o1 immediately precedes o2 in the code po of pi or po b) 9o3 such that o1 ?! o3 and o3 ?! o2.

The program order that we use in our de nitions is, thus, a total order on Hpi .

2.1 Sequential Consistency Sequential Consistency (SC) imposes an order relation on all shared memory accesses, i.e, it imposes a total order on H. Our formal de nition of sequential consistency is derived from the de nition presented in [1]: Sequentially Consistent History. A history H is sequentially consistent SC of the set of operations on H such if there is a legal linear sequence ?! that: po SC o2. i) 8o1; o2 where o1 ?! o2 then o1 ?!

In this de nition, we de ne a new order relation ( SC ) where all processors must perceive the same execution order of all shared memory accesses (legal linear sequence of H). In this order, all operations performed by a process must respect the program order (i). P1: P2: P3:

w(x)1 r(y)2 w(y)2 r(x)0 r(x)1

Fig.1. A Sequentially Consistent Execution History The execution history shown in gure 1 is sequentially consistent since we are able to derive at least one legal linear sequence of all memory operations where all processorspo agree. In this history, the program order that must be respected po is wp3 (y)2 ?! rp3(x)0 ?! rp3(x)1. The following execution order is a valid po r (x)1. SC r (y)2 ?! SC r (x)0 ?! SC w (x)1 ?! execution under SC: wp3(y)2 ?! p2 p3 p1 p3 SC Note, however, that this is not the only legal sequence ?! that can be derived.

po SC r (x)0 ?! SC w (x)1 ?! SC r (x)1 ?! The sequence wp3 (y)2 ?! rp2(y)2. is also p3 p1 p3 a valid execution sequence on SC.

2.2 PRAM Consistency

PipelinedRAM Consistency (or PRAM) [7] relaxes some conditions imposed by SC and requires only that writes issued by the same processor are observed by all processors in the order they were issued. In other words, among the operations issued by other processors, only write operations must obey the program order. Before de ning PRAM consistency formally, we must de ne a new execution history: History Hpi+w . Let H be a global execution history and pi be a processor. History Hpi+w is the history of writes with respect to processor pi and it is a sub-history of H that contains all operations issued by processor pi and all write operations issued by the other processors. PRAM Consistent Execution History. A history H is PRAM consistent if there is a legal linear sequence PRAM ?! of the set of operations on Hpi+w such that: po o2 then o1 PRAM i) 8o1; o2 where o1 ?! ?! o2.

By this de nition, we can see that it is no longer necessary for all processors to agree on the order of all shared memory operations. Every processor has its own view (Hpi+w ) of the shared global memory M and the order PRAM ?! is de ned for Hpi+w . In this view, the program order of Hpi+w must be respected (i). P1: P2: P3:

w(x)1 r(x)1 w(x)2 r(x)2 r(x)2 r(x)1

Fig. 2. A PRAM Consistent Execution History For an execution history to be PRAM consistent, there must be a history

Hpi+w for every pi where the program order is respected. Considering the execution history presented in gure 2, the following legal Hpi+w can be derived: Hp1+w : wp1(x)1 PRAM ?! wp2(x)2. PRAM Hp2+w : wp1(x)1 ?! rp2(x)1 PRAM ?! wp2 (x)2 PRAM ?! rp2 (x)2. PRAM PRAM PRAM Hp3+w : wp2(x)2 ?! rp3(x)2 ?! wp1(x)1 ?! rp3(x)1. As it was possible to derive valid Hpi+w for every pi , this execution history

is valid on PRAM Consistency. Note, however, that the same history is not a valid history on Sequential Consistency since it is impossible to derive a total order on H.

3 Validating Execution Histories Computing orderings for execution histories is a problem that has been extensively studied in [8] and this problem is shown to be co-NP-hard. In Visual-MCM, the task of validating a given execution history is decomposed in three main steps. First, an execution tree is constructed for the given history. Second, a memory model is chosen and some of its constraints are extracted. Third, the extracted constraints are used to traverse the execution tree and nd out if there are execution paths that are valid on the chosen model.

3.1 Step 1: Building the Execution Tree In order to build a complete execution tree, we must include all possible interleavings of shared memory operations that appear in an execution history. Figure 3 shows an execution history and its associated execution tree. /

p1: w(x)1 P2:

r(x)1

w(y)2 a w(x)1

r(x)1

f w(y)2

w(x)1

d w(y)2

w(x)1

w(y)2

r(x)1

w(y)2

w(x)1

r(x)1

ex1

ex2

ex3

ex4

ex5

b r(x)1

c

e

g

w(y)2

r(x)1

h w(x)1

ex6

Fig. 3. An Execution History and Its Associated Execution Tree In gure 3, there are six possible execution paths. Each path is an ordered sequence of nodes, from the root to the leaf. For example, the path ex1 is the execution order: wp1 (x)1 ?! rp2 (x)1 ?! wp2 (y)2. As it can be easily seen, there are p! linear execution orders that can be derived, where p is the number of memory operations to be considered. Even for a relatively small number of operations, we observe an explosion of the number of possible paths. Thus, something must be done in order to reduce the size of the execution tree. Although all sequences in gure 3 are linear, some of them are not wellformed [9]. Sequences ex3, ex4 and ex6 are not well-formed since they read a

value before this value has been written. Removing non well-formed sequences from the execution tree will surely reduce its size and that is done with no sideeect, as there is no practical interest in allowing values to be read before they are written. In gure 3, the execution tree built by our system will not contain the dashed paths.

3.2 Step 2 - Extracting Constraints from the Memory Consistency Model For every chosen memory consistency model, there are two decisions that must be taken. These decisions will be based on the formal de nitions. As an illustration, we will consider the formal de nitions of Sequential Consistency and PRAM Consistency presented in section 2.1 and 2.2. First, we must identify the set of operations that will be validated. In the case of strong memory models, such as Sequential Consistency, all shared memory operations must be included. In the case of relaxed memory models, only a subset of the memory operations will be veri ed. In PRAM consistency, Hpi+w is this subset. Second, we must de ne what order must be respected in this set of operations. In the particular case of Sequential Consistency and PRAM Consistency, the only order to be respected is program order. For the particular history in gure 3, po the program order that must be respected is rp2(x)1 ?! wp2(y)2.

3.3 Step 3: Traversing the Execution Tree The last step in validating a history consists to examine all possible paths that belong to the execution tree and verify if they satisfy the constraints imposed by the model. We use a left-to-right in-depth algorithm to traverse the execution tree in search of valid execution paths. When a node is visited, there are two decisions that can be taken. If the operation in the node is valid on the memory consistency model, the in-depth search continues and the next node is examined. If the operation is not valid, the current path is abandoned and the next left-toright path is examined. If we arrive at one valid leaf, that means that the whole path is valid. To illustrate this procedure, the search for valid paths on Sequential Consistency will be done on the execution tree presented in gure 3. We start our traversal by examining node a. The operation wp1 (x)1 is valid since there are no constraints on legality or program order. The next operation in the path is rp2(x)1 on node b. The legality is respected since the operation rp2(x)1 reads the value written by the most recent write in the sequence. Program order of p2 is also respected: rp2 (x)1 is the rst operation in p2's program code. Thus, node b is also valid. The next node is node c. Operation wp2(y)2 respects program order since it comes after rp2(x)1. For this reason, it is also a valid node. As we were able to arrive at one valid leaf, the whole path wp1(x)1 ?! rp2(x)1 ?! wp2(y)2 is valid on Sequential Consistency. The algorithm continues to examine the tree

and chooses node d that contains wp2 (y)2. This is not a valid node on Sequential Consistency since this order violates the program order of p2. For the same reason, the path starting on node f is not valid. As there is at least one valid execution path, the execution history is valid on SC. Validating an execution history on PRAM Consistency is much more complex. The main reason for this is that, as it is a relaxed memory model, we must be able to derive valid execution paths for every processor.

4 Prototype Implementation We implemented a prototype of Visual-MCM using the Delphi 3.0 programming environment. The functionality of the prototype is shown in gure 4. .................................................................................................................................... .. .. .. .. .. execution ...tree .. Construction of the execution tree .. .. .. .. .. .. .. .. .. .. .. .. ? .. .. .. MCM Analysis .. .. .. .. .. SC Analysis .. .. .. .. .. .. PRAM Analysis .. .. .. .. execution path .. valid . .. Other MCM Analysis or ....not valid message .. .. .. .. .. .. . ...................................................................................................................................

Fig. 4. Prototype of Visual-MCM In our visualisation prototype, there are two main modules: Construction of the Execution Tree and Memory Consistency Model Analysis. The rst module receives an execution history as input. This execution history can be selected by the user among prede ned execution histories or can be provided by the user himself. The basic function of this module is to construct a tree containing the possible execution paths that could have generated the chosen execution history. Normal execution trees are generated as explained in section 3.1. Nevertheless, in order to further reduce the number of possible paths, the user can choose to generate an optimised tree. The optimised execution tree will not contain

execution paths that do not respect program order. This kind of execution trees cannot be used in memory consistency models that do not impose program order to be entirely respected. If the user asks our visualisation tool to analyse histories on these kind of memory models and an optimised execution tree has been generated, our tool forces the user to regenerate a normal execution tree. Processors

Local Memories X Y Z

P1 r1(y)0 w1(x)2 w1(y)1

2

1

1

P2 r2(z)0 w2(z)1

2

1

1

r2(z)0

r1(y)0

w1(x)2 r2(z)0

r2(z)0 w1(y)1

w2(z)1

r1(y)0

w1(x)2

w2(z)1

w1(x)2

w2(z)1

r1(y)0

w2(z)1

w1(x)1

r2(z)0

w2(z)1

w1(y)1

w1(x)2

w1(y)1

w2(z)1

w1(x)2

w1(x)2

w1(y)1

w2(z)1

w2(z)1

w1(y)1

w2(z)1

w1(y)1

w2(z)1

w1(y)1

w1(y)1

w1(y)1

Current Memory Consistency Model: Sequential Consistency Valid Sequence:

r1(y)0 -> w1(x)2 -> r2(z)0 -> w2(z)1 -> w1(y)1

Fig. 5. Visualizing Execution Orderings on Visual-MCM After the tree construction, the user must choose which memory consistency model is to be used in the analysis. By now, we have only implemented Sequential Consistency and PRAM Consistency. The chosen MCM receives the execution history and its associated execution tree and searches the tree for valid paths, as explained in section 3.3. Our visualisation tool provides both a memory view and an ordering view. The memory view shows the values of each memory position x for every memory Mi. The ordering view shows the execution path that is being examined as well as the valid and invalid execution paths that have already been decided. The analysis of an execution history can be done automatically or in a step-by-step basis. Figure 5 shows how a particular execution history is analysed on VisualMCM. Note that the analysis of possible execution paths depends only on the Memory Consistency Model. No particular implementations of memory models

are considered.

5 Related Work [5] presents a visualisation tool called StormWatch. This tool is able to analyse the same application on dierent memory coherence protocols. StormWatch provides trace, source and communication views. The trace view is based on execution histories. Maya is a simulation platform described in [2]. It analyses the behaviour of coherence protocols that implement the following memory consistency models: Sequential Consistency, Causal Consistency, PRAM Consistency and Entry Consistency. The same parallel application is executed on several memory models and results are shown in terms of execution times. Basically, these two systems provide visualisation of the behaviour of parallel applications on some speci c implementations of memory consistency models. While this kind of visualisation is very useful, some of the constraints imposed by a particular implementation of a memory consistency model can lead to bad performance results. In this case, we are not able to tell if the problem is on the implementation or on the memory model itself. Besides, most of the proposed visualisation tools emphasize performance and do not analyse possible results of a parallel application execution. Analysing feasible event orderings has been previously done in the domain of parallel and distributed debugging, specially to detect race conditions [3]. [8] analysed formally the possible event orderings that can be generated by a parallel program and demonstrated the NP-hardness of this problem. As far as we know, this is the rst work that provides a visualisation tool that can be used to analyse the eects and the potentialities of memory consistency models, in a way that is totally independent from coherence protocols that could implement it. We claim that this separation is necessary and the design of a consistency mechanism should be done in two steps. First, one or more memory consistency models must be chosen. Second, for each chosen memory consistency model, a coherence protocol must be speci ed. Our visualisation tool will be helpful on the rst step of this process.

6 Conclusions and Future Work In this article, we presented a visualisation tool that analyses execution histories and shows graphically what valid execution paths could have led to their production. The analisys of execution histories can be done on a variety of memory consistency models. We claim that visualising the possible interleavings of shared memory accesses is helpful for multiprocessor hardware designers and DSM-system designers in the task of choosing the most appropriate memory model. This can reduce considerably the number of unexpected results that are produced after the DSM- based machine or the DSM-software is released. As future work, we intend to incorporate other memory models to VisualMCM. Also, we intend to de ne a language for specifying memory consistency

models based on formal de nitions. Using this language, the user can de ne its own memory models and interactively visualise the interleavings of memory operations that can produced on the newly de ned memory consistency model. This facility will be helpful in memory model de nition as well as in the comparison of distinct memory consistency models.

References 1. Ahamad, M. et al.: The Power of Processor Consistency. Technical Report GIT-CC92/34, GIT (1992) 21 pages 2. Agrawal D., Choy M., Leong H., Singh A.: Evaluating Weak Memories with Maya. In Proceedings of the 8th Workshop on Parallel and Distributed Simulation (1994) 151-155 3. Adve S., Hill M., Miller B., Netzer H.: Detecting Data Races on Weak Memory Systems. In Proceedings of the 18th ISCA (1991) 234-243. 4. Balaniuk A.: Multiple Memory Consistency Models on a SVM Parallel Programming Environment. In Proceedings of the Int. Conf. OPODIS'97 (1997) 249-260. 5. Chilimbi T., Ball T., Eick S., Larus J.: StormWatch: a Tool for Visualizing Memory System Protocols. Supercomputing'95 (1995). 6. Lamport L.: How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs. IEEE Transactions on Computers (1979) 690-691. 7. Lipton, R. J. and Sandberg, J. S.: PRAM: A Scalable Shared Memory. Technical Report CS-TR-180-88, Princeton University, (1988). 8. Netzer R., Miller B.: On the Complexity of Event Ordering for Shared-Memory Parallel Program Executions. Technical Report TR-908, University of WisconsinMadison, (1990). 9. Heddaya A., Sinha H.: An Overview of Mermera: a System Formalism for NonCoherent Distributed Parallel Memory. Technical report BU-CS-92-009, Boston University, (1992), 21 pages.

This article was processed using the LaTEX macro package with LLNCS style