A Framework of Memory Consistency Models Hu Weiwu(?) Shi Weisong() and Tang Zhimin() Institute of Computing Technology Chinese Academy of Sciences, Beijing 100080, P.R. China E-mail: fhww,wsshi,
[email protected] Abstract
Previous descriptions of memory consistency models in shared-memory multiprocessor systems are mainly expressed as constraints on the memory access event ordering and hence are hardware-centric. This paper presents a framework of memory consistency models which describes the memory consistency model on the behavior level. Based on the understanding that the behavior of an execution is determined by the execution order of con icting accesses, a memory consistency model is de ned as an interprocessor synchronization mechanism which orders the execution of operations from dierent processors. Synchronization order of an execution under certain consistency model is also de ned. The synchronization order, together with the program order, determines the behavior of an execution. This paper also presents criteria for correct program and correct implementation of consistency models. Regarding an implementation of a consistency model as certain memory event ordering constraints, this paper provides a method to prove the correctness of consistency model implementations, and the correctness of the lock based cache coherence protocol is proved with this method. Keywords: Framework, memory consistency models, synchronization model, correct program, correct implementation.
1 Introduction As an important aspect of shared memory architecture, memory consistency models have been extensively studied and a number of memory consistency models have been proposed. A consistency model can be viewed as a contract between software and hardware. On one hand, a consistency model speci es how memory operations of a program will appear to execute to the programmer. On the other hand, it places speci c requirements on the order that shared memory accesses are performed. Sequential consistency[13] is considered the most commonly assumed shared memory model by the programmer, but places strict restriction in the order by which memory accesses are The work of this paper is supported by the Climbing Program, the National Natural Science Foundation of China, and the President Young Creation Foundation of the Chinese Academy of Sciences.
1
performed and hence is detrimental to performance. To release the strict restriction on the memory event ordering by sequential consistency, many relaxed memory consistency models have been proposed. Most of them are proposed for hardware optimization purpose and are speci ed in terms of loosing restrictions on the allowable memory event ordering, i.e., they are hardwarecentric in nature. These hardware-centric descriptions of relaxed memory consistency models facilitate hardware optimization, but are at the cost of signi cant disadvantages of programmers. Besides, some of the hardware-centric speci cations prohibit implementations that would not otherwise violate the intended semantics of the model. Based on the understanding that the behavior of an execution is determined by the execution order of con icting accesses, this paper de nes a memory consistency model as an interprocessor synchronization model which orders the execution of two operations from dierent processors. Synchronization order for some typical consistency models are de ned. This synchronization order, together with the program order, determines the behavior of an execution. With these observations, an execution of a program under certain consistency model is de ned as an ordering of synchronization operations of the program under that model. This paper also presents criteria for correct program and correct implementation of consistency models. Regarding an implementation of a consistency model as certain memory event ordering constraints, this paper provides a method to prove the correctness of consistency model implementations. As an example, with the framework built in this paper, we prove that the lock based cache coherence protocol we proposed in [11] is a correct implementation of scope consistency. The rest of this paper is organized as follows. The following section reviews some memory consistency models. A new framework of memory consistency models is given in Section 3. Section 4 describes a method to prove the correctness of consistency model implementations. Section 5 proves the correctness of the lock based cache coherence protocol. Conclusion of this paper is summarized in Section 6.
2 Related Consistency Models 2.1 Sequential Consistency(SC)
Sequential consistency is the programming model most favoured by programmers. It de nes a correct execution as the one whose result is the same as if the operations of all processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order speci ed by the program[13]. In a sequentially consistent machine, correctness is ensured by requiring a process to wait for its previous shared memory accesses to be \globally performed" before it can issue another access to a shared variable[9]. Sequential consistency places strict restriction in the order by which memory accesses are performed. Many performance enhancement techniques that are inherent in uniprocessors, such as prefetching, pipelining, multiple-issue, and write buer, are not allowed in a sequentially consistent machine.
2
2.2 Processor Consistency(PC)
Processor consistency[10] is proposed to relax some of the event orderings imposed by sequential consistency. It is relaxed than SC in that it allows loads following a store to bypass the store. Processor consistency is weaker than sequential consistency therefore it may not yield correct execution if the programmer assumes sequential consistency.
2.3 Weak Consistency(WC)
Weak consistency model[8] looses the constraints on the allowable event ordering by making a contract between the programmer and the hardware designer. In a system with a weak ordering of events, the hardware should be able to distinguish synchronization accesses from ordinary shared accesses, and the programmer bears the responsibility to ensure mutual exclusion for accesses which may cause consistency problems by using hardware recognizable synchronization primitives. The weak consistency model imposes the following constraints on the order that shared memory accesses are executed: synchronization accesses are sequentially consistent with respect to one another, before an ordinary load or store access is allowed to perform with respect to any processor, all previous synchronization accesses must be performed, and before a synchronization access is allowed to perform with respect to any processor, all previous ordinary load and store accesses must be performed. The above conditions allow memory accesses issued by a given processor to be observed and complete out of order with respect to other processors, thus permitting multiple ordinary accesses to be pipelined.
2.4 Release Consistency(RC)
In a system with a release ordering of events, synchronization accesses are further divided into acquire and release operations[9]. An acquire is performed to gain permission of the access to a set of shared locations. A release relinquishes this right. Release consistency imposes the following constraints on the event ordering of memory accesses: Synchronization accesses are sequentially consistent with respect to one another, before an ordinary load or store access is allowed to perform with respect to any processor, all previous acquire accesses must be performed, and before a release access is allowed to perform with respect to any processor, all previous ordinary load and store accesses must be performed. Hardware DSMs like DASH adopt release consistency. In DASH, write operations to shared locations are propagated to other processors without delay. When a processor issues a store operation, the coherence protocol is started to make the stored value visible to other processors. This allows multiple writes to shared memory to be pipelined to combat memory latency.
3
2.5 Eager Release Consistency(ERC)
In software DSMs, it is also important to reduce the number of messages exchanged, because sending a message in a software DSM is more expensive than that in a hardware DSM. Therefore, in Munin's implementation of release consistency (so-called eager release consistency)[6], writes to shared memory are buered until a release, at which all writes to the same destination are merged into a single message. By merging writes instead of pipelining them, messages exchanged are greatly reduced.
2.6 Lazy Release Consistency(LRC)
The Treadmarks' lazy implementation of release consistency goes further[4]. It does not propagate the modi cation of writes in a critical section at the time of a release. Instead, modi cations are buered and are propagated merely to the processor that acquires the released lock until the time of acquire. In this way, lazy release consistency reduces both the number of messages and the amount of data exchanged. In LRC, before a processor can pass an acquire operation, all modi cations that have been visible to the releasing processor must also be visible to the acquiring processor.
2.7 Scope Consistency(ScC)
Scope consistency[12] is even lazier than lazy release consistency. In ScC, when processor P acquires lock L from processor Q, it does not need to pick up all modi cations that have been visible to Q(as required in LRC). Instead, it picks up only those modi cations written by Q in the critical section protected by L. ScC imposes the following constraints on the ordering of memory accesses: Before an lock acquire is allowed to perform at processor P , all writes performed with respect to that lock must also be performed with respect to P . A memory access is allowed to perform only after all the previous acquire events have been performed. A write is said to be performed with respect to a lock when the lock is released.
2.8 Entry Consistency(EC)
Entry Consistency[5] further relaxes the consistency model by exploiting the association of shared data objects with synchronization variables. It requires the acquire operation to obtain updates only to the data that have been explicitly indicated to be guarded by the acquired synchronization variable, and hence reduces unnecessary update propagation. While EC represents the minimum extreme of communication amount in shared memory systems, it substantially increase the burden of the programmers by requiring them to explicitly associate synchronization with data. 4
w (x) r (x) u (x); v (x) acq (l) rel (l) u a b A B a 2: n
M
M
n
n
n
4.2 Correct Implementation of Sequential Consistency
A popular accepted implementation of SC is as follows[9]: Before a load is allowed to perform with respect to any other processor, all previous load accesses must be globally performed and all previous store accesses must be performed, and Before a store is allowed to perform with respect to any other processor, all previous load accesses must be globally performed and all previous store accesses must be performed. Besides these constraints, there are also some implicit constraints about the event ordering of con icting accesses:
Write operations to the same location are serialized, i.e., if w1 i; j = 1; 2; ; N .
11
E
!
w2 then w1 < w2 for all i
j
If r from P reads the value written by w, then w is performed with respect to P before r, i.e., if w r then w < r . If w writes a location after it is read by r from P , then r is performed with respect to P before w, i.e., if r w then r < w . The above constraints can be formalized as follows:
c
c
E
c
!
c
c
E
c
!
c
c
9
v u < v >> w v w < v >>= (5) w1 w2 w1 < w2 > > r w r < w >> w r w