Formal Semantics for Expressing Optimism: The Meaning of HOPE Crispin Cowan Department of Computer Science and Engineering Oregon Graduate Institute P.O. Box 91000 Portland, OR 97291-1000
[email protected]
Hanan Lut yya Computer Science Department Middlesex College University of Western Ontario London, Ontario N6A 5B7
[email protected]
May 19, 1995
ABSTRACT Optimism is a powerful technique for increasing concurrency. A program can increase concurrency by making an optimistic assumption about its future state, and verifying the assumption in parallel with computations based on the optimistic assumption. The use of optimism has been restricted to specialized systems due to the diculty of writing and understanding optimistic programs. In this paper, we de ne optimism as any computation that uses rollback. We present a formal semantics for expressing optimism by de ning operations for concurrent programs to specify which optimistic computations to roll back, while automating the dependency tracking. We prove that these semantics guarantee some intuitively desirable behaviors. Finally, we describe our programming language implementation based on the semantics. Keywords: optimism, semantics of optimism, concurrency, parallelism, distributed computing, rollback.
1 INTRODUCTION Distributed systems can improve program response time by providing access to more computing resources than may be available on a single computer. However, distributing a program across multiple machines can also degrade program response time due to the latency introduced by remote communications. For purposes of discussion, we will refer to the delay between issuing a request to a remote machine and receiving an answer, such as in a remote procedure call (RPC), as latency. Optimism is a technique that can be used to avoid this latency. By optimistically assuming the behavior of the remote process executing the RPC, the calling process can proceed before the RPC is complete. If the remote process behaves as expected, then the latency of the RPC has been successfully avoided. If the remote process does not behave as expected, then the calling process must be rolled back and re-executed using the actual behavior instead of the assumed behavior. Avoiding RPC latency is a special case of the general technique of using optimistic assumptions to avoid latency Supported in part by the National Sciences and Engineering Research Council of Canada (NSERC) and ARPA.
by increasing concurrency. Optimism increases concurrency by making an assumption about a future state, and verifying the assumption in parallel with computations based on the optimistic assumption. Any kind of assumption can be made, as long as a reliable method exists for verifying that the assumption was correct. Sometimes the new concurrency is quite obvious, such as an optimistic assumption that a concurrency lock will be granted. Sometimes, it is subtle, such as the concurrency introduced between the volatile and stable-storage components of a fault-tolerant application. A program increases its concurrency by making an optimistic assumption about its future state and verifying the assumption in parallel with computations based on the optimistic assumption [10]. If a computation proceeds based on an optimistic assumption and that assumption is shown to be incorrect, then all computations predicated on that assumption must be rolled back to correct for the incorrect assumption. If, during the optimistic computation, process pi sends a message to process pj then pj 's subsequent computation becomes optimistic. If pi is forced to rollback, then pj must also rollback to undo the events triggered by the messages, otherwise an inconsistent state is produced. Optimism has been used in various areas to enhance performance [6, 15, 24, 26]. However, optimism is mostly embedded inside systems, and not exposed to the applications programmer. Optimism is not often used in applications because optimistic programs are dicult to write. Whenever an optimistic assumption is made, all of its causal descendants (i.e. all computations dependent on the optimistic assumption) must be tracked, and rolled back if the assumption proves false. Verifying that an optimistic algorithm correctly tracks and rolls back all dependents of an optimistic assumption is tedious, at best, without automatic assistance. We believe that research into optimistic algorithms has been hindered by the lack of adequate automatic assistance that allows the application programmer to make optimistic assumptions and not worry about the details of the dependency tracking, checkpointing, and rollback. This paper presents HOPE (Hopefully Optimistic Programming Environment): a programming model for expressing optimism. HOPE provides primitives for specifying an optimistic assumption, and then later arming or denying the optimistic assumption, possibly in parallel with ongoing computations that depend on the optimistic assumption. The primitives are general, in that any optimistic assumption can be made, and any user-programmed criteria can be used in deciding whether the optimistic assumption was correct. The veri cation criteria can also be selected at run time. Furthermore, armation and denial of optimistic assumptions, as well as making further optimistic assumptions, can all be performed by computations that are themselves
optimistic. In previous work, we have de ned the HOPE programming model and it's applicability [6, 10, 22]. We have also constructed a prototype HOPE system [7, 8, 11]. This paper presents the formal semantics of the HOPE programming model. The rest of this paper is organized as follows. Section 2 describes related work. Section 3 informally describes our programming model for expressing optimism, and provides an example program. Section 4 describes the notation that we use in section 5 to formally de ne the semantics of the HOPE primitives. Section 6 presents some theorems that prove that what one would intuitively expect from the primitives is indeed provided. Finally, section 7 presents our conclusions and future research.
2 RELATED WORK Use of optimism has largely been limited to embedded systems. For instance, numerous optimistic recovery protocols have been designed [24, 18, 19], and a few have even been implemented [14]. These protocols allow separate components of a distributed system to asynchronously checkpoint their state while retaining the ability to recover the whole system to a consistent state. The basic mechanism is to optimistically assume that the sender of a message will checkpoint it's state to stable storage before failure at that node occurs. Various dependency tracking schemes are used to recover to a consistent state in such an environment. HOPE subsumes these systems, because HOPE allows any optimistic assumption to be made, rather than the single non-failure assumption. We believe that research into optimistic algorithms has been hindered by the lack of adequate automatic assistance that allows the application programmer to make optimistic assumptions and not worry about the details of the dependency tracking, checkpointing, and rollback. There has been a small amount of work in systems supporting optimistic programming [2, 3, 17, 20, 23, 25]. However, previous work has either restricted the type of optimistic assumption that can be made, or restricted the scope of optimistic computation [8]. In [25], computation based on an optimistic assumption is limited to the scope of an if or while statement. Randell [23] provides a similar statically scoped construct. In [2, 3], computation based on an optimistic assumption is limited to the scope of a previously de ned encapsulation. All encapsulations must be de ned ahead of this time. This means that dependency tracking is not necessary, but it also means that the range of computation based on an optimistic assumption is statically bound. In Time Warp [17], on the other hand, the amount of computation based on a optimistic assumption is not statically bound. However, only one kind of optimistic assumption can be made, which is that messages arrive at each process in time-stamp order, re ecting a presumption of a globally synchronized clock. Other optimistic assumptions must be re-cast in terms of message order arrival. HOPE can specify any optimistic assumption, including message arrival order.
3 THE HOPE OPTIMISTIC PROGRAMMING MODEL HOPE is a set of primitives for expressing optimism, but is not a complete programming language. Rather, it is programming model for optimism, embodied as a set of primi-
tives designed to be embedded in some other programming language, similar to Linda [4]. There are very few restrictions on the kinds of distributed systems in which HOPE can be embedded. HOPE can be embedded in any system providing concurrent processes that communicate with messages. Consider a distributed program composed of communicating sequential processes, p1 ; p2 ; :::;pn , that execute operations that cause events that change the state of a process. A computation is a sequence of consecutive states that have occurred in the execution of a process. Rollback returns a process to a previous state in its computation and discards the computation subsequent to that state. An optimistic assumption is an assertion about a future state that has yet to be veri ed. An optimistic (or speculative) computation is a computation that proceeds based on an optimistic assumption and is said to be dependent on that assumption. If the assumption is found to be true, then the optimistic computation is retained, otherwise it is rolled back. HOPE consists of one data type and four primitives, as follows:
AID x guess(x) arm(x) deny(x) free of(x)
x is an assumption identi er or AID, used to identify particular op-
timistic assumptions. The executing process makes an assumption identi ed by x. guess speculatively returns True immediately, and returns False if rolled back. The executing process asserts that the optimistic assumption identi ed by x is con rmed as true. The executing process asserts that the optimistic assumption identi ed by x is found to be false. The executing process asserts that the current computation is not, and never will be, dependent on the assumption identi ed by x.
The AID is a signi cant novel feature of HOPE. An AID is a reference to an optimistic assumption. Using the primitives described here, dependence, precedence, and con rmation of an assumption can all be handled separately. guess(x) appears to the programmer to be a boolean function that returns True if the assumption identi ed by x is correct, and returns False if x's assumption is found to be incorrect. The AID is an abstraction that represents the optimistic assumption, and can be used to arm or deny the optimistic computation. guess(x) will return True immediately, regardless of the status of the assumption. Speculative computation begins at this point, with the process \dependent" on x. Based on the returned value, the process then proceeds based on the optimistic assumption being true. If x's assumption is later discovered to be false, the process is rolled back to where it called guess(x), and False is returned instead of True. Having been informed through the return code that the assumption was incorrect, the process can then take the necessary steps to deal with the awed assumption. Idiomatically, guess(x) is embedded in an if statement. The \true" branch of the if statement contains the optimistic algorithm, and the \false" branch of the if statement contains the pessimistic algorithm. aid init(x) is used to
= Worker Process = line = call print("Total is ", total); = S1 ?? RPC = if (line > PageSize) f call newpage(); = S2 ?? RPC =
g call print("Summary ..."); = ... end process =
= S3 ?? RPC =
Figure 1: Before Call Streaming Transformation initialize x ahead of time, so that a checking mechanism can be set up to verify x's assumption.1 arm(x) asserts that the assumption associated with the AID x is correct. Similarly, deny(x) asserts that the assumption associated with x is incorrect. If arm(x) is executed anywhere in the system, all the speculative computations executed from guess(x) onward are retained. If deny(x) is executed anywhere in the system, the computations from guess(x) onward, including any causal descendants in other processes, are discarded and rolled back, and execution re-starts from guess(x) with a return code of False instead of True. There is no restriction on how much computation can be executed before an optimistic assumption is con rmed. There is no restriction on which process in the program may con rm an optimistic assumption. Only one arm or deny primitive may be applied to a given assumption identi er, because multiple arm or deny primitives are redundant, and con icting arm and deny primitives have no meaning. Speculative processes can execute arm and deny primitives, and the system will transitively apply the assertions, i.e., if a speculative process is made de nite, then all arm primitives it has executed will have the same eect as de nite arm primitives. In summary, a guess(x) eventually either results in the execution of an arm(x) and guess returns True, or deny(x) and guess returns False. In addition to explicit guess primitives, processes can also become dependent on AIDs by exchanging messages. When a speculative process sends a message, the message is \tagged" with the set of AIDs that the sender currently depends on. When the message is received, the receiver implicitly applies a guess primitive to each of the AIDs in the message's tag. Execution of the free of(x) statement means that the executing task has no causal dependencies on any event in the speculative executions dependent on x. If any such dependency is ever detected, then x is denied. Asserting free of(x) ensures an execution in which the asserting process is causally free of all events dependent on the AID x.
3.1 EXAMPLE
Here we illustrates the meaning of the HOPE primitives by presenting an optimistic program and its pessimistic equivalent. The pessimistic program in this example speci es the meaning of the optimistic constructs being illustrated. A program increases its concurrency by making an optimistic assumption about its future state and verifying the assumption in parallel with computations based on the assumption [10]. If a computation proceeds based on an assumption and that assumption is shown to be incorrect, then all computations predicated on that assumption must be rolled back to correct for the incorrect assumption. During 1 Although guess is applicable in modeling non-deterministic algorithms, it is a subjunctive statement, not a non-deterministic statement.
= Worker Process = aid t PartPage, Order; PartPage = aid init(); Order = aid init(); send(WorryWart, PartPage, Order, total); if (guess(PartPage)) f = do nothing = = S2 =
g else f call newpage(); g
guess(Order); call print("Summary ..."); = ... end process. =
= S2 ?? RPC = = S3 ?? RPC =
= WorryWart Process(PartPage, total) = aid t
PartPage, Order;
receive(PartPage, Order, total);
line = call print("Total is ", total); = S1 ?? RPC = free of(Order); if (line < PageSize) f arm(PartPage);
g else f deny(PartPage); g = ... end process =
Figure 2: After Call Streaming Transformation the optimistic computation, if a process pi sends a message to process pj then pj 's subsequent computation becomes optimistic. If pi is forced to rollback, then pj must also rollback to undo the events triggered by the messages, otherwise an inconsistent state is produced. Let S1 ;S2 be two sequential operations in process P, where both are remote procedure calls (RPCs). Because RPCs are synchronous, the calling process waits idle until a response is received from the remote machine. Even very fast networks do not signi cantly reduce this idleness. For example, the time required to send a photon from New York to Los Angeles and back again is 30 milliseconds. A transcontinental 100Mb/s ber optic channel is capable of sending 100 byte packets 100,000 times per second, but is only capable of sending that 100 byte packet 30 times per second if each transmission waits for a response. A 100 MIPS CPU can execute over 3 million instructions while waiting for a response from the opposite coast. We can avoid this latency by executing S1 in parallel with S2 , i.e., transforming the synchronous RPCs into asynchronous messages. If S2 does not depend on any results from S1 , then S1 and S2 are completely independent and it is easy to execute S1 and S2 concurrently. However, what if S1 and S2 are not independent? The execution of S2 may be a function of the response of the RPC done by S1 . For example, Figure 1 shows a program fragment in which S1 is an RPC that prints a summary total and returns the current line number of the page. S2 takes the line number and checks to see if the line number now exceeds page size. If it does, then S2 creates a new page; otherwise execution can immediately proceed to S3 . Bacon and Strom [1] present an algorithm for optimistically parallelizing two such statements. We can still parallelize S1 and S2 (and hence the statements after S2 ) by making the (likely) optimistic assumption that the report does not end exactly at the bottom of the page, i.e., line < Page-
Size. Figure 2 shows how we can parallelize S1 and S2 (and hence the statements after S2 ) as follows: S1 is executed in the WorryWart process while S2 (and the statements after S2 ) is executed in the Worker process. The Worker process concurrently executes S1 by spawning the WorryWart process. The Worker process executes guess(PartPage), and based on the optimistic True return code, proceeds to execute S2 and S3 as if the line count were in fact less than PageSize, and prints \Summary..." without forcing a new page. However, the assumption that line < PageSize has yet to be veri ed and the computation in the Worker process is now speculative. If line < PageSize is not valid, then deny(PartPage) is executed. This causes the Worker process to rollback to the point that the guess primitive was executed. Any processes that Worker sent a message to while speculative are also rolled back. The Worker process now resumes execution with a value of False returned from guess(PartPage). In this example, this means that the Worker knows that the line value exceeded PageSize, and so calls newpage(). The execution of S2 and the statements after S2 must not interfere with S1 's execution. S3 's message may arrive at the remote machine ahead of the message from S1 in the WorryWart process. The remote process becomes speculative and is dependent on the assumption identi er Order and by transitivity the WorryWart process becomes dependent on Order. Because S3 changes the line number, S1 's test is invalidated. The free of(Order) primitive is used to detect this causality violation and force rollbacks to solve the problem.
4 PRELIMINARIES We use an operational approach for de ning the semantics of the HOPE constructs. The central aim in this approach is to de ne an abstract machine for interpreting programs of the subject language. The machine interprets a program by passing through a sequence of discrete states. This approach requires that the structure of states and the allowable transitions from one state to another be speci cally de ned. Our purpose here is to formally de ne the meaning of our new computational model, not to break ground in techniques for formal speci cations. Thus we use this basic approach because it is amenable to the problem of de ning the behaviour of our optimistic primitives, which primarily operate on process states. The approach taken here is that a distributed program, Prog, consisting of a collection of communicating sequential processes, P;Q; : : :, is a generator of execution sequences or histories. Each process P generates an execution sequence of process states. We now de ne the components of the abstract machine, starting with the variables and states associated with a process P . Each process P consists of the following components: Vi A nite set of state variables. Some of these variables represent data variables, which are explicitly manipulated by the program text. Other variables are control variables, which represent, for example, the location of control in P and will be be denoted by PC . Three other variables of signi cance in this work are G, I , and IS , which will be discussed later. i A set of states. Each state S 2 i is an interpretation of Vi , assigning to each variable in Vi a value over its domain. The following gives a formal de nition of execution history (sequence) of a process P .
De nition 4.1 An execution history or sequence for process P is a nite or in nite sequence of states separated by events that alter the states: HP : S0 E0 S1 E1 S2 E2 :::. Each event is the execution of a statement by the process P. We will now identify relevant data and control variables. Each assumption identi er is a data variable representing an assumption. Any process in the system can apply HOPE primitives to any assumption identi er. Each assumption identi er is associated with a tuple of control variables. We represent an assumption identi er as follows: De nition 4.2 An assumption identi er X is associated with a control variable X:DOM (Depends On Me). The control variable DOM is used for dependency tracking. It is invisible to the programmer in the same sense that program counters are invisible. The usage of these control variables will become clearer in the next section. An interval is a subsequence of an execution history that corresponds to the smallest granularity of rollback that may occur, i.e., a process is rolled back one or more intervals. An interval is said to be speculative if that interval is rolled back; otherwise, the interval is said to be de nite. We use the control variables I and IS to represent the current interval that the state of a process is in, and the set of intervals in a process's history that are speculative, respectively. Formally, we de ne intervals as follows: De nition 4.3 A guess point in an execution history, HP is the event Ei representing the execution of a guess primitive. De nition 4.4 An interval, in process P , is de ned as a subsequence of states in HP between two guess points, between the start of the process and the rst guess point, and between the last guess point and the current state of the process. An interval A is associated with the tuple of control variables A:PS (Previous State), A:IDO (I Depend On), A:IHD (I Have Denied), A:PID (Process ID). The control variables are considered part of the state of the process, but are transparent to the programmer. We use control variables to denote names of intervals. Like the assumption identi er variables, the usage of these control variables should become clearer in the next section. Basically, these control variables are used for dependency tracking associated with computations. Again, all these control variables are transparent to the programmer. We will use A; B; C to denote intervals and X; Y; Z to denote assumption identi ers. The following de nition of dependence is useful for discussing the relationships between intervals and assumption identi ers: De nition 4.5 An interval A depends on an assumption identi er X if the interval A is made de nite only if the assumption identi er X is de nitely armed. We will now de ne the set and sequence operations that will be used. Some of the control variables (as will be seen in the next section) are sets. All set operations are allowed. With respect to the execution sequences, we require the following: a concatenation operation for adding a state to an execution sequence, and an operation last(HP ) used to denote the current state of HP . Although we may have in nitely long execution sequences, at any one time there is exactly one current state. Del(HP ; A) deletes the interval A from the sequence HP . Theorem 5.1 will show that this deletion will only apply to the sux of an execution sequence ending with the current state.
5 SEMANTICS OF THE HOPE CONSTRUCTS This section de nes the HOPE constructs in terms of the abstract machine described in the previous section by de ning the eects of each HOPE primitive on the execution sequences. All sequences are initially null and all sets are initially empty.
5.1 GUESS
Process P executing a guess(X) primitive in state Si asserts an optimistic assumption and associates the assumption identi er X . The creation of the new interval, A, requires that a checkpoint be taken of the current state, including the program counter. This is represented as follows:
A:PS A:PID
Si P
(1) (2)
We should note that the process identi er (in PID) is also recorded. The recording of the process identi er is a naming convenience. The interval A is speculative, and is dependent on the optimistic assumption associated with the assumption identi er X . In other words, the interval A is rolled back if the optimistic assumption associated with X is discovered to be false. If the interval preceding A is speculative then the interval A is also dependent on the optimistic assumptions that the preceding interval is dependent on. For interval A, the set A:IDO is used to track A's dependencies. At the beginning of interval A (i.e., when the guess primitive is executed), the dependencies are as follows: (Si :I ):IDO [ fX g (3) Associated with each assumption identi er X is the set X:DOM that tracks the intervals that are dependent on X . Thus X records its new dependent interval A as follows:
A:IDO
X:DOM [ fAg The state Si+1 is constructed from Si as follows: X:DOM
Si+1 Si+1 :I Si+1 :IS Si+1 :G
Si A Si+1 :IS [ fAg
True
(4)
(5)
The state variable, IS , is the set of all speculative intervals leading to state S . The guess primitive initially returns a value of True, as recorded in Si+1 :G. If rollback occurs, execution resumes by resetting the PC to the point where guess was called, and returning a G value of False. Thus the eect of the guess primitive on the program counter is merely to increment it to the next operation. The execution sequence associated with P is updated as follows:
HP
HP Si+1
(6)
5.2 AFFIRM The arm(X) primitive asserts that the optimistic assumption associated with X has been determined to be true. There are two cases. In the rst case, the arm(X) was executed in process P in state Si where Si :I = ;. Therefore, the execution of the arm(X) cannot be undone (de nite arm). Thus, all the intervals that depend on X can try to decide on whether to become de nite or not as follows: If Si :I = ; then 8B 2 X:DOM : S last(HB:PID ); ((S:IS ):B ):IDO B:IDOnfX g;
HB:PID
HB:PID S
(7) (8) X:DOM X:DOM nfB g (9) nalize(B) transforms B from a speculative to a de nite interval. nalize is not a part of the user's programming model, and is just used here as a shorthand notation for the eects described in Section 5.5. In the second case, an arm(X) was executed in a process P in state Si where Si :I 6= ;. Therefore interval A may be rolled back, thus the execution of arm(X) may be undone. This is called a speculative arm. Each of the assumption identi ers that interval A is dependent on must keep track of the intervals that depend on X . This is re ected as follows: If B:IDO = ; then nalize(B)
8Y 2 A:IDO :
Y:DOM Y:DOM [ X:DOM (10) All intervals dependent on X are now dependent on the assumption identi ers that the interval A is dependent on. If A:IDO 6= ; then 8B 2 X:DOM : (11) S last(HB:PID ); ((S:IS ):B ):IDO (B:IDO [ A:IDO)nfX g; HB:PID HB:PID S (12) If B:IDO = ; then nalize(B) (13) X:DOM X:DOM nfB g (14) If A is dependent on only X at the time A asserts af rm(X) (i.e., X:DOM = fAg, called self arm) then Equation 12 will cause A:IDO (and possibly other IDO sets) to become null, in which case Equation 14 produces results similar to a de nite arm. Only one arm or deny primitive may be applied to a given assumption identi er. Multiple arm primitives are redundant; once armed, it's armed. Similarly, multiple deny primitives are redundant. Con icting arm and deny primitives have no meaning: An assumption cannot be both true and false. To eliminate this kind of confusing and con icting notation, more than one arm or deny primitive applied to a single assumption identi er, in any combination, is a user error, and the meaning is unde ned.
5.3 DENY The deny(X) primitive asserts that the optimistic assumption associated with X has been determined to be false. There are two cases. In the rst case, the deny(X) was ex-
ecuted in process P in state Si of interval A where Si :I = ; or X 2 A:IDO. This implies that P is not dependent on any other assumption identi ers except perhaps X . Therefore,
the execution of the deny(X) cannot be undone by another process (de nite deny). Thus, all the intervals that depend on X must roll back. This is expressed as follows: If Si :I = ; _ X 2 A:IDO then 8B 2 X:DOM : rollback(B) (15) In the second case, a deny(X) in an interval A dependent on other assumption identi ers other than X implies that A may be rolled back; thus deny(X) may be undone. This is called a speculative deny. The set IHD records assumption identi ers that have been denied, to be applied when the interval is made de nite. If Si :I 6= ; ^ X 62 A:IDO then Si+1 last(HP ); ((Si+1 :IS ):A):IHD A:IHD [ fX g;
HP
HP Si+1
(16) deny(X) becomes de nite when the interval A is made de nite, i.e., when A:IDO = ;.
5.4 FREE OF The execution of a free of(X) primitive in process P in state
Si of interval A means that interval A must not depend on the optimistic assumption associated with the assumption identi er X . The execution of free of(X) in an interval A inspects the A:DOM set for the assumption identi er X . If X is not found, then the requirement is satis ed and the optimistic assumption associated with X has been con rmed: the speci ed ordering constraint was not violated, and the equivalent of an arm(X) is executed. However, if the assumption identi er X was found in A:DOM , then the ordering constraint associated with X was violated, and the equivalent of a deny(X) is executed. The formal speci cation is as follows: If Si :I = ; then arm(X) Else if X 62 A:DOM then
(17)
arm(X) (18) Else deny(X) (19) Like arm and deny, free of \consumes" it's argument: it is an error for more than one arm, deny, or free of primitive to be applied to the same assumption identi er. If Si :I = ; then the interval doesn't depend on any assumption identi ers and the execution of arm(X) is definite. If the interval does depend on assumption identi ers but none of these assumption identi ers is X then the execution of arm(X) is speculative. If the interval depends on X then the free of condition has been violated and the execution of deny(X) is speculative.
5.5 FINALIZE
Finalizing interval A makes A a permanent part of the history HA:PID by transforming A from a speculative to a definite interval. Finalizing A has the precondition:
A:IDO = ;
(20) The eect of nalize(A) on process A:PID is as follows:
S
last(HA:PID ); S:IS S:IS nfAg; HA:PID HA:PID S
(21)
8X 2 A:IHD (8B 2 X:DOM rollback(B)) (22) If last(HA:PID ):IS = ; then S last(HA:PID );
S:I ;; HA:PID HA:PID S
(23) Speculative execution of HOPE primitives now become de nite. In particular, the speculative execution of any deny(X) primitives now become de nite, and so all of the intervals dependent on the assumption identi ers in A:IHD are rolled back by Equation 22. If A was the last and only interval in the history of process A:PID to date, then A:PID no longer has a current interval, and becomes de nite as speci ed by Equation 23.
5.6 ROLLBACK
Rolling back interval A truncates the history of A:PID immediately before the start of A, and all subsequent states in the history are discarded. Process A:PID then resumes from the guess primitive, but returning False instead of True.
HA:PID Del(HA:PID ; A); S A:PS; S:G False; HA:PID HA:PID S
(24)
In addition, speculative execution of arm and deny primitives must be undone. Speculative execution of deny primitives are trivially undone because they are never applied: They die with the interval inside the IHD set. Rollback of a speculative arm(X) is considered equivalent to a deny(X)2 Since a speculative arm(X) by A added all of A:IDO to all of the intervals that depend on X , all of those intervals also will be rolled back as a result of the definite deny that caused the rollback of A, and so no further action is required to undo the eects of a speculative arm on other interval's IDO sets. The only remaining set manipulations to consider are the changes to sets imposed by free of primitives. However, the free of primitive's eects are speci ed entirely in terms of speculative and de nite arm and deny primitives, and so no special treatment is required. The following Lemma shows that for all intervals and assumption identi ers, if an interval A records that it is dependent on an assumption identi er X , then the assumption identi er X also records that A is dependent on X . This Lemma is used in proving subsequent theorems.
Lemma 5.1 For all intervals A and assumption identi ers X , X 2 A:IDO if and only if A 2 X:DOM .
2 Applying deny to rollback a speculative arm is a conservative approximation, because it is dicult to roll back the eects of speculative arm while preserving the full expressiveness of HOPE. We hope to re ne this approximation in future work.
Proof: This proof is based on demonstrating that if an
assumption identi er X is inserted into A:IDO, then A is also inserted into X:DOM , and that if X is taken out of A:IDO, then A is also taken out of X:DOM . Equations 3 and 12 are the only operations where assumption identi ers are inserted into IDO sets. In the rst case, Equation 3 inserts assumption identi er X into A:IDO, and symmetrically Equation 4 inserts A into X:DOM . In the second case, Equation 10 inserts all of the intervals that depend on X into the DOM sets of all of the assumption identi ers that A depends on, and symmetrically Equation 12 inserts all of the assumption identi ers that A depends on into the IDO set of all of the intervals that depend on X . Thus, in all cases, if an assumption identi er is inserted into an interval's IDO set, then the interval is inserted into the assumption identi ers DOM set. Similarly, Equations 7 and 12 are the only two operations that remove assumption identi ers from IDO sets. In both cases, assumption identi er X is removed from B:IDO, and interval B is symmetrically removed from X:DOM . Therefore, since in all cases, insertion into an IDO set symmetrically necessitates insertion into a DOM set, and removal from an IDO set symmetrically necessitates removal from a DOM set, we have for all intervals A and assumption identi ers X , X 2 A:IDO if and only if A 2 X:DOM . 2 The rollback of an interval implies that all intervals occurring after A also are rolled back. Therefore rollback truncates history. This is shown by the following theorem:
Theorem 5.1 If an interval A is rolled back in a process P
then for all intervals B , where B occurs after A in HP , B is also rolled back.
Proof: We will rst show that for all intervals that occur after interval A in HP , A:IDO B:IDO. There are two
cases to consider that correspond to IDO updates. The rst corresponds to the initial creation of an interval, and the second corresponds to the eects of speculative execution of arm primitives. This is done by induction on the number of intervals that occur after interval A. By de nition, when the rst interval B following immediately after A is created, B:IDO = A:IDO [ fX g where X is the assumption identi er in the guess that created B . We can immediately conclude that A:IDO B:IDO. Assume that the rst k intervals to occur after A in HP are such that A:IDO is a subset of the IDO thsets associated with each of these intervals. Let B be the k interval and B be the k + 1th interval. By de nition, when interval B is created, its associated IDO set is B :IDO [ fX g. Thus B :IDO B:IDO. By the inductive hypothesis, we have that A:IDO B :IDO, thus we can immediately conclude that A:IDO B:IDO. However, speculative arm primitives also cause changes in interval's IDO sets, thus we need to show that it is always the case that A:IDO B:IDO. By Lemma 5.1, if an assumption identi er X is in both A:IDO and B:IDO, then both A and B are in X:DOM . Thus, all of the other operations that aect IDO sets (Equations 7 and 12), which always use DOM sets to select the interval IDO sets to manipulate, will aect A and all of the intervals that follow A in HP in the same way. Thus it is always the case that A:IDO B:IDO for all intervals B that follow A in HP . By de nition, the only primitive that can cause A to be rolled back is a de nite deny of some assumption identi er X 2 A:IDO. Since A:IDO B:IDO for all B that follow A in HP , X 2 B:IDO for all B that succeed A in HA:PID . 0
0
0
0
Therefore, the same de nite deny(X) also will cause all following intervals to be rolled back. Therefore, rollback of A will result in a truncation of HP . 2 The following Theorem shows that once an interval's IDO set becomes empty, it can never be rolled back.
Theorem 5.2 For any interval A, if A:IDO = ; then rollback(A) will never occur. Proof: From Equations 15 and 22, an interval A can only be rolled back if A 2 X:DOM for some assumption identi er X and a de nite deny(X) has occurred. But from Theorem 5.1, if A:IDO = ;, then there does not exist an assumption identi er X such that A 2 X:DOM . Therefore, no deny(X) can aect A, and A cannot be rolled back. 2
6 SEMANTIC IMPLICATIONS The preceding speci cations describe the meaning of the HOPE primitives in detail, but we would like to have some assurance that they will behave in a way that the programmer would intuitively expect. If arm is asserted for all of the assumption identi ers that an interval depends upon, then the programmer expects that the interval will be made de nite. Conversely, the programmer expects that the interval will be made de nite only if arm(X) is asserted for all of the assumption identi ers X that the interval depends on. The theorems presented in this section prove that these expectations are preserved by the de nitions of the HOPE primitives. Lemmas 6.1 and 6.2 are combined together in Theorem 6.1 to show that if all of the optimistic assumptions associated with the assumption identi ers that an interval B depends on are con rmed to be true by intervals that are themselves eventually made de nite, then B will be made de nite. Lemma 6.1 below shows that the eects of a speculative arm primitive are identical to the eects of a de nite arm if the asserting interval is eventually made de nite.
Lemma 6.1 Arm Transitivity: Let B be an interval that depends on an assumption identi er X . The eect of executing arm(X) within a speculative interval A upon B:IDO and X:DOM , followed by A eventually being made de nite, is the same as the eect of a de nite arm(X). Proof: Equations 7 and 9 de ne the eect of a de nite af rm(X) as removing X from B:IDO, and removing interval B from X:DOM . Let A be a speculative interval that executes arm(X) for some X 2 B:IDO. Equation 14 immediately removes B from X:DOM . Equation 12 will replace X 2 B:IDO with A:IDO. Let = B:IDO at the time that A executes arm(X), and = A:IDO be the set of assumption identi ers that replace X . Since the contents of were added to , we have . From Lemma 5.1 we have that for all assumption identi ers Y 2 : A 2 Y:DOM ^ B 2 Y:DOM . By Equations 9, 14 and 20, if nalize(A) has occurred, then A:IDO = ;. Since all changes applied to A:IDO are also applied to B:IDO, if A:IDO = ; then all of the assumption identi ers from that were added to B:IDO also will have been removed from B:IDO. Thus the impact of A executing arm(X) followed by nalize(A) is the same as a de nite arm(X). 2
Lemma 6.2 proves that if de nite arm(X) is executed on all of the assumption identi ers X that an interval B depends on, then B will become de nite.
Lemma 6.2 For any interval B , if de nite arm(X) is applied to all assumption identi ers X 2 B:IDO, then nalize(B) will result.
Proof: Let B be an interval with an initial set of assumption identi ers = B:IDO. Equation 7 expresses that for each X 2 , each de nite arm(X) will remove X from B:IDO. Equation 9 will induce nalize(A) when the last assumption identi er in is armed. 2 Using Lemmas 6.1 and 6.2, we can conclude that if all of the assumption identi ers that an interval B depends on are armed by intervals that eventually become de nite, then B will become de nite. Theorem 6.1 For any interval B , if arm(X) is applied to all assumption identi ers X 2 B:IDO by intervals that eventually become de nite, then nalize(B) will result. Proof: Lemma 6.2 shows that if all of the arm(X) primitives are executed by de nite intervals, then nalize(B) will result. Lemma 6.1 shows that speculative arm primitives that are eventually made de nite have the same eect as de nite arm primitives, and so nalize(B) will result in either case. 2 Theorem 6.2 shows that nalize(B) will occur if and only if arm(X) is executed on all of the assumption identi ers X that interval B depends on. Theorem 6.2 For all intervals B , nalize(B) occurs if and only if arm(X) is applied to all of the assumption identi ers X 2 B:IDO by intervals that eventually become de nite. Proof: From Theorem 6.1, we have the case that if af rm(X) is applied to all assumption identi ers X 2 B:IDO, then nalize(B) will result. We prove that nalize(B) implies that arm(X) has been applied to all assumption identi ers X 2 B:IDO using proof by contradiction. Assume that nalize(B) has occurred, and that there exists an assumption identi er X 2 B:IDO that has not had arm(X) executed. Since arm(X) has not occurred, no operation will have removed X from B:IDO. Therefore B:IDO 6= ;, and by Equation 20, nalize(B) cannot have occurred. This contradicts the assumption that nalize(B) has occurred. Therefore, for all intervals B , nalize(B) occurs if and only if arm(X) is applied to all of the assumption identi ers X 2 B:IDO by intervals that eventually become de nite. 2 Lemma 6.3 establishes that if an assumption identi er X is speculatively armed by an interval A that depends on some other assumption identi er Y , then X depends on Y . Lemma 6.3 If an interval A depends on an assumption identi er Y and executes arm(X), then X will be de nitely armed only if Y is de nitely armed. Proof: We use proof by contradiction. Let X be de nitely armed, and Y not be de nitely armed. From Section 5.2, we know that only one interval can execute arm(X). Since X was de nitely armed, and A executed arm(X), then A must be the only interval to have executed arm(X), and from Lemma 6.1 we conclude that A must have been made de nite. If A was made de nite, then from Theorem 6.2 all assumption identi ers that A depended on must have been de nitely armed. From
the statement of our lemma, one of those assumption identi ers was Y , which means that Y must have been de nitely armed, con icting with our assumption. 2 Lemma 6.3 produces Corollary 6.1, which shows that the depends-on relation between assumption identi ers is transitive:
Corollary 6.1 If an assumption identi er X depends on another assumption identi er Y , and Y depends on a third assumption identi er Z , then X depends on Z .
Proof: We use proof by contradiction. Let X be de nitely armed, and Z not be de nitely armed. Given that X is de nitely armed, and that X depends on Y , Lemma 6.3 gives us that Y must be de nitely armed. Similarly, since Y is de nitely armed, and Y depends on Z , Lemma 6.3 gives us that Z must be de nitely armed.
2
Finally, Theorem 6.3 shows that if an interval A executes
free of(X), then A:IDO does in fact remain free of X and
all other assumption identi ers Y that depend on X .
Theorem 6.3 For any interval A that executes free of(X),
then either interval A never becomes dependent on assumption identi er X , or interval A is rolled back.
Proof: We use proof by contradiction. Let A be an interval such that X 2 A:IDO, and let the statement free of(X) be executed within interval A. By inspection, Equation 19 will execute deny(X), which will in turn apply Equation 15, which will roll back interval A. Thus if A depends on X and executes free of(X), then A is rolled back. If A does not depend on X at the time it executes free of(X), then A can never become dependent on X . To become dependent on X , A would have to depend on some other AID Y that was speculatively armed by some interval B that depended on X . However, since the speculative execution of free of(X) resulted in a speculative af rm(X), there cannot exist another interval B that depends on X , because B 's dependence on X will have been replaced with a dependence on A:IDO. Thus A can never depend on X. 2
7 CONCLUSIONS AND FUTURE RESEARCH In this paper, we presented a formal model for expressing optimism by de ning primitives for identifying optimistic assumptions, and then later asserting whether the assumptions were correct, while automating the dependency tracking necessary to maintain consistency. We have also proven some intuitively desirable behaviors of the primitives are guaranteed by the semantics of the primitives. HOPE is not just a theory: a prototype HOPE system exists [7, 8, 9], built on top of the PVM system [13]. HOPE processes are PVM tasks (UNIX processes for the most part), assumption identi ers are implemented as AID tasks, and the HOPE dependency tracking algorithms are implemented using PVM messages. Of particular note is the fact that the implementation never forces a user process to wait for a HOPE dependency tracking message before proceeding. Preliminary performance studies show that the prototype can deliver performance gains of up to 80% using the Call Streaming protocol to avoid RPC latency [11]. To facilitate practical programming, the prototype supports a richer set of dependency tracking facilities than those
presented here. Dependency tracking of inter-process communications is also automated: messages passed between processes are automatically tagged with the assumption identi ers that the sender depends on, and receiving such a tagged message automatically results in the appropriate set of guess primitives being executed before the speculative message is delivered into the user-accessible state of the process. The use of such tagges messages ensures that HOPE programs remain globally consistent, even in the presence of rollback of some processes [21]. In future theoretical work, we are studying ways to specify the meaning of HOPE in a more denotational style. The operational semantics presented here were invaluable in the construction of the HOPE system, but are awkward for reasoning about optimistic programs. Future implementation work will examine ways to optimize both the HOPE dependency tracking algorithms, and the checkpoint and rollback mechanism (the present checkpoint mechanism is simple and fairly portable, but not particularly ecient). The formal semantics presented here clearly de ne when a given process becomes dependent on a given optimistic assumption. This information could be used to optimize checkpointing, broadening the applicability of HOPE to ner-grained problems. In future applied work, we will apply HOPE to the problem of optimistic concurrency control of replicated data [6]. A local cached replica of a piece of data can greatly reduce the latency of access to that data, and optimistically assuming consistency can reduce the latency of updating replicated data. We will also extend the application of optimism beyond its traditional domains [16, 26] into new areas such as truth maintenance systems [12], numerical computation [7] and co-operative work [5].
8 ACKNOWLEDGMENTS HOPE was inspired by optimism studies at the IBM T.J. Watson Research Center. The initial inspiration for HOPE came from discussions with Robert E. Strom. Thanks go to Andy Lowry for our many constructive debates about optimism, as well as Jim Russell and Ajei Gopal for their helpful comments and suggestions. Thanks also go to Mike Bauer, Mike Bennett and Andrew Marshall for their helpful suggestions on the direction and structure of this work.
REFERENCES [1] David F. Bacon and Robert E. Strom. Optimistic Parallelization of Communicating Sequential Processes. In Third ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, April 1991. [2] R.G. Bubenik. Optimistic Computation. PhD thesis, Rice University, May 1990. [3] Rick Bubenik and Willy Zwaenepoel. Semantics of Optimistic Computation. In 10th International Conference on Distributed Computing Systems, pages 20{27, 1990. [4] N. Carriero and D. Gelernter. Linda in Context. Communications of the ACM, 31(4):444{458, April 1989. [5] G. V. Cormack. A formalism for real-time distributed lock-free conference editing. private communication. [6] Crispin Cowan. Optimistic Replication in HOPE. In Proceedings of the 1992 CAS Conference, pages 269{ 282, Toronto, Ontario, November 1992. [7] Crispin Cowan. Optimistic Programming in PVM. In Proceedings of the 2nd PVM User's Group Meeting, Oak Ridge, TN, May 1994.
[8] Crispin Cowan. A Programming Model for Optimism. PhD thesis, University of Western Ontario, March 1995. [9] Crispin Cowan. HOPE: Hopefully Optimistic Programming Environment. Prototype implementation, avialable via FTP from ftp://ftp.csd.uwo.ca/pub/src/hope.tar.Z, January 1995. [10] Crispin Cowan, Hanan Lut yya, and Mike Bauer. Increasing Concurrency Through Optimism: A Reason for HOPE. In Proceedings of the 1994 ACM Computer Science Conference, pages 218{225, Phoenix, Arizona, March 1994. [11] Crispin Cowan, Hanan Lut yya, and Mike Bauer. Performance Bene ts of Optimistic Programming: A Measure of HOPE. In Fourth IEEE International Symposium on High-Performance Distributed Computing (HPDC-4), August 1995. [12] J. Doyle. A Truth Maintenance System. Arti cial Intelligence, 12:231{272, 1979. [13] Al Geist, Adam Geguelin, Jack Dongarra, Wicheng Jiang, Robert Manchek, and Vaidy Sunderam. PVM: Parallel Virtual Machine, a Users' Guide and Tutorial for Networked Parallel Computing. The MIT Press, Cambridge, Massachusetts, 1995. [14] Arthur Goldberg, Ajei Gopal, Kong Li, Rob Strom, and David F. Bacon. Transparent Recovery of Mach Applications. In First USENIX Mach Workshop, Burlington, VT, October 1990. [15] Arthur P. Goldberg. Optimistic Algorithms for Distributed Transparent Process Replication. PhD thesis, University of California at Los Angeles, 1991. (UCLA Tech. Report CSD-910050). [16] D. Jeerson. Virtual Time. ACM Transactions on Programming Languages and Systems, 3(7):404{425, July 1985. [17] D. Jeerson and A. Motro. The Time Warp Mechanism for Database Concurrency Control. Report Technical Report TR-84-302, University of Southern California, January 1984. [18] D.B. Johnson and W. Zwaenepoel. Recovery in Distributed Systems using Optimistic Message Logging and Checkpointing. J. Algorithms, 11(3):462{491, September 1990. [19] T.T.Y. Juang and S. Venkatesan. Ecient Algorithm for Crash Recovery in Distributed Systems. In 10th Conference on Foundations of Software Technology and Theoretical Computer Science, pages 349{361, 1990. [20] Jonathan I. Leivent and Ronald J. Watro. Mathematical Foundations for Time Warp Systems. ACM Transactions on Programming Languages and Systems, 15(5):771{794, November 1993. [21] Hanan Lut yya and Crispin Cowan. Language Support for the Application-Oriented Fault Tolerance Paradigm. To be submitted for review, 1995. [22] Hanan Lut yya and Crispin Cowan. Optimistic Language Constructs. In ICSE-17 Workshop on Research Issues in the Intersection of Software Engineering and Programming Languages, Seattle, WA, April 1995. [23] B. Randell. System Structure for Software Fault Tolerance. IEEE Transactions on Software Engineering, 1(2):226{232, June 1975.
[24] R.E. Strom and S. Yemini. Optimistic Recovery in Distributed Systems. ACM Transactions on Computer Systems, 3(3):204{226, August 1985. [25] Thomas Strothotte. Temporal Constructs for an Algorithmic Language. PhD thesis, McGill University, 1984. [26] P. Trianta llou and D.J. Taylor. A New Paradigm for High Availability and Eciency in Replicated and Distributed Databases. In 2nd IEEE Symposium on Parallel and Distributed Processing, pages 136{143, December 1990.