Modular concurrency control and failure recovery - Computers, IEEE ...

4 downloads 7819 Views 2MB Size Report
L. Sha is with the Department of Computer Science, Carnegie-Mellon. University ... rency control and failure recovery which provides a higher degree of ...
146

IEEE TRANSACTIONSON COMPUTERS, VOL. 31, NO. 2 , FEBRUARY 1988

Modular Concurrency Control and Failure Recovery Abstract-This paper presents an approach to concurrency control based on the decomposition of both the database and the individual transactions. This approach is a generalization of serializability theory in that the set of permissible transaction schedules contains all the serializable schedules. In addition to providing a higher degree of concurrency than that provided by serializability theory, this approach retains three important properties associated with serializability: the consistency of the database is preserved, the individual transactions are executed correctly, and the concurrency control approach is modular, a concept formalized in this paper. The associated failure recovery procedure is also presented as is the concept of failure safety. Index Terns-Concurrency control, consistency, correctness, failure recovery, modularity.

I. INTRODUCTION

S

ERIALIZABILITY theory has been widely accepted as the basis for concurrency control. This acceptance in the database context stems from the virtual “executing alone” environment it creates. If individual transactions are correct and preserve the consistency of the database, then a serializable concurrent execution of transactions will also lead to correct results and leave the database consistent. In addition to the properties of consistency and correctness, serializability theory also provides us with modular concurrency control protocols, which can be used to schedule a transaction without reference to the semantics of other transactions. For example, the two phase lock protocol [3] is a modular concurrency control protocol. The modularity of concurrency control protocols is important for any general purpose transaction facility in which transactions are frequently modified. We believe that the properties of consistency, correctness, and modularity account for both the popularity of serializability theory and the continuing interest in the study of various Manuscript received November 5, 1985; revised May 15, 1987. This work was supported in part by the Office of Naval Research (ONR) under Contract N00014-84-K-0734, the USAF Rome Air Development Center (RADC) under Contract F30602-84-C-0063, the US Naval Ocean Systems Center (NOSC) under Contract N66001-83-C-0305. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of ONR, RADC or NOSC. L. Sha is with the Department of Computer Science, Carnegie-Mellon University, Pittsburgh, PA 15213. J. P. Lehoczky is with the Department of Statistics, Carnegie-Mellon University, Pittsburgh, PA 15213. E. D.Jensen is with the Departments of Computer Science and Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213. IEEE Log Number 8716220.

protocols that support serializable concurrency control [ 151, 121, [91, [71, [io], 181. Our objective is to not only maintain these desirable properties, but also to achieve a higher degree of concurrency than that permitted by serializable schedules. To this end, we must investigate nonserializable scheduling methods whose sets of associated schedules are supersets of the serializable schedules. There have already been a number of proposed nonserializable scheduling methods [ 11, [111, [4], [6], [121 designed for various applications. In this paper, we present a simple decomposition approach for nonserializable concurrency control and failure recovery which provides a higher degree of concurrency than that provided by serializable schedules. In addition, it guarantees the consistency, correctness, and modularity of concurrency control. This approach is intended to support not only distributed database applications but also the reliable sharing of data in distributed operating systems, where a high degree of concurrency is a necessity. There are fundamental differences between systems which accept only serializable schedules and systems which accept both serializable and nonserializable schedules. These differences manifest themselves in the issues of consistency, correctness, and modularity in concurrency control and failure recovery. In this paper, we assume that the database is initially consistent. Furthermore, we assume that each transaction, when executing alone, preserves the consistency of the database and satisfies the stated transaction postcondition. That is, each transaction is assumed to be consistent and correct. Under these assumptions, a schedule is said to be consistent and correct if the execution of the transactions according to this schedule leaves the database in a consistent state and satisfies the postconditionof each of the transactions. In addition, we consider a scheduling protocol to be consistent and correct if it always produces consistent and correct schedules. For example, the two phase lock protocol is a consistent and correct protocol. We consider a consistent and correct scheduling protocol to be modular if it can be used to schedule a single transaction without reference to the semantics of other transactions in the system. For example, the two phase lock protocol is a modular scheduling protocol. While the consistency and correctness of serializable schedules follow directly from our assumptions, it is not easy to determine the consistency and correctness of a nonserializable schedule. Furthermore, a nonserializable scheduling protocol may or may not be modular. To illustrate the difficulty in using nonserializable scheduling methods, let us consider the following example. Suppose that we have a database consisting of two variables A and B

0018-9340/88/0200-0146$01.00 O 1988 IEEE

147

SHA et al.: CONCURRENCY CONTROL AND FAILURE RECOVERY

11. A CONCEPTUAL FRAMEWORK

TABLE I EXECUTIONS OF SCHEDULES S i AND Sz

A. Overview

t2,Z:A := A

+ 2. (51)

t1.2:B := B

+ 1. (49)

An Execution of Schedule S2 A (50) B (50) t&:B := B - 2; (48) tl,l:A := A - 1; (49) t&:A := 100 - B. (52) tl,Z:B:= B + 1. (49)

+

with a consistency constraint “ A B = 100.” Consider two transfer transactions: TI = (tI,l(A:= A - 1); tl,z(B := B + 1)) and Tz = (t2,1(B:= B - 2); t2,2(A:= A + 2)), where the symbol “ti,” denotes step j of transaction i. In addition, consider a different implementation of transaction T2, T,* = (t&(B := B - 2); t$,(A := 100 - B)). Note that both transactions T: and T2 transfer two units from B to A and B = 100” when preserve the consistency constraint “ A executing alone. Suppose that we are first given transactions TI and T2. To enhance the concurrency, one may use a nonserializable scheduling method which schedules these two transactions by associating the “lock” and “unlock” operation pair with each step. That is, TI = (Lock A , t1.1, Unlock A ; Lock B, tl,Z, Unlock B), Tz = (Lock B, t2.1, Unlock B; Lock A , t2,2, Unlock A ) . Later, suppose that someone modifies the implementation of T2 to TZ but retains the same locking protocol with the intuitive argument that transactions T2 and T: have the same number of steps, use the same commutative “add” and “subtract” operations, and perform the identical computation. Now consider the two schedules generated by this scheduling method: SI = (t1,l; tz.1; t2.2; ~ I , z and ) SZ = (tl.1; t*2,1; t *2,2, t1,2).Schedule S2 is the same as SI, except that the steps of Tz are replaced by the steps of T;. It is easy to check that SI is consistent and correct, while S2 is inconsistent. Table I provides an example of executing schedules SIand Sz when both A and B have the initial value 50. Note that SIis consistent and correct, but at the end of the execution of S2,the sum of A and B is 101, an inconsistent result. This example illustrates that intuition is not a reliable foundation for the developmentof a general atomic transaction facility which permits nonserializable concurrent execution of transactions. This is especially true when the transaction system is large and must be developed by a team of programmers. Therefore, it is important to develop scheduling methods for which the properties of consistency, correctness, and modularity have been formally proven, so that each programmer can write, modify, and schedule his own transaction independently of others. In this paper,we present a simple decomposition method which guarantees these three properties and provides a higher degree of concurrency than that provided by serializable schedules. The rest of this paper is divided into three sections. In Section 11, we provide an intuitive discussion of our approach, while Section III proves the consistency, correctness, and Of Our approach* Section I” presents the recovery procedure for the protocols defined in Section 111.

+

Both modularity and the degree of concurrency of a scheduling method are related to its utilization of the available information about the transactions and the database. For example, serializability theory utilizes only the syntax information of transactions, and it provides the highest degree of concurrency possible among scheduling protocols using only transaction syntax information [ 5 ] . To improve concurrency, additional information must be used. We observe that in order to write a consistent and correct transaction, one must be informed about the syntax of transactions, the consistency constraints of the database, and the Semantics of one’s transaction. Our approach is designed to use only these three types of information that are always available to a programmer. That is, we want a protocol that can be used by a programmer without reference to the semantics of any other transaction. If the scheduling of a transaction requires the semantic information of other transactions, then the modification of one transaction could invalidate the scheduling of other transactions. This contradicts our goal of modularity in concurrency control where each transaction can be scheduled without reference to the semantics of others. One may be concerned with the effect of using information about database consistency constraints to enhance concurrency. In fact, the information about the database consistency constraints is always embedded in the semantics of all the transactions, because transactions must maintain the consistency constraints of the database. When the database consistency constraints are changed, one must, in principle, examine all the transactions to see if they are still consistent, no matter which concurrency control protocol is used to schedule the transactions. Thus, database consistency constraints should be carefully designed and rarely changed. In our approach, the information about database consistency constraints is used to decompose the database. In addition to the decomposition of the database, our approach also decomposes transactions. The two types of decomposition are as follows. 1) Based upon the knowledge of the consistency constraints of the database, the database is partitioned into atomic data sets. Such a partition is said to be consistency preservingsin the sense that an atomic data set has the property that its consistency can be preserved by a transaction without reference to the states of other atomic data sets. Furthermore, if the consistency constraints of each atomic data set are satisfied individually, the consistency of the database is maintained. 2) Based upon only the local transaction semantic information, each transaction is independently decomposed into a partially ordered set of elementary transactions. Each elementary transaction must preserve the consistency of the accessed atomic data sets. Furthermore, if the postcondition of each executed elementary transaction is individually satisfied, then the postcondition of the partitioned transaction (called a compound transaction) is also satisfied. I However, atomic data sets need not be independent entities. They can be closely related parts of a system. Example 2 illustrates this point.

148 An important result of this paper is that as long as each of the elementary transactions of a compound transaction is executed serializably with respect to each of the accessed atomic data sets, the database consistency constraints will be maintained, and the postcondition of the compound transaction will be satisfied. That is, once the database and transactions are properly decomposed, it is no longer necessary to serialize the entire transactions with respect to the entire database. Since only the elementary transactions, not the compound transactions themselves, are executed setwise serializably, we say that compound transactions are executed generalized setwise serializably. We will prove that generalized setwise serializability implies consistency, correctness, and modularity in concurrency control. We next introduce two protocols that ensure generalized setwise serializability. An advantage of our approach is that a variety of concurrency control protocols can be used by a single transaction. The only restriction is that the same protocol must be used by all the transactions on any particular atomic data set. Two convenient protocols are the setwise two phase lock and the setwise tree lock protocol. These two protocols are adopted from the two phase lock protocol [3] and the tree lock protocol [141, respectively, These two protocols can be used to enforce the setwise serializability of elementary transactions and thus the generalized setwise serializability of compound transactions. Under the setwise two phase lock protocol, an elementary transaction cannot release any lock on any data object of an atomic data set until it has obtained all its locks on that atomic data set. Once it has released a lock on a data object of an atomic data set, it cannot obtain a new lock on any data object of that atomic data set. It can, of course, obtain new locks on a different atomic data set, provided that it has not released any lock on that atomic data set. The setwise two phase lock protocol has two important implications. First, like the two phase lock protocol, one can use it without reference to the semantics of other transactions. Once a programmer has written or modified his own transaction, he only needs to know the atomic data sets accessed by each of the elementary transactions of this compound transaction to use this protocol. That is, the setwise two phase lock protocol is a modular concurrency control protocol. Second, unlike the standard two phase lock protocol under which no locks can be released until a transaction has obtained all its locks, the setwise two phase lock protocol allows locks on one atomic data set to be released, even if the elementary transaction later acquires locks on a different atomic data set. In other words, the setwise two phase lock protocol provides a higher degree of concurrency by taking advantage of the decomposition of the database. By using the setwise two phase lock protocol for each of the elementary transactions of a compound transaction, one can take advantage of the decomposition of transactions as well as the decomposition of the database. Finally, if the data objects of an atomic data set are organized in the form of a tree, such as the file directories of a node, than we can use the setwise tree lock protocol which requires each elementary transaction to follow the tree lock protocol when it accesses a tree-structured atomic data set.

IEEE TRANSACTIONS ON COMPUTERS, VOL. 37, NO. 2, FEBRUARY 1988

B. Examples of Application Since our approach permits nonserializable schedules, the main objective of Example 1 is to illustrate the basic structure of compound transactions and the concepts of consistency, correctness, and modularity under our nonserializable concurrency control approach. In addition, we also use this example to illustrate our approach for failure recovery. The objective of Example 2 is to discuss practical considerations in the use of this approach for a distributed file system. Given the popularity of serializability theory, the suggested use of nonserializable scheduling methods may seem unusual. However, these methods have been implicitly used by some familiar business operations, e.g., 24-h banking and credit card operations. Example 3 uses the concepts of atomic data sets and compound transactions to model these operations.We believe that as the understanding of nonserializable scheduling methods deepens, there will be more distributed systems explicitly using nonserializable concurrency control. Example 1: Suppose that transaction Get-A-and-B needs one unit of some resource at node A and another unit at node B. This transaction requires both resource types. If it cannot have both, then it will not keep either one of them. The pseudocode of transaction Get-A-and-B is listed in Table II. In this example, Heap A (Heap B) is modeled as an atomic data set with consistency constraint “0 IHeapSize IMaxSize,” and the transaction is written in the form of a compound transaction, Get-A-and-B, which is illustrated in Table 11. Compound transaction Get-A-and-B consists of four elementary transactions: Get-A, Get-B, Put-Back-A, and Put-BackB. Elementary transactions Get-A and Get-B execute first and in parallel, trying to get one unit of A and one unit of B, respectively. The results of the execution of these two elementary transactions are recorded in the atomic Boolean variables: Obtain-B and Obtain-B, respectively. Atomic variables are local variables to the compound transaction and are shared by its elementary transactions to pass the results of execution.2 After examining the values of Obtain-A and Obtain-B, elementary transactions Put-Back-A (Put-Back-B) will return the obtained unit of A(B) unless both Obtain-A and Obtain-B have the value true. Example 1 illustrates the following characteristics of our approach. First, the database is partitioned into two atomic data sets: Heap A and Heap B. Second, the compound transaction Get-A-and-B consists of four elementary transactions, which cooperatively carry out the task of the compound transaction by passing information via atomic variables Obtain-A and Obtain-B. Each of these elementary transactions preserves the consistency of Heap A and Heap B. Third, the locking protocol used by this transaction only ensures that each elementary transaction, not the compound transaction itself, is executed serializably with respect to each atomic data set. Since a compound transaction is not a single atomic action but a package of several cooperating atomic actions, the For the purpose of failure recovery, both the values of the database data objects and the values of atomic variables will be saved in stable storage when an elementary transaction commits.

SHA et al. : CONCURRENCY CONTROL AND FAILURE RECOVERY

149

while the transaction Get-A-or-B will keep the unit obtained from B. The result of this execution is not equivalent to executing these two transactions serially. When these two CompoundTransaction Get-A-and-B; transactions are executed serially, one of them will get both AtomicVanable Obtain-A, Obtain-B:Boolean; units. Even though the transactions are not executed serialBeginserial Beginparallel izably, the consistency of each heap and the correctness of ElementaryTransaction Get-A ; each transaction are preserved. That is, the sizes of Heap A Beginserial and Heap B will be positive and within their bounds, Writelock Heap A; If Heap A is not empty transaction Get-A-and4 will either get both units or nothing, then Beginserial and transaction Get-A-or-B will get both units, one of the two Take one unit from Heap A ; units or nothing. Obtain-A := true; Endserial Having discussed consistency and correctness, we now Commit and unlock Heap A ; comment on the issues of modularity and concurrency. Our Endserial; approach is modular, because when we schedule a compound ElementaryTransaction Get-B; transaction like Get-A-and-B, we make no reference to the Beginserial Writelock Heap B, semantics of other transactions. The avoidance of the use of If Heap B is not empty global transaction semantic information also distinguishes our then beginserial approach from other nonserializable concurrency control Take one unit from Heap B, Obtain-B := true; approaches, e.g., [4], [6], in which the semantics of sets of Endserial; transactions are used to enhance concurrency. Commit and unlock Heap B; Our approach provides a higher degree of concurrency than Endserial; Endparallel; that permitted by serializable schedules. For example, compound transaction Get-A-and3 locks a heap only when it is BeginParallel ElementaryTransaction Put-Back-A ; operating upon that heap. Suppose that we implement Get-ABeginserial and-B as a nested transaction and use serializable schedules. In If Obtain-A and not Obtain-B this case, even if nested transaction Get-A-and4 has obtained then Beginserial WriteLock Heap A ; one unit of A (B), it must still keep its lock on Heap A (B) until Put-Back the unit of A ; it has obtained the other unit of B(A) or has aborted. Thus, Commit and unlock Heap A ; nested transaction Get-A-and-B blocks the resource allocation Endserial; Endserial; activity on Heap A (B) for a longer period of time, compared ElementaryTransaction Put-Back-B, to the blocking caused by compound transaction Get-A-and-B. Beginserial Having addressed the issues in concurrency control, we If Obtain-B and not Obtain-A now illustrate our associated failure recovery method for clean then Beginserial WriteLock Heap E , and soft failures. A clean and soft failure models a temporary Put-Back the unit of B; crash of a computer, in which the content of main memory is Commit and unlock Heap B, lost, but the database in the stable storage remains intact. For Endserial; EndSerial; the purpose of failure recovery, the values of the database data Endparallel; objects and the values of the atomic variables that are accessed Endserial. by an elementary transaction will be indivisibly saved in stable storage when the elementary transaction commits. For exampostcondition (specification) of Get-A-and4 specifies the ple, when elementary transaction Get-A commits, both the correct alternative sets of atomic actions. Any set of combined value of Heap-A and the value of atomic variable Obtain-A actions must result in the transaction getting either both units must be indivisibly saved in the stable storage. Suppose that or no units. For example, one set of atomic actions consists of compound transaction Get-A -and-B fails after its elementary Get-A obtaining a unit from A and Get-B obtaining a unit from transaction Get-A has committed. Since Get-A leaves the B. Another set of atomic actions consists of Get-A obtaining database in a consistent state, other transactions will be one unit from A , Get-B not obtaining any unit from B and Put- executed consistently and correctly despite the failure of GetA-and-B. Since other transactions leave the database consistBack-A returning the obtained unit to A. To illustrate that global serializability is not enforced by our ent, and the values of atomic variables Obtain-A and Obtainmethod, suppose that we have a second compound transaction, B are saved in stable storage, when Get-A-and4 resumes its Get-A-or-B, which tries to get one unit of resource from node execution it will also be executed consistently and correctly. A A and one unit of resource from node B. If it cannot get both, formal treatment of failure recovery is presented in Section then it keeps whatever it has obtained. Suppose that we IV . We now introduce the second example to further illustrate execute transactions Get-A-and4 and Get-A-or-B with Heap A and Heap B having initial values of one. It is possible that the decomposition of the database and transactions. In many transaction Get-A-and4 gets the only unit of A first, while instances, the decomposition of the database into atomic data transaction Get-A-or-B gets the only unit of B first. As a sets is obvious. In the previous example, the resource heap in result, transaction Get-A-and-B will return one unit to A , each node is a data object with its own consistency constraints TABLE 11

COMPOUND TRANSACTION GETA-AND-B

150

that can be maintained without reference to the states of other data objects. Thus, each heap is the data object in a singleton atomic data set. In some applications, the decomposition requires the use of consistency constraints that are weaker than the corresponding ones in centralized systems. We illustrate this point by an example of managing the distributed directories of a distributed file system. This example also illustrates that atomic data sets need not be independent entities. They can be closely related parts of a system. Example 2: We consider a set of shared system files distributed at different nodes. There is a local directory (LD) at each node indicating the locations of resident files. With only these local directories, a user must potentially search through all the LD’s in order to locate a file, and this would be very inefficient. To increase efficiency, the system has a global directory (GD) at some node. The GD indicates which LD should be searched for each of the shared files, and it is replicated for better availability. When one needs a file, the local operating system kernel will first search through its LD, and then it will search a nearby GD, if the file is not in its LD. The introduction of GD’s facilitates file lookups, but in a large system the GD’s can become a performance bottleneck. To further improve the efficiency, the local operating system kernel at each node constructs a partial global directory (PGD) which indicates the resident nodes of the frequently used remote files. Although GD’s and PGD’s help locate files, they also make the updating process, including moving files, more complicated. One could define a set of consistency constraints which require all the GD’s, PGD’s, and LD’s always to point to the current locations of the files. If we were to do this, then all these directories would form a single atomic data set, since we could not maintain the consistency of anyone of them independently of any other. This would imply that when one moves a file from one machine to another, the updating of the source and destination LD’s, the GD’s, and all the relevant PGD’s must appear to all other transactions as an instantaneous event. This can be accomplished by following the setwise two phase lock protocol to lock the source and destination LD’s, all the GD’s, and all PGD’s that contain an entry indicating the transferred file. However, the resulting concurrency is poor, because this atomic data set is too large for its intended application. If an atomic data set contains data objects that are infrequently used, then we do not need to concern ourselves about its size, since there is seldom any blocking. However, directories are among the most frequently used data objects in an operating system, and therefore we would like to have a finer decomposition to enhance system concurrency. One approach which permits a higher degree of concurrency is to use “recent” historical information in lieu of current information. To this end, we first relax the consistency constraints to allow GD’s and PGD’s to point to any valid node location. The consistency constraints of the LD’s remain unchanged, so they must point to the current locations of files. With these weakened consistency constraints, each LD, GD, and PGD is an atomic data set, because we can maintain the consistency constraints of any one of them independently of the states of other atomic data sets. Consequently, one can

IEEE TRANSACTIONS ON COMPUTERS, VOL. 31, NO. 2, FEBRUARY 1988

readlock an LD, get the information needed, unlock it, and then update each of the two GD’s. That is, we can read the LD without blocking the activities on other atomic data sets, and we can update one GD and satisfy its consistency constraints, independently of how the LD’s, PGD’s, and the other GD are being updated by other transactions. To help make the information provided by GD’s and PGD’s to be “recent” historical information, we can require that each transaction which updates an LD also updates the two GD’s. However, we do not require the update of the global and local directories to appear as an instantaneous event to other transactions. In addition, each PGD is managed by a “fault driven” policy. When a transaction uses a PGD, it will increment the “success counter” or the “failure counter” associated with the PGD according to the result derived from using its information. The local operating system kernel will periodically compute the percentage of reference failures. Should this percentage exceed a threshold, the entries in the PGD will be updated by using information in one of the GD’s. Generally speaking, weaker consistency constraints permit a finer partition of the database and thus a higher degree of concurrency. However, once the consistency constraints are weakened, the complexity of transactions will be increased for two reasons. First, the process of weakening the consistency constraints enlarges the number of system states that are considered to be consistent. For example, if the set of consistency constraints associated with the GD and PGD are relaxed, a transaction must be written to function correctly even though it may not be given the current file location, because transactions must have the ability to deal with all possible consistent system states. Second, strong consistency constraints generally ensure that the system will stay in a small set of favorable states. However, the enforcement of strong consistency constraints may be too expensive in terms of the loss of concurrency. When the set of permissible states is enlarged, the specifications of the transactions should be redesigned to help keep the system in favorable states. This may also increase the complexity of transactions. To illustrate the concept of favorable states, consider the directory example when weak consistency constraints are used. The states in which most of the entries in a GD point to current locations of files are favorable states. These states are considered to be favorable, because they allow one to locate files efficiently. The states in which only a small percentage of entries in the GD’s point to current file locations are unfavorable states. To help the directory system stay in the favorable states and to help locate the files, we require file manipulation transactions to use the following three performance enhancement schemes. First, the GD will be updated whenever an LD is updated. Second, when a file is migrated, a forwarding address must be left in the source PGD. Third, we use the fault driven policy mentioned above to update the PGD’s so that its information will be reasonably up to date. Note that each of these three schemes can be incorporated into the transaction specifications. It is important to realize that performance enhancement schemes only increase the probability of the system staying in favorable states. For example, using the performance en-

SHA et al. : CONCURRENCY CONTROL AND FAILURE RECOVERY

hancement scheme “update the GD’s if an LD is updated,” a transaction just needs to update the LD and then the GD, but does not need to use a concurrency control protocol to make the transfer of the file and the updating of the LD and GD appear to be an instantaneous event to the other transactions. This certainly does’not ensure that the entries in the GD’s are as timely as those in the LD’s. Indeed, it is possible that when transaction TI transfers file F from node 1 to node 3 and updates the LD at node 1, another transaction T2is reading the GD at node 2, which indicates that file F is in node 1. When transaction TI is updating the GD at node 2, T2 may be searching the LD at node 1 to locate file F and would find that file F is no longer at node 1. Finally, we want to point out that the specification “Update the GD’s, if an LD is updated” does not ensure GD’s are updated in the correct order. For example, suppose that transaction TI transfers a file from one node to another, but before it can update the GD’s, it crashes. In the meantime, the transferred file is moved again by another transaction T2, and T2 successfully updates the GD’s. When TI recovers, it will actually put the GD’s into an older state. This is, of course, permitted by the weaker consistency constraints. We can expect this to be an infrequent event and accept it as a cost of enhancing concurrency. We can also eliminate this problem at the cost of additional transaction complexity. For example, we can use a version control procedure. The version number of an LD directory entry will be incremented whenever the directory entry is updated. When an entry in a GD is updated, the version number of the corresponding LD entry will be copied to the GD entry. With this arrangement, the new specification becomes “update the corresponding GD entry if an LD is updated and if the version number of the LD directory entry is larger than that of the corresponding GD entry.” Example 3: A typical 24-h electronic banking network links many banks and provides service in many locations. The easiest way to provide remote service to a member bank is to let the 24-h banking machines act like terminals while letting the bank’s own computer do all the processing. However, there is a problem with this approach. When a bank joins a network, the peak communication and processing demands that can be generated by the large number of 24-h banking machines in the network can easily overload both the communication and the processing capability of the bank’s existing facility. It is not appropriate to simply ask member banks to buy more hardware, nor is it a good business practice to let customers stand at a banking machine and wait a long time for the transaction to be completed. A logical solution to this performance problem is, of course, to use distributed processing technology. In a distributed processing approach, the banking machines of any member bank provide full service not only to the bank’s own customers but also to the customers of any other member bank. This requires each bank’s computer to have account information for each customer in the network. A major design problem in this approach is the specification of the consistency constraints of the distributed copies of a customer’s account information. If we require all the copies of an account to have the same up-todate information, then the account and all of its copies become

151

a single atomic data set. As a result, whenever there is a change in an account, both the account and all its copies must be updated in a way that appears to be an instantaneous event to all other transactions. This is costly to implement and provides poor performance. Clearly, lower cost and higher performance alternatives must be considered. From our experience with Example 2, we would suggest the use of weaker consistency constraints. In fact, some 24-h banking networks have used this idea. In essence, each copy of an account is allowed to have either historical or current information, just like the GD’s and PGD’s in Example 2. It follows that each copy is an atomic data set. To keep the copies reasonably up to date, the computer of the customer’s bank will periodical update all the copies of the customer’s account. If a particular bank’s computer is down or busy, its copies will be simply updated later without aborting the update operations on the other banks’ copies. In addition, each transaction on a 24-h banking machine can be represented by a compound transaction. For example, suppose that a customer wishes to withdraw money from a 24-h banking machine. If the local copy of his account indicates that he has enough money in his account, an elementary transaction of the “withdraw” compound transaction will commit and give him the money. Another elementary transaction will contact the computer of the customer’s bank to debit his account and to transfer the money back to the bank which gave him the cash. The use of weak consistency constraints to promote concurrency is not without its drawbacks. For example, suppose that a customer deposits $50 cash into his saving account in his bank, and immediately checks his account balance from another participating bank’s 24-h banking machine. It is possible that the balance given to him would not reflect the $50 deposit he just made. However, if he returns to his bank and asks a clerk to check the current balance, it will probably show that the balance has increased by $50. The clerk will explain to him that it takes time to get all the computers in the network updated. However, for cash deposits, the money will appear on the 24-h machines no later than the next day. From a concurrency control point of view, this customer originally expected that transactions were executed serializably in the banking network. However, customers seem to have no difficulty accepting the results of nonserializable concurrent execution: a 24-h banking machine does not guarantee its information to be up to date. People often deal with systems whose information is not guaranteed to be up to date. For example, the telephone directory we use every day does not guarantee its information to be up to date ‘ either. In addition to the minor inconvenience to customers, banks can also encounter problems when weak consistency constraints are used. The banks cannot always successfully prevent a customer from overwithdrawing. If a customer withdraws money from different banks’ machines within a very short period of time, he may be able to obtain more cash than is actually available in his account. The bank’s solution is a simple one, limiting the amount one can withdraw so that if someone overwithdraws, the bank’s liability is limited. Despite these drawbacks, the use of weak consistency constraints

152

IEEE TRANSACTIONS ON COMPUTERS, VOL. 37, NO. 2, FEBRUARY 1988

to promote system concurrency in distributed processing does help realize the objectives of 24-h banking: providing fast and convenient service to customers with low operational cost to the banks. The concepts of atomic data sets and compound transactions can also be used to model distributed credit card operations. For example, atomic data sets can be used to model the merchant’s booklets which list stolen and invalidated cards, the credit card operation’s centralized database, the merchant’s accountingdatabase, the merchant’s bank account, and the cardholder’s credit account. The checking of the customer’s credit, the sales transaction, and the subsequent transfer of money between the customer’s and merchant’s accounts can be viewed as elementary transactions of the compound transaction “credit card purchase. Having illustrated the ideas, we will formalize them in the next section. ”

III. THEORETICAL DEVELOPMENT Having illustrated our approach with examples, we now prove the consistency, correctness, and modularity of our approach. To keep this presentation concise, we will informally describe most familiar concepts such as the database and schedules and will formally define only new concepts such as atomic data sets and modularity of concurrency control. Except for the proofs of consistency, correctness, and modularity, many related results will be merely cited. Interested readers should consult [13] for a thorough treatment of our theory. This section is divided into four parts. The first three define the concepts of atomic data sets, compound transactions, and scheduling rules and their properties, respectively. The fourth defines the generalized setwise serializable scheduling rule and shows that it is consistent, correct, and modular. The problem of failure recovery will be addressed in Section IV .

A . Atomic Data Sets We begin our discussion with the familiar concepts of data objects and the database. Conceptually, a data object 0 is a unit of data with an associated set of values, the domain of 0. The database is the set of all the data objects. In a distributed operating system context, the concept of a database corresponds to the set of shared system data objects such as the file directories, communication ports, and resource heaps. For a database with n data objects, a state of the database is a vector of n components each of which corresponds to a value of a data object in the database. We assume that there is a set of consistency constraints associated with the database. These constraints are specified by a set of predicates on the state of the database. A state of the database is said to be consistent if and only if the state satisfies the consistency constraints. In the previous section, we informally introduced the concept of atomic data sets. They have the property that as long as each of them is in a consistent state, then the database is in a consistent state. We now give atomic data sets a precise definition using the concept of a consistency preserving partition of the database. Notation: Let I = (1, 2, * , n) be the index set 02, corresponding to the elements of the database D = (01,

-

--

On).Let I’I = (SI, ,Sk)be a partition of the index set I. Let I*be the concatenationof the elements of the paftition , Sk.Consequently, I* is a ll in the order SI,S2, permutation of I. Let U be the set of all the consistent states of the database D and U, be the projection of U onto index set S I . Let U* be the projection of U ontd index set I*. That is, the elements of U* are permutations of the elements in U. For example, if I = (1, 2, 3, 4, 5J and ll = (SI, S2, S3)where SI = (1, 5),S2 = (3,4), and& = (2), thenI* = (1, 5, 3,4,2). Definition 1: A partition of the database index set I, I’I = (SI, S2, * , s k ) is said to be consistency preserving, if and only if a ,

+

-

uIxu2x”’xuk=~*.

-

Suppose that ll = (SI, , Sk)is a consistency preserving partition of database index set I. The sets of data objects corresponding to index sets SI, * ,s k are said to be atomic data sets. The corresponding partition of the databaseD, P, is said to be a consistency preserving partition of the database. For example, suppose that we have a database consisting of and the consistent states are ((0, data objects 01,02,and 03, O , O ) , (0, l,O), (1,0, l ) , (1, 1, 1)). Letnbeapartitionofthe database index set, e.g., I’I = ((1, 3), (2)). The partition n is consistency preserving, because U* = ((0,0,0), (0,0, I), (1, 1 , 0 ) , ( 1 , 1, 1))istheCartesianproductofUI = ((O,O), (1, 1))and U, = ((0), (1)).It followsthatpartition P = ((O1, 4),(02)) is a consistencypreserving partition of the database, 0,) and (4) are two atomic data sets. and (01, An immediate consequence of the definition of a consistency preserving partition is that if each atomic data set is in a consistent state, then the database is in a consistent state. To illustrate this concept concretely, consider the directory example with the weak consistency constraints described in Section 11-B. The database (directory system) is in a consistent state, if each local directory (LD), each global directory (GD), and each partial global directory is in a consistent state, that is, if each LD points to the current locations of files, each GD, and each PGD point to a valid node location. Thus, each LD, GD, and PGD comprises an atomic data set. Finally, it is important to point out that consistency preserving partitions need not be unique. These partitions are partially ordered by refinement, and a unique maximal partition always exists with respect to the given set of database consistency constraints [13]. To achieve a higher degree of concurrency than that offered by the maximal partition, one must replace the original set of consistency constraints by a weaker set. This approach was illustrated by the distributed directory example in Section 11-B.

-

B. Compound Transactions The database D is operated upon by a set of transactions { TI, T,,}. We begin our discussion of transactions with a,

the familiar straight line transaction which is a sequence of steps, Ti = (ti,l, * -,ti,mi).Thejth step of transaction Ti, ti,j, is modeled as the indivisible execution of the following instructions [5]:

L ‘ .J . : = 0.. I J

153

SHA et ai.: CONCURRENCY CONTROL AND RNLURE RECOVERY

C. Schedules and Scheduling Rules where the local variable L,, is used by step t,,, to store the value read. The symbol “01,,” denotes the date object accessed (read or written) by step t l j , and the symbol ‘‘J,j” represents the computation performed by step tl,,. In this model, every step reads and then writes a data object. A read step is interpretid as writing the value read back to the data object. The fundamental qssumption we make about a transaction is that it is consistent and correct when executing alone. By “consistent” we pean that given an initially consistent state, a transaction outputs a consistent state. By “correct” we mean that given an initially consistent state, the results output by a transaction dways agree with the stated postcondition. A postcondition is a predicate which asserts that the values of the data objects output by a transaction are some functions of the values of the data objects input to the transaction. Suppose that ( x I ,* x, ) and (yl, y,) are values input from and output to objects (01, * * , 0,) by a transaction. Postconditions are predicates in the form of ( y = ~ f1(xl, * ,x,)) A *

-

-

a ,

A ( Y n = fn(x1,

*

‘ 9

-

e ,

--

xn)).

We now generalize the structure of a straight line transaction to a compound transaction. A compound transaction consists of a partially ordered set of straight line transactions called elementary transactions and a set of local variables called atomic variables. The atomic variables of a compound transaction are local to the compound transaction but global to (shared by) the elementary transactions. For example, for the compound transaction Get-A-and-B described in Section U-B, the partially ordered elementary transactions are Get-A, GetB, Put-back-A an4 Put-back-B, and the atomic variables are Obtain-A and Obtain-B. Definition 2: A compound transaction is a partially ordered set of straight line transactions called elementary transactions. Associated with a compound transaction, there is a set of local variables called atomic variables which are local to the compound transaction but global to (shared by) its elementary transactions. A step of an elementary transaction reads or writes either a data object or an atomic variable. Finally, the postcondition of the compound transaction is the conjunction of the postconditions of its elementary transactions. Note that for simplicity, we have assumed that there are no condition statements in both a compound transaction and its elementary transactions. That is, in this model every step of every elementary transaction of a compound transaction is executed. Conceptually, this syntax models one possible path of execution in a compound transaction with conditional statements. For example, in compound transaction Get-A-andB, one possible execution path is successfully getting both resource A and B. We assume that when an elementary transaction of a compound transaction is executed serially and in an order that is consistent with the partial order defined by the compound transaction, it preserves the consistency of the database and satisfies its postcondition. In addition, we assume that the postcondition of a compound transaction is equivalent to the conjunction of all the postconditions of its elementary transactions.

We begin with the familiar concept of schedules. A transaction system is a set of transactions, and a schedule of a transaction system is a total ordering of all the steps of the transactionsin the system such that the ordering of the steps of any transaction in the schedule is consistent with the ordering of steps in the transaction. The two most important properties of a schedule are consistency and correctness. Definition 3: Given any transaction system composed entirely of consistent and correct transactions, a schedule z for this transaction system is said to be consistent and correct if and only if under schedule z for any initially consistent state of the database, the database ends in a consistent state, and for each transaction the results of execution satisfy its postcondition. When we discuss schedules, a familiar concept is the equivalenceof schedules. We consider two schedulesz and z* for a given transaction system to be equivalent, denoted as z = z*, if z and z* induce identical partial orderings of steps on each of the data objects in the database. That is, for any pair of steps which operate on the same object, t1,, and tk,n, if t,,, precedes tk,, under schedule z then t,,, also precedes tk,n under schedule z*. Under two equivalent schedules z and z*, for any initial state of the database the executions of transactions according to z and z* yield the same sequence of values for each data object in the database and the same sequence of values for the local variables of transactions. It is straightforward to show that if z is consistent and correct, then z* is also consistent and correct. Theorem I : If schedules z and z* for a transaction system are equivalent and schedule z is consistent and correct, then schedule z* is also consistent and correct [ 131. We now address the concept of scheduling rules. Conceptually, a scheduling rule is a specification of the ways in which the steps of one transaction can be interleaved with the steps of other transactions. Formally, a scheduling rule is modeled as a function that partitions the steps of a transaction into sets of steps called transaction step segments. The schedules associated with a scheduling rule are those under which the transaction step segments of a transaction are interleaved serializably with the transaction step segments of other transactions. For example, serializability theory corresponds to a scheduling rule which takes the entire set of steps of a transaction as a single transaction step segment. Serializable schedules are those under which the transaction step segments specified by serializability theory are interleaved serializably. We now formalize these ideas. Notation: Given a database, let ‘f, denote the set of all possible consistent and correct transactions with m steps. Let ‘f denote the set of all the possible consistent and correct transactions, that is, ‘f = U ,”= Tm.Consider a partition of an m-step transaction into transaction step segments, and let 6, denote the set of all such possible partitions. Let 6 be the set of all the possible partitions qf all consistent and correct 6,. transactions, that is, 6 = U Definition 4: A scheduling rule R is a function which takes a transaction system of any size and partitions each of the transactions in the system into transaction step segments.

b

154

IEEE TRANSACTIONS ON COMPUTERS, VOL. 31, NO. 2, FEBRUARY 1988

D. Setwise and Generalized Setwise Serializable Scheduling Rules

T is R,, where

such that the restriction of R to

Once a scheduling rule R has been introduced, we assume that each of the transactions in a transaction system is decomposed into transaction step segments according to R . We now define the scheduling of transactions. Notation: Let T = { T l , T,} be a set of consistent and correct trahsactions. Let Ti be a transaction in T. Let ER(Ti) denote the partition of the steps of Ti by R . Let D be the database and Z ( T ) be the set of all the possible schedules for T. Let ai denote an arbitrary transaction step segment of T;, that is, U;E ER(Tj).Finally, let “ti,m< denote that step tI,mis executed before step tJ,,. Definition 5: A schedule z E Z ( T ) is said to be transactions step segment serial with respect to R if the transaction step segments specified by R and which belong to different transactions do not overlap in z , i.e.,

-

e,

v ( T i , Ti E T, i # j ) V ( u , E &(Ti))V(u, E ER(T’)) *

((tYi

Suggest Documents