Using Versions in Update Transactions: Application to Integrity Checking Francois Llirbat, Eric Simon, Dimitri Tombro INRIA 78153 Le Chesnay France email: rst name.last
[email protected] March 10, 1997
Abstract
This paper proposes an extension of the multiversion two phase locking protocol, called EMV2PL, which enables update transactions to use versions while guaranteeing the serializability of all transactions. The use of the protocol is restricted to transactions, called write-then-read transactions that consist of two consecutive parts: a write part containing both read and write operations in some arbitrary order, and an abusively called read part, containing read operations or write operations on data items already locked in the write part of the transaction. With EMV2PL, read operations in the read part use versions and read locks acquired in the write part can be released just before entering the read part. We prove the correctness of our protocol, and show that its implementation requires very few changes to classical implementations of MV2PL. After presenting various methods used by application developers to implement integrity checking, we show how EMV2PL can be eectively used to optimize the processing of update transactions that perform integrity checks. Finally, performance studies show the bene ts of our protocol compared to a (strict) two phase locking protocol.
1 Introduction Constraint checking is a key issue of many modern applications, which is acknowledged by recent evolutions of the SQL standard to accommodate a larger class of assertions (SQL2) and triggers (SQL3 [Mel93]). To enforce integrity constraints, update transactions may have to perform many additional read operations. This situation is also frequent in the area of data-mining and decision support applications. An interesting case is decision support applications using sophisticated statistical methods. There, statistical tests are triggered as soon as some relevant situation is detected in the database as a result of updates. Additionally, one may specify when to inform decision-makers about actual ndings, e.g., when the value of some statistical test exceeds a threshold. In such applications, the computation of statistical tests as well as the alerting of decision-makers entail additional read operations. Update transactions that perform many read operations expose to speci c performance problems that are illustrated by means of the following concrete example. 1
Example 1.1 Consider the following example of a stock-market application. We consider a broker who is responsible for ordering shares for her client's portfolios. The following relations are used:
Order (orderkey, portfolio_id, company_name, stockmarket, quantity, price, date) MaxRisk(stockmarket, value) MinRisk(stockmarket, value) Available(portfolio_id, amount)
Share orders are registered in the Order relation. This relation contains for each order, the ordered share description (company name, stockmarket and price), the purchaser portfolio's identity and some information about the nancial transaction (quantity, date). The MaxRisk and MinRisk relations respectively give for each stock-market the minimal and maximal risk that the broker should take to make bene ts. These relations are frequently updated by transactions that analyse the activity of the various stock-markets. The Available relation gives, for each portfolio, the total amount of money available. This relation is incremented when shares are sold or when the broker's client decides to increase her portfolio. The application enforces two constraints:
The Risk constraint is based on a function risk(o), which given a share order o on a stockmarket m computes the risk taken if o is completed1 . It forbids any insertion of o if risk(o) is not contained in the risk threshold interval of m indicated in relations MaxRisk and MinRisk. The Available constraint checks, for each portfolio p, if the total amount of share-orders during the week does not exceed the amount available for p.
Let us take a transaction program, entry order(p) that inserts tuples into relation Order for a given portfolio p. To enforce the Risk and Available constraints the transaction has to perform additional read operations before committing. First, checking the Risk constraint requires to read for each new inserted share order the corresponding items in relations MaxRisk and MinRisk. Second, checking the Available constraint, requires to read both the Available tuple that contains the available amount associated with p and the Order relation to nd all the orders for p performed during the week. Suppose that an entry order(p) transaction runs in isolation degree 3 and obeys the strict two phase locking policy (S2PL) [BHG87]. Whenever entry order executes, the reads incurred by constraint checking may be con icting with concurrent executions of update transactions on relations MaxRisk, MinRisk, Available and Order. When a con ict occurs between two transactions, one transaction is blocked and waits that the other one releases its locks (when committing or aborting). Thus, running instances of entry order augment the lock contention and impede the transactional trac in the database system. This situation may lead to performance thrashing as showed in [Car83].
1.1 Existing Solutions
To avoid thrashing, a rst solution is to use a \read committed" locking level (also called locking isolation level 2). A transaction running under this level uses short duration read locks. Unfortunately, such a consistency level may lead to a constraint violation, called read skew anomaly in [BBG+ 95], as showed by the following example: 1
risk(o) may use some statistical functions and read other relations not represented in our example
2
Example 1.2 Let T1 be an entry order transaction that inserts a share order o on the stockmarket m such that risk(o) = 35, and checks the Risk constraint. Let t = (m; 20) (resp. u = ( m; 30)) be the tuple of MinRisk (resp. MaxRisk) that contains the minimal (resp. maximal) advised risk for the stock-market m. Suppose that a transaction T 2 increments the Value attribute of tuples t and u by 20. Then executing T 1 and T 2 concurrently under locking isolation level 2 may yield the following history2 : H1:
W1(o)
R1(t)
W2(t)
W2(u)
commit(T2)
R1(u)
commit(T1)
To check the constraint, T 1 reads tuples t and u. According to the above history, T 1 sees a risk interval of [20, 50] and hence accepts the insertion of o. But the interval read by T 1 does not correspond to any coherent state (it should either be [20, 30] or [40, 50]).2 Another solution is to use versions to increase concurrency between transactions [BHG87]. In the multiversion scheme, an update operation on an item x does not modify directly x but creates a new version of x. A read operation on an item x, noted read(x), accesses one of the versions associated with x without taking any lock. With the Snapshot Isolation multiversion protocol3 [BBG+ 95], a transaction T executing a read(x) operation always reads the most recent version of x that has been committed before the beginning of T , later called Start Timestamp of T . Therefore, T reads a snapshot of the database as of the time it started. Updates performed by transactions that are active after T 's Start Timestamp are invisible to T . When T is ready to commit, it gets a Commit Timestamp greater than any existing Start Timestamp or Commit Timestamp. Then, T successfully commits only if no other transaction with a Commit Timestamp belonging to T's interval: [Start Timestamp; Commit Timestamp] wrote data that T also wrote. Otherwise, T aborts. When T commits, its changes become visible to all transactions whose Start Timestamp is larger than T 's Commit Timestamp. Executing transactions under snapshot isolation avoids the \read skew" anomaly showed before. Indeed, since all the read operations of a transaction are performed using the same Start Timestamp, the transaction always sees a same state of the database. For the same reason, the snapshot Isolation has no phantoms [BBG+ 95]. Unfortunately, transactions running with snapshot isolation may suer from another constraint violation, called write skew anomaly in [BBG+ 95], as the following example shows:
Example 1.3 Let T 1 and T 2 be two entry order transactions that both perform share orders for
the same portfolio p with an amount of 50, and check the Available constraint. Let t= (p; 100) be the tuple in relation Available that contains the amount available for portfolio p. We assume that the current total amount of share orders for portfolio p during the week is equal to 20. Executing T 1 and T 2 concurrently may yield the following history: H2 :
W1(o) R1(t) W2(o') R1(Order) commit(T1) R2(t) R2(Order) commit(T2)
To check the constraint, T 1 reads item t and relation Order. Since T 2 has not yet committed when T 1 starts, T 1 reads a previous version of Order and does not see the insertion of o0 by T 2. For the same reason, T 2 does not see the insertion of o by T 1. Thus, both transactions evaluate the total amount of share orders to 70 and conclude that the Available constraint is not violated since 70 < 100, although the real amount is equal to 120, which violates the constraint. 2 2
W (resp. R ) denotes a write (resp. read) operation performed by transaction T i
i
i
This protocol is implemented by several commercial DBMS such as Borland's Interbase 4, Microsoft Exchange System, and Oracle V7. 3
3
Another multiversion scheme is the 2V2PL [BHG87] protocol, which guarantees serializability of transactions, and therefore does not suer from the above consistency problems. The main problem with 2V2PL protocol is to use an \optimistic" approach to concurrency control, whereby read-write lock con icts are resolved at the end of transactions by aborting transactions. This makes dicult to compare its performance with the \pessimistic" S2PL protocol.
1.2 Contributions
Our main contribution is to propose an extension of the multiversion two phase locking (MV2PL) protocol, called EMV2PL, which enables update transactions to use versions and guarantees the serializability of all transactions. The use of our protocol is restricted to a particular class of update transactions called write-then-read transactions (henceforth noted WjR transactions). A transaction in this class consists of two consecutive parts: the rst part (called the write part) contains both read and write operations in some arbitrary order, and the second part (abusively called the read part) only contains read operations or write operations on data items already written in the rst part of the transaction. In our protocol, the W part of a WjR transaction T is executed under the S2PL protocol. Then, when T reaches the end of its W part, T gets a Lockpoint Timestamp and releases all its read locks. From that point, T does not take locks anymore and a read(x) operation accesses the most recent committed version of x that precedes T 's Lockpoint Timestamp.
Example 1.4 Consider again the histories H1 and H2 of previous examples. If we execute history T 1 gets a Lockpoint Timestamp ts1 after inserting o and before reading tuples t and u, while T 2 gets a Lockpoint Timestamp ts2 after writing u. Thus, since ts1 < ts2, T 1 does not see the version of u created by T 2 and reads a risk interval of [20; 30]. Hence, T 1 detects a
H1 under EMV2PL,
constraint violation. For history H2, T 1 gets a Lockpoint Timestamp ts1 after inserting o and before reading t while T 2 gets a Lockpoint Timestamp ts2 after inserting o0 and before reading t. Since ts1 < ts2, T 1 does not see the version of o0 created by T 2 whereas the version of o written by T 1 is seen by T 2. Thus, T 2 sees a total amount of 120 and detects a violation of the Available constraint . 2 Risk
Our protocol increases concurrency by allowing WjR transactions to take advantage of versions in two ways: (i) they release their read locks before executing their read part, and (ii) they execute their read part without taking any lock, as read-only transactions do with MV2PL. As Snapshot Isolation does, EMV2PL performs all the read operations of the read part using the same lockpoint-timestamp and thus guarantees that no phantom occurs during the execution of the read part. Moreover, like S2PL and unlike Snapshot Isolation and 2V2PL, EMV2PL uses a \pessimistic" approach to concurrency control. This allowed us to fairly compare its performance with an S2PL protocol, and demonstrate that EMV2PL brings a signi cant increase in concurrency and reduces the probability of deadlocks. Finally, a notable feature of our protocol is its simplicity, as attest the few modi cations to a classical MV2PL implementation that are required to implement it. We consider it as a virtue since our intention in this research is not to invent yet another new concurrency control protocol but rather to enhance existing implementations to better match application needs. A second important contribution of this paper is to show how EMV2PL can be eectively used to optimize an application's transaction throughput when update transactions execute triggers or check integrity constraints. We examine various methods, procedural and declarative, used by 4
Operation Invocation
begin(T ) ... read(x) ... end(T )
Operation Execution get sn(T ) from TM return x's version with largest version number sn(T )
Figure 1: Execution of an R transaction application developers, to implement integrity checking in transactions and analyse the consequence of each method on the pattern of transactions. Finally, we propose tuning rules that indicate how to design transactions with integrity checking under EMV2PL to achieve the better performance throughput.
1.3 Paper Outline
The remaining of the paper is structured as follows. Section 2 formally de nes the EMV2PL protocol, proves its correctness and brie y explains how it can be implemented. Section 3 discusses the potential of EMV2PL to optimize transactions that perform integrity checking. Section 4 presents our performance study and provides simple tuning rules. for designing transactions with constraint checking under EMV2PL. Section 5 presents related work and compares EMV2PL with 2V2PL, and Section 6 concludes the paper.
2 Extended Multiversion Two Phase Lock Protocol In this section we formally de ne the EMV2PL protocol and prove its correctness, we explain its behavior with respect to deadlocks and external consistency, and nally we brie y explain how it can be implemented.
2.1 The EMV2PL Protocol
We now present our Extended MV2PL protocol, called EMV2PL. Figures 1 and 2 respectively show how operations issued by R and W transactions are processed by the Transaction Manager (TM). An R transaction rst obtains a start number noted sn from TM. Then every read(x) gets the most recent version of x having a timestamp less than or equal to sn. Reads in a W transaction follow the usual S2PL protocol, whereas a write(x) creates a new version of x (if x is written for the rst time). Before committing, a W transaction obtains its transaction number (noted tn), associates this number to each of its versions, and releases all its locks. Figure 3 shows how the operations issued by a WjR transactions are processed. The write part of the transaction is processed as a W transaction. When the end of the write part is reached, the transaction signals it reached a lockpoint4 to the TM and receives a transaction number tn. The transaction then releases all the S locks it has acquired so far. After that point, read and write operations are processed as follows. A read(x) operation invokes a function check read(x) that 4
A lock point of a transaction is any point in time between the last lock acquired and the rst lock released
5
Operation Invocation
begin(T ) ... read(x)
Operation Execution
get read-lock on x =may wait according to the 2PL protocol= return the most recent version of x
... write(y)
get write-lock on y =may wait according to the 2PL protocol= creates a new version of y
... end(T )
get tn(T ) from TM commit(T ): perform database updates with version number tn(T ) release locks
Figure 2: Execution of a W transaction
Operation Invocation
begin(T ) ... read(x) ... write(y) ... lockpoint(T ) ... read(z) ... write(t) ... end(T )
Operation Execution
get read lock on x =may wait according to the 2PL protocol= return the most recent version of x get write-lock on y =may wait according to the 2PL protocol= create a new version of y get tn(T ) from TM release S locks
check read(z) =may wait (see EMV2PL protocol)= return z's version with largest version number tn(T ) update the last version of t =this version was created by T before lockpoint(T ) =
commit(T ): perform database updates with version number tn(T ) release locks
Figure 3: Execution of WjR transaction 6
checks if there is an uncommitted version5 of x created by another transaction whose number is smaller than the caller's tn. In that case, check read waits until that transaction commits. After that, the WjR transaction reads the most recent version of x with timestamp smaller than or equal to tn. A write(x) operation only modi es a version already created in the write part. Before committing, a WjR transaction associates its tn to each version it created and releases all its locks. To maintain the tn's, the TM uses a monotonically increasing counter. Since W and WjR transactions obtain their tn after they acquired their last locks and before committing, tn's are lockpoints. For R transactions, the TM simply guarantees that their sn is smaller than the tn of any active or forthcoming transaction. Thus, an R transaction reads only versions of committed transactions. Theorem : The EMV2PL protocol guarantees serializability of all transactions. Proof: See Appendix A.
2.2 Deadlocks
Clearly, EMV2PL suers from deadlocks since it uses S2PL for serializing W transactions and the write parts of WjR transactions. However, once a WjR transaction starts executing its read part, it may not be involved anymore in deadlocks. Proof: See Appendix A.3.
2.3 External Consistency
Although EMV2PL guarantees serializability, it does not preserve external consistency. That is, the order in which transactions commit may dier from their serialization order as shows the following example: Example 2.1 Let T1 be a WjR entry order transaction that inserts a share order o on the stockmarket m such that risk(o) = 35 and checks the Risk constraint in its read part. Let t = (m; 20) (resp. u = ( m; 30)) be the tuple of MinRisk (resp. MaxRisk) that contains the minimal (resp. maximal) advised risk for the stock-market m. Suppose a transaction T 2 increments the Value attribute of tuple u by 10. Then executing T 1 and T 2 concurrently under EMV2PL may yield the following history: H1:
W1(o)
R1(t)
W2(u)
commit(T2)
R1(u)
rollback(T1)
Since T 1 reaches its lockpoint before T 2 then it is serialized before T 2 by EMV2PL and thus T 1 does not see that transaction T 2 made (and committed) a larger risk interval and that the order could have been accepted. 2 Such consistency \faults" are likely to occur if the read part WjR transactions are long. Should external consistency be critical, it may help to show the value of transaction lockpoint-timestamps to users, instead of showing the transaction commit time, since these timestamps re ect the serialization order. Intuitively, the lockpoint-timestamp of a WjR transaction indicates at which time the decision to commit or abort was taken (even though the system committed or aborted the transactions at some later time). 5
i.e., a version created by a still active transaction
7
Exclusive semaphore on hash chain Lock hash table
Transaction Table
Lock header Queue of lock requests Exclusive semaphore on lock queue
Figure 4: Lock manager data structures
2.4 Implementation Issues
The EMV2PL protocol may use the same mechanisms as MV2PL to maintain timestamps and versions. Therefore, we mainly discuss here the implementation of the check read(x) function. All the information needed by check read(x) are found in the lock table. Indeed, checking if there is an uncommitted version of x only requires to check if there is an exclusive lock taken on x. If check read(x) must wait for x's creator to commit, it simply waits for the release of this lock. We show that these mechanisms are easily implemented on top of an existing MV2PL lock manager. We take a lock manager similar to the one described in [GR93]. For simplicity we only consider exclusive locks (noted X-lock) and shared locks (noted S-lock). The lock manager's data structures are summarized in Figure 4. Its two basic interfaces are lock and unlock. The only change in the lock manager's data structures consists of associating to each lock header a list of process ids, referred to as list pid. This list is used to store the pid(s) of WjR transactions blocked by check read. Given this, check read(x) performs the following. It looks at the lock header associated with x and checks if x is X-locked. If yes, there is an uncommitted version of x and check read() checks if the owner of this lock has a tn smaller than the caller's one. In that case, check read() appends the caller's pid into the lock header list pid and waits for the release of the X-lock. After it woke up, or if it was not blocked, check read returns an ok message. These operations require neither to traverse the lock request queue nor to insert a new lock request into the lock request queue. [MHLP91] proposes a method to check that a page only contains committed data without inspecting the lock table. This method can greatly reduce the locking overhead. It can be used in our protocol as follows: check read(x) rst checks if the page containing x contains committed data as described in [MHLP91]. If this condition is veri ed, the lock table must not be read. Finally, we slightly modify the unlock interface so that (i) a transaction releasing an X-lock wakes up all processes whose pids are in the lock header list pid and set list pid to null and (ii) a transaction may release all its S-locks. These are the only modi cations of the unlock routine. Note that the lock interface is unchanged. In fact, check read has less overhead than lock6 in 6
which would be invoked instead of check read in MV2PL or S2PL
8
procedural
RW
check-before-write
WR
write-then-check
j
j
(W R) j
none
declarative
deferred checking : deferred or deferrable assertions, and deferred safe RCA triggers - immediate checking - immediate checking : - others immediate assertions and triggers - others
Table 1: patterns of transactions and integrity checking methods terms of lock queue traversal and memory space, since instead of en-queuing new lock requests, it (sometimes) appends the caller's pid into a list. The number of locks in the system is thus reduced. We give in Appendix 1, the pseudo-code for check read, assuming the implementation context described in [GR93].
3 Application to Integrity Checking We showed through the examples of section 1 that EMV2PL allows to avoid constraint anomalies. However, the applicability of EMV2PL suer from the following limitations: (i)the transactions must be write-then-read transactions and, (ii) the lockpoint (i.e., the end of the write part) must be detected. In this section, we discuss the applicability of EMV2PL to constraint checking. We consider dierent methods for programming integrity checks and analyze consequences of each method on the pattern of transactions. All the resulting patterns are summarized in table 1. Moreover, We show how the lockpoint can be automatically detected (i.e., without knowing the transactions in advance) in the case of deferred declarative triggers and assertions.
Procedural Approach: The vast majority of database applications implement integrity con-
straints using a procedural approach whereby integrity checks are embedded into application programs. We distinguish three classes of integrity checking methods: the write-then-check method consists in checking constraints at the end of transaction. Such method is sometimes mandatory because some temporary inconsistent state is unavoidable during the execution of the transaction, or the interactive eects between two or more updates have to be controlled afterwards, or the integrity checks depend on the logic of the transaction program (specially when some conditional branching is used). the check-before-write method consists in checking constraints at the beginning of transactions. This method is expected to bring the following advantages: (i) exclusive locks on the updated data items are held for a shorter time if the updates occur at the end of the transaction, and (ii) less work is possibly wasted when the transaction violates data integrity since there are no unnecessary writes7.
According to the experience of the authors and others ([Sha], [AD], [Coc], [Moh]), check-before-write is a frequent method used by application developers to optimize the enforcement of integrity constraints 7
9
the immediate checking method consists in checking constraints just before or just after update operations. Such a method allows to provide intermediate consistent states during the execution of the transaction. All the resulting patterns are shown in Table 1. Among these methods, only the write-then-check method yields WjR transactions. Hence, provided that the programmer has the ability to manually insert a lockpoint (e.g., using a speci c command in the transaction program) at the end of the write part, a transaction's execution can take advantage of our EMV2PL protocol.
Declarative Assertions Declarative assertions include two forms of constraints: check constraints and referential constraints. Referential constraints are de nable by means of foreign key clauses. A referential constraint involves a referencing relation and a referenced relation. One or several columns of the referencing relation are speci ed as foreign key. The value of these columns must also appear in the appropriate columns of the referenced relation. The treatment of updates and deletes to the referenced relation can be speci ed in the foreign key clause using a foreign key action that describes the action to take in case of integrity violation. A \cascade" action propagates the update or the delete to the referencing relation, a \set null" (or \default") action sets to the null (default) value all foreign key columns whose values are no longer matched in the referenced relation, and a \no action" disallows the violating update or delete. A check constraint can be any SQL condition including multi-table conditions that may read a large number of additional tuples from several tables8 . If the check constraint is not veri ed, the system undoes the violating statement or rollbacks the whole transaction. Referential constraints and check constraints can be checked either immediately (immediate mode) after an SQL statement or at the end of the transaction (deferred mode). The execution of immediate referential and check constraints yields (WjR) transactions. The execution of deferred referential and check constraints yields WjR transactions provided that the referential constraints with \cascade" or \set null" action are executed rst. In this case, a lockpoint can be dynamically detected by the system (i.e., without knowing in advance the transactions) when all the remaining constraints are with \ no action". Remarks about EMV2PL applicability: Assuming that deferred referential constraints with \cascade" or \set null" action are executed before the others deferred constraints is quite realistic. Indeed, In [Hor92], a deterministic execution model for integrity maintenance with such charasteristic is proposed. Moreover, SQL92 has a very explicit speci cation of constraints with respect to database modi cations that make the xpoint evaluation of constraints independent of the order in which rows are processed and thus allows to execute rst the \cascade" and \set null" referential constraints. Let us note that more than a dozen SQL database vendors have a conforming implementations of SQL92 constraints. Declarative Triggers Another way to enforce integrity constraints is to use triggers. A trigger
consists of an event that causes the trigger to be activated, a condition that is checked when the trigger is activated, and an action that is executed when the trigger is activated and its condition is true. The triggering event is an insertion, deletion or update applied to a given relation, say R. We say that the trigger is de ned on R. The condition is an SQL search condition over the database,
8 SQL-92 distinguishes table check constraints and assertions: A table check constraint is attached to one table and is used to express a condition that must be true for every tuple in the table. An assertion is a stand-alone check constraint in a schema and is normally used to specify a condition that aects more than one table.
10
and the action is an atomic procedure that may contain SQL statements combined with other procedural constructs. Triggers can be executed either immediately before or after the triggering SQL statement (immediate mode) or at the end of the transaction (deferred mode). But unlike deferred constraints, deferred triggers do not necessarily yield WjR transactions since the action of triggers can perform database updates. More precisely, the problem is the following: given a transaction ready to commit and a set of deferred triggers activated by the transaction, how can the database system statically detect a transaction lockpoint9 ? A simple method that can be used by the rule manager to detect a lockpoint when triggers execute at the end of transactions, consists in detecting a speci c class of triggers called RCA safe triggers. An RCA trigger is a trigger whose action part does not acquire new exclusive locks. RCA stands for Rollback, Compensative, Alerter trigger. Indeed, the action part of triggers that do not acquire new exclusive lock typically (i) performs a rollback, (ii) overwrites database items that have been already inserted, updated or deleted by the triggering transaction or (iii) raises an alert. An RCA trigger is safe if it cannot transitively trigger a non-RCA trigger. When the rule manager receives the \end-of-transaction" signal from a transaction, the S set of triggers that have been activated is computed. Then, the rule manager recursively selects a trigger r from S , executes r and recomputes S . The rule manager signals the lockpoint to the TM when all the triggers in S are RCA and safe. Remark about RCA and safe triggers: Let us note that RCA safe triggers can be detected at the time triggers are de ned. The safe property can be determined using a triggering action graph (TAG), as de ned in [Mel93]. The RCA property may require a complicated code analysis (except for evident cases as alerters or rollbacks triggers) which is largely out of the scope of this paper.
4 Performance Study In this section we evaluate the performance of S2PL and EMV2PL under various transaction workloads in a centralized database. To compare the relative bene ts of S2PL and EMV2PL for various transaction patterns and in a wide range of operating conditions, we have implemented a simulation model 10 . In our experiments, we consider the case of workloads where some initial W transactions are lengthened by the execution of additional read operations (e.g., implied by the execution of decision support procedures or constraint checking) producing either RjW, WjR or (WjR) transaction patterns (see table 1). The simulation results yield the following conclusions: (1) When the workload contains WjR transactions EMV2PL allows to reduce the number of Read-Write lock con icts and thus decreases the lock contention. Moreover, it also signi cantly reduces the number of deadlocks. In particular, the results show that executing additional read operations in WjR transactions under EMV2PL allows to prevent lock contention thrashing possibly caused by these additional reads. (2) The performance improvement brought by EMV2PL is limited by the increased resource contention (disks) caused by the use of versions. Moreover, as we will see, performance analysis of dierent workloads and operating situations authorizes a kind of feedback tuning approach which given a set of W transactions and a set of 9 In fact, the problem is more complicated because check constraints, referential constraints, and triggers can be mixed together. However, considering the general framework requires to have a precise description of an execution model for deferred triggers and assertions, a still open problem which is largely out of the scope of this paper. 10 Note that our simulation model only re ects the relative bene ts and costs but not the exact numbers.
11
TERMINALS
TERMINALS
. . .
RESTART
. . .
restart_delay
cpu
. . . cpu
CC WAIT QUEUE
Transaction Manager
CC_REQUEST CC REQUEST QUEUE
CC Agent Data disk
ACK DATA ACCESS
ABORTED
Data Manager
.. .
LOG REQUEST
Data disk
FLUSH QUEUE
Log Manager COMMITED
Log disk
DONE
(a) Logical Queuing Model
(b) Physical Queuing Model
Figure 5: The Simulation Model additional reads operations, suggests a suitable transaction pattern ( RjW, WjR or (WjR) ) to profitably execute these new read operations under EMV2PL. In particular, our tuning guide is useful in the context of constraint checking in order to select a performant constraint checking method between the procedural check-before-write or write-then-check, and the declarative immediate or deferred methods.
4.1 The Simulation Model
Our simulation model is strongly derived from [CM86], [ACL87], [BC92] and [SLSV95]. It has two parts: the system model simulates the behavior of the various operating system and DBMS components, while the application model simulates the database items and the transactional workload.
4.1.1 The System Model In our simulation, we model the concurrent execution of transactions on a single site database. To keep the simulator simple, we simulate page-level locking. This allows us not to simulate indexes and index locking, and transactions access records randomly. The system model (Figure 5a) is divided into four main components: a Transaction Manager (TM), a Concurrency Control Manager (CCM), a Data Manager (DM) and a Log Manager (LM). The TM is responsible for issuing concurrency control requests and their corresponding database operations. It also assures the durability property by ushing all log records of committed transactions to durable memory. The CCM schedules the concurrency control requests according to either the S2PL or EMV2PL protocol. The LM provides read and insert- ush interfaces to the log table. The DM is responsible for granting access to the physical data objects and executing the database operations. 12
Parameter
num buer k page cpu page io access log disk io log rec io w commit cpu abort cpu restart delay cpu cc request
Description
Number of pages in the buer pool Resource Unit ( kCPUs and 2k Disks ) CPU time for accessing a record I/O time for accessing a page time for issuing a I/O log access I/O time for sequentially writing 1 page on log disk cpu time for executing a commit cpu time for executing an abort restart delay of an aborted transaction cpu time for servicing one cc request
Value
600 1, 2 or 3 1 millisecond 7 milliseconds 7 milliseconds 1 milliseconds 1 milliseconds 1 milliseconds 5 milliseconds 1 millisecond
Table 2: System Parameters De nitions and Values The DM encapsulates the details of a LRU Buer Manager. The number of pages in the buer cache is num buer. These pages are shared by the main segment and the version pool. When a dirty version pool is chosen for replacement by the LRU algorithm, the DM rst checks if it contains needed versions. If not (i.e., if it contains only obsolete versions), the page is considered non-dirty and simply discarded. Otherwise, it is written on disk. We have chosen to simulate the on-page version caching technique [BC92] because it is one of the most ecient technique for maintaining version and processing transactions: rst, versions are maintained for records (instead of pages), second, a small portion of each page is used for caching previous versions of records. As a result, readers may nd the adequate version without performing any additional I/Os. Also, these versions may sometimes be eliminated while still in the page and thus have not to be appended into the version pool at all. The physical queuing model is shown on Figure 5b. There are k resource units, each containing one CPU server and two I/O servers ([CM86]). The requests to the CPU queue and I/O queues are serviced FCFS ( rst come, rst serve). Parameter record cpu is the amount of CPU time for accessing a record in a page. Parameter page io access is the amount of I/O time associated with accessing a data page from the disk. We added one separate I/O server dedicated to the log le. The parameter log disk io represents the xed I/O time overhead associated with issuing the I/O. Parameter log rec io w is the amount of I/O time associated with writing a log record on the Log disk in sequential order. Parameter commit cpu is the amount of CPU time associated with executing the commit (releasing locks, etc). Parameter abort cpu is the amount of CPU time associated with executing the abort statement (executing undo operations, releasing locks etc). Table 1 summarizes the parameters of the system model and their values for the experiments.
4.1.2 The Application Model The database contains num rec records. With S2PL, 18 records t in one page (this corresponds to pages of 8K containing records of 454 bytes). With EMV2PL the records are assumed to contain an additional 50 bytes to store the timestamp and version pointer. As a result, only 15 records t in one page, whose num cached records are used to cache previous versions. A transaction workload contains transactions that consist of an update part that is extented with additional read operations. There are mpl terminals executing transactions. Parameter U size is the average number of operations executed by the update part of each transaction (without the additional reads). Among these operations, percent write U are write operations. Moreover, a 13
Parameter
num rec num cached mpl U size percent write U RperW P rollback P type
Description
Number of records in the database Number of records cached in each page Number of terminals mean initial size of Update part of tx fraction of write in the Update part of tx Number of additional reads per write in the Update part of tx Probability that RperW read operations generate a rollback pattern of the extended tx
Value
150000 3 1 to 70 30 25% 1 to 13 0 to 10 W R, R W, (W R) j
j
j
Table 3: Workload Parameters De nitions t/s t/s 9.00
15.00 14.00
8.00 EMV2PL 7.00
S2PL
13.00
EMV2PL
12.00
S2PL
11.00
6.00 10.00
5.00
9.00 8.00
4.00 7.00 6.00
3.00
5.00
2.00
4.00 3.00
1.00
2.00
0.00 1
0
20
40
60
Figure 11(a): W R tx throughput: mpl is varying
mpl
j
5
9
13
RperW
Figure 11(b) W R tx throughput:RperW is varying j
transaction executes RperW additional read operations per write operations occurring in its update part. Parameter P type represents the obtained pattern of transactions. They can be RjW, WjR or (WjR) transactions. In a RjW (resp. WjR) transaction then all the additional read operations are executed at the beginning (resp. at the end) of the transaction. If the transaction is (WjR) then RperW read operations are executed \on the y" just after each write operation. We also vary the probability P rollback of executing a rollback after RperW additional reads. When additional read operations consist of integrity checks, this enables to simulate the detection of an integrity violation that leads to reject the transaction. Let us recall that, in our experiments, workloads only contain transactions of the same pattern. All the parameters are summarized in Table 2 (where \transaction" is abbreviated \tx"). Regarding the measurements, each simulation consisted of 3 to 5 repetitions, each consisting of 2000 seconds of simulation time. These numbers were chosen in order to achieve more than 90 percent con dence intervals for our results.
14
ms x 10
ms x 10
3
3
11.00
11.00 IO 10.00
10.00
9.00
9.00
8.00
8.00
7.00
7.00
IO WAIT
6.00
6.00
5.00
5.00
4.00
4.00
3.00
3.00
2.00
2.00
1.00
WAIT
1.00 CPU
CPU
0.00 1
5
10
13
0.00
RperW
RperW 1
Figure 12(a): W R tx response time (S2PL)
5
10
13
Figure 12(b): W R tx response time (EMV2PL)
j
j
4.2 Experiment 1 : Bene ts of EMV2PL
The goal of this experiment is to show the value of EMV2PL for applications containing WjR transactions. The workload contains only WjR transactions (P type = WjR). In Figure 11(a), we vary the multi-programming level mpl from 1 to 70. RperW is xed to 6. We measure the throughput of concurrent WjR transactions running under S2PL (curve S 2PL) or EMV2PL (curve EMV 2PL). Figure 11(a) shows that EMV2PL always gives the best performance. With large multi-programming levels (mpl 40), the throughput of WjR transactions under S2PL reaches a thrashing situation. As observed and explained in several studies [Tho91] [CKL90], the thrashing situation is caused by system under-utilization due to transaction blocking and wasted processing caused by transaction aborts. The curve EMV 2PL shows that executing WjR transactions under EMV2PL avoids such thrashing situation. This is because EMV2PL eliminates the read-write lock con icts due to the read operations executed in the read part of the transactions and reduces deadlocks among WjR transactions. Indeed, with mpl = 60, the mean number of waits per transaction is 2.21 under S2PL and only 1.01 under EMV2PL. With mpl = 60, the rate of aborts is 20% under S2PL and only 1% under EMV2PL. In Figures 11(b), 12(a) and 12(b) we xed mpl to 50 and varied the number of additional reads per write operation from 1 to 13. Figure 11(b) shows the throughput of WjR transactions. The throughput of WjR transactions is always better under EMV2PL than under S2PL. Moreover, the longer are the WjR transactions, the bigger is the gain in performance for EMV2PL. Figure 12(a)(resp. 12(b) ) shows the response time of WjR transactions and how it is divided into CPU, wait and I/O times under S2PL (resp. EMV2PL). EMV2PL signi cantly reduces the wait time of WjR transactions while the contention on disk servers increases because transactions execute operations at a faster rate. The number k of resource units is thus an important parameter. Figure 13 shows the gain in throughput of WjR transactions under EMV2PL relative to S2PL with one, two or three resource units and under various multiprogramming levels. It shows that EMV2PL is more ecient as there are more resource units since the gain in concurrency is less aected by a higher contention on disk servers. 15
t/s 2.20
9.00
2.10 k =1
8.00
2.00 k =2
7.00
1.90 k =3 1.80
6.00
1.70
5.00 1.60
4.00
1.50
(W|R)* under S2PL 1.40
3.00
1.30
R|W under S2PL
2.00
1.20
W|R under EMV2PL
1.00 1.10
0.00
1.00 5
20
40
60
mpl
Figure 13: W R: relative bene ts of EMV2PL/S2PL
0
20
40
60
mpl
Figure 14: W R, R W and (W R) throughputs
j
j
4.3 Experiment 2 : Application to Integrity Checking
j
j
In the following experiments, we evaluate workloads of transactions that perform integrity checking. We assume that one single integrity check is performed per write operation occurring in the transactions. We also assume that each integrity check requires a xed number of read operations indicated by the RperW parameter. The P rollback parameter represents the probability that an integrity check detects a constraint violation and rollbacks the transaction. In these experiments, we compare the performance of transactions executed with various integrity checking methods. As shown in Table 1, the choice of an integrity checking method has an impact on the pattern of a transactions. Procedural check-before-write policy for transactions granularity yields RjW transactions. Declarative deferred checking and procedural write-then-check for transactions granularity yields WjR transactions. Other policies yield (WjR) transactions. In the following experiments, we xed the RperW parameter to 6. We execute WjR transactions under EMV2PL. RjW and (WjR) are executed under S2PL. Figure 14 shows the total throughput when no integrity check can issue a rollback (p rollback =0). Curves RjW and (WjR) show that with large multiprogramming levels the throughput of RjW or (WjR) reaches a thrashing situation caused by the additional read locks taken during constraint checking. Curves WjR shows that checking the constraints at the end of the transaction under EMV2PL avoids the thrashing situation. This is because EMV2PL eliminates all the readwrite con icts and deadlocks due to constraint checking. Eect of rollbacks Figures 15(a) and 15(b) shows the in uence of transaction rollbacks caused by integrity checks. In Figure 15(a) (resp Figure 15(b)) the probability p rollback that a constraint is violated and produces a rollback is set to 5% (resp. 10%). The curves show that RjW transactions provide the best troughput when the multiprogramming level is small. Indeed, checking constraints at the beginning of transactions allows to avoid unnecessary operations when the constraint is violated. Of course, higher is the probability of rollbacks, better is the throughput of RjW transactions (compare the curves RjW in Figures 15(a) and 15(b)). However, WjR transaction under EMV2PL outperform RjW transactions when the multiprogramming becomes larger. Indeed, WjR transaction under EMV2PL outperform RjW transactions when mpl > 30 if p rollback = 5% (see Figure 16
t/s
t/s
13.00
13.00
12.00
12.00
11.00
11.00
10.00
10.00
9.00
9.00
8.00
8.00
7.00
7.00
6.00
6.00
5.00
5.00 (W|R)* under S2PL
4.00
(W|R)* under S2PL R|W under S2PL W|R under EMV2PL
4.00
3.00
R|W under S2PL
3.00
2.00
W|R under EMV2PL
2.00
1.00
1.00
0.00
0.00 0
20
Figure 15(a):
40
p rollback
60
mpl
= 5%
0
20
Figure 15(b):
40
p rollback
60
mpl
= 10%
15(a)) and when mpl > 40 if p rollback = 10% (see Figure 15(b)). This is explained as follows. When the multiprogramming level is large the read-write lock contention is high. Since EMV2PL eliminates these locks con icts the eect of EMV2PL becomes predominant. Eect of read-write lock con icts To show the eect of read-write lock con icts we divided the database into two parts DB1 and DB2. Transactions perform only operations on DB1. Constraint checking produces only read operations on DB2. In these experiments RperW is xed to 12 and p rollback = 0. Figure 16(a) shows the throughput of WjR, RjW and (WjR) transactions. This gure shows that checking constraint at the beginning of transactions gives always the best performance. This is explained as follows. In WjR transactions, exclusive locks taken in the write part are held for a longer time than with RjW transactions. Indeed, WjR transactions keep the exclusive locks until the whole read part is executed while RjW transactions release their exclusive locks at the end of the write part. Moreover, since constraints are only executed on DB2, they never con ict with the write operations performed by the transactions on DB1. Thus, EMV2PL does not eliminate any read-write con icts and its good eect becomes insigni cant. Finally, Figure 16(a) also shows that executing WjR transactions under EMV2PL gives better performance than executing (WjR) under S2PL. This is because EMV2PL allows transactions to release their read locks before executing their read part. Thus, in average, read locks on DB1 are held for a shorter time under EMV2PL. In Figure 16(b), we consider a workload where 50% of the constraints perform operations on DB1 and 50% on DB2. We consider here three constraint checking policies. The check-before-write policy that yields RjW transactions, the write-then-check (or deferred checking) policy that yields WjR transactions and a mixed policy that consists in checking constraints on DB2 at the beginning of transactions and checking constraints on DB1 at the end of transactions. This policy yields RjWjR transactions. RjW transactions are executed under S2PL. WjR and RjWjR transactions are executed under EMV2PL. Let us note that, under EMV2PL, RjWjR transactions take locks until the end of their W part, that is, the rst read part of these transactions is executed as under S2PL and the second read part is executed using versions. Figure 16(b) shows the resulting throughputs under various multi-programming levels. It shows that the mixed policy (RjWjR 17
t/s t/s 5.50 5.50
5.00
5.00
4.50
4.50
4.00
4.00
3.50
3.50
3.00
3.00
R|W under S2PL W|R under EMV2PL
(W|R)* under S2PL
2.50
2.50 R|W|R under EMV2PL
R|W under S2PL 2.00
2.00 W|R under EMV2PL
1.50
1.50
1.00
1.00
0.50
0.50
0.00
0.00 0
20
40
60 mpl
Figure 16(a) No read-write con icts
0
20
40
60
mpl
Figure 16(b): 50% read-write con icts
transactions) gives the best performance. This is explained as follows. First, by executing the constraints that perform read operations on DB1 at the end of transactions under EMV2PL, we eliminate all the additional read-write con icts and avoid a trashing situation ( only the RjW curve shows a thrashing behavior.) Second, by executing constraints that performs read operations on DB2 at the beginning of transactions we do not add any read-write lock con ict and shorten the write lock holding time of the transactions (compare the WjR curve with the RjWjR curve).
4.4 Rule-of-Thumb Lessons from these Experiments
The simulation results show that, when the workload contains WjR transactions, EMV2PL allows to increase the performance by eliminating read-write lock con icts due to the R part of the WjR transactions. This performance improvement is reduced by the increased resource contention caused by versions. Moreover, the simulation results show that executing read operations for constraint checking at the end of transactions is usually the best solution under EMV2PL to improve the performance except in the following situations: (1) There is resource contention. Once again, adding read operations at the end of transactions intensi es the use of versions and thus increases the resource contention. (2) The read operations are not involved in any read-write lock con ict. In such a case, executing the read operations using versions does not eliminate any lock con ict. Moreover, exclusive locks taken in the Write part of the transaction are held for a longer time (until all the additional read operations are executed). Thus, the best solution is to execute the additional read operations at the beginning of transaction. (3) the read operations consist of integrity checks with a \high" probability of rollbacks. In such a case, checking the constraint at the end of the transaction will possibly waste unnecessary operations. Once again, a best solution is to perform the checks with a high probability of rollback at the beginning of transactions. 18
These considerations lead to a simple feedback method to improve the pattern of transactions that execute a set of constraints.
Guideline for tuning constraint checking under EMV2PL (1) Evaluate the lock contention (2) Evaluate the system load (3) If the system is under-utilized because of lock contention then: (a) Find the constraints that are involved in a lot of lock con icts. If they are not often violated, try to execute them in a deferred mode. (b) Find the constraints that are involved in very few lock con icts, and try to execute them at the beginning of transactions. (4) If lock contention is low, nd the constraints that are very often violated and try to execute them at the beginning of transactions.
5 Related Work An extensive literature addresses the problem of designing concurrency control algorithms that augment the performance of concurrent transactions. Our work directly builds on previous work on MV2PL protocols ([CFL+ 82], [BHG87], [AS89], [AK91], [BC92], [MPL92]), but diers from those by focusing on the problem of optimizing read operations in update transactions. The 2V2PL multiversion protocol authorizes the use of versions in update transactions, and guarantees serializability of transactions. In this protocol, there are three modes of locks associated to each lock unit: read lock (r), write lock (w) and certi cation locks (c). The corresponding compatibility matrix is the following: r w c r compatible compatible not compatible w compatible not compatible not compatible c not compatible not compatible not compatible After each read (resp. write) operation, a r lock (resp. w lock) is taken. A read(x) operation always accesses the last committed version of x. At the end of the transaction, w locks are converted into c locks, and the transaction commits if and only if all the w locks have been successfully converted into c locks. To compare EMV2PL with 2V2PL, we compare the set of serializable histories that they admit. In fact , EMV2PL accepts histories that are not accepted by 2V2PL and conversely as shown by the following examples.
Example 5.1 Suppose we have two WjR transactions T1 and T2, and consider the following history: w1(x)
w2(y) r1(y) r2(y) c2 r1(z) c1
19
If we assume that the read part of T1 contains the r1(y) and r1(z ) operations and the read part of T2 contains r2(y), then EMV2PL accepts this history. Indeed, T2 is never blocked since T2 does not access an item already locked by another transaction and can commit as soon as all its operations are performed. Moreover, since T1 reaches its lockpoint before T2, T1 cannot be blocked by the locks taken by T2 on y. 2V2PL does not accept this history: before committing, T2 must convert its w lock on y into a c lock. To do this, T2 must wait that T1 releases its r lock on y. Thus T2 must wait until T1 commits. Example 5.2 consider now two transactions T3 and T4 and the following history: w3(x)
r4(x) w4(y) c4 w3(z) c3
2V2PL accepts this history. Indeed, T4 can commit since it takes a w lock on a item that is not accessed by another transaction. On the contrary, EMV2PL does not accept this history because T4 is not a WjR transaction and hence the r4(x) operation must take a lock on x; Since x is locked by T3, T4 must wait until T3 commits. So far, the largest body of work on the enforcement of semantic integrity constraints has focused on ecient algorithms to detect if a constraint is violated (e.g, [HI85], [BD95], [CW90]). The problem of optimizing the execution of multiple integrity checks within a transaction has been rst addressed in [BP79]. This paper compares the performance of dierent constraint checking policies (including the check-before-write and the write-then-check methods) and show that the check-before-write method is the most ecient (in terms of transaction response time) because it avoids redundant computations and expensive rollbacks. However, all these works do not consider the possible concurrency between transactions. A very few research papers have addressed the problem of optimizing the throughput of concurrent transactions that perform integrity checks. The Commit LSN method, proposed in [Moh90], is used (among other things) to avoid taking a lock when checking a referential integrity constraint. More precisely, no read lock is acquired on data items involved in a \no action" referential integrity constraint 11 if (i) the constraint is satis ed, and (ii) a property of the Commit LSN is veri ed. In contrast, our proposal is not limited to referential constraints (veri ed or not) but can be applied to any integrity check as long as it does not require new write locks. However our method only applies to WjR transactions. In [Laf82], the author proposes to use semantic integrity dependencies between data items to improve the eciency of constraint checking. The checking of data, which \depend" on the data items currently updated by a transaction, is delayed until other operations actually need them. This gives the possibility to perform these checks in parallel, thereby improving the concurrency of transactions. In the eld of active databases, a related idea has been proposed in [DHL90], which consists of executing triggers in separate transactions from the triggering transaction. When a trigger is de ned, a decoupled coupling mode can be chosen, indicating that the trigger is run in a separate transaction. This mode is subdivided into dependent decoupled where the separate transaction is not spawned unless the triggering transaction commits, and independent decoupled, where the separate transaction is spawned regardless of weather the triggering transaction commits. As shown in [CJL91], de ning triggers in a decoupled mode may greatly improve the throughput of concurrent transactions. However, these two methods require some semantic analysis in order to guarantee that no transaction sees a temporary inconsistent database state. In constrast, our solution preserves the full isolation of all transactions. 11
immediate or deferred
20
6 Conclusions EMV2PL is a simple yet ecient extension of MV2PL which enables a WjR transaction that has acquired all its write locks to (i) release its read locks, and (ii) execute new read operations on versions without taking locks. We proved the correctness of this protocol, and showed that its implementation only requires a few changes with respect to an existing implementation of MV2PL. Performance studies show that for workloads containing WjR transactions, EMV2PL can signi cantly improve the overall throughput of transactions (i.e., WjR, and W transactions), with a relatively small utilization of versions. We then presented a speci c, yet important, application of our protocol to the problem of integrity checking. We described various possible methods for implementing integrity checking. For \read-only" integrity checks, we showed that: if the probability that a transaction violates integrity is small, then checking integrity at the end of transactions run under EMV2PL, is the method that often achieves the best total transaction throughput. Hence, (declarative) deferred checking generally oers better performance that immediate checking for \read-only" integrity checks, which in our view relaunches the interest of implementing deferred assertions and deferred triggers in relational database systems. We foresee two directions of future work. One is to extend our simulation experiments to handle non uniform lock con icts between data items. An interesting application of this is given by \summary tables", which consist of materialized views computed from base relations. Relations MaxRisk and MinRisk in the example of Section 2 are two examples of summary tables. These tables are quite frequent in decision support applications and for integrity checking. We plan to investigate the performance of EMV2PL in application scenarios where summary tables are read at the end of transactions. Another direction is to compare more in depth the performance of immediate versus deferred checking by taking into account the respective overheads associated with these two methods.
Acknowledgments We are grateful to Anthony Tomasic for his detailed comments that enabled to improve this paper. We also thank Francoise Fabret, Angelika Kotz-Dittrich, C. Mohan, and Dennis Shasha for constructive discussions about the paper.
References [ACL87] R. Agrawal, M.J. Carey, and M. Livny. Concurrency control performance modeling: Alternatives and implications. ACM Transactions on Computers and Systems, 12(4):609{ 654, December 1987. [AD] A.Kotz-Dittrich. private communication. [AK91] D. Agrawal and V. Krishnaswamy. Using multiversion data for non-interfering execution of write-only transactions. Proc. ACM SIGMOD Int. Conf. on Management of Data, Denver, Colorado, 20:98{107, May 1991. [AS89] D. Agrawal and S. Sengupta. Modular synchronisation in multiversion databases: Version control and concurrency control. Proc. ACM SIGMOD Int. Conf. on Management of Data, Portland, Oregon, pages 408{417, 1989. 21
[BBG+ 95] H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil, and P. O'Neil. A critique of ansi sql isolation levels. Proc. of the ACM SIGMOD Int. Conf. on Management of Data, San Jose, California, pages 1{8, May 1995. [BC92] P. M. Bober and M. J. Carey. On mixing queries and transactions via multiversion locking. Proc. Int. Conf. on Data Engineering, Tempe, Arizona, pages 535{545, February 1992. [BD95] V. Benzaken and A. Doucet. Themis: A database programming language handling integrity constraints. The International Journal on Very Large Databases, 4(3):493{ 518, July 1995. [BHG87] P.A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems. Addison-Wesley Publishing Company, 1987. [BP79] D.Z. Badal and G.J. Popek. Cost and performance analysis of semantic integrity validation methods. Proc. ACM SIGMOD Int. Conf. on management of Data, Boston, Mass., pages 109{115, 1979. [Car83] M.J. Carey. Modeling and Evaluation of Database Concurrency Control Algorithms. PhD thesis, University of California, Berkeley, September 1983. [CFL+ 82] A. Chan, S. Fox, W.K. Lin, A. Nori, and D.R. Ries. The implementation of an integrated concurrency control and recovery scheme. Proc. ACM SIGMOD Int. Conf. on Management of Data, Orlando, Florida, pages 184{191, June 1982. [CJL91] M. C. Carey, R. Jauhari, and M. Livny. On transaction boundaries in active databases: A performance perspective. IEEE Transactions on Knowledge and Data Engineering, 3(3), September 1991. [CKL90] M.J. Carey, S. Krishnamurthy, and M. Livny. Load control for locking : the half and hall approach. Proc. ACM Symposium on the Principles of Database Systems, 1990. voir lck.Carey.89. [CM86] M. J. Carey and W. A. Muhanna. The performance of multiversion concurrency control algorithms. ACM Transactions on Computers and Systems, 4(4):338{378, November 1986. [Coc] B. Cochrane. private communication. [CW90] S. Ceri and J. Widom. Deriving production rules for constraint maintenance. Proceedings of the 16th Int. Conf. on Very Large Data Bases, Brisbane, Australia, pages 566{577, 1990. [DHL90] U. Dayal, M. Hsu, and R. Ladin. Organizing long-running activities with triggers and transactions. Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, New Jersay, pages 204{214, May 1990. [GR93] J. Gray and A. Reuter. Transaction Processing. Morgan Kaufmann, 1993. [HI85] A. Hsu and T. Imielinski. Integrity checking for multiple updates. Proc. of the Int. Conf. on Management of Data, Austin, Texas, May 1985. 22
[Hor92]
B.M. Horowitz. A run-time execution model for referential integrity maintenance. Proceedings of the Eight Int. Conf. on Data Engineering, Tempe, Arizona, February 1992. [Laf82] G.M. Lafue. Semantic integrity dependencies and delayed integrity checking. Proc. of the 8th Int. Conf. on Very Large Database Systems, Mexico City, Mexico, pages 292{299, 1982. [Mel93] J. Melton, editor. (ISO/ANSI Working Draft) Database Language SQL3. Number ANSI X3H2-90-412 and ISO DBL-YOK 003. February 1993. [MHLP91] C. Mohan, D. Haderle, B.G. Lindsay, and H. Pirahesh. Aries: A transaction recovery method supporting ne-granularity locking and partial rollbacks using write-ahead logging. ACM Transaction on Database Systems, 17(1):94{162, 1991. [Moh] C. Mohan. private communication. [Moh90] C. Mohan. Commit lsn: a novel and simple method for reducing locking and latching in transaction processing systems. Proc. of the 16th Int. Conf. on Very Large Data Bases, Brisbane, Australia, pages 406{418, August 1990. [MPL92] C. Mohan, H. Pirahesh, and R. Lorie. Ecient and exible methods for transient versioning of records to avoid locking by read-only transactions. Proc. ACM SIGMOD Int. Conf. on Management of Data, San Diego, California, pages 124{133, June 1992. [Sha] D. Shasha. private communication. [SLSV95] D. Shasha, F. Llirbat, E. Simon, and P. Valduriez. Transaction chopping: Algorithms and performance studies. ACM Transactions on Database Systems, 20(3), December 1995. [Tho91] A. Thomasian. Performance limits of two-phase locking. Proc. Int. Conf. on Data Engineering, pages 426{435, April 1991.
23
A Proofs In this section we formally prove the correctness of EMV2PL.
A.1 Preliminaries
We rst introduce basic notations and de nitions, (see [BHG87] and [AS89]). A database is a set of objects. A Transaction Ti is an ordered pair (i ;