Design and Performance of an Assertional

1 downloads 0 Views 279KB Size Report
stock, Qi might assert that sell orders for n shares have ... shares available at $30, T1 buys the rest of its shares at $31; then .... points which we now summarize:.
Arthur J. Bernstein David S. Gerstl Wai-Hong Leung Philip M. Lewis Department of Computer Science State University of New York at Stony Brook Stony Brook, NY 11794-4400 USA fart, gerstl, wleung, [email protected]

y

Abstract

Serializability has been widely accepted as the correctness criterion for databases subject to concurrent access. Serializable execution is generally implemented using a two-phase locking algorithm that locks items in the database to delay transactions that are in danger of performing in a non-serializable fashion. Such delays are unacceptable in high-performance database systems and in systems supporting long-running transactions. A number of models have been proposed in which transactions are decomposed into smaller, atomic, interleavable steps. A shortcoming of much of this work is that little guidance is provided as to how transactions should be decomposed and what interleavings preserve correct execution. We previously proposed a new correctness criterion, weaker than serializability, that guarantees that each transaction satis es its speci cation. Based on that correctness criterion, we have designed and implemented a new concurrency control. Experiments using the new concurrency control demonstrate signi cant improvement in performance when lock contention is high.

1 Introduction

A number of authors have proposed methods for increasing the throughput of transaction processing systems by decomposing transactions into steps and releasing locks between steps. In this paper we propose a new type of concurrency control, the assertional concurrency control (ACC), which can be used to schedule the execution of steps in decomposed transactions so as to guarantee correct executions. We have implemented the ACC within the CA-Open  This paper is based upon work supported by NSF grant CCR-9402415. The authors would like to express their gratitude to Computer Associates Internationaltm for the donation of the copy of CA-Open Ingrestm used in these experiments. y Contact Author:(516)632-7835/8334(fax)

Copyright 1998 IEEE. Published in the Proceedings of ICDE'98, February 1998 in Orlando, Florida. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone: + Intl. 908-562-3966.

Design and Performance of an Assertional Concurrency Control System

Ingrestm database management system and experimentally evaluated its e ectiveness using the TPCCtm Benchmark Transactions. Our results show up to an 80% increase in performance when lock contention is high, long running transactions are a part of the transaction suite and sucient system resources are present to support the additional concurrency that the ACC makes possible. Serializability is the most stringent isolation level and has been widely accepted as the correctness criterion for transaction processing systems. It generally implies the use of a strict two-phase locking protocol in which locks are held for long periods of time with a resulting degradation of performance. Performance can be improved through the use of weaker isolation levels, which are implemented using fewer and/or shorter term locks, but correctness is not assured. Another approach to improving performance is to decompose each transaction into a sequence of steps and release locks when a step terminates. The steps of concurrent transactions are interleaved at run time. Since locks are held for shorter periods of time, more concurrency is allowed. Step decomposition is done at design time. Hence, the approach applies to applications involving preprogrammed (canned) transactions having stable (unchanging) code. Long running transactions that access frequently used data items (hot spots) and that are themselves invoked frequently are the prime candidates for decomposition. Other transactions need not be analyzed and can be executed as single steps. While transaction decomposition has a great deal of potential for improving performance (as we demonstrate later in this paper) the resulting schedules are not serializable. In [5] we proposed a notion of correctness, weaker than serializability, called semantic correctness. Informally a concurrent schedule of a set of transactions

is semantically correct if, when each transaction commits, its speci cations are satis ed and, when the system quiesces, the database consistency constraint is true. We described a two-level concurrency control called an assertional concurrency control (ACC), consisting of a dispatcher on the higher level and a twophase locking concurrency control on the lower level, that controls step interleaving in such a way as to produce only semantically correct schedules. In this paper we present the design of a one-level ACC that produces semantically correct schedules and allows more concurrency than the two-level control. We describe semantics formally so that they can be stated concisely and analyzed. Hence, we state the speci cation of each transaction using pre- and postconditions drawn from a formal language. Whether or not assertions are stated formally, however, the designer of any program, particularly a concurrent program, should have an informal notion of the state at certain critical points in the program. Any schedule that guarantees the truth of these assertions at run time is correct. The ACC provides this guarantee. When transaction steps are interleaved, a transaction, Ti , might see uncommitted results produced by another transaction, Tk . We deal with the subsequent rollback of Tk using compensation. We utilize the semantics of the transactions to determine when intermediate results that might be reversed by compensation can be exposed to other transactions. Hence, the need for compensation limits step decomposition.

2 Previous Work

Considerable research has been done on the design of high-performance concurrency controls. Many of the proposals try to increase performance above that available with strict two-phase locking while retaining serializability as a correctness criterion. Some work uses the semantics of abstract objects [25, 13, 26, 3]. [22] demonstrates how a set of transactions can be analyzed to nd a decomposition in which any interleaving of the decomposed transactions is equivalent to a serial execution of the transactions. Unfortunately, the serializability requirement places stringent limits on the decomposition. [17] uses a formal approach to derive a locking protocol that ensures serializability. Other related work considers models that do not use serializability as a correctness criterion. A Saga [10] is a long running transaction that is decomposed into steps. Since no restrictions are placed on how the steps of concurrent Sagas can be interleaved, it is implicitly assumed that each step preserves database integrity. This limits the extent to which the Saga can be decomposed. Other examples of schemes involving

decomposition, non-serializability, and compensation are discussed in [6]. In [6], correctness is generally de ned in terms of restrictions on allowable schedules. A ConTract [24, 20] is a work ow in which a script is used to schedule various transactional steps, and the ow depends on assertions made about the system state. A related approach is taken in [1]. [21] uses semantics in a more formal way. The database is decomposed into a collection of atomic data sets (ADS), where the set of consistent database states is the Cartesian product of the consistent states of the ADS's. Transactions are decomposed into segments of code that access distinct ADS's serializably. The notion of correctness in [21] is related to the concept of predicate-wise serializability [15, 19]. In both cases the database consistency constraint is used to decompose the database into subsets and transactions are required to be serializable (although not necessarily in the same order) in each subset. [9] introduces a model in which transactions are grouped into sets, and the steps of transactions within the same set can be arbitrarily interleaved, while transactions in di erent sets must be completely isolated from each other. [16] generalizes this by introducing the notion of multi-level atomicity. Transactions are decomposed into steps and a hierarchical structure of allowable interleavings is established. The steps of transactions closely related in the hierarchy can be interleaved with each other while those distantly related cannot. The hierarchical nature implies that closeness is transitive. [8] notes that [9] and [16] are not general enough, and gives a more general framework where each breakpoint (interstep point) is associated with a set of transactions whose steps may interleave at that breakpoint. For implementation purposes the assumption is made that allowable interleavings are transitive. There are two shortcomings of these proposals. They do not give any guidance in choosing the points at which transactions should be decomposed and (primarily because of transitivity) the technique used for specifying allowable interleavings is not exible. The work described here generalizes many of these proposals by presenting a non-transitive, table driven, algorithm that essentially enumerates the steps that can be interleaved between two successive steps of each transaction. The concurrency control that enforces the speci cation has been implemented and tested as described in subsequent sections. Equally important, we describe an analysis technique, introduced in [5], that bases the decomposition and the interleaving speci cation on the semantic correctness criterion,

thereby guaranteeing that when a transaction is executed it will satisfy its speci cations. A recent work that takes a similar approach to [5] is [2], but the design described there has not been implemented.

3 The Design of an ACC

We rst present the theoretical basis of an assertional concurrency control (based on [5]); then we describe a simpli ed version of the algorithm.

3.1 Semantic Correctness

The semantics of a transaction, Ti , can be formally characterized by the triple

fI g T fI ^ Qi g

(1) where I, the precondition1 of Ti , is the consistency constraint of the database and Qi asserts that Ti has performed its intended function. For example, if Ti is a buy transaction that purchases n shares of some stock, Qi might assert that sell orders for n shares have been deleted from the database, that the sales have been recorded in a ledger, and that when each share was bought, no cheaper unbought shares existed in the database. (1) can be regarded as a formal restatement of the speci cation of Ti . We can demonstrate that [an implementation of] Ti is correct by proving that (1) is a theorem using a formal system such as that of [14] (although this is not generally done). We propose a new correctness criterion, as an alternative to serializability, for the concurrent execution of a set of transactions. A schedule is semantically correct if Qi is true when each transaction, Ti , in the set terminates and I is true of the nal database state2 [5]. The condition is stated in terms of assertions and is weaker than serializability since any schedule that is serializable is semantically correct3 , but semantic correctness allows schedules that result in states that could not have been reached in any serial schedule. For example, in the stock trading database above, in a semantically correct schedule two concurrent transactions, T1 , and T2, could each buy some of its i

1 The term \transaction precondition" is used in two ways in the literature. We consider the precondition of a transaction to be a predicate that (a) the transaction requires of the state from which it executes, and (b) the transaction can assume to be true when it initiates. Thus the precondition of a withdraw transaction from a bank cannot assert that sucient funds exists since (b) does not hold. Rather withdraw must operate correctly from a state where sucient funds do not exist. 2 In cases where the database is never quiescent, the de nition can be modi ed so that semantically correct schedules have the property that at any point the abort of all running transactions (using the mechanisms of Section 3.4) would result in a consistent state. 3 We consider schedules that are con ict equivalent to semantically correct schedules to be semantically correct themselves.

n shares at $30 and some at $31 per share, even though there are n shares initially available at $30 . Hence in a serializable schedule, one or the other of the two transactions would have bought all of its shares at $30. First T1 buys n=2 shares at $30; then T2 buys n=2 shares at $30; then, since there are no more shares available at $30, T1 buys the rest of its shares at $31; then T2 buys the rest of its shares at $31. Note that both transactions satisfy their postconditions, since when each share was bought, no cheaper unbought shares existed in the database. The resulting state of the ledger could not have been produced by a serializable schedule. A proof of (1) can be abbreviated by an annotated program in which each [atomic] statement of Ti , si;j , is preceded by an assertion, pre(si;j ), its precondition, describing the state of the system at the time si;j begins execution. Each assertion states some condition relating the values of items in Ti 's workspace and in the database. An assertion pre(si;j ) is active if the statement si;j is eligible to execute. If si;j is executed starting in a state where pre(si;j ) is true, the next assertion, pre(si;j +1 ), will be true in the state when si;j terminates. Hence, if when each statement is executed its precondition is true, the postcondition of the transaction will be true when the transaction terminates. The major issue with respect to concurrency is invalidation: if the execution of the statements of Ti and Tk are interleaved, and if sk;l is executed when pre(si;j ) is active and true, it might transform the state to one in which pre(si;j ) is false. Thus pre(si;j ) might not be true when si;j is initiated. If this occurs, we say that sk;l has invalidated pre(si;j ). One way to demonstrate that invalidation cannot occur is by proving [18]:

fpre(si;j ) ^ pre(sk;l )g s fpre(si;j )g k;l

(2)

for all si;j and sk;l . (2) states that if si;j and sk;l are eligible to execute (and thus the preconditions of these statements are true) and if sk;l is executed next, the precondition of si;j will still be true when the execution of sk;l terminates. If (2) cannot be demonstrated, we say that sk;l interferes with pre(si;j ), and there is a possibility of invalidation at run time. Since serializable executions produce results that are identical to serial executions, the issue of invalidation does not arise with two-phase locking. Instead of two-phase locking, we use the ACC to guarantee that the precondition of each statement is true when that statement is executed and can thus ensure semantically correct schedules. We achieve correctness using two techniques:

First, at design time, we decompose each transaction, Ti , into a sequence of steps, Si;1 ; Si;2; : : :; Si;n, and consider those assertions in the proof of (1) that appear at step boundaries, the interstep assertions. We denote the pre- and postconditions of Si;j as pre(Si;j ) and post(Si;j ) respectively (where post(Si;j ?1 ) ) pre(Si;j )). Hence pre(Si;1 )  I and post(Si;n )  post(Ti )  (I ^ Qi). A proof of Ti can be abbreviated fpre(Si;1 )gS 1fpre(Si;2 )gS 2 : : : S fpost(Si;n )g (3) i;

i;

i;n

The goal of the decomposition is to choose steps in such a way that the number of interstep assertions that are interfered with by some transaction step is small. We refer to the interference remaining after decomposition as residual interference. Although it would be desirable to choose steps such that there is no residual interference, such a goal is generally not consistent with a decomposition having a reasonable granularity. In the limit, as step size increases, each transaction becomes a single step and residual interference disappears entirely. In the normal case, however, some residual interference remains. The second technique is used at run time to eliminate invalidation. The ACC uses strict two-phase locking within each step to guarantee step isolation and atomicity, and thus any schedule produced by an ACC is equivalent to a serial schedule of steps. Hence, the only invalidation we have to be concerned with is the invalidation of interstep assertions that occurs because of the residual interference. To prevent this invalidation, the ACC controls step interleaving by effectively locking interstep assertions and by doing so ensures that the precondition of a step is true when the step is initiated. While the interstep assertions can be obtained from a formal proof of (1), such a formal proof is often not necessary. For example, the decomposition into steps is often done in such a way that each step performs a complete subtask and returns the data structure to a consistent state or to some other state for which the interstep assertions are intuitively evident. The stronger the interstep assertions, the more likely they are to be interfered with. It is therefore important that the proof of Ti used at design time involve the weakest assertions that are sucient to yield Qi as a postcondition and guarantee database consistency when the system quiesces. This issue is discussed in [5], which describes a proof, called a maximally reduced proof, using weaker assertions than those in a proof of (1). pre(Si;1 ) is is weaker than I but is suciently strong to demonstrate Qi as a postcondition and to

demonstrate that any conjunct of I that is temporarily made false by a step of Ti has been restored to the true state by a subsequent step. The maximally reduced proof of Ti can still be abbreviated as (3) with the meaning of each assertion adjusted accordingly.

3.2 The Assertional Concurrency Control

In an early, but conceptually simple design [5], the ACC is decomposed into two levels where the lower level implements a conventional two-phase locking algorithm and the higher level is responsible for dispatching steps to the lower level. The design time analysis produces tables that (a) identify steps that interfere with each interstep assertion and (b) identify transaction pre xes that interfere with each conjunct of I. At run time the upper level of the ACC uses these tables to decide whether a step should be dispatched. A major problem with a two-level ACC is that the identity of some database items referenced by a step or an assertion might not be known at design time, and hence assertional locking must be more conservative than necessary. For example, if step Si;j has a precondition asserting account:bal  $1000 for the account it is accessing, and step Sk;l executes the statement UPDATE accounts SET bal = bal - 100 WHERE , a two-level ACC would delay the execution of Sk;l until after Si;j if it cannot be determined at design time that the two transactions will access di erent accounts. If the two transactions access distinct accounts, but the ACC delays Sk;l , then Tk has been delayed due to a false con ict. We can eliminate many false con icts by using information available at run time. In our example, if Ti and Tk access di erent accounts, the ACC can allow Sk;l to execute despite the interference. By integrating the two levels it is possible to construct an ACC that can make such a determination eciently. This integration is accomplished by introducing a new lock mode called an assertional lock (in addition to shared and exclusive locks) and attaching assertional locks to database items, instead of locking the assertions themselves. Steps use conventional locks in a two-phase manner and hence are isolated and atomic. Assertional locks are used to ensure that when a step executes, its precondition is true. An assertional lock on the assertion pre(Si;j ) is denoted A(pre(Si;j )) and can be attached to any database item x that can be locked with a conventional lock. An A(pre(Si;j )) lock on x prevents the update of x by a step that interferes with pre(Si;j ). We say that the assertion pre(Si;j ) is locked when Ti holds A(pre(Si;j )) locks on all items referenced by pre(Si;j ). Since the only way for the

database to move from a state in which pre(Si;j ) is true to one in which it is false is for a transaction to modify an item that pre(Si;j ) references, the truth of pre(Si;j ) can be preserved by locking pre(Si;j ). As in the two-level ACC, interference between steps and assertions is determined at design time and is stored in interference tables, which can be eciently accessed at run time by the ACC. Hence, the overhead of acquiring and releasing an assertional lock is comparable to that for conventional locks. [2] uses a similar approach, and also ensures that the precondition of a step is true when the step is initiated. A major di erence is that the ACC is more pro-active since it prevents the invalidation of any active assertion. [2] allows the invalidation of an active assertion if the programmer can prove that the assertion will eventually be restored. Assertional locks also resemble predicate locks[7]. Predicate locks, however, require run-time checking of predicate intersection to determine whether a con ict has occurred, whereas with assertional locks, the interference analysis is done at design time, and only a table look up is required at run time. The locking protocol derived in [17] is embedded in the application and enforces serializable execution. In contrast, the ACC provides locks within the concurrency control and enforces semantic correctness. Assertions are also used in [20], where they are attached to objects and are checked when the object is changed. In contrast to [20], the ACC determines the potential for invalidation at design time. Thus the ACC's run-time check consists of a simple table lookup. The ACC approach is more conservative, but much faster, especially when the constraint involves a number of items. Instead of our approach, the ACC could be built using a general{purpose tool for implementing advanced concurrency control models, such as CORD [12]. This would simplify system construction at the cost of reduced eciency.

3.3 Simpli ed One-Level ACC Algorithm In this section we describe a version of the one-level ACC that has a number of ineciencies, but is simple enough to be presented concisely. The implemented ACC is considerably more ecient and is brie y discussed at the end of this section. The primary concerns of the one-level ACC are to ensure that: (a) pre(Si;1 ) is true when Ti initiates, and (b) pre(Si;j ), j > 1, is true when Si;j is executed. The simpli ed ACC algorithm ensures this with four points which we now summarize: Before initiating transaction T : request A(pre(Si;1 )) locks on all items referenced by pre(Si;1 ). Before initiating step S : unconditionally grant i

i;j

A(pre(Si;j +1 )) locks on all items in pre(Si;j +1). As a step executes: acquire conventional read and

write locks.

When a step S terminates: unconditionally release all conventional and A(pre(Si;j )) locks We interpret the points of this algorithm as follows: Request A(pre(S 1)) locks: To ensure (a), the i;j

i;

ACC locks pre(Si;1 ) on behalf of Ti . pre(Si;1 ) is implied by I and can be false only if some active transaction has modi ed an item, x, referenced by pre(Si;1 ). The ACC examines the concurrent locks on x to see if pre(Si;1 ) might have been invalidated. If Tk holds a A(pre(Sk;l )) lock on x, then Tk has completed or is about to complete executing the sequence of steps Sk;1; : : :; Sk;l?1. It must be determined whether this sequence (as a whole) interferes with pre(Si;1 ) (through table lookup). If so, pre(Si;1 ) might not be true and Si;1 is delayed. Grant A(pre(S +1)) locks: When this request is made, pre(Si;j ) is already locked, and hence true of the database. It follows from (3) that when Si;j completes pre(Si;j +1 ) will be true of the database. We can prove that any concurrently executing step, which might be serialized by the ACC between Si;j and Si;j +1, does not invalidate pre(Si;j +1 ). Thus the ACC grants A(pre(Si;j +1 )) locks on all items in pre(Si;j +1 ). i;j

Acquire conventional read and write locks:

Conventional locks are dealt with in the usual way. In addition, if Si;j requests a write lock on an item that is assertionally locked with a A(pre(Sk;l )) lock, an interference table is consulted to see if Si;j interferes with pre(Sk;l ). If so, Si;j is delayed. Note that the locking algorithm never checks the value of an item to see if it satis es an assertion. The implemented one-level ACC is signi cantly more ecient than the simpli ed ACC algorithm described above. The implemented algorithm acquires assertional locks on items dynamically at the time conventional locks are acquired. It thus reduces lock holding time and additional excursions through the locking code beyond those required for conventional locking. Additionally the implemented algorithm uses assertions that are weaker than those that appear in a maximally reduced proof, and thus more concurrency is allowed. Details are contained in [4]. The ACC allows transactions to expose the values of variables it has accessed before committing. While the ACC guarantees that active assertions will be satis ed by these exposed intermediate values, it does not directly address the relationship between these values

and the real world state modeled by the application. For example, some transactions might require that they read only committed data to operate correctly. Or some transactions that access multiple items might require that the values it reads all correspond to the same snapshot of the database. We deal with these issues in [11] where we augment interstep assertions in order to restrict undesirable interleavings. One concern about non-isolated execution is the effect it has on legacy and ad hoc transactions (which have not been analyzed as described). The ACC, however, requires transactions to hold assertional locks on the items they access until commit. It uses these assertional locks to ensure that these transactions do not see the intermediate results of multi-step transactions and are thus completely isolated. Thus correct legacy and ad-hoc transactions will continue to operate correctly in the presence of multi-step transactions.

If the deadlock recurs when Si;j restarts, the system will rollback Ti by executing CSi;j ?1. This introduces the possibility of an unrecoverable deadlock, which occurs when none of the transactions involved in the deadlock can proceed or compensate. To prevent unrecoverable deadlock, the ACC ensures that whenever a compensating step is involved in a deadlock, the deadlock can be resolved. To guarantee that when CSi;j executes it will not have to wait because a concurrent transaction holds an assertional lock, we introduce a new type of assertional lock. Under the assumption that CSi;j modi es only items that Si;1; : : :Si;j has already modi ed, Si;j requests these new locks only on these items. When a compensating step CSi;j completes a deadlock cycle, it is not itself aborted, but rather, the ACC aborts all steps that are delaying it. Thus the deadlock is removed. A more complete discussion of this issue can be found in [4].

Transaction rollback is a problem with any system not enforcing isolation. Rollback cannot be handled by simply restoring the before values of database items that the rolled-back transaction has modi ed since its [intermediate] results might have been exposed to another transaction that has already committed. In contrast to a transaction, a step is atomic and isolated and therefore it can be aborted. Thus the rollback of a transaction starts with the abort of the current step. The subsequent corrective action depends on the cause of the rollback. To rollback a transaction that has executed an abort statement, or in the case of a system crash, compensating steps are used [9]. The compensating step CSi;j compensates for the forward steps Si;1 ; : : :; Si;j by \semantically undoing" the behavior of the forward steps. The coding of compensating steps is the responsibility of the transaction programmer. For each step Si;j , the triple fI gSi;1 ; : : :; Si;j ; CSi;j fI ^ Qi g must be a theorem. Thus (a) compensation must restore database consistency, and (b) the result of Ti must allow for the possibility of an unsuccessful execution (where Si;1; : : :; Si;j is followed by CSi;j ). Compensating steps are treated in the same way as ordinary steps in determining allowable step interleavings. A deadlock is detected by nding a cycle in a waitfor graph and aborting the step, Si;j , that completes the deadlock cycle. If the transaction waiting for Si;j is waiting because of a conventional lock, the deadlock is resolved. If the transaction waiting for Si;j is waiting because of an assertional lock, the deadlock might still exist because Ti holds assertional locks between steps.

As an example of decomposition and interference analysis, this section decomposes a simple order processing system loosely based on the TPCC benchmarks [23] to illustrate several features of the concurrency control. The performance measurements in Section 5 were obtained with the actual benchmark transactions. The database tables are organized as follows (keys are underlined): orders(order id, customer id, number of distinct items, price): Each tuple encodes an order. price contains the total price of all items in the order and number of distinct items contains the number of di erent item ids ordered. stock(item id, s level): Each item has a tuple that contains the quantity of stock that is available to ll incoming orders. prices(item id, price): Each item has a tuple that contains the unit price. orderlines(order id, item id, ordered, lled): Each tuple encodes an item named in some order. The ordered eld is the quantity ordered. The filled eld is the quantity of the item that will be shipped for this order. A tuple with the same order id must exist in the orders table. Tuples with the same item id must exist in the stock and prices tables. There is also a database variable current order number that acts as a counter. We deal with only two types of transactions: new order and bill . new order(shown in gure 1) enters an order into the system. Each order has a single tuple in the orders table, and num items of tuples in the orderlines table. new order is divided into steps. The rst step consists of the entire execution until the rst orderline is to be entered, and subsequent steps

3.4 Recovery

4 Example

new_order(cust_id, num_items, items[],quant[]){ /* arrays holds item_id and quantity */ STEP 1; /* STEP BOUNDARY */ get current_order_number into o_num and increment current_order_number; insert the new order into orders; foreach requested item: STEP 2; /* STEP BOUNDARY (within loop)*/ Get the lesser of requested and in-stock; update stock and insert the orderline;

Figure 1: new order Transaction Algorithm consist of the addition of a single orderline. The second transaction is the bill transaction. bill totals the prices of the orderlines for an order enters a value for price in the order record, prints a packing label and bills the customer. We deal with a single conjunct of I, I1 : (8o 2 orders; ol 2 orderlines) (o:num distinct items = jfol j ol:order id = o:order idgj) I1 asserts that the the number of tuples in orderlines with a speci c order id must be equal to the value of the num distinct items eld in the corresponding tuple in orders. While I1 applies to the entire database, it can be rewritten as conjuncts, one per order id. We denote the conjunct of I1 for order o num as I1o num . The partial execution of new order interferes with I1o num where o num is the index of the order being added. new order requires only key constraints and foreign key constraints as a precondition, but not I1 . bill, on the other hand, requires I1o num as a precondition, where o num is the index of the order being shipped. Thus the analysis [below] demonstrates that instances of new order can be arbitrarily interleaved, but bill cannot be interleaved between the steps of a new order acting on the same order. The one-level ACC can enforce this. The insert into orders invalidates I1o num . The initial step of new order could not add a tuple to orders if a tuple with o num = order id already existed, and thus no tuple exist in orderlines with o num = order id. Each iteration adds such a tuple and thus the precondition to each iteration can be written (8ol 2 orderlines)(i = jfoljol:order id = o numgj) where i is the loop counter. Thus I1o num is restored by the nal iteration of the loop. The result of new order can be encoded as: [(9o 2 orders)

_

(o:order id = o num^ o:number of distinct items = num items^ (8i; 0  i < num items; 9ol 2 orderlines) (ol:order id = o num ^ ol:item id = items[i]^ ol:ordered = quant[i])]

[(8o 2 orders) (o:order id = 6 o num)^compensation was invoked] The meaning of this result is that either (a) an order with order id equal to o num and to num items is number of distinct items set in the orders table and entries for each ordered item appear in the orderlines table, or (b) the order was compensated for and no order with order id of o num is in the orders table In the proof (3) of new order, no inter-step assertion is interfered with by any step of another instance of new order. Thus the steps of instances of new order can be allowed to interleave arbitrarily. These executions might produce states not realizable in serializable executions. For example if Ti and Tk are concurrently executing instances of new order, and both are ordering 10 units each of televisions and VCRs, then it is possible for Ti to have the order for televisions lled, but not the order for VCRs and for Tk to have the order for VCRs lled, but not the order for televisions. This non-serializable execution is acceptable since the speci cation of each new order is met and since the database ends in a consistent state. An abbreviated proof (3) of bill, re ecting its result (omitted for brevity), can be written with I1o num as a precondition. Thus bill need be delayed only when the corresponding new order is executing. bill is a single step and thus does not require compensation. The compensation for new order consists of returning any items in orderlines with order id equal to o num to stock and removing the relevant tuples from orders and orderlines. The execution of compensation for an instance, Ti of new order might lead to a state that could not have been reached by the serial executions of the remaining transactions. For example another instance, Tj of new order, executing between the forward steps and compensating step of Ti , might have requested an item that is returned to stock by the compensation. Tj might not get the item since it was out of stock, even though the item is in stock in the nal state. This is acceptable since the resulting schedule is semantically correct. A more detailed analysis of this example is contained in [4].

5 Experimental Results

A one-level ACC was implemented as a modi cation to the concurrency control provided in the CA-

Open Ingrestm database system, and the resulting system was tested using the TPC-Ctm Benchmark Transactions. The same load was applied to the unmodi ed Open Ingres system, and the performance of the two systems was compared. These experiments and their results are described in this section. Open Ingres version 1.1 provides serializability as the default isolation level using two-phase locking of tables and pages with intention locks. Indices are used to allow the system to use page locks as much as possible. An optimized version of a one-level ACC was implemented by adding assertion mode locks to conventional locks within Open Ingres. The design is an optimized version of the ACC algorithm presented in section 3.2. In addition to the events in that algorithm, before a step terminates, the ACC stores an end-of-step record, used in crash recovery, in the log, and saves some of its work area in a database table for compensation. These actions represent overhead and are included in the measured results.

5.1 The TPC-C Benchmark

The TPC-C Benchmark (designed by the Transaction Processing Performance Council) is a popular benchmark for online transaction processing systems. The benchmark simulates a simple order processing system for a geographical area served by a set of warehouses, where each warehouse is divided into districts. An order is placed by a customer over a terminal connected to a particular warehouse. There are ve transaction types, new-order, payment, delivery, stock-level, and order-status, of varying frequency, response time requirements and data access characteristics. Serializable isolation is speci ed for all transaction types except one, which is allowed to run at the read committed level. The standard requires that 1% of new-order transactions abort. Details can be found in [23]. As an example of the interaction between transactions, consider new-order and payment transactions. Each tuple in the district table describes a district and contains a counter used to number the orders in that district. Orders are required to be consecutively numbered. Each new-order transaction increments the counter and hence must acquire a write lock on the tuple. This tuple also contains the year-to-date total payments for orders placed within the district, and hence the new-order and payment transactions con ict. This con ict can have a substantial impact on performance, since these two transaction types together are speci ed to constitute approximately 86% of the transaction mix. The design-time analysis is capable of recognizing that updates to the counter and

the year-to-date payment eld do not interfere and hence allows transactions of these two types, within the same district, to interleave. Each transaction type within the TPC-C benchmark was analyzed and decomposed into steps. The decomposition and interference analysis was similar to that in Section 4, except that the TPC-C analysis was more involved since the number of tables and transactions is greater, and since the consistency constraint, I, has twelve components. Interference tables for the benchmark application were constructed at design time, based on the decomposition, the postcondition of each transaction type, and I. Eleven distinct forward steps types were de ned. In addition, compensating steps were implemented, including one for the new-order transaction since the speci cation forces the aborts of new-orders to occur during the order of the nal item.

5.2 Experiments

We expected the performance bene ts of the ACC to be most obvious when lock contention was high, when the transaction suite contained long running transactions, and when sucient processors were available to take advantage of the added concurrency made possible by reducing lock delays. To study these e ects, we identi ed a number of variables whose values could be used to parameterize the experiments including: Degree of Concurrency { The level of concurrency is limited by the number of terminals connected to a warehouse. The greater the number of terminals, the greater the potential for lock contention. Hot Spots { A variety of distributions were used in selecting the actual arguments used by particular transactions. The degree to which these distributions are skewed determines the extent to which transactions access common data items. Lock Duration { Lock duration a ects lock contention. We used two ways to vary lock time: adding compute time between successive SQL statements and increasing the number of items in an order. In both cases the duration of new-order and delivery transactions is increased. In order to explore the lock contention problem, some of the results presented below involved experiments whose parameters violate the benchmark speci cations, and by its nature the ACC decomposition of transactions violates the benchmark isolation requirements. These parameters include the number of items per order, the extent to which a distribution is skewed, and the introduction of compute time within a transaction. In addition, some experiments did not adhere to the restriction on response time. Thus while the experiments demonstrate the improvement possi-

ble using the ACC in an order processing system, they are not compliant with the TPC-C speci cations.

5.3 Experimental Results

The experiments were run with a single warehouse serving ten districts. Results are plotted as a function of the number of terminals connected to the warehouse. The ordinate is the ratio of the average response time over all transactions measured for the unmodi ed system to that measured for the ACC. Thus a value of 1.5 indicates that the ACC has improved performance by 50% as measured using the criterion of that graph. Experiments were run with three database servers. Figure 2 shows that when concurrency is low, the ACC does not perform as well as the unmodi ed system due to the added overhead of the ACC. The crossover in performance occurs at about 20 terminals. With 60 terminals, the average response time of the unmodi ed system is more than 40% greater than with the ACC. When the district distribution is skewed, creating hotspots in the district table, the e ect is exaggerated, with the response time of the unmodi ed system 60% greater than with the ACC. Total Average Response Time R a 1.7 t i 1.6 Skewed o 1.5 Standard N 1.4 o n 1.3 A 1.2 C 1.1 C / 1 A 0.9 C 0 10 20 30 40 50 60 C j

Number of terminals

Figure 2: The E ect of Hotspots. Figure 3 summarizes the e ect of adding several milliseconds of compute time between successive SQL statements in each transaction. The dotted curves in Figures 2 and 3 are essentially the same. The solid curve in Figure 3 demonstrates the e ect of the added compute time. The response time of the unmodi ed system is more than 80% greater when compute time is introduced. Not surprisingly there is a negative correlation between response time and throughput at a given number of terminals. Figure 4 shows not only response time, but also transaction throughput, as measured using the ratio of the number of transactions completed in the unmodi ed system to the number com-

R a t i o N o n j

A C C / A C C

Total Average Response Time 1.9 1.8 With Compute Time 1.7 W/O Compute Time 1.6 1.5 1.4 1.3 1.2 1.1 1 0.9 0 10 20 30 40 50 Number of terminals

60

Figure 3: The E ect of Transaction Duration. pleted with the ACC. Figure 4 demonstrates that not only does the response time decrease, but the throughput achieved by the ACC increases as the number of terminals increases. R Response Time and Throughput a 1.5 t Response Time i o 1.4 Throughput 1.3 N o n 1.2 A 1.1 C 1 C / 0.9 A C 0.8 0 10 20 30 40 50 60 j

C

Number of terminals

Figure 4: Response Time and Throughput Finally a fourth set of experiments (not shown) determined the relationship between the number of database server processes and the performance of the system. As expected, with a single server, where the server is constantly servicing requests, the server is the bottleneck and performance for the ACC is slightly lower than that for non-ACC. When multiple servers are active, and lock contention becomes the system bottleneck, the ACC performs as shown in gures 2-4.

6 Conclusion

Our goal in this paper has been to demonstrate that the performance of a transaction processing system can be improved by decomposing the transactions into atomic, interleavable steps and then scheduling those steps with an assertional concurrency control to achieve semantic correctness. Semantic correctness is a new correctness criterion, weaker than serializability, that guarantees only that each transaction satis es its

speci cations. We designed such an ACC and implemented it within the Open Ingres database management system. We then performed experiments using the TPC-C benchmark transactions, which we decomposed into steps according to our theory. The experiments demonstrated signi cant performance improvements, up to 80%, when lock contention is high, when long running transactions are a part of the transaction suite, and when sucient system resources are present to support the additional concurrency that the ACC makes possible.

References

[1] Agrawal, D., El Abbadi, A., and Singh, A. Consistency and orderability: Semantics-based correctness criteria for databases. ACM Transactions on Database Systems, 18(3):460{486, September 1993. [2] Ammann, P., Jajodia, S., and Ray, I. Applying formal methods to semantic-based decomposition of transaction. ACM Transactions on Database Systems, pages 215{254, 1997. [3] Badrinath, B. R. and Ramamritham, K. Semanticsbased concurrency control: Beyond commutativity. ACM Transactions on Database Systems, 17(1):163{ 199, March 1992. [4] Bernstein, A., Gerstl, D., Leung, W.H., and Lewis, P. Design and performance of an assertional concurrency control system. Technical Report 97/02, SUNY at Stony Brook, 1997. [5] Bernstein, A. and Lewis, P. High performance transaction systems using transaction semantics. Distributed and Parallel Databases, 4(1):25{47, 1996. [6] Elmagaramid, A., editor. Database Transaction Models for Advanced Applications. Morgan Kaufman, 1992. [7] Eswaran, K. P., Gray, J. N., Lorie, R. A., and Traiger, I. L. The notions of consistency and predicate locks in a database system. Communications of the Association for Computing Machinery, 19(11), November 1976. [8] Farrag, A. A. and Ozsu, M. T. Using semantic knowledge of transactions to increase concurrency. ACM Transactions on Database Systems, 14(4):503{ 525, December 1989. [9] Garcia-Molina, H. Using semantic knowlege for transaction processing in a distributed database. ACM Transactions on Database Systems, 8(2):186{ 213, June 1983. [10] Garcia-Molina, H. and Salem, K. SAGAS. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 249{259, 1987. [11] Gerstl, D., Bernstein, A., and Lewis, P. Correct non-two-phase transactions. Technical Report 96/07, SUNY, Stony Brook, 1996.

[12] Heineman, G. and Kaiser, G. The CORD approach to extensible concurrency control. In International Conference on Data Engineering, pages 562{571, 1997. [13] Herlihy, M. Apologizing versus asking permission:optimistic concurrency control for abstract data types. ACM Transactions on Database Systems, 15(1):96{124, March 1990. [14] Hoare, C.A.R. An axiomatic basis for computer programming. Communications of the Association for Computing Machinery, 12(10), October 1969. [15] Korth, H., Kim, W., and Bancilhon, F. On LongDuration CAD Transactions. Information Sciences, 46:73{107, 1988. [16] Lynch, N. Multilevel atomicity-a new correctness criterion for database concurrency control. ACM Transactions on Database Systems, 8(4):484{502, December 1983. [17] McCurley, E. An Assertional Characterization of Serializability and Locking. PhD thesis, Cornell University, 1988. [18] Owicki, S. and Gries, D. An axiomatic proof technique for parallel programs i. Acta Informatica, 6:319{340, 1976. [19] Rastogi, R., Mehrotra, S., Breitbart, Y., Korth, H., and Silberschatz, A. On correctness of non-serializable executions. In ACM Principles of Database Systems, pages 97{108, May 1993. [20] Schwenkreis, F. A formal approach to synchronize long lived computations. In Proceedings of the 5th Australasian Conference on Information Systems, 1994. [21] Sha, L., Lehoczky, J. P., and Jensen, E. D. Modular concurrency control and failure recovery. IEEE Transactions on Computers, 37(2):146{159, February 1988. [22] Shasha, D., Llirbat, F., Simon, E., and Valduriez, P. Transaction chopping: Algorithms and performance studies. ACM Transactions on Database Systems, 20(3):325{363, September 1995. [23] Transaction Processing Performance Council (TPC). TPC Benchmarktm C, Standard Speci cation, Revision 3.1, June 1996. [24] Wachter, H. and Reuter, A. the contract model. In Elmagaramid, A., editor, Database Transaction Models for Advanced Applications, pages 220{263. Morgan Kaufman Publishers, 1992. [25] Weihl, W.E. Commutativity based concurrency control for abstract data types. IEEE Transactions on Computers, 37(12):1488{1505, December 1988. [26] Weikum, G. Principles and realization strategies of multilevel transaction management. ACM Transactions on Database Systems, 16(1):132{180, March 1991.

Suggest Documents