Lock Inheritance in Nested Transactions 1 ... - Semantic Scholar

7 downloads 0 Views 257KB Size Report
Lock Inheritance in Nested Transactions. Laurent Dayn es, Olivier Gruber and Patrick Valduriez. Projet Rodin, INRIA. Rocquencourt, France. Firstname.
Lock Inheritance in Nested Transactions Laurent Daynes, Olivier Gruber and Patrick Valduriez Projet Rodin, INRIA Rocquencourt, France [email protected]

Abstract The exibility of nested transactions is generally provided at the expenses of a more complex locking mechanism which must deal with expensive lock inheritance. In this paper, we give a solution for ecient lock inheritance. Our solution does not change the original nested transaction model but does revisit its locking rules using set-oriented semantics. This allows us to trade the cost of lock propagation at sub-transaction commit for a potentially more complex con ict detection. Then we propose an ecient lock implementation which maintain the overhead of lock requests comparable to the traditional overhead in at transactions. We conducted a number of comparative measurements in order to evaluate our trade-o . Our benchmarks show a cut o from 10% to 40% of the global time spent in lock operations, including lock requests and commits.

1 Introduction The traditional at transaction model has long been recognised as too limited for complex dataintensive application domains like engineering and oce automation. These applications exhibit new requirements in terms of distribution, cooperation and long duration which call for more exible transaction management. An important exible transaction model is that of nested transactions proposed by Moss [13] which has been the basis for many extensions [8, 20, 4, 21, 22]. A nested transaction is a tree of transactions with a top-level transaction as root and subtransactions as leaves or intermediate nodes. A sub-transaction can either commit or abort, but commitment will take e ect only when the top-level transaction commits. Thus, the top-level transaction has the ACID (Atomicity, Consistency, Isolation, Durability) property while the subtransactions are only atomic and isolated. The main advantage of nested transactions is to allow partial transaction abort which is particularly useful for long or complex transactions as well as to cope with reliable distributed computing. Another advantage is to allow safe parallelism through 

This work has been funded by ESPRIT BRA project FIDE2

1

the parallel execution of sub-transactions. The two basic forms of parallelism are parent/child parallelism, which allows a child transaction to run concurrently with its parent, and siblings parallelism, which allows several children to run concurrently. Four di erent combinations may then be produced[8]. This exibility is provided at the expenses of a more complex logging and locking mechanism which can incur signi cant overhead. Such complexity and performance problem explain why there are very few implementations. Some object database systems have only partial implementations [1]. More complete implementations can be found in some distributed operating system projects like Argus[11], LOCUS[14], Eden[17], Clouds[2] and Camelot[3] (and its commercial version Encina). The overhead of nested transactions is due to log and lock inheritance along the transaction hierarchy. Log inheritance provides the atomicity property of nested transactions, i.e., the entire transaction trees. Sub-transactions are not durable entities but they are atomic. To provide such atomicity, logging is necessary at the sub-transaction level. Thus, the atomicity of a nested transaction requires log records to be upward inherited. That is, the log records must be accessible to the parent transaction to be able to redo or undo the e ects of a child transaction according to its own termination. Lock inheritance provides the isolation property of nested transactions. Locks may be inherited upward or downward the transaction hierarchy. With upward inheritance, the locks of a committing transaction are obtained by its parent transaction. That way, locks are retained until the top-level transaction commits. Downward lock inheritance allows a parent transaction to grant its locks to its children. It permits the working sets of sub-transactions to overlap the working sets of their ancestors. Although excellent solutions exist for ecient logging [12, 18] and fast commit of nested transactions [10, 16], this is not the case for ecient lock inheritance. In fact, we are not aware of any good solution. The major cost of lock inheritance, incurred by each lock request and each subtransaction commit, grows linearly with the number of sub-transactions and locks. In this paper, we give a solution that reduces this cost. Existing implementation of nested transaction locking respects the original formulation of nested locking rules [8] and extends the traditional implementation of a lock manager [6]. Our solution fully respects the original nested transaction model. However, we revisit the locking rules using setoriented semantics. This allows us to trade the cost of lock propagation at sub-transaction commit for a potentially higher cost of con ict detection. Then we propose an ecient implementation of locks whose overhead for requesting locks is comparable with the one of a traditional implementation for at transactions. We restrict our discussion to sibling parallelism for the sake of simplicity, 2

although nothing precludes our solution to be used with parent/child parallelism. This paper is organised as follows. Section 2 recalls the original (close) nested transaction model with a set-oriented formulation of the con ict detection rules and exhibits the cost of lock inheritance. In Section 3, we present a new design for supporting lock inheritance. In Section 4, we introduce a new implementation of locks. In Section 5, we compare the performance of the two approaches based on our implementation of both lock managers. Section 6 concludes.

2 Traditional Lock Inheritance In this section, we rst recall the nested transaction model of Moss [13] and its traditional design. We use a set-oriented formalism which will be useful to us in the next section for presenting our new design for lock inheritance. We focus on locking-related issues, in particular, con ict detection and scheduling rules1 . Then, we present the classic implementation of a lock manager for nested transactions [6, 7] in order to discuss the cost of lock inheritance. We assume that nested transactions are uniquely identi ed in the system. Furthermore, a transaction T is fully identi ed within its hierarchy: its position being represented as the set of its ancestor transactions.

Ancestors(T ) = fT g [ Superiors(T ) Superiors(T ) = Parent(T ) [ Superiors(Parent(T )) Respectively, Descendant of a transaction T can be de ned as : Descendant(T ) = fT g [ Inferiors(T )) [ ( Inferiors(T ) = Children(T ) [ ( Inferiors(Ti))) 8Ti2children(T )

Note that only non-completed (running) children of a given transaction T belong to the set Children(T ). All sets are non-ordered and without duplicates. From the isolation point of view, the e ects of a transaction are only visible to its inferiors. When a transaction commits, its e ects are considered as e ects of its parent transaction. As a consequence, these e ects become visible to the inferiors of the parent transaction. These visibility rules are supported through two distinct mechanisms. The visibility of the e ects of ancestors is usually implemented through the con ict detection mechanism. To easily express the con ict detection rules, transactions owning a lock are divided 1

Deadlock detection is basically the same as [19] and is out of the scope of this paper.

3

in two sets: the read owner set denoted by Rowner (l), which indicates the transactions with a read lock; and the write owner set denoted by Wowner (l) for the transactions with a write lock. Given these two sets, the rules are 2 . 

Con ict detection on a read lock request

Wowners(l)  Ancestors(T ) A transaction T requesting a read lock does not con ict with the current lock settings if all write owners of the lock are ancestors of T . 

Con ict detection on write lock request (include the upgrade case)

(Wowners(l) [ Rowners(l))  Ancestors(T ) A transaction T requesting a write lock does not con ict with the current lock settings if all owners of the lock are ancestors of T . If these rules allow a transaction to see its ancestors' e ects, they do not allow it to see the e ects of its child transactions that have committed. To allow this, a second mechanism is introduced. Each transaction that commits propagates all its locks to its parent transaction, i.e., looses the ownership of its locks which become owned by its parent transaction. In order to propagate locks upward, transactions must record all the locks they own in a bookkeeping structure which we call a lockset. Each time a transaction T acquires a lock l, it adds it in its lockset. When T commits, its lockset is scanned and all its locks are propagated up to its parent transaction. This is equivalent to a commit in a at transaction model but for the operation performed on each lock which is more complex. Table 1 shows the operation performed in the two cases of a sub-transaction and top-level transaction commit. Remark that locksets are also necessary for aborting transactions. Given the above design, we now move on to its implementation. Figure 1 illustrates the classic implementation as described in several excellent papers or books [6, 7, 3]. A Lock Control Block (LCB) contains information about locks, such as the name of the lock, the (read or write) mode of the lock. A xed-size hash table is useful to speed up LCB lookups. A LCB also contains two queues of transactions. One queue contains both the Rowners and Wowners which correspond to granted requests, the other queue contains pending transactions. Queues are doubly-linked lists of Lock Request Blocks (LRBs). Each requester of a lock is assigned an LRB, 2

Because we do not distinguish between hold and retained locks, these rules apply with siblings parallelism only.

4

Sub-transaction commit Top-level transaction commit 8 l 2 lockset(T ) 8 l 2 lockset(T ) if ( T 2 Wowners (l) ) Wowners(l) Wowners(l) ? fT g S Wowners(l) (Wowners (l) ? fT g) P arent(Ti) Rowners(l) Rowners(l) ? fT g Rowners(l) Rowners(l) ? fT g else S Rowners(l) (Rowners(l) ? fT g) P arent(T ) endif S lockset(P arent(T )) lockset(P arent(T )) lockset(T )

Table 1: Commit Algorithms

T1 LCB

Transactions

T1 Lockset

LCB

LRB

T2

LRB

LRB

LCB

LRB T2 Lockset

Lock Control Block (LCB) Next Lock Lock Hash Table

Lock Name

Group Mode

Read/Write Queue

Pending Queue

Free LCB Pool Mode

Hash Slot

Transaction Info

Lock Request Block (LRB)

Figure 1: Lock Manager Implementation.

5

LRB

which contains information about the requester, such as the requested lock mode, its state (pending or granted) and other transaction information. To set a lock on an object, the lock manager does the following. It computes a hash value for the name of the object to be locked. The name of an object is usually the object identi er. If there is no LCB for the object, it initialises an LCB for the new lock and attaches it to the hash bucket chain. Then, it scans the LRB chain for looking up if the requester already has one LRB. If none is found, it allocates a new LRB and chains it to the right LCB queue. It determines the right LCB queue based on whether there is a con ict or not between the requested mode and the lock settings. Con ict detection is performed while the LRB chain is scanned for looking up the requester LRB. Additionally, the lock manager also chains each new LRB to the lockset of the requester. Given this implementation, let us now look at the overhead of nested transaction from the point of view of locking. Having nested transactions make the length of lock LRB chains longer. Where a single LRB was sucient in a at transaction model, several LRBs may be needed if several transactions of a same hierarchy all request the same lock. This yields an obvious space overhead and increases the time for requesting locks. The added overhead upon lock requests stems from longer lists of LRBs which make operations on lock owner sets potentially more expensive. This is true whether con ict detection is involved or not. When the group mode in the LCB does not con ict with the requested mode, e.g., read/read, con ict detection is equivalent to that of the at model and therefore ecient. However, the LRB chain has to be scan to check if the requester already has an LRB in that chain (already owns the lock). When the requested mode and the lock group mode do con ict, the earlier rules for detecting con icts apply which requires the lock manager to scan the entire lists of LRBs representing lock owner sets and check for each LRB if the corresponding transaction is an ancestor of the lock requester. The signi cance of the above cost depends on the nature of transaction hierarchies. The more overlapping and con icting transactions are, the more costly lock requests. One could consider that transactions do not con ict that much and that pending queues are rare and short, thereby arguing that the above cost is negligible. However, we believe that nested transactions in complex application domains potentially overlap their sub-transactions, mixing read and writes, which would lead to the frequent applications of the above costly rules. This will happen if nested transactions are used for supporting multiple successive interactions with a server, each with the same working set. Again, this will be the case if sub-transactions are used for a ne grain of atomicity while working on a given working set. In all cases and whatever is the natures of the transaction hierarchies, a signi cant cost factor 6

exist using nested transactions. Upon commits, both sub-transaction and top-level transaction commits, the entire locksets of committing transactions have to be scanned for propagating locks up (sub-transaction case) or for releasing locks (top-level case) as well as for re-scheduling pending transactions. These repetitive scans hurts both response time and throughput.

3 Lock Inheritance Revisited In this section, we present a new design for lock inheritance in nested transactions. The basic idea is to trade the scanning of locksets at sub-transaction commit for a potentially more costly con ict detection. Eliminating lock propagation at sub-transaction commit obviously improves both response-time and throughput. Of course, this is ne as long as we do not loose much upon lock requests because of the more costly con ict detection. Here, we present our new design and we postpone discussing the equilibrium of this trade-o in the next section. The roots of our new design can be found in Moss's original design, although they have not been fully exploited. Looking at the con ict detection rules, one can realize that a transaction T is said to be non-con icting with a set of transactions, this set of transactions happening to be the ancestors of T. However, if we lift this constraint of being an ancestor, a transaction can then be considered as non-con icting with a set of arbitrary transactions. Intuitively, this allows us to declare a transaction as non con icting with the transactions whose e ects should be visible. For instance, a transaction could be declared non-con icting with its ancestors making ancestors' e ects visible. Furthermore, a transaction could be declared non-con icting with its committed sub-transactions, making the e ects of these committed sub-transactions visible. More formally, each transaction manages two transaction identity sets: an Upward Inherited Identity Set (noted UIIS) and a Downward Inherited Identity Set (noted DIIS). The UIIS set of a transaction T holds the identities of the inferior transactions whose e ects should be visible. The DIIS set of a transaction T represents the identities of all other transactions whose e ects should be visible through the ancestors of T. In particular, the DIIS includes the ancestors and committed siblings of T. These sets are formally de ned as follows for a transaction T .

UIIS (T ) =

[

8Ti2Children(T )

(fTig [ UIIS (Ti))

DIIS (T ) = Parent(T ) [ UIIS (Parent(T )) [ DIIS (Parent(T )) where the Ti transactions are the committed sub-transactions of T . The UIIS and DIIS sets have to be dynamically maintained. For a transaction T, these sets are initialized when the transaction is created and must be updated whenever a sub-transaction of 7

T's hierarchy commits. That is, when a transaction T' commits, the UIIS of its parent transaction P must be augmented with T'. Furthermore, the DIIS of the other sub-transactions of P must also be augmented with T'. In other words, one can see the dynamic updates of DIIS and UIIS sets as ows of identities of committed transactions along hierarchies. Figure 2 shows these ows of identities. Step 1

Step 2

T UIIS={} DIIS={}

Tp UIIS={} DIIS={T}

T UIIS={Tp} DIIS={}

Tp

Tq UIIS={} DIIS={T}

Tp Commited UIIS={} DIIS={T}

Tp

Tq UIIS={} DIIS={T,Tp}

Tp Tp

Tr UIIS={} DIIS={T,Tq} O ’s lock

Ts UIIS={} DIIS={T,Tq}

ROwner={} WOwner={Tp}

Tr UIIS={} DIIS={T,Tq,Tp}

O ’s lock

Ts UIIS={} DIIS={T,Tq,Tp}

ROwner={} WOwner={Tp}

Figure 2: Transaction Identity Flows upon Commit. On the left hand side, we can see that transaction T started in parallel two sub-transactions Tp and Tq . Transaction Tp took an exclusive lock on an object O. Transaction Tq started two sub-transactions Tr and Ts. The states of the UIIS s and DIIS s express that only the e ects of ancestors are visible for now. On the right hand side, we can see the e ects of committing Tp. First, the owner sets of O's lock are left unchanged because locks are not propagated up to the parent transaction. Thus, the owner of O's lock in a exclusive mode is still Tp , now in a committed state. Second, the UIIS s and DIIS s are updated to re ect the new visibility requirements. The algorithmic view of these UIIS and DIIS updates is given in Table 2. We distinguish between the DIISs and UIISs because of the aborts of transactions. When a transaction aborts, it must release all the locks it owns, this includes the locks it requests as well as as the locks it inherited from its sub-transactions. In other words, an aborting transaction should abort its running sub-transactions on the one hand and on the other hand remove from the owner sets of all the locks it owns its identity and the identities of its committed sub-transactions. These identities are exactly those in the UIIS of the committed transaction. The DIIS corresponding to identities obtained from ancestors and should not be removed from lock owner sets. 8

Sub-transaction's commit

UIIS (Parent(T )) UIIS (Parent(T )) SfT g S UIIS (T ) 8Ti 2 Descendant(Parent(T )) DIIS (Ti) DIIS (Ti) S UIIS (Parent(TS)) lockset(Parent(T )) lockset(Parent(T )) lockset(T ) Table 2: UIIS and DIIS Updates upon Commits Given the UIIS and DIIS sets, one has to extend the con ict detection rules given earlier. Those rules were expressing that a transaction was non-con icting with its ancestors only. Now, a transaction must be non-con icting with all the transactions belonging to its DIIS and UIIS . More formally, the rules become: 

Con ict detection on a read lock request.

Wowners(l)  (fT g [ DIIS (T ) [ UIIS (T )) 

Con ict detection on write lock request (including the upgrade case).

(Wowners(l) [ Rowners(l))  (fT g [ DIIS (T ) [ UIIS (T )) Now the trade o of our new design should become clear. On the one hand, it makes con ict detection potentially more costly since more sets are involved and their cardinality may be higher. Indeed, the owner sets retain the identities of committed and running sub-transactions until the commits of top-level transactions. This makes owner sets larger than in a traditional design where owner sets only contain the identities of running transactions. On the other hand, it avoids the scans of locksets at sub-transaction commits and propagates transaction identities instead. Propagating the identities of committing transactions instead of their locks is much better in most cases because we usually have: j Descendant(Parent(T )) j  j lockset(T ) j Thus, the overall performance question turns out to be: is there a locking implementation that can support well the set-oriented operations needed for detecting con icts? If the answer is yes, then the gain is obvious because sub-transaction commits are a lot cheaper and lock requests not signi catively slower. Otherwise, we may be trading a faster sub-transaction commit for slower lock requests which may be either a good or a bad deal depending on the applications. Before we answer that question in the next section, we need to clarify one design point. 9

The point relates to the scheduling of pending transactions which was done as a side e ect of propagating locks up in a traditional design. Since this scan no longer exists, one can wonder how rescheduling is achieved. Two cases have to be considered, namely sub-transaction commits and top-level commits. When a sub-transaction T commits, the only sub-transactions that could be re-scheduled are the siblings of T or the inferiors of these siblings. These potentially schedulable subtransactions are exactly those to which the identity of T is propagated. Thus, pending transactions can be rescheduled in our new design as a side e ect of updating UIISs and DIISs. When a toplevel transaction commits, locksets have to be scanned for releasing the locks, that is e ectively cleaning the lock structures (LRBs) associated with the committing hierarchy. Scheduling will be achieved as a side-e ect of this scan as it was done before.

4 Lock Manager Implementation We now propose a new implementation for locks for supporting our design, that is for supporting the added complexity upon lock requests. We will rst sketch a simpli ed view of a transaction processing system as an architectural basis and will then introduce our new implementation for locks.

4.1 Architectural Model We use the architecture described in [6] as a basis. Basically, only the lock implementation will need to be impacted. However, recalling the global architecture will prove necessary later on. The architecture ts in a distributed environment including workstations and various kinds of servers such as multiprocessors. Despite distribution, the system provides a shared data space on which user computations run. The shared data space is supported through a set of data managers. Data managers represent a physical or logical fragmentation of the shared space. Examples of data managers are local database managers in a distributed database [15] or le systems in a transaction processing system. A data manager is local to a network node but several data managers can be mapped onto the same node. User computations are structured as hierarchies of transactions that are mapped onto data managers. Each transaction runs on a single data manager, that is, it manipulates data from only one data manager. A data manager can execute multiple transactions from di erent hierarchies. In addition, transactions can be nested within a single data manager for supporting a ner grain of atomicity or parallelism. There is neither assumption of co-location between transactions and their data managers nor on how transactions access data within their associated data manager. For 10

instance, transactions could be running at the network node of their data manager or transactions could be running as clients of a remote data server. Transactions request their locks to the lock managers associated with their corresponding data manager. A simple architecture associates a single lock manager to a data manager. However, nothing precludes multiple lock managers per data manager where the di erent lock managers provide di erent functionalities. Typically, one may provide complex hierarchical locking modes to cope with multi-granularity locking protocols while another lock manager may provide highly optimized shared/exclusive locks. To illustrate this, let us assume that shared data are structured using objects and collections of objects (or tuples and relational tables). Moreover, let us assume that collections are fragmented and therefore span multiple data managers. In this context, data managers will share a unique lock manager for locking at the collection level using all the traditional intention locking modes while having their own lock manager for read/write locking eciently at the object level. Our new design and new implementation for locks obviously concern the object-level lock manager for they only provide ecient read/write locking modes. A lock manager typically structures the locks it manages into lock spaces which are disjoint sets of locks. Each lock is associated with a lock name which identi es the resource it locks. Lock names are unique within lock spaces. To request a lock on a resource, a transaction must therefore know the lock manager in charge of the resource to be locked, the lock name of the resource to be locked and nally the lock space in which that lock name is meaningful. For performance reasons, a transaction must register itself to a lock manager before it can request a lock from that lock manager. Registration enables lock managers to assign a local identity to all transactions they will be dealing with. In other words, lock managers handle a mapping between system-wide transaction identi ers and the local transaction identi ers they use in their lock structures. The reason lies in the obvious reduction in size of identi ers which saves both storage and processing time. This is especially true in a nested and distributed context where transactions may have a large system-wide identity. A lock manager is usually unaware of the transaction relationships, which makes it a passive component with respect to transaction completions. This is the transaction manager which is responsible of the completion (commit or abort) of transactions using a distributed protocol over the concerned lock managers. This point is considered a though issue in the literature where various protocols were described. The solutions range from two-phase commit at all levels of transaction hierarchies to lazy propagation of information with query mechanisms for nding out the actual state of remote transactions, i.e. committed or aborted. Two-phase commit at all levels 11

was advocated for generality and simplicity in Eden[17] and an earlier version of Clouds[9] but it proved costly. Lazy propagation of information, used in Argus[10], provides better performance but it does not cope very well with intra-hierarchy con icts. Since the optimisations of nested locking we propose are orthogonal to which commit protocol should be used, we will not detail any further.

4.2 A New Lock Implementation To reduce the overhead of lock requests, it is critical to have a representation of the owner sets that makes their manipulation ecient. Our goal is to come up with an implementation that supports nested locking with roughly the same performance as at locking. If we succeed we would have avoided the overhead of scanning locksets at commit-time without loosing performance upon lock requests. We will present our solution in two steps starting from the traditional implementation of locks presented earlier. The rst step partly overcomes the time overhead of requesting locks when transactions are nested. Recall that a lock is composed of a Lock Control Block (LCB) that chains Lock Request Blocks (LRBs) which represent the transaction requests on that lock. Given that representation, the earlier complex rules for detecting con icts must be applied in all situations. However, in many cases, a much cheaper con ict detection can be applied. The idea is to treat con icts between hierarchies and within hierarchies in di erent ways. Con icts between hierarchies can be solved as they were in a at context. When con icts occur between transactions belonging to di erent hierarchies, one can remark that the earlier con ict detection rules become equivalent to the well-known read/write compatibility rules. Therefore, if we would know the requested modes per hierarchies and not only per transactions, we could apply the simple con ict detection used in lock managers for at transactions, which is cheap. A simple modi cation in the representation of owner sets will allow to do this. Instead of having owner sets being represented as at lists of LRBs, we can structure them according to hierarchies. LCBs chain Hierarchy Lock Request Blocks (HLRBs), one per hierarchy. Each HLRB chains all the LRBs belonging to its corresponding hierarchy. Moreover, each HLRB contains the most restrictive mode among the requested modes of the LRBs it chains. This is shown in Figure 3. We believe this rst optimization to be insucient because we believe that intra-hierarchy con icts are likely. When intra-hierarchy con icts do arise, the earlier rules have to be applied which may prove costly. Our second step optimizes this speci c case. The problem with the above implementation is that owner sets are implemented as linked lists, which is inecient for supporting sets and set-oriented operations such as intersection or inclusion. 12

LCB

HLRB(R)

HLRB(R)

LRB(R)

LRB(R)

LCB

HLRB(W)

LRB(R)

LRB(R) Lock Hash Table

LRB(W)

Figure 3: Lock Manager Architecture. We advocate the use of bitmaps to eciently represent and manipulate sets. If sets are small, bitmaps incur a very small space overhead and eciently support the set-oriented operations we are interested in through logical operators such as AND, NOT and OR. In this new approach, each lock manager manages a bitmap structure per hierarchy it knows. Each transaction that registers at a lock manager is allocated a bit number within the bitmap of its hierarchy. In other words, each allocated bit in a bitmap structure uniquely identi es a transaction of the corresponding hierarchy. Notice that transactions from di erent hierarchies may be allocated an identical bit number at a same lock manager or at di erent lock managers, but within two di erent bitmap structures. Given a bitmap structure per hierarchy, the lock implementation becomes the following. LCBs still chain HLRBs which still correspond to hierarchies of transactions. However, HLRBs no longer chain LRBs, but instead use bitmaps. Each HLRB contains two bitmaps, one for keeping track of read requests and the other for keeping track of write requests. All HLRBs of a hierarchy share the same bitmap structure, that is the ith bit always identi es the same transaction. In these bitmaps, bits that are set to 1 indicate the transactions that currently own the lock. This is shown in Figure 4. Given that new implementation of a lock, the interesting point is that the rest of a traditional lock manager design and implementation is left untouched. Locks of a lock space are still managed through a hash table whose key is the lock name. The hash table buckets still chain the LCBs which in turn chain HLRB. Pending queues are still managed at the LCB level. Only one minor part of the lock manager is impacted by our new lock implementation: locksets. In the traditional approach, locksets of transactions are implemented by chaining LRBs on a per transaction basis. Given our bitmap-based implementation, locksets can no longer be LRBs chains for locks do not 13

LCB

HLRB(R)

HLRB(R)

LCB

HLRB(W)

Corresponding Action A1

HLRB(W) Lock Hash Table

R 00101

Hierarchy

W 00110 A3 R&W Locks

A2 R={A3,A5} W={A3,A5} (Indices Are Bit Numbers)

A4

W Lock

A5

R Lock

Figure 4: Lock Implementation Based on Bitmaps. contain LRBs anymore. We propose that each transaction manages a stack that points to the HLRBs where its corresponding bit is set to 1. These stacks enable transactions to abort by selectively removing the bits of their UIIS from the right HLRB bitmaps. Committing or aborting top-level transactions is in no way more complicated than it was before. It suces to remove all the HLRBs of the corresponding hierarchy from LCB chains. Finding all the HLRBs of a hierarchy is easy because each hierarchy has its own HLRB allocator which can be scanned for that purpose. Each hierarchy has its own HLRB allocator for reducing latching and contention costs as advocated in [5]. Our new implementation of locks seems to solve the problems introduced by nesting. However, this is only true if bitmaps can be maintained small, that is, owner sets have to be maintained small. Otherwise, using bitmaps could become time-consuming and/or space-consuming. Fortunately, the owner sets that are managed in locks are likely to be small in the architecture presented earlier. The reason lies in the fragmentation of the architecture. First, the number of transactions to be named at a lock manager is much smaller than the total amount of transactions in the entire system. Second, transactions are named on a per hierarchy basis at each lock manager. Recall that bit numbers are allocated on a per hierarchy basis. Therefore, the number of bits needed in HLRB bitmaps is likely small. We believe that bitmaps of 128 bits will be sucient for most hierarchies. However, it is still possible that bitmaps do over ow in some rare situation. To be honest, our design increases that probability for it makes the identities of all transactions (committed and 14

LCB

HLRB(R)

HLRB(R)

A1

A1 Lockset

LCB

HLRB(W)

Lock Hash Table

A2

A3

A2 Lockset

Corresponding Action Hierarchy

Figure 5: Lockset Support. running ones) to accumulate within owner sets. A rst straightforward solution would be to abort the hierarchy whose bitmap over ows. A better way would be to avoid the over ow crisis through a background cleaning of bitmaps. The idea is simply to free bits from committed transactions before bitmaps over ow. Bits are retained in owner sets for preserving the nested isolation of hierarchies and therefore cannot be simply freed. However, all the bits of a committed subtree of transactions can be replaced in locks by the unique bit of the parent transaction of that tree. These bits to be replaced are exactly the UIIS of the transaction rooting the committed subtree. Doing this respects the isolation property of transactions and frees bit numbers so they can be reused, thereby avoiding bitmap over ows. It is in some sense a late propagation of locks up to the parent transaction. Given a committed subtree of transactions, locks to be updated can be easily found through the locksets of committed transactions. Having background cleaning is still better than propagating locks at each sub-transaction commit. First, it is a rarely done. Second, if it does happen, it clears multiple bits at once while the earlier approach makes one scan for cleaning one identity. Third, the cleaning operation is cheaper for it only consists in masking bitmaps while the earlier approach required doubly-linked lists manipulations. To summarize, we presented a new lock implementation which should be more ecient than the traditional one. We rst structured lists of LRBs according to hierarchies for limiting the application of the complex rules for detecting con icts, thereby allowing most con icts to be solved as eciently as they were in a lock manager for at transactions. Then, we proposed to use bitmaps 15

to represent owner sets in locks. Bitmaps need only to be small and therefore incur a small space overhead as well as eciently support the set-oriented operations needed for detecting con icts. Bitmaps need to be small for bit numbers only name a fragment of the total amount of transactions in the system. Our implementation has two more advantages. First, it enhances the re-use of locks, that is when a lock is requested several times in a read mode. In the traditional design, for every compatible read request, the chain of LRBs has to be scanned to nd out if that lock is already owned by the requester. Having structured the LRB chains into HLRB chains reduced their length, thereby speeding up the lock-reuse case. Second, our implementation of locks factorizes allocation costs. In the traditional implementation, an LRB has to be allocated for every single lock request. In our approach, the allocation of an HLRB is done only once for each hierarchy of transactions. Hence, whenever several transactions of a same hierarchy overlap their working sets, our scheme gains.

5 Performance Measurements We now discuss the performance of our nested locking scheme. This is based on our prototype implementation of a traditional lock manager for nested transactions and its optimized version. Our optimization trades the scanning of locksets at sub-transaction commit for a more complex con ict detection. Therefore, we rst show that con ict detection in our new lock implementation exhibits a performance level which equals or outperforms that of the traditional implementation, even if transactions are at. Then, we show that our new design and lock implementation reduce much the global overhead of locking in either lock requests or commits. This section starts by giving the operations of the lock manager that need be measured. Then it gives the performance of operations on individual locks and on locksets.

5.1 Operations Measured We measure the times of the operations provided by lock managers. These operations can manipulate individual locks (lock request) or locksets (lockset inheritance or lockset release). Here follows the speci c cases that we measured: 

Operations on individual locks: { Acquiring a free lock. { Acquiring a compatible lock (the lock is already owned and the lock mode does not

con ict with the requested mode).

16

{ Re-using a lock (the requested lock is already owned by the requester). { Acquiring an incompatible lock (the lock mode con icts with the requested mode based

on the simple read/write rule for detecting con icts; however, the rules for nested transactions will indicate that there is no real con ict).



Operations on locksets: { Lockset release (to abort any sub-transaction or commit a top- level transaction). { Lockset inherit (to make a sub-transaction inheriting a lockset of another).

Lockset release is the complete release of the locks of a lockset. This includes the cleaning of all the lock structures (e.g., LRBs or HLRBs) associated with the committing transaction. Lockset inherit means that the locks of a lockset should be made non con icting with a speci ed transaction. In the traditional design, this requires scanning the lockset and exchanging transaction identities in lock owner sets. In our new design, this is simply a ow of identity for updating the UIISs and DIISs. All measurements have been made on a 486 PC at 33 MHz running the OSF/MK3 operating system. All measurements have been obtained after an average of ten runs. All times are memory-resident and single-machine measures. The rationale is that the two solutions only di er in the internals (algorithms) of the lock manager and do not impact the interactions between lock managers and the other components of a transaction processing system.

5.2 Individual Lock Operations The cases of individual lock operations can be distinguished depending on their constant versus varying time. Only one case has a constant time: acquiring a free lock. Table 3 gives the times of acquiring a free lock in the traditional case (LRB) and with our optimization (HLRB). The benchmark used is a simple transaction which creates a lockset, takes 5000 locks and then commits. Requested Mode LRB HLRB Read Mode 197s 188s Write Mode 199s 192s

Table 3: Acquiring a Free Lock. 3

OSF/MK has a separated micro-kernel Mach 3.0.

17

Unsurprisingly, the times are almost the same in both implementations since acquiring a free lock is roughly equivalent when LRB or HLRB lists are empty. They share the hash table lookup, the creation of the LCB and also the creation of either a LRB or an HLRB. The slight gain of the HLRB approach is due to minor optimizations such as the management of the lockset. However, the two implementations will have di erent performance on compatible lock requests because the management of owner sets changes. These operations have a varying time depending on the degree of multiprogramming as well as the degree of overlapping between the working sets of sub-transactions within hierarchies. In the following two benchmarks, we measure these e ects. The rst benchmark measures the time to request a read lock on an object already locked in read mode. The goal is to vary the number of readers, from di erent hierarchies. This is equivalent to multiple reader locking in at transactions. This makes the length of LCB chains to increase in both lock implementations since for each owner there is either an LRB or HLRB in the LCB chains. Naturally, the cost of a read request will increase linearly as the number of parallel readers. This is shown in Figure 6(a). The traditional implementation is denoted LRB, and our optimized version is denoted HLRB. These notations will be conserved for all the following gures and tables. (a) Flat Compatible Locking

(b) Hierarchically Compatible Locking

micro-sec

micro-sec LRB HLRB

220.00

LRB HLRB

220.00

215.00 215.00 210.00 210.00 205.00 205.00 200.00 200.00 195.00 195.00 190.00 190.00 185.00 185.00 180.00 180.00 175.00 175.00 170.00 170.00 165.00 165.00 160.00 160.00 155.00 155.00 150.00 150.00 145.00 145.00 140.00

Tx 2.00

4.00

6.00

8.00

10.00

Tx 2.00

4.00

6.00

8.00

10.00

Figure 6: Compatible Locking. The second benchmark measures the time for requesting a read lock on an object already locked by other transactions belonging to the hierarchy of the request or. The benchmark used is a nested transaction whose fanout is one and nesting is 2 to 10 sub-transactions. Each level takes the same 5000 locks (same locked objects). The times are obtained by measuring the lock requests at the deepest sub-transaction only. The deeper is the sub-transaction, the bigger is the 18

number of ancestors that already own the locks. Figure 6(b) shows that the time of the traditional implementation almost linearly increases. The length of LRB chains increases as the number of ancestors owning locks increases. In contrast, our implementation provides a constant time because there is a single HLRB for supporting all the lock requests of a hierarchy. This shows the eciency of using bitmaps instead of using linked lists. Notice further that the same phenomenon would appear if parallel siblings would overlap their working set, for instance when read-only parallelism is used. These curves strongly suggest that bitmaps should fully replace linked lists. In this approach, each LCB would have two bitmaps for representing the Rowner and the Wowner sets. This requires that bit numbers fully identify transactions in a lock manager. In other words, bit numbers become lock-manager-wide identities and are no longer allocated on a per hierarchy basis. We did not have time to measure such an implementation but it would obviously performs even better. Although, one may worry about the bitmap size in such an approach.

5.3 Lockset Operations Locksets are manipulated for re ecting the commits of transactions which impact the visibility of e ects. There are two cases to be studied. The rst case corresponds to the release of a lockset which only happens when a sub-transaction aborts or a top-level transaction commits. The underlying algorithms are identical in the two measured lock managers. The lockset is scanned and all the LRBs (respectively. HLRBs) are extracted of the LCB chains and freed. As con rmed by the times given in Table 4, both lock managers perform almost equally. release lockset() (in ms)

# of Locks 10 50 100 500 1000 5000 10000 LRB 2 5 9 43 82 410 823 HLRB 3 1 7 36 79 383 781

Table 4: Lockset Release Times. The other case corresponds to the inheritance of a lockset from one transaction to another. This is used for re ecting visibility changes due to sub-transaction commits within hierarchies. The two algorithms used greatly di er. The traditional approach scans the speci ed lockset while our new implementation only plays with action identities. The corresponding overheads depend on the depth and fanout of the transaction hierarchies as well as the degree of overlapping between transactions' working sets. In the following benchmarks, we play with these di erent parameters and discuss the resulting overheads. 19

First, we measure the depth and fanout incidence. Times are obtained by measuring a single nested transaction taking 30000 locks with no overlap between the working sets of sub-transactions. Only the bottom leaf sub-transactions take locks. Di erent fanouts (2-5) and depth (2-3) are measured. Fixing the total number of locks requested as well as a no-overlap condition makes the overhead of locking equivalent in the two lock manager implementations. On the one hand, the no-overlap case causes only free locks to be acquired which incurs about the same overhead in both implementation. On the other hand, xing the total amount of locks xes the cost of the lockset release occurring upon the top-level commits. Again lockset release operations are almost equivalent in both implementations. Therefore, the only cost factor that varies is the inheritance of locks (see Figure 7). (a) Response-Time

(b) Inheritance Overhead ms x 103

Seconds HLRB, depth 2 HRLB, depth 3 LRB, depth 2 LRB, depth 3

12.20 12.00

HLRB, depth 2 HLRB, depth 3 LRB, depth 2 LRB, depth 3

2.20 2.00

11.80

1.80

11.60

1.60

11.40

1.40

11.20

1.20

11.00

1.00

10.80

0.80

10.60

0.60

10.40

0.40

10.20

0.20

10.00

0.00 Fanout

Fanout 2.00

2.50

3.00

3.50

4.00

4.50

2.00

5.00

2.50

3.00

3.50

4.00

4.50

5.00

Figure 7: The Cost of Lock Inheritance. In Figure 7(a), we see that all the response-time curves are almost at. In other words, the cost of inheritance is insensitive to the fanout. This is because the total number of locks has been xed for the entire hierarchy, thereby also xing the number of inherited locks. However, it is clear that the response time increases signi cantly from a depth of 2 to 3. Since all costs but that of lock inheritance are xed, it is the repetitive inheritance of locks at each level that increases response-time. Figure 7(b) only shows the inheritance costs. We can see a linear increase of that overhead with the depth of the transaction hierarchy. This is because the locks which are requested at the deepest sub-transactions only are not inherited at depth 1, are inherited once at depth 2 and three times at depth 3. Inheritance incurs about 10% of the total locking cost at depth 2 while it 20

reaches almost 20% at depth 3. Above, we played with various fanouts and depths but the overlap between the working sets of sub-transactions was null. If this is likely if nested transactions are used to remotely access multiple servers such as relational database systems, it is more likely that sub-transactions do overlap in object-oriented database systems. Architecturally speaking, they are well-suited for clients to work on a given working set. Therefore, it is likely that clients will nest their transactions upon a given working set for ner atomicity grain or cooperation. Thus, we experimented with the impact of nesting sub-transactions on a given, xed working set. (a) Response Time

(b) Inheritance Time ms x 103

Seconds LRB, depth 2 LRB, depth 3 HLRB, Depth 2 HLRB, Depth 3

40.00 38.00 36.00 34.00

LRB, depth 2 LRB, depth 3 HLRB, Depth 2 HLRB, Depth 3

11.00 10.50 10.00 9.50 9.00

32.00

8.50

30.00

8.00

28.00

7.50

26.00

7.00 6.50

24.00

6.00 22.00

5.50

20.00

5.00

18.00

4.50

16.00

4.00 3.50

14.00

3.00

12.00

2.50

10.00

2.00

8.00

1.50

6.00

1.00

4.00

0.50 0.00

2.00 Fanout 2.00

2.50

3.00

3.50

4.00

4.50

-0.50

Fanout 2.00

5.00

2.50

3.00

3.50

4.00

4.50

5.00

Figure 8: The Cost of Lock Inheritance. Figure 8 depicts the impact of various fanouts and depths on response time. The benchmark used is a single nested transaction which takes the same 5000 locks at each level. Fanout ranges from 2 to 5 and depth ranges from 2 to 3. Figure 8(a) presents the response times while Figure 8(b) shows the inheritance time. Conversely to the previous benchmark with no overlap, the overhead of inheritance now depends on the fanout. In fact, the cost of inheritance grows as the product of the fanout and depth. Inheritance is about 10% of the locking overhead at depth 2 and fanout 2 and it reaches 30% at depth 3 and fanout 5. It is interesting to note that our optimization can reduce the locking overhead up to 40%. The later 10% are due to a factorization of allocation costs upon lock requests. When there is no overlap, the cost of requesting locks is about the same in both implementations (free locks acquired only). When working sets overlap, our design incur less lock management overhead since fewer allocations of lock request structures (HLRBs) are needed. In the traditional approach, an LRB 21

has to be allocated in all cases when an transaction requests a lock, regardless of the fact that its hierarchy might already owns that lock. In our design, given there is a single HLRB per hierarchy, all requests upon a same lock made by transactions of a single hierarchy bene t from the HLRB allocation made by the rst transaction requester.

6 Conclusion In this paper, we have proposed a solution for optimizing the performance of lock inheritance in nested transactions. Our solution fully respects the original nested transaction model. By simply revisiting the locking rules using set-oriented semantics, we are able to trade the cost of lock propagation at sub-transaction commit for a potentially higher cost of con ict detection. Eliminating lock propagation at sub-transaction commit obviously improves both transaction response-time and system throughput. However, this is ne as long as con ict detection is kept inexpensive. By moving away from the traditional lock implementation, we were able to propose a solution which makes the cost of lock requests comparable to that of at transactions. This optimization is made possible by organizing the lock request blocks according to the transaction hierarchies and representing lock owner sets as bitmaps instead of linked lists. Our solution to ecient lock inheritance requires changing only the lock implementation of a traditional lock manager. To assess the performance of our solution, we have implemented a traditional lock manager and modi ed its lock implementation according to our design. Thus, we were able to compare a traditional lock manager with an optimized one. The comparisons demonstrate several points. First, our new implementation of locks supports locking in a nested context with performance that equals those of a traditional design in a at transaction context. Thereby, avoiding the scans of locksets at sub-transaction commits is a pure gain which is our second point. The more nested are the transactions and the more overlapping are the working sets of subtransactions, the more signi cant is the gain. The gain in our benchmarks ranges from 10% having no overlap, a depth and a fanout of two up to 40% having a 100% overlap, a depth of three and a fanout of 5. Notice the gain is upon the overhead of lock operations within the lock manager of a transaction processing, and not upon the global overhead of nested transactions which further includes the overheads for supporting the atomicity and durability properties. Although notice that there is none or very little logging and durability overheads if read-only or read-mostly subtransactions are used. Measurements suggest that a complete removal of lock request blocks in locks would induce a further reductions of both the spatial and time overheads of locking. The open questions are clearly what would be the adequate bitmap size and what would be the impact of background collection 22

upon performance.

References [1] R. G. G. Cattell. Object Data Management : Object-Oriented and Extended Relational Database Systems. Addison Wesley, 1991. [2] Partha Dasgupta, Richard J. Leblanc Jr., and William F. Appelbe. The Clouds distributed operating system: Functional description, implementation details and related work. In Proc. of International Conference on Distributed Computing System, pages 2{9, San Jose, California, U.S.A, June 1988. IEEE. [3] J. L. Eppinger, L. B. Mummert, and A. Z. Spector. Camelot and Avalon : A Distributed Transaction Facility. Morgan Kaufmann, San Mateo, CA, 1991. [4] Hector Garcia-Molina. Modeling long-running activities as nested sagas. Data Engineering, 14(1):249{ 259, March 1991. [5] Vibby Gottemukkala and Tobin J. Lehman. Locking and latching in a memory-resident database system. Proc. of International Conference on Very Large Database, pages 533{544, 1992. [6] Jim Gray and Andreas Reuter. Transaction Processing : Concept and Techniques. Morgan Kaufmann, 1993. [7] T. Harder, M. Pro t, and H. Schoning. Supporting parallelism in engineering databases by nested transactions. Technical Report 34/92, Kaiserslautern University, December 1992. [8] T. Harder and K. Rothermel. Concurrency control issues in nested transactions. VLDB Journal, 2(1):39{74, 1993. [9] Gregory G. Kenley. An action management system for a decentralized operating system. Technical Report GIT-ICS-86/01, School of Information and Computer Sciences, Georgia Institute of Technology, Atlanta, Georgia, January 1986. [10] B. Liskov, D. Curtis, P. Johnson, and R. Schei er. Implementation of Argus. In Proc. of ACM Symposium on Operating System Principles, pages 111{122. ACM, 1987. [11] Barbara Liskov. Distributed programming in Argus. Communication of the ACM, 31(3):300{312, March 1988. [12] J. Eliot B. Moss. Log-based recovery for nested transactions. In Proc. of International Conference on Very Large Database, pages 427{432, 1987. [13] J. Elliot B. Moss. Nested Transactions : An Approach to Reliable Distributed Computing. PhD thesis, Massachussets Institute of Technology, April 1981. [14] E. T. Mueller, J. D. Moore, and G. J. Popek. A nested transaction mechanism for LOCUS. In Proc. of ACM Symposium on Operating System Principles, pages 71{89, October 1983. [15] M. T. O zsu and P. Valduriez. Principles of Distributed Database Systems. Prentice-Hall, Englewoods Cli s, NJ, 1991. [16] Sharon E. Perl. Distributed commit protocols for nested atomic actions. Technical Report MIT/LCS/TR-431, Massachussets Institute of Technology, November 1988. [17] Calton Pu and Jerre D. Noe. Design and implementation of nested transactions in Eden. In Proc. of Symposium on Reliability in Distributed Software and Database Systems, pages 126{136, March 1987. [18] K. Rothermel and C. Mohan. ARIES/NT : A recovery method based on write-ahead logging for nested transactions. In Proc. of International Conference on Very Large Database, pages 337{346, 1989.

23

[19] M. Rukoz. Hierarchical deadlock detection for nested transactions. Distributed Computing, 4:123{129, 1991. [20] Santosh K. Shrivastava and Stuart M. Wheater. Implementing fault-tolerant distributed applications using objects and multi-coloured actions. In Proc. of International Conference on Distributed Computing System, pages 203{210, Paris, France, May 1990. [21] Bernd Walter. Nested transactions with multiple commit points: An approach to the structuring of advanced database applications. In Proc. of International Conference on Very Large Database, Singapore, August 1984. [22] Gerhard Weikum and Hans-Jorg Schek. Architectural issues of transaction management in multi-layered systems. In Proc. of International Conference on Very Large Database, Singapore, August 1984.

24