Concurrent and Recoverable Restructuring Method for Database Indices
Ibrahim Jaluta Department of Computer Science and Engineering, Helsinki University of Technology, Finland
M. Abusaa Fekihal Department of Computer Science University of Sudan
[email protected]
[email protected]
Abstract In this paper, we present a top-down restructuring method for B±trees in which a structure modification such as page split or merge is executed as an atomic action. Each structure modification is logged using a single redo-only log record. The execution of each structure modification involves X-latching three pages at most on two adjacent levels of the B±tree for short duration. A B±tree structure modification, once executed to completion will never be undone no matter if the transaction that triggered such a structure modification commits or aborts later on. Recoverability of the B±tree structure is guaranteed because the redo pass of our ARIES-based recovery protocol will produce a structurally consistent tree. The method improves concurrency, simplifies recovery, reduces the amount of logging, and can be used in systems that use redo-only or redo-undo recovery protocols.
1. Introduction In the literature there are two main techniques designed to handle tree-structure modifications such as page splits or merges. In the first technique [1,2,3], an updating transaction T acquires a tree latch in X mode, before it starts the execution of a structure modification bottom up. That is, concurrent tree-structure modifications are serialized through the use of a tree latch. Moreover, to guarantee a correct recovery, leaf page updating by other concurrent transactions are prevented while a tree-structure modification is still going on. This technique could limit the degree of concurrency, because when a transaction T holds a tree latch in X
mode to perform a structure modification, then new transactions will be prevented from accessing the B±tree, and at the same time other transactions are prevented from updating leaf pages. In the second technique [4], an updating transaction Xlatches the pages along the structure-modification path topdown, before it starts the execution of the structure modification bottom-up. Hence, two structure modifications can be executed concurrently only if they occur on completely distinct paths. Two structure modifications having at least one page in common on their paths will be serialized. However, this technique suffers from two problems. The first one is the reduction in concurrency due to the X-latching of all the pages on the structuremodification path at once. The second problem is that, if leaf-page updates and structure modifications are allowed to execute concurrently, then during restart recovery the Brtree could be structurally inconsistent and thus fail to perform logical undo operations. To save time and efforts when a transaction aborts or system fails, we would like the B±tree structure modifications made by a transaction T to be committed regardless of whether T will eventually commit or abort. A structure modification can be executed as an atomic action either by executing the structure modification as a nested top action [1,2,3] or by generating a special transaction or system transaction [5]. In the nested-top-actions approach when an updating transaction T needs to execute a B±tree structure modification, then T saves the LSN of the last log record it has generated, before starting the execution of the structure modification. Then T executes the structure modification, updates the involved pages, generates log records, and
updates the Page-LSNs. When T completes the structure modification, it generates a dummy compensation log record (CLR), which is set to point to the log record whose LSN was saved previously. Therefore, the CLR lets the transaction T, if it were to rollback after completing the structure modification, bypass the log records related to the B±tree structure modification. The above techniques suffer from the following problems. If the updating transaction T aborts or the system fails before writing the dummy log record CLR to disk or before committing the special transaction, then the incomplete structure modification would have to be undone. Thus, time and efforts are wasted. Moreover, in the special-transaction approach, an extra pass during the undo phase of the ARIES [9] restart recovery may be needed to rollback the uncommitted special transactions, i.e., to undo incomplete structure modifications. We present a top-down restructuring method for B±trees similar to the method in [6], but ours avoids the problems associated with the techniques presented in [1,2,3] and [4]. In our method when an updating transaction T reaches a full (about-to-underflow) leaf page, then T releases the X latch on P (to avoid a deadlock with another transaction traversing the tree) and retraverses the Brtree using latch-coupling protocol with X latches and executes the structure modifications along its search path as atomic actions. Execution of each structure modification involves X-latching three pages at most on two adjacent levels of the tree for short duration. Each execution of a structure modification is logged using a single redo-only log record. Each successfully completed structure modification brings the B±tree into a structurally consistent state whenever the tree was structurally consistent initially. Once a structure modification is executed to completion, then the completed structure modification need never be undone during normal processing or restart recovery even if the transaction that triggered such a structure modification commits or aborts later on.
2. B±tree index We use the B±tree [7] as a sparse index to the database, so that the leaf pages store the database records. In addition, we assume the B±tree is a unique index and the key values are of fixed length. Each interior page of the B±tree contains a list of index records k1 , P1 , k2 , P2 , …, kn , Pn , where k1 , k 2 , …, k n are key values and P1 , P2 , …, Pn are page identifiers. A key value in an interior page is always greater than or equal to the highest key value in the corresponding child page. Each leaf page of the B±tree contains a list of data records k 1 , v1 , k 2 , v 2 , …, k n , v n where k 1 , k 2 , …,
k n are the key values of the database records and v1 , v 2 , …,
v n are the data parts of the records. Each leaf page stores its high-key record of the form (k n 1 , Pn 1 ) where the high-key value k n 1 of a page is the highest key value that can appear in that page, while Pn 1 (side link) is the page-id of its right sibling page. The high-key record of the last leaf page is (,0). The key value serves as a special key value which is larger than any key value that can be stored in the database. We assume that each Brtree page can hold a maximum of M 1 t 8 database records (excluding the high-key record) and a maximum of M 2 t 8 index records. Let m1 , 2 d m1 M 1 / 2, and m 2 , 2 d m 2 M 2 / 2, be the chosen minimum load factors for a non-root leaf page and a non-root index page, respectively. Choosing the loading factors m1 and m 2 as above let us avoid the problems associated with merge-athalf [8]. A Brtree is structurally consistent if it satisfies the basic definition of the Brtree, so that each page can be accessed from the root by following child links. We say that a Brtree page P is about to underflow if (1) P is the root page and contains only two child links, (2) P is a non-root leaf page and contains only m1 database records, or (3) P is a non-root index page and contains only m 2 index records.
3. B±tree structure modification operations The following procedures are used by an updating transaction T to execute the structure modification operations (split, merge, redistribute, increase-tree-height, and decreasetree-height) as atomic actions. Split(P,Q). Split the X-latched full page Q into Q and the newly allocated and X-latched page Q´, link Q´ to its parent page P, and generate a redo-only log record. When the operation is completed, the X latches on the storage-map page M, P, the page (Q or Q´) that does not cover the search key value are released while the X latch on the other page is retained. Step 1. X-latch the storage-page map M and page Q´ marked as unallocated in M, and mark Q´ as allocated in M. Step 2. Move the upper half of the records from Q to Q´, chain Q´ as a right sibling of Q if Q is a leaf page, and link Q´ to its parent P. Step 3. Generate a redo-only log record where V is the set of records moved from Q to Q´ and n is the LSN of the previous log record generated by transaction T. Update the Page-LSNs of M, P, Q, and Q´. Step 4. Release the X latches on M, P and the page (Q or Q´) that does not cover the search key value and keep the X latch on the other page. Step 5. Set Q:= the page (Q or Q´) that covers the search key value.
Merge(P,Q,R). Merge the X-latched child pages Q and R of the X-latched parent page P, unlink R from its parent P, deallocate R, and generate a redo-only log record. When the operation is completed, the X latches on the storage-map page M, P and R are released while the X latch on Q is retained.
Step 1. X-latch the storage-page map M. Step 2. Move all the records from R to Q, update the side-link of Q if Q is a leaf page, unlink Q from its parent P, and mark Q as unallocated in M. Step 3. Generate a redo-only log record where V is the set of records moved from R to Q and n is the LSN of the previous log record generated by transaction T. Update the Page-LSNs of M, P, Q and R. Step 4. Release the X latches on M, P and R, and keep the X latch on Q. Redistribute(P,Q,R). Redistribute the records between the X-latched child pages Q and R of the X-latched parent page P, update the child link associated with Q in P, and generate a redo-only log record. When the operation is completed, the X latches on P and the page (Q or R) that does not cover the search key value are released while the X latch on the other page is retained. Step 1. Redistribute the records in Q and R, update the child link associated with Q in the parent P. Step 2. Generate a redo-only log record where V is the set of the records moved and Y is the page (Q or R) that received the records in V and n is the LSN of the previous log record generated by transaction T. Update the Page-LSNs of the pages P, Q and R. Step 3. Release the X latches on P, and the page (Q or R) that does not cover the search key value, and keep the X latch on the other page. Step 4. Set Q:= the page (Q or R) that covers the search key value. Increase-tree-height(P). Increase tree height by distributing the records in the X-latched root page P between the newly allocated and X-latched child pages P´ and P´´, link P´ and P´´ to their parent P, and generate a redo-only log record. When the operation is completed, the X latches on the storage-map page M, P and the page (P or P´´) that does not cover the search key value are released while the X latch on the other page is retained. Step 1. X-latch the storage-map page M and some pages P´ and P´´ marked as unallocated in M, and mark P´ and P´´ as allocated in M.
Step 2. Move the upper half of the records from P to P´´, move the remaining records from P to P´, and link the child pages P´ and P´´ to their parent P. Step 3. Generate a redo-only log record where V1 is the set of records that moved from P to P´ and V2 is the set of records that moved from P to P´´, and n is the LSN of the previous log record generated by transaction T. Update the PageLSNs of M, P, P´, and P´´. Step 4. Release the X latches on M, P and the page (P´ or P´´) that does not cover the search key value, and keep the X latch on the other page. Step 5. Set P:= the page (P´ or P´´) that covers the search key value. Decrease-tree-height(P,Q,R). Decrease tree height by deleting the remaining index records in the X-latched root page P and moving the records in the X-latched child pages Q and R to P, deallocate Q and R, and generate a redo-only log record. When the operation is completed, The X latches on the storage-map page M, Q and R are released while the X latch on P is retained. Step 1. X-latch the storage-map page M. Step 2. Delete the remaining records in P, move all the records in Q and R to P, and mark Q and R as unallocated in M. Step 3. Generate a redo-only log record where V1 is the set of records moved from Q to P, V2 is the set of records moved from R to P, and n is the LSN of the previous log record generated by transaction T. Update the Page-LSNs of M, P, Q and R. Step 4. Release the X latches on M, Q and R, and keep the X latch on P.
4. B±tree traversal and structure modifications A transaction T takes as input a key value k and traverses the Brtree, using the latch-coupling protocol with S latches (Algorithm 1). When the leaf page P covering the search key value k is reached, then P is S-latched for fetching and Xlatched for updating. Algorithm 1. Traversal using latch coupling with S latches. Step 1. Set P:= the Page-id of the root page of the Brtree, and S-latch P. Step 2. If P is a leaf page, then S-latch P for fetching and Xlatch P for updating, and return. Step 3. Search P for the child page Q covering the search key value k, S-latch Q, unlatch P, set P:= Q, and go to Step 2. If an updating transaction T traversing the Brtree reaches a full (about-to-underflow) leaf page P, then T releases the X latch on P and retraverses the Brtree using latch-coupling
protocol with X latches and executes the structure modifications (page splits or merges) along its search path as atomic actions. An updating transaction T uses Algorithm 2 to execute page splits and Algorithm 3 to execute page merges.
physiologically to the page and the Page-LSN is set to the log record’s LSN. Otherwise, the logged update does not require redo. No logging is performed during the redo pass. By the end of the redo pass, the Brtree will become structurally consistent.
Algorithm 2. Step 1. Let P:= the Page-id of the root page, and X-latch P. Step 2. If P is full, then increase-tree-height(P). Step 3. If P is a leaf page, then return. Step 4. Search P for the child page Q covering the search key value k, and X-latch Q. If Q is full, then split(P,Q). Otherwise, unlatch P. Step 5. Set P:= Q and go to Step 3.
Example 1: Assume that the split operation is logged as n: . Then the split operation is redone during the redo pass of our restart recovery protocol as follows.
Algorithm 3. Step 1. Let P:= the Page-id of the root page, and X-latch P. Step 2. If P has only two child links, then decrease-treeheight(P). Step 3. Search P for the child page Q covering the search key value k, and X-latch Q. Step 4. If Q is the rightmost child of its parent, then go to Step 7. Step 5. If Q is about to underflow, then X-latch the right sibling page R of Q. Otherwise, unlatch P, set P:= Q, and go to Step 9. Step 6. If Q and R can be merged, then merge(P,Q,R). Otherwise redistribute(P,Q,R). Set P:= Q, and go to Step 9. Step 7. If Q is about to underflow, then release the X latch on Q (to avoid a deadlock), X-latch the left sibling page L of Q and then X-latch Q. Otherwise, unlatch P, set P:= Q, and go to Step 9. Step 8. If L and Q can be merged, then merge(P,L,Q). Otherwise, redistribute(P,L,Q). Set P:= L. Step 9. If P is a leaf page, then return. Otherwise, go to Step 3.
5. Recovery Our restart recovery protocol is based on ARIES [9]. The restart recovery consists of the analysis, redo and undo passes. The analysis pass determines the starting point (Redo-LSN) of the redo pass in the log, and the list of the transactions that need to be rolled back or whose rollback need to be completed. The redo pass begins at the log record whose LSN equals to Redo-LSN, and then proceeds forward to the end of the log. For each redoable log record the following is performed. If the Page-LSN of the page that mentioned in the log record is less than the log record’s LSN, then the logged update is redone, that is, applied
Step 1. If Q is not in the modified-page table or Q is there with Rec-LSN(Q) > n, then go to Step 3. Step 2. X-latch Q. If the Page-LSN(Q) < n, then delete all the records in the set V from Q, set the high-key record in Q:= (u´,Q´) when Q is a leaf page, and set the Page-LSN(Q):= n. Otherwise, set the Rec-LSN(Q):= the Page-LSN(Q)+1. Unlatch Q. Step 3. If the storage-map page M is not in the modified-page table or M is there with Rec-LSN(M) > n, then go to Step 5. Step 4. X-latch M. If the Page-LSN(M) < n, then mark page Q´ as allocated in M, and set the Page-LSN(M):= n. Otherwise, set the Rec-LSN(M):= the Page-LSN(M)+1. Unlatch M. Step 5. If Q´ is not in the modified-page table or Q´ is there with Rec-LSN(Q´) > n, then go to Step 7. Step 6. X-latch Q´. If the Page-LSN(Q´) < n, then Format Q´ as an empty Brtree page, insert all the records in the set V into Q´, and set the Page-LSN(Q´):= n. Otherwise, set the Rec-LSN(Q´):= the Page-LSN(Q´)+1. Unlatch Q´. Step 7. If P is not in the modified-page table or P is there with Rec-LSN(P) > n, then return. Step 8. X-latch P. If the Page-LSN(Q) < n, then link the child page Q´ to its parent P by inserting the index record (u,Q´) into P and changing the index record (u,Q) in P into (u´,Q), and set the Page-LSN(P):= n. Otherwise, set the RecLSN(P):= the Page-LSN(P)+1. Unlatch P. In the undo pass, all forward-rolling transactions are aborted and rolled back and the rollback of all backwardrolling transactions is completed. The log is scanned backward from the end of the log until all updates of such transactions are undone. When a redo-undo log record of type “inserts” or “delete” is encountered, then the logged update is undone. When a log record of type “compensation” (CLR) is encountered, then no action is performed except that the value of the Undo-Next-LSN field of such a CLR is used to determine the next log record to be processed. When a redo-only log record of type “split” or “merge” or “redistribute” or “increase-tree-height” or “decrease-treeheight” is encountered, then no action is performed except that the value of the Prev-LSN of such log record is used to determine the next log record to be processed.
6. Conclusion In our restructuring method for Brtrees, an updating transaction executes structure modification as atomic actions in top-down manner. Structure modifications are protected by X latches that are held for the duration of the structure modification. Each structure modification is logged using a single redo-only log record. In our method, a structure modification X-latches three pages at most simultaneously. A structure modification which is executed to completion will never be undone regardless of the outcome of the transaction that triggered such structure modification.
Performance of Concurrency Control Mechanisms in Centralized Database Systems (V.Kumar, ed.), Prentice Hall, 1996, pp 248-306. [4] J. Gray and A. Reuter, Transaction Processing: Concepts and Techniques, Morgan Kaufmann, 1993. [5] D. Lomet, B. Salzberg, Concurrency and recovery for index trees. The VLDB Journal 6 (1997), 224-240. [6] Mond, Y., Raz, Y. “Concurrency control in Brtree databases using preparatory operations”, Proc. of the 11th VLDB Conference, 1985, pages 331-334.
7. References
[7] R. Bayer and M. Schkolnick, Concurrency of operations on Btrees. Acta Informatica 9 (1977), 1-21.
[1] C. Mohan, ARIES/KVL: a key-value locking method for concurrency control of multi-action transactions operation on B-tree indexes. In: Proc. of the 16th VLDB Conference, 1990, pp 392-405.
[8] I. Jaluta, B-tree concurrency control and recovery in a clientserver database management system. Ph.D. Thesis and report TKOA37/02, Department of Computer Science and Engineering, Helsinki University of Technology, 2002. http://lib.hut.fi/Diss/2002/isbn9512257068/.
[2] C. Mohan and F. Levine. ARIES/IM: an efficient and high concurrency index management method using write-ahead logging. In: Proc. of the 1992 ACM SIGMOD International Conference on Management of Data, pp 371-380.
[9] Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P. “ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging”, ACM Trans. Database Systems 17 (1992), pages 94-162.
[3] C. Mohan, Concurrency control and recovery methods for Brtree indexes: ARIES/KVL and ARIES/IM. In: