Proceedings of the 43rd Hawaii International Conference on System Sciences - 2010
Hippocratic Databases: Extending Current Transaction Processing Approaches to Satisfy the Limited Retention Principle Markus Kirchberg Institute for Infocomm Research, A*STAR Singapore
[email protected]
Abstract The concept of Hippocratic Database Systems has been introduced by Agrawal et al.. Based on international privacy legislation, ten privacy principles have been identified that an information system should meet in order to enforce privacy and data protection properties. Among those, the limited retention principle and its challenges to traditional database algorithms are considered in this paper. In particular, we examine how data records cannot merely be deleted from the database but also how their corresponding log records may be erased from the log (without violating the necessity to retain data recoverability properties). To the best of our knowledge, there exists no current transaction processing or data recovery approach that supports the deletion of data from log entries as desired by Hippocratic Database Systems.
1. Introduction In 2002, Agrawal et al. [2] introduced the concept of a Hippocratic Database System (HDBS). HDBSs correspond to a technology solution to enforce stricter privacy policies. The proposal aims at automating and integrating privacy policies into database architectures. HDBSs shift the accountability for protecting privacy of information from database users to the database systems themselves. As such some core properties of traditional database systems (DBSs) must be rethought. HDBSs are based on ten Hippocratic principles: Purpose specification, consent, limited collection, limited use, limited disclosure, limited retention, accuracy, safety, openness, and compliance. While each of them has their own implications and corresponding research and development challenges, we focus on the limited retention principle only in this paper. Limited retention implies that data items are erased from the database and all its corresponding structures when the designated retention period has expired. While traditional DBSs support the deletion of data from the database, HDBSs go a step further and request the data item’s associated log records, audit
Sebastian Link Victoria University of Wellington New Zealand
[email protected]
trail information, checkpoint data, backup data etc. be erased. This, however, is a non-trivial endeavor. In this paper, we focus on the impact of the limited retention principle on current traditional database transaction processing approaches. We first examine two data recovery algorithms and consider their suitability for extension to HDBSs. After we identified a suitable data recovery approach, we discuss necessary amendments and additions to its fundamental data structures and routines in order to support the removal of data record’s associated log entries from the transaction log while preserving fundamental data, crash and media recovery properties. Our contribution consists of an evaluation of the ARIES and C-ARIES recovery algorithms regarding their suitability for HDBSs. Having identified CARIES as the more suitable approach, we then propose necessary extensions to its core structures and routines. The paper is structured as follows: Section 2 provides a brief introduction to HDBSs. Section 3 discusses the two recovery algorithms of our choosing followed (in Section 4) by a discussion of how they support HDBSs’ limited retention challenges. Section 5 proposes an augmented version of the C-ARIES algorithm in order to support the limited retention principle. Finally, Section 6 concludes our paper.
2. Hippocratic Database Systems Based on international privacy legislation, Agrawal et al. [2] identified ten privacy principles that an information system should meet in order to enforce privacy and data protection. On the basis of these ten principles, the concept of HDBSs has been proposed: ‘Purpose’ serves as the central concept – with a corresponding attribute being maintained for each piece of information. This is realized using specially maintained privacy metadata tables, which define for each purpose and for each piece of information (i.e. for each attribute) collected for that purpose: • The external-recipients to whom the information can be given out to;
978-0-7695-3869-3/10 $26.00 © 2010 IEEE
1
Proceedings of the 43rd Hawaii International Conference on System Sciences - 2010
• The retention-period for how long the information is stored; and • The authorized-users who can access this information. In the suggested design, the collected information is distributed across two tables: External recipients and retention period are in the privacy-policies table; and Authorized users are in the privacy-authorizations table. The purpose is stored in both (refer to Table 1). Table privacy-policies privacyauthorizations
Attributes purpose, table, attribute, external-recipients }, retention purpose, table, attribute, authorized-users }
techniques. Rollback and recovery processing are concerned with the atomicity property, but also help with ensuring isolation, consistency, and durability.
{
To the best of our knowledge, there exists no transaction processing or data recovery approach that supports the deletion of log records from the transaction log (while preserving the ability to perform data, transaction, crash, and media recovery). Hence, we first introduce the most popular recovery approach, ARIES, and then consider a recently published highly concurrent incarnation of ARIES.
{
3.1. The ARIES Recovery Algorithm
Table 1: Privacy Metadata Schema [2, Fig. 2] When a user submits a query to a HDBS, the system not only verifies that the user is authorized to access the required data items, but answers only queries for which the purpose is equal to that for which the information has been collected. Further, HDBSs do not disclose information for purposes other than those for which the owner of the information has previously given consent – i.e. enforcing limited use and limited disclosure principles. The limited retention principle is enforced by the DataRetentionManager – a module that deletes information that has outlived their purpose. If a certain data item was collected for a set of purposes, it is kept for the retention period of the purpose with the highest retention time. In this paper, we focus on the impact of this new module on traditional database transaction processing approaches.
3. Transaction Processing Approaches From an internal point of view, users access databases in terms of transactions. Fast response times and a high transaction throughput are crucial issues for all DBSs. Hence, transactions are executed concurrently. The transaction management system (TMS) ensures a proper execution of concurrent transactions. It implements concurrency control and recovery mechanisms to preserve the well-known ACID principles. In a TMS, the transaction manager and the recovery manager together typically ensure these ACID principles. While the former is mainly concerned with normal processing, the latter takes over in the case of a system failure. During normal processing, concurrency control protocols are utilized to ensure the isolation and consistency properties of transactions and the data accessed. Durability is commonly ensured with the help of logging
The ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) algorithm [5, 6] has had a significant impact on the current thinking on transaction logging and recovery. It has been incorporated into IBM's DB2 Universal Database, IBM's Lotus Notes and Domino, Microsoft SQL Server and NT file system, Apache Derby and a number of other systems. ARIES is based on the Write Ahead Logging (WAL) protocol that ensures recoverability of a database in the presence of a crash. All updates to all pages are logged. ARIES uses a log sequence number (LSN) stored on each page to correlate the state of the page with logged updates of that page. By examining the LSN of a page, the PageLSN, it can be easily determined which logged updates are reflected in the page. Being able to determine the state of a page with respect to logged updates is critical while repeating history, since it is essential that any update be applied once and only once. Failure to respect this requirement may result in a violation of data consistency. Updates performed during forward processing of transactions are described by Update Log Records (ULRs). However, logging is not restricted to forward processing. ARIES also logs, using Compensation Log Records (CLRs), compensations of updates of aborted/incomplete transactions performed during partial or total rollbacks. By appropriate chaining of CLR records to log records written during forward processing, a bounded amount of logging is ensured during rollbacks, even in the face of repeated failures during crash recovery. This chaining is achieved by: 1) Assigning LSNs in ascending sequence; and 2) Adding a pointer, PrevLSN, to the most recent preceding log record written by the same transaction to each log record. When the undo of a log record causes a CLR record to be written, a pointer, the UndoNextLSN, to the predecessor of the log record being undone is added to the CLR record. The UndoNextLSN keeps track of the progress of a rollback. It tells the system from where to
2
Proceedings of the 43rd Hawaii International Conference on System Sciences - 2010
continue the rollback of the transaction if a system failure were to interrupt the completion of the rollback. When performing crash recovery, ARIES makes three passes (i.e. Analysis, Redo, Undo) over the log. During Analysis, ARIES scans the log from the most recent checkpoint to the end of the log. It determines: 1) The starting point of the Redo phase by keeping track of dirty pages; and 2) The list of transactions to be rolled back during Undo by monitoring the state of transactions. During Redo, ARIES repeats history. It is ensured that updates of all transactions have been executed once and only once. Thus, the database is returned to the state it was in immediately before the crash. Finally, Undo rolls back all updates of transactions that have been identified as active at the time the crash occurred.
3.2. The C-ARIES Recovery Algorithm The C-ARIES recovery algorithm [7] is an adaptation of the original ARIES algorithm. C-ARIES extends the original algorithm with the capability to perform transaction aborts and crash recovery in a highly concurrent manner. C-ARIES preserves the desirable properties of the original ARIES algorithm. Enhancements were made to the Redo and Undo phases whereby these phases are now performed on a page-by-page basis. This results in a much higher degree of concurrency since operations that would normally be performed serially can now be performed concurrently. This page-by-page technique also provides the basis for the improved method to transaction aborts during normal processing. In C-ARIES, extensive changes are made to the CLR, both in terms of the information it contains and the way in which it is used. The rationale behind these modifications can be understood by observing the differences in how C-ARIES and ARIES perform undo operations: In ARIES, operations are undone one-at-atime in the reverse order to which they were performed by transactions. However, in order to increase concurrency, C-ARIES can perform multiple undo operations concurrently, where updates to individual pages are undone independently of each other. In C-ARIES, a new log record type, the Special Compensation Log Record (SCR), is introduced. It is almost identical to the modified version of the CLR, the only differences being the point-in-time at which SCR records are written: During normal rollback processing, operations are undone in the reverse order to which they were performed by individual transactions. However, during crash recovery rollback, operations are undone in reverse order that they were performed on individual pages. Having separate log
records for compensation during recovery and normal rollback allows C-ARIES routines to exploit this fact. In C-ARIES, a PageLastLSN pointer is added to all ULR, SCR and CLR log records. It stores the LSN of the record that last modified an object on this page. Recording these PageLastLSN pointers provides an easy method of tracing all modifications made to a particular set of objects (stored on the same page). With C-ARIES, recovery remains split into three phases: Analysis, Redo and Undo. However, recovery takes place on a page-by-page basis, where updates are reapplied (Redo phase) and removed from (Undo phase) pages independently from one another. The Redo phase reapplies changes to each page in the exact order that they were logged and the Undo phase undoes changes to each page in the exact reverse order that they were performed. Since the state of each page is accurately recorded (by use of the PageLSN) the consistency of the database will be maintained during such a process. C-ARIES has been shown to be a refinement of the original ARIES algorithm in a way that all core properties are being preserved [4].
4. Transaction Processing in the Presence of a Limited Retention Property In this section, we discuss HDBSs’ limited retention principle in greater detail. In addition, we consider how well current data recovery algorithms, i.e. ARIES and C-ARIES, support such a property. HDBSs’ limited retention property means that personal information must only be retained as long as it is necessary for the fulfillment of the purposes for which it has been collected. Thus, data items must be deleted once the retention period has expired. However, merely deleting the respective data items from a data table is not sufficient in HDBSs. Instead, it is desired to erase the data items including any references to their prior existence. That is, data tables, checkpoint information, log files, backups, audit trails, etc. must be amended in a way that all references to the data items are being removed [2]. However, this is a non-trivial task given that such amendments must not affect the HDBS’s ability to perform data recovery. To the best of our knowledge, this problem has not been addressed by any existing database system or transaction processing approach. In order to better understand the problem at hand, let us consider how this data deletion problem affects state-of-the-art data recovery approaches such as ARIES and C-ARIES (Table 2 summarizes the most relevant information). Note: For the purpose of this
3
Proceedings of the 43rd Hawaii International Conference on System Sciences - 2010
paper, we restrict ourselves to the problem of deleting data items from data tables, checkpoint information and log files. Cleaning backups and audit trails are well beyond the scope of this paper and are only briefly addressed at the end of the paper as part of the discussion of future research issues.
Persistent data items Log records
Checkpoint data
ARIES C-ARIES Maintained in the body of a page with a single PageLSN field being added to the page’s header. Chained using Chained using PrevLSN and PageLastLSN UndoNextLSN and UndoneLSN pointers; transaction- pointers; pagebased chaining. based cahining. Fuzzy checkpoints are supported adding a snapshot of (ARIES) or the smallest (i.e. oldest) RecLSN value (C-ARIES) from the dirty pages table as well as a snapshot of the active trans-action table to the checkpoint data.
Table 2: ARIES versus C-ARIES From the point of view of ARIES-based recovery mechanisms, data items are maintained on pages. Each page is segmented into a header (containing a page index and other administrative information) and body (storing the actual data records) [3]. While it is common that a single page contains multiple data records, it is also possible that one data record is distributed across multiple pages (using so-called tuple identifiers to link the individual parts [3]). Both ARIES and C-ARIES add an additional piece of information, the PageLSN, to each page’s header. This PageLSN refers to the LSN of the most recently written log record that has resulted in an update of a data item stored in the page’s body. Using PageLSN values, it can be determined easily which logged updates are reflected in the page. This is critical in particular while repeating history (as each update must only be applied once and only once). However, PageLSNs do not help in determining which of a page’s data items in its body were modified; a necessary piece of information when trying to modify a page’s content. Nevertheless, the PageLSN provides access to the most recently written log record that was applied to the respective page. Log records are maintained reflecting a forward history of all updates to pages (including commit, abort, compensation and checkpoint information). By appropriate chaining of log records, a bounded amount of logging is ensured during rollbacks, even in the event of repeated failures during crash recovery. While both ARIES and C-ARIES assign a monotonically
increasing LSN to each log record, differences exist in the way individual log records are chained together. While ARIES uses PrevLSN pointers to chain log records per transaction, C-ARIES maintains: • A chain of PrevLSN pointers for each transaction’s ULR records only (which assists transaction aborts during normal processing); • A chain of PageLastLSN pointers for each page’s (ULR, CLR and SCR) log records (across transaction boundaries). Recording these PageLastLSN pointers is what makes C-ARIES more suitable for supporting the limited retention property as it provides an easy method for tracing all modifications made to a particular set of data (stored in the same page). ARIES’ PrevLSN-based chaining does not offer any means of efficiently tracing modifications made to a particular data item or page as it was designed to support both redo and undo processing on a transaction basis only. Thus, ARIES would require a scan of the entire log in order to locate all the log records that describe modifications to a particular page. ARIES and C-ARIES both support fuzzy checkpoints, which avoid interrupting normal processing during the process of writing checkpoint data to disk, in order to make crash recovery more efficient. While both approaches record data about the transactions that were active as well as the pages that were marked dirty (i.e. modified in main-memory, but not yet made persistent) during the checkpoint execution, they slightly differ with regard to the data added to checkpoint log records. ARIES adds a copy of the Active Transaction Table (containing TransId, LastLSN, and Status for each active transaction) and the Dirty Pages Table (containing PageId and RecLSN values for each dirty page) to the checkpoint log record. C-ARIES, however, adds the following information for each active transaction: • TransId: An identifier of the active transaction; • FirstLSN: The LSN of the first log record written for the transaction; and • Status: Either ‘Active’ or ‘Commit’. In addition, a DirtyLSN, which is the smallest (i.e. oldest) RecLSN value from the dirty pages table, is also added to the checkpoint log record. Considering our intention to remove log records from the log, such checkpoint log records have to be updated if LastLSN, FirstLSN, RecLSN or DirtyLSN referenced log records are affected. Comparing ARIES and C-ARIES, the latter will require significantly fewer updates to checkpoint log records as the (rather large) dirty pages table is not stored, but only its smallest RecLSN value. Considering both approaches, ARIES’ lack of page-based chaining and, though less critical, its
4
Proceedings of the 43rd Hawaii International Conference on System Sciences - 2010
including of the entire dirty pages table into each checkpoint log record make it a less suitable solution for supporting HDBSs’ limited retention principle. CARIES, instead, is, to the best of our knowledge, the most suitable existing data recovery approach for this purpose. As such, we discuss the necessary changes to its data structures and routines in the next section.
5. Augmenting C-ARIES to Support the Limited Retention Principle In order to be able to amend information written during the execution of checkpoints, we need a chain of checkpoint log records. While this is not done during the execution of a checkpoint, it can be added easily with minimal cost. The general fuzzy checkpoint procedure (for ARIES and C-ARIES) is as follows: 1. Append a Begin Checkpoint Log Record, which indicates the start of a new checkpoint’s execution, to the end of the log; 2. Construct an End Checkpoint Log Record and append it to the end of the log; 3. After the End Checkpoint Log Record has been written to stable storage, a special Master Record containing the LSN of the Begin Checkpoint Log Record is written to a known location on stable storage. In case of a crash, rollback commences by locating the most recent Begin Checkpoint Log Record by reading out its LSN from the Master Record. Now, in order to chain checkpoint log records together, we simple amend this procedure as follows: 1. Append a Begin Checkpoint Log Record with an added PrevLSN value (i.e. the LSN of the previous checkpoint log record obtained by reading out the value of the Master Record on stable storage) to the end of the log; 2. Construct an End Checkpoint Log Record and append it to the end of the log; 3. After the End Checkpoint Log Record has been written to stable storage, a special Master Record containing the LSN of the Begin Checkpoint Log Record is written to a known location on stable storage. The cost of this chaining of Begin Checkpoint Log Records consist of one additional disk access (in order to read out the value from the Master Record) and a few additional bytes of log space (i.e. those necessary to store a PrevLSN value) to accommodate the extended Begin Checkpoint Log Record. In order to ensure that the log retains its recoverability properties, it is necessary to introduce a number of new log record types to C-ARIES:
• Begin Erase Log Record: Marks the beginning of an Erase Record Ri from Page Pj operation. It contains LSN, PrevLSN := nil, TransId := nil, Type := ‘Begin Erase Log Record’, PageId := Pj, PageLastLSN := Pj.PageLSN, and RecId := Ri values. This log record is appended to the end of the log with every Erase Record operation, followed by a flush of the log tail to stable storage. This is necessary to retain CARIES’ ability to perform crash recovery – in particular, it ensures that during the Redo pass, an incomplete Erase Record operation (i.e. one that has no corresponding End Erase Log Record) is being redone. • End Erase Log Record: Marks the end of an Erase Record Ri from Page Pj operation. It contains LSN, PrevLSN := LSN of the corresponding Begin Erase Log Record, TransId := nil, Type := ‘End Erase Log Record’, PageId := Pj, PageLastLSN := Pj.PageLSN, and RecId := Ri values. This log record is appended to the end of the log with every Erase Record operation, followed by a flush of all modified log pages as well as the log tail to stable storage. Subsequently, the corresponding log record can be removed from the page and the page’s PageLSN value is set to the LSN value of the End Erase Log Record. • Deleted Log Record (DLR): Identifies a previous ULR, CLR or SCR log record that has been erased due to the limited retention property. A DLR log record only contains LSN, PrevLSN, TransId, Type, PageId and RecId values. Except for the Type value (changed to ‘Deleted Log Record’) all other named values are retained from the previous ULR, CLR or SCR – in turn, the remaining log record fields (such as the actual data fields) must be deleted. Having defined several new log record types, next, we have to specify the actual Erase Record routine as well as necessary changes to existing C-ARIES mechanisms in the event that they encounter one of the new log record types. First, let us consider the Erase Record Ri from Page Pj routine: 1. Load page Pj into main memory (if necessary) and read out its PageLSN value. 2. Append a Begin Erase Log Record to the end of the log tail and flush the log to stable storage. 3. Let lr be the log record with LSN value = Pj.PageLSN. 4. While lr identifies a valid log record do a. If lr is a ULR, CLR or SCR log record with lr.RecID = Ri then convert lr to a Deleted Log Record as indicated above.
5
Proceedings of the 43rd Hawaii International Conference on System Sciences - 2010
b. Follow the page’s PageLastLSN chain backwards by setting lr := lr’ with lr’.lsn = lr.PageLastLSN. 5. Append an End Erase Log Record to the end of the log tail and flush all modified log pages as well as the log tail to stable storage. 6. Delete the actual record Ri from page Pj and set Pj’s PageLSN value to the LSN value of the End Erase Log Record. Step 4 follows the page’s log record chain (starting from the most recently written log record moving back in time via PageLastLSN pointers – see step 4.b) and converts all log records that refer to modifications (i.e. ULR, CLR or SCR) of record Ri – see step 4.a. Step 4 terminates when a PageLastLSN value = nil is found. It remains to discuss necessary changes to CARIES’ Analysis, Redo and Undo procedures. They must be amended in a way that they can cope with the new log record types – it should be noted that the actual chaining of log records has not been modified. Corresponding changes are as follows: • Analysis pass: C-ARIES’ analysis phase is comprised of three steps: Initialisation, Data Collection and Completion. Only the data collection step is affected by the above modifications. During data collection, if a Begin Erase Log Record or End Erase Log Record is encountered, the same steps as for ULRs are performed. This mainly means that corresponding Page Link (PLink) list entries are added. This ensures that those log records are also revisited during the Redo phase, which navigates forwards through the log for each page and ensures that the logged actions have been reflected to the pages on stable storage. Note: The only difference between the original C-ARIES and the augmented C-ARIES versions consists of Begin and End Erase Log Records being added to the respective PLink lists. Delete Log Records may be skipped during data collection. Such log records are exactly those that where previously included as ULR, CLR or SCR log records – only difference being that the descriptive part of their respective modifications have been deleted. As the actual Erase Record routine is described by separate Begin and End Erase Log Records, individual Delete Log Records need not be revisited during the Redo pass unless the Erase Record routine was not completed prior to the crash. If such a situation arises, the Redo phase will detect this and invoke the missing steps of the interrupted Erase Record routine. • Redo pass: C-ARIES’ redo phase remains largely unchanged. Instead of only considering
SCR, CLR and ULR records as redo-able log records, C-ARIES’s augmented version now also adds Begin Erase Log Records and End Erase Log Records to that list of redo-able actions. If an End Log Record is encountered, reapply the described action if necessary (i.e. if the corresponding delete action has not been made persistent prior to a crash, reapply step 6 of the Erase Record routine). If a Begin Erase Log Record is encountered, the corresponding Erase Record Ri from Page Pj routine has to be reapplied entirely but only iff there is no corresponding End Erase Log Record with a PrevLSN value pointing to the encountered Begin Erase Log Record. Otherwise, no additional actions need to be taken. • Undo pass: No changes are necessary as erasing a log record can never be undone. If a Deleted Log Record is being encountered, it can be ignored (i.e. just follow its PageLastLSN value). Undo removes actions performed by loser transactions (i.e. transactions that have been active at the time of a crash). A regular transaction, however, can never create a Deleted Log Record. Thus, if one is being encountered it must have been issued by a system routine (an Erase Record Ri from Page Pj routine), which has already been reapplied (if and as necessary) during the Redo phase as described previously. At the end of C-ARIES crash recovery with added limited retention support, the database and log are returned to the most recent consistent state. C-ARIES also preserves ARIES’ support for media recovery. That is, the database and log can be returned to their most recent consistent state after a portion (or the entire) database on persistent storage has been lost / damaged. In order to do so, crash recovery commences from a backup of the database (or a portion of it) and the begin checkpoint log record corresponding to the checkpoint that was considered most recent at the time the backup was taken. While this procedure is much more time consuming it follows the three phases of CARIES’ crash recovery. The limited retention modifications have retained this capability. However, it should be noted that HDBSs desire to also erase records from the backups themselves. However, as backups are usually not online, such a requirement cannot fully be resolved in an automated fashion and is beyond the scope of this paper. Finally, we would like to briefly discuss a desired optimization of the data recovery procedure with limited retention support. The previously discussed approached follows PageLastLSN pointers all the way back to the oldest log record that describes the first
6
Proceedings of the 43rd Hawaii International Conference on System Sciences - 2010
update to (i.e. the creation of) a page. However, the record to be erased might have been added to the page only much later in time. Thus, wasting a lot of time when examining log records that are not related to the data record Ri that is meant to be deleted. As an optimization, we suggest adding a flag that marks the creation of a page’s new data record to ULR log records – this can easily be done during normal LSN 1 2 3 4 5 6 7 8 9 10 11
PrevLSN Nil 1 2 Nil Nil Nil 4 5 8 -
TransId 1 1 1 2 3 2 3 2 2
12
-
2
13 14 15 16 17 18 19
10 6 9 16 17 18
2 3 3 3 3
processing. If such a flag is being added, step 4 of the Erase Record Ri from Page Pj routine can terminate much earlier – once the creation entry has been erased (i.e. once the following condition is true: lr is a ULR log record with lr.RecID = Ri and lr.creationFlag = true). The cost for this optimization is a single byte added to all ULR log records that describe the first ever update of a log record on the particular page.
Type Additional Relevant Log Record Information ULR PageId = P1; PageLastLSN = nil; RecId = R1; before & after-images Commit End ULR PageId = P1; PageLastLSN = 1; RecId = R2; before & after-images ULR PageId = P2; PageLastLSN = nil; RecId = R3; before & after-images Begin Checkpoint End Checkpt Transaction table copy with T2 and T3 = ‘Active’; DirtyLSN = 1 ULR PageId = P2; PageLastLSN = 5; RecId = R3; before & after-images ULR PageId = P1; PageLastLSN = 4; RecId = R2; before & after-images Abort CLR PageId = P2; PageLastLSN = 8; UndoNextLSN = 4; RecId = R3; after-image CLR PageId = P1; PageLastLSN = 9; UndoNextLSN = nil; RecId = R2; after-image End Begin Checkpoint End Checkpt Transaction table copy with T3 = ‘Active’; DirtyLSN = 5 ULR PageId = P2; PageLastLSN = 11; RecId = R3; before & after-images ULR PageId = P2; PageLastLSN = 16; RecId = R4; before & after-images Commit End -
Assume P1 and P2 were flushed to disk after the first checkpoint (LSN 6); the Master Record value is 14; and PageLSN values on persistent storage are as follows: PageLSN (P1) = 4; PageLSN (P2) = 5. Table 3: Log Example 1 (before the sample Erase Record routine is invoked)
5.1. Examples To demonstrate the principles of the Erase Record routine in conjunction with the augmented C-ARIES algorithm, we consider an example next. Assume that we have a transaction log, master log record and persistent page information as detailed in Table 3. It is indicated that three transactions have been executed – T1 and T3 succeeded but T2 was aborted during normal processing. In addition, two checkpoints have been performed. Let us consider how a subsequent Erase Record R3 from Page P2 routine would be executed: 1. Page P2 still resides in main memory, its PageLSN value is 17.
2. Append a Begin Erase Log Record to the end of the log with PageId = P2, PageLastLSN = 17 and RecId = R3. The PageLastLSN value is set to P2’s current PageLSN value. 3. lr := log record with LSN 17 4. In this loop, we follow P2’s PageLSN chain backwards through the log and erase all log records that describe modifications of R3 on P2. a. LSN 17: RecId is not R3, so, no action needs to be taken, continue looping. b. LSN 16: RecId = R3, so, erase the log record’s content-related information. c. LSN 11: RecId = R3, so, erase the log record’s content-related information.
7
Proceedings of the 43rd Hawaii International Conference on System Sciences - 2010
d. LSN 8: RecId = R3, so, erase the log record’s content-related information. e. LSN 5: RecId = R3, so, erase the log record’s content-related information. f. Now, the loop terminates. 5. Append an End Erase Log Record to the end of the log with PrevLSN = 20, PageId = P2, LSN 1 2 3 4 5 6
PrevLSN Nil 1 2 Nil Nil Nil
TransId 1 1 1 2 3 -
7
-
-
8 9 10 11
4 5 8 -
2 3 2 2
Type ULR Commit End ULR Delete Begin Checkpoint End Checkpoint Delete ULR Abort Delete
12
-
2
CLR
13 14
10 6
2 -
15
-
-
16 17 18 19 20 21
9 16 17 18 Nil 20
3 3 3 3 -
End Begin Checkpoint End Checkpoint Delete ULR Commit End Begin Erase End Erase
PageLastLSN = 20 and RecId = R3. Flush all modified log pages as well as the log tail. 6. Now, the actual data record can be deleted from Page P2. Table 4 shows a summary of the log (from Table 3) with the added and erased information highlighted.
Additional Relevant Log Record Information PageId = P1; PageLastLSN = nil; RecId = R1; before & after-images PageId = P1; PageLastLSN = 1; RecId = R2; before & after-images PageId = P2; PageLastLSN = nil; RecId = R3; before & after-images Transaction table snapshot with T2 and T3 marked as ‘Active’; DirtyLSN = 1 PageId = P2; PageLastLSN = 5; RecId = R3; before & after-images PageId = P1; PageLastLSN = 4; RecId = R2; before & after-images PageId = P2; PageLastLSN = 8; UndoNextLSN = 4; RecId = R3; after-image PageId = P1; PageLastLSN = 9; UndoNextLSN = nil; RecId = R2; after-image Transaction table snapshot with T3 marked as ‘Active’; DirtyLSN = 5 PageId = P2; PageLastLSN = 11; RecId = R3; before & after-images PageId = P2; PageLastLSN = 16; RecId = R4; before & after-images PageId = P2; PageLastLSN = 17; RecId = R3 PageId = P2; PageLastLSN = 20; RecId = R3
Table 4: Log Example 2 (after the sample Erase Record routine is invoked) As a second example, let us now assume that a crash occurs right after step 5 of the Erase Record routine has been executed. That is, the log on persistent storage corresponds exactly to the one depicted in Table 4, but the actual data record has never been deleted from page P2 (as step 6 of the Erase Record routine was not executed). In fact, the version of page P2 on persistent storage has last been modified by an update described by the log record with LSN 5 (the one depicted in Table 3 not Table 4!). Upon restart, the augmented C-ARIES crash recovery procedure must detect and resolve this as well as other inconsistencies.
Analysis commences from the most recent checkpoint (i.e. LSN 14). The Initialisation step examines the data contained in the end checkpoint log record: Transaction 3 is marked as active and the DirtyLSN is set to LSN 5. Next, the Data Collection step commences and scans the log in forward direction starting from LSN 5 (which is the smaller value among transaction T3’s FirstLSN value and DirtyLSN). This step considers log records 5 (skip), 6 (skip), 7 (skip), 8 (skip), 9 (process), 10 (skip), 11 (skip), 12 (process), 13 (skip), 14 (skip), 15 (skip), 16 (skip), 17 (process), 18 (process), 19 (process), 20 (process), and 21
8
Proceedings of the 43rd Hawaii International Conference on System Sciences - 2010
(process) – ‘process’ means that the respective data structures are being populated; those log records will be revisited during the Redo pass and reapplied if and as necessary. At the end of Data Collection, the transaction table is empty (i.e. no active transaction), pages P1 and P2 are in the dirty pages table and thus locked exclusively; page P1 has LSNs 9 and 12 in its list of LSNs to revisit during the Redo pass while page P2 has to revisit LSNs 17, 20 and 21. All remaining pages may be returned to normal processing. Redo commences on a page-by-page basis. For each dirty page (i.e. P1 and P2), Redo starts from the smallest LSN identified during the Data Collection step that is larger than the respective page’s PageLSN value – that way no redo-able action is applied more than once. As both PageLSN values for P1 and P2 are smaller than the LSNs identified during Analysis, all previously identified LSNs must be revisited. For P1, LSN … 16 17 18 19 20 21 22
the update described by LSN 9 is reapplied as well as the compensation described by LSN 12. Subsequently, P1 is returned to normal processing as there is no transaction for which an undo operation must be executed. At the same time, P2 is being processed concurrently. For P2, the update described by LSN 17 is reapplied first. Next, LSN 20 is considered. As there is a corresponding End Erase Log Record, the Erase Record R3 from Page P2 routine does not need to be reapplied. However, as an End Erase Log Record is being revisited, the page’s persistent version may still contain the record R3 that is to be erased. As such, only step 6 of the Erase Record routine is re-executed. Now, it is being ensured that every logged redo-able action has been executed at least once (but also no more than once). Subsequently, page P2 may also be released for normal processing and Crash Recovery terminates. The database has been returned to a consistent state.
PrevLSN TransId Type Additional Relevant Log Record Information LSNs 1 to 15 are identical to those from Table 3 9 3 Delete PageId = P2; PageLastLSN = 11; RecId = R3; before & after-images 16 3 ULR PageId = P2; PageLastLSN = 16; RecId = R4; before & after-images 17 3 Commit 18 3 End Nil Begin Erase PageId = P2; PageLastLSN = 17; RecId = R3 Nil 4 ULR PageId = P1; PageLastLSN = 12; RecId = R2; before & after-images 21 4 Commit -
Assume P1 and P2 were flushed to disk after the first checkpoint (LSN 6); the Master Record value is 14; and PageLSN values on persistent storage are as follows: PageLSN (P1) = 4; PageLSN (P2) = 5. Table 5: Log Example 3 (inconsistent log snapshot before crash recovery) As a third example, we consider that a crash occurs during the execution of an Erase Record routine before the End Erase Log Record is being written. The log snapshot depicted in Table 5 describes such a situation in which an Erase Record R3 from Page P2 routine created the log record with LSN 20 and converted the previous ULR log record with LSN 16 to a Delete Log Record. Concurrently, transaction T4 updated a record on page P1 and committed. After commit, the depicted log tail was flushed just before a crash occurs. Again, the augmented C-ARIES algorithm must detect and resolve corresponding inconsistencies. Analysis commences from LSN 14. Initialisation is identical to the second example. Data Collection scans the log in forward direction. This step considers log records 5 (process), 6 (skip), 7 (skip), 8 (process), 9 (process), 10 (skip), 11 (process), 12 (process), 13 (skip), 14 (skip), 15 (skip), 16 (skip), 17 (process), 18 (process), 19 (process), 20 (process), 21 (process), and
22 (process). At the end of Data Collection, the transaction table contains transaction T4 marked as committed, pages P1 and P2 are in the dirty pages table and thus locked exclusively; page P1 has LSNs 9, 12 and 21 in its PLink list while page P2 has to revisit LSNs 5, 8, 11, 17, and 20 during Redo. Redo for page P1 is almost identical to the second example – only difference being the additional ULR log record with LSN 21 that has to be reapplied. For page P2, the PageLSN value is equal to that of the first entry in its PLink list. Thus, Redo commences with LSN 8, followed by 11, 17 and 20. When the log record with LSN 20 is considered, no corresponding End Erase Log Record is found. Hence, the captured Erase Record R3 from Page P2 routine must be reapplied. During this process, it is determined that LSN 16 had already been erased, but not LSNs 11, 8 and 3. Once this has been done, an End Erase Log Record is appended to the log tail, all modified log
9
Proceedings of the 43rd Hawaii International Conference on System Sciences - 2010
pages and the log tail are flushed and the record is deleted from page P2. Subsequently, an End log record for transaction T4 is written and the corresponding LSN … 16 17 18 19 20 21 22 23 24
transaction table entry removed. Lastly, Crash Recovery terminates with the log (depicted in Table 6) and database being returned to a consistent state.
PrevLSN TransId Type Additional Relevant Log Record Information LSNs 1 to 15 are identical to those from Table 4 9 3 Delete PageId = P2; PageLastLSN = 11; RecId = R3; before & after-images 16 3 ULR PageId = P2; PageLastLSN = 16; RecId = R4; before & after-images 17 3 Commit 18 3 End Nil Begin Erase PageId = P2; PageLastLSN = 17; RecId = R3 Nil 4 ULR PageId = P1; PageLastLSN = 12; RecId = R2; before & after-images 21 4 Commit 20 End Erase PageId = P2; PageLastLSN = 20; RecId = R3 22 4 End Table 6: Log Example 4 (consistent log snapshot after crash recovery on Table 5)
6. Conclusion In this paper, we have discussed how selected challenges introduced by the desirable limited retention property of HDBSs can be addressed. In particular, we considered how data records cannot only be deleted from DB pages but also their corresponding log records from the log (without violating data recoverability). To the best of our knowledge, there exists no transaction processing or recovery approach that supports the deletion of data from log entries. For our consideration, we have first examined the popular ARIES recovery algorithm and one of its incarnations, C-ARIES, which is designed for highly concurrent systems. We concluded that the C-ARIES algorithm is most suitable for the purpose at hand – mainly due to its different approach to linking log records. C-ARIES’ page-oriented chaining of log records provides a natural way of tracing particular modifications applied to a page and removing those data items’ associated log records. After identifying CARIES’ suitability, we have discussed the necessary changes to its core data structures, log records and crash recovery routines. The results presented in this paper can be considered as a first step towards a fully-fledged data recovery solution for HDBSs. Besides deleting data records from the database and its corresponding log entries, HDBSs also require such information to be erased from audit trails and backups – which was beyond the scope of this paper. In addition, there are related issues that need to be addressed, for instance, how do we continue to support historical analysis and
statistical queries without incurring privacy breaches? Agrawal et al. suggest that it may be sufficient to limit queries as proposed in statistical database literature, e.g. [1], but this remains to be investigated.
7. References [1] Adam, N. R., and Wortman, J. C., “Security-control Methods for Statistical Databases”, ACM Computing Surveys, 21(4), 1989, pp. 515-556. [2] Agrawal, R., Kiernan, J., Srikant, R., and Xu, Y., “Hippocratic Databases”, Proc. 28th Int’l Conf. on Very Large Data Bases, Morgan Kaufmann, 2002, pp. 143-154. [3] Gray, J. and Reuter, A., “Transaction Processing: Concepts and Techniques”, Morgan Kaufmann, 1993. [4] Kirchberg M., “Using Abstract State Machines to Model ARIES-based Transaction Processing”, Journal of Universal Computer Science, 15(1), 2009, pp. 157-194. [5] Mohan, C., Haderle, D.J., Lindsay, B.G., Pirahesh, H., and Schwarz, P.M., “ARIES: A Transaction Recovery Method Supporting Fine-granularity Locking and Partial Rollbacks Using Write-ahead Logging”, ACM Transactions on Database Systems, 17, 1992, pp. 94-162. [6] Mohan, C., “Repeating History beyond ARIES”, Proc. 25th Int’l Conf. on Very Large Data Bases, Morgan Kaufmann, 1999, pp. 1-17. [7] Speer, J. and Kirchberg, M., “C-ARIES: A Multithreaded Version of the ARIES Recovery Algorithm”, Proc. 18th Int’l Conf on Database and Expert Systems Applications, LNCS 4653, Springer, 2007, pp. 319-328.
10