An Implementation of Two Phase Locking (2PL) and Optimistic Concurrency Control(OCC) on Google App Engine Puneet Lakhina
Neer Shay
University of California, Santa Barbara
University of California, Santa Barbara
[email protected]
[email protected]
ABSTRACT
1.1
Concurrency Control in Databases is implemented mainly through two strategies, locking or optimistic concurrency control. Through the means of this paper we compare the implementations of these two strategies and provide a comparative analysis on the relative performance of the two. The vehicle for this analysis is a transactional datastore built using the Google App Engine which utilizes Google Datastore API. We have implemented the Strong Strict Two Phase Locking (SS2PL) version of locking and Parallel Validation version of the Optimistic Concurrency Control. In addition to a study of the two concurrency control protocols, we also highlight the nuances of their implementation using Google App Engine and on using the ”New Age” key-value App Engine Datasotre API as our backend storage.
Database Concurrency control is a mechanism that allows operations from different transactions to be interleaved but still appear from the user point of view to have executed in a serial fashion. The two main strategies employed for concurrency control are Optimistic Concurrency Control [10] and Locking. The locking protocols require transactions to acquire locks on data items they wish to operate on before proceeding with execution. This prevents conflicting operations (operation on the same data items with at-least one of them being write) amongst two transaction from accessing the same data item. In contrast Optimistic Concurrency Control protocol relies on the assumption that most transactions do not have conflicting operations. Thus transactions in OCC proceed concurrently and are validated for serializability only when they commit. In case a non serializable transaction is detected, it is aborted or retried.
Categories and Subject Descriptors H.4 [Database Management Systems]: Systems - Concurrency, Transaction Processing
Keywords Optimistic Concurrency Control, Two Phase Locking, Google App Engine, Datastore
1.
INTRODUCTION
The purpose of this project was to implement a serializable transactional read and write data store using the Google App Engine API. Thus there are two aspects to this project: a comparative implementation of two concurrency control strategies: Two Phase Locking and Optimistic Concurrency Control (OCC), and the usage of Google App Engine [2] especially its key-value type DataStore API. The key-value datastores are the ”new age” data stores with the upside of being more scalable that a traditional database by compromising on the some of the guarantees of the traditional relational database. The other notable key-value datastores are Dynamo [8] and PNUTS [7]
Concurrency Control
Amongst the locking protocols two phase locking protocols is the one utilized in most commercial database systems. We have implemented the Strong Strict Two Phase Locking (SS2PL) variation of the two phase locking protocol. This protocol mandates that transactions release any locks that they hold only after completion i.e. after a commit or an abort. OCC has two variations - Serial Validation and Parallel Validation, which differ in terms of whether two or more transactions can be validated simultaneously. While serial validation disallows two transactions to be validating simultaneously, parallel validation permits this with an extra restriction on conflicting operations between transactions. Our implementation is based on parallel validation protocol of OCC. Section 2 further details the implementation of these two strategies, including the data structures used for locking and data storage.
1.2
Google App Engine
GAP is a Platform as a Service offering from Google, which allows users to run web applications using Google’s infrastructure in a sandbox environment. The primary selling point of the App engine is the inbuilt scalability. The scalability is provided through the platform which is built on tested scalable technologies such as BigTable [6] and Google File System [9]. In order to ensure application scalability the platform does enforce some restrictions which we discuss in the next section. The platforms consists of a python runtime and variety of Service APIs such as the Datastore API for persistent storage, Memcahce API for caching etc. The Datastore is a distributed data storage service that
the operation type (read,write,commit or abort), the data item on which the operation is to be performed and the associated transaction. In order to aid with the scheduling of an operation. The operation entity also holds a status flag which signifies if an operation is waiting to acquire a lock. • DataItems: A data item entity holds a key,value pair of a data item and a value. The set of objects of data items and their values is the current state of the data store.
2.1.2 Figure 1: Concurrent Data Store Implementation supports querying and transactions. It allows the application to store data without the application having access to the any disk directly. In Section 2 we discuss the implementation details of the two strategies as well as the details of Google App engine APIs and features utilized for the implementation. Section 3 describes the details on performance testing of the two implementations.
2.
IMPLEMENTATION DETAILS
Our concurrency implementation is a layer that sits on top of the the datastore and regulates access to the data store. The Datastore API handles concurrency itself by using an OCC protocol, but the transactional services of the Datastore API haven’t been utilized for implementing concurrency except in some cases at the lowest level of granularity, like incrementing counter values. All data structures and the data store itself are modeled as entities using in the Datastore API. Fig 1 gives a high level view of the architecture of the implementation. HTTP Request Handlers talk to the scheduler to perform an operation. In case the scheduling strategy being used is SS2PL, the scheduler requests the lock manager for a lock on the data item to perform the operation. While in OCC’s case the scheduler performs the operation on local storage and only validates once a commit operation comes in.
2.1 2.1.1
SS2PL Implementation Data Model
The entities utilized for the SS2PL Implementation are: • Transactions: A transaction entity holds all the properties associated with a transaction. This includes the list of operations and the start and end time of the transaction. • Locks: This Locks entity holds the locks status of all data items. Each lock object has a data item, a lock mode(read or write) and the list of transactions holding that lock. Only read locks can be shared between transactions. • Operations: An operation entity holds all the attributes associated with an operation. This includes
Operation Execution
Operation execution algorithm below details the steps involved in executing an operation in the SS2PL setup. An interesting difference between SS2PL and OCC is the handling of an abort operation. In case of SS2PL, since all operations for which locks are acquired are performed on the persistent database, an abort needs to cleanup the value written by those transactions. This is implemented by performing an inverse write for that data item. The data item value to be used for the inverse operation is the value that was in the persistent store before the write operations was performed. Thus every write operation has an implicit read involved. if operation is not abort then if lock(operation) then performop if operation is commit then releaseLocks processWaitQueue else op.iswaiting = True end if end if else for all writeoperations do performInverseOp end for performcommit end if The wait queue processing is done by finding out all the waiting operations, and then processing them in the order of submission. Processing an operation in the wait queue is similar to any new operation with the added constraint that no operation is performed for a transaction that has previous waiting operations. This is done in order to maintain the order of operations within a single transactions.
2.2
Optimistic Concurrency Control
The implementation of Optimistic Concurrency Control (OCC) is relatively simpler as compared to SS2PL. The entities involved in OCC are as follows: • OCCTrans : Similar to the SS2PL Transactions OCCTrans holds details of an OCC transactions. This includes the set of operations for a transaction, which are grouped into two properties, read set(property rs) and write set (property ws). In addition an OCCTrans object holds values for fstart and fend which are used to calculate which transactions finished during the read
phase of this transaction. Amongst these values fstart is populated when a transaction starts i.e. when the first operation of a transaction is received while fend is populated during the start of the validation phase. In addition parallel validation in OCC requires maintaining an active set which is a list of transactions that are performing validation at the same time.
if op is read then if op.din in op.trans.writeset.dataitemnames then readFromWritesetOperation for this Data Item Name else read from persisten db end if add op to op.trans.readset else if op is write then add op to op.trans.writeset else if op is commit then {START CRITICAL SECTION 1} set fend = getvalue(tctr) {Clone current active set} op.trans.activeset = (all transactions with phase = val) {Add yourself to active set for other transactions} op.trans.phase = val {END CRITICAL SECTION 1} valid ← T rue for all transactions t with tid > fstart and tid = :1 and property2