Snapshot Isolation and Integrity Constraints in Replicated Databases Yi Lin1 , Bettina Kemme1 School of Computer Science, McGill University, Canada
[email protected],
[email protected] Ricardo Jim´enez-Peris2 , Marta Pati˜no-Mart´ınez2 Facultad de Inform´atica, Universidad Polit´ecnica de Madrid, Spain (rjimenez, mpatino)@fi.upm.es and Jos´e Enrique Armend´ariz-I˜ nigo3 Departamento de Ingenier´ıa Matem´atica e Inform´atica, Universidad P´ublica de Navarra, Spain
[email protected] Database replication is widely used for fault-tolerance and performance. However, it requires replica control to keep data copies consistent despite updates. The traditional correctness criterion for the concurrent execution of transactions in a replicated database is 1-copy-serializability. It is based on serializability, the strongest isolation level in a non-replicated system. In recent years, however, snapshot isolation (SI), a slightly weaker isolation level, has become popular in commercial database systems. There exist already several replica control protocols that provide SI in a replicated system. However, most of the correctness reasoning for these protocols has been rather informal. Additionally, most of the work so far ignores the issue of integrity constraints. In this paper, we provide a formal definition of 1-copy-SI using and extending a well-established definition of SI in a non-replicated system. Our definition considers integrity constraints in a way that conforms to the way integrity constraints are handled in commercial systems. We discuss a set of necessary and sufficient conditions for a replicated history to be producible under 1-copy-SI. This makes our formalism a convenient tool to prove the correctness of replica control algorithms. Categories and Subject Descriptors: H [Information Systems]: ; H.2 [Database Management]: ; H.2.4 [Systems]: Distributed databases General Terms: Theory, Verification, Reliability Additional Key Words and Phrases: Replication, Snapshot Isolation, Integrity Constraints 1 This
work was partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) under its Discovery Grants Program. 2 This work was supported in part by the Spanish National Science Foundation (MEC) (grant TIN2007-67353-C02), Madrid Regional Research Council under the Autonomic project (grant S0505/TIC/000285) and the EU Commission under the NEXOF-RA project (grant FP7-216446). 3 This work has been partially supported by the Spanish MEC and EU FEDER under grant TIN2006-14738-C02 and IMPIVA under grant IMAETB/2007/30.
Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior specific permission and/or a fee. c 2009 ACM 0362-5915/2009/0300-0001 $5.00
ACM Transactions on Database Systems, Vol. V, No. N, February 2009, Pages 1–54.
2
·
...
1. INTRODUCTION Database systems are an important component in current information systems architectures. In these multi-tier architectures, the database system builds the backend tier that provides persistence and transactional properties. With businesses providing their clients and business partners increasingly online access to their services, and with the emergence of web-service standards, these information systems face immense scalability issues. Often, the database is a bottleneck, and the only commercial solution to achieve scalability is to buy expensive parallel database software. A cheaper alternative is database replication. In this case, the database system is installed on a cluster of machines each holding a copy of the database. Typically, a ROWA (read-one-write-all) approach is used. A read access can be executed by any replica while writes have to be performed by all replicas. Database Replication In recent years, many cluster-based replication solutions have been proposed (e.g., [Carey and Livny 1991; Chundi et al. 1996; Breitbart et al. 1999; Pacitti et al. 1999; Kemme and Alonso 2000; Pedone et al. 2003; Amza et al. 2003; Holliday et al. 2003; Plattner and Alonso 2004; Cecchet et al. 2004; Pati˜ noMart´ınez et al. 2005; Lin et al. 2005; Plattner et al. 2008]) that have shown to provide excellent scalability for transactional workloads. Some of them integrate replica control directly into the database kernel. The clients connect to any of the database replicas and submit their requests as if this was a non-replicated database system. Other solutions implement the replication logic in a middleware layer between the client and the database replicas. The middleware provides a standard database interface such as JDBC, and controls where reads and writes are executed. Correctness in Replicated Databases Many of the solutions assume that the underlying database system provides the isolation level serializability using strict two phase locking (2PL). Based on the locking mechanisms of the database system, the replication module guarantees 1-copy-serializability at the global level, i.e., the execution in the replicated environment is equivalent to a serial execution over a logical single copy of the database. Recently, Snapshot Isolation (SI) has emerged as a new isolation level [Berenson et al. 1995]. SI is slightly weaker than serializability and has become quite popular. It requires that transactions read data from a snapshot committed at the time point when they start. Furthermore, if two transactions want to update the same data item concurrently, one will be aborted. SI has been adopted by many database vendors such as Oracle, PostgreSQL, Interbase 4 and Microsoft SQL Server 2005. Implementations of SI allows for more concurrency than strict 2PL, the standard mechanism to achieve serializability, since read operations read from a snapshot and do not need to set locks. SI avoids all isolation anomalies as defined by the industrial ANSI standard [ANSI X3.135-1992 1992]. However, it does not provide serializability as defined in the research literature. Berenson et al. [1995] provide an adjusted set of anomalies, and show that SI exhibits some anomalies that cannot occur under their definition of serializability. Given the popularity of SI, it makes sense for a replicated database to provide ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
...
·
3
what we call 1-copy-SI, meaning that the execution in the replicated environment is equivalent to an execution over a logical single copy of the database that is possible under SI. Indeed, several replica control protocols have been proposed that provide SI at the global level (e.g., [Plattner and Alonso 2004; Plattner et al. 2008; Lin et al. 2005; Elnikety et al. 2005; Wu and Kemme 2005; Daudjee and Salem 2006]). Often, however, correctness reasoning is rather informal. Integrity Constraints While SI and its relationship to serializability have been discussed in depth in the research literature [Adya 1999; Berenson et al. 1995; Fekete et al. 2005], its behavior in regard to integrity constraints is not well defined. As pointed out by Berenson et al. [1995], having all operations based on SI semantics, integrity constraints such as foreign key constraints are easily violated. Clearly, commercial database systems maintain integrity constraints even if they are based on SI. That is, they actually implement an isolation level that is stronger than SI but weaker than serializability. We are not aware of any work that formalizes this behavior. Adya [1999] discusses integrity constraints and its integration with SI. However, the author proposes to use the serializable isolation level for update transactions and SI only for read-only transactions. This is stricter than the isolation level implemented in commercial systems. In regard to replication tools, integrity constraints are generally ignored, and it is not clear, whether the systems can handle them. Some might handle them, others not. However, in order to judge whether correctness is given, we need a way to express when an execution in a replicated environment provides SI at the global level and at the same time does not violate any integrity constraints. Contribution of the Paper This paper proposes a framework that allows us to reason about SI and integrity constraints in a replicated environment. Our framework is based on the General Isolation Definition (GID) introduced in [Adya 1999; Adya et al. 2000]. GID is a very powerful tool and allows the definition of isolation levels in an implementation-independent manner. In particular, Adya [1999] defines SI using GID. We extend this definition and the GID framework to reason about correctness in a replicated environment. We define 1-copy-SI as a correctness level in a replicated system. Furthermore, we introduce integrity constraints and define an isolation level SI+IC that corresponds to the isolation level implemented in commercial systems. We extend this isolation level to 1-copy-SI+IC to be used in a replicated environment. We present conditions that help to decide whether a replicated history conforms to 1-copy-SI or 1-copy-SI+IC. In particular, we show that in order to be 1-copy-SI/1-copy-SI+IC a history must avoid certain cycles in its dependency graph. Our formalism is a convenient tool to prove the correctness of a given replica control algorithm. We present three example protocols and show that two provide 1-copy-SI+IC while one only provides 1-copy-SI. The remainder of this paper is structured as follows. In Section 2, we present GID as introduced in [Adya 1999] to reason about SI. In Section 3, we define 1-copySI based on GID and give some necessary and sufficient conditions for a replicated execution to be 1-copy-SI. In Section 4, we extend the formalism to express integrity constraints (ICs) and define SI+IC as new isolation level. In Section 5, we derive 1-copy-SI+IC which provides SI and proper handling of integrity constraints in a replicated environment. In Section 6, we describe several example replica control ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
4
·
...
protocols and prove their correctness. Section 7 deals with replica failures. Section 8 presents related work and Section 9 concludes the paper. 2. SNAPSHOT ISOLATION (SI) Berenson et al. [1995] define SI by two properties. Snapshot-Read requires that a transaction T reads data from a snapshot which contains all updates committed before T starts (plus its own updates). Snapshot-Read is typically implemented via a multi-version system where read operations access previously committed versions. Snapshot-Write requires that no two concurrent transactions may write the same object. That is, if two concurrent transactions both want to write the same data item only one of them will be allowed to commit. Conflict detection for SnapshotWrite can be implemented via locking or via validation. Our correctness reasoning is based on the formalism introduced in [Adya 1999; Adya et al. 2000], denoted as General Isolation Definition (GID). In his thesis [Adya 1999], Adya defines GID and uses it to reason about various isolation levels in a non-replicated environment, including snapshot isolation. In the remainder of this section, we present GID for snapshot isolation, slightly adjusted to our needs. 2.1 General Isolation Definition (GID) 2.1.1 Data Items and Transactions. A data item (object) x of the database has a life time from its initial unborn version, xinit , to its dead version, xdead , created by a transaction deleting x. A transaction Ti starts with a start operation si , then contains a sequence of read and write operations, and terminates with a commit operation (i.e., ci ) or an abort operation (i.e., ai ). A transaction Ti creates a version xi of object x by performing a write operation wi (xi ). If Ti reads x it reads a specific version xj , denoted as ri (xj ). Reads cannot read unborn or dead versions. If Ti writes x, the version xi becomes a committed version at the time Ti commits. We also say that Ti installs xi at commit time. Before the commit, xi is a tentative version. If Ti aborts, xi becomes an aborted version that is no more visible. For simplicity, we assume Ti does not read or write the same object twice, and if it reads and writes an object, it performs the read before the write4 . 2.1.2
Transaction Histories. Execution is described through histories.
Definition 2.1. History. Let T be a set of transactions. A history H over T describes the execution of all transactions in T and consists of two parts. (1) It describes a partial order5, called time-precedes order ≺t , over operations of transactions of T with the following properties: (a) Each transaction in T has a start, and either a commit or an abort operation in H. H contains all operations of committed transactions. For aborted transactions some of the read or write operations might be missing. 4 Extending
to multiple writes on an object or to write-then-read-relationships is conceptually very simple but makes the notation and descriptions more cumbersome. 5 Partial order in this paper refers to an order < with irreflexivity (i.e., ¬(a < a)) and transitivity (i.e., (a < b) ∧ (b < c) ⇒ (a < c)). ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
...
·
5
(b) H includes the order in which operations within a transaction are executed. That is, for any two operations oij and oik of Ti ∈ T , if oij happens before oik in the execution, then oij ≺t oik . In particular si ≺t ci . (c) If wi (xi ) and rj (xi ), then wi (xi ) ≺t rj (xi ). (d) For any two committed transactions Ti and Tj : either ci ≺t sj or sj ≺t ci . (2) H includes a version order . For each object x there exists a total order on the committed versions of x. xinit is the unborn version, and xdead (if existing) is the last version. The description is very flexible and does not consider any isolation level. Different isolation levels are then defined by putting specific restrictions on the possible histories. For convenience, we present a history H as a sequence of operations (i.e., start, read, write, commit, abort) with a total order (from left to right) consistent with ≺t . For example, consider the history Hexample : Hexample : s1 s2 w1 (x1 ) r2 (x1 ) w2 (x2 ) w2 (y2 ) c1 c2 s3 r3 (x1 ) c3 s4 w4 (y4 ) a4 [x2 x1 ] In this history, T2 reads version x1 although it is not yet committed. x2 is ordered before x1 in the version order, although in ≺t , w1 (x1 ) is ordered before w2 (x2 ), and c1 is ordered before c2 . This shows that, in general, the version order is independent of the execution or commit order. Furthermore, T3 reads x1 although x2 was installed later. Finally, y4 is not considered in the version order since it was created by an aborted transaction. Clearly, this history is not SI since it both violates Snapshot-Read (T2 reads a data version that was not committed before T2 started) and Snapshot-Write (T1 and T2 are concurrent, write the same object, and both commit). Those familiar with traditional serializability theory [Bernstein et al. 1987] will easily see that the history is actually serializable. In the following, our example histories often do not start with an empty database but assume that before the history H over a set of transactions T started, a transaction T0 committed and created some data versions. We assume that if T0 wrote object version x0 , then x0 xi for any transaction Ti ∈ T that writes x. 2.1.3 Predicates. A database query often accesses an entire set of data items and performs a predicate evaluation. In the context of this paper we are only interested in predicate reads6 . Adya [1999] introduces a predicate evaluation as a special read operation. We slightly enrich the formalism of [Adya 1999] to better serve our needs. A transaction Ti can have a predicate read operation ri (F:P:Oset(P):Iset(P)). P is a function over a set of relations defining a predicate. Iset(P ) contains a version for each data item of each relation specified in P . This can include unborn and dead versions. P takes Iset(P ) as input and returns the versions Oset(P ) ⊆ Iset(P ) that match the predicate. Unborn and dead versions cannot be in the return set. Function F takes Oset(P ) as input and returns the outcome of the query. Predicate read operations are added to the history just as normal read or write operations. For instance, assume a relation D(did, location) with two data items d1 and d2. A transaction T0 has already created version d10 =(‘d1’, ‘Chicago’) while d2 still only has its unborn version d2init . If a query of transaction T1 now asks 6 Predicate
writes can be described in a similar way and are omitted for space reasons. ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
6
·
...
wr,s T1 ww,wr T2 Fig. 1.
s
T3
SSG(Hexample )
for all departments in Chicago (e.g., select * from D where location = ‘Chicago’) we can write this as a predicate read r1 (select : D.location = Chicago : {d10 } : {d10 , d2init }). For simplicity, this notation does not indicate the function P but only the predicate defined by P . The predicate is evaluated over each of the data versions Iset(P ) = {d10 , d2init }. d10 is the only matching tuple in Oset(P ). F simply returns this tuple as outcome of the query. If a query only returns the number of departments in Chicago (e.g., select count(*) from D where location = ‘Chicago’) then F returns the value “1” as outcome of the query. 2.1.4 Serialization Graph. GID uses data-flow graphs to reason about the properties of a history. In this paper, we are interested in the Start-ordered Serialization Graph (SSG). It records dependencies between two committed transactions of a given history H over T . Tj start-depends on Ti if Ti commits before Tj starts in the time-precedes order. Tj directly write-depends on Ti if both write a common data item x and xi and xj are consecutive versions of x in H’s version order. Tj directly read-depends on Ti if Ti installs some object version xi and Tj accesses xi in its read operation (i.e., rj (xi ) or rj (F:P:Oset(P):Iset(P)) and xi ∈ Iset(P )). Tj directly anti-depends on Ti if Ti accesses an object version xk in a standard or predicate read operation and Tj creates x’s next version xj in the version order7 . Definition 2.2. Start-ordered Serialization Graph (SSG). The SSG(H) of a history H over a set of transactions T is a directed graph where each node in SSG(H) corresponds to a committed transaction in H, and there is a write-, read-, anti- or start-dependency edge from Ti to Tj iff Tj directly write-, directly read-, directly anti-, or start-depends on Ti , respectively. The dependency definitions and edges are summarized in Table I and Figure 1 shows the SSG(Hexample ) of the above example history. Since T4 aborts it is not contained in the graph. In the following we refer to write-, read-, and antidependency edges also as ww-, wr- and rw-dependency edges, respectively. The particular data item x that leads to a dependency does usually not need to be considered. But if it does, we say that the dependency or the dependency edge is due to data item x. Note that a dependency edge can be due to several data items. ww +
In the following, given the SSG(H) of a history H, we denote as Ti −→ Tj a path in the graph from Ti to Tj consisting only of write-dependency edges. Similarily, S+
we denote as Ti −→ Tj a path in SSG(H) with only start-dependency edges. 2.2 Snapshot isolation in GID Adya [1999] derives the set of histories allowable under SI by defining how SnapshotRead and Snapshot-Write impose further restrictions on the ≺t order of certain start 7 Adya [1999] defines anti-dependency for predicate reads to the first transaction to change the outcome of the predicate read. For SI, however, we need the anti-dependency to the next version.
ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
... Dependency Type Directly write-depends Directly read-depends
SSG ww Ti −→ Tj wr Ti −→ Tj
Edge name write-dependency edge or ww-dependency edge read-dependency edge or wr-dependency edge
Directly anti-depends
Ti − → Tj
anti-dependency edge or rw-dependency edge
start-depends
Ti −→ Tj
Table I.
rw S
·
7
start-dependency edge
Dependencies (based on Fig. 2 in Adya et al. [2000])
and commit operations. Definition 2.3. Snapshot-Read. All read operations or transaction Ti occur at Ti ’s start point. That is, if ri (xj ), (i 6= j) occurs in history H, then: (1) cj ≺t si , and (2) if wk (xk ) also occurs in H(j 6= k 6= i), then either (a) si ≺t ck , or (b) ck ≺t si and ck ≺t cj Part (1) requires that the read version was committed at start time of the reading transaction. Part (2) requires that the latest of the committed versions is read. That is, if both xj and xk were installed (committed) before Ti started, and xk xj , then Ti does not read the “outdated” version xk . Definition 2.4. Snapshot-Write. For two committed transactions Ti and Tj in H that modify the same object x (1) Either ci ≺t sj or cj ≺t si . (2) If ci ≺t sj (≺t cj ) then xi xj and if cj ≺ si (≺t ci ) then xj xi That is, no concurrent committed transactions may update the same object, and the version order of an object x follows the order in which the transactions that updated x committed. Similar in spirit to the ANSI definitions, GID now identifies phenomena that a history must avoid to be SI. Some of them are defined through properties of the history that are simple to verify. Others are properties of the SSG. —G-1a: Aborted Reads. A history H over T exhibits phenomenon G-1a if it contains an aborted transaction T1 and a committed transaction T2 such that T2 has read some objects modified by T1 . —G-1b: Intermediate Reads. A history H exhibits phenomenon G-1b if it contains a committed transaction T2 that has read a version of object x written by transaction T1 that was not T1 ’s final modification of x. We do not further consider these phenomena because our transaction model assumes that each transaction only writes an object at most once. —G-1c: Circular Information Flow. A history H has phenomenon G-1c if the start-ordered serialization graph SSG(H) contains a directed cycle consisting entirely of ww-dependency and wr-dependency edges. We call this a G-1c cycle. —G-SIa: Interference. A history H exhibits phenomenon G-SIa if SSG(H) contains a ww- or wr-dependency edge from Ti to Tj without there also being a start-dependency edge from Ti to Tj . ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
8
·
...
s wr,ww T1 T2 s rw Fig. 2.
wr,s T3
SSG(Hnon−SI ) in Example 1
T1 Fig. 3.
rw T ww,s T 2 4 SSG(HSI ) in Example 1
—G-SIb: Missed Effects. A history H exhibits phenomenon G-SIb if SSG(H) contains a directed cycle with exactly one rw-dependency edge. We refer to such cycle as a G-SIb cycle. GID defines an isolation level PL-SI corresponding to SI as the one in which the G-1 and G-SI phenomena are disallowed. Roughly, G-1 captures the essence of dirty read and dirty write while G-SI captures the essence of violating Snapshot-Read and Snapshot-Write 8 . For the convenience of discussion, we refer to a history as an SI-history if it avoids phenomena G-1 and G-SI. Example 1. Hnon−SI is not an SI-history while HSI is an SI-history. Their SSGs are shown in Figure 2 and 3 respectively. We assume that a transaction T0 installs version x0 and y0 before the transactions T1 to T3 start. Hnon−SI : s1 s2 w1 (x1 ) w1 (y1 ) c1 r2 (x1 ) w2 (y2 ) c2 s3 r3 (y0 ) c3 [y1 y2 ] HSI : s1 s2 s3 w1 (x1 ) c1 r2 (x0 ) w2 (y2 ) c2 w3 (y3 ) a3 s4 r4 (x1 ) w4 (y4 ) c4 [y2 y4 ] In both Hnon−SI and HSI , T1 is the first to write and install x. In Hnon−SI , T2 reads the version of x created by T1 (r2 (x1 )). This violates property (1) of SnapshotRead because T1 has not committed at the time T2 starts. Correspondingly we can S wr see that there is a T1 −→ T2 edge but no T1 −→ T2 edge in SSG(Hnon−SI ) (Figure 2). This means Hnon−SI has phenomenon G-SIa. Moreover, T1 and T2 both write y concurrently and both are allowed to commit. This violates SnapshotS ww Write. Correspondingly we can see that there is a T1 −→ T2 edge but no T1 −→ T2 rw S edge in SSG(Hnon−SI ). Furthermore, there is a G-SIb cycle T1 −→ T3 − → T1 in SSG(Hnon−SI ) having exactly one anti-dependency edge. Phenomenon G-SIb always occurs if a transaction reads an outdated version which violates property (2) of Snapshot-Read. In Hnon−SI , T3 reads y0 , although T1 wrote y1 and committed before T3 started. Thus, T3 should have read y1 and not y0 . This results in a G-SIb cycle between T1 and T3 with one start- and one rw-dependency edge. In HSI , T2 reads x from T0 instead of T1 (r2 (x0 )). This is correct, because T2 started after T0 committed. Although T1 and T2 are concurrent, both are able to commit because they write different objects. However, T3 is aborted because it writes y, is concurrent to T2 , and T2 commits (only one may commit). T4 reads the last committed version as of start time. Figure 3 shows SSG(HSI ). T3 does not appear in the SSG as it aborted. HSI avoids phenomenon G-1a since no transaction reads from T3 , G-1b since no transaction updates the same data item twice, and G-SIa since both the wr-dependency edge from T1 to T4 and the ww-dependency edge from T2 to T4 are accompanied by start-dependency edges. Furthermore, since SSH(HSI ) is acyclic, G-1c and G-SIb are avoided. Hence, HSI is an SI-history. 8 We
refer to [Adya 1999; Adya et al. 2000] for the proofs that G-1 and G-SI are necessary and sufficient conditions for a history to provide Snapshot-Read and Snapshot-Write. ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
...
·
9
2.3 Observations This section discusses some further properties of SI-histories and general histories and their SSGs. They will be useful when we discuss SI in a replicated system. First of all, we want to point out a property that holds in the SSG(H) of any history H. Figure 4 shows an illustration of this property. Proposition 2.5. Let H be a history over T . Let Ti , Tj ∈ T write x, and wr ww Tk ∈ T read x. If Ti −→ Tj and Ti −→ Tk are two edges in SSG(H) due to x, rw
then Tk − → Tj is an edge in SSG(H) due to x. ww
It directly follows from the definition of dependency edges. Ti −→ Tj due to x wr means that xi and xj are consecutive versions in x’s version order. Ti −→ Tk due to x means that Tk reads version xi . Since Tj installs the next version of xi , according rw
to the definition of direct anti-dependency edges, there must be a Tk − → Tj edge in SSG(H) due to x. Secondly, we want to look at the relationship between dependencies and the start and commit order of transactions. In Section 2.1, we have shown in our first example history, Hexample , that there are generally very few restrictions on how operations are ≺t -ordered. However, an SI-history has quite strong properties in regard to the ≺t -order. Table II summarizes these ordering implications. Clearly, a start-dependency edge between Ti and Tj means ci ≺t sj for any history H by definition. Furthermore, in order to avoid G-SIa, every ww- or wr-dependency edge in the SSG(H) of an SI-history H is accompanied by a start-dependency edge, and rw
thus, we have ci ≺t sj in H. Finally, an anti-dependency Ti − → Tj implies si ≺t cj in H. Assume that this would not be the case. Then cj ≺t si holds. Thus, there S
would be an edge Tj −→ Ti resulting in a cycle between Ti and Tj with exactly one rw-dependency edge. This is phenomenon G-SIb and avoided by SI-histories. Dependency
wr Ti
Tk ww
rw
S
Tj
Fig. 4. Relationship of read-, write-, and anti-dependency edge
Order Requirement in SI-history
Ti −→ Tj ww Ti −→ Tj wr Ti −→ Tj
c i ≺t s j c i ≺t s j c i ≺t s j
Ti − → Tj
s i ≺t c j
rw
Table II.
Order requirements for SI-histories
3. SNAPSHOT ISOLATION IN A REPLICATED SYSTEM In this section we extend the notion of SI to a replicated environment. In order for a replicated database to provide a certain level of isolation, it should behave like a non-replicated database that runs under this isolation level. The concept of 1-copy-serializability is well known and understood ([Bernstein et al. 1987]). It requires the execution in the replicated system to be equivalent to a serial execution in a non-replicated system. In this section, we formally define what it means for ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
10
·
...
a history to be 1-copy-snapshot-isolation (1-copy-SI), and discuss necessary and sufficient conditions for a history to be 1-copy-SI. 3.1 Transactions and histories in a replicated database A replicated database consists of a set of replicas R each of which keeps a copy of the database. That is, our framework assumes full replication. Our model follows a ROWA approach in which each update transaction executes on one replica that performs all its operations. The transaction is called local at this replica, and remote at the other replicas. Only the write operations of a transaction are applied at the remote replicas. Hence, all replicas execute the same set of update transactions, but an update transaction Ti has a readset RSi consisting of all read operations only at one replica while it has the same writeset W Si consisting of its write operations at all replicas. Read-only transactions, in contrast, only exist at the local replica. We express this by using a ROWA mapper function. Definition 3.1. Mapper function. A ROWA mapper function, rmap, takes a set of transactions T and a set of replicas R as input, and transforms T into a set of transactions T 0 = rmap(T , R). rmap(T , R) transforms each update transaction Ti ∈ T into a set {Tik |Rk ∈ R}. In this set there is exactly one local transaction Til where W Sil = W Si and RSil = RSi (Ti is local at Rl ). The rest are remote transactions Tir , where W Sir = W Si and RSir =∅ (Ti is remote at Rr ). A read-only transaction Ti is transformed into a single local transaction Til with RSil = RSi . We denote as T k = {Tik |Tik ∈ T 0 } the set of transactions executed at replica Rk . Executing T 0 at the replicas R leads to what we denote a replicated history. Definition 3.2. Replicated history. Let T be a set of transactions, R a set of replicas and rmap a ROWA mapper function generating T 0 = rmap(T , R). Let k RH k be the history over T k at Rk ∈ R. We denote the union over all S histories RH , Rk ∈ R, as a replicated history RH over rmap(T , R), i.e., RH = RH k , Rk ∈ R .
In the remainder of the paper, we assume that before the start of a replicated history RH, all replicas have the same state of the database, i.e., for each data item x, each replica Rk has the same last committed data version. 3.2 1-copy-SI
We now have to define when a replicated history provides 1-copy-SI, i.e., when it is equivalent to an SI-history over a non-replicated database. We model this by requiring a replicated history over rmap(T , R) to have the same dependencies between read and write operations as a non-replicated SI-history over T . In GID, any such dependency is captured by the means of a ww-, wr- or rw-dependency edge in the SSG. Thus each history RH k at replica Rk has its own SSG(RH k ) reflecting the dependencies that occurred in this history. The union of all these SSGs reflects the sum of all dependencies. An equivalent non-replicated SI-history has to have the same dependencies, except of the start-dependency edges. We first define this set of dependencies as a graph: S Definition 3.3. Union Serialization Graph (USG). Let RH = RH k be a replicated history over rmap(T , R). We denote as U SG(RH) the following graph. ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
...
wr,s T1
ww,s T 2
wr,s rw
T3
A (a) SSG(RHexact−edge )
T1
11
wr,s
ww,s T 2
T4
(b) SSG(RH B exact−edge) Fig. 5.
·
T1 ww,s T2
rw T 3 wr,s (c) SSG(H exact−edge )
T4
SSGs in Example 2
(1) ∀Rk ∈ R, if SSG(RH k ) has node Tik ∈ T k , then U SG(RH) has node Ti . (2) ∀Rk ∈ R and each ww-, wr-, or rw-dependency edge from Tik to Tjk in SSG(RH k ), U SG(RH) has a corresponding ww-, wr-, or rw-dependency from Ti to Tj . (3) There are no further edges or nodes in U SG(RH). Definition 3.4. 1-copy-SI. Let RH = We say RH is 1-copy-SI if
S
RH k be a replicated history over rmap(T , R).
(1) ∀Rk ∈ R, RH k is an SI-history. (2) For all update transactions Ti ∈ T and for all Rk , Rl ∈ R : cki ⇐⇒ cli . (3) There exists an SI-history H over T such that, (a) SSG(H) and U SG(RH) have the same nodes; (b) SSG(H) has exactly the same ww-, wr-, and rw-dependency edges as U SG(RH). (1) means that the histories at all replicas must be SI-histories. In the following we often refer to them as the local histories. (2) means all local histories must commit the same set of update transactions. This is an obvious requirement of ROWA. Finally, (3) means an SI-history over the original set of transactions must exist with the same dependencies. We refer to this non-replicated history over T often as the global history. As with GID in general case, the data items that lead to the dependency edges do not need to be considered. We show in Appendix A that is indeed the case and that our Definition of 1-copy-SI is sufficient. Example 2. In this example, there are two replicas RA and RB . Transactions T1 , T2 , and T3 are local at RA while T4 is local at RB . The replicated history A B RHexact−edge is the union of the local histories RHexact−edge and RHexact−edge . A A A A A A A A A A RHexact−edge : sA 1 w1 (x1 ) w1 (y1 ) c1 s2 w2 (x2 ) s3 c2 r3 (x1 ) c3 [x1 x2 ] B B B B B B B B B B RHexact−edge : sB 1 w1 (x1 ) w1 (y1 ) c1 s2 w2 (x2 ) s4 c2 r4 (y1 ) c4 [x1 x2 ] A B SSG(RHexact−edge ) and SSG(RHexact−edge ) are shown in Fig. 5. For simplicity, the superscript A and B, which indicate replicas, at the transactions are omitted in the figure. It is easy to verify that both RH A and RH B are SI-histories. U SG(RH) is A the union graph of all ww-, wr- and rw-dependency edges of SSG(RHexact−edge ) B and SSG(RHexact−edge ). We can show that the replicated history RHexact−edge is 1-copy-SI by building the following global history Hexact−edge over {T1 , T2 , T3 , T4 }: Hexact−edge : s1 w1 (x1 ) w1 (y1 ) c1 s2 w2 (x2 ) s3 s4 c2 r3 (x1 ) c3 r4 (y1 ) c4 [x1 x2 ] ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
·
12
...
SSG(Hexact−edge ) is shown in Figure 5.(c). It has exactly the same ww-, wr- and rw-dependency edges as U SG(RHexact−edge ). We can also easily see that H avoids G1 and G-SI. Hence, RHexact−edge is 1-copy-SI. In the above example, we have shown the 1-copy-SI property by constructing a nonreplicated history that fulfills the conditions of the 1-copy-SI definition. However, constructing an appropriate non-replicated global SI-history for an arbitrary replicated history that fulfills the 1-copy-SI property is not always trivial. Furthermore, in case a replicated history is not 1-copy-SI, it is difficult to prove that no global SI-history with the appropriate properties exists. Thus, we need a more convenient way to determine whether a replicated history is 1-copy-SI. Bernstein et al. [1987] showed that if the union of the serialization graphs of the histories at different replicas, enhanced with certain edges, is acyclic then the replicated history is 1-copy-serializable. It would be nice if we could use the U SG(RH) for a similar purpose. That is, given that U SG(RH) has certain properties, e.g., avoids certain cycles, then we know that RH is 1-copy-SI. Indeed, the next two sections will discuss a set of properties for U SG(RH) that help to determine whether the replicated history is 1-copy-SI. 3.3 Necessary conditions for a replicated history to be 1-copy-SI It is clear that if U SG(RH) has a G-1c or G-SIb cycle, then RH is not 1-copy-SI because it is not possible for an SI-history H to have a SSG(H) with the same edges. Our first question is whether any other characteristics of U SG(RH) can be determined that show that RH is not 1-copy-SI. Let’s have a look at an example. Example 3. In this example, there are two replicas RA and RB . Transaction T1 and T2 are local at RA , T3 and T4 are local at RB . We assume an initial transaction T0 created x0 and y0 and committed before the following execution starts. A A A A A A A A A A RHhole : sA 1 w1 (x1 ) c1 s2 r2 (x1 ) r2 (y0 ) c2 s4 w4 (y4 ) c4 B B B B B B B B B B RHhole : sB w (y ) c s r (y ) r (x ) c s w (x ) c 4 4 3 3 4 0 3 1 1 1 4 4 3 1 B A ) are shown in Figures 6.(a) and (b) respectively. ) and SSG(RHhole SSG(RHhole The U SG(RH) shown in Figure 6 (c) has no G-1c or G-SIb cycles. Still, RHhole is not 1-copy-SI. We show this by contradiction. Assume RHhole is 1-copy-SI. Then, there must be a global SI-history Hhole which contains the same ww-, wr-, and rw-dependency edges as U SG(RHhole ). Hence, based on wr
rw
T1 −→ T2 − → T4 in U SG(RHhole ) and Table II, we derive for the ≺t -order of H: ) wr T1 −→ T2 =⇒ c1 ≺t s2 =⇒ c1 ≺t c4 rw T2 − → T4 =⇒ s2 ≺t c4 wr
rw
Similarly, due to T4 −→ T3 − → T1 we derive: ) wr T4 −→ T3 =⇒ c4 ≺t s3 =⇒ c4 ≺t c1 rw T3 − → T1 =⇒ s3 ≺t c1 This results in c1 ≺t c4 ≺t c1 which is impossible since ≺t is irreflexive. Thus, no SI-history could have a graph which above edges, and RHhole is not 1-copy-SI. ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
...
s T1
wr, s T 2
s s rw
(a) SSG(RH A
hole )
T4 T4
wr, s T 3
13
rw s rw
T1
T1
(b) SSG(RHB
hole )
Fig. 6.
·
wr
T2
rw
T4
wr
T3
(c) USG(RHhole )
SSGs in Example 3
The problem of RHhole is that T2 and T3 indirectly order T1 and T4 although T1 and T4 do not conflict. In RA , T2 reads x and y from a snapshot after T1 commits but before T4 commits. This indirectly requires T1 to commit before T4 . In contrast, in RB , T3 reads x and y from a snapshot after T4 commits but before T1 commits, indirectly ordering T4 before T1 . In a non-replicated history, only one of the snapshots is possible, that is either T1 commits before T4 or it commits after T4 but not both. U SG(RHhole ) (see Figure 6.(c)) expresses this behavior by having a cycle with more than one rw-dependency edge. In principle, this is not explicitly forbidden by the definition of SI. But it turns out that the particular cycle above is not possible in a non-replicated history. Thus, we define a further phenomenon. —G-SIb*: rw-dependency cycle. A history H exhibits phenomenon G-SIb* if SSG(H) has a cycle with at least one rw-dependency edge and each rwdependency edge is prefixed by a ww-, wr-, or start-dependency edge. We refer to such a cycle as a G-SIb* cycle. G-SIb* refers to cycles where there are no consecutive rw-dependency edges9 . Note that G-SIb* includes G-SIb because if there is a cycle with exactly one rwdependency edge, then this rw-dependency edge must be prefixed with a non rwdependency edge. G-SIb* is a derived phenomenon, i.e., if a history avoids G-1a-c and G-SIa-b, then it automatically avoids G-SIb*. Lemma 3.5. A (non-replicated) SI-history H over a set T avoids G-SIb*. Proof Sketch. The proof follows the lines of reasoning taken in Example 3. Any cycle can be broken into m (m > 1) sections where each section k ∈ {0, . . . , m− (ww/wr/S)+
rw
1} follows the pattern Tik −→ Tjk − → Ti(k+1)%m . From there, we can derive cik ≺t sjk ≺t ci(k+1)%m in the history, eventually leading to c0 ≺t c0 which is a contradiction. A complete proof is given in Appendix A.1. In Example 3, as U SG(RHhole ) has a G-SIb* cycle, we know that RH is not 1-copy-SI. In summary we observe the following necessary conditions. Proposition 3.6. Necessary Conditions for 1-copy-SI. If a replicated history RH is 1-copy-SI, then U SG(RH) has no G-1c or G-SIb* cycles. 3.4 Sufficient conditions for a replicated history to be 1-copy-SI. It turns out that avoiding G-1c and G-SIb* is not only necessary but also sufficient for a replicated RH history to be 1-copy-SI. That is, for a replicated history RH, if all local histories RH k are SI, all Rk commit the same update transactions, and 9 SI
allows cycles with two consecutive rw-dependency edges. Fekete et al. [2005] show that all histories that are SI but not serializable contain cycles with consecutive rw-dependency edges. ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
14
·
...
U SG(RH) has no G-1c and G-SIb* cycles, then RH is 1-copy-SI. In particular, we are able to construct a global SI-history H such that SSG(H) has the same ww-, wr- and rw-dependency edges as U SG(RH). We start with some interesting properties of an RH whose local histories are SI-histories. Lemma 3.7. Let RH be a replicated history over rmap(T , R). At each Rk ∈ R, let RH k be an SI-history over T k . Let each update transaction Ti ∈ T commit at either all or none of the replicas. If U SG(RH) has no G-1c cycle, then for each Rk , Rl ∈ R(k 6= l): xi xj in RH k ⇐⇒ xi xj in RH l . That is, all local histories have the same version orders for all data items, and thus, the same ww-dependency edges in their SSG(RH k ). Proof Sketch. By definition of ww-dependency edges, xi xj implies a path ww +
Ti −→ Tj in the local SSG. If there are different version orders xi xj in RH k ww +
and xj xi in RH l , then SSG(RH k ) has a path Ti −→ Tj and SSG(RH l ) a ww +
reverse path Tj −→ Ti . Thus in contradiction to our assumption U SG(RH) has a G-1c cycle. A complete proof is given in Appendix A.2. As in SI the version order of an object is consistent with the commit order of the transactions updating the object, we can derive the following: Proposition 3.8. Let RH be a replicated history over rmap(T , R). At each Rk ∈ R, let RH k be an SI-history over T k . Let each update transaction Ti ∈ T commit at either all or none of the replicas. If U SG(RH) has no G-1c cycles, then for any Ti , Tj ∈ T writing a common A A data item x and for any replicas RA , RB ∈ R: cA if and only if i ≺t cj in RH B B B ci ≺t cj in RH . That is, two conflicting committed transactions commit in the same order in all local histories. Readers can verify that the replicated history RHexact−edge in Example 2 does obey Lemma 3.7 and Proposition 3.8. Each local history is SI, all histories commit the same set of update transactions and U SG(RH) is acyclic. Both histories have the same version order for x and commit T1 and T2 in the same order. Based on the discussion above, we can state sufficient conditions for a replicated history to be 1-copy-SI. Theorem 3.9. Sufficient conditions for 1-copy-SI. Let RH be a replicated history over rmap(T , R). RH is 1-copy-SI if the following holds (1 ) For each Rk ∈ R, RH k is an SI-history. (2 ) For all update transactions Ti ∈ T and for all Rk , Rl ∈ R : cki ⇐⇒ cli . (3 ) U SG(RH) has no G-1c or G-SIb* cycles. Proof Sketch. To prove this, according to the definition of 1-copy-SI (Definition 3.4), it is sufficient to show that we are able to construct an SI-history H over T with the same ww-, wr-, and rw-dependencies as U SG(RH). The proof consists of three parts. First, we create a global history H based on the dependency edges in U SG(RH). Then, we show that H really has the same dependency edges as U SG(RH). Finally, we show that H is actually an SI-history. ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
... s1
wr T1
ww
rw T 3 wr (a) USG(H exact−edge) T2
T4
c1
s2
c2
s3 s4
Fig. 7.
c1
15 s2
c2
c3
s3
c3
c4
s4
c4
(b) incomplete SCSG(H exact−edge )
s1
·
(c) complete SCSG(H exact−edge)
U SG and SCSG of RHexact−edge
We only give an idea of the main ideas behind each of these three parts and provide a flavor of how the details can be derived. The complete proof and a detailed description of every part is given in Appendix A.3. Part 1: We show the construction of H along the history RHexact−edge of Example 2. We first build a total order between start and commit operations of all committed transactions. Then we fill in the read and write operations, determine the version orders, and determine the versions in the read operations. We first order start and commit operations. Clearly, for each transaction Ti , si ≺t ci . Then, we consider the dependency edges in U SG. The U SG(RHexact−edge ) of our example is shown in Figure 7.(a). Whenever there is a wr- or ww-dependency edge from Ti to Tj , we require ci ≺t sj , whenever there is a rw-dependency edge from Ti to Tj we require si ≺t cj . This is derived from Table II. We can present the ≺t -relations built so far as a Start-Commit-Order Serialization Graph SCSG(RH) where the start- and commit operations are nodes, and there is an edge from node ni to node nj if ni ≺t nj . Figure 7.(b) shows SCSG(RHexact−edge ) for our example so far. As ≺t must be transitive, ni ≺t nj whenever there is a path in the graph. We can see that the graph is acyclic. Indeed, we show in Appendix A.3 that our construction rules avoid any cycle ci ≺t ci because any such cycle would be due to a G-1c or G-1b* cycle in U SG. We now extend SCSG to order any pair of startand commit operations because a history must order all start/commit pairs. For any ci , sj where there is not yet a path from ci to sj or from sj to ci in SCSG we set sj ≺t ci . Figure 7.(c) now shows the complete SCSG(RHexact−edge ). By construction, the graph remains acyclic, and thus, ≺t remains a partial order. The next step includes into the ≺t -order of global history H the read and write operations of each committed transaction Ti by simply ordering them between si and ci . After that, we determine the version order in H. According to Proposition 3.7, all local histories have the same version orders for all data items. We use these version orders for H. Finally, we let in H each read operation of transaction Ti read the same version that Til did in the history RH l of RH in which Ti was local. Coming back to our example, the global history Hexact−edge given in Example 2 and repeated below conforms to the construction rules above. Note that there exist other possible global histories, as the commit order between c2 , c3 and c3 is not restricted. Also the order of non-conflicting operations can be varied. Hexact−edge : s1 w1 (x1 ) w1 (y1 ) c1 s2 w2 (x2 ) s3 s4 c2 r3 (x1 ) c3 r4 (y1 ) c4 [x1 x2 ] Part 2: Next, we have to show that the SSG(H) of the global history H has exactly the same ww-, wr- and rw-dependency edges as U SG(RH). This is true as the version order in H is the same as in the local histories of RH, and transactions in H read the same data versions as the corresponding local transactions do in RH. ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
16
·
...
We can easily confirm this property for our example history Hexact−edge . Part 3: The final part shows that H is actually an SI-history. The proof for G-1 and G-SIa can be easily derived by looking how we constructed H and the fact that U SG(RH) has no G-1c cycles. Showing that H avoids G-SIb is slightly more complex as SSG(H) has more edges than U SG(RH), namely start-dependency edges. The idea is to show that any G-SIb cycle in SSG(H) would require ci ≺t ci in H which is impossible because our construction of ≺t guarantees a partial order. 3.5 Observations Proposition 3.8 indicates that all conflicting transactions must commit in the same order at all replicas. But when is a transaction allowed to commit? According to Snapshot-Write property of SI, if two transactions have write/write conflicts and are concurrent, one of them must be aborted. This rule also needs to hold in a replicated database. But when are two transactions concurrent in a distributed system? In a non-replicated system, two transactions Ti and Tj are concurrent if their lifetimes overlap (i.e., si ≺t cj ∧ sj ≺t ci ). We can define the concurrency of two transactions in a replicated database according to this rule. Definition 3.10. Concurrency. Let RH be a replicated history over rmap(R, T ). Two transactions Ti , Tj ∈ T are concurrent in RH, iff ∃Rk , Rl ∈ R: ski ≺t ckj /akj in RH k and slj ≺t cli /ali in RH l . It means that Ti and Tj are concurrent if and only if Ti does not always start before Tj commits/aborts at all replicas (or vice versa). Note that Rk might be the same as Rl . It means that if Ti and Tj are concurrent in one local history they are considered concurrent. But they are also considered concurrent if Ti executes completely before Tj in one history and completely after Tj in another history. Based on this definition, we can derive another rule for 1-copy-SI. Lemma 3.11. Let RH be a replicated history over rmap(R, T ), and RH is 1copy-SI. If two transactions Ti , Tj ∈ T have write/write conflicts and are concurrent in RH, at least one of them aborts. The proof of this Lemma is given in Appendix A.4. 4. SNAPSHOT ISOLATION AND INTEGRITY CONSTRAINTS Database systems allow the definition of a whole range of integrity constraints, such as primary keys and foreign keys. In this section, we discuss the relationship between snapshot isolation and integrity constraints in a non-replicated system. The next section extends our notions to a replicated environment. 4.1 Motivation An integrity constraint puts constraints on the existence and values of data objects in the system. During the execution of a transaction these constraints might be violated. However, at commit time, all constraints must be obeyed. The most simple constraint is the primary key constraint that disallows the existence of two records in a table with the same value in the primary key attribute. A very common constraint is the foreign key constraint. Assume a department relation D(did, location) with identifer did as primary key, and an employee relation ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
...
·
17
rw
T1 Fig. 8.
rw
T2
SSG(Hskew ) in Example 4
E(eid, ename, did) with identifier eid as primary key and the attribute did as foreign key referring to the department the employee works in. The foreign key constraint requires that if there is an employee record with did=‘d1’ in the employee table, then there is a department record in the department table with did=‘d1’. Another example is that the balance of an account may not be below zero. A bit more advanced, the constraint could require the sum of the balances of all accounts of a client to be at least zero while each individual account can be below zero. Such constraints can be defined at database design time and then are enforced by the database system itself. In order to do so, a database system needs to perform some implicit read operations upon receiving certain update requests. For instance, upon the insert of a new tuple, the system checks whether already a record with the same primary key value exists, and if yes, aborts the transaction. In the foreign key example above, upon the insert of an employee record or the update of the did field of an existing employee record, the system performs an implicit read operation on the department table to check whether a department record exists with the corresponding value in the did attribute. If it exists, the insert/update of the employee record is allowed, otherwise the transaction is aborted. Similarly, upon the delete of a department record or the change of the value of the did field, the system looks at the employee table and checks whether an employee record exists that has the same did value. If yes, the transaction is aborted otherwise the modification is ok10 . In the examples with the account balances, the values of the balances are checked to determine whether the update is possible. In most cases, these read operations are predicate reads. For instance, in the foreign key example, when inserting an employee tuple, the importance is the existence of a corresponding department tuple which can only be expressed as a predicate read. The problem is that if these integrity read operations run under snapshot isolation, integrity constraints could be violated. Example 4. In fact, the most common example given in the literature to show that SI does not provide serializability, is an example of the violation of the constraint that the sum of two given accounts should be above zero. If transactions want to withdraw money from one of the accounts the values of both accounts have to be checked. Let x and y be such accounts with primary key values id=‘a1’ and id=‘a2’. Let T0 have created versions x0 and y0 with balances of 50 for both. A further account z0 exists in the accounts table. Now assume two concurrent transactions, one withdrawing 80 from x and the other 80 from y. Hskew : s1 s2 r1 (sum(balance)≥80 : id=a1 ∨id=a2 : {x0 , y0 } : {x0 , y0 , z0 }) 10 In
this paper we do not consider the SQL CASCADE option where the delete/update of the department tuple would automatically delete/update all corresponding employee tuples. ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
18
·
...
r2 (sum(balance)≥80 : id=a1 ∨id=a2 : {x0 , y0 } : {x0 , y0 , z0 }) w1 (x1 ) w2 (y2 ) c1 c2 The check is modeled as a predicate read (see Section 2.1.3). Input is Iset(P ) = {x0 , y0 , z0 }, the predicate of P is id=a1 ∨id=a2 and thus, P finds the matching set Oset(P ) = {x0 , y0 }. The evaluation F is sum(balance)≥80, executes over Oset(P ) and returns true. Both transactions perform the same predicate read over the same versions which were the committed versions as of start time of T1 and T2 . At the end of execution, the sum over both balances is below zero. The SSG is shown in Figure 8. The execution is SI but integrity constraints are violated. The problem is that reading from a snapshot is not the right thing to do for checking integrity constraints because it does not really help if the constraint holds at the begin of the transaction. Instead, the constraint needs to hold at the time the transaction commits. 4.2 A new isolation level: SI+IC SI-based database systems guarantee that integrity constraints are not violated by distinguishing between standard read operations, that read from a snapshot, and integrity reads that are done to check constraint violations. We model a new isolation level SI+IC based on integrity reads. It is stronger than the basic SI that we discussed in the last two sections, because it avoids integrity constraint violations. It is weaker than serializability because standard read operations continue to read from a snapshot. An SI+IC history should satisfy the following two requirements. (1) It should provide SI properties to operations not related to integrity constraints; (2) If a transaction commits, its updates do not violate the integrity of the database. We model an integrity read operation as a special form of a predicate read. Definition 4.1. Integrity Read. An integrity read operation of transaction Ti is a special predicate read operation iri (F:P:Oset(P):Iset(P))={f, t} where the evaluation function F always returns a boolean outcome of either true (t) or false (f ). Furthermore, the predicate in function P may only contain single-record conditions, i.e., for any xj ∈ Iset(P ), xj ∈ Oset(P ) if and only if P ({xj }) = {xj }. Requiring that the predicate needs to be evaluated individually on each version in Iset(P ) without taking the other versions in Iset(P ) disallows complex conditions such as joins. We will need this restriction to define anti-dependencies appropriately. No such restriction is needed for F . Note that most common integrity constraints can be checked using our definition of integrity reads. This is true, for instance, for all examples of integrity constraints discussed in this paper. Example 5. Assume in our foreign key example tables D(did, location) and E(eid, ename, did), with D consisting of d10 =(‘d1’, ‘Chicago’) and d20 =(‘d2’, ‘New York’) inserted by transaction T0 . When transaction T1 inserts a new employee (‘e1’, ‘Mike’, ‘d1’) it performs an integrity read iri (6=∅:D.did=d1:{d10 }:{d10 , d20 }). The predicate defined in P is D.did=d1 searching for records in D with id d1. The versions accessed are Iset(P ) = {d10 , d20 }. The function F is 6=∅. It receives the only matching version Oset(P ) = {d10 } as input, and thus, returns true. ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
...
·
19
Of course, only performing integrity reads is not enough. The transaction must also perform the proper actions depending on the outcome of the integrity read. Definition 4.2. IC-obeying. We say a transaction Ti is IC-obeying, if it performs the integrity reads necessary to confirm that its write operations do not violate integrity constraints and it aborts when at least one of these integrity reads returns false. Example 6. Let us stay with the foreign key example. Assume above tables D(did, location) and E(eid, ename, did) and the same data versions d10 and d20 in the department table. Furthermore, the employee table has two unborn data versions e1init and e2init . Now assume a transaction T1 inserts employee e1 and transaction T2 deletes the department d1. T1 : insert into E values (‘e1’, ‘Mike’, ‘d1’); T2 : delete from D where did=d1; Now assume a serial execution where T1 runs before T2 . For simplicity, we ignore in this and all following examples that T1 also needs to check a primary key constraint. HIC−obey : s1 ir1 (6=∅:D.did=d1:{d10 }:{d10 , d20 })=t w1 (e11 ) c1 s2 ir2 (=∅:E.did=d1:{e11 }:{e11, e2init })=f a2 T1 ’s integrity read determines that a department tuple with department id d1 exists and returns true. Thus, T1 performs the insert and commits. After that T2 ’s integrity read determines that the department to be deleted has already an employee and returns false. Thus, T2 aborts. As mentioned above, it is not the transaction written by the application programmer that performs the integrity reads. Instead, the database system extends the application transaction automatically by the necessary integrity reads and forces them to be IC-obeying. In commercial systems the integrity read takes typically place before the corresponding write operations or just at commit time (using deferred constraint checking). In theory, it could be any time during the execution of the transaction. The important issue is that the integrity constraint should hold at the time the transaction commits. That is, while the read takes place sometime before the commit, it should be still valid at the time of commit. It is useless if a transaction T performs an integrity read on an object x, but the object x is overwritten before T commits in such a way that the integrity constraint does not hold anymore. This is exactly the problem of history Hskew of Example 4. While T2 ’s read finds a sufficiently large balance, the balance is too low at commit time. The question is what it means that the integrity read is still valid at the time of commit. We can observe that the outcome of an integrity read iri (F:P:Oset(P):Iset(P)) can only be changed by a write operation if it affects Oset(P ) as this is the input for the evaluation function F . For instance, in above foreign key example, it matters whether T2 performs its integrity read before T1 ’s insert (Oset(P ) = {} and thus evaluation F returns true) or after the insert (Oset(P ) = {e11} and F returns false). In contrast, if T1 inserted (‘e1’, ‘Mike’, ‘d2’), then T2 ’s integrity read would return true independently of when T1 ’s insert occurs, because Oset(P ) would always be the empty set. We express such behavior by defining anti-dependencies for integrity reads different than for ordinary reads. ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
20
·
...
Definition 4.3. IC-dependencies. Let H be a history over transactions T . Let Ti ∈ T perform an integrity read iri (F:P:Oset(P):Iset(P))=t. 1. ∀xj ∈ Iset(P ), Ti directly IC-read-depends on Tj . 2. ∀xj ∈ Oset(P ), and xk follows xj in the version order, Tk directly IC-antidepends on Ti . 3. ∀xj ∈ Iset(P ) \ Oset(P ) and xk , Tk directly IC-anti-depends on Ti if the following conditions are fulfilled: • xj xk and • P ({xk }) = {xk } and • ∀xl such that xj xl xk : P ({xl }) = ∅ Property (1) defines IC-read-dependencies in the same way as for normal predicate reads. Property (2) indicates that Tk directly IC-anti-depends on Ti if there is a data item x, the version xj accessed by Ti ’s integrity read matches the predicate and Tk creates the next version for x. This IC-anti-dependency reflects that if Ti ’s integrity read accessed xk instead of xj , the outcome of its evaluation F might change. Property (3) indicates that Tk directly IC-anti-depends on Ti if there is a data item x, the version xj accessed by Ti does not match the predicate, and Tk is the first transaction to create a version of x that matches the predicate while all versions xl that are in the version order after xj but before xk do not match the predicate. If Ti ’s integrity read accessed xl instead of xj the outcome of F would not change as neither xj nor xl appear in Oset(P ), thus Tl does not IC-antidepend on Ti . However, if Ti ’s integrity read accessed xk instead of xj , Oset(P ) would contain xk and the outcome of F could change. With this, we express the following requirements for integrity reads. Definition 4.4. IC-Consistency. Let H be a history over a set of transactions T . An integrity read operation iri (F:P:Oset(P):Iset(P))=t of committed transaction Ti ∈ T is IC-consistent if the following holds. (1) If Ti directly IC-read-depends on transaction Tj due to this integrity read then cj ≺ t ci . (2) If transaction Tk directly IC-anti-depends on Ti due to this integrity read then ci ≺ t ck . Property (1) guarantees that the read reflects a committed version at the time Ti commits. Property (2) guarantees that any transaction that changes the outcome of the integrity read commits after Ti . If all integrity reads of a transaction T are ICconsistent and T is IC-obeying, then it is guaranteed that the integrity constraints related to T ’s write operations hold when T commits. Example 7. Let us continue with the same setup as in Example 6 but with an interleaved execution. In the following history, although the transactions are ICobeying, the foreign key constraint is violated at the end of the execution. HIC−bad : s1 s2 ir1 (6=∅:D.did=d1:{d10 }:{d10 , d20 })=t ir2 (=∅:E.did=d1:{}:{e1init , e2init })=t w1 (e11 ) w2 (d1dead ) c2 c1 T1 ’s integrity read finds a department with id d1. Hence, T1 can continue to insert the employee tuple. Similarly, T2 ’s integrity read finds no employee associated ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
...
·
21
with the department. Hence, T2 can continue to delete the department. After both commit, the employee (‘e1’, ‘Mike’, ‘d1’) refers to a non-existing department. Clearly this history does not respect foreign key constraints. A closer look reveals that T2 directly IC-anti-depends on T1 as T1 ’s integrity read accesses data version d10 , d10 is in Oset(P ) and T2 creates the next version d1dead of d1. However, T1 does not commit before T2 . Thus, property (2) of IC-dependency is violated and T1 ’s integrity read is not IC-consistent. Note that T1 also directly IC-anti-depends on T2 as T2 ’s integrity read accesses e1init and e2init which both do not match P , and T2 creates data version e11 that matches P . Nevertheless T2 is IC-consistent, as it commits before T1 . We now derive our new isolation level as follows. Definition 4.5. Snapshot Isolation and Integrity Constraints (SI+IC). A history H over a set of IC-obeying transactions T is an SI+IC history if it fulfills the Snapshot-Read and Snapshot-Write properties (Definitions 2.3 and 2.4), and all integrity reads of committed transactions are IC-consistent. Using this definition, the history HIC−bad of Example 7 is not an SI+IC history because T1 has an integrity read that is not IC-consistent. If T1 ’s integrity read did actually read the version d1dead (leading to Oset(P ) being empty and F to return false) but the remaining operations remained the same, then T1 would not be IC-obeying anymore. The integrity read would detect that no department exists but the insert would nevertheless occur. In contrast, HIC−obey of Example 6 is an SI+IC history, as the integrity reads are IC-obeying, and T2 ’s integrity read is IC-consistent. Note that IC-consistency is not defined for T1 ’s integrity read as T1 does not commit. In fact, our definition is somewhat stronger than what is needed. Let us explain this through an example. Example 8. In a variation of the foreign key example, T2 does not delete the department tuple but simply changes the location of the department (e.g., update D set location = ‘New York’ where did = ‘d1’). Note that T2 does not require to perform an integrity read for this update. Consider the following execution: HIC−rename : s1 s2 ir1 (6=∅:D.did=d1:{d10 }:{d10 , d20 })=t w1 (e11 ) w2 (d12 ) c2 c1 T1 ’s integrity read has the initial version d10 matching the predicate and the write operation is executed. Then, T2 renames the department, creating d12 and commits before T1 terminates. As T2 creates a new data version d12 where the previous version d10 is element of Oset(P ) of T1 ’s integrity read, the integrity read is not IC-consistent, and thus HIC−rename is not considered SI+IC. However, the history does not violate integrity constraints. If the integrity read were performed on d12 , the outcome would still be true. An execution with deferred integrity reads (performed at commit time) would capture this fact: HIC−rename0 : s1 s2 w1 (e1 ) w2 (d12 ) c2 ir1 (6=∅:D.did=d1:{d12 }:{d12 , d20 })=t c1 If the integrity read is performed at commit time on the latest committed versions, then the true state of the database at commit time is captured. In HIC−rename0 both transactions commit and the history is SI+IC. ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
22
·
...
Despite being too restrictive, i.e., some histories that do not violate integrity constraints (e.g., HIC−rename ) are not considered SI+IC, we think our definition is appropriate as it is simple and, as will be shown in the next section, captures well how locking-based integrity reads and deferred integrity checking work. 4.3 Implementing Integrity Constraints Many commercial database systems use locking for integrity reads. As integrity reads are mostly predicate reads, this is tricky. Thus, often only primary key, foreign key, and constraints on individual tuples are handled correctly, as they can be implemented through locks on the primary key index. In many cases, the integrity read takes place immediately before the corresponding write operation is executed. The read does not read from a snapshot but the latest committed version and it has to be guaranteed that the outcome of evaluation does not change until commit time. Thus, long locks are set. Example 9. Assume again T1 inserting an employee and T2 deleting the department. T1 has to get a lock on the department key d1 before inserting the employee tuple. T2 has to get a write lock on d1 as it is going to delete this record. Furthermore, it has to check for employee records with foreign key d1. It has to find all committed entries, i.e., it may not read from the snapshot. Let us denote with S (X) a shared (exclusive) lock request. Then, a possible history is: Hlocks : s1 s2 S1 (D.did=d1) X2 (D.did=d1)[blocked] ir1 (6=∅:D.did=d1:{d10 }:{d10 , d20 })=t w1 (e11 ) c1 ir2 (=∅:E.did=d1:{e11 }:{e11, e2init })=f a2 In Hlocks T1 is the first to get the shared lock on d1, it then finds a department record. When T2 now tries to get an exclusive lock on d1 the lock T2 has to wait. T1 inserts the employee tuple, commits, and releases its lock. Now T2 gets the lock, performs the integrity read over all committed versions of employee records and finds the record inserted by T1 . It has to abort. Note that if a transaction wants to delete or update d2 it can do so concurrently as T1 and T2 only set locks on d1. If T2 did get first the lock, then T1 would be blocked. T2 would check the employee table with only unborn versions, and thus delete d1, commit and release its locks. After that T1 would get the lock on d1, find no department tuple, and abort. Two parts play a role in the correct implementation. The lock on d1 guarantees that the conflict is detected and one transaction is blocked until the other terminates. The fact that the integrity read of T2 does not access a snapshot but the latest committed versions guarantees that no updates are missed. Integrity constraints can also be defined with the “deferred option”. A possible implementation can be as follows. The write operation first executes without checking any integrity violation. At the end of transaction, a validation takes place performing integrity reads on the latest committed values. The values of the own writes can be considered. For simplicity of description, we assume validation and commit are done atomically so no locks need to be set. Example 10. Taking again the example above, one possible history could be Hopt : s1 s2 w1 (e11 ) w2 (d1dead ) ir1 (6=∅:D.did=d1:{d10 }:{d10 , d20 })=t c1 ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
... Dependency Type
SSG
Directly IC-read-depends
Ti −→ Tj
wir
irw
Directly IC-anti-depends
Ti − → Tj
commit-depends
Ti −→ Tj
C
Table III.
·
23
Edge name IC-read-dependency edge or wir-edge IC-anti-dependency edge or irw-edge commit-dependency edge
IC dependencies
ir2 (=∅:E.did=d1:{e11 }:{e11, e2init })=f a2 Both perform their writes. T1 first enters validation, reads the last committed version d10 of d1 and commits. Thanks to the atomicity of validation, T2 reads the last committed version of e11 during its validation. It has to abort. If validation is performed in reverse order, the situation is similar. The implementation of integrity reads to check the primary key constraints are similar. In a locking based approach, the primary key value would be locked. For other constraints on a single record (e.g., the balance may not be below zero), SI already disallows two transactions to concurrently perform updates. For constraints spanning more records (e.g., the sum of the balances of a set of accounts), more advanced predicate locks would be needed. Thus, many systems do not allow the specification of such constraints. In particular, only few systems support assertions. In such case, the application has to include explicit read operations into the transactions. However, in this case, the database system is not able to recognize them as integrity reads, and thus, typically lets them, incorrectly, read from a snapshot. 4.4 SI+IC in GID We have seen in Section 2 how we can check a set of phenomena (G-1, G-SI) to determine whether a history runs under SI. In this section, we show how we can extend the list of phenomena to check whether a history runs under SI+IC. Apart of IC-read and IC-anti-dependencies that were already introduced in the last section, we say that Tj commit-depends on Ti if Ti commits before Tj commits. All new dependencies are summarized in Table III. We have to extend the definition of SSG to include these new dependencies. Definition 4.6. Start-ordered Serialization Graph (SSG). The SSG(H) of a history H over a set of IC-obeying transactions T is a directed graph where each node in SSG(H) corresponds to a committed transaction in H, and there is a ww-, wr-, rw-, wir-, irw-, start- , or commit- dependency edge from Ti to Tj if Tj directly write-, directly read-, directly anti-, directly IC-read, directly IC-anti, start-, or commit-depends on Ti , respectively. Given that the graph has now more types of edges, the question is how many of the phenomena G-1 and G-SI have to be adjusted to consider the new edges, and whether we have to add new phenomena. It turns out that we have to adjust very little. G-1 and G-SIa remain as they are. We only have to adjust G-SIb and add one new phenomenon: —G-SIb: Missed Effects. A history H over a set of IC-obeying transactions T exhibits phenomenon G-SIb if SSG(H) contains a directed cycle with exactly one ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
24
·
...
rw-dependency edge that is prefixed by a ww-, wr-, or start-dependency edge. We refer to such cycle as a G-SIb cycle. —G-IC: IC Violation. A history H over a set of IC-obeying transactions T exhibits phenomenon G-IC if SSG(H) contains a wir- or irw-dependency edge from Ti to Tj without there also being a commit-dependency edge from Ti to Tj . G-IC reflects the requirements that an object version accessed in an integrity read must be installed before the reading transaction commits (wir-dependency edge accompanied by a commit-dependency edge) and that if a later version changes the outcome of an integrity read, then it is only installed after the reading transaction commits (irw-dependency edge accompanied by a commit-dependency edge). G-SIb is simply extended to reflect that the phenomenon only occurs if the rwdependency edge in the cycle is prefixed by ww-, wr- or start-dependency edges as SI+IC-histories are allowed to have a cycle where the rw-dependency edge is prefixed by a wir- or irw-dependency edge. Example 11. Assume a transaction T0 created versions x0 , y0 and z0 . Now assume the following history Hcycle : s1 s2 r1 (x0 ) w2 (x2 ) w2 (y2 ) c2 s3 r3 (y2 ) w3 (z3 ) c3 ir1 (F:P:{z3 }:{z3 })=t c1 This history is SI+IC as the read operations r1 (x0 ) and r3 (y2 ) read committed versions as of transaction start, no conflicting writes exist, and T1 ’s integrity read accesses z3 which is the latest installed version at the time T1 commits. As SSG(Hcycle ) (Fig. 9) contains a cycle where an rw-dependency edge is prefixed by a wir-dependency and a commit-dependency edge, such cycles need to be allowed. We now show that the avoidance of G1, G-SI and G-IC is sufficient and necessary for a history to be SI+IC. Theorem 4.7. Necessary conditions for SI+IC. An SI+IC history H over a set of IC-obeying transactions T avoids G-1, G-SI and G-IC. Proof Sketch. As G-1 and G-SIa are not concerned with integrity reads, the main part of the proof is to show that G-IC and the new definition of G-SIb are avoided. This is straightforward for G-IC. If Tj directly IC-read-depends on Ti , then wir
an SI+IC history orders ci ≺t cj . Therefore, the wir-dependency edge Ti −→ Tj C
in SSG(H) is accompanied by a Ti −→ Tj edge. If Tj directly IC-anti-depends on Ti , then an SI+IC history orders ci ≺t cj . Therefore, the irw-dependency edge irw
C
Ti − → Tj in SSG(H) is accompanied by a Ti −→ Tj edge. Assume that G-SIb is not avoided. There will be a cycle in which the rwdependency edge is prefixed by a ww-, wr- or start-dependency edge. Since G-SIa and G-IC hold, there must also be a cycle that consists only of start- and commitdependency edges and a single rw-dependency edge. That is, the cycle has the S∗
C∗
S+
rw
form (Ti −→ Tj −→ Tk )∗ −→ Tp − → Ti . One can derive that this implies (ci ≺t sj ≺t cj ≺t ck ) ≺t sp ≺t ci in H which is impossible. The detailed and complete proof is given in Appendix B.1. ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
...
Fig. 9.
rw
T2
25
irw
wir,c T1
·
wr,s
T1
T3
SSG(Hcycle ) of Example 11
Fig. 10.
irw, c
T2
SSG(HIC−bad ) of Example 12
Theorem 4.8. Sufficient Conditions for SI+IC. If a history H over a set of IC-obeying transactions T avoids G-1, G-SI and G-IC, then it is an SI+IC history. Proof Sketch. We have to show that H fulfills the Snapshot-Read and SnapshotWrite properties and all its integrity reads are IC-consistent. For Snapshot-Read and Snapshot-Write we refer to Appendix B.2. Showing that there is no integrity read that is not IC-consistent is again straightforward. As the avoidance of G-IC guarantees that each IC-dependency edge from Ti to Tj has a commit-dependency edge in the same direction, the proper commit order required by the IC-consistency Definition 4.4 is always maintained. The details are given in Appendix B.2. Example 12. Let us revisit HIC−bad of Example 7. HIC−bad : s1 s2 ir1 (6=∅:D.did=d1:{d10 }:{d10 , d20 })=t ir2 (=∅:E.did=d1:{}:{e1init , e2init })=t w1 (e11 ) w2 (d1dead ) c2 c1 SSG(HIC−bad ) is shown in Figure 10. In the figure, the irw-dependency edge from T2 to T1 is associated with a commit-dependency edge, but the other irwdependency edge is not. Hence, HIC−bad exibits the G-IC phenomenon. As discussed, it is not an SI+IC history because one of the integrity reads is not ICconsistent. And this anomaly is expressed through the G-IC phenomenon. 4.5 Observations Theorem 4.8 states that it is sufficient to show that a history avoids G-1, G-SI and G-IC in order to know that it is an SI+IC history. Now we show that such a history avoids a further phenomenon: —A history H over a set of IC-obeying transactions T exhibits phenomenon G1c* if SSG(H) contains a cycle that consists entirely of wr-, ww-, wir–, and irw–dependency edges. We refer to such a cycle as G-1c* cycle. Lemma 4.9. An SI+IC history H avoids G-1c*. Proof. Assume it has such a cycle. Due to G-SIa and G-IC, there is also a cycle that consists only of commit- and start-dependency edges. This is impossible since each edge Ti to Tj in the cycle implies ci ≺t cj , and thus transitively ci ≺t ci . 5. 1-COPY-SI+IC In this section we extend our definition of 1-copy-SI to cover integrity constraints, denoting the new correctness criterion as 1-copy-SI+IC, and discuss sufficient conditions for a replicated history to be 1-copy-SI+IC. A first issue is how to handle integrity reads in a replicated environment. Normal reads are executed at only one replica. However, an integrity read is something ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
26
·
...
tightly related to the write operation. It checks something that has to hold in order for the write operation to be allowed to execute. One possibility to assure the proper behavior of the write is to include the integrity read at all replicas. We assume this execution model. Therefore, we extend the ROWA mapper function of Definition 3.1 to include integrity reads at all replicas. We denote at IRSi the set of all integrity read operations of transaction Ti . Definition 5.1. Mapper function. A ROWA mapper function, rmap-ic, takes a set of IC-obeying transactions T and a set of replicas R as inputs, and transforms T into a set of IC-obeying transactions T 0 = rmap-ic(T , R). rmap-ic(T , R) transforms each update transaction Ti ∈ T into a set of transactions {Tik |Rk ∈ R}. In this set there is exactly one local transaction Til where W Sil = W Si , IRSil = IRSi and RSil = RSi (Ti is local at Rl ). The rest are remote transactions Tir , where W Sir = W Si , IRSir = IRSi and RSii =∅ (Ti is remote at Rr ). A read-only transaction Ti is transformed into a single local transaction Til with RSil = RSi . We denote as T k = {Tik |Tik ∈ T 0 } the set of transactions executed at replica Rk . From there we define 1-copy-SI+IC as below. Recall that the U SG(RH) of a replicated history RH contains the ww-, wr- and rw-dependency edges of the SSG(RH k ) of all replicas. S Definition 5.2. 1-copy-SI+IC. Let RH = RH k , Rk ∈ R, be a replicated history over rmap-ic(T , R). We say that RH is 1-copy-SI+IC if
(1) For each Rk ∈ R, RH k is an SI+IC history; (2) For all update transactions Ti ∈ T and for all Rk , Rl ∈ R: cki ⇐⇒ cli ; (3) There exists a global SI+IC history H over IC-obeying T such that (a) SSG(H) and U SG(RH) have the same nodes. (b) SSG(H) has exactly the same ww-, wr-, and rw-dependency edges as U SG(RH). Note that IC-dependency edges are not considered in U SG(RH). Thus, local histories can have different integrity reads as long as all integrity reads have the same effect, i.e., either all local histories and the global history have integrity reads that return true, and thus, the transaction commits, or all local histories and the global history have integrity reads that detect a violation, and thus, abort the transaction. Which version of a data item each of the histories reads is not relevant, as long as the outcome of the integrity read is the same everywhere. Example 13. Let’s revisit the example where T1 inserts an employee and T2 changes the name of the corresponding department. Again, before the execution there exist data versions d10 , d20 , e1init and e2init . A possible execution is A A A A A A A RHrename : sA 1 ir1 (6=∅:D.did=d1:{d10 }:{d10 , d20 })=t w1 (e11 ) c1 s2 w2 (d12 ) c2 B B B B B B B RHrename : s2 w2 (d12 ) c2 s1 ir1 (6=∅:D.did=d1:{d12 }:{d12 , d20 })=t w1 (e11 ) cB 1 T1 performs an integrity read on d10 at RA , and on d12 at RB . In both cases, the A B subsequent write (insert of employee) can succeed. SSG(RHrename ) and SSG(RHrename ) are shown in Figure 11.(a) and (b), respectively. At the commit time of any of the transactions, no integrity constraint is violated. The U SG(RHrename ) only contains A T1 and T2 but no edges. A global history could be equivalent to either RHrename ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
...
T1
irw, c, s
T2
T1
A (a) SSG(RH rename)
wir, c, s
·
27
T2
B (b) SSG(RH rename ) Fig. 11.
SSGs of Example 13
B or RHrename . Although T1 reads different versions at the different replicas, this does not matter as both integrity reads return true.
In the following we describe a set of sufficient conditions for a replicated history to be 1-copy-SI+IC. If these conditions hold then we are able to construct an SI+IChistory H that has the same wr-, ww-, and rw-dependency edges as U SG(RH), and for every committed transaction Ti , there exists at least one replica Rk , such that the integrity reads of Ti in H access exactly the same data versions as the integrity reads of Tik in RH k . If RH k is an SI+IC-history over IC-obeying transactions, we have the guarantee that these integrity reads return true. Therefore, if the corresponding transaction Ti in H performs the integrity reads over the same versions, we know that they also return true, and thus, Ti is also IC-obeying. Note that we allow different transactions to have integrity reads from different replicas, e.g., Ti can have the same integrity reads as in RH k , while Tj has the same integrity reads as in RH l . But we require all integrity reads of an individual transaction Ti to be taken from one local history because they might be related to each other (e.g., the sum of x and y may not be below 100). Definition 5.3. UnionSSerialization Graph with Integrity Dependencies (USG-IC). Let RH = RH k be a replicated history over rmap-ic(T , R). We denote as U SG-IC(RH) the following graph. (1) It has the same nodes as U SG(RH). (2) It has the same ww-, wr-, and rw-dependency edges as U SG(RH). (3) For each Ti ∈ T , there exists Rk ∈ R, each wir-dependency edge from Tjk to Tik in SSG(RH k ) has a corresponding wir-dependency edge from Tj to Ti in USG-IC(RH), and each irw-dependency edge from Tik to Tjk in SSG(RH k ) has a corresponding irw-dependency edge from Ti to Tj in U SG-IC(RH). (4) There are no further edges or nodes in U SG-IC(RH). Note there is no unique USG-IC(RH) since there can be many combinations of choosing a local history RH k for a transaction Ti . That is, if there are n replicas and t transactions there could be as many as nt different USG-IC(RH). Theorem 5.4. Sufficient Conditions for 1-copy-SI+IC. Let RH be a replicated history over rmap-ic(T , R). RH is 1-copy-SI+IC if the following holds —For each Rk ∈ R, RH k is an SI+IC-history. —For all update transactions Ti ∈ T and for all Rk , Rl ∈ R, cki ⇐⇒ cli ; —There exists an U SG-IC(RH) that has no G-1c* or G-SIb* cycles. ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
28
·
...
irw T1
T2
(a) USG−IC(RH ) rename Fig. 12.
s1
c1
s2
c2
(b) complete SCSG(RH ) rename
U SG-IC and SCSG of RHrename
Proof Sketch. The proof is similar to the one for Theorem 3.9. We have to construct an SI+IC-history H with the same nodes and the same ww-, wr-, and rwdependency edges as U SG(RH). Again, we only describe the main ideas and refer for the details to Appendix C.1. We use RHrename of Example 13 as an example to illustrate our steps. We choose for U SG-IC(RHrename ) the IC-dependency edge from SSG(RH A ). Thus, U SG-IC(RHrename ) in Figure 12.(a) consists of nodes T1 and T2 with a single irw-dependency edge from T1 to T2 . Part 1: We build a similar SCSG(RH) graph as we have done for Theorem 3.9 which provides a total order between pairs of start- and commit operations. As U SG-IC(RHrename ) does not have any ww-, wr- or rw-dependency edges, s1 and s2 are ordered both before c1 and c2 . Additionally, we also relate some commit pairs. Whenever there is a wir- or irw-dependency edge from Ti to Tj in U SG-IC, we require ci ≺t cj , and thus we connect ci to cj in SCSG. This reflects that the global history must have IC-consistent integrity reads. In our example, we require c1 ≺t c2 . Figure 12.(b) shows the completed SCSG(RHrename ) which remains acyclic. The detailed proof in Appendix C.1 shows that this construction generally provides an acyclic SCSG, and thus, a partial order ≺t , if U SG-IC(RH) has no G-1c* and G-SIb* cycles. The ≺t order of read and write operations, the version order and the versions read by read operations are determined as in the proof of Theorem 3.9. Additionally, for each integrity read operation iri of committed Ti we need to determine the set of versions accessed. When constructing U SG-IC(RH), let Rk ∈ R be the replica such that the wir/irw-dependency edges for Ti were taken from SSG(RH k ). Then we let iri access the same versions as the corresponding irik accessed in RH k . In A our example T1 performs in Hrename the same integrity read as T1 in RHrename . Thus, a possible final global history is Hrename : s1 s2 ir1 (6=∅:D.did=d1:{d10 }:{d10 , d20 })=t w1 (e11 ) c1 w2 (d12 ) c2 Part 2: We have to show that the SSG(H) of the newly constructed global history H has the same ww-, wr- and rw-dependency edges as USG(RH). This part of the proof is the same as Theorem 3.9 proof part (2). For our example, it is trivially true, since there are no ww-, wr- or rw-dependency edges. Part 3: Finally, we have to show that H is an SI+IC history. As most of the proof is similar to the proof of Theorem 3.9, we only look at integrity constraints. As the global history performs the integrity reads of committed transactions on the same data version as the corresponding integrity reads in one of the local histories, the outcome must be the same, namely true. Thus, all committed transactions are IC-obeying. Furthermore, it guarantees that G-IC is avoided because SSG(H) has now the same IC-dependency edges as U SG-IC(RH). As we set ci ≺t cj in our ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
...
·
29
construction of H whenever U SG-IC(RH) has an IC-dependency edge from Ti to Tj , H avoids G-IC. The details are given in Appendix C.1. 6. REPLICATION PROTOCOL In this section, we show how our formalism can be used to prove the correctness of replica control protocols. We first present SRCA, a replica control protocol presented in [Lin et al. 2005]. We show that this protocol provides 1-copy-SI+IC. We then extend the protocol to accomodate some extensions proposed in [Lin et al. 2005] and show that this extended protocol SRCA-Ex is still 1-copy-SI but no more 1-copy-SI+IC. Finally, we provide a third protocol SRCA-2PC, a simple extension of SRCA-Ex, that again is 1-copy-SI+IC. In all protocols, there is one middleware that coordinates transaction execution among the database replicas R. We assume that integrity constraints are checked in deferred mode. Furthermore, we assume that all database instances implement SI in the following way. A write of transaction Ti on data item x creates a new version, a read of transaction Ti on data item x reads the last version of x that was committed before Ti started (or its own version if it has created one). Snapshot-Write uses the first-committer-wins strategy. Transactions perform their write operations optimistically, creating new versions on-the-fly. At commit time of a transaction Ti , a validation takes place. If a transaction Tj committed after Ti started and wrote one of the objects that was written by Ti , then Ti has to abort and all versions it created are discarded. Otherwise, validation succeeds. Ti commits and its versions become the latest committed versions. 6.1 SRCA Protocol This section presents the SRCA protocol proposed in [Lin et al. 2005]. It works as follows. The client connects to the middleware via a standard database interface such as JDBC. When the client starts a transaction Ti , the middleware chooses any database replica Rl ∈ R as local replica. All operations of Ti are forwarded to this replica and executed within transaction Til . At commit time, if Ti was a read-only transaction, Til is simply committed at its local replica. If Ti is an update transaction, the middleware extracts the records changed by Til from Rl . These changed records represent the writeset of Ti . It then performs a validation similar to the one within the database system described above. For that purpose it keeps track of the writesets of all committed transactions and uses a timestamp mechanism to determine whether transactions are concurrent. Validation checks whether Ti ’s writeset overlaps with the writeset of any transaction Tj that already validated and is concurrent to Ti . If such an overlap exists, Ti ’s validation fails and the middleware aborts Til . Otherwise, validation succeeds and Ti ’s writeset has to be applied at all replicas. For that, the middleware keeps a queue Qk for each database replica Rk . It appends Ti and its writeset to the queues of all replicas. At the local replica Rl , when Ti is the first in Ql , the middleware commits Til and removes Ti from Ql . At a remote replica Rr , when Ti is the first in Qr , the middleware starts Tir , applies the changes, commits Tir and removes Ti from Qr . If a commit fails, the middleware tracks this accordingly. Note that the protocol does not conform to our mapper functions defined in Definitions 3.1 and 5.1. An update transaction Ti that aborts does not have matching ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
·
30
... wr,s
A R
T1 s1 w(x1)
c1
T1
T2
B R
T4
A
SSG
s4 r(x1) r(y0) c4
rw
QA
T1,T2
QB
T1,T2
T2
T1
T1
SSG
s1 w(x1) c1
rw
T2
T2
T2 s2 w(y2) T3 s3 w(x3) T5
rw
T2
s2 w(y2) c2
T4 MW
s
T5
B
wr c2
T1
rw
T2
a3 s5 r(x0) r(y0) c5
T4
rw
T5
rw time
USG
(b) SSGs
(a) Execution
Fig. 13.
Example 14
transactions Tir at remote replicas. We can simply imagine dummy transactions at these remote replicas consisting only of start and abort operations. Example 14. Figure 13.(a) shows an example execution. The set of transactions is T = {T1 = w1 (x), T2 = w2 (y), T3 = w3 (x), T4 = r4 (x), r4 (y), T5 = r5 (x), r5 (y)}. T1 and T4 are local at RA , while the rest are local at RB . We use grey boxes to identify remote transactions at each replica. The figure shows the temporal evolution of the middleware and database execution from left to right. Dash lines indicate the causal relationship of events between middleware and databases. For better readability we omit superscripts, that is, we write T1 instead of T1A , as it should become clear from the description where the transaction executes. T1 is started at RA concurrently to T2 and T3 at RB . All of them can finish execution until commit. Assume T1 is the first to submit the commit request. The middleware extracts the writeset and performs validation. T1 ’s validation succeeds and T1 is added to the queues QA and QB . Shortly after T2 wants to commit, its validation succeeds as T1 and T2 do not have a write/write conflict. T2 is also added to queues QA and QB . When now T3 wants to commit, validation fails as the middleware detects that concurrent transaction T1 has already validated and a write/write conflict with T3 . Thus, the middleware tells RB to abort T3 . T3 is not added to the queues. At RA T1 now commits and T2 starts, applies its writeset and commits. Before T2 commits, read-only transaction T4 starts at RA . As the database uses SI, it reads the version x1 created by T1 and y0 as T2 has not yet committed. At RB , T1 is started, the writeset applied and T1 committed. The execution succeeds within the database as concurrent transaction T2 has no write/write conflict and T3 is aborted. After T1 ’s commit, the middleware submits T2 ’s commit. Again it succeeds as there are no conflicts. Before T1 commits at RB , T5 is started, thus reading data versions x0 and y0 . Figure 13.(b) shows the SSGs of the two local histories and the U SG of the replicated history. No cycles exist. Example 15. Our next example shows the foreign key constraint example with existing department records d10 and d20 , T1 inserting the first employee e1 for this ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
...
·
31
department and T2 deleting the department. A possible execution is as follows. T1 is submitted to RA and T2 to RB . At commit time of T1 the middleware extracts the writeset containing e11 . Validation succeeds and T1 is added to queues QA and QB . Now T2 submits its commit request, the middleware extracts the writeset containing d1dead . Validation again succeeds because there is no write/write conflict on any data item (the middleware does not check integrity constraints). T2 is added to both queues. Now the middleware submits the commit of T1 to RA . The integrity read accesses d11 at RA and returns true. Thus, T1 can commit at RA . Now T2 starts at RA , and the write operation is applied. When the middleware submits the commit for T2 to RA , the integrity read accesses e11 and returns false. Thus, T2 aborts at RA . At the same time, the middleware starts T1 at replica RB , executes the insert and submits the commit. T1 performs the integrity read accessing the last committed version d10 of the department (T2 has not yet committed at RB ). Thus T1 also commits at RB . When the middleware now submits the commit of T2 to RB , the integrity read accesses the last committed version e11 of the employee tuple and aborts. Therefore, although the middleware validates both transactions successfully, T2 aborts at both replicas. The middleware has to track this accordingly and adjust counters, etc. Thus, the execution is as follows. The A/B superindexes are omitted for better readability. RH A : s1 w1 (e11 ) ir1 (6=∅:D.did=d1:{d10 }:{d10 , d20 })=t c1 s2 w2 (d1dead ) ir2 (=∅:E.did=d1:{e11 }:{e11 , e2init })=f a2 RH B : s2 w2 (d1dead ) s1 w1 (e11 ) ir1 (6=∅:D.did=d1:{d10 }:{d10 , d20 })=t c1 ir2 (=∅:E.did=d1:{e11 }:{e11 , e2init })=f a2 As only T1 commits, the graphs contain only T1 and will not be shown here. 6.2 SRCA is 1-copy-SI+IC Theorem 6.1. SRCA provides 1-copy-SI+IC if the underlying DB replicas provide SI+IC using first-commiter-wins strategy and deferred mode for integrity constraints. Proof. Based on Theorem 5.4, we need to prove that for any replicated history RH possible under the protocol, (i) all local histories RH k are SI+IC-histories, (ii) an update transaction commits at either none or all replicas, (iii) there exists a U SG-IC(RH) with no G-1c* and G-SIb* cycles. Property (i) is fulfilled by assumption. Property (ii) It is clear that an update transaction T that aborts at its local replica before or at time of validation is not even started at any remote replicas. Thus, we only look at transactions that validate successfully. We show two properties. First, a validated transaction will not abort due to a write/write conflict. Second, the transaction will perform the same integrity reads at all replicas. The first part is from [Lin et al. 2005]. The middleware submits the commit of a transaction Tik to replica Rk only if Ti is the first in queue Qk . If Tik is a remote transaction, no other transaction commits between Tik ’s start and Tik ’s commit. Thus, when database instance Rk performs the validation of Tik internally at Tik ’s commit time, validation succeeds. If Tik is a local transaction, then Tik ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
32
·
...
has already started at Rk when the commit is submitted to Rk . If Tik had a write/write conflict with any other transaction that committed since the start of Tik , then the middleware would have detected a conflict at validation time of Ti and not appended Ti to any queue. Note that in all replicas there might be concurrent local transactions that have not yet validated. But these transactions are of no interest because they are not considered in any validation process. For integrity reads we assume they are always checked at commit time and read the last committed version. We show that all transactions perform the same integrity reads based on the fact that commit requests are submitted in the same order at all replicas. Assuming that before the commit of the first transaction, all replicas have the same state, the integrity reads of the first transaction have the same outcome at all replicas, and all will either commit or abort the transaction (as transactions are IC-obyeing). Per induction, assume all replicas have committed the same set of n transactions in the same order. When now transaction Tn+1 performs its commit, its integrity reads access the same data versions at all replicas, and thus, the outcome is the same at all replicas. Property (iii) We have just shown that all replicas commit the same set of write transactions in exactly the same order, and that all committed transactions perform their integrity reads on exactly the same data versions. This means, for any two replicas Rk and Rl , if there is a ww-, wir- or irw-dependency edge from Ti to Tj in SSG(RH k ), then there is the same edge from Ti to Tj in SSG(RH l ). As a result, there exists actually only a single U SG-IC(RH), since independently which replica Rk we choose for a transaction Ti , its IC-dependency edges are the same as in other replicas. We now show that this U SG-IC(RH) avoids G-1c* and G-SIb* cycles. (1) Assume a G-1c* cycle exists in U SG-IC(RH). There can be wr-, ww-, wirand irw-dependency edges in the cycle. Note that all transactions in the cycle must be update transactions. This is true because each transaction in the cycle is the start node of a wr-, ww-, wir-, or irw-dependency edge. The start node of a wrww- or wir-dependency edge is obviously an update transaction. Being the start node of an irw-dependency edge means the transaction performed an integrity read which is followed by a successful write operation. Thus, all transactions are update wr/ww/wir/irw
transactions, and thus, executed at all replicas. Each edge Ti −→ Tj in the cycle is taken from the SSG(RH k ) of at least one replica Rk . As the replicas provide SI+IC, this implies that at RH k it holds that ci ≺t cj . As both Ti and Tj are update transactions they commit at all replicas in the same order. Thus ci ≺t cj holds at all replicas and a cycle is not possible. (2) Assume a G-SIb* cycle exists. We break the cycle into q sections of (wr/ww/wir/irw)∗
rw
wr/ww
−→ Tjp −→ Tkp − → Ti(p+1)%q (where 0 ≤ p < q) Tip In section p, Tip , Tjp , and Ti(p+1)%q must be write transactions while Tkp can be a wr/ww
rw
read-only transaction. Tjp −→ Tkp − → Ti(p+1)%q is derived from the SSG(RH l ) of Tkp ’s local history RH l . This implies cjp ≺t skp ≺t ci(p+1)%w which means that Tjlp commits before Til(p+1)%w at Rl . And since they are update transactions, they corresponding transactions in the other replicas commit in the same order. Now let’s consider Tip
(wr/ww/wir/irw)∗
−→
Tjp . As they are update transaction this
ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
...
·
33
order occurs at all replicas and implies cip ≺t cjp . Therefore, for each section we have cip ≺ c(p+1)%q in all local histories. Putting all sections together results in ci0 ≺t ci0 which is impossible. Hence, G-SIb* cannot happen in the U SG-IC(RH) of a replicated history RH produced by SRCA. The proof assumes that the database system validates transactions at commit time and checks integrity constraints at commit time. In case of a non-deferred mode for integrity constraints or if write/write conflicts are detected during runtime, the protocol above might lead to deadlocks. The protocol would need to be extended for this purpose. With this in place, correctness reasoning would be similar but accordingly more complex since more cases need to be considered. The crucial characteristics in the protocol that are used in the proof are that all replicas provide SI+IC, and that all replicas commit successfully validated update transactions in the same order. However, they are not necessary conditions. Lemma 3.7 shows that conflicting transactions need to be committed in the same order, but this is not necessarily required for non-conflicting transactions. Nevertheless, committing all transactions in the same order makes it easy to show that all replicas decide on the same outcome of the transaction and that there are no G-1c* and G-SIb* cycles because any such cycle would appear in a local history. 6.3 SRCA-Extension Lin et al. [2005] present an extension to SRCA that does not require to commit all transactions in the same order at all database replicas. In this section we show that with this extension, the protocol provides 1-copy-SI but no more 1-copy-SI+IC. The extended algorithm, denoted as SRCA-Ex, works as follows. Local execution of read-only and update transactions is as with SRCA. The same holds for the validation and the abort of failed transactions. The differences are as follows. If the validation of a transaction Ti succeeds, the middleware commits Til at Ti ’s local replica Rl immediately. Furthermore, it appends Ti to the queues of all remote replicas. The middleware then starts Tir at remote replica Rr and applies the writeset when there is no transaction before Ti in Qr that has a conflicting write operation. This means, that if a previously validated transaction Tj has a write/write conflict with Ti , then Tir only starts after Tjr commits. However, Tir can run concurrently with other validated transactions for which there is no write/write conflict. After all updates of Tir have been applied, Tir commits. As a result, it is now possible that transactions do not commit in validation order and the commit order at the different replicas is different. The algorithm furthermore puts restrictions on when to start transactions. A local transaction may only start if there is no ‘hole’ in the commit order. That is, if the middleware assigns new transaction Ti to be local at replica Rl and Tj is the last validated transaction that committed at Rl , then transaction Til may only start at Rl if all transactions that validated before Tj have also committed at Rl .11 It is easy to see that the protocol does not provide 1-copy-SI+IC if the underlying database replicas provide SI+IC. 11 This
might lead to starvation of transactions as there might always be holes. Lin et al. [2005] provide extensions that avoid starvation. For space reasons, we ignore them here. ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
·
34
...
wr,s RA
T1 s1 w(x1) c1
T1
T2 T4
MW
RB
SSG
A
wr,s T2
QB
T1
T2 s2 w(y2) c2 T3 s3 w(x3) a3 T5
rw T4
T2
s4 r(x1) r(y0) c4
QA
T1
s
s2 w(y2) c2
T1
s
T2 wr,s T5 SSG
s1 w(x1) c1
wr T1
s5 is delayed.
T2
s5 r(x1) r(y2) c5
s5
B
Fig. 14.
wr
T5
wr USG
time (a) Execution
rw T4
(b) SSGs Example 17
Example 16. Let’s revisit Example 15. T1 executes at RA and inserts employee e1. T2 executes at RB and deletes the department. Both submit their commit request, the middleware validates first T1 and then T2 not detecting a conflict. At RA T1 commits immediately. So does T2 at RB . When now T2 executes at RA a constraint violation is detected and RA aborts T2 . Similar, RB aborts T1 due to integrity violation. The replicas do not commit the same set of transactions. The execution is as follows: RH A : s1 w1 (e11 ) ir1 (6=∅:D.did=d1:{d10 }:{d10 , d20 })=t c1 s2 w2 (d1dead ) ir2 (=∅:E.did=d1:{e11 }:{e11 , e2init })=f a2 RH B : s2 w2 (d1dead ) ir2 (=∅:E.did=d1:{}:{e1init , e2init })=t c2 s1 w1 (e11 ) ir1 (6=∅:D.did=d1:{}:{d1dead , d20 })=f a1 However, assuming that the application has not specified any integrity constraints, the execution provides 1-copy-SI. Example 17. Let’s revisit Example 14 and observe the execution under SRCA-Ex shown in Figure 14.(a). T1 executes first at RA , writes x and validates successfully. T1 is immediately committed at its local replica RA and T1 queued in QB . T2 executes first at RB , writes y and also validates successfully as there is no write/write conflict with T1 . T2 is immediately committed at its local replica RB and queued in QA . T3 executes at RB and writes x. At validation time, the middleware detects a conflict with concurrent transaction T1 and aborts T3 at RB . T2 is now started at RA , its writeset applied and then committed. Before T2 commits at RA , read-only transaction T4 starts at RA . This is correct as only T1 has committed and is the last to validate. T4 reads x1 and y0 . At RB , T1 is started, its writeset applied and then committed. Before T1 commits, read-only transaction T5 wants to start at RB . At this timepoint only T2 has committed at RB but it validated after T1 . There is a ‘hole’ in the order of committed transactions. Therefore, T5 ’s start is delayed until also T1 has committed at RB . T5 reads versions x1 and y2 . ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
...
·
35
Figure 14.(b) shows the SSGs of the two local histories and the U SG of the replicated history. No cycles exist. Theorem 6.2. SRCA-Ex provides 1-copy-SI if the underlying DB replicas provide SI using first-commiter-wins strategy and there are no integrity constraints. Proof. Based on Theorem 3.9, we need to prove that for any replicated history RH possible under the protocol, (i) all local histories RH k are SI-histories, (ii) a write transaction commits at either none or all replicas, (iii) U SG(RH) has no G-1c and G-SIb* cycles. Property (i) is fulfilled by assumption. Property (ii) Again, we only look at transactions that validate successfully. This time, we only have to show that a validated transaction Ti will not abort due to a write/write conflict. The middleware submits the commit for Til to Ti ’s local replica immediately after validation. As it has checked for write/write conflicts against concurrent validated transactions, the validation within the local database system Rl will also succeed and Til commit. The middleware starts a remote transaction Tir for Ti at Rr when all other transactions that validated before Ti and had conflicting write operations committed. Furthermore, for any transaction Tj validating after Ti and conflicting, Tjr is not started until Tir commits. Thus, while Tir might run concurrently with other remote or local transactions that validated before or after Ti , it is assured that they do not have a write/write conflict with Ti . Thus, independently in which order they actually perform their commit, validation will succeed. The only transactions that might run concurrently with Tir and conflict are local transactions that have not yet validated (they will fail their validation). Property (iii) We need to show that U SG(RH) avoids G-1c cycles and G-SIb* cycles. We first show two properties that help us in our proof. ww/wr
First, we show that whenever there is an edge Tik −→ Tjk in any SSG(RH k ) ww/wr
(and thus Ti −→ Tj in U SG(RH)), then Ti validated before Tj . We look first at ww-dependency edges. As all histories are SI, a ww-dependency edge in SSG(RH k ) implies that RH k committed Tik before Tjk started. As validation always occurs before commit, it implies that Ti validates before Tj starts. Assume now the edge occurs in Tj ’s local history RH l . As Tjl starts before validation in its local history this implies Ti validates before Tj . Now assume the edge occurs in a history RH r where Tjr is remote and Tj validates before Ti validates. Transactions are appended to Qr in validation order and Ti may only “overtake” Tj in the queue Qr , and thus, Tir commit before Tjr , if they do not conflict. As there is a conflict, such an overtake may not take place. Therefore, as Tir commits before Tjr , Ti must have validated wr before Tj . A wr-dependency edge Ti −→ Tj in U SG(RH) must be derived from wr Til −→ Tjl in SSG(RH l ) of Tj ’s local history RH l . As RH l is SI, Til commits before Tjl starts and as Tjl is local in RH l , Tjl starts in RH l before it validates. Thus, Ti validates before Tj validates. Second, if Ti validates before Tj , Ti and Tj have a write/write conflict and validation succeeds for both transactions (only possible if they are not concurrent), then ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
·
36
... ww+
there is a path Tik −→ Tjk in the SSG(RH k ) of each local history RH k . Assume there is not such a path in SSG(RH k ). As the transactions write a common data ww+ item, this implies there is a path Tjk −→ Tik . However, as we have seen above this implies that Tj validates before Ti violating our assumption. With this, it is clear that U SG(RH) avoids G-1c cycles (consisting only of wwand wr-dependency edges) as this would imply a cycle in the validation order. Now assume G-SIb* exists. We can break the cycle into q sections of Tip
(wr/ww)∗
−→
wr/ww
rw
Tjp −→ Tkp − → Ti(p+1)%q (where 0 ≤ p < q) (wr/ww)∗
Let us consider one section p. Tip −→ Tjp implies that Tip validates before ww Tjp according to the discussion above. Now assume the next edge is Tjp −→ Tkp ww and is derived from Tjkp −→ Tkkp of SSG(RH k ) of history RH k . This implies Tjp ww+
validates before Tkp , and as a result Tjp −→ Tkp in the SSGs of all local histories, ww+
including Tkp ’s local history RH l . This means we have either Tjlp −→ Tklp or wr
Tjlp −→ Tklp in the SSG(RH l ) of the local history RH l of Tkp . This implies that all transactions that validated before Tjp also committed before Tklp started in RH l according to SRCA-Ex (no holes when a transaction starts). As the edge rw
Tklp − → Til(p+1)%q also incurs in SSG(RH l ) and implies skp ≺t ci(p+1)%q , we can derive Tjp validated before Ti(p+1)%q . Adding all sections together we can derive a cycle in the validation order which is impossible. Hence, G-SIb* can not happen in the U SG(RH) of a replicated history RH produced by SRCA-Ex. 6.4 SRCA with 2-Phase-Commit SRCA-Ex allows integrity reads to access different data versions leading to the possibility of replicas not committing the same set of update transactions. Accessing different data versions is possible because replicas can commit non-conflicting update transactions in different order. The question arises whether one can actually build a replica control protocol providing 1-copy-SI+IC that allows non-conflicting transactions to commit in different order and/or integrity reads to access different data versions at the different replicas. In fact, it is easy to derive such a protocol from SRCA-Ex. We denote it as SRCA-2PC. The only change is that the middleware does not commit each Tik individually once execution has completed at Rk . Instead, only when execution has completed at all replicas, the middleware runs a 2-Phase-Commit protocol (2PC) with all replicas being participants. Assuming deferred mode for integrity constraints, transactions perform their integrity reads upon receiving the prepare-to-commit request of the 2PC from the middleware. When the integrity read at a replica Rk evaluates to false, Rk votes to abort the transaction. Only if the integrity reads at all replicas evaluate to true all vote to commit the transaction and the transaction can commit. Note that the different replicas might access different data versions in their integrity reads. Theorem 6.3. SRCA-2PC provides 1-copy-SI+IC if the underlying DB replicas provide SI+IC using first-commiter-wins strategy and deferred mode for integrity constraints. Proof. Properties (i) and (iii) hold with the same reasoning as for SRCA-Ex. ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
...
·
37
Property (ii) holds because of the properties of the 2PC. 6.5 Discussion All the protocols discussed in this section use a middleware and assume that the underlying database systems provide SI+IC. This makes the proof of the first property (local SI+IC histories) trivial. Protocols that are implemented within the database kernel have to show explicitly that the local histories are SI(+IC)-histories. Requiring that all update transactions commit at all replicas is an obvious requirement of ROWA approaches. While this is automatically provided if 2PC is used, showing this property for protocols that do not rely on a 2PC, is not trivial. In fact, SRCA-Ex does not provide 1-copy-SI+IC because the replicas might commit different sets of update transactions. The complexity of showing that U SG(RH) avoids G-1c* and G-SIb* cycles again depends on the replica control algorithm itself. It is likely that the more conservative and restrictive the algorithm is, the easier the proof will be. 7. FAILURES 7.1 Motivation A ROWA approach cannot continue executing update transactions if one replica fails. Thus, replica control protocols typically implement a read-one-write-allavailable (ROWAA) approach where only the available copies need to perform the update transactions. For space reasons, the following discussion excludes integrity constraints. They can be included in a straightforward way into the formalism. We first have to define a history in the advent of failures. We assume the crashfailure model where a database system that fails simply stops execution. Definition 7.1. History with failure. A history H with failure over a set of transactions T is a history according to Definition 2.1 with the following exceptions. 1. The last event in H is a failure event, denoted as f . 2. If ci of transaction Ti ∈ T is contained in H, then all operations of Ti are contained in H. If ai of transaction Ti ∈ T is contained in H, then at least si is contained in H. The Definition 3.2 of a replicated history can then be extended by simply indicating that each local history RH k can possibly be a history with failure. Example 18. Assume two replicas RA , RB and transactions T1 , T2 local at RB . A A A RHfAail1 : sA 1 w1 (x1 ) c1 f B B B B B B RHf ail1 : s1 r1 (y0 ) w1B (x1 ) cB 1 s2 w2 (x2 ) c2 T2 does not execute at RA since the replica fails before being able to do so. Thus, T2 only commits at the local and single available replica RB . Definition 3.4 of 1copy-SI is violated as the two histories do not commit the same set of transactions. In a ROWAA approach, however, only the available replicas should be required to commit the same set of transactions while a history with failure only needs to execute properly until it fails. With this change, the history RHf ail1 can be 1-copySI. U SG(RHf ail1 ) of Figure 15 contains only a ww-dependency edge from T1 to ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
38
·
...
T1 Fig. 15.
U SG(RHf ail1 ) and U SG(RHf ail2 )
wr
T2
Fig. 16.
rw
T4
wr
T3
U SG(RHf ail3 )
T2 . A corresponding global history would be the same as RHfBail1 . However, what does it exactly mean for a history with failure to execute properly until failure? Example 19. Assume the same setup as in Example 18 but a different execution. A A A RHfAail2 : sA 2 w2 (x2 ) c2 f B B B B B B RHfBail2 : sB 1 r1 (y0 ) w1 (x1 ) c1 s2 w2 (x2 ) c2 This time RA executes and commits T2 but not T1 while RB commits both transactions. The execution again seems to be fine. U SG(RHf ail2 ) is the same as U SG(RHf ail1 ) (see Figure 15) having a single ww-dependency from T1 to T2 . The global history could again be the same as RH B . However, if RA had not failed, it would have not been possible to commit both T1 and T2 at RA and extend RH A so that the extended history is 1-copy-SI. The issue is that U SG(RHf ail2 ) in Figure 15 does not have any cycle because RA failed before committing T1 . If RH A had completed execution, there would have been an additional ww-dependency edge from T2 and T1 leading to a cycle. This is in contrast to RHf ail1 where the execution at RA could have been extended to execute and commit T2 , and the U SG would still be acyclic. Example 20. Let us reconsider Example 3 with T1 writing x, T4 writing y, and T2 and T3 reading both x and y. The history RHhole was not 1-copy-SI because the two read-only transactions implicitly ordered the non-conflicting update transactions. In the following execution we let RB fail before executing T1 . A A A A A A A A A RHfAail3 : sA 1 w1 (x1 ) c1 s2 r2 (x1 ) r2 (y0 ) c2 s4 w4 (y4 ) c4 B B B B B B B B B RHf ail3 : s4 w4 (y4 ) c4 s3 r3 (y4 ) r3 (x0 ) c3 f The U SG(RHf ail3 ) in Figure 16 is also acyclic. However, the replicated history is not 1-copy-SI since T2 and T3 are reading from incompatible snapshots. Again, if RB had not crashed and applied T1 ’s update, then the history would not be 1-copySI. The issue as before lies in the fact that RH B is incomplete due to its failure, and it misses some edges needed to capture the fact that the replicated history is not 1-copy-SI. In this case, it misses an rw-dependency edge from T3 to T1 . 7.2 Failure Completed Histories Failures result in incomplete local histories. Taking the SSGs of these incomplete histories to build the U SG might prevent us from observing violations of 1-copy-SI. Our approach is to complete these local histories with failures in order to be able to capture such violations. We can observe that the missing edges in the U SG are always due to the fact that a history with failure misses the write operations of some committed transactions. By adding the missing committed update transactions, local histories become complete. In RHfAail2 , if we add the writes (and commit) of T1 then the U SG contains a ww-dependency edge from T2 to T1 (see Figure 17), ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
...
·
39
rw
T1 Fig. 17.
Extended U SG(RHf ail2 )
wr
Fig. 18.
T2
rw
T4
wr
T3
Extended U SG(RHf ail3 )
resulting in a G-1c cycle. Thus, we can detect that the history is not 1-copySI. Similarly, adding T1 to RHfBail3 results in an additional rw-dependency edge from T3 to T1 leading to a G-SIb* cycle (see Figure 18). In contrast, adding T2 to RHfAail1 , the U SG remains the same as in Figure 15, and we can see that the history is 1-copy-SI. Let us define this concept formally. S Definition 7.2. Failure Completed Replicated History. Let RH = RH k be a replicated history over rmap(T , R) with at least one local history with failure. A failure completed replicated history F CRH for RH has the following properties: S (1) F CRH = F CRH k is a replicated history over rmap(T , R) where no local history has a failure. (2) For each local history with failure RH k in RH, there is a local history F CRH k without failure in F CRH, such that RH k − f k is a prefix of F CRH k . (3) For each local history RH k in RH without failure, there is a local history F CRH k in F CRH where RH k = F CRH k . (4) For all update transactions Ti ∈ T and for all Rk , Rl ∈ R : cki ⇐⇒ cli . Missed update transactions can be added in different ways. A failure completed history represents one possible continuation of the execution if no failure had occurred. If at least one continuation exists that represents a 1-copy-SI history then we consider the replicated history RH with failures to be 1-copy-SI. S Definition 7.3. 1-copy-SI in the advent of failures. Let RH = RH k be a replicated history over rmap(T , R) with at least one local history with failure. RH is 1-copy-SI if there exists a failure completed history F CRH for RH that is 1-copy-SI. This means, in order to show that a replicated history RH with failures is 1-copySI we have to find a failure completed F CRH for RH where each local history is an SI-history and where U SG(F CRH) has no G-1c or G-SIb* cycles. For our examples, we can do this for RHf ail1 but not for RHf ail2 and RHf ail3 . Both RHf ail2 and RHf ail3 miss one transaction. As the local histories in F CRH must be extensions of the local histories in RH with failures, we can only add the missing transaction at the end of the execution resulting in U SGs with cycles. 7.3 Failure Handling in the SRCA Protocols In this section, we outline how the SRCA protocols of Section 6 handle failures. We only consider the failure of any individual database replica but not the failure of the middleware. For brevity, our discussion ignores integrity reads. In SRCA, for each failed replica the sequence of committed transactions is a prefix of the sequence of transactions committed at the available replicas. The failed history up ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
40
·
...
to the failure represents an SI-history. The history might contain some transactions that did not complete before the failure. A local transaction that had not validated before the crash can be considered aborted in the entire system because the other replicas have not applied it. A local transaction that had validated before the crash was applied by the available replicas. Thus, the failed history has to be extended by the commit of this transaction. A remote transaction has to be extended by the missing write operations and the commit. Furthermore, all committed update transactions that are completely missing in the history have to be appended, too. The missing write and commit operations are appended in the order that conforms to the order in one of the available histories. For SRCA-Ex and SRCA-2PC, the failed history is extended in a similar way. The order of missing write and commit operations should conform to the order in which these transactions were validated. The proofs that these extended histories remain 1-copy-SI+IC (for SRCA and SRCA-2PC) and 1-copy-SI (for SRCA-Ex) is relatively straightforward and omitted. One can show that all additional dependency edges that appear in U SG(F CRH) of the failure completed history F CRH would have also appeared if no local replica had failed. As the protocols provide 1-copy-SI(+IC) histories in the failure free case, the failure completed history will also be 1-copy-SI(+IC). 8. RELATED WORK 8.1 Work on Snapshot Isolation in General SI became a popular isolation level since Oracle implemented it as its highest isolation level, and other commercial solutions have followed with their own implementations. Basically all commercial systems we are aware of do not implement the first-committer-wins strategy but detect write/write conflicts during execution using locks (first-updater-wins strategy). Oracle [Oracle Corporation 2007] does not actually store multiple versions but reconstructs previous versions by accessing the a specific page undo log which contains the old data values of records. PostgreSQL [2007] has always had a record-based multi-version system. Microsoft SQL Server 2005 [2007] offers both serializability via locking and SI by reconstructing record versions stored in persistent storage. In the research literature, Berenson et al. [1995] defined SI by specifying a set of anomalies that SI avoids or allows. In particular, compared to serializability, SI does not avoid the anomaly “write skew”. Adya [1999] introduces the concepts of Snapshot-Read and Snapshot-Write and defined the properties of SI through GID. From there, Fekete et al. [2005] observe that the set of histories allowed by SI, but not by serializability are those that have cycles in the SSG with two consecutive anti-dependency edges. We made a similar observation and refined the G-SIb phenomenon to reflect this fact. Some work has analyzed how serializability can be achieved on top of a SI scheduler. Based on the GID formalism, Fekete et al. [2005] describe a set of tools to convert a given database application so that even if it runs on a database system providing SI, only serializable executions are produced. Depending on the kind of application, certain vulnerable edges (conflicts) between concurrent transactions have to be determined and restructured. More recently, Cahill et al. [2008] has shown how SI concurrency control can be extended within the DB kernel to enforce ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
...
·
41
serializability in an efficient way. The idea is to keep track of anti-dependencies and whenever there are two consecutive anti-dependencies to abort one of the transactions. Similarily, Elnikety et al. [2005] show how serializability can be achieved in a replicated system even if the local databases only provide SI. For that, read- and writesets of transactions have to be monitored and their intersections determined. Our assumption is that serializability is not always needed but the execution in a replicated system should reflect the behavior of a non-replicated system using SI. As we have discussed in this paper, the definitions given for SI in [Berenson et al. 1995; Adya 1999] allow for the violation of integrity constraints. Adya [1999] explores this topic further and shows that if update transactions run under serializability and only queries use snapshot isolation, integrity constraints do not pose a problem. However, this does not reflect the behavior of commercial systems. Alomari et al. [2008] handle integrity violations in systems where integrity reads are embedded in the application programs by running those transactions in a serializable mode similar to [Fekete et al. 2005]. Our paper addresses integrity constraints in detail and extends the GID to model actions the are done to maintain database integrity. Our SI+IC guarantees the preservation of integrity constraints by avoiding G-IC and forbidding G-SIb* and G-1c* cycles. It does not require the entire transaction to be serializable. An option that could be further explored is to run integrity reads as sub-transactions that require serializability [Weikum and Vossen 2001]. It might be difficult, though, to combine this with a formalism like GID which allows a simple description of SI properties. 8.2 Snapshot Isolation in a Distributed System Snapshot Isolation in a Federated System. In a federated system, data is partitioned (not replicated) across a set of databases. Transactions from users are accepted by a federation layer which redirects them to underlying databases and performs any necessary pre- and post-processing. Schenkel et al. [1999] propose two algorithms to provide globally SI assuming the underlying database systems provide SI locally. The challenge in a distributed setting is that a transaction might need to read data from different databases. Using SI this means, it should read from the same snapshot at all replicas. Schenkel et al. [1999] indicate that for a schedule in a federated system to provide SI at the global level it should not have any two transactions Ti and Tj that are concurrent at one database D1 while Ti executes after Tj in database D2 and reads a data version written by Tj . The reasoning is that in this case Ti would read from a transaction Tj that is globally concurrent to Ti because there exists a database where both are concurrent. This is disallowed under SI. Snapshot Isolation in a Replicated System. In the last few years, several groups started to work concurrently on the concept of SI in a replicated system. Elnikety et al. [2005] present Generalized Snapshot Isolation (GSI) that it is a generalization of SI in the context of a centralized database. GSI is based on two definitions that are similar to Snapshot-Read and Snapshot-Write. However, it allows a transaction, instead of reading the committed snapshot at the time of transaction start, to read an older snapshot. This is equivalent to artificially setting the start point of a transaction into the past. Although the paper presents a replicated protocol no ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
42
·
...
formalization is presented for the replicated case. A similar concept was defined by Daudjee and Salem [2006] as weak SI. GSI or weak SI are interesting in a replicated system since when a transaction starts locally, some update transactions might have already committed at other replicas but not yet at the local one. Thus, when looking at the global state, these transactions have already committed but the local transaction does not yet see the effects thus violating SI. In our framework, we are similar in concept to GSI because the start time of the transaction is set to the current state of the local database, and therefore, automatically put into the past (looking at the global state). Using GSI, however, a transaction’s start timepoint can be put arbitrarily into the past. An update transaction is more likely to abort if its start timepoint is farther in the past. Considering that the updates of a transaction are never committed at the same physical time at the different replicas it can happen that a transaction might miss important updates from an application point of view. For instance, in the protocols of Section 6 an update transaction Ti could be local at history RH l . Then a consecutive read-only transaction Tj of the same user could be local at a different replica Rk . It is possible that at the start of Tj at Rk , Rk has not yet applied and committed Ti . Thus, Tj will not see the changes of Ti . The system provides 1-copySI but from a user point of view, the execution is not correct since a user does not see its previous writes. In order to capture such dependencies between transactions, Elnikety et al. [2005] introduce prefix-consistent GSI that requires that a transaction Ti ’s snapshot needs to contain updates of transactions that committed before Ti and are related to Ti , for instance because they were submitted by the same user or because they belong to the same workflow. In similar spirit, Daudjee and Salem [2006] refer to strong session SI, if it provides Snapshot-Read and Snapshot-Write, and for any two transactions Ti and Tj of the same user session, if Ti ’s commit precedes the first read/write operation of Tj , then ci ≺t sj . As far as we know, none of the formalisms developed to reason about SI in a distributed or replicated environment considers integrity constraints and their impact on the correctness of the system. Replica Control for Snapshot Isolation. Several database replication protocols have been developed based on SI [Plattner and Alonso 2004; Plattner et al. 2008; Daudjee and Salem 2006; Kemme and Alonso 2000; Lin et al. 2005; Wu and Kemme 2005; Elnikety et al. 2005; Mu˜ noz-Esco´ı et al. 2006]. For most, however, no formal proof of correctness has been given. Protocols are either implemented into the kernel of a database system or at a middleware layer. Primary copy protocols let all update transactions execute at a single primary replica while secondary replicas may only execute read-only transactions. In contrast, update anywhere protocols allow any transaction to be local at any replica. Lazy protocols send writesets to other replicas only after commit of the local transactions while eager protocols send them before commit. Several middleware approaches have one middleware instance for each database replica. They often use a total order multicast [Chockler et al. 2001] in order to allow for a distributed, yet deterministic validation. Snapshot isolation has also been used in the context of multi-tier middleware systems such as J2EE. In [Perez-Sorrosal et al. 2007] a replication protocol is presented for replicating both the application server and database tiers. It provides 1-copy-SI and cache ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
...
·
43
transparency. It guarantees that the cached data objects in the middle tier are versioned in a way consistent with SI and the underlying database. 8.3 Other correctness criteria 1-copy-serializability [Bernstein et al. 1987] was the first and strongest correctness criteria developed for replicated database systems, and is offered by many replication protocols [Carey and Livny 1991; Chundi et al. 1996; Breitbart et al. 1999; Pacitti et al. 1999; Kemme and Alonso 2000; Pedone et al. 2003; Amza et al. 2003; Holliday et al. 2003; Cecchet et al. 2004; Pati˜ no-Mart´ınez et al. 2005]. The proper serialization order is provided by various mechanisms, such as locking [Carey and Livny 1991; Cecchet et al. 2004], using the total order multicast of group communication systems [Kemme and Alonso 2000; Pedone et al. 2003; Pati˜ no-Mart´ınez et al. 2005], versioning [Amza et al. 2003], vector clocks and other timing mechanisms [Pacitti et al. 1999; Holliday et al. 2003], serialization graphs [Breitbart and Korth 1997], or by restricting how object copies can be collocated [Chundi et al. 1996; Breitbart et al. 1999]. The concept of data freshness (or staleness) levels has received considerable attention since it allows to provide faster response times at the cost of staler data. R¨ ohm et al. [2002] present a lazy primary copy system where freshness of a secondary replica is based on the time between the last applied update at the secondary and the most recent update on the primary replica. Queries can indicate the minimum freshness of the data they want to see. Thus, applying writesets at secondaries can be delayed to timepoints when there is little load in the system or until secondaries are too stale. In [Gan¸carski et al. 2007], staleness is defined on a relation basis and reflects the number of tuple changes a replica has not yet seen. In [Plattner et al. 2008], secondary replicas can be designed to maintain an important snapshot or to load a required past snapshot. As in [R¨ohm et al. 2002], applying writesets to secondaries can be delayed to give preference to queries that will get faster response at the price of less accurate data. In all these approaches, global correctness is not violated, and the systems still provide 1-copy-serializability or 1-copy-SI. As with GSI, from an abstract point of view it means that the start timepoint for a query can be put into the past, limited by the freshness value. In [Bernstein et al. 2006], the concept of Relaxed Currency Serializability is introduced and applied to a distributed and replicated cache. Several constraints, such as time-bound, valuebound or drift constraint can be defined over a set of data items. A transaction may read from different snapshots (such as in the read-committed isolation level [Adya et al. 2000]) as long as freshness constraints are satisfied. 9. CONCLUSION In this paper, we present a formal framework to reason about snapshot isolation in a replicated environment. Our framework is based on General Isolation Definition (GID) which provides a graph-based way to reason about the correctness of snapshot isolation schedules. We extend GID in several ways. Firstly, we extend it to reason about replicated histories, and define what it means for a replicated history to provide SI at the global level, i.e., to provide 1-copy-SI. By extending the graph-based reasoning tool of GID, we can derive sufficient and necessary conditions for a replicated history to be 1-copy-SI by looking at the dependency graph ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
44
·
...
of the replicated history and test for certain cycles. From there, we analyze how commercial systems that provide SI handle integrity constraints. Since basic SI can lead to violations of integrity constraints but commercial systems maintain them, we derive a new isolation level, denoted as SI+IC which is supported by several database systems. It models the maintenance of integrity constraints by first requiring a transaction to perform special integrity read operations on relevant data items and then ensuring that at commit time the data versions read are still valid. The dependency graph of histories with integrity reads is extended and new graph-based conditions define when a history provides SI+IC. We then extend the notation to a replicated history to define 1-copy-SI+IC and identify conditions that allow to determine whether a history is 1-copy-SI+IC. In order to handle failures, some special care has to be taken to capture all dependencies. 10. ACKNOWLEDGEMENTS We want to thank the anonymous reviewers for their insightful reviews and many constructive suggestions. They helped us tremendously in improving this paper. REFERENCES Adya, A. 1999. Weak consistency: A generalized theory and optimistic implementations for distributed transactions. Ph.D. thesis, MIT, Cambridge. Adya, A., Liskov, B., and O’Neil, P. E. 2000. Generalized isolation level definitions. In Proc. of the IEEE Int. Conf. on Data Engineering (ICDE). 67–78. ¨ hm, U. 2008. The cost of serializability on Alomari, M., Cahill, M. J., Fekete, A., and Ro platforms that use snapshot isolation. In Proc. of the IEEE Int. Conf. on Data Engineering (ICDE). 576–585. Amza, C., Cox, A. L., and Zwaenepoel, W. 2003. Distributed versioning: Consistent replication for scaling back-end databases of dynamic content web sites. In Proc. of the ACM/IFIP/USENIX Int. Middleware Conf. 282–302. ANSI X3.135-1992. 1992. American National Standard for Information Systems - Database Language- SQL. Berenson, H., Bernstein, P., Gray, J., Melton, J., O’Neil, E., and O’Neil, P. 1995. A critique of ANSI SQL isolation levels. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data. 1–10. Bernstein, P. A., Fekete, A., Guo, H., Ramakrishnan, R., and Tamma, P. 2006. Relaxedcurrency serializability for middle-tier caching and replication. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data. 599–610. Bernstein, P. A., Hadzilacos, V., and Goodman, N. 1987. Concurrency Control and Recovery in Database Systems. Addison-Wesley. Breitbart, Y., Komondoor, R., Rastogi, R., Seshadri, S., and Silberschatz, A. 1999. Update propagation protocols for replicated databases. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data. 97–108. Breitbart, Y. and Korth, H. F. 1997. Replication and consistency: Being lazy helps sometimes. In Proc. of the ACM Int. Symp,. on Principles of Database Systems (PODS). 173–184. ¨ hm, U., and Fekete, A. 2008. Serializable isolation for snapshot databases. In Cahill, M., Ro Proc. of the ACM SIGMOD Int. Conf. on Management of Data. 729–738. Carey, M. J. and Livny, M. 1991. Conflict detection tradeoffs for replicated data. ACM Transactions on Database Systems (TODS) 16, 4, 703–746. Cecchet, E., Marguerite, J., and Zwaenepoel, W. 2004. C-JDBC: Flexible database clustering middleware. In In Proc. of USENIX Annual Technical Conference, FREENIX Track. 9–18. Chockler, G., Keidar, I., and Vitenberg, R. 2001. Group communication specifications: a comprehensive study. ACM Computer Surveys 33, 4, 427–469. ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
...
·
45
Chundi, P., Rosenkrantz, D. J., and Ravi, S. S. 1996. Deferred updates and data placement in distributed databases. In Proc. of the IEEE Int. Conf. on Data Engineering (ICDE). 469–476. Daudjee, K. and Salem, K. 2006. Lazy database replication with snapshot isolation. In Proc. of Int. Conf. on Very Large Data Bases (VLDB). 715–726. Elnikety, S., Pedone, F., and Zwaenopoel, W. 2005. Database replication using generalized snapshot isolation. In Proc. of the Int. Symp. on Reliable Distributed Systems (SRDS). 73–84. Fekete, A., Liarokapis, D., O’Neil, E., O’Neil, P., and Shasha, D. 2005. Making snapshot isolation serializable. ACM Transactions on Database Systems (TODS) 30, 2, 492–528. Ganc ¸ arski, S., Naacke, H., Pacitti, E., and Valduriez, P. 2007. The leganet system: freshnessaware transaction routing in a database cluster. Information Systems 32, 2, 320–343. Holliday, J., Steinke, R. C., Agrawal, D., and Abbadi, A. E. 2003. Epidemic algorithms for replicated databases. IEEE Transactions on Knowledge and Data Engineering (TKDE) 15, 5, 1218–1238. Kemme, B. and Alonso, G. 2000. A new approach to developing and implementing eager database replication protocols. ACM Transactions on Database Systems (TODS) 25, 3, 333– 379. ˜ o-Mart´ınez, M., and Jim´ enez-Peris, R. 2005. Middleware based Lin, Y., Kemme, B., Patin data replication providing snapshot isolation. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data. 419–430. Microsoft SQL Server 2005. 2007. SQL Server 2005 row versioning-based transaction isolation. ˜ oz-Esco´ı, F. D., Pla-Civera, J., Ruiz-Fuertes, M. I., Iru ´ n-Briz, L., Decker, H., ArMun ´ riz-In ˜ igo, J. E., and Gonza ´ lez de Mend´ıvil, J. R. 2006. Managing transaction conmenda flicts in middleware-based database replication architectures. In Proc. of the Int. Symp. on Reliable Distributed Systems (SRDS). 401–410. Oracle Corporation. 2007. Oracle 11g Release 1. Pacitti, E., Minet, P., and Simon, E. 1999. Fast algorithm for maintaining replica consistency in lazy master replicated databases. In Proc. of Int. Conf. on Very Large Data Bases (VLDB). 126–137. ˜ o-Mart´ınez, M., Jime ´nez-Peris, R., Kemme, B., and Alonso, G. 2005. MIDDLE-R: Patin Consistent database replication at the middleware level. ACM Transactions on Computer Systems (TOCS) 23, 4, 375–423. Pedone, F., Guerraoui, R., and Schiper, A. 2003. The database state machine approach. Distributed and Parallel Databases 14, 1, 71–98. ˜ o-Mart´ınez, M., Jim´ enez-Peris, R., and Kemme, B. 2007. ConPerez-Sorrosal, F., Patin sistent and scalable cache replication for multi-tier J2EE applications. In Proc. of the ACM/IFIP/USENIX Int. Middleware Conf. 328–347. Plattner, C. and Alonso, G. 2004. Ganymed: Scalable replication for transactional web applications. In Proc. of the ACM/IFIP/USENIX Int. Middleware Conf. 155–174. ¨ Plattner, C., Alonso, G., and Ozsu, M. T. 2008. Extending DBMSs with satellite databases. VLDB J. 17, 4, 657–682. PostgreSQL. 2007. PostgreSQL, the world’s most advanced open source database. ¨ hm, U., Bo ¨ hm, K., Schek, H.-J., and Schuldt, H. 2002. FAS - a freshness-sensitive coordiRo nation middleware for a cluster of OLAP components. In Proc. of Int. Conf. on Very Large Data Bases (VLDB). 754–765. Schenkel, R., Weikum, G., Weißenberg, N., and Wu, X. 1999. Federated transaction management with snapshot isolation. In Int. Workshop on Foundations of Models and Languages for Data and Objects (FMLDO) - Selected Papers. 1–25. Weikum, G. and Vossen, G. 2001. Transactional Information Systems. Morgan Kaufmann, Chapter 6. Wu, S. and Kemme, B. 2005. Postgres-R(SI): Combining replica control with concurrency control based on snapshot isolation. In Proc. of the IEEE Int. Conf. on Data Engineering (ICDE). 422–433. ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
46
·
...
Appendix A. SERIALIZATION GRAPH DEPENDENCY EDGES Definition 3.4 of Section 3.1 defines a replicated history RH to be 1-copy-SI, if all local histories are SI-histories, if all local histories commit the same set of update transactions, and if there is a global history H with the same set of committed transactions, whose SSG(H) has the same nodes and the same ww-, wr-, and rwdependency edges as U SG(RH). In this definition, the serialization graphs (SSG or USG) do not consider the data items that lead to the dependency edges. In this section we argue that considering the individual data items is not necessary. For the non-replicated case, Adya shows in [Adya 1999] that GID does not require to look at the individual data items that trigger dependencies. But in Definition 3.4 it seems feasible that a dependency edge from Ti to Tj in the U SG(RH) of a replicated history is due to a conflict on data item x, while the SSG(H) of the “equivalent” global SI-history H has this same dependency edge due to a data item y. One option to express the requirement that dependency edges reflect conflicts on the same data items would be to tag dependency edges with the data items that cause them. For instance if a history had operations ...wi (xi ) ... rj (xi ) ..., then the wrx Tj dependency edge. The 1-copy-SI definition SSG resp. U SG could have a Ti −→ could then require the dependency edges of SSG(H) of the global history H to have the same item tags as the corresponding dependency edges in U SG(RH). The reasoning we used in Section 3.1 would work equally well with this extended notation but for simplicity, we did not use it. Instead, we show in this appendix that the dependency edges in SSG(H) must be due to the same data items as the corresponding edges in U SG(RH). S Lemma Appendix A.1. Let RH = RH k be a replicated history over rmap(T , R) with the following properties. (1 ) ∀Rk ∈ R, RH k is an SI-history. (2 ) For all update transactions Ti ∈ T and for all Rk , Rl ∈ R : cki ⇐⇒ cli . (3 ) There exists a global SI-history H over T such that (a) SSG(H) and U SG(RH) have the same nodes; (b) SSG(H) has exactly the same ww-, wr-, and rw-dependency edges as U SG(RH).
Then the following holds. If a dependency edge in U SG(RH) is due to data item x, then the corresponding dependency edge in SSG(H) is also due to x. If a dependency edge in SSG(H) is due to data item x, then the corresponding dependency edge in U SG(RH) is also due to x. Proof. We show for each type of dependency edge, that they must be due to the same data items in both U SG(RH) and SSG(H). Our proof only shows one direction, the other follows the same reasoning. wr
(1) Assume that U SG(RH) contains Tj −→ Ti due to data item x and taken from SSG(RH k ) of replica Rk while the corresponding edge in SSG(H) is not due to x but due to y. This means Tj writes both x and y. Furthermore, there wr must be an edge Tk −→ Ti in SSG(H) due to x as Ti must read x from some other transaction (assuming a start transaction that writes all data items before ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
·
...
47
wr
they are read). As a result SSG(RH k ) must also contain Tk −→ Ti (all readdependency edges to Ti in U SG(RH) are from Ti ’s local history). As Tk writes x, and Ti in RH k reads x from Tj and not from Tk , and RH k is an SI-history, s,ww +
s,wr
it follows that Tk −→ Tj −→ Ti . But as SSG(H) and U SG(RH) have the ww +
same ww-dependency edges and H is an SI-history, it follows that Tk −→ Tj in wr,s SSG(H), and as we also have Tj −→ Ti in SSG(H) due to y, Ti will actually also read x from Tj and not Tk according to the Snapshot-Read property of wr reading the last committed version. Thus, Tj −→ Ti in SSG(H) is due to x and our assumption is wrong. ww (2) Assume that U SG(RH) contains a Tj −→ Ti due to data item x taken from SSG(RH k ) while the corresponding edge in SSG(H) is not due to x but due to y. At the same time, as Tj and Ti both write x, H must order xj and xi and SSG(H) must also contain a ww+ -dependency path between Tj and Ti . ww+ ww+ As Ti −→ Tj would lead to a G-1c cycle in SSG(H), Tj −→ Ti must hold. As by assumption Ti cannot directly write-depend on Tj due to x, there must be ww ww+ at least one other transaction Tk , Tj −→ Tk −→ Ti in SSG(H), and thus in ww U SG(RH). As Tk also writes x and we assume Tj −→ Ti in SSG(RH k ) due ww+ ww+ to data item x we must either have Tk −→ Tj or Ti −→ Tk in SSG(RH k ) due to x. But both will result in a G-1c cycle in U SG(RH) since there is already ww ww+ Tj −→ Tk −→ Ti in U SG(RH). rw
(3) Now assume that SSG(RH k ) contains a Ti − → Tj due to data item x for wr transaction Ti local at Rk . Furthermore there must be a Tk −→ Ti due to x as we assume that a start transactions writes all data items read by transactions in the history. By the definition of wr- and rw-dependency edges this means ww that there is also a Tk −→ Tj in SSG(RH k ) due to x. As we have seen already that all ww- and wr-dependency edges in U SG(RH) and SSG(H) are due to ww wr the same data item, Tk −→ Ti and Tk −→ Tj also exist in SSG(H) due to x. rw
Thus, due to Proposition 2.5, this means that SSG(H) also has Ti − → Tj due to x.
Appendix A.1
Complete Proof of Lemma 3.5
Recall that the lemma states that (non-replicated) SI-history H over a set of transactions T avoids G-SIb*. Proof. Assume there is an SI-history H that has phenomenon G-SIb*. SSG(H) cannot have a cycle with only one rw-dependency because it avoids G-SIb. Thus, SSG(H) has a cycle c with m (m > 1) rw-dependency edges and each rw-dependency edge is prefixed by a ww-, wr-, or start-dependency edge. Firstly, we can easily derive that SSG(H) must have a cycle c0 with m (m > 1) rw-dependency edges and all other edges in the cycle are start-dependency edges. This is because whenever there is a ww- or wr-dependency edge between from Ti to Tj there is also a start-dependency edge because of G-SIa. Thus, in the following, we only consider a ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
48
·
...
cycle that consists of m rw-dependency edges, all other edges are start-dependency edges, and each rw-dependency edge is prefixed by a start-dependency edge. We can break the cycle into m sections. Each section k ∈ {0, . . . , m−1} has the pattern S+
rw
Tik −→ Tjk − → Ti(k+1)%m . According to Table II, we can derive for the ≺t -order of H for each section k due to transitivity: ) s+ Tik −→ Tjk ⇒ cik ≺t sjk ⇒ cik ≺t ci(k+1)%m rw Tjk − → Ti(k+1)%m ⇒ sjk ≺t ci(k+1)%m If we now look at all sections, we obtain: ci0 ≺t ci1 ≺t · · · ≺t cik ≺t ci(k+1) · · · ≺t cim −1 ≺t ci0 . Since ≺t is irreflexive this results in a contradiction. Appendix A.2
Complete Proof of Lemma 3.7
Recall that this lemma indicates that for a replication history RH where U SG(RH) has no G-1c cycles, two conflicting committed transactions commit in the same order in all local histories. Proof. Assume two write transactions Ti and Tj updating the same data object, and two arbitrary replicas RA and RB . Since all local histories commit the same B B set of update transactions, we know that if cB so do cA i and cj occur in RH i and A A A A A B B cj in RH and vice versa. Now assume ci ≺t cj in RH and cj ≺t ci in RH B . Let x be one of the objects that Ti and Tj both update. As the SnapshotWrite property requires the version order to follow the commit order, ci ≺t cj implies xi xj in RH A . By the definition of ww-dependency edges (as in Table ww +
I), if xi xj , then SSG(RH A ), and thus U SG(RH), have a path Ti −→ Tj B B consisting of only ww-dependency edges . Similarly, cB will lead to j ≺t ci in RH ww +
Tj −→ Ti in U SG(RH). This results in U SG(RH) having a cycle consisting only of ww-dependency edges. This contradicts the assumption that U SG(RH) avoids G-1c. Appendix A.3
Complete proof of Theorem 3.9
Recall that Theorem 3.9 indicates that a replicated history RH is 1-copy-SI if the following holds. RH is 1-copy-SI if the following holds (1) For each Rk ∈ R, RH k is an SI-history. (2) For all update transactions Ti ∈ T and for all Rk , Rl ∈ R : cki ⇐⇒ cli . (3) U SG(RH) has no G-1c or G-SIb* cycles. Proof. To prove this, according to the definition of 1-copy-SI (Definition 3.4), it is sufficient to show that we are able to construct an SI-history H over T with the same ww-, wr-, and rw-dependencies as U SG(RH). The proof consists of three parts. First, we create a global history H based on the dependency edges in U SG(RH). Then, we show that H really has the same dependency edges as U SG(RH). Finally, we show that H is actually an SI-history. Part (1): To construct a history H. We first build a total order between start and commit operations of all committed transactions. Then we fill in the read and write operations, determine the version ACM Transactions on Database Systems, Vol. V, No. N, February 2009.
...
·
49
orders, and indicate the versions read by the read operations. Step 1: Partially ordering starts and commits. In order to obtain this total order we construct a Start-Commit-Order Serialization Graph, SCSG(RH) where the vertices are the start and commit operations of all committed transactions. < 1. For each Ti in U SG(RH), there is an edge si −→ ci in SCSG(RH). This reflects the fact that the ≺t -order requires the start of a transaction to be before its commit, i.e., si ≺t ci .