Disconnection Modes for Mobile Databases - Semantic Scholar

2 downloads 6116 Views 194KB Size Report
Traditional database applications and information service applications will need to integrate ... cation arena, we envision application infrastructures that will.
Wireless Networks 8, 391–402, 2002  2002 Kluwer Academic Publishers. Manufactured in The Netherlands.

Disconnection Modes for Mobile Databases JOANNE HOLLIDAY Department of Computer Engineering, Santa Clara University, Santa Clara, CA 95053, USA

DIVYAKANT AGRAWAL and AMR EL ABBADI Department of Computer Science, University of California at Santa Barbara, Santa Barbara, CA 93106, USA

Abstract. As mobility permeates into todays computing and communication arena, we envision application infrastructures that will increasingly rely on mobile technologies. Traditional database applications and information service applications will need to integrate mobile entities: people and computers. In this paper, we develop a distributed database framework for mobile environments. A key requirement in such an environment is to support frequent connection and disconnection of database sites. We present algorithms that implement this framework in an asynchronous system. Keywords: databases, data consistency, mobility, replication, disconnected operation

1. Introduction As mobility permeates into todays computing and communication arena, we envision application infrastructures that will increasingly rely on mobile technologies. Current mobility applications tend to have a large central server and use mobile platforms only as caching devices. We want to elevate the role of mobile computers to first class entities in the sense that they allow the mobile user work/update capabilities independent of a central server. In such an environment, several mobile computers may collectively form the entire distributed system of interest. These mobile computers may communicate together in an ad hoc manner by communicating through networks that are formed on demand. Such communication may occur through wired or wireless networks. At any given time, a subset of the computer collection may connect and would require reliable and dependable access to data of interest. In this paper, we consider a distributed database that can be made up entirely of mobile components. These component sites are peers and the database may be replicated for fault-tolerance or availability or both. Thus we have a distributed replicated database where all sites must participate in the synchronization of transactions. The capabilities of the distributed replicated database are extended in this paper by allowing a site or team member to plan to disconnect so that the remaining sites are minimally disrupted. A “check-out” mode is described that allows for independent update of a portion of the database by a disconnected member. These updates are automatically synchronized and integrated into the database upon reconnection. A “relaxed check-out” mode permits increased concurrency at the expense of serializability, the traditional notion of correctness in database management systems. An “optimistic check-out” mode further increases concurrency, but requires reconciliation rules. An implementation in an asynchronous system is discussed. Our approach emphasizes the mobile aspect of all system components, and hence the lack of a centralized server.

Walborn and Chrysanthis [27] describe the use of mobile computers in the trucking industry. Each truck has a computer with a satellite or radio link and interacts with the corporate database. Other applications involving remote or disaster areas and military applications have mobile computers forming ad hoc networks without communication with stationary computers. Faiz and Zaslavsky [8] discuss the impact of wireless technologies and mobile hosts on a variety of replication strategies. Distributed replicated file systems such as Ficus [21] and Coda [17] have extensive experience with disconnected operations. The problem of failed sites and the related issue of network partitioning has been investigated extensively and many of the results apply to disconnected operation. In this paper, we accommodate voluntary disconnection, by using the notion of planned disconnection, which has been shown to be of particular use in asynchronous systems [16]. In section 2, we present several alternative disconnection modes. In section 3, we present our model of a distributed database system. In section 4, we explain in detail how the various modes will operate. In section 5, we show how to implement the modes in an asynchronous system. In section 6, we discuss related work and compare it with ours. We conclude with section 7. 2. Motivation In the system we are considering, the components of the distributed database are laptop computers belonging to members of a small team. The team members maintain a database which may or may not be replicated at each of their laptop computers. Normal operations ensure that all team members are completely synchronized with each other’s progress. Team members gather together in various locations and form ad hoc networks to perform their work. These ad hoc networks may be wired or wireless. Several varieties of wireless LANs are already available. Examples include NCR’s Wave-

392

LAN (Lucent Technologies), Motorola’s ALTAIR, Proxim’s Range LAN, and Telesystem’s ARLAN (see http://www. wlana.com). The laptop computers of the team members carry the only active copies of the database. When team members gather in a remote location, they need to have update capabilities on their database without having to connect to a fixed location computer. The database system must deal with both planned and unplanned disconnections. We start by considering three scenarios covering ways in which the team members might want to interact with their database: basic sign-off, check-out, and relaxed check-out. In the first scenario, team members occasionally disconnect for long periods of time, for instance when one or more of them are traveling. When this happens, the remaining team members need to be informed so that the database can continue to process updates knowing that the missing processor has not failed but rather has voluntarily disconnected. The database copy of the traveling member can continue to process read-only transactions. We call this mode basic sign-off. The idea of planned disconnection is not new [16] and it is reasonable for computers in a distributed database to go through a sign-off procedure before disconnecting to tell the remaining members not to expect acknowledgments or votes on the disposition of shared data. Our first scenario involves Bob, who is the CEO of a small company, in fact he has only 11 employees. It was decided to spend the “office equipment” budget getting laptop computers for the three senior executives and a desktop computer for the executive secretary. The three laptops of the senior executives contain replicas of the corporate database, which is generally updated during their frequent meetings when they connect their computers with an ad hoc network. The database is backed up occasionally onto the secretary’s computer. When Bob needs to travel to meet with customers or give a presentation, he wants the other senior executives to be able to continue their work with the corporate database in his absence. He also wants to take his laptop on his trip so that he can peruse the database while he is waiting at the airport or in his hotel room. For the basic sign-off mode, the database system provides a sign-off capability for Bob when he or another one of the senior executives wishes to travel, otherwise, the distributed database system would not be able to synchronize all of the database components and would not allow updates. A manual override can handle the case where a member has forgotten to sign out. When Bob returns and reconnects, he wants his copy of the database to automatically be brought up to date. For this, the database system provides a sign-on procedure. In the next scenario, the traveling team member wishes to “check-out” a portion of the database so that updates can be made to that portion while disconnected from the rest of the members. The check-out idea is adapted from version control software [5] and this mode is called check-out mode. This mode could be needed when the team member is traveling to a location where he will be able to gather information relevant to that portion of the database or perhaps that portion of the database is concerned with the work or output of that par-

HOLLIDAY, AGRAWAL AND EL ABBADI

ticular team member. When the traveling team member has checked out a portion of the database, the other team members are completely unable to access that data. It is as if the traveling team member had a write lock on the data that was checked out. The traveling member can access in read-only mode other portions of the database that he did not check out. When the traveling team member returns and reconnects his laptop to the distributed database, the database changes he has made are automatically synchronized and integrated into the database. This is in contrast to a system without check-out mode that requires manual integration of the changes into the database or complex reconciliation rules. An example of the check-out mode would involve Alice, who is part of a small sales team that currently covers the Pacific Northwest states. She meets with her team almost every day at which time they bring their laptops together to form a network and share their information. At that time, their common database is updated with production and product availability information from the factory. Before the computers are disconnected, each team member is given a sales assignment and a portion of the database is checked out to them to update as they make sales and delivery contracts. Alice can update the portion that is checked-out to her as well as query the portions of the database that were not checked out to anyone. She cannot access the portions of the database that were checked out to other team members, however, that is not necessary for her to do her work. At the next meeting, the laptops are connected and the updates to the database are smoothly integrated without the need of manual intervention. In the next section, we explore several variations of check-out mode. The third scenario, relaxed check-out mode, is like checkout except those portions of the database that have been checked out to other team members can be queried, that is, accessed in read-only mode. Unlike the above two scenarios where all transactions are serializable, the resulting transactions are not serializable. However, there are many situation in which a relaxation of serializability requirements is acceptable. Jane is a field researcher in a remote part of the world. She and her fellow researchers take their laptops to remote observation platforms and weather stations. They record animal migration patterns and download weather information from sensor stations. While she is recording animal migrations she frequently needs to check recordings of an earlier day or a different location to help her in interpreting her measurements for her field notes. The database on the computers of Jane and her fellow researchers allows relaxed check-out mode. She checks out the portion of the database that pertains to the location she is going to or the animal species she is studying. She and her fellow researchers can also query the portions that are checked out to other researchers, although the query results reflect old values from the last time the researchers gathered together and formed a network. She knows she may be viewing stale data, but it is still useful to her to do so. When she checks in her work at the end of the day, it is automatically integrated into the database.

DISCONNECTION MODES FOR MOBILE DATABASES

393

3. Database model We consider a distributed system consisting of n sites labeled S1 , S2 , . . . , Sn . The database, DB, is fully replicated, that is, copies of the data items are stored at every site. Users interact with the database by invoking transactions at any one of the database sites. A transaction is a sequence of read and write operations on the data items that is executed atomically. The criterion for correctness in databases is the serializable execution of transactions [4]. Serializable executions are guaranteed by using a concurrency control mechanism such as two-phase locking, timestamp ordering, or optimistic concurrency control [11]. Since two-phase locking is widely used, we assume that each site in our distributed system enforces two-phase locking locally. In a replicated database, multiple physical copies must appear to the user as a single logical copy. Therefore, the criterion for correctness in replicated databases is one-copy serializability [3]. We assume a mechanism to enforce one-copy serializability is used. It could be a synchronous control protocol or an asynchronous site management scheme such as the one discussed in section 5. A synchronous system might use Read-One-Write-All (ROWA), where read operations can read any copy and write operations must write to all copies before the transaction can complete. An alternative to ROWA is a quorum system [9] where write operations write to a write quorum of copies, updating the version number, and read operations read from a read quorum, selecting the most recent version. Operation conflicts are resolved consistently at all sites by having each site vote, either implicitly when it acknowledges a write operation or explicitly in an agreement or voting scheme. In the interest of generality, we do not require any particular communication style or distributed database management scheme as long as it has some form of global synchronization. In this paper, we deal with a distributed database in which: • The number of team members is fixed, however, some of those team members may be disconnected. In fact, members of the distributed system frequently disconnect. Members will need to gather and form an ad hoc network in different geographical locations. • Database consistency is important. Serializability is enforced locally with two-phase locking. Some method of

global synchronization must also be used, although to preserve generality, we will not specify which method. • The distributed system must be able to process database updates even though some of the team members are not available. It is not desirable to designate one of the members as the primary site which must always be present when a network is formed, therefore all members should be equal participants.

4. Disconnection types and implementation Unplanned disconnection is a disconnection without informing the distributed system of the intention to disconnect and reconnect in an orderly manner. The result is that the disconnected site is detected as a failure. Planned disconnection involves informing the distributed system of the intention to disconnect and may include the appointing of a proxy. The purpose of a planned disconnection procedure is to enable the remaining connected sites to continue processing with minimal disruption. There are a number of different ways that a disconnection can affect the database. In this section, we explore the possible kinds of planned disconnection and provide appropriate terminology. In basic sign-off mode [12], a site Si decides to disconnect and informs the system, consisting of the currently connected sites, of its intention. The database of the disconnected site becomes read-only while the access capabilities of the remaining connected sites are unaffected. This is illustrated in figure 1 in which the dotted circle indicates read-only access whereas the solid line circle indicates full read/write access. In check-out mode [13], the site Si wants to disconnect and be able to update a set of data items I (Si ). There are three variations to this mode that determine what type of access to non-checked-out items is allowed. The first variation is DB partition. In DB partition mode, the database is partitioned into I (Si ) and DB − I (Si ). The disconnected site has complete and unlimited access to I (Si ) and nothing else while the remaining system has complete access to DB−I (Si ) and nothing else. This is illustrated in figure 2. The second variation is check-out with mobile read. This mode allows the disconnected site to have read access to all database items

Figure 1. Basic sign-off of site Si .

394

HOLLIDAY, AGRAWAL AND EL ABBADI

Figure 2. Part of the database is checked out by site Si .

Figure 3. Check-out with mobile read.

Figure 5. Relaxed check-out of site Si .

Figure 4. Check-out with system read.

Figure 6. Optimistic check-out of site Si .

in addition to the read/write access to the checked-out items, I (Si ). The remaining connected sites in the system have complete (read/write) access to DB−I (Si ) and no access to I (Si ). This is illustrated in figure 3 where the dotted lines indicate the read-only portion of the database. The third variation is check-out with system read. This mode allows the connected sites in the system to have read access to all database items in addition to the read/write access to the non-checked-out items, DB − I (Si ). This is illustrated in figure 4. In check-out mode, when Si checks out an object, either Si or the remaining connected sites are prevented from accessing for read some of the objects in the database. Although this is necessary to preserve serializability, many database systems operate on lesser degrees of isolation [11]. Therefore, we define a relaxed check-out mode (figure 5) in which the remaining sites can read the items that other sites have checked out

while disconnected sites can read items they have not checked out. Optimistic check-out mode gives both the disconnected site and the connected system full access to the entire database. It assumes optimistically that conflicting updates will not occur and if they do, predefined rules can be used to integrate the conflicting updates when the disconnected sites reconnect. Optimistic check-out is illustrated in figure 6. 4.1. Basic sign-off In simple planned disconnection or basic sign-off, the database of the disconnected site becomes read-only. The connected sites continue to read and update the data item or objects. The planned disconnection will be accomplished with the help of a proxy. When a site disconnects, it appoints another site to vote or acknowledge operations on its behalf to

DISCONNECTION MODES FOR MOBILE DATABASES

ensure that replicas can be updated and in any other actions of the distributed system which require consensus. The power to vote on behalf of another member is called a proxy and the site with that power is also referred to as a proxy. In a broadcast communication scheme, all sites will receive all messages, so the proxy can easily respond to another member’s messages. In point-to-point communications, the address of the disconnected member must be changed to point to the proxy. If site Si wants to disconnect from a distributed database, it carries out a disconnect dialog so that the system is aware that it has not failed, but will merely disconnect for a period of time. Si contacts another member Sj to be Si ’s proxy. During the disconnection, Si can only read its local copy of the data. When Sj sees a message for Si , it answers on behalf of Si while Si is disconnected. Sj also keeps track of the updates to the database that Si has missed because of the disconnection. Assume that site Si wishes to disconnect. The sign-off procedure is as follows: 1. Site Si selects a proxy. The proxy can be any peer which is currently connected or a special site designated to hold proxies. 2. Give the proxy the right to vote for Si in matters concerning updates to the data items. 3. Ensure that the proxy is aware of Si ’s current state, i.e., which updates have been processed and which have not and then disconnect. Si must also go through a sign-on procedure when reconnecting: 1. Reconnect and contact the proxy before processing messages from other sites. If Si ’s proxy has itself disconnected, broadcast messages to determine who the new proxy is. 2. With the help of the proxy, determine what updates need to be made to the database to bring Si up-to-date and incorporates those updates. 3. Retrieve voting or acknowledgment rights from the proxy, release it from duty and then resume normal operations. In the case where all sites are disconnected and Si is the last to sign off, there is no other site for Si to give its proxy to. In this case, Si simply disconnects without assigning a proxy as no one in the distributed database system will be performing updates or anything else, for that matter. A potential difficulty is that of sites signing on and being unable to retrieve their proxies because the proxy holder has signed off. This problem of “who was the last to sign off?” is similar to the problem of “who was the last processor to fail?” and has been explored in [22]. The basic sign-off protocol produces executions that are one-copy serializable. Briefly, it can be argued that, since all of the transactions of the disconnected site are read-only, the values of the data that are read are those of a snapshot taken at the time of disconnection. All of the read-only transactions

395

of the disconnected site can be serialized at the time of disconnection. 4.2. Check-out mode If the site Si wants to disconnect and be able to update a particular data object, it declares its intention to do so before disconnection and “checks out” or “takes” the object for writing. This might be accomplished by obtaining a lock on the item before disconnection. An object can only be checked out to one site at a time. In order to maintain serializability in check-out mode, some of the sites are prevented from accessing the objects that do not “belong” to it. Since many database systems use two phase locking, it makes sense to implement check-out mode using the existing locking mechanisms. The site that wishes to disconnect, say, Si , acquires a write lock on the item or object it wants to update while disconnected. This write lock is like an ordinary write lock except that the “transaction” which holds it (if there is one) should not be aborted due to deadlock with ordinary transactions. The mechanism for obtaining the lock might be via a transaction or through some other means. In order to distinguish these “transactions” from ordinary user transactions, we will call them pseudo-transactions. Assume that site Si wishes to disconnect and “check out” a set of items IL (Si ). The disconnect procedure proceeds as follows: 1. Site Si selects a proxy as in basic sign-off mode and follows all the same steps to handle voting rights. 2. At the same time, Si initiates a pseudo-transaction to obtain write locks on the items in IL (Si ). 3. If the pseudo-transaction is successful, Si disconnects with update privileges on all the items in IL (Si ). If the pseudotransaction is not successful, Si tries again or disconnects with update rights on a subset of the items. When Si reconnects, the reconnect procedure is the same as for basic sign-off except that Si must complete the effects of the pseudo-transactions by transmitting the new value for all items in IL (Si ) and releasing the corresponding locks. The preceding rules apply to all the variations of check-out mode. We now discuss details of the variations: DB partition, check-out with mobile read, and check-out with system read. DB partition is the most straightforward of the checkout modes. The site wishing to disconnect, Si , checks out the desired data items with pseudo-transactions and signs off. Si has read/write access of the checked-out items, IL (Si ), and nothing else. The remaining connected sites in the system have read/write access of DB − IL (Si ) and nothing else. Because the database is partitioned, the resulting executions are serializable. Figure 7 shows the activity at three sites, Si , Sj and Sk using check-out mode with mobile reads. Time proceeds from left to right and an asterisk (*) indicates the disconnection and reconnection points in time. Xi indicates the

396

HOLLIDAY, AGRAWAL AND EL ABBADI

Si *pt: wl-X & disconnect t1 r(Y0 ) r(Z0 ) w(X1 ) c

*reconnect & w(X2 ) c

t2 r(Y0 ) r(X1 ) w(X2 ) c *pt: wl-Y & disconnect t4 r(Y0 ) r(Z3 )w(Y4 ) c

Sj

*reconnect & w(Y4 ) c

Sk t3

r(Z0 ) w(Z3 ) c Figure 7. Check-out mode with mobile read.

version of data item X written by transaction i. In figure 7, Si first acquires a write lock on X with a pseudotransaction (pt) and disconnects. t1 and t2 are examples of transactions that may be executed at the disconnected site Si . Site Sj later disconnects with a write lock on Y and executes transactions that read Y and Z and update Y , e.g., t4 . Site Sk remains connected and executes transaction t3 . Notice that Si can execute transactions during its disconnection that read all other database items without getting read locks before disconnection on those items. This is because all of Si ’s transactions will be reading versions of those data items that existed at disconnect time. In order to preserve correctness, it must be possible to serialize all of the transactions executed by Si during disconnection at the point in time of disconnection. This can be done if: 1. Only those items IL (Si ) write locked by pseudo-transactions at disconnect time can be modified by Si during disconnect. 2. Those items write locked by pseudo-transactions at disconnect time can neither be read nor written by other sites (consequence of holding write lock) and the pseudotransaction cannot be aborted in order to release the lock. 3. Items not write locked by pseudo-transactions at disconnect time are treated as read-only by Si during disconnect (unless they were currently locked by other transactions at the time of disconnection). This will guarantee serializability because each transaction whether at a connected or disconnected site respects two phase locking. The disconnected site is required to respect locks held by ongoing transactions from other sites at the time of disconnect. Thus, in figure 7, t4 executes under the conditions that X is write locked by a pseudo-transaction and other items may be locked by transactions that were ongoing when site Sj disconnected. The equivalent and correct serial order for these transactions is t1 , t2 , t3 , t4 . The check-out mode with mobile reads produces executions that are one-copy serializable. Briefly, write locks obtained at the time of disconnect by pseudo-transactions are held during disconnect and released only after the new values are revealed at reconnection time. These locked items are the only data items modified by the transactions at the disconnected site and these transactions are serialized with respect to each other because of local two phase locking. They are serialized with respect to the transactions of the rest of the

system at the point in time of disconnection (this is necessary because they may have read versions of items that existed only at that time). All of the actions of the disconnected site can be considered a single long transaction from the point of view of the rest of the system. Therefore, the algorithm is correct because two phase locking is correct and the distributed database management system enforces global synchronization. Let us explore in more detail what happens when several sites are disconnecting and reconnecting. If more than one site tries to check out item X at the same time, one of the pseudo-transactions will fail; the item is already locked. Assume that after site Si disconnects and checks out X, another site Sk wants to disconnect and check out Z. Site Sk will observe that a pseudo-transaction from Si has a write lock on X. Site Sk now asks for and gets data item Z for check-out and disconnects. Si is unaffected as its transactions involve data item Z as read-only and it reads the version of Z that existed when Si disconnected. If Sk reconnects first, it transmits its new value for Z and asks its proxy for any updates that it missed. It will see that X is still write locked by Si . When Si reconnects, it will not even know that Sk ever disconnected. Now consider what happens if they reconnect in the reverse order. When Si reconnects it will update X and release its lock on X by “committing” its pseudo-transaction. In processing the missed updates, it will find out that Z is write locked by a pseudo-transaction from Sk . Sk reconnects, transmits the new value of Z and in processing its missed updates, finds that the pseudo-transaction that locked X has completed. Check-out mode with system read is illustrated in figure 8. When Si and Sj disconnect, they can only access the items they have checked out. The remaining connected sites can read the old values of those checked-out items in addition to having full access to non-checked-out items. Thus, t2 reads old values of X and Y in addition to reading and writing Z. When t3 reads X and Y , it gets the value of X updated by the reconnection of Si , however, Sj is still disconnected, so the value of Y is the old value. The correct serial order of these transactions is: t2 , t1 , t3 , t4 . This mode can be implemented by having the pseudo-transactions obtain read-locks on the items to be checkedout. It must be possible to convert these read-locks to write-locks when Si reconnects so that the new values can be written. In order to maintain correctness, it must be possible to serialize all of the transactions executed by Si during disconnection at the point in time of reconnection. This can be done if:

DISCONNECTION MODES FOR MOBILE DATABASES

397

Si *pt: wl-X & disconnect t1 r(X0 ) w(X1 ) c Sj *pt: wl-Y & disconnect

*reconnect & w(X1 ) c *reconnect & w(Y4 ) c t4 r(Y0 ) w(Y4 ) c

Sk t2

r(X0 ) r(Y0 ) r(Z0 ) w(Z2 ) c t3 r(X1 ) r(Y0 ) w(Z3 ) c

Figure 8. Check-out mode with system read.

Si *pt: wl-X & disconnect t1 r(X0 ) r(Z0 ) w(X1 ) c

*reconnect & w(X2 ) c t2 r(X1 ) w(X2 ) c

Sj

*pt: wl-Y & *disconnect t4 r(Y0 )w(Y4 ) c

*reconnect & w(Y4 ) c

Sk t3 r(Z0 ) w(Z3 ) c t5 b(X0 )b(Y0 )r(Z3 ) w(Z5 ) c Figure 9. Relaxed check-out mode using browse locks.

1. Only those items IL (Si ) locked by pseudo-transactions at disconnect time can be modified by Si during disconnect. 2. Those items IL (Si ) locked by pseudo-transactions at disconnect time can be read but not written by other sites (consequence of holding read lock) and the pseudotransaction cannot be aborted in order to release the lock. When Si reconnects, it will convert those read locks to write locks if it has updated the items and this means that ongoing transactions that have read IL (Si ) must abort. 3. Items not locked by pseudo-transactions at disconnect time cannot be accessed by Si during disconnect. The check-out mode with system reads produces executions that are one-copy serializable. The locked items I (Si ) are the only data items modified by the transactions at the disconnected site Si and these transactions are serialized with respect to each other because of local two phase locking. They are serialized with respect to the transactions of the rest of the system at the point in time of reconnection (rather than the point of disconnection as for check-out with mobile read). 4.3. Relaxed check-out mode In relaxed check-out mode, all sites, whether connected or not, can read any database item. We can implement this by starting with check-out mode with mobile read and adding the capability for connected sites to access in browse mode the items that other sites have checked out. An item can only be checked out to one site at a time and transactions from sites which have not checked it out are not allowed to wait for a write lock. According to Unland and Schlageter [26], a browse lock permits the holder of the lock to read in a dirty mode and does not guarantee consistent reading or permit writing.

The compatibility matrix for read, write, and browse locks according to [26] is: r w b

r w y n n n y y

b y y y

If Si disconnects and wants to read and write X, why shouldn’t the transactions at the connected sites be able to access the “old” value of X during the disconnect? It is not really necessary to prevent transactions from reading X as the value of X has not been changed yet. Only when the disconnected site which has taken X reconnects and sends out the new value of X will it change the value of X. In the relaxed check-out mode, we allow reads of the old value of X by allowing transactions at other sites to get a browse lock when they cannot get a read lock because the item is write locked by a pseudo-transaction. In this way, other sites can see the old value of X and Si can read and write X while disconnected and then update the entire distributed database when it reconnects. Relaxed check-out mode allows some non-serializable executions and permits greater concurrency. Figure 9 shows an example execution using browse locks. In this example, transaction t5 at site Sk cannot get read locks on X or Y , so it gets browse locks on them. The nonserializable executions occur because of the browse lock. In figure 9, there is a dependency cycle involving t1 , t3 and t5 . Relaxed check-out mode works as follows. We allow a site Si to disconnect and check out some items, IL (Si ) with write locks and also allow Si to read the set of items that were not locked. Other sites can choose to let their transactions get browse locks on IL (Si ) realizing that they may be reading old data. The rules that must be followed are: 1. Only those items IL (Si ) locked by pseudo-transactions at disconnect time can be modified by Si during disconnect.

398

2. Those items locked by pseudo-transactions at disconnect time can be browsed but not modified by other sites. Any transaction which browses an item in IL (Si ) knows it has given up serializability. Transactions from other sites attempting to write to those items, IL (Si ), should be aborted rather than allowing them to wait, since they might have to wait a very long time. 3. Items not locked by pseudo-transactions at disconnect time can be accessed for reading in transactions executed by Si during disconnect. The ANSI/ISO SQL-92 specifications [2] define four isolation levels for transactions: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE. The relaxed check-out mode corresponds to level 2, READ COMMITTED (also called “cursor stability” [11]), if browse locks are only used on write locks held by pseudotransactions. This is because the browse permits the reading of old data and non-repeatable reads if the item is browsed again after the disconnected site reconnects. However, the read will never be that of an uncommitted and, thus, truly dirty value. If browse locks are allowed on items that are write-locked by any transaction, pseudo or “real”, then the appropriate isolation level is level 1, READ UNCOMMITTED, also called “browse”. 4.4. Optimistic check-out Optimistic check-out allows all sites to access all data items (unless locked locally) whether connected or not. The problem this causes is that conflicting updates have to be reconciled and their transactions may be rolled back upon reconnection. Existing databases such as Oracle have developed application dependent reconciliation rules for conflict resolution [18]. Additionally, there are meta-rule paradigms to specify which rules to apply and in what order if more than one rule is applicable. For example, Jagadish et al. [15] describe a meta-rule language. Using techniques like this, it should be possible to create an automated conflict resolution procedure. Since a disconnecting site has the opportunity to check out database items, we can suggest rules based on that check-out which will increase the ability of the system to integrate the updates automatically. Using these rules, optimistic check-out requires no pseudo-transactions to acquire locks, however, a disconnecting site, Si , can assist in the eventual reconciliation process by checking out certain parts of the database. The remaining connected sites must agree to the check-out and items can only be checked out to one site at a time. When Si checks out items X, 1. A transaction at Si that accesses only those items IL (Si ) checked out at disconnect time can be committed. When Si reconnects, those updates to IL (Si ) will have priority over any updates made to IL (Si ) by other sites. 2. A transaction at Si that accesses only those items DB − IL (Si ) not checked out at disconnect time can be committed. When Si reconnects, any updates made to

HOLLIDAY, AGRAWAL AND EL ABBADI

DB − IL (Si ) by other sites will have priority over those updates to IL (Si ). 3. A transaction at Si that accesses both items in DB − IL (Si ) and IL (Si ) can only be committed conditionally with the understanding that the transaction may have to be rolled back and perhaps redone. 4. Transactions at the remaining connected sites are similarly constrained during a disconnection. Optimistic check-out mode may or may not be serializable depending on the strictness of the reconciliation and the ability of the application to undo transactions. 5. Implementation in an asynchronous system The traditional synchronous replica management techniques requires the synchronous execution of the individual read and write operations to be executed on some set of the copies before transaction commit (this is called eager update). For example, a read-one write-all locking protocol would execute each read operation by acquiring a read lock, while write operations are executed by obtaining write locks on all copies. After all the operations are executed, on all sites, the transaction is committed using two phase commit. Similarly, the quorum locking approach [9] requires reads to obtain locks on a quorum of copies and writes to obtain locks on a quorum of copies. The implementation of our disconnection modes in a synchronous system with eager update is fairly obvious as the pseudo-transactions simply obtain locks in a straightforward manner. However, such a system is not practical on a distributed network experiencing congestion and delays. There is, therefore, an increasing interest in asynchronous replica management protocols in which database transactions are executed locally, and the effects of these transactions are incorporated asynchronously on remote database copies. We now develop a solution for an asynchronous “epidemic” system. We use as an example, an epidemic algorithm for replica update that allows update-anywhere capability and maintains one-copy serializability in the system [1,24]. We use the epidemic style of communication [16,19,25], since epidemic communication does not require the distributed system member to be continuously connected as do traditional replica management systems. The members can individually propose updates to the system state and communicate these updates to the other members using epidemic communication. That is, periodically a member decides to contact another member and exchange information. Through this kind of pairwise communication, the knowledge of the proposed updates spreads throughout the system. Since the members connect for a short time to exchange epidemic messages, then disconnect, this model seems well suited for supporting users in mobile and disconnected environments. 5.1. Epidemic-based distributed database management In an epidemic based transactional system [1], a transaction t executes locally under a local concurrency control mecha-

DISCONNECTION MODES FOR MOBILE DATABASES

nism, such as two phase locking. At the point of commit, a precommit record is written to an event log. That record is propagated to other system members using epidemic communication. As other members become aware of t, they perform the necessary synchronization and communicate their action to each other via epidemic messages. Each member maintains a vector clock that captures the causal order of events (transaction precommits). In addition, each member Si keeps a two-dimensional time table Ti (sometimes called a matrix clock) as defined by Wuu and Bernstein [28], which corresponds to Si ’s most recent knowledge of the events at all members. This timing information is included in all epidemic messages. The time table ensures the following timetable property: if Ti [k, j ] = v then Si knows that Sk has received the records of all events at Sj up to time v (which is the value of Sj ’s local clock). The precommit record for a transaction from Si that is stored in the event log, contains the readset, writeset, the values written, and a precommit vector timestamp which is the ith row of Si ’s timetable, Ti . The timing information is used to determine when a given site is aware of a set of events. Based on the information, transactions can be committed or aborted and records may be garbage collected from the event log. 5.2. Basic sign-off We provide a sign-off and sign-on procedure for the basic sign-off mode for this protocol. Assume that team member Si wants to disconnect and wants some other team member to act as its proxy during disconnection. Si , preparing to disconnect, stops generating new transactions. Si contacts another member Sj to be Si ’s proxy. During the disconnection, Si can only read its local copy of the data. Si signs off and disconnects with the following disconnect dialog: 1. Si contacts Sj , and indicates that it is leaving and requests Sj to act as its proxy. Si stops accepting epidemic messages from any member other than Sj . Sj stops accepting epidemic messages from any member other than Si and does not precommit any local transactions during the disconnect dialog. 2. Sj sends Si all of the event log (precommit) records that Sj believes Si has not seen and its timing information. This ensures that Si knows everything Sj knows before it disconnects. 3. Si sends Sj its event log records and timing information. This ensures that Sj has an accurate picture of Si ’s state so that Si can be brought back up to date when it reconnects. When Si has sent this message and knows it has been received, Si disconnects. 4. When Sj has received and processed this information, it marks its event log with an identifier for Si to indicate that all records after this point should not be garbage collected since they are needed to bring Si up to date when it reconnects.

399

5. Sj resumes sending and receiving epidemic messages and acts as a proxy for Si . Transaction processing during the disconnect proceeds as follows. When processing a precommit record from an incoming epidemic message, the proxy Sj checks its event log for conflicts and marks the record aborted if there were conflicts. If the vector timestamps are incomparable, the transactions are considered concurrent, and thus, their read and write sets must be checked for conflicts. If there were no conflicts, Sj executes the transaction locally as in normal processing. Sj updates its timetable as follows. Let Tj be Sj ’s timetable and say the update originated at member Sk and has time tk . Then Tj [j, k] is set to tk and also Tj [i, k] is set to tk . This last assignment means that Si is aware of the record from Sk bearing time tk . Si is actually disconnected and unaware of the update, however, Sj has Si ’s proxy and is acknowledging updates on behalf of Si . Other members will believe that Si knows about their updates and has acknowledged, so the updates can be committed or aborted and garbage collected from their logs. Sj can also commit or abort the transactions, but it must keep the updates in the event log rather than garbage collecting them. Sj finishes processing the incoming record by adding a precommit record to its event log, thus precommitting the remote transaction. Sj handles the other records similarly and processes the received timetable information as usual except that it does not garbage collect log records. When Sj precommits a local transaction, it increments its local clock Tj [j, j ], gives the transaction the timestamp Tj [j, ∗] (that is, the j th row of Tj ) and writes a precommit record to its log. When Sj has Si ’s proxy, it must also change its timetable to indicate that Si knows about the transaction by setting Tj [i, j ] to the new value of Sj ’s clock Tj [j, j ]. When Si reconnects, it contacts Sj and gets the updates that it missed, its new log and timetable and retrieves its voting capability. The reconnect dialog is: 1. When Si reconnects, it contacts the member to which it gave its proxy, Sj , or sends out a query, “Who is acting as Si ?”. The member which is acting as Si will respond and the real Si can begin the reconnect dialog to retrieve its voting rights. 2. Sj stops processing transactions and incoming epidemic records and sends its entire event log to Si followed by its timetable, Tj . Sj can now garbage collect its event log and proceed with normal operations as it has given Si back its proxy. 3. Si processes the message in the same manner as it would any other epidemic message. Sj ’s timetable Tj becomes Si ’s new timetable. Si ’s log and timetable are now up to date and it begins accepting epidemic messages from other members and handling local read-only and update transactions. Using this procedure, a member can accept proxies for several other members at once. The team member that is acting

400

HOLLIDAY, AGRAWAL AND EL ABBADI

as a proxy for other team members does not have to maintain separate logs and timetables for each such member. Rather, it must remember which members it is representing and perform a small amount of extra processing. The event log of the member with the proxy can grow large, so it needs to have the option of requiring a complete database copy upon reconnect.

quire write locks through the use of pseudo-transactions so that those items locked can be updated during disconnection. The pseudo-transactions are completed with a second record upon reconnect that automatically updates the items.

5.3. Check-out mode

As in check-out mode, when Si wants to disconnect, Si precommits and sends out a pseudo-transaction record with IL (Si ) as its writeset and with a timestamp containing a ∞, for Si ’s clock. So, as in check-out mode, the timestamp is Ti [i, ∗] except Ti [i, i] = ∞. When it reconnects, Si also sends out an epidemic message with IL (Si ) as its writeset. This is recognized by the other sites and those items become available again. In relaxed check-out mode, transactions now have the option to acquire browse locks if an item is write locked by a pseudo-transaction. Since a transaction does all of its reads locally, the transaction and its home site can make the decision to acquire a browse lock when it wants to read an item that is write locked by a pseudo-transaction. When the transaction that used the browse capability precommits, the browsed item is not included in its readset as that would be detected as potentially causing a non-serializable execution and so the transaction would be aborted. In this way, Si can acquire write locks and update items during disconnection, while other sites are allowed to browse the checked-out items. The execution in figure 9 is possible in such a system.

The three variations of check-out mode have similar implementations in the epidemic system, so we will describe only check-out mode with mobile reads. In this mode, Si wants to disconnect and take possession of some data. That is, Si wants more than a read only copy of the database while it is disconnected. Si wants to be able to read and write data items in IL (Si ) and to be able to read other items in the database. Other sites will not be able to read or write items in IL (Si ). To accomplish this, we make use of the fact that, according to the epidemic protocol, all transaction precommit records attempting to update an item will abort if there is a concurrent conflicting transaction. We only need to begin a transaction and acquire the necessary locks, but not complete the transaction while Si is disconnected. This special type of transaction is a pseudo-transaction. Before the disconnect dialog, Si precommits and sends out a record for a pseudo-transaction with IL (Si ) as its writeset with a timestamp containing a distinguished value, say ∞, for Si ’s clock. Thus, the timestamp is Ti [i, ∗] except Ti [i, i] = ∞. When sites receive this update record, they process it as they would any epidemic update record by acquiring write locks on behalf of these pseudotransactions except: any timestamp is incomparable with one containing a ∞ and they do not update their timetable with this value. As a result, this record will remain in their event log and not be garbage collected and will cause the abort of any transaction which reads or updates items in IL (Si ) (except an update from Si when Si reconnects). The database could note which data items are thus restricted and so prevent transactions from attempting to read or update those items before precommit. The pseudo-transaction update record for IL (Si ) that was sent out by Si will eventually become known by all other sites. When this information gets back to Si , Si will know if there were conflicts, in which case, the pseudotransaction indicated by the record failed and must abort. In this case, Si failed to get the items and can either try again or disconnect without them. If there were no conflicts, the pseudo-transaction was successful and Si can take possession of the data items in IL (Si ) when it disconnects. At this time Si can begin the disconnect dialog and disconnect with the use of the data items that it successfully acquired. When Si reconnects, it sends out an epidemic message with IL (Si ) as its writeset with the current values for those items. This record has a normal timestamp. Other sites compare it to records in their log as usual, looking for conflicts. Since its site of origination is the same as the site of the ∞ value in a conflicting record, it is allowed to supersede the log record, thus making it possible to update the items in IL (Si ) again. Thus, when a site Si disconnects, it can ac-

5.4. Relaxed check-out mode

6. Related work Network partitioning is a separate but related issue to disconnected operation. “Partitioning” is a communication failure in which the network fragments into isolated subnets called partitions. If this situation is not adequately dealt with, uncoordinated updates may be applied to different copies of the database. This situation differs from disconnected operation in that in disconnected operation, there are no subnets or nonprimary partitions, just single isolated sites. Also, in the case of planned disconnection, the disconnecting site has the opportunity to inform the system that it is leaving. In [6], Davidson et al. describe the problems for replicated databases that are introduced by network partitioning and several solutions. Strategies for handling partitions can be classified as optimistic or pessimistic. Optimistic strategies allow transactions to run in partitions and then use version vectors, timestamps, or precedence graphs to resolve inconsistencies when the partitions merge. By checking out data items, our protocols, which are essentially pessimistic, predefine the rules used to resolve inconsistencies upon reconnect. Pessimistic strategies include Primary Copy and Tokens which restrict which sites can process updates and quorum/voting protocols such as: Missing Writes, Accessible Copies, and Class Conflict Analysis. In Missing Writes, the database sites have two operational modes: normal and failure. In normal mode, transactions follow a “read-one-write-all” protocol (that is, a read

DISCONNECTION MODES FOR MOBILE DATABASES

operation can read any one copy of the desired data item and a write operation must write to all copies). When a failure is detected, sites switch to failure mode and transactions must acquire quorums in order to read and write. The Accessible Copies protocol [7] requires transactions to “read-one-writeall” with the additional restriction that data items are accessible only if the current partition contains a majority of the copies of that item. Our basic sign-off mode is an extension of the Accessible Copies protocol. Class Conflict Analysis [23] attempts to identify classes of transactions which, because of the semantics of the application or the properties of certain sites, should be allowed to execute in a partitioned environment. When a disconnecting site checks out data items, it has the effect of defining a transaction class that consists of transactions accessing those items. The Coda distributed file system has been designed to deal with disconnected clients [17]. The files are replicated at the servers as first class copies and if needed, they are cached at clients as second class copies. Coda deals with entire files, detects only write-write conflicts, and, not being designed for a high-concurrency environment, uses an optimistic replication scheme. Thus, it is very different from a distributed database which deals with atomic transactions and serializability. However, many of the lessons learned from the Coda system are valuable for our distributed database. One of the lessons was that when users have portable workstations, they often identify and download files of interest to work on while disconnected, then copy back the modified files upon reconnect. This demonstrates that users are adept at identifying and getting what they will need when disconnected. Another lesson was to off-load as much processing from the servers to the clients as possible to improve scalability. Although we do not have clients and servers in our database, it seems reasonable to put the burden of disconnect processing on the disconnecting site when possible. In Coda, voluntary disconnect is treated the same as involuntary disconnect or failure and the replication strategy is optimistic. Our check-out ability in voluntary disconnect facilitates our pessimistic replication strategy. Pitoura and Bhargava [20] have proposed a distributed database system in which a database is replicated over static hosts and mobile or portable computers. Sites are organized into clusters of strongly connected (high bandwidth, low latency) groups, which usually consist of the static hosts. A disconnected site is a cluster of one. Database copies are classified as core or quasi which essentially corresponds to first class and second class copies. To support disconnected operation, they offer weak transactions which consist of weak operations and lead to conditional updates, while strict operations and strict transactions preserve serializability. Gray et al. [10] propose a two-tiered replication scheme for databases with base nodes and mobile nodes. Tentative transactions from mobile nodes are designed to commute with other transactions to improve their chances of committing when reconciliation occurs upon reconnect. Our check-out ability gives transactions an even greater chance of commit.

401

7. Discussion Mobile computers have added a lot of flexibility to distributed systems. However, distributed databases have traditionally not been very flexible because of the need to synchronize all transactions at all sites. The three proposed modes increase flexibility and will allow distributed databases to take advantage of mobility and to be used in new ways. Basic sign-off mode allows a system which normally requires complete synchronization to continue to function even though a site has disconnected – something that would otherwise be treated as a failure. Our check-out mode allows for independent update of a portion of the database while guaranteeing a smooth reintegration with the distributed database. Relaxed checkout mode increases the concurrency of check-out mode at the cost of perfect transaction isolation, but the degree of isolation is more than sufficient for most applications. Our epidemic based asynchronous implementation is particularly suitable for mobile teams since it does not require all sites to be connected together at any given time. This implementation has been extended in [14] to allow transactions to commit with the approval of only a quorum of sites rather than requiring the votes of all sites, thus giving the protocol additional flexibility in a mobile environment. Acknowledgements This research was partially supported by the NSF under grant Nos. EIA98-18320, CCR97-12108, IIS98-17432 and IIS9970700. Dr. Holliday is the Clare Boothe Luce Professor of Computer Engineering at Santa Clara University. References [1] D. Agrawal, A. El Abbadi and R. Steinke, Epidemic algorithms in replicated databases, in: Proceedings of the ACM Symposium on Principles of Database Systems (May 1997) pp. 161–172. [2] ANSI X3.135-1992, American National Standard for Information Systems – Database Language – SQL (November, 1992). [3] P.A. Bernstein, V. Hadzilacos and N. Goodman, Concurrency Control and Recovery in Database Systems (Addison-Wesley, Reading, MA, 1987). [4] P.A. Bernstein and E. Newcomer, Principles of Transaction Processing (Morgan Kaufmann, 1997). [5] M.J. Blin, J. Lisicki and I.G. Puddy, Improving configuration management for complex open systems, ICL Systems Journal 10(1) (May 1995). [6] S. Davidson, H. Garcia-Molina and D. Skeen, Consistency in partitioned networks, ACM Computing Surveys 17(3) (1985) 341–370. [7] A. El Abbadi, D. Skeen and F. Cristian, An efficient fault-tolerant algorithm for replicated data management, in: Proceedings, 5th SIGACTSIGMOD Symposium on Principles of Database Systems (March 1985) pp. 215–229. [8] M. Faiz and A. Zaslavsky, Database replica management strategies in multidatabase systems with mobile hosts, in: Proceedings of the 6th International Hong Kong Computer Society Database Workshop (March 1995). [9] D.K. Gifford, Weighted voting for replicated data, in: Proceedings of the Seventh ACM Symposium on Operating Systems Principles (December 1979) pp. 150–159.

402

HOLLIDAY, AGRAWAL AND EL ABBADI

[10] J. Gray, P. Helland, P. O’Neil and D. Shasha, The dangers of replication and a solution, in: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (June 1996) pp. 173–182. [11] J. Gray and A. Reuter, Transaction Processing: Concepts and Techniques (Morgan Kaufman, 1993). [12] J. Holliday, D. Agrawal and A. El Abbadi, Exploiting planned disconnections in mobile environments, in: Proceedings, 10th IEEE Workshop on Research Issues in Data Engineering (RIDE2000) (February 2000) pp. 25–29. [13] J. Holliday, D. Agrawal and A. El Abbadi, Planned disconnections for mobile databases, in: Proceedings, 11th IEEE Workshop on Database and Expert Systems (DEXA 2000) (September 2000) pp. 165–169. [14] J. Holliday, R. Steinke, D. Agrawal and A. El Abbadi, Epidemic quorums for managing replicated data, in: Proceedings of the 19th IEEE International Performance, Computing, and Communications Conference (IPCCC 2000) Phoenix, AZ (February 2000) pp. 93–100. [15] H.V. Jagadish, A.O. Mendelzon and I.S. Mumick, Managing conflicts between rules, in: Proceedings of the 1996 ACM Symposium on Principles of Database Systems (1996) pp. 192–201. [16] P. Keleher, Decentralized replicated-object protocols, in: Proceedings of the 18th ACM Symposium on Principles of Distributed Computing (April 1999) pp. 143–151. [17] J.J. Kistler and M. Satyanarayanan, Disconnected operation in the Coda file system, ACM Transactions on Computer Systems 10(1) (1992) 3– 25. [18] Oracle, Oracle7 server distributed systems: Replicated data, http://www.oracle.com/products/oracle7/server/ whitepapers/replication/html/index [19] K. Petersen, M. Spreitzer, D.B. Terry, M.M. Theimer and A.J. Demers, Flexible update propagation for weakly consistent replication, in: Proceedings of the Sixteenth ACM Symposium on Operating Systems Principles (1997) pp. 288–301. [20] E. Pitoura and B.K. Bhargava, Data consistency in intermittently connected distributed systems, IEEE Transactions on Knowledge and Data Engineering 11(6) (1999) 896–915. [21] P. Reiher, J.S. Heidemann, D. Ratner, G. Skinner and G.J. Popek, Resolving file conflicts in the Ficus file system, in: Proceedings, Summer USENIX Conf. (June 1994) pp. 183–195. [22] D. Skeen, Determining the last process to fail, ACM Transactions on Computer Systems 3(1) (February 1985). [23] D. Skeen and D. Wright, Increasing availability in partitioned networks, in: Proceedings, 3rd ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (April 1984) pp. 290–299. [24] R.C. Steinke, Epidemic transactions for replicated databases, Master’s thesis, University of California at Santa Barbara, Department of Computer Science, UCSB, Santa Barbara, CA (1997). [25] D.B. Terry, M.M. Theimer, K. Petersen, A.J. Demers, M. Spreitzer and C.H. Hauser, Managing update conflicts in Bayou, a weakly connected replicated storage system, in: Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (1995) pp. 172–183.

[26] R. Unland and G. Schlageter, A transaction manager development facility for non standard database systems, in: Database Transaction Models for Advanced Applications, ed. A.K. Elmagarmid (Morgan Kaufmann, 1992) pp. 400–466. [27] D. Walborn and P.K. Chrysanthis, Pro-motion: Management of mobile transactions, in: Proceedings of the 11th ACM Symposium on Applied Computing (1997) pp. 389–398. [28] G.T. Wuu and A.J. Bernstein, Efficient solutions to the replicated log and dictionary problems, in: Proceedings of the Third ACM Symposium on Principles of Distributed Computing (August 1984) pp. 233–242.

JoAnne Holliday is the Clare Boothe Luce Assistant Professor of Computer Engineering at Santa Clara University. Ms. Holliday got her bachelors degree in physics in 1971 from the University of California at Berkeley and her masters degree in industrial engineering from Northeastern University in 1976. She worked as a computer programmer/analyst for corporations such as Raytheon, Honeywell and Unisys before deciding to return to academia. She received her PhD at the University of California at Santa Barbara in August 2000. Her research interests are in the area of distributed databases and innovative uses of the Internet. E-mail: [email protected] WWW: http://www.cse.scu.edu/∼jholliday

Amr El Abbadi received his undergraduate degree in computer science at the Faculty of Engineering in Alexandria University and his Ph.D. in computer science from Cornell University. In 1987 he joined the Department of Computer Science at the University of California, Santa Barbara, where he is currently a Professor. He is currently an editor of Information Processing Letters (IPL). He was Vice Chair of the 1999 International Conference on Distributed Computing Systems, Vice Chair for the International Conference on Data Engineering 2002, and the Americas Program Chair for the 2000 International Conference on Very Large Data Bases (VLDB). E-mail: [email protected] WWW: http://www.cs.ucsb.edu/∼amr

Divyakant Agrawal is a Professor of Computer Science at the University of California at Santa Barbara. He received his Ph.D. from State University of New York at Stony Brook. His research interests are in the area of distributed systems, databases, multimedia information storage and retrieval, digital libraries, and Web based technologies for building large-scale information systems. E-mail: [email protected] WWW: http://www.cs.ucsb.edu/∼agrawal