Exploiting Planned Disconnections in Mobile Environments JoAnne Holliday Divyakant Agrawal Amr El Abbadi Department of Computer Science University of California at Santa Barbara Santa Barbara, CA 93106 fjoanne46,agrawal,
[email protected] Abstract We present the notion of a distributed database made up entirely of mobile components. Since disconnections will be frequent in such an environment, we develop a disconnection and reconnection procedure to allow normal processing on the connected components. We briefly discuss a protocol based on epidemic communication to support such a system while ensuring one-copy serializability.
1 Introduction Mobile computers and wireless networks are now being integrated into a variety of enterprises for different applications. The prevailing mode of operation with occasional disconnection by a single user will rapidly evolve into a situation where many if not all users are disconnecting and reconnecting in networks that are created in an ad hoc manner, e.g., a wireless network in a meeting room. This will result in mobile computers being integrated as first class entities in distributed information systems. Such mobile computers will inevitably contain data and information that will need to be shared with other mobile units. We therefore anticipate the need for supporting distributed databases requiring transactional one-copy serializability [2] and made up entirely of mobile components. Consider a scenario in which the components of the distributed database are laptop computers belonging to members of a small team. The team consists of three members, Alice, Bob, and Charles, normally located in Santa Barbara. The current project involving these three team members requires that they maintain a database which is replicated at each of their laptop computers. Normal operations ensure that all team members are completely synchronized with each other's progress. Now consider the situation when Alice needs to go away to do a system demonstration to her clients in New York. In order to do so, she would like to carry a read-only copy of the database but allow Bob
and Charles the capability to continue their work. Subsequently, Alice runs into some design modifications and therefore asks Charles to join her in New York. Charles carries a copy of the database on his laptop but empowers Bob to continue to work on the system. After new requirements are identified, Alice and Charles summon Bob to New York to approve the changes and close the deal. As soon as Bob arrives in New York, the three team members form an ad hoc wireless network and continue their development work as they were doing in California. The above scenario which currently seems far-fetched will soon become a standard modus operandi. Based on the above scenario, we propose in this paper a notion of mobile distributed databases that need to operate under the following conditions:
Members of the distributed system frequently disconnect. While they are disconnected, their copy of the database becomes read-only.
Members will need to gather and form an ad hoc network in different geographical locations.
Database consistency and transaction serializability is important.
The distributed system must be able to process database updates even though some of the team members are not available.
It is not desirable to designate one of the members as the primary site which must always be present when a network is formed, therefore all members should be equal participants.
The protocol we propose to support such a mobile system uses the epidemic style of communication [12, 9], since epidemic communication does not require the distributed system member to be continuously connected as do traditional replica management systems. The members can individually propose updates to the system state and communicate these updates to the other members using epidemic communication. That is, periodically a member de-
cides to contact another member and exchange information. Through this kind of pair-wise communication, the knowledge of the proposed updates spreads like an epidemic throughout the system. Thus the members connect for a short time to exchange epidemic messages, then disconnect. This model therefore seems well suited for supporting users in mobile and disconnected environments. By using the epidemic model to propagate updates, the knowledge of these updates eventually reaches all members of the team. Although the asynchronous nature of epidemic communication is suitable for mobile environments, it still runs a risk of large delays in situations where the participation of all members is required. We intend to use the notion of planned disconnection to address this problem [5]. It is reasonable for computers in a distributed database to go through a sign-off procedure before disconnecting to tell the remaining members not to expect acknowledgments or votes on the disposition of shared data. The need to signoff before “pulling the plug” should not be strange to any modern computer user. Most operating systems, (Microsoft Windows, Linux, SunOS, etc) require a shut-down process before the computer can be turned off. Ignoring this will complicate things when the power is restored. Similarly, we can define a sign-off procedure to end a normal connection session. If it is ignored, the reconnect procedure is complicated. In this paper we propose a way for the members of a mobile team to disconnect for a long period of time with minimal disruption to the normal database processing by using an epidemic update protocol with sign-off, disconnect, and reconnect procedures. Some protocols allowing disconnection have been proposed [6, 9, 5], but we are aware of none which provide transactional semantics and guarantee full serializability in a write-anywhere environment.
2 Mobile Distributed Database Management The database exists on several mobile computers forming a mobile distributed database where data items are partially or completely replicated. We refer to the persons on the team as team members and their computers as system members or simply members. We do not require any of the members to be stationary or always connected to the network. Also, each of the members are fully capable of generating transactions (both read-only and update), although, when disconnected, the members can only process readonly transactions. No member is designated as the primary site so that any of the team members could choose to disconnect. When some or all of the members connect to form a distributed system, we assume that the network is pointto-point and it may or may not be reliable. There are several commercial products, currently available that allow the for-
mation of this type of ad hoc network. Numerous protocols have been proposed to implement epidemic propagation in distributed systems. In general, most of these protocols maintain two types of control information, timing information and a record of pending updates, in addition to the database state. The timing information is used for ensuring that non-commutative updates are incorporated in the same order at all member copies. Furthermore, in some protocols, the timing information at a copy site is used to determine the updates that must be sent to other sites during propagation. The timing information can also be used to garbage collect the update records when they are no longer needed. Most prior applications of epidemic communication have only supported commutative operations, e.g., a dictionary with insert and delete [13] or single item update operations [5, 7]. We have recently developed an approach that uses epidemic update to support multi-operation transactions with one-copy serializability guarantees [1, 11]. In this approach, a transaction, t, executes locally under a local concurrency control mechanism, such as Two Phase Locking. At the point of commit, a pre-commit record is propagated to other system members via epidemic communication. As other members become aware of t, they perform the necessary synchronization and communicate their action to each other via epidemic messages. Since transactions execute on a single system member independent of other members until pre-commit, we can adapt this algorithm for a mobile environment. Given a mobile distributed database, each member may execute operations unilaterally if disconnected, but updates are committed only when it is verified that no conflicting updates have been executed on any of the members. However, if a member of the team decides to disconnect, say to attend an executive meeting, while the other team members remain connected and continue to process transactions, we allow for a planned disconnection [5]. Without this planned disconnection, the other connected team members will not be able to commit updates as they cannot get an acknowledgment or vote from the disconnected member. We add a signoff procedure for use when one of the team members wants to disconnect for a long duration to inform the distributed database system that it is going off-line for awhile. The idea is to give another member that is still connected, the right to acknowledge updates or vote in place of the disconnected member. We call this right to vote a proxy. Assume that team member Si wants to disconnect and wants some other team member to act as its proxy during disconnection. Si , preparing to disconnect, stops generating new transactions. Si contacts another member Sj to be Si 's proxy. During the disconnection, Si can only read its local copy of the data. Si signs-off and disconnects with the following disconnect dialog:
1.
2.
3.
4.
S
i contacts Sj , and indicates that it is leaving and requests Sj to act as its proxy. Si stops accepting epidemic messages from any member other than Sj .
Sj stops accepting epidemic messages from any member other than Si and does not pre-commit any local transactions during the disconnect dialog. Sj sends Si all of the event log (pre-commit) records that Sj believes Si has not seen and its timing information. This is like a normal epidemic message and ensures that Si knows everything Sj knows before it disconnects. S
i sends Sj its event log records and timing information. This ensures that Sj has an accurate picture of Si 's state so that Si can be brought back up to date when it reconnects. When Si has sent this message and knows it has been received, Si disconnects. Si can only process read-only transactions while disconnected.
5. When Sj has received and processed this information, it marks the beginning of its event log with an identifier for Si to indicate that all records after this point should not be garbage collected since they are needed to bring Si up to date when it reconnects. 6.
Sj resumes sending and receiving epidemic messages and acts as a proxy for Si .
In order to explain the handling of transactions when a proxy has been given, we will represent the timing information with vector clocks and the time table defined by Wuu and Bernstein [13]. Time is recorded using a vector clock [8] that captures the causal order of events and ensures the following property:
e; f E e f iff Time(e) < Time(f ): Note that if Time(e) and Time(f ) are incomparable, denoted Time(e) Time(f ), then events e and f are concurrent. Each member S keeps a two-dimensional time table T , which corresponds to S 's most recent knowledge of 8
2
!
i
i
i
the events at all members. Each row of the time table is a vector clock. Each time table ensures the following timetable property: if Ti [k; j ] = v then Si knows that Sk has received the records of all events at Sj up to time v (which is the value of Sj 's local clock). Wuu and Bernstein define the HasRecvd predicate as:
HasRecvd(T ; t; S ) T [k; Site(t)] Time(t) where t is an event, Site(t) is the site at which t occurred, and Time(t) is the local time at Site(t) when t occurred. When HasRecvd(T ; t; S ) is true S knows that S has received a record of event t. When a site S has completed the i
k
i
k
i
i
k
i
local execution of the operations for an update transaction t, it places a pre-commit event record in the log recording the readset (RS(t)), writeset (WS(t)), the values written, and a pre-commit vector timestamp (TS(t)) which is the ith row of Si 's time-table, Ti [1]. When Si sends a message to Sk it includes all records t such that HasRecvd(Ti ; t; Sk ) is false, and it also includes its time-table Ti . When Si receives a message from Sk it applies the updates of all received log records and updates its time-table in an atomic step to reflect the new information received from Sk . When a site receives a log record it knows that the log records of all causally preceding events either were received in previous messages, or are included in the same message. This is referred to as the log property which is stated as follows with respect to a local copy of the log Li at site Si and the set of events E :
e; f E if (e
8
2
!
f)
^
f L)
(
2
i
then
e L: 2
i
Transaction processing during the disconnect proceeds as follows: When processing an incoming epidemic record, the proxy Sj checks its event log for conflicts and marks the record aborted if there were conflicts. If the vector timestamps are incomparable, the transactions are considered concurrent and thus their read and write sets must be checked for conflicts. Aborting conflicting concurrent transactions ensures one-copy serializability. A site can detect if two transactions are conflicting because their log records contain their read sets, write sets and version vectors. That is, the condition for conflict between transactions t and t0 is:
C onf licting (t; t
0
2 66 0 6 )6 B 66 BB 4@
V ( )\ W ( )\ W
3 1 77 ( ) 6= ; CC 777 ( ) 6= ; C 7 A5
T S (t) T S (t
RS t
WS t
WS t
W S (t)
RS (t
)
0
WS t
\
0
0
0
6 ;
)=
If no conflicts are detected, Sj initiates a transaction to apply the results of the original transaction to that site and puts the pre-commit record in its own event log. Sj updates its timetable as follows: Let Tj be Sj 's timetable and say the update originated at member Sk and has time tk . Then Tj [j; k] is set to tk and also Tj [i; k] is set to tk . This last assignment means that Si is aware of the record from Sk bearing time tk . Si is actually disconnected and unaware of the update, however, Sj has Si 's proxy and has acknowledged updates on behalf of Si . Other members will believe that Si knows about their updates and has acknowledged, so the updates can be committed or aborted and garbage collected from their logs. A transaction is committed and the remainder of its locks released when it is not aborted and it is known that all sites have knowledge of that transaction. This information can be
determined from the time table. Sj can also commit or abort the transactions at the appropriate time, but it must keep the updates in the event log rather than garbage collecting them. Sj finishes processing the incoming record by adding a pre-commit record to its event log thus pre-committing the remote transaction. Sj handles the other records similarly and processes the received timetable information as usual except that it does not garbage collect log records. An alternative would be to garbage collect and require a reconnecting member to copy the entire database. The decision to take this alternative would depend on, among other things, the size of the database, the size of the log, and the length of the disconnection. When Sj pre-commits a local transaction, it increments its local clock Tj [j; j ], gives the transaction the timestamp Tj [j; ] and writes a pre-commit record to its log. When Sj has Si 's proxy, it must also change its timetable to indicate that Si knows about the transaction by setting Tj [i; j ] to the new value of Sj 's clock Tj [j; j ]. When Si reconnects, it contacts Sj and gets the updates that it missed, its new log and timetable and retrieves its voting capability. The reconnect dialog is: 1. When Si reconnects, it tries to contact the member to which it gave its proxy Sj . If that member is disconnected (Sj will have given its proxy voting rights along with Si 's to some other member), then Si must send out a query, “Who is acting as Si ?”. If a broadcast mechanism is available, it may be efficient to use it at this time. Otherwise, the query must be propagated along with other epidemic messages. The member which is acting as Si will respond and the real Si can begin the reconnect dialog to retrieve its voting rights. 2.
S
j responds to Si with a message indicating whether a log exchange or complete database recovery is needed. (If Si has been disconnected too long, Sj may have decided it needs to garbage collect its event log and Si will have to read the complete database to get up to date.)
3. If a log exchange is indicated, Sj stops processing transactions and incoming epidemic records and sends its entire event log to Si followed by its timetable, Tj . As soon as this message is acknowledged, Sj can garbage collect its event log and proceed with normal operations as it has given Si back its proxy. 4.
S
i processes the message like any other epidemic message except that there is no need to update the timetable as each log record is processed because Sj 's entire timetable Tj becomes Si 's new timetable without modification. Si 's log and timetable are now up to date and it begins accepting epidemic messages from
other members and handling local read-only and update transactions. Using this procedure, a member can accept proxies for several other members at once. The team member that is acting as a proxy for other team members does not have to maintain separate logs and timetables for each such member. Rather, it must remember via a bit vector, which members it is representing and perform a small amount of extra processing. The event log of the member with the proxy can grow large, so it needs to have the option of requiring a complete database copy upon reconnect. Disconnection and reconnection takes times, of course, however, while a member is disconnected, the additional burden on its proxy is minimal. The disconnect/reconnect procedure preserves one-copy serializability. This is because, all of the transactions of the disconnected member are read-only. The values of the data that are read are those of a snapshot taken at the time of disconnection. All of these transactions can be serialized at the time of disconnection. Transactions which occur on the connected members are one-copy serializable because the only difference between current operations and the onecopy serializable protocol in [1] is that some members are disconnected and thus not generating possibly conflicting transactions and not incorporating the updates which their proxies are acknowledging for them. However, when they reconnect, they will be given the updates in causal order to incorporate in their copy of the database.
3 Discussion In this paper we proposed the idea of a mobile distributed database that supports the operation of a mobile team. We developed an epidemic based protocol for managing such a system, and integrated it with a planned disconnection procedure. The epidemic approach is especially appropriate for disconnection since the event logs maintain the record of all events necessary for execution even when some members are not available. We have assumed that any failures will be detected and corrected quickly, so that disruptions to the protocol caused by failures and network partitions can be disregarded. This is not realistic, and a possible solution is to use quorums instead of requiring a vote from all sites. Recently, we extended the epidemic approach to incorporate quorums, thus increasing the degree of fault-tolerance and disconnections allowed [11, 4]. Our approach in this paper, however, is a first look at the requirements of mobile distributed databases and ways to fill those requirements, for example, with planned disconnections. Many issues need to be explored and we briefly discuss some of these now. Widespread use of mobile distributed database may result in the formation of large mobile teams. The use of
vector timestamps and time-tables can limit such scalability. These issues have been addressed frequently by the distributed systems community. Rabinovich et al. [10] have proposed a more scalable mechanism than vector timestamps. A variant of the log and time-table algorithm called the two-phase gossip protocol [3] reduces the size of the two-dimensional time-table from n2 to 2 n. Wuu and Bernstein [13] suggest a number of ways including having each member keep only its own timetable row and a row for each of its neighbors (or just two other members) instead of storing the entire time-table. Furthermore, we note that communications networks that support mobile systems are likely to be expensive. Ways to keep messages short in this protocol should be investigated. Even in the case where each member keeps an entire time-table, the member could send only its row of the time-table rather than its entire table at the end of each epidemic message. This would have the effect of slowing down the spread of update acknowledgments, thus slowing the commit response time. In exchange for this, messages would be shorter. Security is another issue of concern in a mobile system where team members disconnect and later reconnect. When a member reconnects and wants his proxy back, how can we authenticate his request? How can we prevent an intruder from masquerading as a disconnected member of the team? Should we use passwords or some other security mechanism in this environment? Should we allow for manual assignment of proxies for a team member that is known to have disconnected without signing off or who planned to be at a meeting and is unavoidably absent. For mobile teams to be a reality, these security issues need to be further investigated. Finally, our proposed protocol permits the disconnected member to only read data. If only Alice will be working on a particular area of the database after she disconnects, is there a way for her to “check-out” and “take” those data items when she disconnects so that she can update those items rather than limiting her to read-only access? What limits must we place on the connected members in order to preserve serializability? We are currently working on a disconnect protocol to allow a member to claim read/write privileges on a data item when they disconnect.
References [1] D. Agrawal, A. El Abbadi, and R. Steinke. Epidemic Algorithms in Replicated Databases. In Proceedings of the ACM Symposium on Principles of Database Systems, pages 161– 172, May 1997. [2] P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems. Addison Wesley, Reading, Massachusetts, 1987. [3] A. A. Heddaya, M. Hsu, and W. E. Weihl. Two Phase Gossip: Managing Distributed Event Histories. Information
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
Sciences: An International Journal, 49(1,2,3):35–57, October/November/December 1989. Special issue on Databases. J. Holliday, R. Steinke, D. Agrawal, and A. El Abbadi. Epidemic Quorums for Managing Replicated Data. Technical Report TRCS 99-32, Department of Computer Science, University of California at Santa Barbara, 1999. http://www.cs.ucsb.edu/TRs/TRCS99-32.ps. P. Keleher. Decentralized Replicated-Object Protocols. In Proceedings of the 18th ACM Symposium on Principles of Distributed Computing, Apr. 1999. J. J. Kistler and M. Satyanarayanan. Disconnected operation in the Coda file system. Operating Systems Review, 25(5):213–225, Oct. 1991. R. Ladin, B. Liskov, L. Shrira, and S. Ghemawat. Providing High Availability Using Lazy Replication. ACM Transactions of Computer Systems, 10(4):360–391, Nov. 1992. F. Mattern. Virtual time and global states of distributed systems. In M. C. et. al., editor, Parallel and Distributed Algorithms: proceedings of the International Workshop on Parallel & Distributed Algorithms, pages 215–226. Elsevier Science Publishers B. V., 1989. K. Petersen, M. Spreitzer, D. B. Terry, M. M. Theimer, and A. J. Demers. Flexible Update Propagation for Weakly Consistent Replication. In Proceedings of the Sixteenth ACM Symposium on Operating Systems Principles, pages 288– 301, 1997. M. Rabinovich, N. H. Gehani, and A. Kononov. Scalable update propagation in epidemic replicated databases. In Proceedings of the International Conference on Extending Data Base Technology, pages 207–222, 1996. R. C. Steinke. Epidemic Transactions for Replicated Databases. Master' s thesis, University of California at Santa Barbara, Department of Computer Science, UCSB, Santa Barbara, CA 93106, 1997. D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. Spreitzer, and C. H. Hauser. Managing update conflicts in Bayou, a weakly connected replicated storage system. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, pages 172–183, 1995. G. T. Wuu and A. J. Bernstein. Efficient Solutions to the Replicated Log and Dictionary Problems. In Proceedings of the Third ACM Symposium on Principles of Distributed Computing, pages 233–242, Aug. 1984.