Failure Recovery based on Quasi-Synchronous Checkpointing in Mobile Computing Systems D. Manivannan and M. Singhal Department of Computer and Information Science The Ohio State University Columbus, OH 43210. E-mail: fmanivann,
[email protected] Abstract
Mobile computing systems are expected to revolutionize the way computers are used. Mobile hosts have small memory, a relatively slow processor and low power batteries, and communicate over low bandwidth wireless communication links. In this paper, we address the problem of failure recovery in mobile computing systems. Any recovery method for mobile computing systems should take into consideration energy and communication bandwidth constraints under which mobile computers have to operate. Synchronous checkpointing is not suitable for mobile systems since it involves high communication cost over a low bandwidth network. Asynchronous checkpointing is not suitable because multiple checkpoints need to be stored in the stable storage and also some or all of the checkpoints taken may be useless for constructing consistent global checkpoints. In this paper, we propose a low-overhead recovery algorithm based on a quasi-synchronous checkpointing algorithm for mobile computing systems. The checkpointing algorithm preserves process autonomy by allowing them to take checkpoints asynchronously and uses communication-induced checkpoint coordination for the progression of the recovery line which helps bound rollback propagation during a recovery. Thus, it has the easeness and low overhead of asynchronous checkpointing and the recovery time advantages of synchronous checkpointing. The checkpointing algorithm ensures the existence of a recovery line consistent with any checkpoint of any process all the time. The recovery algorithm exploits this feature to restore the system to a state consistent with the latest checkpoint of a failed process, in the event of a failure. It uses selective pessimistic message logging at the receiver end to handle the messages lost due to rollback.
Keywords: Distributed checkpointing, mobile computing, failure recovery, fault-tolerance.
1
1 Introduction The design of checkpointing and recovery algorithms for distributed systems has traditionally been restricted to static hosts connected usually by high speed wired networks. Not much work has been done in the area of checkpointing and recovery in mobile computing systems. Mobile computing|the use of portable computers capable of wireless networking| is expected to revolutionize the way we use computers [8]. A mobile computing system is a distributed system in which the hosts could move from one location to another while maintaining connectivity with other hosts. Mobile hosts, networked partly by wireless and partly by wired network and equipped with limited battery life and relatively slow processors introduce a new set of issues that is not present in the traditional distributed systems with static hosts. In this paper, we present a recovery algorithm that suits mobile computing. Several checkpointing schemes have been proposed in the literature for static distributed systems. They can be broadly classi ed into two categories { asynchronous and synchronous. In asynchronous checkpointing [6], processes take checkpoints periodically without any coordination with others. To recover from a failure, a process communicates with other processes to determine if their local states are causally related. If they are, processes that received messages which are responsible for causal dependencies, roll back to eliminate these causal dependencies. This process is repeated until the local states of all the processes are free from causal dependencies. This approach gives processes maximum autonomy during checkpointing and has no message overhead. However, multiple checkpoints need to be stored at each process. Thus, storage requirement may be large. During recovery, processes may suer from domino eect, in which the processes roll back repeatedly while determining a consistent set
of checkpoints; in the worst case, processes may have to rollback to their initial state, thus wasting all the checkpointing eort. Thus, asynchronous checkpointing is not suitable for 2
rollback recovery in mobile computing systems. To reduce domino eect, Kim et al. [13], and Venkatesh et al. [24] use the dependency tracking and insert checkpoints before processing a new message that introduces dependency. Also, message logging [10, 11, 19, 21] and message reordering [25] have been suggested in the literature to cope with the domino eect. In synchronous checkpointing schemes, domino-free recovery is achieved by sacri cing process autonomy and incurring extra message overhead during checkpointing. In this approach, processes synchronize their checkpointing activities so that a globally consistent set of checkpoints is always maintained in the system [7, 14, 15]. The storage requirement for the checkpoints is minimum because each process keeps only one checkpoint in the stable storage at any given time. Synchronous checkpointing schemes involve high message overhead and any checkpointing method that involves high message overhead is not suitable for mobile computing systems due to the presence of low bandwidth wireless channels in the network. Moreover, process execution may have to be suspended during the checkpointing coordination as in [12, 14], resulting in performance degradation.
Paper Objectives In this paper, we present a recovery algorithm that is suitable for mobile computing systems. We rst review the quasi-synchronous checkpointing algorithm presented in [16] which has the easeness and low overhead of asynchronous checkpointing and the recovery time advantages of synchronous checkpointing. It ensures the existence of a recovery line consistent with any checkpoint of any process all the time. We then present a low-overhead recovery algorithm based on the quasi-synchronous checkpointing algorithm. The algorithm is fully asynchronous, i.e, a failed process needs only to roll back to its latest checkpoint and inform other processes about the rollback; it can resume normal computation without waiting for other processes to roll back to a consistent checkpoint. Messages are selectively logged at 3
the receiver end, to handle various types of message abnormalities that arise due to rollback; this selective message logging reduces message logging overhead during the normal operation. There is no extra message transfered during the normal computation. The rest of the paper is organized as follows. In the next section, we present the background required. In Section 3, we present the quasi-synchronous checkpointing algorithm. In Section 4, we present a basic recovery algorithm which rolls back the processes to a set of globally consistent checkpoints in the event of a failure. In Section 5, we extend the basic recovery algorithm to a comprehensive recovery algorithm which restores the system to a consistent state in the event of a failure. In Section 6, we analyze the overhead involved in the checkpointing and the recovery algorithms. In Section 7, we compare our algorithms with the existing related work. Section 8 concludes the paper.
2 Background
2.1 A Mobile Computing System Model We follow the system model described in [3]. A mobile computing system consists of a set of large number of mobile hosts (MHs) and a relatively smaller number of xed hosts called the mobile support stations (MSSs). Figure 1 shows a schematic diagram of a mobile computing system. The MSSs are connected by a static wired network. A cell is a logical or geographical coverage area under a MSS. A MH can directly communicate with a MSS only if it is present in the cell serviced by the MSS. At any given time, a MH may belong to only one cell. The static network provides reliable FIFO delivery of messages between any two MSSs with arbitrary message latency. Similarly, the wireless network within a cell ensures FIFO delivery of messages between a MSS and a local MH. When a MH leaves a cell, it sends a leave(r) message on the MH-to-MSS channel supplying the sequence number r of the last message received on the MSS-to-MH channel. After 4
Wireless Cell
Wireless Cell MH
MH
MH
MH
MH
MSS
MSS
Fixed Network
MSS
MH
MSS MH
MH
Wireless Cell
Wireless Cell
Figure 1: A Mobile Computing System
sending this message, the MH neither sends nor receives any other message within the current cell. Each MSS maintains a list of ids of MHs that are local to its cell; on receipt of the
leave() message from a local MH, it is deleted from the list. When a MH enters a new cell, it sends join(MH-id) message to the new MSS; it is then added to the list of local MHs at the new MSS. Disconnected operation: A MH disconnects by sending a disconnect(r) message to its local MSS M, where r is the sequence number of the last message it received from M (similar to a leave(r) message). When M receives the disconnect() message from an MH it deletes the MH from the list of local MHs; however, it sets a \disconnected" ag for the particular MH-id. When some other MSS M' attempts contact this MH (while it is disconnected from the network), M informs M' of the disconnected status of the MH. Later, the MH may reconnect at a MSS N a by sending a reconnect(MH-id,previous MSS-id=M) message. N 5
informs M of the MH's reconnection, and as a result, M unsets the \disconnected" ag for the MH while N adds it to its list of local MHs. The disconnect and leave messages and the FIFO nature of the wireless channel help the MSS in identifying the messages that need to be retransmitted when the MH connects at the new MSS. When an MH h1 wants to send a message to another MH h2, h1 sends the message to its local MSS M1 via the wireless network. M1 forwards it to M2, the MSS local to the mobile host h2, via the wired network. Finally, M2 forwards the message to h2 via the wireless network. Since MHs move from one cell to another cell, the location of a MH is not xed; as a result, a MSS trying to send a message to an MH must rst locate it and then send the message to the MSS local to the MH, which in turn will forward the message to the MH. Messages sent by a MH to another MH may have to be rerouted because the destination node has disconnected from the old MSS and reconnected to a new MSS. Routing protocols which handle node mobility have been proposed in [2, 4, 9, 22]. We do not address the issue of routing in this paper.
2.2 An Application Model A distributed computation in a mobile computing system consists of N sequential processes denoted by P1,P2, ,PN running concurrently on dierent mobile hosts in the network. The processes do not share a global memory or a global physical clock. Message passing is the only way for processes to communicate with one another. The computation is asynchronous: each process evolves at its own speed and messages are exchanged through communication channels, whose transmission delays are nite but arbitrary. We assume that messages are not lost, altered or spuriously introduced. Processes are fail-stop. All failures are detected immediately and result in halting failed processes and initiating recovery action [21]. A process can be inactive due to failure for an arbitrarily long, but nite time. 6
The states of processes involved in a distributed computation depend on one another due to inter-process communication. So, when a process P rolls back after a failure, the processes that have states directly or transitively dependent on P 's state are forced to roll back. The use of checkpoints on a stable storage and rollback-recovery protocols are well established techniques for dealing with process failures in a distributed system. When a failure occurs, a rollback protocol uses the checkpoints and message logs to restore the system to a consistent global state [17]. By a consistent global state we mean that if the receipt operation of a message has been recorded in the state of some process, then the send operation of that message must have been recorded also.
3 The Quasi-synchronous Checkpointing Algorithm Activity of each process is modeled by a sequence of events (i.e., executed actions). Occurrence of an event generally causes a change of the local state of the process. The local state of a process saved in stable storage is called a checkpoint of the process. Each process takes checkpoints independently. Processes also take checkpoints as a result of the reception of some messages. Independently taken checkpoints are called basic checkpoints and those triggered by message receptions are called forced checkpoints. The forced checkpoints help in advancing the recovery line.
De nitions and Notations Each checkpoint of a process is assigned a unique sequence number. The sequence number assigned to a checkpoint C is denoted by C:sn. The sequence numbers of the checkpoints of a process increase monotonically. Each message is piggybacked with the sequence number of the latest checkpoint of the process sending the message. The sequence number piggybacked with message M is denoted by M:sn. The checkpoint with sequence number m of process 7
Pi is denoted by Ci;m. The send and receive events of a message M are denoted respectively by send(M ) and receive(M ). We say send(M ) Ci;m if message M was sent by process Pi before taking the checkpoint Ci;m. Also, we say receive(M ) Ci;m if message M was received and processed by Pi before taking the checkpoint Ci;m. 2
2
De nition 1 A set S = C1;m ; C2;m ; ; CN;mN of N checkpoints, one from each process, is said to be a consistent global checkpoint1 if for any message M and for any integer f
1
2
g
i; 1 i N : receive(M ) Ci;mi = send(M ) Cj;mj for some j; 1 j N .
2
)
2
In the checkpointing algorithm, each process has two variables, sni and nexti. The value of the variable sni denotes the sequence number of the latest checkpoint taken by process Pi. The value of nexti denotes the sequence number to be assigned to the next basic checkpoint that Pi will take. The variable nexti is incremented by Pi every x time units, where x is the smallest of the checkpoint interval times of all processes. The main purpose of nexti is to keep the sequence numbers of the latest checkpoints of the processes close to each other; as we will see later, this helps in the progression of the recovery line. Now, we present the quasi-synchronous checkpointing algorithm formally.
The Quasi-synchronous Checkpointing Algorithm Data Structures at Process Pi
fSequence number of the latest checkpoint of Pi , initialized to 0. This is updated every time a new checkpoint is taken.g nexti : integer(:= 1); fsequence number to be assigned to the next basic checkpoint, initialized to 1g When it is time for process Pi to increment nexti nexti := nexti + 1; fnexti incremented at periodic time intervalsg When it is time for process Pi to take a basic checkpoint If nexti > sni then fSkip taking a basic checkpoint if nexti sni (i.e., if it had Take checkpoint C ; already taken a forced checkpoint C with C:sn nexti )g C:sn := nexti ; f Assign nexti as the sequence number for the checkpoint taken g sni := C:sn; fUpdate snig sni : integer(:= 0);
1
Also called a consistent global snapshot or a recovery line.
8
When process Pi sends a message M
M:sn := sni ; fSequence number of the current checkpoint piggybacked with message M g send (M ); When Process Pj Receives a message M from process Pi If M:sn > snj then Take checkpoint C ; C:sn := M:sn; snj := C:sn; Process the message.
Now, we explain the Quasi-synchronous checkpointing algorithm (hereafter referred to as QSA) with an example. 0
3
4
P [ 1
[
[
*
M2
M1 0
2
P [ 2
[
0
5
6
[
[
[
[
*
M3
M0
P [ 3
3
4
*
M 4
1
2
3
4
5
[
[
[
[
[
*
Figure 2: Example illustrating the QSA
The space time diagram of a distributed computation consisting of three processes P1; P2; and P3 is shown in Figure 2. The basic checkpoints are shown in the gure as \[" and the forced checkpoints are shown as \[". The sequence numbers assigned to checkpoints are also shown in the gure. Here, each process Pi increments its variable nexti every x time units. Process P3 takes a basic checkpoint every x time units, P2 takes a basic checkpoint every 2 x time units, and P1 takes a basic checkpoint every 3 x time units. Message M0 forces P3 to take a forced checkpoint with sequence number 2 before processing M0. As a result, P3 skips taking a basic checkpoint with sequence number 2. Message M1 forces process P2 to take a forced checkpoint with sequence number 3 before processing M1 because M1:sn = 3 and sn2(= 2) < 3 while receiving the message. Similarly, message M2 forces P1 to take a checkpoint before processing the message and M4 forces P2 to take a checkpoint before processing the message. However, M3 does not force the receiving process to take 9
a checkpoint before processing it. Note that there may be gaps in the sequence numbers assigned to checkpoints. For example, process P1 does not have checkpoints with sequence numbers 1 and 2.
3.1 Storing and Retrieving Checkpoints MHs are vulnerable to catastrophic failures. The failures could occur due to physical drop, outside temperature, humidity, water spill or due to exposure to security X-rays etc. Thus, the hard disks on the MHs cannot be treated as stable storage for storing checkpoints. Hence the checkpoints should be stored at the MSSs. If checkpoints are stored at the MSSs, an MH should be able to locate the MSS which holds a required checkpoint during recovery. We propose the following mechanism to solve this problem. Each process maintains a set data structure location info whose entries are ordered tuples of the form (pid; checknum; location) where pid denotes the process id, checknum denotes checkpoint number and location denotes the id of the MSS in which the checkpoint with sequence number checknum of the process with id pid is stored. When a process takes a checkpoint, it stores the checkpoint in the stable storage at the MSS local to it and also adds an entry (pid; checknum; location) corresponding to the checkpoint to the set location info; it also stores a copy of the set location info at the MSS so that the contents of the set will be available when the process fails. Thus, whenever a process needs a checkpoint with a given sequence number, it can nd the MSS containing the checkpoint from location info and request the corresponding MSS for the checkpoint. In the rest of the paper, to keep the recovery algorithm simple, we neither mention about the set location info nor show how it is updated.
4 Basic Recovery Algorithm In this section, we present a basic recovery algorithm based on the QSA presented in Section 3. The basic recovery algorithm rolls back the processes to a consistent global checkpoint in the event of the failure of a process. We assume that if a process fails, no other process 10
fails until all the processes are rolled back to a consistent global checkpoint. Now, we present the basic recovery algorithm (hereafter called the BRA).
The Basic Recovery Algorithm When process Pi fails
Roll back to the latest checkpoint C ; send roll back to(C:sn) to all the other processes.
Process Pj on receiving roll back to(n) message If snj n then
Find the earliest checkpoint C of Pj such that C:sn n; Roll back to C ; snj := C:sn; Discard all the checkpoints beyond C Else fIn this case the process does not rollback at allg Take a checkpoint C ; C:sn = n; snj := C:sn;
4.1 An Explanation of the Basic Recovery Algorithm The BRA works as follows: When a process Pi fails, it rolls back to its latest checkpoint and sends roll back to(n) message to all the other processes, where n is the sequence number of the latest checkpoint of Pi. On receiving this message, a process Pj rolls back to the earliest checkpoint whose sequence number is n; if there is no such checkpoint (i.e., all the existing checkpoints of Pj have sequence numbers < n), then it takes a checkpoint and assigns n as the sequence number for the checkpoint taken. So, we can assume for simplicity that all the processes (including the ones that just take a checkpoint) roll back to the earliest checkpoint whose sequence number is n. We prove below that the checkpoints to which the processes roll back as a result of receiving the roll back to() message indeed form a consistent global checkpoint. The basic recovery algorithm is asynchronous; in other words, after rolling back, a process can proceed normally without waiting for other processes to rollback. 11
For example, in Figure 2, if process P3 fails, it will rollback to its latest checkpoint C3;5 and send roll back to(5) message to P1 and P2. Upon receiving this message P2 will rollback to checkpoint C2;5 since C2;5 is the earliest checkpoint of P2 whose sequence number is 5. However, since P1 does not have a checkpoint whose sequence number is
5, it will not
rollback but will take a checkpoint with sequence number 5.
4.2 Correctness of the Basic Recovery Algorithm In this section, we prove that the basic recovery algorithm rolls back all the processes to a consistent global checkpoint when a process fails. As mentioned earlier, when a process fails and initiates recovery, no other process fails until the recovery is complete. Before proving the correctness of the recovery algorithm, we make the following observations from the checkpointing and recovery algorithms presented above.
Observation 1: If Pj (1
j N ) rolls back to the checkpoint Cj;mj upon receiving roll back to(n) message from Pi, then
1. mj n; 2. all checkpoints taken by Pj prior to checkpoint Cj;mj have sequence numbers less than
n.
Observation 2: For any message M sent by Pj (1 j N ), send(M ) Cj;mj
2
()
M:sn