The 27th Intl. Conf. on Parallel Processing (ICPP’98), Aug. 1998
On the Impossibility of Min-Process Non-Blocking Checkpointing and An Efficient Checkpointing Algorithm for Mobile Computing Systems Guohong Cao and Mukesh Singhal Department of Computer and Information Science The Ohio State University Columbus, OH 43210 E-mail: fgcao,
[email protected] Abstract
ble storage, which is called a checkpoint of the process. To recover from a failure, the system restarts its execution from a previous consistent global checkpoint saved on the stable storage. A system state is said to be consistent if it contains no orphan message; i.e., a message whose receive event is recorded in the state of the destination process, but its send event is lost [10, 16]. In order to record a consistent global checkpoint, processes must synchronize their checkpointing activities. In other words, when a process takes a checkpoint, it asks (by sending checkpoint requests to) all relevant processes to take checkpoints. Therefore, coordinated checkpointing suffers from high overhead associated with the checkpointing process. Much of the previous work [5, 9, 10] in coordinated checkpointing has focused on minimizing the number of synchronization messages and the number of checkpoints during the checkpointing process. However, these algorithms (called blocking algorithm) force all relevant processes in the system to block their computations during the checkpointing process. Checkpointing includes the time to trace the dependency tree and to save the states of processes on the stable storage, which may be long. Moreover, in mobile computing systems, due to the mobility of MH s, a message may be routed several times before reaching its destination. Therefore, blocking algorithms may dramatically reduce the performance of these systems [6]. Recently, nonblocking algorithms [6, 15] have received considerable attention. In these algorithms, processes need not block during the checkpointing by using a checkpointing sequence number to identify orphan messages. However, these algorithms [6, 15] assume that a distinguished initiator decides when to take a checkpoint. Therefore, they suffer from the disadvantages of centralized algorithms, such as poor reliability, bottle neck, etc. Moreover, these algorithms [6, 15] require all processes in the system to take checkpoints during checkpointing, even though many of them may not be necessary. If they are modified to permit more processes to initiate checkpointing, which makes them truly distributed, the new algorithms will suffer from another problem: in or-
Mobile computing raises many new issues, such as lack of stable storage, low bandwidth of wireless channel, high mobility, and limited battery life. These new issues make traditional checkpointing algorithms unsuitable. Prakash and Singhal [14] proposed the first coordinated checkpointing algorithm for mobile computing systems. However, we showed that their algorithm may result in an inconsistency [3]. In this paper, we prove a more general result about coordinated checkpointing: there does not exist a non-blocking algorithm that forces only a minimum number of processes to take their checkpoints. Based on the proof, we propose an efficient algorithm for mobile computing systems, which forces only a minimum number of processes to take checkpoints and dramatically reduces the blocking time during the checkpointing process. Correctness proofs and performance analysis of the algorithm are provided.
1 Introduction A distributed system is a collection of processes that communicate with each other by exchanging messages. A mobile computing system is a distributed system where some of processes are running on mobile hosts (MH s), whose location in the network changes with time. To communicate with MH s, a conventional distributed system is augmented with mobile support stations (MSS s) that act as access points for the MH s by wireless networks. The mobility of MH s raises some new issues [11] pertinent to the design of checkpointing algorithms: locating processes that have to take their checkpoints, energy consumption constraints, lack of stable storage in MH s, and low bandwidth for communication with MH s. These features make traditional checkpointing algorithms for distributed systems unsuitable for mobile computing systems. Coordinated checkpointing is an attractive approach for transparently adding fault tolerance to distributed applications, since it avoids domino effect [10] and minimizes the stable storage requirement. In this approach, the state of each process in the system is periodically saved on the sta37
the computation performed between its ith and (i + 1)th checkpoint, including the ith checkpoint but not the (i +1)th checkpoint.
der to keep the checkpoint sequence number updated, any time a process takes a checkpoint, it has to notify all processes in the system. If each process can initiate a checkpointing, the network would be flooded with control messages and processes might waste their time taking unnecessary checkpoints. Prakash-Singhal algorithm [14] was the first algorithm to combine these two approaches. More specifically, it forces only a minimum number of processes to take checkpoints and does not block the underlying computation during checkpointing. However, we showed that their algorithm may result in an inconsistency [3]. In this paper, we prove a more general result about coordinated checkpointing: there does not exist a non-blocking algorithm that forces only a minimum number of processes to take their checkpoints. Based on this proof, we propose an efficient checkpointing algorithm for mobile computing systems that forces only a minimum number of processes to take checkpoints and dramatically reduces the blocking time during checkpointing. The rest of the paper is organized as follows. Section 2 presents preliminaries. In Section 3, we prove that there does not exist a non-blocking algorithm that forces only a minimum number of processes to take their checkpoints. Section 4 presents an efficient checkpointing algorithm for mobile computing systems. Performance analysis of the algorithm is provided in Section 5. Section 6 concludes the paper.
2.2 New Issues in Mobile Computing There are some new issues in mobile computing systems that complicate the design of such systems. Locating MH s: When a mobile host MH1 wants to send a message m to another mobile host, say MH2 , MH1 sends m to its local MSS , say MSSp, via a wireless link. MSSp forwards m to MH2 ’s local MSS , say MSSq , over the static network. Finally, MSSq forwards m to MH2 via the wireless link. Since MH s move from one cell to another, the location of an MH is not fixed. As a result, MSSp needs to first locate MH2 before it can forward m to MSSq . The cost to locate an MH is referred to as the search cost [11]. Even though many routing protocols [2, 8] have been proposed to reduce the search cost, it can still be significant. Therefore, a checkpointing algorithm should try to avoid or reduce the search cost. Energy and bandwidth requirements: The battery of an MH has a limited life. To save energy, an MH powers down individual components during periods of low activity [7]. This strategy is referred to as the doze mode operation. An MH in the doze mode is woken up on receiving a message. Therefore, energy conservation and low bandwidth constraints require a checkpointing algorithm to minimize the number of synchronization messages. Due to the vulnerability of mobile computers to catastrophic failures, e.g., loss, theft, or physical damage, the disk storage on an MH cannot be considered to be stable [1]. Therefore, we utilize the stable storage at MSS s to store checkpoints of MH s. Then, to take a checkpoint, an MH has to transfer a large amount of data to its local MSS over the wireless link. Since the wireless network has low bandwidth, and MH s have relatively low computation power, a checkpointing algorithm should only force a minimum number of processes to take checkpoints.
2 Preliminaries 2.1 Computation Model A mobile computing system consists of a large number of mobile hosts (MH s) [1] and relatively fewer static hosts called mobile support stations (MSS s). The number of MSS s is denoted by Nmss and that of MH s by Nmh with Nmh Nmss . The MSS s are connected by a static wired network, which provides reliable FIFO delivery of messages. A cell is a logical or geographical coverage area under an MSS . An MH can directly communicate with an MSS by a reliable FIFO wireless channel only if it is present in the cell supported by the MSS . The distributed computation we consider consists of N sequential processes denoted by P1 ; P2 ; ; PN running concurrently on fail-stop MH s or MSS s. The processes do not share a common memory or a common clock. Message passing is the only way for processes to communicate with each other. The computation is asynchronous: each process runs at its own speed and messages are exchanged through reliable communication channels, whose transmission delays are finite but arbitrary. Each checkpoint taken by a process is assigned a unique sequence number. The ith (i 0) checkpoint of process Pp is assigned a sequence number i and is denoted by Cp;i . The ith checkpoint interval [12] of process Pp denotes all
2.3 Non-blocking Algorithms Most of the existing coordinated checkpointing algorithms [5, 10] rely on the two-phase protocol and save two kinds of checkpoints on the stable storage: tentative and permanent. In the first phase, the initiator takes a tentative checkpoint and forces all relevant processes to take tentative checkpoints. Each process informs the initiator whether it succeeded in taking a tentative checkpoint. A process may refuse to take a checkpoint depending on its underlying computation. After the initiator has received positive acknowledgments from all relevant processes, the algorithm enters the second phase. If the initiator learns that all processes have successfully taken tentative checkpoints, it asks them to make their tentative checkpoints permanent; otherwise, it 38
a more general result: “there does not exist a non-blocking algorithm that forces only a minimum number of processes to take their checkpoints.”
asks them to discard their tentative checkpoints. A process, on receiving the message from the initiator, acts accordingly. Note that after a process takes a tentative checkpoint in the first phase, it remains blocked until it receives the decision from the initiator in the second phase.
3 Proof of Impossibility Definition 1 If Pp sends a message to Pq during its ith checkpoint interval and Pq receives the message during its j th checkpoint interval, then Pq z-depends on Pp during Pp ’s ith checkpoint interval and Pq ’s j th checkpoint interval, denoted as Pp ij Pq .
A non-blocking checkpointing algorithm does not require any process to suspend its underlying computation. When processes do not suspend their computations, it is possible for a process to receive a computation message from another process, which is already running in a new checkpoint interval. If this situation is not properly dealt with, it may result in an inconsistency. For example, in Figure 1, P2 initiates a checkpointing. After sending checkpoint requests to P1 and P3 , P2 continues its computation. P1 receives the checkpoint request and takes a new checkpoint, then it sends m1 to P3 . Suppose P3 processes m1 before it receives the checkpoint request from P2 . When P3 receives the checkpoint request from P2 , it takes a checkpoint (see Figure 1). In this case, m1 becomes an orphan.
Definition 2 If Pp ij Pq , and Pq jk Pr , then Pr transitively z-depends on Pp during Pr ’s k th checkpoint interval
i
and Pp ’s ith checkpoint interval, denoted as Pp k Pr (we simply call it “Pr transitively z-depends on Pp ” if there is no confusion).
i
Pp ij Pq = Pp j Pq i j i Pp j Pq Pq k Pr = Pp k Pr
Proposition 1
checkpoint
^
)
req
ue
st
P1
)
P2 checkpoint
The definition of “z-depend” here is different from the concept of “causal dependency” used in the literature. We illustrate the difference between causal dependency and zdepend using Figure 2. Since P2 sends m1 before it receives m2, there is no causal dependency between P1 and P3 due to these messages. However, these messages do establish a z?1 P2 ^ P2 j?1 P1 dependency between P3 and P1 : P3 jk? 1 i?1
m1 req
ue
st
P3 checkpoint
Figure 1: Inconsistent checkpoints Most of non-blocking algorithms [6, 15] use a Checkpoint Sequence Number (csn) to avoid inconsistencies. In these algorithms, a process is forced to take a checkpoint if it receives a computation message whose csn is greater than its local csn. For example, in Figure 1, P1 increases its csn after it takes a checkpoint and appends the new csn to m1. When P3 receives m1, it takes a checkpoint before processing m1 since the csn appended to m1 is larger than its local csn.
=)
k?1
P3 i?1 P1 .
C 1, i-1
P1
C 2, j-1
P2
C 3, k-1
P3
C 1, i m1
m2
C 2, j C 3, k
Consistent checkpoints
This scheme works only when every process in the computation can receive each checkpoint request and then increases its own csn. Since Prakash-Singhal algorithm [14] forces only part of processes to take checkpoints, the csn of some processes may be out-of-date, and may not be able to avoid inconsistencies. Prakash-Singhal algorithm attempts to solve this problem by having each process maintain an array to save the csn, where csni [i] has been the expected csn of Pi . Note that Pi ’s csni [i] may be different from Pj ’s csnj [i] if there is no communication between Pi and Pj for several checkpoint intervals. By using csn and the initiator identification number, they claim that their non-blocking algorithm can avoid inconsistencies and minimize the number of checkpoints during checkpointing. However, we showed that their algorithm may result in an inconsistency [3]. Next, based on a new concept, called “z-dependency”, we prove
Figure 2: The difference between causal dependency and zdepend Definition 3 A min-process checkpointing algorithm is an algorithm satisfying the following condition: when a process Pp initiates a new checkpointing and takes a checkpoint Cp;i , a process Pq takes a checkpoint Cq;j associated with Cp;i if and only if Pq
j?1 i?1 Pp .
In coordinated checkpointing, to keep consistency, the initiator forces all dependent processes to take checkpoints. Each process, which takes a checkpoint, recursively forces its dependent processes to take checkpoints. Koo-Toueg algorithm [10] uses this scheme, and it has been proved [10] that this algorithm forces only a minimum number of processes to take checkpoints. In the following, we prove that 39
Koo-Toueg algorithm is a min-process algorithm and a minprocess algorithm forces only a minimum number of processes to take checkpoints. To simplify the proof, we use “Pp `ji Pq ” to represent the following: Pq causally depends on Pp when Pq is in the ith checkpoint interval and Pp is in the j th checkpoint interval.
Pp Pp
Proposition 2
Definition 4 A min-process non-blocking algorithm is a min-process checkpointing algorithm which does not block the underlying computation during checkpointing. Lemma 2 In a min-process non-blocking algorithm, assume Pp initiates a new checkpointing and takes a checkpoint Cp;i . If a process Pr sends a message m to Pq after it takes a new checkpoint associated with Cp;i , then Pq takes a checkpoint Cq;j before processing m if and only if
j j P =) P p i Pq ij q j i Pq =) Pp `i Pq `
j?1
Pq i?1 Pp .
Lemma 1 An algorithm forces only a minimum number of processes to take checkpoints if and only if it is a min-process algorithm.
Proof. According to the definition of min-process, Pq takes
j?1
a checkpoint Cq;j if and only if Pq i?1 Pp . Thus, we only need to show that Pq should take Cq;j before processing m. It is easy to see, if Pq takes Cq;j after processing m, m becomes an orphan (as in Figure 1). From Lemma 2, in a min-process non-blocking algorithm, when a process receives a message m, it must know if the initiator of a new checkpointing transitively z-depends on it during the previous checkpoint interval.
Proof. It has been proved [10] that Koo-Toueg algorithm forces only a minimum number of processes to take checkpoints; so we only need to prove the following: in [10], when a process Pp initiates a new checkpointing and takes a checkpoint Cp;i , a process Pq takes a checkpoint Cq;j associated with Cp;i if and only if Pq
j?1 i?1 Pp .
Necessity: In [10], when a process Pp initiates a new checkpoint Cp;i , it recursively asks all dependent processes to take checkpoints. For example, Pp asks Pkm to take a checkpoint, Pkm asks Pkm?1 to take a checkpoint, and so on. If a process Pq takes a checkpoint Cq;j associated with Cp;i , then, there must be a sequence: Pq `js?k11 Pk1 ^ Pk1 `sskk12 Pk2 ^ ^ Pkm?1 `sskkmm?1 Pkm ^ Pkm `si?km1 Pp (1 m N ) sk1 j?1 =) Pq sk Pk1 ^ Pk1 sk Pk2 ^ ^ 2 1 sk sk Pkm?1 skmm?1 Pkm ^ Pkm i?m1 Pp =)
Lemma 3 In a min-process non-blocking algorithm, there is not enough information at the receiver of a message to decide whether the initiator of a new checkpointing transitively z-depends on the receiver. Proof. The proof is by construction (using a counterexample). In Figure 3, assume messages m6 and m7 do not exist. P1 initiates a checkpointing. When P4 receives m4, there is a z-dependency as follows:
0
j?1
Pq i?1 Pp
Pq ji??11 Pp j ?1 = Pq i?1 Pp j?1 = Pq i?1 Pp ?1 Pp , when Pp initiates a new checkSufficiency: If Pq ji? 1 point Cp;i , Pq takes a checkpoint Cq;j associated with j?1 Cp;i ; otherwise, if Pq i?1 Pp , then there must be a se`
)
0
quence:
Pq js?k11 Pk1 Pk1 sskk12 Pk2 Pkm?1 sskkmm?1 Pkm Pkm si?km1 Pp (1 m N ) sk1 1 = Pq js? k1 Pk1 Pk1 sk2 Pk2 Pkm?1 sskkmm?1 Pkm Pkm si?km1 Pp Then, when Pp initiates a new checkpoint Cp;i , Pp asks Pkm to take a checkpoint, Pkm asks Pkm?1 to take a checkpoint, and so on. In the end, Pk1 asks Pq to take a checkpoint. Then, Pq takes a checkpoint Cq;j associated with Cp;i . ^
)
^
^
`
`
^ ^
^
^ ^
`
)
0
information (P2 0 P1 ) unless P4 notifies P1 of the new z-dependency information when P4 receives m4. There are two ways for P4 to notify P1 of the new z-dependency information: first is to broadcast the z-dependency information (not illustrated in the figure); the other is to send the z-dependency information by an extra message m6 to P3 , which in turn sends to P1 by m7. Both of them dramatically increase message overhead. Since the algorithm does not block the underlying computation, it is possible that P1 receives m7 after it sends out m5 (as shown in the figure). Then, P2 still cannot get the z-dependency information when it receives m5. Approach 2: Tracing the out-going messages. In this approach, since P2 sends message m3 to P5 , P2 hopes to
^
information: Approach 1: Tracing the in-coming messages. In this approach, P2 gets the new z-dependency information from P1 . Then, P1 has to know the z-dependency information before it sends m5 and appends the z-dependency information to m5. In Figure 3, P1 cannot get the new z-dependency
OR
)
0
P2 0 P4 P4 0 P1 = P2 0 P1 . However, P2 does not know this when it receives m5. There are two possible approaches for P2 to get the z-dependency
`
40
get the new z-dependency information from P5 . Then, P5 has to know the new z-dependency information and it would like to send an extra message (not shown in the figure) to notify P2 . Similarly, P5 needs to get the new z-dependency information from P4 , which comes from P3 , and finally from P1 . Certainly, this requires much more extra messages than Approach 1. Similar to Approach 1, P2 still cannot get the z-dependency information in time since the computation is in progress.
4 A Min-Process Checkpointing Algorithm From Theorem 1, no min-process non-blocking algorithm exists. There are two directions in designing efficient coordinated checkpointing algorithms. First is to relax the non-blocking condition while keeping the min-process property. The other is to relax the min-process condition while keeping the non-blocking property. The new constraints in mobile computing system, such as low bandwidth of wireless channel, high search cost, and limited battery life, suggest that the proposed checkpointing algorithm should be a min-process algorithm. Therefore, we develop an algorithm that relaxes the non-blocking condition; that is, it is a minprocess algorithm; however, it minimizes the blocking time.
C 1,1 P1 m5
m2 P2
m7 P3 m1
m3
m6
P4 m4
P5 C i,0
Time
4.1 Handling Node Mobility
Changes in the location of an MH complicate the routing of messages. Messages sent by an MH to another MH may have to be rerouted because the destination MH disconnected from the old MSS and is now connected to a new MSS . Many routing protocols for the network layer, to handle MH mobility, have been proposed [8]. An MH may be disconnected from the network for an arbitrary period of time. At the application level, the checkpointing algorithm may generate a request for the disconnected MH to take a checkpoint. Delaying a response to such a request, until the MH reconnects with some MSS , may significantly increase the completion time of the checkpointing algorithm. Thus, we propose the following solution to deal with disconnections. We observe that only local events can take place at an MH during the disconnect interval. No message send or receive event occurs during this interval. Hence, no new dependencies with respect to other processes are created during this interval. The dependency relation of an MH with the rest of the system, as reflected by its local checkpoint, is the same no matter when the local checkpointing is taken during the disconnect interval. Suppose a mobile host MHi wants to disconnect from its local MSSp . MHi takes a local checkpoint and transfers its local checkpoint to MSSp as disconnect checkpointi . If MHi is asked to take a checkpoint during the disconnect interval, MSSp converts disconnect checkpointi into MHi ’s new checkpoint and uses the message dependency information of MHi to propagate the checkpoint request. MHi also sends disconnect(sn) message to MSSp on the MH -to-MSS channel supplying the sequence number sn of the last message received on the MSS -to-MH channel. On the receipt of MHi ’s disconnect(sn), MSSp knows the last message that MHi has received and buffers all computation messages received until the end of the disconnect interval.
Figure 3: Tracing the dependency Theorem 1 No min-process non-blocking algorithm exists. Proof. From Lemma 2, in a min-process non-blocking algorithm, a receiver has to know if the initiator of a new checkpointing transitively z-depends on the receiver, which is impossible from Lemma 3. Therefore, no min-process nonblocking algorithm exists. Corollary 1 No non-blocking algorithm forces only a minimum number of processes to take checkpoints. Proof. The proof directly follows from Lemma 1 and Theorem 1.
Remarks Netzer and Xu [12] introduced the concept of “zigzag” paths to define the necessary and sufficient condition for a set of local checkpoints to lie on a consistent checkpoint. Our definition of “z-depend” captures the essence of “zigzag” paths. If an initiator forces all its “transitively z-dependent” processes to take checkpoints, the resulting checkpoints are consistent, and no “zigzag” path exists among them. If the resulting checkpoints are consistent, then there is no “zigzag” path among them, and all processes on which the initiator “transitively z-depends” have taken checkpoints. However, there is a distinctive difference between a “zigzag” path and “z-depend”. A “zigzag” path is used to evaluate whether the existing checkpoints are consistent; thus, it is mainly used to find a consistent recovery line in an uncoordinated checkpointing algorithm. It has almost no use in a coordinated checkpointing algorithm since a consistent recovery line is guaranteed by the synchronization messages. The “z-depend” is proposed for coordinated checkpointing and it reflects the whole synchronization process of coordinated checkpointing, e.g., in the proof of Lemma 1, “z-depend” is used to model the checkpointing process. Based on “zdepend”, we found and proved the impossibility result. It is impossible to prove the result only based on “zigzag” paths. 41
Later, MHi may reconnect at an MSS , say MSSq . If MHi knows the identity of its last MSS , say MSSp , it sends a reconnect(MHi , MSSq ) message to MSSp through MSSq . If MHi lost the identity of its last MSS for some reason, then MHi ’s reconnect request is broadcasted over the network. On receiving the reconnect request, MSSp transfers all the support information (the checkpoint, dependency vector, buffered messages, etc.) of MHi to MSSq , and clears all information related to the disconnection. Then, MSSq forwards the buffered messages to MHi . With this, the reconnect routine terminates and the relocated mobile host MHi resumes normal communication with other MH s (or MSS s) in the system.
received all response messages from the processes to which it sent checkpoint request messages, it sends a response to the proxy MSS . The following is the formal description of the first phase.
MSS
Algorithm executed at the proxy : for i(MSSi Smss ), send a R request message to MSSi ; Upon receiving all R vectors from each MSSi do construct matrix D; calculate(D);
8
2
MSS
MSS
Algorithm executed at an , say k: Upon receiving R request from the proxy MSS : for i(Location(Pi ) Cellk ), send Ri to the proxy MSS ; Upon receiving Sforced from the proxy MSS : for i(Location(Pi ) Cellk Pi Sforced ), send a checkpoint request to Pi ; continue its computation; Upon receiving response messages from all processes to which it sends request messages: send a response to the proxy MSS ;
8 8
4.2 The Checkpointing Algorithm
MSS
s: In mobile computing systems, Data structures at all communications to and from an MH pass through its local MSS . Therefore, when an MSS receives an application message to be forwarded to a local MH , it first updates the dependency information that it maintains for the MH , and then forwards it to the MH . The dependency information is recorded by boolean vector Ri for process Pi . The vector has n bits, representing n processes. When Ri [j ] is set to 1, it represents that Pi depends on Pj . For every Pi , Ri is initialized to 0 except Ri [i], which is initialized to 1. When a process Pi running on an MH , say MHp , receives a message from a process Pj , MHp ’s local MSS sets Ri [j ] to 1. First phase of the algorithm: When a process running on an MH initiates a checkpointing, it sends a checkpoint request to its local MSS , which will be the proxy MSS ( If the initiator runs on an MSS , then the MSS is the proxy MSS ). The proxy MSS sends R request messages to all MSS s in the system (denoted by Smss ) to ask for dependency vectors. In response to the R request, each MSS returns the dependency vectors that it maintains for processes running on the MH s in its cell. Having received all the dependency vectors, the proxy MSS constructs an N N dependency matrix with one row per process, represented by the dependency vector of the process. Based on the dependency matrix, the proxy MSS can locally calculate all the processes on which the initiator transitively depends. This is essentially the same as finding the transitive closure of the initiator in the dependency tree which is constructed using the dependency vectors. Then, it can be transformed to a matrix multiplication [4]. After the proxy MSS finds all the processes that need to take checkpoints, it adds them to the set Sforced and broadcasts Sforced to all MSS s, which are waiting for the result. When an MSS receives Sforced , it checks if any processes in Sforced are also in its cell. If so, the MSS sends checkpoint request messages to them. Any process receiving a checkpoint request takes a checkpoint and sends a response to its local MSS . After an MSS has
2 2
^ 2
P
Algorithm executed at any process i : Upon receiving a request from MSSj : take a checkpoint, send a response to MSSj ;
D NN
Calculate ( : ) /* Di denotes the dependency vector of process Pi . Assume Pj is the initiator. */ A = D j ; Dj = D j D While A = Dj do A = Dj ; Dj = Dj D; Sforced = ; for i(Dj [i] = 1), Sforced = Sforced Pi ;
f
6
[
8
g
Second phase of the algorithm: After the proxy MSS has received a response from every MSS , the algorithm enters the second phase. If the proxy MSS learns that all processes have successfully taken tentative checkpoints, it asks them to make their tentative checkpoints permanent. Otherwise, it asks them to discard their tentative checkpoints. A process, on receiving the message from the proxy MSS , acts accordingly (Techniques to reduce discarded checkpoints can be found in [13]). An example: In Figure 4, Di denotes the dependency vector of process Pi . When P1 initiates a checkpointing, the proxy MSS constructs the dependency matrix D, and calculates D1 D= (1 1 1 0 0). Since (1 1 1 0 0) D = (1 1 1 0 0), based on the procedure Calculate, Sforced = fP1 ; P2 ; P3 g. Thus, P1 asks P2 and P3 to take checkpoints. S1
P1
S2 10100
10000
checkpoint
m1 P2
11000
D1 = (10100)
01000 m2
m3
P3 00100 P4
10100 11000 D= 01100 00010 00011
checkpoint D1 X D = ( 1 1 1 0 0 ) checkpoint
01100
(11100)XD=(11100)
00010 m5
P5 00001
00011
Figure 4: Checkpointing and dependency information 42
assume that all processes are running on MH s and there is only one process running on each MH . Notations Cstatic : cost of sending a message between any two MSS s. Cwireless : cost of sending a message from an MH to its local MSS (or vice versa). Cbroadcast : cost of broadcasting a message in the static network. Csearch : cost incurred to locate an MH and forward a message to its current local MSS , from a source MSS . Tstatic : average message delay in the static network. Twireless : average message delay in the wireless network. Tcheckpoint : average delay to save a checkpoint on the stable storage, including the time to transfer the checkpoint from an MH to its local MSS . Tsearch : average delay incurred to locate an MH and forward a message to its current local MSS .
4.3 Proof of Correctness Lemma 4 A process takes a checkpoint if and only if the initiator transitively depends on it. Proof. In the proposed algorithm, the proxy MSS uses the procedure calculate to find out the transitive closure of the initiator. During the execution of calculate, no new dependency relation is formed since MSS s are blocked. Therefore, a process Pi belongs to Sforced if and only if the initiator transitively depends on Pi . Since an MSS only sends a checkpoint request to a process in Sforced , a process Pi takes a checkpoint only if the initiator transitively depends on Pi . Thus, we only need to show that a process receives a checkpoint and takes a checkpoint if the initiator transitively depends on it. If Pi is running on a MSSp , a checkpoint request is sent to Pi when Sforced reaches MSSp . If Pi is running on an MH , say MHj , which is in MSSp’s cell, then there are three possibilities when Sforced reaches MSSp : Case 1: MHj is still connected to MSSp : the request is forwarded to MHj , then to Pi . Case 2: MHj is disconnected from the network: MSSp takes a checkpoint on the behalf of Pi by converting disconnect checkpointi into Pi ’s new checkpoint. Case 3: MHj has moved to MSSq (handoff): MSSp forwards the request to MSSq , which forwards it to MHj , then to Pi by the underlying network as explained in Section 4.1. Thus, if the initiator transitively depends on Pi , Pi receives a checkpoint request and takes a checkpoint.
Performance of our algorithm The blocking time: After an MSS has sent all its local dependent vectors to the proxy MSS , it blocks (cannot forward messages) until it receives Sforced from the proxy MSS . Therefore, the blocking time is 2Tstatic . The synchronization message overhead: The message overhead includes the following. The request and reply messages from the initiator to its proxy MSS : 2Cwireless . The proxy MSS broadcasts R request, Sforced , and make permanent messages to all MSS s: 3Cbroadcast. MSS s send dependency vectors and response messages to the proxy MSS : 2Nmss Cstatic . MSS s send checkpoint request and make permanent messages to necessary MH s and receive response messages from them. In the worst case, it is 3Nmh Cwireless . Therefore, the total message overhead (worst case) is 2Cwireless +3Cbroadcast + 2Nmss Cstatic + 3Nmh Cwireless . The number of checkpoints: Similar to Koo-Toueg algorithm [10], our algorithm forces only a minimum number of processes to take checkpoints.
Theorem 2 The algorithm creates a consistent global checkpoint. Proof. The proof is by contradiction. Assume there is a pair of processes Pp and Pq such that at least one message m has been sent from Pq after Pq ’s last checkpoint Cq;j and has been received by Pp before Pp ’s last checkpoint Cp;i . We also assume Cp;i is associated with the initiator Pr ’s checkpoint Cr;k . Then, based on Lemma 3:
Comparison with other algorithms Table 1 compares our algorithm with two representative approaches for coordinated checkpointing. Koo-Toueg algorithm [10] has the lowest overhead (based on our three parameters) among the blocking algorithms [5, 9, 10] which try to minimize the number of synchronization messages and the number of checkpoints during checkpointing. The algorithm in [6] has the lowest overhead (based on our three parameters) among the non-blocking algorithms [6, 15]. We do not compare our algorithm with Prakash-Singhal algorithm since it may result in inconsistencies, and there is no easy solution to fix it without increasing overhead. As shown in Table 1, compared to Koo-Toueg algorithm, our algorithm avoids the search cost (which is significant) and dramatically reduces the blocking time from
i?1 Pp takes a checkpoint =) Pp k?1 Pr . Pp receives m from Pq =) Pq ji?1 Pp . j i?1 Pq ji?1 Pp ^ Pp k?1 Pr =) Pq k?1 Pr j Pq k?1 Pr =) Pq takes a checkpoint.
Thus, the sending of m is recorded at Pq . A contradiction.
5 Performance Analysis The performance of a checkpointing algorithm is determined by three parameters: the blocking time (in the worst case), the synchronization message overhead (in the worst case), and the number of checkpoints required during checkpointing. Since Nmh >> Nmss , to simplify the analysis, we 43
Algorithm Koo-Toueg [10]
Blocking time
Messages
Checkpoints
Nmh (4 Twireless Nmh (6 Cwireless + Csearch ) MIN +Tcheckpoint + Tsearch ) [6] 0 Nmh (3 Cwireless + Csearch ) MAX Our algorithm 2 Tstatic 2Cwireless + 3Cbroadcast + 2Nmss Cstatic MIN +3Nmh Cwireless 3Nmh Cwireless Table 1: A comparison of performance. Nmh Nmss , Twireless Tstatic , Cwireless Cstatic , and MAX MIN most of times. In general [8], the local MSS of the source MH is unaware of the current location of the target MH , and will have to “search” the network, i.e., query all MSS s, to discover the MSS that is local to the target MH . Then, Csearch = Cbroadcast + 2 Cstatic and Tsearch = 3 Tstatic .
Nmh
(4 Twireless + Tcheckpoint + Tsearch ) to 2 Tstatic . Besides avoiding the search cost, our algorithm cuts the message overhead by half compared to Koo-Toueg algorithm. Compared to [6], our algorithm avoids the search cost and minimizes the number of checkpoints during checkpointing. Note that there maybe many applications running in a system: some of them have higher reliability requirement, and others do not. Also, different processes run at their own speed. Then, some processes may need to take checkpoints more frequently than others. However, the algorithm in [6] forces all processes in the system to take checkpoints. Thus, our algorithm significantly reduces the message overhead and checkpointing overhead compared to [6].
[2] [3] [4] [5] [6]
6 Conclusions
[7]
The major contribution of this paper is not just to present an efficient checkpointing algorithm, but to prove a more general result about coordinated checkpointing; that is, there does not exist a non-blocking algorithm that forces only a minimum number of processes to take their checkpoints. The result suggest that there are two directions in designing efficient coordinated checkpointing algorithms: first is to relax the non-blocking condition while keeping the minprocess property and the other is to relax the min-process condition while keeping the non-blocking property. The new constraints in mobile computing system, such as low bandwidth of wireless channel, high search cost, and limited battery life, favor that the checkpointing algorithm should be a min-process algorithm. Following this direction, we proposed a min-process algorithm which dramatically reduces the blocking time from Nmh (4 Twireless + Tcheckpoint + Tsearch) in Koo-Toueg algorithm to 2 Tstatic . In our algorithm, only MSS s are blocked for a duration of 2 Tstatic . More specifically, the MSS s cannot send messages during these 2 Tstatic . However, they can do other computations and even receive messages.
[8]
[9] [10] [11] [12] [13] [14]
[15] [16]
References [1] Arup Acharya and B.R. Badrinath.
“Checkpointing Dis-
44
tributed Applications on Mobil Computers”. the Third Intl. Conf. on Parallel and Distributed Information Systems, Sep. 1994. P. Bhagwat and C.E. Perkins. “A Mobile Networking System Based on Internet Protocol (IP)”. USENIX Symp. on Mobile and Location Independent Computing, August 1993. G. Cao and M. Singhal. “On Consistent Checkpointing in Distributed Systems”. OSU Technical Report #OSU-CISRC9/97-TR44, 1997. T. Cormen, C. Leiserson, and R. Rivest. “Introduction to Algorithms”. MIT Press, 1990. Y. Deng and E.K. Park. “Checkpointing and RollbackRecovery Algorithms in Distributed Systems”. Journal of Systems and Software, pages 59–71, April 1994. E.N. Elnozahy, D.B. Johnson, and W. Zwaenepoel. “The Performance of Consistent Checkpointing”. Proc. of the 11th Symp. on Reliable Distributed Systems, pages 86–95, Oct. 1992. G.H. Forman and J. Zahorjan. “The Challenges of Mobile Computing”. IEEE Computer, pages 38–47, April 1994. J. Ioannidis, D. Duchamp, and G.Q. Maguire. “Ip-based Protocols for Mobile Internetworking”. Pro. of ACM SIGCOMM Symp. on Communication, Architectures and Protocols, pages 235–245, Sep. 1991. J. Kim and T. Park. “An Efficient Protocol For Checkpointing Recovery in Distributed Systems”. IEEE Trans. on Parallel and Distributed Systems, Aug. 1993. R. Koo and S. Toueg. “Checkpointing and RollbackRecovery for Distributed Systems”. IEEE Trans. on Software Engineering, pages 23–31, Jan. 1987. P. Krishna, N.H. Vaidya, and D.K. Pradhan. “Recovery in Distributed Mobile Environments”. IEEE Workshop on Advances in Parallel and Distributed System, Oct. 1993. R. Netzer and J. Xu. “Necessary and Sufficient Conditions for Consistent Global Snapshots”. IEEE Trans. on Parallel and Distributed System, Feb. 1995. R. Prakash and M. Singhal. “Maximal Global Snapshot with Concurrent Initiators”. Proc. of the Sixth IEEE Symp. on Parallel and Distributed Processing, pages 344–351, Oct. 1994. R. Prakash and M. Singhal. “Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems”. IEEE Trans. on Parallel and Distributed System, pages 1035–1048, Oct. 1996. L.M. Silva and J.G. Silva. “Global Checkpointing for Distributed Programs”. Proc. of the 11th Symp. on Reliable Distributed Systems, pages 155–162, Oct. 1992. R.E. Strom and S.A. Yemini. “Optimistic Recovery In Distributed Systems”. ACM Trans. on Computer Systems, pages 204–226, August 1985.