b State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, ... Compared with traditional
Pervasive and Mobile Computing 4 (2008) 139–160 www.elsevier.com/locate/pmc
A fault tolerant mutual exclusion algorithm for mobile ad hoc networks Weigang Wu a,b , Jiannong Cao a,∗ , Jin Yang a a Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong b State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China
Received 6 February 2006; received in revised form 25 August 2007; accepted 27 August 2007 Available online 14 September 2007
Abstract In this paper, we propose a permission-based message efficient mutual exclusion (MUTEX) algorithm for mobile ad hoc networks (MANETs). To reduce messages cost, the algorithm uses the “look-ahead” technique, which enforces MUTEX only among the hosts currently competing for the critical section. We propose mechanisms to handle dozes and disconnections of mobile hosts. The assumption of FIFO channel in the original “look-ahead” technique is also relaxed. The proposed algorithm can also tolerate link or host failures, using timeout-based mechanisms. Both analytical and simulation results show that the proposed algorithm works well under various conditions, especially when the mobility is high or load level is low. To our knowledge, this is the first permission-based MUTEX algorithm for MANETs. c 2007 Elsevier B.V. All rights reserved.
Keywords: Distributed algorithm; Fault tolerance; MANET; Mobile computing; Mutual exclusion
1. Introduction Mutual exclusion (MUTEX) is one of the classical distributed coordination problems. It refers to how to ensure a group of hosts intermittently requesting to enter the Critical ∗ Corresponding author. Tel.: +852 27667275; fax: +852 27740842.
E-mail addresses:
[email protected] (W. Wu),
[email protected] (J. Cao),
[email protected] (J. Yang). c 2007 Elsevier B.V. All rights reserved. 1574-1192/$ - see front matter doi:10.1016/j.pmcj.2007.08.001
140
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160
Section (CS) to gain exclusive access to the shared resource. Much work has been done on the MUTEX problem in distributed systems and many solutions have been presented [4,6,21]. These solutions can be categorized into two classes: token-based and permissionbased. In token-based algorithms, a unique token is shared among the hosts. A host is allowed to enter the CS only if it possesses the token. While in a permission-based algorithm, the host that wants to enter the CS must first obtain the permissions from other hosts by exchanging messages. Mobile computing, as a new distributed computing paradigm, has been a hot research topic in recent years. Compared with traditional wired networks, mobile networks have fundamentally different characteristics, in the aspects of communication, mobility and resource constraints. These characteristics introduce new applications of distributed coordination algorithms, including the MUTEX algorithms [4,5] in mobile networking environments. However, they also raise new challenges in the development of algorithms for solving distributed coordination problems [9,19]. For example, existing MUTEX algorithms for traditional distributed networks need to be adapted or even redesigned for being used in the new system environments. During the past several years, efforts have been made to solve the MUTEX problem in mobile computing systems, including both infrastructured mobile networks [22,2,15] and mobile ad hoc networks (MANETs) [4,3,24,13]. An infrastructured mobile network consists of two distinct sets of entities: a large number of mobile hosts (MHs) and relatively fewer, but more powerful mobile support stations (MSSs). While in a MANET, there is no MSS. Each MH in a MANET plays the same role, and communications between hosts are multi-hops. To achieve mutual exclusion in infrastructured networks, MSSs act on behalf of MHs, but for MANETs, there is no MSS, which makes the problem more difficult to solve. All existing algorithms for MANETs are token-based algorithms, which maintain the token rotation with certain kind of logical structure (topology) like a tree or ring. Although token-based algorithms have some desirable features, such as that the hosts only need to keep information about their neighbours and few messages are needed to pass the privilege to enter the CS, they suffer from the fatal problem of token loss. In MANETs, the mobility and frequent disconnections of MHs make token loss a more serious problem and the maintenance of a tree or ring topology more difficult. In this paper, we consider another approach—the permission-based approach [17]. We propose the first permission-based MUTEX algorithm for MANETs. Compared with the token-based approach, permission-based algorithms have the following advantages: (1) there is no need to maintain the logical topology to pass the token, and (2) there is no need to propagate any message if no host requests to enter CS. These advantages make the permission-based approach well suitable for MANETs where all the resources, e.g. the network bandwidth and the battery power of the MHs, are limited. The main problem of the permission-based approach is the large number of messages to be exchanged between the MHs. Therefore, the main challenge of designing a permission-based algorithm for MANETs is to reduce the number of messages. A commonly used approach to reduce the message cost is to use quorums [12,6,1,11]. A host who wants to enter CS just sends request to the hosts in the quorum rather than all hosts. However, how to construct a quorum efficiently is not trivial and costly, especially in MANETs. Moreover, quorum-based
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160
141
algorithms are usually prone to deadlocks [21]. Additional message cost is unavoidable to break deadlocks. In the proposed message-efficient algorithm, we use the “look-ahead” technique [22], which works as follows. On any host Si , there are two sets. The Info seti includes the IDs of those hosts which Si needs to inform when it requests to enter CS, and the Status seti includes the IDs of the hosts which would inform Si when they request to enter CS. By adjusting the two sets according to well-defined rules during the execution, hosts can know whether others are concurrently requesting CS and enforce MUTEX only among the hosts currently competing for CS. This significantly reduces the number of messages exchanged among MHs. However, the “look-ahead” technique is designed for infrastructured networks. Several issues need to be addressed in order to apply it in MANETs. First, the assumption of MSSs no longer holds. In MANETs, there is no MSS available. We need to consider handling the disconnections and dozes of MHs. To save power, MHs may enter the “doze” mode or even voluntarily disconnect from the system. In [22], when a host wants to disconnect from the network, it offloads the current values of its data structures to its serving MSS which would then act on behalf of the host in the execution of the algorithm. In MANETs, however, mobile hosts have to deal with such challenges by themselves. Therefore, new mechanisms must be designed to adapt the original algorithm. The second issue is fault tolerance. In a mobile environment, especially a MANET, link failures (e.g. signal shielded) and host failures (e.g. battery exhausted) occur frequently. Link failures can lead to message loss while host failures may result in accidental disconnections. In [22], failure of host or link is not considered. In our proposed algorithm, such failures are handled by exchanging messages and adjusting the Info set and the Status set. Third, the constraint of requiring FIFO channels in the original look-ahead technique [22] is relaxed in our proposed algorithm. In a traditional network, even infrastructured mobile network (support stations are fixed hosts), it is not so difficult to guarantee the FIFO property by using mechanisms such as sequence numbers. However, in MANETs, this becomes more difficult because the overhead (e.g. buffer memory, message size, data structures to record the sequence numbers of other hosts) for a mobile host is very high. Furthermore, in a MANET, messages are forwarded by mobile hosts themselves, rather than via fixed hosts as in an infrastructured network, so the message ordering is more difficult to guarantee. In our paper, we used a very simple but efficient method to deal with non-FIFO channels. Finally, we propose a distributed method to initialize the data structures Info set and Status set, which is absent in [22]. Our initialization method is very simple and efficient. The rest of the paper is organized as follows. In Section 2 we present a brief overview of the related work, describing existing MUTEX algorithms for MANETs and permissionbased MUTEX algorithms. Section 3 first introduces the “look-ahead” technique and then describes the design of the proposed algorithm. We present the correctness proof of the proposed algorithm in Section 4. Analytical performance evaluation and the simulation study are presented in Sections 5 and 6 respectively. We discuss how to make the algorithm more robust in Section 7. Finally, Section 8 concludes the paper.
142
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160
2. Related work Several MUTEX algorithms for MANETs have been proposed in literature. All of them make use of a token passed along a logical ring or a logical tree. Baldoni et al. [3] proposed a message efficient MUTEX algorithm by combining the token-circulating and token-asking approaches. A host sends a request message to the coordinator host only when it wants to access the CS. Thus, the token needs not be circulated if there is no request for CS. Moreover, the logical ring, along which the token is circulated, is computed on-the-fly in order to reduce the number of hops cost. A token-asking algorithm is proposed in [24], which is derived from Raymond’s treebased algorithm [16] with the improvement to handle broken links caused by host mobility. Using a Direct Acyclic Graph of token-orientated pointers, hosts maintain multiple paths leading to the token holder. Requests for CS are forwarded to the token holder along a path in the graph. Host and link failures are handled by adapting the graph topology. In [25] a variation of the algorithm in [24] is presented to eliminate the overhead introduced by the process for searching the requesting host. Instead of that the token holder searches the hosts, when a host detects a failure of an outgoing link, it resends its request. Based on the Local-Recency (LR) token circulation mechanism proposed in [13], Chen et al. proposed a self-stabilizing MUTEX algorithm for MANETs [8]. The token is passed along a dynamic logical ring. The least recently visited neighbour of the token holder is always chosen to be the successor. As mentioned before, token-based algorithms have some desirable features. However, the fatal problem of token loss makes these algorithms not robust. What is worse, in MANETs, the mobility and frequent disconnections of MHs make token loss a more serious problem and the maintenance of a tree or ring topology more difficult. Therefore, in this paper, we consider another approach—permission-based approach. Two different types of permissions have been used in existing permission-based algorithms [21]. The Ricart–Agrawala type permission, which is adopted in our proposed algorithm, is proposed by Ricart and Agrawala [17]. A host that wants to enter the CS, sends request messages to all other hosts. Requests for CS are assigned globally unique priorities, e.g. Lamport-like timestamps [10]. If the receiver of a request is not requesting the CS or its priority is lower, it grants permission to the requester immediately by sending a reply. Otherwise, it grants the permission after its own execution of the CS. The semantics of such permission is “as far as I am concerned, it is OK for you to enter the CS” [21]. A variation of the Ricart–Agrawala algorithm is proposed in [7], by remembering the recent history of the CS execution so as to reduce message cost. Singhal proposed a dynamic Rciart–Agrawala type algorithm [20], by dynamically changing the set of the hosts to which a requesting host needs to send request messages. The algorithm proposed in [22], based on which our new algorithm is designed, is also Reicart–Agrawala type algorithm. It made a modification to the Ricart–Agrawala algorithm so that, instead of involving all the hosts in the system, MUTEX is enforced only among the hosts which are currently competing for CS. On each host Si , there are two sets. The Info seti includes the IDs of those hosts which Si needs to inform when it requests to enter CS, and the Status seti includes the IDs of the hosts which would inform Si when they request to enter CS. If a host wants to enter CS, it just sends request to the hosts in its
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160
143
Info set. When a host wants to disconnect from the network, it offloads the current values of its data structures to its serving MSS which would then act on behalf of the host in the execution of the algorithm. Another type of permission is Maekawa type permission [12], where a requesting host needs to send requests to a quorum. Therefore algorithms using Maekawa type permissions are usually called “quorum-based” algorithms. The semantics of Maekawa permission is that “as far as all the hosts are concerned to my knowledge, it is OK for you to enter the CS”. Many quorum-based MUTEX algorithms have been proposed. Because of the limited space, we do not introduce them here. A good survey can be found in [6]. 3. Design of the proposed algorithm The proposed algorithm is based on the “look-ahead” technique [22] proposed for infrastructured mobile networks. To apply the “look-ahead” technique in MANETs, the following issues have to be considered. First and most important, there is no MSS for MHs in a MANET, so some additional steps should be taken to ensure that the algorithm can continue its execution after one or more hosts disconnect or doze. Second, the algorithm in [22] does not provide methods for some important functions. There is no method for initializing the two key data structures—the Info set and Status set. Third, the assumption about the FIFO channel becomes infeasible in MANETs. Because the route between two MHs changes from time to time due to the movements of the MHs, implementing the FIFO channel in MANETs is very costly. Finally, there is no fault tolerance mechanism for handling host failures or link failures. The problems above are all addressed in our proposed algorithm. We propose a simple and efficient method to initialize the Info set and Status set. The disconnections and dozes of hosts are also handled. When a host wants to disconnect from the network or enter the doze mode, it informs other hosts by sending messages. Both the sender and receiver modify the information maintained accordingly. When a host wakes up from the “doze” mode, it resumes the execution immediately without performing any special action. When a host reconnects to the network, it informs other hosts and resumes the execution. To relax the constraint of requiring FIFO channels, we add a new variable Q req . In [22], if a channel is not FIFO, a REPLY message from a host Si (with lower priority) to a host S j (with higher priority) may be delivered following the REQUEST message from Si to S j . On the reception of the REPLY message, S j moves Si to the Status set j and consequently Si can never get a REPLY from S j . With the help of Q req , such a request is recorded and the host with the lower priority would not be put in Status set after the reception of the REPLY. FIFO is no longer necessary. Using timeout, a fault tolerance mechanism is developed to tolerate both link and host failures. A timeout value is set for each request message sent out. Intermittent and recoverable link and host failures are handled by resending request messages when the timeout expires. 3.1. System model and assumptions A MANET consists of a collection of n autonomous MHs, S = {S1 , S2, . . . , Sn }, communicating with each other through wireless channels. Whether two hosts are directly
144
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160
connected is determined by the signal coverage range and the distance between the hosts. Each host is a router and the communication between two hosts can be multiple hops. Both link and host failures can occur. The topology of the interconnection network can change dynamically due to mobility of hosts and failures of links and hosts. At any moment, each MH is in one of three different states: normal, doze, and disconnection. For the disconnection mode, two different cases are considered: voluntary disconnection and accidental disconnection. An “accidental disconnection” refers to disconnection aroused by failures of links or hosts. Such disconnections occur more frequently and unpredictably than that in wired networks due to the unstable links and characteristics of mobile hosts. A MH may also voluntarily disconnect from the network to save the battery power [3,14]. Since the MH knows such a disconnection in advance, it can execute predefined operations for the distributed algorithm that it currently participates in. A distributed system built on a MANET is an asynchronous system, so there is no bound on the processing speed of hosts or message delay. To provide support for tolerating recoverable link and host failures, we use the timeout and message retransmission mechanisms. We assume that the link failure or host failure is eventually recovered within the retrying period. 3.2. Data structures and message types Following the definitions in [22,18], each host Si maintains two sets, which are defined below: Info seti : an array of the IDs of the hosts to which Si needs to send request messages when it wants to enter CS. Status seti : an array of the IDs of the hosts which, upon requesting to access CS, would send the request messages to Si . To ensure the correctness of the algorithm, the following conditions must be satisfied: (1) ∀Si :: Info seti ∪ Status seti = S; ∀Si :: Info seti ∩ Status seti = φ (2) ∀Si ∀S j :: Si ∈ Info set j ⇒ S j ∈ Status seti . Obviously, condition (1) guarantees that host Si knows the request status of all the other hosts and there is no redundancy information. Condition (2) guarantees the consistency among the sets of all MHs. In addition, each host maintains the following data structures. ts req : the timestamp for the request of Si . It is used as the priority of the request of Si . If Si is not requesting for CS, it is set to NULL. Q req : the array of the IDs of the hosts which have sent requests to Si but Si has not sent back a reply yet. T O req : the array of timers each associated with the REQUEST message sent to some host. Trec : the timestamp of the last reconnection. It is set to “0” initially. The messages used in the algorithm are classified into the following types. REQUEST: the message sent from a host requesting CS to other hosts for getting their permissions. The message contains the priority of the request (e.g. unique timestamp). REPLY: the message sent by a receiver of a REQUEST to grant the permission of accessing CS.
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160
145
Fig. 1. Algorithm for initialization.
DOZE: the message to inform others that the sender is entering the doze mode. DISCONNECT: the message to inform others that the sender is disconnecting voluntarily. RECONNECT: the message to inform others that the sender has reconnected to the network after a voluntary or accidental disconnection. The message contains the priority (e.g. a unique timestamp) of the reconnection. 3.3. The proposed algorithm 3.3.1. Initialization of Info set and Status set The algorithm for initializing the Info set and Status set is shown in Fig. 1. We use an n×n matrix M, where n is the number of hosts in the network, to represent the relationships among the hosts. The value of each element of M, m i j , represents the relationship between the pair of hosts Si and S j . If m i j = 0, S j is in the Info set of Si . If m i j = 1, S j is in the Status set of Si . To ensure that the sets of all the hosts satisfy the conditions (1) and (2) specified in Section 3.2, an arbitrary host, say S0 , is selected to act as the initiator of the algorithm. The initial value of M is determined by the initiator. S0 generates the upper triangular matrix Mu randomly and broadcasts Mu to all other hosts. Then, all the hosts, including S0 , set the value of each element of the lower triangular matrix Ml to be the 2’s complement of the corresponding element in the upper triangle. Finally, according to the corresponding row in M, each host initializes its Info set and Status set. It is easy to verify that our initialization algorithm meets the specified conditions with only a few messages. In the initialization algorithm, the failures of hosts or links are not considered. To tolerate failures, each host sets a timeout for the message of Mu . If the timeout expires, the host sends a query message to the initiator. This may be repeated until Mu is received from the initiator. Since a host eventually recovers from a failure, each host eventually receives the Mu and initializes its sets. 3.3.2. Normal execution (without disconnection or doze) The pseudocode of the proposed MUTEX algorithm is shown in Fig. 2. All the hosts execute the same code. When a host wants to enter the CS, it first sets tsreq to the current time and sends the REQUEST messages to all the hosts in its Info set. To tolerate link and host failures, a
146
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160
Fig. 2. Body of the MUTEX algorithm.
timeout is set in TOreq for each request message sent. The host then waits for a REPLY message corresponding to each REQUEST message sent out. If the Info set is empty, it enters CS immediately. When a host Si receives a REQUEST message from another host S j , it moves S j to Info seti and records the request in Q req . If Si itself is not requesting for CS or its priority is lower, it sends a REPLY message to S j and removes the record for S j in Q req . If S j is in Status seti before Si receives the REQUEST form S j and Si is requesting for CS with a lower priority, Si sends a REQUEST to S j . Upon receiving of a REPLY message from host S j , Si removes the timeout (in TOreq ) associated with S j . If Si finds no request from S j in its Q req , S j is moved to Status seti . When the timeout for a REQUEST message expires, the requesting host sends a REQUEST again. When all the replies for REQUEST messages have been received, the requesting host enters CS. On exiting CS, a host sends REPLY messages to all hosts in its Info set.
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160
147
Fig. 3. An example of normal execution.
It is worth notice that when two hosts compete for the CS simultaneously, if we do not recorder the REQUEST separately in Q req , it is possible that the host with the lower priority never gets a REPLY from the other host. This is caused by the non-FIFO property of communication channels. The following example execution shown in Fig. 3 illustrates the usage of Q req clearly. The example execution shows the scenario with only two hosts. At the beginning, the host Si is in the Status set j while S j is in the Info seti (Fig. 3(a)). Then Si and S j each generate a request while Si has a higher priority. Since S j is in Info seti , Si has to send a REQUEST message to S j (Fig. 3(b)). Upon receiving the REQUEST message from Si , S j moves Si from Status set j to Info set j and records the request in its Q req . Because the priority of S j is lower, S j sends a REPLY and a REQUEST to Si . However, due to the nonFIFO channel, the REQUEST arrives at Si first, and consequently Si records this request in Q req (Fig. 3(c)). In Fig. 3(d), the REPLY arrives at Si , but Si does not move S j into Status seti because there is a request from S j recorded in Q req . After receiving replies for all REQUEST messages sent out, Si enters the CS. Upon exiting from the CS, Si sends a REPLY message to S j (Fig. 3(e)) (Without Q req , S j should be in Status seti and cannot get the REPLY). S j moves Si to Status setj and enters CS (Fig. 3(f)). 3.3.3. Handling doze and disconnection When a host Si wants to enter the “doze” mode, it broadcasts a DOZE message to all other hosts and moves all hosts in its Status set to its Info set. All other hosts move Si
148
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160
into their Status set after they receive the DOZE message from Si . This ensures that the dozing host would not be disturbed. When a dozing host wakes up, it resumes the execution without any special operation. If a host Si wants to disconnect voluntarily, the same steps would be taken except that Si will broadcast a DISCONNECT message, rather than a DOZE message. When a host Si reconnects to the network after a disconnection, either a voluntary one or an accidental one, Si needs to broadcast a RECONNECT message to inform other hosts and move all the hosts in its Status seti to its Info seti . When a host S j receives a RECONNECT message from Si , it compares its Trec with the timestamp of the received RECONNECT message. If Trec is less, moves Si to Status seti and if it is waiting for a REPLY message from Si , S j removes the corresponding timeout in TOreq . The comparison is necessary because several hosts may send out a RECONNECT message concurrently. For example, if Si and S j send out RECONNECT concurrently, they will move each other to Status set if they do not compare the time of reconnection. This violates the conditions for Status set and Info set discussed in Section 3.2.
4. Correctness of the proposed algorithm A solution to the MUTEX problem must satisfy the following three correctness properties: • Mutual exclusion (safety): At most one host is allowed to enter the CS at any moment; • Deadlock free (liveness): If any host is waiting for the CS, then in a finite time some host enters the CS; • Starvation free (Fairness): If a host is waiting for the CS, then in a finite time the host enters the CS. In this section we prove the correctness of the proposed algorithm by showing that the three correctness requirements for distributed MUTEX algorithms are satisfied. Lemma 1. Based on the assumptions, the effect of recoverable link or host failures can be eliminated. Argument. Without loss of generality, we assume that the link between Si and S j fails and some message is lost. If neither of the two hosts is waiting for reply from another, the link failure has no effect on their executions. So, we assume that Si is waiting for the reply of S j . Eventually, the timeout for S j would expire and the request is resent. Since we assume that a link failure can be recovered within the retrying period, S j eventually will receive the request after the request is resent one or more times. Similarly, when a host e.g. Si , fails, only the hosts waiting for the reply from Si are affected. Since Si can recover within the specified time period for retrying, it can eventually receive the request after it reconnects to the network. Lemma 2. If a host Si wants to enter CS, it eventually learns about all the hosts concurrently requesting CS.
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160
149
Argument. For a host S j in Status seti of host Si , if S j wants to enter CS, it will send a REQUEST message to Si . Si will receive this request even there are failures (Lemma 1). For a host Sk in Info seti of host Si , Si sends a REQUEST message to Sk . Sk will eventually receive the request (Lemma 1). If Si receives the reply from Sk , it knows that Sk is not requesting CS. Otherwise, Si is blocked until Sk sends a reply. Theorem 1. At most one host can be in the CS at any time (safety). Argument. We prove the theorem by contradiction. Assume that two hosts Si and S j are executing the CS simultaneously. From Lemma 2, each of them has learned the status of the other, which implies that they had sent reply to each other before they entered the CS. However, this is impossible because no two hosts have the same priority. This is a contradiction. Theorem 2. The algorithm is deadlock free (liveness). Argument. A deadlock occurs when there is a circular wait and there is no REPLY in transit. This means that each host in the cycle is waiting for a REPLY from its successor host in the cycle. Since each request has a distinct priority, there is a host, e.g. Sh whose priority is the highest. Assume that the successor of Sh as S j . We claim that Sh eventually receives one REPLY from host S j . In the cases with no failure or disconnection, the safety property can be proved in a way similar to that in [22]. Here we only consider the cases with failures or disconnections. Case 1: S j fails. If S j failed before it sent out reply to Sh , it will send the reply after it recovers (Lemma 1). Eventually Sh can receive the reply from S j . Case 2: S j runs normally. In this case, S j would receive the request from Sh and handle it. This can be further divided into two cases: (1) S j has no request for CS or its request has a lower priority (because Sh has highest priority). Then S j would send reply to Sh immediately. (2) S j is in the CS. Q req is set for Sh , and Sh should be moved to Info set j if it is in Status set j before. After S j exits from CS, it must send a reply to Sh . In all cases, the circular wait is broken eventually. The theorem holds. Theorem 3. The algorithm is starvation free (fairness). Proof. In the cases with no failure or disconnection, the safety property can be proved in a way similar to that in [22]. Therefore we only need to consider the situations with failures and disconnections. If there are link failures or host failures, the effect would be eventually eliminated by the timeout mechanism (Lemma 1). If there are disconnected hosts, all their current requests would be deleted after they reconnect or wake up. So, failures and disconnections do not affect the fairness of the algorithm. The fairness is guaranteed. 5. Performance analysis In this section, we analyse the performance of the proposed algorithm. The performance of the proposed algorithm is affected significantly by failures and disconnections in the
150
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160
system, but it is hard to quantitatively study the performance under failures. Therefore, in the analysis, we consider only the normal executions without failure or disconnection. The performance of the proposed algorithm under failures is evaluated using simulations and the results are reported in the next section. Usually, the performance of a MUTEX algorithm is analysed under two special load conditions, i.e. low-load and high-load. In low-load conditions, there is seldom more than one request for CS simultaneously in the system; while in high-load conditions, there is always a pending request for CS at a host. We use three commonly used measures [21] in the analysis: Number of messages per CS entry (MPCS): the average number of messages exchanged among the hosts for each execution of the CS. Synchronization delay (SD): the number of sequential messages exchanged after a host leaves the CS and before the next host enters the CS. Response Time (RT): the time interval that a host waits to enter the CS after its request for CS arrives. The performance of the proposed algorithm is computed with respect to these measures as follows. 5.1. Number of messages per CS entry The MPCS of the proposed algorithm is in fact determined by the average size of the Info set. In general, all the hosts are equally active, so a host is put in an Info set and a Status set with same probability. Therefore, the average size of the Info set is n/2. Under low-load conditions, there is usually only one or no request in the system at any moment of time. When a host requesting for the CS sends REQUEST messages to the hosts in its Info set, those hosts send back REPLY messages immediately. Therefore, MPCS under low-load conditions is: MPCSlow = 2 ∗ n/2 = n. Under high-load conditions, each host always has a pending request. For an arbitrary host Si , on average, n/2 hosts will issue their current requests for CS earlier than Si does. Generally, half of these n/2 hosts are in the Status set of Si , and will send REQUEST messages to Si . On the reception of these REQUEST messages, Si sends REQUEST messages back to the senders (Si has the lower priority). For each REQUEST message, one REPLY message must be sent. Therefore, the average number of messages for one CS entry at a host under high-load conditions is MPCShig = 2 ∗ (n/2 + n/4) = 3 ∗ n/2. In the above analysis, the load at each host is assumed to be the same. However, as discussed in [22], an interesting feature of Info set is that its average size is affected by the activeness of the hosts, i.e. the Info set of the host requesting for CS is smaller if the arrival of CS requests is localized at few hosts. In such conditions, the MPCS under lowload conditions and high-load conditions are |Φ| and 3 ∗ |Φ|/2 respectively, where Φ is the set of active hosts. This results in a substantial reduction in message cost. The effect of this feature is validated in the simulations.
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160
151
It is important to notice that, the condition that “the arrival of CS requests is localized at a few hosts” does not mean that only these few hosts have requests for CS during the execution. The set Φ may vary from time to time. As long as it can keep stable for a “quite” long time, the proposed algorithm can benefit. 5.2. Synchronization delay The synchronization delay is meaningless under low-load conditions, because it measures the interval between the arrivals of two requests. Under high-load conditions, when a host Si exits the CS, it will send REPLY messages to all the hosts in its Info set, i.e. the hosts that have pending requests. Then, the host issuing request at the earliest time will enter the CS immediately after it receives the REPLY from Si . Therefore, the synchronization delay under high-load conditions is: SDhig = 1,
i.e. the time of transferring one message.
5.3. Response time Under low-load conditions, most of the time, no more than one host competes for the CS. When a host wants to enter CS, it sends REQUEST messages to the hosts in its Info set and then all these hosts send REPLY immediately after they receive the REQUEST. Therefore the response time under the low-load level is: RTlow = 2,
i.e. twice of the time of transferring one message.
Under high-load conditions, there is always a pending request at each host. The hosts are in the waiting chain with respect to the timestamps of their requests, i.e. the time when they issue requests for CS. A host in the chain can enter CS after its predecessor exits, so each host needs to wait for the hosts whose requests are earlier. On average, each host has to wait for n/2 such hosts. Assuming the average time of an execution of CS is E, the response time under high-load conditions is: RThig = (E + S Dhig ) ∗ n/2 = (E + 1) ∗ n/2.
6. Simulation study Simulations have been carried out to evaluate the performance of the proposed algorithm. In the simulation, we adopted Glomosim [23] as the platform which has been widely used for simulating algorithm in MANETs. The proposed algorithm is implemented as an application level protocol. 6.1. Simulation parameters and setup In the simulations, we set the parameters of the MANET with the same values as those used in [3]. Since the network partition problem is not considered, we adopted such a territory scale that can minimize the probability of network partitions while maximizing the number of hops for each application message [3].
152
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160
Table 1 Parameters of simulations Number of hosts Territory scale Average speed of movement Mobility model Transmission radius Routing-protocol Link bandwidth Simulation time
4, 8, 12, 16, 20 313, 443, 543, 626, 700 m 20 m/s Random-waypoint 200 m BELLMANFORD 2M bits/s 300 h
All the hosts are scattered into a rectangular territory. To evaluate the scalability of the algorithm, we varied the number of hosts, and accordingly the territory scale in proportion, so that the performances under different numbers of hosts are comparable. The number of hosts with corresponding territory scale and other main parameters are shown in Table 1. We used UDP as the transport layer protocol at first, but the high percentage of packet loss (more than 50%) always made the simulation blocked. Therefore we finally used TCP. However, even using TCP, there are still some packets lost (near 2%), which may be caused by the movements of the hosts. The arrival of the requests at a host is assumed to follow a Poisson distribution with mean λ, which represents the number of requests generated by a single host per second. Simulations were carried out under three different load conditions, i.e. high (λ = 1.00E–2), middle (λ = 1.00E–3) and low (λ = 1.00E–4). The simulations can be divided into three parts. First, there are only link failures in the network. Just as mentioned above, packet loss is already there, so we did not simulate link failures by ourselves in this part. Second, we introduced host failures. The arrival of host failures at a host is also assumed to satisfy a Poisson distribution and the duration of host failures satisfies the exponential distribution. To simplify the simulation, we fixed the percentage of host failure to 10%, a quite high value. In the two parts described above, the load levels of all the hosts are the same, i.e. the arrival rates of requests at the hosts are uniform. However, as discussed in Section 5, one feature of the algorithm is that the performance is better if the arrival of CS requests is localized at few hosts. This feature makes the algorithm scalable to large system. Therefore, we also conducted simulations under the condition that different hosts have different load levels. All simulation were carried out under three different mobility levels set by adjusting the pause time so that the time a host does move accounts for 100%, 50% and 10% of the total simulation time respectively. 6.2. Simulation results and discussion In the simulations, we measured the message cost using two metrics. Besides the MPCS introduced before, the number of hops per CS entry (HPCS) is also adopted. In this paper, a “message” means an application layer message, i.e. the end-to-end message; while a “hop” means a network layer message, i.e. the point-to-point message. Obviously, the latter can reflect the message cost of an algorithm more precisely. Because of the resource
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160
153
Fig. 4. No. of hops per application message.
constraints, HPCS is important for MANETs. As to the cost in time, we measured the RT of our algorithm. However, the following discussions focus on the MPCS and HPCS, which is at the core of the paper. The simulation results are described and discussed according to the main factors that affect the performance. To understand the simulation results well, we first need to find out the relationship between MPCS and HPCS. The difference between the two metrics depends on the number of hops per message, which is affected by the topology of a MANET. In a MANET, the topology is dynamic due to the movements and failures of hosts. Therefore, the number of hops per message is significantly affected by the mobility of the hosts, which has been validated in [3]. Fig. 4 depicts the average number of hops needed for each application level message with 20 hosts in our simulation environment. When the mobility increases, the number of hops decreases. In fact, the number of hops is determined by the distance between the source and destination host. In a MANET, the distance between any two hosts is changed from time to time. However, the higher the mobility is, the higher the probability for the distance between any two hosts to be short. Therefore, under high mobility, the number of hops is small. Of course, the limitation of the number of hops is one. (1) Effect of system scale Both Figs. 5 and 7 show that MPCS/HPCS increases linearly while the number of hosts increases. Using the “look-ahead” technique, a host only needs to send requests to those hosts in the Info set or competing for the CS concurrently. Since all the hosts are equally active, the average size of the Info set is in proportion with the system scale, as analyzed in Section 5. Therefore, the message cost increases nearly linearly when the system scale increases, and the algorithm is scalable to the system scale. When the activeness of hosts is not uniform, the scalability of our algorithm is much better. See “(5) Effect of uniformity of load level” for more discussions. The effect of system scale on RT is shown in Figs. 6 and 8. Same as HPCS/MPCS, the response time increases with the increase of the system scale. A large number of hosts in the system lead to more messages exchanged and higher competitions for CS, so a host needs to wait longer before it can enter the CS. (2) Effect of mobility Figs. 5 and 6 show the effect of mobility on MPCS/HPCS and RT respectively. Under different load levels, the MPCS without host movement is always the best. This is easy to
154
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160
Fig. 5. MPCS/HPCS vs no. of hosts-effect of mobility.
understand. If the hosts do not move, the TCP connections can be established easily and keep stable. So, all the requests are handled quickly and no message resending is needed. When hosts move during the execution, it becomes difficult to establish and maintain a connection due to the package loss caused by host movements. As a result, the number of messages increases. However, MPCS under low mobility is higher than that under high mobility. This can be explained using the effect of mobility on the distance between two hosts. Just as discussed before, the higher the mobility is, the shorter the average distance between any two hosts is and the higher probability for the connection to be established. Therefore, the response time would be shorter (as shown in Fig. 6) and fewer hosts compete for the CS concurrently, leading to less request messages. The effect of mobility on the HPCS, shown in Fig. 5(d)–(f), is similar to that on the MPCS, except that the performance under no movement is no longer the best. This is because that, besides its effect on the establishment of a connection, the mobility level also affects the average number of hops per message as discussed in Fig. 4. If there is no movement, the number of hops per message becomes large, which makes the performance bad.
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160
155
Fig. 6. RT vs no. of hosts-effect of mobility.
As shown in Fig. 6, the mobility affects the RT in a very similar way as it does on MPCS. More messages mean longer delay and more competitions, so RT varies with the same trend of MPCS. It is important to notice that when the MHs do not move, the RT is much less than that with movements. This is because the mobility of MHs affects not only the establishment of connections among hosts but also the transmission of a message. (3) Effect of load level Fig. 7 shows the MPCS against the load level. In general, MPCS increases with the increase of load level. From the curves in Fig. 7 we can see, under low-load level, a host needs to send request messages to nearly half of all hosts. Under high-load level, however, there are pending requests at any time, so a host that wants to enter CS needs to send request messages to not only all the hosts in its Info set, but also those requesting for the CS. Therefore, the MPCS increases. Fig. 8 shows the effect of load level on RT, similar to the effect on MPCS. The higher the load level is, the higher the RT. (4) Effect of host failure From Figs. 7 and 8, we can see that more time and messages are needed when there are host failures. The curves with host failures are named with a suffix “-Fail”. Under different load levels and system scales, the increase in MPCS caused by host failures varies strongly. In some cases, the number of messages is doubled, but for most cases, about 40% more messages are needed. Considering the high failure rate (10%), this is acceptable. (5) Effect of uniformity of load level Just as discussed in Section 5, the performance of the propose protocol is affected by the activeness of the hosts. In this part of simulation, we let some hosts generate requests
156
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160
Fig. 7. MPCS vs no. of hosts-effect of load level and host failures.
Fig. 8. RT vs no. of hosts-effect of load level and host failures.
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160
157
Fig. 9. HPCS with nonuniform load level.
more actively than others. Since the mobility or number of hosts have no relevance here, we carried out the simulation under fixed mobility of 50% with 20 hosts. The requests for CS of four hosts out of all the 20 hosts constitute 80% of all requests, while the rest of the hosts generate only 20% requests. Fig. 9 shows the HPCS against the load level. To measure the performance precisely, more load levels were examined. Under a nonuniform load level, some hosts have higher load level than the others, but the total load level of all the hosts is the same as that under uniform load level. The results show that, if some hosts are active but others are not, the message cost of achieving mutual exclusion is reduced significantly, which agrees to the analysis in Section 5. This feature is important to the scalability of MUTEX algorithm. The message cost does not increase with the system scale increases if some hosts are always much more active than the others. 7. Making the algorithm more robust In the algorithm described in Section 3, permanent link or host failures are not considered. In an asynchronous system, there is no solution to precisely detect such failures. However, the algorithm can be extended to handle permanent failures, with the assumption that such failures can be suspected using some approach, e.g. a timeout based approach. Once a permanent failure or network partition is perceived to occur, we enhance our algorithm in the following way to handle it. 7.1. Permanent host failures First, let us consider the permanent failures during the initialization. As described in Section 3.3.1, a timeout is set for the initialization message from the initiator. To handle the permanent failure of the initiator, a host Si sends query messages to all other hosts rather than only the initiator, when the timeout expires. On the reception of the query, the receivers send back replies with the possibly received Mu . If no host receives the initialization Mu , a new initiator can be selected using some predefined order, e.g. ID new initiator = (ID old initiator + 1) mod n. Then the initialization is tried again. Now, we discuss how to handle permanent failures in the mutual exclusion. To handle permanent host failures, a straightforward method is to set a threshold for resending a
158
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160
REQUEST message. When the number of the times of resending a REQUEST reaches the threshold, the destination host is perceived to be crashed and it is moved to the Status set. However, the accuracy of detecting permanent host failures using such a method is significantly affected by the load conditions. To avoid this, another method is to detect such failures using a separated heartbeat like mechanism, which is commonly used in distributed systems. After the crash of a host is detected, the host is move to Status set. 7.2. Network partition The execution of a host may be blocked when the network is partitioned (because of link failures, host failures or host movements) until the partitions are merged. If the partitions keep unconnected for very long time, the delay caused may be intolerable. Whether such a long time delay can be avoided depends on the property of the CS. If the CS cannot be accessed by two or more hosts at any moment of time, even if they are in unconnected partitions, no action can be taken. Otherwise, if the hosts in different partitions can access the CS simultaneously, it is possible to reduce the wait time caused by partitions. In fact, the main difficulty to handle the network partitioning is how to detect it, which is out of scope of this paper. Here, we assume there is such a partition detection module available. With such a module, handling the partitioning is simple. Once a partition is detected, a host moves all the hosts in other partitions to its Status set. However, it is more complex to handle the merging of partitions. The relationship between the hosts in different partitions must be considered. For a pair of hosts in two different partitions, it is possible that both hosts put each other in the Status set, which violates the conditions specified in Section 3.2, when the partitions are merged. To deal with this, at least one host has to change its Status set. A simple but efficient method is that the host with the small ID moves the other host from Status set to Info set. 8. Conclusions In this paper, we have proposed an efficient and reliable permission-based MUTEX algorithm for MANETs. This algorithm does not depend on any logical topology, so it eliminates the cost of maintaining such a topology. To reduce the number of message exchanged, the “look-ahead” technique is used. We designed a mechanism using timeout to tolerate intermittent and recoverable link/host failures, which often occur in MANETs. The algorithm can also handle dozes and disconnections of hosts. The simulation results show that the algorithm performs better under low-load level and high mobility level. Moreover, when some hosts are more active than the others, the proposed algorithm can save much more communication cost. Acknowledgements This work is supported in part by the University Grant Council of Hong Kong under the CERG grant PolyU 5076/01E and China National 973 Programme Grant 2002CB312002.
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160
159
References [1] D. Agrawal, A.E. Abbadi, An efficient and fault-tolerant solution for distributed mutual exclusion, ACM Transactions on Computer Systems (1991) 1–20. [2] B.R. Badrinath, A. Acharya, T. Imielinski, Designing distributed algorithms for mobile computing networks, Journal of Computer Communications 19 (1996) 309–320. [3] R. Baldoni, A. Virgillito, R. Petrassi, A distributed mutual exclusion algorithm for mobile ad-hoc networks, in: Proc. of ISCC’02, IEEE Computer Society, 2002, pp. 539–544. [4] M. Bencha¨ıba, A. Bouabdallah, N. Badache, M. Ahmed-Nacer, Distributed mutual exclusion algorithms in mobile ad hoc networks: An overview, ACM SIGOPS Operating Systems Review 38 (2004) 74–89. [5] A. Boukerche, S. Hong, T. Jacob, A distributed algorithm for dynamic channel allocation, Mobile Networks and Applications 7 (2) (2002) 115–126. [6] G. Cao, M. Singhal, a delay-optimal quorum-based mutual exclusion algorithm for distributed systems, IEEE Transactions on Parallel and Distributed Systems 12 (12) (2001) 1256–1268. [7] O.S.F. Carvalho, G. Roucairol, On mutual exclusion in computer networks, technical correspondence, Communications of the ACM 26 (2) (1983) 146–149. [8] Y. Chen, J. Welch, Self-stabilizing mutual exclusion using tokens in ad hoc networks, in: Proc. of DIALM’02, ACM, 2002, pp. 34–42. [9] G.H. Forman, J. Zahorjan, The challenges of mobile computing, Computer 27 (4) (1994) 38–47. [10] L. Lamport, Time, clocks and ordering of events in distributed systems, Communications of the ACM 21 (7) (1978) 558–565. [11] W.-S. Luk, T.-T. Wong, Two new quorum based algorithms for distributed mutual exclusion, in: Proc. of ICDCS’97, IEEE Computer Society, 1997, pp. 100–106. √ [12] M. Maekawa, A N algorithm for mutual exclusion in decentralized systems, ACM Transactions on Computer Systems 3 (3) (1985) 145–159. [13] N. Malpani, N.H. Vaidya, J.L. Welch, Distributed token circulation on mobile ad hoc networks, in: Proc. of ICNP’01, IEEE Computer Society, 2001, pp. 4–13. [14] V.K. Murthy, Mobile computing: Operational models, programming modes and software tools, in: IPDPS, IEEE Computer Society, 2001, pp. 2016–2025. [15] L.M. Patnaik, A.K. Ramakrishna, R. Muralidharan, Distributed algorithms for mobile hosts, IEE Proceedings-Computers and Digital Techniques 144 (1997) 49–56. [16] K. Raymond, A tree-based algorithm for distributed mutual exclusion, ACM Transactions on Computer Systems 7 (1989) 61–77. [17] G. Ricart, A.K. Agrawala, An optimal algorithm for mutual exclusion in computer networks, Communications of the ACM 24 (1981) 9–17. [18] B.A. Sanders, The ifnormation structure of distrubuted mutual exclusion algorithms, ACM Transactions on Computer Systems 5 (3) (1987) 284–299. [19] M. Satyanarayanan, Fundamental challenges in mobile computing, in: Proc. of PODC’96, ACM press, 1996, pp. 1–7. [20] M. Singhal, A dynamic information structure mutual exclusion algorithm for distributed systems, IEEE Transactions on Parallel and Distributed Systems 3 (1) (1991) 121–125. [21] M. Singhal, A taxonomy of distributed mutual exclusion, Journal of Parallel and Distributed Computing 18 (1993) 94–101. [22] M. Singhal, D. Manivannan, A distributed mutual exclusion algorithm for mobile computing environments, in: Proc. of ICIIS’97, IEEE Computer Society, 1997, pp. 557–561. [23] UCLA parallel computing laboratory, GloMoSim Manual ver.1.2, http://pcl.cs.ucla.edu/. [24] J.E. Walter, S. Kini, Mutual exclusion on multihop, mobile wireless networks, Technical Report, Dept. of Computer Science, Texas A&M Univ., 1997. [25] J. Walter, J. Welch, N. Vaidya, A mutual exclusion algorithm for ad hoc mobile networks, Wireless Networks 9 (2001) 585–600.
160
W. Wu et al. / Pervasive and Mobile Computing 4 (2008) 139–160 Weigang Wu received the Ph.D. degree in computer science from Hong Kong Polytechnic University, Hong Kong, in 2007. He received the B.Sc. degree in material science in 1998 and the M.Sc. degree in computer science in 2003, both from the Xi’an Jiaotong University, China. He is currently a Postdoctoral Fellow in the Department of Computing, The Hong Kong Polytechnic University. His research interests include parallel and distributed computing, networking, mobile and wireless computing, fault tolerance. His recent research has focused on distributed algorithms in mobile environments and mobility management in wireless mesh networks. He has published a number of papers in conferences and journals, and served as a reviewer for many international journals and conferences. Dr. Wu is a member of IEEE.
Jiannong Cao received the B.Sc. degree in computer science from Nanjing University, Nanjing, China in 1982, and the M.Sc. and the Ph.D. degrees in computer science from Washington State University, Pullman, WA, USA, in 1986 and 1990 respectively. He is currently a professor in the Department of Computing at Hong Kong Polytechnic University, Hung Hom, Hong Kong. He is also the director of the Internet and Mobile Computing Lab in the department. Before joining Hong Kong Polytechnic University, he was on the faculty of computer science at James Cook University and University of Adelaide in Australia, and City University of Hong Kong. His research interests include parallel and distributed computing, networking, mobile and wireless computing, fault tolerance and distributed software architecture. He has published over 180 technical papers in the above areas. His recent research has focused on mobile and pervasive computing systems, developing test-bed, protocols, middleware and applications. Dr Cao is a senior member of China Computer Federation, a member of the IEEE Computer Society, the IEEE Communication Society, IEEE, and ACM. He is also a member of the IEEE Technical Committee on Distributed Processing, IEEE Technical Committee on Parallel Processing, IEEE Technical Committee on Fault Tolerant Computing. He has served as a member of editorial boards of several international journals, a reviewer for international journals/conference proceedings, and also as an organizing/programme committee member for many international conferences. Jin Yang received the Ph.D. degree in computer science from Hong Kong Polytechnic University, Hong Kong, in 2006. He has served for Lucent Technologies Optical Networks research center for three years after received his M.Sc. degree from Xia’men University, China. Dr. Yang’s current research interests include fault tolerant distributed computing, mobile computing, failure detectors and agreement protocols. Besides academic research, Dr. Yang also has a close relation with the industrial community. Currently, Dr. Yang is focusing on implementing distributed/mobile computing in Linux kernel to support network elements in mesh network.