A Self-Stabilizing Leader Election Algorithm in Highly ... - IEEE Xplore

926

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,

VOL. 19,

NO. 7, JULY 2008

A Self-Stabilizing Leader Election Algorithm in Highly Dynamic Ad Hoc Mobile Networks Abdelouahid Derhab and Nadjib Badache Abstract—The classical definition of a self-stabilizing algorithm assumes generally that there are no faults in the system long enough for the algorithm to stabilize. Such an assumption cannot be applied to ad hoc mobile networks characterized by their highly dynamic topology. In this paper, we propose a self-stabilizing leader election algorithm that can tolerate multiple concurrent topological changes. By introducing the time-interval-based computation concept, the algorithm ensures that a network partition can within a finite time converge to a legitimate state even if topological changes occur during the convergence time. Our simulation results show that our algorithm can ensure that each node has a leader over 99 percent of the time. We also give an upper bound on the frequency at which network components merge to guarantee the convergence. Index Terms—Mobile ad hoc networks, self-stabilizing, leader election, concurrent and disjoint computations.

Ç 1

INTRODUCTION

A

ad hoc network is a collection of mobile nodes forming a temporary network without any form of centralized administration or predefined infrastructure. In such a network, each node participating in the network acts as both a host and a router. Two nodes can communicate if they are within the transmission range of each other. Due to node mobility, link breakages and link formations might occur frequently. The failure of some links considered as critical can split up the network into several disjoint network components. In addition, multiple components can also merge into a single connected component. Leader election is a fundamental problem in distributed systems and is a useful building, especially in ad hoc networks where failures are considered the norm and not the exception. Leader election is required in many applications, for example, when a mutual exclusion application is blocked because of the failure of a token holding node. It is also required in a group communication service [18], key distribution and management [4], [10], and routing coordination [3], [5], [7], [8], [13], [17]. The classical property of the leader election problem in distributed systems with a fixed number of nodes states that eventually there is a unique leader [11]. However, due to ad hoc network characteristics this property cannot be guaranteed. First, the mobility of nodes may lead to frequent network partitioning. Second, we cannot guarantee a unique leader in each network component at all times. When a network partitioning occurs, the network component will be without a leader until a partitioning is detected and the leader election N

. A. Derhab is with the Department of Computer Engineering, CEntre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers 16030, Algeria. E-mail: [email protected], [email protected]. . N. Badache is with the Computer Science Department, University of Sciences and Technology Houari Boumediene (USTHB), Algiers 16111, Algeria. E-mail: [email protected]. Manuscript received 7 Aug. 2006; revised 19 Apr. 2007; accepted 3 Oct. 2007; published online 22 Oct. 2007. Recommended for acceptance by J. Hou. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TPDS-0220-0806. Digital Object Identifier no. 10.1109/TPDS.2007.70792. 1045-9219/08/$25.00 ß 2008 IEEE

process terminates. In the same way, when two network components merge, there will temporarily be two leaders in the resulting network component. Thus, the leader election problem definition should slightly be modified to be the following: every connected component will eventually have a unique leader. The problem of the leader election in mobile ad hoc networks is challenging and has received interest in recent years, but few solutions have been proposed. The aim of these solutions is to bring a network affected by topological changes to a state satisfying the following predicate P : for every connected component, there is a unique leader, and each node of the component is aware of this leader. Malpani et al. [12] have adapted the Temporally Ordered Routing Algorithm (TORA) [15] to elect a unique leader in each network component. Every component creates a leaderoriented directed acyclic graph (DAG). They have presented two DAG-based leader election algorithms, the first one is for one topological change and the second one is for concurrent topology changes. One topological change means that a new topology change occurs only after the algorithm has terminated its execution triggered by a previous topology change, whereas concurrent topology changes mean that changes can occur at any time. In the algorithms, a node that detects that its current leader is no longer reachable elects itself as leader and propagates this information throughout the component. It ceases to be a leader when it detects the existence of another leader of higher priority in its component. The first algorithm is proved to stabilize to a state satisfying P if the network topology stabilizes for a sufficiently long time. The second algorithm is designed to tolerate concurrent topological changes. However, it has been provided with no proof of correctness. In [21], the authors have proposed a leader election algorithm based on the diffusing computation concept [6]. The leader sends periodic heartbeat messages to other nodes. The absence of these messages at a node for some predefined time out indicates a departure from the leader and triggers a diffusing computation at that node to elect a new leader. The diffusing computation consists of constructing a spanning tree with the node that started off the diffusing computation as the root. The root then informs all the reachable nodes about the identity of the elected leader Published by the IEEE Computer Society

DERHAB AND BADACHE: A SELF-STABILIZING LEADER ELECTION ALGORITHM IN HIGHLY DYNAMIC AD HOC MOBILE NETWORKS

(for example, node with the highest identity). Different diffusing computations can be executed concurrently. A total order on these diffusions is defined to determine the diffusing computation of the highest priority. A node participates in only one diffusing computation at a time. It stops its participation in diffusing computations of lower priorities in favor of the highest one. The algorithm is guaranteed to reach a state satisfying P if diffusing computations stop in the network. This algorithm has also some disadvantages. First, leader messages need to be periodically broadcast to all nodes in the network to detect partition. Second, if the leader migrates away, then multiple simultaneous computations are triggered in all other nodes in the component. Third, there is also an extra overhead in terms of the extra Probe-Reply required to handle failure during leader election. In this paper, we propose a self-stabilizing leader election algorithm that can work in a highly dynamic and asynchronous ad hoc network. The algorithm uses the DAG-based approach for designing a leader election algorithm. This approach has the advantage of detecting partitions automatically using the TORA mechanism. It has fewer message overheads as compared to the algorithm in [21]. It is localized since the knowledge of each node is limited to one hop. Nodes have synchronized clocks, either through a global positioning system (GPS) [9] or through an appropriate algorithm [14], [19], [20]. The paper addresses an open issue that has been tackled by a few works. The issue is how to guarantee that a selfstabilizing algorithm converges to a legitimate state despite frequently changing ad hoc networks. In the literature, an algorithm is self-stabilizing if it can converge to a legitimate state in a finite time regardless of the initial state, and the system remains in a legitimate state until another topological change occurs. However, the convergence is guaranteed only when the network experiences no topological changes during the convergence time. The solutions presented earlier cannot work in highly dynamic environments like ad hoc networks, because they assume that topological changes stop from some point onward in order for the algorithms to stabilize. When the system experiences a new topological change before completing the convergence, the algorithms restart convergence to its legitimate state from scratch. In this case, the algorithms might never converge to the legitimate state. Our original contribution is designing a leader election algorithm that can converge to the legitimate state even if topological changes occur during the convergence time. This is achieved as follows: In Malpani’s algorithm, all the computations, which have been triggered in response to link failures, have the same goal (that is, checking if the component is separated from the current leader). Therefore, we propose to stop newer computations in favor of the oldest one. In addition, the merge of components triggers a computation in the component whose leader has a lower priority. We consider that an older leader has always a higher priority than a newer one. When concurrent merge computations meet, we propose to stop computations associated with newer leaders in favor of the oldest one. We also estimate the upper bound frequency at which the merge of network components can occur to guarantee the convergence of the algorithm to a legitimate state. The rest of the paper is organized as follows: In Section 2, we present a review of Malpani’s algorithm and its behavior in the presence of concurrent topological changes.

927

Section 3 defines the notion of time-interval-based computation. In Section 4, we describe our self-stabilizing leader election algorithm. Section 5 shows simulation results. The correctness proof of the algorithm is given in Section 6. Finally, Section 7 concludes the paper.

2

OVERVIEW

OF

MALPANI’S ALGORITHM

In [12], the network is arranged as a DAG with the leader node as the sink. A 6-tuple Hi ¼ ðlidi ; i ; oidi ; ri ; i ; iÞ is associated with each node. It represents the height of node i in the DAG. The heights are compared lexicographically (where 0 < 1 < 2 . . . , and A < B < C . . . ). Links are ordered from higher to lower heights. For each j 2 i’s neighbors ðNi Þ, the link ði; jÞ is marked outgoing if Hi is higher than Hj . Otherwise, it is marked incoming. lidi denotes the identifier of leader in node i’s component. The triple ði ; oidi ; ri Þ denotes the reference level. i and i induce the directions on the links among all the nodes with the same reference level. Fig. 1a depicts a DAG constructed for the leader node L. Initially, node L sets its height to (L, 1, 1, 1, 0, L). It then propagates an Update in the network. When a node i receives such a message from its neighbor j, it sets its height to (lidj , 0, 0, 0, j þ 1, i). When a node i loses its last outgoing link (that is, becomes a local minimum) as a result of link failure, the node selects a new height such that it becomes the global maximum by defining a new reference level. i is set to the time of the failure, oidi is set to i, the originator of this reference level, and ri is set to zero. Then, it sends an Update message containing its new height to its neighbors. When a new reference level is defined, it is higher than any previously defined reference levels since it is based on the current time. This action results in link reversals that may cause other nodes to lose their last outgoing link. In this manner, the new reference level is propagated outward from the point of the original failure (redirecting links in order to reestablish routes to the leader). This propagation will only extend through nodes, which, as a result of the initial link failure, have lost all routes to the leader. Fig. 1b shows the state of the network after the failure of link ðL; AÞ, and Figs. 1c, 1d, and 1e show the sequence of link reversals performed by nodes A, B, C, D, and E following the failure of ðL; AÞ. A node i changes its height under the following five different cases: .

Case 1 (propagate). The node has no more outgoing links due to a link reversal following the receipt of an Update packet, and the reference levels of its neighbors are not equal (for example, nodes B and D in Fig. 1c). It sets its reference level to largest among all its neighbors and sets i to a value that is lower (1) than the of all its neighbors with the maximum level. Formally ði ; oidi ; ri Þ ¼ max ðj ; oidj ; rj Þjj 2 Ni ; ði ; iÞ ¼ min j jj 2 Ni =ðj ; oidj ; rj Þ ¼ maxfði ; oidi ; ri Þgg 1; iÞ:

928


VOL. 19,

NO. 7, JULY 2008

Fig. 1. An execution of Malpani’s algorithm. (a) t ¼ 0. (b) t ¼ 1. (c) t ¼ 2. (d) t ¼ 3. (e) t ¼ 4. (f) t ¼ 8. (g) t ¼ 11. (h) t ¼ 13. (i) t ¼ 14. (j) t ¼ 18. (k) t ¼ 22.

.

.

.

.

Case 2 (reflect). The node has lost its outgoing links due to a link reversal following the receipt of an Update packet, and the reference levels of the neighbors are equal, with r equal to zero (for example, node C in Fig. 1c). The node then reflects back the reference level by setting ri to one and i to zero. The new reference level, which is referred to as the reflected reference level, is propagated back toward the node that originally defined the reference level. Case 3 (detect). The node has lost its outgoing links due to a link reversal following the receipt of an Update packet, and all of i’s neighbors have the same reflected reference level with oid ¼ i. Then, node i has detected a partition. In this case, node i elects itself as the leader and starts a DAG propagation to inform other nodes in the new component about the new leader. Case 4 (generate). The node has lost its last outgoing link due to a link reversal following the receipt of an Update packet, and all of i’s neighbors have the same reflected reference level with oid 6¼ i. Because the node did not define the new reference level itself, this is not necessarily an indication of a partitioning of the component. Therefore, the node starts a new reference level. It sets its reference level to ðt; i; 0Þ and i to zero, where t is the current time. Case 5 (merge). The node receives an Update packet from j such that lidj < lidi . Then, Hi ¼ ðlidj ; j ; oidi ; ri ; j þ 1; iÞ:

Fig. 1 shows an example of executing Malpani’s algorithm with time increasing from Figs. 1a, 1b, 1c, 1d, 1e, 1f, 1g, 1h, 1i, 1j, and 1k. In this figure, thick arrows indicate the direction of update messages, and thin arrows represent the constructed DAG. Because of node mobility, node L becomes disconnected from the rest of the nodes, and the graph in Fig. 1a changes to that shown in Fig. 1b. Node A detects at time t1 ¼ 1 that it has lost its last outgoing link toward its leader node L; it then defines a new reference level (that is, ref1 ¼ ðt1 ; A; 0Þ) and broadcasts an Update packet to its neighbors. Fig. 1c shows the state of the graph after nodes B, C, and D have received the Update message sent by A. The reference level of A, ref1 , is now higher than that of its neighbors, so the update message has as effect the reversal of the links to B, C, and D. Nodes B and D execute case 1, and node C executes case 2. At time t2 ¼ 2, node F loses its last outgoing link. Therefore, it defines a new reference level (that is, ref2 ¼ ðt2 ; F ; 0Þ) and sends an Update message to node G. Fig. 1d depicts the state of the graph at a later time after E has received ref1 from nodes B and C and after G has received ref2 from node F . In the figure, node E changes its height according to case 1 and sends ref1 to node G, whereas node G takes no action since it still has an outgoing link. In Fig. 1e, node G loses its last outgoing link following the reception of ref1 . Therefore, it executes case 1 and sets its reference level to the largest among all its neighbors (that is, ref2 ).


3

Fig. 2. Time diagram of Malpani’s algorithm execution.

Fig. 1f shows the state of the graph after ref2 has been propagated from node G to node C. Node E, D, B, and A have executed case 2, and node C has executed case 2. At time t3 ¼ 8, node D loses its last outgoing link ðD; AÞ. Therefore, it defines a new reference level (that is, ref3 ¼ ðt3 ; D; 0Þ) and sends an Update message to its neighbor E. The update message has as effect the reversal of the ðD; EÞ link, as shown in Fig. 1g. Fig. 1g shows the state of the graph after the reflected reference level ref2 has been propagated from node A to node E. Upon reception of ref2 , node E loses its last outgoing link. As t3 > t2 , E sets its reference level to ref3 . In Fig. 1h, ref3 is propagated until it reaches node F and node A. Node F executes case 2 and sets its reference level to ðt3 ; D; 1Þ. In Fig. 1i, node C executes case 2 and sets its reference level to ðt3 ; D; 1Þ in response to ref3 propagation. Node F propagates the reflected reference level of ref3 toward node D. Fig. 1j shows the state of the graph after the reflected reference level of ref3 has been propagated back from nodes F and C toward node D. At time t ¼ 18, node D finds that all its neighbors have the same reflected reference level with oid ¼ D, so it has detected the partition. Then, it executes case 3, elects itself as leader, and propagates its height in the new component, which results in the creation of D-DAG, as shown in Fig. 1k. To detect network partitioning, a reference level must pass through all nodes of the component in two phases: 1) forward phase, in which nodes propagate the reference level with r ¼ 0, and 2) backward phase, in which nodes propagate the reflected reference level with r ¼ 1. This leads to a time complexity of Oð2dÞ, where d is the diameter of the network component. Fig. 2 shows the execution of the example presented in Fig. 1 on a time axis. Solid lines represent the time intervals during which the reference levels are propagated. Dotted lines represent the remaining time required for the algorithm to complete its execution if no further link failures occur. The reference level ref1 was started at time t1 ¼ 1 (see Fig. 1b), and it was stopped by node G at time t ¼ 4 (see Fig. 1e) due to the existence of ref2 that was started at time t2 ¼ 2. If ref2 has not occurred, the algorithm would detect the partitioning at time t ¼ 7, since the diameter of the component at time t ¼ 1 was d ¼ 3. In the same way, the ref2 propagation that was supposed to terminate at time t ¼ 13 was stopped by ref3 at time t ¼ 11 (Fig. 1g). The ref3 propagation was terminated at time t ¼ 18. Node D, which detected network partitioning, created a new DAG that took 4 units of time to terminate. From the example, we can conclude that each new reference level spoils all the efforts a leader election made before detecting network partitioning. The algorithm will never terminate if reference level generations do not stop. Therefore, the execution of the partition detection algorithm is not bounded, and nodes can be without a leader for undetermined time.

929

THE SELF-STABILIZING LEADER ELECTION ALGORITHM

3.1 Definitions and Assumptions The network is modeled as a graph G ¼ ðU; V Þ, where U represents the set of mobile nodes, and V represents the set of edges. The set of connected nodes is called a network component. The network component to which a node i belongs is denoted by Ci . We make the following assumptions about the nodes and the network: Communication links are bidirectional and FIFO. However, we can use the method presented in [16] that allows protocols primarily designed for bidirectional networks to work with unidirectional links. 2. Each node has a sufficiently large receive buffer to avoid buffer overflow. 3. There is no assumption about the number of nodes, as well as the diameter of the network. The state of a node is defined by the values of its local variables. A configuration of a distributed system G ¼ ðU; V Þ is an instance of the states of its nodes. The set of configurations of G is denoted as C. A computation of a system G is an infinite sequence of configurations. Node actions change the global system configuration. Moreover, several node actions may occur at the same time. The actions of each node i are of the form: < guardi >!< commandi > . The guard < guardi > of each action is a Boolean expression on the state of i and its neighborhood. < commandi > represents a list of assignment statements and primitives such as broadcast messages. We assume that the guarded action can be atomically executed: the evaluation of the guard and the execution of the statement are executed in one atomic action. 1.

Definition 1. Legitimate configuration of leader election. Configuration is a legitimate configuration of the leader election problem for a network component C if and only if (iff) 8i 2 C9l : lidi ¼ l and C is a DAG with every node i in C except for the leader l has a directed path to l. In this case, the component is said to be in a stable state. Otherwise, it is in an unstable state. Definition 2. Self-stabilizing leader election algorithm. An algorithm is a self-stabilizing leader election algorithm if any execution starting from an arbitrary configuration eventually reaches a legitimate configuration. Definition 3. A node in a DAG is said to be uncertain if it loses all its outgoing links due to either link failures or link reversals, because that node still does not know if a new route toward the leader is found. Otherwise, it is said to be certain. Definition 4. The subgraph induced by a set of connected certain nodes is called the certain subgraph. Otherwise, it is called the uncertain subgraph. Definition 5. A node in an uncertain subgraph that is adjacent to any node in a certain subgraph is called a frontier node, and a node in a certain subgraph that is adjacent to a frontier node is called a border node.

3.2 Basic Idea As shown in Section 2, Malpani’s algorithm does not stabilize if the generation of reference levels does not stop. Let us consider the situation where multiple reference levels are propagating during the same time period. In case 1, the

930


most recent one has always a higher priority than the others. In addition, case 4 chooses to generate a new reference level because it does not know if the component is partitioned or not. Each new reference level spoils all the effort made before to converge to a legitimate state and recover from earlier link failures. Moreover, case 5 does not define a timebased order on the different DAG propagations that occur in a component. These situations raise the question of how the algorithm could stabilize in the presence of nonstop concurrent topology changes or how the effect of topological changes that occur during the convergence time could be discarded. As all the reference level propagations, which have been triggered in response to link failures, have the same goal (that is, checking if the component is separated from the current leader), we can intuitively propose to change the criterion used to order reference levels and consider the oldest reference level as the highest priority instead of the most recent one. A reference level refj is higher than another refi iff j < i . Roughly speaking, a reference level will be stopped when it collides with an older one. The less-fresh propagation criterion incurs less message overhead, and it is closer to completion than the criterion adopted by Malpani’s algorithm. However, this approach is inefficient because we cannot distinguish between concurrent computations and disjoint ones, and hence, we can make false decisions about network partitioning. Therefore, it is not sufficient to only know the time at which the reference level is generated, because we cannot determine if two reference levels are still propagating or one of which has been already terminated. Therefore, we propose that each node records its knowledge about the age of the reference level it meets, that is, the time when the reference level is started and the time when the node receives this reference level for the last time. Hence, each node can decide if two reference levels are disjoint or concurrent. We change case 1 as follows: If reference levels are disjoint, the node stops propagating the older reference levels. Otherwise, it stops propagating the newer reference levels in favor of the oldest one. In case of merge of components, we also propose to stop the DAG propagation associated with newer leaders in favor of the oldest one.

3.3 Time-Interval-Based Computations In distributed systems, computations have durations. They can be specified by the instants at which they begin and end. In temporal logic, interval-based temporal logics are more expressive than instant-based ones since they are capable of describing events that occur in the system in time intervals. We could define an interval as an ordered pair ½t; u of instants, where t < u. If I ¼ ½t; u, we write BeginðIÞ ¼ t and EndðIÞ ¼ u. This notation allows us to refer to the instants marking the beginning and the end to any interval we refer to. A single time instant t can also be written as an interval ½t; t. Allen [1] has defined 13 basic binary relations between time intervals, six of which are inverses of the other six: before and after, overlaps and overlapped-by, starts and started-by, finishes and finished-by, during and contains, meets and met-by, and equals (see Fig. 3). Two time intervals A and B meet (or B met-by A) iff A precedes B, yet there is no period between A and B, and A and B do not overlap [2]. Let MeetsðA; BÞ be a predicate that evaluates to true if A meets B. The predicate

VOL. 19,

NO. 7, JULY 2008

Fig. 3. Possible relationship between intervals and instants. (a) A before B. (b) A meets B. (c) A overlaps B. (d) A equal B. (e) A starts B. (f) A finishes B. (g) A during B.

BeforeðA; BÞ (or AfterðB; AÞ) holds true if there exists another period that spans the time between them. Two time intervals A and B are disjoint if they do not overlap in any way. The predicate OverlapsðA; BÞ (or overlapped-byðB; AÞ) holds true if A starts before B, and B ends after A. The predicate EqualsðA; BÞ holds true if A and B both starts and ends at the same time. The predicate DuringðA; BÞ or ContainsðB; AÞ holds true if A has a later starting point and an earlier ending point than B. The predicate StartsðA; BÞ (or Started-byðB; AÞ) holds true if A has the same starting point as B, but it is contained in B. The predicate F inishesðA; BÞ (or F inished-byðB; AÞ) holds true if A has the same ending point as B, but it is contained in B. After presenting the relation between intervals, we need now to define the relations that exist between an instant t and an interval I. Fig. 3h shows the five possible relations. The predicate P recedesðt; IÞ and F ollowsðt; IÞ hold true if t < BeginðIÞ and t > EndðIÞ, respectively. The predicate Dividesðt; IÞ holds true if BeginðIÞ < t < EndðIÞ. Limitðt; IÞ holds true if either t ¼ BeginðIÞ or t ¼ EndðIÞ. In Fig. 3h, we have BeginðAÞ ¼ B2 and EndðAÞ ¼ B4 , and the predicates P recedesðB1 ; AÞ, F ollowsðB5 ; AÞ, and DividesðB3 ; AÞ hold true.

4

ALGORITHM DESCRIPTION

Each node i maintains two variables: the partition index and the height. The partition index (denoted by P Ii ) is a 3-tuple (Certaini , T ci , lidi ). lidi is the identifier of the node considered to be the leader of node i. Certaini , a Boolean variable, whose value is one if node i is in a certain state and zero otherwise. T ci denotes the time at which the node lidi has started the creation of its DAG. A new partition index is started by node i when it detects a network partitioning. The height Hi ¼ ðERLi ; i ; idi Þ, where ERLi is the extended reference level. This latter is a 3-tuple (½T bi ; T ei , oidi , ri ). We call the triple (T bi , oidi , ri ) a reference level, and it is denoted by RLi . The variable ½T bi ; T ei is called the reference level interval of i (denoted by RLIi ) and is the knowledge of node i about the time period during which the reference level is propagating. The definitions of oidi , ri , i , and idi are still unchanged.


931

Fig. 4. An example of executing the self-stabilizing leader election algorithm. (a) t ¼ 1. (b) t ¼ 2. (c) t ¼ 3. (d) t ¼ 4. (e) t ¼ 5. (f) t ¼ 8. (g) t ¼ 9. (h) t ¼ 10. (i) t ¼ 12.

4.1 Ordering of Time Intervals and Link Orientation Definition 6. Two time intervals I and J intersect iff I \ J 6¼ ;.

Definition 10. An equivalence class Sa is said to be more recent than Sb iff 8I 2 Sa , 8J 2 Sb : EndðJÞ < BeginðIÞ.

Definition 7. Two time intervals I and J -intersect, and we T write I J if there exists a chain of l intersections such that the following statement holds true:

Definition 11. Let us consider two time intervals I and J, both belonging to the same equivalence class. I is said to be older than J if the following holds:

ðI \ I1 Þ ^ ðI1 \ I2 Þ ^ ^ ðIl2 \ Il1 Þ ^ ðIl1 \ JÞ: T It is easy to show that the binary relation on a set S of time intervals is an equivalence relation. T Definition 8. For the equivalence relation , the set of elements of S that are related to a time interval, say, a, of S is called the equivalence class of a, and it is denoted by Sa such that T Sa ¼ fx 2 S : x ¼ a _ a xg. The set of equivalence classes T on a set S form a partition of S. of the equivalence relation

ðMeetsðI; JÞ _ OverlapsðI; JÞ _ F inished byðI; JÞ _ ContainsðI; JÞ _ LimitðJ; IÞ _ DividesðJ; IÞÞ:

Definition 9. fSI1 ; SI2 ; ; SIn g is a partition of S iff 1. 2. 3.

6 ;; 1 i n. SIi ¼ SIi \ SIj ¼ ; if SIi 6¼ SIj , 1 i, j n. n S SIi ¼ S. i¼1

In DAG, a link ði; jÞ is oriented from i to j according to the following conditions: 1. 2.

If nodes i and j are both certain and i > j (that is, ðD; AÞ in Fig. 4a). If both nodes are uncertain, we check their respective RLIs: .

.

If SRLIi is more recent than SRLIj , that is, AfterðRLIi ; RLIj Þ holds true (for example, ðB; EÞ in Fig. 4c). If RLIi and RLIj belong to the same equivalence class and RLIi is older than RLIj (for example, ðG; F Þ in Fig. 4e).

932


If RLIi and RLIj are not related to the same reference level and Started-by ðRLIi ; RLIj Þ holds true. . If RLIi and RLIj are related to the same reference level and ðri ¼ 1 ^ rj ¼ 0Þ (for example, ðE; BÞ in Fig. 4f). . If RLIi and RLIj are related to the same reference level and ðri ¼ rj Þ and StartðRLIi ; RLIj Þ hold true (for example, ðB; EÞ in Fig. 4d and ðF ; GÞ in Fig. 4f). . If EqualsðRLIi ; RLIj Þ holds true and ðoidi ; ri ; i ; idi Þ > ðoidj ; rj ; j ; idj Þ. If node i is uncertain and node j is certain (for example, ðA; DÞ in Fig. 4b). .

3.

4.2 Initialization Initially, each node i whose height is null (that is, Hi ¼ ð½; ; ; ; ; iÞ) can start the construction of a DAG. It sets P Ii to ð1; t; iÞ and Hi to ([0, 0], 0, 0, 0, i), where t is the current local time. Then, it broadcasts a CreateDagðP Ii ; Hi Þ message to its neighbors. Upon receiving such a message, each node j whose height is null sets P Ij to P Ii and Hj to ð½0; 0; oidi ; ri ; i þ 1; jÞ. Multiple DAG constructions can be triggered at the same time. DAG construction is depicted in Algorithm 1. Algorithm 1. DAG construction Action 1: (i wants to be a leader ^Hi ¼ null) ! 1: P Ii :¼ ð1; t; iÞ; 2: Hi :¼ ð½0; 0; 0; 0; 0; iÞ; 3: broadcast CreateDagðP Ii ; Hi Þ; Action 2: ði receives CreationðP Ij ; Hj Þ ^Hi ¼ nullÞ ! 1: P Ii ¼ P Ij ; 2: Hi :¼ ð½0; 0; oidj ; rj ; j þ 1; iÞ; 3: broadcast CreateDagðP Ii ; Hi Þ;

4.3 Concurrent Reference Level Propagation Algorithm 2 consists of two actions. Action 3 shows the reaction of a certain and an uncertain node to link failures, and Action 4 shows the instructions performed by a node in response to link reversals. Contrary to Malpani’s algorithm, we start a new reference level only if a certain node loses all its outgoing links. When an uncertain node loses all its outgoing link, it does not need to generate a new reference level that traverses uncertain subgraphs, which probably have no paths to the leader and have been traversed by previous reference levels. Instead, it selects one of the existing reference levels to resume its propagation. This choice avoids incurring more effort in terms of message overhead and time. It helps to quickly detect network partitioning if it has occurred. When a certain node i loses its last outgoing link, it creates a new reference level. It sets Certaini to zero and ERLi to ð½t; t; i; 0Þ. Then, it broadcasts F ailureðP Ii ; Hi Þ to its neighbors. When an uncertain node i loses its last outgoing link, it invokes the HandleUncertainReverse procedure. The procedure distinguishes four conditions. First, if i’s neighbors do not all have the same reference level, it sets RLi to the maximum reference level among its neighbors. It calculates using the function max the reference level interval ðRLIÞ

VOL. 19,

NO. 7, JULY 2008

with the highest priority. This function determines the most recent equivalence class S, and then, it returns the reference level with the oldest RLI among the intervals belonging to S. Second, if all neighbors have the same reference level with r ¼ 0, it reflects this reference level by setting ri to one, it means that the link from which it is supposed to receive a reflected reference level has been broken. Third, if all neighbors have the same reflected reference level with an originator other than i (that is, no route between i and the originator of the reference level is available; this situation is likely to happen when a network partitioning occurs in the network component where the reflected reference level is propagating) the node elects itself as leader and propagates a CreateW eakDagðP Ii ; Hi ; oldðlidi Þ; oldðT ci ÞÞ message. This propagation constructs what we call a weak DAG. The weak DAG will be explained later in this paper. Fourth, if all neighbors have the same reflected reference level with i as an originator, it detects network partitioning and starts the creation of a new DAG. In Action 4, when node i receives the Failure message from its neighbor j (that is, a reference level is propagating), it checks if it still has outgoing links. If so, this reference level will be stopped. Otherwise, it invokes the HandleUncertainReverse procedure. Fig. 4 shows an example of executing the proposed selfstabilizing leader election algorithm on the network in Fig. 1. In Fig. 4a, node A loses at time t ¼ 1 its last outgoing link toward its leader L. Therefore, it defines a new reference level ref1 ¼ ðt1 ; A; 0Þ and broadcasts its height to its neighbors. Fig. 4b shows the state of the graph after nodes B, C, and D have received the Update message sent by A. Nodes B and D execute condition 1, and node C executes condition 2. At time t2 ¼ 2, node F loses its last outgoing link. Therefore, it defines a new reference level (that is, ref2 ¼ ðt2 ; F ; 0Þ) and sends an Update message to node G. Fig. 4c depicts the state of the graph after ref1 has been sent from nodes B and C to node E. Node E changes its height according to condition 1 and sends its height to G. In Fig. 4d, node G loses its last outgoing link following the reception of ref1 from node E. Therefore, it executes condition 1 and sets its reference level to the one with the highest priority among all its neighbors, that is, ref1 . In Fig. 4e, node F executes condition 2 and sets its reference level to ðt1 ; A; 1Þ. Fig. 4f shows the state of the graph after the reflected reference level ref1 has been propagated from node F and received by node B. At time t3 ¼ 8, the uncertain node D loses its last outgoing link ðD; AÞ. Node D executes condition 3, elects itself as a leader, and creates a weak DAG. In Fig. 4g, node A finds that all its neighbors have the same reflected reference level with oid ¼ A, so it has detected the partition. Then, it executes condition 4 and elects itself as a leader and propagates its height in the new component, which results in the creation of A-DAG consisting of nodes A, B, and C, as shown in Fig. 4h. Algorithm 2. Reference level propagation Action 3: (i loses all its outgoing links) ! 1: if ðNi ¼ Þ then 2: P Ii :¼ ð1; t; iÞ; 3: Hi :¼ ð½0; 0; 0; 0; 0; iÞ; 4: broadcast CreateDagðP Ii ; Hi Þ; {i constructs a new DAG} 5: else


6: if Certaini ¼ 1 then 7: Certaini :¼ 0; 8: ERLi :¼ ð½t; t; i; 0Þ; 9: i :¼ 0; 10: broadcast F ailureðP Ii ; Hi Þ; 11: else 12: HandleUncertainReverse(); 13: end if 14: end if Action 4: (node i has no outgoing links due to link reversal following reception of F ailureðP Ij ; Hj ÞÞ ! 1: Certaini ¼ 0; 2: HandleUncertainReverse(); Procedure HandleUncertainReverse() 1: if neighbors do not have the same reference level {Condition 1} then 2: RLi :¼ fRLk jRLIk ¼ max fRLIj jj 2 Ni ; RLIi gg 3: T ei :¼ t; 4: i :¼ minfj jj 2 Ni and RLj ¼ RLi g 1; 5: broadcast F ailureðP Ii ; Hi Þ; 6: else if all neighbors have the same reference level with r ¼ 0 {Condition 2} then 7: ERLi :¼ ð½T bj ; t; oidj ; 1Þ; 8: i ¼ 0; 9: broadcast F ailureðP Ii ; Hi Þ; 10: else if all neighbors have the same reference level with r ¼ 1 and oidi 6¼ i {Condition 3} then 11: oldðlidi Þ :¼ lidi ; oldðT ci Þ :¼ T ci 12: P Ii :¼ ð1; t; iÞ; 13: Hi :¼ ð½0; 0; 0; 0; 0; iÞ; 14: broadcast CreateW eakDagðP Ii ; Hi ; oldðlidi Þ; oldðT ci ÞÞ; {i constructs a weak DAG} 15: else if all neighbors have the same reference level with r ¼ 1 and oidi ¼ i {Condition 4} then 16: P Ii ¼ ð1; t; iÞ; 17: Hi ¼ ð½0; 0; 0; 0; 0; iÞ; 18: broadcast CreateDagðP Ii ; Hi Þ; {i constructs a new DAG} 19: end if Algorithm 3. Merging of two network partitions Action 5: ððnode i detects a new neighbor j _ receives CreateDagðP Ij ; Hj ÞÞ ^ ðlidj 6¼ lidi ÞÞ ! 1: if P Ij P Ii then 2: P Ii :¼ P Ij ; 3: Hi :¼ ð½T bj ; T ej ; oidj ; rj ; j þ 1; iÞ; 4: broadcast CreateDagðP Ii ; Hi Þ; 5: else 6: Stop j-DAG propagation 7: end if Action 6: ððnode i receives CreateW eakDagðP Ij ; Hj ;oldðlidj Þ; oldðT cj ÞÞÞ ^ ðlidj 6¼ lidi ÞÞ ! 1: if ððoldðlidj Þ ¼ lidi ^ oldðT cj Þ ¼ T ci Þ ^ ðcertainj ¼ v1 _ ðcertaini ¼ 0 ^ ri ¼ 0ÞÞÞ then 2: broadcast F akeLeaderðlidj ; T cj Þ;

933

3: else if ðP Ij P Ii Þ then 4: P Ii :¼ P Ij ; 5: Hi :¼ ð½T bj ; T ej ; oidj ; rj ; j þ 1; iÞ; 6: broadcast CreateW eakDagðP Ii ; Hi ; oldðlidj Þ; oldðT cj ÞÞ; 7: else 8: broadcast CreateDagðP Ii ; Hi Þ; 9: end if Action 7: ðnode i receives F akeLeaderðlidk ; T ck Þ from node jÞ ! 1: if ðlidi ¼ lidk ^ T ci ¼ T ck Þ then 2: P Ii :¼ P Ij ; 3: Hi :¼ ð½T bj ; T ej ; oidj ; rj ; j þ 1; iÞ; 4: Broadcast F akeLeaderðlidk ; T ck Þ; 5: end if

4.4 Merging of Multiple Network Partitions Algorithm 3 consists of actions dealing with the propagation of a new leader throughout a network partition. As explained in Section 4.3, the node detecting network partitioning propagates a CreateDag message if it is the originator of the reference level or a CreateWeakDag message otherwise. In Action 5, when a node i detects the existence of a new network partition or receives a CreateDag message from node j such that lidi 6¼ lidj , it will adopt lidj as a leader if the partition index of j has a higher priority than that of i, and we write P Ij P Ii . D e f i n i t i o n 1 2 . P Ij P Ii ððCertainj > Certaini Þ _ ððCertainj ¼ Certaini Þ ^ ððT cj ; lidj Þ ðT ci ; lidi ÞÞÞÞ, and ðT cj ;lidj Þ ðT ci ;lidiÞ ðT cj < T ciÞ_ððT cj ¼ T ciÞ^ðlidj > lidiÞÞ: By Definition 12, a node first checks the value of the Certain variable. It selects the node that is sure to have a directed path toward its leader node, and it is not involved in link reversal actions. Then, it checks the DAG generation time and the leader identifier. The condition P Ij > P Ii incurs three cases: Node j is certain, and i is uncertain. Nodes i and j both have the same value of certain, and j’s partition is created at a time older than that of i. 3. Nodes i and j both have the same value of certain and the same value of DAG creation time, and lidj > lidi . In Fig. 4g, node A, which detects network partitioning, sets P IA to (1, 9, A). Then, it propagates the CreateDag message. Upon receiving such a message, nodes of the disconnected partition, which are uncertain nodes (for example, nodes B and C in Fig. 4h), adopt A as their leader. If the condition presented in (HandleUncertainReverse: line 10) holds true, node i declares itself a leader. As i is not the originator of the reference level, it may happen that this declaration is false because both node i and the originator of the reference level are still part of the same network partition. In this case, it creates a weak DAG by generating a CreateWeakDag message that contains, besides its partition index and its height, also the identifier of its previous leader and its previous T c (for example, node D in Fig. 4g). The weak DAG differs from the known DAG in the fact that its propagation can be stopped by an uncertain node. In Action 6, when an uncertain node receives the CreateWeakDag message from node j, it first checks if the condition 1. 2.

934


Fig. 5. Time diagram of the execution of the proposed algorithm.

ðððoldðlidj Þ ¼ lidi ^ oldðT cj Þ ¼ T ci Þ^ ðcertaini ¼ 1 _ ðcertaini ¼ 0 ^ ri ¼ 0ÞÞÞÞ holds true. The first part of the condition means that nodes lidi and lidj were part of the same DAG, and the second part means that either node i still has a path toward lidi or i has not yet reflected its reference level (that is, the reference level is in the forward phase). This indicates that network partitioning has not occurred and, hence, the decision of creating a new DAG by lidj was not correct. Node lidj is called in this case a fake leader. Node i then broadcasts a FakeLeader(lidj ; T cj ) message. If lidj is not a fake leader, node i will adopt lidj as a leader provided that P Ij P Ii . As D-DAG is older than A-DAG, D will be the unique leader of the component as shown in Fig. 4i. In Action 7, each node that finds out that its leader is fake following the reception of a FakeLeader message from its neighbor j adopts lidj as a leader and broadcasts in turn the FakeLeader message. In this manner, a corrective action will be propagated resulting in deleting the fake leader from all the nodes.

4.5

Time Diagram of the Execution of the Self-Stabilizing Leader Election Algorithm Fig. 5 shows a time diagram of the self-stabilizing leader election algorithm execution related to the example in Fig. 4. The topology change scenario of this example is the same as the one presented in Fig. 1. The reference level ref1 that starts at time t1 ¼ 1 blocks the newer reference level ref2 generated at time t ¼ 2, since their reference level intervals belong to the same equivalence class. A weak DAG is generated by node D at time t ¼ 8; the network partitioning is detected by node A at time t ¼ 9. The new DAG construction is terminated at time t ¼ 12. A-DAG is deleted when it detects the existence of D-DAG with T cD ¼ 8. The policy that selects the DAG with the oldest

VOL. 19,

NO. 7, JULY 2008

value of time creation helps to block the propagation of any newer DAG created in the same network component. In Malpani’s algorithm, the same scenario needs 21 units of time to recover from three link failures, detect network partitioning, and construct a leader-oriented DAG, whereas our algorithm needs only 11 units of time. In comparison with Malpani’s algorithm, it is obvious that the proposed self-stabilizing leader election algorithm shows better performance in terms of convergence time and message overhead.

5

SIMULATION RESULTS

In this section, we discuss the performance of our selfstabilizing leader election algorithm compared to Malpani’s algorithm by using the GloMoSim simulator [22]. In our experimental results, each plotted point represents the average of five executions. We plot the 95 percent confidence interval on the graphs. Our simulation environment is characterized by an area of 1,000 m 1,000 m, with random initial nodes’ location, a random waypoint mobility model, in which each mobile node randomly selects a location with a random speed uniformly distributed between zero and a certain maximum speed Vmax and then stays stationary during a pause time of 1 second before moving to a new random location. The transmission power is set to 12 dbm, which is equivalent to a transmission range of 324 m. We evaluate the performance of the algorithms using the following three metrics: Fraction of stabilization time. This is the fraction of time that a node i has at least a directed path to its leader lidi , and lidi is the leader of the network component, that is, the network component Ci is in a stable state. 2. Convergence time. This is defined as the mean time elapsed between the instant at which the first topology change occurs in Ci and the instant at which Ci is again in a stable state. 3. Message overhead. This is defined as the number of messages generated during algorithm execution. Every 1 ms, we monitor the status of the node (that is, stable or unstable). Thus, if the algorithm can converge to a legitimate state within a time t < 1 ms, the change of the status is not recorded. Figs. 6, 7, and 8 show the three metrics as a function of Vmax and for three levels of network densities: low (20 nodes), medium (30 nodes), and high (40 nodes). 1.

Fig. 6. Fraction of stabilization time. (a) Malpani’s algorithm. (b) Our algorithm.


935

Fig. 7. Convergence time. (a) Malpani’s algorithm. (b) Our algorithm.

Fig. 8. Message overhead. (a) Malpani’s algorithm. (b) Our algorithm.

We observe in Fig. 6a that Malpani’s algorithm shows disastrous results in terms of the fraction of stabilization time. As Vmax increases, the fraction of stabilization time decreases. This is due to the fact that the number of reference levels that are triggered during the same time period and the number of merge events also increase, and hence, the algorithm takes more time to converge to a stable state. We also remark that the stabilization time of Malpani’s algorithm increases as density increases. The reason for this is that the size of the network component increases, and hence, more nodes may trigger their reference levels during the same time period. In this case, the algorithm needs more time to recover from all the failures that occurred in the component. In Fig. 7a, Malpani’s algorithm takes a long time to converge to a stable state than the self-stabilizing leader election algorithm for the same reasons discussed in reference to Fig. 6a. The first observation we draw from Figs. 6b and 7b is that our self-stabilizing leader election algorithm outperforms Malpani’s algorithm in terms of the fraction of stabilization time and convergence time. Each node can have a path to a correct leader over 99 percent of the time, and the convergence time is less than 7 seconds. This is due to the fact that reference levels that are propagated as a result of new link failures are stopped by the oldest one. In addition, when multiple components concurrently merge, the oldest DAG propagation encompasses new ones. We can conclude that although the network topology changes at high speed and the number of nodes that trigger reference level propagation and DAG-propagation is high, our algorithm is slightly affected by the variation of mobility and node density. We observe in Figs. 6b and 7b that for high and medium density, the fraction of stabilization time and convergence time are one and zero, respectively, when V max is less than

15 m/s. In this case, the size of network components is large, and those components remain connected for longer durations. The occurrence of concurrent topological events is unlikely to happen since node speed is low. The second observation from the two figures is that for N ¼ 20 and V max < 15, the stabilization time is different from one, and the convergence time is different from zero. This is due to the fact that the number of components with a few nodes is high, and hence, more merge operations occur, which decreases the stabilization time and increases the convergence time. We also observe that for V max > 25, the algorithm under low density shows better results in terms of stabilization time and convergence time than it does under medium density. This can be explained as follows: when N ¼ 30, large components split up in many components with a few nodes, and hence, the number of components in the network increases, which leads to more merge operations occurring. On the other hand, for N ¼ 20, the number of components under low or high speed is not much affected. For N ¼ 40, the stabilization time and the convergence time are slightly affected by the variation of mobility because the node density becomes very high, and most of the nodes belong to a large connected component. High node density means that components remain connected for longer durations. Although nodes get disconnected from their leaders, at high speeds, they are disconnected only for very short durations. We observe in Fig. 8 that our self-stabilizing algorithm outperforms Malpani’s algorithm in terms of message overhead. In Malpani’s algorithm, the message overhead increases with N and V max since more topological changes are expected to occur, whereas the self-stabilizing algorithm discards the effect of new topological changes.

936


6

PROOF

OF

CORRECTNESS

Before proving the correctness of the self-stabilizing leader election algorithm, we will formally state the definitions of predicates employed by the proof: bcasti ðmÞ (resp., recvij ðmÞ) is a predicate that is true at the instant when node i broadcasts the message m (resp. i receives a message m). It is falsified once i terminates, executing the atomic action in which the broadcast is executed. . DeleteDAGðt; idÞ is a predicate that is true when the DAG propagation generated by node id at time t is stopped, and then, it is deleted. . Ii ðP Ii ¼ null ^ Hi ¼ nullÞ. . P ð8i 2 Ci : Li ) ð8j 2 Ci ; i 6¼ j : :Lj ÞÞ such that Ci is a connected component to which i belongs. P is called the legitimate state. . Li ðlidi ¼ i ^ ð8j 2 Ci ; j 6¼ i : lidj ¼ i ^ j has a directed path toward iÞÞ. An illegitimate state is defined to be either .

a set of nodes that do not have a path to any leader or a set of nodes that have at least a path to a leader node but do not have the correct id in their lid variable. Assuming that each message transmission takes one time unit and each node i starts execution in a state satisfying predicate Ii , we prove the following theorem: 1. 2.

Theorem 1. After the occurrence of a topological change, the network component can within a finite time converge to a state satisfying predicate P even if further topological changes occur during the convergence time. The proof for a single topological change can be found in [12]. In this paper, we analyze the self-stabilizing leader election for concurrent topology changes as follows: for each component Ci , we start observing the algorithm execution when the first topology changes occur in Ci ðts Þ. Ci ðtÞ denotes the component to which i belongs at time t. The observation ends when the algorithm terminates (that is, none of the actions are enabled at any node in Ci ðte Þ at time te ) and reaches a state satisfying the predicate P . It is obvious that Ci changes as time progresses. We consider the following six cases of concurrent topology changes: . . . . .

.

concurrent DAG propagations in a network component, concurrent reference level propagations that do not partition the network component in which they occur, concurrent reference level propagations that partition the network component in which they occur, concurrent merging of network components, concurrent merging of network components with concurrent DAG propagations in the components, and concurrent merging of network components with concurrent reference level propagations that partition the components.

Definition 13. leveli ðkÞ in a network component Ci is the length of the longest path between nodes k and i, and disti ðkÞ is the shortest path distance from i to k. Definition 14. The set of nodes that are l-hop away from i (denoted by Ali ) is defined as follows: Ali ¼ fjjdisti ðjÞ ¼ lg.

VOL. 19,

NO. 7, JULY 2008

Definition 15. The set of nodes that are within l-hop away from i (denoted by Uil ) is defined as follows: Uil ¼ fjjdisti ðjÞ lg. Case 1. Let us consider a set of nodes 1 ; ; m , each of which initiates a DAG construction at T1 ; ; Tm , respectively, such that ðT1 ; 1 Þ ðTm ; m Þ. Lemma 1. Concurrent DAG propagations in a network component will collapse in a finite time into a final one that defines a leader-oriented DAG. Proof. A node i starts DAG propagation by performing Action 1 or HandleUncertainReverse(line 10, 15). It sets P Ii and Hi to (1, ti , i) and [0, 0], 0, 0, 0, i), respectively. Then, it broadcasts a CreateDag message. Multiple DAG propagations in a network component implies that ð9j; k 2 Ci : lidj 6¼ lidk Þ. Let n be the number of time units after node 1 has started a DAG creation. We will show by induction on n that Pn ðlidi ¼ 1 ^ ð8j 2 Uin f1 g : lidj ¼ 1 ^ j has a directed path toward 1 ÞÞ Basic case: r ¼ 0. Using the definition of Uil , U01 ¼ f1 g, and lid1 ¼ 1 . Thus, P0 holds true. Inductive hypothesis. Let us assume that Pn holds true for all l < d such that d is the diameter of Ci . We will f1 g : now prove that Pnþ1 ðlidi ¼ 1 ^ ð8j 2 Unþ1 1 lidj ¼ 1 ^ j has a directed path toward 1 ÞÞ \ N Pn ) 8i8j : j 2 An1 ^ i 2 Anþ1 j 1 ð1Þ ^ bcastj ðCreateDagÞ ) recvij ðCreateDagÞ: Statement (1) means that when node j joins 1 -DAG, it broadcasts a CreateDag message; all nodes i that have a connection with j will receive the message after a unit of time. From (1), it follows that ^ 8i8j : j 2 An1 ^ i 2 Anþ1 1 \ Nj ð2Þ recvij ðCreateDagÞ ^ Hi ¼ null ) ðlidi ¼ lidi Þ;

8i8j : j 2 An1 ^ i 2 Anþ1 1 \ Nj

^ recvij ðCreateDagÞ ^ P Ij P Ii ) lidi ¼ lidj ^ DeleteDagðT ci ; lidi Þ :

ð3Þ

By statements (2) and (3), when the P Ij -DAG propagation arrives at node i that is not assigned to any leader, i will be annexed to P Ij -DAG. If P Ij P Ii , P Ij will continue its propagation. Otherwise, the DAG propagation of P Ij will be stopped. Nodes i and j are certain, lidi ¼ lidj ¼ 1 , and Hi > Hj . Thus, node i has an outgoing link toward j. As node j has a directed path toward i, Pnþ1 holds true. Therefore, at time t1 þ maxfdist1 ðxÞjx 2 G1 g, all nodes will have the correct leader and a path oriented toward u t 1 and, thus, P holds true. Case 2. Let 1 ; ; k be a set of nodes belonging to l-DAG that lose their last outgoing links at t1 ; ; tk , respectively, and the network component in which these failures occur remains connected.


937

Lemma 2. All reference levels that do not partition the component in which they occur will eventually be stopped.

link, and D is the average diameter of a network component.

Proof. It is easy to show that if the network component is not partitioned after the generation of concurrent reference levels, it means that there exists at least one border node that has a direct path to the leader, and hence, all the reference levels will eventually be stopped. u t

Lemma 4. If Ci has the highest P I, the necessary condition so 1 . that the algorithm stabilizes is X < fD

Lemma 3. Eventually, within a finite time, network partitioning will be detected, and the disconnected component will define a new leader-oriented DAG. Proof. A network that is partitioned means that there is no border nodes, and the reference level must pass through the nodes of the component in two phases. We assume that there exists a set of nodes p ; ; k such that ðp > 1 ^ k > pÞ, and their respective RLIs belong to the same equivalence class. We can remark that RLIp has the highest priority in this set. We will now consider two cases: Case 3a. If l-DAG is partitioned at time tp , p generates a reference level. As RLIp is the highest priority, it will terminate any new reference level generated at time tx > tp . Therefore, the network partitioning will be detected at time ðtp þ 2dÞ. Case 3b. l-DAG was not partitioned at time tp , but the partitioning occurred at a later time tq . If the propagation of RLp is not stopped during its forward phase, the partition detection algorithm will take Oð2dÞ time units. By Action 4, any new reference level RLx generated at time tx > tp will be stopped and deleted by RLp . Otherwise, the propagation of RLp is stopped by a border node m at time tp þ levelp ðmÞ. At time tq 2 ½tp; tp þ levelp ðmÞ, a node q loses its last outgoing link, which results in network partitioning. Therefore, it generates a new reference level. The reference level RLq will terminate any reference level belonging to an older equivalence class or generated after tq . When RLq arrives at node m after at most ðd levelp ðmÞÞ time units, node m compares RLIp and RLIq . As OverlapsðRLIp ; RLIq Þ holds true, node m stops RLq and resumes the propagation of RLp . RLp needs ðd levelp ðmÞÞ time units to complete the forward phase and another d time units for the backward phase. During this period, any further failures will be stopped by RLp . The network partitioning will be detected at ðtq þ 3d 2 levelp ðmÞÞ. Therefore, the partition detection time is upper bounded by Oð3dÞ. By Action 3 and Action 4, it may happen that during partition detection algorithm execution, other nodes that are not the originator of the reference level detect partitioning and create a weak DAG. The originator in turn creates its DAG. The node with the highest P I will become the unique leader of the disconnected component. u t Case 4. We consider an infinite merging of components C1 ; C2 ; with the component Ci . We assume that f is the frequency at which new components merge with Ci . X is the average time to send a message over a wireless

Proof. When Ci merges with a component Cx with diameter Dx ðx > 0Þ, a CreateDag message is propagated throughout Cx . Initially, the component Ci is in a stable state. When C1 merges with Ci at time t1 , it terminates the stable period and initiates a new unstable period. At t1 , the average remaining time required so that the CreateDag message traverses the C1 component is ðX D1 Þ, since it would take this period of time if no further merging occurred beyond this instant. As time progresses from t1 and the CreateDag message is propagating, the average remaining time required to cover C1 is reduced at the rate X sec=sec. At time t2 2 ½t1 ; t1 þ X D1 , a new component C2 merges with the component fCi ðt1 Þ; C1 g and forces the convergence time to increase by D2 . The remaining time continues to decrease until it reaches the instant tm , at which it has traversed the whole network component Ci ðtm Þ. This terminates the unstable period and initiates a new stable period. The stable period is terminated at time tmþ1 when Cmþ1 merges with Ci ðtm Þ. The algorithm is self-stabilizing if 9m such that the CreateDag message can traverse the component Ci ðtm Þ Pm before the arrival of Cmþ1 . Formally, j¼1 ðDj XÞ < tmþ1 t1 . The time between two mergings is f1 . X P m m 1 ð m j¼1 Dj Þ < f ) mX D < f ) X < fD . The average convergence time in this case is upper boundedby

m f .

u t

Lemma 5. Given a set of components C1 ; C2 ; ; Ck1 ; Ck ; Ci ; Ciþ1 ; s u c h t h a t P I1 P I2 P Ik1 P Ik P Ii P Iiþ1 . The concurrent merging of Ci with some or all of these components will terminate within a finite period of time. Proof. When a network component Cp merges with Cq and P Ip > P Iq , a CreateDag message will be propagated throughout Cp . To calculate the worst case of the convergence time, we assume that Ci merges with the components in the following order Ck ; Ck1 ; ; C1 . When Ci merges with Ck at time tk , a CreateDag message will be propagated throughout Ci , and each node in Ci will set its P Ii to P Ik . Before this propagation is terminated, the component Ck1 merges at time tk1 with the component Ci ðtk Þ. If no further merging occurs, the component Ci will stabilize within X Di time units. The propagation of P Ik1 will take no more than XðDi þ Dk Þ before the arrival of Ck2 and so on. As the time interval ½T c1 ; T ck is finite, the number of network components that have a higher priority than Ci is also finite. Assume that Ci merges with components that have higher partition indices than its own. The time needed for a node belonging to Ci ðtk Þ (that is, a node that does not leave its component) to have the highest partition index is upper bounded by Xðk Di þ ðk 1Þ Dk þðk2ÞDk1 þ þD2 Þ ¼ XDðkþðk 1Þþ þ1Þ, X DÞ. After this time interval, which equals to ðkðkþ1Þ 2 the component Ci will have the highest partition index. By Lemma 4, in the presence of infinite merging, the component Ci that has now the highest partition index

938


VOL. 19,

NO. 7, JULY 2008

1 will stabilize if X < fD , and it will take in the worst case m additional ð f Þ time units. Thus, the average convergence kðkþ1Þ time is upper bounded by ðm X DÞ. u t 2 f þ

merges with components of higher P I. In this case, the average convergence time is upper bounded by kðkþ1Þ ðmþ1 XDÞ. u t 2 f þ

Lemma 6. Given a set of components C1 ; C2 ; ; Ck1 ; Ck ; Ci ; Ciþ1 ; such that P I1 P I2 P Ik1 P Ik P Ii P Iiþ1 and the merging of two components occurs when a new link is formed between their uncertain nodes. Then, the concurrent merging of Ci with some or all of these components will terminate within a finite period of time.

Case 6. We consider the case where concurrent link failures occurring at nodes 1 ; ; k at t1 ; ; tk , respectively, leads to the disconnection of a component Ci from l-DAG. During reference level propagations, infinite components C1 ; C2 ; merge with Ci .

Proof. A network component Cx may consist of multiple uncertain subgraphs denoted by Ux and a unique certain subgraph denoted by Sx . We consider that the diameter of an uncertain subgraph Ux is denoted by x , and the merging of two components Cx and Cy occurs when a new link is formed between their respective uncertain subgraphs Ux and Uy . Let us assume the execution of the following scenario: When the subgraph Ux merges with Uy at time tx and P Iy > P Ix , a CreateDag message will be propagated in Ux . After traversing x hops, the CreateDag message is stopped by a certain node p 2 Cx . Node p propagates in turn a CreateDag message that traverses Ux and Uy until it reaches a certain node q 2 Cy . If P Iq > P Ip , node q triggers a lidq -DAG propagation through Uy and Ci . From this scenario, the maximum time needed to complete the merge of two components Ui and Uk is ð2ði þ k Þ þ Di Þ. Let denote the average diameter of uncertain subgraphs. By Lemma 5, a component Ci gets the highest priority index after the merging with k components. Therefore, it needs ðð4k þ kðkþ1Þ DÞXÞ time units. By Lemma 4, after Ci has 2 obtained the PI with the highest priority and in the presence of infinite merging, the algorithm will stabilize 1 , and it will take in the worst case an additional if X < fD m ð f Þ time units. Thus, the average convergence time is kðkþ1Þ upper bounded by ðm DÞXÞ. u t 2 f þ ð4k þ Case 5. Let us consider a set of nodes 1 ; ; m belonging to Ci , each of which initiates a DAG construction at T1 ; ; Tm , respectively, such that ðT1 ; 1 Þ ðTm ; m Þ. During DAG propagation, infinite components C1 ; C2 ; merge with Ci . Lemma 7. If Ci has the highest priority index, the algorithm will 1 stabilize if X < fD . Otherwise, it will take ðkðkþ1Þ D XÞ 2 additional time units. Proof. By Lemma 1, the DAG propagation with the highest partition index (that is, ð1; T1 , 1 Þ) stops all other ongoing DAG propagations. The proof of infinite mergings is similar to that presented in Lemma 4. We assume that during 1 -DAG propagation, an infinite sequence of components merges with Ci . The algorithm is selfstabilizing if 9m such that the CreateDag message can cover the component Ci ðtm Þ before the arrival of Cmþ1 . P Formally, ðDi þ m j¼1 ðDj XÞ < tmþ1 T1 Þ , ðDi þ Pm mþ1 1 X ð j¼1 Dj Þ < f Þ ) ðm þ 1ÞXD < mþ1 f ) X < fD . The average convergence time in this case is upper bounded by mþ1 f . By Lemma 5, the algorithm needs additional ðkðkþ1Þ DXÞ units of time to stabilize if Ci 2

Lemma 8. If the disconnected component Ci merges only with components of lower partition index, the algorithm will stabilize in worst cases after an average time of 2 1 ðmþ1 f Þð2 þ 1fDX þ ð1fDXÞ2 Þ. Otherwise, it will take D XÞ additional time units. ððkÞðkþ1Þ 2 Proof. Case 6a. Ci merges only with components of lower partition index. By Lemma 3, a reference level RLq that is propagating in Ci needs to traverse Di hops to complete the forward phase. If during the forward phase, components of lower priority merge with the disconnected component Ci , then the new resulting components will adopt lidi ¼ l as a leader, which is now unreachable. By Lemma 5, a reference level propagation in a component that has the highest P I while infinite mergings are occurring needs at most ðmþ1Þ units of time to cover Ci f before the occurrence of the ðm þ 1Þth merging. The backward phase of RLq in Lemma 3 will take at most ðmþ1Þ units of time before colliding with RLp . At this f time, RLq will be stopped, and RLp resumes its forward propagation phase. During this phase, components with low priority continue to merge with Ci . The average time required to cover Ci is calculated as follows: The diameter of component Ci is increased at the rate of fD hops/sec. At the end of RLq ’s backward phase, the diameter of Ci is upper bounded by D0 ¼ ððmþ1Þ f fDÞ. Therefore, RLp ’s forward phase needs T0 ¼ D0 X time units to traversethis new diameter. During T0 , Ci is increased by fDT0 hops, which means that additional T1 ¼ fDXT0 units of time are needed so that RLp completes its forward phase. In the same way, during T1 , Ci ’s diameter is increased by fDT1 hops and so on. Formally, 1 1 X mþ1 X Tn ¼ ðfDXÞ: f n¼0 n¼0 Provided that fDX < 1,

P1

1 ¼ 1fDX < 1. Thus, mþ1 P1 f is n¼0 Tn ¼ ð1fDX Þ.

n¼0 ðfDXÞ

the forward phase duration of RLp

The backward phase of RLp will take the same time as the forward one. The time period to detect network partition2 ing is upper bounded by ððmþ1Þ f ð2 þ 1fDX ÞÞ. At the end of

RLp ’s backward phase, the diameter of Ci becomes P1 n¼0 ðTn Þ. Therefore, the p -DAG propagation

Dnew ¼ fD 0

new needs T0new ¼ Dnew hops. 0 X time units to traverse D0


During T0new , Ci increases by fDT0new . Using the same method as in the RLp forward phase, the p -DAG propagation in the Ci component will terminate after P1 ðmþ1 f Þ new ¼ ð1fDXÞ 2 . Therefore, the average convergence n¼0 Tn 2 1 time is upper bounded by ðmþ1 f Þð2 þ 1fDX þ ð1fDXÞ2 Þ time

units. Case 6b. Ci merges with k components of higher partition index. By Lemma 5, the algorithm needs an DXÞ units of time to stabilize. In this additional ðkðkþ1Þ 2 case, the average convergence time is upper bounded by ! mþ1 2 1 kðk þ 1Þ þ DX : þ f 2 1 fDX ð1 fDXÞ2 u t The previous eight lemmas prove Theorem 1.

7

CONCLUSION

In this paper, we have proposed a self-stabilizing leader election algorithm that can converge to a legitimate state even in the presence of topological changes during convergence time. By defining concurrent and disjoint computations and their corresponding intervals, an older reference level encompasses any new one belonging to its equivalence class. In the same way, an older DAG propagation encompasses new ones. Simulation results show that the proposed selfstabilizing leader election algorithm experiences very optimal results in terms of the fraction of stabilization time, convergence time, and message overhead compared to Malpani’s algorithm. We have provided a novel observation about self-stabilization by defining the unstable period by the time period that begins when a component C that contains a node i enters an illegitimate state and terminates when this particular component is again in a legitimate state. By determining the frequency at which merging can occur, we have shown that the unstable period is bounded.

REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9]

J.F. Allen, “Maintaining Knowledge about Temporal Intervals,” Comm. ACM, vol. 26, no. 11, pp. 832-843, 1983. J.F. Allen and G. Ferguson, “Actions and Events in Interval Temporal Logic,“ Technical Report TR521, Univ. of Rochester, 1994. A.D. Amis, R. Prakash, T.H.P. Vuong, and D.T. Huynh, “Maxmin D-Cluster Formation in Wireless Ad Hoc Networks,” Proc. IEEE INFOCOM ’00, pp. 32-41, 2000. N. Asokan and P. Ginzboorg, “Key Agreement in Ad Hoc Networks,” Computer Comm., vol. 23, no. 17, pp. 1627-1637, 2000. D.J. Baker and A. Ephremides, “The Architectural Organization of a Mobile Radio Network via a Distributed Algorithm,” IEEE Trans. Comm., vol. 29, no. 11, pp. 1694-1701, 1981. E.W. Dijkstra and C.S. Scholten, “Termination Detection for Diffusing Computations,” Information Processing Letters, vol. 11, no. 1, pp. 1-4, Aug. 1980. M. Gerla and J.T.-C. Tsai, “Multicluster, Mobile, Multimedia Radio Network,” ACM/Baltzer Wireless Networks, vol. 1, no. 3, pp. 255-265, 1995. T.-C. Hou and T.-J. Tsai, “An Access-Based Clustering Protocol for Multihop Wireless Ad Hoc Networks,” IEEE J. Selected Areas in Comm., vol. 19, no. 7, July 2001. E. Kaplan, Understanding GPS. Artech House, 1996.

939

[10] B. Lehane, L. Dolye, and D. O’Mahony, “Ad Hoc Key Management Infrastructure,” Proc. Int’l Conf. Information Technology: Coding and Computing (ITCC ’05), vol. 2, pp. 540-545, 2005. [11] N. Lynch, Distributed Algorithms. Morgan Kaufmann, 1996. [12] N. Malpani, J.L. Welch, and N. Vaidya, “Leader Election Algorithms for Mobile Ad Hoc Networks,” Proc. Fourth Int’l Workshop Discrete Algorithms and Methods for Mobile Computing and Comm., pp. 96-103, 2000. [13] A.B. McDonald and T.F. Znati, “A Mobility-Based Framework for Adaptive Clustering in Wireless Ad Hoc Networks,” IEEE J. Selected Areas in Comm., vol. 17, no. 8, pp. 1466-1487, Aug. 1999. [14] D. Mills, Network Time Protocol, Specification, Implementation and Analysis, Internet draft 1119, RFC, Sept. 1989. [15] V.D. Park and M.S. Corson, “A Highly Adaptive Distributed Routing Algorithm for Mobile Wireless Networks,” Proc. IEEE INFOCOM ’97, pp. 1405-1413, Apr. 1997. [16] V. Ramasubramanian, R. Chandra, and D. Mosse, “Providing a Bidirectional Abstraction for Unidirectional Ad Hoc Networks,” Proc. IEEE INFOCOM, 2002. [17] C.L. Richard and M. Gerla, “Adaptive Clustering for Mobile Wireless Networks,” IEEE J. Selected Areas in Comm., vol. 15, no. 7, pp. 1265-1275, Sept. 1997. [18] G.-C. Roman, Q. Huang, and A. Hazemi, “Consistent Group Membership in Ad Hoc Networks,” Proc. 23rd Int’l Conf. Software Eng. (ICSE ’01), pp. 381-388, 2001. [19] K. Ro¨mer, “Time Synchronization in Ad Hoc Networks,” Proc. ACM MobiHoc ’01, pp. 173-182, 2001. [20] J. So and N. Vaidya, “MTSF: A Timing Synchronization Protocol to Support Synchronous Operations in Multihop Wireless Networks,” technical report, Univ. of Illinois, Urbana-Champaign, Oct. 2004. [21] S. Vasudevan, J. Kurose, and D. Towsley, “Design and Analysis of a Leader Election Algorithm for Mobile Ad Hoc Networks,” Proc. 12th IEEE Int’l Conf. Network Protocols (ICNP ’04), pp. 350-360, Oct. 2004. [22] X. Zeng, R. Bagrodia, and M. Gerla, “GloMoSim: A Library for Parallel Simulation of Large-Scale Wireless Networks,” Proc. 12th Workshop Parallel and Distributed Simulation (PADS ’98), pp. 154-161, 1998. Abdelouahid Derhab received the engineer, master’s, and PhD degrees in computer science from the University of Sciences and Technology Houari Boumediene (USTHB), Algiers, Algeria, in 2001, 2003, and 2007, respectively. He is currently an associate researcher in the Department of Computer Engineering, CEntre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers. His interest research areas are distributed algorithms in mobile systems, data management, and ad hoc networks. Nadjib Badache received the engineer degree in computer science from the University of Constantine, Algeria, in 1978 and the master’s and PhD degrees from the University of Sciences and Technology Houari Boummediene (USTHB), Algiers, Algeria, in 1982 and 1998, respectively. In 1995, he joined The ADP research group at IRISA, France, where he prepared a PhD thesis on the causal ordering and fault tolerance in a mobile environment. He is currently a professor in the Computer Science Department, USTHB, where he is also the head of LSI. He is the author of many papers, and he supervised many thesis and research projects. His research interests are distributed mobile systems, mobile ad hoc networks, and security. . For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.

A Self-Stabilizing Leader Election Algorithm in Highly ... - IEEE Xplore

A Self-Stabilizing Leader Election Algorithm in Highly ... - IEEE Xplore

Suggest Documents

LOGO: A New Distributed Leader Election Algorithm

CHEATS: A cluster-head election algorithm for WSN ... - IEEE Xplore

Leader-Follower Framework - IEEE Xplore

leader election algorithm in hypercubes with the ... - Semantic Scholar

Replica Placement Algorithm for Highly Available Peer ... - IEEE Xplore

(PSO) Algorithm - IEEE Xplore

EM algorithm - IEEE Xplore

PEDESTRIAN DETECTION IN HIGHLY CROWDED ... - IEEE Xplore

an improved leader election algorithm for distributed systems

An O(1) RMRs Leader Election Algorithm - CiteSeerX

An Asynchronous Leader Election Algorithm for Dynamic Networks ...

Leader Election Algorithm for Distributed Ad-hoc Cognitive Radio ...

A DOWNSCALING ALGORITHM FOR COMBINING ... - IEEE Xplore

A Fast Recursive STFT Algorithm - IEEE Xplore

Environment: A Ranking Algorithm - IEEE Xplore

The Viterbi Algorithm - IEEE Xplore

What's in an Algorithm? - IEEE Xplore

A leader-election procedure using records

A New Leader Election implementation - CiteSeerX

A Stable Covering Set-based Leader Election

Leader election using random walks

Leader Election in Trees with Customized Advice

A Biometric-Secure e-Voting System for Election ... - IEEE Xplore

On Leader Green Election - arXiv