Fault tolerance in sensor networks: Performance ... - IEEE Xplore

Fault Tolerance in Sensor Networks: Performance Comparison of Some Gossip Algorithms Marco Baldi, Franco Chiaraluce

Elma Zanaj

Dipartimento di Ingegneria Biomedica, Elettronica e Telecomunicazioni Università Politecnica delle Marche Ancona, Italy {m.baldi, f.chiaraluce}@univpm.it

Departamenti i Elektronikes dhe Telekomunikacionit Fakulteti i Teknologjise se Informacionit Universiteti Politeknik i Tiranes Tirana, Albania [email protected]

Abstract— The goal of this paper is to evaluate the efficiency of three versions of the well known gossip algorithm, namely: basic gossip, push-sum and broadcast, for the distributed solution of averaging problems. The main focus is on the impact of link failures that, reducing the network connectivity, decrease the convergence speed. As a similar effect occurs in non fully-meshed networks, because of a limited coverage radius of the nodes, a comparison is made between these two scenarios. The considered algorithms can require optimization of some share factors; to this purpose, we resort to simulations, but the conclusions achieved are confirmed through analytical arguments, exploiting the concept of potential function. Keywords-gossip; wireless sensor networks; fault tolerance

I.

INTRODUCTION

Ad hoc wireless sensor networks are peer-to-peer systems formed by many small and simple devices, able to measure some quantities and to transmit their measured values to neighboring nodes. In such networks, nodes often communicate in order to merge their single contributions into a common result. Actually, this occurs in averaging problems, whose target is to calculate, in a distributed manner, the average value of a quantity of interest (e.g., temperature). Because of their features, these networks are suitable for many purposes, as environmental monitoring applications, allowing accurate control over large areas with favorable cost-to-benefit ratio [1]. Among these applications, however, hostile environments and scenarios of natural and man-made disasters represent great challenges, in which the network availability must be ensured, in spite of a number of possible impairments. Among the several protocols that are available nowadays for sensors communication, an increasing attention has been devoted to simple decentralized procedures based on the gossip principle, through which the computational burden is distributed among all nodes. The gossip algorithm was originally conceived for telephone networks [2], [3]. When gossip is applied in sensor networks, noting by xi and xj the local measures of the i-th and j-th nodes, an interaction among them updates one or both their values, that are then used for a subsequent interaction. The communication protocols can be managed either in a

synchronous or in an asynchronous way, but the latter is more practical, because of its inherent simplicity. So, in this paper, we will limit to consider an asynchronous time model, in which any node has a clock which ticks independently at the times of a rate 1 Poisson process. Therefore, the inter-tick times at any node are rate 1 exponentials, independent across nodes and over time. Various implementations of gossip for averaging problems are possible; they aim at estimating the mean value of the sensed quantity. More precisely, let us denote by N the number of nodes and by x(k ) = [ x1 (k ), x2 (k ),..., xN (k )]T the vector of the estimates of all nodes after k clock ticks. The target of the algorithm is to find a reliable measure of the average value

xave =

∑ i=1xi (0) N

N , in the shortest possible time, that is,

maximizing the convergence speed. In a first implementation, called “basic gossip” in the following, an interaction among the i-th and j-th nodes produces as output xi(k + 1) = xj(k + 1) = xi(k)/2 + xj(k)/2, that is used by both nodes for the subsequent interaction [4]. A variant of this proposal consists in the so-called “push-sum” algorithm [5]. According with such protocol, a node forwards a share of its values, properly defined, to one of its neighbors, randomly selected, while keeping the remaining part. The performance of the push-sum algorithm depends on the choice of the share, that therefore represents a degree of freedom to optimize. Both the basic gossip and the push-sum algorithm are point-to-point protocols. However, in a wireless network, when a node transmits, all nodes in its coverage area can receive the transmitted data. This suggests to implement a broadcast algorithm to reduce the averaging time. Although the fundamentals of the considered protocols are well known and a number of papers on these topics already appeared in previous literature, several issues are still open. Among them, we have mentioned above the problem of optimizing the share values in the push-sum algorithm. In [5] the authors limited to say that the choice of shares may be deterministic or randomized, and may or may not depend on the time, without providing, however, a numerical evidence of

the impact resulting from the different choices. The same was, at our best knowledge, in the subsequent literature. Only very recently, in [6], we have presented a first ensemble of simulated and theoretical results on this issue, focusing on ring and random geometric graph topologies. Another relevant topic, rarely explored in the past, concerns evaluating the performance of gossip algorithms in the presence of link failures. Actually, when averaging algorithms are adopted in wireless sensor networks, the shadow fading or other kinds of radio impairments could prevent some links from being used, due to their poor quality in terms of signal-tonoise ratio. The study of networks with link failures could seem not different from that of non fully-meshed networks where, because of a limited coverage radius, each node can reach directly only a limited set of neighbors, being linked to the others only through multiple hops (which means to pass through intermediate nodes). Really, the two situations are rather different; failures can be modeled as a stochastic phenomenon, and when a percentage x of links fail, malfunctions are generally distributed at random, without any specific correlation between distinct failures. Obviously, in some cases, failures may be due to mechanisms involving simultaneously a number of nodes that are close one each other; but this appears as a particular case, while the uncorrelation assumption seems more suitable to model practical situations. In this paper, the convergence speed of the selected gossip algorithms, in presence of random link failures throughout the network, is investigated. Our analysis is mainly based on numerical simulations, but some theoretical issues are also discussed, particularly in regard to the share optimization when the push-sum approach is applied. We develop a number of comparisons, with the aim to show limits and potentialities of the considered techniques. The organization of the paper is as follows. In Section II we define the considered gossip versions. In Section III we introduce the simulation parameters and describe the graph whose performance in the presence of link failures will be investigated afterwards. In Section IV we face, from a theoretical viewpoint, the problem of share factor optimization in the push-sum algorithm; an analytical approach is developed, based on the computation of the potential function. In Section V we present a number of simulated results, first considering the various algorithms separately, and then in comparative terms. Finally, Section VI concludes the paper. II.

THE CONSIDERED GOSSIP ALGORITHMS

A. Basic Gossip The basic gossip algorithm is very simple, and has been briefly described in the Introduction. Its main steps are as follows: 1. 2.

Node i chooses (at random) another node, j, inside its coverage area. Nodes i and j split their information into two equal parts, xi(k)/2 and xj(k)/2, keeping one part and sending the other.

3.

Nodes i and j calculate their new estimates by adding the received value to that already stored: xi(k+1) = xj(k+1) = [xi(k) + xj(k)]/2.

The choice of j is done according with a uniform distribution, conditioned on the value of the Euclidean distance Dij between nodes i and j. In other words, the probability that node i contacts node j (≠ i) when it is selected for transmission is given by:

⎧1 ⎪ , Dij ≤ r , pij = ⎨ di ⎪0, Dij > r , ⎩

(1)

where di is the number of nodes within its coverage area, that is delimited by a coverage radius r, assumed to be equal for all nodes. So, the coverage radius represents the maximum distance at which a node can transmit reliably. Clearly, in order to recognize the nodes inside the coverage area, a query session is required, before interaction starts. We suppose that each node performs a very simple query aimed at knowing the number of reachable neighbors, di, in its coverage area. More sophisticated localization strategies [7] can be adopted, that permit to implement more efficient versions of averaging algorithms. In [8], for example, a geographic gossip algorithm has been proposed, based on greedy routing, that is potentially able to provide remarkable gains. But applicability of this kind of protocols, where each node must compute and compare a large number of distances from a prefixed target, seems difficult. For this reason, we have not included these gossip versions in our study. Alternatively, the selection probabilities could be optimized with the final goal to minimize the convergence speed [4] but, once again, this would make more involved the interaction while not providing, in many cases, substantial improvements [9]. The probabilities pij can be collected in a matrix P, with N×N entries. This matrix is stochastic, i.e., each of its rows sums to 1. On the other hand, if the link between i and j fails, the corresponding pij is set equal to zero, independently of the value of Dij. In this case, matrix P is no longer stochastic, and the i-th node has a probability to communicate pij < 1 . In

∑j

other words, if the link between i and j fails, at some clock tick the i-th node tries transmitting to the j-th node without success, thus wasting the communication attempt. Obviously, this reflects on the averaging time, that increases in a manner dependent on the number of faulty links and their distribution. B. Push-Sum Algorithm The push-sum protocol proceeds as follows. At the i-th node, with i = 1, 2, …, N, two quantities are stored and updated through the interaction with the other nodes: they are named si(k) and wi(k), respectively. These quantities satisfy the following mass conservation properties, for any k:

N

∑ i =1

N

si (k ) =

∑ i =1

N

xi (0) = Nxave

,

∑

wi (k ) = N .

(2)

i =1

When the protocol starts, that is, once having acquired the sensed values, we have si(0) = xi(0) and wi(0) = 1, ∀i . Afterwards, if the clock of the i-th node ticks at the k-th time instant (let us remind that transmission is asynchronous in the considered system) it selects randomly one of its neighbors, say j, and sends to it a fraction (1 − α) of its parameters, while it retains the remaining fraction α. As a consequence, the parameters at nodes i and j are modified as follows: si (k ) = αsi (k − 1), wi (k ) = αwi (k − 1), s j (k ) = s j (k − 1) + (1 − α) si (k − 1),

(3)

w j (k ) = w j (k − 1) + (1 − α) wi (k − 1), while parameters at all the other nodes remain unchanged. This way, conditions (2) are certainly satisfied. A new estimate at the interacted nodes is then derived as xm(k) = sm(k)/wm(k), with m = i, j. In [5], where, besides point-to-point communications, also broadcast transmissions were considered, a more general mechanism was applied, where the share factor can be different for any node and even variable with time. This model, however, seems too involved for practical applications; so, we prefer to consider a single and constant α, whose value should be optimized in order to achieve the fastest convergence speed. On the other hand, a bidirectional version of the push-sum algorithm could be also adopted where, every time node i contacts node j, sending to it a fraction of its message, node j does symmetrically the same, sending to node i a share of its own message. In Appendix A we show that, at least for a fullymeshed network, the optimum share for this case is 1/2. So, under this choice, such modified version of the push-sum algorithm practically becomes identical to the basic gossip. C. Broadcast Algorithm The idea to implement a broadcast algorithm originates in the observation that, when a node transmits some information, all the other nodes in its coverage area are able to receive the transmitted data. This suggests to implement a broadcast averaging algorithm that, at the expense of a slight increase in complexity, allows to reduce significantly the averaging time. This broadcast algorithm is unidirectional, as the information flows from a transmitting node to a number of receiving nodes (depending on the coverage radius and the random nodes distribution) but not in the opposite sense. Similarly to the push-sum algorithm, the i-th node maintains a sum, si(k), and a weight, wi(k). When the algorithm starts, that is for k = 0, we have wi(0) = 1 and si(0) = xi(0), that coincides with the initial sensed value at node i. When the i-th node’s clock ticks, say at step k, the node splits its information into a number of parts; it may keep the first, so that [wi(k+1), si(k+1)] = αi[wi(k), si(k)], while it sends each neighbor j one of the remaining parts: αij[wi(k), si(k)]. Node j receives the

transmission and updates its values by adding the received ones, so that [wj(k+1), sj(k+1)] = [wj(k), sj(k)] + αij[wi(k), si(k)]. As stated in the expressions, this mechanism is ruled by the share parameters, αi and αij, that can be collected in a matrix A, having αii = αi along the main diagonal. The elements of A, satisfying the condition Σjαij = 1, can be chosen at random or following some suitable deterministic rule. Although different laws can be adopted [10], in this paper we assume αi = 0 and:

⎧1 ⎪ αij = ⎨ di ⎪0 ⎩

Dij ≤ r , j ≠ i,

(4)

Dij > r.

This choice is quite similar to (1) and gives good results [11]. On the other hand, by assuming α i ≠ 0 , the following rule could be adopted:

⎧α i , j = i, ⎪ (1 ) − α ⎪ i , Dij ≤ r , j ≠ i, αij = ⎨ d i ⎪ ⎪0, Dij > r. ⎩

(5)

However, in Appendix B we verify analytically that the choice αi = 0 can be optimal in the case of fully-meshed networks without failures. Moreover, in Section V.C we report numerical simulations showing that, also in presence of link failures, the averaging time with αi = 0.5 is greater than that with αi = 0. III.

SIMULATION PARAMETERS

The convergence speed of the considered protocols is evaluated through the computation of the normalized difference between the estimated average and the true average. More precisely, we determine in R simulations (with R sufficiently large) the random variable e(k) = ||x(k) – xave 1||/||x(0)||, where ||x|| denotes the l2 norm of the vector x and 1 is the vector of all ones. This way, a set of R curves em(k) is obtained, m = 1…R, that are averaged in order to compute:

e(k ) =

1 R

R

∑ em (k )

(6)

m =1

which has the meaning of estimated mean curve. In order to average over possible different initial conditions, x(0) is randomly changed at the beginning of each simulation. According to the probability theory, it is known that: lim e(k ) = e(k )

R →∞

(7)

In this paper, simulations are done over a random geometric graph (RGG), where nodes are randomly distributed in a unit square, according with a 2D homogeneous Poisson point process. The number of neighbors, di, the i-th node is linked to gives its nodal degree. In the case of regular graphs (like the ring, for example) di is equal for all nodes and it measures directly the connectivity level of the network. On the other hand, in general, each node is characterized by the coverage radius r; for the RGG, even assuming that all nodes have the same r, the nodal degree is generally not unique. Moreover, for each value of r, the connectivity level can vary from graph to graph. So, an average nodal degree, d , must be computed for the analysis purposes. An example of RGG, for N = 50 and r = 0.4, is shown in Fig. 1. The behavior of d , as a function of r, is shown in Fig. 2; for each value of the coverage radius, 100 RGGs have been randomly generated, and their nodal degrees have been averaged.

d = ( N − 1)(1 − x ) .

(8)

The validity of (8) has been confirmed through simulation, as shown in Fig. 3. 50 40 30

where e(k ) represents the true average of e(k). Once having determined (6), the averaging time is defined as the number of clock ticks, say k*, that permits to have a normalized difference smaller than a prefixed value, for example e(k*) ≤ 10−10.

Simulated

20 Theoretical

10 0 0.0

0.2

0.4

x

0.6

0.8

1.0

Figure 3. Average value of d for a fully-meshed network with N = 50, in presence of a link failure rate x, ranging between 0. 1 and 0.9.

IV.

SHARE FACTOR OPTIMIZATION

As mentioned in Section II.B, for the push-sum algorithm an important issue concerns optimization of the share factor α that appears in (3). A useful analytical tool, in this sense, is provided by the “potential function” method.

Figure 1. Example of RGG with N = 50 and r = 0.4.

In order to define the potential function for the push-sum algorithm, let us consider a vector vi(k) (that does not contain any measured quantity, but is only introduced for analysis purposes), whose components, vij(k), are such that:

50

40 30

N

si (k ) =

∑ vij (k ) x j (0).

(9)

j =1

The following condition is satisfied:

20 N

10 0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 r Figure 2. Average value of d as a function of r, computed over 100 random geometric graphs.

As one of the objects of our study is to compare the impact of failures against that of a limited r, we suppose to start with a fully-meshed network (that implies to have r ≥ 2 ) and to eliminate, at random as well, a fraction x of its links. So, while in absence of failures the network connectivity is N – 1 (see Fig. 2), in the new scenarios the average value of d becomes approximately:

wi (k ) =

∑ vij (k ).

(10)

j =1

It is evident that, if vi(k) is nearly proportional to the all-one vector, then xi(k) = si(k)/wi(k) is close to the true average. The potential function at time k is defined as follows [5]: N

Φ(k ) =

N

∑∑ i =1

2

wi (k ) ⎤ ⎡ ⎢vij (k ) − N ⎥ . ⎦ j =1 ⎣

(11)

So, in the limit case of all nodes perfectly aware of the true average, the potential function is null. Based on this evidence,

evaluation of the mean potential function, for any k, should permit us to estimate the convergence speed of the algorithm. More precisely, assuming that, at instant k, node l is selected as the transmitter and node m as the receiver, the following difference between the potential functions at time instant k and k + 1 can be easily derived: N

δΦ = Φ(k ) − Φ(k + 1) = 2α(1 − α) N

−2(1 − α)

⎡

∑ ⎢⎣vlj (k ) − j =1

wl (k ) ⎤ ⎡ ⎢ vlj (k ) − N ⎥ ⎦ j =1 ⎣

∑

2

wl (k ) ⎤ ⎡ w (k ) ⎤ ⋅ vmj (k ) − m ⎥ . N ⎥⎦ ⎢⎣ N ⎦ (12)

In the following of this section (and in the appendixes, too) we will omit, for the sake of simplicity but without ambiguity, to indicate that the quantities at the right hand side are computed at the time instant k. We wish to compute the average of (12) over all possible choices, uniformly distributed, of the transmitting and receiving nodes. In the case of a fully-meshed network, through simple algebra, it is possible to find:

2 N

δΦ =

1− α ⎤ ⎡ ⎢α(1 − α) + N − 1 ⎥ Φ ⎣ ⎦

(13)

where Φ = Φ(k). A criterion for optimizing the value of α can consist in maximizing δΦ / Φ . According with its own meaning, in fact, to have a large δΦ , for a given Φ, should reflect in a high convergence speed. Now, from (13), δΦ / Φ is maximum for

αopt =

N −2 2( N − 1)

(14)

and it is quite evident that, for N sufficiently large, such value can be approximated by 0.5. In the case of non fully-meshed network, instead, that can occur because of a limited value of r and/or the appearance of link failures, Eq. (13) is no longer valid, and must be replaced as follows:

δΦ = −

2α(1 − α) 2(1 − α) Φ+ N N

2(1 − α) N

N

N

⎛

1

N

N

w ⎞ 1⎛ vlj − l ⎟ dl ⎜⎝ N⎠ j =1 l =1

∑∑

w ⎞⎛

∑∑ dl ∑ ⎜⎝ vlj − Nl ⎟⎜⎠⎝ vmj − j =1 l =1

m∈Cl

wm ⎞ N ⎟⎠

where Cl is the subset of nodes that includes node l and the nodes it is linked to. The higher complexity of (15) with respect to (13) is evident. First of all, a new contribution has been added, that is null in the case of a fully-meshed network, because of the mass conservation property (2). Secondly, it seems not possible to evidence, at the right side, the potential function Φ, that is a necessary step toward maximization of δΦ / Φ . To circumvent the problem, we introduce the position 1/dl ≈ 1/ d , ∀ l = 1…N. Based on this approximation, Eq. (15) can be rewritten as:

δΦ ≈ −

2 N

2(1 − α) 1 N d

N

N

∑∑ ∑

j =1 l =1 m∈Cl

⎤

(1 − α )⎥ Φ ⎦

wl ⎞⎛ wm ⎞ ⎛ ⎜ vlj − N ⎟⎜ vmj − N ⎟. ⎝ ⎠⎝ ⎠

(16)

However, the problem of evaluating the last term remains. An estimation of such term can be obtained, based on the definition of Laplacian matrix [12], as reported next. The Laplacian matrix Q(G) of a graph G (V , E ) , where V is the vertex set containing the N nodes and E is the edge set, is an N×N matrix whose elements are defined as follows:

⎧di if i = j , ⎪ Qij = ⎨−1 if i ≠ j and (i, j ) ∈ E , ⎪0 otherwise. ⎩

(17)

The Laplacian matrix can be also written as Q = Δ − M where Δ is the diagonal matrix with elements Δii = di, and M is the adjacency matrix of the considered graph. The eigenvalues of Q are called the Laplacian eigenvalues. They are all real and non negative, and satisfy the condition: 0 = λ1 ≤ λ 2 ≤ … ≤ λ N . The second smallest eigenvalue, λ2, is also known as the algebraic connectivity, and is particularly important. λ2 is equal to zero only if G is disconnected. Other properties of matrix Q and its eigenvalues can be found in the literature (see [13], for example). Let yij = vij − wi / N , i = 1, …, N, be the components of vector yj. Through simple algebra, Eq. (16) can be rewritten as follows:

2

δΦ = − (15)

⎡ 1 ⎢α(1 − α) + d ⎣

2(1 − α)2 1 2(1 − α) Φ+ N d N

(

N

∑ y jT Qy j .

(18)

j =1

Let us define vector z = y1T , y 2T ,… , y N T

)

T

, having N2

components; it is evident that zT z = Φ . Moreover, let us consider a block matrix L, with size N2×N2, having N

repetitions of Q along the main diagonal and all the other blocks equal to the null matrix. Also L can be interpreted as a Laplacian matrix, whose eigenvalues coincide with those of Q, but each appears with multiplicity N.

different from that occurring in the case of limited coverage radius, where, in the same range of d , the αopt can become as low as 0.2, and the theoretical analysis, with approximation (22), is therefore much less effective.

Using these further definitions, Eq. (18) can be rewritten as: ⎡ 2(1 − α) 2 1 2(1 − α) zT Lz ⎤ δΦ = ⎢ − + ⎥Φ N d N zT z ⎦⎥ ⎣⎢ ⎡ 2(1 − α)2 ⎤ 1 2(1 − α) RQ ⎥ Φ = ⎢− + N d N ⎣⎢ ⎦⎥

(19)

having denoted by RQ = zT Lz / zT z the so-called Rayleigh quotient. So, by following the same reasoning as before, the value of α that maximizes δΦ / Φ results in:

αopt = 1 −

1 RQ . d 2

(20)

By applying the Courant-Fischer Minimax Theorem [14], it is possible to say that: λ 2 ≤ RQ ≤ λ N .

(21)

As a confirmation of the correctness of (20), we can observe that, in the case of a fully-meshed network, all the eigenvalues λi, with i ≥ 2, are equal to N and, because of (21), Eq. (20) becomes equal to Eq. (14). In general, however, the value of RQ is not easy to determine. In our analysis, we have decided to approximate RQ by the average of the non null eigenvalues, i.e.: N

∑ λi RQ ≈

i=2

N −1

.

(22)

In [6] we have verified that the value of αopt obtainable from this assumption, in case r < 0.6, can be rather different from the actual optimum value. On the contrary, because of the hypothesis of randomly distributed failures, approximation (22) reveals to be much more acceptable when the nodal degree reduction is due to faulty links. This remark will be confirmed in Section V. By employing the analytical (though approximate) approach, for an RGG with N = 50 and 10 < d < 49, we have found 0.39 < αopt < 0.49. As from Fig. 3 we see that such range of values of d corresponds to x ≤ 0.8, we can expect that the optimum value of α does not change significantly, even for very large failure rates. This conclusion will be confirmed, in the following section, through numerical simulations, and is

V. RESULTS In this section we present a number of simulated results for the RGG with N = 50, in the presence of different percentages of link failures. First we consider the selected algorithms separately, and then in comparative terms. We also discuss the difference with the performance of the same RGG when reduced connectivity follows from a limited coverage radius. Table 1 shows the values considered for the average connectivity level d , together with the coverage radius r (for a non fully-meshed network) and the link failure rate x (for a fully-meshed network affected by failures). These values are common to all the algorithms. TABLE I VALUES OF COVERAGE RADIUS AND FAILURE RATE THAT PRODUCE ALMOST IDENTICAL AVERAGE CONNECTIVITY. Failure rate x d Limited r (x = 0) r≥ 2

(

0.3 0.4 0.5 0.6 0.7 0.8 0.9

11.8 18.6 26.2 32.7 38.4 43.1 46.4

)

0.76 0.63 0.48 0.35 0.23 0.14 0.07

The values in the table must be interpreted in the following sense: each row specifies the coverage radius and the failure rate for non limited coverage radius that determine (approximately) the same average connectivity level. As an example, d = 11.8 is achieved either in a non fully-meshed network with r = 0.3 or in a fully-meshed network where 76% of the links are faulty. This way, we are able to compare the impact of the two different mechanisms that may be responsible for the reduction in the network connectivity. In the following, the pairs determining the same connectivity level will be denoted by r / x. A. Basic Gossip The simulated curves of mean normalized error, as defined by (6), are shown in Figs. 4(a) and 4(b) for the considered scenarios. We can fix attention on a specific error value and compare the k* needed for reaching this value under a limited coverage radius or a non null failure rate, the value of d being equal. A couple of numerical examples are shown in Table II; the k* for the starting full-mesh network with no link failures is also reported as a benchmark.

0

10 -1 10 -2 10 -3 10 -4 10 -5 10 -6 10 -7 10 -8 10 -9 10 -10 10 -11 10 -12 10 -13 10 -14 10

Full mesh x = 0.76 x = 0.63 x = 0.48 r = 0.3 r = 0.4 r = 0.5 0

500

1000

1500

k

2000

2500

3000

(a)

B. Push-Sum Algorithm The first problem to face for the push-sum algorithm is optimization of the share factor α in (3). The theoretical analysis developed in Section IV gives a solid reference that, however, needs to be verified. For this purpose, we have considered 0.1 ≤ x ≤ 0.9 and, for each value of the failure rate x, we have determined the αopt as the one minimizing the averaging time over a number of repetitions of the random experiment large enough to guarantee a satisfactory statistical confidence level. The result obtained is shown in Fig. 5, as a function of the average nodal degree; the relationship between x and d can be obtained from Table 1 or, almost equivalently, from (8). The curve is rather irregular but the optimal α is comprised between 0.42 and 0.48, that is in line with the results of the theoretical analysis in Section IV. 0.50

10 -1 10 -2 10 -3 10 -4 10 -5 10 -6 10 -7 10 -8 10 -9 10 -10 10 -11 10 -12 10 -13 10 -14 10

0.45 0.40

αopt

0

Full mesh x = 0.35 x = 0.23 x = 0.14 r = 0.6 r = 0.7 r = 0.8 0

500

0.35 0.30 0.25

1000

1500

k

2000

2500

3000

Theoretical Simulation (limited radius) Simulation (random faults)

0.20 10

15

20

25

30

35

40

45

50

(b) Figure 4.

e ( k ) for different values of the coverage radius and failure rate

x (basic gossip algorithm).

TABLE II

( )

NUMBER OF CLOCK TICKS REQUIRED TO REACH e k * r/x ≥ 2 /0 0.6 / 0.35 0.4 / 0.63

= 10−10 .

k* (LIMITED RADIUS)

k* (RANDOM FAILURES)

2165

2165

2661 > 3000

2210 2347

We notice that, as obvious, both mechanisms increase the convergence time, but the impact of the limited radius is stronger. In other words, for a given vale of d , to achieve the

( )

target e k *

requires a longer time (higher k*) when the

network is non fully-meshed because of the limited coverage radius, while the convergence is faster when the reduced connectivity is due to failures and missing links are distributed randomly.

Figure 5. Simulated αopt for the push-sum algorithm as a function of the average nodal degree of the RGG with N = 50, in the presence of link failures and in a network where nodes have a limited coverage radius.

The theoretical approach is not able to distinguish between the case of a limited radius and that of link failures. From the figure, we see that the actual network behavior is well predicted by theory when missing links are distributed at random. When they follow from limited coverage radius, instead, numerical simulations give results that can differ significantly from theoretical expectations, particularly for low connectivity levels. Based on the simulated results, we can also say that the optimal value of α is close to 0.5 for the case of random faults, practically for any value of d (and, as observed, this is predicted by the theory), while smaller values should be adopted for α in the case of limited radius, particularly when r < 0.6. This statement is confirmed in Fig. 6, where α = 0.3 and α = 0.5 have been considered for both situations. While in case of link failures the result for α = 0.5 is better than that for α = 0.3, the opposite occurs for the non fully-meshed network, where the smaller (although not necessarily optimum) value of α reduces the convergence time.

We also see that, for both values of α, the impact of faulty links is stronger than that of limited coverage radius. This is an important difference between the behavior of the bidirectional algorithm (basic gossip) and the unidirectional one (push-sum). More will be said in Section V.D about the comparison between the two approaches.

to that of the basic gossip and, for the same d , convergence of the faulty network is usually faster than that of the non fullymeshed one.

0

10

-1

10

0

10

-2

10

-1

10

-2

10

-3

-3

10

Full mesh x = 0.76 x = 0.63 = 46.4 r = 0.3 r = 0.4 r = 0.9

-4

10

10

-5

10

Full mesh x = 0.76 x = 0.63 x = 0.07 r = 0.3 r = 0.4 r = 0.9

-4

10

-5

10

-6

10

-7

10

0

500

-6

10

-7

10

0

1000

1500

k

2000

2500

500

1000

k

1500

2000

1500

2000

(a)

3000 0

10

(a) -1

10

0

10

-2

10

-1

10

-2

10

-3

-3

10

Full mesh x = 0.76 x = 0.63 x = 0.07 r = 0.3 r = 0.4 r = 0.9

-4

10

10

-5

10

Full mesh x = 0.76 x = 0.63 x = 0.07 r = 0.3 r = 0.4 r = 0.9

-4

10

-5

10

-6

10

-7

10

0

500

-6

10

-7

10

0

1000

1500

k

2000

2500

Figure 6.

1000

k

(b)

3000

(b)

500

Figure 7.

e ( k ) for different values of the coverage radius r and failure

rate x by using the broadcast algorithm with: (a) αi = 0.5, and (b) αi = 0.

e ( k ) for different values of the coverage radius r and failure

rate x by using the push-sum algorithm with: (a) α = 0.3 and (b) α = 0.5.

C. Broadcast Algorithm The analysis of the previous sections has been repeated for the broadcast algorithm. Fig. 7(a) shows the results, for some values of d , when adopting rule (5) with αi = 0.5, while Fig. 7(b) shows the results, for the same values of d , when adopting rule (4) (αi = 0). Although in Appendix B the optimality of αi = 0 has been proved only for the case of a fully-meshed network, simulation results demonstrate that it remains a good choice also for the case of non fully-meshed or faulty networks. In regard to the comparison between the impact of the two causes of reduced network connectivity, we observe that the behavior of the broadcast algorithm is similar

D. Comparison between the different algorithms For the sake of comparison, the results for the various algorithms and some pairs r / x, corresponding to the same value of d , are summarized in Fig. 8. As expected, the broadcast algorithm is able to provide the fastest convergence to the average value, due to its point-tomultipoint nature. However, the basic gossip algorithm is also able to achieve good performance, though being more simple and requiring interaction only between couples of nodes. Its loss in terms of clock ticks, with respect to the broadcast algorithm, is usually limited within 30%. Moreover, there are situations, for very limited coverage radius, where the basic gossip outperforms the broadcast algorithm (see the case r = 0.3 in Fig. 8(b)).

Performance of the push-sum algorithm, instead, is worse than that of the first two algorithms. For a network with good connectivity (considered in Fig. 8 (a)), the push-sum algorithm requires approximately a doubled number of clock ticks with respect to the gossip algorithm to reach the same value of e ( k ) . This can be justified if we consider that, in push-sum, each interaction between two nodes is unidirectional, while in the basic gossip it is bidirectional. In other words, if comparison would be made on the basis of the number of transmissions, instead of the number of clock ticks, the efficiency of basic gossip and push-sum would become similar.

0

10

Gossip r = 0.9 Gossip x = 0.07 Push-sum r = 0.9 Push-sum x = 0.07 Broadcast r = 0.9 Broadcast x = 0.07

-1

10

-2

10

-3

10

-4

10

-5

10

-6

10

-7

10

0

500

1000

1500

k

2000

2500

3000

importance, this topic has been rarely debated in previous literature. Some important conclusions can be drawn from our analysis. First of all, we have demonstrated that the convergence of the algorithms is preserved, on average, regardless of the solution adopted, up to acceptably low values of the mean normalized error. Then, we have verified that the share factor in nodes interaction, when applicable, should be optimized for taking into account the network connectivity. However, starting from the results known for fully-meshed networks, the optimum share factor for the push-sum algorithm is less sensitive to the failure rate than to the limited coverage radius, that is another common reason for reduced connectivity. Moreover, the assumption of random distribution for the link failures makes applicable an approximate, and inherently simple, analytical approach, based on the potential function, that instead does not provide equally accurate solutions for the case of limited coverage radius. APPENDIX A A modified version of the push-sum algorithm consists in adopting a bidirectional protocol in place of the unidirectional one. In practice, noting by l and m the interacting nodes, l sends to m a fraction (1 − α) of its parameters, while retaining a fraction α, and the same is done, symmetrically, from node m to node l. So, the updating rule for nodes l and m, is as follows: sl (k ) = αsl (k − 1) + (1 − α) sm (k − 1),

(a)

wl (k ) = αwl (k − 1) + (1 − α) wm (k − 1),

0

10

sm (k ) = αsm (k − 1) + (1 − α) sl (k − 1),

-1

10

(23)

wm (k ) = αwm (k − 1) + (1 − α) wl (k − 1),

-2

10

while parameters at all the other nodes remain unchanged.

-3

10

-4

10

Gossip r = 0.3 Gossip x = 0.76 Push-sum r = 0.3 Push-sum x = 0.76 Broadcast r = 0.3 Broadcast x = 0.76

-5

10

-6

10

-7

10

0

500

1000

k

1500

2000

It is evident that such variant can be seen also as a modification of the basic gossip with, potentially, an asymmetric share to optimize. However, based on arguments quite similar to those presented in Section IV, we can show that the choice α = 1/2 is the best one for the case of fully-meshed network. For this purpose, we can resort to the concept of potential function. Starting from (11), the potential function after one clock tick (that in this case, however, corresponds to two transmissions, from l to m and vice versa) is as follows:

(b) 2

Figure 8.

e ( k ) for the considered averaging algorithms in the case of: (a)

r = 0.9 / x = 0.07 and (b) r = 0.3 / x = 0.76; optimum shares have been used, when applicable.

VI.

CONCLUSION

We have developed a numerical analysis of the averaging time behavior for basic gossip, push-sum and broadcast algorithms, taking into account the impact of link failures, randomly distributed in the network. In spite of its practical

Φ(k + 1) =

⎡ ⎛ wi ⎞ wl ⎞ ⎛ ∑ ∑ ⎜ vi , j − ⎟ + ∑ ⎢ α ⎜ vl , j − ⎟ + (1 − α ) N N⎠ i≠l,m j ⎝ j ⎣ ⎝ ⎠ 2

⎡ w ⎞⎤ w ⎛ ⎛ ⋅ ⎜ vm, j − m ⎟ ⎥ + ∑ ⎢(1 − α ) ⎜ vl , j − l N ⎠⎦ N j ⎣ ⎝ ⎝

2

wm ⎞ ⎤ ⎞ ⎛ ⎟ + α ⎜ vm, j − N ⎟ ⎥ . ⎠ ⎝ ⎠⎦ (24)

Through simple algebraic manipulations, this can be rewritten as:

2 2 ⎡ ⎛ wi ⎞ wl ⎞ ⎛ Φ(k + 1) = ∑∑ ⎜ vi , j − ⎟ + 2α ( α − 1) ⎢ ∑ ⎜ vl , j − ⎟ N⎠ N⎠ i j ⎝ ⎢⎣ j ⎝

w ⎞ ⎤ w ⎞⎛ w ⎛ ⎛ + ∑ ⎜ vm, j − m ⎟ ⎥ + 4α ( α − 1) ∑ ⎜ vl , j − l ⎟⎜ vm, j − m N ⎠ ⎥⎦ N ⎠⎝ N j ⎝ j ⎝ 2

⎞ ⎟. ⎠

(25) By averaging over all possible pairs (l, m) that, we remind, are chosen according with a uniform law, we obtain:

δΦ = Φ (k ) − Φ (k + 1) =

4α (1 − α) Φ(k ) N

(26)

By averaging over all possible l, we obtain:

δΦ = Φ (k ) − Φ(k + 1) =

REFERENCES [1]

[2]

[4]

APPENDIX B Let us consider rule (5) for the broadcast algorithm. In the particular case of a fully-meshed network and equal share factors for all nodes, it becomes:

⎧ α, ⎪ αij = ⎨ 1 − α ⎪ N −1 ⎩

j =i

[5]

[6]

(27)

[7]

So, similarly to the push-sum algorithm, the value of α should be optimized. The analysis based on the potential function, however, gives α = 0 as the “optimum” value, thus confirming the goodness of the positions in (4) (at least for the particular case of a fully-meshed network). The demonstration is reported next.

[8]

j≠i

[9]

[10]

Using (27), the potential function at time k + 1 results in: [11]

Φ ( k + 1) =

⎡ ⎛

∑ ⎢ α ⎜⎝ v j

⎣

l, j

−

wl ⎞ ⎤ ⎟⎥ N ⎠⎦

2

(28)

2

⎡⎛ w ⎞ 1− α ⎛ w ⎞⎤ vl , j − l ⎟ ⎥ . + ∑ ∑ ⎢ ⎜ vi , j − i ⎟ + ⎜ N ⎠ N −1⎝ N ⎠⎦ i≠l j ⎣⎝

[12] [13]

Through simple algebraic manipulations, we obtain: [14]

(

Φ ( k + 1) − Φ ( k ) = α 2 − 1 +

2 (1 − α )

)

w ⎞ N ⎛ ∑ vl , j − Nl ⎟⎠ N − 1 j ⎜⎝

w ⎞⎛ w ⎞ ⎛ ∑ ∑ vi , j − Ni ⎟⎠ ⎜⎝ vl , j − Nl ⎟⎠ . N − 1 i j ⎜⎝

2

(29)

(30)

and the ratio δΦ / Φ(k ) is maximum for α = 0.

[3]

and the ratio δΦ / Φ (k ) is maximum for α = 1/2.

1 − α2 Φ (k ) N −1

G. Barrenetxea et al., “DemoAbstract: SensorScope, an urban environmental monitoring network,” in Proc. 4th European Conference on Wireless Sensor Networks (EWSN 2007), Delft, Netherlands, Jan. 2007. B. Baker, R. Shostak, “Gossips and telephones,” Discrete Mathematics, vol. 2, no. 3, pp. 191–193, 1972. G. Berman, “The gossip problem,” Discrete Mathematics, vol. 4, no. 1, p. 91, 1973. S. Boyd, A. Ghosh, B. Prabhakar, D. Shah, ‘‘Randomized gossip algorithms,” IEEE Trans. Inf. Theory, vol. 52, no. 6, pp. 2508–2530, June 2006. D. Kempe, A. Dobra, J. Gehrke, ‘‘Gossip based computation of aggregate information,” in Proc. IEEE Conference on Foundation of Computer Science, Cambridge, MA, Oct. 2003, pp. 482–491. E. Zanaj, M. Baldi, F. Chiaraluce, “Optimal share factors in the pushsum algorithm for ring and random geometric graph sensor networks,” submitted for publication in the Journal of Communications Software and Systems, 2009. N. Patwari, J. N. Ash, S. Kyperountas, A. O. Hero III, R. L. Moses, N. S. Correal, “Locating the nodes: Cooperative localization in wireless sensor networks,” IEEE Signal Processing Magazine, vol. 22, no. 4, pp. 54-69, July 2005. A. G. Dimakis, A. D. Sarwate, M. J. Wainwright, “Geographic gossip: efficient averaging for sensor networks,” IEEE Trans. Signal Processing, vol. 56, no. 3, pp. 1205-1216, March 2008. E. Zanaj, M. Baldi, F. Chiaraluce, “Efficiency of the gossip algorithm for wireless sensor networks,” in Proc. SoftCOM 2007, Split, Dubrovnik, Croatia, September 2007, Paper 7072. E. Zanaj, M. Baldi, F. Chiaraluce, “Efficiency of unicast and broadcast gossipo algorithms for wireless sensor networks,” Journal of Communications Software and Systems, vol 4, no. 2, pp. 105-112, June 2008. M. Baldi, F. Chiaraluce, E. Zanaj, “Comparison of averaging algorithms for wireless sensor networks,” in Proc. ICTTA’08, International Conference on Information and Communication Technologies, Damascus, Syria, April 2008, Paper TEL05_7. R. Merris, “A survey of graph Laplacians,” Linear and Multilinear Algebra, vol. 39, no. 1 & 2, pp. 19-31, July 1995. B. Mohar, “The Laplacian spectrum of graphs,” included in Graph Theory, Combinatorics, and Applications, Vol. 2, Y. Alavi, G. Chartrand, O. R. Oellermann, A. J. Schwenk Eds., Wiley, pp. 871-898, 1991. J. Shawe-Taylor, N. Cristianini, “Kernel Methods for Pattern Analysis,” Cambridge University Press, 2004.

Fault tolerance in sensor networks: Performance ... - IEEE Xplore

Fault tolerance in sensor networks: Performance ... - IEEE Xplore

Suggest Documents

Software Implemented Fault Tolerance: Technologies ... - IEEE Xplore

Performance Control in Wireless Sensor Networks: The ... - IEEE Xplore

security in wireless sensor networks - IEEE Xplore

SENSOR NETWORKS SECURITY ISSUES IN ... - IEEE Xplore

Reprogramming in Wireless Sensor Networks - IEEE Xplore

Fault Tolerance in Wireless Sensor Networks â A Survey

fault tolerance in wireless sensor networks - Semantic Scholar

Fault Tolerance in ZigBee Wireless Sensor Networks - Intelligent ...

fault tolerance in wireless sensor networks using ... - Semantic Scholar

Performance Evaluation of Wireless Sensor Networks ... - IEEE Xplore

Sensor Fault Detection and Fault Tolerant Control of ... - IEEE Xplore

Modeling and Predicting Fault Tolerance in Vehicular ... - IEEE Xplore

A Survey on Fault Tolerance Techniques in Wireless ... - IEEE Xplore

Sensor Data Cryptography in Wireless Sensor Networks - IEEE Xplore

Investigating Sensor Networks with Concurrent ... - IEEE Xplore

Deploying Wireless Sensor Networks with Fault-Tolerance for

Wireless Sensor Actor Networks - IEEE Xplore

optimal fault-tolerant event detection in wireless sensor ... - IEEE Xplore

Nest: a nested-predicate scheme for fault tolerance ... - IEEE Xplore

Anomaly Detection in Wireless Sensor Networks in a ... - IEEE Xplore

Anomaly Detection in Wireless Sensor Networks in a ... - IEEE Xplore

Fault Location in Distribution Networks by Compressive ... - IEEE Xplore

Automating Fault Tolerance in High-Performance ... - arXiv

6th IEEE International Workshop on Sensor Networks ... - IEEE Xplore

Fault tolerance in sensor networks: Performance ... - IEEE Xplore