Accelerating consensus gossip algorithms - COMONSENS

Accelerating consensus gossip algorithms: Sparsifying networks can be good for you César Asensio-Marco, Baltasar Beferull-Lozano Group of Information and Communication Systems Instituto de Robótica y Tecnolog´ıas de la Información & las Comunicaciones (IRTIC) Universidad de Valencia 46980, Paterna (Valencia), Spain Email: {Cesar.Asensio, Baltasar.Beferull}@uv.es

Abstract—In this paper, we consider the problem of improving the convergence speed of an average consensus gossip algorithm by sparsifying a sufficiently dense network graph. Thus, instead of adding links, as usually proposed in the literature, or globally optimizing the mixing matrix of the gossip algorithm for a given network, which requires global knowledge at every node, we find a sparser network that has better spectral properties and faster convergence than the original denser one. This allows to reduce simultaneously both the convergence time and the communication cost involved in the execution of the gossip algorithm. We first show why it is possible to sparsify a network while increasing its convergence rate and also that there exists an optimal fraction of links to be removed. As a benchmark, we devise a centralized method that selects in an optimal way the set of links to be removed from the original network. Then, we propose a low complexity and scalable decentralized protocol requiring only local information at each node, which also generates a sparser network having a substantially better convergence rate. Simulation results are presented to verify and show clearly the efficiency of our approach.

I. I NTRODUCTION The advent of wireless sensor networks have recently attracted a great deal of research work providing a new scenario where a decentralized way of computing with in-network processing capabilities is necessary. As an important example, the average consensus problem is an instance of a distributed problem, where the goal is to achieve, in a distributed way, the average of the sensor data by processing the measurements collected by sensor nodes [1][2]. These algorithms, commonly called average consensus gossip algorithms, avoid the need of performing all the computations at a few sink nodes, thus, reducing congestion around these nodes and incrementing the robustness of the network. The convergence speed and the communication cost of these distributed algorithms have been identified as the most important performance issues [3], which in general, are determined by the topology of the network. The former can be bounded in terms of the so-called algebraic connectivity of the graph representing the network [4]. The latter is approximately This work was supported by the spanish MEC Grants TEC2006 10218 “SOFIWORKS”, CONSOLIDER-INGENIO 2010 CSD2008-00010 “COMONSENS” and the European STREP Project “SENDORA” Grant no. 216076 within the 7th European Community Framework Program.

determined by the number of required communications, which is also related to the topology of the network. Taking into account these performance issues, consensus algorithms have to be designed in order to avoid unnecessary waste of power and time. Most of the existing related research [6] focuses on properly redesigning the topology to improve these parameters, which in most cases [7] [9] involve generating and adding some unrealistic large links to the network. However, creating links between distant nodes might not be possible due to the communication power constraints that are present in battery supplied nodes. Moreover, as the consensus itself, it is desirable to have a distributed method to find good topologies for consensus gossip algorithms. In this paper, we focus on the improvement of the consensus convergence behavior and its power efficiency in dense and randomly deployed sensor networks. We consider the problem of improving both convergence time and power efficiency by sparsifying a sufficiently dense network graph. We show that removing some properly chosen links produces a neat possitive effect. We present both centralized and decentralized approach. Finally, simulation results are presented to verify the efficiency of our approach. The remainder of this paper is structured as follows: Some background on consensus problems is presented in Section II. The motivation of our approach is given in Section III. In Section IV, we present how to tackle the problem of improving the convergence rate and power consumption by removing links from an optimization point of view. We then propose, in Section V, a deterministic principle to distributively locate the suitable nodes, such that removing links among them, it is possible to achieve better results than having the original sensor network. Section VI is devoted for validating our claims by comparing our results with existing approaches. II. P ROBLEM FORMULATION A network of nodes can be modeled as a graph G = (V, E), consisting of a set V of N vertices and a set E of M edges. We denote an edge between vertices i and j as a pair (i, j), where the presence of an edge between two vertices indicates they can communicate with each other.

Given a graph we can assign an N × N adjacency matrix A, given by 1 if (i, j) ∈ E Aij = 0 otherwise The neighborhood of a node i is defined as Ωi = j ∈ V : (i, j) ∈ E i = 1, ..., N where di = |Ωi | is the degree of node i. Similarly, we denote by L the N × N Laplacian matrix of the graph, which is given by L=D−A

(1)

where D = diag(d1 , ..., dN ) is the so-called degree matrix. Let us assume that the sensor measurements of nodes have some initial data at time slot k = 0. We collect them in a vector, which we call the initial state vector x(0), thus the average of the initial state x(0) is 11t x(0) xavg = N where 1 denotes the all ones column vector. We consider the general linear update of the state of each sensor i at time k, using only local data exchange, namely xi (k + 1) =

N X

Wij (k)xj (k) ∀i = 0, 1, 2...N

j=1

where W denotes the mixing matrix, which in this paper, it is assumed to be given by W = I − αL

(2)

where α is a constant independent of time k, which we take 1 as dmax where dmax is the maximum degree in the network. This simple choice of α ensures that, at each time slot k, we give an equal weight to every available link and no processing is required for computing the weights. Since dmax is the maximum degree of the network it can be easily calculated in a distributed way, thus it is scalable. The asymptotic convergence factor is defined as usual [5] by: ||x(k) − xavg || r(W) = sup lim k→∞ ||x(0) − xavg ||2 which has been shown to be equal to: 11T r(W) = ρ W − N where ρ(A) denotes the spectral radius of a matrix1 , and the associated convergence time denoted by t(W) is given by: t(W) = 1 ρ(A)

= max{1≤i≤N } |λi (A)|

−1 log(r(W))

(3)

The convergence time t(W) is the main performance indicator we use in our work. In this paper, we show that it is possible to reduce the convergence time t(W) by removing some properly chosen links, thus sparsifying a given graph. We also assume a communication cost associated with every link in the network. For the sake of simplicity, we assume in this paper that this cost is constant and proportional to the squared distance of the corresponding link. Then, the total communication cost P (W) is proportional to t(W) times the power consumption of one communication iteration across the various links. III. M OTIVATION OF OUR APPROACH 1 gives For a given graph G, the simple choice of α = dmax slower convergence than the value of α presented in [5], which is the optimal one when (2) is assumed. If the eigenvalues of L are arranged in increasing order as 0 = λ1 ≤ λ2 ≤ ... ≤ λN , the optimal value of α is given by [5]: 2 (4) λ2 (L) + λN (L) where λ2 (L) and λN (L) represent the second and the last eigenvalue of the Laplacian matrix. However, using this α implies having perfect global knowledge, thus is not scalable. In other words, every node would need to know the spectral properties of L. This value of α cannot be calculated in a practical scenario due to its inherent complexity. This motivates finding a distributed solution keeping the simplicity of 1 , while trying to achieve a convergence time as close α = dmax as possible to the one obtained using the optimal α. It has been shown in the literature [11] that given a graph G, if we remove a link between two nodes of this graph, we get a new graph G0 for which λ2 (L(G0 )) ≤ λ2 (L(G)). Moreover, since the eigenvalues of W and L are related as follows α=

λi (W) = 1 − αλi (L) if a link from the graph G is removed, we get that λ2 (W(G0 )) ≥ λ2 (W(G)), which implies slower consensus convergence for the same value of α [10]. However, if we 1 and the action of removing a particular link, take α as dmax gives rise to a reduction in dmax , we produce a positive effect in the value of λ2 (W(G0 )). Therefore, by removing links (sparsifying the network), it is possible to create two opposite effects. One of the goals of this paper is to show that the second possitive effect can dominate over the first negative, thus sparsifying a network can increase the convergence rate. Figure 1 validates this intuition and shows how dmax1(G0 ) increases its value faster than the decreasing of λ2 (L(G0 )), which implies that λ2 (W(G0 )) is minimized when their (L(G0 )) product λd2max (G0 ) is maximum. As stated before, there exists a performance gap between the consensus rate achieved with 1 α = dmax and the optimal one for a given graph. The aim of this paper is to close as much as possible the convergence rate given by the optimal α while keeping the simplicity of 1 α = dmax , which can be used in a distributed solution.

0.11

3

0.105

0.1

2.5

0.08

λ2(L)/dmax

3.5

0.12

λ2(L)

1/dmax

0.14

2

0.1 0.095 0.09

0.06

1.5

0.04

1

0.08

0.02 0

0.5 0

0.075 0

0.2

0.4

0.6

0.8

removed links %

0.2

0.4

0.6

0.8

0.085

removed links %

0.2

0.4

0.6

0.8

removed links %

Fig. 1. Average results over 1000 different topologies that are randomly deployed and each of them n are composed by 100onodes. These graphs have been λ (L) as a function of the percentage generated by always removing the links of the nodes with maximum degree. The different parameters d 1 , λ2 (L), d2 max max of removed links are shown from left to right.

IV. C ENTRALIZED APPROACH Let us assume first a centralized scheme. In order to reduce the required time to achieve global consensus, we have to focus on the spectral properties of the matrix W. As shown in [10] the averaging time increases as a function of the second largest eigenvalue of W. Thus, a natural approach is to solve the optimization problem that involves minimizing λ2 (W). As mentioned before, α plays a crucial role when working with the matrix W. Thus, the optimization problem can be expressed as follows: minimize{Z,Y} s. t.

λ2 (W) W = I − α(Z − Y) Yij = 0 if (i, j) ∈ /E Yij ∈ {0, 1} if (i, j) ∈ E

Our goal is to minimize λ2 (W), having a matrix W with the structure as W = I − α(Z − Y) (first constraint), thus enforcing the simplicity in the choice of α and reducing the number of links but not adding news (second constraint). Notice that the last constraint (binary variables) is not convex, so we need to introduce a simple relaxation in order to convexify the problem. Moreover, we have renamed the degree matrix D as Z and the adjacency matrix A as Y in order to clarify that they are now variables. Thus, relaxing the third constraint, this problem can be transformed in the following convex SDP problem [5]: minimize{s,Z,Y} s. t.

s W = I − α(Z − Y) Yij = 0 if (i, j) ∈ /E Yij 1, Yij 0 t W − 11 N sI W1 = 1 1t W = 1t

This optimization problem has the matrices Z, Y and the auxiliary variable s as optimization variables. We also restrict the values of the degrees between 1 (to ensure connectivity) and Dii (no node can increase its number of neighbors). Finally, we bound the maximum degree h of the network iwith 1 1 α, which takes values in the interval dmax , dmax −1 , ...1 .

Therefore, the problem can be expressed as: minimize{s,Z,Y} s. t.

s W = I − α(Z − Y) Yij = 0 if (i, j) ∈ /E Yij 1, Yij 0 t W − 11 N sI W1 = 1 1t W = 1t 1 ≤ Zii ≤ Dii 1 Zii ≥ α

∀i ∀i

where Dii is the degree of each node in the original graph G and Zii are the variables that give the degree in the resulting network. We solve the previous problem for the different values of α. It gives an array of solutions where the best one is given by the minimum λ2 (W), which implies minimum convergence time. Notice that because of the relaxation, the results that we directly get from the relaxed optimization problem can be interpreted in a probabilistic way. It actually gives a new topology that on average has much less links and gives much better convergence rate than the original one. This goodness is somehow expected because of the relaxation. However, from an implementation point of view, we need a solution where the values of the matrix Y are zero or one and the values of Z are integers. Then, given the optimized matrices Y and Z from the relaxed problem, we need to project their values to achieve a proper solution. The method that we have used for this purpose consists on using the resulting probabilistic matrix as a random topology generator process. Each of the entries of this matrix gives the probability of having a one in each of the corresponding entries of the generated matrix. After generating a certain number of instances (1000 in practice), we choose the topology with the minimum λ2 (W). Although this centralized approach provides an important benchmark for comparison, unfortunately, computing all of this in a centralized fashion is not satisfactory in our setting. It is natural to ask whether in this setting, the idea of removing links to improve the consensus algorithm can also be performed in a decentralized fashion. In other words, designing a decentralized algorithm with similar results.

50

50

50

45

45

45

40

40

40

35

35

35

30

30

30

25

25

25

20

20

20

15

15

15

10

10

10

5

5

5

0

0

0

5

10

15

20

25

30

35

40

45

50

0

5

10

15

20

25

30

35

40

45

0

50

0

5

10

15

20

25

30

35

40

45

50

Fig. 2. Left: Original topology (2890 links), Middle: Resulting topology of the distributed approach (1740 links), Right: Resulting topology of the centralized approach (1148 links). The corresponding convergence times t(W) are: 12.4, 8.5, 6.2 and the corresponding power consumptions P (W) are: 6.1 × 107 , 3.5 × 107 , 2.1 × 107 respectively.

V. D ECENTRALIZED APPROACH As explained in Section I, removing links from nodes, which have degree equal to dmax can actually give rise to a neat possitive effect in the convergence time. As shown in Figure 1, the parameter λ2 (W) can be reduced by removing approximately 35% of the links. Although these initial graphs have not been generated by using only local information, intuitivelly it is not necessary to know the entire set of nodes with maximum degree in order to achieve a similar solution. Therefore, our goal is to obtain similar results, but only using local information and distributed computations. Let us assume a randomly and dense deployed network of N nodes inside a 2D (sufficiently) large square area where a link between two nodes is established if their internode distance is shorter than ||R||. Suppose that the size of the square area is L2 . Since the deployment is generated uniformly random, the probability that there is a link between any two nodes is simply given by: πR2 L2 and the average number of links (average degree) that each node has is (approximately): p=

πR2 L2 According to [8], the degree distribution for a random network can be approximated by a binomial distribution, or in the limit for large scale networks, by a Poisson distribution. Thus, assuming a large-scale network, the probability of any node having degree k is: E[d] = λ = N p = N

e−λ λk k! and the degree distribution is simply N p(k). The main idea behind this distributed approach is to find a degree γ such that all nodes with degree greater or equal to γ are chosen for losing links. This is possible by combining the information given by Figure 1 and the degree distribution. If we assume that each node i having di ≥ γ loses (di − γ) and we want to remove a percentage c of the total number of links p(k) =

M, we can approximate this parameter by using the following expression:   dX max p(i)(i − γ) = cM N i=γ

where λ is approximated asymptotically by M N . Then, if we divide the summation in two terms we have:   dX dX max max p(i)i − γ  p(i) = cλ i=γ

i=γ

The first summation is the expression for the expectation of a Poisson distribution where γ terms have been removed. Therefore, this summation is always lower or equal than λ resulting in the following inequality:   dX max γ p(i) ≤ (1 − c)λ i=γ

As it is well known, the Poisson distribution has almost all the probability acummulated around the parameter λ. This implies that for values of c ≥ P 0.35, we have γ ≈ (1−c)λ since dmax for dense deployed networks i=γ p(i) ≈ 1. In other words, both the number of nodes having low degree and the associated probability are small p(x < 0.65λ) ≤ 0.05. Moreover, since the resulting expression for the parameter is proportional to the average degree of the network, it can be easily calculated in a distributed way (via gossiping). Figure 3 shows that using this parameter, we can achieve the best convergence when γ ≈ 0.65λ, that is, by removing approximately 35% of the total number of links in the network. In conclusion, our approach consists on removing links from nodes, which have high degree. These nodes are chosen by using the parameter γ, which can be obtained via gossiping. The only information used here is the local degree of nodes. VI. N UMERICAL RESULTS In this section, we focus on validating our algorithms by comparing the associated convergence time of various existing approaches with the one derived from our sparse subgraphs.

7

11.5

0.94

5

0.93

4.5

0.92

4

x 10

11 10.5

9.5 9

P(W)

λ2(W)

t(W)

10

0.91

3.5

8.5 8

0.9

3

7.5 7 0

0.1

0.2

0.3

0.4

0.89 0

0.5

0.1

c (removed links %)

0.2

0.3

0.4

2.5 0

0.5

c (removed links %)

0.1

0.2

0.3

0.4

0.5

c (removed links %)

Fig. 3. Average values of several parameters {t(W), λ2 (W), P (W)} obtained by applying our distributed method and averaging over 100 different topologies composed by 100 nodes. The different parameters values are shown as a function of c and generated by removing links from nodes with degree greater or equal than γ = (1 − c)λ, from left to right: t(W), λ2 (W) and P (W). The red line is the value of the parameters when no links are removed.

Figure 4 shows that our distributed algorithm provides much better results than using the same α but not removing any link. Interestingly, it also has a similar convergence behavior as the optimal α applied to the original network (global knowledge required) and it behaves slower than the optimal topology obtained from the centralized approach. The numerical results of time and power consumption are summarized in Table I. Figure 3 shows the chosen value of c giving, on average, the fastest convergence when applying our distributed method and saving, at the same time, a large amount of power. Finally, Figure 2 shows an example of a network where removing links the convergence of the consensus algorithm is improved. In conclusion, we are sparsifying graphs so that the new generated graphs have faster convergence behavior than the original denser ones. Moreover, less communication links are used for achieving consensus, which also implies a reduction in the power consumption. 0.45 α = 1 / dmax centralized algorithm

0.4

α = 1 / dmax distributed algorithm

max deviation

0.35

α = 1 / dmax original topology α = 1 / (λ2(L) + λN(L)) original topology

0.3 0.25 0.2 0.15 0.1 0.05 0

10

20

30

40

50

60

70

80

90

100

iteration Fig. 4. Average convergence results over 100 different topologies composed by 100 nodes. The comparison is done between: a) our centralized method, which removes 60% of the links (optimized topology), b) our distributed method, which removes 35% of the links (sparser topology) and the convergence with c) α = 1/dmax and d) α = 1/(λ2 (L)+λN (L)) for the original topology (no links removed).

TABLE I S IMULATION R ESULTS Method α= α= α= α=

1 dmax 1 dmax 1 dmax λ2

original topology distributed algorithm centralized algorithm 2 original topology (L)+λ (L) N

t(W)

P (W))

10.2 7.6 5.5 6.2

4.7 × 107 2.9 × 107 1.7 × 107 2.6 × 107

R EFERENCES [1] Xiao, L.; Boyd, S.; Lall, S., “A scheme for robust distributed sensor fusion based on average consensus,” Information Processing in Sensor Networks, 2005. IPSN 2005. Fourth International Symposium on , vol., no., pp. 63-70, 15 April 2005 [2] Olfati-Saber, R.; Fax, J.A.; Murray, R.M., “Consensus and Cooperation in Networked Multi-Agent Systems,” Proceedings of the IEEE , vol.95, no.1, pp.215-233, Jan. 2007 [3] Barbarossa, S.; Scutari, G.; Swami, A., “Achieving Consensus in SelfOrganizing Wireless Sensor Networks: The Impact of Network Topology on Energy Consumption,” Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on , vol.2, no., pp.II-841II-844, 15-20 April 2007 [4] R. Olfati-Saber and R. M. Murray, “Consensus problems in networks of agents with switching topology and time-delays,” IEEE Trans. Automat. Contr., vol. 49, no. 9, pp. 1520-1533, Sep. 2004. [5] Lin Xiao; Boyd, S., “Fast linear iterations for distributed averaging,” Decision and Control, 2003. Proceedings. 42nd IEEE Conference on , vol.5, no., pp. 4997-5002 Vol.5, 9-12 Dec. 2003 [6] Ming Cao; Chai Wah Wu, “Topology design for fast convergence of network consensus algorithms,” Circuits and Systems, 2007. ISCAS 2007. IEEE International Symposium on , vol., no., pp.1029-1032, 27-30 May 2007 [7] Olfati-Saber, R., “Ultrafast consensus in small-world networks,” American Control Conference, 2005. Proceedings of the 2005 , vol., no., pp. 2371-2378 vol. 4, 8-10 June 2005 [8] M. E. J. Newman, “The structure and function of complex networks,” SIAM Review, vol. 45, pp. 167256, 2003. [9] Zhipu Jin; Murray, R.M., “Random consensus protocol in large-scale networks,” Decision and Control, 2007 46th IEEE Conference on , vol., no., pp.4227-4232, 12-14 Dec. 2007 [10] Boyd, S.; Ghosh, A.; Prabhakar, B.; Shah, D., “Randomized gossip algorithms,” Information Theory, IEEE Transactions on , vol.52, no.6, pp. 2508-2530, June 2006 [11] C. Godsil and G. Royle. Algebraic Graph Theory. Springer, 2001.

Accelerating consensus gossip algorithms - COMONSENS

Accelerating consensus gossip algorithms - COMONSENS

Suggest Documents

Broadcast Gossip Algorithms for Consensus - Google Sites

Asymmetric Randomized Gossip Algorithms for Consensus - CiteSeerX

Asymmetric Randomized Gossip Algorithms for Consensus

Gossip Algorithms for Convex Consensus Optimization over Networks

Gossip Algorithms Contents - MIT

Broadcast Gossip Algorithms - Semantic Scholar

Broadcast Gossip Algorithms - Semantic Scholar

Quantized Consensus on Gossip Digraphs - Google Sites

Reaching consensus about gossip: convergence ... - Semantic Scholar

Reaching consensus about gossip: convergence times ... - CiteSeerX

Consensus Algorithms

Local Interference Can Accelerate Gossip Algorithms - CiteSeerX

Gossip algorithms for distributed ranking - Automatica - Unipd

Gossip Algorithms for Distributed Signal Processing - arXiv

Block Toeplitz Matrices - COMONSENS

Gossip algorithms for simultaneous distributed ... - Automatica

Gossip algorithms for simultaneous distributed ... - Automatica - Unipd

ALMOST SURE CONVERGENCE OF RANDOM GOSSIP ALGORITHMS

A New Perspective on Randomized Gossip Algorithms

Analysis of Accelerated Gossip Algorithms - IEEE Xplore

Consensus for Quantum Networks: From Symmetry to Gossip Iterations

Optimal Gossip Algorithm for Distributed Consensus ... - FORTH-ICS

Fast Discrete Consensus Based on Gossip for Makespan ... - CiteSeerX

A gossip-based algorithm for discrete consensus over ... - CiteSeerX