Differential Nested Lattice Encoding for Consensus ... - CiteSeerX

7 downloads 0 Views 236KB Size Report
Proc., pages 158–167, Killarney,Ireland, 1998. APPENDIX. A. CALCULATION OF E{ ˜ZI (K − L) ˜ZI (K − M)}. We have already known that E{˜zi(k − l)˜zi(k − m)} =.
Differential Nested Lattice Encoding for Consensus Problems ∗

Mehmet E. Yildiz

Anna Scaglione

Cornell University School of Electrical and Computer Engineering 432 Rhodes Hall Ithaca, NY 14850

Cornell University School of Electrical and Computer Engineering 325 Rhodes Hall Ithaca, NY 14853

[email protected]

[email protected]

ABSTRACT

Keywords

In this paper we consider the problem of transmitting quantized data while performing an average consensus algorithm. Average consensus algorithms are protocols to compute the average value of all sensor measurements via near neighbors communications. The main motivation for our work is the observation that consensus algorithms offer the perfect example of network communications where there is an increasing correlation between the data exchanged, as the system updates its computations. Henceforth, it is possible to utilize previously exchanged data and current side information to reduce significantly the demands of quantization bit rate for a certain precision. We analyze the case of a network with a topology built as that of a random geometric graph and with links that are assumed to be reliable at a constant bit rate. Numerically we show that in consensus algorithms, increasing number of iterations does not have the effect of increasing the error variance. Thus, we conclude that noisy recursions lead to a consensus if the data correlation is exploited in the messages source encoders and decoders. We briefly state the theoretical results which are parallel to our numerical experiments.

Consensus, average consensus, predictive coding, coding with side information, nested lattice coding

Categories and Subject Descriptors C.2.4 [Computer-Communication Networks]: Distributed Systems—Distributed applications; E.4 [Coding and Information Theory]: Data compaction and compression

General Terms Algorithms, Design, Performance ∗This work was supported by the National Science Foundation under the grant number NSFFMFCCF0514243. The authors were visiting Ecole Polytechnique Federale de Lausanne, Switzerland during some parts of this study.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. IPSN’07, April 25-27, 2007, Cambridge, Massachusetts, USA. Copyright 2007 ACM 978-1-59593-638-7/07/0004 ...$5.00.

1. INTRODUCTION In [7], Saber and Murray have studied a cooperative network algorithm which is called average consensus with synchronous updates. This algorithm allows a set of nodes (agents, sensors) to calculate the average of the initial sensor measurements by updating their states with a weighted average of their near neighbors’ measurements in an iterative fashion. The authors have analyzed continuous time update scenario with dynamically changing topology. Moreover, they have explored the necessary conditions for convergence in terms of the update weights and network connectivity. In [6], Ren et al. have studied consensus problem under Kalman filtering context. Rabbat et al. have focused on consensus problem with binary erasure links between neighboring nodes[5]. The motivation of our work lies in some of the results shown in [11] by Boyd et al., who derived the convergence conditions for discrete time average consensus. The same authors have explored the convergence characteristics of the problem when the state values are communicated in the presence of additive independent noise with fixed variance in [12]. The discouraging outcomes of the analysis in [12] are twofolds. First, the node values may not converge to a common value. Second, the node values converge to the initial average in mean, and mean squared deviation from the initial average increases as the number of iterations grows. The key observation here is that consensus algorithms offer the perfect example of network communications where there is a correlation between the data exchanged, and the correlation increases as the system updates its computations. If this fact is properly accounted for in quantizing the messages, the variance of the noise should diminish and the negative result in [12] should no longer be considered as an indicative of the performance that consensus problems can have over digital communication networks. To prove it, our paper introduces and analyzes a more detailed internode communications model, where the noise added to each computation is the result of the data source encoding. In particular, we assume that at every iteration a message can contain a finite number of bits and explore the effect of the quantization errors arising due to this finite communication rate constraint. We propose practi-

a) Scheme 1: Predictive Coding

z i (k )

Encoder i

a) Peer to Peer Scheme

~z ( k ) i

Decoder j

z i (k )

Estimates of previous state values of the sender (sensor i)

Encoder i

Estimates of previous state values of the sender (sensor i)

z1 ( k )

Encoder 2

z2 (k )

.

.

Encoder m

b) Scheme 2: Wyner Ziv Coding

z i (k )

Encoder 1

b) Broadcast Scheme

~z ( k ) i

Decoder j

zi (k )

States of the receiver (sensor j)

z m (k )

Encoder

~z ( k ) i

c) Scheme 3: Predictive+WZ Coding

z i (k )

Encoder i

Decoder j

~z ( k ) i

Figure 2: Peer to Peer vs Broadcast Estimates of previous state values of the sender (sensor i)

States of the receiver (sensor j)

Figure 1: Encoding/Decoding Strategies for Consensus Problem

cency matrix A defined as: aij > 0, if there is an edge between node i and j aij = 0, if i = j or i 6= j and there is no edge Laplacian 1 matrix of the graph G is defined as: L = diag(A1) − A,

cal scalar quantizers based on predictive and/or nested lattice Wyner-Ziv encoding schemes, and draw our conclusions based on their actual numerical implementation over random networks. More specifically, denoting by zi (k) the unquantized message of node i at iteration k and, by z˜i (k) its reconstructed value at the decoder side, we consider three main strategies which are represented in Fig.1. For each of these three strategies, we consider the two possible communication scenarios in Fig.2 called, respectively, the peer to peer case and the broadcast case. As seen in Fig.2a the peer to peer schemes utilize a different encoder for each particular destination, while the broadcast schemes in Fig.2b require only one encoder for all receivers of a given sender. The reason of this distinction is that the peer to peer methods generally outperform the broadcast methods, but at the price of a more complex encoder structure and of forcing to send a different message to each neighbor. While sending a different message over each link is acceptable in a wired network, it is wasteful in a wireless medium, where communications are naturally broadcast and each transmission reaches all neighbors. For this reason, the peer to peer methods are proposed for wired networks and the broadcast methods are proposed for wireless networks. As mentioned above, through this study it is possible to reach a more positive conclusion compared to that in [12]. The paper is organized as follows: In Section 2, we review the main mathematical relationships characterizing average consensus algorithms. In Section 3, we propose a predictive coding scheme which has the structure shown in Fig.1a. In Section 4, we discuss Wyner-Ziv encoder/decoder scheme as in Fig.1b. We combine predictive and WZ coding strategies in Section 5 (Fig.1c). In Section 6, we briefly discuss our theoretical results. We simulate and give the corresponding results in Section 7. We conclude the paper in Section 8.

2.

MODEL We consider a graph G = (n, A) with n nodes and adja-

(1)

(2)

where 1 denotes the vector with all coefficients one. In this study, we consider distributed average consensus algorithms updating a vector of node values x(k) = (x1 (k), . . . , xn (k))T as follows: n X  xi (k + 1) = xi (k) − ǫLij xj (k) − xi (k) , (3) j=1,j6=i

for i = 1, . . . , n k = 0, 1, . . .. We can rewrite (3) in vector form as: x(k + 1) = (I − ǫL)x(k) = W x(k).

(4)

It was shown by Xiao and Boyd that above system converges to the average of any initial vector x(0) ∈ Rn if and only if W is balanced2 , and k W − n1 11T k< 1 where the norm is maximum singular value norm [12]. In other words, convergence is satisfied if 1 is an eigenvalue of W and it is also the eigenvalue with greatest magnitude [11]. Similar results for continuous time iterations are in [7]. For A = AT , by constraining 0 < ǫ < 1/max(A1), we guarantee that eq.(4) asymptotically converges to average as in [11]. If we decompose the eq. (3) as follows: x(k + 1) = (I − ǫdiag(A1))x(k) + ǫAx(k)

(5)

we see that there are two different parts in the update: computing (I − ǫdiag(A1))x(k) requires only local values and ǫAx(k) uses the neighbors’ values. We can therefore define z(k) = ǫx(k),

(6)

as the vector of variables which needs to be exchanged over the links available in G at each iteration k. The entries of the vector z(k), quantized with finite precision, are reconstructed as z˜(k) = z(k) + w(k), where w(k) is the quantization error. 1 Laplacian matrix is the matrix representation of a graph G. 2 T 1 W = W 1)T = 1T

While there exists a substantial body of work on average consensus protocols under infinite precision and noiseless peer to peer communications, little research has been done introducing distortions in the message exchange, such as the noisy update assumption made in [12]. Specifically, Xiao and Boyd consider the following extension of (3): xi (k + 1) = xi (k) −

n X

j=1,j6=i

Remark 1. The methodologies introduced require knowledge of the node statistics and topology (W matrix) to be available at each node. This assumption may not be feasible in a decentralized setting. However, our results are a starting point to analyze other suitable non parametric encoders for this problem. In Remark 4, we discuss briefly about how to relax this assumption.

PREDICTIVE CODING

In this section, we study the predictive encoding/decoding algorithms in Fig.1a for the broadcast and peer to peer transmissions (Fig.2).

3 A graph G is strongly connected if for every pair of vertices u and v, there is a path from u to to v, and v to u. Since we are constraining W matrices to be symmetric, existence of a path from u to v implies that there exists a path from v to u.

-

d i (k )

zˆi (k )

 ǫLij xj (k) − xi (k) + wi (k) (7)

where wi (k), i = 1, . . . , n, k = 0, 1, . . . are independent zero mean identically distributed random variables with fixed variance. The authors show that under these assumptions, the node values converge to the initial average only in mean (i.e. we do not have mean squared convergence) and that the mean squared error (MSE) deviation from the actual average, increases beyond a certain iteration. Moreover, the node values do not necessarily converge to the same value. Under a detailed internode communication model, wi (k) can be characterized as the quantization noise. If increasing correlation among the node states are taken into account, it can be shown that the variance of the quantization noise diminishes and the nodes converge to a common value even under finite rate regions. This is the main contribution of our study. In the rest of the paper, we assume that the data in the initial state vector x(0) are random variables with zero mean and finite variance. Since all the variables we are going to deal with are zero mean, the appellative of covariance and correlation will be used interchangeably. While our results and encoding methods, can be applied in the most general case, all our numerical results will consider random geometric graphs to set up the topology of G. A random geometric graph G(n, r) is a graph whose nodes are uniformly distributed points over a fixed area and where a link exists between any two nodes that are at a range less than r (r is called connectivity radius). We assume that transmission over the range r are always successful, that no channel errors are added, and that the nodes have fixed locations so that the topology of G is fixed. We also assume that nodes are strongly connected3 and W satisfies convergence conditions for average consensus defined in (4).

3.

Encoder zi (k )

Decoder ~ d i (k )

~ d i (k )

Quantizer

Predictor

Delay(-1)

+

~ zi ( k )

+

zˆi (k )

Predictor

Delay(-1)

Figure 3: Differential encoder/decoder diagram.

3.1 Broadcast Predictive Coding To exploit the local temporal correlation, the nodes can digitize zi (k) in (6) via the differential encoding/decoding scheme in Fig.3. For each node i and time instant k, define: zˆi (k) di (k) d˜i (k) z˜i (k) wi (k)

= =

prediction prediction error

= = =

quantized prediction error noisy reconstruction quantization error

z˜i (k) is derived through the following steps: p≤k

zˆi (k) =

X

(k)

ai (l)˜ zi (k − l)

(8)

l=1

di (k) =

zi (k) −

p≤k

X

(k)

ai (l)˜ zi (k − l)

(9)

Q[di (k)] = Q[zi (k) − zˆi (k)] zi (k) − zˆi (k) + wi (k) zˆi (k) + d˜i (k)

(10)

l=1

d˜i (k) = ≈ z˜i (k) = =

zi (k) + wi (k).

(11) (12)

where in (8) zˆi (k) is a linear minimum mean squared estimate (LMMSE) of zi (k) of order p; di (k) in (9) is the prediction error, to be quantized and transmitted; (10) is because quantization noise can be modeled as additive, and (11) is the reconstruction of zi (k) at the encoder. (12) shows us that zi (k) can be reconstructed at the receiver since zˆi (k) is also known at the decoder. Assuming Ri (k) is the transmission rate of sensor i and that we use a finite range uniform scalar quantizer4 , at iteration k the range of the quantizer and step-size can be chosen as follows: ηi (k) = 4 ∗ ST D [di (k)] , △i (k) = ηi (k)2−Ri (k) . 4

(13)

Nonuniform quantizers such as Lloyd-Max may also be implemented and may result in better performance. Our choice is just for the sake of simplicity.

where ST D[.] is the standard deviation of a random variable. The range of the quantizer is chosen such that any realization of the random variable di (k) falls into this range with high probability. Under the high resolution assumption, we can further approximate the quantization error as an uniform random variable independent of di (k) and all the above variables at time k. Hence: ∆2i (k) 4V AR [di (k)] 2−2Ri (k) = , 12 3

V AR [wi (k)] ≈

(14)

where V AR [di (k)] is the prediction error variance, is calculated below. The prediction coefficients are: (k)

ai

= vzTi (k) Mz˜−1 , i (k−1)

(15)

where, for l, m = 1, . . . , p ≤ k:

Hence:

[Mz˜i (k−1) ]lm   vzi (k) m

=

E{˜ zi (k − l)˜ zi (k − m)},

(16)

=

E{zi (k)˜ zi (k − m)}.

(17)

V AR [di (k)] = V AR [zi (k)] − vzTi (k) Mz˜−1 v , i (k−1) zi (k)

(18)

Σ(k) = W Σ(k − 1)W T + ǫ2 AΥ(k − 1)AT Moreover, by plugging (18) into (14), and assuming noises are independent spatially, the noise covariance matrix can also be written as: 4V AR[di (k − 1)]2−2Ri (k) 3

(23)

and [Υ(k − 1)]ij = 0 if i 6= j.

3.1.2 Algorithm Summary At iteration k; node i

3.1.1 Analytical Framework 



To be able to compute [Mz˜i (k−1) ] and vzi (k) , we need to calculate E{˜ zi (k − l)˜ zi (k − m)} and E{zi (k)˜ zi (k − l)} for k, l ∈ {1, . . . , p}. Either terms can be obtained taking the ii element of the cross-covariance matrices for a given k and l: h i E{˜ zi (k − l)˜ zi (k − m)} = E{˜ z (k − l)˜ z T (k − m)} ii h i T E{zi (k)˜ zi (k − m)} = E{z(k)˜ z (k − m)} ii

which are easier to calculate because the recursions are more compact to express in terms of vectors. Note that the noisy recursion have the following simple form: x(k) = (I − ǫdiag(A1))x(k − 1) + A˜ z (k − 1),

(19)

where z˜(k − 1) = z(k − 1) + w(k − 1) by eq.(12). Therefore: = = + =

The values of Σ(k − m), and Υ(k − m), that change with the index k −m, can be calculated in an iterative fashion. In fact, using eq.(21), we can express state covariances in terms of known previous state covariances and noise covariances as follows:

[Υ(k − 1)]ii =

and, plugging (18) into (14), we can calculate variance of quantization error for a given node i and iteration k.

z(k)

Table 1: Cross Correlations of Noisy States E{˜ z (k − l)˜ z T (k − m)} = l = m Σ(k − m) + Υ(k − m) l < m W m−l Σ(k − m) + ǫW m−l−1 AΥ(k − m) T l > m Σ(k − l)(W l−m )T + Υ(k − l)ǫ W m−l−1 A

ǫx[k] = (I − ǫdiag(A1))z(k − 1) + ǫA˜ z (k − 1), (I − ǫdiag(A1))z(k − 1) + ǫAz(k − 1) (20) ǫAw(k − 1) W z(k − 1) + ǫAw(k − 1) (21)

The simple expression in eq.(21) will be used to express state covariance matrix in a recursive fashion. Further details on the calculation of E{˜ z (k − l)˜ z T (k − m)} T and E{z(k)˜ z (k − m)} are given in Appendix A and B. We define state and noise vector covariances as follows: Σ(k − m)

,

E{z(k − m)z T (k − m)}

Υ(k − m)

,

E{w(k − m)wT (k − m)}

E{˜ z (k − l)˜ z T (k − m)} is presented in Table 1 for a given k, m, l triplet in terms of the state and noise vector covariances, W , A, and ǫ. Similarly, E{z(k)˜ z T (k − m)} can be written in terms of these quantities as: E{z(k)˜ z T (k − m)} = W m Σ(k − m) + ǫW m−1 AΥ(k − m) (22)

(k)

• obtains the linear predictor coefficients ai , the prediction error variance V AR[di (k)], and the quantization interval length △i (k), • quantizes and transmits the prediction error di (k), • updates state and noise covariances matrices for the next iteration. Receiving the transmitted value, neighbor j (k)

• obtains the linear predictor coefficients ai , • reconstructs zi (k) as z˜i (k), • updates state and noise covariances matrices for the next iteration. Once, transmissions are complete and neighbor values are reconstructed, state values are updated by the recursion equation. Remark 2. If the network connectivity (W ), initial distribution of the observations (Σ(0)), and the rate allocation (Ri (k)) are fixed, then the prediction coefficients can be calculated offline and stored at each sensor. In this case, each sensor stores ni × p × K values where ni is the number of neighbors of sensor i, p is the predictor order, and K is the maximum number of iterations. Assuming p and K are fixed as reasonable quantities, the significant complexity factor will be the number of neighbors. If the network is dense, this factor scales as log(n). Though this results in scalability issues, if the network is dense enough it can be modeled in a more structured way. In this case, the same coefficient can be used for all neighbors, and each node stores only p × K coefficients. A brief discussion on how to model the network in a structured way is given in Remark 4.

3.2 Peer to Peer Predictive Coding At a given time instant k, sensor i knows not only its own previously quantized values (˜ zi (k−1), z˜i (k−2), . . .), but also neighbors’ previously quantized values, i.e. z˜j (k − 1), z˜j (k − 2), . . . where j is a neighbor of i. If the link is bidirectional the prediction error at node i can be further reduced by having: p

zˆi (k) =

X

(k)

(k)

ai (l)˜ zi (k − l) + bi (l)˜ zj (k − l).

(24)

l=1

x

y

-4∆ -3 ∆ -2 ∆ - ∆ bin 1

bin 0

bin 1

bin 0

bin 1

0



2∆

3∆

4∆

bin 0

bin 1

bin 0

bin 1

bin 0

bin 1

∆ xˆ

(k)

where ai and bi (k) are the corresponding LMMSE coefficients. The derivations related to this method follow exactly the same logic steps in Section 3.1. Due to lack of space, the detailed calculations are given in [13].

4.

Figure 4: An example of nested lattice codes in 1-D.

CODING WITH SIDE INFORMATION

In this section, we exploit the fact that a node can use its own sequence of present and past values locally, all or in part, as side information ([10],[8]) and therefore can improve the accuracy in the reconstruction of the neighbors’ state in a way that is comparable to predictive coding. To this end, we note that the calculations done in Section 3.1.1 provide the covariance matrices of the current states and the cross covariance of current states with previous states of all nodes. We propose the use of a simple scalar nested lattice code to utilize the side information. We analyze two schemes which are the broadcast and the peer to peer versions of the strategy.

4.1 WZ scalar quantization Coding with side information or Wyner-Ziv (WZ) coding is the encoding strategy that leads to reduced rate or improved performance by relying on the fact that the decoder can make use of side information correlated with the incoming message. In the classical WZ scheme, there is a single decoder-encoder pair and the decoder has data that can be represented as a noisy version of the data to encode. The rate distortion for this problem was studied by Wyner and Ziv in [10]. Our implementation of WZ will be based on a nested lattice code construction (see e.g. [4] and [14]). We will use the simple encoding approach discussed in [3] and [9] that we summarize next. The decoder has access to the side information y ∈ R. The encoder compresses its observation x ∈ R in the following way (see Fig.4 which shows an example of a scalar nested lattice code with rate=1 bit, and quantization interval length △): • The encoder partitions the observation space into disjoint uniform quantization intervals of length △ and determines the index of the interval to which x belongs to. (e.g. 3△ in Fig.4) • The encoder assigns a label l to each of the quantization intervals where l ∈ {1, . . . , 2R } and R represents the transmission rate. • The encoder transmits the label(coset index) of the interval to which x belongs to.(e.g. bin 1 in Fig.4) The reader should note that there are infinitely many intervals which are assigned the same coset index by the encoder in this scheme. Assuming that coset index is available at the

decoder error free, the decoder uses the side information (y) and the coset index (bin 1) to determine which one is the most likely interval in which x falls among those that have the same coset index. A model often used to represent the dependency between x and y is that y = x + n where n is random noise, with a unimodal zero mean distribution. In this case a minimum distance rule can be used to decide the quantization interval where x is. Then, the quantized value of x is recovered as the centroid of this interval (e.g. x ˆ=△ in Fig.4). The formal algorithm is given as follows: • Given an integer number P = 2R , and quantization index △, x is encoded as:   x p= mod P, (25) △ where mod stands for the modulo operation that returns a number p = 0, ..., P − 1 and the brackets indicate the rounding operation. • The coset index p is transmitted to the decoder error free. • Using p and side information y, the decoder reconstructs x by MD rule [9] as: " y # −p △ x ˆ = P △ + p△ (26) P Since the rate is fixed by the communication constraint, the only parameter to choose in the encoder/decoder pair specified in eqs.(25) and (26) is the quantization step △. The choice of the quantization step is non trivial due to its complex effect on the actual distortion that one obtains with a finite nested lattice code, such as our scalar quantizer. In fact, the decoding rule in (26) can produce errors having much larger range than the quantization step △ and the expression of the error variance is a non linear function of the rate and the step size. The error variance expression for this scheme is given in [3]. It is valid under the assumptions of jointly Gaussian random variables x and y and high resolution. The encoded data x and side information y are such that y = x + n where n is also a Gaussian random variable

and independent of x. In the scalar case, the expression is: E{(x − x ˆ)2 } = Φ(R, △, σn ) (27) ∞ R 1 X   (k + 2 )2 △ △2 = 22R △2 (2k + 1) Q + σn 12 k=0

R ∞ t2 where σn is the variance of n and Q(x) = √12π x e− 2 dt. Although the expression is non linear function of the quantization interval △, it is convex therefore it has a minimum. Moreover, it can be minimized numerically for a given choice of rate and σn with respect to △. Considering Gaussian random states, we propose to use the methodology in (25) and (26) with a step size chosen to minimize (27) to encode the states zi (k) making use of the side information at the decoder side. The strategy can be generalized easily to other distributions by modifying appropriately (27). Next, the broadcast and the peer to peer strategies are discussed in the given order.

4.2 Broadcast WZ Coding In this case, there is a single encoder and several receivers with heterogenous side information. Recently, Feder et al. considered this problem for discrete sources in [2]. The authors make an analogy between the degraded broadcast channel capacity problem and the broadcast Slepian Wolf problem and, provide achievable broadcast rates under static and streaming scenarios5 . In our paper we use a simpler albeit suboptimum encoder: a single encoding scheme optimizing the quantization interval length △ to have minimum average error variance at the receivers. Let us denote the neighbor set of sensor i as Ni . For each receiver j ∈ Ni , we can map quantities x and y = x + n in the previous section as follows: zi (k) zˆji (k)

→ =

x p X

(28) aji (l)˜ zji (k − l) +

l=1

zˆji (k)



p X

bji (l)zj (k − l)(29)

l=0

y

(30)

where aji (l) and bji (l) are the linear estimation coefficients, zˆji (k) is the linear estimate of zi (k) and z˜ji (k − l) is the reconstruction of zi (k − l) at sensor j. Note that, although there is only one encoder used for all neighbors, the reconstruction of the state value z˜ji (k − l) (and its quantization noise) is different at each receiver due to the heterogonous side information zˆji (k). To compute the side information the decoder uses the supervector γji (k): γji (k) = [zj (k) zj (k−1) . . . zj (k−p) z˜ji (k−1) . . . z˜ji (k−p)]T , (31) and the covariance matrix and cross-correlation vector: Mji (k)

=

T E{γji (k)γji (k)}

(32)

vji (k)

=

T E{zi (k)γji (k)}.

(33)

Then, the side information in (29) can be written in a compact form as: T −1 zˆji (k) = vji (k)Mji (k)γji (k). 5

(34)

At the moment no extension exists for the WZ result and the scheme proposed in [2] requires a rather complex procedure.

The reader should note that Mji (k) is a (2p + 1) × (2p + 1) matrix for a given order p. The upper left (p + 1) × (p + 1) block of the matrix contains cross correlations of the states k, . . . , k − p at sensor j and can be calculated by (50). The lower right p × p block is the covariance of the noisy reconstructions of sensor i, and is calculated via the recursion shown in table1. The upper right and lower left blocks of the matrix are cross correlations between reconstructions of sensor i and states of sensor j and are derived in (22). The necessary equations to derive vji (k) vector are (22) and (23). In order to choose the quantizer parameters both encoder and decoders need to know the statistics of dji (k) = zi (k) − zˆji (k) where dji (k) is the estimation error of zi (k) and it is used as the side information noise (i.e. dji (k) → n). The variance of dji (k) is necessary to derive △: σd2ji (k) = =

E{(zi (k) − zˆji (k))2 } −1 V AR[zi (k)] − vji (k)T Mji (k)vji (k) (35)

Using (35) we can derive: △i (k) = argmin△>0

X

Φ(R, △, σdji (k) )

(36)

j∈Ni

which is the common quantization interval length minimizing the average error variance over all j ∈ Ni . Using (28), (30) and the △i (k) in (36), the encoder i computes the coset index and broadcast to its neighbors. Upon receiving the coset index error free, each neighbor reconstructs zi (k) as in (26). Once all the neighbors’ information is received and decoded, the sensor i performs the following state update: xi (k) = (1 − ǫ

n X

aij )xi (k − 1) +

j=1

n X

aij z˜ij (k − 1)

(37)

j=1

˜ To write the network equations we introduce Z(k) as a n×n matrix whose ij entry is reconstruction of zj (k) by zi (k). The network equation is given as: ˜ − 1))1. x(k) = (I − ǫdiag(A1))x(k − 1) + (A⊙Z(k

(38)

where ⊙ represents entry by entry matrix multiplication and 1 is all ones column vector. Once all sensors update their current states as in (37), the new step requires knowing Mji (k + 1), and vji (k + 1) defined in (32) and (33). As mentioned before, these quantities can be written in terms of the state and the noise covariances and calculated recursively as indicated in Section 3.1.1. The only difference here compared to Section 3.1.1 is that, because each term z˜ij (k − 1) has its own specific quantization error wij (k − 1) associated the update of the covariance matrix of the states has a different form in this case compared to (23). By eq.(6) and (38): z(k) = W z(k − 1) + β(k − 1) P where [β(k − 1)]i = ǫ n j=1 aij wij (k − 1). Then:

(39)

+ E{β(k − 1)β T (k − 1)}

(40)

E{z(k)z T (k)} = W E{z(k − 1)z T (k − 1)}W T

where, approximating the wij (k − 1) as being spatially uncorrelated: n X [E{β(k − 1)β T (k − 1)}]ii = ǫ2 a2ij V AR[wij (k − 1)] j=1

the prediction error di (k) as in (8-9). The coefficients of the linear predictor can be calculated as in Section 3.1 in (15-17). Then, for each receiver j ∈ Ni , as we have done in Section 4.2 we can set: di (k) dˆji (k) = αji (k)dj (k)

d i (k )

zi (k )

Predictive Encoder

WZ Encoder

WZ Decoder

d j (k )

~ d i (k )

Predictive Decoder

~z (k ) i

→ x

(44)

→ y

(45)

dˆji (k) is the linear MMSE estimate of di (k) in terms of dj (k).

(side information)

Figure 5: Joint predictive and nested lattice code encoder/decoder diagram.

Remark 3. One may argue that a linear predictor of di (k) in the form of (29) will perform better than (45). While this argument is valid, we take advantage of the fact that dj (k) is available at the receiver for free since it is the quantity to be transmitted by sensor j at iteration k. The estimator coefficient in 45 is equal to:

and the non-diagonal entries are 0. We note that V AR[wji (k)] = Φ(R, △i (k), σdji (k) )

αji (k) = E{di (k)dj (k)}/V AR[dj (k)] (41)

where

for a given i, j pair, where △i (k) is given in (36). The procedure described in this section is followed for all k iteratively.

This strategy utilizes a different encoder for each neighbor. For each sender-receiver pair (i, j), we define message to be sent and side information as in (28) and (30). We follow the same procedure (31-35) to calculate predictor coefficients and prediction error. In this case, the optimum quantization interval is given as:

(43)

for a given i, j pair, where △ji (k) is given in (42).

5.

V AR[dj (k)] = [E{d(k)dT (k)}]jj

E{d(k)dT (k)} = −

COMBINED SCHEME

In this section, we combine the predictive coding with coding with side information to have the improved performance of the predictive coding for finite quantizers but also the further gain from the side information that is present at the decoder only. Under the combined scheme, we only discuss the broadcast strategy. The details of the peer to peer strategy are tedious but straightforward at this point.

5.1 Combined Broadcast Scheme This strategy makes use of the previous states estimates of the sender at the encoder and the decoder as in Section 3.1 to generate a prediction error which is to be encoded. The difference between combined strategy and pure predictive strategy is that the decoder also uses its present and past values as side information to improve the performance. The encoder/decoder structure is shown in Fig.5. This scheme is the concatenation of a predictive coder and a WZ coder. The algorithm is as follows: Let i be the sender and Ni be its neighbor set. The predictive encoder maps zi (k) into

E{z(k)z T (k)} p X E{z(k)˜ z T (k − l)}KlT

(47)

l=1



p X

Kl E{˜ z (k − l)z T (k)}

l=1

(42)

instead of (36). The sensor i encodes zi (k) as in (25) with △ = △ji (k). The sensor j reconstructs zi (k) as z˜ji (k) via (26). The state updates are given as in (38). The only difference in this scheme compared to the previous one is that each encoder has a different step size and hence (41) should be replaced by V AR[wji (k)] = Φ(R, △ji (k), σdji (k) ),

E{di (k)dj (k)} = [E{d(k)dT (k)}]ij

and

4.3 Peer to Peer WZ

△ji (k) = argmin△>0 Φ(R, △, σdji (k) )

(46)

+

p p X X

Ki E{˜ z (k − i)˜ z T (k − l)}KlT

i=1 l=1

(k)

In (47), Kl is a n × n matrix whose ii entry is ai (l) as defined in (15) and the non diagonal entries are approximated as 0. The derivation of the equality (47) is not given due to the space constraint. For each receiver j, the estimation error eji (k) = di (k) − dˆji (k) is now the side information noise (i.e. eji (k → n). Its variance is: σe2ji (k)

= E{(di (k) − dˆji (k))2 } = V AR[di (k)] − E 2 {di (k)dj (k)}/V AR[dj (k)]

For all receivers j ∈ Ni , the common quantization interval length is derived by using σeji (k) instead of σdji (k) in (36). The reconstruction of di (k) at sensor j is d˜ji (k). Then, the predictive decoder reconstructs zi (k) as z˜ji (k) = d˜ji (k) + zˆji (k − 1). To be able to encode/decode in the next iteration, the state values, the state covariances, and the noise covariances have to be updated. The corresponding equation are analogous to (38), (23), and (41) respectively.

6. THEORY OVERVIEW Recently we have studied the convergence conditions of the proposed algorithms. We have explored the conditions on the quantization noises at each step which lead to a consensus value whose MSE deviation from the initial mean

is bounded. Furthermore, we have showed that the proposed algorithms can achieve the bounded consensus with zero mean asymptotically. Due to lack of space, we only state lemmas, the proofs are given in [13].

While we do not have an expression for the optimum distortion rate regions for a given MSE bound from the initial mean, one example of rate allocations which achieves the bounded convergence with zero asymptotic rate can be calculated by Lemmas 1 and 2 as follows: Define the ratio of consecutive noise variances of sensor i at iteration σ 2 (k+1) i k as; w = η i (k + 1). If Ri (k) is chosen such that 2 σw (k)  i β k ηi (k) = k+1 and β > 1, ∞ X

2 2 σw (k) = σw (1)ζ(β) i i

k=1

where ζ(.) is a Riemann-Zeta function and has a finite value. Remark 4. To address the issues in Remarks 1 and 2, we will use a special kind of random graphs which is obtained by placing n nodes on the surface of a d-dimensional unit torus. We assume that d = 2 and the node coordinates are random with uniform distribution over the unit torus. The torus approach helps us to model the network without any boundary effects. It was shown in [1] that if the connectivity radius (r) is such that r 2 ≥ 4log(n) , then for large n each n node has fixed number of neighbors with high probability. If n and r are chosen such that the above condition is satisfied, and the initial observations of the sensors are i.i.d., W matrix can be constructed independently by each node. In other words, each node can label the network (edges and vertexes) individually without any loss of performance with respect to the case where W is known globally. This is due to the fact that the reconstructed graphs have isomorphic relationship. Moreover, the predictor coefficients for each neighbor will be the same which addresses the complexity issues in Remark 2. More detailed discussions are given in [13].

7.

NUMERICAL RESULTS

In this section, we show the numerical performance of the broadcast algorithms proposed, illustrating their convergence characteristics in the network scenario defined next. For simulation purposes, several random geometric graphs G(n; r) are generated with nodes uniformly distributed over a unit square. According to the definition of G(n; r), there exists a link between any two nodes if their range is less than r. We assume that there are no channel errors between two nodes that are connected. Moreover, we assume that matrix

1 0.5

states

states

1 0.5 0

-1 0

0 -0.5

10

20 30 iteration

40

-1 0

50

c) WZ coding p=0

20 30 iteration

40

50

1.5

1

1

0.5

0.5

0 -0.5 -1 0

10

d) Predictive+WZ coding p=1

1.5

states

Lemma 2. All rate allocations that allow the node values to converge to a common value, will eventually converge to zero rate under predictive coding scheme and WZ coding scheme.

b) Predictive coding p=2 1.5

-0.5

states

Lemma 1. Given finite number of sensors (n < ∞) and bounded A matrix (ali < ∞; ∀ l, i ∈ {1, . . . , n}), mean squared deviation Pfrom the 2initial average is bounded if and only if limk→∞ k−1 t=0 E{wi (t)} converges ∀ i ∈ {1, . . . , n}. Therefore, a necessary and sufficient condition for MSE deviation to be bounded is that the noise variances at each sensor form a convergent series.

a) Predictive coding p=1 1.5

0 -0.5

10

20 30 iteration

40

50

-1 0

10

20 30 iteration

40

50

Figure 6: State evolution(n = 10,R = 5,r = 0.5) A in (1) is such that:   aij = 1, if there is an edge between node i and j aij = aij = 0, if i = j or i 6= j and there is no edge (48) Fig.6 shows the evolution of the sensor states as the algorithm iterates under different methods using R = 5 quantization bits per state. All of these methods are initialized by the same zero mean unit variance Gaussian random data. There are 10 nodes in the network, and the connectivity radius is 0.5. Fig.6a and b shows predictive coding with order 1, and 2. We do not observe any significant improvement by increasing the prediction order p. In fact, for p >= 2 the covariance matrix becomes badly scaled giving rise to numerical errors in the computation of the prediction coefficients. Fig.6 c and d shows WZ coding with p = 0 and combined coding with p = 1. All four method converges to a consensus which is very close to the initial average (indicated with ∗). Fig.7 shows the behavior of the average mean squared error (MSE) per iteration, defined as: M SE(k) =

n n 2 1X 1X zi (k) − zi (0) . n i=1 n i=1

In other words, M SE(k) is the average mean squared distance of the states at iteration k from the initial mean. The algorithms are simulated through 1000 monte carlo initial state values, for a maximum of 400 iterations of the average consensus algorithm and averaged over 13 different random geometric graphs. The connectivity radius of each graph is a specific realization of uniformly distributed random variable in (0, 1) interval. Fig.7 shows that quantizing without taking into account of the increasing correlations among the states results in increasing and unbounded error variance as discussed by Boyd et al.[12](dotted line). Instead, it appears that if we utilize the correlation structure of the data exchanged as in our paper, the error variance converges to a finite value (solid line), irrespective of the iteration number. Since it is shown numerically that average consensus noise variance is bounded under the proposed algorithms as in

4

0.04

predictive coding p=1 simple quantization 0.035

2

0.03 empirical variance of the final states

0

MSE(dB)

-2

-4

-6

0.025

0.02

0.015

0.01

-8

0.005

-10 0

50

100

150

200 Iteration

250

300

350

400

Figure 7: MSE versus iteration(n = 10,R = 0.1)

0 1

1.5

2

2.5 3 3.5 rate per sensor per iteration

4

4.5

5

Figure 9: Final state variance(n = 10,r = 0.5,k = 40)

8. FINAL REMARKS -16 WZ coding with p=0 Predictive coding p=1

-18

MSE(dB)

-20

-22

-24

-26

-28

3

4

5

6

7

8

9

10

Rate per transmission per sensor

Figure 8: Rate Distortion Curve(n = 10,r = 0.5)

Fig.7, one can investigate the rate-distortion trade-offs of these methods for a fixed number of iterations. Fig.8 shows the rate-distortion performance of predictive coding(p = 1) and WZ (p = 0). The results are averaged over 1000 monte carlo runs under a single random network topology. While it is clear that the predictive coding method performs better, the WZ scheme also has reasonably good performance. This result is promising in the following way: One may argue that due to memory and lifetime constraints of battery powered wireless networks, storing and processing previous values may not be feasible. In this case, one can set p = 0 in the WZ coding strategy (no previous values but current states only reaching reasonably good performance. Having said that, we also observe that while the nodes reach to a consensus under predictive coding strategy at all rates, they may not under the WZ scheme for lower rates as in Fig.9. This is partly due to the fact that the WZ quantizer performance degrade but also due to the fact that in our algorithms the parameters are set based on an analysis that is valid under the high rate quantization assumption.

We have proposed and investigated source coding strategies for average consensus problems that make use of the temporal and spatial correlation of the data exchanged throughout the network. We have given the mathematical framework for predictive coding, nested lattice coding, and combined coding for both peer to peer and broadcast scenarios, providing the details on how to implement each strategy. Our most significant observation is that, even with highly suboptimal encoding strategies, using the temporal correlation and the ever increasing spatial correlation leads to a non increasing MSE as the number of iterations increases. Hence, this decreases the sensitivity of the algorithm precision relative to the convergence speed of the algorithm itself. On the other hand this also highlights that the accuracy gained by increasing the number of iterations saturates with these simple approaches. Hence, future work should clarify if optimum schemes would also have saturating benefits. Moreover, the comparison of our strategies leads to the interesting observation that predictive coding strategy performs better but WZ scheme has a promising performance even with p = 0.

9. REFERENCES [1] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah. Mixing times for random walks on geometric random graphs. In Workshop on Analytic Algorithms and Combinatorics Proc., January 2005. [2] M. Feder and N. Shulman. Source broadcasting with unknown amount of receiver side information. In IEEE Proc. of Information Theory Workshop Proc., pages 127–130, Banglore,India, 2005. [3] Z. Lin, S. Cheng, A. D. Liveris, and Z. Xiong. Slepian-wolf coded nested quantization(swc-nq) for wyner-ziv coding: Performance analysis and code design. In Data Compression Conference Proc., pages 322–331, 2004. [4] S. S. Pradhan and K. Ramchandran. Distributed source coding using syndromes (discus): Design and construction. In Data Compression Conference Proc., pages 158–167, March 1999. [5] M. Rabbat, R. Nowak, and J. Bucklew. Generalized consensus computation in networked systems with erasure links. In IEEE Workshop on Signal Processing Advances in Wireless Communications Proc., pages 1088–1092, June 2005. [6] W. Ren, R. W. Beard, and D. B. Kingston. Multi-agent kalman consensus with relative uncertanity. In American Control Conference Proc., pages 1865 – 1870, OR, USA, June 2005.

[7] R. O. Saber and R. M. Murray. Agreement problems in networks with directed graphs and switching topology. California Institute of Technology Technical Report, CIT-CDS 03-005, 2003. [8] D. Slepian and J. K. Wolf. Noiseless coding of correlated information sources. IEEE Trans. Inform. Theory, IT-19:471–480, July 1973. [9] A. Vosughi and A. Scaglione. Precoding and decoding paradigms for distributed data compression,. IEEE Transactions on Signal Processing, 2007. [10] A. D. Wyner and J. Ziv. The rate-distortion function for source coding with side information at the decoder. IEEE Trans. Inform. Theory, IT-22:1–10, Jan 1976. [11] L. Xiao and S. Boyd. Fast linear iterations for distributed averaging. Systems and Control Letters, 53:65–78, 2004. [12] L. Xiao, S. Boyd, and S. Kim. Distributed average consensus with least-mean-square deviation. In Mathematical Theory of Networks and Systems Proc., pages 158–167, Kyoto,Japan, July 2006. [13] M. E. Yildiz and A. Scaglione. Practical coding algorithms for consensus problems with zero asymptotic rate. Submitted to IEEE Trans. on Signal Processing, 2007. [14] R. Zamir and S. Shamai. Nested linear/lattice codes for wyner-ziv encoding. In IEEE Inform. Theory Workshop Proc., pages 158–167, Killarney,Ireland, 1998.

Then, E{z(k − l)wT (k − m)} =

E{z(k − m + (m − l))wT (k − m)}

=

W m−l E{z(k − m)wT (k − m)}

+

ǫ

m−l−1 X

W j AE{w(k−l−1− j)wT (k − m)}

j=0

=

ǫW m−l−1 AE{w(k − m)wT (k − m)}

We realize that z(k−m) is independent of the noise w(k−m), and noises vectors at different time instants are independent. In the case of l ≥ m, E{z(k − l)wT (k − m)} =

0

By the same way, the third term can be given as E{w(k − l)z T (k − m)} =

0

if l ≤ m. In the case of l > m, E{w(k − l)z T (k − m)} =

APPENDIX A. CALCULATION OF E{Z˜I (K − L)Z˜I (K − M )} We have already known that E{˜ zi (k − l)˜ zi (k − m)} = [E{˜ z (k − l)˜ z (k − m)}]ii . Covariance among noisy states at time instant l and time instant k can be written as:

E{w(k − m)wT (k − m)}ǫ W m−l−1 A

The last term, which is the covariance of the error terms is zero unless l = m since noises at different time instants are independent. If we are to summarize the formulas above in three different cases; If l = m E{˜ z (k − l)˜ z T (k − m)} = E{z(k − m)z T (k − m)}

E{˜ z (k − l)˜ z T (k − m)} = E{z(k − l)z T (k − m)}

+ E{w(k − m)wT (k − m)}

+ E{z(k − l)wT (k − m)} + E{w(k − l)z T (k − m)} T

+ E{w(k − l)w (k − m)}. We also note that z(k) vector can be written in terms of z(k − 1 − q) as follows: z(k) =

W z(k − 1) + ǫAw(k − 1) q X W q+1 z(k − 1 − q) + ǫ W j Aw(k − 1 − j). (49)

=

j=0

We will focus on the first term of covariance equation. Suppose l ≤ m; by eq. 49 correlation between states can be expressed as: T

T

E{z(k−l)z (k−m)} = E{z(k − m + (m − l))z (k − m)} = W m−l E{z(k − m)z T (k − m)} m−l−1 X



W j AE{w(k−l−1− j)z T (k − m)}

j=0

=W

m−l

T

E{z(k − m)z (k − m)} (50)

We note that each element of the summation term is zero since vector z(k − m) is independent of future noise vectors and both state and noise vectors have zero mean. In the case of l > m, state correlation can be written as; E{z(k − l)z T (k − m)} =

E{z(k − l)z T (k − l)}(W l−m )T

Now, we will focus on the second equation. Suppose l < m.

T

If l < m E{˜ z (k − l)˜ z T (k − m)} = +

W m−l E{z(k − m)z T (k − m)} ǫW m−l−1 AE{w(k − m)wT (k − m)}

If l > m E{˜ z (k − l)˜ z T (k − m)} = +

E{z(k − l)z T (k − l)}(W l−m )T E{w(k − l)wT (k − l)}ǫ W m−l−1 A

Moreover, E{˜ zi (k − l)˜ ziT (k − m)} can be found by taking ii entry of the E{˜ z (k − l)˜ z T (k − m)} matrix.

B.

CALCULATION OF E{ZI (K)Z˜IT (K − M )}

In this section, we will calculate correlation between state vector at time k, and noise vector at time k − m. We know that E{zi (k)˜ zi (k − m)T } = E{z(k)˜ z (k − m)}ii . Then,  E{z(k)˜ z (k − m)T } = E{z(k) z T (k − m) + wT (k − m) } =

W m E{z(k − m)z T (k − m)}

+

ǫW m−1 AE{w(k − m)wT (k − m)}

In the above derivation, we use the equations in Appendix A by setting l = 0, and m ≥ 1.

T

Suggest Documents