[31] Prakash Ishwar, Rohit Puri, S. Sandeep Pradhan and Kannan. Ramchandran. On rate-constrained estimation in unreliable sensor networks. In Proc. of the ...
ACCEPTED
FOR PUBLICATION IN THE
IEEE T RANSACTIONS
ON
C OMMUNICATIONS , J ULY 2005.
1
Scalable Decoding on Factor Trees: A Practical Solution for Wireless Sensor Networks Jo˜ao Barros and Michael T¨uchler
Abstract— We consider the problem of jointly decoding the correlated data picked up and transmitted by the nodes of a large-scale sensor network. Assuming that each sensor node uses a very simple encoder (a scalar quantizer and a modulator), we focus on decoding algorithms that exploit the correlation structure of the sensor data to produce the best possible estimates under the minimum mean square error (MMSE) criterion. Our analysis shows that a standard implementation of the optimal MMSE decoder is unfeasible for large scale sensor networks, because its complexity grows exponentially with the number of nodes in the network. Seeking a scalable alternative, we use factor graphs to obtain a simplified model for the correlation structure of the sensor data. This model allows us to use the sum-product decoding algorithm, whose complexity can be made to grow linearly with the size of the network. Considering large sensor networks with arbitrary topologies, we focus on factor trees and give an exact characterization of the decoding complexity, as well as mathematical tools for factorizing Gaussian sources and optimization algorithms for finding optimal factor trees under the Kullback-Leibler criterion. Index Terms— sensor networks, trees, complexity theory, quantization, MAP estimation
I. I NTRODUCTION Consider a large-scale sensor network in which hundreds of sensor nodes pick up samples from a physical process in a field, encode their observations, and transmit the data back to a remote location over an array of reachback channels. The task of the decoder at the remote location is then to produce the best possible estimates of the data sent by all the nodes. In [2], Barros and Servetto show that knowledge available at the receiver on the correlation between the measurements of different sensors can in general J. Barros is with the Department of Computer Science of the University of Porto, Porto, Portugal. URL: http://www.dcc.fc.up.pt/˜barros/. M. T¨uchler is with the Center of Microelectronics Aargau, University of Applied Sciences of NorthWest Switzerland, Aargau, Switzerland. This work was conducted while the authors were with the Institute for Communications Engineering of the Technische Universit¨at M¨unchen, M¨unchen, Germany. Parts of it have been presented at the 2004 IEEE International Conference in Communications [3], and the 2004 International Symposium on Inf. Theory and Applications [29].
be exploited to improve the decoding result and thus increase the reachback capacity of the network. This principle holds even when the sensor nodes themselves are not capable of eliminating the redundancy in the data prior to transmission. To fulfill this data compression task, each node would have to use complex Slepian-Wolf source codes, which might be impractical for large-scale sensor networks. In that case, the decoder can still take advantage of the remaining correlation to produce a more accurate estimate of the sent information. To study this scenario, we consider a reachback communications model in which the system complexity is shifted from the sensor nodes to the receiver, i.e., a reachback network with very simple encoders (e.g., a scalar quantizer and a modulator) and a decoder of increased yet manageable complexity. Our goal is then to devise a practical decoding algorithm for this instance of the sensor reachback problem. Our main contributions are as follows. First, we argue that a standard implementation of the optimal decoder based on minimum mean square error (MMSE) estimation is unfeasible for large-scale sensor networks, because its complexity grows exponentially with the number of sensors. To guarantee the scalability of the decoding algorithm, we propose to use factor graphs [20] that model the correlation between the sensor signals in a flexible way depending on the targeted decoding complexity and the desired reconstruction fidelity. The applied decoding algorithm is the sum-product (SP) algorithm [20]. We are able to show that by choosing the factor graph in an appropriate way we can make the overall decoding complexity grow linearly with the number of nodes. We also provide evidence that cycle-free factor graphs, so called factor trees, are particularly well suited for large-scale sensor networks with arbitrary topology, because (a) they guarantee that the SP algorithm is MMSEoptimal and thus the fidelity depends on the approximation only, (b) they yield low-complexity solutions, and (c) they allow the use of well-established minimum weight spanning tree algorithms. Using the KullbackLeibler distance (KLD) as a measure of the fidelity of the approximated correlation model, we give a detailed mathematical treatment of multivariate Gaussian sources
II. P ROBLEM S ETUP
and a set of optimization algorithms for factor trees with various degree constraints. Finally, we present numerical results that underline the performance and scalability of the proposed approach. It turns out that under reasonable assumptions on the spatial correlation of the sensor data, the performance of our decoder is very close to the optimal MMSE solution. Our results suggest a useful relationship between the KLD and the MMSE, which still requires proof. We note that the idea of exploiting the remaining correlation in the source encoded data to enhance the decoding result was already presented in Shannon’s landmark paper [27]. This principle was put effectively into practice in [13], triggering many contributions that exploit the redundancy left by suboptimal quantizers in combination with convolutional codes or turbo codes and powerful iterative decoding schemes [14]. More recently, this approach has also been successfully implemented using low-density parity-check codes (see [19] and references therein). In the context of sensor networks, early work by Slepian and Wolf [5] has inspired several contributions on the construction of distributed source codes (e.g. [30], [6], and [7]), which remove the redundancy in the data prior to transmission. Quantizers operating along these lines were proposed in [9], [10] and [11]. Joint source-channel codes for correlated sources and noisy channels were considered in [8]. Other related lines of research are focused on multi-sensor data fusion [32], [33] and rate-constrained encodings of sensor data with a given correlation structure (see e.g. [31] and references therein). The rest of the paper is organized as follows. Sec. II sets the stage for the main decoding problem by describing the system setup and elaborating on the drawbacks of the optimal decoder. Then, Sec. III describes our approach based on factor graphs and iterative decoding. A key contribution is the set of optimization tools presented in Sec. IV. The paper concludes with some numerical results in Sec. V and some comments in Sec. VI.
µ, Σ). Such a PDF is denoted as N (µ Source Model: Each sensor k observes at time t continuous real-valued data samples uk (t), with k = 1, 2, . . . , M . For simplicity, we assume that the M sensor nodes are placed randomly on the unit square and consider only the spatial correlation of measurements and not their temporal dependence. Thus, we drop the time variable t and consider only one time step. However, it is worth pointing out that the discussed techniques can be easily extended to account for sources with memory. The sample vector u = (u1 u2 ... uM )T at any given time t is assumed to be one realization of an M -dimensional Gaussian random variable, whose PDF p(u) is given by N (0M , R) with 1 ρ1,2 · · · ρ1,M ρ2,1 1 · · · ρ2,M R = .. .. .. . .. . . . . ρM,1 ρM,2 · · ·
BPSK
+
!
!!
Sensor Node M
BPSK
+
BPSK
+
1
It follows that the samples uk are distributed with N (0, 1). Gaussian models for capturing the spatial correlation between sensors at different locations are discussed in [26], whereas reasonable models for the correlation coefficients ρi,j of physical processes unfolding in a field can be found in [12]. In the following, we assume that the sensors are randomly placed in a unit square according to a uniform distribution and that the correlation ρi,j between sensor i and j decays exponentially with their Euclidean distance di,j , i.e., ρi,j = exp(−β · di,j ), where β is a positive constant. Notice that this correlation
. . .
Decoder
Sensor Node 2
. . .
. . .
. . .
1 2
p(a) = exp(− (a − µ )T Σ−1 (a − µ ))/(2π|Σ|)1/2 . (1)
#"
The basic system model that accompanies us throughout this paper is illustrated in Fig. 1. We begin with a brief explanation of our notation and a precise description of the source model, the encoding procedure, and the reachback channel. Notation: In the following, vectors are always considered to be column vectors and are denoted with small bold letters. Matrices are denoted with capital bold letters, unless otherwise noted. The expression 0N is the length-N all-zero column vector, IN is the N × N identity matrix, and |A| is the determinant of A. The covariance is defined by Cov{a, b} = E abT − E {a} E {b}T , where E {·} is the expectation operator. An N -dimensional random variable with realizations a ∈ RN is Gaussian distributed with mean µ = E {a} and covariance matrix Σ = Cov{a, a}, when its probability density function (PDF) p(a) is given by
Sensor Node 1
A. System Model
Fig. 1. System model of a sensor network.
2
The required posterior probabilities p(ik |y) are given by X p(ik = i|y) = γ · p(y|i)p(i), (3)
structure, which we deem to be a reasonable abstraction of the physical measurements picked up locally by a number of scattered sensors, is only one of many source models for which our algorithms apply. Encoding: We assume that the sensors are “cheap” devices consisting of a scalar quantizer, a bit mapper, and a modulator1 . Each sensor k quantizes uk to the index ik ∈ L = {1, 2, . . . , 2Q }, representing Q bits, i.e., there are 2Q reconstruction values u ˜(ik ) ∈ R. The modulator maps ik to a tuple xk of channel symbols, which are transmitted to the remote receiver. In our examples we use binary phase shift keying (BPSK), such that in a discrete-time baseband description of our transmission scheme ik is mapped to Q symbols xk = (xk,1 ... xk,Q ) from the alphabet {+1, −1}. Reachback Channel: Since in many applications sensors must transmit some data to the central receiver simultaneously, reservation based medium access protocols such as TDMA or FDMA are a reasonable choice for this type of reachback networks, as argued in [2]. Thus, we assume that the reachback channel is virtually interference-free, i.e., QMthe joint PDF p(y1 , ..., yM |x1 , ..., xM ) factors into k=1 p(yk |xk ). In addition, we model the reachback channel as an array of additive white Gaussian noise channels with noise variance σ 2 , i.e., the channel outputs are given by yk = xk +nk after demodulation, where nk is distributed with N (0Q , σ 2 IQ ).
∀i∈LM :ik =i
where i = (i1 i2 ... iM )T and γ = 1/p(y) is a constant normalizing the sum over the product of probabilities to one. Since the AWGN Q channels are independent, the PDF p(y|i) factors into M k=1 p(yk |ik ), where each p(yk |ik ) is a Gaussian distribution given by N (xk (ik ), σ 2 IQ ). The probability mass function (PMF) p(i) of the index vector i can be obtained by numerically integrating the source PDF p(u) over the quantization region indexed by i. Alternatively, one can resort to Monte Carlo simulations in order to estimate p(i), a task which needs to be carried out only once and can therefore be performed offline. The computational complexity of the decoding process is determined by the number of additions and multiplications required to compute the estimates u ˆ k for all k . The most demanding decoding operation is the Q marginalization of the indices ik in p(i) · M p(y k |ik ), k=1 denoted mk (i) =
X
∀i∈LM :i
p(i) · k =i
M Y
p(yl |il ).
(4)
l=1
Although the calculation of the PDF p(yk |ik ) and the P estimate u ˆk = γ · ∀i∈L u ˜(i) · mk (i) for all k requires a number of additions and multiplications that is linear in M , the marginalization in (4) requires 2Q(M−1) − 1 additions and M 2QM multiplications per index k . Problem Statement: We conclude that a straightforward computation of the MMSE-optimal decoder is unfeasible for networks with a large number of sensors M , the calculation in (4) being the major bottleneck — its computational complexity grows exponentially 2 with M . Our goal is thus to find a scalable decoding algorithm yielding the best possible trade-off between complexity and estimation error.
B. Optimal Decoding The decoder uses the channel output vector y = (y1 y2 ... yM )T and the available knowledge of the source correlation R to produce estimates u ˆ k of the measurements u . Assuming that the mean square error k 2 ˆk and (MSE) E (ˆ uk − u ˜(ik )) between the estimate u the source representation u ˜(ik ), corresponding to the transmitted quantization index ik , is the fidelity criterion to be minimized by the decoder, the conditional mean estimator (CME) [25] should be applied: X u ˜(i) · p(ik = i|y). (2) u ˆk = E {˜ u(ik )|y} =
III. S CALABLE D ECODING
USING
FACTOR G RAPHS
In this section we propose a scalable decoding solution based on factor graphs and the sum-product (SP) algorithm [20], in which the computational complexity of the decoding algorithm is restricted using the following two-step approach: First, an approximate model of the dependencies between the samples uk is defined yielding a factor graph that is suitable for decoding. Second, the SP algorithm is performed on the factor graph defined
∀i∈L
Notice that for PDF-optimized this estimator quantizers also minimizes the MSE E (ˆ uk −uk )2 between the estimate u ˆk and the originally observed value uk [16]. 1
This model for the encoder may seem too simple, yet it allows us to focus on the essential aspects of the problem and highlight the key features of our decoding algorithm. The latter can be easily extended to include, for example, more sophisticated channel coding.
2
Notice that although this is certainly true for a straightforward implementation of the decoding algorithm, it remains to be seen whether a more efficient implementation exists.
3
by the SP algorithm directly from the node degrees [1]. For our factor graph, in which the variables are drawn from the size-2Q alphabet L, a degree-d variable node requires d(d−2)2Q multiplications to compute d outgoing messages (a message consists of 2Q values, d−2 multiplications per value). A degree-d function node requires d(d−1)2Qd multiplications and d(2Q(d−1) −1) additions to compute d outgoing messages (d summations over (2Q(d−1)−1) values, d−1 multiplications per value). These complexity counts hold for graphs with cycles as well, but the number of operations scales with the number of iterations performed during message passing. Thus, to bound the complexity in this case, we must bound the number of iterations. Running the SP algorithm on the factor graph in Fig. 2, which yields the exact marginals mk (i), requires M 2Q multiplications in the M variable nodes for ik and M (2Q(M−1) −1) additions and M (M −1)2QM multiplications in the function node for p(i). Combining these numbers yields the same count as that below equation (4).
in the first step, delivering the desired data estimates with complexity growing linearly with M , the number of sensors in the reachback network. A. Factor Graphs and the Sum-Product Algorithm A factor graph depicts a function of typically many variables, which factors with respect to a suitable operation such as multiplication. There are two types of nodes in a factor graph: variable nodes representing the variables and function nodes representing the factors. The dependencies between variables and factors are indicated by edges connecting some of the nodes. The degree of a node is the number of incident edges. The function that needs QMto be factorized in our decoding problem is p(i) · k=1 p(yk |ik ) contained in (4). The corresponding factor graph, illustrated in Fig. 2 for M = 9 sensors, consists of M variable nodes for each ik , and M function nodes for each p(yk |ik ), as well as one degree-M function node for p(i). The 1 8
7
0.9
B. Scalable Decoding on Factor Trees
0.8
This complexity count can decrease tremendously if p(i) factors into functions with small numbers of arguments (indices ik ) yielding function nodes with small degree. There are general ways to factorize p(i) such as the chain rule, e.g. p(i) = p(i1 )p(i2 |i1 )...p(iM |i1 , ..., iM −1 ). However, some factors in this factorization still evidence a large degree (up to degree M ) and the factor graph contains cycles, so that the SP algorithm cannot be exact. To overcome this drawback, we propose factorizations yielding fully connected cycle-free factor graphs, which we name factor trees. Running the SP algorithm on a factor tree yields the correct marginals mk (i). Moreover, we can restrict the connectivity of the factor tree by limiting the function nodes to have a prescribed degree. For example, if we factorize according to p(i) = g1 (i1 )g2 (i2 , i1 )...gM (iM , iM −1 ) for some functions gk (·), we get a chain-like factor tree, whose function nodes have a degree of at most 2. Obviously, in most cases the PMF p(i) derived from p(u) will not have a structure leading to such a factorization. Consequently, we must seek an approximate source distribution pˆ(u) that does lead to a PMF pˆ(i) with the desired properties. In this paper, we consider factorizations of pˆ(i) into N functions gk (·), k = 1, 2, ..., N , where the function node degree, i.e., the number of arguments of gk (·), is at most 1, 2, or 3. Fig. 3 depicts possible factor graphs with these constraints on gk (·) for our example with M = 9 sensors. Running the SP algorithm on the degree-1 factor graph corresponds to scalar decoding, where no information
4
5
0.7
9
6
0.6 0.5 1
0.4
2
0.3 3
0.2 0.1 0
0
0.2
0.4
0.6
0.8
1
Q Fig. 2. Factor graph of the function p(i) · M k=1 p(yk |ik ) for a sensor network consisting of 9 sensors (numbered circles). We have nine variable nodes for each index ik (circles), nine function nodes for each factor p(yk |ik ) (empty boxes), and one function node for the factor p(i) (filled box).
marginals mk (i) in (4) can be computed by running the SP algorithm [20] on the factor graph in Fig. 2, which lets the nodes pass “messages” to their neighbors along the edges of the graph. As long as the factor graph is cycle-free, the SP algorithm yields the correct marginals mk (i) for the M variable nodes, and we know that the estimation error is due to the approximation of the correlation structure only. Otherwise, it becomes iterative (the messages circulate forever) and, in general, the marginals mk (i) cannot be computed exactly, which possibly leads to errors beyond those imposed by the chosen approximation. Cycle-free factor graphs also allow us to determine the exact number of additions and multiplications required 4
1
1 8
7
0.9
0.8
4
5
0.7
4
5
0.7
9
6
9
0.6
1
0.4
0.5 1
0.4
2
2
0.3
0.3 3
0.2
3
0.2
0.4
0.6
0.8
0
1
2
3
0.2
0.1 0
1
0.4 0.3
0.2
0.1
6
0.6
0.5
0.5
4
5
0.7
9
6
0.6
8
7
0.9
0.8
0.8
0
1 8 8
7
0.9
0.1 0
0.2
0.4
0.6
0.8
1
0
0
0.2
0.4
0.6
QM
0.8
1
QN
Fig. 3. Factor Q graph of the function pˆ(i) · k=1 p(yk |ik ) corresponding to a sensor network with M = 9 sensors, where pˆ(i) = k=1 gk (·) is given by 9k=1 gk (ik ) (left plot, degree-1 function nodes), g(i1 , i6 )g(i2 , i1 )g(i3 , i9 )g(i4 , i7 )g(i5 , i4 )g(i6 , i4 )g(i7 , i8 )g(i9 , i5 ) (middle plot, degree-2 function nodes), or g(i1 , i2 )g(i3 , i9 )g(i1 , i4 , i6 )g(i4 , i5 , i9 )g(i4 , i7 , i8 ) (right plot, degree-2 and -3 function nodes).
Moreover, it is possible to construct an optimal set of arguments of the functions gk (·) for these rather simple factor graphs, as we will prove in the next section. Remark: Since each quantization index ik depends on a unique source symbol uk , any factorization of the approximate probability distribution of the indices pˆ(i) corresponds uniquely to a factorization of pˆ(u) and vice-versa. Consequently, the optimal arguments of the functions gk (·) of the factorization for the indices pˆ(i) = Q N k=1 gk (·) can be easily obtained from the optimal functions fk (·) of the corresponding factorization for QN the source symbols pˆ(u) = k=1 fk (·). Thus, to obtain the approximate source distribution pˆ(u), we chose the functions gk (·) whose arguments correspond uniquely to the arguments of the functions fk (·). For example, from pˆ(i) = p(i1 )p(i2 |i1 )p(i3 |i1 ) we get pˆ(u) = p(u1 )p(u2 |u1 )p(u3 |u1 ).
about the correlations between sensors is taken into consideration. In this case, we have N = M . To obtain the approximate marginals m ˆ k (i) = p(yk |ik = i)gk (ik = i), no operations are required on the function nodes for gk (ik ) and only M 2Q multiplications must be performed in the M variable nodes for ik . Running the SP algorithm on the degree-2 or degree-3 factor graph requires 2·22Q multiplications and 2(2Q−1) addition per degree-2 function node and 6 · 23Q multiplications and 3(22Q − 1) per degree-3 function node. In addition, some multiplications are required in the variable nodes. The following lemma shows how many function nodes N are required to yield a factor tree: Lemma 1: A factor graph consisting of M variable nodes and N function nodes with degree dk , k = 1, 2, ..., N , can be a fully connected cycle-free factor graph, i.e., a factor tree, if and only if M +N −1=
N X
IV. M ODEL O PTIMIZATION
dk .
k=1
A. Optimization Criterion
Proof: The lemma follows from elementary graph theory (see e.g. [28]). The SP algorithm running on a factor tree that corresponds to the approximate index distribution pˆ(i) is a candidate for our desired scalable decoding algorithm, whose complexity increases linearly with the number of sensors M . There are numerous other factorizations of pˆ(i) yielding different complexity counts, e.g., by increasing the admissible degree of the function nodes, by clustering the variable nodes, or by allowing factor graphs with cycles. The latter yields a very large class of factors graphs, which admit an iterative SP algorithm [3], [21]. Interestingly enough, as will be shown in Sec. V, the performance of the scalable decoder based on factor trees with degree-2 or degree-3 functions nodes is already very close to that of the optimal decoder.
The performance of the scalable decoding algorithm proposed in the previous section naturally depends on how well pˆ(u) approximates p(u). A useful distance measure to determine how close pˆ(u) is to p(u) is the Kullback-Leibler distance (KLD) measured in bit [4, Sec. 9.5]: Z p(u) D(p(u)||ˆ p(u)) = p(u) log 2 du. (5) pˆ(u) Our motivation for using the KLD as the optimization criterion comes from previous work on fixed-rate vector quantization with mismatched codebooks [23, Sec. 6], where it is shown that if the quantizer is optimized for a model probability distribution pˆ(u) instead of the true source distribution p(u), the resulting excess quadratic distortion in decibels is proportional to the 5
KLD D(p(u)||ˆ p(u)). Although this result does not apply directly to our reachback system — in our case the source coding is done by an array of scalar quantizers and not by a vector quantizer — the vector quantizer approach does correspond to the case of full cooperation between the sensors, i.e. when every node knows all the source realizations observed by all the nodes. Therefore, the performance of a vector quantizer processing u can be viewed as an upper bound to the fidelity achieved by our coding scheme, and for this upper bound we know from [23] that the loss in MSE is proportional to the KLD between pˆ(u) and p(u). Indeed, our numerical results (discussed in detail in Sec. V) sustain a similar useful connection between the KLD for p(u) and pˆ(u) and the average MSE of our decoder. A mathematical proof that quantifies this relationship is a non-trivial matter due to the nonlinearity of the array of scalar quantizers — it remains a challenging problem for future work.
Thus, the set b1 is always empty. A special Sk−1case is the usual chain rule expansion, where bk = l=1 al holds. In our example, the CCRE p(u1 )p(u2 |u1 )p(u3 , u4 |u2 )p(u5 |u4 ) of the PDF p(u1 , ..., u5 ) is specified by a1 = {u1 }, b1 = ∅, a2 = {u2 }, b2 = {u1 }, a3 = {u3 , u4 }, b3 = {u2 }, a4 = {u5 }, and b4 = {u4 }. The next definition introduces another useful property: QN Definition 2: The CCRE pˆ(u) = k=1 p(ak |bk ) is said to be symmetric, if any bk , k = 2, 3, ..., N , is a subset of (al , bl ) for some l < k . The CCRE p(u1 )p(u2 |u1 )p(u3 , u4 |u2 )p(u5 |u4 ) of the PDF p(u1 , ..., u5 ) is symmetric, because the properties b2 ⊂ a1 , b3 ⊂ a2 , and b4 ⊂ a3 holds. The CCRE p(u1 , u2 )p(u3 , u4 |u2 )p(u5 |u4 , u1 ) of p(u1 , ..., u5 ) is not symmetric, since b3 = {u1 , u4 } is not contained in {a1 , b1 } = {u1 , u2 } or {a2 , b2 } = {u2 , u3 , u4 }. It turns out that symmetric CCREs of the source distribution p(u) yield the factor graphs of interest in this paper, i.e., the factor trees specified in Sec. III-B. Consider the following lemma: Q Lemma 2: If a CCRE pˆ(u) = N k=1 p(ak |bk ) for the source distribution p(u) has at most one conditioning variable in every factor, i.e., all bk are either empty or contain a single element, then (1) the CCRE is symmetric, and (2) the factor graph corresponding to pˆ(u) is a tree. Proof: See the appendix. From this Q lemma follows, for example, that a CCRE pˆ(u) = N k=1 p(ak |bk ) in which both the ak and the bk consist of single elements yield a factor tree where all function nodes have degree 2. Due to the chosen source model, we are particularly interested in CCREs of multivariate Gaussian distributions. Next, we present a useful lemma, which requires the following definition: let P be an M × M indicator matrix, whose entry in the l-th row and l 0 -th column is 1 if both ul and ul0 are contained in one of the N factors p(ak |bk ) and 0 otherwise. For example, for the CCRE p(u1 )p(u2 |u1 )p(u3 , u4 |u2 )p(u5 |u4 ) of p(u1 , ..., u5 ) we find 1 1 0 0 0 1 1 1 1 0 P= 0 1 1 1 0 . 0 1 1 1 1 0 0 0 1 1
B. Constrained Chain Rule Expansions Recall that our goal is to minimize the KLD D(p(u)||ˆ p(u)) subject to the constraints imposed QN on the functions fk (·) of the factorization pˆ(u) = k=1 fk (·). In this section, we introduce a few mathematical tools that are useful for this task. Consider the chain rule expansion p(u1 )p(u2 |u1 )p(u3 , u4 |u1 , u2 )p(u5 |u1 , ..., u4 ) of the source distribution p(u) = p(u1 , u2 , ..., u5 ), which consists of N = 4 factors. A constrained chain rule expansion (CCRE) is obtained by taking the factors of the chain rule expansion and removing some of the conditioning variables, thus yielding an approximate PDF of p(u1 , ..., u5 ). For example, a CCRE of p(u1 , ..., u5 ) with at most one conditioning variable is given by p(u1 )p(u2 |u1 )p(u3 , u4 |u2 )p(u5 |u4 ). The CCRE concept can be formalized as follows: Definition 1: Consider a PDF pˆ(u), which factors into N PDFs p(ak |bk ) according to pˆ(u) =
N Y
p(ak |bk ),
(6)
k=1
where ak and bk are subsets of the elements in u. This PDF is a constrained chain rule expansion (CCRE) of the source distribution p(u), if the following constraints are met: 1) All pairs of subsets ak and al , k 6= l, are disjoint: ak ∩ al = ∅. Sk−1 2) The elements in bk are connected: bk ⊆ l=1 al . SN 3) All elements uk of u are connected: k=1 ak = u.
The lemma can now be as follows. Qstated N Lemma 3: Let pˆ(u) = k=1 p(ak |bk ) be a CCRE of a Gaussian PDF p(u) given by N (0M , R). The following holds: 1) The PDF pˆ(u) is a zero-mean Gaussian PDF with ˆ , i.e., it is given by N (0M , R) ˆ . covariance matrix R
6
ˆ −1 are zero for all zero-positions 2) The entries of R in P. ˆ −1 equals M , i.e., tr(RR ˆ −1 ) = 3) The trace of RR M. ˆ 4) If the CCRE is symmetric, then the entries of R are equal to those in R for all one-positions in P. Proof: See the appendix. Based on Lemma 3, we can prove the following connection between symmetric CCREs and the KLDoptimal functions fk (·) of the factorization pˆ(u) = Q N p(u)): k=1 fk (·), which minimize the KLD D(p(u)||ˆ Theorem 1: Consider the Gaussian source distribution p(u) given by N (0M , R) and the PDF pˆ(u) = Q N k=1 fk (uk ), which factors into N functions fk (uk ) with subsets uk of u as argument. If the latter factorization admits a symmetric CCRE, i.e., all uk can be split into pairs {ak , bk } satisfying the constraints in Definitions 1 and 2, then the KLD-optimal functions fk (uk ) minimizing the D(p(u)||ˆ p(u)) are equal to the Gaussian PDFs p(ak |bk ) = p(ak , bk )/p(bk ), and the corresponding minimal KLD is given by N X
the underlying symmetric CCRE pˆ(u) = yields the smallest KLD D(p(u)||ˆ p(u)).
k=1 p(ak |bk )
Let la and lb denote the allowed maximal number of elements in the sets ak and bk of the CCRE pˆ(u) = QN p(a k |bk ), respectively. Recall that the algorithmic k=1 complexity for scalable decoding based on sum-product decoding on a factor tree grows exponentially with the degree df = la + lb of the function nodes, which is why we consider factor trees with df ≤ 3 only, as specified in Sec. III-B. Besides the trivial scalar decoder corresponding to the Q symmetric CCRE pˆ(u) = N p(u k ), i.e., (la , lb ) = k=1 (1, 0), we consider decoders based on the choice (la , lb ) = (1, 1) or (la , lb ) = (2, 1), which are based on factor trees with function node degrees of at most 2 or 3. From Lemma 2 follows that symmetric CCREs generate such factor trees when lb = 1, i.e. when the factors p(ak |bk ) of the CCRE contain only a single conditioning variable. Next, we provide optimization algorithms for these two classes of scalable decoders.
|Rak ,bk | , |Rbk | k=1 (7) where Rak ,bk and Rbk are the covariance matrices of the zero-mean Gaussian PDFs p(ak , bk ) and p(bk ), respectively. Proof: See the appendix. This theorem considerably simplifies our search for KLD-optimal approximate source distributions pˆ(u) = Q N k=1 fk (uk ) that yield factor trees with function nodes of degree at most 1, 2, or 3, by allowing us to restrict our attention to the set of symmetric CCREs and determine step by step the factor arguments ak and bk that minimize the KLD. Moreover, it follows from (7) that each factor p(ak |bk ) of pˆ(u) reduces the KLD D(p(u)||ˆ p(u)) by the amount log 2 |Rak ,bk |/|Rbk |, which is strictly negative, because in general |Rak ,bk | < |Rbk | holds. 1 1 D(p(u)||ˆ p(u)) = − log 2 |R| + 2 2
QN
log2
1) Factor Tree with Degree-2 Q Function Nodes: A symmetric CCRE pˆ(u) = N k=1 p(ak |bk ) yields a degree-2 factor tree if (la , lb ) = (1, 1), i.e. N = M −1, as stated inQLemma 1. Starting with the trivial factorization M −1 pˆ(u) = k=0 p(urk ), where {r0 , ..., rM −1 } is a permutation of the index set {1, ..., M }, admissible CCREs are constructed by adding conditioning variables to M−1 of these factors, i.e., pˆ(u) = p(ur0 )
M −1 Y
p(urk |usk ),
k=1
where s1 is necessarily equal to r0 and all other sk are chosen from the set {1, ..., M }. Combining the PDFs p(ur0 ) = p(us1 ) and p(ur1 |us1 ) yields the CCRE pˆ(u) = p(ur1 , us1 )
M −1 Y
p(urk |usk ),
M −1 Y
p(irk |isk ).
k=2
C. Optimization Algorithms
consisting of N = M − 1 factors with two Q arguments. The corresponding index factorization pˆ(i) = N k=1 gk (·) is given by
In the previous section, we proved for Gaussian sources that symmetric CCREs of the source distribution p(u) yield the KLD-optimal functions fk (·) of the factorQ ization pˆ(u) = N f (u ) k=1 k k provided that the arguments uk admit a symmetric CCRE. We also showed that factorizations yielding a factor tree with function nodes of degree 1, 2, or 3 always admit symmetric CCREs. Nevertheless, there exist many factor trees that connect the M variable nodes for each sensor in the network and so the problem becomes finding the factor tree for which
pˆ(i) = p(ir1 , is1 )
k=2
The calculation of the KLD D(p(u)||ˆ p(u)) via (7) requires the local covariance matrices 1 ρrk ,sk and Rusk = 1, (8) Rurk ,usk = 1 ρrk ,sk 7
which follow from the entries ρk,k0 of the covariance matrix R of the source distribution p(u), such that
2) Degree-3 Factor Trees: The optimization procedure for the previous case turned out to be relatively simple, because degree-2 factor trees can be interpreted as classical graphs and we could exploit well-established graph-theoretic techniques. Unfortunately, this is not true for degree-3 factor trees, forcing us to seek an alternative solution. In analogy with the previous case, we begin by rewriting (6) specifically for degree-3 factor trees according to
1 1 D(p(u)||ˆ p(u)) = − log 2 |R| + log2 |Rur1 ,s1 | 2 2 M −1 |Rurk ,usk | 1 X + log 2 2 |Rusk | k=2
1 = − log 2 |R| 2 M −1 1 X log 2 (1 − ρ2rk ,sk ). + 2
(M −1)/2
pˆ(u) = p(ur0 )
k=1
p(urk , usk |utk ),
k=2
Notice that a function node connecting the variable nodes irk and isk decreases the KLD by
where a1 = ur0 , ak = [urk−1 usk−1 ] (for k > 1) and bk = urk−1 . In practice, it is not always possible or useful to construct a degree-3 factor tree that consists solely of degree-3 function nodes, however to simplify the explanation we will neglect the additional degree-2 function nodes and assume that (M − 1)/2 is a natural number. Once again, we require the local covariance matrices
1 log 2 (1 − ρ2rk ,sk ), (9) 2 corresponding to the factor p(urk |usk ). We denote the first decrease due to the factor p(ur1 , us1 ) as ∆D0 = 1 2 2 log 2 (1 − ρr1 ,s1 ). The function nodes corresponding to p(ur1 , us1 ) or p(urk |usk ) can be regarded as vertices in a classical graph connecting the (variable) nodes irk and isk , which have the undirected weight 21 log2 (1 − ρ2rk ,sk ). Our optimization task — finding the factor tree arguments ak and bk for the factors p(ak |bk ) yielding a minimal KLD — can thus be formulated as a minimum weight spanning tree problem where the undirected weight of an edge between two nodes irk and isk is given by 12 log 2 (1 − ρ2rk ,sk ). To find this tree, we adapted Prim’s minimum weight spanning tree algorithm [28]. The algorithm finds the optimal tree with a very low complexity. Fig. 4 shows the outcomes of the proposed algorithm for a sensor network with M = 100 nodes using the source model outlined in Sec. II-A. ∆D1|1 =
1
Ruk+1 = Rrk ,sk ,tk = ρrk ,sk ρrk ,tk
ρrk ,sk 1 ρsk ,tk
ρrk ,tk ρsk ,tk 1
(10)
and Rbk+1 = Rsk = 1, where ρrk ,sk = R(rk , sk ) denotes the covariance between urk and usk . Now, we can calculate the KLD using (7), which results in 1 D(p(u)||ˆ p(u)) = − log2 |R| 2 (M −1)/2 1 X |Rrk ,sk ,tk | + log2 2 |Rsk | k=1
1 = − log 2 |R| 2 (M −1)/2 1 X log2 |Rrk ,sk ,tk |. + 2
1 0.9
k=1
0.8
Since the degree-3 factor tree cannot be described as a classical graph, we cannot apply a minimum weight spanning tree algorithm. Closer inspection reveals that our optimization problem is equivalent to the problem of finding the minimum spanning hypertree in a hypergraph, which is known to be NP-hard in general [17]. Thus, we propose a suboptimal greedy algorithm that constructs a degree-3 factor tree based on the optimal degree-2 factor tree: First, we try to replace a pair of degree-2 function nodes with one degree-3 function node that reduces the KLD without changing the original structure of the tree; then, we repeat this procedure over and over again until it is no longer possible to replace
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Y
0
0.2
0.4
0.6
0.8
1
Fig. 4. Degree-2 factor trees for M sensors placed randomly on the unit square, according to the source model described in Sec. II-A.
8
∆D2|1 − ∆D2×1|1
1 = log2 2
1 + 2 · ρrk ,sk · ρrk ,tk · ρsk ,tk − ρ2rk ,sk − ρ2rk ,tk − ρ2sk ,tk 1 + ρ2rk ,sk · ρ2rk ,tk − ρ2rk ,sk − ρ2rk ,tk
!
(11)
ing the factors p(usk |urk ) and p(urk |utk ) is
any function nodes. Assume, for example, that we want to compute the KLD decrement that results from the substitution of two function nodes connecting, say, the variable nodes i rk and isk and the variable nodes irk and itk , by a new function node connecting irk , isk and itk .
1 1 log2 |Rsk ,rk | + log 2 |Rrk ,tk | 2 2 1 2 = log2 (1 + ρrk ,sk · ρ2rk ,tk − ρ2rk ,sk − ρ2rk ,tk ) 2 Consequently, the overall reduction in KLD that results from the substitution is given by (11) shown on top of the page. This quantity, which is used by the algorithm to choose the appropriate substitutions, has the property that ρrk ,tk · ρrk ,sk ∆D2|1 − ∆D2×1|1 ≤ 0 if ≥0 ρsk ,tk (12) with equality when ρrk ,tk · ρrk ,sk = ρsk ,tk . It follows that a degree-3 function node always leads to a smaller KLD, except when the variables itk , irk and isk form a Markov chain and ρtk ,rk ρrk ,sk = ρtk ,sk . In this case, the two degree-2 factors translate the connection between the variables in an optimal way [15]. ∆D2×1|1 =
TABLE I A LGORITHM : F IND THE OPTIMIZED DEGREE -3 FACTOR TREE
Initialization Construct the optimal degree-2 factor tree T2 with Algorithm 1 Make a list of all combinations of three variable nodes which are neighbours in T2 |Rs ,r ,t | Calculate ∆D2|1 − ∆D2x1|1 = 21 log2 R k kR k | sk ,rk || rk ,tk | for every list entry sort the list in order of increasing ∆D2|1 − ∆D2x1|1 Function node counter: k ← 0 Main Loop repeat read next row [irk isk itk ∆D2|1 − ∆D2x1|1 ] of list if connection of irk , isk and itk does not form a cycle then k ←k+1 remove the two function nodes connecting irk , isk and i tk connect irk , isk and itk by a new function node with function gk = p(irk , isk |itk ), where itk represents the only conditioning argument in the previous functions end if until end of list
V. N UMERICAL E XAMPLES To evaluate the decoder performance, we measure the output signal-to-noise ratio (SNR) given by kuk2 Output SNR = 10 · log 10 in dB ˆ k2 ku − u versus the channel SNR ES /N0 averaged over a sufficient amount of sample transmissions. We consider two networks with M = 9 or M = 100 sensors. In our implementation, MMSE-optimal decoding can be simulated for the network with 9 sensors, only. Naturally, the results are highly dependent on the chosen source model. As outlined in Sec. II, we assume that the correlation between the sensors ui and uj is given by ρi,j = exp(−β · di,j ). Notice that if we keep increasing the number of sensors in the unit square without altering β , the sensor measurements would become increasingly correlated. Therefore, to obtain a fair result, we set β = 1.05 and β = 4.2 for the simulations with M = 9 sensors and M = 100 sensors, yielding correlation values ρi,j between 0.217 and 0.930 and between 0 and 0.945, respectively. Each sensor node uses a Lloyd-Max quantizer to map uk to ik , which is then transmitted in accordance with the system setup described in Sec. II. The decoder performance for the network with M = 9 sensors from Figs. 2 and 3 is illustrated in Fig. 5 for
In terms of the underlying CCRE, this step is equivalent to replacing the factors p(usk |urk ) and p(urk |utk ) with the factor p(usk , urk |utk ). Denoting the correlations between the sensors as ρrk ,sk , ρrk ,tk and ρsk ,tk , we can write the KLD decrement ∆D2|1 associated with the degree-3 function node representing the factor p(usk , urk , utk ) or p(usk , urk |utk ) as 1 log 2 |Rsk ,rk ,tk | 2 1 = log 2 (1 + 2 · ρrk ,sk · ρrk ,tk · ρsk ,tk 2 −ρ2rk ,sk − ρ2rk ,tk − ρ2sk ,tk )
∆D2|1 =
On the other hand, the KLD decrement ∆D2×1|1 associated with the pair of degree-2 function nodes represent9
1-bit quantization (Q = 1). Clearly, the factor-tree-based decoders (degree-2 and degree-3 tree) are nearly as good as the MMSE-optimal decoder. Note also that there is a direct correspondence between the decoding performance and the KLD. As expected, the scalar decoder loses a lot of performance, since it does not exploit any information about the source correlations. For this choice of source model, the improvement of the degree-3 tree over the degree-2 tree is barely noticeable, due to the fact that the correlation between measurements decays very quickly with the distance between the nodes. Fig. 6 depicts the performance results for the network with M = 100 sensors with multiple quantizers. The KLD-optimal degree-2 factor tree for this network is depicted in Fig. 4. Again, the KLD of the degree-2 tree is nearly as good that of the degree-3 tree, which finds a correspondence in their SNR performance. We recall that for this network size (M > 100) the optimal MMSE decoder is unfeasible.
4.6
16
Output SNR in dB
12
Output SNR in dB
10 SP tree d =2 Q=1 f
8
SP tree d =2 Q=2 f
SP tree d =2 Q=3 f
SP tree df=3 Q=1
6
SP tree df=3 Q=2 SP tree df=3 Q=3
4
2
no a priori Q=1 no a priori Q=2 no a priori Q=3 0
1
2
3
4 5 6 Channel SNR in dB
7
8
9
10
Fig. 6. Performance of three decoders based on optimized factor graphs for a network with M = 100 sensors using various quantizers (1, 2, or 3-bit quantization). The correlation factor between any two sensor measurements varies between ρ = 0 and ρ = 0.945. We consider the following cases: (1) scalar decoder (trivial factor graph, D(p(u)||ˆ p(u)) = 45.37 bits), (2) KLD-optimal degree-2 factor tree (D(p(u)||ˆ p(u)) = 6.13 bits), (3) optimized degree-3 factor tree ( D(p(u)||ˆ p(u)) = 5.40 bits).
t = 10000 samples M = 9 sensors Q =1
4.4
presented a scalable decoding scheme for the sensor reachback problem, which uses a simplified factor graph model of the dependencies between the sensor measurements such that a sum-product algorithm can produce the required estimates efficiently. Focusing on factor trees — for which we know that the SP algorithm delivers optimal estimates — we introduced the concept of constrained chain rule expansions and provided two optimization algorithms for the Gaussian case. The analysis tools we presented can be equally applied to many other factorization models yielding decoders with various complexities. Our analyses and simulation results indicate that the proposed approach is well suited for large-scale sensor networks. Natural extensions could include (a) adapting the factor graph to account for sensor nodes that have more complex features, such as entropy coding, channel coding or higher modulations, and (b) reducing the complexity further by running linear message updates in the nodes of the factor graph based on a Gaussian approximation of the message distributions [24].
4.2 4 3.8 3.6 3.4
CME SP tree df=3
3.2
SP tree df=2 no a priori
3 2.8 0
t = 10000 samples M = 100 sensors
14
2
4 6 Channel SNR in dB
8
10
Fig. 5. Performance of the MMSE-optimal decoder (CME) and three decoders applying the SP algorithm on the factor graphs in Fig. 3 for a network with M = 9 sensors and Q = 1-bit quantization. The correlation factor between any two sensor measurements varies between ρ = 0.217 and ρ = 0.930. We consider the following cases: (1) scalar decoder (cf. Fig. 3(left), D(p(u)||ˆ p(u)) = 3.92 bits), (2) optimal degree-2 factor tree (cf. Fig. 3(middle), D(p(u)||ˆ p(u)) = 0.43 bits), (3) optimized degree-3 factor tree (cf. Fig. 3(right), D(p(u)||ˆ p(u)) = 0.40 bits), (4) optimal MMSE Decoder.
VI. S UMMARY
AND
C ONCLUSIONS
We studied the problem of jointly decoding the correlated measurements picked up by a sensor reachback network. First, we showed that the complexity of the optimal MMSE decoder grows exponentially with the number of nodes in the network, thus motivating the search for scalable solutions offering a trade-off between complexity and end-to-end distortion. Then, we
A PPENDIX A) Proof of Lemma 2: The proof of part 1) is straightforward: if bk consists of at most a single element, this element must be contained in al for some l < k according to definition 1. 10
To prove part 2), we start with the CCRE pˆ0 (u) = QN ˆ(u) by removing k=1 p(ak ), which is derived from p all conditions bk . The factor graph of this CCRE is a tree (more precisely, a forest), since the subsets a k are pairwise disjoint again according to definition 1. The N subtrees corresponding to the factors p(ak ) are connected to a complete tree by adding exactly N −1 extra edges to the graph, such that each edge starts in the function node of a p(ak ). This results from adding bk conditions to the p(ak ) factors for all k = 1, ..., N , such that b1 is empty and all other bk consist of exactly one element, as stated in the lemma. This construction also serves to prove part 2) of the Theorem. B) Proof of Lemma 3: Let [Rak ]K denote the expansion of Rak to a K × K matrix, where the non-zero entries correspond to the positions of the ak elements in u, e.g., a 0 b 0 0 0 0 0 0 0 a b c 0 d 0 0 → [Ru1 ,u3 ]5 = Ru1 ,u3 = . c d 0 0 0 0 0 0 0 0 0 0 ˆ −1 Using this notation,Qthe inverse covariance matrix R of the PDF pˆ(u) = N k=1 p(ak |bk ) can be written as ˆ −1 = R
N X
−1 [R−1 uk ]K − [Rbk ]K ,
factor pˆ(u) into p(uk ) times a product of PDFs where all elements in uk appear in the conditioning part, only. The true source distribution p(u) can always be factored into p(uk ) times a PDF where uk is in the conditioning part using a suitable chain rule expansion. It follows that the variables in uk are Gaussian distributed with zero mean and covariance matrix Ruk according to either pˆ(u) or ˆ and R must have identical entries for all p(u), i.e. R variable pairs (ul , ul0 ) in uk . C) Proof of Theorem 1: The first step of the proof is to show that the KLDoptimal functions fk (uk ) and, thus, the PDF pˆ(u) must be Gaussian given that p(u) is zero-mean Gaussian. This is shown in [21]. The second step is to show that the factors fk (uk ) = p(ak |bk ) are the KLD-optimal functions: Let S be the set of all positive definite M ×M matrices, whose entries are equal to those in R for all one-positions in P whereas the other entries are arbitrary. Let S 0 be the set of all positive definite M ×M matrices, whose inverse has zero entries for all zero-positions in P whereas the other entries are arbitrary. From Theorem 2 in [22] follows that for any A ∈ S and any B ∈ S 0 the following inequality holds ˜ ≤ D(N (0M , A)||N (0M , B)), D(N (0M , A)||N (0M , B)) ˜ is that unique matrix from S 0 , whose entries are where B ˜ ∈ equal to those in R for all one-positions in P, i.e., B ˆ of the PDF pˆ(u) constructed S . A covariance matrix R from a symmetric CCRE is an element of both S (part ˆ is 2 of Lemma 3) and S 0 (part 4 of Lemma 3), i.e., R ˜ . Since the true source distribution R is an equal to B element from S , it follows that D(p(u)||ˆ p(u)) given by ˆ is the smallest KLD among D(N (0M , R)||N (0M , R)) all Gaussian PDFs, whose covariance matrix is an element from S 0 . Finally, the Q elements in S 0 represent the admissible factorizations N ˆ(u), i.e., k=1 fk (uk ) of p a Gaussian PDF pˆ(u) constructed from a symmetric CCRE yields the KLD-optimal factors fk (uk ) given by p(ak |bk ).
(13)
k=1
where uk = (ak , bk ) and Ruk and Rbk are the covariance matrices of the zero-mean Gaussian PDFs p(uk ) = p(ak , bk ) and p(bk ), respectively. This follows from the equivalence p(ak |bk ) = p(uk )/p(bk ) and the definition of a Gaussian PDF in (1). It is easy to see that pˆ(u) ˆ is a zero-mean Gaussian PDF and given by N (0M , R) ˆ −1 are zero at the zero-positions that the elements of R of P, which proves parts 1) and 2) of the lemma. The proof of part 3) follows trivially from [18, Corollary 1.2]. For details see [21]. To prove part 4), assume that the factor p(ak |bk ) in pˆ(u) can be replaced by p(uk )/p(bk ) while 1/p(bk ) cancels with the argument ul of another factor p(al |bl ), l 6= k , of pˆ(u). This is possible for symmetric CCREs, since bk is contained in (al , bl ) for some l < k , which yields
Since p(u) and pˆ(u) are Gaussian, computing the KLD D(p(u)||ˆ p(u)) simplifies to
1 ˆ −1 |) (− log 2 (|R| |R 2 p(al |bl )/p(bk ) = p(ul )/p(bk )/p(bl ) = p(u0l |bk )/p(bl ), ˆ −1 )−M ) + tr(RR 1 where u0l contains the remaining elements of ul after ˆ −1 |), = − log2 (|R| |R 2 taking out all those in bk . This replacement can be repeated recursively to cancel p(bl ) with p(am |bm ) for as shown in [18], [21], where the last line follows from some m < l and so forth until the empty set b1 is part 3) of Lemma 3. Applying the factorization (13) to ˆ −1 yields the formula (7) in the theorem. reached. Thus, with a symmetric CCRE it is possible to R D(p(u)||ˆ p(u)) =
11
ACKNOWLEDGEMENTS The authors most gratefully acknowledge discussions with Seong Per Lee and Christoph Hausl, who also ran the simulations while finishing their master/diploma theses at the Technische Universit¨at M¨unchen. The first author would also like to thank Sergio D. Servetto and J. Cardinal for their insightful comments. R EFERENCES [1] S. M. Aji and R. J. McEliece. The generalized distributive law. IEEE Trans. on Inf. Theory, 46(2):325–343, March 2000. [2] J. Barros and S. D. Servetto. Reachback capacity with noninterfering nodes. In Proc. of the IEEE International Symposium on Information Theory (ISIT 2004), Yokohama, Japan, July 2003. [3] J. Barros, M. T¨uchler, and Seong P. Lee. Scalable source/channel decoding for large-scale sensor networks. In Proc. of the IEEE International Conference in Communications (ICC2004), Paris, June 2004. [4] T. M. Cover and J. Thomas. Elements of Inf. Theory. John Wiley and Sons, Inc., 1991. [5] D. Slepian and J. K. Wolf. A Coding Theorem for Multiple Access Channels with Correlated Sources. Bell Syst. Tech. J., 52(7):1037–1076, 1973. [6] V. Stankovic, A. Liveris, Z. Xiong, and C. Georghiades. Design of Slepian-Wolf codes by channel code partitioning. In Proceeding of IEEE Data Compression Conference (DCC), Snowbird, UT, USA, March 2004. [7] J. Garcia-Frias and Y. Zhao. Compression of correlated binary sources using turbo codes. IEEE Communications Letters, pages 417–419, 2001. [8] W. Zhong, H. Lou, and J. Garcia-Frias. LDGM codes for joint source-channel coding of correlated sources. In Proceedings of the IEEE International Conference on Image Processing, Barcelona, Spain, September 2003. [9] Thomas J. Flynn, Robert M. Gray. Encoding of correlated observations. IEEE Trans. Inform. Theory, IT-33:773–787, 1987. [10] J. Cardinal, G. Van Assche. Joint entropy-constrained multiterminal quantization. In Proc. IEEE Int. Symposium on Information Theory (ISIT 2002), Lausanne, Switzerland, JuneJuly 2002. [11] G. Maierbacher, J. Barros. Low-Complexity Coding for the CEO Problem with Many Encoders. In Proc. of the 26th Symposium on Information Theory in the Benelux, Brussels, Belgium, May 2005. [12] C. R. Dietrich and G. N. Newsam. Fast and exact simulation of stationary Gaussian processes through circulant embedding of the covariance matrix. SIAM Journal on Scientific Computing, 18(4):1088–1107, 1997. [13] J. Hagenauer. Source-controlled channel decoding. IEEE Trans. on Communications, 43(9):2449–2457, September 1995. [14] J. Hagenauer, E. Offer, and L. Papke. Iterative decoding of binary block and convolutional codes. IEEE Trans. on Inf. Theory, 42(2):429–445, March 1996. [15] Christoph Hausl. Scalable decoding for large-scale sensor networks. Diploma Thesis, Lehrstuhl f¨ur Nachrichtentechnik, Technische Universit¨at M¨unchen, M¨unchen, Germany, April 2004. [16] N. Jayant and P. Noll. Digital Coding of Waveforms. Prentice Hall, 1984.
12
[17] D. M. Warme. Spanning Trees in Hypergraphs with Applications to Steiner Trees. PhD Thesis, University of Virginia, May 1998. [18] A. Kavcic and J. Moura. Matrices with banded inverses: Inversion algorithms and factorization of gauss-markov processes. IEEE Trans. on Inf. Theory, 46:1495–1509, July 2000. [19] I. Kozintsev, R. Koetter, and K. Ramchandran. A framework for joint source-channel coding using factor graphs. In Proc. 33rd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, October 1999. [20] F. R. Kschischang, B. Frey, and H.-A. Loeliger. Factor graphs and the sum-product algorithm. IEEE Trans. on Inf. Theory, 47(2):498–519, 2001. [21] Seong Per Lee. Iterative decoding of correlated sensor data. Diploma Thesis, Lehrstuhl f¨ur Nachrichtentechnik, Technische Universit¨at M¨unchen, M¨unchen, Germany, October 2003. [22] H. Lev-Ari, S. Parker, and T. Kailath. Multidimensional maximum-entropy covariance extension. IEEE Trans. on Inf. Theory, 35:497–508, May 1989. [23] J. Li, N. Chaddha, and R. Gray. Asymptotic performance of vector quantizers with the perceptual distortion measure. IEEE Trans. on Inf. Theory, 45(4):1082–91, May 1999. [24] H. Loeliger. Least Squares and Kalman Filtering on Forney Graphs. Codes, Graphs, and Systems, R.E. Blahut and R. Koetter, eds., Kluwer, 2002. [25] H. V. Poor. An Introduction to Signal Detection and Estimation. Springer-Verlag, 1994. [26] A. Scaglione and S. D. Servetto. On the interdependence of routing and data compression in multi-hop sensor networks. In Proc. ACM MobiCom, Atlanta, GA, 2002. [27] C. E. Shannon. A mathematical theory of communication. Bell Syst. Tech. J., 27:379–423 and 623–656, 1948. [28] K. Thulasiraman and M. N. S. Swamy. Graphs: Theory and Algorithms. John Wiley and Sons, Inc., 1992. [29] M. T¨uchler, J. Barros, and C. Hausl. Joint source-channel decoding on factor trees: A scalable solution for large-scale sensor networks. In Proc. of the 2004 International Symposium on Information Theory and its Applications (ISITA 2004), Parma, October 2004. [30] S. Pradhan, J. Kusuma and K. Ramchandran. Distributed compression in a dense sensor network. IEEE Signal Processing Magazine, 1, March 2002. [31] Prakash Ishwar, Rohit Puri, S. Sandeep Pradhan and Kannan Ramchandran. On rate-constrained estimation in unreliable sensor networks. In Proc. of the Second International Workshop on Information Processing in Sensor Networks (IPSN), Palo Alto, CA, April 2003. [32] D.L. Hall and J. Llinas. An introduction to multisensor data fusion. In Proceedings of the IEEE, vol. 85, n 1, pp. 6-23, 1997. [33] H.F. Durrant-Whyte, M. Stevens and E. Nettleton. Data fusion in decentralised sensing networks. In Proc. Fusion 2001, pp.302-307, Montreal Canada, July 2001.