3538
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 8, AUGUST 2008
Group Testing for Binary Markov Sources: Data-Driven Group Queries for Cooperative Sensor Networks Yao-Win Peter Hong, Member, IEEE, and Anna Scaglione, Member, IEEE
Abstract—Group testing has been used in many applications to efficiently identify rare events in a large population. In this paper, the concept of group testing is generalized to applications with correlated source models to derive scheduling policies for sensors’ adopting cooperative transmissions. The tenet of our work is that in a wireless sensor network it is advantageous to allocate the same channel dimensions to all sensor sources that have the same response to a sequence of queries or tests. That is, nodes that have the same data attributes should transmit as a cooperative supersource. Specifically, we consider the case where sensors’ data are modeled spatially as a one-dimensional Markov chain. Two strategies are considered: the recursive algorithm and the tree-based algorithm. The recursive scheme allows us to illustrate the performance of group testing for finite populations while the tree-based algorithm is used to derive the achievable scaling performances of the class of group testing strategies as the number of sensors increases. We show that the total number of queries required to gather all sensors’ data scales in the order of the joint entropy. A further generalization of this concept provides the basis of deriving efficient data-gathering algorithms for correlated sources. Index Terms—Cooperative communications, data gathering in sensor networks, distributed source coding, group testing, multiple access.
I. INTRODUCTION ROUP testing was first introduced by Dorfman [1] to improve the efficiency of blood tests on a large population of blood samples. The idea is to pool together multiple blood samples and test them simultaneously instead of performing the test separately on each sample. If the collective outcome of the group test is positive, we gain the knowledge that at least one blood sample in the group is infected and additional tests must be performed on smaller subgroups to single out the infected samples. On the other hand, if the test is negative, we know that all samples in the group are clear of infection and the states of all these samples are identified with only one test. Consequently, the expected number of tests required to classify a large population of blood samples is significantly reduced if the event of an infection is rare.
G
Manuscript received December 4, 2006; revised December 19, 2007. This work was supported in part by the National Science Council (Taiwan) under Grant NSC-95-2221-E-007-043-MY3 and the National Science Foundation under Grant CCF-0514243. Y.-W. P. Hong is with the Institute of Communications Engineering, National Tsing Hua University, 30013 Hsinchu, Taiwan, R.O.C. (e-mail: ywhong@ ee.nthu.edu.tw). A. Scaglione is with the Department of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14853 USA (e-mail:
[email protected]). Communicated by A. Høst-Madsen, Associate Editor for Detection and Estimation. Digital Object Identifier 10.1109/TIT.2008.926363
The (group) query and (group) response (Q&R) architecture of the group testing methodology serves as the basis of numerous data-dependent retrieval or classification strategies. Specifically, it has been applied to areas such as DNA library screening [2], random access in computer networks [3], [4], image compression [5], and industrial quality control [6]. The Q&R strategies proposed for these applications exploit the knowledge of the data statistics as well as the ability to test groups of data at unison to reduce the cost of data retrieval. Despite the large number of applications, these works consider solely the case where the states of the tested items can be modeled as independent and identically distributed (i.i.d.) Bernoulli random variables and numerous strategies have been derived for this specific source model. In this paper, we consider the Q&R data retrieval structure for sensor networks and adopt the group testing methodologies to derive transmission scheduling policies for correlated sensors. The key observation that leads to our proposed strategy is that, in dense sensor networks, closely located sensors often contain the same data to transmit due to the high spatial correlation of the measurements. Therefore, it would be inefficient to allocate individual channels for each sensor since the local data is redundant and/or contains a quantity of information that may be much less than the capacity of the channel. In fact, under the Q&R architecture, sensors that contain the same data should be queried simultaneously and, then, respond cooperatively through a single channel use. Following the ideas of classical group testing, we devise data-dependent scheduling policies for the cooperative transmission of highly redundant sensors. However, our strategy differs from the classical schemes in two aspects: 1) we assume a correlated data model among sensors as opposed to the i.i.d. Bernoulli data model considered in the literature; and 2) we exploit the sensors’ ability to respond to different types of queries. (In blood testing applications, the blood samples are passively tested objects and the series of queries are of the same type that can only test the existence of an infected sample in the group.) Our main contribution is to extend the group testing methodology beyond the i.i.d. Bernoulli model and to derive Q&R strategies for the retrieval of correlated data in sensor networks. This work differs from most data gathering methods proposed in the literature where the sensors are queried individually and their responses are sent through separate channels [7], [8]. Specifically, we consider the case where the sensors’ data are modeled as a sequence of spatially Markov random variables. As in the conventional i.i.d. Bernoulli case, there is no general
0018-9448/$25.00 © 2008 IEEE
HONG AND SCAGLIONE: DATA-DRIVEN GROUP QUERIES FOR COOPERATIVE SENSOR NETWORKS
approach to obtain the optimal group testing strategy (i.e., one that requires the minimum number of tests) and, thus, two suboptimal algorithms, i.e., the recursive algorithm and the tree-based algorithm, are proposed and analyzed in this paper. The analysis of these two strategies is sufficient to show the effectiveness of collaborative group transmissions of correlated data. The former scheme derives the queries through a set of recursive equations that are shown to achieve near-optimal performance for systems with finite number of sensors. The dependence of the performance on the data statistics cannot be derived explicitly in this scheme but can be obtained from the analysis of the tree-based algorithm. More specifically, by analyzing the tree-based algorithm, we obtain an upper-bound on the performance of the optimal group testing strategy with respect to the parameters of the stochastic data model. Let be the number of sensors in the network. As increases, we show that, for the special cases of interest, the expected number of queries scales in the same order as the entropy. Preliminary results of our work were presented in [9]–[14]. We think that the idea of allocating a cooperative source and/or channel code to the response of a group query is a new concept. We consider this paper as a starting point to develop more general techniques based on this tenet. Since only the noiseless channel is considered in this work, the results are more akin to the cooperative source coding framework. Cases that consider noise on the transmission link is beyond the scope of this paper but possible approaches are discussed in Section VII and [15]. The paper is organized as follows. In Sections II and III, we describe the binary Markov model and the Q&R strategy that we use to retrieve the sensors data. We propose and analyze two algorithms: the recursive scheme and the tree-based scheme in Sections IV and V, respectively. In Section VII, a generalized formulation of group testing is given and the relations to the information-theoretic literature are discussed. This work has implications to many fields of research which we shall summarize in Section VIII. Finally, we conclude in Section IX. II. SYSTEM MODEL A. Binary Markov Source Model Consider a network of sensors, denoted by , and a random vector that represents the sensors’ data, i.e., is the data at sensor . To consider correlation among sensors, we model as a first-order shift-invariant Markov sequence with , , for , for all . That is, the probability of can be expressed as
Let and be the transition probabilities as illustrated in the state diagram shown in Fig. 1(a). In sensor network applications, the Markov sequence may be used to model approximately the data observed by a line of sensors that quantize a continuous random field with 1 bit at each local sensor, e.g., Fig. 1(b). When the
3539
Fig. 1. Illustration of the binary Markov data model. (a) The two-state Markov chain. (b) The line network.
number of sensors increases in the fixed interval , the transition probabilities will decrease and, thus, increases the correlation among neighboring sensors. We further assume that the process is stationary1 with (1) and (2) The correlation coefficient between adjacent nodes is given by (3) is the variance of where and is the covariance and . We note that each set of uniquely between specifies a pair of transition probabilities . In this work, we consider the cases where takes on values within the interval . When , the problem reduces to the case of i.i.d. Bernoulli random variables, which was the model considered in most classical group testing problems [4], [6]. B. Query-and-Response (Q&R) Communication Architecture To retrieve the sensors’ data, we adopt the Q&R communication architecture where the sensors’ transmissions are triggered by the query from the data gathering node. The sensors are treated as a distributed database that records the natural phenomenon occurring at each local point. To efficiently retrieve the sensors’ data, each query imposed by the data gathering node may involve multiple sensors which respond simultaneously through cooperative transmissions. The coordination among the distributed sensors that is needed to achieve cooperative transmissions is provided by the queries. These cooperative transmissions are easily synchronized since they are triggered simultaneously by the query. Borrowing from the concept of group testing and considering the distributed source model in Section II-A, we choose in each query a group of sensors that are likely to contain the 1Under the stationary assumption, the Markov chain becomes “time re= j = ) = Pr( = j = ) for versible” such that Pr( 2 f0 1g. all
a; b
;
X
aX
b
X
aX
b
3540
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 8, AUGUST 2008
same binary data. For example, when the sensors in a group are likely to contain the bit , the receiver polls the sensors in along with a query asking whether or not the sensors actually contain the bit . This query is denoted by (where in this example). In fact, this is a generalization to the classical group testing, e.g., blood testing, where the queries are restricted to asking only one question “Is the blood sample infected?”. In our case, the two and can be used arbitrarily and selected questions with respect to the data statistics. A generalization of the queries are discussed in Section VII and in [13], [15]. Suppose that each sensor is equipped with a simple transmitter that responds only with an on–off keying signal. Specif, sensor transically, as the response to the query , if and only if and (i.e., a mits a pulse, pulse is emitted in protest of the wrong guess imposed by the data gathering node); otherwise, the sensor remains silent. Let us partition the group into two distinct subgroups and such that . The signal arriving at the receiver is denoted by (4)
is the pulse emitted when the guess was incorrect, where is the channel response between sensor and the data gathis the additive white Gaussian noise. ering node, and Let us also consider a simple receiver structure where the receiver only detects for the existence of a pulse, i.e., the receiver decides between the two hypotheses
for (5) Let represent the collective response to the query, which takes if is detected and , otherwise. The on the value optimum pulse design and the optimum maximum a posteriori probability (MAP) receiver for this binary test can be derived through standard approaches [16], [17] based on the available knowledge of the channel state at either the transmitter or the receiver. Even though more complicated receivers can be used to acquire more information from each query, such as to estimate the number of pulses embedded in the aggregate response, the increase of the alphabet size of makes the system more susceptible to noise and, thus, may not necessarily result in a more efficient data gathering method. In addition, to further improve the reliability of the responses, beamforming gains can also be utilized with group transmissions by adopting the cooperative time-reversal communications proposed in [18], [19]. While it is important to discuss the optimal physical layer design and/or the cooperative channel coding problem, we do not concentrate on these issues in this paper. Instead, we focus on optimizing the Q&R procedure to schedule the channel access for correlated sources while considering only the simple physical layer model proposed above to support the Q&R methodology.
To focus on the efficiency of the methodology, we assume that the response from the sensors are noiseless.2 In the absence of noise, the sensors determine the existence of a pulse with , the receiver knows that all the sensors no error. When specified in the query contain the same bit and, therefore, has resolved the set using only one query. On the other hand, when , the receiver knows that there exists at least one sensor in the group that does not possess the bit but no information is given on the specific identity or the total number of these sensors. In this case, smaller subsets of the group must be queried in subsequent time slots in order to identify the sensors possessing the opposite message. By choosing appropriately the sequence of queries, one can eventually resolve the entire set of sensors’ data . III. QUERY-AND-RESPONSE (Q&R) STRATEGIES Based on the data model and transceiver assumptions given above, our goal is to design Q&R strategies that reduce the total number of queries needed to obtain a lossless reconstruction of at the data gathering node. To achieve this task, the group and the question chosen for each query must depend on the statistics of the data, which is assumed to be known at the data gathering node. Since the sensors’ observations are correlated, the data not yet retrieved can be inferred by the data gathered via the previous queries. Hence, the statistics of the data are updated dynamically as more information is collected from the sensors. Suppose that is the total number of queries needed to reconstruct the data . Let be the sequence be the seof queries and let is the response corresponding quence of responses, where . Each response gives way to forto the th query, i.e., mulate the subsequent queries in an adaptive fashion. Initially, may depend only on the prior distribution the query but, as more data is gathered from the sensors, later queries can be generated by exploiting the information provided by the precan be formulated vious responses, e.g., the th query . By based on the conditional distribution saying that is the total number of queries needed to reconstruct , we mean that the data gathering node is able to identify the value of with probability after queries. That is, after ob, we have taining the responses if is the actual value of . Notice that always exists and is upper-bounded by , which is obtained by querying the sensors one-by-one. In the absence of noise, can be viewed and, as a binary data representation that uniquely identifies is lower-bounded by the entropy thus, the expected length [20], i.e., of (6) Conventionally, the communication of the sensors’ data are achieved with a separation between the source and channel coding, in which case the communication cost scales linearly with the number of sensors (since each sensor must communicate through a separate channel). In the following, we show 2When noise is present one must optimize the queries to take into consideration the statistics of the noise. The treatment of noise is beyond the scope of this paper and it is the subject of our future work. Preliminary studies can be found in [13], [15].
HONG AND SCAGLIONE: DATA-DRIVEN GROUP QUERIES FOR COOPERATIVE SENSOR NETWORKS
that, with proper design of data-dependent group queries, the total number of channel uses needed to gather the sensors’ data scales with the entropy of the data instead of the number of sensors in the network. However, as in classical group testing, there is no tractable method for obtaining the optimal set of queries to minimize the number of channel uses. Nonetheless, the advantage of the Q&R architecture can be illustrated through the analysis on suboptimal algorithms. Specifically, we study the performance of two suboptimal methods: 1) the optimized recursive algorithm, where the optimal set of sensors to be queried in each time slot is selected by solving a set of recursive equations and 2) the tree-splitting algorithm, where the group of sensors are chosen through a series of binary partitions of the network. These methods follow the same approach as those used to analyze the performance of conventional group testing problems in [4], [21]. However, the analysis and the intuitions that follows differs from that given in the literature. In the following, we analyze two schemes: the first scheme allows us to illustrate the effectiveness of this strategy with finite number of sensors while the second strategy provides an analytical study on the scaling of the performances as increases. More importantly, we note that, with the Q&R communication architecture, the system only requires extremely simple sensor transmissions and no complex operations are performed at the sensors. Remark 1: It is worthwhile to note that the Q&R communication architecture is applicable to the multihop scenario as well. Suppose we have a multihop network where each link has a fixed capacity of 1 bit and that there exists a data-gathering tree with the data gathering node as its root. To apply the Q&R strategy, a query is first sent through multihop broadcasting to the sensors and the responses from the sensors are then sent through the reverse paths to the data gathering node. When only a bi), each intermediate nary response is required (e.g., sensor on the multihop path can aggregate the responses from its downstream sensors3 (instead of routing the data directly) by performing the OR operation on the received symbols along with the local response. That is, an intermediate node will respond with bit if and only if either itself or one of its downstream sensors respond with a in protest to the guess imposed by the query. The cost of responding to each query is of 1 bit per node and the delay of retrieving the query is bounded by the longest path length. IV. OPTIMIZED RECURSIVE GROUP TESTING ALGORITHM As described in the previous sections, the total number of queries can be reduced if we select a group of sensors that are likely to contain the same data in each query. Under the binary Markov model, this event is most likely to occur among conor that tiguous sensors if the transition probabilities . Therefore, in this algorithm, we reduce the complexity of the search for optimal queries by imposing two restrictions on the group selection. First, we only allow groups of contiguous sensors, i.e., the sensors with continuous labels, to be queried simultaneously. Second, a node must be included 3Downstream sensors refer to those that transmit data through the sensor of interest enroute to the data-gathering node.
3541
in the current query if it contains the smallest index among the set of unresolved sensors.4 The method is suboptimal due to the restrictions on the group selection but the computational complexity is reduced since we now only need to determine the number of contiguous sensors in each group (i.e., the cardinality for the th query) and the best question to ask (i.e., of ) in each query. Our search for the best group under these (i.e., the restrictions is reduced from a set of maximum size power set of ) to a set of maximum size . The best query under these group selection constraints is obtained by solving a set of recursive equations that are derived below. The derivation follows similar approaches as in [4]. be the minimum number of queries needed Let using to resolve the data , our the optimized recursive strategy. For goal is to find the expected value of , where , along with the Q&R strategy that achieves this expected value. To initialize the querying process, we start by gathering data from the first sensor in the network , which is achieved by allocating a query to sensor 1 alone and by asking one of the two questions or . If we know from the initial query that , where , then the expected number of queries that are still needed to resolve is . Since with probability and with probability , we can derive the expected number of queries under the optimized recursive scheme as
(7) where (8) is the expected number of queries needed to gather data given . Notice that is invariant to the index due to the spatial homogeneity of the Markov chain, i.e.,
for all positive integers . in (7), we must first solve for To obtain . Specifically, after acquiring the knowledge that from the initial query, we go on to choose the number of sensors and the question to ask in the next query. Suppose , we choose to query the next sensors, i.e., sensors to and to ask the question . If all sensors contain the bit (i.e., ), no sensor will respond to the query and the data-gathering node gains knowledge that , that is, the data are resolved. The expected number of queries needed to resolve the remaining sensors’ data becomes
4It is worthwhile to note that the Markov chain is “time-reversible” under the stationary assumption and, thus, the query can be initiated from both directions.
3542
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 8, AUGUST 2008
which follows from the Markov property. On the other hand, if there exists a sensor that does not contain (but contains instead, where is the complement of ), the sensor will respond in protest to the question. In this case, we are not yet , but we gain knowledge able to resolve the data such that (s.t.) of the fact that there exists . Therefore, the expected number of remaining queries needed becomes s.t . Hence, we have
the query, then we know that there exists such that and the expected number of queries remaining becomes s.t.
If we instead choose and that all sensors contain the , the expected number of bit , i.e., queries remaining becomes
If a sensor responds in protest to the query, then we know that contain the at least one sensor among the set bit and the expected number of queries remaining becomes where is the -dimensional vector with all- entries. Hence, the number of sensors and the question is chosen to minimize the number of queries needed to resolve the remaining sensors’ data. More generally
s.t.
(9)
for , be the expected number of queries needed to gather given that and that there the data such that . With (9), we exists as can also express more generally
(11) which is the expected number of queries needed to gather the given that: i) , ii) there exists data such that , and iii) there exsuch that . Notice that ists by definition. Similarly, and are chosen to minimize the expected number of remaining can be solved as (12), shown queries. Therefore, at the bottom of the page, where s.t.
(10) . where To solve for in (9), let us start by querying data from the next sensors with the question . Since we know that and that there exists such that , therefore, if we choose and that all sensors indeed contain the bit (i.e., ), the expected number of queries remaining will become
s.t.
Please note that, for , it is only meaningful to have since we know that there exists such that . On the other hand, if a sensor responds in protest to
Given that i) , ii) there exists such that , and iii) there exists such , we can also derive following similar that procedures as before. Suppose we choose to query the next sensors and ask the question . If (which is meaningful ) and that all sensors indeed contain the only when bit , then the expected number of remaining queries becomes
s.t.
If a sensor responds in protest to the question, the expected number of remaining queries becomes
s.t.
s.t.
for for (12)
HONG AND SCAGLIONE: DATA-DRIVEN GROUP QUERIES FOR COOPERATIVE SENSOR NETWORKS
3543
On the other hand, if (which is meaningful only when ) and that all sensors indeed contain the bit , then the expected number of queries remaining becomes
s.t.
If a sensor responds in protest to the query, the expected number of queries remaining becomes
s.t.
Hence, page, where
s.t.
can be given by (13) at the bottom of the
N = 36
= 0:15 0:9
Fig. 2. For and to , we show the performance of the optimized recursive algorithm (solid line) and the entropy lower bound of (6) (dashed line).
s.t. s.t. The functions in (10), (12), and (13) can be evaluated recursively with the initial conditions that , and . In Fig. 2, we show the performance of the optimized recursensors and for various sive algorithm for a network of values of . The dashed lines represent the entropy lower bound of (6) for each case of and the solid lines represent the performance of the proposed algorithm. We observe that the proposed querying strategy closely approximates the optimal performance (i.e., the entropy lower bound that can be achieved asymptotically with Huffman coding). More importantly, the expected number of channel accesses varies with the entropy of the data as opposed to consuming a fixed number of channel accesses that are proportional to the number of sensors, which is the case for user-oriented multiple-access channel (MAC) protocols such as conventional time-division multiple-access (TDMA) schemes. The advantage of group queries is most promising when is close to or and when is close to , i.e., the cases where sensors’ data have low aggregate entropy. With the optimized recursive strategy, we show that group testing over correlated sources yield comparable performances with the optimal centralized compression scheme, such as Huffman coding, for a finite number of sources (since it is shown to closely approximate the entropy lower bound, cf. Fig. 2). However, the recursive formula does not show explicitly the relation between the number of queries and the parameters , , which governs the entropy of the data. In the following section, we exploit the simple structure of the tree-splitting
algorithm to obtain explicitly the asymptotic relation between the performance of group testing and the entropy of the data. V. BINARY TREE-SPLITTING ALGORITHM In this section, we consider a specific group selection strategy that chooses the queried groups through a binary tree-splitting algorithm. This has also been used in the context of collision resolution in [21]. Specifically, in this strategy, we initially split the network into two subgroups of equal size and query each group individually. If the group is resolved through the query, then no further action is needed on the sensors in that group; however, if the group is not resolved through the query, we again partition the group into two groups with equal size and query them in the subsequent time slots. The process continues until all the sensors are resolved. For simplicity of our illustration, we consider the for some positive integer . In practice, case where the network size need not be a power of and the partitioning can be done based on the location of the sensors rather than the index, cf. [14]. This method can be applied to sensor network architectures such as those proposed in [11], [22]. Also, in this case we assume that the queried groups contain contiguous sensors. To simplify our analysis, we adopt a suboptimal querying scheme where each group of sensors chosen through the treesplitting algorithm are queried twice in consecutive time slots asking two different questions. More specifically, suppose that , is the th group that is chosen. In this case, we and impose the two queries on the same group in consecutive time slots. This apand provides proach yields a pair of outputs
for for (13)
3544
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 8, AUGUST 2008
and the asymptotic scaling of the perexplicit relations with formances with respect to . A. Performance Evaluation of the Tree-Splitting Algorithm Considering the Markov model in Section II, we can compute the minimum number of queries needed to resolve the sensors’ . data using the binary tree-splitting algorithm, i.e., The proof is provided in Appendix A.
Fig. 3. Example of the realization of a sensor field with the binary sequence 0100111100001100.
us with the ternary information on
, namely,
Theorem 1: Consider a network of sensors and the binary observations modeled by the two-state Markov chain . The expected number of queries with the parameters with the binary tree splitting needed to retrieve the data algorithm is given by (14) where
where indicates that all sensors in contain the bit since and, no sensor responded (i.e., protested) to the question similarly, indicates that all sensors in contain the bit . In both of these cases, the group of sensors are resolved and there is no need to partition the groups any further. When occurs, the data gathering node knows that both and are possessed by some sensor in the group but it is not able to identify the sensors that contain the bit nor those that contain the bit . This is referred to as the erasure case. When is received, the group is partitioned into two subgroups of equal size as specified by the binary tree-splitting algorithm. For example, we consider a network of 16 nodes that are partitioned into a binary tree, as shown in Fig. 3. For a network nodes, there are levels in the binary tree, i.e., of levels . In the th level of the tree, the network is groups of size equal to . The group partitioned into denotes the th group in the th level of the tree (where ), which consists of all sensors within its subtree, i.e., . In the binary tree-splitting algorithm, the sequence of queries , i.e., and , and continues starts from the subgroups of splitting and querying the smaller subgroups each time the larger , results in either group results in . If the query on , for or , the process will go on to query the smallest group that is not yet resolved. However, if the test results in an erasure, the branches into two subgroups where the group vertex is queried next. For the data vector shown at the bottom of the tree in Fig. 3, the sequence of queries are done in the order of the following groups: and . Even though can be inferred from the result of the value of the data in to (since we know that and contain different bits from the erasure resulting from the query on ), we do not consider this improvement in our analysis. As a result of the binary splitting and the use of two queries on each group, the performance of this algorithm yields a loose upper bound to the optimal achievable performance of the Q&R strategy. However, the simple structure allows us to derive the
and . When the correlation is high, i.e., can be approximated as bility
, the proba-
(15) By using the approximation in (15), we can show that
(16) Notice from (16) that the expected number of queries reduces as or decreases to and as increases to , which are also decreases. The relation with the cases where the entropy of and are consistent with that observed in the optimized recursive scheme (see Fig. 2). Consider the case where sensors are increasingly correlated with respect to the sensor density in for some constant the sense that and large. This occurs, for example, when the sensors are as shown in Fig. 1(b) and deployed in a fixed interval that each sensor takes a binary quantization of a continuous random field that has zero crossings at Poisson points on the line (see random telegraph signal in [23], [24]). In this case, it is easy to show that the expected number of queries scales as . In contrast, the number of transmissions increases linearly with if each sensor transmits separately in different channels, such as that in TDMA. Even though the scaling of the binary tree-splitting algorithm may exceed that when is fixed, the Q&R strategies can produce of gains that depend on the data structure and, therefore, achieving for finite . a better performance in certain regimes of This advantage is not enjoyed by the conventional transmission protocols.
HONG AND SCAGLIONE: DATA-DRIVEN GROUP QUERIES FOR COOPERATIVE SENSOR NETWORKS
Fig. 4. The network is initially split into 2 the original binary tree algorithm.
3545
groups where each functions as
B. Optimized Tree-Splitting Strategy is To avoid the bad scaling laws for the case where fixed, we can improve upon the original tree-splitting scheme by optimizing over the initial partitioning of the network. In the binary tree splitting algorithm, we initiate the process by splitting the network into two subgroups and then proceed with the binary splitting within each group if necessary. In this case, large groups of sensors are queried initially and may result in an erasure with high probability if the sensors are not highly correlated. To avoid this cause of inefficiency, we can increase the number of initial partitioning such that the network is split disubgroups at the beginning of the process. The rectly into can be optimally chosen with respect to the data exponent . The binary partitioning of the groups statistics, where continues independently in each subgroup. When the network groups, the problem is equivalent is partitioned initially into subgroups that follow the binary tree-splitting to having algorithm described in the previous subsection, where each subgroup is initially split into only two subgroups. This is illustrated in Fig. 4. Following the approach given in [21], we find the optimal that minimizes the expected number of queries for each . This value of is denoted by . value of Let be the expected number of queries initial partitions. The expected number of queries with for the binary tree-splitting algorithm, as introduced in the . To previous section, is derive , we first prove the following properties on the continuity and the behavior of and with respect to the parameters (the proof is given in Appendix B). Lemma 1: Let be the value of minimum number of queries for the values
that achieves the , i.e.,
for all . 1) For fixed , the minimum expected number of queries is continuous, monotonically decreasing and concave down with respect to . 2) For fixed , the minimum expected number of queries is continuous, concave down . and symmetric around
N
Fig. 5. For = 128, we show the optimal expected number of tests.
K that achieves the minimum
3) For fixed , the function is monotonically nonincreasing with respect to ; and, for fixed , it is symmetric . around An example of is plotted in Fig. 5 for a network of . From this figure, we can see that is monotonically nonincreasing with respect to and is symmetric . In the following, we derive for two around and ii) for close to . The proof is cases: i) for fixed provided in Appendix C. Theorem 2: Case I: For fixed and an arbitrary value of , it is branches where optimal to split the root node initially into (17) Case II: For
close to (18)
From (17), we can see that the optimal splitting of the network for is equivalent to querying each inin this case. In the dividual sensor separately, since sensor model shown in Section II, the information lower bound increases linearly with when the transition probabilities are fixed. However, when we do not optimize over the initial partitioning of the network, the average number of tests of the binary tree splitting algorithm . Interestingly, with the optimal may increase faster than splitting, the worst case scenario is to choose an initial parti, which then achieves an increase consistent tioning of . with the entropy, i.e., the increase of Remark 2: We note that the binary tree-splitting algorithm can be applied to many different scenarios without relying on the statistical knowledge of the underlying sensor field. In fact, in most sensor applications where the statistics of the sensor field are not available beforehand, one can utilize the data-driven property of the binary tree-splitting scheme to efficiently retrieve data from the sensor field while estimating the desired
3546
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 8, AUGUST 2008
statistics on the fly. As proposed in [12], the estimated statistics can then be used to choose the optimal splitting of the initial group. This results in an adaptive strategy to efficiently retrieve distributed data without the initial knowledge of the sensor field statistics. Interested readers should refer to [12] for further details. VI. OPTIMAL SCALING PERFORMANCES In the preceding sections, we introduced and analyzed two suboptimal algorithms: 1) the optimized recursive algorithm and 2) the tree-splitting algorithm. For finite values of , we are able to show through numerical evaluation that the optimized recursive algorithm closely approximates the optimal Q&R strategy since there is little difference between the expected number of queries and the entropy lower bound. However, the relationship and the parameters and are not between explicit. Although the tree-splitting algorithm is a loose upper bound on the optimal scheme, the dependence with respect to and is explicitly shown and it is consistent with what we observe in the optimal recursive scheme. With the tree splitting as the upper bound and the entropy as the lower bound, we are able to derive the asymptotic behavior of the optimal group-testing-based Q&R strategy as increases, even though the construction of the optimal Q&R strategy is still unknown. be the number of queries needed for the optimal Let Q&R strategy to resolve the data set . Since the tree-splitting algorithms are special cases of the Q&R strategies, they are obvious upper bounds to the optimal scheme. Therefore, we have (19) The entropy serves as a lower bound since the binary channel outputs uniquely represent the observation , similar to the source coding problem. Using these bounds, we derive the asymptotic performances of the optimal scheme for two cases: i) the case where the number of sensors increases while the density remains constant and ii) the case where the density increases linearly with the number of sensors. As mentioned previously, the binary data at each sensor can be viewed as the binary quantization of a spatially continuous random process,5 similar to the sampling of a Random Telegraph process [24]. In the first scenario, where the distance between sensors remains the same, the correlation coefficient remains constant as the size of the network increases. In the second case, the distance between sensors decreases as the number of sensors increases and, thus, increases the correlation between sensors. Suppose , that the sensors are placed uniformly in a fixed interval as shown in Fig. 1(b). The distance between sensors will then 5Suppose that we have a continuous one-dimensional random process that crosses the value 0 at Poisson points on the line with parameter . Assume that sensor i is located at position i and the data at sensor i is a binary quantized value of the continuous random process such that X = 1 if the sample of the process is greater than 0 and X = 0, otherwise. In this case, we can treat the transitions between the binary quantized states as a Markov chain, similar to the sampling of the standard Random Telegraph Signal [24]. The correlation coefficient between adjacent sensors can be computed as
=
E [X X ] 0 p p(1 0 p)
1 0 1 0 p 1 ND :
Therefore, we can model the correlation coefficient as 1 positive constant c .
0 = c =N , for some
be . It is easy to show that the correlation coefficient , for some between adjacent sensors satisfies . constant We first prove the following lemma on the scaling of the entropy lower bound of (19). The proof is given in Appendix D. Lemma 2: 1) For fixed
, we have (20)
2) for of , we have
, for some
, and a fixed value (21)
Secondly, we show the scaling of the upper bound as follows (the proof is given in Appendix E). Lemma 3: 1) For any
, we have (22)
2) for
, for some
, and a fixed value of (23)
From Lemmas 2 and 3, we have shown that the upper and lower bounds of (19) scale in the same order. Therefore, it follows that the best group-testing-based Q&R strategy is asymptotically optimal in the sense that it achieves the same scaling as the entropy of . Theorem 3: Let be the expected number of queries needed by the optimal Q&R strategy. 1) For fixed (24) 2) for fixed
and
for some
, (25)
From this result, we have shown that the Q&R methods can significantly reduce the total number of transmissions necessary for the sensors to convey their information to a central processor. These strategies capitalize on the statistical knowledge of the sensors’ data to design group queries that are able to retrieve the data from multiple sensors simultaneously. This method compresses the distributed sensors’ information while scheduling the cooperative transmission of multiple sensor. However, we note that the class of Q&R strategies are not restricted to binary sources. A trivial extension to -ary sources is given in [10], questions corresponding to the where each query is to ask symbols in the -ary alphabet. In fact, the Q&R strategy can be generalized for different data statistics and for more sophisticated receiver structures, e.g., a receiver that gives an estimate of the number of sensors transmitting in response to each query. In the following section, we provide a generalized formulation of the Q&R problem.
HONG AND SCAGLIONE: DATA-DRIVEN GROUP QUERIES FOR COOPERATIVE SENSOR NETWORKS
Fig. 6. Cooperative MAC with feedback communications.
VII. GENERALIZATION OF GROUP TESTING TO Q&R STRATEGIES and the random Consider the set of sensors vector defined on the probability space that represents the observations made by the sensors. In general, the value of may belong to an arbitrary alphabet , which can be either discrete or continuous. Our goal is to rapidly identify the data at all sensors up to a certain accuracy through a sequence of queries and responses. The problem has implications to both lossless and lossy distributed source coding. As in the binary case, the data gathering node first imposes a query on the distributed sensors after which the sensors respond by transmitting a signal simultaneously and cooperatively over a MAC. A Q&R strategy must rely on the knowledge of the data statistics, the transmission channel, and all the previous responses in order to minimize the number of queries needed to retrieve the data. Because of the use of sensors’ queries, the Q&R strategy can be modeled as a MAC with feedback [25]–[27], as shown in Fig. 6. be the th query and let be the Let symbol transmitted by sensor in response to this query. The at the output transmission of the sensors form a signal of the MAC, which is described with the conditional probability where . function The sequence of channel outputs are used to form a query , or feedback symbol through the function . When the query is rei.e., ceived, each sensor performs symbol-by-symbol encoding , which depends on [28] and transmits the symbol and the query . Thus, we define the encoding function . With the information obtained through channel outputs , the receiver computes an estimate of the observation with the decoding function . The estimate obtained after the th transmission is denoted by and the initial estimate obtained before any transmission occurs is denoted by . In the binary case considered in this paper, we defined the set and utilized these queries to of queries to be notify the sensors implicitly about their partners’ data and to coordinate the sensors’ transmissions. By definition, we can express , where is the encoding function as the indicator function and is the complement of . That is, sensor responds with a if it belongs to the queried group and contains the bit that is opposite to the question. Ignoring the possible errors, the MAC outputs a if at least one sensor transmits a symbol . Therefore, it is equivalent to the binary OR
3547
channel [6] where if there exists such and , otherwise. that Let be the random variable representing the total number of channel accesses used to retrieve the data . Given the distortion and the constraint ,our goal is to minimize measure (through the design on the expected number of queries, i.e., and ), such that the estimate achieves the distortion , i.e., . In the binary case, we considered the lossless reconstruction of where is set to . We note that, in general, the symbol-by-symbol encoding considered above may not reach the distortion constraint most efficiently. However, we expect that the joint source-and-channel coding over correlated sources [26] and the feedback structure [25] can improve upon the efficiency. This is the form of cooperative advantage that we exploit with the Q&R strategy. A drawback of this scheme is the complexity involved in computing the optimal sequence of queries since it requires optimization over all possible groups of sensors and over all time slots. In fact, this is the reason for the two suboptimal strategies proposed in the binary Markov problem. A standard approach to reducing the complexity is to reduce the size of the search over a reasonably large set or to reduce the problem to a step-by-step optimization where the optimization is performed separately for each time slot. For example, in the binary Markov case, we restrict the search over sets of consecutive sensors, which reduces the problem to require only polynomial complexity and experiences little loss in performance. The advantage of this scheme is overwhelming when the aggregate entropy of the data is low, therefore, suboptimal or heuristic schemes may often be sufficient to achieve a desirable performance. The study of heuristic algorithms for general cases are beyond the scope of this paper but can be found in [13], [15]. VIII. DISCUSSIONS AND RELATED WORK The problem of efficiently retrieving correlated information from a large number of distributed sources appears in many areas of engineering, such as querying from a distributed database [29], content delivery in peer-to-peer networks [30], or the exchange of local information in a sensor network [8], [31], [32]. In this paper, we focused on the application of large-scale wireless sensor networks due to its need for an efficient data gathering strategy for correlated sources; however, the methodology extends to other applications as well, such as file comparison, fault detection, computer network,s etc. It is difficult to make complete justice to the several original contributions in the area. Therefore, rather than providing an overview on the state-of-the-art, we provide some indication on how our work relates to the different avenues in the following three areas: 1) distributed compression and communication for sensor networks; 2) joint source–channel coding; and 3) guessing strategies. A. Distributed Compression and Communication for Correlated Sensors Distributed compression techniques have been widely studied in the sensor network literature, along with data-driven communication protocols, to reduce the cost of communicating the sensors’ local measurements to each other or to a fusion center. These strategies exploit the high dependency between closely
3548
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 8, AUGUST 2008
located sensors to eliminate the redundancy of the transmissions. Three main approaches have been taken in the literature: 1) data aggregation [33]–[35]; 2) distributed source coding [36], [37]; and 3) spatial sampling [38], [39]. Data aggregation reduces the communication cost by allowing sensors to compress their local messages with the messages that they are relaying and, thereby, reducing the length of the messages enroute to its destination. Since the aggregation can only be performed by the relaying sensors, the cooperating sensors must be assigned to a single route toward the destination. Distributed source coding (DSC), however, exploits the Slepian–Wolf theory to derive compression schemes for distributed terminals. In fact, the Slepian–Wolf DSC theory shows that the optimal centralized compression efficiency can be achieved even when the sensors do not communicate explicitly their local data. Both of these schemes possess a sequential encoding structure that results in a tradeoff in reliability and efficiency [32]. Another data-gathering method for correlated sources is proposed based on spatial sampling. In this scheme, data is only gathered from sensors that are sufficiently uncorrelated, which is similar to sampling the data at sufficiently close points in space. This technique reduces significantly the energy consumption for moderate distortion constraints in data retrieval applications. However, the energy consumption increases rapidly under strict distortion constraints. In the limit where the lossless reconstruction of the sensors data is desired, data from all sensors must be collected and the cost increases linearly with the total number of sensors .
C. Guessing and Entropy
B. Joint Source and Channel Coding for Sensor Networks The strategies introduced above perform compression in the application layer and considers separately the source and channel coding. However, the separation between these two operations is well known to be suboptimal in multiuser systems [20]. This is due to the fact that source coding eliminates the correlation that could be utilized to improve the performance through cooperative transmissions. The methodology given in this paper also falls in the general area of joint source-and-channel coding since the cooperative channel coding at the sensors depends on their local data content. However, the approach is different from those proposed in the literature [40], [41], which focus on the use of turbo codes or low-definition parity-check (LDPC) codes. The strategy proposed in this paper exploits the correlation through the queries and utilize these queries to coordinate the cooperative transmissions at the sensors. The key intuition is to have highly correlated sensors transmit cooperatively in the same channel or time slot, instead of transmitting separately. The works most relevant to ours were proposed in the context of distributed detection and estimation problems [28], [42], [43]. In these works, all the sensors observe measurements of a common event and, thus, are highly correlated. The sensors act as the relay in cooperative networks that forward the information of the source to the destination [44]. In fact, as shown in [42], [43], it is sufficient to allocate a single channel to the sensors with the same measurement, similar to that in the group testing case. Therefore, the total number of channel uses becomes independent of the network size. This strategy is referred to as the type-based multiple access (TBMA) in the literature on distributed statistical inference.
The group testing methodology is extended to the case with correlation among the distributed samples and generalized as a query-and-response data retrieval strategy. Although there is no tractable approach to derive the optimal Q&R strategy in general, we are able to demonstrate the effectiveness of these strategies with two suboptimal algorithms, i.e., the optimized recursive algorithm and the tree-splitting algorithm. The optimized recursive scheme is shown to closely approximate the performance of the optimal scheme, but the dependence on the data statistics cannot be derived explicitly. With the tree-splitting algorithm, we are able to derive the relation between the statistical parameters of the data and the number of queries needed for the Q&R scheme. We also derived the asymptotic scaling of the expected number of queries with respect to . The data dependence of the queries and the cooperative transmission between sensors provides significant improvements over strategies that impose a separation between source and channel coding. The Q&R methodology is generalized to include different source models and transmission channels.
In general, the Q&R methodology shown in Section VII is similar to the formulation of a guessing game [45], [46]. Specifically, the query is like imposing a guess on the sensors data and the collective response is the answer to that guess. For example, in blood-testing applications, we group multiple blood samples together because we guess that these samples are all clear of infection. If the statistics imply otherwise, we can then make the reverse guess (if the test is available) or reduce the number of samples in the group. The guessing game has been studied in the information theory literature where the goal is to compute the minimum number of guesses required before the correct guess is made on an unknown random value. This problem was considered by Massey [47] for the guessing of encoding keys in security applications. Performance bounds on this problem have been derived in terms of the entropy and the alphabet of the random variable. Further research on this subject can be found in [45], [46] and references therein and thereof. Our proposed strategy differs from this problem in two ways. In our scheme, the sensors’ data are guessed or queried in groups and the queries are constructed sequentially. Although the sensors data can be treated as a single random vector which draws relations to the guessing problem, separating the data at distributed terminals makes the problem significantly different. IX. CONCLUSION
APPENDIX A PROOF OF THEOREM 1 Proof: Following the approach given by Capetanakis in [21], we can compute the expected number of queries needed to , the number identify the values of . Specifically, for of queries is expressed as follows: (26)
HONG AND SCAGLIONE: DATA-DRIVEN GROUP QUERIES FOR COOPERATIVE SENSOR NETWORKS
where if the query on results in (27) otherwise. The first term in (26) represents the two queries that are initially and . Since four imposed on each of the two groups additional queries are imposed each time a splitting of group occurs, i.e., when , we multiply the second term is with the factor . The probability that
3549
2) The second property follows similarly. 3) For any fixed value of , we want to show that is monotonically nonincreasing with respect to , i.e., (30) For
fixed, let
be the expected number of queries as a function of and . , then it From (14) and (28), we can show that, if follows that
which is invariant to the index due to the homogeneity of the Markov chain. Substituting this into (26), we can derive the expected number of queries as
APPENDIX B PROOF OF LEMMA 1 Proof: be the expected number of 1) Let and initial partitions. By queries for the values subgroups, we have splitting the network immediately into equivalently binary trees, each containing sensors, since each binary tree has two initial partitions as described in Fig. 3. Therefore, the expected number of queries is given by for
. It follows that for all
(28) where is given by (14) for a netsensors. Since each term in (14) is continuous, work of monotonically decreasing, and concave down with respect to , for , it follows that has the same properties (and, thus, as well). , we choose the optimal that miniGiven mizes the expected number of queries. By taking the minimization over , we have
(31)
We are now ready to prove (30) by contradiction. Suppose such that . Then, there exists by (31), it follows that
(32) (29) Therefore, is also continuous, monotonically de. creasing and concave down with respect to
follows from (31) and follows from the definition where of , that is, should yield the minimum expected number of queries when the correlation coefficient is
3550
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 8, AUGUST 2008
. However, (32) contradicts the fact that yields the minimum expected number of queries when the correlation cois monoefficient is . Hence, we have proven that tonically nonincreasing with respect to , for a fixed value of . is symmetric with reSimilarly, we can prove that . More specifically, we can show that spect to around monotonically increases with respect to for .
2) We first derive the scale of
as follows:
APPENDIX C PROOF OF THEOREM 2 be the values of for Proof: Let the interval is optimal. From the properties of Lemma 1, we can which and by solve for
Therefore
(33) and Hence, we have shown that
.
(34) APPENDIX E PROOF OF LEMMA 3
From (14) and (28), we have
(35) Similarly, we can solve for
through (36)
For splitting of
Proof: 1) Since the worst case is to partition the network into initial groups, i.e., querying each sensor individually, the optimal tree-splitting algorithm must achieve performance such . that 2) The optimal tree-splitting algorithm is bounded by the performance of the binary tree-splitting algorithm. In fact, the biand sufficiently nary tree-splitting is optimal for large. Therefore, we derive the scaling of the binary tree-splitting algorithm for the case where . From (14), we have
, we can derive from (35) and (36) that and , i.e., the initial branches is optimal for such that
Equivalently, we can say that
is optimal for
Similar to that shown in (15), we can approximate (35) and and obtain the optimal splitting for (36) for as (37) APPENDIX D PROOF OF LEMMA 2 Proof: 1) This property follows from the fact that .
where (37) follows from the Stirling’s formula [20]. ACKNOWLEDGMENT We would like to thank Prof. Toby Berger and Prof. Pramod K. Varshney for the helpful discussions on group testing.
HONG AND SCAGLIONE: DATA-DRIVEN GROUP QUERIES FOR COOPERATIVE SENSOR NETWORKS
REFERENCES [1] R. Dorfman, “The detection of defective members of large population,” Ann. Math. Statist., vol. 14, no. 4, pp. 436–440, Dec. 1943. [2] W. Bruno, E. Knill, D. Balding, W. Bruno, N. Doggett, W. Sawhill, R. Staltings, C. Whittaker, and D. C. Torney, “Efficient probing designs for library screening,” Genomics, vol. 26, pp. 21–30, 1995. [3] J. K. Wolf, “Born again group testing: Multiaccess communications,” IEEE Trans. Inf. Theory, vol. IT-31, no. 2, pp. 185–191, Mar. 1985. [4] T. Berger, N. Mehravari, D. Towsley, and J. Wolf, “Random multipleaccess communication and group testing,” IEEE Trans. Commun., vol. COM-32, no. 7, pp. 769–779, Jul. 1984. [5] E. Hong and R. Ladner, “Group testing for image compression,” IEEE Trans. Image Process., vol. 11, no. 8, pp. 901–911, Aug. 2002. [6] M. Sobel and P. A. Groll, “Group testing to eliminate efficiently all defectives in a binomial sample,” Bell Syst. Tech. J., vol. 38, pp. 1179–1253, Sep. 1959. [7] C. Fragouli and A. Orlitsky, “Silence is golden and time is money: Power-aware communications for sensor networks,” in Proc. Allerton Conf. Communications, Control and Computing, Monticello, IL, Sep. 2005. [8] R. Cristescu, B. Beferull-Lozano, and M. Vetterli, “On network correlated data gathering,” in Proc. INFOCOM 2004, Hong Kong, Mar. 2004, vol. 4, pp. 2571–2582. [9] Y.-W. Hong and A. Scaglione, “On multiple access for correlated sources: A content-based group testing approach,” in Proc. IEEE Information Theory Workshop, San Antonio, TX, Oct. 2004, pp. 298–303. [10] Y.-W. Hong and A. Scaglione, “Content-based multiple access: Combining source and multiple access coding for sensor networks,” in Proc. IEEE Int. Workshop on Multimedia Signal Processing, Siena, Italy, Sep. 2004, pp. 103–106. [11] Y.-W. Hong, A. Scaglione, R. Manohar, and B. Sirkeci-Mergen, “Dense sensor networks that are also energy efficient: When ‘more’ is ‘less’,” in Proc. IEEE Military Communications Conf. (MILCOM), Atlantic City, NJ, Oct. 2005, vol. 5, pp. 3127–3133. [12] Y.-W. Hong and A. Scaglione, “Group testing for sensor networks: The value of asking the right questions,” in Proc. Asilomar Conf. Signals, Systems, and Computers, Pacific Grove, CA, Nov. 2004, vol. 2, pp. 1297–1301. [13] Y.-W. Hong and A. Scaglione, “Generalized group testing for retrieving distributed information,” in Proc. Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Philadelphia, PA, Mar. 2005, vol. 3, pp. 681–684. [14] Y.-W. Hong and A. Scaglione, “A scalable communication architecture for the sensor broadcast problem,” in Proc. IEEE Int. Workshop on Signal Processing Advances for Wireless Communications (SPAWC), New York, Jun. 2005, pp. 226–230. [15] Y.-W. Hong and P. K. Varshney, “Data-centric and cooperative mac protocols for sensor networks,” in Wireless Sensor Networks: Signal Processing and Communications Perspectives, A. Swami, Q. Zhao, Y.-W. Hong, and L. Tong, Eds. New York: Wiley, 2007. [16] S. M. Kay, Fundamentals of Statistical Signal Processing: Detection Theory. Upper Saddle River, NJ: Prentice-Hall, 1998. [17] J. G. Proakis, Digital Communications, 4th ed. New York: McGraw Hill, 2001. [18] R. J. Barton and R. Zheng, “Cooperative time-reversal communication is order-optimal for data aggregation in wireless sensor networks,” in Proc. IEEE Int. Symp. Information Theory (ISIT), Seattle, WA, Jul. 2006, pp. 222–226. [19] R. J. Barton and R. Zheng, “Order-optimal data aggregation in wireless sensor networks using cooperative time-reversal communication,” in Proc. 40th Annu. Conf. Information Sciences and Systems, Princeton, NJ, Mar. 2006, pp. 1050–1055. [20] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley-Interscience, 1991. [21] J. Capetanakis, “Generalized TDMA: The multi-accessing tree protocol,” IEEE Trans. Commun., vol. COM-27, no. 10, pp. 1476–1484, Oct. 1979. [22] P. Venkitasubramaniam, S. Adireddy, and L. Tong, “Sensor networks with mobile access: Optimal random access and coding,” IEEE J. Sel. Areas Commun., vol. 22, no. 6, pp. 1058–1068, Aug. 2004.
3551
[23] H. Stark and J. W. Woods, Probability and Random Processes with Applications to Signal Processing, 3rd ed. Upper Saddle River, NJ: Prentice-Hall, 2002. [24] A. Papoulis and S. U. Pillai, Probability, Random Variables and Stochastic Processes, 4th ed. New York: McGraw-Hill, 2001. [25] T. Cover and C. Leung, “An achievable rate region for the multipleaccess channel with feedback,” IEEE Trans. Inf. Theory, vol. IT-27, no. 3, pp. 292–298, May 1981. [26] T. Cover, A. El Gamal, and M. Salehi, “Multiple access channels with arbitrarily correlated sources,” IEEE Trans. Inf. Theory, vol. IT-26, no. 6, pp. 648–657, Nov. 1980. [27] K. de Bruyn, V. Prelov, and E. van der Meulen, “Reliable transmission of two correlated sources over an asymmetric multiple-access channel (corresp.),” IEEE Trans. Inf. Theory, vol. IT-33, no. 5, pp. 716–718, Sep. 1987. [28] M. Gastpar and M. Vetterli, “Source-channel communication in sensor networks,” in Proc. Int. Symp. Information Processing in Sensor Networks (IPSN), Palo Alto, CA, Apr. 2003, pp. 162–177. [29] M. T. Ozsu and P. Valduriez, Principles of Distributed Database Systems, 2nd ed. Upper Saddle River, NJ: Prentice-Hall, 1999. [30] BitTorrent Protocol [Online]. Available: http://bitconjurer.org/BitTorrent [Online]. Available [31] S. Lindsey, C. Raghavendra, and K. M. Sivalingam, “Data gathering algorithms in sensor networks using energy metrics,” IEEE Trans. Parallel Distrib. Syst, vol. 13, no. 9, pp. 924–935, Sep. 2002. [32] D. Marco and D. L. Neuhoff, “Reliability vs. efficiency in distributed source coding for field-gathering sensor networks,” in Proc. 3rd Int. Symp. Information Processing in Sensor Networks (IPSN), Berkeley, CA, Apr. 2004, pp. 161–168. [33] W. R. Heinzelman, J. Kulik, and H. Balakrishnan, “Adaptive protocols for information dissemination in wireless sensor networks,” in Proc. 5th Annu. ACM/IEEE Int. Conf. Mobile Computing and Networking (MobiCom), 1999, pp. 174–185. [34] B. Krishanamachari, D. Estrin, and S. Wicker, “The impact of data aggregation in wireless sensor networks,” in Proc. Int. Workshop on Distributed Event Based Systems (DEBS), Vienna, Austria, Jul. 2002. [35] C. Intanagonwiwat, R. Govindan, D. Estrin, J. Heidemann, and F. Silva, “Directed diffusion for wireless sensor networking,” IEEE/ACM Trans. Netw., vol. 11, no. 1, pp. 2–16, Feb. 2003. [36] S. Pradhan, J. Kusuma, and K. Ramchandran, “Distributed compression in a dense microsensor network,” IEEE Signal Process. Mag., vol. 19, no. 2, pp. 51–60, Mar. 2002. [37] Z. Xiong, A. D. Liveris, and S. Cheng, “Distributed source coding for sensor networks,” IEEE Signal Process. Mag., vol. 21, no. 5, pp. 80–94, Sep. 2004. [38] M. C. Vuran and I. F. Akyildiz, “Spatial correlation-based collaborative medium access control in wireless sensor networks,” IEEE/ACM Trans. Netw., vol. 14, no. 2, pp. 316–329, Apr. 2006. [39] M. Dong, L. Tong, and B. M. Sadler, “Impact of data retrieval pattern on homogeneous signal field reconstruction in dense sensor networks,” IEEE Trans. Signal Process., vol. 54, no. 11, pp. 4352–4364, Nov. 2006. [40] J. Garcia-Frias, “Joint source-channel decoding of correlated sources over noisy channels,” in Proc. Data Compression Conf. (DCC’01), Washington, DC, 2001, pp. 283–283. [41] M. Sartipi and F. Fekri, “Source and channel coding in wireless sensor networks using ldpc codes,” in Proc. IEEE Conf. Sensor and Ad Hoc Communications and Networks (SECON). [42] G. Mergen and L. Tong, “Type based estimation over multiaccess channels,” IEEE Trans. Signal Process., vol. 54, no. 2, pp. 613–626, Feb. 2006. [43] K. Liu and A. M. Sayeed, “Optimal distributed detection strategies for wireless sensor networks,” in Proc. 42nd Annu. Allerton Conf. Communications, Control and Computing, Monticello, IL, Oct. 2004. [44] Y.-W. Hong, W.-J. Huang, F.-H. Chiu, and C.-C. J. Kuo, “Cooperative communications in resource-constrained wireless networks,” IEEE Signal Process. Mag., vol. 24, no. 3, pp. 47–57, May 2007. [45] A. D. Santis, A. Gaggia, and U. Vaccaro, “Bounds on entropy in a guessing game,” IEEE Trans. Inf. Theory, vol. 47, no. 1, pp. 468–473, Jan. 2001. [46] E. Arikan and N. Merhav, “Guessing subject to distortion,” IEEE Trans. Inf. Theory, vol. 44, no. 3, pp. 1041–1056, May 1998. [47] J. Massey, “Guessing and entropy,” in Proc. IEEE Int. Symp. Information Theory (ISIT), Trondheim, Norway, Jun./Jul. 1994, p. 204.