Distributed Multi-target Detection in Sensor Networks

0 downloads 0 Views 172KB Size Report
bility fusion method based on Bayes' theorem to fuse the results from each cluster. .... (b) Diesel truck. Figure 3: Vehicles used in the experiment. 38. 48. 49. 50. 43. 44. 45. 46. 47. 35 .... networks,” IEEE Signal Processing Magazine, vol. 19, no.
Distributed Multi-target Detection in Sensor Networks Xiaoling Wang, Hairong Qi

Steve Beck

Department of Electrical and Computer Engineering

BAE Systems, IDS

The University of Tennessee

Austin, TX 78725

Knoxville, TN 37996

[email protected]

{xwang1,hqi}@utk.edu

Recent advances in micro electro mechanical systems (MEMS), wireless communication technologies, and digital electronics are responsible for the emergence of sensor networks that deploy thousands of low-cost sensor nodes integrating sensing, processing, and communication capabilities. These sensor networks have been employed in a wide variety of applications, ranging from military surveillance to civilian and environmental monitoring. Examples of such applications include battlefield command, control, and communication [1], target detection, localization, tracking and classification [2, 3, 4, 5, 6], transportation monitoring [7], pollution monitoring in the air, soil, and water [8, 9], ecosystem monitoring [10], etc. A fundamental problem addressed by these sensor network applications is target detection in the field of interest. This problem has two primary levels of difficulties: single target detection and multiple target detection. The single target detection problem can be solved using statistical signal processing methods. For example, a constant false-alarm rate (CFAR) detector on the acoustic signals can determine the presence of a target if the signal energy exceeds an adaptive threshold. On the other hand, the multiple target detection problem is rather challenging and very difficult to solve. Over the years, researchers have employed different sensing modalities, 1-D or 2-D, to tackle the problem. Take 2-D imagers as an example: through image segmentation, the targets of interest can be separated from the background and later identified using pattern classification methods. However, if these targets appear occluded with each other in a single image frame, or the target pixels are mixed with background clutter, which is almost always the case, detecting these targets can be extremely difficult. In such situations, array processing or distributed collaborative processing of 1-D signals such as the acoustic and seismic signals may offer advantages because of the time delays among the sensors’ relative positions as well as the

1

intrinsic correlation among the target signatures. For instance, we can model the acoustic signal received at an individual sensor node as a linear/nonlinear weighted combination of the signals radiated from different targets with the weights determined by the signal propagation model and the distance between the targets and the sensor node. The problem of detecting multiple targets in sensor networks from their linear/nonlinear mixtures is similar to the traditional blind source separation (BSS) problem [11, 12], where different targets in the field are considered as the sources. The “blind” qualification of BSS refers to the fact that there is no a-priori information available on the number of sources, the distribution of sources, or the mixing model [13]. In general, BSS problems involve two aspects of research: source number estimation and source separation, where source number estimation is the process of estimating the number of targets and source separation is the process of identifying and extracting target signals from the mixtures. Independent component analysis (ICA) [14, 15, 16, 17] has been a widely accepted technique to solve BSS problems, but it has two major drawbacks when applied in the context of sensor networks. First, for conceptual and computational simplicity, most ICA algorithms make the assumption that the number of sources equals to the number of observations, so that the mixing/unmixing matrix is square and can be easily estimated. However, this equality assumption is generally not the case in sensor network applications since thousands of sensors can be densely deployed within the sensing field and the number of sensors can easily exceed the number of sources. Hence, the number of sources has to be estimated before any further calculations can be done. In [18], the problem of source number estimation is also referred to as the problem of model order estimation. We will use these two terms interchangeably in this section. The second drawback is that most model order estimation methods developed to date require centralized processing and are derived under the assumption that sufficient data from all the involved sensors are available in order to estimate the most probable number of sources as well as the mixing matrix. However, this assumption is not feasible for real-time processing in sensor networks because of the sheer amount of sensor nodes deployed in the field and the limited power supply on the battery-supported sensor nodes. In this section, we focus our discussion on model order estimation. We develop a distributed multiple target detection framework for sensor network applications based on centralized blind source estimation techniques. The outline of this section is as follows: We first describe the problem of blind source separation in Sec. 1 and source number estimation in Sec. 2. Based on the background introduction of these two related topics, we then present in Sec. 3 a distributed source number estimation technique for multiple target detection in sensor networks. In Sec. 4, we conduct experiments to evaluate the performance of the

2

proposed distributed method compared to the existing centralized approach.

1

Blind Source Separation (BSS)

The BSS problem [11, 12] considers how to extract source signals from their linear or nonlinear mixtures using a minimum of a-priori information. We start our discussion with a very intuitive example, the socalled cocktail-party problem [15]. Suppose there are two people speaking simultaneously in a room and two microphones placed in different locations of the room. Let x1 (t) and x2 (t) denote the amplitude of the speech signals recorded at the two microphones, and s1 (t) and s2 (t) the amplitude of the speech signals generated by the two speakers. We call x1 (t) and x2 (t) the observed signals and s1 (t) and s2 (t) the source signals. Intuitively, we know that both the observed signals are mixtures of the two source signals. If we assume the mixing process is linear, then we can model it using Eq. 1, where the observed signals (x 1 (t) and x2 (t)) are weighted sums of the source signals (s1 (t) and s2 (t)), and a11 , a12 , a21 , and a22 denote the weights which are normally dependent upon the distances between the microphones and the speakers.

x1 (t) = a11 s1 (t) + a12 s2 (t) x2 (t) = a21 s1 (t) + a22 s2 (t)

(1)

If the aij ’s are known, the solutions to the linear equations in Eq. 1 are straightforward; otherwise, the problem is considerably more difficult. A common approach is to adopt some statistical properties of the source signals (e.g., making the assumption that the source signals s i (t), at each time instant t, are statistically independent) to help estimate the weights aij . In sensor networks, sensor nodes are usually densely deployed in the field. If the targets are close to each other, the observation from each individual sensor node is a mixture of the source signals generated by the targets. Therefore, the basic formulation of the BSS problem and its ICA-based solution are applicable to multiple target detection in sensor networks as well. Suppose there are m targets in the sensor field generating source signals s i (t), i = 1, . . . , m, and n sensor observations recorded at each sensor node xj (t), j = 1, . . . , n, where t = 1, . . . , T indicates the time index of the discrete-time signals. Then the sources and the observed mixtures at t can be denoted as vectors s(t) = [s1 (t), . . . , sm (t)]T and x(t) = [x1 (t), . . . , xn (t)]T respectively. Let Xn×p = {x(t)} represent the sensor observation matrix and Sm×p = {s(t)} the unknown source matrix, where p is the number of discrete times.

3

If we assume the mixing process is linear, then X can be represented as

X = AS,

(2)

where An×m is the unknown non-singular scalar mixing matrix. In order to solve Eq. 2 using the ICA algorithms, we need to make the following three assumptions. First, the mixing process is instantaneous so that there is no time delay between the source signals and the sensor observations. Second, the source signals s(t) are mutually independent at each time instant t. This assumption is not unrealistic in many cases since the estimation results can provide a good approximation of the real source signals [15]. In this sense, the BSS problem is to determine a constant (weight) matrix W ˆ an estimate of the source matrix, is as independent as possible: so that S, ˆ = WX. S

(3)

In theory, the unmixing matrix Wm×n can be solved using the Moore-Penrose pseudo-inverse of the mixing matrix A W = (AT A)−1 AT .

(4)

ˆ can be denoted as ˆsi = wX, Correspondingly, the estimation of one independent component (one row of S) where w is one row of the unmixing matrix W. Define z = AT wT , then the independent component ˆsi = wX = wAS = zT S, which is a linear combination of si ’s with the weights given by z. According to the Central Limit Theorem, the distribution of a sum of independent random variables converges to a Gaussian. Thus, zT S is more Gaussian than any of the components si and becomes least Gaussian when it in fact equals one of the si ’s, that is, when it gives the correct estimation of one of the sources [15]. Therefore, in the context of independent component analysis, nongaussianity indicates independence. Many metrics have been studied to measure the nongaussianity of the independent components, such as the kurtosis [13, 19], the mutual information [11, 20], and the negentropy [14, 21]. The third assumption is related to the independence criterion stated above. Since the mixture of two or more Gaussian sources is still a Gaussian, which makes it impossible to separate them from each other, we need to assume that at most one source signal is normally distributed for the linear mixing/unmixing model [17]. This assumption is reasonable in practice since pure Gaussian processes are rare in real data.

4

2

Source Number Estimation

As we discussed above, BSS problems involve two aspects: source number estimation and source separation. Most ICA-based algorithms assume that the number of sources equals the number of observations to make the mixing matrix A and the unmixing matrix W square in order to simplify the problem. However, this assumption is not feasible in sensor networks due to the sheer amount of sensor nodes deployed. Hence, the number of targets has to be estimated before any further operations can be done. Several approaches have been introduced to solve the source number estimation problem so far, some heuristic, others based on more principled approaches [22, 23, 24]. As discussed recently in [18], it has become clear that techniques of the latter category are superior, and heuristic methods may be seen at best as approximations to more detailed underlying principles. In this section, we focus on the discussion of principled source number estimation algorithms, which construct multiple hypotheses corresponding to different number of sources. Suppose Hm denotes the hypothesis on the number of sources m. The goal of principled source number estimation is to find m ˆ whose corresponding hypothesis H m ˆ maximizes the posterior probability given only the observation matrix X,

m ˆ = arg max P (Hm |X). m

(5)

A brief introduction on different principled source number estimation methods is given next.

2.1

Bayesian Source Number Estimation

Roberts proposed a Bayesian source number estimation approach to finding the hypothesis that maximizes the posterior probability P (Hm |X). Interested readers are referred to [24] for detailed theoretical derivation. According to Bayes’ theorem, the posterior probability of the hypothesis can be written as

P (Hm |X) =

p(X|Hm )P (Hm ) p(X)

(6)

Assume the hypothesis Hm of different number of sources m has a uniform distribution, i.e. equal prior probability P (Hm ) and since p(X) is a constant, the measurement of the posterior probability can be simplified to the calculation of likelihood p(X|Hm ). By marginalizing the likelihood over the system parameters space and approximating the marginal integrals using the Laplace approximation method, a log-likelihood

5

function proportional to the posterior probability can be written as:

L(m)

= =

log p(x(t)|Hm ) ˆ βˆ 1 1 ˆ T A| ˆ − β (x(t) − Aˆ ˆ s(t))2 log π(ˆs(t)) + (n − m) log( ) − log |A 2 2π 2 2 m mn n X βˆ −[ log ˆsj (t)2 ) + mn log γ] log( ) + ( 2 2π 2 j=1

(7)

ˆ is the estimate of the mixing matrix, ˆs(t) = Wx(t) is the estimate where x(t) is the sensor observations, A ˆ T A) ˆ −1 A ˆ T , βˆ is the variance of noise component, γ is a constant, of the independent sources and W = (A and π(·) is the assumed marginal distribution of the source. The Bayesian source number estimation method considers a set of Laplace approximations to infer the posterior probabilities of specific hypotheses. This approach has a solid theoretical background and the objective function is easy to calculate, hence, it provides a practical solution for the source number estimation problem.

2.2

Sample-based Source Number Estimation

Other than the Laplace approximation method, the posterior probabilities of specific hypotheses can also be evaluated using a sample-based approach. In this approach, a reversible-jump Markov chain Monte Carlo (RJ-MCMC) method is proposed to estimate the joint density over the mixing matrix A, the hypothesized number of sources m, and the noise component Rn , which is denoted as P (A, m, Rn ) [18, 23]. The basic idea is to construct a Markov chain which generates samples from the hypothesis probability and to use the Monte Carlo method to estimate the posterior probability from the samples. An introduction of Monte Carlo methods can be found in [25]. RJ-MCMC is actually a random-sweep Metropolis-Hastings method, where the transition probability of the Markov chain from state (A, m, Rn ) to state (A0 , m0 , Rn0 ) is

p = min{1,

P (A0 , m0 , Rn0 |X) q(A, m, Rn |X) J}, P (A, m, Rn |X) q(A0 , m0 , Rn0 |X)

(8)

where P (·) is the posterior probability of the unknown parameters of interest, q(·) is a proposal density for moving from state (A, m, Rn ) to state (A0 , m0 , Rn0 ), and J is the ratio of Jacobians for the proposal transition between the two states [18]. More detailed derivation of this method is provided in [23].

6

2.3

Variational Learning

In recent years, the Bayesian inference problem shown in Eq. 6 is also tackled using another approximative method known as variational learning [26, 27]. In ICA problems, variables are divided into two classes: the visible variables v and the hidden variables h. An example of visible variables is the observation matrix X; examples of hidden variables include an ensemble of the parameters of A, the noise covariance matrix, any parameters in the source density models, and all associated hyperparameters such as the number of sources m [18]. Suppose q(h) denotes the variational approximation to the posterior probability of the hidden variables P (h|v), then the negative variational free energy, F , is defined as

F =

Z

q(h) ln P (h|v)dh + H[q(h)]

(9)

where H[q(h)] is the differential entropy of q(h). The negative free energy F forms a strict lower bound on R the evidence of the model, ln p(v) = ln( p(v|h)p(h)dh). The difference between this variational bound and the true evidence is the Kullback-Leibler (KL) divergence between q(h) and the true posteriors P (h|v) [28].

Therefore, maximizing F is equivalent to minimizing the KL divergence, and this process provides a direct method for source number estimation. Another promising source number estimation approach using variational learning is the so-called Automatic Relevance Determination (ARD) scheme [28]. The basic idea of ARD is to suppress sources that are unsupported by the data. For example, assume each hypothesized source has a Gaussian prior with separate variances, those sources that do not contribute to modeling the observations tend to have very small variances and the corresponding source models do not move significantly from their priors [18]. After eliminating those unsupported sources, the sustained sources give the true number of sources of interest. Even though variational learning is a particularly powerful approximative approach, it is yet to be developed into a more mature form. In addition, it presents difficulties to estimate the true number of sources with noisy data.

3

Distributed Source Number Estimation

The source number estimation algorithms described in Sec. 2 are all centralized processes in the sense that the observed signals from all sensor nodes are collected at a processing center and estimation needs to be performed on the whole data set. While this assumption works well for small sensor array applications like

7

Sensor field

A cluster

Cluster head Sensor node

Figure 1: An example of a clustered sensor network. in speech analysis, it is not necessarily the case for real-time applications in sensor networks due to the large scale of network, as well as the severe resource constraints. In sensor networks, the sensor nodes are usually battery-operated and cannot be recharged in real time, which makes energy the most constrained resource. Since wireless communication consumes the most energy among all the activities conducted on the sensor node [29], the centralized scheme becomes very energy-intensive due to large amount of data transmission and is not cost effective for real-time sensor network applications. On the contrary, when implemented in a distributed manner, data can be processed locally on a cluster of sensor nodes that are close in geographical location and only local decisions need to be transfered for further processing. In this way, the distributed target detection framework can dramatically reduce long-distance network traffic and therefore conserve energy consumed on data transmissions and prolong the lifetime of the sensor network.

3.1

Distributed Hierarchy in Sensor Networks

In the context of the proposed distributed solution to the source number estimation problem, we assume a clustering protocol has been applied and the sensor nodes have organized themselves into clusters with each node assigned to one and only one cluster. Nodes can communicate locally within the same cluster, and different clusters communicate through a cluster head specified within each cluster. An example of a clustered sensor network is illustrated in Fig. 1. Suppose there are m targets present in the sensor field, and the sensor nodes are divided into L clusters. Each cluster l (l = 1, · · · , L) can sense the environment independently and generate an observation matrix Xl which consists of mixtures of the source signals generated by the m targets. The distributed estimation hierarchy includes two levels of processing: First, we estimate the posterior probability P (H m |Xl ) of each

8

" #$ &%'  ( ) )*' ,+ *- )* ./ 10 )32



4

Local  estimation   



 estimation   ,5  estimation     ... Local ... Local

  !

  4 

  

Posterior probability fusion Bayes’ theorem

Figure 2: The structure of the distributed source number estimation hierarchy. hypothesis Hm given a local observation matrix Xl . The Bayesian source number estimation approach proposed by Roberts [24] is employed in this step. Secondly, we develop a posterior probability fusion algorithm to integrate the decisions from each cluster. The structure of the hierarchy is illustrated in Fig. 2.

This hierarchy for distributed source number estimation benefits from two areas of research achievements, distributed detection and ICA model order estimation. However, it exhibits some unique features that make it advantageous for multiple target detection in sensor networks from both the theoretical and practical points of view. • M-ary hypothesis testing. Most distributed detection algorithms are derived under the binary hypothesis assumption, where H takes on one of two possible values corresponding to the presence and absence of the target [30]. The distributed framework developed here allows the traditional binary hypothesis testing problem to be extended to the M-ary case, where the values of H correspond to the different numbers of sources. • Fusion of detection probabilities. Instead of making a crisp decision from local cluster estimates as in the classic distributed detection algorithms, a Bayesian source number estimation algorithm is performed on the observations from each cluster, and the posterior probability for each hypothesis is estimated. These probabilities are then sent to a fusion center where a decision regarding the source number hypothesis is made. This process is also referred to as the fusion of detection probabilities [31] or combination of level of significance [32]. By estimating and fusing the probabilities of hypothesis 9

from each cluster, it is possible for the system to achieve a higher detection accuracy because faults are constrained within the processing of each cluster. • Distributed structure. Even though the source number estimation technique is usually implemented in a centralized manner, the distributed framework presents several advantages. For example, in the distributed framework, data is processed locally in each cluster and only the estimated hypothesis probabilities are transmitted through the network. Hence, heavy network traffic can be significantly reduced and communication energy conserved. Furthermore, since the estimation process is performed in parallel within each cluster, the computation burden is distributed and computation time reduced. After local source number estimation is conducted within each cluster, we develop a posterior probability fusion method based on Bayes’ theorem to fuse the results from each cluster.

3.2

Posterior Probability Fusion based on Bayes’ Theorem

The objective of source number estimation approaches is to find the optimal number of sources m that maximizes the posterior probability P (Hm |X). When implemented in the distributed hierarchy, the local estimation approach calculates the posterior probability corresponding to each hypothesis H m from each cluster, P (Hm |X1 ), · · · , P (Hm |XL ). According to Bayes’ theorem, the fused posterior probability can be written as P (Hm |X) =

p(X|Hm )P (Hm ) p(X)

(10)

Assume the clustering of sensor nodes is exclusive, that is, X = X1 ∪ X2 ∪ · · · ∪ XL and Xl ∩ Xq = ∅ for any l 6= q, l = 1, · · · , L, q = 1, · · · , L, the posterior probability P (Hm |X) can be represented as

P (Hm |X) =

p(X1 ∪ X2 ∪ · · · ∪ XL |Hm )P (Hm ) p(X1 ∪ X2 ∪ · · · ∪ XL )

(11)

Since the observations from different clusters are assumed to be independent, p(X l ∩ Xq ) = 0, for any l 6= q, we then have

p(X1 ∪ X2 ∪ · · · ∪ XL |Hm )

=

L X

p(Xl |Hm ) −

L X

p(Xl |Hm )

p(Xl ∩ Xq |Hm )

l,q=1,l6=q

l=1

=

L X

l=1

10

(12)

Substituting Eq. 12 into Eq. 11, the fused posterior probability can be derived as

P (Hm |X)

=

PL

=

PL

=

l=1

l=1

L X l=1

p(Xl |Hm )P (Hm ) PL l=1 p(Xl )

P (Hm |Xl )p(Xl ) P (Hm ) P (Hm ) PL l=1 p(Xl )

p(Xl ) P (Hm |Xl ) PL q=1 p(Xq )

where P (Hm |Xl ) denotes the posterior probability calculated in cluster l, and the term p(Xl )/

(13)

PL

q=1

p(Xq )

reflects the physical characteristic from clustering in sensor networks which is application-specific. For example, in the case of distributed multiple target detection using acoustic signals, the propagation of acoustic signals follows the energy decay model where the detected energy is inversely proportional to the square of the distance between the source and the sensor node, i.e., Esensor ∝ d12 Esource . Therefore, the PL term p(Xl )/ q=1 p(Xq ) can be considered as the relative detection sensitivity of the sensor nodes in cluster

l and is proportional to the average energy captured by the sensor nodes, Kl Kl 1 X p(Xl ) 1 1 X Ek ∝ , ∝ PL Kl Kl d2 q=1 p(Xq ) k=1 k=1 k

(14)

where Kl denotes the number of sensor nodes in cluster l.

4

Performance Evaluation

We apply the proposed distributed source number estimation hierarchy to detect multiple civilian targets using data collected from a field demo held at BAE Systems, Austin, TX in August, 2002. We also compare the performance between the centralized Bayesian source number estimation algorithm and the distributed hierarchy using the evaluation metrics described next.

4.1

Evaluation Metrics

As mentioned before, source number estimation is basically an optimization problem in which an optimal hypothesis Hm is pursued that maximizes the posterior probability given the observation matrix, P (H m |X). The optimization process is affected by the initialization condition and the update procedure of the algorithm itself. To compensate for the randomness and to stabilize the overall performance, the algorithms are

11

iteratively repeated, for example, 20 times in this experiment. Detection probability (Pdetection ) is the most intuitive metric for measuring the accuracy of a detection approach. In our experiment, we define detection probability as the ratio between the correct source number estimates and the total number of estimations, i.e., Pdetection =

Ncorrect Ntotal ,

where Ncorrect denotes the number

of correct estimations and Ntotal is the total number of estimations. After executing the algorithm multiple times, a histogram can be generated that shows the accumulated number of estimations corresponding to different hypotheses of the number of sources. The histogram also represents the reliability of algorithm, the larger the difference of histogram values between the hypothesis of the correct estimate and other hypotheses, the more deterministic and reliable the algorithm. We use kurtosis (β) to extract this characteristic of the histogram. Kurtosis calculates the flatness of the histogram,

β=

N 1 X hk − µ 4 k( ) −3 C θ

(15)

k=1

PN where hk denotes the value of the kth bin in the histogram, N is the total number of bins, C = k=1 hk , q PN PN 1 2 µ = C1 k=1 khk is the mean, and θ = k=1 k(hk − µ) is the variance. Intuitively, the larger the C kurtosis, the more deterministic the algorithm, and the more reliable the estimation.

Since the source number estimation is designed for real time multiple target detection in sensor networks, the computation time is also an important metric for performance evaluation.

4.2

Experimental Results

In the field demo, we let two civilian vehicles, a motorcycle and a diesel truck, as shown in Fig. 3, travel along the N-S road from opposite directions and intersect at the T-junction. There are 15 nodes deployed along the road. For this experiment, we assume two clusters of 5 sensor nodes exist for the distributed processing. The sensor network setup is illustrated in Fig. 4(a). We use Sensoria WINS NG-2.0 sensor nodes (as shown in Fig. 4(b)), which consist of a dual-issue SH-4 processor running at 167MHz with 300 MIPS of processing power, RF modem for wireless communication, and up to four channels of sensing modalities, such as acoustic, seismic, and infrared. In this experiment, we perform multiple target detection algorithms on the acoustic signals captured by the microphone on each sensor node. The observations from sensor nodes are preprocessed component-wise to be zero-mean, unit-variance distributed. First, the centralized Bayesian source number estimation algorithm is performed using all the ten sensor observations. Secondly, the distributed hierarchy is applied as shown in Fig. 2, which first calculates 12

(a) Motorcycle.

(b) Diesel truck.

Figure 3: Vehicles used in the experiment. N 34

36

W

37

38

30 feet

50

35 32 feet

86 feet

42 feet

49

48

42

43

98 feet

139 feet

24 feet

32 feet

44 47 45 35 feet

(b) Sensoria sensor node.

46

S

(a) Sensor laydown.

Figure 4: The sensor laydown and the Sensoria sensor node used. the corresponding posterior probabilities of different hypotheses in the two clusters and then fuses the local results using the Bayesian posterior probability fusion method. Figure 5(a) shows the average value of the log-likelihood function in Eq. 7 corresponding to different hypothesized numbers of sources estimated over 20 repetitions. Figure 5(b) displays the histogram of the occurrence of the most probable number of sources when the log-likelihood function is evaluated 20 times. Each evaluation randomly initializes the mixing matrix A with values drawn from a zero-mean, unit-variance normal distribution. The left column in the figure corresponds to the performance from applying the centralized Bayesian source number estimation approach on all the ten sensor observations. The right column shows the corresponding performance of the distributed hierarchy with the proposed Bayesian posterior probability fusion method. Based on the average log-likelihood, it is clear that in both approaches the hypothesis with the true number of sources (m = 2) has the greatest support. However, the two approaches have different

13

centralized Bayesian source number estimation

distributed hierarchy with Bayes posterior probability fusion

50

10

40

0

−10 30 −20 20

L(m)

L(m)

−30 10

−40 0 −50 −10 −60

−20

−30

−70

1

2

3 hypothesized number of sources

4

−80

5

1

2

3 hypothesized number of sources

4

5

(a) Log-likelihood function.

histogram of distributed hierarchy with Bayes probability fusion for 20 runs 18

16

16

14

14

12

12

10

10

histogram

histogram

histogram of centralized scheme for 20 runs 18

8

8

6

6

4

4

2

0

2

1

2

3 hypothesized number of sources

4

0

5

1

2

3 hypothesized number of sources

4

5

(b) Histogram of source number estimation during 20 repetitions.

Figure 5: Performance comparison. Left: Centralized Bayesian approach. Right: Distributed hierarchy with the Bayesian posterior probability fusion. rates of correct estimations and different levels of uncertainty. Figure 6(a) illustrates the kurtosis calculated from the two histograms in Fig. 5(b). We can see that the kurtosis of the distributed approach is 8 times higher than that of the centralized approach. The detection probabilities are shown in Fig. 6(b). We observe that the centralized Bayesian algorithm can detect the correct number of sources 30% of the time, while the distributed approach increases the detection probability of centralized scheme by an average of 50%. A comparison of computation times between the centralized scheme and the distributed hierarchy are shown in Fig. 6(c). It is clear that by using the distributed hierarchy, the computation time is generally reduced by a factor of 2.

4.3

Discussion

As demonstrated in the experiment as well as the performance evaluation, the distributed hierarchy with the proposed Bayesian posterior probability fusion method has better performance in the sense of providing a higher detection probability and a more deterministic and reliable response. The reasons include: 1) The centralized scheme uses the observations from all the sensor nodes as inputs to the Bayesian source number estimation algorithm. The algorithm is thus sensitive to signal variations due to node failure or

14

comparison − detection probability

comparison − kurtosis of histogram

comparison − computation time

1

20

22

0.9

centralized scheme distributed hierarchy with Bayes fusion

20

15

0.8

18

0.7

16

5

computation time (second)

detection probability

kurtosis

10

0.6

0.5

0.4

0

8

6

4

0 distributed hierarchy with Bayes fusion

(a) Kurtosis.

10

0.2

0.1

centralized scheme

12

0.3

−5

−10

14

2

centralized scheme

distributed hierarchy with Bayes fusion

(b) Detection probability.

0

2

4

6

8

10 repetition

12

14

16

18

20

(c) Computation time.

Figure 6: Comparison: kurtosis, detection probability, and computation time. environmental noise in each input signal. While in the distributed framework, the source number estimation algorithm is only performed within each cluster, therefore, the effect of signal variations are limited locally and might contribute less in the posterior probability fusion process. 2) In the derivation of the Bayesian posterior probability fusion method, the physical characteristics of sensor networks, such as the signal energy captured by each sensor node versus its geographical position, are considered, making this method more adaptive to real applications. Furthermore, the distributed hierarchy is able to reduce the network traffic by avoiding large amount of data transmission, hence conserving energy and providing a scalable solution. The parallel implementation of the estimation algorithm in each cluster can also reduce the computation time by half.

5

Conclusions

This work addressed the problem of source number estimation in sensor networks for multiple target detection. This problem is similar to the BSS problem in signal processing and ICA is traditionally the most popular algorithm to solve it. The classical BSS problem includes two aspects of research: source number estimation and source separation. The multiple target detection problem in sensor networks is similar to source number estimation. Based on the description of several centralized source number estimation approaches, we developed a distributed source estimation algorithm that avoids large amounts of long-distance data transmission, and in turn reduces the network traffic and conserves energy. The distributed processing hierarchy consists of two levels: First, a local source number estimation is performed in each cluster using the centralized Bayesian source number estimation approach. Then a posterior probability fusion method is derived based on Bayes’ theorem to combine the local estimations and generate a global decision. An experiment is conducted on the detection of multiple civilian vehicles using acoustic signals to evaluate the 15

performance of the two approaches. The distributed hierarchy with the Bayesian posterior probability fusion method is shown to provide better performance in terms of the detection probability and reliability. In addition, the distributed framework can reduce the computation time by half.

References [1] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “A survey on sensor networks,” IEEE Communications Magazine, vol. 40, no. 8, pp. 102–114, August 2002. [2] S. Kumar, D. Shepherd, and F. Zhao, “Collaborative signal and information processing in micro-sensor networks,” IEEE Signal Processing Magazine, vol. 19, no. 2, pp. 13–14, March 2002. [3] D. Li, K. D. Wong, Y. H. Hu, and A. M. Sayeed, “Detection, classification, and tracking of targets,” IEEE Signal Processing Magazine, vol. 19, no. 2, pp. 17–29, March 2002. [4] X. Wang, H. Qi, and S. S. Iyengar, “Collaborative multi-modality target classification in distributed sensor networks,” in Proceedings of the Fifth International Conference on Information Fusion, Annapolis, MD, July 2002, vol. 1, pp. 285–290. [5] K. Yao, J. C. Chen, and R. E. Hudson, “Maximum-likelihood acoustic source localization: experimental results,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002, vol. 3, pp. 2949–2952. [6] F. Zhao, J. Shin, and J. Reich, “Information-driven dynamic sensor collaboration,” IEEE Signal Processing Magazine, vol. 19, no. 2, pp. 61–72, March 2002. [7] A. N. Knaian, “A wireless sensor network for smart roadbeds and intelligent transportation systems,” M.S. thesis, Massachusetts Institute of Technology, June 2000. [8] K. A. Delin and S. P. Jackson, “Sensor web for in situ exploration of gaseous biosignatures,” in Proceedings of 2000 IEEE Aerospace Conference, Big Sky, MT, March 2000. [9] X. Yang, K. G. Ong, W. R. Dreschel, K. Zeng, C. S. Mungle, and C. A. Grimes, “Design of a wireless sensor network for long-term, in-situ monitoring of an aqueous environment,” Sensors, vol. 2, no. 7, pp. 455–472, 2002. [10] A. Cerpa, J. Elson, M. Hamilton, and J. Zhao, “Habitat monitoring: application driver for wireless communications technology,” in 2001 ACM SIGCOMM Workshop on Data Communications in Latin America and the Caribbean, April 2001. [11] A. J. Bell and T. J. Sejnowski, “An information-maximisation approach to blind separation and blind deconvolution,” Neural Computation, vol. 7, no. 6, pp. 1129–1159, 1995. [12] J. Herault and J. Jutten, “Space or time adaptive signal processing by neural network models,” in Neural Networks for Computing: AIP Conference Proceedings 151, J. S. Denker, Ed., New York, 1986, American Institute for Physics. [13] Y. Tan and J. Wang, “Nonlinear blind source separation using higher order statistics and a genetic algorithm,” IEEE Transactions on Evolutionary Computation, vol. 5, no. 6, pp. 600–612, 2001. [14] P. Comon, “Independent component analysis, a new concept,” Signal Processing, vol. 36, no. 3, pp. 287–314, April 1994. [15] A. Hyvarinen and E. Oja, “Independent component http://www.cis.hut.fi/aapo/papers/IJCNN99 tutorialweb/, April 1999.

16

analysis:

a

tutorial,”

[16] J. Karhunen, “Neural approaches to independent component analysis and source separation,” in Proceedings of 4th European Symposium on Artificial Neural Networks (ESANN), 1996, pp. 249–266. [17] T. Lee, M. Girolami, A. J. Bell, and T. J. Sejnowski, “A unifying information-theoretic framework for independent component analysis,” International Journal on Mathematical and Computer Modeling, 1999. [18] S. Roberts and R. Everson, Eds., Independent Component Analysis: Principles and Practice, Cambridge University Press, 2001. [19] A. Hyvarinen and E. Oja, “A fast fixed-point algorithm for independent component analysis,” Neural Computation, vol. 9, pp. 1483–1492, 1997. [20] R. Linsker, “Local synaptic learining rules suffice to maximize mutual information in a linear network,” Neural Computation, vol. 4, pp. 691–702, 1992. [21] A. Hyvarinen, “Fast and robust fixed-point algorithms for independent component analysis,” IEEE Transactions on Neural Networks, vol. 10, no. 3, pp. 626–634, 1999. [22] K. H. Knuth, “A Bayesian approach to source separation,” in Proceedings of First International Conference on Independent Component Analysis and Blind Source Separation: ICA’99, 1999, pp. 283– 288. [23] S. Richardson and P. J. Green, “On Bayesian analysis of mixtures with an unknown number of components,” Journal of the Royal Statistical Society, series B, vol. 59, no. 4, pp. 731–758, 1997. [24] S. J. Roberts, “Independent component analysis: source assessment & separation, a bayesian approach,” IEE Proceedings on Vision, Image, and Signal Processing, vol. 145, no. 3, pp. 149–154, 1998. [25] D. J. C. MacKay, “Monte Carlo methods,” in Learning in Graphical Models, M. I. Jordan, Ed., pp. 175–204. Kluwer, 1999. [26] H. Attias, “Inferring parameters and structure of latent variable models by variational Bayes,” in Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, 1999, pp. 21–30. [27] C. M. Bishop, Neural networks for pattern recognition, Oxford University Press, 1995. [28] R. Choudrey, W. D. Penny, and S. J. Roberts, “An ensemble learning approach to Independent Component Analysis,” in Proceedings of Neural Networks for Signal Processing, Sydney, December 2000. [29] V. Raghunathan, C. Schurgers, S. Park, and M. B. Srivastava, “Energy-aware wireless microsensor networks,” IEEE Signal Processing Magazine, vol. 19, no. 2, pp. 40–50, March 2002. [30] J. Chamberland and V. V. Veeravalli, “Decentralized detection in sensor networks,” IEEE Transactions on Signal Processing, vol. 51, no. 2, pp. 407–416, February 2003. [31] R. Krysztofowicz and D. Long, “Fusion of detection probabilities and comparison of multisensor systems,” IEEE Transactions on System, Man, and Cybernetics, vol. 20, pp. 665–677, May/June 1990. [32] V. Hedges and I. Olkin, Statistical methods for meta-analysis, New York: Academic, 1985.

17