Distributed Signal Detection under the Neyman

0 downloads 0 Views 312KB Size Report
a simple signal-present alternative hypothesis H1 and a simple null hypothesis H0. Each ... decision for H0 and u0 = 1 corresponds to a decision for H1.
Distributed Signal Detection under the Neyman-Pearson Criterion Qing Yan and Rick S. Blum

Member 

EECS Department Lehigh University Bethlehem, PA 18015 Abstract -

A procedure for nding the Neyman-Pearson optimum distributed sensor detectors

for cases with statistically dependent observations is described. This is the rst valid procedure we have seen for this case. This procedure is based on a Theorem proven in this paper. These results clarify and correct a number of possibly misleading discussions in the existing literature. Cases with networks of sensors in fairly general con gurations are considered along with cases where the sensor detectors make multiple bit sensor decisions.

Keywords: Neyman-Pearson criterion, multisensor, correlated observations.

This paper is based on work supported by the Oce of Naval Research under Grant No. N00014-97-10774 and by the National Science Foundation under Grant No. MIP-9703730 

1

1 Introduction Consider the design of an N -sensor distributed detection scheme, which is to decide between a simple signal-present alternative hypothesis H1 and a simple null hypothesis H0. Each sensor has an associated processor which makes a decision based only on the observations obtained from the sensor. The sensor processors transmit their decisions to a single central fusion center where an overall decision is made. A particular value xk of the random vector

Xk is observed at the kth sensor, k = 1; : : : ; N , where xk consists of a set of mk real scalar observations. We consider the case where the X1; : : : ; XN may not be independent. The nal binary decision in our distributed detection scheme is denoted by the random variable

U0 , with a particular realization of U0 denoted by u0 and where u0 = 0 corresponds to a decision for H0 and u0 = 1 corresponds to a decision for H1. Uk is the random variable which describes the decision made at the kth sensor. A particular value for Uk is denoted by uk which may take on only the values 0 or 1 (binary sensor decisions). We let 0(u) denote the probability that we decide for U0 = 1 for a given set of sensor decisions u = (u1; : : : ; uN ). We let k (xk) denote the probability we decide for Uk = 1 for a given observation xk. A complete set of sensor rules and fusion rule are described by = ( 0; 1; : : : ; N ). Let us focus on the Neyman-Pearson criterion. Speci cally, denote the problem of interest as NP which is de ned as nding a that satis es

NP : max

Pd ( ) subject to the constraint Pf ( ) = 2

where Pd( ) = Prob(U0 = 1jH1) is the probability of detection obtained when is used,

Pf ( ) = Prob(U0 = 1jH0) is the probability of false alarm obtained when is used, and 0   1. Specifying the forms of NP optimum distributed detection schemes can be extremely dicult [1], especially for cases with dependent observations from sensor to sensor where the optimum sensor test statistics are not generally likelihood ratios. In Section 2 we provide Theorems giving conditions on the optimum sensor detectors for a parallel distributed sensor topology. A discussion in Section 3 clari es some misconceptions that have appeared in the literature. Examples illustrating the use of the theory from Section 2 are provided in Section 4. Also, a general procedure for nding the Neyman-Pearson optimum distributed sensor detectors for cases with statistically dependent observations is described in Section 4. This is the rst valid procedure we have seen for this case. Extensions to general topologies using multiple bit sensor decisions without feedback are considered in Section 5. Conclusions are given in Section 6.

2 Optimum Sensor Tests Let us assume the Xk; k = 1; : : : ; N each have probability density functions (pdfs) fXk (xk jHj ); j = 0; 1. De ne

Djk (xk) = fXk (xkjHj )

X

Prob(U0 = 1jUfk = ufk ; Uk = 1) ? Prob(U0 = 1jUfk = ufk ; Uk = 0)

uek f Prob(Uk = ufk jXk = xk ; Hj );



(2.1)

3

for j = 0; 1 and k = 1; : : : ; N where ufk stands for a speci c value of the random vector

Ufk of sensor decisions excluding the kth so that Ufk = (U1 ; U2; : : : ; Uk?1; Uk+1; : : : UN ) and Prob(U0 = 1jUfk = ufk ; Uk = uk ) = Prob(U0 = 1jU = u) describes the fusion rule 0. The sum in (2.1) is over all values of ufk (for example, if N = 2 and k = 1, then ufk = u2 and the sum is over u2 = 0; 1). Note that the conditional probability Prob(Ufk = ufk jXk = xk; Hj ) is de ned as a limit as the conditioning event shrinks to a point. Using these de nitions, we present Theorem 1 which gives a set of necessary conditions for an optimum sensor rule given the fusion rule and other sensor rules are xed.

Theorem 2.1 Given a fusion rule and a set of sensor processor rules at all but the kth sensor and a statistical description for X1 ; :::; XN under H0 and H1 such that 1) Xk is an mk -dimensional random vector with a probability density function fXk (xk jHj ) with no point masses of probability under either hypothesis j = 0; 1; 2) D1k (Xk)=D0k (Xk) is a continuous scalar random variable with a probability density function with no point masses of probability under either hypothesis; Then 1) A k of the form 8 > > > < 1; if D1k (xk ) > k D0k (xk )

k (xk) = Prob(Uk = 1jXk = xk) = > > > : 0; if D1k (xk ) < k D0k (xk )

(2.2)

will satisfy NP for the given fusion rule and the given set of sensor processor rules provided there exists some rule k0 that will provide the required overall false alarm probability for the

4

given fusion rule and the given set of sensor processor rules. The event D1k (xk) = k D0k (xk ), which occurs with zero probability, can be assigned k = 0 or k = 1. 2) Any rule that satis es NP for the given fusion rule and the given set of sensor processor rules must be of this form except possibly on a set having zero probability under H0 and H1.

Theorem 2.1 was essentially proven as Theorem 1 in [2]. Here, however, we omit the third assumption made in Theorem 1 in [2]. It turns out that the second assumption can be used instead of it whenever the third assumption was used in the proof in [2]. Theorem 2.1 gives the best form of any sensor detector, given all the other sensors and the fusion rule are xed. Thus it gives conditions for person-by-person optimality. No better rule can be found by changing one sensor at a time. However, if two sensors are changed at the same time, it is possible that performance can be improved. Next, we will show that by considering changes in two sensors at the same time we can put further restrictions on the conditions produced by Theorem 2.1 such that 1 = 2 =    = N  0 will produce an optimum solution. First we demonstrate that only k  0 need be considered in the following Lemma.

Lemma 2.1 Given a set of sensor rules and a fusion rule with the kth sensor using (2.2) with k < 0, equivalent or better performance, in terms of Pf and Pd , can be achieved with a k  0.

Proof: Consider a sensor processor rule k given by (2.2) with k < 0. Let Pd denote the detection probability and Pf the false alarm probability obtained when using this rule along with the fusion rule and other processor rules. Next, we will show that we can use another rule k from (2.2) with k = 0 replacing k which achieves better performance. First note that it has been shown in (6)-(9) from [2] that, down to a constant, the probability of 5

detection can be computed by integrating D1k (xk ) over the set fxk j k (xk ) = 1g. Since the constant will cancel in any computation of a change in probability of detection we nd

Pd = Pd +

Z Ak

D1k (xk )dxk ?

Z

D (x )dx Bk 1k k k

(2.3)

where Ak = fxk j0 < D1k (xk ) < k D0k (xk ); D0k (xk ) < 0g and Bk = fxk jk D0k (xk )
0g. Notice that D1k (xk ) > 0 8xk 2 Ak and D1k (xk ) < 0 8xk 2 Bk so Pd  Pd . Using similar ideas [2] we nd

P = P f

f

+

Z

Z

D (x )dx ? D (x )dx  Pf Ak 0k k k Bk 0k k k

(2.4)

since D0k (xk ) < 0 8xk 2 Ak and D0k (xk ) > 0 8xk 2 Bk . 2 In the sequel, we consider only k  0.

Theorem 2.2 Under the same assumptions in Theorem 2.1 and if the pdf of D1k (Xk)=D0k (Xk) under Hj ; j = 0; 1, is greater than zero for 0 < D1k (Xk)=D0k (Xk) < 1 (k  0 assumed from Lemma 2.1), the best performance can only be obtained with a set of sensor rules

1; : : : ; N described in Theorem 2.1 with 1 = 2 =    = N = . Thus, under these conditions only a set of sensor processor rules ( 1 ; 2;    ; N ) of the form 8 > > > < 1; if D1k (xk ) > D0k (xk )

k (xk) = Prob(Uk = 1jXk = xk) = > > > : 0; if D1k (xk ) < D0k (xk )

6

(2.5)

will satisfy NP for the given fusion rule. The event D1k (xk) = D0k (xk), which occurs with zero probability, can be assigned k = 0 or k = 1. In nonsingular detection cases with the pdf of D1k (Xk )=D0k (Xk ) under Hj equaling zero for some positive D1k (Xk)=D0k (Xk) there can be other solutions which appear to be di erent which are also optimum.

Proof: First assume that the pdf of D1k (Xk)=D0k (Xk) under Hj is greater than zero for positive D1k (Xk)=D0k (Xk), j = 0; 1, Then, we only need to show that for a xed fusion rule, the set of sensor processor rules ( 1; 2;    ; N ) can not be optimum if any two of them,

m and n (1  m; n  N; m 6= n), take di erent parameters m and n. We prove this by contradiction. Let Ai denote the decision region of sensor i, thus ui = 1 if xi 2 Ai. Further, let

(A1 ;    ; Am ;    ; An;    ; AN ) denote the distributed decision scheme which results from using the sensor decision regions A1 ;    ; Am ;    ; An;    ; AN . Let ( 1; 2;    ; N ) denote a set of rules for which m and n (1  m; n  N; m 6= n), take di erent parameters m and

n. A set of rules which is better than ( 1; 2;    ; N ) in the NP sense could be found by using the following steps. We assume m > n, as the proof for the opposite case m < n is in fact the same. De ne A(ka;b) = fxk jaD0k (xk )  D1k (xk ) < bD0k (xk ); D0k (xk ) > 0g, and Bk(a;b) = fxk jbD0k (xk )  D1k (xk ) < aD0k (xk ); D0k (xk ) < 0g. First we change the parameter m of the decision rule of sensor m by a small amount, i.e. m = m ? ", " > 0, thus the decision region of sensor m will be Am , where Am = (Am [ A(mm ?";m) ) \ (Bm(m ?";m))c. Consequently, Pf with (A1 ;    ; Am;    ; AN ) is given 7

by (see (6)-(9) in [2])

P = P f

f

+

Z

Z

D (x )dx ? D (x )dx A(mm ?";m ) 0m m m Bm(m ?";m ) 0m m m

(2.6)

From the de nition we see that D1n(xn )=D0n(xn) is the combination of continuous functions of m, thus D1n (xn)=D0n(xn) itself is a continuous function of m. Therefore, there exists a minimum  > 0 so that

jD1n(xn )=D0n(xn) ? D1n(xn)=D0n(xn )j <  for all xn 2 A(nn ;m ) [ Bn(n ;m)

(2.7)

Also, there exists a  > 0 so that Pf with the decision region

(A1 ;    ; Am ;    ; An;    ; AN ) is equal to Pf with (A1;    ; Am;    ; An;    ; AN ), or Pf ?

Pf = Pf ? Pf, where An = (An [ Bn(n +;n++) ) \ (A(nn +;n++) )c. Hence Z

Z

D (x )dx ? D (x )dx = A(mm ?";m ) 0m m m Bm(m ?";m ) 0m m m Z Z  D (x )dx D (x )dx ? An(n +;n ++) 0n n n Bn(n +;n ++) 0n n n

(2.8)

Similar to (2.6) we have [2]

Pd = Pd +

Z A(mm ?";m )

D1m (xm )dxm ?

8

Z

D (x )dx Bm(m ?";m ) 1m m m

(2.9)

and

P  = P  ? d

d

Z

Z

D



(x )dx + D (x )dx An(n +;n ++) 1n n n Bn(n +;n ++) 1n n n

(2.10)

From the de nition of A(mm ?";m) we must have Z A(mm ?";m )

D1m (xm )dxm  (m ? ")

Z

D (x )dx : A(mm ?";m ) 0m m m

(2.11)

From the de nition of Bm(m ?";m) it follows that

?

Z

Z

D (x )dx  ?(m ? ") (m ?";m) D0m (xm )dxm Bm(m ?";m ) 1m m m Bm

(2.12)

Similarly, we obtain Z An(n +;n ++)

D1n(xn )dxn < (n +  +  )

Z



D (x )dx An(n +;n ++) 0n n n

(2.13)

and

?

Z Bnn +;n ++) (

Z

1n (xn )dxn < ?(n +  +  ) Bn(n +;n ++) D0n (xn )dxn :

D



(2.14)

Combining the previous four equations with (2.8), we get Z

>

Z

D (x )dx ? D (x )dx A(mm ?";m ) 1m m m Bm(m ?";m ) 1m m m  Z m ? "  Z   n +  +  An(n+;n++) D1n(xn )dxn ? Bn(n+;n++) D1n (xn)dxn 9

(2.15)

Since  and  will get monotonically smaller when " is made smaller, we can choose

" small enough so that n +  +  < m ? ", thus (2.15) becomes Z

>

Z

A(mm ?";m )

An(n +;n ++)

D1m(xm )dxm ? D1n (xn)dxn ?

Z

Z

D (x )dx Bm(m ?";m ) 1m m m 

D (x )dx Bn(n +;n ++) 1n n n

(2.16)

which can be rewritten as

Pd ? Pd > ?(Pd ? Pd)

(2.17)

Pd > Pd

(2.18)

or

This means that the rules de ned by (A1 ;    ; Am;    ; An;    ; AN ) achieve a larger detection probability while maintaining the same level of false alarm. This contradicts the assumption that a scheme without 1 = 2 =    = N can be optimum. If m is taken to be at 1 while n; n = 1; : : : ; N; n 6= m are taken to be nite, then a similar argument to that made above shows that performance can be improved by choosing a nite m. Now suppose that the pdf of D1k (Xk)=D0k (Xk) under Hj is zero for some positive

D1k (Xk)=D0k (Xk) and j = 0; 1. We do not consider cases where only one of these pdfs, either under H0 or H1, is zero but the other is not since this describes a singular detection problem. If k is in any interval where the pdf of D1k (Xk)=D0k (Xk) under Hj is zero, then we can clearly move k to any point in this interval without any change in performance. Of course, such changes are not really of any signi cance and if these changes 10

are ignored the above results still hold. Except, of course this possibility introduces cases without 1 = 2 =    = N which can produce exactly the same performance, instead of

1 = 2 =    = N being strictly better. Note that this possibility is incorporated in the wording of the Theorem. 2

3 Discussion Necessary conditions for the optimum sensor detectors for NP have been studied in a few previous papers, see for example [3, 4] which were the rst. However, the derivations provided in these papers have been questioned [1, 5]1. Our derivations do not leave any questions. We clearly show our conditions which allow 1 6= 2 6=    6= N are necessary to solve NP. Then we show that constraining 1 = 2 =    = N is necessary if the pdfs of the sensor test statistics have support over the whole real line and in other cases it will not sacri ce optimality (we ignore singular detection cases). In [1] the author demonstrates that attempting to solve NP in a distributed case by maximizing Pd ? Pf without constraints, which was the approach taken in [3, 4], is not generally correct. In particular, he demonstrates that this procedure will fail if the overall receiver operating curve (ROC) is not concave. This is signi cant since it is not true (even for cases with a xed fusion rule and no point masses in the sensor test distributions) that the overall ROC must be concave. In fact, a counter example is given for the case of a xed fusion rule in [6] for a case with point masses in the sensor test distributions. In Section 4 We note that for the special case of independent observations, likelihood ratio tests have been shown to be optimum in a few paper and these results were not questioned. The rst of these papers was [5]. 1

11

of this paper we present a counter example for a case with a xed fusion rule and no point masses in the sensor test distribution. This is the rst example of this type we have seen.

Clearly the ROC can be non-concave if the fusion rule is not xed. For an example see [7]. Note that our proof of Theorem 2.1 and Theorem 2.2 did not rely on the overall ROC being concave since we did not attempt to maximize Pd ? Pf . We note that it seems impossible to determine if the ROC is concave without having the optimum solutions. Interestingly enough, even though we did not attempt to maximize Pd ? Pf , the conditions we provide through Theorem 2.1 and Theorem 2.2 in this paper are similar in form to part2 of the necessary conditions produced in [3, 4]. Since the formulation used in [3, 4] to obtain these necessary condition is not correct (without concave ROC), this is just a coincidence. To further demonstrate that such coincidences can occur, next a similar circumstance is presented where this occurs for a di erent problem. Consider an optimization problem where one is attempting to nd a vector t that maximizes a quantity Pd under the constraint Pf = . For this example assume Pd and Pf are continuous functions of each component of t. Necessary conditions for this case are well known. For example, restating a Theorem from page 224 of [8] gives the following.

Theorem 3.1 Let Pd be a real-valued function of t = (t1 ; : : : ; tN ) where ?1  ti  1 for i = 1; : : : ; N . Let Pf be another real-valued function of t. Let to be a local extremum of Pd under the constraint Pf = and assume rPf 6= 0 at to where rPf denotes the gradient of Pf with respect to t. Then there exists a real-valued  such that at to we have rPd ? rPf = 0. Unlike [3, 4] we do not have an equation requiring  to be the threshold of the xed fusion rule in our Theorems and we give conditions under which our Theorems provide necessary and sucient conditions. No conditions are given in [3, 4]. 2

12

The conditions provided in Theorem 3.1, rPd ? rPf = 0, are called rst-order necessary conditions and  in Theorem 3.1 is generally called a Lagrange multiplier. rPd ?

rPf = 0 says that the normal vectors to the tangent planes of the Pd and Pf surfaces must point in the same direction at the extremum (see the proof in [8]). Theorem 3.1 does

not generally imply to will be at an unconstrained extremum of Pd ? Pf . In fact, a few counter examples are presented in [9] which show this is not generally the case. However, the conditions rPd ? rPf = 0 would also be obtained as necessary conditions to nd extrema of Pd ? Pf without constraints for the correct , the Lagrange multiplier. So in fact, the correct form of the necessary conditions are obtained using a possibly inappropriate procedure (attempting to nd extrema of Pd ? Pf without constraints instead of using Theorem 3.1). Our situation is similar, in one important way. Our correct conditions are produced by combining Theorem 2.1 and Theorem 2.2, not by attempting to maximize Pd ? Pf . However these conditions look similar to part of those which have been obtained as necessary conditions for maximizing Pd ? Pf , just by coincidence. Attempting to maximize Pd ? Pf is generally not the correct way to solve NP, and one can nd cases where this will give wrong answers. We present one such example at the end of Section 4. In this example, the overall ROC is not concave. In the preceding discussion we bring out some ways in which our correct conditions produced in Theorem 2.1 and Theorem 2.2, are di erent from the necessary conditions for maximizing Pd ? Pf from [3, 4]. To make sure these were noted, we repeat them here. First our conditions do not include an equation requiring  to be the threshold of the xed fusion rule, unlike those in [3, 4]. Second, we describe when our results give either necessary or 13

sucient conditions to solve NP. Absolutely no such description is given in [3, 4]. Further, our Theorems provide regularity conditions needed for them to be true. There are absolutely no regularity conditions stated in [3, 4], even though some are clearly needed. As a very simple example, we require D1k (Xk)=D0k (Xk) be a continuous random variable. If this is the case, it is clear that randomization at the sensor rules can not improve performance, but if this is not the case then randomization at the sensor rules might be needed for optimum performance [1]. Another di erence is that our Theorems imply how to use our results. For example, there can be several di erent  which satisfy our conditions for a given false alarm probability. If this is the case then all solutions, with all the valid , must be tested to nd an optimum solution. It is not clear that this is the case in [3, 4]. For our approach, these ideas are illustrated in Section 4, where we demonstrate the rst valid algorithm we have seen for nding NP optimum solutions for distributed detection cases with dependent observations. The discussions in [1, 5] led to a number of publications which stated that they produced counter examples in which conditions similar to those produced by the combination of Theorem 2.1 and Theorem 2.2 do not work (in particular we mean the conditions from [3, 4].). Since these conditions were originally produced using an incorrect methodology, we understand the motivation of these authors. However, in checking these counter examples, we found the results in Theorem 2.1 and Theorem 2.2 did work. Next we describe our ndings in more detail. First, consider a three-sensor scheme for detecting a Rayleigh fading signal in Gaussian noise for the case where the observations are independent from sensor to sensor [5]. In this case, likelihood ratio tests are optimum at the sensors. Thus performance is determined 14

by the performance of the individual sensors and the fusion rule. A detailed derivation of the performance of the individual sensors is provided in [10]. We summarize only the main results needed. Let  be the common signal-to-noise ratio of each sensor, PFi; PDi be the false alarm probability and detection probability at sensor i, and ti be the threshold at sensor i to which the likelihood ratio should be compared. From [10] 8 > > < (ti (1 + ))?1?(1=) if ti > 1+1  PFi = > > > :1 if ti  1+1 

(3.19)

The reason that PFi = 1 for any ti  1+1  is that the sensor likelihood ratio (see (7-32) on page 203 in [10]) is 1=(1 + ) times an real exponential function with a positive argument so that the likelihood ratio must be 1=(1 + ) or larger. Further, it is shown in [10] that

PDi = PFi1=(1+) if ti > 1=(1 + ). Now assume a fusion rule u0 = u2(u1 + u3). The authors of [5] rst perform a direct optimization for this problem. The key nding is that the optimum solution must use PF 2 = PD2 = 1. Then they attempt to use the necessary conditions on the sensor rules from [3, 4] to get the same solution. This leads to (1 ? PF 3) t1 =  PPF 2(1 D 2 ? PD 3 ) 1 ? (1 ? PF 1)(1 ? PF 3) t2 =  1 ? (1 ? PD1)(1 ? PD3 ) (1 ? PF 1)PF 2 t3 =  (1 ? PD1 )PD2

(3.20)

which would be the same conditions obtained from Theorem 2.1 and Theorem 2.2. The 15

authors then attempt to show that (3.20) gives a solution which does not use PF 2 =

PD2 = 1, so this solution is not optimum. However, in the process, the authors use PF 2 = (t2 (1 + ))?1?(1=) to eliminate t2 from (3.20). Unfortunately from (3.19), it is invalid to use PF 2 = (t2 (1 + ))?1?(1=) unless it is known that t2 > 1+1  which equivalent to

PF 2 < 1. Since their direct optimization found PF 2 = 1 this is not the case. To see that the solution from the direct optimization, does satisfy (3.20) and (3.19) we consider a special case  = 1 to simplify matters. First, use a fact found from the direct optimization, that is PF 1 = PF 3 < 1. De ne 2 = PF 1 = PF 3. Then from (3.19) we nd

p

2 = (2t1 )?2 and so with PF 2 = PD2 = 1 and (1 ? PF 3 )=(1 ? PD3 ) = (1 ? PF 3)=(1 ? PF 3) = (1 + ) the rst equation of (3.20) yields

 = 2 (11+ ) :

(3.21)

Using these same results in the other equations in (3.20) yields (t1 = t3 from (3.19))

t1 = 21 2 t2 = 2(2 2?? 2 + ) t3 = 21

p

(3.22)

and from the fusion rule we nd = (1 ? 1 ? ) where is the false alarm probability. 1 2

From (3.22), we see that t2  12 , and thus PF 2 = PD2 = 1, which agrees with the results given in the direct optimization. We note however, that as long as t2  12 , optimum performance will be obtained. This 16

is true even if the second equation in (3.22) is not satis ed. Thus, this example shows that the conditions in Theorem 2.1 and Theorem 2.2 are not always necessary conditions, as stated in [3, 4]. This example shows that, exactly as we state in Theorem 2.1 and Theorem 2.2, we can obtain an optimum solution using Theorem 2.1 and Theorem 2.2, but in cases where the pdf of D1k (Xk)=D0k (Xk) under Hj is zero for some positive D1k (Xk)=D0k (Xk) there can be other solutions which appear to be di erent which are also optimum. Finally, we note that a similar example is discussed in [11] which appears to be based on this example from [5].

4 Illustration of the use of Theorem 2.1 and Theorem 2.2 By using Theorem 2.1,Theorem 2.2, we can employ a Gauss-Seidel type iterative algorithm to solve a wide range of optimum distributed detection problems under the Neyman-Pearson criterion. Our approach employs the technique used in xed fusion rule Bayesian optimization of distributed detection schemes [12, 13, 14] with a slight twist. The twist is to nd the best  and then to apply the Gauss-Seidel procedures given in [12, 13, 14]. For a speci c , we randomly initialize all the sensor rules. The fusion rule is initialized and xed in the whole procedure. Then we will use (2.5) to calculate the kth rule and update it, for k = 1; : : : ; N . We repeat this procedure until all the sensor rules converge. In general we must do this for all  that give the required false alarm probability and choose the solution that gives best performance. While this is a very tedious procedure, frequently, there is only one single solution for a given value of . If this is the case then it is 17

easy to show that the false alarm probability is monotonic with respect to . If it is possible to prove there is only one single solution for a given value of  beforehand (frequently dicult), then instead of searching all , we can greatly simplify the procedure by using Newton's algorithm in conjunction with (2.5). Next we illustrate these ideas with some examples. As a rst example, consider a two-sensor problem with a binary hypotheses

H0 : x1 ; x2  N (0; 0; 1; 1; ) H1 : x1 ; x2  N (s1; s2; 1; 1; ) where N (a; b; c; d; e) denotes a bivariate Gaussian distribution with mean vector E [(x1 ; x2 )T ] =

p

(a; b)T ; V ar(x1) = c; V ar(x2) = d and E [x1 x2 ] = e cd. Assume the xed fusion rule is the AND rule and consider the case of s1 = 1; s2 = 2 and  = 0:2. In this case the iterative algorithm produces the overall ROC shown in Fig. 1 which is concave. Also, in this case the false alarm probability is monotonic with respect to the value of  as shown in Fig. 2. The optimum thresholds on the sensor observations are shown in Fig. 3. We see that the optimum sensor decision regions are always single semi-in nite intervals in this case, which tallies with the results from [13] as expected. We have considered similar cases with  = 0 and our approach produced an optimum solution matching results of a direct optimization for every case we tried. For example, Fig. 4 shows the optimum sensor observation thresholds for the case where s1 = 0:5; s2 = 0:1. We note that for a given value of , several di erent converged solutions may result from the algorithm with di erent initializations. For instance, if we use the XOR fusion 18

rule in the s1 = 1; s2 = 2 example described above with  = 0:975, there are at least six converged solutions for  = 0:5 (see Fig. 5), most of which do not solve NP. It should be further noted, as illustrated in Fig. 5, that these converged solutions may correspond to di erent false alarm probabilities, and more than one of them may be a proper optimum solution in the NP sense for some, possibly di erent, Pf constraints. Clearly, if a given value of  corresponds to more than one optimum solution, for di erent values of Pf , then Pf versus  is no longer a function, let alone a monotone function. Our next example illustrates such a case. Consider a distributed, 2-sensor, binary hypotheses testing problem where a xed AND fusion rule is employed. Assume the sensor observations are independent, noise only, observations under hypothesis H0 where the noise follows the generalized Gaussian distribution which is de ned by

where

k ?[jxj=A(k)]k fx(x) = 2A(k)?(1 e =k)

(4.23)

 ?(1=k)  A(k) = 2

(4.24)

?(3=k)

1 2

and ? is the gamma function. In this example, k is 1.1 and  is 1. Under the signal present hypothesis we assume a constant signal 1.3 is added to the noise observations (same distribution as under H0) at each sensor. It is interesting that in this case the ROC curve in Fig. 6 is non-concave, however our approach for nding optimum sensor detectors still works as indicated by the analysis in Section 2. With some manipulation we can see that the ROC is not concave more clearly. 19

The key is to plot Pd ? 0:35  Pf versus Pf as in Fig. 7. Since any linear terms added to Pd do not change its concavity, Fig. 7 clearly illustrates nonconcavity. This example is a case where attempting to solve NP by maximizing Pd ? Pf is not correct for some Pf . This is as explained in [1]. We summarize the discussion from [1]. Consider the local minimum on the ROC with Pf near 0:27. Call this point A. It is clear from Fig. 7 that no  exists such that we can subtract the linear term ( ? 0:35)Pf from the curve of PD ? 0:35PF such that point A will be at the maximum point of the resulting curve PD ? PF versus PF . This implies that we can not nd the solution to NP for the false alarm probability corresponding to point A by trying to maximize Pd ? Pf . To see this recall that by de nition point A on the ROC must correspond to the optimum solution. We note that these same arguments hold for points on the ROC near point A. The problem can also be seen in a di erent way from Fig. 8. Those solutions in the lower left part of Fig. 8 (the line with less negative slope which is below the other line for smaller  in Fig. 8) lie on the ROC but it is clear they can not maximize Pd ?Pf for any value of . This is due to the fact that there is more than one solution (each with a di erent false alarm probability) corresponding to the same value of  as illustrated in Fig. 9. Therefore, with di erent initialization, the iterative algorithm may converge to di erent solutions. This is illustrated in Fig. 10 where we plot the thresholds used on the observations at each sensor versus iteration number. Here we see that the procedure converges to three di erent answers.

20

5 Extensions Consider a case where an nk bit sensor decision is to be made at sensor k (k = 1; : : : ; N ) and where the multiple-bit sensor decisions from all N sensors will be sent to the fusion center to generate a nal decision. In this case the results in Theorem 2.1 and Theorem 2.2 can be shown to still hold with minor modi cation. In particular the lth bit of the decision at sensor k should be made using the rules from Theorem 2.1 and Theorem 2.2 with Djk (xk) replaced by

Djkl(xk) = fXk (xkjHj )

X

Prob(U0 = 1jUgkl = g ukl; Ukl = 1) ? Prob(U0 = 1jUgkl = ugkl ; Ukl = 0)

ufkl g Prob(Ukl = g ukljXk = xk; Hj ):



(5.25)

All of the rules to generate each bit at each sensor should use the same  if the pdf of

D1k (Xk)=D0k (Xk) under Hj is greater than zero for all D1k (Xk)=D0k (Xk) > 0, j = 0; 1. If not then other solutions can also be optimum. The proof is omitted since it is so similar to those for Theorem 2.1 and Theorem 2.2. The results from Theorem 2.1 and Theorem 2.2 can also be extended for some other topologies besides the parallel topology we have focused on. Generally, consider an Nsensor distributed detection tree network without feedback. The interconnection structure is speci ed by the communication matrix R, as motivated in [11]. The elements of this matrix take binary values and indicate the presence or absence of directed communication links between pairs of sensors. R is of dimension N  N where N is the total number of sensors in a given system (including any sensors at the global decision maker). We de ne 21

the input sensor set of the kth sensor as follows

Ik = fi : Rik = 1; for all ig

(5.26)

and the input decision tuple U Ik is (Uk ; Uk ;    ; Ukn ) where k1; k2;    ; kn are elements of Ik . 1

2

Theorem 2.1 and Theorem 2.2 still hold for the case of tree networks with Djk (xk) replaced by (binary sensor decision assumed and fkgc = f1; : : : ; k ? 1; k + 1; : : : ; N g)

Djk (xk; uIk ) =

X  u(Ik [fkg)c



Prob(xk; U fkgc = ufkgc jHj ) Prob(U0 = 1jU fkgc = ufkgc ; Uk = 1; xk; Hj ) 

?Prob(U0 = 1jU fkgc = ufkgc ; Uk = 0; xk; Hj ) ; for j = 0; 1 and k = 1;    ; N: where we note that now Djk (xk; uIk ) depends on the sensor decisions passed to it which are

uIk as just de ned. In (5.27) (Ik [ fkg)c denotes the complement of Ik [ fkg and u(Ik [fkg)c denotes the decisions from all the sensors except the kth and those in Ik . Prob(xk; U fkgc =

ufkgc jHj )dxk is the joint probability that xk < Xk  xk + dxk and that U fkgc = ufkgc when Hj is true. Further, Prob(U0 = 1jU fkgc = ufkgc ; Uk = 1; xk; Hj ) is a conditional probability. The rest of the quantities in (5.27) are de ned similarly. It is straight forward to generalize (5.27) to multiple bit sensor decision cases using (5.25). The proof that Theorem 2.1 and Theorem 2.2 still hold follows closely the proof of Theorem 2.1 and Theorem 2.2 so this is omitted. Similar to the proof of Theorem 2.1 in [2], the key is that the integral of Djk (xk; uIk ) over the decision region is the probability of detection conditioned on uIk plus a constant term. Thus the quantity in (5.27) will lead to a kth sensor decision rule which will maximize the probability of detection for each possible 22

(5.27)

input uIk when the entire network, except for the kth sensor, is xed. It is easy to show that this implies that this kth sensor decision rule will also maximize the unconditional probability of detection when the entire network, except for the kth sensor, is xed. In most cases, only a certain subset of the sensor decisions in uence the kth sensor decision directly or indirectly, and (5.27) can be rewritten more explicitly regarding the actual interconnection structure which is speci ed in terms of the communication matrix R. De ne the dependent matrix M whose elements denote whether the decisions at sensor i are in uenced by decisions from sensor k or not. It is easy to see that the dependent matrix M can be calculated from the communication matrix R. Further, we de ne the dependent set

Dk and the indirectly dependent set Ek at the kth sensor as follows Dk = fi : Mik = 1; for all ig; Ek = fi : Mik = 1 and Rik = 0; for all ig:

(5.28)

Using this information, (5.27) becomes

Djk (xk; uIk ) =

X uEk



Prob(xk; U Dk = uDk jHj ) Prob(U0 = 1jUk = 1; xk; UDk = uDk ; Hj ) 

?Prob(U0 = 1jUk = 0; xk; UDk = uDk ; Hj ) ; for j = 0; 1 and all k:

(5.29)

Recall u0 is the overall decision. Thus with each given topology, we can nd the correct form of the function Djk (xk; uIk ) from (5.29). For example, consider the serial topology [11] where all of the sensors are connected in series and each receives direct observations of the common phenomenon. Every sensor but the rst one uses the decision transmitted directly 23

from its upstream neighbor in conjunction with its direct observation to make its decision that is sent to the next sensor. For simplicity consider the binary sensor decision case where the decision of the last sensor is accepted as the overall decision. For the last sensor, sensor

N > 2, we nd DN = f1; : : : ; N ? 1g and EN = f1; : : : ; N ? 2g. Further, Theorem 2.1 and Theorem 2.2 still hold for the case of serial networks with Djk (xk; uIk ) replaced by the appropriate equation from

for sensor 1 :





Dj1(x1) = Prob(x1jHj ) Prob(UN = 1jU1 = 1; x1; Hj ) ? Prob(UN = 1jU1 = 0; x1; Hj ) ; for sensor 2 :



Dj2(x2; u1) = Prob(x2; U1 = u1jHj ) Prob(UN = 1jU2 = 1; x2; U1 = u1; Hj ) 

?Prob(UN = 1jU2 = 0; x2; U1 = u1; Hj ) ; for sensor k; k = 3; :::; N ? 1 : Djk (xk; uk?1) =

X 

u1 ;:::;uk?2



Prob(xk; U1 = u1; : : : ; Uk?1 = uk?1jHj ) Prob(UN = 1jUk = 1; xk; 

U1 = u1; : : : ; Uk?1 = uk?1; Hj ) ? Prob(UN = 1jUk = 0; xk; U1 = u1; : : : ; Uk?1 = uk?1; Hj ) ; for sensor N : DjN (xN; uN ?1) =

X u1 ;:::;uN ?2

Prob(xN; U1 = u1; : : : ; UN ?1 = uN ?1jHj ):

24

(5.30)

6 Conclusion This paper has focused on optimum Neyman-Pearson distributed signal detection. We have presented two key Theorems which we believe clarify the conditions for NP optimum sensor detectors under a xed fusion rule. The Theorems appear to be similar to some previous results that were obtained using an inappropriate procedure. Our Theorems, however, state requirements under which they provide necessary conditions for NP optimum sensor detectors. Such requirements have been lacking in previous research and we demonstrate that these requirements are needed by giving a counter example where previous results do not provide necessary conditions when these requirements are not met. Our focus here was on cases with binary sensor decisions and for a parallel architecture, but we explain how to extend our results in both of these regards.

References [1] J. N. Tsitsiklis, \Decentralized Detection," Advances in Statistical Signal Processing, Vol. 2: Signal Detection, H.V. Poor and J.B. Thomas, Ed. Greenwich, CT: JAI Press,

1990. [2] R. S. Blum, \Necessary conditions for optimum distributed sensor detectors under the Neyman-Pearson criterion," IEEE Transactions on Information Theory, IT-42, pp. 990-994, May 1996. [3] R. Srinivasan, \Distributed radar detection theory," IEE Proceedings, vol. 133, Part F, No. 1, pp. 55-60, Feb. 1986. 25

[4] R. Srinivasan, \A theory of distributed detection," Signal Processing, vol. 11, No. 1, pp. 319-327, Dec. 1986. [5] S. C. A. Thomopoulos, R. Viswanathan, and D. Bougoulias, \Optimal distributed decision fusion," IEEE Transactions on Aerospace and Electronic Systems, AES-25, No. 5, pp. 761-765, Sept. 1989 (see also R. Viswanathan and S. C. A. Thomopoulos, \Distributed Data Fusion- A singular situation in Lagrange multiplier optimization," TR-SIU-EE-87-4, Department of Electrical Engineering, Southern Illinois University, Carbondale, IL, 1987.). [6] M. Cherikh, \Optimal decision and detection in the decentralized case," Ph.D. dissertation, Department of Operations Research, Case Western Reserve University, May 1989, pp. 117-120. [7] P. K. Willett and D. J. Warren, \The suboptimality of randomized tests in distributed and quantized detection Systems," IEEE Transaction on Information Theory vol. 38, No. 2, pp. 355-362, March 1992.

[8] D. G. Luenberger, Introduction to Linear and Nonlinear Programming, Reading, Massachusetts: Addison-Wesley, 1973. [9] M. R. Hestenes, Optimization Theory: The Finite Dimensional Case, New York: John-Wiley & Sons, New York, 1975. [10] A. D. Whalen Detection of Signals in Noise , Orlando, Florida: Academic Press. Inc., 1971 (see especially Section 75 on pages 205-207). 26

[11] P. K. Varshney, Distributed Detection and Data Fusion, New York: Springer-Verlag, 1997. [12] Z. B. Tang, K. R. Pattipati, and D. L. Kleinman, \A distributed M-ary hypotheses testing problem with correlated observations", IEEE Transactions on Automatic Control, Vol.37, No.7, pp. 1042-1046, July 1992.

[13] P. F. Swaszek, P. Willett and R. S. Blum, \Distributed detection of dependent data - the two sensor problem", 29th Annual Conference on Information Sciences and Systems, Princeton University, pp. 1077-1082, Princeton, N.J., March 1996.

[14] Y. Zhu, R. S. Blum, Z-Q. Luo, and K. M. Wong, \Unexpected Properties and optimum distributed sensor detectors for dependent observation cases," to appear in IEEE Transactions on Automatic Control, 2000.

[15] Q. Yan and R. S. Blum, \On some unresolved issues in nding optimum distributed detection schemes," submitted to IEEE Transactions on Signal Processing.

27

1

0.98

Pd

0.96

0.94

0.92

0.9

0.88 0.2

0.3

0.4

0.5

0.6 Pf

0.7

0.8

0.9

1

Figure 1: Concave ROC curve of a two-sensor detection scheme for a binary hypotheses testing problem H0 : x1 ; x2  N (0; 0; 1; 1; 0:2) versus H1 : x1 ; x2  N (1; 2; 1; 1; 0:2). The xed fusion rule is the AND rule.

0.7

0.6

lambda

0.5

0.4

0.3

0.2

0.1

0 0.2

0.3

0.4

0.5

0.6 Pf

0.7

0.8

0.9

1

Figure 2: False alarm probability versus the value of  (monotone in this case) for the same case considered in Fig.1.

28

1

threshold for sensor 1 threshold for sensor 2

thresholds on sensor observations

0

−1

−2

−3

−4

−5 0.2

0.3

0.4

0.5

0.6 Pf

0.7

0.8

0.9

1

Figure 3: Optimum thresholds of sensor observations versus Pf for the same case considered in Fig.1.

5

threshold for sensor 1 threshold for sensor 2 0

thresholds

−5

−10

−15

−20

−25

0

0.1

0.2

0.3

0.4

0.5 Pf

0.6

0.7

0.8

0.9

1

Figure 4: Optimum thresholds of sensor observations versus Pf for the binary hypotheses testing problem H0 : x1 ; x2  N (0; 0; 1; 1; 0) versus H1 : x1; x2  N (0:5; 0:1; 1; 1; 0). The xed fusion rule is the AND rule.

29

0.9265

0.926

Pd

0.9255

0.925

0.9245

0.924 0.195

0.196

0.197

0.198

0.199

0.2

Pf

Figure 5: The set of converged solutions for  = 0:5 for the binary hypotheses testing problem H0 : x1 ; x2  N (0; 0; 1; 1; 0:975) versus H1 : x1 ; x2  N (1; 2; 1; 1; 0:975). The xed fusion rule is the XOR rule.

0.92

0.9

Pd

0.88

0.86

0.84

0.82

0.8 0.15

0.2

0.25

0.3

0.35 Pf

0.4

0.45

0.5

0.55

Figure 6: Non-concave ROC curve of a 2-sensor, single observation per sensor, detection problem with independent observations from sensor to sensor. Under H0 the observations follow a zero-mean generalized Gaussion distribution with k = 1:1 and  = 1. Under H1 the observations follow a similar distribution with a shift in the mean of 1.3.

30

0.756

0.754

Pd−0.35*Pf

0.752

0.75

0.748

0.746

0.744

0.742 0.15

0.2

0.25

0.3

0.35 Pf

0.4

0.45

0.5

0.55

Figure 7: (Pd ? 0:35  Pf ) versus Pf for the same case considered in Fig.6. 0.76

Pd−lambda*Pf

0.755

0.75

0.34

0.345

0.35

0.355

0.36 lambda

0.365

0.37

0.375

0.38

Figure 8: Pd ? Pf for the same case considered in Fig.6. 0.55

0.5

0.45

lambda

0.4

0.35

0.3

0.25

0.2

0.15 0.15

0.2

0.25

0.3

0.35 Pf

0.4

0.45

0.5

0.55

Figure 9:  versus Pf for the same case considered in Fig.6. 31

0.5

0

threshold

−0.5

−1

−1.5

−2 0

5

10

15 20 iteration times

25

30

35

Figure 10: Di erent solutions resulting from di erent initial values in the Gauss-Seidel iterative procedure for the same case considered in Fig.6.

32