Invariant Sequential Detection and Recognition of Targets ... - CiteSeerX

0 downloads 0 Views 182KB Size Report
Distributed Systems. Alexander G. ... systems. In this paper, we consider a distributed multisensor sys- ..... fi(y, θ), θ ∈ Θ} where fi are known pdf's and θ is an.
Invariant Sequential Detection and Recognition of Targets in Distributed Systems Alexander G. Tartakovsky ∗ Center for Applied Mathematical Sciences University of Southern California Los Angeles, CA 90089-1113, USA [email protected]

X. Rong Li † Department of Electrical Engineering University of New Orleans New Orleans, LA 70148 [email protected]

George Yaralov ‡ Center for Applied Mathematical Sciences University of Southern California Los Angeles, CA 90089-1113, USA [email protected] Abstract– The sequential testing of many hypotheses is important for a variety of applications including detection and recognition of targets by multiple-resolution-element radar and infrared systems. In this paper, we consider a distributed multisensor system in which each sensor tests a finite number of composite hypotheses in a sequential manner. Then the decisions are transmitted to a fusion center, which combines them to improve the performance of the system. First, we study the behavior of an invariant sequential test for many composite hypotheses assuming that distributions of observations under each hypothesis are not exactly known due to the presence of nuisance parameters. In particular, we show that, under certain general conditions, the proposed sequential test is asymptotically optimal in the sense that it minimizes any positive moment of the stopping time distribution when error probabilities are small. Then, general results are applied to the problem of target detection-identification in a multi-sensor, multi-channel system. We consider two statistical models which frequently arise in the target detection area. The first model is related to the detection of a non-fluctuating target in the presence of clutter with unknown mean and variance. The second problem is detection of a slowly fluctuating target in white Gaussian noise with unknown variance. Finally, we use a nonsequential fusion rule for fusion of local decisions. This fusion rule waits until all the local decisions in all sensors are made and then fuses them. It is optimal in the sense of maximizing the average probability of correct target detection-identification for a given false alarm rate.

Key words: distributed decisions, optimal fusion, sequential detection, invariant sequential tests, multi-sensor, multi-resolution systems.

1 Introduction

In this paper, we continue to study the problem of fusing local decisions made sequentially by multiple sensors that was initiated in [11]. In contrast to [11] where the case of multiple simple hypotheses was considered, in the present paper we assume that each sensor sequentially tests M composite hypotheses, and then, the M -ary local decisions are transmitted to a fusion sensor for further processing in order to refine the performance. Specifically, we consider the case, which is important for a variety of applications, where the distributions of observed data are not completely known due to nuisance parameters. We first construct an invariant multi-decision sequential procedure that guarantees the given constraints on the probabilities of errors. Then, we prove that the proposed invariant sequential test asymptotically minimizes the average sample size, or more generally, any positive moment of the stopping time distribution when the probabilities of errors are allowed to be small. General results are applied to the problem of target detection in a distributed multi-sensor system where each sensor represents a multi-channel system. Compared to [13] where the problem of detecting targets in multi-channel systems with binary decisions was considered, here we set a more diffi∗ Supported in part by the U.S. ONR grants N00014-99-1-0068 and cult problem of target detection-identification. Finally, we N00014-95-1-0229. provide computations that show that the fusion of multiple † Supported in part by ONR via grant N00014-00-1-0677 and by NSF local decisions in a fusion center is beneficial: the overall via Grant ECS-9734285. ‡ Supported in part by the U.S. ONR grants N00014-99-1-0068 and probability of error can be reduced quite substantially even N00014-95-1-0229. for a small number of sensors.

2 Problem Outline Consider a distributed multiple decision problem of testing M composite hypotheses H0 , H1 , . . . , HM −1 (M ≥ 2) by ` sensors with multi-dimensional data y 1 (k), . . . , y ` (k), observed sequentially in discrete time k = 1, 2, . . . , where y s (k) ∈ RNs is the information available to the s-th sensor at time moment tk . The detailed structure of the hypotheses is given below in Section 3. Each local sensor, s, makes a local M -ary decision us ∈ {0, 1, . . . , M − 1} based upon the information y s (k), k = 1, 2, . . . , and then transmits its decision to a fusion center. This decision is made at a random point in time νs , since the sensor uses a sequential test (see Section 3). The fusion center makes a final M -ary decision, F , based upon the vector of local M -ary decisions u` = (u1 (ν1 ), . . . , u` (ν` )) received from local sensors. A final M -ary decision rule (generally randomized) is the probability measure on the space of all local decisions, δ(u` ) = (δ0 (u` ), δ1 (u` ), . . . , δM −1 (u` )), δ0 (u` ) + δ1 (u` ) + · · · + δM −1 (u` ) = 1, where δi (u` ) ∈ [0, 1] is the probability of making the decision in favor of Hi when the vector u` is observed.

3 Sensor M -ary Sequential Decision Procedure 3.1 Preliminaries Let Y s = {y s (1), y s (2), . . . } be the observed sequence of random variables with a probability measure P , which is assumed to belong to a family P. Our goal is to test M composite hypotheses “Hi : P ∈ Pi ”, i = 0, 1, . . . , M −1, where Pi ⊂ P are given subsets of the set P. Write Y ns = (y s (1), . . . , y s (n)) for the vector of the first n observations available to the s-th sensor. In the rest of this section, we omit the index s indicating the identity of the sensor. A sequential decision procedure includes a stopping time and a terminal decision to achieve a tradeoff between the average observation time and the quality of the decision, i.e. the number of observations is allowed to be a function of the observations. More precisely, a pair D = (τ, u) is said to be an M -ary sequential test of hypotheses if τ is a Markov stopping time with respect to the sequence {Y n } (i.e., the event {τ ≤ n} depends on Y n but does not depend on y(n+1), y(n+2), . . . ), and u = u(Y τ ) is a terminal decision function with values in the set {0, 1, . . . , M − 1}. In other words, {u = i} = {τ < ∞, D accepts Hi } is identified with accepting the hypothesis Hi at a finite random time moment τ . If P ∈ Pi and u 6= i, then P (u 6= i) = Pr(reject Hi |Hi true)

is the probability of wrong rejection of the hypothesis Hi when it is true (the probability of error). Let αi be positive numbers less than one that characterize the given constraints on the probabilities of errors and let α = (α0 , α1 , . . . , αM −1 ) denote the vector of corresponding constraints. We will be interested in the tests for which the probabilities of errors do not exceed the prespecified numbers αi , i.e. P (u 6= i) ≤ αi if P ∈ Pi for all i = 0, 1, . . . , M − 1. Let ∆(α) denote the class of such tests. In what follows, we restrict ourselves to the hypothesis testing problems with prior uncertainty due to nuisance parameters that are not included in hypotheses to be tested. In other words, if these parameters were known, then the hypotheses would be simple. A typical example is where we wish to test the hypotheses about the mean value of the population with unknown variance based on the observations y(k) = µ + ξ(k), where ξ(k), k = 1, 2, . . . are independent and identically distributed (i.i.d.) zeromean random variables with the probability density function (pdf) σ −1 f (y/σ). The hypotheses to be tested are “Hi : µ = µi , σ > 0”, i = 0, 1, . . . , M − 1, assuming that the variance σ 2 is unknown but µi and f (y) are given. A natural generalization is testing composite hypotheses µ ∈ Ωi , where Ωi are some regions. The latter, more general problem, however, is out of the scope of this paper and will be considered elsewhere. In many hypothesis testing problems with nuisance parameters and even in nonparametric problems, the composite hypotheses can be reduced to simple ones by using the principle of invariance. Suppose that the family of distributions P is invariant under the group of transformations G on the sample space X . If such a group exists and the sets P0 , . . . , PM −1 form M distinct orbits, then the hypotheses become simple [2, 14]. In this case, it is possible to construct invariant multi-hypothesis sequential tests which are asymptotically optimal (under certain conditions) as αi → 0 among invariant multi-hypothesis tests for which Pr(reject Hi |Hi true) ≤ αi , i = 0, 1, . . . , M − 1. To be specific, let M(Y ) be a maximal invariant statistic with respect to the group G, i.e., M a function on X such that (a) M(G(x)) = M(x) for all x ∈ X and G ∈ G; (b) M(x1 ) = M(x2 ) implies x1 = G(x2 ) for some G ∈ G. It follows from Theorem 5.6.1 of Ferguson [2] that the class of invariant tests and the class of all tests which are the functions of maximal invariant are equivalent. That is, instead of restricting attention to the class of invariant tests, we can restrict attention to the class of all tests which are the functions of maximal invariant M. This means that, when constructing sequential tests, we can operate with the maximal invariant statistic Mn (Y n ) in place of the raw data Y n . Thus, to find an optimal invariant decision, the observed

data y(k), k = 1, 2, . . . are transformed into the maximal invariant Mk , k = 1, 2, . . . , in which case the hypotheses become simple, “Hi : P = P i ”, where P i = P i (M) is the probability measure that corresponds to the maximal invariant when P (Y ) ∈ Pi . Since we assumed that the sets Pi form distinct orbits, the measures P i are also distinct. It is also clear that if we restrict ourselves only to invariant tests, then the class ∆(α) can be defined as ∆(α) = {D : P i (u 6= i) ≤ αi , i = 0, 1, . . . , M − 1} . Hereafter P i denotes the probability measure corresponding to the maximal invariant M = {Mn , n ≥ 1} under the hypothesis Hi . 3.2 Construction of the Invariant Sequential Test

the class ∆(α) when αi → 0. Let E i denote the operator of expectation with respect to the measure P i . Suppose that the thresholds bi are selected so that D∗ ∈ ∆(α). The test D∗ is asymptotically optimal in the class ∆(α) if inf D∈∆(α) E i τ →1 Eiν

as max αi → 0. i

In cases of parametric uncertainty, the following method for computing the invariant likelihood ratio statistic (1) is useful. Let p(y) be a density of P with respect to some sigma-finite measure. Assume that Pi = {p : p(y) = fi (y, θ), θ ∈ Θ} where fi are known pdf’s and θ is an unknown vector nuisance parameter belonging to a set Θ. Then, the invariant log-likelihood ratio can be represented in the form ’R “ fi (Y n , θ)W (dθ) Zij (n) = log R Θ , i 6= j, (4) f (Y n , θ)W (dθ) Θ j

Let Fn = σ(Mn ) denote a σ−algebra generated by the maximal invariant statistic Mn , and let P ni be a restriction of the measure P i to this σ−algebra. Next, by Zij (n) de- where W (dθ) is the Haar measure [14]. For instance, in note the example considered above (testing the mean with an undP ni Zij (n) = log (Mn ), i 6= j, (1) known variance), W (dσ) = σ −1 dσ on the positive halfdP j line. the log-likelihood ratio for the hypotheses Hi and Hj built The important observationQis that even if the initial on the maximal invariant statistic Mn (Y n ) (“most informa- model is i.i.d., fi (Y n , θ) = nk=1 fi (y k , θ), the invaritive” invariant version of the log-likelihood ratio). ant log-likelihood ratio of (4) is no longer the sum of Now everything is prepared to give a construction of the i.i.d. random variables. At the same time, most of the reinvariant sequential test. For any hypothesis Hi , define the sults in sequential hypothesis testing, including the origiMarkov rejecting time nal results of Wald and Wolfowitz on optimality of SPRT, substantially exploit the random walk structure of log    likelihood ratios. To cover non-i.i.d. models and comηi (bi ) = min n ≥ 1 : min Zij (n) ≤ −bi , (2) posite statistical hypotheses with nuisance parameters (in0≤j≤M −1   j6=i variant sequential tests), we will need a new technique. This technique is based on the strengthening of the Strong where bi are positive thresholds. Let η(0) ≤ η(1) ≤ Law of Large Numbers into the so-called complete ver· · · ≤ η(M −2) ≤ η(M −1) be a time-ordered sequence of sion [1],[3],[5],[6],[9]. In the next subsection, we use this rejecting times η0 (b0 ), . . . , ηM −1 (bM −1 ). The test D∗ = technique to prove that the introduced IRMSPRT is asymp(ν(b), u(b)) is defined as totically optimal in the class ∆(α) when the given constraints for the probabilities of errors αi are small. ν(b) = η(M −2) (b), u(b) = arg max ηi (bi ), (3) 0≤i≤M −1

where we emphasize that ν = ν(b) and u = u(b) depend on the set of thresholds b = (b0 , . . . , bM −1 ). Thus, the observation process is continued up to rejection of all except one hypotheses, and in this instant, the remaining hypothesis is accepted. In cases where η(M −2) = η(M −1) , the decision is made in favor of the hypothesis for which the likelihood is maximal. This test has a similar structure to a test considered in [8, 11]. The major difference is that we now apply an invariant log-likelihood ratio, and hence, the test becomes also invariant. The corresponding invariant test will be called Invariant Rejecting Multi-Hypothesis Sequential Probability Ratio Test (IRMSPRT). It is our goal to show that this test is asymptotically optimal among all invariant tests under certain conditions in

3.3 Asymptotic Optimality Recall that P i denotes the probability measure corresponding to the maximal invariant M = {Mn , n ≥ 1} under the hypothesis Hi and P in the restriction of this measure to the σ−algebra Fn = σ(Mn ) generated by Mn . We start with the question of how to choose the thresh6 i) ≤ αi . olds bi in (2) to guarantee the inequalities P i (u = The following theorem answers this question. Note that a similar rejecting sequential test in case of simple hypotheses has been considered in [8]. In that work, however, when deriving upper bounds for probabilities of errors, by mistake we lost the factor M − 1. In Theorem 1, we correct this mistake.

Theorem 1. Let {y(k), k ≥ 1} be an arbitrary observation process. If “ ’ M −1 bi = log for i = 0, 1, . . . , M − 1, (5) αi then D∗ ∈ ∆(α). Proof. Obviously, the event {u(b) 6= i} = {ν(b) < ∞, D∗ rejects Hi } 6 i implies the event {ηi (bi ) < ∞}, and hence, for all j = h i P i (u 6= i) ≤ P i (ηi < ∞) = E j 1l{ηi ε n−1 Zij (n) converges to Dij completely under P i ). Putting the last time when the log-likelihood ratio process bj = log[(M − 1)/αj ], we obtain the asymptotic upper n−1 Zij (n) leaves the region (Dij − ε, Dij + ε). We al- bound ’ “ ways suppose that sup {∅} = 0. | log αj | 1 −1 (1 + o(1)) as αmax → 0, (10) Definition . The process {n Zij (n)}n≥1 is said to E i ν ≤ max j6=i Dij converge completely under the measure P i to the constant Dij as n → ∞ if E i [Ti,j (ε)] < ∞ for all ε > 0. If where o(1) → 0. E i [Ti,j (ε)]r < ∞ for all ε > 0 and all positive r, we say On the other hand, Theorem 2.2 from [9] applies here to that {n−1 Zij (n)} converges strongly completely to Dij un- show that, as α max → 0, der P i . ’ “ 1 See also [3, 5, 6, 9] for the definition of the complete convergence and | log αj | inf E i τ ≥ max (1 + o(1)) (11) r-quick convergence. j6=i Dij D∈∆(α) which completes the proof. ƒ

whenever n−1 Zij (n) → Dij almost surely under P i . Combining the inequalities (10) and (11), we obtain the desired asymptotic equalities (9), and the proof is complete. ƒ Remark. We do not know whether the complete convergence condition in Theorem 2 is necessary or not. This condition, however, cannot be replaced with the almost sure convergence condition, since in general, the latter does not even guarantee finiteness of E i ν. See a counterexample in [9]. If one is interested in the behavior of higher moments of the stopping time distribution, then the complete convergence condition in Theorem 2 should be strengthened. To be specific, suppose that E i [Ti,j (ε)]r < ∞ for all positive ε and r, i.e. n−1 Zij (n) → Dij strongly completely under P i as n → ∞. Then, a similar argument applies to show that for all r ≥ 1 as αmax → 0 “r ’ | log αj | inf E i τ r ∼ E i [ν(α)]r ∼ max . (12) j6=i Dij δ∈∆(α) This implies that the IRMSPRT asymptotically minimizes all the positive moments of the stopping time distribution. In [11], we already discussed the meaning of the numbers Dij for the case of simple hypotheses. In invariant problems, the situation is quite similar. The constants Dij characterize the asymptotic distance between the hypotheses Hi and Hj after invariant transformation of the observed data. In many cases, Dij = limn→∞ n−1 E i Zij (n). 3.4

Detecting targets in multi-channel systems

Suppose that each sensor represents a multi-channel system (for example, assumed velocity channels in IR/EO systems) with N channels in total, and the data from all channels are observed simultaneously2 . In this case, y(k) = (y1,k , . . . , yN,k ), where the component yi,k corresponds to the observation in the i-th channel at time tk . The hypothesis H0 means that there is no target at all, and the hypothesis Hi for i = 1, . . . , N that the target is located in the i-th channel. It is necessary to decide whether the target is present or absent and to identify the channel where it is located. Clearly, there are M = N + 1 hypotheses in this problem. The P 0 (u 6= 0) is the probability of a false alarm, while P i (u 6= i), i = 1, . . . , N , represent the sum of the probability of missed detection and the probability of the wrong identification of the channel where the target is located. In what follows, the decision u = j that the target is located in the j-th channel, when the hypothesis Hi is true, will be identified with missed detection. Therefore, the probability P i (u 6= i) will be regarded as the total probability of 2 An

interesting problem of target search when not all of the channels can be observed will not be considered in this paper.

missed detection when the target is located in the i-th channel. Suppose that the constraints on the probabilities of missed detection do not depend on the channel where the target is located, i.e. αi = α1 for all i = 1, . . . , N . Using (5), we obtain that one needs only two thresholds, b0 = log(N/α0 ) and b1 = · · · = bN = log(N/α1 ), to guarantee the inequalities P 0 (u(b) 6= 0) ≤ α0 and P i (u(b) = 6 i) ≤ α1 for i = 1, . . . , N , where α0 and α1 are the given constraints on the false alarm probability and the probabilities of missed detections, respectively. We now consider two examples that are of particular interest for radar applications. 3.4.1 Detection of a non-fluctuating target in Gaussian noise with unknown mean and variance Suppose that under H0 , yi,k = ξi,k ,

i = 1, . . . , N,

while under Hi , i = 1, . . . , N , yi,k = µi + ξi,k

yj,k = ξj,k

and

for j = 6 i,

where noise values in the i-th channel ξi,k , k = 1, 2, . . . are i.i.d. random variables with a density σ −1 f ([y − θ]/σ), which is known except for the parameters θ and σ. The mean θ and the variance σ 2 of the noise are supposed to be unknown. The value of µi characterizes the intensity of the signal from the target in the i-th channel. The target detection-identification problem is formalized as testing N + 1 hypotheses “H0 : µj /σ = 0

for all j = 1, . . . , N ”;

“Hi : µj /σ = 0 for j 6= i and µi /σ = qi ”, i = 1, . . . , N

where q1 , . . . , qN are given positive numbers that characterize SNR in the i-th channel. We will also suppose that noise components in different channels are mutually independent. This problem is invariant under change in location and scale, i.e. under the group G : Gc,a (x) = a + cx,

c > 0, |a| < ∞.

The invariant log-likelihood ratio can be computed by using the formula (4) where θ = (θ, σ) and the Haar measure W (dθ, dσ) = σ −1 dσdθ for σ > 0 and θ ∈ (−∞, +∞). As a result, we obtain that Zi0 (n) = log[Vi (n)/V0 (n)], where Z ∞Z ∞ n Y N ‘ y Y j,k − v dσdv, σ −nN f V0 (n) = σ −∞ 0 j=1 k=1

Vi (n) =

Z



−∞

N Y

j=1 j6=i

f

Z



0

y

j,k

σ

σ −nN

n h  ‘ Y yi,k f − v − qi σ

k=1

‘i − v dσdv.

Clearly, Zij (n) = Zi0 (n) − Zj0 (n) for i, j 6= 0. Further, assume that noise is Gaussian, i.e. f (x) = (2π)−1 exp(−x2 /2). Define y n = (N n)−1

n X N X

An analysis of the above relations shows that Qij > 0, and hence, the conditions of Theorem 2 are satisfied. Thus, the IRMSPRT is asymptotically optimal. Furthermore, assuming that SNR is the same for all channels qi = q and noting that minj6=i Qij = Qi0 for i = 1, . . . , N , we obtain the following approximations for the average sample size 2N log(N/α0 ) , (N − 1)q 2 (ρN − 1) 2N log(N/α1 ) E0ν ≈ , (N − 1)q 2 Eiν ≈

yj,k ,

k=1 j=1

Sn2 = (N n)−1

n X N X

k=1 j=1

[yj,k − y n ]2 ,

ψ(v, z) = − 12 v 2 + zv + log v, Z ∞ ‚ ƒ UN (z, n) = v −2 exp N nψ(v, z) dv, 0 Pn qi (N n)−1 k=1 [yi,k − y n ] . Ti (n) = Sn

i 6= 0,

where ρN = 2N 2 [φ(Λ) − log 2]/[(N − 1)q 2 ] and Λ= 3.4.2

N2

p

(N − 1)q 2 . 1 + (N − 1)q 2 /N 2

Detecting a fluctuating signal in white Gaussian noise with unknown variance

Using the above formula for the invariant log-likelihood ratio, we obtain

Consider the problem of detecting a signal with slowly fluctuating amplitude and phase (Rayleigh amplitude and uniform phase) in white Gaussian noise with unknown variUN (Ti (n), n) N −1 2 (qi − qj2 )n + log , (13) ance. In each channel, the waveform is pre-processed with Zij (n) = − 2N UN (Tj (n), n) the use of the matched filter and square-law detector. The data yi,k , k = 1, 2, . . . are observed at the output of this where T0 (n) = 0. pre-processing scheme in the i-th channel. In the case conThe log-likelihood ratios Zij (n) given by (13) are too sidered, if the target is located in the i-th channel, yi,k has complicated for direct use. However, approximations are exponential distribution with the parameter 1/2σ 2 (1+q). If possible, especially for questions as general as convergence. there is no target, the parameter is equal to 1/2σ 2 , where σ 2 In particular, a suitable approximation has been obtained and q are the variance of noise and SNR at the output of the in [9]. To be specific, by using a uniform version of the square-law detector. It is supposed that noise samples are Laplace asymptotic integration method, it was shown in [9] mutually independent in different channels. The variance that σ 2 is supposed to be unknown and represents a nuisance Œ Œ ’ “ Œ1 Œ € 1 parameter. See [12] for more details. Œ Zij (n) − Ψij Ti (n), Tj (n))Œ ≤ O , (14) Œn Œ This problem is invariant under the group of scale n changes. The maximal invariant statistic is Mn = n where (X1n , . . . , XN ) where ƒ N − 1€ 2  ‚ Xkn = (Xi,1 , . . . , Xi,n ) and Xi,k = yi,k /y1,1 . Ψij (x, y) = N φ(x) − φ(y) − qi − qj2 , 2N p p   € € It can be shown [9, 12] that the log-likelihood ratio for the φ(z) = z z + 4 + z 2 /4 + log z + 4 + z 2 , maximal invariant is › š h −1 qMi,n i Exploiting (14) it can be verified [9] that n Zij (n) Zi0 (n) = −n log(1 + q) + N log 1 − , 1+q converge strongly completely to the constants Qij = N [φ(Λii ) − φ(Λij )] − Q0i =

N −1 2 (qi − qj2 ), i, j 6= 0; 2N

N −1 2 N −1 2 q , Qi0 = N [φ(Λii ) − log 2] − q , 2N i 2N i

where (N − 1)qi2 p , N 2 1 + (N − 1)qi2 /N 2 qi qj . Λij = − p 2 N 1 + (N − 1)qi2 /N 2 Λii =

where

Pn

yi,k . Mi,n = PN k=1 Pn j=1 k=1 yj,k

Obviously, Zij (n) = Zi0 (n) − Zj0 (n) where Z00 (n) = 0. Also, it follows from [12] that for i = 1, . . . , N  q‘ P −s.c. − log(1 + q), n−1 Zi0 (n) −−i−−→ D1 = N log 1 + n→∞ N ’ “ q P 0 −s.c. −1 n Z0i (n) −−−−−→ D0 = N log 1 − n→∞ (1 + q)N + log(1 + q)

where we used the abbreviation P i −s.c. for the strong complete convergence. It is easily verified that 0 < D0 < ∞ and 0 < D1 < ∞ whenever 0 < q < ∞. Applying Theorem 2, we conclude that the IRMSPRT is asymptotically optimal with respect to the expected sample size. Furthermore, it minimizes not only the average sample size but, also, all the moments of the observation time distribution within the class ∆(α) of invariant tests (see (12)). To be specific, the asymptotic formulas (9) hold true for every positive r with D0 and D1 , which are defined above. For small α0 and α1 , the average sample sizes are approximately equal to   ) log(N/α ; ‘1  E0ν ≈  q N log 1 − (1+q)N + log(1 + q) !   log(N/α0 )  € Eiν ≈ , i = 1, . . . , N. N log 1 + Nq − log(1 + q)

4 Optimal fusion rule Once the local decision us (νs ) in the s-th sensor is made (at random stopping time νs defined in (3)), it is transmitted to a fusion center. Recall that us is the M -ary decision that takes values in the set {0, 1, . . . , M − 1}. By F` = F` (u` ), we will denote a final decision rule based on the whole vector of decisions u` = (u1 , . . . , u` ) obtained from all sensors, and by Lij (`) the log-likelihood ratio for the hypotheses Hi and Hj , š › P i (u1 , . . . , u` ) Lij (`) = log , P j (u1 , . . . , u` ) where P k (u1 , . . . , u` ) is the joint probability of the vector u` under the hypothesis Hk . In the rest of the paper, we consider the problem of fusion of local decisions from the sensors that perform multichannel target detection considered in Section 3.4. Recall that, in this case, H0 is the hypothesis that there is no target, while Hi is the hypothesis that the target is located in the i-th channel, i = 1, . . . , Ns , where Ns is the number of channels in the s-th sensor. Correspondingly, us = 0 is the decision that there is no target and us = i for i = 1, . . . , Ns is the decision that the target is in the i-th channel. In what follows, we restrict ourselves to the case where Ns = N for all s = 1, . . . , `. Thus, M = N + 1 in this case. Let βi (F` ) = Pr(F` rejects Hi |Hi true) denote the probability to reject the hypothesis Hi when it is true. The β0 (F` ) is the false alarm probability, and for i = 1, . . . , N the probability βi (F` ) is the sum of the probability of missed detection and the probability of the wrong identification of the channel where the target is located. As before, a wrong identification of the channel will be regarded as a missed detection. Therefore, the probability βi (F` ) will

be called the total probability of missed detection when the target is located in the i-th channel. A reasonable problem setting is to build a fusion algorithm which would minimize the average probability of missed detection ¯ ` ) = N −1 β(F

N X

βi (F` )

(15)

i=1

ˆ for which the false in the class of rules {F` : β0 (F` ) = β} ˆ alarm probability β0 (F` ) is fixed at the specified level β. In this case, the optimum rule is randomized, i.e. F`0 = P 0 0 0 0 (δ0 (u` ), δ1 (u` ) . . . δM −1 (u` )) with i δi (u` ) = 1 where δi0 (u` ) is a probability of making the decision i when the element u` is observed. More specifically, the optimal fusion rule has the following form [10]:  1 if Y (`) < C  0 (16) δ0 (u` ) = (1 − γ) if Y (`) = C   if Y (`) > C, 0   1/m if Li0 (`) = Y (`) > C (17) δi0 (u` ) = γ/m if Li0 (`) = Y (`) = C   0 otherwise, where Y (`) = maxk∈{1,...,M −1} Lk0 (`), m is the number of subscripts i for which Li0 (`) = Y (`), and the threshold C and the probability γ ∈ [0, 1] are found from the equation ˆ ‰ ˆ ‰ ˆ P 0 Y (`) > C + γP 0 Y (`) = C = β. (18) For the sake of simplicity, suppose that the local decisions u1 , . . . , u` , are independent. Then Lij (`) =

` X s=1

 ‘ (s) (s) log αius /αjus .

(19)

(s)

where αik = P i (us = k). Further consider a somewhat symmetric case assuming (s) that the probabilities αij = αij do not depend on s and that αkj = α10 α0j =

6 j: k = 1, . . . , N , j = 0, . . . , N , for k = (20) for j = 1, . . . , N .

α00

Since αi = P i (us = 6 i) (see Section 3.1), it follows from (20) that α0 = N α00 , α1 = N α10 . (21) Also, define α10 1 − N α00 , w = log , 1 α10 α00 1 − (N − 1)α10 − α00 w2 = log . α00 w0 = log

References

Using (19)–(21), we obtain Lk0 (`) = w2 Rk + w1

N X j=1 j6=k

Rj − w0 R0 ,

(22)

where Rj is the number of elements among (u1 , . . . , u` ) PN that are equal to j ( j=0 Rj = `). Assume PN that α0 < α1 , in which case w2 > w1 > w0 . Since j=0 Rj = `, it follows from (22) that Y (`) = (w2 − w1 )R − (w1 + w0 )R0 + w1 `,

where R = maxk6=0 Rk . Write ~ r = (r0 , r1 , . . . , rN ) and

[1] V.P. Dragalin, A.G. Tartakovsky, and V. Veeravalli, “Multihypothesis Sequential Probability Ratio Tests– Part I: Asymptotic Optimality,” IEEE Trans. Inform. Theory, 45(7): 2448–2461, 1999. [2] T.S. Ferguson, Mathematical Statistics: A Decision Theoretic Approach, Academic Press, New York, London, 1967. [3] P.L. Hsu and H. Robbins, “Complete Convergence and the Law of Large Numbers,” Proc. Nat. Acad. Sci. U.S.A., 33: 25–31, 1947.

r ) = P 0 (R0 = r0 , R1 = r1 , . . . , RN = rN ). g0 (~

[4] L.A. Klein, Sensor and Data Fusion Concepts and Applications, SPIE Press, 1993.

Since, under Hi , the vector (R0 , R1 , . . . , RN ) has a multinomial distribution with the parameters αij , we obtain from (20) and (21) that

[5] T.L. Lai, “On r-Quick Convergence and a Conjecture of Strassen,” Ann. Probability, 4: 612–627, 1976.

N Y `! g0 (~ r) = (α00 )rj . [1 − N α00 ]r0 r0 !r1 ! · · · rN ! j=1

(23)

[6] T.L. Lai, “Asymptotic Optimality of Invariant Sequential Probability Ratio Tests,” Ann. Statist., 9: 318–333, 1981.

[7] E.L. Lehmann, Testing Statistical Hypotheses, John Formula (23) allows us to find the distribution of the Wiley & Sons, New York, 1986. statistic Y (`). This distribution is needed to find the optimal [8] A.G. Tartakovsky, “Asymptotically Optimal Sequenthreshold C, the probability γ from (17), and to evaluate tial Tests for Nonhomogeneous Processes,” Sequential the performance (the probability of missed detection (15)) Analysis, 17: 33–62, 1998. of the optimal fusion rule (16)–(17). It is worth mentioning that in the symmetric case con[9] A.G. Tartakovsky, “Asymptotic Optimality of Certain sidered the conditional probability of missed detection Multihypothesis Sequential Tests: Non-i.i.d. Case,” βi (F`0 ) (under the condition that the target is located in Statistical Inference for Stochastic Processes, 1(3): the i-th channel) does not depend on i, βi (F`0 ) = β(F`0 ). 265–295, 1998. ¯ 0 ) = β(F 0 ). Also, in this case, the fuTherefore, β(F ` ` [10] A.G. Tartakovsky, “Minimax Invariant Regret Solusion rule F`0 is minimax in the sense that it minimizes tion to the N -Sample Slippage Problem,” Mathematimaxi βi (F` ) in the class of rules for which the false alarm cal Methods of Statistics, 6(4): 491–508, 1997. ˆ probability β0 = β. The fusion rule (16)–(17) was studied in a binary case [11] A.G. Tartakovsky and X.-R. Li, “Sequential Testing of Multiple Hypotheses in Distributed Systems,” Proin our previous publications [11, 13]. Below, we give the ceeding of the 3rd International Conference on Inforresults of computations in a 2-channel example where M = mation Fusion, Paris, France, July 2000, Volume II. −3 −1 N + 1 = 3 (three hypotheses), α0 = 10 and α1 = 10 . −4 0 0 Then, it follows from (21) that α0 = 5 · 10 and α1 = [12] A. Tartakovsky, “Quikest Detection of Targets in 5 · 10−2 . The overall false alarm probability β0 (F`0 ) was Multiple-Element Resolution Systems: Sequential Detection Versus Non-sequential Detection,” Profixed at the level βˆ = 10−4 . The average probability of ¯ 0 ) defined in (15) is listed in the last ceedings of the Workshop on Estimation, Tracking and missed detection β(F ` Fusion: A Tribute to Yaakov Bar-Shalom, Monterey, row of Table 1. It is seen that the probability of missed CA, pp. 326–350, May 2001. detection decreases quite fast. [13] A. Tartakovsky, G. Yaralov, and X.-R. Li “Sequential Detection of Targets in Distributed Systems,” SPIE Table 1: The threshold C, the probability γ and the probability of Proceedings: Signal Processing, Sensor Fusion, and ¯ `0 ) for different number of sensors ` missed detection β(F Target Regognition (Aerosense 2001), vol. 4380, Orlando, FL, 16-20 April 2001. ` 2 3 4 5 C

4.55

1.56

-1.44

-4.43

γ β¯

0.0495

0.0324

0.0236

0.0181

1.73 · 10−2

2.11 · 10−3

2.81 · 10−4

3.97 · 10−5

[14] S. Zacks, The Theory of Statistical Inference. New York: Wiley, 1971.

Suggest Documents