Aerosense Conference, SPIE Proceedings: Signal Processing, Sensor Fusion, and Target Recognition, Vol. 4380, Orlando, FL, 16–20 April 2001.
Sequential Detection of Targets in Distributed Systems Alexander Tartakovskya , X. Rong Lib , and George Yaralova a Center
for Applied Mathematical Sciences, University of Southern California 1042 Downey Way, DRB-155, Los Angeles, CA 90089-1113, USA b Department
of Electrical Engineering, University of New Orleans New Orleans, LA USA ABSTRACT
It is supposed that there is a multisensor system in which each sensor performs sequential detection of a target. Then the binary decisions on target presence and absence are transmitted to a fusion center, which combines them to improve the performance of the system. We assume that sensors represent multichannel systems with possibly each one having different number of channels. Sequential detection of a target in each sensor is done by implementing a generalized Wald’s sequential probability ratio test which is based on the maximum likelihood ratio statistic and allows one to fix the false alarm rate and the rate of missed detections at specified levels. We first show that this sequential detection procedure is asymptotically optimal for general statistical models in the sense of minimizing the expected sample size when the probabilities of errors are small. We then construct the optimal non-sequential fusion rule that waits until all the local decisions in all sensors are made and then fuses them. It is optimal in the sense of maximizing the probability of target detection for a fixed probability of false alarm. An analysis shows that the final decision can be made substantially more reliable even for a small number of sensors (3-5). The performance of the system is illustrated by the example of detecting a deterministic signal in correlated (colored) Gaussian noise. In this example, we provide both the results of theoretical analysis and the results of Monte Carlo experiment. These results allow us to conclude that the use of the sequential detection algorithm substantially reduces the required resources of the system compared to the best non-sequential algorithm. Keywords: sequential detection, distributed decisions, optimal fusion.
1. INTRODUCTION Most of the research in fusion of data from multiple sensors was done in a non-sequential setting2,3,8,10,12,17,18,22,30,32,34 where the differences among sensor decision times and their difference from the fusion time are ignored. In many practical systems, however, sensor decisions are made in a sequential manner at random times, depending on the data that are received sequentially by the sensors. The problem of detecting a target in multichannel systems may serve as a good example. It is thus important to consider sensor decisions and their fusion in a sequential setting where the sensor decision rules are sequential in nature. There are only a very few works where a data fusion problem was considered from the point of view of sequential analysis.29,33,40 For example, in,33 the problem of binary hypothesis testing was considered for the system with full feedback and local memory restricted to past decisions. It was supposed in33 that a fusion center performs a sequential test based on the information it receives from the sensors. It was also supposed in that work that the sensor observations were independent and identically distributed (i.i.d.). Paper40 presents an algorithm that allows one to design a set of sequential sensor decision rules that aims at enhancing the performance of the sequential decision at the fusion center. In paper,29 we considered the problem of fusing local decisions made sequentially by multiple sensors assuming that each sensor sequentially tests M hypotheses, and then, the M -ary local decisions Send correspondence to Alexander Tartakovsky: Email:
[email protected]; Telephone: 213-740-2450; Fax: 213-740-2424; WWW: http://www.usc.edu/dept/LAS/CAMS/usr/facmemb/tartakov
1
are transmitted to a fusion sensor, one-by-one, in the order they are made. In that paper, we built a rejecting multihypothesis sequential test which is asymptotically optimal for general, non-i.i.d. models, when probabilities of erroneous decisions are small. In the present paper, we continue to study the problem of fusing decisions in distributed systems assuming that each sensor represents a multichannel system. A target may appear in one of the channels and should be detected as soon as possible, controlling the rates of false alarms and missed detections. Local decisions on target presence or absence are made sequentially by using a generalized Wald test which is based on the comparison of the maximum likelihood ratio statistic (over all channels) with two thresholds. The thresholds are chosen in a way to guarantee specified rates of false alarms and missed detections. A fusion center combines these local binary decisions to further test hypotheses. As a result, the performance is enhanced. We do not assume that observations are i.i.d. In contrast, it is assumed that the observations can be correlated and non-stationary. The proposed sequential test turns out to be asymptotically optimal for very general statistical models when the false alarm rate and the rate of missed detections are low. The general results are illustrated by a particular example of detecting a deterministic signal in colored Gaussian noise. In this example, we not only confirm that asymptotic theory can be applied for preliminary engineering estimates, but we also design the thresholds that guarantee the given levels of false detections and missed detections with very high accuracy. To this end, we estimate overshoots of the thresholds by decision statistics using the results of the nonlinear renewal theory.20,35 The paper is organized as follows. In Section 2, we formulate the problem and provide basic definitions and notations. In Section 3, we describe a sequential detection algorithm and show that it is optimal in an asymptotic setting when the rates of false alarms and missed detections are low. We also present the results of exhaustive analysis (theoretical and Monte Carlo) for an example of detecting the deterministic signal in correlated Gaussian noise and show that sequential detection allows us to reduce, substantially, the overall time needed to reach the final decision compared to the case where the decisions are made non-sequentially. Finally, in Section 4, we discuss the algorithm for fusion of local decisions that are transmitted from local sensors to a fusion center.
2. PROBLEM FORMULATION We will be interested in a distributed binary decision problem of testing two hypotheses related to an absence or (`) presence of a target by ` sensors with multi-dimensional data (X (1) n , . . . , X n ), observed sequentially in discrete time (s) n = 1, 2, . . . , where X n ∈ RNs is the information available to the s-th sensor at time moment tn . In what follows, we will suppose that the s-th sensor represents a multichannel system with Ns channels (e.g., Doppler, angle, and (s) (s) range channels in radars or assumed velocity channels in IR/EO systems). In this case, X (s) n = (X1,n , . . . , XNs ,n ) is (s)
the vector of dimensionality Ns , where the i-th component Xi,n is the observation available in the i-th channel at time tn . Write X t,(s) =
(s) (s) (X 1 , . . . , X t )
for the concatenation of observations up to time t.
It is assumed that a target either is present in one of the channels or is absent in all channels. The decision on target absence or presence must be made as soon as possible, controlling the rates of false alarms and missed (s) (s) detections. Let p0 (x) be the probability density of X t,(s) when the target is absent and pi (x) the probability density when it is located in the i-th channel. The problem of detecting the target in the s-th sensor can be formulated (s) (s) (s) (s) s as the problem of testing two hypotheses, “H0 : p = p0 ” – the target is absent, and “H 1 : p ∈ ∪N i=1 pi ” – the (s)
target is present in one of the channels (it does not matter in which one). Note that the hypothesis H 1 is composite (s) even when the densities pi are completely specified. Each local sensor, s, makes a local binary decision ds ∈ {0, 1} based upon the information X n(s) , n = 1, 2, . . . , available to it and then transmits its decision to a fusion center. Since the sensor uses a sequential test (see Section 3), this decision is made at a random point in time τs , i.e. ds = ds (τs ). Finally, a fusion center makes a final binary decision D based upon the messages received from local sensors. A fusion center can make this decision either based on the complete information that comes from all ` sensors (non-sequential fusion rule), or sequentially, allowing for stopping before the last decision will be made and transmitted. In the following, however, we will consider only non-sequential fusion rules. In a non-sequential setting, a final decision rule D` (generally randomized) is the probability measure Ψ` (u` ) on the space of all local decisions, where u` = (d1 (τ1 ), . . . , d` (τ` )). To be specific, Ψ` = Ψ(u` ) ∈ [0, 1] is the probability of making the decision that the target is present when the vector u` is observed. Then, 1 − Ψ(u` ) is the probability to make the decision that the target is absent. 2
3. SENSOR DETECTION ALGORITHM A sequential detection procedure includes a stopping time and a terminal decision to achieve a tradeoff between the average observation time and the quality of the decision. Problems of sequential testing of two or more hypotheses under different conditions have been studied for decades (see, e.g., Armitage,1 Chernoff,4 Dragalin,5 Dragalin, Tartakovsky, and Veeravalli,6,7 Lai,14 Lorden,16 Pavlov,19 Sosulin and Fishman,21 Tartakovsky23–26 ). Most of the results have been obtained for i.i.d. models for observations, when log-likelihood ratio processes are random walks. See, however,6,14,23–26 for generalizations for non-i.i.d. models. Below, we study the behavior of the sequential detection algorithm which is based on the maximum likelihood arguments. In particular, we show that this algorithm has asymptotically optimal properties for a large class of statistical models that are not confined to a restrictive i.i.d. assumption. This study is important in a variety of detection problems where observations are correlated and/or non-stationary. For brevity, in this section, we omit the index s indicating the identity of the sensor. For instance, we write X n instead of X (s) n .
3.1. Generalized Wald’s sequential test A sequential detection procedure (or more generally a sequential test of two hypotheses) is a pair δ = (τ, d) where τ is a Markov stopping time with respect to {X n }n≥1 , i.e. the event {τ ≤ n} depends only on X n but not on X k , k > n, and d = d(X τ ) is a terminal decision function taking two values 0 and 1. Therefore, {d = 0} = {τ < ∞, δ accepts H0 } and {d = 1} = τ < ∞, δ accepts H 1 are the decisions in favor of the hypothesis H0 and H 1 , respectively, that are made at random time τ which depends on observations. In what follows, we use P i , i = 0, 1, . . . , N to denote the probability measures that correspond to probability densities pi introduced above, and E i will denote the operator of expectation with respect to the measure P i . In other words, the measure P 0 corresponds to the distribution of observations when there is no target and P i to the distribution of observations when the target is located in the i-th channel. It is convenient to introduce a fictitious parameter λ that takes values in the set {0, 1, . . . , N } and parameterize the probability density function p = pλ , λ ∈ {0, 1, . . . , N }. If λ = i, then the target is located in the i-th channel, while if λ = 0, then there is no target at all. Obviously, in terms of the parameter λ the hypotheses to be tested are reformulated as “H0 : λ = 0” and “H 1 : λ ∈ {1, 2, . . . , N }”. The latter hypothesis can be also regarded as a union of simple hypotheses H1 , . . . , HN , i.e. H 1 = ∪N i=1 Hi , where “Hi : λ = i” is the hypothesis that the target is located in the i-th channel. We now present a construction of the sequential detection procedure that will be called the generalized Wald test (GWT). Let pi (X n ) , i = 1, . . . , N, Zi (n) = log p0 (X n ) denote the log-likelihood ratio (LLR) for the hypotheses Hi : λ = i and H0 : λ = 0 based on the data X n observed up to time moment tn . Further, let a0 and a1 be two positive numbers (thresholds). The GWT δ ∗ = (τ ∗ , d∗ ) is defined as ( n o 1 if max1≤i≤N Zi (τ ∗ ) ≥ a1 , ∗ ∗ τ = min n ≥ 1 : max Zi (n) 6∈ (−a0 , a1 ) , d = (3.1) 1≤i≤N 0 if max1≤i≤N Zi (τ ∗ ) ≤ −a0 . Thus, the observation process is continued as long as the maximum LLR statistic is between the thresholds −a0 and a1 . We stop and make a final decision at time τ ∗ , which is the first time n such that the statistic max1≤i≤N Zi (n) leaves the region (−a0 , a1 ). By default, τ ∗ = ∞ if there is no such n. The decision d∗ = 1 on target presence is made as soon as max1≤i≤N Zi (n) exceeds the upper threshold a1 , while the decision d∗ = 0 on target absence is made as soon as it falls below the lower threshold −a0 .
3.2. Upper bounds for the probabilities of errors and choice of thresholds We will be interested in detection procedures that confine the probability of a false alarm P 0 (d = 1) ≤ PFA and the probabilities of missed detection when a target is located in the i-th channel P i (d = 0) ≤ PMS , i = 1, . . . , N , at the given levels PFA and PMS , respectively. The class of such detection algorithms will be denoted by ∆(PFA , PMS ). To be more specific, ∆(PFA , PMS ) =
i δ : PFA (δ) ≤ PFA , max PMS (δ) ≤ PMS , 1≤i≤N
3
(3.2)
i where we use the notation PFA (δ) = P 0 (d = 1) and PMS (δ) = P i (d = 0) for the false alarm probability and for the probability of missed detection of the procedure δ. Evidently, any detection procedure from the class (3.2) also PN i guarantees that the average probability of missed detections i=1 πi PMS (δ) does not exceed the value of PMS , where πi = Pr(λ = i|H 1 ) is the prior probability that the target appears in the i-th channel conditioned on H 1 , i.e. when it is known for sure that the target is present (π1 + · · · + πN = 1).
We now show that the following upper bounds for probabilities of errors hold i max PMS (δ ∗ ) ≤ e−a0
1≤i≤N
and PFA (δ ∗ ) ≤ N e−a1
(3.3)
regardless of any assumptions on the structure of the observed process X n , n ≥ 1. Using these inequalities we immediately obtain that a0 = log(1/PMS )
and a1 = log(N/PFA )
imply δ ∗ ∈ ∆(PFA , PMS ).
(3.4)
To prove the first inequality in (3.3), we observe that Zi (τ ∗ ) ≤ −a0 for all i = 1, . . . , N on the set {d∗ = 0}. Therefore, if we denote an indicator of an event A as 1lA , h i ∗ i PMS (δ ∗ ) = E i 1l{d∗ =0} = E 0 eZi (τ ) 1l{d∗ =0} ≤ e−a0 P 0 (d∗ = 0) = e−a0 (1 − PFA (δ ∗ )) ≤ e−a0 , i (δ ∗ ) ≤ PMS if a0 = log(1/PMS ). which implies that PMS
The second upper bound in (3.3) follows from the following chain of equalities and inequalities: for any k 6= 0 i h ∗ ∗ ∗ PFA (δ ∗ ) = E 0 1l{d∗ =1} = E k e−Zk (τ ) 1l{d∗ =1} ≤ e−a1 E k max eZi (τ )−Zk (τ ) 1l{d∗ =1} 1≤i≤N "N # ∗ N N τ X X X pi (X ) i ∗ −a1 −a1 ∗ =1} [1 − PMS (d∗ )] ≤ N e−a1 , P (d = 1) = e ≤ e−a1 E k 1 l = e ∗ i {d τ p (X ) k i=1 i=1 i=1 where we used the following obvious equalities ∗ {d∗ = 1} = max Zi (τ ∗ ) ≥ a1 = e−Zk (τ ) ≤ exp max Zi (τ ∗ ) − Zk (τ ∗ ) e−a1 ; 1≤i≤N
1≤i≤N
∗ ∗ N X pi (X τ ) pi (X τ ) max Zi (τ ∗ ) − Zk (τ ∗ ) = max ≤ ∗ ∗ . 1≤i≤N 1≤i≤N pk (X τ ) p (X τ ) i=1 k
exp
i It follows that PFA (δ ∗ ) ≤ PFA if a1 = log(N/PFA ).
3.3. Asymptotic performance and optimality of GWT: general, non-i.i.d. case We now show that the sequential detection algorithm of (3.1) and (3.4) asymptotically minimizes (when PFA and PMS are small) the expected sample sizes E i τ for all i = 0, 1, . . . , N among all detection procedures in the class ∆(PFA , PMS ) under minor conditions that do not confine one to the i.i.d. assumption. Note that so far we did not impose any constraints on the observation processes. In fact, the upper bounds for probabilities of errors hold true whenever probability measures P 0 , P 1 , . . . , P N are mutually locally absolutely continuous. However, if we want to study the behavior of the expected sample size (or more generally positive moments of the stopping time), some conditions should be imposed. For the sake of simplicity, assume that the vectors X ni and X nj are statistically independent (the channels are mutually independent), in which case pi (X n1 , . . . , X nN ) = pi (X ni )
N Y
p0 (X nk )
for i = 1, . . . , N ,
k=1 k6=i
p0 (X n1 , . . . , X nN ) =
N Y k=1
4
p0 (X nk ),
and hence Zi (n) = log
pi (X n1 , . . . , X nN ) pi (X ni ) = log . n n p0 (X 1 , . . . , X N ) p0 (X ni )
Therefore, under this assumption, which holds in many applications, the LLR Zi (n) depends on the observation process X n = (X n1 , . . . , X nN ) through the component X ni and does not depend on the rest of the components. Further, assume that the LLR processes Zi (n) obey the Strong Law of Large Numbers, 1 P −a.s. Zi (n) −−i−−−→ Di,0 , n→∞ n
1 P −a.s. Zi (n) −−0−−−→ −D0,i n→∞ n
for i = 1, . . . , N ,
(3.5)
where, as usual, the abbreviation P i − a.s. stands for the almost sure convergence under the measure P i , and where Di,0 and D0,i are positive finite numbers. Note right away that in the i.i.d. case, the numbers Di,0 and D0,i are nothing but the Kullback-Leibler information distances, Di,0 = E i Zi (1) and D0,i = E 0 [−Zi (1)]. Recall that ‘Hi : λ = i’ denotes the ‘hypothesis’ that the target is located in the i-th channel. Let ∆i (PFA , PMS ) = i {δ : PFA (δ) ≤ PFA , PMS (δ) ≤ PMS } be the class of one-channel detection procedures for testing the hypothesis Hi against H0 based on the data observed in the i-th channel with the corresponding error probabilities constraints. It follows from7,14,26 that under the condition (3.5) inf δ∈∆i (PFA ,PMS )
Eiτ ≥
log(1/PFA ) (1 + o(1)) Di,0
and
inf δ∈∆i (PFA ,PMS )
E0τ ≥
log(1/PMS ) (1 + o(1)) D0,i
as Pmax → 0, (3.6)
where Pmax = max(PFA , PMS ) and o(1) → 0 as Pmax → 0. Also, it is clear that Eiτ ∗ ≥ E0τ ∗ ≥
inf δ∈∆(PFA ,PMS )
inf δ∈∆(PFA ,PMS )
Eiτ ≥ E0τ ≥
inf δ∈∆i (PFA ,PMS )
inf δ∈∆i (PFA ,PMS )
Eiτ
for every i = 1, . . . , N ;
E0τ
for all i = 1, . . . , N ,
(3.7)
which along with the equalities (3.6) yield the following asymptotic lower bounds, as Pmax → 0, log(1/PFA ) (1 + o(1)), i = 1, . . . , N ; Di,0 log(1/PMS ) (1 + o(1)). E0τ ∗ ≥ inf E0τ ≥ min1≤i≤N D0,i δ∈∆(PFA ,PMS ) Eiτ ∗ ≥
inf
δ∈∆(PFA ,PMS )
Eiτ ≥
On the other hand, τ ∗ = min(τ0∗ , τ1∗ ) where τ0∗ = min n : min [−Zi (n)] ≥ a0 1≤i≤N
and τ1∗ = min n : max Zi (n) ≥ a1 . 1≤i≤N
(3.8)
(3.9)
Also, τ1∗ ≤ νi = min{n : Zi (n) ≥ a1 for any i = 1, . . . , N . Therefore, the Markov times τ0∗ and νi can be used to obtain upper bounds for E i τ ∗ . To this end, however, we have to strengthen the almost sure convergence condition (3.5), since in general it does not even guarantee finiteness of E i τ ∗ . (See Tartakovsky26 for the corresponding discussion and an example where the Strong Law (3.5) is valid but E i νi is infinite.) For ε > 0, introduce the random variables Ti,0 (ε) = sup n ≥ 1 : n1 Zi (n) − Di,0 > ε
and T0,i (ε) = sup n ≥ 1 : n1 Zi (n) + D0,i > ε ,
which are the last times when Zi (n)/n leaves the region (Di,0 − ε, Di,0 + ε) and −Zi (n)/n leaves the region (D0,i − ε, D0,i + ε), respectively. In terms of the random variables Ti,0 (ε) and T0,i (ε), the almost sure convergence (3.5) can be rewritten as P i (Ti,0 (ε) < ∞) = 1 and P 0 (T0,i (ε) < ∞) = 1 for all ε > 0. Let us strengthen this condition by assuming that E i Ti,0 (ε) < ∞ and E 0 T0,i (ε) < ∞ for all ε > 0.
(3.10)
In this case, Zi (n)/n is said to converge completely to Di,0 under P i and to −D0,i under P 0 as n → ∞ (see11,13,14,26 for the definition of the complete convergence and r-quick convergence). In what follows, we will use the notation 5
ξn → q P − completely for the complete convergence of the sequence ξn to a constant q. Therefore, the conventional strong law of large numbers (3.5) is now strengthened into the complete version 1 P i −completely Zi (n) −−−−−−−−−−→ Di,0 , n→∞ n
1 P 0 −completely Zi (n) −−−−−−−−−−→ −D0,i n→∞ n
for i = 1, . . . , N .
(3.11)
Now, everything is prepared to derive the asymptotic performance of the detection procedure in question. Indeed, it follows from the proof of Theorem 4.2 in6 that E i νi ≤
a1 (1 + o(1)), Di,0
E 0 τ0∗ ≤
a0 (1 + o(1)) mini D0,i
as min(a0 , a1 ) → ∞
if the condition (3.11) holds. Putting a0 = log(1/PMS ), a1 = log(N/PFA ) and using the fact that E 0 τ ∗ ≤ E 0 τ0∗ and E i τ ∗ ≤ E i νi , we obtain Eiτ ∗ ≤
log(1/PFA ) (1 + o(1)), Di,0
i = 1, . . . , N ;
E0τ ∗ ≤
log(1/PMS ) (1 + o(1)), min1≤i≤N D0,i
Pmax → 0.
(3.12)
Combining (3.8) and (3.12), we finally obtain the asymptotic equalities log(1/PFA ) , i = 1, . . . , N ; Di,0 δ∈∆(PFA ,PMS ) log(1/PMS ) as Pmax → 0, E0τ ∗ ∼ inf E0τ ∼ min1≤i≤N D0,i δ∈∆(PFA ,PMS ) Eiτ ∗ ∼
inf
Eiτ ∼
(3.13)
which hold whenever the complete convergence conditions (3.11) are satisfied. Throughout the paper the notation xα ∼ yα means that limα→0 (xα /yα ) = 0, i.e. xα = yα (1 + o(1)), where o(1) → 0. Therefore, the proposed detection procedure has the first-order asymptotic optimality property under very general conditions (3.10). These conditions hold for general statistical models and do not require an i.i.d. assumption which is quite restrictive for a variety of applications. In addition, regardless of the error probability constraints, the same argument applies to show that for large values of thresholds Eiτ ∗ ∼
a1 , Di,0
i = 1, . . . , N ;
E0τ ∗ ∼
a0 min1≤i≤N D0,i
as min(a0 , a1 ) → ∞.
(3.14)
3.4. Asymptotic performance of GWT in the i.i.d. case The formulas for asymptotic performance and probabilities of errors can be substantially improved when the observations are i.i.d., or more generally, when the LLR processes can be well approximated by random walks. In this section, we assume that the LLR’s Zi (n) are random walks, i.e. Zi (n) = ∆Zi (1)+· · ·+∆Zi (n), where ∆Zi (k), k = 1, 2, . . . are i.i.d. random variables with E i ∆Zi (1) = Di,0 and E 0 ∆Zi (1) = −Di,0 . This is always the case when the observations Xi,n , n = 1, 2, . . . from the i-th channel are i.i.d. with the probability density functions fi (x) if λ = i and f0 (x) if λ 6= i. In the latter case, the values of Di,0 and D0,i are the famous Kullback-Leibler information numbers that measure the ‘distances’ between the densities fi and f0 : Z Z f0 (x) fi (x) fi (x)dx and D0,i = log f0 (x)dx. Di,0 = log f0 (x) fi (x) The random walk property, however, is not limited to the i.i.d. case. There exist special cases where observations are correlated but LLR’s are random walks or ‘almost’ random walks (see23,24,26 and Section 3.5). Our first observation is that, in this case, the complete convergence conditions (3.11) are satisfied whenever E i |∆Zi (1)|2 < ∞ and E 0 |∆Zi (1)|2 < ∞. Therefore, asymptotic relations (3.13) and (3.14) hold if the second moments of ∆Zi (1) are finite. The second moment condition, however, can be relaxed in the i.i.d. case. In fact, using the same kind of argument as in,6 it can be shown that the GWT asymptotically minimizes any positive moment of the observation time whenever E i |∆Zi (1)| < ∞. More precisely, if 0 < Di,0 < ∞ and 0 < D0,i < ∞ for all i = 1, . . . , N , then for any m ≥ 1 m m a1 a0 E i (τ ∗ )m ∼ , i = 1, . . . , N ; E 0 (τ ∗ )m ∼ as min(a0 , a1 ) → ∞. (3.15) Di,0 min1≤i≤N D0,i 6
The upper bounds for the probabilities of errors (3.3) and the first order asymptotic expansions for the expected sample sizes (3.13), (3.14) obtained in the general case can be improved in the i.i.d. case by using nonlinear renewal theory.20,35,36 The hypotheses H0 and H 1 will be considered separately, since in general the performance for H0 and H 1 is different, especially in a symmetric case (see below). In the rest of the paper, we will suppose that the second moments of the LLR’s are finite: E i |Zi (1)2 | < ∞ and E 0 |Zi (1)|2 < ∞. 3.4.1. The hypothesis H 1 – the target is present In order to exploit relevant results from the nonlinear renewal theory, we rewrite the Markov time τ1∗ defined in (3.9) in the form of a random walk crossing a threshold a1 plus a nonlinear term that is slowly changing (see Woodroofe35 for the corresponding definition of slowly changing sequences). Indeed, by adding and subtracting Zi (n), the Markov time τ1∗ can be written in the following form (for any i = 1, . . . , N ) τ1∗ = min{n ≥ 1 : Zi (n) ≥ a1 − Yi (n)}, where Yi (n) = max{0, maxj6=i Zji (n)} with Zji (n) = Zj (n) − Zi (n). Clearly, X 0 ≤ Yi (n) = log max 1, max exp{Zji (n)} ≤ log 1 + exp{Zji (n)} j6=i
(3.16)
j6=i
and hence
0 ≤ E i Yi (n) ≤ E i log 1 +
X
exp{Zji (n)} ≤ log 1 +
j6=i
X
E i exp{Zji (n)} = log N.
j6=i
Since E i Zi (n) = Di,0 n linearly increases with n and E i Yi (n) is bounded by log N , one can expect that the behavior of τ1∗ , when a1 is large and the target is located in the i-th channel, will be approximately the same as the behavior of the one-sided stopping time νi (a) = min{n : Zi (n) ≥ a} if we take a = a1 − E i Yi (∞). Further, let κi (a) = Zi (νi (a)) − a on {νi < ∞} denote the overshoot (excess) of Zi (n) over the boundary a at the stopping time νi (a), and let Gi (y) = lima→∞ P i (κi (a) ≤ y) be the limiting distribution of the overshoot. It follows from20,35 that if the second moment E i |∆Zi (1)|2 is finite, then E i νi (a) = where κ i =
R∞ 0
1 (a + κ i ) + o(1) Di,0
as a → ∞,
(3.17)
y dGi (y) is the average limiting overshoot. Now, using the fact that E i Zji (1) = −D0,j − Di,0 < 0 for P −a.s.
all j 6= i, it is not difficult to show that Yi (n) −−i−−−→ 0, which implies that {Yi (n)}n≥1 is a slowly changing sequence. n→∞
Applying Theorem 4.1 from,35 we conclude that the overshoot ρ(a1 ) = max1≤i≤N Zi (τ1∗ ) − a1 at the stopping time τ1∗ has the same limiting distribution as κi (a1 ) and that E i τ1∗ = E i νi (a1 ) + o(1)
as a1 → ∞.
(3.18)
Finally, we show that E i τ1∗ = E i τ ∗ +o(1) as min(a0 , a1 ) → ∞, which together with (3.18) implies that the asymptotic equality (3.17) is also valid for the expected value of τ ∗ . We have q p E i (τ1∗ − τ ∗ ) = E i (τ1∗ − τ ∗ )1l{τ ∗ 6=τ1∗ } = E i (τ1∗ − τ ∗ )1l{d∗ =0} ≤ 2E i τi 1l{d∗ =0} ≤ 2 E i (τ1∗ )2 P i (d∗ = 0), where the latter inequality follows from the Schwarz inequality. Now, using the right inequality in (3.3) and the left asymptotic equality (3.15) with m = 2, we obtain E i (τ1∗ − τ ∗ ) ≤ Ca1 e−a0 /2 (1 + o(1)) → 0
7
as min(a0 , a1 ) → ∞,
which along with the equality (3.18) shows that E i τ ∗ = E i νi (a1 ) + o(1) as min(a0 , a1 ) → ∞. Thus, it follows from (3.17) that 1 (a1 + κ i ) + o(1) for i = 1, . . . , N as min(a0 , a1 ) → ∞. (3.19) Eiτ ∗ = Di,0 Next, we refine the inequality (3.3) for the false alarm probability by taking into account overshoots over the thresholds and using the fact that the stopping times τ ∗ , τ1∗ , and νi are close to each other when the target is present in the i-th channel (for any i = 1, . . . , N ). Obviously, for all i = 1, . . . , N h h i i ∗ ∗ PFA (δ ∗ ) = E 0 1l{τ ∗ =τ1∗ } = E i e−Zi (τ1 ) 1l{τ ∗ =τ1∗ } = e−a1 E i e−ρ+Yi (τ1 ) 1l{τ ∗ =τ1∗ } . By inequalities (3.16), for all i = 1, . . . , N h i ∗ E i e−ρ 1l{τ ∗ =τ1∗ } ≤ E i e−ρ+Yi (τ1 ) 1l{τ ∗ =τ1∗ } N X X ≤ E i e−ρ 1 + exp{Zji (τ1∗ )} 1l{τ ∗ =τ1∗ } = E i e−ρ 1l{τ ∗ =τ1∗ } . i=1
j6=i
i (δ ∗ ) → 1. Therefore, Now, as a0 , a1 → ∞, ρ(a1 ) → κi (a1 ) in distribution (under P i ) and P i (τ ∗ = τ1∗ ) = 1 − PMS E i e−ρ 1l{τ ∗ =τ1∗ } → γi as min(a0 , a1 ) → ∞,
where γi = lima→∞ E i [e−κi (a) ] =
−a1
max γi e
1≤i≤N
R∞ 0
e−y dGi (y). It follows that ∗
(1 + o(1)) ≤ PFA (δ ) ≤
N X
! γi
e−a1 (1 + o(1))
as min(a0 , a1 ) → ∞.
(3.20)
i=1
Since 0 < γi ≤ 1, the upper bound in (3.20) gives a better approximation to PFA (δ ∗ ) than the upper bound in (3.3). Monte Carlo simulations show that (3.20) is considerably more accurate compared to (3.3). It is important to emphasize that the asymptotic formula (3.19) coincides with the one for the Wald’s procedure which tests the hypothesis H0 against the hypothesis Hi . This fact has the following explanation. Since E i ∆Zi (1) = Di,0 > 0 and E i ∆Zj (1) = −D0,j < 0 for all j 6= i, the statistic Zi (n) has the positive drift, while all other LLR’s Zj (n) have negative drifts. Therefore, when the target is located in the i-th channel, the major impact on the performance is due to Zi (n), and the influence of the rest of the statistics can be neglected if the thresholds are large enough. In the symmetric case where the distances Di,0 = D1 between Hi and H0 are the same for all i = 1, . . . , N (typical when SNR does not depend on the channel), the formulas become especially simple, since the values of Di,0 = D1 , κ i = κ 1 , and γi = γ1 do not depend on i. Therefore, Eiτ ∗ ≈
1 (a1 + κ 1 ) D1
for i = 1, . . . , N ;
γ1 e−a1 . PFA (δ ∗ ) . N γ1 e−a1
(3.21)
where κ 1 = lima→∞ E 1 [Z1 (ν1 (a)) − a] and γ1 = lima→∞ E 1 {e−[Z1 (ν1 (a))−a] }. The values of κ 1 and γ1 can be computed either exactly or approximately in a variety of particular problems (see, e.g.20,24,28,35 and Section 3.5). 3.4.2. The hypothesis H0 – no target For the hypothesis H0 , the performance changes dramatically depending on whether the distances D0,i = E 0 [−∆Zi (1)] are different for different channels i (asymmetric case) or the same (symmetric case). Thus, these two cases must be considered separately. Asymmetric case. First, we consider an asymmetric case where the number i∗ = arg mini D0,i for which the Kullback-Leibler information distance D0,i = E 0 [−∆Zi (1)] attains its minimum is unique. Observing that the
8
Markov time τ0∗ (see (3.9)) can be written in the form τ0∗ = min{n : Zi∗ (n) ≤ −a0 − Yi∗ (n)} and using exactly the same kind of argument as above, we obtain E0τ ∗ =
1 D0,i∗
∗
i PMS (δ ∗ ) = γ0,i∗ e−a0 (1 + o(1)),
(a0 + κ 0,i∗ ) + o(1)
as min(a0 , a1 ) → ∞;
i PMS (δ ∗ ) ≤ γ0,i∗ e−a0 (1 + o(1)),
i 6= i∗
(3.22)
as min(a0 , a1 ) → ∞.
(3.23)
Here we used the following notation: κ 0,i∗ = lim E 0 κ0,i∗ (a) a→∞
and γ0,i∗ = lim E 0 e−κ0,i∗ (a) , a→∞
where κ0,i∗ (a) = −Zi (ν0,i∗ (a)) − a is the overshoot of −Zi∗ (n) over the threshold a at the stopping time ν0,i∗ = min{n : −Zi∗ (n) ≥ a}. Symmetric case. Assume that the probability density functions fi (x) = f1 (x) are the same for all i = 1, . . . , N . Then, obviously, D0,i = D0 and Di,0 = D1 for all i = 1, . . . , N . This opposite, symmetric situation is typical for many applications when SNR is the same for all channels. In this case, the asymptotic approximation to the expected sample size under the hypothesis H0 is completely different. Specifically, the second term of expansion for E 0 τ ∗ is not a constant as in the asymmetric case, but is proportional to the square root of the threshold a0 . Therefore, the second term is also growing with the threshold a0 , and the first-order approximation is usually very inaccurate for moderate values of the probabilities of errors. The reason is that, in the symmetric case, none of the LLR statistics Zi (n) play the dominating role when the hypothesis H0 is true. As a result, the sequence Yi (n) is not slowly changing. To obtain the higher order asymptotic expansion for E 0 τ ∗ , we used the most general results of nonlinear renewal theory developed by Zhang.36 We omit all mathematical details and give the final result without proof (some details may be found in7 ). We will need the following additional notation: v02 =V ar0 [∆Z1 (1)],
ξk = −(N − 1)D0 −
N X
∆Zi (1),
ξ˜k = ξk −
i=1 i6=k
Z
∞
CN =N
N η X ξi , 1 + η i=1
η=
p 1 + N (N − 2) − 1,
h i xϕ(x)Φ(x)N −2 (N − 1)ϕ(x)(1 − x2 ) + (x3 − 3x)Φ(x) dx
−∞
Rx where ϕ(x) = (2π)−1/2 exp(−x2 /2) and Φ(x) = −∞ ϕ(y)dy are respectively standard normal density and distribution functions and where V ar0 [·] is the variance relative to the density f0 . Also, by hN , we denote the expected value of the N -th order statistic from the standard normal distribution. If E 0 |∆Zi (1)|3 < ∞, then s " # v02 h2N 1 a0 v02 h2N CN 3 ∗ E0τ = a0 + v0 hN + + + κ 0 + 2 E 0 ξ˜1 + o(1) D0 D0 4D02 2D0 6v0
as min(a0 , a1 ) → ∞.
(3.24)
The values of universal constants CN and hN for N = 2 − 1000 can be found in.7 i We do not know how to improve the upper bounds (3.3) for the missed detection probabilities PMS (δ ∗ ) in the i ∗ symmetric case. Note, however, that the probabilities PMS (δ ) do not depend on i, and the formula (3.23) with γ0,i = γ0 works slightly better than the upper bound (3.3) at least in the example considered in the next section.
3.5. An example of target detection in colored Gaussian noise We now consider an example where despite the observed data are dependent, LLR’s are the processes with independent increments. In some cases they can even be approximated by random walks. Consider the additive model where Xi,n = Si,n + Vi,n if λ = i and Xi,n = Vi,n if λ = j, j 6= i or λ = 0. We will suppose that the signal Si,n is deterministic and sensor noise Vi,n in the i-th channel is modeled by an autoregressive Gaussian process of order p which obeys the recursion Vk,t =
p X
t ≥ 1,
ρl Vk,t−l + wk,t ,
l=1
9
vk,n = 0 for n ≤ 0.
Here wk,1 , wk,2 , . . . are i.i.d. Gaussian random variables with mean zero and variance σ 2 (wi,t and wj,t are independent for i 6= j). We assume that the parameters ρl and σ 2 are known. In radar applications, Si,n usually represents the result of preprocessing (attenuation and matched filtering) of modulated pulses in the i-th channel. ei,n ,k = Xi,k − Pnp ρl Xi,k−l , and Sei,n ,k = Si,k − Pnp ρl Si,k−l . It can be shown Write np = min(n − 1, p), X p p l=1 l=1 that n n X 1 Xe 2 ei,n ,k − 1 Zi (n) = 2 Si,np ,k X Sei,n . p p ,k σ 2σ 2 k=1
k=1
ei,n ,n , n = 1, 2, . . . are independent normal random variables with E i X ei,n ,n = Sei,n ,n , E i [X ei,n ,n − Next, obviously, X p p p p 2 ei,n ,n = 0, E 0 X e2 Sei,np ,n ]2 = σ 2 and E 0 X p i,np ,n = σ . Therefore, Zi (n) is the Gaussian process with independent increments ∆Zi (n), n = 1, 2, . . . and parameters E i Zi (n) = −E 0 Zi (n) =
n 1 X e2 Si,np ,k , 2σ 2
V ari [Zi (n)] = V ar0 [Zi (n)] =
k=1
Assume that
n X 1 2 Sei,p,k = q, n→∞ σ 2 (n − p)
k=1
n X
lim
k=p+1
n 1 X e2 Si,np ,k . σ2
|Sei,p,k |m < ∞ for all m > 0 and n < ∞
(3.25)
k=p+1
where q characterizes average SNR, 0 < q < ∞. Since all the moments of Zi (n) are finite, 1 P i −completely q Zi (n) −−−−−−−−−−→ n→∞ n 2
and
1 q P 0 −completely Zi (n) −−−−−−−−−−→ − n→∞ n 2
for all i = 1, . . . , N .
Thus, according to the results obtained in Section 3.3, the detection algorithm is asymptotically optimal, and the asymptotic formulas (3.13)-(3.14) hold with D0,i = Di,0 = q/2. Consider the particular case where the signal is constant, Sk,n = θ. Then the condition (3.25) is fulfilled with Pp 2 q = θ2 (1 − l=1 ρl ) /σ 2 . Extensive Monte Carlo simulations have been performed for different number of channels, SNR, and probabilities of errors. Sample results for the two-channel system are shown in Tables 1–2. We simulated a first-order autoregressive process, in which case q = θ2 (1 − ρ)2 /σ 2 , where ρ is the correlation coefficient (0 ≤ ρ < 1). In simulations, we used the following threshold values a1 = log (γ1 /PFA )
and a0 = log (γ0 /PMS ) ,
where the constants γ1 and γ0 are computed from the formula35 ( ) ∞ p X γ1 = γ0 = γ = exp −2 . k −1 Φ − q/4k
(3.26)
(3.27)
k=1
The first formula in (3.26) follows from the lower bound in (3.21), while the second one with γ0 = 1 from the upper bound (3.3). Experimentation indicates that the formulas (3.26) with γ from (3.27) give quite accurate approximations to the probabilities of errors. To compute the average sample sizes (ASS) τ¯0 = E 0 τ ∗ and τ¯1 = E 1 τ ∗ (due to the symmetry, E i τ ∗ = E 1 τ ∗ for all i = 1, . . . , N ), we used the following second order (SO) approximations q 2 2 τ¯1 (PFA , PMS , q) ≈ (a1 + κ 1 ) and τ¯0 (PFA , PMS , q) ≈ a0 + hN 2a0 + h2N + h2N + κ 0 , (3.28) q q where κ 0 = κ 1 = κ = 1 + q/4 −
∞ p p i √ X h −1/2 p q k ϕ q/4k − q/4k Φ − q/4k . k=1
The formulas (3.28) follow from the higher order asymptotics (3.21) and (3.24). To apply these formulas, we first approximated the LLR’s by homogeneous random walks. It was possible because in the case considered, the LLR 10
Zi (n) has i.i.d. Gaussian increments with E i [Zi (n)−Zi (n−1)] = q/2 and V ari [Zi (n)−Zi (n−1)] = q for all n ≥ p+1. Therefore, despite the fact that for n ≤ p the increments are not i.i.d., one can hope that an i.i.d. approximation will work well as long as the observation time is not very small. For the purpose of comparison, we also used the first order (FO) approximations for ASS (see (3.14)) 2a1 q
τ¯1 (PFA , PMS , q) ≈
and τ¯0 (PFA , PMS , q) ≈
2a0 . q
(3.29)
Table 1. Results for GWT with the thresholds (3.26) N = 2, ρ = 0.5, PFA = 10−3 , PMS = 10−1 H1
H0
q
PMS
MC ASS
FO ASS
SO ASS
E1
PFA
MC ASS
FO ASS
SO ASS
E0
0.1
0.0677
123.925
134.473
138.414
1.542
1.01 · 10−3
65.935
42.369
69.529
2.899
0.2
0.0669
60.367
66.475
69.340
1.583
9.22 · 10−4
32.703
20.423
34.687
2.922
0.5
0.0680
22.176
25.988
27.901
1.724
9.34 · 10−4
10.995
7.567
13.868
3.477
−4
4.687
3.447
6.977
4.078
1
0.0613
9.885
12.657
14.093
1.933
9.51 · 10
2
0.0392
4.045
6.094
7.197
2.362
9.03 · 10−4
2.009
1.488
3.565
4.757
−4
1.126
0.582
1.886
4.243
1.032
0.308
1.340
3.087
4
0.0141
1.565
2.885
3.758
3.053
5.15 · 10
6
0.0041
1.157
1.843
2.619
2.753
2.66 · 10−4
Table 2. Results for GWT with the thresholds (3.4) N = 2, ρ = 0.5, PFA = 10−3 , PMS = 10−1 H1 q
PMS
MC ASS
H0
FO ASS
SO ASS
PFA
MC ASS
FO ASS
SO ASS
−4
73.071
46.052
74.199
0.1
0.0548
140.288
138.155
142.096
4.00 · 10
0.2
0.0519
69.741
69.078
71.942
3.66 · 10−4
34.974
23.026
37.994
−4
13.066
9.210
15.964
0.5
0.0459
27.572
27.631
29.543
3.97 · 10
1
0.0361
13.005
13.816
15.251
2.72 · 10−4
6.154
4.605
8.462
2
0.0217
5.319
6.908
8.011
2.11 · 10−4
2.564
2.302
4.616
4
0.0073
2.048
3.454
4.328
1.17 · 10−4
1.290
1.151
2.630
6
0.0018
1.327
2.303
3.078
5.1 · 10−5
1.058
0.768
1.947
In Table 1, we present the Monte Carlo (MC) estimates of ASS τ¯0 and τ¯1 along with the theoretical values computed according to (3.28) and (3.29). The abbreviations MC ASS, FO ASS, and SO ASS are used for the ASS obtained by the MC experiment, by FO approximations (3.29) and by SO approximations (3.28), respectively. In this table, we also show the MC estimates for the probabilities of errors PFA (δ ∗ ) and PMS (δ ∗ ). The thresholds were chosen from the formulas (3.26) and (3.27) that account for overshoots. It is seen that these formulas allow one to obtain quite accurate approximations for true error probabilities as long as the average sample size is not very 11
small. FO ASS (3.29) is quite accurate for the hypothesis H1 but is very poor for the hypothesis H0 . This could be expected, since the FO approximation for τ¯0 neglects the second term that increases at the rate of the square root of the threshold. SO ASS τ¯0 gives fairly accurate estimates in all cases where SNR is not very large (ASS is not very small). For H1 , however, even FO ASS slightly overestimates the true value of τ¯1 (for most values of SNR). By this reason, it works better than the SO approximation. Furthermore, in the columns E1 and E0 , we show the results of the comparison with the multichannel fixed sample size (FSS) detection algorithm, which is based on the maximum likelihood statistic: 1 if max1≤i≤N Zi (n) ≥ h dn = 0 if max1≤i≤N Zi (n) < h, where h is a threshold. It can be shown that the sample size n0 (PFA , PMS , q), which is required to guarantee the probabilities of errors PFA and PMS , is equal to 2 h i PMS 1 −1 1/N −1 (1 − PFA ) +Φ , (3.30) Φ 1− n0 (PFA , PMS , q) ≈ q (1 − PFA )1−1/N where Φ−1 (α) = {x : Φ(x) = α} is the quantile of the standard normal distribution. The efficiency of the sequential detection algorithm compared to the FSS algorithm is defined as Ei (PFA , PMS , q) = n0 (PFA , PMS , q)/¯ τi (PFA , PMS , q). The data in the table allows us to conclude that for the chosen probabilities of errors, the sequential algorithm requires 3 − 4 times smaller sample when there is no target and 1.5 − 3 times smaller sample size when there is a target. Note that the approximation in (3.30) is related only to the fact that the distribution of ∆Zi (1) differs from the distribution of the ∆Zi (2), ∆Zi (3) . . . . Therefore, this approximation is very accurate in all cases where n0 1. In Table 2, we present the results of analysis of the detection procedure that uses the thresholds (3.4) which were derived based on the general upper bounds (3.3). It is easily seen that for this last procedure the true probabilities of errors PFA (δ ∗ ) and PMS (δ ∗ ) are substantially smaller than the required values when SNR q ≥ 0.5. This leads to an increase of the true values of the average sample size, which is highly undesirable. We also remark that the equivalent SNR q = θ2 (1 − ρ)2 /σ 2 depends on the correlation coefficient of noise through the factor (1 − ρ)2 . Since the ASS grows approximately as 1/q, it substantially increases when ρ increases. In particular, the ASS τ¯1 and τ¯0 are about four times higher for ρ = 0.5 as compared to the case of non-correlated noise (ρ = 0).
4. FUSION OF DECISIONS FROM MULTIPLE SENSORS Once the local decision ds = ds (τs∗ ) in the s-th sensor is made (at time τs∗ ), it is transmitted to a fusion center. Consider a (generally randomized) non-sequential fusion rule Ψ` = Ψ` (u` ) that fuses the whole set of decisions u` = (d1 , d2 , . . . , d` ) obtained from all sensors. As we already discussed in Section 2, the Ψ` is identified with the probability to make a decision in favor of the hypothesis H1 that the target is present. Note that now this hypothesis became simple. The notation H1 in this section should not be confused with the notation in previous sections where H1 denoted the hypothesis that the target is located in the first channel. Write L` for the LLR of the hypotheses H1 and H0 , P 1 (d1 , . . . , d` ) , L` = log P 0 (d1 , . . . , d` ) where P i (d1 , . . . , d` ) is the joint probability of the vector u` under the hypothesis Hi . Recall that ds is a Bernoulli random variable that takes values in the set {0, 1}. Introduce the following randomized fusion rule: 1 ∗ Ψ` = π 0
if L` > h, if L` = h, if L` < h,
(4.1)
where h is a threshold and π ∈ [0, 1] is a probability to accept the hypothesis H1 when L` = h. In other words, if L` > h, then H1 is accepted with probability 1. If L` < h, then H1 is rejected (H0 is accepted with probability 1). 12
Finally, if L` = h, then the probability to accept the hypothesis H1 is equal to π and the probability to accept H0 is equal to 1 − π. Let the threshold h and the probability π be chosen so that the probability of a false alarm β0∗ = E 0 (Ψ∗` ) is equal to a given value β0 . It is known9,15 that the rule (4.1) is the optimal rule in the sense that it maximizes the probability of detection among all rules for which the false alarm probability does not exceed β0 . For the sake of simplicity, consider a symmetric fusion scenario when the local decisions are independent and the probabilities of errors for all sensors are the same, i.e. P 0 (ds = 1) ≈ PFA and P 1 (ds = 0) ≈ PMS (at least for small PFA and PMS ) for all s = 1, . . . , `. Then, it is easily seen that L(`) =
` X PMS 1 − PMS + (1 − ds ) log , ds log PFA 1 − PFA s=1
Thus, in this particular case the optimal rule (4.1) can be 1 ∗ Ψ` = π 0
ds = 0, 1.
written in the form if S` > h, if S` = h, if S` < h,
(4.2)
P` where S` = s=1 ds and the threshold h takes integer values 0, 1, 2, . . . . The values of h and π satisfy the condition E 0 Ψ∗` = β0 , which yields the equation πP 0 (S` = h) + P 0 (S` > h) = β0 .
(4.3)
Next, the probability of missed detection β1∗ for this rule is computed as follows β1∗ = E 1 (1 − Ψ∗` ) = P 1 {S` < h} + (1 − π)P 1 {S` = h}.
(4.4)
Evidently, under H0 , the statistic S` has the binomial distribution B(PFA , `) with the parameter PFA , while under H1 it has the binomial distribution B(1 − PMS , `) with the parameter 1 − PMS . Therefore, using (4.3), we obtain the following equation for the numbers h = h(`, PFA ) and π = π(`, PFA ) ` X
n h C`n PFA (1 − PFA )`−n + πC`h PFA (1 − PFA )`−h = β0 ,
(4.5)
n=h+1
where C`n = `!/n!(` − n)!. Also, it follows from (4.4) that β1∗ =
h−1 X
`−n `−h C`n (1 − PMS )n PMS + (1 − π)C`h (1 − PMS )h PMS .
(4.6)
n=0
The formulas (4.5) and (4.6) allow us to compute the parameters of the optimal fusion rule that minimizes the missed detection probability β1∗ , and the optimal performance β1∗ (`) as a function of the number of fused sensors. In Table 3, we show sample results in the case where PFA = β0∗ = β0 = 10−3 and PMS = 10−1 . The data show that the missed detection probability β1∗ (`) decreases quite fast when ` increases. In particular, β1∗ (1) = 0.1, while after fusion the decisions from 5 sensors β1∗ (5) = 3.7 · 10−4 . Table 3. Mis-detection probability versus number of sensors, PFA = β0 = 10−3 , PMS = 0.1 l
1
2
3
4
5
h
−
1
1
1
1
π
−
0.5
0.333
0.249
0.199
β1∗
1 · 10−1
1 · 10−1
1.9 · 10−2
2.8 · 10−3
3.7 · 10−4
13
5. SUMMARY AND CONCLUSIONS We now summarize the results of the paper. The introduced sequential detection algorithm is based on the maximum likelihood arguments and has three attractive features. First, by being simple, it can be readily implemented on line. Second, it allows one to upper-bound the probabilities of errors for arbitrary statistical models. Third, it minimizes the expected sample size for small error probabilities, again for arbitrary models that are not restricted to the conventional i.i.d. assumption. The results given in Section 3.5 show a big advantage of the proposed multichannel sequential detection procedure over the multichannel non-sequential procedure based on the maximum likelihood ratio statistic. The former detection procedure requires 2 − 4 times smaller sample size for the error probabilities 10−1 − 10−3 if SNR is not very large. The analysis performed in Section 4 shows that fusion of independent decisions from sensors allows us to reduce the probabilities of errors. For example, fusion of decisions from 3 sensors increases the probability of detection from 0.9 to 0.98 when the false alarm probability is maintained at the level 10−3 .
ACKNOWLEDGMENTS The research of A. Tartakovsky and G. Yaralov was supported in part by the U.S. ONR grants N00014-99-1-0068 and N00014-95-1-0229 and by the U.S. ARO grant DAAG55-98-1-0418. The research of X.-R. Li was supported in part by ONR via grant N00014-00-1-0677 and by NSF via Grant ECS-9734285.
REFERENCES 1. P. Armitage, “Sequential Analysis with More Than Two Alternative Hypotheses, and Its Relation to Discriminant Function Analysis,” J. Royal Statist. Soc. B , 12: 137–144, 1950. 2. R.S. Blum, S.A. Kassam, and H.V. Poor, “Distributed Detection with Multiple Sensors: Part II–Advanced Topics,” Proceedings of the IEEE, 85: 64–79, 1997. 3. Z. Chair and P.K. Varshney, “Optimal Data Fusion in Multiple Sensor Detection Systems,” IEEE Trans. Aerospace and Electronic Systems, 22(1): 98–101, 1986. 4. H. Chernoff, “Sequential Design of Experiments,” The Annals of Mathematical Statistics, 30: 755–770, 1959. 5. V.P. Dragalin, “Asymptotic Solution of a Problem of Detecting a Signal from k Channels,” Russian Mathematical Surveys, 42: 213–214, 1987. 6. V.P. Dragalin, A.G. Tartakovsky, and V. Veeravalli, “Multihypothesis Sequential Probability Ratio Tests, I: Asymptotic Optimality,” IEEE Trans. Information Theory, 45(7): 2448–2461, 1999. 7. V.P. Dragalin, A.G. Tartakovsky, and V. Veeravalli, “Multihypothesis Sequential Probability Ratio Tests, II: Accurate Asymptotic Expansions for the Expected Sample Size,” IEEE Trans. Information Theory, 46(4): 1366–1383, 2000. 8. E. Drakopoulos and C.C. Lee, “Optimum Multisensor Fusion of Correlated Local Decisions,” IEEE Trans. Aerospace and Electronic Systems, 27(4): 593–606, 1991. 9. T.S. Ferguson, Mathematical Statistics: A Decision Theoretic Approach. Academic Press, New York, London, 1967. 10. D.D. Freedman and P.A. Smyton, “Overview of Data Fusion Activities”, Proc. American Control Conference, San Francisco, June, 1993. 11. P.L. Hsu and H. Robbins, “Complete Convergence and the Law of Large Numbers,” Proc. Nat. Acad. Sci. U.S.A., 33: 25–31, 1947. 12. L.A. Klein, Sensor and Data Fusion Concepts and Applications, SPIE Press, 1993. 13. T.L. Lai, “On r-Quick Convergence and a Conjecture of Strassen,” The Annals of Probability, 4: 612–627, 1976. 14. T.L. Lai, “Asymptotic Optimality of Invariant Sequential Probability Ratio Tests,” The Annals of Statistics, 9: 318–333, 1981. 15. E.L. Lehmann, Testing Statistical Hypotheses, John Wiley & Sons, New York, 1986. 16. G. Lorden, “Nearly-Optimal Sequential Tests for Finitely Many Parameter Values,” The Annals of Statistics, 5: 1–21, 1977. 17. R.C. Luo and M.G. Kay, “Multisensor Integration and Fusion in Intelligent Systems,” IEEE Trans. Systems, Man, and Cybernetics, 19: 901–931, 1989.
14
18. J.D. Papastavrou and M. Athans, “Distributed Detection by a Large Team of Sensors in Tandem,” IEEE Trans. Aerospace and Electronic Systems, 28: 639–652, 1992. 19. I.V. Pavlov, “Sequential Procedure of Testing Composite Hypotheses with Applications to the Kiefer-Weiss Problem,” Theory Prob. Appl., 35: 280–292, 1990. 20. D. Siegmund, Sequential Analysis: Tests and Confidence Intervals. Springer-Verlag, New York, 1985. 21. Yu.G. Sosulin and M.M. Fishman, Theory of Sequential Decisions and Its Applications. Radio i Svyaz’, Moscow, 1985 (In Russian). 22. Z.B. Tang, K.R. Pattipati and D.K. Kleinman, “An Algorithm for Determining the Decision Threshold in a Distributed Detection Problem,” IEEE Trans. Systems, Man, and Cybernetics, 21: 231–237, 1991. 23. A.G. Tartakovskii, “Sequential Testing of Many Simple Hypotheses with Dependent Observations,” Probl. Information Transmis., 24: 299–309, 1988. 24. A.G. Tartakovsky, Sequential Methods in the Theory of Information Systems. Radio i Svyaz’, Moscow, 1991 (In Russian). 25. A.G. Tartakovsky, “Asymptotically Optimal Sequential Tests for Nonhomogeneous Processes,” Sequential Analysis, 17: 33–62, 1998. 26. A.G. Tartakovsky, “Asymptotic Optimality of Certain Multihypothesis Sequential Tests: Non-i.i.d. Case,” Statistical Inference for Stochastic Processes, 1(3): 265–295, 1998. 27. A.G. Tartakovsky, “Minimax Invariant Regret Solution to the N -Sample Slippage Problem,” Mathematical Methods of Statistics, 6(4): 491–508, 1997. 28. A.G. Tartakovsky and I.A. Ivanova, “Approximations in Sequential Rules for Testing Composite Hypotheses and Their Accuracy in the Problem of Signal Detection from Post-Detector Data,” Problems of Information Transmission, 28: 63–74, 1992. 29. A.G. Tartakovsky and X.-R. Li, “Sequential Testing of Multiple Hypotheses in Distributed Systems,” Proceeding of the 3rd International Conference on Information Fusion, Paris, France, July 2000, Volume II. 30. S.C.A. Thomopoulos, R. Viswanathan, and D.P. Bougoulias, “Optimal Decision Fusion in Multiple Sensor Systems,” IEEE Trans. Aerospace and Electronic Systems, 23(5): 644–653, 1987. 31. J.N. Tsitsiklis, “Decentralized Detection,” Advances in Statistical Signal Processing, Vol. 2: Signal Detection, (H.V. Poor and J.B. Thomas, Ed.), JAI Press, Greenwich, CT, 1993. 32. P.K. Varshney, Distributed Detection and Data Fusion. Springer-Verlag, New York, 1997. 33. V. Venugopal, V. Veeravalli, T. Basar, and H.V. Poor, “Decentralized Sequential Detection with a Fusion Center Performing the Sequential Test,” IEEE Trans. Information Theory, 37: 433-442, 1993. 34. V. Veeravalli, T. Basar, and H.V. Poor, “Minimax Robust Decentralized Detection,” IEEE Trans. Information Theory, 38: 35-40, 1994. 35. M. Woodroofe, Nonlinear Renewal Theory in Sequential Analysis. SIAM, Philadelphia, 1982. 36. C.H. Zhang, “A Nonlinear Renewal Theory,” The Annals of Probability, 16: 793-825, 1988. 37. Y.M. Zhu, “Optimum Fusion in Distributed Multisensor Network Decision Systems,” Proc. 14th World Congress of International Federation of Automatic Control, Beijing, July 1999. 38. Y. Zhu and X.R. Li, “Optimal Decision Fusion Given Sensor Rules,” Proc. 1999 International Conf. on Information Fusion (FUSION’99), 1999, pp. 85–92. 39. Y.M. Zhu, C.Y. Liu, and Y. Gan, “Multisensor Distributed Neyman-Pearson Decision with Correlated Sensor Observations,” Proc. 1998 International Conf. on Information Fusion (FUSION’98), 1998, pp. 35–39. 40. Y.M. Zhu, Y. Gan, K.S. Zhang, and C.Y. Liu, “Multisensor Neyman-Pearson Type Sequential Decision with Correlated Sensor Observations,” Proc. 1998 International Conf. on Information Fusion (FUSION’98), 1998, pp. 787–793.
15