Multidecision Quickest Change-Point Detection: Previous ... - CiteSeerX

5 downloads 0 Views 594KB Size Report
Previous Achievements and Open Problems. Alexander G. Tartakovsky. Department of Mathematics, University of Southern California,. Los Angeles, California ...
Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

Sequential Analysis, 27: 201–231, 2008 Copyright © Taylor & Francis Group, LLC ISSN: 0747-4946 print/1532-4176 online DOI: 10.1080/07474940801989202

Multidecision Quickest Change-Point Detection: Previous Achievements and Open Problems Alexander G. Tartakovsky Department of Mathematics, University of Southern California, Los Angeles, California, USA

Abstract: The following multidecision quickest detection problem, which is of importance for a variety of applications, is considered. There are N populations that are either statistically identical or where the change occurs in one of them at an unknown point in time. Alternatively, there may be N “isolated” points/hypotheses associated with a change. It is necessary to detect the change in distribution as soon as possible and indicate which population is “corrupted” or which hypothesis is true after a change occurs. Both the false alarm rate and misidentification rate should be controlled by given (usually low) levels. We discuss performance of natural multihypothesis/multipopulation generalizations of the Page and Shiryaev-Roberts procedures, including certain asymptotically optimal properties of these tests when both the false alarm and the misidentification rates are low. Specifically, we show that under certain conditions the proposed multihypothesis detectionidentification procedures asymptotically minimize the trade-off between any positive moment of the detection lag and the false alarm/misclassification rates in the worst-case scenario. At the same time, the corresponding sequential detection-identification procedures are computationally simple and can be easily implemented online in a variety of applications such as rapid detection of intrusions in large-scale distributed computer networks, target detection in cluttered environment, and detection of terrorists’ malicious activity. Limitations of the existing and proposed solutions to this challenging problem are also discussed. Keywords: Asymptotic optimality; Average detection delay; False alarm rate; Misidentification rate; Moments of the detection delay; Multidecision change-point detection; Multihypothesis sequential tests; Slippage problems. Subject Classifications: 62L15; 60G40; 62F12; 62F15.

Received October 1, 2007, Revised February 7, 2008, Accepted February 10, 2008 Recommended by Nitis Mukhopadhyay Address correspondence to Alexander G. Tartakovsky, Department of Mathematics, University of Southern California, KAP-108, Los Angeles, CA 90089-2532, USA; Fax: 213740-2450; E-mail: [email protected]

202

Tartakovsky

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

1. INTRODUCTION AND FORMULATION OF THE PROBLEM Let     be a probability space and X = Xn  n ≥ 1 be a discrete-time stochastic process defined on this space. We are interested in the following multiple decision problem (multipopulation problem), which is a natural extension of the slippage problem (Ferguson, 1967; Mosteller, 1948; Tartakovsky, 1997) to the case where populations may change their properties at unknown points in time. Suppose there are N populations whose distributions are identical up to some unknown instant ,  = 1 2    , while after this instant one of the populations (and only one) changes its statistical properties. On the basis of samples from the N populations we wish to decide which population has slipped to the right of the rest as soon as possible after the change occurs when the rates of false alarms and misidentification are low. This problem is of considerable practical importance and arises across different fields of science and engineering. In particular, one of the typical scenarios that leads to such a problem is target (signal) detection in the N -channel (or multiresolution) systems (infrared, radar, etc.), which appears randomly at an unknown a priori moment. In this case it is necessary to detect the target as quickly as possible along with indication of the channel where it appears, controlling the rate of false alarms and target missing (see, e.g., Tartakovsky, 1991b). Another important application area is computer network security—rapid detection and identification of computer intrusions (see, e.g., Tartakovsky et al., 2006). To be specific, it is assumed that the observed stochastic process Xn = X1 n     XN n is an N -component process. The component Xl n, n = 1 2    corresponds to observations obtained from the lth population, and all populations can be observed simultaneously. Let 0 stand for the probability measure when the change does not occur (all populations are equal) and let i be the probability when the change occurs at time  in the ith population. Note that if  =  (the change does not occur), then i = 0 for all i. 0 and i will be used to denote the corresponding expectations. We shall also suppose that the change may occur only in one population. In mathematical terms, a sequential detection/identification procedure is a pair =   d, where is a stopping time with respect to the family of sigma-algebras n = X1     Xn, n ≥ 0, and d = dX1     X  is the terminal decision function taking values in 1     N. That is, is the moment of detection of a change ( ≤ n ∈ n ) and d = i, i = 1     N is the decision that the change occurred in the ith population (identification). For a good detection-identification procedure a detection lag  − + should be “stochastically small” while the rates of false alarms and wrong identification should be low. A reasonable measure of the detection lag is the average detection delay (ADD) that can be expressed via the conditional expected detection delay time i  −   ≥ . The false alarm rate (FAR) can be measured by the average run length to false alarm ARL2FA  = 0 and the misidentification rate by the probabilities of wrong identification i d = i  ≥ , i = 1     N . Thus, a good detection procedure is a procedure for which i  −   ≥  and i d = i  ≥  are small while 0 is large. Alternatively, the FAR can be measured by the (local) conditional probability of false alarm (PFA) in a certain time window of length ≥ 1, PFA   = supk≥1 0  < + k  ≥ k, which should be reasonably small. See Tartakovsky (2005) for a discussion of the usefulness of this approach in a variety of applications.

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

Multidecision Quickest Change-Point Detection

203

Standard measures of “as quickly as possible” that are being commonly used (Lorden, 1971; Pollak, 1985, 1987) are either D1i   = sup i  −  ≥  or ES1i   = sup ess sup i  − +  . We will use a more general measure, m Dm i   = sup i  −   ≥  for m > 0

(1.1)

1≤ 0, this means that the whole distribution i  −  ≥  is concentrated close enough to the change point and this is true for any population. The functions (1.1) along with the following modification  m   = sup i  − m  d = i  ≥  m > 0 D i

(1.2)

1≤ 0), but if j = i, then the loss is gij (possibly gij = g). In the worst-case scenario this leads to the  m   and the probabilities of a wrong optimization of the trade-off between D i  1   classification i d = i  ≥  (say). Thus, suitable performance indices are D i m 1 m   ) rather than D  . However, as we will see, the risk D is useful (generally D i i i  m . Besides, it is of interest for the proof of asymptotic optimality with respect to D i in its own right.  m   may be made arbitrary small if one does not confine Since, obviously, the D i other characteristics such as false alarm and misidentification rates, we have to introduce some constraints that will be expressed in the following class of detection procedures. Class T :   T  =  0 ≥ T sup i d = i  ≥  ≤ i for all i = 1     N

(1.3)

1≤ 1, 0 < i < 1, and  = 1      N . Thus, this class includes only procedures for which the ARL2FA exceeds the predefined number T and the probabilities of misidentification do not exceed the predefined numbers i . As an alternative, we could consider the class that confines

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

204

Tartakovsky

the local false alarm probability PFA   = supk 0  < + k ≥ k, instead of the ARL2FA. However, this interesting problem is out of the scope of this paper and will be considered elsewhere. See a remark in Section 7 regarding this latter problem. It is almost impossible to find an optimal procedure that minimizes the risk functions (1.2) in the corresponding classes for arbitrary values of T and i . For this reason, we will address an asymptotic setting when T →  and i → 0. Specifically, we will prove that the proposed below detection procedures are asymptotic solutions to the following problem:  m   as T →  and max i → 0 inf D i

∈T

1≤i≤N

(1.4)

Note that the last criterion is equivalent (or almost equivalent) to inf

sup i  − m  d = i ≥  as T →  and max1≤i≤N i → 0

∈T 1≤ n. Here p0 x and pi x, i = 1     N , are given prechange and postchange probability density functions (p.d.f.’s) and Xn = X1     Xn denotes the concatenation of all observations from all populations up to time n. The log-likelihood ratio (LLR) for the hypotheses Hi and H0 (no change) is Zi n = log

n  di n p X t  i = 1     N X  = log i i d0 p0 Xi t t=

(2.1)

while the LLR for the hypotheses Hi and Hj , i = j, is Zij n = Zi n − Zj n =

n 

log

t=

pi Xi t  i j = 1     N i = j pj Xj t

(2.2)

In what follows we write 1 N for 1     N and 0 N for 0 1     N. Procedure 1 (Recursive Matrix CUSUM Test). Define max1≤s≤n Zijs n and the Markov times

the

statistics

Lij n =

 

i = inf n ≥ 1  min Lij n − aj  ≥ 0  i = 1     N j=ij∈0N

( i =  if there is no such n), where aj are positive numbers. The first procedure ∗ =  ∗  d∗  is defined as

∗ = min i  d∗ = arg min i  i∈1N

i∈1N

Note that the stopping time ∗ can be also represented in the following form:

∗ = infn ≥ 1  Zis n ≥ a0  Zijs n ≥ aj  j = i j ∈ 1 N  for some i ∈ 1 N and s ≤ n

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

206

Tartakovsky

Procedure 2  (Recursive Matrix Shiryaev–Roberts Test). Define the s rij n = log ns=1 eZij n  and the Markov times   i = inf n ≥ 1  min rij n − bj  ≥ 0  i = 1     N

statistics

j=ij∈0N

(i =  if no such n exists), where bj are positive numbers. The second procedure ∗∗ =  ∗∗  d∗∗  has the form

∗∗ = min i  d∗∗ = arg min i  i∈1N

i∈1N

Let us first discuss the reasons why we expect the procedures ∗ and ∗∗ to be efficient in the problem considered. For i j ∈ 0 N , i = j, we use i j = 1i Zij1 1 to denote the Kullback-Leibler information distances between the hypotheses H1i and H1j (H10 = H0 ). It follows from (2.1) and (2.2) that i 0 = 1i log

pi Xi 1 p X 1  0 i = 0 log 0 i  p0 Xi 1 pi Xi 1

i j = i 0 + 0 j for i j ∈ 1 N  It is known that the stopping times

i0 = infn ≥ 1  Li0 n ≥ a0  and i0 = infn ≥ 1  ri0 n ≥ b0  are optimal procedures (at least asymptotically) to detect the change 0 → i in the ith population based on observations Xi 1 Xi 2    (see, e.g., Lorden, 1971; Moustakides, 1986; Pollak, 1985, 1987; Tartakovsky, 1991b, 1998d). In particular, if a0 = b0 = log T , then 0 i0 ≥ 0 i0 ≥ T and for any m > 0   log T m m m inf Dm   ∼ D 

 ∼ D   ∼ as T →  i0 i0 i i i  0 ≥T i 0 (cf. Tartakovsky, 1998d, 2005). On the other hand, assuming that it is known that a change has occurred (let it be from the very beginning) in the hypothesis testing problem (N -population identification problem) where for all n ≥ 1, Hi  Xk n ∼ 0 for k = i and Xi n ∼ 1i  i = 1     N the multihypothesis SPRT MH =  MH  dMH  given by

MH = min ˜ i  dMH = arg min ˜ i  ˜ i = infn  min Zij1 n − aj  ≥ 0 i∈1N

j=i

i∈1N

is asymptotically optimal when the probabilities of errors i   = 1i d = i, i = 1     N are small (see, e.g., Dragalin et al., 1999; Tartakovsky, 1998a,c). Specifically, if we choose aj = logN/j , then i  MH  ≤ i and for all m > 0 and i = 1     N ,    log j  m inf 1i m ∼ 1i  MH m ∼ max as max k → 0 1≤k≤N i j  k  ≤k k∈1N  j∈1N j=i

(cf. Dragalin et al., 1999; Tartakovsky, 1998a).

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

Multidecision Quickest Change-Point Detection

207

The proposed sequential detection-identification procedures are nothing but modifications of the multihypothesis SPRT adapted to the change-point detection and identification: since the point of change is unknown, the LLR statistics Zij1 n are replaced either with the generalized LLR statistics Lij n or with logarithms of average (over uniform prior) likelihood ratio statistics rij n. The decision d = 0 is then identified with continuation of observation. Since both the change detection procedures and the sequential test are asymptotically optimal, one can expect that so are the proposed detection-identification procedures. Intuitively, the aforementioned results allow us to expect that with an appropriate choice of thresholds,

inf

∈T

Dm i  



∗ Dm i  



∗∗ Dm i  



 log j  log T  max ∼ max i 0 j∈1N i j

 m

j=i

for all m ≥ 1 and i ∈ 1 N as T →  and maxk k → 0. Below we will show how to choose the thresholds to embed the procedures ∗ and ∗∗ into the class T  and will prove that this conjecture is indeed true. A particular version of Procedure 1 with a single threshold (i.e., a0 = a1 = · · · = aN = a) has been considered by Oskiper and Poor (2002).

3. ASYMPTOTIC PERFORMANCE FOR LARGE VALUES OF THRESHOLDS In this section we study the behavior of procedures ∗ and ∗∗ regardless of the class T  specified above when the thresholds ai and bi are large. In the following sections these results will be used to establish asymptotic optimality properties in the corresponding class as T →  and i → 0. For the sake of brevity, in the following we omit the index 1 in all variables, expectations, and probability measures when  = 1. For instance, we simply write Zij n, i , i instead of Zij1 n, 1i , 1i . The first important observation is that for the procedures ∗ and ∗∗ , ∗ ∗ m ∗∗ ∗∗ m Dm Dm i   = i  − 1  i   = i  − 1 

Indeed, the statistics Lij n = max1≤s≤n Zijs n obey the recursions n = 1 2     Lij 0 = 0 Lij n = Zij n + L+ ij n − 1

(3.1)

where Zij n = log pi Xi n/pj Xj n and y+ = max0 y denotes a nonnegative part of y. Thus, if  = 1, then one always starts from the zero point, while if  ≥ 2, then the starting point for Lij n considered from  − 1 on the set ∗ >  − 1 is Lij  − 1, i.e., random and belongs to the interval 0 aj . Since we consider the independent and identically distributed (i.i.d.) case, this is the only difference in a statistical sense between the situations  = 1 and  ≥ 2. In fact, by the renewal property, for evaluation of the expectation of  ∗ − m on the set ∗ ≥  it is sufficient to consider the behavior of statistics Lij n, n =   + 1    that again satisfy the recursion (3.1) but with the random initial condition Lij  − 1 = Lij

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

208

Tartakovsky

 − 1 ∈ 0 aj . Thus, on average, the process minj=i Lij n − aj  hits the zero level faster (not slower) when  > 1, compared to the case when  = 1, and i  ∗ − m  ≥  ≤ 1i  ∗ − 1m for all  ≥ 2 A similar argument shows that the same is true for procedure ∗∗ .  m we can concentrate on Therefore, in further calculations related to the risk D i ∗∗ m ∗ m evaluation of i  − 1  d = i and i  − 1  d = i, assuming that a change occurs in point  = 1. In what follows we always suppose that the ratios ai /aj and bi /bj are bounded away from zero and infinity, i.e., lim ai y/aj y = lim bi y/bj y = cij with 0 < cij < 

y→

y→

(3.2)

where y stands for a generic variable such that ak y →  as y → . Also, we always assume that the information numbers are strictly positive and finite: 0 < i j <  for all 0 ≤ i j ≤ N i = j

(3.3)

The condition i j > 0 holds whenever measures i and j are mutually absolutely continuous and distinct (i.e., i Zij 1 = 0 > 0). The results are trivially valid in the case i j = , but this case has no practical significance. Let ij∗  = i d∗ = j  ∗ ≥  and i∗  = i d∗ = i  ∗ ≥  denote the probabilities of errors of the procedure ∗ . Respectively, let ij∗∗  and i∗∗  denote the probabilities of errors of the procedure ∗∗ . Also, write c = c0  c1      cN  and cmin = mink∈0N ck and, for i = 1     N , define  Si n = ns=1 Zi s Si 0 = 0 ti h = infn ≥ 1  Si n ≥ h vi = limh→ i exp−Si ti h − h

(3.4)

where the constant vi can be computed using renewal-theoretic argument (see, e.g., Siegmund, 1985). In the following theorem we derive bounds for the ARL2FA and probabilities of errors. Theorem 3.1. Let 0 < i j < . (i) i ∗ m <  and i ∗∗ m <  for any positive and finite aj and bj and any m > 0. (ii) The ARL2FAs satisfy the inequalities 0 ∗ ≥ where κN a0  =

1 a0 1 e κN a0 −1  0 ∗∗ ≥ eb0 for any a0  b0 > 0 N N

(3.5)



 −1 N 1 e−1 1  max 1 + 0 i − 1 − 1/ee−a0 (3.6) N i=1 0 i e

and ea0 eb0 ∗∗ 1 + o1 

≥ 1 + o1 as a0  b0 →   0 N 2 i=1 vi i 0 i=1 vi (3.7)

0 ∗ ≥ N

Multidecision Quickest Change-Point Detection

209

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

(iii) The probabilities of errors satisfy the inequalities ij∗ 1 ≤ e−ai min i ∗ a

e e+1 i ji ∗ a − 1 +  e−1 e−1

ij∗∗ 1 ≤ e−bi i ∗∗ b

(3.8) (3.9)

To prove the theorem we need the following auxiliary result. Lemma 3.1. Let  and  be two distinct measures and the sequence Xn n≥1 be i.i.d. under both measures. Let px and fx denote the probability densities of  and  with respect to a dominating measure x. Define n =

n n  n   pXk  fXk   0 = 1 and Rn =  R0 = 0 fXk  pXk  k=1 s=1 k=s

Let be any (possibly randomized) stopping time with respect to Xn . Then F  F

 max n

0≤n≤ −1

≤ min

 −1 

 n

= P 

(3.10)

n=0

e

1 + e−1 + Ip fP  − 1 P  e−1

(3.11)

provided  <  = 1

lim P R ∧n − ∧ n < 

n→

lim P Rn − n >n  = 0

n→

(3.12)

where p f = log px/fxpxdx is the Kullback–Leibler information number and G is the operator of mathematical expectation with respect to the measure G. Proof. It is easy to see that Rn = 1/n  F

 −1 

n−1 s=0

s and, hence,

 n

= F  R  = P R

(3.13)

n=0

where the letter equality holds whenever  <  = 1. Let n = X1      Xn . Straightforward calculation shows that the process Rn − n n ≥ 1 is a zero-mean  n -martingale. By the second and third conditions in (3.12), P R −  = 0 (see Theorem 4.1 in Klass, 1988), which along with (3.13) proves (3.10). For any t > 0 write t = ∧ t − 1. To prove inequality (3.11) we first apply Doob’s inequality to the martingale  t (see Theorem 3.4, Ch. 7, in Doob, 1953) and then use the trivial inequality y log+ y ≤ e−1 + y log y

210

Tartakovsky

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

to obtain e 1 + F  t log+  t  e−1 e 1 + e−1 + F  t log  t  ≤ e−1 e = 1 + e−1 + P log  t  e−1  

t  pXj  e log 1 + e−1 + P  = e−1 fXj  j=1

F  max n  ≤ 0≤n≤ t

By the Wald identity,  P

t 

log

j=1

pXj  fXj 

 = Ip fP t 

and, consequently, F



 max n ≤

0≤n≤ t

e 1 + e−1 + Ip fP t  e−1

Since the sequence maxn≤ t  t  t ≥ 1 is monotonically nondecreasing, letting t → , by the Beppo Levi theorem we get from here   e F max n ≤ 1 + e−1 − Ip f + Ip fP  0≤n≤ −1 e−1 On the other hand, by (3.10), F



 −1    max n ≤ F n = P 

0≤n≤ −1

n=0

Combining the last two inequalities, we get (3.11) and the lemma follows. Proof of Theorem 3.1. (i) For ck ≥ 0, introduce the following stopping times: (3.14) i c = inf n ≥ 1  min Zik n − ck  ≥ 0  i = 1     N k∈0N k=i

Assertion (i) follows immediately from the fact that ∗ ≤ i ≤ i a, ∗∗ ≤ i ≤ i b, and Lemma A.1 of Dragalin et al. (1999), which states that the stopping time i c is exponentially bounded for any nonnegative finite set of ck whenever i j > 0. (ii)

We first prove the second inequality in (3.5). Let Ri0 n = eri0 n . Observe

that i ≥ i0 = infn ≥ 1  Ri0 n ≥ eb0  and ∗∗ = min i ≥ min i0 1≤i≤N

1≤i≤N

N    = inf n ≥ 1  max Ri0 n ≥ eb0 ≥  = inf n ≥ 1  N −1 Ri0 n ≥ N −1 eb0 

1≤i≤N

i=1

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

Multidecision Quickest Change-Point Detection

211

 Write GN n = N −1 Ni=1 Ri0 n. It is easily verified that the statistic GN n − n n ≥ 1 is a zero-mean 0  n -martingale and that the second and third conditions in (3.12) hold for GN with  = 0 and = . See Tartakovsky and Veeravalli (2004). Therefore, by Theorem 4.1 in Klass (1988), 0 GN  = 0  and, by the definition of the stopping time , GN  ≥ N −1 eb0 on  < . This implies that 0 ∗∗ ≥ 0  ≥ eb0 /N

(3.15)

To prove the first inequality in (3.5) we note that ∗ a ≥ ∗∗ a and, thus, by (3.15), 0 ∗ ≥ ea0 /N

(3.16)

However, this bound can be improved with the help of Lemma 3.1 as follows. Let sij n = exp Zijs n. As before, we write simply ij n instead of 1ij n. It is clear that

i ≥ i0 = infn ≥ 1  Li0 n ≥ a0  and Li0 n = Zi0 n − min Zi0 s − 1 = log i0 n + log 1≤s≤n



 max 0i s 

0≤s≤n−1

The latter relationship and the definition of i0 yield i0  i0  max 0i s ≥ ea0 on  i0 <  0≤s≤ i0 −1

which implies     0 i0  i0  max 0i s = i max 0i s ≥ ea0  0≤s≤ i0 −1

0≤s≤ i0 −1

Since by Lemma 3.1 (see equation (3.11)) e

1 + e−1 + 0 i0  i0 − 1 i max 0i s ≤ 0≤s≤ i0 −1 e−1 it follows that   1 e − 1 a0 e + 0 i − 1 − 1/e  0 i ≥ 0 i0 ≥ 0 i e Combining the latter inequality with inequality (3.16), one obtains   e − 1 a0 1 e + 0 i − 1 − 1/e  0 i ≥ max ea0  0 i e which, similarly to (3.15) after simple algebra, yields the first inequality in (3.5). To prove asymptotic inequalities (3.7) we use the results of Pollak and Tartakovsky (2008) and Tartakovsky (2005), which state that the asymptotic

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

212

Tartakovsky

distributions of the suitably standardized stopping times i0 a0  and i0 b0  are asymptotically exponential as a0  b0 → . More specifically, i0 e−a0 vi2 i 0 and i0 e−b0 vi converge weakly to Exponential(1) and the moment-generating functions to that of Exponential(1) when the thresholds go to infinity. Therefore,     ea0 eb0 0 min i0 ∼ N 2  0 min i0 ∼ N as a0  b0 →  1≤i≤N 1≤i≤N i=1 vi i 0 i=1 vi which along with the fact that 0 ∗ ≥ 0 min1≤i≤N i0  and 0 ∗∗ ≥ 0 min1≤i≤N i0  proves (3.7). The proof of (ii) is complete. (iii)

First, we observe that

Lji n = max Zjis n = Zji n − min Zji s − 1 = −Zij n + max Zij s 1≤s≤n

1≤s≤n

0≤s≤n−1

and, by the Wald likelihood ratio identity, ij∗ 1 = i  ∗ = j  = j ij  j  ∗ = j   which yields     ij∗ 1 = j exp − max Zjis  j  max ij s ∗ = j   0≤s≤ j −1

1≤s≤ j

By the definition of j , on  ∗ = j <    max Zjis  j  ≥ ai ⇒ exp − max Zjis  j  ≤ e−ai

1≤s≤ j

1≤s≤ j

and, hence, ij∗ 1 ≤ e−ai j



 max  n  ij ∗

0≤n≤ −1

Next, using Lemma 3.1 with  = i ,  = j , we obtain the upper estimate j



e −1 ∗ ∗

1 + e − i j + i ji  i  max ij n ≤ min 0≤n≤ ∗ −1 e−1 



which together with the previous inequality shows that e e e+1 ij∗ 1 ≤ e−ai min i ∗  i ji ∗ + − i j  e−1 e−1 e−1 This completes the proof of (3.8). Thus, it remains to prove (3.9). Observe that n−1 e

rji n

=

ij s ij n

s=0

(3.17)

Multidecision Quickest Change-Point Detection ∗∗ 

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

and erji 

ij∗∗ 1

213

≥ ebi on the set  ∗∗ = j . Therefore,

= i 

∗∗

= j  = j ij j  ∗∗ =j   ≤ e

−bi

j

∗∗ −1



ij s = e−bi i ∗∗ 

s=0

where the latter equality follows from Lemma 3.1 (see equation (3.10)). The proof of (iii) is complete. The following theorem establishes asymptotic performance of the sequential detection-identification procedures ∗ and ∗∗ in terms of positive moments of the detection delay in the worst-case scenario. For c = c0  c1      cn  with ci > 0, i ∈ 0 N , write    

ck ck c0 = max  max  i = 1     N i c = max i k i 0 k∈1N i k k∈0N k=i

k=i

Theorem 3.2. Assume that 0 < i j < . (i) For all i ∈ 1 N ,  m  ∗  ∼ Dm  ∗  ∼ i am as amin →  D i i

(3.18)

 m  ∗∗  ∼ Dm  ∗∗  ∼ i bm as bmin →  D i i

(3.19)

where amin = mink∈0N ak and bmin = mink∈0N bk . (ii) For all i ∈ 1 N ,    e min 1 i k 1 + o1 as amin →  (3.20) i∗ 1 ≤ e−ai i a e−1 k∈1N k=i

i∗∗ 1

≤ N − 1e−bi i b1 + o1 as bmin → 

(3.21)

This theorem will be proved through several lemmas of independent interest. To begin, we note that i a ≤ i a and ∗∗ a ≤ ∗ a

(3.22)

Thus, upper bounds for the moments of the stopping time ∗∗ that we derive below follow from that of ∗ . Recall that the stopping time i c is defined by (3.14). In addition, we will need the following Markov times:

ij cj  = infn ≥ 1  Lij n ≥ cj 

(3.23)

ij cj  = infn ≥ 1  rij n ≥ cj  i ∈ 1 N  j ∈ 0 N  i = j

(3.24)

Obviously, for j = 0 the Markov time i0 c0  corresponds to the CUSUM detection procedure and the Markov time i0 c0  to the Shiryaev-Roberts procedure, both for detecting the change 0 → i based on the observations Xi 1 Xi 2    from the ith population.

214

Tartakovsky

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

Lemma 3.2. Let 0 < i j < . (i) For all i ∈ 1 N and all m > 0, i i cm ∼ i cm as cmin → 

(3.25)

(ii) For all j = i and all m > 0,  i ij cj  ∼ i ij cj  ∼ m

m

cj i j

m as cj → 

(3.26)

Proof. (i) This follows immediately from Theorem 4.1 in Dragalin et al. (1999). (ii) For j = 0, the asymptotic relation (3.26) has been proven in Tartakovsky (1998d) (see also Tartakovsky, 2005). For j ∈ 1 N the proof is identical. Lemma 3.3. Let m ≥ 1. If 0 < i j < , then i k cm ≥ i k cm ≥ eci for any k = i i i c ∼ i i c ∼ i c m

m

m

as cmin → 

(3.27) (3.28)

Proof. Taking into account that k ≥ ki , replacing 0 by i and i by k in the proof of Theorem 3.1(ii), and applying the same argument, one obtains that i k ≥ i k ≥ i ki ≥ eci (for any k = i). Hence, (3.27) follows from Jensen’s inequality. We first note that since i ≤ i , applying Lemma A.1 of Dragalin et al. (1999), we obtain that i i cm <  for all 0 ≤ ci < . Now we use Lemma 3.2(i) and (3.22) to obtain the upper bound i i cm ≤ i i cm ≤ i cm 1 + o1 To prove (3.28) or, equivalently, the convergence of moments    

c m  c m i i → i i → 1 as cmin →  i c i c it remains to show that this upper bound is also the lower bound (asymptotically). Since i c ≥ ij cj  and i c ≥ ij cj  for any j ∈ 0 N (j = i), it follows from Lemma 3.2(ii) that m  cj m m i i c ≥ i i c ≥ 1 + o1 for all j = i as cj →  i j which completes the proof of the lemma. Now everything is prepared to prove Theorem 3.2. Proof of Theorem 3.2. First consider the detection procedure ∗ . Since, by (3.8) and (3.28), i  ∗ = i  = i∗ 1 ≤ N − 1e−ai i ∗ ≤ N − 1e−ai i i ≤ N − 1e−ai i a1 + o1 → 0

Multidecision Quickest Change-Point Detection

215

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

it is easily seen that i  ∗ m = i  i m 1 + o1 as amin →  and, hence, by (3.28), ∗ m i  ∗ m ∼ Dm i   ∼ i a 

(3.29)

which proves (3.18) for Dm . i Next, since i∗ 1 = j=i ij∗ 1, the asymptotic inequality (3.20) follows immediately from Theorem 3.1(iii) (see equation (3.8)) and (3.29). This also implies  m  ∗  ∼  m . Indeed, by (3.20), ∗ 1 → 0 as amin →  and, hence, D (3.18) for D i i i m ∗ Di  . Thus, the proof of both parts (i) and (ii) is complete for Procedure 1. For Procedure 2 the proof is identical. The details are omitted. Remark 3.1. The asymptotic upper bounds (3.20) and (3.21) obtained for the probabilities of errors in point  = 1 can be generalized for arbitrary  ≥ 1. In particular, a quite cumbersome argument shows that for Procedure 1, i∗  ≤ N − 1e−ai i aCi 1 + o1 as amin → 

(3.30)

and an analogous bound holds for Procedure 2. In general the problem of computing constants Ci  ≥ 1 is quite difficult and requires a big investment of effort. Constants Ci  are, of course, the functions of the model and are different for different forms of p.d.f.’s pi . For this reason, in the next section we replace ˜ the initial class T  with another class T  that confines the probabilities of errors 1i d = i in place of sup i d = i  ≥ . A similar approach has been used by Dragalin (1995) and Lai (2000). However, we conjecture that the upper bounds with Ci  = 1 are very conservative for i∗ 1 and in fact hold for every  ≥ 1. This conjecture is supported by Monte Carlo simulations presented in Section 6. Remark 3.2. Nikiforov (2000, 2003) suggested a different CUSUM-based detection procedure, where the statistics Li n − Lj n are used instead of the statistics Lij n, and proved the upper bound on the supremum probability. This upper bound turns out to be even more conservative. Also, an argument in Remark 4.2 and results of simulations presented in Section 6 show that the detection-identification procedures ∗ and ∗∗ perform better, especially when the detection threshold is smaller than the identification thresholds, i.e., a0 < ai and b0 < bi for i = 1     N . 4. ASYMPTOTIC OPTIMALITY In this section we prove that specially designed detection-identification procedures ∗ and ∗∗ are asymptotically optimal in the class of procedures ˜ T  =   0 ≥ T i d = i ≤ i for all i = 1     N 

(4.1)

We emphasize the difference between this class and the original class T  defined in (1.3). The class (4.1) confines the probabilities of errors i d = i  ≥  at  = 1,

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

216

Tartakovsky

while the class (1.3) confines sup i d = i  ≥ . Unfortunately, we were unable to estimate sup i∗  and sup i∗∗  with acceptable accuracy to prove asymptotic optimality in the original class. However, see Remark 3.1.  m  , i ∈ 1 N in the We first derive the asymptotic lower bounds for inf ∈T D i original class T , and then show that these bounds are sharp for Procedures 1 ˜ and 2 (but in the class T ). For i ∈ 1 N , define  

 log j  log T i T  = max  max  (4.2) i 0 j∈1N i j j=i

Also, write for brevity max = maxk∈1N k . Theorem 4.1. Let 0 < i j < . Then, for all m > 0 and i ∈ 1 N , m m inf Dm i   ≥ inf Di   ≥ i T  1 + o1 as T →  max → 0 (4.3)

∈T

∈T

The proof is divided into two stages, which are described in the following two lemmas. Lemma 4.1. Let 0 < i j < . Then, for all m > 0,    log j  m m m  1 + o1 as max → 0 inf Di   ≥ inf Di   ≥ max ∈ ∈ i j j∈1N

(4.4)

j=i

where  =   sup≥1 i d = i  ≥  ≤ i  i = 1     N detection procedures that confines only the probabilities of errors.

is

the

class

of

Proof. Write

¯ i  =

 maxj=i  log j /i j

Chebyshev’s inequality applies to obtain i ¯ i m ≥ m i ¯ i  >  for any m > 0 and  > 0 Suppose lim

inf i ¯ i  >  = 1 for every 0 <  < 1

max →0 ∈

Then it follows from the previous inequality that lim

inf i ¯ i m ≥ 1 for any m > 0

max →0 ∈

which obviously implies  inf Dm i   ≥ max

∈

j∈1N j=i

 log j  i j

m 1 + o1 as max → 0

(4.5)

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

Multidecision Quickest Change-Point Detection

217

m Since Dm i   ∼ Di   as max → 0 for any procedure ∈ , this last inequality yields (4.4). Thus, to prove (4.4) we have only to establish the validity of (4.5). Let, for some positive finite L, iL = d = i ∩  ≤ L. Obviously, for any L ≥ 1 and B > 0, j d = i = i d=i ji   ≥ i iL Zij   L, it follows that   i  > L ≥ i d = i − j d = ieB − i max Zij n ≥ B 1≤n≤L   ≥ 1 − i − j eB − i max Zij n ≥ B for any ∈  1≤n≤L

From this, assuming B = 1 + i jL and L = Lj with Lj being a greatest integer that does not exceed 1 −  log j /i j, one obtains that for any ∈  i  > 1 −  log j /i j   2 ≥ 1 − i − j − i max Zij n ≥ 1 + i jLj  1≤n≤Lj

Since this inequality holds regardless of the choice of a specific sequential procedure from the class , we may also write inf i  > 1 −  log j /i j

∈

 2

≥ 1 − i − j − i

 1 max Zij n ≥ 1 + i j  Lj n≤Lj

(4.6)

By the strong law of large numbers (which holds due to our assumption i j < ), n−1 Zij n → i j i -a.s. and, hence, for all j = i,   1 lim i max Z n ≥ 1 + i j = 0 for every  > 0 (4.7) max →0 Lj 1≤n≤Lj ij (The details of the proof of equation (4.7) may be found in the proof of Lemma 2.1 of Tartakovsky, 1998a). The relations (4.6) and (4.7) show that for any 0 <  < 1, inf i  > 1 −  log j /i j → 1 for all j ∈ 1 N  j = i as max → 0

∈

from which (4.5) follows in an obvious manner. This completes the proof of (4.4).

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

218

Tartakovsky

The following lemma is of interest on its own merit. It will be used in the next section to prove asymptotic optimality of the detection procedures ∗ and ∗∗ as well as the maximum likelihood detection procedures (defined below) in the class T =   0 ≥ T, which confines only ARL2FA  = 0 . Lemma 4.2. Let 0 < i j < . Then, for all positive m,  inf

∈T

Dm i  



log T i 0

m 1 + o1 as T → 

(4.8)

Proof. Consider the case where a change may occur only in the ith population. Then it follows from Theorem 3.1 of Tartakovsky (1998d) that the right-hand side of (4.8) is the lower bound for the minimax risk Dm i   of any detection procedure in the class T as T → . Obviously, this lower bound cannot be improved in the case where the location of the change is a priori unknown. Thus, the desired lower bound follows. An alternative, more detailed proof can be constructed based on the argument analogous to that given in Tartakovsky (2005). m Proof of Theorem 4.1. Since T  = T ∩  and Dm i   ∼ Di   as max → 0 for any ∈ T , the combination of (4.4) and (4.8) yields the desired lower bound (4.3). Now everything is prepared to prove asymptotic optimality of the procedures ˜ . ∗ and ∗∗ with specially chosen thresholds in the class T Theorem 4.2. Let m ≥ 1. Assume that 0 < i j < . (i) Let a0 be a solution of the equation a0 = logNT + log κN a0 

(4.9)

where κN a0  is defined in (3.6), and let b0 = logNT

(4.10)

ARL2FA ∗  ≥ T and ARL2FA ∗∗  ≥ T for every T > 1

(4.11)

Then

If  a0 = log T

N 

 vi2 i 0

i=1

 and b0 = log T

N 

 vi 

(4.12)

i=1

where vi is defined in (3.4), then ARL2FA ∗  ≥ T1 + o1 and ARL2FA ∗∗  ≥ T1 + o1 as T →  (4.13)

Multidecision Quickest Change-Point Detection

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

(ii)

219

Let ai and bi , i = 1     N , be solutions of the equations    N −1 e 1 −1 ai e i a = min 1 i j  i e − 1 N − 1 j∈1N

(4.14)

j=i

ebi i b−1 =

N −1  i

(4.15)

Then i∗ 1 ≤ i 1 + o1 and i∗∗ 1 ≤ i 1 + o1 as i → 0

(4.16)

(iii) Let thresholds ai and bi be chosen from (4.9), (4.14) and (4.10), (4.15), respectively. Then, for all i ∈ 1 N ,  m   ∼ D  m  ∗  ∼ D  m  ∗∗  ∼ i T m as T →  max → 0 inf D (4.17) i i i ˜ ∈T

and the same is true for Dm i . Proof. From inequalities (3.5) in Theorem 3.1 we immediately obtain inequalities (4.11) if a0 and b0 are chosen according to (4.9) and (4.10), respectively. Asymptotic inequalities (4.13) follow directly from (3.7) in Theorem 3.1. If one chooses thresholds ai and bi from equations (4.14) and (4.15), then inequalities (4.16) follow from asymptotic inequalities (3.20) and (3.21) in Theorem 3.2. Thus, it remains to prove (iii). Straightforward computation shows that if ai and bi are taken as in (4.9), (4.14) and (4.10), (4.15), then for any m > 0,

i am ∼ i bm ∼ i T m as T →  max → 0 which along with Theorem 3.2(i) yields  m  ∗  ∼ D  m  ∗∗  ∼ i T m  D i i Now (4.17) follows from Theorem 4.1. Corollary 4.1. a0 = logNT implies ARL2FA ∗  ≥ T . If maxi∈1N 0 i < e − 1/e, then   N e  a0 = log T 0 i−1 implies ARL2FA ∗  ≥ T1 − o1 as T →  e − 1 i=1 (i)

(ii)

Let, for i = 1     N ,    log j  N −1 log T ai = log max  max i i 0 j∈1N i j j=i





 e 1 min 1 i j  e − 1 N − 1 j∈1N bi = log



j=i

 log j  N −1 log T  max max i i 0 j∈1N i j j=i





(4.18)

(4.19)

220

Tartakovsky

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

Then i∗ 1 ≤ i 1 + o1 and i∗∗ 1 ≤ i 1 + o1 as i → 

(4.20)

(iii) Let a0 = b0 = logNT and let ai and bi be as in (4.18) and (4.19), respectively. Then (4.17) holds. Note also that we expect that the thresholds given by (4.12) that account for overshoots and a renewal property of the CUSUM statistic will provide much better approximation to the ARL2FA. This is confirmed by Monte Carlo simulations (see Section 6). We would like to emphasize once more that the reason we replaced the initial ˜ class T  with class T  is that in general we are unable to calculate the constants Ci  (for  ≥ 2) in (3.30) to confine sup i d = i  ≥ . However, as we mentioned in Remark 3.1, we expect that the upper bounds with Ci  = 1 are very conservative for i∗ 1 and i∗∗ 1 and hold for every  ≥ 1, i.e., for sup i d = i  ≥ , too. This conjecture is supported by Monte Carlo simulations presented below. We therefore may conclude that the asymptotic optimality result established in Theorem 4.2(iii) should hold for the initial class T . However, a rigorous proof of this fact is still an open problem. Remark 4.1. Let T ij  =   0 ≥ T ij   ≤ ij  i j ∈ 1 N  i = j denote the class of procedures for which all partial probabilities of errors ij   = sup i d = j ≥  do not exceed the predefined numbers ij . The proof of Theorem 4.1 shows that the following asymptotic lower bounds hold for all i j ∈ 1 N , i = j:    m  log ji  log T m m  inf Di   ≥ max 1 + o1 inf Di   ≥  max ∈Tij  ∈Tij  i 0 j∈1N i j j=i

as T →  max → 0

(4.21)

Procedures ∗ and ∗∗ with thresholds aji and bji (in place of aj and bj ) chosen so that ij∗ and ij∗∗ are less than or equal to ij are then asymptotically optimal. In particular, one may set 

  log ji  log T −1 aji = bji = log ji max  max  i 0 j∈1N i j j=i

For m = 1 and ij = , i j = 1     N , i = j, the lower bound (4.21) yields the lower bound for the minimax ADD in the class T  =   0 ≥ T maxij∈1N i=j sup≥1 i d = j  ≥  ≤ , inf

max sup i  −  ≥ 

∈T i∈1N 1≤ 1/. Specifically, it can be shown that in the multipopulation problem considered in the present paper, for Nikiforov’s test nik =  nik  dnik  the ADDi = i  nik − 1 is asymptotically equal to ADDi  nik  ∼

1 maxlog T  log  as T →   → 0 i 0

(4.23)

Comparing with the lower bound (4.22) shows that in the multipopulation problem considered, Nikiforov’s test is asymptotically optimal if, and only if, log T >  log . Indeed, while in the proposed multipopulation CUSUM test ∗ the statistics Lij n are used, in Nikiforov’s test these statistics are replaced with the difference of CUSUM statistics Li0 n and Li0 n, i.e., with the statistics Gij n = Li0 n − Lj0 n. Obviously, for sufficiently large n, the mean trend of the statistic Gij n under hypothesis H1i is 1i Gij n ≈ i 0n + ci − cj ≈ i 0n, since 1i Li0 n ≈ i 0n + ci and 1i Lj0 n = 0 Lj0 n ≈ cj , where ci and cj are positive constants. The latter is true since 1i Zj0 1 = −0 Z0j 1 = −0 j < 0. On the other hand, 1i Lij n ≈ i jn = i 0 + j 0n. Therefore, the optimal rate  log / minj=i i j (which is attained for the procedures ∗ and ∗∗ ) is replaced with the nonoptimal rate  log /i 0 in Nikiforov’s procedure. As a result, the latter procedure is not asymptotically optimal if T < 1/, i.e., when the detection threshold is smaller then the identification threshold. The above argument is confirmed by Monte Carlo simulations in Section 6. 5. ASYMPTOTIC OPTIMALITY IN CLASS T We now ignore the constraints related to the probabilities of misclassification and focus on the class T =   0 ≥ T, which confines only the ARL2FA. Despite the fact that absolute values of the error probabilities are not taken into account, we still consider detection procedures for which the misclassification rate is small when T → . To be specific, we consider such ∈ T, for which sup i d = i  ≥  → 0 as T → . This additional restriction is quite reasonable. Obviously, this constraint holds for the procedures ∗ and ∗∗ . In what follows, the class of such ˜ detection procedures will be denoted by T. Since we are interested in controlling only the FAR, one needs to pick only one threshold in the procedures ∗ and ∗∗ . Thus, we take a0 = aj = a and b0 = bj = b (j ∈ 1 N ), and the Markov times i and i become    

i = inf n  min Lij n ≥ a  i = inf n  min rij n ≥ b  j∈0N j=i

j∈0N j=i

With the help of Theorem 3.2(i), Theorem 4.2(i), and Lemma 4.2 it is easily shown that if we take a = b = logNT, then 0 ∗ ≥ 0 ∗∗ ≥ T ; i∗ 1 i∗∗ 1 ≤ N − 1 logNT/ NT i 01 + o1; and   log T m m m ∗ m ∗∗    inf Di   ∼ Di   ∼ Di   ∼ as T →  (5.1) ˜ i 0 ∈T

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

222

Tartakovsky

˜ i.e., these procedures are asymptotically optimal in the class T for large T . Obviously, the same is true with respect to Dm . i Along with the above detection procedures, consider the “maximum likelihood” ML ML ML ML detection procedures ML =  ML ∗ ∗  d∗  and ∗∗ =  ∗∗  d∗∗  of the form   ∗ ML = inf n  max L n ≥ h  d∗ML = i if max Lk0  ML

ML k0 ∗ ∗  = Li0  ∗  

k∈1N

k∈1N



∗∗ ML ML

ML  d∗∗ = i if max rk0  ML ∗∗ = inf n  max rk0 n ≥ h ∗∗  = ri0  ∗∗  k∈1N

k∈1N

where h∗ and h∗∗ are nonnegative thresholds. It is seen that these procedures have an entirely different structure compared to the previous procedures. Thus, it is reasonable to ask whether or not these detection procedures preserve asymptotic optimum. The following theorem shows that the result similar to (5.1) holds for the maximum likelihood detection procedures ML and ML ∗ ∗∗ . See also Tartakovsky 1 (1994) for m = 1 and the risk function Di  . Theorem 5.1. Let 0 < i j < ,  log T Ni=1 vi . Then

h∗ = log T

N i=1

vi2 i 0

and

ML ∗ ∗ 0 ML ∗ h  ∼ 0 ∗ h  ∼ T as T → 

h∗∗ =

(5.2)

and for all i ∈ 1 N and m ≥ 1,  m  ML  ∼ D  m  ML  ∼  m   ∼ D inf D i i i ∗ ∗∗

˜ ∈T



log T i 0

m as T → 

(5.3)

and the same is true with respect to Dm i . ∗ ∗ ∗∗ Proof. We first observe that ML and ML ∗ h  = mini∈1N i0 h  ∗∗ h  = ∗∗ ∗ ∗∗ mini∈1N i0 h , where i0 h  is Page’s detection procedure and i0 h  is the Shiryaev–Roberts detection procedure (see equations (3.23) and (3.24) for j = 0). By Pollak and Tartakovsky (2008) and Tartakovsky (2005), the distributions of

i0 he−h vi2 i 0 and i0 he−h vi converge weakly to Exponential(1) as h →  (along with the moment generating functions). Since the corresponding stopping times are independent for different populations i, it follows that

  0 min i0 h ∼ N 1≤i≤N

  eh eh  0 min i0 h ∼ N as h →  2 1≤i≤N i=1 vi i 0 i=1 vi

  which implies (5.2) if h∗ = log T Ni=1 vi2 i 0 and h∗∗ = log T Ni=1 vi . ML Asymptotic relations (5.3) are proved only for procedure ML ∗ . For ∗∗ the proof is identical. ∗ ∗ Since ML ∗ h  ≤ i0 h , it follows from Lemma 3.2(ii) that  ∗ m i ML ∗ h 



h∗ i 0

m 1 + o1 as h∗ → 

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

Multidecision Quickest Change-Point Detection

223

where the right side is clearly equal to log T/i 0m 1 + o1 when h∗ =  ML ML − 1m , this shows that log T Ni=1 vi2 i 0. Since Dm i  ∗  = i ∗   log T m m ML 1 + o1 as T →  Di  ∗  ≤ i 0 which along with the reverse inequality (4.8) of Lemma 4.2 yields   log T m m ML   ∼ D   ∼ 1 + o1 as T →  inf Dm i i ∗ ∈T i 0 ML m ˜ Since Dm i   ∼ Di   for any ∈ T as T → , it remains to prove that i  = ML ML i d∗ = i  ∗ ≥  → 0 as T → , which follows from Tartakovsky (1994). ML Remark 5.1. Since ML ∗ h ≥ ∗∗ h and   h ≥

ML ∗∗ h = min i0 h = inf n ≥ 1  max Ri0 n ≥ e i∈1N



i∈1N

= inf n ≥ 1  N −1



 Ri0 n ≥ eh /N 

i∈1N

and since N

N −1

i=1

Ri0 n − n is a 0 -martingale with mean 0, it follows that

−1 ≥ 0 ML 0 ML ∗ ∗∗ ≥ 0  = N

N 

Ri0  ≥ eh /N

i=1

Therefore, h∗ = h∗∗ = hT N = logNT ML 0 ∗∗ hT N ≥ T for every T > 1 and N ≥ 1.

implies

0 ML ∗ hT N ≥

for Remark 5.2. More generally, we may select different thresholds h∗i and h∗∗ i different populations, which leads to the following multichart CUSUM and Shiryaev-Roberts detection procedures: ∗ = min i0 h∗i  i0 h∗i  = infn  Li0 n ≥ h∗i  1≤i≤N

∗∗ ∗∗ = min i0 h∗∗ i0 h∗∗ i  i  = infn  ri0 n ≥ hi  1≤i≤N

The decision d = j is made for the population j for which the minimum is attained (i.e., ∗ = j0 h∗j ). Since there are N thresholds and only one constraint 0 = T , the solution is not unique and additional constraints are needed. One way is to balance ARL2FA for all populations, i.e., to require ∗∗ 0 10 h∗1  = · · · = 0 N 0 h∗N  and 0 10 h∗∗ 1  = · · · = 0 N 0 hN 

(at least approximately). Using asymptotic exponentiality of stopping times i0 h∗i  and i0 h∗∗ i , we obtain that ∗ and ∗∗ are also asymptotically exponential and that (for large thresholds) 0 ∗ ∼ N

1

2 −h∗i i=1 vi i 0e

1  −h∗∗ i i=1 vi e

 0 ∗∗ ∼ N

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

224

Tartakovsky

∗ Thus, selecting h∗i = log vi2 i 0NT and h∗∗ i = logvi NT implies 0 i0 hi  ∼ ∗ ∗∗ ∗∗ 0 i0 hi  ∼ NT and 0  ∼ 0  ∼ T . With these thresholds the multichart CUSUM and Shiryaev-Roberts tests are also asymptotically optimal (i.e., equation 5.3 holds), which can be proven in just the same way as above.

Thus, (5.1) and Theorem 5.1 show that the two entirely different structures of the detection procedures ensure the first-order asymptotic optimality property. At the same time the maximum likelihood detection procedures ML and ML ∗ ∗∗ are simpler in implementation and simulation. However, in general the procedures ML and ML ∗ ∗∗ are not optimal in the class T  and even cannot control both the misclassification rate and the rate of false alarms (only one of these rates can be controlled). The reason is that the procedures ML and ML monitor ∗∗ ∗ ∗ different populations separately, while the former procedures and ∗∗ monitor the observations from all populations in an optimal manner by using the optimal sequential test of multiple hypotheses. See Tartakovsky (1994) for further discussion.

6. MONTE CARLO SIMULATIONS As an example, consider the multipopulation version of the standard problem of detecting a change in the mean value of the Gaussian sequence. To be precise, let Xn = X1 n     XN n where, under hypothesis Hi , Xi n = n≥ + i n and Xj n = j n for all j = i. Here l nn≥1 and k nn≥1 are (mutually independent) i.i.d.  Gaussian sequences,  0 2 , and  = 0 is a known number. Write Si n = nk=1 Xi k and q = 2 / 2 . It is easily seen that Zi0 n =

  S n − /2 Zij n = 2 Si n − Sj n 2 i

i 0 = 0 i = q/2 for i ∈ 1 N  i j = q for i j ∈ 1 N  i = j Due to the symmetry it is reasonable to use only two sets of thresholds a0 , b0 (detection) and a1 , b1 (identification) in the procedures ∗ and ∗∗ . It thus follows from Theorem 3.1(ii) and Theorem 3.2 that

1 2 e−1 1 0 ∗ ≥ ea0 max 1 + q/2 − 1 − 1/ee−a0  0 ∗∗ ≥ eb0  (6.1) N q e N ea0 eb0 ∗∗

≥ 1 + o1  1 + o1 as a0  b0 →  0 Nv2 q/2 Nv     e 2a0 a1 i∗ 1 ≤ N − 1 min 1 q e−a1 max  1 + o1 as amin →  e−1 q q   2b0 b1 −b1 ∗∗ i 1 ≤ N − 1e max  1 + o1 as bmin →  q q  m   m  ∗  ∼ Dm  ∗  ∼ max 2a0  a1 D as amin →  i i q q  

2b0 b1 m m ∗∗ m ∗∗  Di   ∼ Di   ∼ max as bmin →   q q 0 ∗ ≥

(6.2) (6.3) (6.4) (6.5) (6.6)

Multidecision Quickest Change-Point Detection

225

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

Here the constant v = vq is given by vq =

 √    qn 1 2 exp −2  −  q n 2 n=1

where x is the standard normal distribution function. This constant is tabulated in Tartakovsky et al. (2003) for q = 010120. The value of vq is close to 1 only when q is very small. For example, vq = 0832 and qvq2 /2 = 00346 for q = 01 and vq = 056037 and qvq2 /2 = 0157 for q = 1. It is therefore clear that the asymptotic lower bounds (6.2) for the ARL2FA should be much more accurate than the lower bounds (6.1). Obviously, for 2a0 > a1 the major contribution to the FAR is due to the statistics Li0 n, since, with high probability, i = i0 . It is thus expected that the distribution of the stopping time ∗ will be approximately exponential at least for low FAR. The results of Monte Carlo simulations presented below support this fact. A Monte Carlo experiment was performed for N = 2 and the multidecision CUSUM (MD-CUSUM) test ∗ . We used 5 · 105 replications for estimating ARL2FA ∗  = 0 ∗ , probability of error ∗  = 1 d∗ = 2 ∗ ≥  and average detection delay ADD = 1  ∗ −  ∗ ≥ . To guarantee ARL2FA ≥ T and the probability of misidentification ≤ , detection and identification thresholds a0 and a1 were chosen according to asymptotic bounds (6.2) and (6.3), i.e., assuming that 2a0 ≥ a1 the thresholds are chosen as follows:  a0 N T = log

Nqvq2 T 2





2N − 1 log  a1 N T  = log q

 N2 vq2 T   2



Sample Monte Carlo results are given in Figures 1–4. In Figure 1 we show the ARL2FA vs. detection threshold a0 for several values of identification threshold a1 . It is seen that the theoretical lower estimate (6.2) is accurate as long as 2a0 ≥ a1 . However, when a1 increases and becomes larger than a0 the accuracy of the theoretical approximation is poor, which can be expected. The lower bound ea0 /2 is inaccurate: e.g., for a0 = 5 it gives 74 versus 480. As we discussed above, intuitively we expect that the distribution of the stopping time ∗ under the no-change hypothesis is asymptotically exponential as long as 2a0 ≥ a1 , in which case the ARL2FA is indeed a proper measure of false alarms. This also allows for evaluating PFA in a fixed window k k + : 0  ∗ < k +  ∗ ≥ k ≈ 1 − 1 − 1/ARL2FA , which is important for a variety of applications. The QQ-plots (theoretical vs. experimental quantiles) in Figure 2 show that the exponential approximation is very good whenever 2a0 ≥ a1 even for moderate FAR (ARL2FA ≈ 600 and less down to 150). On the other hand, in the opposite case of 2a0 < a1 , this approximation fails, as can be expected. Figure 3 depicts the probability of error ∗ vs. threshold a1 and change point . The plot in Figure 3(a) confirms that the false alarm and misidentification rates cannot be made completely independent—for a fixed a0 (i.e., FAR) the probabilities of misidentification cannot be made arbitrary high. Figure 3(a) demonstrates that the upper bound (6.3) is fairly inaccurate (conservative) even for high threshold values a1 . For a0 = a1 = 5 this bound gives ∗ 1 = 0067, while ∗ 1 ≈ 0002. The supremum of the probability of error is about five times higher than 1∗ 1 (sup ∗  ≈ 001) in this particular experiment, and it stabilizes quite rapidly

Tartakovsky

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

226

Figure 1. ARL2FA vs. a0 and a1 .

(for  = 80−100), which confirms the conjecture of Remark 3.1: the upper bound for 1∗ 1, which is equal to 0067, works for sup ∗  as well. Moreover, it is conservative even for sup ∗ . This result confirms the discussion in Remark 3.1 that the proposed detection-identification tests are in fact optimal in the original ˜ class T  but not only in T . Figure 4(a) shows the Monte Carlo estimate of the ADD vs. identification threshold a1 in the range 1 5 for detection threshold a0 = 5. The ADD is

Figure 2. QQ-plots (experimental vs. exponential).

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

Multidecision Quickest Change-Point Detection

227

Figure 3. Probability of errors vs. threshold a1 and change point .

approximately constant (between 9.35 and 9.45), which can be expected from the theory. The asymptotic estimate given in 6.5 (for m = 1) is quite accurate (ADD1 = 9). Figure 4(b) depicts the ADD vs change point . It is seen that a stationary regime is attained very rapidly, already for  = 20−30. As we mentioned in Remark 4.2, the detection-identification test proposed by Nikiforov (2000) is expected to be asymptotically optimal only if the detection threshold a0 is bigger than the identification threshold a1 . Indeed, since 1 0 = q/2 and 1 Z20 1 = 0 Z20 1 = −q/2 < 0, using (4.23) we obtain that the ADD of Nikiforov’s test is ADD1  nik  ≈

2 maxa0  a1  q

Comparing with (6.5) shows that for a1 > a0 the ADD of Nikiforov’s test is ADD1 ≈ 2a1 /q, while for test ∗ the average detection delay ADD1 ≈ 2a0 /q for a0 < a1 < 2a0 and ADD1 ≈ a1 /q for a1 ≥ 2a0 , i.e., in the latter case the ADD of nik is twice as large. This is confirmed by Monte Carlo simulations. Figure 5 compares the ADD for these two detection-identification tests when a0 = 5 and a1 = 15. It is seen that the ADD of the proposed MD-CUSUM test ∗ varies between 15 (for  = 1) and 12 (for large ), while the ADD of Nikiforov’s test varies between 30 (for  = 1) and 29 (for large ).

Figure 4. ADD vs. threshold a1 and change point .

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

228

Tartakovsky

Figure 5. ADD vs. change point  for a0 = 5 and a1 = 15 for MD-CUSUM ∗ and Nikiforov’s detection tests.

7. CONCLUDING REMARKS 1. The same argument applies to show that the proposed procedures are  m and Dm in the class  asymptotically optimal with respect to D T  in a “scalar” i i multihypothesis isolation problem where X is either a scalar process or a vector and all components of this vector change their properties simultaneously from 0 to i , i = 1     N , at an unknown point in time . This problem was considered by Nikiforov (1995, 2000, 2003) and Lai (2000) under slightly different constraints (see, e.g., Remark 4.1). 2. Let   =   PFA   ≤  i   ≤ i  i ∈ 1 N  be the class of detection-identification procedures for which the local probability of false alarm PFA   = supk 0  ≤ k +  ≥ k and the probabilities of errors i   = sup i  = i ≥  do not exceed tolerance levels  and i , respectively. Using the developed methods along with Tartakovsky (2005), it can be shown that both detection-identification procedures ∗ and ∗∗ minimize all positive moments of the detection delay i  − m  ≥  uniformly for all  ≥ 1 in the class  . This problem will be considered elsewhere. 3. The results of the present paper can be generalized to cover general, noni.i.d. stochastic models using the methods developed in Lai (1995, 1998, 2000), Dragalin et al. (1999), Tartakovsky (1998a,d), Tartakovsky and Veeravalli (2004, 2005), and Baron and Tartakovsky (2006). These methods rely on the strong law of large numbers for the LLR processes, rates of convergence in the strong law, and certain extensions. 4. While the proposed change detection-identification procedures are proven to be asymptotically optimal when the false alarm and misidentification rates are low, the Mone Carlo simulations presented in Section 6 show that the false alarm

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

Multidecision Quickest Change-Point Detection

229

and misidentification rates cannot be made completely independent—the detection threshold affects the misidentification rate and vice versa. The same is true for other procedures considered by Nikiforov (2000, 2003) and Lai (2000). Thus, in this respect the problem of detection-identification (isolation) is still open. A different approach is perhaps needed (maybe multistage) to make these rates independent. 5. A general Bayesian theory of change detection-identification may be developed based on the methods used in Tartakovsky and Veeravalli (2005) and Baron and Tartakovsky (2006). This interesting problem will be considered somewhere else.

ACKNOWLEDGMENTS The research was supported in part by U.S. Office of Naval Research Grants N00014-99-1-0068 and N00014-95-1-0229 and by U.S. Army Research Office MURI Grant W911NF-06-1-0094 at the University of Southern California. We would like to thank Alexey Polunchenko for the help in Monte Carlo simulations.

REFERENCES Armitage, P. (1950). Sequential Analysis with More Than Two Alternative Hypotheses, and Its Relation to Discriminant Function Analysis, Journal of Royal Statistical Society, Series B 12: 137–144. Baron, M. and Tartakovsky, A. G. (2006). Asymptotic Bayesian Change-Point Detection Theory for General Continuous-Time Models, Sequential Analysis 25: 257–296. Basseville, M. and Nikiforov, I. V. (1993). Detection of Abrupt Changes: Theory and Applications, Englewood Cliffs: Prentice Hall. Doob, J. L. (1953). Stochastic Processes, New York: Wiley. Dragalin, V. P. (1994). Optimality of a Generalized CUSUM Procedure in Quickest Detection Problem, in Statistics and Control of Random Processes: Proceedings of Steklov Institute of Mathematics, vol. 202, pp. 107–120, Providence: American Mathematical Society. Dragalin, V. P. (1995). A Multi-channel Change Point Problem, in Proceedings of 3rd UmeaWuzburg Conference in Statistics, Umea University, pp. 97–108. Dragalin, V. P., Tartakovsky, A. G., and Veeravalli, V. V. (1999). Multihypothesis Sequential Probability Ratio Tests, Part I: Asymptotic Optimality, IEEE Transactions on Information Theory 45: 2448–2461. Ferguson, T. S. (1967). Mathematical Statistics: A Decision Theoretic Approach, New York: Academic Press. Girshik, M. A. and Rubin, H. (1952). A Bayes Approach to a Quality Control Model, Annals of Mathematical Statistics 23: 114–125. Klass, M. J. (1988). A Best Possible Improvement of Wald’s Equation, Annals of Probability 16: 840–853. Lai, T. L. (1995). Sequential Changepoint Detection in Quality Control and Dynamical Systems, Journal of Royal Statistical Society, Series B 57: 613–658. Lai, T. L. (1998). Information Bounds and Quick Detection of Parameter Changes in Stochastic Systems, IEEE Transactions on Information Theory 44: 2917–2929. Lai, T. L. (2000). Sequential Multiple Hypothesis Testing and Efficient Fault DetectionIsolation in Stochastic Systems, IEEE Transactions on Information Theory 46: 595–608.

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

230

Tartakovsky

Lorden, G. (1971). Procedures for Reacting to a Change in Distribution, Annals of Mathematical Statistics 42: 1897–1908. Lorden, G. (1977). Nearly-Optimal Sequential Tests for Finitely Many Parameter Values, Annals of Statistics 5: 1–21. Mosteller, F. (1948). A k-Sample Slippage Test for an Extreme Population, Annals of Mathematical Statistics 19: 58–65. Moustakides, G. V. (1986). Optimal Stopping Times for Detecting Changes in Distributions, Annals of Statistics 14: 1379–1387. Nikiforov, I. V. (1995). A Generalized Change Detection Problem, IEEE Transactions on Information Theory 41: 171–187. Nikiforov, I. V. (2000). A Simple Recursive Algorithm for Diagnosis of Abrupt Changes in Random Signals, IEEE Transactions on Information Theory 46: 2740–2746. Nikiforov, I. V. (2003). A Lower Bound for the Detection/Isolation Delay in a Class of Sequential Tests, IEEE Transactions on Information Theory 49: 3037–3047. Oskiper, T. and Poor, H. V. (2002). Online Activity Detection in a Multiuser Environment Using the Matrix CUSUM Algorithm, IEEE Transactions on Information Theory 48: 477–493. Page, E. S. (1954). Continuous Inspection Schemes, Biometrika 41: 100–115. Pollak, M. (1985). Optimal Detection of a Change in Distribution, Annals of Statistics 13: 206–227. Pollak, M. (1987). Average Run Lengths of an Optimal Method of Detecting a Change in Distribution, Annals of Statistics 15: 749–779. Pollak, M. and Tartakovsky, A. G. (2008). Asymptotic Exponentiality of the Distribution of First Exit Times for a Class of Markov Processes with Applications to Quickest Change Detection, Theory of Probability and Its Applications, to be published. Siegmund, D. (1985). Sequential Analysis: Tests and Confidence Intervals, New York: SpringerVerlag. Shiryaev, A. N. (1961). The Detection of Spontaneous Effects, Soviet Mathematics—Doklady 2: 740–743. Shiryaev, A. N. (1963). On Optimum Methods in Quickest Detection Problems, Theory of Probability and Its Applications 8: 22–46. Shiryaev, A. N. (1978). Optimal Stopping Rules, New York: Springer-Verlag. Tartakovsky, A. G. (1988). Multi-alternative Sequential Detection and Estimation of Signals with Random Appearance Times, Statistical Control Problems 83: 216–222 (in Russian). Tartakovsky, A. G. (1991a). Asymptotically Optimal Multi-Alternative Sequential Detection of a Disorder of Information Systems, Proceedings of IEEE International Symposium on Information Theory, Budapest, p. 359. Tartakovsky, A. G. (1991b). Sequential Methods in the Theory of Information Systems, Moscow: Radio i Svyaz’ (in Russian). Tartakovsky, A. G. (1992). Efficiency of the Generalized Neyman-Pearson Test for Detecting Changes in a Multichannel System, Problems of Information Transmission 28: 341–350. Tartakovsky, A. G. (1994). Asymptotically Minimax Multialternative Sequential Rule for Disorder Detection, in Statistics and Control of Random Processes: Proceedings of Steklov Institute of Mathematics, vol. 202, pp. 229–236, Providence: American Mathematical Society. Tartakovsky, A. G. (1997). Minimax-Invariant Regret Solution to the N -Sample Slippage Problem, Mathematical Methods of Statistics 6: 491–508. Tartakovsky, A. G. (1998a). Asymptotic Optimality of Certain Multihypothesis Sequential Tests: Non-i.i.d. Case, Statistical Inference for Stochastic Processes 1: 265–295. Tartakovsky, A. G. (1998b). Asymptotic Solution to a Multi-decision Change-Point Problem, unpublished. Tartakovsky, A. G. (1998c). Asymptotically Optimal Sequential Tests for Nonhomogeneous Processes, Sequential Analysis 17: 33–62.

Downloaded By: [Tartakovsky, Alexander] At: 21:18 19 May 2008

Multidecision Quickest Change-Point Detection

231

Tartakovsky, A. G. (1998d). Extended Asymptotic Optimality of Certain Change-Point Detection Procedures, technical report/preprint, Center for Applied Mathematical Sciences, University of Southern California. Tartakovsky, A. G. (2005). Asymptotic Performance of a Multichart CUSUM Test under False Alarm Probability Constraint, in Proceedings of 44th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC’05), December 12– 15, 2005, pp. 320–325, Seville, Spain, Omnipress CD-ROM, ISBN 0-7803-9568-9. Tartakovsky, A. G. and Veeravalli, V. V. (2004). Change-Point Detection in Multichannel and Distributed Systems with Applications, in Applications of Sequential Methodologies, N. Mukhopadhyay, S. Datta, and S. Chattopadhyay, eds., pp. 339–370, New York: Marcel Dekker. Tartakovsky, A. G. and Veeravalli, V. V. (2005). General Asymptotic Bayesian Theory of Quickest Change Detection, Theory of Probability and Its Applications 49: 458–497. Tartakovsky, A. G., Li, X. R., and Yaralov, G. (2003). Sequential Detection of Targets in Multichannel Systems, IEEE Transactions on Information Theory 49: 425–445. Tartakovsky, A. G., Rozovskii, B. L., Blažek, R., and Kim, H. (2006). Detection of Intrusions in Information Systems by Sequential Change-Point Methods (with Discussion), Statistical Methodology 3: 252–340.

Suggest Documents