Detecting, Tracking and Counteracting Terrorist Networks via Hidden ...

6 downloads 10625 Views 150KB Size Report
tion Awareness system with an eye to the detection and in- terdiction of ... make a chemical weapon, and then buys a plane ticket des- ..... rameters which produce the best representation of the most .... We have for the noise-free observation xt.
Detecting, Tracking and Counteracting Terrorist Networks via Hidden Markov Models Jeffrey Allanach, Haiying Tu, Satnam Singh, Peter Willett & Krishna Pattipati Dept. of Electrical and Computer Engineering U-1157, University of Connecticut Storrs, Connecticut 06269-1157 {willett,krishna}@engr.uconn.edu Abstract—In reaction to the tragic events of September 11th , 2001, DARPA made plans to develop a Terrorism Information Awareness system with an eye to the detection and interdiction of terrorist activities. Under this program and in conjunction with Aptima, Inc., the University of Connecticut is developing its Adaptive Safety Analysis and Monitoring (ASAM) tool for assisting US intelligence analysts with: 1) identifying terrorist threats; 2) predicting possible terrorist actions; and 3) elucidating ways to counteract terrorist activities. The focus of this paper, and an important part of the ASAM tool, is modeling and detecting terrorist networks using hidden Markov models (HMMs). The HMMs used in the ASAM tool model the time evolution of suspicious patterns within the information space gathered from sources such as financial institutions, intelligence reports, newspapers, emails, etc. Here we report our software’s ability to detect multiple terrorist networks within the same observation space, distinguish transaction “signatures” of terrorist activity from the ambient background of transactions of benign origin, and incorporate information relating to terrorist activity, timing and sequence.

TABLE OF C ONTENTS 1

I NTRODUCTION

2 A DAPTIVE S AFETY A NALYSIS AND M ONITORING (ASAM) 3

D ETECTING T ERRORIST N ETWORK HMM S

4

M ULTIPLE H IDDEN M ARKOV M ODELS

5

P REDICTING T ERRORIST ACTIVITIES

6

S IMULATIONS

7

S UMMARY

1. I NTRODUCTION Motivation Characteristics of terrorist networks that make them difficult to observe include: low SNR (in the sense of sparse relevant observations superimposed upon a large background of benign ones); geographic distribution; and dynamic and adapIEEE Aerospace Conference, Big Sky MT, March 2004 c 0-7803-8155-6/04/$17.00 2004 IEEE IEEEAC paper # 1250

tive structures. Terrorist networks are often a string of small cells, and interconnection between these is low and therefore very difficult to detect. In order to maintain a low profile, terrorist cells can move around geographically, alter their personnel, and even change their intended target. The ASAM process considers all of these issues, and is intended to help analysts by using multi-functional transaction models (colored digraphs), dynamic Bayesian networks (DBNs), hidden Markov models (HMMs), multi-phase/dynamic fault trees, colored Petri-nets, stochastic PERT networks, and decision networks. The focus of this paper is modelling and detection of terrorist activities via hidden Markov models. The basic premise is that terrorist networks can be evaluated using transaction-based models. This type of model does not rely solely on the content of the information gathered, but more on the significant links between data (people, places, things) that appear to be suspicious. For example, an unknown person withdraws money from his/her bank account, uses that money to purchase chemicals that could be used to make a chemical weapon, and then buys a plane ticket destined for the United States. This sequence of events suggests a reason to be concerned: it may or may not arise from terrorist activity, but ought to be flagged for more careful scrutiny. Much important information that can be extracted from this example in terms of transaction-based models is the financial transactions. A model of this event would not be concerned with who this person is, or what their objectives are: provided the model is able to detect this or an equivalent sequence of events and report them, it really does not matter what the data is. Models such as these are very efficient because instead of wasting resources interpreting what the data is, they only have to look for significant links between the data. Due to the enormous amount of data which must be analyzed for possible instances of terrorist activity this model must be both efficient and accurate. Note that the number of instances of terrorism is (thankfully) very low and hence a “learning from data” approach is problematic: some exogenous information about the likely structure of a terrorist cell is required, and we shall discuss this further. The underpinning goal of the ASAM process is to support strategic decision-making; to provide early warning to facilitate preemption; to increase the range of options and the probability of success; and to integrate information to deal with

large volumes of data. As will be discussed shortly, the fundamental structure to be considered is of a (dynamic) Bayesian network. However, in order to facilitate ASAM’s implementation a hierarchical structure has been selected: at the lowest level ASAM uses Hidden Markov models (HMMs). In our application of HMMs we choose to model the evolution of terrorist activities over time: specifically, the HMM is actually constructed out of the sequence of transactions that it observes. Plan of the Paper The purpose of this paper is to describe ASAM’s formulation of an efficient and effective model of terrorist networks. Our effort aims at increasing the resources available to US intelligence analysts and/or policy makers who investigate reports of terrorist activities. With more effective tools for analysis, policy makers will have more and better information when planning some form of counter-terrorist action. The following sections discuss the ASAM technical approach, with particular attention paid to the HMM “physical layer”. Section 2 discusses the ASAM process. Section 3 provides a measure of detecting suspicious transactions (implicative of terrorist activities) in the presence of many benign transactions. Section 4 summarizes a method to evaluate the progress of multiple terrorist networks using modern target tracking ideas, and section 5 discusses how to predict terrorist attacks against the US and its allies. In section 7 we finish our discussion with a brief summary of our results and concluding thoughts. Refer to section 6 for simulations.

Genoa II Output

Dynamic Bayesian Network

Dynamic Sub-Bayesian Network Hidden Markov Models

Probabilistic Graph Matching

EELD Input

Figure 1. ASAM Hierarchy. The separate EELD program provides data that has been somewhat processed, in that it takes the form of links and transactions. The “Genoa II” output is structural intelligence data in a form useful to analysts.

2. A DAPTIVE S AFETY A NALYSIS AND M ONITORING (ASAM) The tragic events of September 11th , 2001, caught the United States unprepared to respond to significant homeland terrorism: the process of intelligence gathering and assimilation was largely unautomated, and unintegrated across the multiple agencies whose responsibilities might include responding to such events. The Government has been swift to solicit suggestions for its overhaul: our automated ASAM system is one among many that will be able to assist in the efforts of homeland security by monitoring large quantities of transactional data and reporting only significant information. The ASAM software will provide a means for gathering, sharing, understanding, and using information to analyze terrorist networks. This includes coordination, referring to the ability to handle information distributed across different government agencies. In order to facilitate these provisions, ASAM addresses the task by learning – where possible – from historical data, evaluating the most likely sequence of events, and predicting future events. Note that it is not a coincidence that these three problems are the same three problems addressed by HMMs. HMMs are a principal method for modelling stochastic processes and are therefore an ideal way to make inferences about how terrorist networks evolve. In addition to the

task discussed, the ASAM tool also provides measures for counter-terrorism. From this perspective, ASAM is a feedback control network: the inferred terrorist activities represent outputs from the system “plant”, and the counter-terrorist actions represent the “control” in the feedback loop. By combining the use of both stochastic modelling and systems theory, ASAM will be able to suggest feasible actions to counteract terrorism and optimize these counter-terrorist actions by evaluating the strategy that results in the highest probability of interdiction. As shown in Figure 1, the ASAM process is hierarchical. The lowest level (corresponding to the “physical layer” in open systems interconnect terminology [23]) is powered by HMMs, and information from these propagates upward to subordinate dynamic Bayesian networks (sub-DBNs) and eventually to an over-arching DBN structure that models all terrorist activities. Each sub-DBN contains a different terrorist model and the culmination of all of the sub-DBNs – itself a DBN – will be used to evaluate the total probability of terrorist activity. It is important to note that the HMMs function in a faster time scale than DBNs: since the HMMs model the evolution of the transaction space, they process new information every time a transaction occurs. In other words, each

HMM can be viewed as a detailed stochastic time-evolution of a particular node represented in the DBN. Data sent from the HMMs to the DBNs, also known as soft evidence, is the probability of observing the incoming data given a model of a terrorist network. The ASAM hierarchical structure was designed this way in order to support distributed decision making. US intelligence agencies can collaborate on detecting, and counteracting terrorism by providing different pieces from their own local (and not necessarily shared) data. If, for instance, intelligence agency X has a model for suicide bombing scenarios, they can concatenate their model with another model for contract killings from intelligence agency Z. The combination of two or more models will be more effective in terms of evaluating terrorist threat. These models, and the inferences derived from them can be accessed by any of these intelligence agencies for evaluation. The following paragraphs will provide a more detailed description of each of the ASAM levels. The input to the ASAM software will be a series of transactions and patterns generated by the Evidence Extraction and Link Discovery (EELD), another US government funded project. Provided that the transactional evidence is continual, the ASAM software will monitor possible terrorist activity by comparing its terrorist network models with the incoming data via model-based “probabilistic graph matching”, a method for measuring the correlation between a state model graph and an input graph (EELD data,) given their statistical descriptions. As we shall see, the state model graph represents the network model for one state of an HMM: if an HMM has ten states then it has ten different state models. Simply put, the state model graphs are snapshots of the evolving terrorist network. Much like these, the input graphs are snapshots of the evolving transaction space. A more detailed explanation of how HMMs function within the ASAM architecture can be found in section 3. The highest level of the ASAM architecture is the DBN. This is a collection of different HMMs representing different terrorist activities. We use the DBN here so that we can evaluate the total probability of a terrorist attack. Each HMM monitors the probability of a certain terrorist scenario, and this probability is reported to a sub-DBN, where it is then evaluated with all other HMMs from that sub-DBN. Finally, the probability of each sub-DBN is computed with the probability of the other sub-DBN to then provide the user (US intelligence) with the current and future probability of observing a terrorist attack. Why has the ASAM tool been structured in this way? There are alternative approaches, and here we explain, in brief, the thinking that has led to the decision. Why model according to DBNs? The Bayesian structure is natural, since the goal of ASAM is to provide intelligence operatives with current “state estimates”, these ranked ac-

cording to probabilities and likelihoods. A second goal is to suggest actions to maximize the probability of interdiction of an evolving terrorist activity: both goals require probability calculations, and hence a set of identifiable probabilistic assumptions1 . As for the dynamic nature of these Bayesian belief networks, it is also necessary that predictions be accompanied with timing information, as, for example: Will this activity take place tomorrow or next week? Why is the structure hierarchical? Ideally there ought to be one large Bayesian structure modelling “everything”, but there are two reasons to compartmentalize the functions. The first is the obvious concern of numerical load: smaller distributed DBNs at every scale are preferable for reasons of efficiency and parallelizability. The second is that intelligence agencies presently operate independently, and the hierarchical and distributed structure allows them to preserve both their present functionality and ownership of their data. Integration of their conclusions (data fusion and team decisionmaking [18], [19]) is the responsibility of higher lever networks. Why use sub-DBNs? This seems a natural division of topic: each sub-DBN refers to a particular terrorist network model. One can therefore afford to be profligate with models: the model that most closely matches the data is the one that “fires”. Additionally, different sub-DBNs can be allotted to terrorist cells operating with different goals or in different geographic locations. Why use HMMs? A hidden Markov model is a specific sort of DBN, one that has an efficient structure for inference based on the forward or forward-backward algorithm. An additional HMM feature is that there exist ways to detect the presence of an HMM from among ambient data. That is, there is no need to consider all sorts of terrorist activities to be in an inactive but “poised-for action” state; some HMMs are simply “off”. We shall shortly explain the means by which the HMM is designed to capture the evolution of a terrorist activity. In regards to the last item, note that due to its quantity it may be possible to estimate from the data a hidden Markov model for “ambient” (benign) transactional activities, and thereby simplify the data association task [1], [4]. Where do the probabilistic models come from? It is unlikely that there will be enough data from terrorist events extant that inferences can be drawn. Thus, we propose that organizational theory (e.g., [7], [15]) be coupled with large-scale system diagnostic tools such as Qualtech Systems’ TEAMS software [11] to produce normative network models. A terrorist activity has a goal, and certain transactions must take place in order that the goal be achieved. From figure 1, ASAM’s purpose is to: 1) transform EELD’s discovered transactions to state-estimation snapshots of terrorist activity; 2) provide tools for analysis, 3) make predic1 As always, a set of probabilistic assumptions, and of course the set of instantiated parameters such as transition probabilities that they require, are assailable. We believe, however, that it is vital to have the probabilistic assumptions clear and visible: if there is some debate, for example, that a certain terrorist network is modelled as evolving too slowly, then this is easily fixed by adjusting the transition probabilities. The assumptions and model are out in the open.

tions about future terrorist activities, 4) and elucidate counterterrorist strategies. In this paper we focus particularly on the HMM part of ASAM: how are the HMMs modelled, how is the presence of an active HMM detected, and how can transactional “noise” be filtered out? The other parts of the ASAM process are discussed in some detail in [24].

joint probability for an HMM sequence is n−1  n    p(s1 , · · · , sn , x1 , · · · , xn ) = πs1 ast st+1 bst xt

3. D ETECTING T ERRORIST N ETWORK HMM S

In terms of a terrorist network model, A, B, and π represent, the probability of moving from the current state of terrorist activity to another (usually denoting an increase of terrorist threat), the probability of observing a new suspicious transaction given the current state, and the initial threat respectively.

This section addresses the detection of terrorist networks modelled as hidden-Markov. By using the forward variable, defined below in (7), a detection of a change between two or more observation sequences modelled as hidden Markov models can be achieved using a procedure analogous to Page’s test [17]. It is particularly important to recall that all HMMs operate in a low-SNR environment: there are many observations that are simply noise, and therefore data association (which transactions are relevant?) is a particular concern. At any rate, prior to introducing our detection algorithm, some background information on HMMs and the sequential probability ratio test (SPRT) will be given. Background on HMMs A hidden Markov model is a type of stochastic signal model used to evaluate the probability of a sequence of events, determine the most likely state transition path, and estimate parameters which produce the best representation of the most likely path. An excellent tutorial on HMMs can be found in [20]. The existence of the Baum-Welch re-estimation algorithm [6], which is in fact an application of the EM algorithm [16], makes it a convenient tool for modelling dependent observations. HMMs may presently be best known for their application to speech recognition; however, here we propose their use as discrete time finite-state representations of transactional data that may arise from terrorist activity. A discrete HMM is parameterized by λ = (A, B, π)

(1)

A = [aij ] = [p(st+1 = j/st = i)]

(2)

where (i, j = 1, · · · , N ) is the state transition matrix of the underlying Markov chain, where B = [bij ] = [p(xt = j | st = i)]

(3)

(i = 1, · · · , N ; j = 1, · · · , M ) is the observation matrix, and where π = [πi = p(s1 = i)] (4) (i = 1, · · · , N ) is the initial probability distribution of the underlying Markov states. Implicit to the above notation is the finite number of states (N ) and finite alphabet of observations (M ). A convenient choice of the initial probability is the stationary distribution of the underlying Markov states, so that the resulting sequence can be regarded as stationary. The

t=1

i=1

(5)

and this can be considered its defining property.

The forward variable will be used to evaluate the probability of terrorist activity because it is an efficient way to compute the likelihood of a sequence of transactional observations. The forward variable of an HMM is defined as αt (i) = p(x1 , x2 , · · · , xt , st = i | λ)

(6)

It is easily checked that the following recursion holds for the forward variable N   αt+1 (j) = αt (i)aij bjxt+1 (7) i=1

with initial condition α1 (j) = π(j)bjx1

(8)

This gives us an efficient way to calculate the likelihood function of an HMM given observations up to the current time. Representing Terrorist Networks with HMMs The input to the ASAM process will be a series of transactions between persons, places, and things of suspicious origin. A graphical illustration of this idea would suggest that the persons, places, and things represent nodes and the transactions, or relationships, represent links in between the nodes. An example of a terrorist network is shown in figure 2. Each time frame a new snapshot of this network will be received. Our task is to evaluate whether or not there exist patterns within the network that signify a possible terrorist activity. The scenario depicted in this figure is a method of performing a contract killing — although this “task” is often identified with organized crime units within the US it nonetheless provides an easy to understand example. The node labelled ‘Person’ represents someone who has an intent to kill another person. The basic premise behind this network is that the ‘Person’ will hire a terrorist organization to murder someone else so that they will be dissociated with the victim. The ‘Middle Man’ is a person who forms a task relationship between the ‘Person’ and the ‘Hitman’. The middle man’s task is to communicate information between the ‘Person’ and the ‘Hitman’, such that the two of them remain anonymous. The middle man also relays the necessary financial resources from the person to the hitman.

  

Intent

Terrorist Organization

Person

Middle Man

γ(s  t−1 , τ ) τ γ(st−1 , τ ) 0

st = st−1 ∪ τ st = st−1 else

in which τ can range over all single transactions (|τ | = 1) such that st = st−1 ∪ τ , and in which γ(st−1 , τ ) ≥ 0 and may be simply defined as constant. In a slight variation on the usual HMM structure, the emission probabilities in (3) are functions not simply of the current states, but also of the past states – this does not alter the Markov nature of the state sequence st . We have for the noise-free observation xt

Hitman

P r(xt |st , st−1 ) =  xt = ∅ and st = st−1  1 1 xt = τ and st = st−1 ∪ τ  0 else

Relationships Trust

1−

Victim

Task

and in all cases the cardinality |τ | = 1.

Money & Resources Strategy & Goals

Given a probabilistic description of the network, including the probability of false alarm Pf a , and the probability of miss detection Pmd , the ASAM software can determine whether or not a suspicious pattern exists. We refer to this, perhaps inaccurately, as “probabilistic graph matching” (values for Pf a and Pmd can be estimated from previous data). Consider that the observation at time t is

State Models State 1

State 4

State 2

State 5

Z(t) = {z1 (t), . . . , znt (t)}

State 3

(11)

meaning that there are nt observed transactions at time t. Then we have P r(Z(t)|st , st−1 ) =  n −1  (1 − Pmd )Pf at Pmd Pfnat  Pfnat

Figure 2. Example of a Terrorist Network

The transactions that occur between each of these members in the network is essentially what the ASAM process monitors. As shown in the figure, the first state of the network represents the relationship between the person and his intent (motive). The second state is a combination of the person’s motive and the person’s communication with a known terrorist organization; as more transactions/communications are made, the HMM state models grow to be more complicated. Each state model is essentially the concatenation of the previous transactions with the current transaction. Let us be specific. Suppose that the culmination of a terrorist event requires the transactions in the set T . The set of states that can occur at time t are st ∈ T . The cardinality of possible states is therefore upper bounded by 2|T | ; but in practice this will be far lower, since many states, such as the “hit” transaction being previous to that of payment for the hit, are structurally precluded. Now we have, according to (2), P r(st |st−1 ) =

(10)

(9)

(12) τ ∈ Z(t) & st = st−1 ∪ τ τ∈ / Z(t) & st = st−1 ∪ τ st = st−1

where as before we understand that in all cases the cardinality |τ | = 1. Note that, for notational simplicity, we have assumed that miss and false alarm probabilities are identical for all transactions; this can be easily generalized. Figure 3 shows the relationship between the state models and the observations Z1n . The emission matrix, B, is constructed by measuring the correlation of the input graphs to each state model of the HMM. Likewise, the transition matrix, A, is made by measuring the correlation between each state model within the HMM. To exemplify this point, if two state models of an HMM are very dissimilar, then consequently the transition probability between the two of them will be very small. Figure 4 outlines the details of this process. Page’s Test Page’s test [17], also known as the CUSUM procedure, is an efficient change detection scheme. A change detection problem is such that the distribution of observations is different before and after an unknown time n0 ; and we want to detect the change, if it exists, as soon as possible. Casting it into a

HMM State Models a11

a22

a33

a12

a23

xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx b1 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx

xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx b2 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx

Pattern for State 1

Observed Pattern in State 1

ann

a3n

xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx b3 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx

Observed Pattern in State 2

xxxxxxx xxxxxxx xxxxxxx xxxxxxx bn xxxxxxx xxxxxxx xxxxxxx xxxxxxx

Observed Patterns

Observations Model of Transaction Space for State 3

Patterns

Observation of Transaction Space at t=3

STATES

1

2

1

0.7

0.1

---

0.001

m

2

0.2

0.6

---

0.002

n

0.001

0.02

---

0.9

Probabilistic Graph Matching

Emission Matrix

Pattern at Time t in State 1

a12

Pattern at Time t+1 in State 2

a11

Transition Matrix

Patterns at time t

Patterns at time t+1

Figure 3. The Connection Between Hidden Markov Models and Probabilistic Graph Matching

STATES

1

2

1

0.3

0.12

---

0

2

0.2

0.09

---

0

n

0

0

0

1

0

standard inference framework, we have the following hypothesis testing problem: H: K:

x(k) = v(k) x(k) = v(k) x(k) = z(k)

1≤k≤n 1 ≤ k < n0 n0 ≤ k ≤ n

(13)

where x(k) are observations and v(k) and z(k) are all independent identically distributed (i.i.d.), with probability density functions (pdf ) denoted as fH and fK , respectively. Note that under K the observations are no longer a stationary random sequence: their distribution has a switch at n0 from fH to fK . The Page decision rule, which can be derived from the generalized likelihood ratio (GLR) test [2], amounts to finding the stopping time

n N = arg min max Lk ≥ h (14) n

n

Figure 4. How to Estimate the Parameters λ = {A, B, π} for a terrorist network. Note the two observed patterns and their respective probabilities in the emission matrix.

where 

L(k) =

1≤k≤n

Lk1

=

k

 i=1

fK (xi ) ln fH (xi )

(16)

with L(0) = 0. This is based on the fact that, given independence, Ln1 = L1k−1 + Lnk (17) Equation (15) allows us to write down the standard recursion for the Page’s test

1≤k≤n

where Lnk is the log likelihood ratio (LLR) of observation {xk , . . . , xn }, and arg minn f (n) denotes the value of n that achieves the minimum for f (n). Given that the observations are i.i.d., (14) can be easily reformulated as

(15) N = arg min L(n) − min L(k − 1) ≥ h

m

N = arg min {Sn ≥ h}

(18)

Sn = max{0, Sn−1 + g(xn )}

(19)

n

in which

and g(xn ) = ln is the update nonlinearity.

fK (xn ) fH (xn )

(20)

Page’s recursion assures that the test statistic is “clamped” at zero; i.e., whenever the LLR of current observation would make the test statistic Sn negative (which happens more often when H is true), Page’s test restarts at zero. The procedure continues until it crosses the upper threshold h and a detection is claimed. Thus, operationally, Page’s test is equivalent to a sequence of SPRTs with upper and lower thresholds h and 0. Whenever the lower threshold 0 is crossed, a new SPRT is initiated from the next sample until the upper threshold h is crossed.

The log likelihood ratio is then Lnk = ln (Λ(n; k)) =

n  i=k

ln

fK (xi | xi−1 , · · · , xk ) fH (xi | xi−1 , · · · , x1 )

(23)

Page’s test is equivalent to a sequence of repeated sequential probability ratio tests (SPRTs) with thresholds h and 0. Start an SPRT with threshold 0 and h. If the SPRT ends at time k with test statistic below zero, reinitiate another SPRT from k + 1 as if no previous data existed. That is, recalculate the likelihood ratio based on the stationary marginal distribution. • Repeat the above procedure until h is crossed. • •

In practice, the update nonlinearity g(xi ) need not be a LLR as in (20), since this might not be available as in the case when dealing with composite hypotheses or with hypotheses involving nuisance parameters. For a nonlinearity other than the LLR, a critical requirement for the corresponding CUSUM procedure to work is the “antipodality” condition: E(g(xn ) | H) E(g(xn ) | K)

< 0 > 0

In compact form, we can write, in a manner similar to the standard Page recursion (18): Sn = max{0, Sn−1 + g(n; k)}

(21) where

There is no false alarm rate or probability of detection involved, since we see from the implementation that, sooner or later, a detection is always claimed as long as the test is “closed” (i.e., P r(N < ∞) = 1 under both hypotheses). The performance of Page’s test is therefore measured in terms of average run length (ARL) under K and H. It is always desired to have as small a delay to detection as possible, usually denoted as D, while keeping the average number of samples between false alarms, denoted as T , as large as possible. Analogous to the conventional hypothesis testing problem, where we wish to maximize the probability of detection while keeping the false alarm rate under a fixed level, the trade-off amounts to the choice of the upper threshold h. The relationship between h and the ARL is often calculated in an asymptotic sense using first or second order approximations, usually credited to Wald and Siegmund [25], [21]. As a final note, Page’s test using the LLR nonlinearity has minimax optimality in terms of ARL, i.e., given a constraint on the average delay between false alarms, the Page’s test minimizes the worst case delay to detection [14]. Detecting HMMs Consider a Page’s test (13) except that fH and fK are general non-i.i.d. probability measures. Assume under K the observations before and after the change are independent of each other. The likelihood ratio (parameterized by n0 ) is then Λ(n; n0 ) = = =

f (X1n | K) f (X1n | H)

fH (X1n0 −1 )fK (Xnn0 )

fH (X1n0 −1 )fH (Xnn0 | X1n0 −1 ) fK (Xnn0 ) fH (Xnn0 | X1n0 −1 )

g(n; k) = ln

fK (xn | xn−1 , · · · , xk ) fH (xn | xn−1 , · · · , xk )

(24) (25)

and xk is the first sample after the last reset, i.e., Sk−1 = 0. The difference with (23) is that the conditional densities of both numerator and denominator in the logarithm of g(n; k) depend on the same set of random variables, which make a Page-like recursion possible by utilizing the stationary assumption of the hidden Markov models. Note also that such a scheme reduces to the standard Page test with a LLR nonlinearity when the observations both before and after the change are i.i.d.. Further, if the observations before change are independent, we need only replace fH (xn | xn−1 , · · · , xk ) with fH (xn ) for the scheme to work. The scheme presented here is in the same form as the sequential detector proposed in [5]. It was shown that this procedure is in fact asymptotically optimal in Lorden’s sense, i.e., as h → ∞, it minimizes the worst case delay to detection, given a constraint on the average time between false alarms, among all possible sequential schemes. So far we have proposed a CUSUM procedure that is applicable to the case of dependent observations provided we have an efficient means to calculate the likelihood function. This is not always a reasonable assumption. Fortunately, for the hidden Markov model, the existence of the forward variable, together with its recursion formula as discussed in this section, enables efficient computation of the likelihood function of an HMM. Specifically, the likelihood function of an HMM with parameter triple λ could be written as f (x1 , x2 , · · · , xt | λ) =

N 

αt (i)

(26)

i=1

(22)

where N is the total number of states and the αt ’s are the forward variables defined in (6).

Now the conditional probability in equation (23) is readily solved as fj (xt | xt−1 , · · · , x1 ) = f (xt | xt−1 , xt−2 , · · · , x1 , λj ) N αt (i) (27) = Ni=1 i=1 αt−1 (i) where j = H; K. Although we have followed the proposed procedure to find the conditional pdf as in equation (27), this step can in fact be avoided since the likelihood function, defined as the sum of αt (i), can be used directly by each individual sequential likelihood ratio test. But in practice, it is found, the direct use of the likelihood function as defined in (26) will cause numerical underflow as the number of observations increases. For discrete HMMs, it is easily seen from the definition of the forward variable that the likelihood decreases monotonically (and generally geometrically) with the number of observations. The conditional likelihood function defined in (27) does not suffer such a numerical problem. We need therefore to develop a way of recursively computing the conditional likelihood function in (27) without the direct use of the forward variable. This can be achieved by scaling. Define αt such that α1 (i) = α1 (i), but for t > 1 

 N αt (i)aij bjxt+1 i=1  (28) αt+1 (j) = N  i=1 αt (i) N It is easily checked that i=1 αt (i) is identical to fj (xt | xt−1 , · · · , x1 ) with j = H, K as defined in (27). Thus, the updating nonlinearity g(n; k) can be obtained recursively without computing explicitly the exact likelihood function at each time. To summarize, for the quickest detection of HMMs, we propose the following procedure: 1. Set t = 1, l0 = 0, where lt denotes the LLR at time t. 2. Initialize the (scaled) forward variable αt using αt (j) = π(j)bjxt

(29)

for each possible state j and for both hypotheses H and K. 3. Update the log likelihood ratio   N  α (i | K) t lt = lt−1 + ln i=1 (30) N  i=1 αt (i | H) 4. If lt > h, declare detection of a change, stop; If lt < 0, set lt = 0; t=t+1; then goto 2; If 0 < lt < h, continue. 5. Set t = t + 1; Update the scaled forward variable αt using (28); then goto 3. A detection of change between the null hypothesis and the terrorist network hypothesis is shown in section 6, Figure 5. The ground truth is that both HMM1 and HMM2 begin at t = 75.

4. M ULTIPLE H IDDEN M ARKOV M ODELS The detection scheme proposed in the previous section assumes that multiple terrorist networks are independent of each other. This assumption is not always appropriate; so, in the next method we propose a scheme for tracking multiple terrorist networks, where statistical independence is not assumed. Before we begin with multiple HMMs, consider first the case of a single terrorist network HMM. The forward variables, after suitable normalization, define the posterior probability of state occupancy given observations up to the current time. Specifically, we have αt (i) p(st = i|x1 , · · · , xt ) = N j=1 αt (j)

(31)

where N is the total number of states, and we have suppressed the dependence on the HMM parameter λ. In a sense, the HMM state is being “tracked”. By extension, the direct model expansion approach suggests optimal multiple target tracking, as we now discuss. Let us denote s1 (n) and s2 (n) as the underlying states of HMM1 and HMM2 at time n, and Z1n as the superimposed observations z(1) through z(n). The goal of the tracking algorithm is obtain the likelihood function p(Z1n ), given that both HMM1 and HMM2 are active. Assume we have obtained p(Z1n , s1 (n), s2 (n)), and consider the one step update of p(Z1n+1 , s1 (n + 1), s2 (n + 1)). This can be written as (Z1n+1 , s1 (n + 1), s2 (n + 1)) = p(Z1n , z(n + 1), s1 (n + 1), s2 (n + 1)) = p(z(n + 1)|s1 (n + 1), s2 (n + 1), Z1n ) ×p(s1 (n + 1), s2 (n + 1), Z1n ) = p(z(n + 1)|s1 (n + 1), s2 (n + 1)) ×p(s1 (n + 1), s2 (n + 1), Z1n )

(32)

where the last identity follows from the fact that given s1 (n + 1) and s2 (n + 1), z(n + 1) is independent of the previous observations. Computation of the first term is essentially the same as obtaining the observation matrix B in the model expansion approach. The derivation of this algorithm can be found in [9], the result of which produces the posterior probability of each state.

p(s1 (n + 1)|Z1n+1 ) =



p(s1 (n + 1), Z1n+1 ) n+1 ) s1 (n+1) p(s1 (n + 1), Z1

p(s2 (n + 1)|Z1n+1 ) =



p(s2 (n + 1), Z1n+1 ) n+1 ) s2 (n+1) p(s2 (n + 1), Z1 (33)

The derivation of the likelihood function [9], [10] of the ob-

servation (which after all is the goal here) follows from, p(Z1n+1 ) =



p(s1 (n + 1), Z1n+1 )

(34)

p(s2 (n + 1), Z1n+1 )

(35)

s1 (n+1)

p(Z1n+1 ) =

 s2 (n+1)

requiring min(N1 , N2 ) operations.

3. Update the log likelihood ratio   p(Ztt0 | K) lt = lt−1 + ln N i=1 αt (i | H)

(36)

4. If lt > h, declare detection of a change, stop; If lt < 0, set lt = 0; then go to 2; If 0 < lt < h, continue. 5. Set t = t+1; update the forward variable αt (·) using (27); and update the “tracker” with either (34) or (35); then go to 3.

Discussion Multiple target tracking has been studied for decades. Though new approaches continually appear, many can be categorized as based on the Joint Probabilistic Data Association Filter [1], [3] or on the Multiple Hypothesis Tracker [4]. There are some cases in which an existing target tracking algorithm, such as JPDAF or MHT, can be applied almost directly in detecting superimposed HMMs. For example, certain transient signals can be modelled as slowly varying, possibly continuous frequency lines [22]. When modelled via an HMM, the observation (in the frequency domain) is the noisy measurement of the underlying frequency which is assumed to evolve according to a Markov model. In this case, analogy between the HMMs’ state estimates and the conventional multi-target tracking problem is obvious – different frequency lines produce different (noisy) frequency domain observations, and consequently association between tracks and observations is explicit. This direct applicability is a function of the way in which the observation process is formed from the HMM; at greater generality (as we attempt) the superposition of the HMMs does not take this form, and thus the tracking algorithm presented here is neither JPDAF nor MHT. The major reason is, for tracking superimposed HMMs, there is one and only one measurement available at any instant. This measurement is the “superposition” of the realizations of two or more HMMs, and hence there is no data association in an explicit sense. Operationally, however, the tracking algorithm has a flavor similar to JPDAF. Given the output of the likelihood function of the tracker as in (34) or (35), a Page-like test is easily constructed. Under H, we use forward recursion to compute the likelihood given only HMM1 is present. Under K, the target tracker is used to compute the likelihood given both HMM1 and HMM2 are present. The output likelihood functions under both H and K are used to run a sequential test and whenever the test statistic falls below zero, it is reset to zero and the procedure restarts from the next observation.

If under H more than one HMM is present, then the above procedure may be modified such that the “tracking” approach is used under both hypotheses. It is often necessary to use scaled versions of the forward variables to avoid numerical underflow – please consult [8] or any standard HMM reference for details.

5. P REDICTING T ERRORIST ACTIVITIES The ability to detect and track terrorist networks is important because it provides US intelligence with information about where, and when terrorist activities are likely to occur. More important than this is the ability to determine what is the terrorist target. These pieces of information are valuable since it can provide decision-makers with useful information when evaluating appropriate counter-terrorist reactions. The following section explains how the ASAM tool evaluates possible terrorist threats. In order to make accurate predictions about future observations, we will use the forward variable and the HMM model parameter A. The forward variable describes the probability of each state of the HMM up to the current time. The product of the transition matrix, A, and the forward variable gives the most likely possibility of observing a new transaction in the next time frame. Assuming the state transition matrix is a relatively good representation of the real terrorist network, further predictions can be made according to αT +τ (j) =

N 

αT +τ −1 (i)aτij

(37)

i=1

where T and τ represent the current time and the additional time to predict until, respectively. The value τ is a user input to the ASAM software. This method of prediction is useful to intelligence analysts, who need to know such things as how long will it take for terrorist network A to get to state X. Queries to the ASAM software such as these can be handled efficiently.

Algorithmically, the detector operates as follows: 1. Set t = 0, l0 = 0, where lt denotes the LLR at time t. 2. Set t = t + 1; t0 = t. Under H, initialize the forward variable αt (·|H) using (8); under K, initialize a multiple target tracker using (8). Compute the likelihood function under both hypotheses.

A simulation using this method is shown in section 6, Figure 6. Note that the dashed line indicates the current time and each of the graphs represent the probability of each state of the HMM as a function of time. For simplicity, only a five state HMM is shown, however, an accurate model of a terrorist network could be any number of states. It is discovered in

Probability of Possible Terrorist Activities

this simulation that the probability of the last state approaches unity around time slice eighteen. Had this been a real simulation of terrorist network, this would be convincing evidence that a terrorist attack will occur in the near future.

1 state 5 state 4 state 3 state 2 state 1

0.9

0.8

0.7

Figure 5 is an application of Page’s test to detecting terrorist networks. The underlying truth is that terrorist network HMM1 and HMM2 both begin at time t = 75. The HMMs are composed of 100 state models, which were randomly generated and superimposed over noise to produce the observations. The noise generated alters the ground truth with a probability of false alarm, and probability of miss detection, which are both set at 10%. Although randomly simulated, HMM1 and HMM2 have different temporal characteristics. HMM2 was given a longer duration period and is therefore much more difficult to detect. Since we wish to detect the terrorist Network HMMs as quickly as possible, a threshold for automatic detection should be set at approximately fifty.

0.6

Probability

6. S IMULATIONS

−−−−−−−−−−−−−−−−−−> Future Predictions

0.5

0.4

0.3

0.2

0.1

0

2

4

6

8

10 Time

12

14

16

18

Figure 6. This shows the probability of a five state terrorist network HMM. It is assumed that the last state will result in a terrorist attack. Note that the probability of attack approaches one around time slice eighteen.

HMM Detection 110

cal description of the current level of terrorist activity against the US and its allies. Provided that this information is reported to the appropriate decision-makers a means of preventing a terrorist attack can quickly be determined. Further documentation of the ASAM process and its capabilities is provided in [24].

100

90 HMM1

CUSUM Statistic

80

70

60

HMM2

50

40

30

0

20

40

60

80

100

120

140

Time

Figure 5. Detection of two 100 state terrorist network HMMs in the presence of noise. Pf a (probability of false alarm) and Pmd (probability of miss detection) are both 10%. HMM1 and HMM2 begin at time t = 75. Figure 6 shows the probability of a five state terrorist network HMM as a function of time. Each curve in the graph corresponds to a different HMM state model. As the observation of the terrorist network evolves the probability of the final state increases asymptotically to unity. The vertical dashed line indicates the current time and the measures following it are predictions based on (37). US intelligence agents will use graphs like these to evaluate terrorist threats and to select an effective response.

7. S UMMARY This paper addressed a method for detecting, tracking, and predicting possible terrorist activities via HMMs. Using these algorithms the ASAM tool will provide a meaningful statisti-

R EFERENCES [1]

Y. Bar-Shalom and X. Li, Multitarget-Multisensor tracking: principles and techniques, YBS Publications, 1995.

[2]

M. Basseville and L. Nikiforov, Detection of Abrupt Changes, Englewood Cliffs, NJ: Prentice Hall, 1993.

[3]

R. Bethel and G. Paras, “A PDF Multitarget Tracker,” IEEE Transactions on Aerospace and Electronic Systems, vol. 30, pp. 386-403, April 1994.

[4]

S. Blackman and R. Popoli, Design and Analysis of Modern Tracking Systems, Artech House, 1999.

[5]

R. Bansal and P. Papantoni-Kazakos, “An algorithm for detecting a change in a stochastic process,” IEEE Trans. Information Theory, vol. 32, pp. 227-235, March 1986.

[6]

L. Baum, T. Petrie, G. Soules, and N. Weiss, “A Maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains,” Ann. of Math. Stat., vol. 41, pp. 164-171, 1970.

[7]

K. Carley, “On the Evolution of Social and Organizational Networks,” in Networks In and Around Organizations (Andrews & Knoke, eds.), JAI Press, 1999.

[8]

B. Chen and P. Willett, “Detection of Hidden Markov Model Transient Signals,” IEEE Transactions on Aerospace and Electronic Systems, Vol. 36-4, pp. 12531268, December 2000.

[9]

B. Chen and P. Willett, “Superimposed HMM Transient Detection via Target Tracking Ideas,” IEEE Transactions on Aerospace and Electronic Systems, pp. 946956, July 2001.

[10] B. Chen and P. Willett, “Quickest detection of superimposed hidden Markov models using a multiple target tracker,” in Proc. IEEE Aerospace Conference, Aspen, CO, 1998. [11] S. Deb, K. Pattipati, V. Raghavan, M. Shakeri, R. Shrestha, “Multi-Signal Flow Graphs: A Novel Approach for System Testability Analysis and Fault Diagnosis,” IEEE Aerospace and Electronics Systems Magazine, May 1995. [12] C. Han, P. Willett, and D. Abraham, “Some methods to evaluate the performance of Page’s test as used to detect transient signals,” IEEE Trans. Signal Processing, August 1999. [13] B. Juang and L. Rabiner, “A probabilistic distance measure for hidden Markov models,” AT&T Technical Journal, vol. 64, pp. 391-408, 1985. [14] G. Lorden, “Procedures for reacting to a change in distribution,” Ann. of Math. Stat., vol. 42, pp. 1897-1908, June 1971. [15] J. MacMillan, M. Paley, Y. Levchuk, E. Entin, J. Freeman and D. Serfaty, “Designing the Best Team for the Task: Optimal Organizational Structures for Military Missions,” in New Trends in Cooperative Activities, (McNeese, Salas, & Endsley, eds.), Human Factors and Ergonomics Society Press, 2001. [16] T. Moon, “The Expectation-Maximization Algorithm”, IEEE SP Magazine, vol. 13, pp. 47-60, Nov. 1996.

[17] E. Page, “Continuous Inspection Schemes”, Biometrika, Vol41, pp100-115, 1954. [18] Pete, A., Pattipati, K., and Kleinman, D., 1993. “Distributed Detection in Teams with Partial Information: a Normative-Descriptive Model”, IEEE Transactions on Systems, Man and Cybernetics, Vol. 23, 1626-1648. [19] Pete, A., Pattipati, K., Y. Levchuk and Kleinman, D., 1998. “An Overview of Decision Networks and Organizations,” IEEE Trans. on SMC: Part C - Applications, Vol. 28, May, 173-192. [20] L. Rabiner, B. Juang, “An introduction to hidden Markov models”, IEEE ASSP Magazine, pp4-16, January 1986. [21] D. Siegmund, Sequential Analysis – Tests and Confidence Intervals, New York: Springer-Verlag, 1995. [22] R. Streit and R. Barrett, “Frequency line tracking using hidden Markov models,” IEEE Trans. ASSP, vol38, pp586-598, Apr. 1990. [23] A. Tanenbaum, Computer Networks, 4th ed., PrenticeHall, 2003. [24] H. Tu, J. Allanach, S. Singh, K. Pattipati, P. Willett, G. Levchuk & W. Stacy, “Information Integration among Multiple Agencies via Hierarchical Bayesian Networksm,” Proceedings of 2004 SPIE: Defense & Security Symposium — Technologies for Homeland Security and Law Enforcement Sensors, Command, Control, Communications, and Intelligence (C3I) Technologies for Homeland Security and Homeland Defense, Orlando, FL, April 2004. [25] A. Wald, Sequential Analysis, New York: Wiley, 1947.

Jeffrey Allanach Jeffrey Allanach is a graduate student of Electrical and Computer Engineering at the University of Connecticut. He received his BS from UConn in December, 2003, and expects to receive his MS in May 2005. Currently, his research interests include signal processing, and target tracking.

Haiying Tu Haiying Tu was born in 1972 in Ningbo, China. She received the BS degree in automatic control from Shanghai Institute of Railway Technology in 1993 and MS in transportation information engineering and control from Shanghai Tiedao University in 1996. She is currently a PhD student of Electrical and Computer Engineering at the University of Connecticut. Her research interests include Bayesian analysis, fault diagnosis and decision making.

Satnam Singh Satnam Singh is a PhD student at Systems Optimization Laboratory, University of Connecticut. He received his MS degree in Electrical Engineering from University of Wyoming. Currently, he is the chair of IEEEUConn Students Branch. His interests are signal processing and optimization.

Peter Willett Peter Willett is a Professor of Electrical and Computer Engineering at the University of Connecticut. Previously he was at the University of Toronto, from which he received his BASc in 1982, and at Princeton University from which he received his PhD in 1986. He has written, among other topics, about the processing of signals from volumetric arrays, decentralized detection, information theory, CDMA, learning from data, target tracking, and transient detection. He is a Fellow of the IEEE, is a member of the Board of Governors of IEEE’s AES society, and is a member of the IEEE Signal Processing Society’s SAM technical committee. He is an associate editor both for IEEE Transactions on Aerospace and Electronic Systems and for IEEE Transactions on Systems, Man, and Cybernetics. He is a track organizer for Remote Sensing at the IEEE Aerospace Conference (20012003), and was co-chair of the Diagnostics, Prognosis, and System Health Management SPIE Conference in Orlando. He also served as Program Co-Chair for the 2003 IEEE Systems, Man and Cybernetics Conference in Washington, DC.

Krishna Pattipati Krishna R. Pattipati received the B. Tech degree in Electrical Engineering with highest honors from the Indian Institute of Technology, Kharagpur, in 1975, and the MS and PhD degrees in Systems Engineering from the University of Connecticut in 1977 and 1980, respectively. From 1980-86 he was employed by ALPHATECH, Inc., Burlington, MA. Since 1986, he has been with the University of Connecticut, where he is a Professor of Electrical and Computer Engineering. His current research interests are in the areas of adaptive organizations for dynamic and uncertain environments, multi-user detection in wireless communications, signal processing and diagnosis techniques for power quality monitoring, multi-object tracking, and scheduling of parallelizable tasks on multi-processor systems. Dr. Pattipati has published over 260 articles, primarily in the application of systems theory and optimization (continuous and discrete) techniques to large-scale systems. He has served as a consultant to Alphatech, Inc. and IBM Research and Development, and is a co-founder of Qualtech Systems, Inc., a small business specializing in advanced integrated diagnostics software tools.

Suggest Documents