Joint Tracking and Classification Based on Bayes Joint Decision and Estimation X. Rong Li, Ming Yang
Jifeng Ru
Department of Electrical Engineering University of New Orleans, New Orleans, LA 70148 Phone: 504-280-7416, Email: {xli, myang2}@uno.edu
Arcon Corporation Waltham, MA 02451 Email:
[email protected]
Abstract— Many problems involve both decision and estimation where the performance of decision and estimation affects each other. They are usually solved by a two-stage strategy: decision-then-estimation or estimation-then-decision, which suffers from several serious drawbacks. A more integrated solution is preferred. Such an approach was proposed in [14]. It is based on a new Bayes risk as a generalization of those for decision and estimation, respectively. It is Bayes optimal and can be applied to a wide spectrum of joint decision and estimation (JDE) problems. In this paper, we apply that approach to the important problem of joint tracking and classification of targets, which has received a great deal of attention in recent years. A simple yet representative example is given and the performance of the JDE solution is compared with the traditional methods. Issues with design of parameters needed for the new approach are addressed.
Keywords: Target tracking, target classification, estimation, decision, Bayes approach.1 I. I NTRODUCTION Although some problems involve decision and/or estimation either alone or separately, good solutions to many other problems require joint decision and estimation (JDE). This is often the case for problems involving inter-dependent discreteand continuous-valued uncertainties and thus decision and estimation affect each other. The prevailing approach to JDE tries to make the best decision first disregarding estimation and then solve the estimation problem as if the decision were surely correct. This “decisionthen-estimation” strategy has several serious flaws. It does not account for the possible decision errors in the subsequent estimation; on the other hand, decision is made regardless of the result of estimation. Another strategy is “estimation-thendecision.” It would not work well if estimation is significantly dependent on decision or estimation is not secondary. In the general case, a joint approach would be more promising than separate decision and estimation as well as decision then estimation or estimation then decision. While JDE is an old problem and has been studied before, it has received increasing attention in recent years and efforts have been made to overcome the aforementioned drawbacks (see, e.g., [2]), especially for target inference (see, e.g., [1], [3], [4], [6]–[13], [15]–[20]). 1 Research supported in part by ARO grant W911NF-04-1-0274, NASA/LEQSF grant (2001-4)-01, and Navy through Planning Systems Contract # N68335-05-C-0382.
In the companion paper [14], an integrated approach to JDE based on a novel Bayes risk as a generalization of those for decision and estimation, respectively, was proposed. In the optimization theoretic parlance, this approach has the potential of arriving at a globally optimal solution, which is inherently superior in performance to the conventional twostage optimization strategy or separate decision and estimation, especially for problems where decision and estimation are highly correlated. The structure of the optimal solution in this framework and a converging JDE algorithm were presented in [14]. The power of the proposed framework is illustrated by applications to several difficult JDE problems in target inference, including track fusion and joint detection and tracking. In this paper, we apply the JDE approach of [14] to joint tracking and classification (JTC) of targets. We consider a simple yet representative JTC example, where three types of data are available: the first type is useful for both tracking and classification; the second is particularly good for classification but not directly useful for tracking, the third is particularly good for tracking but not directly useful for classification. We present optimal decision, optimal estimation, decision-thenestimation, estimation-then-decision, and our proposed JDE solution in the Bayesian setting for this example. In this way, we demonstrate how the proposed JDE solution work and contrast its performance with the existing methods. While the JDE approach of [14] is general, it relies on several design parameters. Only simple guidelines have been presented in [14]. In this paper, a case study is conducted concerning these design parameters that make a trade-off between decision and estimation performance. Also, we propose a general method of evaluating performance of joint decision and estimation in a comprehensive way. To our knowledge, it is the only method, be it comprehensive or not, available that evaluates joint decision and estimation performance. II. J OINT D ECISION AND E STIMATION Numerous practical problems involve joint decision and estimation, where decision and estimation are inter-dependent. Consider a JDE problem in which x is the estimatee (i.e., the quantity to be estimated) and the decision involves M candidates: D1 , . . . , DM . The complete Bayes solution to the problem of estimating x using data z is its posterior density f (x|z). By the same token, a complete Bayesian solution
to the problem of deciding on candidates D1 , . . . , DM is the set of posterior probabilities {P {D1 |z}, . . . , P {DM |z}}. We call this set of probabilities soft decisions, although this is unorthodox in decision theory. In the same spirit, when x and D are not independent, we may be interested in inferring them jointly in terms of the set of functions {P [(D, x)1 |z], . . . , P [(D, x)M |z]}, where P [(D, x)i |z] is the posterior mixed “probability-density.” Here the hybrid quantity (D, x)i = (x(Di ), Di (x)) signifies the inter-dependence of x and D. In some cases, a decision-then-estimation or an estimation-then-decision method is justifiable. In the general case, however, an integrated approach would be more promising than separate decision and estimation as well as the traditional two-stage strategies. Such an approach was presented in [14]. For convenience of the reader, its main results are presented below. A. An Integrated Approach to JDE The basic idea is to minimize the following Bayes risk for joint decision and estimation ¯= R
N M
(αij cij + β ij E[C(x, xˆ)|Di , Hj ])P {Di , Hj } (1)
i=1 j=1
where Di stands for the ith decision, which is equivalent to the event {z ∈ Di }, where Di is the decision region for Di in the data space; cij is the cost of decision Di when hypothesis Hj is true; C(x, x ˆ) is the estimation cost function; E[C(x, x ˆ)|Di , Hj ] is the expected cost conditioned on the case that the decision is made on Di but Hj is true; αij and β ij are relative weights of decision and estimation costs, which can be chosen to fit different requirements given by practical problems. This new Bayes risk generalizes the traditional ones in that (a) it is a risk for joint decision and estimation, (b) the decision set and the hypothesis set need not be one-to-one correspondent, (c) it can handle problems with incommensurable subproblems of estimation, and (d) the coefficients (αij , β ij ) provide additional flexibilities. For example, if M = N and ¯ with (αij , β ) = Di = “deciding on Hi ” for all i, then: (a) R ij (1, 0) and (0, 1) reduces to the traditional Bayes risk for ¯ is simply the decision and estimation, respectively; (b) R sum of the traditional Bayes risks for decision and estimation ¯ is a weighted sum of respectively if (αij , β ij ) = (1, 1); (c) R the product of decision and estimation costs if (αij , β ij ) = (0, cij ). ¯ the optiFor any given E[C(x, x ˆ)|Di , Hj ], to minimize R, mal decision D is D = Di ,
if Ci (z) ≤ Ck (z), ∀k
(2)
where the posterior cost is given by Ci (z) =
N
(αij cij + β ij E[C(x, xˆ)|Di , Hj ])P {Hj |z}
j=1
and given a set of decision regions {D1 , ..., DM } as a partition of the data space, the optimal estimator for (1) with C(x, x ˆ) =
x ˜ x ˜ is the following generalized conditional mean 1(z; Di )ˇ xi x ˆ=
(3)
i
where, for z ∈ Di , ¯ x ˇi = E[x|z] =
E[x|z, Hj ]P¯ {Hj |z}
j
β ij P {Hj |z} P¯ {Hj |z} = β P {Hk |z} k ik 1 z ∈ Di 1(z; Di ) = 0 else and these quantities are undefined if z ∈ / Di . The optimal joint decision-estimate (D, x ˆ) is the combination of the above optimal decision and optimal estimate. It can be obtained by the following iterative algorithm, which always converges. 1) Start from an arbitrary initial decision set {Di }r=0 as the initial partition {Di }, where r denotes the iteration index. 2) Estimation update: based on {Di }r , calculate x ˆij and P¯ {Di , Hj |z} to obtain the optimal estimate x ˆ with the corresponding conditional mean square error. ¯r 3) Decision update: calculate Ci (z) for each Di , then R is obtained. ¯ r and R ¯ r−1 . If the relative difference is 4) Compare R small enough, stop the iteration, so Di with the smallest posterior cost at iteration r is taken as the optimal decision and the corresponding x ˆ is taken as the optimal estimate; otherwise change decision regions to {Di }r and repeat steps 2, 3, 4. Simplification of this algorithm was discussed in [14]. Note that (3) reduces to the conditional mean if β ij is constant over j. Assume there is no decision error, namely, P {Hj |Di , z} = δ i−j if z ∈ Di and undefined otherwise, where δ i−j is the Kronecker delta: δ i−j = 1 if i = j and 0 otherwise. Then, the optimal estimator (3) reduces to x ˆ= 1(z; Di )E[x|z, Hi ] (4) i
which is the estimator in the traditional “decision-thenestimation” strategy. It is the conditional mean based on a single “correct” decision, whereas (3) is a weighted sum of the conditional means to account for possible decision errors. B. Performance Evaluation While many practical problems involve JDE, their solutions are evaluated so far only in terms of decision performance and estimation performance, separately. We are not aware of any measure, comprehensive or not, for evaluating joint decision and estimation performance. In this section, we propose a systematic method and measure for evaluating a JDE algorithm comprehensively based on statistical distance between the original data and the mock data generated by the JDE algorithm.
A large class of JDE problems can be formulated as follows. The ground truth is that the observation data z has a distribution F (z|s, x); that is, z ∼ F (z|s, x), where x is to be estimated and s is unknown with discrete (or finitely many possible) values. Probably more often, the exact form of F (z|s, x) is not known; rather, it is known that z = h(s, x, v), where v ∼ F (v|s, x) with known F (v|s, x). A JDE problem consists of two subproblems: decide on the s value and estimate x. With the assumption that s ∈ {1, 2, . . . , N }, it can be formulated as estimating x and testing the hypotheses: H1
vs.
H2
vs.
···
vs.
HN
where Hi : z ∼ Fi (z|x) and Fi (z|x) = F (z|s, x)|s=i . Let ξ = (s, x). Then, the solution to a JDE problem is ˆξ = (d, x ˆ), where x ˆ is the estimate of x and d ∈ {1, 2, . . . , N } is the decision. The basic idea of our method for evaluating JDE performance is to measure some distance between the true distribution F (z|ξ) and the distribution Fd (z|ˆ x) corresponding to the JDE result if they are available, such as in a simulationbased evaluation study, or some statistical distance between the original data z and the mock data zˆ generated by the JDE algorithm if the distributions are not available. Here, the mock data zˆ is generated randomly by the JDE algorithm with the result (d, x ˆ) via either the distribution Fd (z|ˆ x) or the data model that converts a JDE result to the data, such as zˆ = h(d, x ˆ, v) with v ∼ F (v|d, x ˆ). More specifically, let ρ(z, zˆ) = ∆[f (z|ξ), fd (z|ˆ x)]dF (z, ξ) where ∆[f (z|ξ), fd (z|ˆ x)] is the “distance” between f (z|ξ) and fd (z|ˆ x). If both s and x are random, generate ξ i ∼ f (ξ), i = 1, . . . , Ni , where f (ξ) is the prior distribution of ξ for performance evaluation determined by the evaluator, which could differ from the one used in the JDE algorithm. • For each pair ξ i = (si , xi ), generate zij ∼ f (z|ξ i ), j = 1, . . . , Nj , each may be a vector consisting of multiple data points. • For each zij , obtain ˆ ξ ij = (dij , x ˆij ) = g(zij ) by the JDE algorithm to be evaluated. • Let ρ(zi , z ˆij ) = ∆[f (z|ξ i ), fdij (z|ˆ xij )] be the “distance” between f (z|ξ i ) and fdij (z|ˆ xij ). Compute the final performance metric ρ(z, zˆ) = ∆[f (z|ξ), fd (z|ˆ x)]dF (z, ξ) ≈
Nj Ni 1 ρ(zi , zˆij ) Ni Nj i=1 j=1
We may use the total variation distance 1 |ft (z) − fd (z)|dz ∆tv (Ft , Fd ) = 2 1 = |pt (zi ) − pd (zi )| 2 i
where pt and pd are probability mass functions (pmfs). ρ(z, zˆ) will have a simple form when pmfs are involved: ρtv (z, zˆ) ≈
Nj Ni 1 |p(zik ) − pˆ(ˆ zijk )| 2Ni Nj i=1 j=1 k
III. TARGET T RACKING AND C LASSIFICATION Target tracking and classification (recognition) are coupled in many cases, but are often handled independently. For instance, while many tracking algorithms are mainly based on kinematic (e.g., radar, sonar) measurements and associated models, target classification is usually handled using identity or attribute data from, e.g., electronic support measure (ESM), high resolution radar, and acoustic, passive infrared, and seismic sensing modalities. The history in recent years has witnessed a great deal of effort in joint tracking and classification. In this section, we apply the integrated approach of [14] to joint tracking and classification via a simple yet representative example. A. Problem Statement Consider three types of data z1 , z2 , z3 having the same size from three different sensors that are perfectly synchronized. For simplicity, our presentation below is for the case in which each data point is a scalar, but the approach works for the general vector case without difficulty. The first type is obtained from infrared imagers (or other energy-selective sensing devices) modeled as z1j = θx + vj ,
j = 1, ..., n
(5)
where vj ∼ N (0, σ 2v ) is i.i.d. Gaussian noise; x denotes the target state, which has a normal prior N (¯ x, σ 2x ) and is independent of v; and the modulation term θ has two possible values, H0 : θ = θ0 (6) H1 : θ = θ1 which correspond to two possible classes of objects, e.g., humans and moving vehicles, respectively, since different classes of objects have different infrared features, assumed to be reflected in amplitude modulation. The second type of data is obtained from ESM sensors (or some devices based on image, acoustic, or seismic features). The readings are in identity type and used to indicate different target classes with certain probabilities: P {z2j P {z2j P {z2j P {z2j
= θ0 |θ = θ1 |θ = θ0 |θ = θ1 |θ
= θ 0 } = 1 − p0 = θ 0 } = p0 j = 1, ..., n = θ 1 } = 1 − p1 = θ 1 } = p1
(7)
Assume z21 , . . . , z2n are independent. The third type of data is obtained from kinematic sensing devices, say, radar, modeled as z3j = x + wj ,
j = 1, ..., n
where wj ∼ N (0, σ 2w ) is independent Gaussian noise.
(8)
The objective is not only to track the moving target x (with continuous uncertainty) but also to determine the target type θ. Note that type 2 and type 3 data do not provide direct information for tracking and classification, respectively. As a result, it is difficult to use type 2 data for tracking or use type 3 data for classification without joint decision and estimation. B. Conditional Independence It is assumed that measurement errors from different sensors are independent and thus the three types of data are conditionally independent; that is, f (z1 , z2 , z3 |Hi , x) = f (z1 |Hi , x)f (z2 |Hi , x)f (z3 |Hi , x) = f (z1 |Hi , x)f (z2 |Hi )f (z3 |x) It turns out that type 1 and type 2 data are independent conditioned on the hypothesis (without x): f (z1 , z2 |Hi ) = f (z2 |Hi )f (z1 |Hi ) which follows from
(9)
f (z1 , z2 |Hi ) =
=
f (z1 |Hi , x)f (z2 |Hi )f (x)dx
cov([z1 , z3 ] |Hi ) = [θ i 1 , 1 ] σ 2x [θi 1 , 1 ] + R
2 2 θ2i σ 2x 11 θi σ x 11 + σ 2v In = θ 2i σ 2x 11 σ 2x 11 + σ 2w In
θ 2i σ 2x 11 Ci = Ci3 = θ2i σ 2x 11 C3 where z¯3 = x ¯1, C3 = σ 2x 11 + σ 2w In , and, for i = 0, 1, ¯1, z¯i = θi x
Ci = θ2i σ 2x 11 + σ 2v In
It thus follows that f (z1 , z3 |Hi ) = N ([z1 , z3 ] ; [¯ zi , z¯3 ] , Ci3 ) f (z1 |Hi ) = N (z1 ; z¯i , Ci )
(12)
f (z3 |Hi ) = N (z3 ; z¯3 , C3 ) Note that clearly f (z1 , z3 |Hi ) = f (z1 |Hi )f (z3 |Hi ). Also, we clearly have
j=1
= f (z2 |Hi )f (z1 |Hi )
D. Classification by Bayesian Decision
Similar conditional independence holds between type 3 and type 2 data and between type 2 data and type 1 and type 3 data f (z2 , z3 |Hi ) = f (z2 |Hi )f (z3 |Hi )
(10)
f (z1 , z2 , z3 |Hi ) = f (z2 |Hi )f (z1 , z3 |Hi )
(11)
but type 1 and type 3 data are not independent conditioned on the hypothesis (without x) f (z1 , z3 |Hi ) = f (z1 |Hi , x)f (z3 |x)f (x)dx = N (z1 ; θi x1, σ 2v In )N (z3 ; x1, σ 2w In )N (x; x¯, σ 2x )dx where N (y; y¯, Cy ) stands for the pdf of a Gaussian variable n y with mean y¯ and covariance Cy , and 1 = 1n = [1, . . . , 1] . C. Likelihood Functions Under each hypothesis, type 1 and type 3 data, z = follow a linear model z = Hx + υ, where H0 : H1 :
E[[z1 , z3 ] |Hi ] = [θ i 1 , 1 ] x ¯ = [¯ zi , z¯3 ]
f (z2 |Hi ) = f (z21 , . . . , z2n |Hi ) n [pi δ z2j −θ1 + (1 − pi )δ z2jk −θ0 ] =
f (z1 , z2 |Hi , x)f (x|Hi )dx
variables x and υ (since they are independent Gaussian). As such, z is Gaussian. Under Hi ,
[z1 , z3 ] ,
H = H0 = [θ0 1 , 1 ] H = H1 = [θ1 1 , 1 ]
and υ = [v , w ] with R = cov(υ) = diag(σ2v In , σ 2w In ). Under each hypothesis, x and z are jointly Gaussian because they are two weighted sums of jointly Gaussian random
The optimal Bayes decision minimizes the so-called Bayes risk ¯D = R cij P {“Hi ”|Hj }P {Hj } i,j
which is a special case of (1). It decides on the hypothesis Hi with the smallest posterior cost, that is, Ci (z) ≤ Cl (z), ∀l, where Ck (z) = j ckj P {Hj |z} and z = (z1 , z2 , z3 ). In our example, we choose c00 = c11 = 0, c01 = c10 = 1. ¯ D becomes the probability With this choice, the Bayes risk R of decision error Pe = P {“Hi ”|Hj }P {Hj } i=j
and thus the optimal Bayes decision becomes the minimum decision-error decision. It is well known that this amounts to maximum a posteriori (MAP) decision, which decides on the hypothesis having the maximum posterior probability. It follows from Bayes’ theorem that the posterior probabilities are {H0 }f (z|H0 ) P {H0 |z} = P {H0 }f P(z|H 0 )+P {H1 }f (z|H1 ) P {H1 |z} = 1 − P {H0 |z}
(13)
While type 3 data z3 is potentially helpful for decision (through some kind of estimation), it is not clear how it should be used in a purely decision setting. For example, if conditioned on x type 3 data is independent of the other types of data, as is the case for our problem, the likelihood ratio of H1 to H0 conditioned on x remains unchanged with or without type 3 data. As a result, in an implementation of
the optimal Bayes decision, type 3 data is often ignored in practice (although it can actually be used to help decision) and only the first two types of data (z1 and z2 ) are used. Now let z = (z1 , z2 ) be the data used for decision. Since each element in z2 is Bernoulli distributed, to make decision on θ is actuallyto infer Bernoulli parameter p. It is known that zˆ2 = (1/n) nj=1 z2j is a sufficient statistic for p [5]. Note that nˆ z2 counts the number γ of z2j ’s that are equal to 1. Thus, nˆ z2 has a binomial (n, p) distribution: n γ P {ˆ z2 = γ/n|Hi } = p (1 − pi )n−γ γ i Thus, it follows from the conditional independence of type 1 and type 2 data that f (z|Hi ) = f (z1 |Hi )f (z2 |Hi ) = f (z1 |Hi )P {ˆ z2 = γ/n|Hi } n γ p (1 − pi )n−γ = N (z1 ; z¯i , Ci ) γ i By assuming equal priors of both hypotheses, P {H0 } = P {H1 }, from (13) it follows that
n−γ γ 1 −1 |C0 |1/2 1−p1 P {H0 |z} = 1 + pp10 exp S 1−p0 2 |C1 |1/2 P {H1 |z} = 1 − P {H0 |z}
(14)
where S = (z1 − z¯0 ) C0−1 (z1 − z¯0 ) − (z1 − z¯1 ) C1−1 (z1 − z¯1 ) As such, the decision rule is
n−γ γ |C0 |1/2 H0 p1 1 − p1 c10 − c00 S + 2 ln ≶ 2 ln p0 1 − p0 c01 − c11 |C1 |1/2 H1 (15) which, for our choice of (c00 , c01 , c10 , c11 ) = (0, 1, 1, 0), is the same as the MAP decision decides on Hi if P {Hi |z} > P {Hj |z}. E. Tracking by Bayesian Estimation The optimal Bayes estimator is the conditional mean x ˆ= E[x|z]. By the total expectation theorem, E[x|z, Hi ]P {Hi |z} = x ˆi P {Hi |z} x ˆ = E[x|z] = i=0,1
MSE(ˆ x|z) =
i=0,1
[Pi + (ˆ x−x ˆi )(ˆ x − xˆi ) ] P {Hi |z}
i=0,1
By a similar argument as for the Bayesian decision, in Bayesian estimation, only type 1 and type 3 data, z1 and z3 , are used directly and thus in the practical implementation, z = [z1 , z3 ] . As explained in Sec. III-C, z and x are jointly Gaussian under each hypothesis. It is well known that, for the jointly Gaussian case, the conditional mean and its MSE matrix are given by, under Hi , −1 x ˆi = E[x|z, Hi ] = x ¯ + Cxzi Czi (z − z¯i )
−1 xi |z, Hi ) = Cx − Cxzi Czi Cxzi Pi = MSE(ˆ
where Cxzi = σ 2x [θi 1 , 1 ]
Czi = σ 2x [θi 1 , 1 ] [θ i 1 , 1 ] + diag(σ 2v In , σ 2w In ) = Ci3 z¯i = [θi 1 , 1 ] x ¯ = [¯ zi , z¯3 ] , Cx = σ 2x
The posterior probabilities of hypotheses are P {H0 }f (z|H0 ) P {H0 }f (z|H0 ) + P {H1 }f (z|H1 ) P {H1 |z} = 1 − P {H0 |z} P {H0 |z} =
where f (z|Hi ) was given by (12). For equal prior probabilities of hypotheses, we have f (z|H0 ) f (z|H0 ) + f (z|H1 ) F. Classification before Tracking (Decision then Estimation) In this traditional approach to JDE, a decision (target classification) is first made concerning the hypotheses H0 and H1 , as in Sec. III-D, and then the target state x is estimated, as in Sec. III-E. More specifically, the data space is partitioned as {D0 , D1 } by the decision rule first and the target state estimator is given by (4): x ˆ= 1(z; Di )E[x|z, Hi ] P {H0 |z} =
i
In other words, if the decision is “Hi ”, which should be made following Sec. III-D, then the estimate is x ˆi = E[x|z, Hi ] and MSE(ˆ xi |z, Hi ), given in Sec. III-E. G. Tracking before Classification (Estimation then Decision) In this approach to JDE, the target state x is estimated first, as in Sec. III-E, and then a decision (target classification) is made concerning the hypotheses H0 and H1 , as in Sec. III-D. More specifically, let x ˆ be the Bayes target state estimator of Sec. III-E. Then f (z|Hi , x ˆ) = f (z1 |Hi , x ˆ)f (z2 |Hi , x ˆ) ˆ1, σ 2v In )P {ˆ z2 = γ/n|Hi } = N (z1 ; θi x n 1 1 2 = exp − (z − θ x ˆ ) 1j i 2σ2v (2πσ 2v )n/2 j=1 n γ · p (1 − pi )n−γ γ i The Bayes test then decides on H1 if (c10 − c00 )P {H0 } f (z|ˆ x, H1 ) >λ= f (z|ˆ x, H0 ) (c01 − c11 )P {H1 } Our choice (c00 , c01 , c10 , c11 ) = (0, 1, 1, 0) and equal prior probabilities of hypotheses P {H0 } = P {H1 } lead to λ = 1. Then the test can be simplified as n (θ1 − θ0 ) xˆ [ˆ z1 − (θ 1 + θ 0 ) x ˆ/2]
n−γ γ H0 1 − p1 p1 + ln ≶ 0 p0 1 − p0 H1 n where the sample mean zˆ1 = (1/n) j=1 z1j .
H. Joint Tracking and Classification
we have
The first type of data can serve both tracking and classification purposes directly. The second type is useful for decision but has no direct impact on estimation; the third type is useful for estimation but has no direct impact on decision. As a result, in the usual implementations of the traditional Bayes decision, Bayes estimation, and the two-stage approaches, one type of data is not used directly for either classification (decision) or tracking (estimation). However, our proposed joint approach uses all data without difficulty. Its performance is generally superior since more information is used. The joint solution can be achieved by iteration. Although the iteration may start from any decision/estimation results, we choose the one with smaller JDE cost from the conventional solutions. For the choice of JDE cost weights αij , β ij in (1), the values of αij will modify the decision cost in detail, and generally speaking, we would like to choose β 01 , β 10 < β 00 , β 11 so that the estimation error costs are taken into account (otherwise the generalized Bayes risk will be dominated by the cost associated with decision errors). For simplicity, here our JDE algorithm starts from the Bayes decision results: If the decision is “Hi ”, i.e., z ∈ Di , from (3), we have ¯ x ˆ = xˇi = E[x|z, Di ] = E[x|z, Di , Hj ]P¯ {Hj |z, Di } = x ˆj P¯ {Hj |z} j
j
where P¯ {Hj |z} =
β ij P {Hj |z} , z ∈ Di β i0 P {H0 |z} + β i1 P {H1 |z}
Then ˆ0 P¯ {H0 |z} + x ˆ1 P¯ {H1 |z} xˇi = x x ˆ0 β i0 P {H0 |z} + x ˆ1 β i1 P {H1 |z} = β i0 P {H0 |z} + β i1 P {H1 |z} To calculate the posterior JDE cost R(z) = (cij + β ij E[˜ x x ˜|Di , Hj , z])P {Di , Hj |z} i
j
the key is to obtain the part imported by estimation. It can be shown that mse(ˆ x|Di , Hj , z) E[˜ x x ˜|Di , Hj , z] xij − x ˆ) (ˆ xij − x ˆ) = mse(ˆ xij |z, Di , Hj ) + (ˆ where, for z ∈ Di , x ˆij = E[x|z, Di , Hj ] = E[x|z, Hj ] xij |z, Hj ) mse(ˆ xij |z, Di , Hj ) = mse(ˆ and x ˆ is obtained by (3). Since, under z ∈ Di , x ˆij − x ˆ=x ˆj − 1(z; Di )ˇ xi = x ˆj − x ˇi i
x|Di , Hj ) Eij mse(ˆ = E[mse(ˆ xij |z, Di , Hj )|Di , Hj ] ˆ) (ˆ xij − x ˆ)|Di , Hj ] + E[(ˆ xij − x xj − x ˇi ) (ˆ xj − x ˇi ) |Di , Hj = mse(ˆ xij |Di , Hj ) + E (ˆ xj − x ˇi ) (ˆ xj − x ˇi ) |Di , Hj = mse(ˆ xj |Di , Hj ) + E (ˆ For each decision region, mse(ˆ xij |z, Di , Hj ) = mse(ˆ xj |z, Hj ) −1 Hj , if z ∈ Di = σ 2x − (σ 2x )2 Hj Czj
Since it does not depend on observations, we have xij |z, Di , Hj )] mse(ˆ xij |Di , Hj ) = E[mse(ˆ = mse(ˆ xj |z, Hj ) Note that under z ∈ Di x ˆj − x ˇi = xˆj − =
k
β P {Hk |z} x ˆk ik l β il P {Hl |z}
xj − x ˆ )β P {Hk |z} k (ˆ k ik l β il P {Hl |z}
Therefore
E˜ij E (ˆ xj − x ˇi ) (ˆ xj − x ˇi ) |Di , Hj (ˆ xj − x ˇi ) (ˆ xj − x ˇi ) dF (z|Hj ) = z∈Di
y˜i y˜i =E |D , H i j ( l β il P {Hl |z})2 where y˜i = k (ˆ xj − x ˆk )β ik P {Hk |z}. It can be obtained by the Monte-Carlo method numerically Li 1 (i) (i) (i) (i) E˜ij ≈ x ˆj (zk ) − xˇi (zk ) xˆj (zk ) − x ˇi (zk ) Li k=1
where L and Li are the measurement counts in the MonteCarlo simulation: Generate data points z1 , . . . , zL with the distribution F (z|Hj ). Use the decision part to collect all the (i) (i) points z1 , . . . , zLi in Di , where i Li = L. Let cij = αij cij + β ij Eij . If c10 > c00 and c01 > c11 , for the next iteration, decide on H1 if
n−γ γ |C0 |1/2 c − c00 1 − p1 p1 > 2 ln 10 S + 2 ln 1/2 p0 1 − p0 c01 − c11 |C1 | otherwise decide on H1 if
n−γ γ |C0 |1/2 12 S 1 − p1 p1 (c01 − c11 ) e > (c10 − c00 ) p0 1 − p0 |C1 |1/2 The most straightforward stopping criterion is to check the ¯ in two adjacent iterations, which is not difference of R easy to calculate. Alternatively, we may stop the iteration if (k) (k+1) maxi,j |Eij − Eij | is below a threshold and there is no (k) (k+1) change in the decision (i.e., z ∈ (Di ∩ Di )), as explained in [14].
I. Performance Evaluation In this example, f (z1 |θ, x) = N (z1 ; θi x1, σ 2v I) n [pi δ z2j −θ1 + (1 − pi )δ z2jk −θ0 ] f (z2 |θ, x) = j=1
f (z3 |θ, x) = N (z3 ; x1, σ 2w I) f (z1 , z2 , z3 |θ, x) = f (z1 |θ, x)f (z2 |θ, x)f (z3 |θ, x) Let
ˆ ˆθ, x f(z| ˆ) = f (z|θ, x)|(θ,x)=(ˆθ,ˆx)
Considering pmf, we have g(z|θi , x) = f1 (z1 |θ, x)p2 (z2 |θ, x)f3 (z3 |θ, x) = N (z1 ; θi x1, σ 2v I)N (z3 ; x1, σ 2w I) n [pi δ z2j −θ1 + (1 − pi )δ z2jk −θ0 ] · j=1
and gˆ(z|ˆ θ, x ˆ) = N (z1 ; ˆ θˆ x1, σ 2v I)N (z3 ; x, σ 2w ) n [ˆ pi δ z2j −θ1 + (1 − pˆi )δ z2jk −θ0 ] · j=1
IV. S IMULATION R ESULTS In our simulation example, the following parameter values were used θ0 = 1, θ1 = 2, p0 = 0, p1 = 0.65, x ¯ = 1, σ 2x = 0.52 , σ 2v = 1, σ 2w = 0.5 With data length n = 10, the simulation results are based on M = 500 Monte-Carlo runs. A. Scenario 1: Data generated from H0 In this case, the simulated data is generated from H0 (humans). The inference results are listed in Table I. To obtain the above results, the weights of JDE cost are chosen as αij = α = 1, i, j = 0,1, and β 01 = β 10 , β 00 = β 11 . We add the constraint i β ij = B for the comparison purpose. Notice that the maximum possible mse (corresponding to the case where all classification results are incorrect) is around 0.35, to balance the decision and estimation impact, we chose B = 3. The root-mean-square errors (RMSE) of tracking the target state and the probability of correct classification PC are two conventional metrics for the tracking and classification performance, respectively, but not jointly. We are not aware of any measure in the literature for the overall joint performance. The JDE performance metric ρ proposed in Sec. II-B is used to measure the joint performance. For Ni = 200, Nj = 5, the results are shown in the lower part of Table I. Consider tracking errors. In the ideal case (always identify the target correctly) the root-mean square error is RMSEideal = 0.1767; for tracking without classification or tracking before
classification, RMSEE = RMSEE→D = 0.2532, which equals RMSEJDE with β ij /β ii = 1, as they should be; for classification before tracking, RMSED→E = 0.2403. This indicates that decision (classification) helps estimation (tracking) noticeably. Consider classification performance now. For classification without tracking or classification before tracking, (PC )D = (PC )D→E = 0.798, which equals (PC )JDE with β ij /β ii = 1, as they should be; for tracking before classification, PC = 0.894. This indicates that estimation (tracking) also helps decision (classification) significantly. These results for tracking and classification verify that in the H0 case for this example, decision and estimation will help each other. However, it is hard to compare the overall performance of the different strategies since none of the strategies is always better than the others in terms of both decision and estimation results. The JDE performance index ρ Sec. II-B is extremely useful in such a case where having an overall ranking is desirable. We calculated the ρ values for the two conventional strategies: For classification before tracking, ρD→E = 0.9976 × 10−2 ; for tracking before classification, ρE→D = 1.0105 × 10−2 . Note that ρE→D is not equal to ρJDE with β ij /β ii = 1, although RMSEE = RMSEE→D = RMSEJDE (β ij /β ii = 1) and (PC )D = (PC )D→E = (PC )JDE (β ij /β ii = 1). As a distance measure, a smaller ρ value indicates better performance. Therefore we conclude that decision then estimation is indeed better than estimation then decision in terms of JDE performance. By checking the ρ values of the proposed JDE solution listed in the table, we can see clearly that the JDE solution with β ij /β ii = 0 has the smallest ρ value, meaning that overall it outperforms the two existing strategies, although its decision performance is worse than that of the tracking before classification strategy. This weight choice agrees with our intuition that β 01 , β 10 should be smaller than β 00 , β 11 . B. Scenario 2: Data generated from H1 In this case, the simulated data was generated from H1 (moving vehicles). The results are listed in Table II. The weights of the JDE cost are the same as those in Scenario 1. For the tracking errors, RMSEideal = 0.1280; RMSEE = RMSEE→D = 0.1470; and RMSED→E = 0.1493. This time, decision (classification) does not help estimation (tracking). For the classification results, for classification without tracking or classification before tracking, PC = 0.884; for tracking before classification, PC = 0.882. Estimation (tracking) does not help decision (classification), either. This indicates that in this case, the conventional strategies cannot utilize the coupling between decision and estimation well to improve the inference results. By checking the upper part of the table, we found that the JDE solution with β ij /β ii = 0 outperforms the decisionthen-estimation and estimation-then-decision in decision and estimation performance concurrently. Then we checked the comprehensive performance index ρ: For classification before tracking, ρD→E = 0.8122 × 10−2 ; for tracking before
TABLE I S IMULATION RESULTS IN JDE SOLUTIONS ( TRUTH IS H0 )
β ij /β ii RMSE PC ρ(×10−2 )
0 0.1864 0.886 0.8327
10−3 0.1865 0.886 0.8331
.01 0.1877 0.886 0.8401
.1 0.2042 0.872 0.8979
.5 0.2386 0.826 1.0700
1 0.2532 0.798 1.1873
2 0.2680 0.790 1.3114
10 0.3030 0.786 1.5671
∞ 0.3210 0.782 1.7051
10 0.1769 0.862 1.1166
∞ 0.3311 0.856 1.9436
TABLE II S IMULATION RESULTS IN JDE SOLUTIONS ( TRUTH IS H1 )
β ij /β ii RMSE PC ρ(×10−2 )
0 0.1372 0.904 0.7809
10−3 0.1372 0.902 0.7809
.01 0.1368 0.904 0.7810
.1 0.1365 0.900 0.7950
classification, ρE→D = 0.8740 × 10−2 . By comparing the corresponding ρ values of the JDE solution listed in the table, we can also draw the same conclusion that the JDE solution with β ij /β ii = 0 is better than the two existing strategies in terms of the overall performance. V. C ONCLUSIONS AND D ISCUSSION In this paper, we applied the new proposed JDE framework to a joint target tracking and classification problem. The performance of the JDE solution was compared with two other existing strategies using both conventional methods and a joint performance index proposed by us. With an appropriate weight choice of the JDE cost, the JDE solution outperforms the other two strategies by overall evaluation. Note that the weight design of the JDE costs is problem dependent and subject to change based on the user’s preference. For instance, if we want to pay more attention to the estimation part, we should increase the values of β ij ’s (or equivalently decrease αij ’s) to achieve better estimation performance; and vice versa. For simplicity, the idea was illustrated by batch processing. The solution to the dynamic case is under investigation. R EFERENCES [1] D. Angelova and L. Mihaylova. Sequential Monte Carlo Algorithms for Joint Target Tracking and Classification Using Kinematic Radar Information. In Proc. 2004 International Conf. on Information Fusion, volume II, pages 709–716, Stockholm, Sweden, June 28-July 1 2004. [2] B. Baygun and A. O. Hero. Optimal Simultaneous Detection and Estimation Under a False Alarm Constraint. IEEE. Trans. Information Theory, IT-41(3):688–703, May 1995. [3] R. E. Bethel. Joint Detection and Estimation in a Multiple Signal Array Processing Environment. PhD thesis, George Mason University, Fairfax, VA, USA, 2002. [4] Y. Boers and H. Driessen. Integrated Tracking and Classification: An Application of Hybrid State Estimation. In Proc. 2001 SPIE Conf. on Signal and Data Processing of Small Targets, vol. 4473, pages 198–209, 2001. [5] G. Casella and R. L. Berger. Statistical Inference. Duxbury Press, Belmont, CA, 1990. [6] S. Challa and G. W. Pulford. Joint Target Tracking and Classification Using Radar and ESM Sensors. IEEE Trans. Aerospace and Electronic Systems, 37(3):1039–1055, 2001.
.5 0.1415 0.890 0.8391
1 0.1470 0.884 0.8872
2 0.1543 0.880 0.9496
[7] H. Chen, X. R. Li, and Y. Bar-Shalom. On Joint Track Initiation and Parameter Estimation under Measurement Origin Uncertainty. IEEE Trans. Aerospace and Electronic Systems, 40(2):675–694, Apr. 2004. [8] N. Gordon, S. Maskell, and T. Kirubarajan. Efficient Particle Filters for Joint Tracking and Classification. In Proc. 2002 SPIE Conf. on Signal and Data Processing of Small Targets, vol. 4728, pages 439– 449, Orlando, FL, USA, April 2002. [9] S. Herman and P. Moulin. A Particle Filtering Approach to FM-Band Passive Radar Tracking and Automatic Target Recognition. In Proc. IEEE Aerospace Conf., volume 4, pages 1789–1808, 2002. [10] S.M. Herman. A Particle Filtering Approach to Joint Passive Radar Tracking and Target Classification. PhD thesis, Univ. of Illinois at Urbana-Champaign, 2002. [11] D. Khosla and Y. Chen. Joint Kinematic and Feature Tracking Using Probabilistic Argumentation. In Proc. 2003 International Conf. on Information Fusion, pages 11–16, Cairns, Australia, July 2003. [12] M. Krieg. Joint Multi-sensor Kinematic and Attribute Tracking Using Bayesian Belief Network. In Proc. 2003 International Conf. on Information Fusion, pages 17–24, Cairns, Australia, July 2003. [13] A. D. Lanterman. Tracking and Recognition of Airborne Targets via Commercial Television and FM Radio Signals. In Proc. of SPIE Conf. on Acquisition, Tracking, and Pointing, XIII, volume 3692, pages 189– 198, Orlando, FL, 1999. [14] X. R. Li. Optimal Bayes Joint Decision and Estimation. In Proc. 2007 International Conf. on Information Fusion, Qu´ebec City, Canada, July 2007. [15] W. Mei, G.-L. Shan, and X. R. Li. An Efficient Bayesian Algorithm for Joint Target Tracking and Classification. In Proc. 7th International Conf. on Signal Processing, Beijing, China, Aug. 31–Sept. 4 2004. [16] M.I. Miller, A. Srivastava, and U. Grenander. Conditional-Mean Estimation Via Jump-Diffusion Process in Multiple Target Tracking/Recognition. IEEE Trans. Signal Processing, 43(11):2678–2690, 1995. [17] J.A. O’Sullivan, S.P. Jacobs, M.I. Miller, and D.L. Synder. A LikelihoodBased Approach to Joint Target Tracking and Identification. In Proc. 27th Asilomar Conf. on Signals, Systems and Computers, pages 290– 294, Nov. 1993. [18] J. A. O’Sullivan S. P. Jacobs. High Resolution Radar Models for Joint Target Tracking and Recognition. In IEEE International Radar Conference, Syracuse, NY, USA, May 1997. [19] P. Smets and B. Ristic. Kalman Filter and Joint Tracking and Classification in the TBM Framework. In Proc. 7th International Conf. on Information Fusion, volume I, pages 46–53, June 28-July 1 2004. [20] B. Vo and W.-K. Ma. Joint Detection and Tracking of Multiple Maneuvering Targets in Clutter Using Random Finite Sets. In Proc. 2004 Control, Automation, Robotics and Vision Conference, volume 2, pages 1485–1490, Dec. 2004.