Asymptotic Efficiency of the PHD in Multitarget ... - Semantic Scholar

7 downloads 258 Views 643KB Size Report
mention the Multiple Hypotheses Tracking (the MHT) [2],. P. Willett was supported by the U.S. Office of Naval Research under Grants. N00014-09-10613 and ...
1

Asymptotic Efficiency of the PHD in Multitarget/Multisensor Estimation Paolo Braca,∗ Stefano Marano,† Vincenzo Matta† and Peter Willett‡

Abstract—Tracking an unknown number of objects is challenging, and often requires looking beyond classical statistical tools. When many sensors are available, the estimation accuracy can reasonably be expected to improve, but there is a concomitant rise in the complexity of the inference task. Nowadays, several practical algorithms are available for multitarget/multisensor estimation and tracking. In terms of current research activity one of the most popular is the probability hypothesis density, commonly referred to as the PHD, in which the goal is estimation of object locations (unlabeled estimation) without concern for object identity (which is which). While it is relatively well understood in terms of its implementation, little is known about its performance and ultimate limits. This paper is focused on the characterization of the PHD estimation performance for the static multitarget case, in the limiting regime where the number of sensors goes to infinity. It is found that the PHD asymptotically behaves as a mixture of Gaussian components, whose number is the true number of targets, and whose peaks collapse in the neighborhood of the classical maximum likelihood estimates, with a spread ruled by the Fisher information. Similar findings are obtained with reference to a na¨ıve, two-step algorithm which first detects the number of targets, and then estimates their positions. Index Terms—Unlabeled multi-object estimation, random finite sets, RFS, probability hypothesis density, PHD, multiple sensors.

I. I NTRODUCTION

E

STIMATION and tracking of multiple objects, from a set of measurements gathered at multiple remote units, is an emerging problem of relevance to a number of engineering areas [1]. New challenges arise in designing and characterizing the inference algorithms, due to the very distinct features of such MultiTarget-MultiSensor (MTMS) problem. As regards “multitarget,” the main differences with respect to classical estimation theory are related to the unknown number of objects to be estimated, and to the fact that they are unlabeled. As regards “multisensor,” we often have unlabeled reports, i.e., we don’t know which target is observed via which measurement. Moreover, the complexity of the algorithms increases due to the need to track multiple objects, such that it is usually not easy to find efficient implementations with a relatively large number of sensors. In the last few years, great progress has been made both at theoretical and practical levels. From a theoretical standpoint, many solution paradigms have emerged, among which we mention the Multiple Hypotheses Tracking (the MHT) [2], P. Willett was supported by the U.S. Office of Naval Research under Grants N00014-09-10613 and N00014-10-10412. ∗ NATO STO Centre for Maritime Research and Experimentation, La Spezia, Italy, Email: [email protected]. † University of Salerno, Fisciano, Italy, Email: {marano/vmatta}@unisa.it. ‡ University of Connecticut, Storrs CT, Email: [email protected].

[3], Joint Probabilistic Data Association (the JPDAF) [3], [4], and the comprehensive methodological approach for multiobject stochastic systems provided by the Point Process (PP) theory [5], [6], with specific emphasis on the Random Finite Set (RFS) formulation [7]–[10]. Despite the availability of many working solutions, less is known about their performance bounds nor the fundamental limits of inference for MTMS problems. It is important to recognize that the Bayesian Cram´er-Rao lower bound (BCRLB) [11], [12] limits achievable estimation performance, and in the case of multiple objects – but labeled ones – the computation of the Fisher information matrix (FIM) that governs the BCRLB simply requires the appropriate joint terms. Even in the case (our case) of unlabeled measurements, often called the data association problem, there is a variant of the BCRLB, one that incorporates an information reduction factor (IRF) (e.g., [12], [13]). However, for unlabeled objects of unknown cardinality we are aware of no results. That is our interest here. An increasing role in multitarget tracking is currently being played by the Probability Hypothesis Density (PHD) filter [14], which is roughly interpreted as a target-occupancy surface whose peaks should indicate target positions. It is attracting increasing attention, motivating different derivations, interpretations and implementations [15]–[17], extensions to the Cardinalized PHD (CPHD) [18], multisensor versions [19], [20], and so on. One main contribution of this paper is the asymptotic characterization of the PHD in the limiting regime where the number of sensors goes to infinity. As a reference, we consider the clairvoyant Maximum Likelihood (ML) estimator – which knows in advance the true number of targets – and its Fisher information. We start by studying an embarrassingly na¨ıve target state estimator we will refer to as FIDES (FIrst Detect then EStimate). The FIDES strategy basically amounts to implement a two-stage maximum likelihood rule, which first detects the number of targets, and then estimates their positions. We are able to show that this estimator is asymptotically equivalent to the clairvoyant ML estimator: it is asymptotically efficient, with performance in the many-sensor case predicted by rather than bounded according to the Fisher terms. We then prove that the PHD function behaves asymptotically as a mixture of as many Gaussian components as the true number of targets. The components are centered on the maximum likelihood estimates of the target positions, and have variances which shrink to zero as the number of sensors goes to infinity. Loosely speaking, it becomes progressively narrower and peakier around the true target states, according to the common belief that the PHD is a reasonable candidate

2

for multitarget estimation. But more important, we show that the way its mass becomes concentrated is ruled by the familiar Fisher information, and that, relative to the clairvoyant scheme, the PHD is asymptotically efficient. In this paper, we focus on the static case (no tracking issues considered) – which is relevant in many different contexts, such as track initialization in dynamic problems, tracking of multiple objects when the scan frequency is substantially lower than the intrinsic system dynamics (e.g., tracking flies) and so on – but is really a first attempt that, at least for now, avoids the complexity of the dynamic estimation problem. Particularly relevant is the multisensor static case, arising, for instance, in the single-scan multistatic scenario [21], [22], in sensor networks, or in multiple radar systems, since, as the number of sensors increases, accurate estimation is performed even with a single scan. Our analysis accordingly focuses on a genuine multisensor concept, where the information streams gathered from multiple screens are fused in parallel, which in principle differs from single-sensor filtering of multiple time scans. The paper, which extends the results of [23], is organized as follows. In Sect. II we pose and formalize the estimation problem considered. Section III is devoted to the FIDES strategy, while in Sect. IV we address the asymptotic results about the PHD. Numerical experiments are presented in Sect. V. II. T HE ESTIMATION PROBLEM In this section we introduce the model and define a proper estimation setting. It is important to stress that the multitarget/multisensor problem considered is well-known to be amenable to different formalizations. In the following, we opt for simplicity, perhaps paid in the coin of generality. Since our goal is to show how some of the most popular MTMS inference algorithms relate to classical and fundamental quantities (such as, e.g., Fisher Information), we feel that this choice helps to highlight the physical meaning of the results. On the other hand, it is certainly of interest to generalize these to different observation and/or target models, for which a more general formalism might be necessary; this is left for future work. Nonetheless, we felt it appropriate to offer some insights and references as starting points for such generalization.

(i.e., the number of measurements) is random as well. The corresponding density of z m , parametrized in θ k , is  k k  p(m; θ ) fm (z m ; θ ) m > 0, k m (1) γ(z ; θ ) =  k p(0; θ ) m = 0, with

γ(z 0 ; θ k ) +

∞ Z X

γ(z m ; θ k )dz m = 1,

m=1

where z 0 and θ 0 represent the absence of measurements and targets, respectively. Throughout the paper, when the integration domain is not specified, it is intended to correspond to the entire space of interest, e.g., in the last equation, to Rm . In the above model, the physical process generating the observations can be regarded as resulting from the following two-step procedure: first, a number of data m is generated according to the discrete probability distribution p(m; θ k ); then, the measurements are generated according to fm (z m ; θ k ), a probability density function (pdf) on Rm . Following the formalization proposed in [5], [6], given the underlying probability spaceS (Ω, F, P), we use as state space ∞ the countable union X = m=0 Rm , where R0 means an isolated point representing the lack-of-measurements event, i.e., m = 0. The random variable z m is accordingly defined in terms of a measurable mapping Ω → X . In addition, the observations are statistically independent and identically distributed (iid) across sensors, so that the probability density of the whole data set is γ(Z N ; θ k ) =

N Y

k n γ(z m n ; θ ),

(2)

n=1

where Z N is the corresponding aggregate of measurements mN 1 zm 1 , . . . , zN . For the sake of concreteness, we shall refer to a classical observational model, the so-called Measurement-Origin Uncertainty (MOU) model [3], see details in Appendix A, which incorporates the essential features of a MTMS problem. B. Target (un-)labeling and ML (un-)identifiability

A. The measurements In a surveillance region Θ, there are k targets, whose locations are given by the vector θ k = [θ1 , . . . , θk ], θi ∈ Θ. For simplicity and without loss of generality, the surveillance region is assumed to be one-dimensional, Θ ⊂ R, such that θi is the position of the i−th target on the real line. Consider a system made of N sensors sending their raw observations to a fusion center. Based on the received data, a fusion strategy is adopted in order to estimate the number of targets and their positions inside the screen. We wish to avoid confusion: here θ k is the deterministic parameter to be estimated, and the focus is on the asymptotic achievable performance when the number of sensors N goes to infinity. For a given θ k , let us denote by z m = [z1 , . . . , zm ] the random data vector collected by a sensor, whose size m

A fundamental remark must be made before formulating any estimation problem. Targets are physical objects, and as such, they can be labeled. On the other hand, they are in the context of this manuscript not directly observed, and, in many typical situations, the measurement model is insensible to target switching, namely, having the pair of targets (1, 2) in positions (a, b) is indistinguishable from having them in positions (b, a). Reflecting this, the density γ(z m ; θ k ) is invariant to any index permutation of θ k , k > 0. This is typically regarded as a pathology in the standard inference literature, in the sense that it leads to a non-identifiable estimation problem [24]. Assuming that the target positions are distinct, in the statistics literature this phenomenon is commonly referred to as label switching, see, e.g., [25] and references therein. Briefly, the label switching issue can be dealt

3

in place of ξ k00 , with the understanding that the cardinality of ξ 0 is k0 . By P0 [·] and E0 [·] we denote the probability and expectation operators, respectively, evaluated under the true model γ(Z N ; ξ 0 ). In the following we shall be concerned with convergence of random variables as N goes to infinity. The P P0 will represent weak convergence and consymbols 0 and −→ vergence in probability [24], respectively, under γ(Z N ; ξ 0 ). To start with, we refer to a Clairvoyant Maximum Likelihood (C-ML) estimator which knows the number of targets. The C-ML estimator is defined as

(b) 5

4

4

3

3

2

2

1

1

ξ1

θ1

(a) 5

0

0

−1

−1

−2

−2

−3

−3

−4

−4

−5 −5 −4 −3 −2 −1

0

1

2

3

4

5

−5 −5 −4 −3 −2 −1

0

1

2

3

4

5

ξ2

θ2

Fig. 1. Panel (a), aggregate likelihood of target positions θk = [θ1 , θ2 ], k = 2. Panel (b), aggregate likelihood of ordered target positions ξk = [ξ1 , ξ2 ], k = 2. The true targets locations are [0, −2].

with as follows. First, let ξ k = [ξ1 , . . . , ξk ] = [θν(1) , . . . , θν(k) ] be the ordered vector of the observations1 , where θν(1) < · · · < θν(k) .

(3)

The appropriate procedure is to estimate the vector of ordered target positions ξ k , with the interpretation that ξi is not the position of the target with label i, but rather the i−th among the overall k targets according to the ordering rule considered, (3). This clearly removes the non-identifiability pathology described. We note in passing that this viewpoint is consistent with the idea of estimating a target state represented by a set, instead of a vector, see, e.g., [7], [8]. Due to the (target) permutation invariance, the distribution of the observed data, for a given ordered vector ξ k , equals the distribution of the observed data given any scrambled version thereof. Accordingly, the corresponding density is still denoted by γ(Z N ; ξ k ). A pictorial illustration of the label switching problem, and the way of circumventing it, is given in Fig. 1. In panel (a) we display the likelihood function γ(Z N ; θ k ), with k = 2, as a function of (θ1 , θ2 ), which corresponds to the original, unordered target positions. In panel (b) we display the likelihood function γ(Z N ; ξ k ), with k = 2, as a function of (ξ1 , ξ2 ), which corresponds to the ordered target positions. As can be seen, the former is symmetric to vector permutations, yielding an ambiguity in the estimation problem. The latter works instead in the restricted parameter space of ordered targets, making the problem of estimating the ordered target positions meaningful. C. Ideal performance: clairvoyant ML and Fisher Information Before studying the asymptotic properties of the commonly employed MTMS estimators, it is clearly desirable to have some benchmark performance. From a statistical engineering perspective, it would be even more comfortable to deal with such classical and fundamental quantities as Fisher information. With a slight abuse of notation, we shall denote the true state of the (ordered) targets as ξ 0 = [ξ01 , . . . , ξ0k0 ],

k0 > 0,

(4)

1 We would like to stress here that other ordering rules can be arbitrarily chosen, without affecting the whole reasoning and results. This is especially relevant when moving to higher dimensions, where the choice of a particular ordering criterion may be less immediate.

k0 b ξ = arg max γ(Z N ; ξ k0 ), ξ k0

(5)

where the search is restricted to the ordered vectors ξ k0 . The corresponding per-sensor Fisher Information Matrix (FIM) will be denoted by J0 :   2 ∂ log γ(z m ; ξ 0 ) , (6) {J0 }ij = − E0 ∂ξ0i ∂ξ0j in which ξ0i is the ith object in the ordered vector ξ 0 – note that this is a joint FIM of the ordered ξ 0 , and that via γ (see (5)) this FIM accounts for MOU explicitly via the IRF (see Appendix A). We are now in the position of stating the asymptotic normality property of the C-ML estimator. Throughout the paper, we suppose that the classical regularity assumptions needed for these asymptotics to hold are valid, see, e.g., [26]–[28]. P ROPERTY 1 (Asymptotic normality of C-ML) Let ξ 0 be the true parameter vector, with k0 > 0. The C-ML defined in eq. (5) satisfies  √  k0  P N b ξ − ξ 0 0 N 0, J−1 . (7) 0 ⋄ Note that, in the above, the estimator depends upon N . To avoid cumbersome notation, that dependence is omitted. III. FIDES STRATEGY: F IRST D ETECT THEN E STIMATE The clairvoyant maximum likelihood estimator is clearly an idealization. Any sensible inference strategy must be unavoidably faced with the unknown number of targets. The most na¨ıve solution is perhaps a separate, two-stage, inference procedure that first detects the number of targets, and then computes an estimate of their positions. As a matter of fact, this separate philosophy has been already applied, see, e.g., the GMAP-I estimator proposed in [7]. We note also that separate estimation of target cardinality and location is a current research topic even from a multi-hypothesis tracker (MHT) perspective, see, for instance, the Kalman-based cardinality estimator in [29]. In the following, we exploit the two-stage concept to design a specific strategy, referred to as FIDES, which is genuinely non-Bayesian in both stages, and whose detection step may become particularly appealing from an implementation point of view. It will be shown that the FIDES strategy achieves the same asymptotic performance of the clairvoyant ML estimator introduced in Sect. II-C. In order to state and prove this claim, we first formally introduce the FIDES estimator, and then the

4

performance error figure under consideration. The two-stage FIDES inference procedure is now detailed. In the following, we assume that no more than kmax targets are allowed.

The FIDES inference strategy can be summarized by the following algorithm: FIDES

Detection stage. The first step is to define a detection statistic for the kmax + 1 possible hypotheses k = 0, 1, . . . , kmax . In general, a sufficient statistic for detection should rely upon the whole data set Z N , but, thanks to the specific features of our problem, this step can be substantially lightened by using only the cardinalities of the measurements, namely, [m1 , . . . , mN ]. In fact, according to the MOU model detailed in the appendix, we have (PD is the detection probability, and µ(·) the discrete probability distribution of the number of false alarms): k   X k P d (1 − PD )k−d µ(mn − d), (8) p(mn ; θ k ) = d D d=0

which tells us that the distribution of mn does not depend upon the target positions, but only upon their number k, and will be accordingly denoted by p(mn ; k). The ML detector based upon this coarser information is b k = arg max0≤k≤kmax

N Y

p(mn ; k).

(9)

n=1

For any fixed value of k0 , the error probability P0 [b k 6= k0 ] of the above scheme goes exponentially fast to zero, see, e.g., [30]. In order to gain further computational savings (to be paid in performance), a sample-mean detector will be also considered in this work, that is  $ !'+ PN   1 m n n=1 b −λ , kmax , (10) k = min   PD N

where a+ = max{a, 0}, and ⌊a⌉ is the nearest-integer rounding function. Since E0 [mn ] = PD k0 + λ, asymptotic consistency of this detection rule straightforwardly follows by the weak law of large numbers: PN P0 n=1 mn P0 k0 , (11) −→ E0 [mn ] ⇔ b k −→ N and the continuity at integers of the rounding function [24]. The above convergence immediately implies that the error probabilities vanish asymptotically. In addition, they exhibit an exponential decay rate provided that the moment generating function of mn exists and obeys mild regularity conditions, see, e.g., [31]. For Poisson distributed clutter, a typical assumption in MOU, these conditions are straightforwardly met.

Estimation stage. The second step is to define an estimation procedure that uses the cardinality inference made at the first stage, namely, the estimated number of targets b k. A simple and direct procedure is to use b k instead of k0 in (5), namely, for any b k > 0:   b k b b (12) ξ = arg max γ Z N ; ξ k , b ξk b k

the search being restricted to the ordered vectors ξ .

INFERENCE STRATEGY

• Step i Take the data cardinalities [m1 , . . . , mN ] and compute b k using either (9) or (10).

• Step ii Take data Z N and run the ML (12) with b k obtained at step i.

A. Performance figures The definition of an error measure accounting for the uncertainty on the number of targets is less immediate than would be used in classical estimation problems. The error measure adopted in this work is as follows. Recall that we are interested in the estimation performance as N gets large, for a fixed true target state ξ 0 . Accordingly, when the true state is k0 = 0, we should measure a mere detection error. The first-stage detection procedures employed ensure an exponential decay of the error probability. In this paper, we content ourselves with this result (we have no interest in optimizing the error exponent), the attention being instead turned to the most challenging situation of k0 > 0, which requires estimating the entries of ξ 0 , namely, the target positions. Let us accordingly fix k0 > 0. Without loss of generality, assume a centered surveillance region (−V /2, V /2). When the estimated cardinality differs from the true value k0 , a pes0 = [V, . . . , V ] is assumed, that corresponds simistic error ekmax to the worst case where the error is equal to the gate size V in each component of the vector. On the other hand, when the true cardinality has been correctly estimated, b k = k0 , the error is measured by the classical difference between the estimated (ordered) target positions and the true ones. Accordingly: ( b k b k0 k = k0 , ξ − ξ 0 if b e = (13) 0 if b k 6= k0 . ekmax

defines the error vector. Alternative definitions of the error can certainly be adopted. One which attracted recent interest in the context of multiobject systems is the so-called OSPA (Optimal Subpattern Assignment) metric [32]. Let us start by considering the case b k ≥ k0 . The OSPA metric first computes the possible b k (sub)vectors made of k0 elements of b ξ , and then selects the one lying at minimum distance from ξ 0 . In addition, a penalty term is added to take into account an error on the true cardinality. Formally, for b k ≥ k0 : dOSP A

" #1 k0  2 2 X 1 b k ξbπ(i) − ξ0i + V 2 b = p min k − k0 , b k π i=1 (14)

5

where the summation runs over all the possible permutations2 . The opposite case k0 > b k is easily managed by exchanging in b k the above formula b k with k0 , and b ξ with ξ 0 . To be precise, when b k = k0 = 0, dOSP A = 0. We shall show in the sequel that, in the asymptotic regime of large N , the (Euclidean) norm of the error ||ek0 || and the OSPA metric become essentially equivalent, but for a constant factor.

T HEOREM 1 (Asymptotic properties of the FIDES strategy) Let ξ 0 be the true parameter vector, with k0 > 0. Assuming that property 1 holds, the FIDES strategy is asymptotically equivalent to the C-ML: N ek0

P0

 N 0, J−1 . 0 b k

(15)

k0



the Proof. When b k = k0 , we have b ξ = b ξ , otherwise √ k0 maximum error emax is made. The error term N ek0 can be accordingly recast as  √  k0     √ 0 I b N ekmax k 6= k0 + N b ξ − ξ0 I b k = k0 , (16)

where I(·) is the indicator function. Let us focus on the first term. For all ǫ > 0 we have h√ i h  i  0 P0 k 6= k0 . (17) || I b N ||ekmax k 6= k0 > ǫ ≤ P0 b Recalling that i the selected detector is consistent, we have h k 6= k0 → 0 as N goes to infinity, such that the first term P0 b in the summation (16) vanishes  in probability. By property 1 √  k0 the first term N b ξ − ξ 0 converges in distribution to a zero-mean multivariate Gaussian with covariance matrix given by the inverse per-sample FIM. The claim now follows by direct application of Slutsky’s theorem [28]. • From the perspective of judging the quality of the proposed estimator, it is of interest to investigate the behavior of its error covariance matrix. In particular, this will help in understanding the connections with the C-ML and the impact of the unknown number of targets. Straight from (13), we can accordingly define the error covariance matrices h T i (18) R = E0 ek0 ek0 for our system, and

Rc = E0

 T    k0 k0 b ξ − ξ0 ξ − ξ0 b

(19)

for the clairvoyant. Now, it holds true that, for any vector a 2 Actually,

aT Ra

= ×

the general definition of the OSPA metric considers squared distances truncated at an arbitrary cut-off level. On the other hand, one possibility is clearly setting this latter as the gate size, and we directly opted for this.

=

h

2 i aT ek0 h i h i 2 0 + P0 b k = k0 P0 b k 6= k0 aT ekmax    2 k0 b ξ − ξ0 E0 aT b k = k0 i h 2 0 + aT R c a k 6= k0 aT ekmax P0 b  h i  aT P0 b k 6= k0 V 2 11T + Rc a, (20)

= E0



We now prove the asymptotic efficiency of FIDES.



of cardinality k0 :

where 1 is the column vector made of all ones, and we exploit k0 the definition h of eimax . The above equation tells us that the k 6= k0 V 2 11T + Rc − R is positive semidefinite, matrix P0 b that is h i P0 b k 6= k0 V 2 11T + Rc − R ≥ 0. (21)

Now, the error covariance matrix of the C-ML typically3 attains the Cram´er-Rao Lower Bound (CRLB), with a scaling rate N −1 . In view of decay of the detection i h the exponential k 6= k0 , this means that the asymptotic error probability P0 b scaling of the error covariance matrix for the FIDES strategy is determined by that of the C-ML. In particular, the behavior of the system in terms of Mean Square Error (MSE) can be retrieved from eq. (21): i h h 2 i 0 k 6= k0 k2 P0 b ≤ kekmax E0 ek0 {z } | FIDES 

2 

k0

(22) + E0 b ξ − ξ0 . {z } | C−M L

The above findings are consistent with the simulation results to be seen later in Fig. 2, where the MSE is initiallyh (low number i 0 k 6= k0 , and k2 P0 b of sensors) dominated by the term kekmax then converges to the asymptotic clairvoyant ML error. Some remarks are now in order. In principle, detecting the number of targets and estimating their states are actually special cases of inference on multi-target states [7]. In this regard, the asymptotic equivalence shown between the twostage FIDES strategy and the clairvoyant maximum likelihood offers new insights. The bottom line here is that, as N grows, the impact of the unknown number of targets disappears first, and the dominant remaining uncertainty is that ascribed to the target positions ξ k0 . This can be interpreted in the light of the difference between hypothesis testing and parameter estimation, as discussed in [34]. A discrete parameter space and an exponential error-scaling are the essential features of the former, to be contrasted with a continuous parameter space and the N −1 scaling law of the latter. This in the end explains the observed behavior.

3 In principle, property 1 tells nothing about the behavior of R , because, c in general, convergence in distribution does not imply convergence of the moments. Extra-conditions, such as uniform integrability, are required, see, e.g., [33, p. 52, eq. (1.74)]. On the other hand, these are very often met in practical problems.

6

Before concluding this section, consider the system behavior in terms of the OSPA metric introduced in subsection III-A. For N large enough b

k k0 b ξ ≈b ξ ≈ ξ0 ,

such that it is expected that the permutation minimizing the summation in (14) is (asymptotically) the standard ordering4 , just what we adopted in iour estimation strategy. Moreover, we h b showed that P0 k 6= k0 converges to zero at an exponential rate, such that the cardinality-error term V 2 |b k −k0 | considered in (14) is expected to be asymptotically negligible, yielding k0 d2OSP A ≈ 1, ||ek0 ||2

(23)

suggesting that, for sufficiently large N , the error norm ||ek0 || and the OSPA metric are basically equivalent aside from a constant factor. IV. A SYMPTOTIC EFFICIENCY OF THE PHD As pointed out in [14], a convenient tool to address the estimation/tracking problem is the so-called probability hypothesis density, namely, a function D(θ|Z N ) whose integral in a given region corresponds to the expected number of targets in that region, namely: Z D(θ|Z N )dθ = E[no. of targets in A|Z N ], (24) A

where the expectation is computed under the posterior5 distribution of the target state, given the available observations Z N . Different estimators can be in principle conceived by exploiting this property. One of the most accessible amounts to finding the PHD peaks and, given that they represent sufficient mass, using their locations as estimates of the target positions. The main result we present lends support to this common practice: as a matter of fact, in the limit of a large number of sensors, the PHD behaves like a mixture of Gaussians, where i) the number of components equals the true number of underlying targets; ii) each Gaussian component is centered on the clairvoyant ML estimates; iii) each component has a spread ruled by the FIM of the clairvoyant system. In formulas: s ( ) k0 X N (θ − ξbi )2 exp −N , (25) D(θ|Z N ) ≈ 2πσi2 2σi2 i=1 where the variances of the Gaussian terms are defined in terms of the FIM as  σi2 = J−1 . (26) 0 ii

A preliminary, pictorial view of the above behavior can be gained by looking at Figs. 5 and 6, which are discussed in detail in Sect. V. The way the PHD can be used for inference purposes can be summarized by the following algorithm: in the one-dimensional scenario considered, if b k = k0 , the optimal permutation is always the standard ordering. 5 Up to now, we have not treated the target state as a random quantity, such that the reader might be confused by the adoption of a posterior distribution. This will be clarified soon, see Sect. IV-A. 4 Actually,

PHD

INFERENCE STRATEGY

• Step i Take data Z N and compute D(θ|Z N ). • Step ii R Evaluate numerically b k = D(θ|Z N )dθ.

• Step iii Find the locations of the highest b k peaks.

Clearly, other inference procedures can be devised starting from the PHD, and the above is just one possibility, which is commonly adopted, and provides in turn useful insights about the forthcoming results. A. Target Priors and Posteriors The PHD is inherently Bayesian, being indeed made of posterior densities. This notwithstanding, we are still focusing on the estimation problem as formalized in Sect. II, namely, that of deterministic parameter estimation. We wish to avoid confusion here. The standard way to perform an asymptotic study of Bayesian estimators [28], [35] is that of defining a suitable a-priori distribution on θ k (which may be either grounded on a sensible prior knowledge, or be artificially introduced), and constructing the corresponding Bayesian estimator. The performance thereof are then studied in the limit of a large number of samples, where the effect of the a-priori information is eventually “washed out.” Let us accordingly define the a-priori density   p(k) fk (θ k ) k > 0, k (27) π(θ ) =  p(0) k = 0,

with

0

π(θ ) +

∞ Z X

π(θ k )dθ k = 1.

k=1

As usual, in the above p(k) is the a-priori distribution on the number of targets, and fk (θ k ) is a pdf on Rk . Due to target unlabeling, the prior is invariant to any index permutation of θ k . Furthermore, that at most kmax targets are Precalling kmax p(k) = 1. assumed, we have k=0 The posterior density of θ k given the measurements Z N is   p(k|Z N ) fk (θ k |Z N ) k > 0, k (28) ϕ(θ |Z N ) =  p(0|Z N ) k = 0, with

0

ϕ(θ |Z N ) +

∞ Z X

k=1

ϕ(θ k |Z N )dθ k = 1.

In particular, for k > 0: fk (θ k |Z N ) = Z

γ(Z N ; θ k ) fk (θ k ) γ(Z N ; θ k ) fk (θ k )dθ k

is an ordinary posterior pdf on Rk .

(29)

7

Priors and posteriors on ξ k are obtained by classical order statistics in terms of the distributions of the unordered vectors. In particular, we define   k!fk (ξ k |Z N ) ξ1 < · · · < ξk , k e (30) fk (ξ |Z N ) =  0 elsewhere.

For later use, we also introduce the corresponding i−th marginal posterior, which is denoted by Z e fk,i (ξ|Z N ) = fek (ξ k |Z N )dξ k−i , (31) dξ k−i k

means integration with respect to all the compowhere nents of ξ aside from the i−th . The posterior distribution exhibits some interesting asymptotic properties, which are key in determining the asymptotic behavior of the PHD. Intuition suggests that the posterior pdf, as N increases, becomes more and more concentrated around the true value of the parameter. This concept has been investigated and made mathematically precise in different ways, see, e.g., [35]–[38]. The neat formulation employed in [37] turns out to be particularly convenient for our purposes, and leads to the following6 :

P ROPERTY 2 (Asymptotic Normality of the Posterior) Let ξ 0 be the true parameter vector, with k0 > 0, σi2 =  −1 k0 J0 ii , and let b ξ = [ξb1 , . . . , ξbk0 ] be the C-ML estimator. Then, the i−th marginal posterior fek0 ,i (ξ|Z N ) is asymptotically normal in the following sense:  2 Z b Z ξbi +b √σi N y 1 P0 e exp − dy, fk0 ,i (ξ|Z N )dξ −→ √ σ i 2 2π a ξbi +a √ N (32) where a, b ∈ R, with a < b. ⋄ Some remarks about the interconnections between properties 1 and 2 are now in order. While asymptotic normality of the ML does not necessarily imply asymptotic normality of the posterior nor vice versa [36], it is very common that both are verified for a large class of problems of practical interest. In [37] a suite of sufficient conditions for the validity of both asymptotic properties is provided. B. PHD asymptotics The PHD [14] has been introduced and popularized in the context of Finite Set Statistics (FISST). Given the observed data Z N , the probability hypothesis density is a function of a scalar variable θ defined as [8]: D(θ|Z N ) = ψ(θ|Z N ) Z ∞ X 1 + ψ({θ, y1 , . . . , yk−1 }|Z N )dy (k−1) , (k − 1)! k=2 (33) 6 The

result in [37] is offered for the simplest case where the data are real random variables. The case of an arbitrary state space can be derived from the more general setup presented in [38, chapter 10].

where ψ({θ1 , θ2 , . . . , θk }|Z N ) is the multitarget posterior as defined in the FISST framework [8]. It is formally a function of a set {θ1 , θ2 , . . . , θk }, which can be equivalently regarded as a function of vector-valued argument: ψ({θ1 , θ2 , . . . , θk }|Z N ) = ψ(θ1 , θ2 , . . . , θk |Z N ),

(34)

with the proviso of being invariant to any permutation of its vector-valued argument, as a consequence of the observed lack of ordering in a set. In our context, the probabilistic description of the target/data model adopted corresponds to the following multitarget posterior [8]: ψ({θ1 , . . . , θk }|Z N ) = k! ϕ(θ k |Z N ),

(35)

ϕ(θ k |Z N ) being the posterior density in eq. (28), such that eq. (33) can be rewritten, with a slight abuse of notation: Z ∞ X k ϕ(τ k |Z N )dθ k−1 , (36) D(θ|Z N ) = ϕ(θ|Z N ) + k=2

having defined, for k ≥ 2:

τ k = [θ, θ1 , . . . , θk−1 ].

(37)

We are now in the position of stating and proving the main theorem about the asymptotic properties of the PHD. T HEOREM 2 (Asymptotic normality of PHD) Let ξ 0 be the true parameter vector, with k0 > 0, σi2 =  −1 k0 J0 ii , and let b ξ = [ξb1 , . . . , ξbk0 ] be the C-ML estimator. Assuming that property 2 holds, we have, for i = 1, . . . , k0 :  2 Z b Z ξbi +b √σi N 1 y P0 √ D(θ|Z N )dθ −→ exp − dy. (38) σ 2 2π a ξbi +a √ i N



Proof. Using eq. (28), eq. (36) can be recast as D(θ|Z N ) = p(1|Z N )f1 (θ|Z N ) Z kX max k p(k|Z N ) fk (τ k |Z N )dθ k−1 , + k=2

(39)

implying Z bN

D(θ|Z N )dθ = p(1|Z N )

aN

+

kX max

k p(k|Z N )

k=2

Z

bN aN

Z

Z

bN

f1 (θ|Z N )dθ

aN

fk (τ k |Z N )dθ k−1 dθ, (40)

where, for ease of notation, we have defined σi σi (41) aN = ξbi + a √ , bN = ξbi + b √ . N N It is now possible to characterize the behavior of the single terms of the summation in eq. (40), namely, of Z bN f1 (θ|Z N )dθ, p(1|Z N ) aN

k p(k|Z N )

Z

bN

aN

Z

fk (τ k |Z N )dθ k−1 dθ,

k ≥ 2. (42)

8

We start by showing that, for any k 6= k0 , they vanish in probability. By application of the Bayes rule to the joint density of measurements and targets γ(Z N ; θ k )π(θ k ), it is straightforward to obtain R γ(Z N ; θ k0 )π(θ k0 )dθ k0 p(k0 |Z N ) = Pkmax R γ(Z N ; θ k )π(θ k )dθ k k=0  −1 R kX max k k k γ(Z ; ξ )e π (ξ )dξ N  , = 1 + R k0 k0 k0 )dξ )e π (ξ γ(Z ; ξ N k6=k0 (43)

R

0

0

0

where γ(Z N ; θ )π(θ )dθ denotes, with abuse of notation, the quantity γ(Z N ; θ 0 )π(θ 0 ), and, in the last equality, we recast the integrals in terms of the ordered vector ξ k , having introduced the pertinent prior density π e(ξ k ). This is convenient because it removes the unuseful redundancy implied by the permutation invariance of the model. The single terms of the summation in eq. (43) can be rewritten as Z h i k0 e(ξ k )dξ k γ(Z N ; ξ k )/γ(Z N ; b ξ ) π Z h , (44) i k0 k0 k0 k0 b e(ξ )dξ γ(Z N ; ξ )/γ(Z N ; ξ ) π

k0 where, we recall, b ξ is the clairvoyant ML estimator defined by eq. (5). Consider first the denominator: by application of the standard Laplace method, the pertinent integral goes to zero as N −k0 /2 [35]. As to the numerator, by exploiting the conditional independence of the model in (2), the consistency of the ML estimator and the strong law of large numbers give: k0 1 γ(Z N ; b ξ ) → DKL (ξ 0 ||ξ k ), log N γ(Z N ; ξ k )

(45)

where DKL (ξ 0 ||ξ k ) denotes the Kullback-Leibler divergence between the two models γ(z m ; ξ 0 ) and γ(z m ; ξ k ), and the convergence is with probability one, under the true model γ(Z N ; ξ 0 ). Assuming that the statistical models corresponding to different number of targets are identifiable, we have7

the proof, it remains to characterize the behavior of the term in eq. (42) corresponding to k = k0 . Let us focus on k0 > 1. The case k0 = 1 is much simpler, and will follow straightforwardly. It is expedient to introduce the set (strict inequalities suffice due to the existence of densities): Tord = {θ k0 −1 : θ1 < · · · < θk0 −1 },

which, for any fixed argument θ of the PHD function, can be S k0 Tj , with also written as Tord = j=1

Tj = {θ k0 −1 : θ1 < . . . , θj−1 < θ < θj < · · · < θk0 −1 }. (47) We have the following chain of equalities: Z k0 fk0 (τ k0 |Z N )dθ k0 −1 Z (a) = k0 ! fk0 (τ k0 |Z N )dθ k0 −1 Tord

(b)

=

k0 !

k0 Z X j=1

(c)

=

k0 Z X j=1

(d)

=

k0 X j=1

Tj

fk0 (τ k0 |Z N )dθ k0 −1

fek0 (θ1 , . . . , θj−1 , θ, θj , . . . , θk0 −1 |Z N )dθ k0 −1

fek0 ,j (θ|Z N ),

Thus, eq. (45) implies that, for any ξ k in the parameter space, the integrand at the numerator of (44) vanishes exponentially with N , while we have shown that the denominator vanishes only as N −k0 /2 . Under classical regularity conditions used in the theory of ML estimation [39], which allows interchanging integration and limit operators, the above is enough to conclude that the ratio (44) vanishes. This, in the light of eq. (43), implies that p(k0 |ZN ) converges to unity. In order to complete 7 For

the considered MOU model, the Kullback-Leibler divergence is easily lower bounded by the divergence between the discrete probability distributions corresponding to cardinality-only measurements, see eq. (8), which in turn depend only on the number of targets, yielding min DKL (ξ0 ||ξk ) ≥ DKL (k0 ||k) > 0, ξk where DKL (k0 ||k) is the divergence corresponding to cardinality-only measurements.

(48)

where (a) follows by the permutation invariance of the pertinent pdf; (b) from the fact that the Tj ’s are disjoint; (c) from the definition of the ordered-vector pdf (30); finally, in (d) we simply used definition (31). Summarizing, we have: Z bN Z bN fek0 ,i (θ|Z N )dθ D(θ|Z N )dθ = p(k0 |Z N ) aN

aN

+ p(k0 |Z N )

+ ρk6=k0 , min DKL (ξ 0 ||ξ k ) > 0. ξk

(46)

k0 Z X j6=i

bN

aN

fek0 ,j (θ|Z N )dθ

where ρk6=k0 is the remainder corresponding to the terms with k 6= k0 in eq. (40), which has already been shown to vanish in probability. By property 2, it follows that the second term at the right hand side converges to zero in probability (note that the j−th marginalized posterior fek0 ,j (θ|Z N ) is integrated in a shrinking neighborhood of the i−th target estimate, not the j−th one). The claim of the theorem now follows by direct application of the same property 2 to the first term at the right P0 1. • hand side, clearly using p(k0 |Z N ) −→ R EMARK 1. Equation (38) tells that the PHD is asymptotically Gaussian in the neighborhood of each target estimate. R Pkmax P0 k0 , which k p(k|Z N ) −→ Moreover, D(θ|Z N )dθ = k=1 reveals that the PHD mass is essentially all concentrated around these estimates. As a result, the limiting behavior is that of a mixture of Gaussian components. R EMARK 2. The proof of the theorem implicitly contains the further result that, in the particular case k0 = 0, the area under

9

(a)

0

!0.5

!5

!4

!3

!2

!1

0

Gate

1

2

3

4

5

350

300

300

250

150

λ =2

100

λ =1

50

0

0 !ekmax ! 2 P0 !5

10

0

10

! # " k "= k0

1

2

10

10

3

10

N

200

λ =3

150

λ =2

100

λ =1

50 0.5

σ

1

0

1.5

0.5

350

300

300

250 200

λ =3

150

λ =2 λ =1

100

Fig. 2. Uppermost plot: True target positions. Lowermost plot: squared norm of the error vector for the FIDES strategy, with first-stage using the ML detector in eq. (9), or the sample-mean detector in eq. (10). Also displayed are the error pertaining to the clairvoyant ML estimator, its CRLB, and the errors associated to a wrong cardinality. The relevant system parameters are PD = 0.99, V = 10, σ = 0.5, λ = 2, and kmax = 5.

the PHD, computed in any subset of the surveillance region, goes to zero in probability. V. N UMERICAL EXPERIMENTS We now report some numerical experiments, aimed at intuition, at providing a check for the theoretical analysis, and at investigating the practical impact of a finite number of sensors. We consider as a first case study a system with detection probability PD = 0.99, gate size V = 10, noise standard deviation σ = 0.5, average number of clutter echoes λ = 2 and maximum number of targets kmax = 5. The true target positions are ξ 0 = [−1, 0], see the uppermost plot in Fig. 2. We apply the proposed FIDES strategy, using, at the first stage, the detectors presented in Sect. III, namely, the ML detector (9) and the sample-mean detector (10). The actual error performances have been estimated by means of 103 Monte Carlo runs. The results of the analysis are summarized in the lowermost plot of Fig. 2. First, the empirical MSE of the clairvoyant system is computed, and, as seen from the figure, it attains its theoretical CRLB8 for sufficiently large N . Then, the detection error i h 0 k 6= k0 in eq. (22) is displayed, and, as k2 P0 b term kekmax expected, the simplified rule is outperformed by the ML, although both exhibit an exponential governance. The complete picture considers the overall error ||ek0 ||2 for the FIDES strategy. We observe that, again, the system using a simplified rule as its first stage is a little inferior. In the smallN regime, the error is dominated by the wrong-cardinality effect, and, accordingly, i it is almost coincident with the term h 2 k0 b kemax k P0 k 6= k0 in eq. (22). This behavior can be partly ascribed to the inefficiency of a cardinality estimator that is independent of the measurement values. Then, a knee-point in N appears, such that, as predicted by theorem 1, the FIDES errors begin to coincide with the some words about the computation of the CRLB see Appendix A.

0

0.5

σ

1

σ

1

1.5

(d)

350

50

8 For

250

(c)

Knee point

||e (k 0 )|| 2

0 FIDES ! eq.(9) FIDES ! eq.(10) C!ML CRLB

10

λ =3

200

Knee point

Target positions

(b)

350

Knee point

Knee point

0.5

250 200 150

λ =3 λ =2 λ =1

100 50

1.5

0

0.5

σ

1

1.5

Fig. 3. Knee-points as functions of σ, with PD = 0.9 (uppermost panels) and PD = 0.99 (lowermost panels), for three different values of λ = 1, 2, 3. Plots (a), (c) and (b), (d) refer to the sample-mean detector, and to the ML detector, respectively.

asymptotic CRLB of the clairvoyant system – when the number of sensors N is greater than this knee-point value, the scheme can be considered efficient. As can be seen, a transition region between the two regimes arises, and the sharp separation is reminiscent of the different scaling laws for detection (exponential in N ) and estimation (hyperbolic in N ). Finally, note that the loss in performance consequent to the choice of a simplified first stage detector appears to be tolerable. It is worth investigating the dependence of this kneepoint upon the main system parameters, namely PD , σ, and λ. Accordingly, in Fig. 3 the values of the knee-point are computed by Monte Carlo simulation, as functions of the noise standard deviation σ, for three different values of the clutter rate λ = 1, 2, 3. The leftmost plots (a) and (c) refer to the simplified detection rule, while the rightmost ones (b) and (d) pertain to the ML detector. In the uppermost plots we have PD = 0.9, while in the lowermost ones PD = 0.99. As expected, all curves are monotonically decreasing with σ. This can be easily explained by considering that the higher the noise variance, the higher the CRLB. Recalling that the detectors employed at first stage are based on cardinality information only (i.e., they do not depend upon σ), this uppershift of the CRLB implies a left-shift of the knee-point. The curves are also decreasing with PD , meaning that with higher detection probabilities the knee-point is sooner encountered. Finally, the curves are increasing with λ, that is, higher clutter rates produce a degradation of the detection performance, yielding a right-shift of the knee-point. Clearly, the results of the knee-point analysis are expected to depend on the particular first stage employed by FIDES, namely, a detector based only on the measurements’ cardinalities. A further comparison is made in terms of the OSPA metric, see Fig. 4. We repeat the analysis with PD = 0.9, V = 10, σ = 1, λ = 1, and kmax = 3. As a first evidence, the scaled OSPA-error k0 d2OSP A (see eq. (23)) stays below ||ek0 ||2 . On

10

D (θ|ZN )

5

1

10

N =2

6

Monte Carlo integration Asymptotic approximation

N =5

5

D (θ|ZN )

6

FIDES FIDES − OSPA CRLB

2

10

4 3 2

4 3 2

0

10

||e(k0 )||2

1 0 −5

−1

10

1 0

0 −5

5

θ 6

0

5

θ

N =10

6

N =20

−2

10

−3

10

−4

10

0

10

1

2

10

4 3 2 1

3

10

5

D (θ|ZN )

D (θ|ZN )

5

4 3 2 1

10

N

0 −5

0

θ

5

0 −5

0

5

θ

k0 d2OSP A

Fig. 4. Comparison between the scaled OSPA metric and the squared error ||ek0 ||2 for the FIDES strategy. The relevant system parameters are PD = 0.9, V = 10, σ = 0.5, λ = 1, and kmax = 3.

the other hand, as expected, the two error measures reconcile when N is sufficiently large: the impact of a target-cardinality error disappears, and the permutation chosen by the OSPA becomes equivalent to conventional ordering. Let us now switch to the results pertaining to the PHD, summarized in Figs. 5 and 6. The relevant system parameters are PD = 0.9, V = 10, σ = 0.3, λ = 1, and kmax = 3. In Fig. 5, we consider the case that k0 = 2 targets are effectively present in the system, and that they are located in ξ 0 = [−3, 0], while in Fig. 6, we set k0 = 3, and ξ 0 = [−3, 0, 1.5]. For each figure, we draw a realization of the data, corresponding to four different values of N = 2, 5, 10, 20, and accordingly compute the single summands in eq. (39) via Monte Carlo integration, in order to get the overall PHD function. The priors used in the PHD evaluation are chosen as uniform over the gate and over the number of targets. We also compute and display the theoretical curves corresponding to Gaussian mixtures centered in the C-ML estimates, with variances given by the diagonal entries of the inverse FIM. In Fig. 5 we see that a very good agreement is achieved even with relatively few sensors. In the case of N = 2, a small spurious bump appears approximately at θ = 2, while the peaks corresponding to the true targets are well pronounced, and fit in shape the asymptotic Gaussian mixture. Similar conclusions can be drawn by looking at Fig. 6. Here two sensors appear to be insufficient, but, as theorem 2 tells, convergence is met for increasing values of N . Note that a very good match with the asymptotic results is observed with only five sensors. VI. C ONCLUSION The static multitarget/multisensor inference problem has been addressed, with emphasis on the analytical characterization of the estimation performance, in the limiting regime of a large number of sensors. We focused on two specific approaches, namely, an ML-based separate detection/estimation method (FIDES), and the classical probability hypothesis density (PHD). The basic results are here summarized.

Fig. 5. Numerical check of PHD asymptotics: the exact PHD functions have been evaluated by Monte Carlo integration. The approximating curves, given by (25), are Gaussian mixtures with number of components corresponding to the true number of targets, locations corresponding to the C-ML estimates, and weights determined by the Fisher information matrix elements. The relevant system parameters are PD = 0.9, V = 10, σ = 0.3, λ = 1, and kmax = 3. The true target positions are ξ0 = [−3, 0].

The performances of FIDES are asymptotically equivalent to those of the clairvoyant maximum likelihood estimator, that is, they are finally expressed in terms of Fisher information. • We discovered that the PHD function, as the number of sensors gets large, is approximated by a mixture of k0 Gaussian densities (k0 being the true number of targets), becoming peakier and peakier around the maximum likelihood estimates of the target positions. The spread of these Gaussian densities is governed by the Fisher information matrix. This basically tells that, even if something beyond the conventional estimation methodologies is needed in the multitarget case, the role of a classical and fundamental quantity such as the Fisher information is preserved. The bottom line is that two of the most popular multitarget estimation algorithms are not only reasonable, they are, in a sense, efficient. This becomes particulary relevant in modern and next-generation multitarget systems, where more and more remote sensing units are, or will be available. •

A PPENDIX A M EASUREMENT-O RIGIN U NCERTAINTY MODEL In this appendix, we summarize the assumptions of the MOU model, and provide the pertinent statistical description. Let us fix the target parameter vector θ k . Due to the identical distribution across sensors, the suffix n denoting the particular sensor is skipped all throughout this section. • A generic target is observed within a sensor gate (−V /2, V /2) with detection probability PD , and the detection events of distinct targets are independent of each other. At most one measurement per sensor can come from a particular target, and a measurement cannot

11

D (θ|ZN )

5

N =2

6

Monte Carlo integration Asymptotic approximation

4 3 2 1 0 −5

0

3 2

0 −5

5

0

5

θ

N =10

6

N =20

5

D (θ|ZN )

D (θ|ZN )

{A}ij

{A}ij

1

5 4 3 2 1 0 −5

data-association matrix A [3], [8]:

4

θ 6

N =5

5

D (θ|ZN )

6



4 3 2 1

0

θ

5

0 −5

0

5

θ

Fig. 6. Numerical check of PHD asymptotics: the exact PHD functions have been evaluated by Monte Carlo integration. The approximating curves, given by (25), are Gaussian mixtures with number of components corresponding to the true number of targets, locations corresponding to the C-ML estimates, and weights determined by the Fisher information matrix elements. The relevant system parameters are PD = 0.9, V = 10, σ = 0.3, λ = 1, and kmax = 3. The true target positions are ξ0 = [−3, 0, 1.5].











be originated by multiple targets. The number of detected targets is denoted by d. The observations z m = [z1 , z2 , . . . , zm ] collected at the remote sensor, can be either made of c clutter echoes and d target-originated measurements (m = c + d), or they can be all clutter-originated. The number of false alarms has a discrete probability distribution µ(c), and the average number of false alarms is denoted by λ. Given the number of false alarms, their positions (clutter returns) are independent, identically distributed random variables, also independent from the target positions. Typically, a Poisson distribution µ(c) = λc −λ is assumed, and clutter echoes are (conditioned c! e on c) uniform random variables spreading over the gate (−V /2, V /2). The d detected targets (d ≤ k) will be denoted by θi1 , θi2 , . . . , θid , with i1 < i2 < · · · < id . Among the collected measurements z1 , z2 , . . . , zm , the observations coming from the detected targets will be labeled as zj1 , zj2 , . . . , zjd . Summarizing, the target/measurements association pairs can be compactly denoted as (ih , jh ), with h = 1, 2, . . . , d. In order to give a formal MOU model, it is convenient to introduce the following events

=

1 iff target i originates meas. j,

=

0 otherwise,

(49)

whence the size of A is obviously k × m. Given that at most one measurement can come from a particular target, and that a measurement cannot be originated by multiple targets, matrix A contains at most one “1” per each row and per each column. A particular association matrix A determines m(A) measurements, d(A) detected targets, and the pairs (ih (A), jh (A)) of associated targets/measurements. To avoid cumbersome notation, the explicit dependence upon A of the above quantities will be skipped. The probability of getting a particular A is P[A] = P[A1 ]P[A2 |A1 ],

(50)

d P[A1 ] = PD (1 − PD )k−d µ(m − d),

(51)

where

and

(m − d)! , (52) m! m! in that (m−d)! , is the total number of possible associations of d targets with m measurements, which are equally likely in view of the unlabeling assumption. • Given an association matrix A, the target-originated measurements zjh are independent random variables, with pdf G(zjh − θih ) – truncated to the gate (−V /2, V /2) – where h = 1, 2, . . . , d. A common model for G(·) is a zero-mean Gaussian with variance σ 2 . The above assumptions lead to the following density at the single remote sensor (m > 0, k > 0): P[A2 |A1 ] =

γ(z m ; θ) =

d X P(A) Y G(zjh − θih ) R V /2 m−d V G(z − θi )dz A

×

h=1

−V /2

I(z m ∈ Hm (V )),

h

(53)

where the summation is taken over all the possible association matrices A of size k × m, and Hm (V ) denotes a centered m-cube of side length V . When A is such that d = 0, the product in the above equation should be read as 1. In practice, one has V ≫ σ, such that, ruling out those situations where the R V /2targets are close to the screen boundaries, the integrals G(z − θih )dz are usually set to 1. −V /2 In the case of the MOU the single-object CRLB involves an IRF (see, e.g., [13]). At present, no results on a multiobject IRF under MOU are available. However, in Figs. 2 and 4 such a CRLB is nonetheless reported. In our scalar A1 = {targets i1 , i2 , . . . , id , have been detected, problem this is achieved by computing (6) numerically using c = m − d measurements are clutter} , (53) for γ(z m ; ξ 0 ). That is, the issue of an IRF for multi-object A2 = {association pairs are (ih , jh ), h = 1, 2, . . . , d} . estimation under MOU – a subject for future investigation on its own, and well beyond the scope of this paper – is obviated. The intersection of these events A = A1 ∩ A2 will be ACKNOWLEDGMENT referred to as an association event, that, according to the The authors would like to thank the anonymous reviewers MOU, cannot be observed, the system being unable to for valuable suggestions, which in particular allowed to imdistinguish which is which. Each association event can be compactly described via a prove the derivation of theorem 2.

12

R EFERENCES [1] Y. Bar-Shalom and D. Blair, Multitarget/Multisensor Tracking: Applications and Advances – Volume III. Norwood, MA: Hartec House Inc., 2000. [2] S. Blackman and R. Popoli, Design and Analysis of Modern Tracking Systems. Artech House, 1999. [3] Y. Bar-Shalom, P. Willett, and X. Tian, Tracking and Data Fusion: A Handbook of Algorithms. Storrs, CT: YBS Publishing, 2011. [4] Y. Bar-Shalom, F. Daum, and J. Huang, “The probabilistic data association filter,” IEEE Control Syst. Mag., vol. 29, no. 6, pp. 82–100, Dec. 2009. [5] J. Moyal, “The general theory of stochastic population processes,” Acta Mathematica, vol. 108, no. 1, pp. 1–31, 1962. [6] D. Daley and D. Vere-Jones, An Introduction to the Theory of Point Processes, Volume I: Elementary Theory and Methods. Springer, 2003. [7] I. R. Goodman, R. P. Mahler, and H. T. Nguyen, Mathematics of data fusion. Dordrecht, The Netherlands: Kluwer Academic Publisher, 1997. [8] R. Mahler, Statistical Multisource-Multitarget Information Fusion. Artech House, 2007. [9] B.-T. Vo, B.-N. Vo, and A. Cantoni, “Bayesian filtering with random finite set observations,” IEEE Trans. Signal Process., vol. 56, no. 4, pp. 1313–1326, Apr. 2008. [10] R. Streit, Poisson Point Processes: Imaging, Tracking, and Sensing. Springer, 2010. [11] H. Van Trees, Detection, Estimation, and Modulation Theory. Part I. New York: John Wiley & Sons, Inc., 1968 (reprinted, 2001). [12] H. Van Trees and K. Bell, Bayesian Bounds for Parameter Estimation and Nonlinear Filtering/Tracking. New York: Wiley Interscience /IEEE Press, 2007. [13] R. Niu, P. Willett, and Y. Bar-Shalom, “Matrix CRLB scaling due to measurements of uncertain origin,” IEEE Trans. Signal Process., vol. 49, no. 7, pp. 1325–1335, 2001. [14] R. Mahler, “Multitarget Bayes filtering via first-order multitarget moments,” IEEE Trans. Aerosp. Electron. Syst., vol. 39, no. 4, pp. 1152– 1178, Jan. 2003. [15] B.-N. Vo and W.-K. Ma, “The Gaussian mixture probability hypothesis density filter,” IEEE Trans. Signal Process., vol. 54, no. 11, pp. 4091– 4104, Nov. 2006. [16] R. Streit, “Multisensor multitarget intensity filter,” in Proc. of Intern. Conf. on Information Fusion (FUSION), Jul. 2008, pp. 1–8. [17] O. Erdinc, P. Willett, and Y. Bar-Shalom, “The bin-occupancy filter and its connection to the PHD filters,” IEEE Trans. Signal Process., vol. 57, no. 11, pp. 4232–4246, Nov. 2009. [18] B.-T. Vo, B.-N. Vo, and A. Cantoni, “Analytic implementations of the cardinalized probability hypothesis density filter,” IEEE Trans. Signal Process., vol. 55, no. 7, pp. 3553–3567, Jul. 2007. [19] R. Mahler, “Approximate multisensor CPHD and PHD filters,” in Proc. of Intern. Conf. on Information Fusion (FUSION), Jul. 2010, pp. 1–8. [20] E. Delande, E. Duflos, D. Heurguier, and P. Vanheeghe, “Multi-sensor PHD: Construction and implementation by space partitioning,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), May 2011, pp. 3632–3635. [21] S. Coraluppi, “Multistatic sonar localization,” IEEE J. Ocean. Eng., vol. 31, no. 4, pp. 964–974, Oct. 2006. [22] R. Georgescu and P. Willett, “The GM-CPHD tracker applied to real and realistic multistatic sonar data sets,” IEEE J. Ocean. Eng., vol. 37, no. 2, pp. 220–235, Apr. 2012. [23] P. Braca, S. Marano, V. Matta, and P. Willett, “Multitarget-multisensor ML and PHD: Some asymptotics,” in Proc. of Intern. Conf. on Information Fusion (FUSION), 2012. [24] E. Lehmann, Elements of Large Sample Theory. Springer, 2004. [25] E. Redner and H. Walker, “Mixture densities, maximum likelihood and the EM algorithm,” SIAM Review, vol. 26, no. 2, pp. 195–239, 1984. [26] A. Wald, “Note on the consistency of the maximum likelihood estimate,” Ann. Math. Statist., vol. 20, 1949. [27] L. Le Cam, “Asymptotics and the theory of inference,” Int. Statist. Rev., vol. 58, no. 2, pp. 153–171, Aug. 1990. [28] E. Lehmann and G. Casella, Theory of Point Estimation. Springer, 1998. [29] C. Coraluppi and C. Carthel, “Aggregate surveillance: A cardinality tracking approach,” in Proc. of Intern. Conf. on Information Fusion (FUSION), 2011. [30] C. Leang and D. Johnson, “On the asymptotic of m-hypothesis Bayesian detection,” IEEE Trans. Inf. Theory, vol. 43, no. 1, pp. 280–282, Oct. 1997.

[31] A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications. New York: Springer-Verlag, 1998. [32] D. Schuhmacher, B.-T. Vo, and B.-N. Vo, “A consistent metric for performance evaluation of multi-object filters,” IEEE Trans. Signal Process., vol. 56, no. 8, pp. 3447–3457, Aug. 2008. [33] H. Shao, Mathematical Statistics, 2nd ed. Springer, 2003. [34] H. Viswanathan and T. Berger, “The quadratic CEO problem,” IEEE Trans. Inf. Theory, vol. 43, no. 5, pp. 1549–1559, Sep. 1997. [35] J. Bernardo and A. Smith, Bayesian Theory. New York: Wiley, 1994. [36] C. Chen, “On asymptotic normality of limiting density functions with Bayesian implications,” J. Roy. Statist. Soc., vol. 47, no. 3, pp. 540–546, May 1985. [37] A. Walker, “On the asymptotic behavior of posterior distributions,” J. Roy. Statist. Soc., vol. 31, no. 1, pp. 80–88, 1969. [38] A. van der Vaart, Asymptotic statistics. New York: Cambridge University Press, 1998. [39] W. K. Newey and D. L. McFadden, “Large sample estimation and hypothesis testing,” in Handbook of Econometrics, R. F. Engle and D. L. McFadden, Eds. Elsevier Science B.V., 1994, vol. 4, pp. 2111–2245.

Suggest Documents