COMPARATIVE THRESHOLD PERFORMANCE STUDY FOR CONDITIONAL AND UNCONDITIONAL DIRECTION-OF-ARRIVAL ESTIMATION Yuri I. Abramovich† and Ben A. Johnson* †DSTO, ISR Division, Edinburgh, Australia, e-mail:
[email protected] * Lockheed Martin Australia and ITR, UniSA, Mawson Lakes, SA, Australia, e-mail:
[email protected]
ABSTRACT Comparative analysis of the threshold SNR and/or sample support values where genuine maximum likelihood DOA estimation starts to produce “outliers” is conducted for unconditional (stochastic) and conditional (deterministic) problem formulations. Theoretical predictions based on recent results from Random Matrix Theory (RMT) are provided and supported by simulation results. Index Terms— Direction of arrival estimation, adaptive arrays, error analysis, maximum likelihood estimation. 1. INTRODUCTION Currently, two major statistical models are applied to describe the problem of direction-of-arrival (DOA) estimation on an M -element array. 1. Unconditional (Stochastic) Model: This model is applied when an exhaustive (multi-variate) statistical description of a set of x(t) ∈ C M×1 , t = 1, . . . , N antenna training data is provided, with some unknown (estimated) parameters whose number does not depend on the amount of training data N . Typically, a multivariate Gaussian model is applied, accompanied by additional assumptions on the statistical independence of the training vectors x(t). 2. Conditional (Deterministic) Model: This model is applied when the emitted source waveforms, sampled at an antenna receiver sample rate, can not be described as i.i.d. (complex) Gaussian vectors. In this case, the sampled sequences are treated as a priori unknown deterministic parameters. Obviously, the number of these “parameters” grows with the number of observations N , and therefore the consistent estimation of all parameters is impossible here. Analysis conducted in [1] demonstrated that in the conditional case, DOA estimation accuracy improves as N → ∞. Since a particular realization of i.i.d. Gaussian signals can be also This work was partially conducted under DSTO/LMA Collaboration Agreement 290905
978-1-4577-0539-7/11/$26.00 ©2011 IEEE
4176
treated as a set of unknown deterministic parameters, the conditional maximum likelihood (CML) technique can also be applied to this unconditional (i.i.d.) scenario, and will in general provide a different set of DOA estimates than unconditional maximum likelihood (UML). The detailed comparison of asymptotic (N → ∞, M = constant) properties of UML versus CML DOA estimation for such a scenario has been conducted by P. Stoica and A. Nehorai in [1]. There, it was analytically predicted that for i.i.d. Gaussian training data, the CML DOA esimate should always be inferior to the UML estimate, although a noticeable difference in those predictions was only demonstrated for highly spatially correlated sources (ρ = 0.995), relatively small antennna array dimensions (M = 5), small SNR values, and closely spaced multiple sources. While the analysis described the important region with estimation errors which approach the Cram´er-Rao bound (CRB), it unfortunately provides no insight into the so-called “threshold” behavior of these DOA estimation techniques, where the DOA estimates depart rapidly from the CRB due to the onset of severely erroneous ‘outliers” as SNR and/or sample support decreases. In this paper, we demonstrate both theoretically and empirically that this conclusion (that UML equals or outperforms CML estimation) remains true for the threshold region as well.
2. G-ASYMPTOTIC UML AND CML DOA ESTIMATION THRESHOLD CONDITIONS Let us consider a set of i.i.d. complex Gaussian training samples with the true (actual) covariance matrix R0 R0 = σn2 IM + A(Θ0m )Pm AH (Θ0m )
(1)
where σn2 is the white noise power, A(Θm ) = [a(θ1 ), . . . , a(θm )] ∈ C M×m , m < M
(2)
is the (M × m)-variate “tall” matrix of antenna manifold (steering) vectors in the m source directions Θm = (θ1 , . . . , θm ), and Pm is the inter-source (spatial) correlation matrix.
ICASSP 2011
As a result of UML DOA estimation, we reconstruct the estimated covariance matrix 2
ˆ ML ˆ ML ˆ ML H ˆ ML R(Ω σnML ) IM + A(Θ m ) = (ˆ m )Pm A (Θm )
(3)
ˆ ML where Ω m is the set of all a priori unknown parameter UML estimates; in the extreme case with no a priori knowledge on noise covariance matrix, we have power and interspace Ωm = σn2 , Pm , θ1 , . . . , θm . According to the maximum likelihood (ML) principle, we need to seek a likelihood function (or normalized likelihood ratio) which is maximal over the unknown parameter set. ˆ m ) is the normalized likelihood function (likelihood LR(Ω ratio) for multivariate Gaussian (zero mean, covariance R0 ) i.i.d. data XN (i.e. x(t) ∈ CN (0, R0 ), t = 1, . . . , N ) and its ML solution can be presented as: ˆ exp(M ) ˆ ML )R ˆ exp(M ) det R-1 (Ω det R0-1 R m ≥ -1 R ˆ ˆ ML ˆ exp Tr R-1 (Ω ) R exp Tr R m 0 (4) ˆ is given by 1 XN X H . Under the threshold condiwhere R N N tion, the likelihood ratio associated with a parameter set conˆ ex ) can taining a highly erroneous DOA estimate (denoted Ω m ˆ m and will by be maximal across all possible parameter sets Ω definition equal or exceed the likelihood ratio of the unknown true covariance matrix. Therefore, under the threshold condition, we colloquially consider the likelihood principle itself to be “broke”, not because eqn. (4) fails to hold, but because the likelihood ratio no longer always discriminates between accurate and erroneous sets of estimates. Since the whitened sample covariance matrix is complex Wishart distributed
ˆ ML ) = LR(Ω m
1
1
− ˆ −2 N ˆN = CˆM ; CM ∼ CW(N, M, IM )(N ≥ M ), N R0 2 RR 0 (5) the UML threshold condition may be alternatively formulated as a condition where, despite a severely erroneous DOA ˆ ex , the whitened sample matrix estimate within the set Ω m ex ˆ ˆ WM (Ωm ) − 12 ˆ ex ˆ − 12 ˆ ex ˆ M (Ω ˆ ex W (Ωm )RR (Ωm ) m) = R
(6)
is “statistically indistinguishable” from the white noise-only N . Here, by “statistical complex Wishart sample matrix N1 CˆM similarity”, we mean that the p.d.f. for the LR values assoˆ M and 1 Cˆ N should ciated with the two random matrices W N M significantly overlap. Alternatively, it means that the deterˆ ex ministic matrix WM (Ω m) 1 − 12 ˆ ex ˆ ex ˆ ex WM (Ω (Ωm )R0 R− 2 (Ω m) = R m
(7)
which is the transform between the two random matrices should be “sufficiently close” to the identity matrix to keep ˆ ex ˆ M (Ω the distribution of (random) eigenvalues in W m ) close 1 ˆN to the similar distribution of eigenvalues in N CM . To more
4177
carefully define the similarity of these eigenvalue distributions, we turn to tools derived from General Statistical Analysis (GSA) or Random Matrix Theory (RMT) [2]. In RMT, we consider statistical tests not under standard asymptotic conditions (N → ∞, M =constant), but under Kolmogorov “G-asymptotic” conditions M, N → ∞ and
M → c > 0. N
(8)
In [3], we suggested the G-asymptotic criterion for similarity between two eigenvalue distributions introduced as a “single cluster” criterion. The main motivation for this criN , terion is that for the white noise-only sample matrix N1 CˆM the empirical (sample) distribution of its non-ordered eigenvalues γˆj converge almost surely to the non-random limiting distribution known as the Marchenko-Pastur law [4]. Because convergence under the Kolmogorov conditions (8) allows the antenna dimension to grow as the sample support grows, it is considered more appropriate for analysis of “threshold” performance than the conventional asymptotic assumption. While for any finite N and M , G-asymptotic results may still not be sufficiently representative, a number of applications of RMT methodology demonstrate surprising fast convergence to the asymptotic results. Not surprisingly, for a random matrix with the mean equal to the identity matrix, this limiting (under (8)) distribution is unimodal, with all eigenvalues γˆj confined to the interval (contracting with c → 0): √ √ (9) (1 − c)2 < γˆmin < . . . < γˆmax < (1 + c)2 ˆ ex ) = IM in (7) and When the deterministic matrix WM (Ω m has M different eigenvalues with multiplicity Kj (the number of times each M true eigenvalue is repeated) for a sufficiently large training sample support (i.e. for c 1), the limiting eigenvalue distribution tends to concentrate around these M corresponding “clusters” [5,6]. Yet, for a comparatively small ˆ ex ) from a constant number, deviation of eigenvalues WM (Ω m there is a correspondingly high c = M/N (< 1) such that these individual clusters broaden, eventually coalescing into only one cluster, as per the white noise-only covariance maN , with its Marchenko-Pastur law. The latter means trix N1 CˆM ˆ ex ) are not separable, exactly as the that eigenvalues in WˆM (Ω m 1 ˆN eigenvalues in N CM . From the RMT perspective, under these conditions, sceˆ ex may be treated as statistically indistinguishable nario Ω m from the true (actual) scenario Ω0m . For this paper’s model, with i.i.d. (Gaussian) sources, this “single cluster” criterion have been suggested in [3], based on the complementary “eigenvalue splitting” condition derived by X. Mestre in [5,6]. ˆ ex ), the limiting Specifically, for a sample matrix WˆM (Ω m eigenvalue distribution is a “single cluster”, if and only if 1 ≤ min βM (j) c 1≤j≤M
(10)
ex is the inter-source (spatial) covariance matrix as where Pm per (1). Since Γex m > 0, from
where βM (j) are calculated as βM (j) =
2 M γj 1 Kj ; βM (0) = βM (M ) = 0 M j=1 γj − αj
ex H ex Γex m > A(Θm )A (Θm ),
(11) and αj denotes the (M − 1) real-valued solutions to M γj 1 Kj =0 M j=1 [γj − f ]3
(12)
(M − m)σn2 (M − m)σn2 ≥ 0 )R ˆ ML )R ˆ ˆ P Tr P⊥ (Θ Tr (Θ ⊥ m m (13)
where P⊥ (Θm ) = IM −A(Θm )[AH (Θm )A(Θm )]−1 AH (Θm ) (14) Since the projector P⊥ (Θm ) may be presented as H (Θm ) P⊥ (Θm ) = UM−m (Θm )UM−m H
U (Θm )U (Θm ) = IM−m
(15) (16)
where UM−m are eigenvectors spanning the subspace complementary to the signal subspace AH (Θm ), we get ˆ = Tr [U H (Θm )RU ˆ M−m (Θm )] Tr [P⊥ (Θm )R] M−m
(17)
and for Θm = Θ0m ˆ (Θ0m )] = Tr [ Tr [U H (Θ0m )RU
1 ˆ CM−m ] N
(18)
where
CˆM−m ∼ CW(N, M −m, IM−m ) (19) ex ˆ M−m (Θ ˆ )] where While an accurate p.d.f. for Tr [W m
ˆ M−m (Θ ˆ ex ) W m
ˆ M−m (Θm ) = UM−m (Θm )RU H
we get the well-known first-order expansion for R-1 (Ωex m ): 2 ex ex R-1 (Ωex m ) ≈ σn [P⊥ (Θm ) + A(Θm )×
and γj is the j-th eigenvalue of the deterministic matrix ˆ ex WM (Ω m ) (7) with multiplicity Kj . Let us now delineate similar “single cluster” conditions for the CML DOA estimation when applied to the same i.i.d. Gaussian data. Without a loss in generality, we may consider the case with an a priori known white noise power σn2 . According to the ML principle, similarly to the UML case in (4), we have ˆ ML LRDET (Θ m )=
(22)
(20)
may be found in some cases, let us compare the “single clusˆ ex ˆ M (Ω ter” conditions for UML matrix W m ) (6) and CML maˆ ex ). Suppose that for the UML DOA estimaˆ M−m (Θ trix W m tion, an alternative scenario Ωex m that turns (10) into equality is found. The covariance for this scenario R(Ωex m ) may be presented as 2 ex ex H ex ex −2 ex R(Ωex m ) = σn [IM + A(Θm )Γm A (Θm )]; Γs = σn Pm (21)
4178
ex -1 ex -1 H ex ex -1 H ex ×[AH (Θex m )A(Θm )] (Γm ) [A (Θm )A(Θm )] A (Θm )] (23)
It is important to note that the projector P⊥ (Θex m ) and the second matrix in (23) are spanned by orthogonal subspaces. Due to this orthogonality, we get the following inequalities min [eigj Wm (Θex m )] ≤
1