Detection and localization of multiple sources in noise ... - IEEE Xplore

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 40, NO. I , JANUARY 1992

Detection and Localization of Multiple Sources in Noise with Unknown Covariance Mati Wax Abstract-We present a novel technique for the detection and localization of multiple sources in the presence of noise with unknown and arbitrary covariance. The technique is applicable to coherent and noncoherent signals and to arbitrary array geometry and is based on Rissanen’s MDL principle for model selection. Its computational load is comparable to that of analogous techniques for white noise. Simulation results demonstrating the performance of this technique are included.

245

11. PROBLEMFORMULATION Consider an array composed of p sensors with arbitrary locations and arbitrary directional characteristics. Assume that q narrow-band sources having the same center frequency impinge on the array from distinct locations O , , . . , 04. For simplicity, assume that the sources and sensors and are in the same plane and that the sources are located in the far field of the array. In this case, the only parameter that characterizes the location of the source is its directionof-arrival 0. Using complex envelope representation, the p x 1 vector received by the array can be expressed as x(t) =

A(O)s(t)+ n(t)

(1.4

where A(@) is the p x q matrix of the steering vectors of the sources

I. INTRODUCTION Most of the high resolution techniques for the detection and localization of multiple sources by passive sensor arrays are based on modeling the noise covariance matrix as spherical or, mathematically equivalent, as having a known structure. Not surprisingly, applying detection and localization techniques “tailored” for this model to situations for which it is not adequate will result in poor performance. To improve the performance in such scenarios it is necessary to apply techniques suited for arbitrary and unknown noise covariance matrices. Several detection and localization techniques for unknown and arbitrary noise covariance have been proposed in recent years. Paulraj and Kailath [3] described a technique which exploits the rotational invariance of the noise fields. Unfortunately, this rotational invariance may not always exist, and even if it exists its exploitation requires special array geometry. Moreover, the technique is not applicable to coherent sources. Le Cadre [2] described a different technique which is applicable to arbitrary array geometry and to coherent sources, but is based on parametric ARMA modeling of the noise. Hence, the technique is limited to the cases where this model is valid. Another technique based on parametric ARMA modeling of the noise was presented by Tewfik [7]. His technique, however, is applicable only to the case of uniform linear array and noncoherent sources. Finally, Reilly et al. [4] presented a different technique, which is applicable to arbitrary array geometry and to coherent and noncoherent sources and does not involve any parametric modeling of the noise. This technique, however, addresses only the localization problem. In this correspondence, we present a novel technique for both detection and localization, based on Rissanen’s minimum description length (MDL) principle, which is applicable to a general array geometry and to coherent and noncoherent sources. Its computational load is relatively modest and is comparable to that of analogous techniques for spherical noise. The correspondence is organized as follows. In Section I1 we formulate the problem. In Section 111 we derive the MDL criterion for detection and localization. Then, in Section IV we present simulation results verifying the performance of the proposed detection and algorithm and demonstrating the superiority of the localization algorithm over that proposed by Reilly et al. Finally, in Section V, we present some concluding remarks. Manuscript received November 1, 1989; revised November 16, 1990. The author is with RAFAEL, Haifa 31021, Israel. IEEE Log Number 9104020.

A(@) = [a(e,),

. . . , a(0,)i.

(1 .b)

Suppose that the received vector x ( t ) is sampled at M time instants t , , . . * , tM. Our problem can be stated as follows. Given the sampled data, determine the number q and the directions-of-arrival 0,, . . . , Os. The problem of determining the number of signals is referred to as the detection problem while that of determining the directions of arrival of the signals is referred to as the localization problem. To guarantee the uniqueness of the solution it is required that the conditions derived in [8] hold. As for the statistical models for the signals and noise, in this correspondence the following models are used: M1: The noise samples { n ( t , ) }are identical and statistically independent complex Gaussain random vectors with zero mean and covariance matrix Q. M2: The signal samples { ~ ( t , )are } identical and statistically independent complex Gaussain random vectors, independent of the noise samples, with zero mean and covariance matrix C. Notice that M1 implies that the noise covariance matrix can be arbitrary, i.e., no parametric model for the noise covariance matrix is assumed. Similarly, M2 implies that the shape of the signals and the correlation among them can be arbitrary. Specifically, the signals can be even coherent (fully correlated), thus making the model applicable also in specular multipaths scenarios.

111. SIMULTANEOUS DETECTION A N D LOCALIZATION Our approach is based on a simultaneous solution of the detection and localization problems via the minimum description length (MDL) principle for model selection, see Rissanen [SI, [6]. A straightforward application of the MDL principle to our problem is computationally very unattractive because of the large number of unknown parameters in the model. Indeed, from (1) and M1 and M2 it follows that x ( t ) is a complex Gaussian random vector with zero mean and covariance matrix given by

R = A(@)CAH(0)+ Q. (2) Thus, since the vector 0 is real and the matrices C and Q are Hermitian, it follows that the number of unknown real parameters in 0 , C, and Q, are q , q2 and p 2 , respectively, which amount to a total of q + q2 + p 2 parameters. The computation of the maximum likelihood estimator calls, therefore, for the solution of a ( q q2 p2)-dimensional nonlinear maximization problem, thus rendering a straightforward application of the MDL model selection criterion very unattractive.

+

1053-587X/92$03.00 0 1992 IEEE

+

246

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 40, NO. 1, JANUARY 1992

Following Wax and Ziskind [9], to circumvent this difficulty we compute the description length by models based on the decomposition of x ( t ) into its components in the signal and noise subspaces, denoted by x s ( t ) and xN(t), respectively. These models, though suboptimal, simplify considerably the computational load. Specifically, we compute the description length by summing up three terms: ( I ) the description length of the noise subspace components { x N ( f , ) } , assuming that is given, (2) the description length of the signal subspace components { x s ( t , ) } ,assuming that the noise subspace components { x N ( t , ) } and are given, and (3) the description length of @'. To compute the description length of the noise subspace components we need a probabilistic model for xN(t)conditioned on Note that from M1 it follows that conditioned on x N ( t ) is a (p - k ) X 1 complex Gaussian random vector with zero mean. Thus, denoting its covariance matrix by R N N ( O ' k ' ) , we have

- N,-,(o,R"(o'~')).

X,,,r(t)

(3)

Hence, the probability density of the noise subspace components is given by f({xN(?,)}

= I nR"(@'k')l 3

where & N ( O ' k ' ) denotes the (p - k ) matrix of x N ( t ) =

L{x.&)IO'k'} = M log

+ Mk + 4 k 2 log M .

&s(@' k ' ) l

(8)

Notice that the second term in (8) is different from that in (5) since the dimension of the matrices involved here is k and not p - k . Combining (5) and (8) and ignoring constant terms which are independent of k , the total description length of the signal and noise subspace components is given by

+ 4 (@

+ k 2 ) log M .

-

(9)

To compute the description length of we have first to decide on the estimator to be used. To derive this estimator, recall that our goal is to obtain the shortest code length, which we claim assures the best detection performance. Hence the optical estimate of is obtained by minimizing the right-hand side of (9), which yields

-M

exp (- M tr

fiNN(@')

that the description length of the signal subspace components, ignoring constant terms which are independent of k , is given by

[ ~ ~ ~ ( ( ~ " " ) f i ~ ~ ( ~ ' ~(4.a) ) ) l )

X

(p - k ) sample-covariance

l M

2 ,?,x,(t,)-G(t,).

(4.b)

The estimator has an interesting and intuitively appealing geometric interpretation. To present this interpretation, let Z denote a r X M matrix with columns {z;}, z = [zl . . * ZM]. Now by applying the Binet-Cauchy theorem (see Gantmacher [I]) to lZZHl we have

As can be readily verified, the maximum likelihood estimator of

RNN(@) is given by fiNN(@). Hence, since R"(O'k') contains (p - k)* real parameters, it then follows from the expression for the MDL that the code length required to encode the noise subspace components, ignoring constant terms which are independent of k , is given by

where mod I 1 represents the modulus of the complexed-value determinant of the bracketed matrix. Hence, by the well-known interpretation of the determinant as the volume spanned by its columns, we get

i

1

sum of the squared volumes of all the different

IZZ"( =

+ ;(p - k)* log M .

(5)

Note that the second term results from the expression M tr [ f i i h ( @ ' k ' ) f i N N ( @ ( k ' ) ] appearing in the log-likelihood after the substitution of the maximum-likelihood estimator. To compute the description length of the signal subspace components, we need a probabilistic model for ns(t) conditioned on @'. Ignoring the dependence of signal subspace components on the noise subspace components when conditioned on so as to simplify the derivation, we model x s ( t ) as a k X 1 complex Gaussian random vector with zero mean and convariance matrix

&',

RSS(O"'):

- N~ (0, ztss

xs( t )1

j.

(6)

Hence, the probability density of the signal subspace components is given by

f( {xs(t,)} 109 = (*R,(@'")( *

-M

exp { -M tr

[~,1 ( ~ P ) ) f i ~ ~ ( t 3 ( ~ ) ) 1 (7.a) }

where

parallelepiped spanned by sets of r distinct vectors from zl,

. .

>

ZM

.

(12)

This establishes the interpretation of lZZHl as a measure of the volume occupied by the columns of Z. Now since Ifiss(O'k')(= l/M(XSXtI, where X, = [xs(tl), * . . , X S ( t M ) ] , it then follows that \Rss(@"')1 represents the volume occupied by the data in the signal subspace. Similarly, IfiNN(O'k')( represents the volume occupied by the data in the noise subspace. We can therefore interprete the estimator (IO) as the value of that minimizes the volume occupied by the projections of the data onto both the signal and the noise subspaces. This interpretation is intuitively pleasing since the minimization of the volume of the projections onto the noise subspace guarantees good fit of the signal subspace while the minimization of the volume of the projections onto the signal subspace guarantees the exploitation of the stochastic model of the signals, or more specifically, their being Gaussian zero mean vectors. For comparison, the estimator proposed by Reilly et al. [4] can be expressed as

(7.b) Analogously to ( 5 ) , noting that the number of real parameters in

Rss(O'k')is k 2 , it then follows from the expression for the MDL

implying that it minimizes only the volume of the projections onto the noise subspace.

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 40, NO. 1; JANUARY 1992

A more explicit form of the estimator can be obtained by casting it in terms of the eigenvalues of the matrices involved. Indeed, by the well-known invariance property of the determinant under the transformation F + GFG" when G is unitary, it follows from the analysis in Wax and Ziskind [9] that k

IRiss(O(q=

n

,=I

lf(@'k))

(14.a)

z;Y(o'~')

(14.b)

and p-k

II

IRNN(O(~))I =

r=l

where l ~ ( O ' k '2) * * 2 Zi(O?) denote the nonzero eigenvalues of the rank-k matrix P A ( ~ , ~ I ) R P Aand ( ~ ( xZy(O'k)) I) 2 ... 2 ZF-k(O(k)) denote the nonzero eigenvalues of the rank-@ - k ) matrix P,&w) RP&ki), with R denoting the sample-covariance matrix: l

R

=-

cM

M,=l

(14.c)

X(t,)X"(t,)

and PA(@,) denoting the orthogonal projection matrix on the columns of the matrix A ( @ ) ) PA(e(kl)=

A ( O " ) ( A ~ ( ~ c k ' ) A ( O c k ' ) ) ~ ' A H ( O(14.d) ''")

and

Pi(@kl)=

PA(e(k1) .

-

(14.e)

Substituting (14) into (IO), we get

Using this estimator in (9) and adding k log M for the description length of its k real parameters, the total number of free parameters becomes k 2 + @ - k)2 k = p 2 - k ( 2 p - 2k - 1). Hence, dropping terms which do not depend on k , the MDL criterion is given by

+

k

= arg min MDL ( k )

(16.a)

k

where p-k

lf(&'k))

n

,=I

-

ly(&k')

;k(2p - 2k - 1) log M

(16.b)

with 6@) denoting the MDL estimator given by (15). Observe that the resulting form of the MDL criterion is rather unconventional. First, the second term, referred to as the complexity term, is negative and is not a monotonic function of k. Rather, it has a parabolic behavior with the peak being at p / 2 . Second, which is to be expected in light of the behavior of the second term, the first term is also not a monotonic function of k . The exact behavior of this term, however, is very difficult to analyze since it depends also on the data. Notice that the computational load of the estimator is relatively modest since only a k-dimensional minimization problem is involved. An efficient solution to the k-dimensional minimization is offered by the alternating maximization (AM) technique, see Ziskind and Wax [lo]. This technique transforms the k-dimensional minimization into a sequence of much simpler one-dimensional minimizations that are iterated till convergence, with the iterations initialized by a simple and efficient scheme.

241

IV. SIMULATION RESULTS To demonstrate the performance of the proposed method, we present the results of two simulated experiments. In these experiments we also compare the proposed localization algorithm with the one proposed by Reilly 8 1 al. [4]. In the experiments the array was uniform and linear with 6 sensors spaced half a wavelength apart and the sources had equal power and impinged from 0" and 6". The signals and noise were complex zero mean Gaussian variables, with the signals being independent of the noise and the covariance matrix of the noise given by Q = { q , k } , where q , k = 0.91'-kl exp ( i j ( i - k ) ~ )This . is the same covariance matrix used by Reilly et al. in their simulations. We carried out 100 Monte Carlo runs and computed the rootmean-square error (RMSE) of the two directions obtained by the proposed estimator and the relative frequency that the proposed detection criterion correctly detected two sources. We also evaluated the RMSE of the estimator proposed by Reilly et al. The estimates were computed by the alternating maximization (AM) technique (Ziskind and Wax [lo]). In all the experiments, the number of iterations till convergence almost never exceeded 7, with the average being between 5 and 6. In the first experiment the sources were uncorrelated. Fig. 1 presents the results as a function of the signal-to-noise ratio (SNR), with the number of samples being fixed to 50. Fig. 2 presents the results as a function of the number of samples, with the SNR being fixed to 1 dB. Observe that the increase in the SNR and in the number of samples does not have the same effect. While the increase in the SNR substantially improves the detection and localization performance, the increase in the number of samples beyond 5 hardly improves the localization performance, and the detection performance even degrades slightly. The reason for these phenomena is the asymptotic biases of the localization algorithm. Indeed, since the meansquare error is the sum of the squared bias and the variance, as soon as the bias is much larger than the variance, which in this example is at 50 samples, the increase in the number of samples, though decreasing the variance, does not decrease essentially the RMSE. To understand the role of the biases in the detection performance, recall the Rissanen's derivation of the MDL detection criterion implicitly assumes that the estimation errors are inversely proportional to the square root of the number of samples. Consequently, since when the biases are dominant this is not the case beyond a certain number of samples, the detection performance does not improve and even slightly degrades beyond that point. Comparing the proposed localization algorithm to that proposed by Reilly et al., the improved performance is evident, especially when the SNR is low or when the number of samples is small. This improvement is due to the smaller biases of the proposed algorithm. To demonstrate the performance of coherent sources, in the second experiment the scenario was as in the first experiment, the only difference being that the sources were coherent. Fig. 3 presents the results as function of the SNR, with the number of samples being fixed to 50, and with the phase difference at the array center being fixed to 180". Notice that the improved performance of the localization algorithm over that proposed by Reilly et al. is more evident in this case.

V. CONCLUDING REMARKS We have presented a novel technique for the simultaneous detection and localization of multiple sources in the presence of an unknown and arbitrary noise covariance matrix which is applicable


41

i

;

5 t

\

-.

~

'

?7.mp e5

(b)

Fig. 1. Two uncorrelated sources with equal power, # 1 located at 0" (boresite) and #2 located at 6 " , impinging on a linear uniform array with 6 sensors spaced half a wavelength apart. The number of samples is 50. (a) The root-mean-square error (RMSE) of the estimated locations of the two sources obtained by the estimator pf the proposed method (solid curves) and the estimator of Reilly er al. (dashed curves) as a function of the signalto-noise ratio. (b) Probability of correct detection of the proposed method as a function of the signal-to-noise ratio.

to coherent and noncoherent signals and to arbitrary array geometry. The technique was derived by applying the MDL principle for model selection to a family of competing signal subspace models defined in terms of the number of signals and their locations. The unknown elements of the noise covariance matrix and the signal covariance matrix, which are nuisance parameters in this problem, have been eliminated from the model, thus reducing considerably the number of unknown parameters and thereby the computational complexity of the estimator. The estimator was shown to have the very intuitively appealing interpretation of minimizing the volume of the projections of the sampled data onto both the signal and noise subspaces. The results of the simulations verify the performance of the proposed detection and localization algorithms. Moreover, they also show that in the threshold region, i.e., when the signal-to-noise ratio is low or when the number of samples is small, the proposed localization algorithm significantly outperforms the localization algorithm proposed by Reilly et al. [4], especially in the case of coherent signals.

Fig. 2. The same as in Fig. 1 only that the signal-to-noise ratio is fixed to 1 dB. (a) The root-mean-square error (RMSE) of the estimated location of the two sources obtained by the estimator of the proposed method (solid curves) and the estimator of Reilly et al. (dashed curves) as a function of the number of samples. (b) Probability of correct detection of the proposed method as a function of the number of samples.

The technique presented is by no means optimal. Thus, unlike the optimal techniques for spherical noise, the proposed estimator is asymptotically biased. That is, as the number of samples grows to infinity the biases approach a nonzero limit which depends on the signal-to-noise ratio (SNR), the directions of the sources, and the structure of the noise covariance matrix Q. As a result of these biases, the detection crierion is not consistent, i.e., as the number of samples grows, the probability of correct detection does not necessarily approach one but rather stays fixed and even slightly degrades beyond a certain number of samples. Another peculiarity of the proposed technique is its behavior in the case that some set of k steering vectors {a(O,)}span the orthogonal complement of the span of another set of p - k steering vectors {ace,)}. In this case I Rss (@)) I( R,,(@) I i s the same for both sets and the MDL criterion will select the set corresponding to min{k, p - k } , which may not be the right choice. Nonetheless, when considering all possible scenarios, the probability of occurrence of these pathological cases is zero. Finally, it should be pointed out that there is an alternative to the technique we have presented to eliminate the unknown elements of the signal and noise covariance matrices. In the alternative approach, referred to as stochastic complexity (SC) (Rissanen


249

[9] M. Wax and I. Ziskind, “Detection of the number of coherent signals by the MDL principle,” IEEE Trans. Acousr., Speech, Signal Processing, vol. 37, no. 8, pp. 1190-1196, 1989. [lo] I. Ziskind and M. Wax, “Maximum likelihood localization of multiple sources by alternating projection,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 36, pp. 1553-1560, 1980.

Numerical Performances of Autoregressive Spectrum Estimators Based on Three-Term Recurrences Chi-Hsin Wu and Andrew E. Yagle

Abstract-The numerical performances of two fast algorithms for estimating the power spectral density of an autoregressive process are studied. The algorithms perform similarly to the Burg algorithm, but require only two-thirds as many multiplications as the most efficient implementationof the Burg algorithm. This allows the high resolution associated with the Burg algorithm to be obtained using many fewer computations. One algorithm is the deterministic form of the split lattice algorithm adjoined to the split Levinson recursions; however, its resolution is relatively poor. The other algorithm corrects a bias in the first algorithm, and has resolution similar to the Burg algorithm.

396

I. INTRODUCTION

-2

0

2

L

s8gna8 1 0

6

8

.

I

10

12

”08Se r O I 8 0

(b) Fig. 3. As in Fig. 1 only the sources are coherent with phase difference of 180” at the array center. [6]), the nuisance parameters are integrated out by putting a prior distribution on them. The question still remains as to what prior distribution to use and how to do the integration analytically.

ACKNOWLEDGMENT The author would like to thank the reviewers for their helpful comments. REFERENCES [I] F. R. Gantmacher, The Theory of Matrices. Chelsea, 1977. [2] J. P. Le Cadre, “Parametric methods for spatial signal processing in the presence of unknown colored noise fields,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, pp. 965-983, 1989. [3] A. Paulraj and T. Kailath, “Eigenstmcture methods for direction-ofamval estimation in the presence of unknown noise fields,” IEEE Trans. Acousr., Speech, Signal Processing, vol. 34, pp. 13-20, 1986. [4] J. P. Reilly, K. M. Wong, and P. M. Reilly, “Direction-of-amval estimation in the presence of noise with unknown, arbitrary covariance matrices,” in Proc. ZCASSP 89 (Glasgow, Scotland), pp. 26092612. [5] J. Rissanen, “Modeling by the shortest data description,” Aurornatica, vol. 14, pp. 465-471, 1978. [6] J. Rissanen, Stochastic Complexiry in Statistical Injury, vol. 15 (Series in Computer Science). World Scientific, 1989. [7] A. Tewfik, “Harmonic retrival in the presence of colored noise,” in Proc. ICASSP 89, pp. 2069-2072. [8] M. Wax and I. Ziskind, “On unique localization of multiple sources in passive sensor arrays,” IEEE Trans. Acousr. Speech, Signal Processing, vol. 37, No. 7, pp. 996-1000, 1989.

The Burg algorithm for autoregressive (AR) process spectral estimation [ 11 is known to give accurate estimates of the power spectral density for truly AR processes [2]. One of the major problems of this popular algorithm is the large amount of computation it requires, as compared to nonparametric methods such as the periodogram. This correspondence studies the numerical performances of two spectral estimators that have properties very similar to the Burg algorithm, but require only two-thirds as many multiplications to implement. This allows the high resolution characteristic of the Burg algorithm to be attained using less computation. The two algorithms are based on a three-term recurrence that replaces the lattice recursions of the Burg algorithm, as in the recently developed “split” Levinson, Schur, and lattice algorithms [3]. In these algorithms, the three-term recurrence requires only one multiplication per update at each time, a 50% reduction as compared to the two required by the lattice-based Levinson, etc., algorithms. The split algorithms remove a hidden redundancy in the lattice-based algorithpls, and thus constitute a more efficient implementation of them. Instead of propagating the forward and delayed backward prediction errors separately, as does the Burg algorithm, the two algorithms propagate the sum of these errors. The reflection coefficient, which is the correlation coefficient between the forward and backward prediction errors, is replaced by a so-called potential, which is related to the correlation coefficient between the sums of the errors at successive times. The computation of the potential from the time series data is an important issue that shows the new algorithms are nontrivial genManuscript received August 27, 1989; revised May 10, 1991. This work was supported by the National Science Foundation under Grant MIP8858082. The authors are with the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109-2122. IEEE Log Number 9104019.

1053-587X/94$03.00 0 1994 IEEE