Parallel and Adaptive High Resolution Direction

0 downloads 0 Views 312KB Size Report
Eventually, R converges to a diagonal matrix, equal to SZ up to a reordering on the diagonal, i.e., R = 5TSZ5 for j = 1, where 5 is a permutation matrix 7.
Parallel and Adaptive High Resolution Direction Finding Marc Moonen - Filiep Vanpoucke ESAT - Katholieke Universiteit Leuven K.Mercierlaan 94 3001 Heverlee -Belgium [email protected] [email protected] Ed Deprettere Dept. of Electrical Engineering Delft University of Technology 2628 CD Delft -The Netherlands [email protected] .

Abstract Recently, the new class of so-called subspace methods for high resolution direction nding has received a great deal of attention in the literature. When a real-time implementation is aimed at, the computational complexity involved is known to represent a serious impediment. In this paper, an ESPRIT-type algorithm is developed, which is fully adaptive and therefore particularly suited for real-time processing. Furthermore, a systolic array is described, which allows the processing of incoming data at a rate which is independent of the problem size. The algorithm is based on orthogonal transformations only. Estimates are computed for the angles of arrival, as well as for the source signals. Our aim is not so much to develop yet another ESPRIT-type algorithm, but rather show that it is indeed possible to develop an algorithm which is fully parallel and adaptive. This is something which has not been done before. EDICS classification : 7.1.4 Systolic and Wavefront Architectures. Mailing address :

Marc Moonen, ESAT-KUL, K. Mercierlaan 94, 3001 Heverlee, Belgium tel. 32/16/22 09 31, email : [email protected]

1

I. Preliminaries The direction-of-arrival (DOA) estimation problem consists in determining the angles of arrival of a number of signals impinging on a sensor array. Recently, the ESPRIT method of Roy [11] and Paulraj et al. [10] has received a great deal of attention in the literature. It belongs to the class of socalled subspace methods. The generic measurement set up consists of two identical sensor arrays, displaced over a distance 1, and with m sensors each. The sensors are thus pairwise identical, but apart from that the sensor characteristics may be unknown. Incident on both arrays are d narrow band non-coherent signals (plane waves) sk (t) = s^k (t):ej!o t ; (k = 1; : : :; d), each having an unknown slowly time-varying complex amplitude s^k (t) and a known center frequency !o , which is the same for all the signals 1. Noise is present at each sensor, and is assumed to be additive, stationary, and of zero mean. If N  m time samples are available of the signals received by the sensors in the rst array, then one can collect all these observations into one matrix X , with N rows and m columns. The j -th column contains a sampled version of the signal at the j -th sensor. The i-th row contains the samples at time ti for all the sensors. The following equation then holds.

X = S 1 A + Nx ;

where S is the signal matrix, with S (i; j ) = sj (ti ), A is the array gain matrix, with A(i; j ) the gain of sensor j in the direction of signal si and Nx is the noise matrix with Nx (i; j ) the noise present at time ti for sensor j . All these matrices are unknown, except for X . Due to the displacement 1 between the 2 arrays, the signals measured by corresponding sensors in the two arrays are equal up to a phase shift, which for signal sk is given 1sin k by k = e0j!o c , where c is the signal propagation velocity and k is the (unknown) angle of arrival 2 . The observations at the sensors of the displaced array are similarly collected in an N 2 m data matrix Y , which then satis es the equation

= S 1 8 1 A + Ny ; where 8 = diag (1; : : :; d ). Again, all these matrices are unknown, except for Y . The direction nding problem now reduces to estimating 8, from Y

which the k 's can be computed directly. The signal copy problem, consists in computing the source signals, i.e., the matrix S . 1 The model can also be extended to include wide band signals. 2 k = 0 corresponds to the direction perpendicular to 1.

2

In the special case when there is no noise present, i.e., Nx the DOAs are readily computed from the matrix pencil

Y

= Ny = 0,

0 X = S 1 (8 0 I ) 1 A:

The 's for which the rank of Y 0 X is one less than the rank of X and Y , i.e., the non-trivial generalized eigenvalues, then correspond to the 's. In other words, the 's are computed as the rank reducing numbers for the

given pencil. The noise, however, introduces new rank reducing numbers and also reduces the accuracy of the original ones. Therefore, it is necessary to compress the pencil Y 0 X to a smaller pencil, where the e ect of the noise is zeroed or, at least, reduced. In general, one rst selects `compression matrices' Prow and Pcol, d 2 N and m 2 d respectively, and then computes the rank reducing numbers of the compressed d 2 d pencil

Prow |{z} Y Pcol 0ProwX Pcol = ProwS 1 (8 0 I ) 1 APcol +

|

N 2m

{z

d2d

noise terms

:

}

The compression matrices may be chosen in various ways. The approach in [11], e.g.h, employsi the singular value decomposition (SVD) of the compound matrix X Y . The resulting algorithm, however, is dicult to turn into an adaptive algorithm and to implement on a systolic array. Hereafter an alternative choice is made for the compression matrices. We do not claim this choice is better {or worse{ in terms of performance. But it turns out that one ends up with an algorithm which can indeed be made fully adaptive and parallel. This is the main contribution of the paper. Our adaptive algorithm is based on the SVD updating algorithm of [8], combined with a generalized Schur decomposition (GSD). The corresponding systolic array is based on the SVD updating array of [9]. By introducing adaptivity, the computational complexity is reduced from O(m3 ) to O(m2 ) per time update. On a parallel processor array with O(m2) processors, the throughput is then O(m0 ), which means that the number of measurements that are processed per time unit is independent of the problem size. The underlying computational steps are brie y outlined in section II, where it is also shown how so-called `Jacobi-type' algorithms may be employed. Such algorithms are particularly suited for adaptive and parallel implementation, which is shown in section III. In section IV, the signal copy problem is considered, where the aim is to estimate the source signals. Finally, in section V, a few simple computer experiments are shown.

3

II. Basic Computational Steps & Algorithms In the algorithm below, we make use of a third `instrumental variable'-type data matrix Z . For the time being, we assume Z is produced by a third array of m sensors. Later on we will see that a time-shifted version of either X or Y may also be used as a Z matrix. With a third array, the matrix Z is given as

Z = S 1 B + Nz ; where B is the auxiliary array gain matrix, which need not be the same as the other gain matrix, A 3. Matrices Z and Nz are de ned in the same way as X; Y and Nx ; Ny respectively. The concepts of our algorithm are

outlined below. The compression matrices are computed in the rst step, from the SVD of Z . The rank reducing numbers of the compressed pencil are computed in the second step, from the generalized Schur decomposition of the compressed pencil. Step 1 : singular value decomposition

The rst step in the procedure is a singular value decomposition of the matrix Z , which is written as

Z |{z}

N 2m

=

UZ 1 SZ 1 VZH

# "

"

#

H = Us Un 1 S0s S0n 1 VVsH ; | {z } | {zn }

i

h

N 2m

|

{z

m2m

}

m2m

where UZH 1UZ = VZH 1VZ = Im 4 and SZ is a diagonal matrix containing the singular values of Z in descending order. The superscript H denotes complex conjugate transpose. Each of the matrices UZ ; SZ and VZ can be subdivided as shown. Ss is a d 2 d matrix, with the `large' singular values, originating from the signals sk , and Sn contains the `small' singular values, corresponding to the noise, i.e., Sn = 0 if Nz = 0. The columns of UZ and VZ are divided accordingly. The subscript s refers to the so-called `signal subspace', while n refers to the so-called `noise subspace' [8]. Applying the transformation UZH to the matrices X and Y results in "

#

UsH X def = UnH " H # Us Y def = UnH

"

"

Xs

1

Ys

1

#

#

;

3 The rank of B should be equal to d (= rank(A)). 4 Im is the m 2 m identity matrix, sometimes also denoted as Im2m .

4

where the dots represent don't care entries. If we assume that the noise satis es the following standard properties [2]

1 N H Ny = lim 1 N H Nz = lim 1 N H Nz = lim N !1 N x N !1 N y N !1 N x 1 N H S = lim 1 N H S = lim 1 N H S = lim N !1 N z N !1 N y N !1 N x

Om2m Om2m

then it is readily veri ed that

1 1 Xs = p1 1 U H 1 S 1 A + noise terms p lim | {z } N !1 N |{z} N s 0

d2m

1 1 Ys = p1 1 U H 1 S 1 8 1 A + noise terms : p lim | {z } N !1 N |{z} N s 0

d2m

In words, for N ! 1, the (scaled) reduced matrices Xs and Ys are not corrupted by noise. The Z is thus used as a sort of `instrumental variable'. Note that if the third sensor array is not available, but when the additive noise is white, one can set Z equal to a time shifted version of either X or Y . One can indeed show that also in this case, for N ! 1, the (scaled) matrices Xs and Ys are not corrupted by noise. In other words, there is no need for a third auxiliary array, at least when only the DOAs have to be estimated 5 . The row compression is thus given as

Prow = UsH : As the noise is e ectively canceled by this row compression, the choice for the column compression is not so important anymore. Here we make the simplest possible choice, namely "

#

Id :

Pcol = 0

We only mention here that better choices for Pcol are conceivable, e.g., based on an orthogonal compression of Xs and/or Ys . In fact the above choice may even fail to give the right answer for contrived examples. We make a simple choice for the sake of an easy-to-follow exposition.

5 For the signal copy problem, see section VI, it is not possible to use a time shifted version of X or Y . Then Z must indeed correspond to a third array, see below.

5

In conclusion, we have "

#

UsH X = UH " nH # Us Y = UnH

"

#

1 1 1 # " Yss 1 ; 1 1 Xss

where then the 's will be computed as the rank reducing numbers of the compressed pencil Yss 0 Xss . Step 2 : generalized Schur decomposition

In order to compute the rank reducing numbers, a generalized Schur decomposition (GSD) of the pencil is computed. This amounts to reducing the pencil to upper triangular form, by applying left and right unitary transformations :

QH (Yss 0 Xss)W = Ty 0 Tx: Here Q and W are unitary matrices, and Tx and Ty are upper triangular (d 2 d) matrices. The rank reducing numbers result as the ratios ) of the corresponding diagonal entries of Ty and Tx, i.e., k = TTxy ((k;k k;k) . We will not analyze the performance of the above procedure here (only a few simulation results are given in section V). Our aim is only to show that this procedure leads to an algorithm which is amenable to parallel and adaptive implementation. To this aim, we rst elaborate the above procedure as a so-called Jacobi-type algorithm. Such algorithms are based on 2 2 2 transformations, and are particularly suited for parallel implementation. We will brie y review the Jacobi-type SVD and GSD computation. In the next section, all this is then turned into a fully adaptive and parallel algorithm. Jacobi SVD :

The SVD of the matrix Z may be computed in two steps. First, a QR decomposition is computed, given as

Z |{z}

N 2m

RZ = |{z} QZ 1 |{z}

N 2m m2m

;

6

where QH Z QZ = Im and RZ is an upper triangular matrix. This is done in a constant number of steps, e.g., with a sequence of Givens transformations [3, 5]. Then an iterative procedure is applied to the triangular factor RZ , transforming it into a diagonal matrix. This diagonalization procedure consists in applying a sequence of plane transformations as follows, see [6, 7] for details:

( RZ U ( I V ( I

R

for j = 1; : : : ; 1 for i = 1; : : : ; m01

[ i is called the pivot index]

( 5[i;i+1] U[Hi;j] 1 R 1 V[i;j] 5[i;i+1] U ( U 1 U[i;j] 5[i;i+1] V ( V 1 V[i;j] 5[i;i+1]

R

end end

The matrices U[i;j ] and V[i;j ] are unitary transformations in the (i; i+1)plane 2

U[i;j] =

6 6 6 4 2

V[i;j] =

6 6 6 4

Ii01

Ii01

3

cos  r 1 sin  0r 1 sin  cos  cos 0s 1 sin

Im0i01

s 1 sin

cos

Im0i01

7 7 7 5 3 7 7 7 5

with jrj = jsj = 1. Here r and s denote the complex conjugates of r and s respectively. The matrix 5[i;i+1] is a permutation in the (i; i + 1)-plane 2

5[i;i+1] =

6 6 6 4

Ii01

3

0 1 1 0 7

Im0i01

7 7: 7 5

The transformation parameters are chosen such that applying U[i;j ] and V[i;j] to R results in a zero (i; i + 1)-entry in R, while R still remains

in upper triangular form. Each iteration thus essentially reduces to performing an SVD of a 2 2 2 block somewhere on the main diagonal 6 . The permutations are necessary to guarantee convergence to diagonal form . For more details, the reader is referred to [6, 7]. After each iteration we have

RZ = U 1 R 1 V H : Eventually, R converges to a diagonal matrix, equal to SZ up to a reordering on the diagonal, i.e., R = 5T SZ 5 for j = 1, where 5 is a permutation matrix 7. This results in the required SVD, up to the reordering :

Z

= QZ 1 RZ = |QZ{z1 U} 1 UZ 5

R |{z}

1 |{z} VH

5T SZ 5 5T VZH

(for j = 1). The above SVD algorithm is simple, amenable to parallel implementation [7], and may be turned into an adaptive algorithm, see section III. Jacobi GSD :

By applying the orthogonal transformation (QZ U )H to X and Y , one obtains "

#

(QZ U ) = 5 1 1 11 # " Y ss 1 : H T (QZ U ) Y = 5 1 1 1 HX

T

Xss

The rows with Xss and Yss are thus scattered in the resulting matrices, due to the permutation 5. As 5 is known as the permutation that puts the diagonal elements in R in descending order, it is easy to retrieve Xss and Yss . A similar Jacobi-type algorithm may then be used to compute the GSD of the pencil Yss 0 Xss [1, 13], as follows :

6 Here `inner rotations' are used, see [12, 7], with either  or always less than or equal

to 45o . This also means that, e.g.,  = = 0 when the 2 2 2 block is already in diagonal form. 7 The diagonal entries in SZ appear in descending order, by de nition, while the diagonal entries in R are, in general, unordered.

8

X ss Yss

( ( Q ( W (

Xss Yss I I

for j = 1; : : : ; 1 for i = 1; : : : ; d 0 1

X ss

( Yss ( Q ( W (

5 [i;i+1] QH[i;j] 1 Xss 1 W[i;j] 5 [i;i+1] 5 [i;i+1] QH[i;j] 1 Yss 1 W[i;j] 5 [i;i+1] Q 1 Q[i;j] 5 [i;i+1] W 1 W[i;j] 5 [i;i+1]

end end

Here

2

5 [i;i+1] =

6 6 6 4

Ii01

3

0 1 1 0

Id0i01

7 7: 7 5

We refer to [13] for the details on how to compute the unitary plane rotations Q[i;j ] and W[i;j ] . After each iteration we have

= Q 1 Xss 1 W H = Q 1 Yss 1 W H ; where X ss and Yss converge to the upper triangular matrices Tx and Ty , up to a permutation, i.e., eventually (for j = 1)  X ss = 5 T Tx 5 Yss = 5 T Ty 5 : Xss Yss

The rank reducing numbers result as the ratios of the corresponding  ) diagonal entries of X ss and Yss , i.e., k = XYssss((k;k k;k) (with a reordering 5 ).

9

III. A parallel and adaptive algorithm In on-line applications, new rows are appended to the data matrices X , Y and Z at each sampling instant. As an example, Z is de ned in a recursive manner as follows

Z (k) =

"

#

1 Z (k 0 1) : z (tk )T

Here k is the sampling time index, and is a weighting factor for exponential windowing (  1). Finally, z (tk ) is the observation vector at time tk , for the third array. Matrices X (k) and Y (k) are similarly de ned. The aim is now to obtain new estimates for the DOAs at each sampling time instant. The procedure of the previous section uses SVD and GSD. An adaptive version is then based on SVD updating, with additional GSD computations which are carried out in a `second level'. We rst focus on the SVD updating and its implementation on a systolic array, and then move on to the second level GSD computation. The resulting procedure is fully adaptive. The corresponding systolic array will be as depicted in Figure 1. New observations are processed continuously (input), and estimates for the DOAs run out as indicated (output). The square block in the middle stores and updates the triangular factor R, as well as X ss and Yss (padded with don't care entries, see section II). The three matrices are overlaid. The upper square block stores and updates the column transformation matrices V and W (overlaid) 8 . The lower square block stores a permutation matrix 5 that keeps track of the ordering of the elements on the diagonal in the central block. This will be used to re-order the outputs. SVD Updating

The following is an adaptive SVD algorithm, applied to Z , where the Jacobitype SVD procedure is interlaced with QR updates, whenever a new observation z (tk ) has to be worked in. The U -matrix (of growing size) is not computed explicitly. We refer the reader to [8] for the details. 8 Note that W is now an m 2 m matrix, instead of d 2 d.

10

V ( Im2m R ( Om2m for k = 1; : : : ; 1 input new observation vector z (tk ) 1. Matrix-vector multiplication

z~(tk )T

( z(tk )T 1 V

2. QR updating "

R

0

#

( QZ (k)H 1

"

1R z~(tk )T

#

3. SVD steps for i = 1; : : : ; m 0 1

( 5[i;i+1] U[Hi;k] 1 R 1 V[i;k] 5[i;i+1] V ( V 1 V[i;k] 5[i;i+1]

R

end end

The matrix R remains upper triangular throughout. The column transformations are accumulated in V . Each time a new observation vector z (tk ) is worked in (step 2), it is rst transformed with V (step 1). In this way, the new row is put in the same `basis' as the matrix R. In step 2, QZ (k) is a unitary transformation, which zeroes the last row in the compound matrix, see [4, 5] for details. This QR update is then followed by a sequence of SVD transformations i = 1; : : : ; m 0 1 along the diagonal (step 3). Each time update requires only O(m2) operations, whereas normally a complete SVD computation would require O(m3 ) operations. In [8] it is shown that this O(m) reduction in computational complexity is obtained at the cost of a tracking error of the order of magnitude of the `time variation' in O(m) time steps. Both the tracking error and the time variation are de ned in terms of the angles between the true and/or estimated signal (noise) subspaces at di erent time steps. For `slowly' time varying systems, the tracking error is then suciently small. This also means that the triangular factor R will always be close to a (block) diagonal matrix, instead of being exactly equal to a diagonal matrix. For slowly time varying systems,

11

there will be a clear distinction between the noise part (small diagonal elements) and the signal part (large diagonal elements). A systolic implementation of this SVD updating algorithm is described in [9]. With O(m2) processors, an O(m0 ) throughput rate is attained. The array consists of a square part, where the unitary matrix V is stored and updated, and a triangular part for storing and updating the triangular factor R, as outlined in Figure 2. In the square part the matrix-vector products are computed, and in the triangular part the QR updating and the SVD diagonalization is performed. All these operations are carried out concurrently. The data ow is depicted in Figure 3, where for the time being only the upper square part, together with the triangular part in the middle is to be taken into account, as indicated by the trapezoidal frame. The smaller frames may be thought of as processors, dots correspond to matrix entries. Processors on the main diagonal (double frames) compute the required 2 2 2 transformations. Row transformation parameters are passed on to the right, while column transformation parameters are passed on upwards. O -diagonal processors only apply and propagate these transformations to the blocks next outward. For details on this con guration, we refer to [7, 12]. The data vectors z (tk ), z (tk+1 ), z (tk+2 ), etc. are fed in in a skewed fashion as indicated in Figures 3a, 3e, 3i, etc., respectively 9 , and are propagated to the right, in between two rotation fronts. Meanwhile, the matrix-vector product is computed, with intermediate results being propagated upwards. The resulting matrix-vector product z~(tk ) becomes available at the top end of the square array, Figures 3d-e-f. It is then bounced back and propagated downwards, towards the triangular array. While going downwards, the z~(tk ) vector crosses the upgoing transformations V[:] 5[:] , which are then applied to the V matrix as well as to the available z~(tk ) components, in order to obtain consistent results. Finally, the QR updating is interlaced with the SVD steps in the triangular array, starting in Figure 3m. We refer to [3] for the details on systolic QR updating. The data ow is slightly di erent here, compatible with the Jacobi-type operations. In each 2 2 2 block, the column and row transformations of the SVD diagonalization are performed rst, while in a second step, the row transformations of the QR updating, are performed. For more details, we refer to [9]. The matrices Xss and Yss are obtained by applying the row transformations of the SVD updating procedure to the matrices X and Y . An algorithmic description is as follows. Note that X ss and Yss are now de ned

9 The rst vector is indicated by the 's, the second one by the 's. Subsequent vectors are not shown, for the sake of clarity.

12

as m 2 m matrices, i.e., Xss and Yss (d 2 d) padded with don't cares, see section II.

( ( ( V ( W (

R X ss Yss

Om2m Om2m Om2m Im2m Im2m

for k = 1; : : : ; 1 input new observation vectors x(tk ), y (tk ), z (tk ) 1. Matrix-vector multiplications

x~(tk )T ( x(tk )T 1 W y~(tk )T ( y (tk )T 1 W z~(tk )T ( z (tk )T 1 V 2. QR updating # # " " 1 R 1 X ss 1 Yss R X ss Yss H 0 : : : : : : ( QZ (k) 1 z~(tk )T x~(tk )T y~(tk )T

3. SVD steps for i = 1; : : : ; m 0 1

( X ss ( Yss ( V ( W ( R

5[i;i+1] U[Hi;k] 1 R 1 V[i;k] 5[i] 5[i;i+1] U[Hi;k] 1 Xss 5[i;i+1] 5[i;i+1] U[Hi;k] 1 Yss 5[i;i+1] V 1 V[i;k] 5[i;i+1] W 1 5[i;i+1]

end end

In the above algorithm, we have already introduced a new unitary matrix W which will be used later on. Here it only accumulates the permutations. The column permutations are applied to X ss and Yss to keep the diagonal elements of Xss and Yss on the diagonal, as this is where Jacobi-type transformations are initiated in a parallel implementation.

13

A systolic array for this algorithm is outlined in Figure 4. In the square array both V and W (two levels) are stored and updated. The triangular array has now become a square array, too, with three levels, for R, X ss and Yss respectively. The details of the data ow are again shown in Figure 3, where now the upper two square blocks in each gure should be taken into account. The 's now correspond to x(tk ), y (tk ), z (tk ) or x~(tk ), y~(tk ), z~(tk ) components (three levels overlaid). Note that the con guration above the main diagonal is as follows

whereas below the diagonal, one has

In the upper square part, vectors x(tk ) and y (tk ) are multiplied by W , and the vector z (tk ) is multiplied by V . The row transformations (column permutations) that are generated on the main diagonal, both for the QR updating and the SVD reduction, are now propagated to the left (downwards), too. Combined SVD and GSD updating

The key observation now is that the GSD computations can run concurrently with the adaptive SVD computations. Recall that the GSD of Yss 0 Xss needs to be computed, whereas the stored matrices X ss and Yss are m 2 m matrices, i.e., Xss and Yss (d 2 d) padded with don't cares and permuted

X ss Yss

"

#

= 5 1 1 11 1 5 # " Y ss 1 1 5: T = 5 1 1 1 T

Xss

Both the SVD and the GSD level will now generate 2 2 2 transformations. Since the row transformations are not stored, they should be the same for the two levels. This means that, e.g., a row transformation, generated in the GSD level, should be applied to the R matrix, too 10. Brie y, each row 10 Otherwise, subsequent row transformations generated in the

inconsistent result when applied to X ss and Yss.

14

R

level, may have an

transformation should be applied in both levels, and either the SVD level or the GSD level should generate it, not both at the same time. Fortunately, the GSD level only generates row transformations whenever two non-don't care rows are combined, see section II. Remember that a large element on the main diagonal of R corresponds to a non-don't care row in the matrices X ss and Yss . This is precisely where the corresponding SVD computations may be skipped 11. In other words, in the signal subspace the row transformation is dictated by the GSD level. In all other cases, the row transformation is dictated by the SVD level. The computation of the transformations U[i;k] , V[i;k], Q[i;k] and W[i;k] is given below. In this description, [M ]i;i+1 denotes the 2 2 2 matrix at the crossing of rows i; i + 1 and columns i; i + 1 of a matrix M . case 1 : [R]i;i+1 has two `large' 12 diagonal elements : a) compute Q[i;k] and W[i;k] from the GSD of [X ss ]i;i+1 , [Yss ]i;i+1

b) put U[i;k] = Q[i;k] , and compute V[i;k] to upper triangularize [U[Hi;k] 1 R]i;i+1 .

case 2 : [R]i;i+1 has either one or two `small' diagonal elements : a) compute U[i;k] and V[i;k] from the SVD of [R]i;i+1

b) put Q[i;k] = U[i;k] , and W[i;k] = Im

With the above scheme for computing the transformations, the adaptive algorithm nally becomes as follows:

R

( Om2m

X ss ( Yss (

Om2m Om2m V ( Im2m W ( Im2m 5 ( Im2m for k = 1; : : : ; 1

input new observation vectors x(tk ), y (tk ), z (tk ) 1. Matrix-vector multiplications

x~(tk )T

( x(tk )T 1 W

11 Such SVD transformations correspond to transformations within the signal subspace. As we only need to separate the subspaces spanned by Us and Un , whereas the basis of singular vectors need not be computed explicitly, these transformations are irrelevant. In [8] it is shown that the subspace tracking capability of the updating algorithm is not a ected when these transformations are skipped. 12 We assume a threshold is given in order to de ne `large' and `small'.

15

y~(tk )T z~(tk )T

( y(tk )T 1 W ( z(tk )T 1 V

2. QR updating "

R X ss Yss 0 ::: :::

#

( QZ (k) 1 H

"

1 R 1 X ss 1 Yss z~(tk )T x~(tk )T y~(tk )T

#

3. SVD/GSD steps for i = 1; : : : ; m 0 1

[compute transformations as outlined in the text above]

( ( ( V ( W ( 5 (

R X ss Yss

5[i;i+1] U[Hi;k] 1 R 1 V[i;k] 5[i;i+1] 5[i;i+1] QH[i;k] 1 Xss 1 W[i;k] 5[i;i+1] 5[i;i+1] QH[i;k] 1 Yss 1 W[i;k] 5[i;i+1] V 1 V[i;k] 5[i;i+1] W 1 W[i;k] 5[i;i+1] 5 1 5[i;i+1]

end

4. Generate output

output ( 5 1

i Yss(m;m) T : : :   Xss(1;1) Xss (m;m)

h  Yss (1;1)

end

The SVD/GSD computations are combined and interlaced with the QR updates. The matrix R will always be close to a (block) diagonal matrix. The matrices X ss and Yss will always be close to the required GSD form, padded with don't cares. Estimates for the 's, also padded with don't cares, are computed from the ratios of the corresponding diagonal entries of the matrices X ss and Yss . The multiplication with 5 in step 4 puts these estimates always in the same order. The corresponding systolic array is still given in Figures 3 and 4, only the program for the processors has changed. The lower square array keeps track of the permutation matrix 5. The column transformations in this lower array are simple permutations. The vector containing the estimates for the 's (padded with don't cares) is computed on the main diagonal, and then propagated downwards, indicated by the 's in Figure 3 (overlaid with x~(tk ) and y~(tk )). In the lower block, it is nally multiplied by the permutation

16

matrix 5. When the result leaves the array, it will always have the same ordering. Note that the output in Figure 3A (3D, etc.) corresponds to the input in Figure 3a (3d, etc.). The latency for one single update is O(m).

IV. Signal Copy In this section, we brie y consider the signal copy problem for the simple case when the source signals are independent, i.e., when

lim 1 1 S H 1 S

N !1 N

is a diagonal matrix. It turns out that, in this case, an estimate for the source signals can be obtained at each time instant, and with only little computational overhead. We consider the noise-free case with N ! 1. If there is noise, or whenever N 6= 1, the equations hold approximately (see section V). In the non-adaptive scheme, one has "

X I0d and similarly "

Y

Id

#

#

0

"

S 1 A I0d

#

= = Us 1 Q 1 Tx 1 W H "

S 1 8 1 A I0d

#

= = Us 1 Q 1 Ty 1 W H :

From the uniqueness properties of the GSD, and with the above assumption for S H S , it follows that "

X I0d

#

= |Us{z1 Q} 1

T 1W | x {z" } # S 5~ D I D01 5~ T A d H

;

0

~ is some permutation matrix, and D is some complex diagonal (scalwhere 5 ing) matrix. In other words, the source signal matrix S can be determined, up to a scaling and a permutation ~T: S = Us Q 1 D01 5 17

In the adaptive scheme, the aim is to compute {at time instant k{ the last row of S (k). Now, the row transformations from the GSD level are interleaved with those generated by the SVD level. Therefore, the above formula now reads S~(k) = U~(k) 1 D(k)015~ (k)T ;

where U~(k) corresponds to the accumulated row transformations, and S~(k) is S (k) padded with m 0 d don't care columns. The aim is thus to compute ~ (k)T . Note the last row of U~ (k), which should then be adjusted with D(k)01 5 that this adjustment should be consistent from one time step to the next ~ (k) will be taken into account in the same way as one. The permutation 5 was done for the 's, i.e., by storing and updating the permutation matrix 5 explicitly (lower square block), and computing matrix-vector multiplications such that a constant re-ordering is obtained. As for D(k), one can verify 13 that consistent estimates can be obtained by choosing 3 2 u~ (k) 1 j u ~ ( k ) j 7 6 1 7 ; ... D(k) = [diag(X ss (k))]01 1 664 7 5 u~m (k) ju~m (k)j

where diag(M ) is a diagonal matrix with the diagonal entries of M , and u~i (k) is the i-th component of u~(k), the rst row vector of U~(k). It remains to compute U~ (k) in each time step, where only the rst and the last row are needed. At time tk , before adding the new observation, we have

Z (k 0 1) = U~(k 0 1) 1 R(k 0 1) 1 V H (k 0 1)

When the new row is appended, we have

Z (k)

=

"

2

=

6 6 6 6 4

1 Z (k 0 1) z (tk )T

U~(k 0 1)

0 : : :0

#

0

3

0 1

7 5

" .. 7 . 7 71

1 R(k 0 1) z (tk )T 1 V (k 0 1)

#

1 V H (k 0 1)

13 diag(Xss(k)) compensates for the loss of amplitude information since U~(k) is a unitary

matrix with a growing number of rows, whereas diag(~um =ju~m j) corrects the phase loss due to the fact that the QR decomposition of a complex matrix is only determined up to a unitary diagonal matrix.

18

The QR update is then performed as follows 2

6 6 Z (k) = 66 4 |

U~(k 0 1)

0

3

# " .. 7 1 R(k 0 1) . 7 7 QZ (k) 1 QZ (k)H V H (k): T 0 75 z (tk ) V (k 0 1)

0 : : : 0 {z 1 i h ~U (k) ...

} |

"

{z

R(k) 0 : : :0

#

}

The subsequent SVD/GSD transformations correspond to a further re nement of this decomposition, which is not taken into account in the present estimate. What is important here is that the last row of the present U~(k) is obtained from h

i

0 : : : 0 1 1 QZ (k)

In the algorithmic description, we thus only have to extend the formula for the QR update as follows 3 2 0 # " .     R Xss Yss s~ u~ 7 . 6 ( QZ (k)H 64 1 R 1 Xss 1 Yss 0. u~ 75 : 0 ::: ::: 1 1

z~(tk )T x~(tk )T y~(tk )T

1 0

The estimate for the source signals at time tk (including don't care entries) is then obtained as 5(k) 1 D(k)01 1 s~ ;

where vector s~ is de ned by the formula above and 5(k)D(k)01 is obtained as explained before. In a systolic implementation, these additional operations are easily included in the program of the diagonal processors. The vector u~ is stored on the diagonal. The estimate s~T 1 D(k)01 is also computed on the diagonal, together with the DOAs, and then propagated downwards, towards the 5 array, where the permutation is applied. The overall result is that at each time instant, the 's as well as the estimates for the corresponding signals (including don't cares) run out of the array, as indicated in Figure 1.

V. Experiments The purpose of this section is to illustrate the behavior of the algorithm with two simple computer experiments. In a rst experiment the algorithm

19

is used to track the directions of arrival of two sources which are moving at constant angular velocity. The second simulation shows the reconstructed signals for a stationary scenario, with 2 xed sources. In both simulations the antenna is a uniform linear array, consisting of 12 sensors. The distance between the individual elements is one fth of the wavelength. The subarrays are fully separated. The X-array consists of elements 1 to 4, the Y-array consists of elements 5 to 8 and the Z-array consists of elements 9 to 12. The signals are random complex waveforms. The algorithm uses a weighting factor of 0.99. Case 1: tracking two crossing sources

Suppose a car is driving at 120km/h on a highway which is at 1km from the antenna. The rate of change of the direction of arrival then equals approximately 2 degrees/sec. If the algorithm would update the directions of arrival at a 50 Hz rate, the angles would change at 0.04 degrees/sample. In the experiment the rst source is initially at 20 degrees and moves at an angular speed of 0.05 degrees/sample. The second source is originally located at 40 degrees but moves at 0.03 degrees/sample in the opposite direction. 400 samples are taken which amounts to a timeslot of 8 seconds. The signal to noise ratio is set to 30 dB. Figure 5 shows the amplitude and the phase of the estimated 's. The dotted curves indicate the exact 's. The behavior is typical for each of the runs we did. As long as the sources are well separated, the estimates are very good. Obviously, when the sources approach each other and cross, the estimates severely degrade. Theoretically the basic algorithm can not distinguish between two coinciding sources, since then the array matrix drops rank. The two sources are then interpreted as one single source. Our algorithm is kept quite simple, in view of adaptive and parallel implementation, and is therefore certainly susceptible to improvement. Case 2: reconstructing the signal waves

In this scenario the two sources are located at two stationary positions. The source locations are 20 and 30 degrees and the signal-to-noise ratio is 30 dB. The signals are random complex waveforms and are approximately orthogonal to each other. Figure 6 shows the estimated waveforms on top of the exact signals. Note that, since the signals can only be reconstructed to within a complex constant, these estimated signals have been re-scaled. For each signal the amplitude and phase are plotted. Initially, the estimates are quite poor. The reason is that for a small number of samples the assumption of orthogonality

20

of the source signals is violated 14. As the number of samples grows, the source signals become more orthogonal and the estimates improve.

Acknowledgements The authors would like to thank Prof. P. Van Dooren (University of Illinois, Urbana-Champaign) and A.-J. van der Veen (University of Technology Delft) for many helpful suggestions. This research was partially sponsored by ESPRIT Basic Research Action Nr. 3280. Marc Moonen is a senior research assistant with the Belgian N.F.W.O. (National Fund for Scienti c Research). Filiep Vanpoucke is a research assistant with the Belgian N.F.W.O.

References [1] J.-P. Charlier, P. Van Dooren, `A Jacobi-like algorithm for computing the generalized Schur form of a regular pencil'. J. Comp. Appl. Math., 27 (1989), pp 17-36. [2] B. De Moor, Mathematical Concepts and techniques for modelling static and dynamic systems. PhD Thesis Katholieke Universiteit Leuven, Dept. El. Eng., 1988. [3] W.M. Gentleman, H.T. Kung, `Matrix triangularization by systolic arrays'. Real-Time Signal Processing IV, Proc. SPIE, Vol. 298 (1982), pp 19-26. [4] P.E. Gill, G.H. Golub, W. Murray, M.A. Saunders, `Methods for modifying matrix factorizations'. Math. Comp. 28 (1974), No. 126, pp 505535. [5] G.H. Golub, C.F. Van Loan, Matrix computations. North Oxford Academic Publishing Co., Johns Hopkins Press, 1988. [6] E. Kogbetliantz, `Solution of linear equations by diagonalization of coecient matrices'. Quart. Appl. Math., 13 (1955), pp 123-132. [7] F.T. Luk, `A triangular processor array for computing singular values'. Lin. Alg. Appl., 77 (1986), pp 259-273. [8] M. Moonen, P. Van Dooren, J. Vandewalle. `An SVD updating algorithm for subspace tracking'. To appear in SIAM J. Matrix Anal. Appl., 13 (1992), No. 4. 14 A submatrix of an orthogonal (unitary) matrix is not orthogonal (unitary) anymore.

21

[9] M. Moonen, P. Van Dooren, J. Vandewalle, `A systolic array for SVD updating'. To appear in SIAM J. Matrix Anal. Appl. [10] A. Paulraj, R. Roy, T. Kailath, `A subspace rotation approach to signal parameter estimation'. Proc. IEEE, 74 (1986), pp 1044-1045. [11] R.H. Roy, ESPRIT : Estimation of signal parameters via rotational invariance techniques. PhD thesis, Stanford Univ., Stanford CA, 1987. [12] G.W. Stewart, `A Jacobi-like algorithm for computing the Schur decomposition of a nonhermitian matrix'. SIAM J. Sc. Stat. Comp., 6 (1985), pp 853-863. [13] A.J. van der Veen, E.F. Deprettere, `Parallel VLSI matrix pencil algorithm for high resolution direction nding', IEEE Trans. Signal Processing, 39 (1991), No. 2, pp 383-394.

22

33 input 33 33 33

33 output 33  33 33

V

00 0@0  Ys @@ R  Xs @@ @@ @00

Figure 1

23

 W

5

input

33 33 33 33

V

@@ @@R @@ @@

Figure 2

24

Figure a

Figure b

Figure c

Figure d

Figure e

Figure f

Figure g

Figure h

Figure i

Figure j

Figure k

Figure l

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 @111@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 @ 1 11 11 11 11 11 111 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 @ 1 11 11 11 11 11 111 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 @111@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 @ 11 11 11 11 11 11 11 111111

25

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11@11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 @111@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 @111@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

Figure m

Figure n

Figure o

Figure p

Figure q

Figure r

Figure s

Figure t

Figure u

Figure v

Figure w

Figure x

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 @111@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 @ 1 11 11 11 11 11 111 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 @ 1 11 11 11 11 11 111 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 @111@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 @ 11 11 11 11 11 11 11 111111

26

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11@11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 @111@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 @111@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

Figure y

Figure z

Figure A

Figure B

Figure C

Figure D

Figure E

Figure F

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 @ 1 11 11 11 11 11 111 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 @ 1 11 11 11 11 11 111 111111

Figure 3

27

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11@11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 @11@111 111 111 111 111 11 11 @11 11 11 11 11 11 11 @11 @ 11 11 11 11 11 11 11 11@ 11 11 11 11 11 11 111111

33 input 33 33 33

V

00 0@0  Ys @@ R  Xs @@ @@ @00

Figure 4

28

 W

estimated poles: amplitude

50

0.5

0 0

estimated poles: phase

0

1

phase

amplitude

1.5

theta : 20 40 omega : 0.05 -0.03 snr : 30 30 100

200

300

-50 -100 -150 0

400

sample

100

200 sample

Figure 5

29

300

400

15

amplitude: signal 1

100

ampl

phase

10

5

0 0

2

50

100 time

150

-200 0

200

amplitude: signal 2

50

100 time

150

200

phase: signal 2

200 100

phase

ampl

0 -100

1.5 1 0.5 0 0

phase: signal 1

200

0 -100

50

100 time

150

-200 0

200

Figure 6

30

50

100 time

150

200

Suggest Documents