A systolic array for SVD updating - Semantic Scholar

A systolic array for SVD updating Marc Moonen ESAT Katholieke Universiteit Leuven K.Mercierlaan 94, 3001 Heverlee, Belgium [email protected] Paul Van Dooren University of Illinois Urbana-Champaign 1101 West Spring eld Avenue Urbana, Ill 61801, USA Joos Vandewalle ESAT Katholieke Universiteit Leuven K.Mercierlaan 94, 3001 Heverlee, Belgium ESAT-SISTA REPORT 1991-18

Abstract

In an earlier paper, an approximate SVD updating scheme has been derived as an interlacing of a QR updating on the one hand and a Jacobi-type SVD procedure on the other hand, possibly supplemented with a certain re-orthogonalization scheme. In this paper, this updating algorithm is mapped onto a systolic array with O(n2) parallelism for O(n2 ) complexity, resulting in an O(n0 ) throughput. Furthermore it is shown how a square root free implementation is obtained by combining modi ed Givens rotations with approximate SVD schemes. Singular Value Decomposition, Parallel Algorithms, Recursive Least Squares. AMS(MOS) : 65F15, 65F25, CR: G.1.3 Running Head : SVD Updating Keywords :

1

1

Introduction

The problem of continuously updating matrix decompositions as new rows are appended, frequently occurs in signal processing applications. Typical examples are adaptive beamforming, direction nding, spectral analysis, pattern recognition, etc. [13]. In [12], it has been shown how an SVD updating algorithm can be derived by combining QR updating with a Jacobi-type SVD procedure applied to the triangular factor. In each time step an approximate decomposition is computed from a previous approximation at a low computational cost, namely O(n2) operations. This algorithm was shown to be particularly suited for subspace tracking problems. The tracking error at each time step is then found to be bounded by the time variation in O(n) time steps, which is suciently small for applications with slowly time-varying systems. Furthermore, the updating procedure was proved to be stable when supplemented with a certain re-orthogonalization scheme, which is elegantly combined with the updating. In this paper, we show how this updating algorithm can be mapped onto a systolic array with O(n2) parallelism, resulting in an O(n0) throughput (much like it is the case for merely QR updating, see [5]). Furthermore it is shown how a square root free implementation is obtained by combining modi ed Givens rotations with approximate SVD schemes. In section 2, the updating algorithm is brie y reviewed. A systolic implementation is described in section 3 for the easy case, where corrective re-orthogonalizations are left out. In section 4, it is shown how to incorporate these re-orthogonalizations. Finally, a square root free implementation is derived in section 5. 2

SVD updating

The Singular Value Decomposition (SVD) of a real matrix Am2n is a factorization of A into a product of three matrices Am2n

(m n)

= Um2n 1 6n2n 1 VnT2n

where U has orthonormal columns, V is an orthogonal matrix and 6 is a diagonal matrix, with the singular values along the diagonal. Given that we have the SVD of a matrix A, we may need to calculate

2

the SVD of a matrix A that is obtained after appending a new row to A. A

=

"

A

#

aT

= U (m+1)2n 1 6n2n 1 V Tn2n

In on-line applications, a new updating is often to be performed after each sampling. The data matrix at time step k is then de ned in a recursive manner (k n) (k) A

=

"

(k) A(k01) T a(k)

1

#

= Uk(2k)n 1 6n(k2)n 1 Vn(2k)n

T

Factor (k) is a weighting factor, and a(k) is the measurement vector at time instance k. For the sake of brevity, we consider only the case where (k) is a constant , although everything can easily be recast for the case where it is time-varying. Finally, in most cases the U (k) matrices (of growing size!) need not be computed explicitly, and only V (k) and 6(k) are explicitly updated. An adaptive algorithm can be constructed by interlacing a Jacobi-type SVD procedure (Kogbetliantz's algorithm [9], modi ed for triangular matrices [8, 10]) with repeated QR updates. See [12] for further details.

Initialization

( In2n (0) R ( On2n V

Loop

for k = 1;

::: ;

(0)

1

input new measurement vector a(k) (k)T a? (k) R

QR

"

( (

(k)T a

1 V (k01)

(k01) R

1

updating (k) R?

0

#

(

(k)T

Q

3

1

"

(k) R (k)T a?

#

(k) R (k) V

( (

SVD steps

for i = 1; (k) R V

(k)

fV (k) end

::: ;n

(k) R? (k01) V

01

( 2i(k) 1 R(k) 1 8i(k) ( V (k) 1 8i(k) ( Ti(k) 1 V (k)g T

end

Matrices 2i(k) and 8i(k) represent plane rotations (ith rotation in time step (k) (k) (k) k) through angles i and i in the (i; i +1)-plane. The rotation angles i and i(k) should be chosen such that the (i; i + 1) element in R(k) is zeroed, while R(k) remains in upper triangular form. Each iteration thus amounts to solving a 2 2 2 SVD on the main diagonal. The updating algorithm then reduces to applying sequences of n 01 rotations, where the pivot index i repeatedly takes up all the values i = 1; 2; : : : ; n 0 1 (n such sequences constitute a pipelined double sweep [14]), interlaced with QR updates. At each time step, R(k) will be \close" to a (block) diagonal matrix, so that in some sense V (k) is \close" to the exact matrix with right singular vectors, see [12]. Transformations Ti(k) correspond to approximate re-orthogonalizations of row vectors of V (k). These should be included in order to avoid round-o error build-up, if the algorithm is supposed to run for say thousands of time steps (see [12]). For the sake of clarity, the re-orthogonalizations are left out for a while, and are dealt with only in section 4. In the sequel, the time index k is often dropped for the sake of conciseness. 3

A systolic array for SVD updating

The above SVD updating algorithm {for the time being without re-orthogonalizations{ can elegantly be mapped onto a systolic array, by combining

4

systolic implementations for the matrix-vector product, the QR updating and the SVD. In particular with n 01 SVD iterations after each QR update 1 , an ecient parallel implementation is conceivable with O(n2) parallelism for O(n2 ) complexity. The SVD updating is then performed at a speed comparable to the speed of merely QR updating. The SVD updating array is similar to the triangular SVD array in [10], where the SVD diagonalization and a preliminary QR factorization are performed on the same array. As for the SVD updating algorithm, the diagonalization process and the QR updating are interlaced, so that the array has to be modi ed accordingly. Also, from the algorithmic description it follows that the V -matrix should be stored as well. Hence we have to provide for an additional square array, which furthermore performs the matrix-vector products aT 1 V . It is shown how the the matrix-vector product, the QR updating and the SVD can perfectly be pipelined at the cost of little computational overhead. Finally, it is brie y shown how e.g., a total least squares solution can be generated at each time step, with only very few additional computations. Figure 1 gives an overview of the array. New data vectors are continuously fed in, to the left-hand side of the array. The matrix-vector product is computed in the square part, and the resulting vector is passed on to the triangular array that performs the QR updating and the SVD diagonalization. Output vectors are ushed upwards in the triangular array, and become available at the right-hand side of the square array. All these operations can be carried out simultaneously as is detailed next. The correctness of the array has also been veri ed by software simulation. We rst brie y review the SVD array of [10], and then modify the Gentleman-Kung QR updating array accordingly. Next we show how to interlace the matrix-vector products, the QR updates and the SVD process, and additionally generate (total least squares) output vectors.

1. SVD array Figure 2 shows the SVD array of [10]. Processors on the main diagonal perform 2 2 2 SVD's annihilating the available o-diagonal elements. Row transformation parameters are passed on to the right, while column transformation parameters are passed on upwards. O-diagonal processors 1 If the number of rotations after each QR update is for instance halved or doubled, the array can easily be modi ed accordingly.

5

only apply and propagate these transformations to the blocks next outward. Column transformations are also propagated through the upper square part, containing the V -matrix (V 's rst row in the top row, etc.). In this parallel implementation, o-diagonal elements with odd and even row numbers are being zeroed in an alternating fashion (odd-even ordering). However, it can easily be veri ed that an odd-even ordering corresponds to a cyclic by rows or columns ordering, apart from a dierent start-up phase [11, 14]. The 222 SVD's that are performed in parallel on the main diagonal can indeed be thought of as corresponding to dierent pipelined sequences of n 01 rotations, where in each sequence the pivot index successively takes up the values i = 1; : : : ; n 0 1. A series of n such sequences is known to correspond to a double sweep (pipelined forward + backward) in a cyclic by rows ordering. In Figure 2, one such sequence is indicated with double frames (for i = 1; : : : ; 7), starting in gure 2a. In a similar fashion, the next sequence starts o from the top left corner in gure 2e. As pointed out in section 2, the QR updatings should be inserted in between two such sequences.

2. A modi ed Gentleman-Kung QR updating array A QR updating is performed by applying a sequence of orthogonal transformations (Givens rotations) [6]. Gentleman and Kung have shown how pipelined sequences of Givens rotations can be implemented on a systolic array, see [5]. This array should now be matched to the SVD array, such that both can be combined. Figure 3 shows a modi ed QR updating array. While all operations remain unaltered, the pipelining is somewhat dierent, so that the data vectors are now propagated through the array in a slightly dierent manner. The data vectors are fed into the array in a skewed fashion as indicated, and are propagated downwards while being changed by successive row transformations. On the main diagonal, elementary orthogonal row transformations are generated. Rotation parameters are propagated to the right, while the transformed data vector components are passed on downwards. Note that each 222 block combines its rst row with the available data vector components, and pushes the resulting data vector components one step downwards. The rst update starts o in gure 3a (large lled boxes), the second one in 3e (smaller lled boxes), etc. Furthermore, each update is seen to correspond to a sequence of rotations where the pivot index takes up the values

6

i = 1; : : : ; n. Both the processors con guration and the pipelining turn out to be the same as for the SVD array.

3. Matrix-vector product The matrix-vector product aT 1 V can be combined with the SVD steps as depicted in Figure 4 (4a-4g). The data vectors aT are fed into the array in a skewed fashion as indicated, and are propagated to the right, in between two rotation fronts corresponding to the SVD diagonalization (frames). Each processor receives a-components from its left neighbour, and intermediate results from its lower neighbour. The intermediate results are then updated and passed on to the upper neighbour, while the a-component is passed on to the right. The resulting matrix-vector product becomes available at the top end of the square array. It should be stressed that a consistent matrix-vector product a 1 V can only be formed in between two SVD rotation fronts. That is a restriction, and it is worthwhile analyzing its implications. {First, the propagation of the SVD rotation fronts dictates the direction in which a matrix-vector product can be formed. The resulting vector a? thus inevitably becomes available at the top end of the square array, while it should be fed into the triangular array at the bottom for the subsequent QR update. The a?-components therefore have to be re ected at the top end and propagated downwards, towards the triangular array (Figure 4e-4p). The downward propagation of an a? -vector is then carried out in exactly the same manner as the propagation in the modi ed Gentleman-Kung array (see also Figure 3). {Second, the V -matrix that is used for computing aT 1 V is in fact some older version of V , which we term V (\) 2. For a speci c input vector a(k) , this V (\) equals V (k01) up to a number of column transformations, such that V

(k01)

= V (\) 1 8(\)

where 8(\) denotes the accumulated column transformations. In order to obtain a?(k) , it is necessary to apply 8(\) to the computed matrix-vector product (k) a? 2

=

(k)T a

1 V (k01)

One can check that it is not possible to substitute a speci c time index for the \\".

7

=

(\) (\) {z1 V } 18

(k)T

|a

T

a?(\)

These additional transformations represent a computational overhead, which is the penalty for pipelining the matrix-vector products with the SVD steps on the same array. Notice however that at the same time, the throughput improves a lot. Waiting until V (k01) is formed completely, before calculating the matrix-vector product, would induce O(n) time lags and likewise result in an O(n01 ) throughput. With the additional computations, the throughput is O(n0). Let us now focus on the transformations in 8(\) and the way these can be processed. One of these transformations is e.g., 81(k05) (see section 2 for notation), which is computed on the main diagonal in Figure 4a (double T frame). While propagating downwards, the a?(\) vector crosses the upgoing rotation 81(k05) in gure 4e. At this point this transformation can straightT forwardly be applied to the available a?(\) -components. Similarly, one can verify that 81(k04) , 81(k03) , 81(k02) and 81(k01) are applied in Figure 4h, 4k, 4n and 4q respectively. The transformations in the other columns can be traced similarly. In conclusion, each frame in the square array now corresponds to a column transformation, that is applied to a 2 2 2 block of the V -matrix, and to the two available components of an a?(\)-vector. These components are propagated one step downwards next. A complete description for a 222 block in the V -matrix is then as follows. Notation is slightly modi ed for conciseness, and and 5 represent memory cells which are lled by the updated elements of the a?(\)-vector.

8

Internal Cell V -matrix

Input Transformation Parameters

c ; s

cin; sin

Apply Transformation (

cout ; sout

xout

(ap )in -

6

)

a? q +1 vp;q +1

+

(ap+1 )in- vp ;q vp 5;q +1

+1 +1

666

xin

yin

(cin ; sin)

a? q vp;q vp+1;q

yout

6 6

a? q vp;q

- (a ) - (a )

h

s c

c s

i

0

Propagate a?q ; a?q+1

; 5

p out

p+1 out

a? q +1 vp;q +1 vp+1;q +1 a? a? q q +1 vp;q vp;q +1 vp+1;q vp+1;q +1

a?q ; a?q+1

Propagate Transformation Parameters

cout; sout

c ; s

= even Input ap ; ap+1 and Intermediate Results ap ; ap+1 ; x; y (ap )in ; (ap+1)in ; xin; yin Update Intermediate Result x x + vp;q 1 ap + vp+1;q 1 ap+1 y y + vp;q+1 1 ap + vp+1;q+1 1 ap+1 Propagate ap ; ap+1 and intermediate results (ap )out; (ap+1 )out ; xout; yout ap ; ap+1 ; x; y if q

end

By the end of Figure 4p, the rst a? -vector leaves the square array in a form directly amenable to the (modi ed Gentleman-Kung) triangular array.

4. Interlaced QR updating and SVD diagonalization Finally, the modi ed Gentleman-Kung array and the triangular SVD array are easily combined (Figure 4q etc.). In each frame, column and row transformations corresponding to the SVD diagonalization are performed rst (see also Figure 2), while in a second step, only row transformations are performed corresponding to the modi ed QR updating (aecting the

9

a? -components and the upper part of the 2 2 2 -blocks (see also Figure 3). Again, column transformations in the rst step should be applied to the a?components as well. Complete descriptions for boundary cells and internal cells are as follows.

Boundary Cell R-factor

Compute Rotation Angles ; cos ; sin ; cos ; sin Apply Transformation

c ; s ; c ; s

(cout ; sout) a? i ri;i

6

a? i+1 ri;i+1

+

a? i ri;i

a? i+1 ri+1;i+1

-(c -(c

out ; sout ; s out out

) )

ri+1;i +1

0

ri;i

0

1

s c

c s

0

a? i ri;i

a? i+1 ri;i+1 ri+1;i+1

Compute Rotation Angle c ;s cos ; sin Apply Transformation c s

ri;i+1 a?i+1

0

s c

h

ri;i a?i

s c

c s

0

0

a?i+1

Propagate a?i+1

a?i+1


cout ; sout; cout; sout ; cout; sout

10

c ; s ; c ; s ; c ; s

i

Internal Cell R-factor

Input Transformation Parameters

c ; s ; c ; s ; c ; s

Apply Transformation

a? q rp;q rp+1;q

rp;q a?q

a? q +1 rp;q +1 rp+1;q +1 1

s c

rp;q+1 a?q+1

(cin ; sin) (cin ; sin)

c s

0

(cout; sout )

cin ; sin; cin; sin; cin; sin

a? q rp;q rp+1;q

a? q +1 rp;q +1 rp+1;q +1

h

Apply Transformation c

0s

rp;q a?q

s c

s c

c s

i

-

a? q rp;q

rp;q+1 a?q+1

a? q +1 rp;q +1

+5

-(c -(c

out ; sout ; s out out

rp+1;q rp+1;q +1

0

6

(

6

cin; sin

)

Propagate a?q ; a?q+1

; 5

a?q ; a?q+1


cout; sout ; cout; sout; cout; sout

c ; s ; c ; s ; c ; s

Without disturbing the array operations, it is possible to output particular singular vectors (e.g., total least squares solutions [15]) at regular time intervals. This is easily done by performing matrix vector multiplications V 1 t where t is a vector, with all its components equal to zero, except for one component equal to \1", and which is generated on the main diagonal. The t-vector is propagated upwards to the square array, where the matrixvector product V 1 t is performed, which singles out the appropriate right singular vector. While t is propagated upwards, intermediate results are propagated to the right, such that the resulting vector becomes available at the right-hand side of the array. These solution vectors can be generated at the same rate as the input data vectors are fed in, and both processes can run simultaneously without interference.

11

) )

4

Including re-orthogonalizations

In [12], it was shown how additional re-orthogonalizations stabilize the overall round-o error propagation in the updating scheme. In the algorithmic description of Section 2, Ti(k) is an approximate re-orthogonalization and normalization of rows p and q in the V -matrix. The row indices p and q are chosen as a function of k and i, in a cyclic manner. Furthermore, the re-orthogonalization scheme was shown to converge quadratically. In view of ecient parallel implementation, we rst re-organize this re-orthogonalization scheme. The modi ed scheme is then easily mapped onto the systolic array. The computational overhead turns out to be negligible, as the square part of the array (V -matrix) so far remained under-loaded, compared to the triangular part, see below for gures. First of all, as the re-orthogonalization scheme cyclicly adjusts the row vectors in the V -matrix, it is straightforward to introduce additional row permutations in the square part of the array. The 222 blocks in the square part then correspond to column transformations (SVD scheme) and row permutations (re-orthogonalization scheme). Orthogonal column transformations clearly do not aect the norms and inner products of the rows, except for local rounding errors assumed smaller than the accumulated errors. Hence, the column transformations are assumed not to interfere with the re-orthogonalization, and thus need not be considered anymore. As for the row permutations, subsequent positions for the elements in the rst column of V are indicated in Figure 5, for a (fairly) arbitrary initial row numbering (as an example, the 222 block in the upper left corner in Figure 5a, interchanges elements 4 and 5, etc.). Let us now focus on one single row (row 1), and see how it can (approximately) be normalized and orthogonalized with respect to all other rows. Later on we will use this in an overall procedure. 1.- In a rst step, the norm (squared) and inner products are computed as a matrix-vector product 2

6 6 6 def 6 x1 = V 1 v1 = 6 6 6 4

3

v1T

2 7 7 7 6 7 6 716 7 4 7 5

v2T

.. .

T vn

3 v1

2

6 7 7 def 6 7 = 6 6 5 4

11 12

.. .

3 7 7 7 7 5

1n

where viT is the ith row in V . On the systolic array, where v1 initially resides

12

in the bottom row, it suces to propagate the v1-components upwards, and accumulate the inner products from the left to the right (Figures 5a-5h, where pq is shorthand for pq ). The resulting x1 -vector components run out at the right-hand side. 2.- In a second step, this x1 -vector is \back-propagated" to the left (Figure 5e-5t). Due to the permutations along the way, the x1 -components reach the left-hand side of the array at the right time and in the right place, such that 3.- in a third step, a correction vector can be computed as a matrix-vector product 2

y1

def h

=

( 0 1)

1 2 11

6 6 i 6 6 12 : : : 1n 1 6 6 6 4

v1T vT 2

.. .

T vn

3 7 7 7 7 7 7 7 5

where the term 21 (11 0 1) corresponds to the rst order term in the Taylor series expansion for the normalization of v1 . The y1 -vector components are accumulated from the bottom to the top, while x1 is again being propagated to the right (Figure 5q-5w). Finally, 4.- in a fourth step, v1 {which meanwhile moved on to the top row of the array{ is adjusted with y1 ?

v1

v1

0 y1

These operations are performed in the top row of the array (Figure 5t-5w). One can check that if

1 v1 1 vp for some small 1, then ? ? v1 1 v1 ? v1 1 vp v1 v1

= 1 + O() = O() = 1 + O(2) = O(2)

p

p

= 2; : : : ; n = 2; : : : ; n

The above procedure for v1 should now be repeated for rows 2, 3, etc., and furthermore everything should be pipelined. Obviously, one could start

13

a similar procedure for v2 in Figure 5e, for v3 in Figure 5i, etc. The pipelining of such a scheme would be remarkbly simple, but unfortunately there is something wrong with it. A slight modi cation is needed to make things work properly. As for the processing of v2, one easily checks that the computed inner product 21 equals v2 1 v1 = v1 1 v2 = 12, while it should equal v2 1 v1? . A similar problem occurs with 31 and 32 , etc. In general, problems occur when computing inner products with ascending rows, which still have to be adjusted in the top row of the array, before the relevant inner product can be computed. This problem is readily solved as follows. Instead of computing inner products with all other rows, we only take descending rows into account. This is easily done by assigning tags to the rows, where e.g., a 0-tag indicates an ascending row, a 1-tag indicates a descending row. Tags are reset at the top and the bottom of the array. Computing an xi vector is then done as follows 2

6 6 6 def 6 xi = V 1 vi = 6 6 6 4

3

TAG1 v1T

2 7 7 7 6 7 6 716 7 4 7 5

TAG2 v2T .. .

TAGn vnT

3 vi

2

i1

3

6 7 7 i2 7 7 def 6 7 = 6 . 7 7 6 5 4 .. 5 in

For the 828 example of Figure 5, one can check that the result of this is that is orthogonalized onto v2; v3; v4 and v5 (16 = 17 = 18 = 0, because rows 6, 7 and 8 are ascending row at that time, i.e. TAG6 = TAG7 = TAG8 = 0). Similarly v2 is orthogonalized onto v3 ; v4; v5 and v6 (27 = 28 = 21 = 0), etc. The resulting ordering is recast as follows (pp refers to the normalization of row p, whereas pq for p 6= q refers to the orthogonalization of row p onto row q ) v1

11 12 13 14 22 23 24 33 34 44

15 25 35 45 55

26 36 46 56 66

37 47 57 67 77

48 58 68 78 88

51 61 62 71 72 73 81 82 83 84

In such a sweep, each combination appears at least once (wherewith 4 = n2 combination appear twice, e.g., 15 (51), etc.). In the general case, the rows are cyclicly normalized and orthogonalized onto the n2 succeeding rows. One

14

can easily prove that if kV V T 0 I kF = O() before a particular sweep, then kV V T 0 I kF = O(2 ) after the sweep. In other words, the quadratic convergence rate is maintained. On the other hand, it is seen that one single sweep takes twice the computation time for a sweep in a \normal" cyclic by rows or odd-even ordering. This is hardly an objection, as the re-orthogonalization is only meant to keep V reasonably close to orthogonal. The error analysis in [12] thus still applies (with slightly adjusted constant coecients). Complete processor descriptions are left out for the sake of brevity. Let it suce to give an operation count for dierent kinds of processors. See Table 1, from which it follows that the re-orthogonalizations do not increase the load of the critical cells. Internal cell V -matrix Internal cell R-matrix Diagonal cell R-matrix

SVD updating Re-orthog. 162 82 10+ 8+ 282 14+ 252 13+ 9p11 4 1

Total 242 18+ 282 14+ 252 13+ 9p11 4 1

Table 1 Finally, as the rows of V continuously interchange, each input vector a in Figure 1 should be permuted accordingly, before multiplication (see section 2, at 1 V = (at 1 P t ) 1 (P 1 V ), where P is a permutation matrix). One can straightforwardly design a kind of \preprocessor" for a, that outputs the right components of a at the right time. For the sake of brevity, we will not go into details here. 5

Square root free algorithms

The throughput in the parallel SVD updating array is essentially determined by the computation times for the processors on the main diagonal to calculate the rotation angles both for the QR updatings and the SVD steps. In general these computations require respectively one and three square roots,

15

which appears to be the main computational bottle-neck. Gentleman developed a square root free procedure for QR updating [1, 4, 7] where use is made of a (one-sided) factorization of the R-matrix. The SVD schemes however as such do not lend themselves to square root free implementation. Still, in [3] a few alternative SVD schemes have been investigated, based on approximate formulas for the computation of either tan or tan . When combined with a (generalized) Gentleman procedure with a two-sided factorization of the R-factor, these schemes eventually yield square root free SVD updating algorithms. Implementation on a systolic array hardly imposes any changes, when compared to the conventional algorithm. Furthermore, as the approximate formulas for the rotation angles are in fact (at least) rst order approximations, the ( rst order) performance analysis in [12] still applies. In other words, the same upper bounds for the tracking error are valid, even when approximate formulas are used.

1. Square root free SVD computations The SVD procedure is seen to reduce to solving elementary 2 2 2 SVD's on the main diagonal (Section 2). For an approximate SVD computation, the relevant transformation formula becomes # #" #" " # " 3 r3 ri;i 0 sin cos ri;i ri;i+1 0 sin cos i;i+1 = cos sin cos sin 0 ri+1;i+1 0 r3 i+1;i+1

3 j jri;i+1j). In where ri;i+1 is only approximately being annihilated (jri;i +1 particular, the following approximate schemes from [3] turn out to be very useful for our purpose. (For details, we refer to [3].) if jri;ij jri+1;i+1j

= r2 0rir+12 ;i+1ri;i++1r2 i;i i+1;i+1 i;i+1 approximation 1 : tan = approximation 2 : tan = 1+

2

+ ri;i+1 tan = ri+1;i+1 tan r

i;i

16

if jri;ij jri+1;i+1j

approximation 1 : approximation 2 :

= r2

ri;i ri;i+1 2 2 i+1;i+1 ri;i ri;i+1

0

tan = tan = 1+

+

2

0 ri;i+1 tan = ri;i tan r

i+1;i+1

In the sequel, we consider only jri;ij jri+1;i+1j, as the derived formulas can straightforwardly be adapted for the other case. These approximate schemes still require two square roots for the computation of cos and cos . The above approximate formulas can however be combined with a (generalized) Gentleman procedure, where use is made of a two-sided factorization of the R-matrix 1 1 R

= Drow 1 R 1 Dcol 2

2

and where only R , Drow and Dcol are stored (in the sequel, an overbar always refers to a factorization). Let us rst rewrite the rst approximate formula for tan r2

tan = ( r2 0 r2 i+1;i+1+ r2 ) 1 rri;i+1 i+1;i+1 i;i i+1;i+1 i;i+1 {z

|

}

As contains only squared values, it can be computed from the factorization of R as well.

=

0

col 2 drow i di ri;i

col 2 drow i+1 di+1ri+1;i+1 col 2 row col 2 drow i+1 di+1 ri+1;i+1 di di+1 ri;i+1

+

Obviously, a similar formula for can be derived from the second approximate formula for tan (that has better convergence properties, see [3]). Applying Gentleman's procedure to the row transformation then gives "

# " p row di 0 sin cos

cos sin 1

0

q0

# "

drow i+1

17

1

3 # 2 q col d 0 r i;i ri;i+1 1 4 i q 5 0 ri+1;i+1 dcol 0 i+1

=

" q

drow i+1

0

cos

#

0

p row di cos

6 0 tan 1 64 1 "

2 q

=

4

dirow

3

0

q

With

0

dirow +1

r

2

tan

r i;i r

i+1;i

r

# 2 q

drow i+1 drow i

7 71 5

dcol 0 i 4 q 1 1 0 rir+1i;i+1 ;i+1 0 dcol i+1 3 3 " # 2 q col r i;i ri;i +1 1 4 di q 0 5 51 r i+1;i ri+1;i+1 dcol 0 i+1

3

ri;i

3 5

p row d qi tan = 1 r i+1;i+1 drow i+1 ri;i+1

this leads to row transformation formulas "

3

1

dirow drow i+1

# r i;i +1 r i+1;i+1

2

1

0 1 riri;i;i = 4 1

drow i row +1 di+1

+1

+1

1 riri;i;i

+1

+1 +1

3 " 5 1 ri;i

ri;i+1

#

0 ri+1;i+1

with scale factor updating (notice the implicit row permutation) row3

= =

di

row3

di+1

cos2 =

cos2 row cos2 di row

di+1

1

1 + 2 1 riri;i;i

d

2 row +1 i 2 row +1 +1 i+1

d

Note that due to the 1's in the rst transformation formula, a 50% saving in the number of multiplications is obtained in the o-diagonal processors. The column transformation should then annihilate ri+1;i in order to preserve the triangular structure. With

tan =

ri+1;i+1

ri+1;i

q

q

dcol i+1

dcol i

we can again apply Gentleman's procedure as follows

18

" p

dirow

3

q

0

=

" p

#

0

dirow +1

dirow

3

0

=

dirow

0

3

3

0

#

row 3

di+1

r

0 tan

6 1 64 " p

+1 ri;i

" p

1 q

0

dirow +1

3

#

q0

dcol i

1 r i+1;i+1 1 i+1;i r

q 2

ri;i

0

sin cos 1 0cos sin

dcol i+1

1 rri;i rri;i+1 1 i+1;i i+1;i+1

dicol dcol i+1

r

# r3

1

3

1

tan

dcol i+1 dcol i

3 +1 ri;i

7 71 5

" q

dcol i+1 cos

dcol i cos

0

" pdcol3

0 ri3+1;i+1 1 i;i

p

q0

i

0

# 3

dicol +1

which then leads to column transformation formulas "

r 3

i;i

0

# r i;i3 +1 r 3 i+1;i+1

=

"

2 # r i;i+1 : 64 r

r i;i r

i+1;i

i+1;i+1

#

0

+1

0 riri ;i;i 1

+1

+1

1

3

7 5 ri+1;i+1 dcol i+1 col ri+1;i di

with scale factor updating col3

di

col3

di+1

= =

cos2 =

cos2 col 2 di cos col

di+1

1+

1

2 col ri+1 ;i+1 di+1 2 col ri+1;i di

2. Square root free SVD updating The above approximate SVD schemes straightforwardly combine with the square root free QR updating procedure into a square root free SVD updating procedure. At a certain time step, the data matrix is reduced to R, triangular and stored in factorized form R

= Drow 1 R 1 Dcol 1 2

1 2

19

Furthermore, the same column scaling is applied to the V -matrix V

= V 1 Dcol 1 2

where V is stored instead of V . The reason for this is twofold. First, the column rotations to be applied to the V -matrix are computed as modi ed Givens rotations. Explicitly applying these transformations to an unfactorized V would then necessarily require square roots. Second, a new row vector aT to be updated, immediately gets the correct column scaling from the matrix-vector product aT 1 V , so that the QR updating can then be carried out as if there were no column scaling at all. The updating can indeed be described as follows. "

A

aT

#

"

# 2

3

= 0 01 1 4 DTrow 1 R 1 Dcol 5 1 V T 1 Dcol) a 1 (V # " # " " # R U 0 Drow 0 1 aT 1 (V ) 1 Dcol 1 V T = 0 1 1 0 1 = etc. U

1 2

1 2

1 2

1 2

1 2

The factor in the midst of this expression can then be reduced to a triangular factor by QR updating, making use of modi ed Givens rotations. The further reduction of the resulting triangular factor can be carried out next as detailed in the previous section. From the above explanation, it follows that on a systolic array, a square root free updating algorithm imposes hardly any changes. The diagonal matrices Drow and Dcol are obviously stored in the processor elements on the main diagonal, and R and V are stored instead of R and V . The matrix-vector product aT? = aT 1 V is computed in the square part, and the R -factor is updated with aT? next, much like the R-factor was updated with aT? = aT 1 V in the original algorithm. All other operations are carried out much the same way, albeit that modi ed rotations are used throughout. When re-orthogonalizations are included, it is necessary to propagate the scale factors to the square array {along with the column transformation{ such that the norms and inner products can be computed consistently. The rest is straightforward. Finally, an operation count for a square root free implementation is exhibited in Table 2. Note that the operation count for the diagonal processors heavily depends on the speci c implementation. We refer to the

20

literature for various (more ecient) implementations [1, 7]. The operation count for V -processors remains unchanged (as compared to Table 1), while the computational load for the diagonal processors is reduced to roughly the same level, apart from the divisions. The internal R-processors are seen to be under-loaded this time, due to the reduction in the number of multiplications. Internal cell V -matrix Internal cell R-matrix Diagonal cell R-matrix

6

SVD updating Re-orthog. 102 122 10+ 8+ 142 14+ 352 17+ 9 11 Table 2

Total 222 18+ 142 14+ 352 17+ 9 11

Conclusion

An approximate SVD updating procedure was mapped onto a systolic array with O(n2 ) parallelism for O(n2 ) complexity. By combining modi ed Givens rotations with approximate schemes for the computation of rotation angles in the SVD steps, all square roots can be avoided. In this way, a main computational bottleneck for the array implementation can be overcome.

Acknowledgement : Marc Moonen is a senior research assistant with

the Belgian N.F.W.O. (National Fund for Scienti c Research). This work was sponsored (partly) by the BRA 3280 project of the EC. References

[1] BARLOW J.L., IPSEN I.C.F., 1987. Scaled Givens rotations for the solution of linear least squares problems on systolic arrays. SIAM J. Sci. Stat. Comp., Vol. 8, No. 5, pp 716-733.

21

[2] BRENT R.P., LUK F.T., 1985. The solution of singular value and symmetric eigenvalue problems on multiprocessor arrays. SIAM J. Sci. Stat. Comp., Vol. 6, No. 1, pp 69-84. [3] CHARLIER J.P., VANBEGIN V., VAN DOOREN P., 1988. On ef cient implementations of Kogbetliantz's algorithm for computing the singular value decomposition. Num. Math., Vol. 52, pp 279-300. [4] GENTLEMAN W.M., 1973. Least squares computations by Givens transformations without square roots. J. Inst. Maths Applics., Vol. 12, pp 329-336. [5] GENTLEMAN W.M, KUNG H.T., 1981. Matrix triangularization by systolic arrays. Real-Time Signal Processing IV, Proc. SPIE, Vol. 298, pp 19-26. [6] GILL P.E., GOLUB G.H., MURRAY W., SAUNDERS M.A., 1974. Methods for modifying matrix factorizations. Math. Comp., Vol. 28, 126, pp 505-535. [7] HAMMARLING S., 1974. A note on modi cations to the Givens plane rotations. J. Inst. Maths Applics., Vol. 13, pp 215-218. [8] HEATH M.T., LAUB A.J., PAIGE C.C, WARD R.C., 1986. Computing the singular value decomposition of a product of two matrices. SIAM J. Sci. Stat. Comp., Vol. 7, No. 4, pp 1147-1159. [9] KOGBETLIANTZ E., 1955. Solution of linear equations by diagonalization of coecient matrices. Quart. Appl. Math., Vol. 13, pp 123-132. [10] LUK F.T., 1986. A triangular processor array for computing singular values. Lin. Alg. Appl., Vol. 77, pp 259-273. [11] LUK F.T., PARK H., 1987. On the equivalence and convergence of parallel Jacobi SVD algorithms. Proc. SPIE, Vol. 826, Advanced Algorithms and Architectures for Signal Processing II, F. T. Luk, Ed., pp 152-159. [12] MOONEN M., VAN DOOREN P. , VANDEWALLE J. An SVD updating algorithm for subspace tracking. To appear in SIAM J. Matrix Anal. Appl.

22

[13] SPEISER J.M., 1986. Signal processing computational needs. Proc. SPIE, Vol. 696, Advanced Algorithms and Architectures for Signal Processing, J.M. Speiser, Ed., pp 2-6. [14] STEWART G.W., 1985. A Jacobi-like algorithm for computing the Schur decomposition of a nonhermitian matrix. SIAM J. Sci. Stat. Comp., Vol. 6, No. 4, pp 853-863. [15] VAN HUFFEL S., VANDEWALLE J., 1987. Algebraic relations between classical regression and total least squares estimation. Lin. Alg. Appl., Vol. 93, pp 149-162.

23

captions

Figure 1: overview SVD updating array Figure 2: SVD array Figure 3: modi ed Gentleman-Kung array Figure 4: SVD updating array Figure 5: re-orthogonalizations

24

input

-

square array

@ triangular @@ array @@ @@ Figure 1

25

-

output

Figure a

Figure b

Figure c

Figure d

Figure e

Figure f

Figure g

etc.

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 i = 1 ! 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 i=3 ! 1 1 1 1 1 1 1 11 11 11 11 1 11 11 1

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 i = 2 1 11 11 11 1 11

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11 Figure 2

26

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11111111 ! 1 11 11 11 11 11 11 1 11 11 11 11 1 11 11 1

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 111111 i=4 ! 1 1 1 1 1 1 11 11 11 1 11

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

i

Figure a

Figure b

Figure c

Figure d

Figure e

Figure f

Figure g

etc.

= 1 ! 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11 1 11 11 11 11 11 11 11 i = 3 ! 1 11 11 11 11 11 1 11 11 11 1 11

1 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 i = 2 ! 1 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11 11 11 1 11 1 11

1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11 Figure 3

27

1 11 11 11 11 11 11 11 111111 i = 4 ! 1 11 11 11 11 1 11 11 1

1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure a a(k)

81(k05)

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 8 k0 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11 ( 1

Figure e 8(k05) 1 1 1 1 1

8 04) (k 1

8 03) (k 1

5)

Figure c

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 8 k0 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11 ( 1

Figure f

11 11 11 11 111 111 111 111 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 k0 11 11 11 11 11 11 11 11 8 1 11 11 11 11 11 11 1 11 11 11 11 1 11 11 1 ( 1

Figure i 8(k04) 1 1 1 1 1

Figure b

4)

( 1

3)

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 8 k0 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11 ( 1

Figure g

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 8 k0 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11 ( 1

Figure j

11 11 11 11 111 111 111 111 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 k0 11 11 11 11 11 11 11 11 8 1 11 11 11 11 11 11 1 11 11 11 11 1 11 11 1

5)

4)

( 1

Figure 4

28

3)

5)

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure h

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 8 k0 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11 ( 1

Figure k

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 8 k0 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure d

4)

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure l

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 8 k0 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11 ( 1

3)

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure m 8(k03) 1 1 1 1 1

81(k02)

11 11 11 11 111 111 111 111 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 k0 11 11 11 11 11 11 11 11 8 1 11 11 11 11 11 11 1 11 11 11 11 1 11 11 1 ( 1

Figure q 8(k02) 1 1 1 1 1

8 01) (k 1

8

(k ) 1

2)

Figure o

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 8 k0 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11 ( 1

( 1

1)

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 8 k0 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11 ( 1

( ) 1

( 1

1)

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 8 k 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

( ) 1

Figure 4 (continued)

29

2)

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure t

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 8 k0 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11 ( 1

Figure w

Figure v

11 11 11 11 111 111 111 111 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 8 k 1 11 11 11 11 11 11 1 11 11 11 11 1 11 11 1

2)

Figure p

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 8 k0 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure s

Figure r

11 11 11 11 111 111 111 111 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 k0 11 11 11 11 11 11 11 11 8 1 11 11 11 11 11 11 1 11 11 11 11 1 11 11 1

Figure u 8(k01) 1 1 1 1 1

Figure n

1)

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure x

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 8 k 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

( ) 1

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure a 4 5 6 3 7 8 2 1

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 121 1 1 1 1 1 1 1 111111 11 11 11 11 11 11 1 11 11 11 11 11 11 1 11 11 11 11 1 11 11 1

Figure e 5 6 7 4 8 1 3 2

11 1115111611 11 11 11 11 11 11 11 1114111711 11 11 11 11 11 11 11 1113111811 11 11 11 11 11 11 11121111 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure i 6 7 8 5 1 2 4 3

11 11 11 11 11 11161511 11 11 11 11 11 1118111711 11 11 11 11 11 11141111 11 11 11 11 11 11121311 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure c

Figure b 5 4 6 7 3 8 2 1

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 131 1 1 1 1 1 1 1 1118111211 11 11 11 11 11 11 11 111111 11 11 11 11 1 11 11 11 11 11 11 1 11 11 11 11 1 11 11 1

Figure f 6 5 7 8 4 1 3 2

11 11 11 1115111611 11 11 11 11 11 11 11 1114111711 11 11 11 11 11 11 11131811 11 11 11 11 11 11111211 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure j 7 6 8 1 5 2 4 3

5 6 4 7 3 8 1 2

Figure g 6 7 5 8 4 1 2 3

11 11 11 11 11 1115111611 11 11 11 11 11 11 11141711 11 11 11 11 11 11181311 11 11 11 11 11 1112111111 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure k

11 11 11 11 1117111611 11 11 11 11 11 11151811 11 11 11 11 11 11111411 11 11 11 11 11 1113111211 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11 Figure 5

30

11 11 11 11 11 11 11 11 141 1 1 1 1 1 1 1 1117111311 11 11 11 11 11 11 11 1118111211 11 11 11 11 11 11 11 111111 11 11 1 11 11 11 11 11 11 1 11 11 11 11 1 11 11 1

7 8 6 1 5 2 3 4

11 11 11 11 11161711 11 11 11 11 11 11181511 11 11 11 11 11 1112111111 11 11 11 11 11 11141311 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure d 5 6 4 7 8 3 1 2

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

15 16 14 17 13 18 12 11

Figure h 6 7 5 8 1 4 2 3

11 11 11 11 11 11 11151611 11 11 11 11 11 11171411 11 11 11 11 11 1111111811 11 11 11 11 11 11131211 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure l 7 8 6 1 2 5 3 4

11 11 11 11171611 11 11 11 11 11 1111111811 11 11 11 11 11 11151211 11 11 11 11 11 11131411 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure m 7 8 1 6 2 3 5 4

11 11 1118111711 11 11 11 11 11 11161111 11 11 11 11 11 11121511 11 11 11 11 11 1114111311 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure q

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

18 8 1 11 2 12 7 17 3 13 4 14 6 16 15 5

Figure u 1 2 3 8 4 5 7 6

11 1111111211 11 11 11 11 11 11 11 1118111311 11 11 11 11 11 11 11 1117111411 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure o

Figure n 8 7 1 2 6 3 5 4

11 11 11171811 11 11 11 11 11 11111611 11 11 11 11 11 1113111211 11 11 11 11 11 11151411 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure r

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

1 11 8 18 2 12 3 13 7 17 14 4 16 6 15 5

Figure v 2 1 3 4 8 5 7 6

11 11 11 1111111211 11 11 11 11 11 11 11 1118111311 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

8 1 7 2 6 3 4 5

11 11181711 11 11 11 11 11 1112111111 11 11 11 11 11 11161311 11 11 11 11 11 11 141 1 1 1 1 1 1 1 151 1 1 1 1 1 1 1 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure s

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

1 11 2 12 8 18 13 3 17 7 14 4 16 5 15 6

Figure w 2 3 1 4 8 5 6 7

11 11 11 11 11 1111111211 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

Figure 5 continued

31

Figure p

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

18 8 11 1 17 7 12 2 3 13 6 16 4 14 5 15

Figure t 1 2 8 3 4 7 5 6

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

11 12 18 13 17 14 16 15

Figure x 2 3 1 4 5 8 6 7

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11

A systolic array for SVD updating - Semantic Scholar

A systolic array for SVD updating - Semantic Scholar

Suggest Documents

A Low-Power Systolic Array Architecture for Block ... - Semantic Scholar

CUMULUS - A Scalable Systolic Array for Binary

TFM-SVD - Semantic Scholar

A systolic, linear-array multiplier for a class of right

A Systolic-Array Architecture for First-Order 3-D IIR ... - Semantic Scholar

A Systolic Array-Based FPGA Parallel Architecture for the BLAST

A RECTANGULAR ARRAY FOR MOTION ... - Semantic Scholar

Energy-efficient power loading for a MIMO-SVD ... - Semantic Scholar

a wavelet based hybrid svd algorithm for digital ... - Semantic Scholar

A Novel Technique for Broadband SVD - Semantic Scholar

STRATEGIC INFORMATION UPDATING: A ... - Semantic Scholar

Tree Approximation for Belief Updating - Semantic Scholar

A Systolic Array Based GTD Processor With a Parallel Algorithm

A List-based Algorithm for Efficiently Updating the ... - Semantic Scholar

A Novel Algorithm for Non-Bonded-List Updating ... - Semantic Scholar

Updating a Context-Aware Service Directory for M ... - Semantic Scholar

dim array - Semantic Scholar

SVD-BASED SIGNATURE VERIFICATION ... - Semantic Scholar

SVD-BASED SIGNATURE VERIFICATION ... - Semantic Scholar

Array Prefetching for Irregular Array Accesses in ... - Semantic Scholar

Optimized Systolic Array Design for Median Filter in ... - Research Trend

A Constructive Solution to the Juggling Problem in Systolic Array

Dynamic Updating and Downdating Matrix SVD and Tensor ... - MIRLab

Low Power Systolic Array Based Digital Filter for DSP Applications