Algorithmic Engineering Applied to the QR-RLS Adaptive Algorithm M Harteneck, R W Stewart
Signal Processing Division, Department of Electrical & Electronic Engineering, University of Strathclyde, Glasgow, G1 1XW, Scotland, U.K., email: fmoritz,
[email protected]
J G McWhirter, I K Proudler
DRA Malvern, St Andrews Road, Great Malvern, Worchestershire, WR14 3PS, U.K., email: fjgm,
[email protected] Abstract In this paper the technique of algorithmic engineering is used to reduce the computational complexity of a QR-RLS adaptive ltering algorithm with parallel weight extraction. As simple transformations of a signal ow graph representation are used for the derivation, complex mathematics are almost completely avoided.
1 Introduction Adaptive nite impulse response (FIR) lters are used in many real world applications where the environment in which the lter has to operate is time-varying or not know apriory. Such problems include acoustic echo cancelation, acoustic noise control and on-line modeling of unknown or time-varying plants. An N -order adaptive FIR lter aims to linearly combine N delayed samples of an input signal x(k), which could originate from an input to an unknown plant, in such a way that the corresponding output y^(k) matches as best as possible a desired signal y(k), which could be the output of the unknown plant. Adaptive system identi cation tries to determine the parameters of an unknown and/or time-changing transfer function and, once these parameters are speci ed, give a complete characterization of the plant. These parameters can then be used for on/o-line controller design or as models of acoustic transfer paths in active noise control systems [1]. Usually, the gradient based least mean squares (LMS) algorithm [2] is used because of its simplicity and low computational complexity of O(N ). However, if a rapid convergence is required then a recursive least squares (RLS) algorithm [3] is more advisable. Some drawbacks of the standard RLS algorithm are its high computational complexity of O(N 2 ) and that it operates on the autocorrelation th
1
2 Algorithmic Engineering Applied to the QR-RLS Adaptive Algorithm matrix of the input signal x(k) and therefore needs a high dynamic range which can be problematic in a xed-point environment. Another algorithm to minimise the least squares criterium is the QR-RLS algorithm [3] which performs a QR decomposition [4] of the input data matrix, and therefore is better conditioned for the use in a xed point environment, but still needs about the same level of computational complexity. One approach to reduce the computational complexity is to develop a \fast" algorithm with the use of algorithmic engineering techniques [5], i.e. by simple transformations of a signal- ow-graph representation of the algorithm as done for the single- and multi-channel QR-RLS algorithm in [6]. In this paper algorithmic engineering techniques are used to reduce the computational complexity of the QR-RLS algorithm with parallel weight extraction [3]. The savings are achived by simple transformations which make the triangular postprocessor super uous while maintaining the same input-output relationship. Therefore, in Section 2, the QR-RLS algorithm is reviewed and the for the derivation important properties are explained. Finally, in Section 3 the derivation is presented.
2 Review of the Adaptive QR-RLS Algorithm An adaptive algorithm for nite impulse response lters tries to predict a desired signal y(k), where k is the discrete time, with a linear combination of delayed versions of the input signal x(k). The prediction y^(k) is therefore formed as y^(k ) =
?1 X
N
wi (k )x(k ? i);
(2.1)
i=0
where fw(k)g is the set of adaptive weights and N is the number of taps. The prediction problem can now be stated as the requirement to nd the optimum set of weights such that the least-squares performance criterion (k); which is de ned as (k ) =
k?1 X
i e2 (k ? i);
(2.2)
i=0
is minimized at every time step k, with the error e(k) being de ned as e(k ) = y (k ) ? y^(k )
(2.3)
and being a forgetting factor to facilitate the algorithm to adapt to changing environments and to ensure convergence in a xed-point environment. The forgetting factor is usually chosen to be slightly less then 1. One possibility to solve this problem is to write it as an N -order linear regression and then solve the problem by applying a QR decomposition [4]. To th
M Harteneck, R W Stewart 3 write the above problem in a matrix-vector notation, the following de nitions are necessary: w(k) = [w0 (k) w1 (k) : : : w ?1(k)] (2.4a) x(k) = [x(k) x(k ? 1) : : : x(k ? N + 1)] (2.4b) y(k) = [y(1) y(2) : : : y(k)] (2.4c) e(k) = [e(1) e(2)? : : : e(k)] (2.4d) ? 1 ? 2 = diagonal ; ; : : : ; ; 1 ; (2.4e) where \T" denotes matrix transpose. Using these de nitions, (2.1) can be written as y^(k ) = x (k )w(k ) (2.5) and (2.2) can be rewritten as T
N
T
T
T
k
k
k
T
(k ) =
?1 X
k
i=0
i e2 (k ? i) = eT (k )k e(k ) = kk2 e(k )k2 = 1
1
2
k
2 62 6 66 66 66 64 6 6 4 |
3
y (1) y (2)
x 6 x 6 2
T
(1) (2) .. . (k)
3
3 2
7 7 7 7 (k)77
7 7 5
7 7 (2.6) 7 = 7w .. 775 ? 64 5 . y (k ) x {z } | {z } y( ) A( ) 1 1 = k 2 y(k) ? 2 A(k)w(k)k2 ; To minimize (2.6), the vector inside the Euclidean norm (k:k) is premultiplied with an orthogonal rotation matrix Q(k) which is calculated in such a way that 1 R ( k) 2 Q(k) A(k) = 0(k) ; (2.7) where R(k) is an N N upper-triangular matrix and 0(k) is a (k ? N ) N zero matrix, i.e. the QR decomposition [4]. Note that the norm of a vector does not change if it is premultiplied with an orthogonal matrix. By premultiplying the vector with Q(k) we get
=
T
T
k
k
k
k
k
(k ) = kQ(k )k2 y(k ) ? Q(k )k2 A(k )w(k )k2 1
=
1
p(k) ? R(k) w(k)
v(k) 0(k)
2
;
(2.8)
where the vector Q(k) 2 y(k) is partitioned in a suitable way into the vectors p(k) and v(k). 1
k
4
Algorithmic Engineering Applied to the QR-RLS Adaptive Algorithm x(k)
y(k)
A 0 0 0 0 0
r
~ w 0
B
~w 2
~w 1
~ w 3
-1
r
Figure 1. Signal-Flow-Graph representation of the canonical least squares processor with parallel weights extraction for 4 weights One common approach to construct the orthogonal rotation matrix Q(k) is to construct it by a series of Given's rotations [7, 4]. These rotations are de ned as
a00;0 a00;1 0 a01;1 =
c s ?s c
a0;0 a0;1 ; a1;0 a1;1
(2.9)
where c and s are the rotation parameter which are de ned as c= s=
a0;0
q
a
q
a
2 0;0
+ a21 0
2 0;0
+ a21 0
a1;0
(2.10a)
;
(2.10b)
:
;
If required, the optimal weight vector can now be calculated from (2.8) via backsubstitution as
R(k)w(k) = p(k) , w(k) = R? (k)p(k): 1
(2.11)
As shown in [3], it is possible to perform the decomposition of (2.7) in a time-recursive fashion and extract the weight vector w(k), i.e. perform the backsubstitution, by using the canonical least squares processor, as shown in Fig. 1 for the case of 4 adaptive weights (N = 4). The processor consists of the interconnection of two equidimensional components with distinct but coupled functions. The rst is the standard triangular array processor (left of the line A{B), consisting of boundary and internal cells of the types (a) and (d) as shown in Fig. 2
5
M Harteneck, R W Stewart (a) Internal Cell 1:
c,s
ui
r
c,s
uo
? 1
uo = c u i s 2 r ; 1 r = s ui + c 2 r ;
(d) Boundary Cell:
if ui = 0 then 1 c = 1; s = 0; r = 2 r; o = c i ; ui else c,s 1 1 c = 2 r ( r2 + u2i )? 2 ; r 1 2 2 ? δ o s = ui ( r + u1i ) 2 ; 2 2 ?2 r = ( r + ui ) ; o = c i ; endif
(b) Internal Cell 2:
c,s
ui
r
c,s
uo
uo = c u i s ? 2 r ; 1 r = s ui + c ? 2 r ;
?
1
δi
(c) Downdating Cell:
c,s
uo
r ui
c,s
= 1c (uo + s 12 r); 1 r = s uo + c 2 r ; uo
Figure 2. Processing Cells whose input signals are x(k) and y(k), which calculates 2
2 R(k ? 1) R(k) p(k)3 4 0 r(k ) 5 = Q(k ) 4 0 0 0 x (k)
3
2 p(k ? 1) 1 2 r(k ? 1) 5 : y (k )
1 2
1
T
(2.12)
The second part is a triangular postprocessor (right of the line A{B) consisting of internal cells type (b) only. The rotation calculated by this part is 2
2 3 ? (k) 3 R ? R? (k ? 1) 4?r? (k )w (k )5 = Q(k ) 4?? r? (k ? 1)w (k ? 1)5 ; g 0 1
1
T
1 2
1 2
1
1
T
T
(2.13)
where g is a vector related to the Kalman gain vector. Note that in the postprocessor the matrix R?1(k ? 1) is multiplied by ? 21 which is a value greater than 1 and therefore gives rise to instability as errors in the matrices and vectors are ampli ed. Note that in Fig. 1 the value ?r?1 (k)w (k) is substituted by w~ . One observation which is necessary for the following derivation of the fast algorithm in Section 3 is that if the elements of input vector x(k) are permuted by a permutation matrix , then the elements of the resulting weights vector w(k) will be permuted in the same way, i.e. i
x(k) ! w(k) , x(k) ! w(k); where \!" reads as \leads to" and is a N N permutation matrix.
i
(2.14)
6
Algorithmic Engineering Applied to the QR-RLS Adaptive Algorithm
3 Fast Algorithm
In this section a derivation is shown that makes the triangular postprocessor (right of the line A{B) of the processor shown in Fig. 1 redundant and thereby reduces the computational complexity considerably, while maintaining the same input-output relationship. The derivation consists of three algorithmic engineering transformations, starting with the structure shown in Fig. 1. These transformations are shown in Figs. 3 to 5. The derivation is demonstrated on this low order example for simplicity but can be easily extended to a higher order or a multichannel processor. In Fig. 3 the original structure of Fig. 1 is taken and an additional column of internal cells is inserted between the triangular processor and the postprocessor. This column, which does not change the result of the processor as the rotation parameters c and s are only passed through the internal cells, is fed by x(k ? 4) and calculates the associated backward prediction problem. In the last row of internal cells type (b) the index i of the corresponding weight is marked to show the ordering of the weight vector w(k) (0-1-2-3). In the following step of this iteration, the input vector x(k) is reordered which reorders in the same manner the resulting weights in the bottom row of the internal cells (1-2-30). Then, nally, the last row of the processor, i.e. below the line A1 ? B1 , is separated to form an additional stage. The remaining processor, above the line A1 ? B1 , is very similar to the processor at the beginning of this iteration. It is a 3 3 triangular processor plus two additional columns of internal cells which calculate the forward and backward prediction problems and the input signal x(k ) is delayed by one sample. Note that some inputs to the additional stage are labeled with 1-2-3 as the order is important in the following derivation. In Fig. 4, the remaining processor, above the line A1 ? B1 of Fig. 3, is taken and further transformed. In the rst step, the triangular array is duplicated and the forward prediction problem is separated from the backward prediction problem and the desired signal. Then, in the left column, the input vector x(k) is reordered, which reorders the output to the stage of the next processor from 1-2-3 to 2-3-1, and the delay which appears at all input signals of the processor is moved to all output signals. Then, the redundant parts are combined and a 2 2 triangular processor with two additional columns and a postprocessor (above the line A2 ? B2 ) and an additional stage is created. The next step of the derivation is to make the postprocessor redundant. Therefore, the additional stage (below the line A2 ? B2 ) is transformed, as shown in Fig. 5, such that the output signals from the postprocessor are not needed any more. To obtain the transformation, rst note that the output signals labeled with 2-3-1 and 1-2-3 represent the same signals and are present twice. Therefore if the cells type (b) on the right side could be \inverted" such that they calculate the input signal u given the output signal u and the rotation parameters c and s, then the input could be fed back in a loop into the cells on the left hand side and the signals from the postprocessor above the line A2 ? B2 would not be necessary any more. These \inverse" cells are easily obtained and shown in i
o
7
M Harteneck, R W Stewart x(k)
y(k) 0 0 0 0
0 x(k)
1
2
3
y(k) 0 0 0 0
1 x(k)
2
3
0
y(k) 0 0 0
A1 1
1
2
2
0
3
3
B1
0
Figure 3. 1st Step of the Derivation from a 4 4 array to a 3 3 array.
8
Algorithmic Engineering Applied to the QR-RLS Adaptive Algorithm y(k)
x(k)
0 0 0
1 x(k)
2 x(k)
3 y(k) 0
0
0
0
0
0
1
2
1
3 x(k)
x(k)
2
3
y(k)
0
0 0
0 0
0
1 2
3
x(k)
2
3
1 y(k) 0 0
A2
B2 0
0
1 2
3
2
3
1
Figure 4. 2nd Step of the Derivation from a 3 3 array to a 2 2 array.
9
M Harteneck, R W Stewart A2
B2 0
0
1 2
3
2
3
1
A2
B2 0
0
1
2
3
Figure 5. 3rd Step of the Derivation: Transformation of the Additional Stage. Fig. 2 as cells type (c). In Fig. 6 nally the processor equivalent to the processor of Fig. 1 is shown. Note that the resulting weight vectors of Figs. 1 and 6 are reordered. The part above the line A2 ? B2 is now only the triangular processor consisting of the cells type (a) and (d) of Fig. 2 and not any more the processor consisting of cells type (a), (b) and (d) as in Fig. 1. To nally obtain an algorithm with a computational complexity of O(N ) note rst that all the additional stages of Figs. 3 to 5 have a computational complexity of O(N ). Therefore the only remaining part with a computational complexity of O(N 2 ) is the remaining processor above the line A2 ? B2 in Fig. 5 which can be transformed easily as shown in [6] to an equivalent \fast" structure.
4 Conclusions In this paper we have shown via algorithmic engineering techniques applied to a signal- ow-graph representation how to transform a QR-RLS algorithm with parallel weight extraction to an equivalent algorithm with a lower computational complexity. The derivation is easily extendable to a higher order N of the triangular array and to even multichannel systems. One drawback of the algorithm is the inherent instability of the algorithm as the Cholesky factor of the autocorrelation matrix R(k) and the inverse of the
10
Algorithmic Engineering Applied to the QR-RLS Adaptive Algorithm
x(k)
y(k)
A2
B2
0
0
A1
0
r
1
2
3
0
Figure 6. Final Structure without Postprocessor above Line A2 ? B2
B1
M Harteneck, R W Stewart 11 Cholesky factor R?1 (k) are propagated independently and errors in the inverse factor are ampli ed by the inverse of the forgetting factor which will cause the algorithm to diverge.
Bibliography 1. S J Elliot and P A Nelson. Active Noise Control. IEE Communications and Electrical Engineering Journal, 10(4):12{35, October 1993. 2. B Widrow and S D Stearns. Adaptive Signal Processing. Prentice Hall, Englewood Clis, 1985. 3. S Haykin. Adaptive Filter Theory. Prentice Hall, 2 edition, 1991. ISBN 0-13-012236-5. 4. G H Golub and C F Van Loan. Matrix Computations. John Hopkins University Press, 1989. 5. J G McWhirter. Algorithmic Engineering in Adaptive Signal Processing. IEE Proceedings-F, 139(3):226{232, June 1992. 6. I K Proudler and J G McWhirter. Algorithmic Engineering in Adaptive Signal Processing: Worked Examples. IEE Proceeding - Vis. Image Signal Process., 141(1):19{26, Febuary 1994. 7. S Haykin. Adaptive Filter Theory. Prentice Hall Information and Systems Sciences Series, 3 edition, 1996. nd
rd