A Partially Decoupled RLS Algorithm for Volterra Filters David W. Grith, Jr. and Gonzalo R. Arce
Department of Electrical Engineering University of Delaware Newark, Delaware 19716 Tel: (302) 831-8030 E-mail: gri
[email protected] and
[email protected] Abstract In this paper we consider a partially decoupled variation of the RLS algorithm is based on a constrained optimization of the cumulative lter error using the higherorder sets of lter weights to improve on the performance of the lower-order weight sets, whose values are already established. From this constrained optimization, a recursive algorithm is developed whose form closely resembles the classic RLS algorithm, but with some structural dierences. It is shown that these dierences lead to a reduction in computational complexity and in increase in the rate of convergence. Submitted to the IEEE Transactions on Signal Processing EDICS Paper Category: SP 2.7.3
(Non-linear Filters) Permission to publish this abstract separately is granted
1 Introduction Nonlinear digital ltering techniques have been applied to a wide variety of problems in signal estimation and equalization for which linear lters are not adequate. Examples of such situations include modeling nonlinear dynamical systems [1], echo cancelation, performance analysis of data transmission systems, and adaptive noise cancelation [4]. One of the earliest types of nonlinear architectures to be developed in the Volterra lter, which forms an estimate of a desired process by taking higher-order products of a related observed process, which are then weighted and summed. The RLS algorithm for Volterra lters has been well-studied; a detailed treatment may be found in [4], for instance. The computational complexity of the algorithm can be prohibitive, however, especially for large observation window sizes or high lter orders. One way of reducing the complexity of the algorithm is to introduce a partial decoupling of the sets of lter weights. This leads to both a reduction in computational complexity and and more rapid convergence. The paper is organized into sections as follows. In Section 2, we rst brie y review the formulation of the Volterra lter and the Volterra RLS algorithm. We then introduce the partially decoupled RLS algorithm and examine its computational complexity relative to that of the fully coupled RLS algorithm in Section 3. Finally, in Section 4, we compare the performance of the two algorithms by applying them to the problem of equalization in a nonlinear communications channel.
2 The Volterra RLS Algorithm The output of a FIR Volterra lter of order p is computed using a Volterra series truncated to p summations [4], and is given by y(n) = h0 +
N X
h1 (i1 )x 1 +
N N X X
i
i1
=1
h2 (i1 ; i2 )x 1 x 2 + + i
i1
=1 i2 =1
N X
i
i1
=1
N X
h (i1 ; ; i )x 1 x p(;1) p
i
p =1
p
i
i
where the vector x = [x1 ; x2 ; : : : ; x ] contains the N elements of the observed process fx(n)g collected in the vector x(n) = [x(n); x(n ? 1); : : : ; x(n ? N + 1)] . The Volterra lter coecients are contained in the kernels h ; k = 1; 2; : : : ; p of the p summations in 1. Often, the lter output is written as a pseudo-linear operation by concatenating the elements of the kernels into a single vector h , which is partitioned as h = [h1 jh2 j jh ] , where h contains the elements of the kth kernel. If we concatenate the inner products of the observations into a similar vector x (n), which we shall call the Volterra observation vector, we can write the lter output as N
T
T
k
V
V
T
T
T T p
k
V
y(n) = h
T V
1
x (n): V
(2)
The quantity that we wish to minimize in order to optimize the lter performance is the cumulative time-series error E (n), which for a pth order Volterra lter is
E (n) =
n X
? je2 (i)j; n
(3)
i
V
=1
i
where 0 1 and e2 (n) = jd(n) ? h x (n)j2 is the squared lter error when we are trying to estimate a desired signal d(n) from a related observed process fx(n)g. This function, when minimized with respect to the lter weights, gives us a normal equation for the Volterra lter, which is T V
V
V
(n)h (n) = (n) V
Pn
V
(4)
V
where (n) = =1 ? x (i)x (i) is the Volterra correlation matrix and (n) = P ? d(i)x (i) is the cross-correlation vector between the desired response and the Volterra tap =1 inputs. (n) may be partitioned as V
n
n
i
i
n
i
i
T V
V
V
V
V
3
2
(n) 1 (n) 7 6 11
(n) = 664
;
V
.. .
;p
.. .
...
1(n) (n) p;
Pn
7; 7 5
(5)
p;p
where 1 2 (n) = =1 ? x 1 (i)x 2 (i). (n) may be similarly partitioned. As noted in Haykin [3] for the linear ltering case, the RLS correlation matrix (n) is dierent from the Volterra autocorrelation matrix R in two important respects. First, the outer product of Volterra observation vectors x 1 (i)x 2 (i) at the i iteration is weighted by the forgetting factor ? . Second, we assume that the data is pre-windowed, i.e., all input data prior to time i = 1 are equal to zero. If we carry out the ensuing analysis in a fashion analogous to that given by Haykin in [3], we obtain the Volterra RLS algorithm. The convergence properties of this algorithm have been thoroughly studied, and a detailed treatment may be found in [4]. k ;k
i
n
i
T k
k
V
V
k
n
T k
th
i
3 The Partially Decoupled Volterra RLS Algorithm We now develop the partially decoupled version of the Volterra RLS algorithm. The optimal Volterra lter coecients are given by the Volterra normal equation R h = p . It has been shown in [Grith] that if we rst optimize only the linear weights, and then optimize the set of second order weights with the linear weights xed at their optimal values, and proceed to the higher order weights in a similar fashion, we arrive at a slightly dierent normal equation for the Volterra lter, which is V
R4 h = p V
V
2
V
:
V
V
(6)
where R4 is the partitioned block-lower-triangular matrix V
2
6 6 6 4 RV = 666 6 4
R1 1 R2 1 R2 2
0
;
.. .
R
;
.. .
R
1
p;
;
...
R
2
p;
3 7 7 7 7: 7 7 7 5
(7)
p;p
We can apply this technique to the problem of optimizing h (n) with respect to E (n); if we do this we arrive at an analogous normal equation, which is V
4(n)h (n) = (n); where 4(n) is the lower-block-diagonal matrix V
2
(n) 6 11
4(n) = 664 V
(8)
V
V
;
.. .
3
0
...
1(n) (n) p;
7 7: 7 5
(9)
p;p
We may use the de nitions of the RLS correlation arrays to produce dierence equations that will allow us to recursively determine 4(n) and (n), in a manner analogous to Haykin, given initial values for these arrays. We rst examine each of the non-null Volterra blocks of 4(n) and notice that they may be written as V
V
V
k1 ;k2
(n) =
"n?1 X =1
#
?1? x 1 (i)x 2 (i) + x 1 (n)x 2 (n); k1 k2 n
i
T k
k
T k
k
(10)
i
which gives us the general recursion for 4(n), which is V
4(n) = 4(n ? 1) + X4(n); V
where we de ne X4(n) to be V
V
(11)
V
2
T
X4(n) = 66 V
6 4
T
T
x
p
3
0
6 x1 (n)x1 (n) 6 6 x2 (n)x1 (n) x2 (n)x2 (n) 6
.. . (n)x1 (n)
x
T
p
.. ... . (n)x2 (n) T
x (n)x (n) p
T p
7 7 7 7: 7 7 7 5
(12)
In similar fashion we may write the recursion for (n) as V
(n) = (n ? 1) + d(n)x (n): V
V
V
(13)
By de ning recursive update equations for the Volterra correlation arrays, we have developed a means for iteratively optimizing the Volterra lter's tap weights. We simply use the partially 3
decoupled Volterra normal equation in conjunction with (11) and (13). We thus have a preliminary form for the RLS algorithm:
4(n) = 4(n ? 1) + X4(n) (n) = (n ? 1) + d(n)x (n) h (n) = 4? (n) (n) V
V
V
(14) (15) (16)
V
V
V
1
V
V
V
Because 4(n) is often quite large, it is usually impractical to invert the correlation matrix directly at each iteration. Haykin [3] and Mathews [4] at this point use the Matrix Inversion Lemma, relying on the symmetric nature of the fully coupled correlation arrays in the linear and general Volterra cases, respectively. In our case, however, we are faced with a non-symmetric correlation array which prevents our directly applying the inversion lemma. A solution to this problem is to consider only the diagonal Volterra blocks of 4, which are symmetric. We may apply the inversion lemma to each of this in a fashion analogous to that employed by Haykin in his derivation of the RLS algorithm for linear lters. The result of this is a set of recursions for each of the inverses of the diagonal blocks: V
V
P (n) = ?1P (n ? 1) ? ?1 (n)x (n)P (n ? 1); k = 1; 2; : : : ; p; k
where P (n) (n) = I k
k
k
M
k
T k
k
8n and (n) = k
k
?1 P (n ? 1)x (n) 1 + ?1 x (n)P (n ? 1)x (n) k
(18)
k
T k
k
(17)
k
And the initial value of P (n) is set to P (0) = I k , where is a small positive number. We are also able to discern from the above results that (n) = P (n)x (n). Since h1 (n) is independent of all higher-order weight sets, the linear RLS update equation is identical to that which follows from Haykin's analysis. We have k
M
k
k
k
k
h1 (n) = h1 (n ? 1) + 1 (n)1 (n);
(19)
where 1 (n) = d(n) ? h1 (n ? 1)x1 (n) is the rst order innovation. We now consider the case of the second order lter weights, h2 (n). The derivation of the update equation is very similar to that employed for the decoupled linear weights. From the partially decoupled normal equation, T
2 1(n)h1(n) + 2 2(n)h2(n) = 2(n): ;
(20)
;
Solving for h2 (n) and applying the same techniques that were used in the linear case results in
h2(n) = P2 (n ? 1)2 (n ? 1) + 2(n)[d(n) ? h2 (n ? 1)x2 (n)] ? P2 (n)2 1(n)h1 (n): T
4
;
(21)
Inserting the recursion for h1 (n) given in (19) into the above expression and simplifying produces the nal result h2 (n) = h2 (n ? 1) + 2(n)2 (n) ? P2 (n)2 1(n)1 (n)1 (n): (22) ;
It is at this point that we realize we must modify our algorithm, for the algorithm as presently conceived will result in our having to update the elements of the correlation array along with the other arrays. This can be shown to produce an unacceptably large increase in the computational complexity of the algorithm. We therefore make the following modi cation to the algorithm. We develop the recursion for h1 (n) in the same fashion as was done above. When we then turn our attention to the adaptation of the second order weights, we perform the analysis using the assumption that the linear weights have already reached steady state, i.e., that h1 (n) h1 (n) and P1 (n) P1 (n ? 1). This forces the linear Kalman gain vector to zero, causing (22) to collapse to
h2(n) = h2(n ? 1) + 2(n)2 (n):
(23)
We proceed in an inductive fashion for each of the higher-order sets of weights. The resulting algorithm is listed as: ?1 P (n ? 1)x (n) (24) (n) = 1 + ?1 x (n)P (n ? 1)x (n) k
k
k
T k
(n) = d(n) ?
k X
k
P (n) = k
j
=1
k
k
h (n)x (n) T j
(25)
j
?1 P (n ? 1) ? ?1 (n)x (n)P (n ? 1); k
k
T k
k
k = 1; 2; : : : ; p;
(26) (27)
h (n) = h (n ? 1) + (n) (n); k = 1; 2; : : : ; p: k
k
k
k
We nally compare the computational complexity of this algorithm to that of the fully coupled Volterra RLS algorithm. By analogy to the case of the linear RLS adaptation, each iteration of the fully coupled algorithm requires 4M2 + 4M + 1 complex multiplications and 3M2 + M complex additions, where M is the number of elements of h . We can show that the partially decoupled RLS P P algorithm requires 4 =1 M 2 + 4M + 1 complex multiplications and 3 =1 M 2 + M complex additions per iteration, where M is the number of elements in h , for k = 1; 2; : : : ; p. Clearly, the partially decoupled algorithm represents a savings in the number of operations that must be performed per iteration. This fact is demonstrated in Figure 1, which plots computational complexity as a function of observation window size and lter order for both fully coupled and partially decoupled RLS adaptive algorithms. p
p
p
p
p
V
p
k
p
p
k
k
k
k
5
k
p
9
9
10
10
8
8
10
10
p=5
7
7
10
10
p=4
p=4 Additions per Iteration
Multiplications per Iteration
p=5
6
10
p=3 5
10
p=2
4
10
3
6
10
p=3 5
10
p=2
4
10
3
10
10 p=1
p=1
2
2
10
10
1
10
0
1
5
10
10
15
N
0
5
10
15
N
Figure 1: Plot of Required Complex Multiplications and Additions per Iteration for Fully Coupled and Partially Decoupled RLS Algorithm; Solid Line=Fully Coupled, Dashed Line=Partially Decoupled
4 Example of Application We now consider a simple example that will allow us to demonstrate the dierence in performance of the fully coupled and partially decoupled RLS adaptation schemes. The situation that we will examine is one that arises in satellite communication channels. We have a single channel passing through a satellite transponder, which here consists of a traveling wave tube ampli er (TWTA). Such devices, if driven with an excessive amount of input signal power, are pushed into a saturation mode and their transfer characteristic becomes nonlinear. A detailed description of their operation is given in [1]. To simulate this eect, we will use a memoryless transfer function developed by Saleh [6], where the input and output voltage magnitudes are related as d(n) (28) x(n) = 1 + d2 (n) + w(n); where d(n) is the input signal, x(n) is the corrupted output process, w(n) is an additive Gaussian white noise process with variance 2 , and we will take = = 4. The input signal is a Markov w
6
process with state transition matrix whose elements are given by 8 > > > > > >
> > > > > :
0; 0:6; 0:4; 0:2;
i > j + 0:2; i < j ? 0:2 i = j; j = 1 i = j 0:1; j = 0:9
(29)
otherwise.
It is possible, using this model, to compute optimal weights for Volterra lters of various orders by estimating the joint statistics of d(n) and x(n). The corresponding MSEs for fully coupled and partially decoupled Volterra lters of orders 1 through 5 are given in the table below for a signal to noise ratio of 10dB. Table 1: Mean Squared Error Figures for Fully Coupled and Partially Decoupled Volterra Filters p
1 2 3 4 5
Filter MSE Fully Coupled Partially Decoupled 0.1023 0.1023 0.1022 0.1022 0.0923 0.0992 0.0916 0.0984 0.0881 0.0964
1
10
Fully Coupled RLS 0
MSE
10
−1
10
Partially Decoupled RLS
−2
10
0
100
200
300
400 500 Iteration Number, n
600
700
800
900
Figure 2: Ensemble-Averaged Square Innovation Error for Second Order Filters 7
2
10
Fully Coupled RLS
1
MSE
10
0
10
−1
10
Partially Decoupled RLS
−2
10
0
100
200
300
400 500 Iteration Number, n
600
700
800
900
Figure 3: Ensemble-Averaged Square Innovation Error for Third Order Filters These results show that the minimum mean squared error (MMSE) operating point of the partially decoupled Volterra lter is very close to that of the optimal, fully coupled, design. Time histories of the innovation errors for second and third order lters are plotted in Figures 2 and 3, respectively. As can be seen from the gures, there is little dierence in the steady-state operation of the lters. It is in the initial stages of the adaptation that we see the dierence between the two schemes. The partially decoupled innovation, which is dominated by the linear component of the lter, exhibits much less initial deviation than the fully coupled innovation. As a result, the partially decoupled innovation exhibits a faster settling time.
5 Conclusions We have seen how the principle of constrained optimization of the Volterra lter, originally applied to the MMSE ltering problem, can be applied to the minimum cumulative error ltering problem with similar results. Furthermore, we have shown that it is possible to develop a partially decoupled version of the Volterra RLS algorithm, just as a partially decoupled Volterra LMS algorithm was developed. The partially decoupled RLS algorithm is advantageous in that it requires fewer computations per iteration to implement than the standard RLS algorithm does. The price paid is a loss in performance, but as the examined example shows, the penalty is not severe. In addition, the partially decoupled algorithm exhibits a slightly faster setting time, and experiences a smaller jump in the innovation error in the early stages of adaptation than what is encountered when the fully coupled algorithm is run. 8
References [1] S. Benedetto, E. Biglieri, and R. Daara, \Modeling and Performance Evaluation of Nonlinear Satellite Links{ A Volterra Series Approach," IEEE Transactions on Aerospace and Electronic Systems, vol. 15, no. 4, pp. 494-507, July, 1979. [2] D. W. Grith, Partially Decoupled Volterra Filters: Formulation and Adaptive Algorithms, M.S. Thesis, University of Delaware, 1994. [3] S. Haykin, Adaptive Filter Theory, (Englewood Clis, NJ: Prentice Hall, Inc., 1991). [4] V. J. Mathews, \Adaptive Polynomial Filters," IEEE Signal Processing Magazine, pp. 10-25, July 1991. [5] W. J. Rugh, Nonlinear System Theory, (Baltimore: The Johns Hopkins University Press, 1981) [6] A. A. M. Saleh, \Frequency-Independent and Frequency-Dependent Nonlinear Models of TWT Ampli ers," IEEE Transactions on Communications, vol. 29, no. 11, pp. 1715-1720, November, 1991.
9
List of Figures 1 2 3
Plot of Required Complex Multiplications and Additions per Iteration for Fully Coupled and Partially Decoupled RLS Algorithm; Solid Line=Fully Coupled, Dashed Line=Partially Decoupled : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 Ensemble-Averaged Square Innovation Error for Second Order Filters : : : : : : : : : 7 Ensemble-Averaged Square Innovation Error for Third Order Filters : : : : : : : : : : 8
List of Tables 1
Mean Squared Error Figures for Fully Coupled and Partially Decoupled Volterra Filters 7
10