IEICE TRANS. FUNDAMENTALS, VOL.E94–A, NO.8 AUGUST 2011
1628
LETTER
Special Section on Advances in Adaptive Signal Processing and Applications
Regularization of the RLS Algorithm∗
˘ ††c) , Nonmember Jacob BENESTY†a) , Nonmember, Constantin PALEOLOGU††b) , Member, and Silviu CIOCHINA
SUMMARY Regularization plays a fundamental role in adaptive filtering. There are, very likely, many different ways to regularize an adaptive filter. In this letter, we propose one possible way to do it based on a condition that makes intuitively sense. From this condition, we show how to regularize the recursive least-squares (RLS) algorithm. key words: echo cancellation, adaptive filters, regularization, recursive least-squares (RLS) algorithm
1.
Introduction
It is well known that regularization is a must in all problems when a noisy linear system of equations needs to be solved [1]. Any adaptive filter has a linear system of equations to solve, explicitly or implicitly, so that regularization is required in order that the algorithm converges smoothly and consistently to the optimal Wiener solution, especially in the presence of additive noise. In many adaptive filters [2], [3], the regularization parameter is chosen as δ = βσ2x , where σ2x = E x2 (n) is the variance of the zero-mean input signal x(n), with E[·] denoting mathematical expectation, and β is a positive constant. In practice though, β is more a variable that depends on the level of the additive noise. This type of regularization seems to work well in practice (with, of course, a good choice of β) since the misalignment, which is a distance measure between the true impulse response and the estimated one with an adaptive algorithm, decreases smoothly with time and converges to a stable and small value. However, even popular books [4], [5] discuss the regularization problem in a very superficial way and refer to it as a “small positive number,” which is not really true. Indeed, in our experience, the regularization parameter can vary from very small to very large, depending on the level of the additive noise. In this letter, we focus on the regularized recursive least-squares (RLS) algorithm and show how to find δ or β in order that the algorithm behaves properly in all noise Manuscript received November 5, 2010. Manuscript revised December 13, 2010. † The author is with the INRS-EMT, University of Quebec, Montreal, Quebec, H5A 1K6, Canada. †† The authors are with University Politehnica of Bucharest, Romania. ∗ This work was supported under the Grant POSDRU/89/1.5/S/ 62557. a) E-mail:
[email protected] b) E-mail:
[email protected] c) E-mail:
[email protected] DOI: 10.1587/transfun.E94.A.1628
conditions. 2.
Signal Model and the Regularized RLS Algorithm
We have the observed signal d(n) = hT x(n) + w(n) = y(n) + w(n), where n is the discrete-time index, h = [h0 , h1 , . . . , hL−1 ]T is the impulse response (of length L) of the system that we need to identify, superscript T denotes transposition, x(n) = [x(n), x(n − 1), . . . , x(n − L + 1)]T is a vector containing the most recent L samples of the zeromean input signal x(n), and w(n) is a zero-mean additive noise, which is independent of x(n). The signal y(n) is called the echo in the context of echo cancellation. We can define the echo-to-noise ratio (ENR) [2] as ENR = σ2y /σ2w , where σ2y = E[y2 (n)] and σ2w = E[w2 (n)] are the variances of y(n) and w(n), respectively. The ˆ goal is to estimate h with an adaptive filter h(n) = T ˆ ˆ ˆ [h0 (n), h1 (n), . . . , hL−1 (n)] , such that for a reasonable value ˆ of n, the (normalized) misalignment h(n) − h22 /h22 ≤ ι, where ι is a predetermined small positive number and · 2 is the 2 norm. From the regularized least-squares criterion J(n) = n n−i 2 ˆT ˆT ˆ i=1 λ [d(i) − h (n)x(i)] + δh (n)h(n), where λ (0 < λ < 1) is the exponential forgetting factor, we easily derive the regularized RLS algorithm: ˆ ˆ − 1) + R ˆ x (n) + δI −1 x(n)e(n), (1) h(n) = h(n ˆ x (n) = ni=1 λn−i x(i)xT (i) = λR ˆ x (n − 1) + x(n)xT (n) where R is an estimate of the correlation matrix of x(n) at time n, I is the identity matrix of size L × L, and e(n) = d(n) − ˆ − 1) = d(n) − yˆ (n) is the a priori error signal. We xT (n)h(n ˆ x (n) is full rank, although it will assume that the matrix R can be very ill conditioned. As a result, if there is no noise, regularization is not required and the linear system involved in (1) can be solved exactly; however, the more the noise, the larger should be the value of δ. In the following, we assume that L 1 and we limit thei effect of the exponential window to the length L, i.e., ∞ i=1 λ = 1/(1 − λ) = L, which implies that λ = 1 − 1/L. The question is how to choose δ? Next, we propose one reasonable way to find this important parameter. 3.
Choice of the Regularization Parameter
The update equation of the regularized RLS can be rewritten as
c 2011 The Institute of Electronics, Information and Communication Engineers Copyright
LETTER
1629
ˆ ˆ − 1) + h(n), ˜ h(n) = P(n)h(n ˆ x (n) + δI −1 x(n)xT (n) and where P(n) = I − R ˆ x (n) + δI −1 x(n)d(n) ˜ h(n) = R
(2)
(3)
is the correctiveness component of the regularized RLS algorithm, which depends on the new observation d(n). But ˆ − 1) in P(n) does not depend on the noise signal and P(n)h(n (2) can be seen as a good initialization of the adaptive filter. Obviously, (3) is the solution of the noisy linear system of L ˆ ˜ equations Rx (n) + δI h(n) = x(n)d(n). ˜ is the error signal beSince e˜ (n) = d(n) − xT (n)h(n) tween the desired signal and the estimated signal (which depends on the new observation) obtained from the filter optimized in (3), we should find δ in such a way that the expected value of e˜ 2 (n) is equal to the variance of the noise, i.e., (4) E e˜ 2 (n) = σ2w . This is reasonable if we want to attenuate the effects of the ˜ noise in the estimator h(n). In order to easily derive a simplified and optimal δ according to (4), we assume in the rest that x(n) is stationary and white. In this case and for n large enough, we have σ2x ˆ + δ I ≈ Lσ2x + δ I. (5) Rx (n) + δI ≈ 1−λ We also have xT (n)x(n) = Lσ2x . Developing (4) and using (5), we easily derive the quadratic equation: 2 Lσ2x Lσ2x 2 δ −2 δ− = 0, (6) ENR ENR from which we deduce the obvious solution: √ L 1 + 1 + ENR σ2x = βRLS σ2x , δ= (7) ENR √ where βRLS = L 1 + 1 + ENR /ENR is the normalized (with respect to the variance of the input signal) regularization parameter of the RLS. We see that δ depends on three elements: the length, L, of the adaptive filter, the variance, σ2x , of the input signal, and the ENR. In echo cancellation, the first two elements (L and σ2x ) are known, while the ENR is often roughly known or can be estimated. Therefore, it is not hard to find a good value for δ in these applications. Furthermore, we clearly have limENR→0 δ = ∞ and limENR→∞ δ = 0, which is what we desire. 4.
Simulations
Simulations were performed in the context of network echo cancellation. The echo path has 128 coefficients (the sampling frequency is 8 kHz); it is the fourth impulse response
Fig. 1 Misalignment of the RLS algorithm using β = 20 (dot) and βRLS (solid). (a) ENR = 20 dB; (b) ENR = 10 dB; (c) ENR = 0 dB. The input signal is speech, L = 128, λ = 1 − 1/L.
from ITU-T G168 Recommendation [6]. The same length is used for the adaptive filter, i.e., L = 128. The input signal is a speech sequence. The output of the echo path is corrupted by an independent white Gaussian noise with different values of the ENR. An abrupt change of the echo path was introduced at time 2.5, by shifting the impulse response to the right by 12 samples. The performance measure is the normalized misalignment (in dB). The exponential forgetting factor is λ = 1 − 1/L ≈ 0.99. The RLS algorithm using the optimal regularization βRLS [see (7)] is compared with its counterpart using a constant regularization β = 20 (which is a “classical” ad-hoc choice in many scenarios). The results are given in Fig. 1. As expected, the importance of the “optimal” regularization becomes more apparent for low ENRs, outperforming the “ad-hoc” regularization. 5.
Conclusions
In this letter, we have proposed a simple condition, that makes intuitively sense, for the derivation of an optimal regularization parameter for the RLS algorithm. Simulations have shown that with the proposed regularization, the adaptive algorithm behaves very well at different ENR levels. References [1] P.C. Hansen, Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion, SIAM, Philadelphia, PA, 1998. [2] J. Benesty, T. Gaensler, D.R. Morgan, M.M. Sondhi, and S.L. Gay, Advances in Network and Acoustic Echo Cancellation, SpringerVerlag, Berlin, Germany, 2001. [3] C. Paleologu, J. Benesty, and S. Ciochin˘a, Sparse Adaptive Filters for Echo Cancellation, Morgan & Claypool Publishers, 2010. [4] S. Haykin, Adaptive Filter Theory, Fourth Edition, Prentice-Hall, Upper Saddle River, NJ, 2002. [5] A.H. Sayed, Fundamentals of Adaptive Filtering, John Wiley & Sons, Hoboken, NJ, 2003. [6] Digital Network Echo Cancellers, ITU-T Rec. G.168, 2002.