864
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 3, MARCH 1999
Fig. 3. Norm of coefficient-error vector for the SOV filter (white Gaussian input of variance 1, SNR 20 and 40 dB).
=
to estimate the SOV system (treated in [9] and [11])
yk =
0 0:78xk01 0 1:48xk02 + 1:39xk03 + 0:04xk04 + 0:54xk201 + 3:72xk01 xk02 + 1:86xk01 xk03 0 0:76xk01xk04 0 1:62x2k02 + 0:76xk02xk03 0 0:12xk02xk04 + 1:41x2k03 0 1:52xk03xk04 0 0:13x2k04 + nk : We then have N = 4; M = 5; = 2M = 10; n = 14. Fig. 3
illustrates the evolution of the performance criterion for two values of the SNR. We note that no numerical stability problem occurred during the simulations.
[6] F. Fnaiech, S. Guillon, and M. Najim, “A fast M-D recursive least square adaptive second-order Volterra filter,” in Proc. IEEE Workshop Nonlinear Signal Image Process., Neos-Marmaras, Greece, June 22–24, 1995, pp. 408–411. [7] A. Houacine, “Regularized fast recursive least squares algorithms for adaptive filtering,” IEEE Trans. Signal Processing, vol. 39, pp. 860–870, Apr. 1991. [8] B. H. Khalaj, A. H. Sayed, and T. Kailath, “A unified derivation of square-root multichannel least-squares filtering algorithms,” in Proc. IEEE ICASSP, 1993, pp. 523–526. [9] J. Lee and V. J. Mathews, “A fast recursive least square adaptive secondorder Volterra filter and its performance analysis,” IEEE Trans. Signal Processing, vol. 41, pp. 1087–1102, Mar. 1993. [10] L. Ljung and T. Kailath, “Efficient change of initial conditions, dual Chandrasekhar equations, and some applications,” IEEE Trans. Automat. Contr., vol. AC-22, pp. 443–447, June 1977. [11] V. J. Mathews, “Adaptive polynomial filters,” IEEE Signal Processing Mag., pp. 10–26, July 1991. [12] M. Morf, G. S. Sidhu, and T. Kailath, “Some new algorithms for recursive estimation in constant, linear systems,” IEEE Trans. Automat. Contr., vol. AC-19, pp. 315–323, Aug. 1974. [13] M. Sayadi, A. Chaari, F. Fnaiech, and M. Najim, “A fast M-D Chandrasekhar algorithm for second-order Volterra adaptive filtering,” in Proc. IEEE ICASSP, Atlanta, GA, May 7–10, 1996, pp. 1339–1342. [14] A. H. Sayed and T. Kailath, “Extended Chandrasekhar recursions,” IEEE Trans. Automat. Contr., vol. 39, pp. 619–620, Mar. 1994. [15] G. L. Sicuranza, “Quadratic filters for signal processing,” Proc. IEEE, vol. 80, pp. 1263–1285, Aug. 1992. [16] D. T. M. Slock and T. Kailath, “Numerically stable fast transversal filters for recursive least squares adaptive filtering,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 39, pp. 92–114, Jan. 1991.
VIII. CONCLUDING REMARKS In this work, we have presented a fast algorithm for the multichannel linear and SOV filtering problem based on fast Chandrasekhar equations. This technique is extended from the single-channel structure and could handle a different number of delay elements per input channel. The method has good numerical properties since, instead of propagating through the recursions, the overall covariance matrix, which sometimes loses its positive definiteness property, propagates a factorized increment of this matrix, which has a reduced rank. The adaptive filter was successfully used in multichannel linear and nonlinear system identification, and its convergence and numerical stability are illustrated by simulations. ACKNOWLEDGMENT The authors wish to acknowledge the anonymous referees for their valuable comments and constructive suggestions. REFERENCES [1] M. G. Bellanger, Adaptive Digital Filter and Signal Analysis. New York: Marcel Dekker, 1987. [2] J. Cioffi and T. Kailath, “Fast RLS transversal filters for adaptive filtering,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP32, Apr. 1984. [3] G. Demoment and R. Reynaud, “Fast RLS adaptive algorithms and Chandrasekhar equations,” in Proc. SPIE Adaptive Signal Process., S. Haykin, Ed., July 1991, vol. P-1565, pp. 357–367. [4] X. C. Du, A. Brella, and R. Longchamp, “Fast algorithm of Chandrasekhar type for ARMA model identification,” Automatica, vol. 24, no. 5, pp. 927–934, 1992. [5] D. D. Falconer and L. Ljung, “Application of fast Kalman estimation to adaptive equalization,” IEEE Trans. Commun., vol. COMM-26, pp. 1436–1446, Oct. 1978.
A Novel Kurtosis Driven Variable Step-Size Adaptive Algorithm Dimitrios I. Pazaitis and A. G. Constantinides
Abstract—In this correspondence, a new variable step-size LMS filter is introduced. The time-varying step-size sequence is adjusted, utilizing the kurtosis of the estimation error, therefore reducing performance degradation due to the existence of strong noise. The convergence properties of the algorithm are analyzed, and an adaptive kurtosis estimator that takes into account noise statistics and optimally adapts itself is also presented. Simulation results confirm the algorithm’s improved performance and flexibility.
I. INTRODUCTION Stochastic gradient adaptive filters using the least mean square (LMS) algorithm [12] enjoy great popularity due to their inherent simplicity and robustness. The LMS algorithm has found applications in numerous and diverse fields, such as noise canceling, adaptive line enhancing, telecommunications, geophysics, etc., and its performance has been thoroughly investigated in the literature (e.g., [3], [5], [12]). Manuscript received March 31, 1996; revised May 1, 1998. This work was supported by a scholarship from the State Scholarship Foundation (I.K.Y) of the Hellenic Republic. The associate editor coordinating the review of this paper and approving it for publication was Dr. Akihiko Sugiyama. D. I. Pazaitis is with IMEC-VSDM, Leuven, Belgium (e-mail:
[email protected]). A. G. Constantinides is with the Signal Processing Section, Department Electrical and Electronic Engineering, Imperial College, London, U.K, (e-mail:
[email protected]). Publisher Item Identifier S 1053-587X(99)01351-3.
1053–587X/99$10.00 1999 IEEE
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 3, MARCH 1999
865
independent of xn . The identification (estimation) error at time n is denoted by e(n). All signals are real and zero mean, and n is the discrete time index. The adaptive FIR filter coefficients vector is denoted by n ; n = [hn ; . . . ; hn0N +1 ]T . The vector containing the parameters of the unknown system at time n; opt n is assumed to have the same dimensions as n . The error signal is then defined as
H H
H
H
wn 0 XTn Vn
e(n) = en
V
Fig. 1. System identification setup.
A constant step-size , which is also known as convergence factor, governs the stability of the algorithm, as well as the rate of convergence and the steady-state excess mean square error in relation to the optimal Wiener solution. The convergence time is inversely proportional to , whereas the misadjustment is proportional to N , where N equals the number of the weights of the adaptive filter [12]. When a nonstationary environment is considered, the “lag misadjustment” is proportional to 01 . Consequently, the optimal selection of the step size is dictated by a tradeoff between convergence rate and steady-state excess error and has been frequently addressed in the literature (e.g., [1], [12]). To meet the conflicting requirements of fast convergence, good tracking ability, and low misadjustment, time-varying step-size sequences have been proposed (e.g., [4], [7]–[9]). The LMS+F algorithm [10] may also be regarded as a variable step-size algorithm. The main rationale behind these approaches is to somehow sense the distance from the optimum and correspondingly adapt the stepsize value. Various approaches have been proposed based—among others—on the magnitude of the squared error (e.g., [8], [9]) and the gradient of the error surface (e.g., [4], [7]). It seems to us, however, that the performance of most of the proposed algorithms is susceptible to degradation in the presence of strong noise. Furthermore, very fine tuning of the algorithm parameters is usually required to achieve satisfactory performance. In this correspondence, a new time-varying step-size selection method is proposed that provides noise immunity under Gaussian noise conditions. To achieve this, we resort to the use of higher order statistics and, more precisely, to the fourth-order cumulant, which is known as kurtosis. The reason for using cumulants is twofold. Cumulants are additive in their arguments, i.e., cumulants of sums equal sums of cumulants (hence, the name cumulant), and second, all cumulants are blind to any kind of a Gaussian process, i.e., all cumulants of order greater than two of a Gaussian process equal zero. Thus, by introducing cumulants, we immunize in a way the step-size selection algorithm against Gaussian or Gaussian-like noise. In a real environment, even if the noise is not Gaussian, the existence of a large number of additive interferences “moves” the probability density function of the noise—according to the central limit theorem—toward Gaussianity. Next, we formulate the adaptive system identification problem and introduce our variable step-size LMS algorithm. II. A VARIABLE STEP-SIZE LMS ALGORITHM The structure of the system identification setup is depicted in Fig. 1. The input signal is denoted as xn , and wn is the observation noise. We assume that wn is symmetrically distributed and statistically
H H
(1)
opt where is the weight error vector. The n = n0 n symbol ( )T stands for the vector transposition operation n = T [xn ; . . . ; xn0N +1 ] is the input vector, and N is the number of adaptive coefficients. The popular LMS adaptive algorithm is a steepest descent gradient search type algorithm that uses, at each iteration, the instantaneous value of the gradient and seeks to minimize the mean square error E fe2 (n)g. The weight vector is updated using
Hn
+1
=
X
Hn + nXn e(n)
(2)
where n is the time-varying step size. In its original form [12], the step-size assumes a constant value. In our proposed algorithm, the step size is a function of the kurtosis of the error. The following two types of functions are proposed for adjusting the step size of the LMS algorithm
n = C4e (n)
(3)
and
n = max
1
0 e0jC
nj
( )
(4)
where
C4e (n) = E fe4 (n)g 0 3E 2 fe2 (n)g
(5)
is the kurtosis of the error, is a positive constant, and j 1 j denotes the absolute value. The constant max in (4) can be chosen as the maximum allowable value of n that ensures convergence. Each of the above functions exhibits certain advantages. Update (4) is selfrestricted to values lower than or equal to max , requiring, therefore, no supervision. On the other hand, update (3) is simpler to use, but to ensure convergence, an upper bound max on n may have to be imposed. It is also noted that the constant max in update (4) not only represents the maximum allowable value but also scales the term in the parenthesis. Thus, in certain environments, values smaller than the stability bound may prove more beneficial. If desired, a lower bound on the step-size sequence may be applied in order to ensure a minimum level of tracking speed. From (1), we see that the kurtosis of the error equals the sum of two terms: the noise kurtosis and the kurtosis of the convolution of the weight error vector with the input signal. Under the Gaussian noise assumption, the first term equals zero, whereas the second one approaches zero as the filter converges and the weight error vector n tends to zero. Thus, the algorithm starts with a large value for n , which decreases as it approaches the optimum, permitting, theoretically, exact convergence to the optimal solution. A question that might arise is whether it would be more beneficial to use the third-order cumulant (skewness) of the error signal. The reason for preferring the fourth-order cumulant (kurtosis) lies in the fact that the third-order cumulant of a symmetric distribution always equals zero. Thus, under certain conditions, the kurtosis of the second term in (1) might be zero, therefore freezing the adaptation process of the filter weights. Furthermore, experimental results will confirm the
V
866
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 3, MARCH 1999
improved performance characteristics of the kurtosis-driven step-size compared with step-size updates based on lower than the fourth-order statistics. The use of the exponential function to control n was first proposed by Karni and Zeng in [7], where the norm of the instant gradient, rather than the kurtosis, was used. Their algorithm can be interpreted as a new type of algorithm, which considers a performance function that includes an additional exponential function term in comparison with that of LMS [6].
R
is symmetric, it may be rotated Since the input covariance matrix into a diagonal matrix by a unitary transformation
R = Q3Q01 = Q3QT Q
In this section, the behavior of the proposed variable step-size adaptive algorithm is investigated. Two types of convergence are examined, namely, convergence in the mean and convergence in the mean square sense. The first type of convergence suffices for the mean weight error to equal zero, whereas the second one ensures a steady-state error with finite variance. To facilitate the analysis, a number of assumptions and approximations are introduced. The most common assumption is that the various input vectors come from mutually independent, zero mean, Gaussiandistributed sequences [3], [5], [12]. Although this is hardly true, especially in tapped-delay filters, where consecutive input vectors share N 0 1 entries, it is extensively used in the literature to simplify the analysis and has produced reliable results. The unknown system (optimal weight vector) opt is considered time varying, and a random walk model is adopted to model its evolution
H
=
Hn
opt
+ A(n)
(6)
where A(n) is a white, zero mean vector disturbance process with 2 a covariance matrix equal to A . We assume that the triplet f n; A(n); wng are statistically independent random processes, and we also introduce the assumption that the step-size sequence n is statistically independent of n ; n and e(n) [8], [9]. In the ideal version of the algorithm, where the actual kurtosis is utilized, this assumption is valid. In practical applications, however, where the error sequence fe(n)g is considered ergodic and a finite length window is used to estimate the kurtosis, this assumption does not hold. Experimental results have shown, nevertheless, that under slow convergence conditions or by increasing the time span of the window, the assumption tends to hold. For the algorithm to converge in the mean sense, the mean weight error vector should converge to the zero vector. A sufficient condition for this is [12]
X H
E fn g