2096
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007
Variable Explicit Regularization in Affine Projection Algorithm: Robustness Issues and Optimal Choice Hernán Rey, Student Member, IEEE, Leonardo Rey Vega, Sara Tressens, and Jacob Benesty, Senior Member, IEEE
Abstract—A variable regularized affine projection algorithm (VR-APA) is introduced, without requiring the classical step size. Its use is supported from different points of view. First, it has optimal and it satisfies certain error the property of being energy bounds. Second, the time-varying regularization parameter is obtained by maximizing the speed of convergence of the algorithm. Although we first derive the VR-APA for a linear time invariant (LTI) system, we show that the same expression holds if we consider a time-varying system following a first-order Markov model. We also find expressions for the power of the steady-state error vector for the VR-APA and the standard APA with no regularization parameter. Particularly, we obtain quite different results with and without using the independence assumption between the a priori error vector and the measurement noise vector. Simulation results are presented to test the performance of the proposed algorithm and to compare it with other schemes under different situations. An important conclusion is that the former independence assumption can lead to very inaccurate steady-state results, especially when high values of the projection order are used.
H1
Index Terms—Adaptive filtering, affine projection algorithm (APA), filtering, regularization, steady-state analysis.
I. INTRODUCTION
A
N adaptive filtering problem can be understood as one of identifying an unknown system using input–output data pairs. These situations appear very frequently in engineering problems [1]. Adaptive filtering schemes not only have the ability of solving problems with less computational cost, but can also deal with time variations of the system (nonstationary environments). In this paper, we focus on the affine projection algorithm (APA) [2]. It performs the actualization of the system estimation based on multiple input vectors. Traditionally, explicit regularization has been used in adaptive filtering to provide numerical stability to those algorithms that have to deal with ill-conditioned matrix inversions. On the other hand, the connection between regularization and robustness has been explored in the matrix analysis literature [3], [4]. In APA, the role of explicit regularization has also been implied in noise power reduction [5].
Manuscript received March 22, 2006; revised July 21, 2006. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Dominic K. C. Ho. This work was supported in part by the Universidad de Buenos Aires under Project UBACYT I005. H. Rey, L. R. Vega, and S. Tressens are with the Facultad de Ingenieria, Universidad de Buenos Aires (FIUBA), Buenos Aires C1063ACV, Argentina (e-mail:
[email protected];
[email protected];
[email protected]). J. Benesty is with the INRS-EMT, Université du Québec, Montréal, QC H3C 3P8, Canada (e-mail:
[email protected]). Digital Object Identifier 10.1109/TSP.2007.893197
We propose an APA with a time-varying regularization, which is supported from different points of view. First, a positive sequence for the regularization parameter can control the update of the adaptive filter so the classical step size is no longer needed. Second, we show the robustness related results estimation we introduced in [6]. By invoking the theory of in Krein spaces [7], we prove that the APA with this modified optimal. As a consequence, it exhibits a robust update is behavior against perturbations and model uncertainties, in the sense that small perturbations lead to small estimation errors. This is an important property in real environments. approach However, the robustness guaranteed by the is only true along a certain time interval, with possibly infinite length. That is why we look for energy relations that allow local robust behavior (robust at each time instant). Following the ideas introduced in [8], we find local and global error energy bounds for the APA family using the modified update. Several methods have been proposed in the literature for step-size control, but most of them are implemented heuristically [9]. Particularly in APA, some of these methods have many parameters [11] not linked to any expression that could turn them into design parameters (e.g., showing the steady-state error as a function of these parameters). In [10], a method for controlling a time-varying regularization parameter was introduced, but it requires extra processing for its implementation (like prewhitening and delay coefficient estimation). Although in [6] we proposed an optimal choice for the regularization factor to achieve maximum speed of convergence, it also requires the knowledge or computation of the power ( is defined in of the misalignment vector, i.e., Section II). To skip this issue, we introduce here a variant that depends only on the power of the error vector, i.e., ( is defined in Section II). We estimate this by time-averaging , allowing us to derive a new the observable quantity variable regularized APA (VR-APA). The simple expression derived here gives information about the relationship between the regularization factor and the convergence behavior of the algorithm. In addition, we prove that the same optimal regularization choice holds if we consider a time-varying system using a first-order Markov model. We also find expressions for the power of the steady-state error vector for the VR-APA and the APA with step size and no regularization factor. Different results are obtained with and without using the independence assumption between the a priori error vector and the measurement noise vector. Simulation results are presented to test the performance of the proposed algorithm and to compare it with other schemes under different situations.
1053-587X/$25.00 © 2007 IEEE
REY et al.: VARIABLE EXPLICIT REGULARIZATION IN APA: ROBUSTNESS ISSUES AND OPTIMAL CHOICE
In Section II, we introduce the APA recursion and propose to use a variable regularized version. Section III shows the robustness implications that come from using the modified update. The optimal choice for the regularization sequence is analyzed in Section IV. Then, in Section V, we find expressions for the steady-state behavior of the power of the error vector. Finally, simulation results are presented in Section VI. Boldface symbols are used for vectors (lower case) and matrices (upper case). Other notation is defined as follows: transpose; conjugate; conjugate and transpose; trace; expectation; set of complex numbers; identity matrix. II. APA FAMILY be an unknown Let linear finite-impulse response system. The input vector at time , , passes through the . This output is observed, system giving an output but it usually appears corrupted by a measurement noise , which will be considered as additive. Thus, each input gives . We want to find to estimate . an output This adaptive filter receives the same input, leading to an output . error When data blocks are used, we can define the data matrix , the desired output data vector , the noise vector , the error vector , the misalignment vector , and the a priori error vector . The APA was first introduced in [2], and follows the recursion (1)
2097
This rule gives an “effective step size” in (0,1) for any posthat could make the itive , so there is no upper bound on algorithm unstable. In the following, we define: (3) Despite the fact of assuming a stationary system in Section III, we will consider a nonstationary environment when we optimize the regularization parameter. III. ROBUST BEHAVIOR OF THE APA FAMILY Perturbations are something that any algorithm has to deal with in a real-world implementation. They have many different sources as follows: parameter variations with time, initial condition errors, measurement noise, modeling errors, numerical precision, etc. If we follow a deterministic framework, an algorithm is robust if it does not amplify the energy of the perturbations. We show the robust behavior of the APA family from the optimality and error energy following two approaches: bounds. A.
Optimality of the APA Family
In the mid 1990s, Hassibi et al. presented a relationship beoptimal filters and Kalman filtering in Krein spaces tween [7]. We apply this theory to show that the APA family is optimal. First, we introduce the state-space model (4) where is an estimate of and is the resulting filtered error vector (a posteriori error). We want optimal estimate for from the observations to find the , in order to achieve
(5)
where is a scalar known as the step size, included to control the changes along the selected direction. Moreover, setting in (1) leads to the popular normalized least mean square (NLMS) algorithm. The first motivation for using APA is to make an improvement on the convergence speed. In [12], it was shown that and are both stable, but the first choice has less steady-state mean square error with the same convergence speed. In [13], it was shown that the tracking ability of APA is maximized when is close to 1. On the other hand, when highly colored input data are presented, the matrix inversion in (1) becomes very difficult as its condition number grows critically. Using this numerical stability justification, a positive regularization term is usually added. These are some of the reasons why we propose to set and use a time-varying regularization parameter to control the update of the adaptive filter, so that the APA update becomes
where is a positive weighting sequence and denotes the space of square-summable causal vector sequences, viz., and denotes the set of all the possible causal estimation strategies. This formulation shows that the optimal estimate guarantees a level that represents the lowest bound on the energy transfer from all possible peresturbations of finite energy to the a posteriori error. The timates are then overconservative. In this case, the disturbances and the initial estimate . When a are given by the noise close form solution to this problem is not available, we solve a suboptimal problem, looking for a strategy that achieves . With (4) and (5) in mind, we restate the [14, Th. 1] which a posteriori gives the conditions for the existence of one filter with level (a formal proof of it can be found in [7] ). Theorem 1: For the state-space model (4), a strategy that exists if and only if achieves
(2)
(6)
2098
where
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007
and
satisfies the following: (7)
If this is the case, then one possible strategy that achieves is given by level (8) where
is computed as (9)
and (10) Equation (9) gives the expression for an a posteriori filter with level . , from (7), we can get that the condiStarting with a posteriori filter with level tion (6) for the existence of an takes the form (11) Assuming exciting input signal, i.e.,
we can see that for large and any positive weighting sequence we must have , because the second term in (11) is a positive semidefinite matrix with very large eigenvalues is large. Then, it is easy to see that , bewhen cause with this choice (11) is guaranteed. Using (9) and (10) , we can show that the optimal strategy that with satisfies (5) is the APA update (2). This result means that from all possible estimation strategies , the APA as in (2) is the one that minimizes the maximum energy transfer from the perturbation to the a posteriori error. However, when we used the hypothesis of the exciting input, we assumed that in (5), which represents the interval length where the energy is computed, is sufficiently large. As a consequence, the problem could become of infinite horizon. We can say nothing about what happens to the energy of the error right after a noise peak appears at a precise time instant. This “local behavior” needs a different approach and it will be developed in Section III-B. The fact that acts as a weighting sequence for the energy and , in order to avoid the amplification of the perof turbations, is an indicator that its importance is not restricted to numerical instability issues. , comes from Another interpretation of (5), with defining its associated transfer operator (12) It maps the weighted noise vector sequence and the perturbation induced by the initial choice into the weighted se-
Fig. 1. Maximum singular value of the operator (12) for the APA with the update (2). The input process is an AR1(0.95), M = 32, = 20 . For clarity, the plot is the ensemble average over 100 independent trials.
quence of the a posteriori error vector. Taking (5) into account, the maximum singular value of this operator is equal to 1 for large enough . The maximum singular value of an operator can be obtained as
We derived the operator (12) for the APA with the update (2) for all (the details are not shown, because they are tedious). This operator depends only on the input signal and on . In Fig. 1, we show for various instants the maximum singular value of the mentioned operator (for successive instants, the operator increases its dimensions). In this case, we used as in [17], where is the power of the input process. We can see that the maximum singular value grows with the iteration number and it tends towards 1. In addition, having fixed , the maximum singular value increases with . This has also been done in [14] for the recursive least squares (RLS) algorithm. The authors found that the maximum singular value for the RLS is above 1 and it increases with . That result might be an explanation for the poor robust performance of that algorithm, especially when non-Gaussian perturbations are considered. B. Local and Global Error Energy Bounds for APA We wonder if the APA family guarantees that the energy of the estimation errors will never exceed the energy of the perturbations for all time instants. To do so, we follow the approach introduced by Rupp and Sayed [8] for the least-mean-square (LMS)-type algorithms. Recalling the definition of the weighting matrix in (3), if we choose for all , then is positive definite. Using the APA recursion (2), the following local bounds for the a posteriori and a priori errors can be found.
REY et al.: VARIABLE EXPLICIT REGULARIZATION IN APA: ROBUSTNESS ISSUES AND OPTIMAL CHOICE
2099
Theorem 2 (Local Error Energy Bounds): At each time instant , the following energy bounds apply to the a posteriori and a priori errors, respectively: if if if if
(13)
.
(14)
Proof: See Appendix A. A first interpretation of these bounds is that for all time instants , the energy of the estimation errors never exceeds the energy of the perturbations. Another interpretation comes from and as defining the operators
Fig. 2. Mismatch (in decibels) for an APA. The setup is the same as the one used in [20, Fig. 4]. The input process is modeled by (40). M = 256, K = 10, = 1, SNR = 10 dB [see (37) for its definition]. The plot is the result of ensemble averaging over 100 independent trials.
The bounds (13) and (14) imply that these operators are contractions (in the sense that the norm of the output mapping always decreases with respect to the norm of the input). As we did before, we can see this experimentally by looking for the maximum singular value of the operator and verifying that it is not greater than 1. With the mapping defined previously, this is another way of presenting the robustness of APA. As the local bounds are valid for all , if a time interval of is taken into account, global error bounds come as a length generalization. In this case if if (15) if if
. (16)
The and the error bound approaches are concerned with the robustness of the APA family. In the first approach, we proved that the APA is the algorithm that minimizes the energy relation (5), over all possible estimation strategies. In the second approach, the bound (15) shows that, actually, the APA family is always poscan reach a tighter energy relation, as approach. The itive. This difference should not belittle the local and global bounds are satisfied by the APA family as a consequence of its way of finding the system estimate. On the other hand, the theory of linear estimation in Krein spaces is a powerful and elegant tool for the design and implementation of optimal filters. Another important point must be emphasized, in order to avoid a misinterpretation of the presented results. What we can predict from the robust behavior of the APA family is that the energy of the error vectors will be bounded by the energy of the
perturbations. However, if the energy of the latter increases, the energy of the errors might do so. This means that we can predict a stable behavior of the algorithm but we are not able to predict whether the performance will be “satisfactory” or not. In [20], Yamada et al. showed that the performance of the APA with and is strongly affected by the noise level. This is not in contradiction with the results presented here. To clarify this, in Fig. 2, we plot the mismatch, i.e., , for an APA with , , a signal-to-noise ratio (SNR) of 10 dB [defined in (37)], and different constant values for . The rest of the setup is identical to the one used in [20]. As can be seen, in all cases, there is a stable behavior, even for the unregularized case. However, is reduced. the steady-state mismatch degrades as long as was introduced in [17], Although the rule of thumb it might be not large enough for leading to a good performance, especially in low SNR conditions. is increased, usually increases with . At the As , another phenomenon takes place: the same time, when cross correlation between the noise vector and the a priori error invector is not 0 anymore (actually, it is more negative as creases). This means that, in the case of or , the energy of the errors would increase due to the increase in the energy of the noise, but this effect might be compensated by the negative cross correlation. In fact, it can even be overcompensated, letting the mismatch to be increased (as in the case shown by Yamada et al. [20]) without violating the energy bounds presented here. To emphasize the difference between robust behavior and steady-state mismatch, we can study the NLMS algorithm under impulsive noise. It is well known that in these scenarios the NLMS might present a positive steady-state misoptimal as it was proved in [14]. match. However, it is still An expression relating the steady-state mismatch with requires further studies. Only with it, we will be able to predict when a certain regularization factor will be enough to guarantee a satisfactory performance of the APA for a certain . However, as shown in Fig. 2, the larger the , the lower the steady-state
2100
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007
mismatch is. The prize for this is a decrease in the speed of convergence. In order to solve this compromise, in Section IV, we that maximizes the look for a time-dependent sequence speed of convergence. IV. OPTIMAL REGULARIZATION CHOICE In this section, we propose to maximize the speed of that minimizes convergence by choosing for each the . We assume that the measurement noise is a zero-mean white noise independent of the input data. We also use the following usual assumption in the APA literature [10], [12], [15], [18]. and the noise vector are A1) The a priori error vector statistically independent. As a consequence, our choice for results in a sequence that does not depend on its past values ( with appear only in the terms ). It should be noted that , i.e., the NLMS. A1) is only valid for the APA with . At first, we assume a stationary In the sequel, environment and then analyze the nonstationary case. 1) Stationary System: From the APA recursion (2)
(17) As
, using A1) leads to
(18) Now, we perform a singular value decomposition (SVD) of . By differentiating (18) the input matrix, i.e., partially with respect to , its optimum value is the one that solves (19) where
and
are diagonal matrices
with and being the eigenvalues and eigen. vector matrix of 2) Nonstationary System: Suppose now that the system is nonstationary with dynamics (20) where is a zero-mean white noise vector independent of the system. The misalignment vector is now . Thus, introducing A1)
(21)
, replacement As the error vector is now , which will in (21) leads to (18), except for the term disappear after differentiating with respect to . This result implies that solving (19) gives the same optimal regularization choice for both stationary and nonstationary environments, and it does not depend on the input statistics. A. Choice of
Under Simplifying Assumptions
Although we have just made the usual assumptions in APA . As we do analysis, solving (19) depends on the SVD of not have this information, we perform a heuristic approximation by replacing for (22) which is the average of the eigenvalues. Under this condition, and can be expressed as a constant times of the matrices the identity matrix for each . Hence, solving (19) leads to (23) The denominator is large at the beginning of the adaptagrows, tion, leading to a small . As the error decreases, slowing the adaptation and allowing the APA to have a small misadjustment. Although the approximation (22) could seem inaccurate, we can arrive at (23) with different assumptions. As , where is the th-order autocorrelation matrix of the (stationary) input signal, if is large and (usual condition in APA applications), then it is reasonable to assume (24) are not stochastic Thus, the eigenvalues and eigenvectors of will be required and the expression for anymore. In general, could be quite complicated. However, we can analyze two special cases. . Replacing (24) in 1) White Input: In this case, (18) gives the result
(25) Minimizing this expression with respect to brings the optimal regularization choice (23). 2) AR1 With Pole Close to 1: If the input is a highly correlated AR1 (first-order autoregressive process with pole close and , we prove to 1), by assuming independence between that (23) is the solution to the optimization (see Appendix B). Although the choice in (23) was first derived heuristically, the two extreme cases analyzed here encourage us to implement it. For these reasons, we propose a new VR-APA by using the update (2) with computed from (23). The quantity is estimated by time-averaging . The performance of this scheme will be tested in Section VI.
REY et al.: VARIABLE EXPLICIT REGULARIZATION IN APA: ROBUSTNESS ISSUES AND OPTIMAL CHOICE
V. STEADY-STATE POWER OF THE ERROR VECTOR Finally, we study the steady-state behavior of the regularized . Although the theAPA for a stationary system, oretical analysis of nonstationary systems is important, it will not be addressed here. The following analysis does not pretend to give a complete description of the statistical characteristics of the VR-APA. The exact quantitative analysis of the statistical behavior of the algorithm requires a full stochastic approach and it is out of the scope a of this paper. Particularly, we focus on . the steady-state power of the error vector, i.e., Usually, when regularization has been included in other analyses, it was assumed to be small [15]. This is not valid with VR-APA, where according to (23), can take very large for assuring values. Although there is no upper bound on the stability of the algorithm, we do not let it take arbitrarily large values. This is done for avoiding overflow problems in the practical implementation of the algorithm. We propose to set an upper bound on defined by
2101
correlation between the a priori error vector and the measuredepends on ment noise vector. This term is not 0 because . From recursion (2), this dependence takes the explicit form
(29) where , , and . The is first term is independent of , given that the sequence white and . The second term is actually the one that has an impact on the cross correlation. Using (29) and (24), we can write
(26) where is a positive design parameter. The particular choice . Alof (26) allows us to derive compact results for are given by (23), when this value though the dynamics of is greater than , we set it to this upper bound. We assume that at steady-state and (although we have not proved the last statement, it will be verified later with simulations). requires information about the input In general, statistics. This is why we decided to analyze the special cases . treated in Section IV. In the sequel, It is not difficult to see that is the first component of the error vector .
(30) Applying (24), the terms can be approximated by times a matrix that has 0s everywhere except in the “di, where it has 1s. For agonal” starting from element example
.. .
A. White Input Despite the fact that and are independent for , (although it is usually assumed in the it is not true for APA literature). For this reason, we separately analyze the cases with and without using A1). and : 1) Analysis With Independence Between Starting from (25), it holds that (27) , . It is clear from (27) that when are identically distributed, the Then, if the components of APA presents no misadjustment, i.e., . This is in agreement with the results of [12] and [15]. and : 2) Analysis Without Independence Between Starting from (17) and substituting (24) gives
(28) By comparing this expression with (25), it can be seen that the second term of (28) is the one that considers the cross
.. .
..
.
.. .
.. .
According to this
Following the same reasoning, the other terms in (30) for can be calculated. Now, considering in (28) the steadystate condition, (30) gives the result
Rearranging terms leads to
(31) (and ), (27) and (31) are It can be seen that when , depends equivalent. This is reasonable since when only on so as the noise is uncorrelated. In
2102
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007
Section VI, we will further analyze these predictions and test them with simulations. Although the difference between (27) and (31) could be thought as a consequence of the regularized scheme, something similar happens with the standard APA. Consider the update (1) with a fixed step size . Using the same arguments as in the VR-APA, it can be shown that when the input is white and A1) is included (32) while when A1) is not assumed
(33) . HowHere again, the results are equivalent when ever, there might be important differences when other values of are used. In Section VI, we will also compare these results. B. AR1 With Pole Close to 1 The cross correlation between and is much more complicated than for the white input case. For this reason, we just analyze the case with the assumption A1). Using (24), the asymptotic behavior of (18) is
Fig. 3. Measured room acoustic impulse response.
VI. SIMULATION RESULTS For implementing the proposed algorithm, some issues is estishould be taken into account. The quantity mated by low pass filtering
(36) where
where
are the eigenvectors of
. Rearranging terms
(34)
However, as we prove in Appendix B, . Therefore (35) When , the result is equivalent to the one with white . We will also input. However, differences arise when test this predictions in Section VI. Equations (31) and (35) allow us to define to achieve a certain level of steady-state power for the error vector. Putting this together with the choice (23) provides the fastest convergence to reach that level.
is the forgetting factor that is computed according to and is a natural number. Although a fixed was chosen on each simulation, a simple procedure could be used to increase as the algorithm is closer to its steady-state. An extra 5-dB gain in steady-state mismatch can be accomplished but we chose not to use it because it is not a mandatory feature of VR-APA. The fluctuations in the estimate (36) could result in a lower , especially when and the algorithm magnitude than would become negative in is close to the steady-state. As , as defined in (26), when the this situation, we set denominator of (23) becomes negative. As mentioned before, if , we set it to . With all this, the computational cost of the VR-APA is in the same order of the one of the standard APA with the update (2). This is because the only extra cost is associated with the computation of (36) and (23). The system is taken from a measured impulse response or , except for (Fig. 3). We truncate it to , where samples 106–137 are used. The adaptive filter length is set equal to in each case. We use the mismatch, which was already defined in Section III, as a measure of performance. The plots are the result of ensemble averaging over 30 independent trials, except in Figs. 8–10 where 150 are used. A zero-mean Gaussian white noise is added to the system output, which has a power , such that the SNR is (37)
REY et al.: VARIABLE EXPLICIT REGULARIZATION IN APA: ROBUSTNESS ISSUES AND OPTIMAL CHOICE
The values of and are required for computing (23). In practice, they might be known or otherwise estimated ( can be can obtained by averaging the square of the input signal and be estimated, for example, during silences in an acoustic echo cancellation context). In our simulations, we consider them as known. The performance of the proposed algorithm is compared with other strategies. Although interesting work has been done for comparing the performance of different algorithms [16], in this paper, we want to emphasize how the VR-APA can solve the tradeoff between speed of convergence and steady-state error. (which gives the We simulate the standard APA with fastest speed of convergence) and another with that gives the same steady-state mismatch as for the VR-APA. In both cases, a [17], except when fixed regularization factor is set to , where is used. We also show the performance of the variable step size APA (VSS-APA) introduced by Shin et al. [18], with its update calculated as in (1), where
2103
Fig. 4. Mismatch (in decibels) for white input signal. M = 512, K = 1, = 0:05, SNR = 30 dB, and = 1.
(38) We set and the same as in the proposed VR-APA. One simulation uses given by (38). However, as we will see in the simulation results, its performance is very poor. For this that gives the reason, we run another one employing a value same steady-state mismatch as with the VR-APA. Although it is not shown here, we also simulated the VSS-APA with a different expression to compute . Despite the fact that the performance were still up to two orders of was better, the differences with magnitude and the steady-state performance was worse by even more than 16 dB [19]. Finally, we include a variable regularized APA (POVR-APA) introduced in [10]. The filter update is done with (2). Although based on they proposed an implementation for choosing delay coefficients method to estimate the mismatch, it is easy to show that their update is equivalent to where
(39)
can be estimated in a similar way to (36) as
We use the same and and perform the same control in (with respect to negative and above values) as in the proposed VR-APA. It can easily be seen that . both VR-APAs are equivalent when 1) Stationary Systems: Figs. 4 and 5 show the simulated results for white input excitation. The proposed VR-APA but with a clear converges as fast as the APA with improvement in the steady-state mismatch. The APA with degrades notoriously its convergence speed in order to reach this steady-state level. The relation between and is , the between six and eight orders of magnitude. When . POVR-APA performs similarly to the APA with We also explore the performance for an AR1 input with pole (ratio at 0.95 in Figs. 6 and 7. The conditioning number
Fig. 5. Mismatch (in decibels) for white input signal. M = 512, K = 8, = 0:01, SNR = 30 dB, and = 1.
between the maximum and minimum eigenvalue of the corre) can be seen in the captions. Although the lation matrix POVR-APA presents a slightly faster convergence than the proreposed scheme, it has a worse steady-state performance. quires a difference of four to six orders of magnitude with rein order to achieve a similar performance than the spect to one of the VR-APA. The evolution of across time is presented in Fig. 8. Initially, it is small so fast convergence is obtained. As the estimation of becomes smaller, increases until it reaches and stays stable there. This behavior of was consistent across all the simulations performed. Although the theoretical results of Sections IV and V where developed for the cases of white and AR1 inputs with long system responses, we also test the performance of the algorithm when these hypotheses are not satisfied. In Fig. 9, we use the same setup as in [20, Fig. 4]. We want to see if the VR-APA can perform well even with an
2104
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007
Fig. 6. Mismatch (in decibels) for AR1(0.95). M = 512, K = 2, = 0:08, SNR = 30 dB, = 1, and = 1502.
N
Fig. 7. Mismatch (in decibels) for AR1(0.95). M = 512, K = 8, = 0:08, SNR = 30 dB, = 1, and = 1502.
N
Fig. 8. Variation of across time for AR1(0.9). M = 256, K = 2, = 0:2, SNR = 30 dB, = 1, and = 357.
N
Fig. 9. Mismatch (in decibels) for ARMA(2,2). M = 256, K = 10, = 0:1, SNR = 10 dB, = 5, and = 1:0682 10 .
N
10 dB (we have already seen in Fig. 2 that the fixed regularized APA can perform poorly in this environment). Thus, the input was generated by passing Gaussian white noise through an autoregressive moving average process ARMA(2,2) with transfer function (40) The interesting fact of this process is that the generated signals are highly correlated, even more than with the standard AR1. Its autocorrelation matrix can easily have a conditioning number larger than 100 000. As can be seen, the VR-APA can adapt well to this scenario. presents an increasing mismatch Although the APA with function which stabilizes around 2 dB, when a lower is used, the APA shows a very good performance. The VSS-APA and the POVR-APA have worse performances.
2
In Fig. 10, we test an taps filter using a moderate correlated AR1(0.8) input process. Here, VR-APA has a similar performance to the VSS-APA with , while it outperforms the other schemes, both in speed of convergence and steady-state value. This shows that even when the optimality of the proposed cannot be guaranteed in this experimental setup, the VR-APA has a good performance. 2) Nonstationary Systems: First, we test the recovery to a sudden change in the system. In Fig. 11, we show the result of suddenly multiplying the system response by . The VR-APA can quickly learn the new system without losing speed of convergence nor steady-state error with respect to the ones in the initial system identification. It can be seen in Fig. 12 that after the sudden change, the increase in the power of the error vector leads to a fast decrease of . Thus, the “effective step size” is closer to 1, which, according to [13], maximizes the tracking
REY et al.: VARIABLE EXPLICIT REGULARIZATION IN APA: ROBUSTNESS ISSUES AND OPTIMAL CHOICE
Fig. 10. Mismatch (in decibels) for AR1(0.8). M = 32, K = 4, = 0:03, SNR = 30 dB, = 1, and = 72:19.
N
Fig. 11. Mismatch (in decibels) for AR1(0.9). The system is suddenly changed to . M = 256, K = 8, = 0:05, SNR = 60 dB, = 1, and from = 357.
N
w 0w
ability of the APA family. Although not shown here, the performance can be further improved by using a variable memory factor . Now, we study the performance under a first-order Markov nonstationary system. To do so, we start with the parameters of the stationary system simulated in Fig. 11. Then, we generate a Gaussian white noise vector that is added to the system according to (20). Each component of the noise vector has a power chosen according to the degree of nonstationarity [21]
where . In practice, [21], and, particuin Fig. 13. The VR-APA can adapt well to larly, we use these situations without increasing very much, so that good tracking performance is accomplished. Despite the fact that, in this case, the VR-APA has the same performance as the standard
2105
Fig. 12. Evolution of across the time for the setup of Fig. 11.
Fig. 13. Mismatch (in decibels) for AR1(0.9). First-order Markov system. M = 256, K = 8, = 0:05, S = 2, SNR = 60 dB, = 1, and = 357.
N
APA with and the POVR-APA, when the same parameters are used with a stationary system, the proposed algorithm outperforms the other schemes, as can be seen in the first half of Fig. 11. The standard APA with small has poor performance, as previously noted in [13]. 3) Steady-State Power of the Error Vector: Finally, we test the accuracy of (27), (31)–(33), and (35). The experimental points on each simulation come from the steady-state value of (36) for the VR-APA and from an average of across the last 3000 iterations for the APA with step size and . We start with the standard APA which is shown in Fig. 14. The experimental points are well fitted by the predictions without using A1). While (32) predicts an increasing function of , (33) gives the same function for , a constant for , and U-shaped curves symmetric with respect to for . We should not misinterpret these results by making an analogy with the known ones for . In [12], the authors grows with and is almost independent of showed that . On the other hand, in [15], the authors found that when
2106
Fig. 14. Steady-state of the square norm of the error vector for the APA with step size and = 0. The theoretical expressions come from (32) and (33). White input excitation. M = 512, SNR = 30 dB ( = 0:0001).
, and when , increases linearly with . It should be noticed that, in [15], the authors neglect the on past noise samples, so they are implicitly dependence of using A1). , by definition of the APA, If we look at the case when all the components of the error vector are 0 except for the first one. This means that particularly . Then, the result (32) that uses A1) is in agreement with the one in [15], while (33) predicts that . As can be observed in Fig. 14, the experimental results show that when white input is considered, (33) is the expression that leads to the most accurate predictions. When , (33) predicts that is constant . When , the first component of the error vector carries all the energy, and as , the vector tends to an identically distributed state ( ). As is decreased towards 0, might decrease (as predicted in [12] and [15]), but the energy of the second component of must increase so the sum of the squares remains constant. In general, (33) predicts that for , there must be a complex process of redistribution of the total energy across the . To understand this process, the asymptotic components of behavior of is necessary as a function of . In Fig. 15, the VR-APA is analyzed for white input excitation. The experimental points are in excellent agreement with the results predicted by (31), while (27) is highly inaccurate except for , when it is equivalent to (31). Here, (27) is also an increasing function of with an asymptotic value of . On the other hand, (31) is constant for and decreasing for , but it presents a limiting value of , which is independent of . Again, can still present an increasing behavior with because, for , the redistribution of the energy across each component of the error vector would take place. For this reason, the norm of the vector can decrease while the square value of the first component is increasing. For , the same phenomenon
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007
Fig. 15. Steady-state of the square norm of the error vector for the VR-APA. The theoretical expressions come from (27) and (31). White input excitation. M = 512, SNR = 30 dB ( = 0:0001), and = 0:9997.
,
Fig. 16. Steady-state of the square norm of the error vector for the VR-APA. The theoretical expressions come from (35). AR1(0.95) input excitation. M = 256, SNR = 30 dB ( = 0:0001), and = 0:9997.
as for the standard APA is observed. More studies are required to understand the process of redistribution of the energy. The VR-APA with a highly correlated AR1 input process is analyzed in Fig. 16. For , the predictions are accurate as A1) holds. When , there are differences that become larger as grows, except when is very small, which agrees with the results in [15] when is very small. Actually, the predictions by (35) are increasing functions of , and as shown in the figure, the real points are interpolated by a decreasing function for . The dynamics of the real data are slower than for the white input case as can be seen from comparing the dotted lines of Figs. 15 and 16 for . When , both input process would lead to the same constant error power. As the cross correlation is actually negative, (35) returns a quantity larger than the real one, so it is a conservative expression.
REY et al.: VARIABLE EXPLICIT REGULARIZATION IN APA: ROBUSTNESS ISSUES AND OPTIMAL CHOICE
VII. CONCLUSION In this paper, we proposed a modified update for the APA family, which includes an explicit regularization factor. The aland by choosing a time-dependent gorithm is stable for all , the classic step size is no longer needed. Particularly, the explicit regularization factor does not only help in dealing with numerical precision problems, but also allows robust behavior against all possible perturbations. The algorithm is robust in the sense that it does not amplify the energy of the perturbations. optimality and the local error This was justified from the energy bounds obtained. However, there are no implications on the steady-state performance of the mismatch when low SNRs and different values of are considered (all we can say is that it will be stable). We also performed an analysis for optimizing to have maximum speed of convergence. The general expression depends on the input statistics. Nevertheless, we proved that the same expression holds for stationary and nonstationary (random walk) systems. A closed formula for was derived, which is optimal (under certain assumptions) for white and highly colored AR1 , which could be inputs. Its expression depends on . easily estimated by averaging the observable quantity This is in contrast with previously proposed variable regular[6], [10]. ized APAs that use the unobservable value for In addition, we analyzed the steady-state power of white and highly colored AR1 inputs. By the nature of the VR-APA, we could not assume (as usually done in the litwas small. In particular, for the white input erature) that case, we found different expressions with and without assuming independence between the a priori error vector and the measurement noise vector. Although this difference could be thought as a consequence of the regularized scheme, similar results were found for the standard APA with step size and no regularization factor. The proposed VR-APA shows great performance under different scenarios, even when compared with standard and VSS APAs. The variable regularization factor can control well the system update and at the same time it allows robust performance against perturbations (not only generated by numerical instabilities). Considering the parameter of the VSS-APA [18], when it is tuned so that the algorithm steady-state mismatch equals that of the VR-APA, both algorithms have almost the same performance. The problem is that there is no expression to compute in advance and its range of variability under different scenarios was quite large. This lack of a good expression to commakes it unsuitable in practice. On the other hand, while pute the POVR-APA is equivalent to the proposed algorithm when , its performance gets worse (especially on steady-state) when larger values of are considered. Finally, the simulations for testing the steady-state power of showed very different results (as predicted) depending on the use of the assumption A1), which stands for the independence between the a priori error vector and the noise vector. Although we used (strong) approximations in our analysis, e.g., (24), the simulation results show that the predictions are very
2107
good. Moreover, the assumption A1) has shown to be much stronger than (24). An important fact is that the presented results show that for , reducing the steady-state the APA and VR-APA with power of the error vector does not necessarily lead to a reduc. This could be a barrier when someone wants to tion in generalize certain approaches used in the NLMS literature. Further studies are required to link our results with the steady-state . Particularly, the distribuvalues of the mismatch and to tion of the energy across the components of the error vector as a function of , and can improve our understanding of the APA family. Another conclusion is that the cross correlation should be taken into account in future analysis of the APA, especially for large . Actually, the VR-APA should be derived again without using A1), even though this could be complicated (a dependence on past values of is expected). APPENDIX A PROOF OF THEOREM 2 We start from a weighted difference between the energy of the perturbations and the energy of the errors at each time instant, which has the form of
(41) According to the definitions in Section II, the errors can be written as
and
(42) (43) Now, we can compute the norm of
using (42)
(44) and, using (43), the norm of
as
(45) Substituting (44) and (45) in (41) gives the result
which is positive for any as and is positive definite. Regrouping terms in (41) leads to the first part of (13). , there is no system update, so If and , which completes the proof for the a posteriori bound. For the a priori bound, we start from
(46)
2108
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007
The last term can be written using (42) as
the matrices in (51) are similar, so we can take them out of the summations. In addition (47)
Substituting (47) and (44) in (46) leads to
(52)
With the same arguments applied to the a posteriori bound, the rest of the proof can be finished.
OPTIMIZATION OF
APPENDIX B FOR AR1 WITH POLE CLOSE TO 1
Replacing (24) in (18) and optimizing for
, we have
where and are the eigenvalues and eigenvectors of . and the If the pole of the AR1 process is close to 1, other eigenvalues are close to 0. Therefore
(48) and rearranging terms
(49) Looking at this expression, we see that if for each , we find that the optimal choice for follows the former result (23). This means that for analyzing this hypothesis when
(50) Assuming independence between
and
However
(51) where is the th component of the eigenvector . The element of the matrix is , with being the pole of the AR1 process. As is close to 1, all
because these eigenvectors are associated to nearly zero eigenvalues. Thus, (51) goes to zero, leading (50) to satisfy the former hypothesis. REFERENCES [1] S. Haykin, Adaptive Filter Theory, 3rd ed. Upper Saddle River, NJ: Prentice Hall, 2002. [2] K. Ozeki and T. Umeda, “An adaptive filtering algorithm using an orthogonal projection to an affine subspace and its properties,” Electron. Commun. Jpn., vol. 67-A, no. 5, pp. 19–27, May 1984. [3] L. E. Ghaoui and H. Lebret, “Robust solutions to least-squares problems with uncertain data,” SIAM J. Matrix Anal. Appl., vol. 18, no. 4, pp. 1035–1064, Oct. 1997. [4] G. H. Golub and C. F. Van Loan, Matrix Computations, 2nd ed. Baltimore, MD: John Hopkins Univ. Press, 1989. [5] G. Rombouts and M. Moonen, “Avoiding explicit regularization in affine projection algorithms for acoustic echo cancellation,” in Proc. ProRISC-99, Mierlo, The Netherlands, Nov. 1999, pp. 395–398. [6] H. G. Rey, L. R. Vega, S. Tressens, and B.C. Frias, “Analysis of explicit regularization in affine projection algorithms: robustness and optimal choice,” in Proc. EUSIPCO-04, Vienna, Austria, Sep. 2004, pp. 1089–1092. [7] B. Hassibi, A. H. Sayed, and T. Kailath, “Linear estimation in Krein spaces. Parts I and II,” IEEE Trans. Autom. Control, vol. 41, no. 1, pp. 18–49, Jan. 1996. [8] M. Rupp and A. H. Sayed, “A time-domain feedback analysis of filtered-error adaptive gradient algorithms,” IEEE Trans. Signal Process., vol. 44, no. 6, pp. 1428–1439, Jun. 1996. [9] A. Mader, H. Puder, and G. Schmidt, “Step-size control for acoustic echo cancellation filters. An overview,” Signal Process., vol. 80, no. 9, pp. 1697–1719, Sep. 2000. [10] V. Myllylä and G. Schmidt, “Pseudo-optimal regularization for affine projection algorithms,” in Proc. ICASSP-02, Orlando, FL, May 2002, pp. 1917–1920. [11] K. Mayyas and T. Aboulnasr, “A fast weighted subband adaptive algorithm,” in Proc. ICASSP-99, Phoenix, AZ, Mar. 1999, pp. 1249–1252. [12] S. G. Sankaran and A. A. L. Beex, “Convergence behavior of affine projection algorithms,” IEEE Trans. Signal Process., vol. 48, no. 4, pp. 1086–1096, Apr. 2000. [13] ——, “Tracking analysis results for NLMS and APA,” in Proc. ICASSP-02, Orlando, FL, May 2002, pp. 1105–1108. [14] B. Hassibi, A. H. Sayed, and T. Kailath, “H optimality of the LMS algorithm,” IEEE Trans. Signal Process., vol. 44, no. 2, pp. 267–280, Feb. 1996. [15] H. C. Shin and A. H. Sayed, “Mean-square performance of a family of affine projection algorithms,” IEEE Trans. Signal Process., vol. 52, no. 1, pp. 90–102, Jan. 2004. [16] S. C. Douglas and T. H. Y. Meng, “Normalized data nonlinearities for LMS adaptation,” IEEE Trans. Signal Process., vol. 42, no. 6, pp. 1352–1365, Jun. 1994. [17] S. Gay and S. Tavathia, “The fast affine projection algorithm,” in Proc. ICASSP-95, Detroit, MI, 1995, pp. 3023–3026. [18] H. C. Shin, A. H. Sayed, and W. J. Song, “Variable step-size NLMS and affine projection algorithms,” IEEE Signal Process. Lett., vol. 11, no. 1, pp. 132–135, Feb. 2004. [19] H. G. Rey, L. R. Vega, S. Tressens, and J. Benesty, “Optimum variable explicit regularized affine projection algorithm,” in Proc. ICASSP-06, Toulouse, France, May 2006. [20] I. Yamada, K. Slavakis, and K. Yamada, “An efficient robust adaptive filtering algorithm based on parallel subgradient projection techniques,” IEEE Trans. Signal Process., vol. 50, no. 5, pp. 1091–1101, May 2002. [21] S. Marcos and O. Macchi, “Tracking capability of the least mean square algorithm: Application to an asynchronous echo canceller,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-35, no. 11, pp. 1570–1578, Nov. 1987.
REY et al.: VARIABLE EXPLICIT REGULARIZATION IN APA: ROBUSTNESS ISSUES AND OPTIMAL CHOICE
Hernán Rey (S’06) was born in Buenos Aires, Argentina, in 1978. He received the B.Eng. degree in electronic engineering from the University of Buenos Aires, Buenos Aires, Argentina, in 2002, where currently, he is working towards the Ph.D. degree at the Institute of Biomedical Engineering. Since 2002, he has been a Research Assistant with the Department of Electronics at the University of Buenos Aires. His research interests include adaptive filter theory, neural networks, and computational neuroscience.
Leonardo Rey Vega was born in Buenos Aires, Argentina, in 1979. He received the B.Eng. degree in electronic engineering from the University of Buenos Aires, Buenos Aires, Argentina, in 2004, where currently, he is working towards the Ph.D. degree at the Department of Electronics. Since 2004, he has been a Research Assistant with the Department of Electronics at the University of Buenos Aires. His research interests include adaptive filtering theory and statistical signal processing.
Sara Tressens was born in Buenos Aires, Argentina. She received the degree of electrical engineering from the University of Buenos Aires, Buenos Aires, Argentina, in 1967. From 1967 to 1982, she worked at the Institute of Biomedical Engineering at the University of Buenos Aires, where she became Assistant Professor in 1977 and worked in the area of speech recognition, digital communication, and adaptive filtering. During 1980–1981, she worked in the Laboratoire des Signaux et Systemes, Gif-sur Yvette, France. From 1982 to 1993, she was with the National Research Center in Telecommunications
2109
(CNET), Issy les Moulineaux, France. Her research interests were in the areas of spectral estimation and spatial arrays processing. Since 1994 she has been an Associated Professor at the University de Buenos Aires. Her primary research interests include adaptive filtering, communications, and spectral analysis. Mrs. Tressens received the 1990 Best Paper Award from the IEEE Signal Processing Society.
Jacob Benesty (M’98–SM’04) was born in 1963. He received the M.S. degree in microwaves from Pierre & Marie Curie University, France, in 1987 and the Ph.D. degree in control and signal processing from Orsay University, France, in April 1991. During the postgraduate study (from November 1989 to April 1991), he worked on adaptive filters and fast algorithms at the Centre National d’Etudes des Telecommunications (CNET), Paris, France. From January 1994 to July 1995, he worked at Telecom Paris University on multichannel adaptive filters and acoustic echo cancellation. From October 1995 to May 2003, he was first a Consultant and then a Member of the Technical Staff at Bell Laboratories, Murray Hill, NJ. In May 2003, he joined the University of Quebec, INRS-EMT, Montreal, Quebec, Canada, as an Associate Professor. He coauthored the books Acoustic MIMO Signal Processing (Boston, MA: Springer-Verlag, 2006) and Advances in Network and Acoustic Echo Cancellation (Berlin, Germany: Springer-Verlag, 2001). He is also a coeditor/coauthor of the books Speech Enhancement (Berlin, Germany: Springer-Verlag, 2005), Audio Signal Processing for Next Generation Multimedia Communication Systems (Boston, MA: Kluwer, 2004), Adaptive Signal Processing: Applications to Real-World Problems (Berlin, Germany: Springer-Verlag, 2003), and Acoustic Signal Processing for Telecommunication (Boston, MA: Kluwer, 2000). His research interests are in signal processing, acoustic signal processing, and multimedia communications. Dr. Benesty received the 2001 Best Paper Award from the IEEE Signal Processing Society. He was a member of the editorial board of the EURASIP Journal on Applied Signal Processing and was the Co-Chair of the 1999 International Workshop on Acoustic Echo and Noise Control.