Fast LMS/Newton Algorithms for Stereophonic ... - Semantic Scholar

25 downloads 22 Views 942KB Size Report
of adaptive algorithms to solve the stereophonic acoustic echo cancelation (AEC) problem. .... modeling for the purpose of linear predictive coding in speech processing has been ..... transformation matrix [14] and has the form ... ... ... ... ... .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 57, NO. 8, AUGUST 2009

2919

Fast LMS/Newton Algorithms for Stereophonic Acoustic Echo Cancelation Harsha I. K. Rao, Student Member, IEEE, and Behrouz Farhang-Boroujeny, Senior Member, IEEE

Abstract—This paper presents a new class of adaptive filtering algorithms to solve the stereophonic acoustic echo cancelation (AEC) problem in teleconferencing systems. While stereophonic AEC may be seen as a simple generalization of the well-known single-channel AEC, it is a fundamentally far more complex and challenging problem to solve. The main reason being the strong cross correlation that exists between the two input audio channels. In the past, nonlinearities have been introduced to reduce this correlation. However, nonlinearities bring with it additional harmonics that are undesirable. We propose an elegant linear technique to decorrelate the two-channel input signals and thus avoid the undesirable nonlinear distortions. We derive two low complexity adaptive algorithms based on the two-channel gradient lattice algorithm. The models assume the input sequences to the adaptive filters to be autoregressive (AR) processes whose orders are much lower than the lengths of the adaptive filters. This results in an algorithm, whose complexity is only slightly higher than the normalized least-mean-square (NLMS) algorithm; the simplest adaptive filtering method. Simulation results show that the proposed algorithms perform favorably when compared with the state-of-the-art algorithms. Index Terms—Adaptive filters, lattice orthogonalization, LMS/ Newton, stereo acoustic echo cancellation (AEC).

I. INTRODUCTION

T

HE past few years have witnessed the use of multichannel audio in teleconferencing systems. In particular, stereophonic systems are desirable as they provide the listener with spatial information to help distinguish possibly simultaneous talkers [1]. Acoustic echo cancelers are a necessary component of such teleconferencing systems as they remove the undesired echoes that result from the coupling between the microphone and the loudspeakers [2]–[4]. This work proposes a new class of adaptive algorithms to solve the stereophonic acoustic echo cancelation (AEC) problem. The setup of a typical stereophonic acoustic echo canceler as it exists in a teleconferencing system is shown in Fig. 1 [1], [5]. A transmission room is shown on the left, wherein two microvia two phones are used to pick up the signals from a source acoustic channels characterized by the room impulse responses and . The stereophonic signals are transmitted to the loudspeakers in the receiving room. These loudspeakers are Manuscript received October 17, 2008; accepted March 07, 2009. First published April 07, 2009; current version published July 15, 2009. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Jonathon A. Chambers. The authors are with the Department of Electrical and Computer Engineering, University of Utah, Salt Lake City, UT 84112 USA (e-mail: [email protected]; [email protected]). Digital Object Identifier 10.1109/TSP.2009.2020356

coupled to one of the microphones via the acoustic channels deand . A conventional acoustic echo cannoted by celer will try to model the acoustic paths in the receiving room using two finite-impulse response (FIR) adaptive filters, and . If denotes the echo picked up by the microphone, then the two adaptive filters will use the input signals and to produce an estimate of represented by . The difference between and should produce a close to zero. While this may seem residual echo signal like a straightforward extension of the single-channel AEC, we and are derived from the may note that the inputs same source and hence are highly correlated. The strong and can create problems cross correlation between in implementing adaptive algorithms. The fundamental problem of stereophonic AEC was first addressed in [1] and more insight on this problem has been prodevided in [5]. An important result shown in [5] was that if notes the length of far-end room echo path, is the length of the is the length of the near-end modeling adaptive filters and room echo path, then for , there exists a unique solu, a misalignment will tion. It was also shown that for exist for both the single-channel/monophonic case as well as the stereophonic system. However, the problem is much greater in the two-channel setup due to the strong cross-correlation effects. Also, the use of nonlinearities to decorrelate the input signals was first proposed in [5] and further investigated in [6]. While the various versions of recursive least-squares (RLS) algorithm [7], [8] can provide excellent echo cancellation, the computational requirements for their implementation prevents them from being practical algorithms. Moreover, the strong correlation among the signals in the two channels results in a highly ill-conditioned covariance matrix and this in turn makes RLS algorithms sensitive to numerical errors. A leaky extended leastmean-square (XLMS) algorithm was proposed in [9] that aims to reduce the interchannel correlation without the addition of nonlinearities, hence, the quality and perception of speech signals remains unaffected. The leaky XLMS algorithm was shown multiplications, to perform satisfactorily at the expense of being the length of each adaptive filter. This is twice the complexity of a conventional least-mean-square (LMS) algorithm. Other algorithms that have been proposed for stereophonic AEC are (i) multichannel affine projection (AP) algorithm [10]; and (ii) exclusive maximum (XM) selective adaptation of filter taps [11]–[13]. While the XM selective-tap adaptation can be applied to the normalized least-mean-square (NLMS), AP and RLS algorithms with and without nonlinear processing, it was noticed that a combination of nonlinearities along with the selective-tap adaptation is more effective [11].

1053-587X/$25.00 © 2009 IEEE Authorized licensed use limited to: The University of Utah. Downloaded on August 7, 2009 at 16:36 from IEEE Xplore. Restrictions apply.

2920

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 57, NO. 8, AUGUST 2009

Fig. 1. Setup of a stereophonic AEC system.

It is widely known that the backward prediction-error components of a lattice predictor are orthogonal [14]. This particular characteristic can be incorporated efficiently into a joint estimation setup to improve the convergence rate of the adaptive LMS algorithm [14], [15]. The basic cell of a two-channel lattice predictor has been described in [16] and this idea was used to derive algorithms for stereo echo cancellation using lattice orthogonalization and adaptive structures [17]. However, the lattice pre, [17], which is prohibitive dictor has a complexity of for large values of . This paper develops a stereophonic extension of the fast LMS/Newton algorithm of [18]. This algorithm makes use of the orthogonalization property of the two-channel lattice predictor. We note that because of significant differences between the single-channel and multichannel lattice structures and their detailed properties, this extension is not straightforward. In particular, in a single-channel lattice structure, the close relationship between the forward and backward predictors greatly simplifies the development of the LMS/Newton algorithm of [18]. While the same simplifications do not exist in a multichannel setup [16], [19], it is still possible to use the properties of the two-channel lattice predictor to derive a two-channel version of the LMS/Newton algorithm. Autoregressive (AR) modeling for the purpose of linear predictive coding in speech processing has been found to be quite effective in the past. Usually, a model order in the range of 5 to 10 is more than sufficient to code speech signals [15]. This provided the rationale and as AR processes to model the input signals of order . Consequently, only a few stages of the lattice predictor are sufficient to decorrelate the signals in the two channels of the acoustic echo canceler. The computational complexity reduces drastically and simulation results show that the two versions of the algorithm that we propose perform favorably when compared with the other existing solutions for stereophonic AEC. The use of this simple linear technique aims to achieve the decorrelation without the use of any nonlinearities, thereby fully preserving the stereophonic quality of the speech signals. The rest of this paper is organized as follows. In Section II, we briefly describe the two-channel lattice predictor. The

derivation of the two-channel gradient lattice adaptive algorithm is provided in Section III. The two versions of our LMS/Newton algorithms for the stereophonic setup will be presented in Section IV. The simulation results are discussed in Section V. Conclusions are drawn in Section VI. In what follows, we have denoted vectors and matrices using bold-faced lower-case and upper-case characters, respectively, and vectors are always in column form. The superscript denotes vector or matrix transpose. II. TWO-CHANNEL LATTICE PREDICTOR The characteristics of a lattice predictor makes it an attractive proposition in adaptive filtering. It forms an integral part of a class of LMS/Newton algorithms that have been derived for the single-channel case in [18]. To facilitate the extension of this class of algorithms to stereophonic AEC, in this section, we examine the properties of the two-channel lattice predictor. A gradient-type lattice predictor algorithm has been derived for the purpose of stereophonic AEC in [17]. The two-channel lattice cell used in this algorithm was first derived in the leastsquares context for multichannel adaptive filtering in [16]. The structure of a basic cell of the two-channel lattice predictor is shown in Fig. 2. From the figure, we note that the equations for updating the forward and backward prediction-errors of the th cell can be written as (1a) (1b) where

, are the 2 1 backward and for2 ward prediction-error vectors, respectively, and the 2 is given by reflection coefficient matrix

The initialization of the lattice predictor is done as

Authorized licensed use limited to: The University of Utah. Downloaded on August 7, 2009 at 16:36 from IEEE Xplore. Restrictions apply.

(2a) (2b)

RAO AND FARHANG-BOROUJENY: FAST LMS/NEWTON ALGORITHMS

2921

Fig. 2. Two-channel lattice cell.

A simple gradient adaptive algorithm can be used to comin a recursive fashion. The reflection coefficients of pute the th cell can be chosen so as to minimize the instantaneous backward and forward prediction-errors of the corresponding cell. This leads to the following [17]

(3) where is the adaptation step-size parameter and is a constant added to prevent gradient noise amplification, when any of the are small. The powers are estimated power terms using the following recursive equations

(4) where is a constant close to but smaller than one. It has been noted in [17] that upon convergence of the reflection coefficients, the above algorithm achieves complete orthogonalization of the two-channel backward prediction-er, rors, i.e., it has been assumed that where , is a diagonal matrix. While this may hold good in certain scenarios, such as the case when the two input signals are uncorrelated, it is not true, in general. A more rigorous examination reveals that for a stationary two-channel stereo input vector set , the backward prediction-error vectors obtained using a lattice filter forms is uncorrelated with for an orthogonal set, i.e., . In other words, , where is the Kronecker delta that takes the value of 0 for

Fig. 3. Plot of the covariance matrix

R = E [b(n)b(n) ].

and 1 for . However, the 2 2-element autocorrelation may be nondiagonal. This matrix is a block diagonal matrix [20]. implies that Simulations are performed to examine the contribution of the . Fig. 3 shows a plot of the first off-diagonal elements of . Results are obtained by avermagnitude of the elements of aging over 10 independent runs and the length of each adaptive filters are chosen to be 32 taps. The inputs to the adaptive filters and are generated by filtering a zero-mean, unit variance Gaussian sequence through two independent far-end . The room room echo paths, each having length echo paths are chosen to be independent, zero-mean Gaussian , sequences, each having a variance that decays at the rate of where is the sample number. This experiment clearly shows that the magnitude of the first off-diagonal elements in each is around 50% of the magnitude of the main diagonal elements and hence its contribution cannot be ignored. As expected, the magnitudes of the remaining off-diagonal elements are much smaller and tend towards zero.

Authorized licensed use limited to: The University of Utah. Downloaded on August 7, 2009 at 16:36 from IEEE Xplore. Restrictions apply.

2922

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 57, NO. 8, AUGUST 2009

III. TWO-CHANNEL GRADIENT LATTICE ADAPTIVE ALGORITHM A transversal filter used to estimate the echo picked up by from the input sequences and the microphone can be implemented using the two-channel lattice structure. The as shown in Fig. 1 can output of the transversal filters be obtained as a linear combination of the backward predic. The lattice predictor is used to transtion-error vector form the input signals to the backward prediction-errors. The linear combiner uses these backward prediction-errors to produce an estimate of the echo. This is referred to as the lattice joint process estimator [15], [17]. In this section, a step-normalized LMS algorithm for the adaptive adjustment of the linear combiner part of a two-channel lattice joint process estimator is developed. The linear combiner coefficients represented by are updated using the following adaptive equation [15], [21]

(9) and where define a coefficient error vector as that

. If we , we note

(10) where (11) According to the principle of orthogonality, we have [15]. Then it follows from (5), (10) and the independence assumption commonly used in adaptive filtering literature [14], [15] that

(5) where

is the adaptation step-size parameter and

is a vector containing the backward prediction-errors of , both the channels. The error signal is the desired signal as shown in Fig. 1. where In [17], was assumed to be a diagonal matrix. Accordwas chosen to be a ingly, the step-normalization matrix diagonal matrix with the diagonal elements consisting of , where , , 2, denotes the powers of the backare ward prediction-errors. The diagonal elements of used to normalize the power of the elements along the diagonal . The powers of the backward predicof tion-errors are computed recursively as

(6) In this paper, we correct the above normalization process to account for the block diagonal structure of . We choose the to be a block diagstep-normalization matrix onal matrix with the block diagonal elements consisting of the , where 2 2 matrices (7) and are computed using (6) and puted recursively as

is com-

(8) We shall now briefly analyze the behavior of the mean values of the linear combiner coefficients. If denotes the optimum linear combiner coefficients in a joint estimation setup, then we have, [15],

(12) From (12), we note that the convergence behavior of the two-channel gradient lattice adaptive is controlled by the . Each eigeneigenvalues of the matrix value will determine a particular mode of convergence in the direction defined by its associated eigenvector [15]. Thus, an appropriate choice of the normalization matrix will result in . Consequently, the joint process estimator will be controlled by just one mode of convergence. In Fig. 4, we present pictorial representations of the normalized for the case when (a) is the covariance matrix diagonal matrix as defined in [17]; (b) is the block diagonal is the covariance matrix matrix constructed using (7); and as shown in Fig. 3. The inputs to the adaptive filters and are the same as the ones described in Section II while simulating Fig. 3. These plots clearly indicate that only the covariance matrix normalized using the block diagonal will equalize the eigenvalues and hence result in matrix equal modes of convergence. This confirms the validity of our argument that normalization should be performed using the block diagonal matrix. A. Misadjustment of Lattice Joint Process Estimator It is important to note that in a two-channel lattice joint process estimator, the reflection coefficients and the linear combiner coefficients are being updated simultaneously. Consequently, any change in the reflection coefficients will require readjustment of the linear combiner coefficients and this will lead to a significant increase in the steady-state mean-square error (MSE). This particular characteristic of the lattice joint process estimator is discussed in detail for a single-channel setup in [15]. It is demonstrated in [15] that the adaptation of the reflection coefficients has to be stopped after some initial convergence to achieve a low steady-state mean-square error. However, in the case of speech inputs, the optimum reflection coefficients are time-varying since they have to track the time-varying statistics of the inputs. Hence, this requires continuous adaptation of the reflection coefficients as well as

Authorized licensed use limited to: The University of Utah. Downloaded on August 7, 2009 at 16:36 from IEEE Xplore. Restrictions apply.

RAO AND FARHANG-BOROUJENY: FAST LMS/NEWTON ALGORITHMS

Fig. 4. Normalized covariance matrix 3 matrix.

R

2923

with (a) the wrong assumption that 3 is a diagonal matrix, and (b) the correct assumption that 3 is a block diagonal

the slow modes of convergence are absent (or, at least, less dominant) when the block diagonal matrix is used for normalization instead of the diagonal matrix. While this simulation example provides further validation of our earlier observations, the continuous perturbation of the reflection coefficients will limit the applicability of the gradient lattice adaptive algorithm. Furthermore, the high computational complexity of the two-channel lattice joint process estimator makes its implementation impractical for very long filter lengths. In the rest of this paper, we will derive and study two low complexity adaptive algorithms that make use of the properties of the two-channel gradient lattice algorithm and at the same time are insensitive to the perturbations of the reflection coefficients. IV. LMS/NEWTON ALGORITHMS BASED ON AR MODELING Fig. 5. Comparison of the MSE. (Solid—the wrong assumption that 3 is a diagonal matrix and Dashed—the correct assumption that 3 is a block diagonal matrix).

the linear combiner coefficients, which will result in a further increase in the MSE. We will demonstrate the above phenomenon by a simulation example. We will also use this example to compare the performance of the two-channel gradient lattice adaptive algorithm normalized using the diagonal matrix of [17] with the same algorithm normalized using the block diagonal matrix as defined in (7). Fig. 5 presents a pair of learning curves for the modeling problem using the same set of inputs, far-end room and near-end room conditions as described previously. We selected , and uncorrelated noise is added to such that a signal-to-noise-ratio (SNR) of 40 dB is achieved in our simulations. To demonstrate the effect of the perturbations of the , we stopped adapting reflection coefficients denoted by from iteration 30 000 onward. It is apparent that the continuous adaptation of the reflection coefficients has a significant impact on the MSE. Once the adjustment of the reflection coefficients is stopped, the algorithm converges quickly to the noise floor level. While both the pairs of learning curves eventually converge to the same steady-state MSE, we can clearly see that

Two versions of the LMS/Newton algorithm based on autoregressive modeling were proposed for the single-channel case in [18]. These were based on the fact that the input sequence (a speech signal) to the adaptive filter can be modeled as an AR process of order , where can be much smaller than the filter length . This results in an efficient way of updating without having to estimate , where is an estimate of the and, here, input correlation matrix is the mono-channel filter input vector. In this section, we derive the LMS/Newton algorithms for the stereophonic setup. can be exFor a stereophonic system, we note that pressed as (13) is the

where

and has the form

.. .

filter input vector given as

is the

transformation matrix [14] and

.. .

.. .

Authorized licensed use limited to: The University of Utah. Downloaded on August 7, 2009 at 16:36 from IEEE Xplore. Restrictions apply.

..

.

.. .

.. . (14)

2924

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 57, NO. 8, AUGUST 2009

where is a 2 2 identity matrix, is a 2 2 zero matrix is a 2 2 backward predictor coefficient maand each trix. Equation (13) is also known as the Gram-Schmidt orthogonalization algorithm [14]. This algorithm provides a one-to-one and the backcorrespondence between the input vector . From (13), it follows that ward prediction-error vector

to be an AR process of order , input sequence is sufficient. The matrix then a lattice predictor of order takes the form of (19), shown at the bottom of the page, and the takes the form of vector

(15) On the other hand, the update equation for the ideal LMS/Newton algorithm is [14], [15] (16) where . It is important to note that any perturbations in the reflection coefficients is only present in , which is incorporated in the update equation as a part of the step-size. The other terms in the update part of (16), and are independent of any perturbations. Consequently, based on the assumption that is very small, the LMS/Newton algorithm remains robust despite the continuous adaptation of the reflection coefficients. However, in (5), the linear combiner coefficients are updated using the backward prediction-error and the error signal , which is also computed vector . The time-varying nature of the reflection coeffiusing cients will result in time-varying backward prediction-errors and thus, adversely affect the MSE performance. Using (13) and (15), (16) can be written as (17) where (18) The significance of (17) is that the computation of according to (18) can be performed at a low complexity. In the sequel, we derive two implementations of the LMS/Newton algorithms based on (17) and (18). Henceforth, we will refer to them as Algorithm 1 and Algorithm 2. A. Algorithm 1 This algorithm will involve the direct implementation of (18) through the use of a lattice predictor. Since we are assuming the

.. .

.. .

..

.

.. .

(20) in (20) requires us to update only The special structure of the first elements of . The remaining elements . are just the delayed versions of by . It inWe first consider the multiplication of through . volves the estimation of the powers of The powers of these backward prediction-error vectors are computed using (6) and (8). Unlike the single-channel implementais diagonal, tion in [18], wherein the normalization matrix . we have to now consider the block diagonal structure of Hence, the 2 2 normalization matrix , is constructed as described in (7). We note that in a typical acoustic echo canceler, can be chosen to be as small as 8 and inverting matrices of size 2 2 constitutes only a small percentage of acoustic echo canceler complexity. , we now have to mulTo complete the computation of by (according to (18)). A close examination tiply vector described in of the structures of the matrix and (19) and (20), respectively, will reveal that in order to compute , only the first and the last elements of need to be computed. The remaining elements of are the th and nd elements. delayed versions of its The elements of can be estimated using the two-channel Levinson-Durbin algorithm [20] and we note that the coeffineed to be computed. cients of prediction filters of order 1 to Accordingly, we formulate the Algorithm 1 as shown. using the (1)–(4) 1) Run the lattice predictor of order and (6)–(8) to obtain the reflection coefficients and the backward prediction-errors. 2) Run the two-channel Levinson-Durbin algorithm to convert the reflection coefficients to the backward ’s. As the derivation of the predictor coefficients two-channel version of the Levinson-Durbin algorithm

.. .

..

.

.. .

.. .

..

.

.. . (19)

.. .

.. .

..

.

.. .

.. .

..

.

.. .

.. .

..

.

Authorized licensed use limited to: The University of Utah. Downloaded on August 7, 2009 at 16:36 from IEEE Xplore. Restrictions apply.

.. .

RAO AND FARHANG-BOROUJENY: FAST LMS/NEWTON ALGORITHMS

2925

is not commonly found in literature, we have provided a derivation of it in Appendix A. The equations for this particular step are shown here for completeness.

5) Similarly, compute the last elements of . Let denote the last elements of the normalized backward prediction-error vector such that

(25) and

is the bottom-right part of having dimension , then the last elements of are

(26)

(21) where each is a 2 2 forward predictor coefficient matrix.1 3) Compute the elements that are the delayed versions of the th and nd elements of as follows

(22a) (22b) will be the th 2 1 Note . vector component of elements of . If 4) Compute the first denotes the first elements of the normalized backward prediction-error vector such that

6) Finally, compute the adaptive filter output , the error signal and update the filter taps using (17). To implement the lattice predictor using (1)–(4) and (6)–(8), we require multiplications. The Levinson-Durbin almultiplications. We gorithm given in (21) requires multiplications to update further need using (24) and (26). Finally, multiplications are required to adaptively update the transversal filter coefficients. Hence, in order to implement the fast LMS/Newton Algorithm 1, we remultiplications. The quire a total of number of the required additions is about the same. Typically, can take a value of 8 and the adaptive filter length may be 1500, for a medium size office room. With these numbers, each would make up only 17% of the total computaupdate of tional complexity of the acoustic echo canceler. B. Algorithm 2

(23) where

,

is as defined in (7), and denotes the top-left part of having dimension , then the first elements are of ,

(24)

The two-channel LMS/Newton Algorithm 1 is structurally complicated despite having reasonably low computational complexity. Manipulating the data is not all that straightforward and hence it is more suitable for implementing using software. We now propose an alternate algorithm that is computationally less complex and can be easily implemented in hardware. If we look at the matrix given in (19), we observe that rows of this matrix are uniquely only the first represented. The remaining rows are just the delayed verth and nd rows given by sions of the . Now, if we are rows of out of our computation able to remove the first , then we shall be able to simplify Algorithm 1. This of leads us to the development of the fast LMS/Newton Algorithm 2. This particular version of the LMS/Newton algorithm can be developed by extending the input and tap-weight vectors and , to the following vectors

1It is important to note that in the single-channel lattice, the forward and backward predictor coefficients are related according to the equation a g [15]. However, such a relationship does not hold in a two-channel lattice. Thus, some simplifications that are applicable to single-channel lattice equations are inapplicable to the two-channel case. Consequently, direct mimicking of the results of [18] is not possible here and, thus, we provide a fresh derivation of Algorithms 1 and 2, independent of [18].

=

Authorized licensed use limited to: The University of Utah. Downloaded on August 7, 2009 at 16:36 from IEEE Xplore. Restrictions apply.

(27)

2926

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 57, NO. 8, AUGUST 2009

and

respectively. It is well-known that in a single-channel lattice , and this relationship between the forward and backward predictor coefficients was used to derive the single-channel LMS/Newton algorithm in [18]. In for only a two-channel lattice, [refer to (A6a) and (A8a) in the Appendix A]. But based on the perspective gained from extensive experimentation, we observed that this relationship also approximately holds true for . Hence, we introduce the approximate relationship between the forward and backward predictor coefficients as

(28) respectively, and applying (16) to update the . We also need to extended tap-weight vector appropriately take care of the dimensions of and . Since we are only interested in the tap-weights corresponding to ,

(35)

and the last elements of the extended the first tap-weight vector can be permanently set to zero and assigning a zero step-size parameter to all of them. This and will also remove the computation of the first elements of . Hence the the last recursive equation is now modified to

The main motivation behind introducing this approximation is to use the transposed backward predictor coefficients in reverse order to estimate the forward prediction-errors. Consequently, we can rewrite (33) as (36)

(29) where

From (31) and (34), we recognize that the filtering of the through a backward prediction-error input vector is equivalent to evaluating . filter to obtain is normalThe backward prediction-error vector , where is the 2 2 maized with trix constructed according to (7). This will give us an update . We then use the normalized backward preof as an input to diction-error vector a filter whose coefficients are the transposed duplicates of the backward prediction-error filter in reverse order. We recognize from (32) and (36) that this filter turns out to be the forward prediction-error filter, assuming that (35) holds. As a result, the output of the forward prediction-error filter will provide us with . Thus, the samples of the vector we can see that the approximation introduced in (35) facilitated the development of an algorithm that can be efficiently implemented on hardware. At the same time, this algorithm was shown to satisfactorily exhibit the fast converging characteristics of the LMS/Newton algorithm over a wide range of experiments.

(30) is a matrix defined as (31), shown is a matrix at the bottom of the page, and defined as shown in (32) at the bottom of the page. Upon examining (30), we can see that it is only necessary to update the and then the first 2 first 2 1 element vector of 1 element vector of the final result, . The remaining elements will be the delayed versions of these first two elements. Recall that the forward and backward prediction-errors are given as (33) and (34)

.. .

.. .

.. .

.. .

..

..

.

.. .

.

.. .

.. .

.. .

..

.

..

.

.. .

.. .

..

..

.

.

.. .

.. .

Authorized licensed use limited to: The University of Utah. Downloaded on August 7, 2009 at 16:36 from IEEE Xplore. Restrictions apply.

.. .

.. .

(31)

(32)

RAO AND FARHANG-BOROUJENY: FAST LMS/NEWTON ALGORITHMS

2927

In a practical implementation, we choose the original input to and not . To account for the predictor filter to be is delayed by samples to be this delay, the desired signal time aligned with . This will result in a delayed LMS algorithm whose performance is very close to its nondelayed version . Moreover, since the power terms are assumed when to be time invariant over the length of the prediction filters, the normalization block is moved to the output of the forward prediction-error filter. Accordingly, we formulate Algorithm 2 as shown. 1) Run the lattice predictor of order using the (1)–(4) and (6)–(8) to obtain the reflection coefficients and the . backward prediction-error vector that are the delayed 2) Compute the elements of versions of its first two elements. (37a) (37b) 3) Run the lattice predictor of order using the (1) and (2) (reflection coefficients have already been computed in as the input to obtain the forward Step 1) with . prediction-error vector to be the first 4) Compute the first two elements of premultiplied with the 2 2 two elements of normalization matrix . This particular version of the LMS/Newton algorithm is computationally less intensive when compared to Algorithm 1. To implement the lattice predictor using (1)–(4) and (6)–(8), we multiplications. Updating using the forrequire ward prediction-error filter requires a further multiplications. If we include the adaptive transversal filter updates, Almultiplications. gorithm 2 requires a total of and , updating constitutes Thus, for only 4% of the total complexity of the acoustic echo canceler. V. SIMULATION RESULTS A. Experiments With a Stationary Signal We shall now present the simulation results and compare the performance of the two versions of the LMS/Newton algorithm with the NLMS algorithm, the XM selective-tap NLMS implementation of [11] and the leaky XLMS algorithm of [9]. Though the computational complexity of the leaky XLMS algorithm is almost twice that of the NLMS algorithm [9], it does not add any signal distortion and hence provides a suitable benchmark for comparison. While Algorithm 1 will be an exact implementation of the ideal two-channel LMS/Newton algorithm, an approximation was introduced in the form of (35) while deriving Algorithm 2. Hence, we will refer to them in our results as Exact Algorithm 1 and Approximate Algorithm 2, respectively. We also note that we have not added any form of nonlinearity to any of these algorithms. The room echo paths are independent, zero-mean Gaussian sequences, each having a variance that de, where is the sample number. Sevcays at the rate of eral experiments confirmed that this particular model generates

responses that closely approximate the characteristics of a typical room echo path as depicted in [22]. The length of the modeling adaptive filters is set equal to 1024 and the length of the is also selected to be the same. near-end room echo paths Upon extensive experimentation, it was observed that the algorithms exhibited satisfactory performance when the order of the AR model is chosen to be 8. The reference inputs to the adaptive filters are generated by filtering a zero-mean, unit variance Gaussian sequence through the two far-end room echo paths, equal to 2048. This particular value of each having length will satisfy the condition for the existence of a unique solu[5]. In all our simulations, we have selected tion, i.e., , and the power terms in (4), (6) and (8) are initialized to one. For the XM-NLMS algorithm, we chose in the size of the tap-selection set (denoted by the parameter [11]) to be . For the leaky XLMS algorithm, we chose the correlation coefficient to be 0.5 and the leakage factor to be . The notations are the same as those used by the authors such that a SNR of in [9]. Uncorrelated noise is added to 40 dB is achieved and the simulation results are obtained after averaging over 10 independent runs, for each case. Fig. 6(a) compares the MSE and Fig. 6(b) compares the normalized misalignment2 curves of the two versions of the LMS/ Newton algorithms with the other algorithms. We can see that among all the implementations, our algorithms converge the fastest. We recognize that Algorithm 1 is an exact implementation of the ideal LMS/Newton algorithm for which extensive theoretical analysis is available in [4], [14], and [15]. The ideal LMS/Newton algorithm does not suffer from any eigenvalue spread and has only one mode of convergence. Hence, it is particularly important to note that the Exact Algorithm 1, as expected, is governed by a single mode of convergence. Moreover, this also holds approximately in the case of Approximate Algorithm 2. We have also studied the adaptation of the algorithms when there is an abrupt change in the far-end room echo paths. The new room echo paths are also independent, zero-mean Gaussian sequences, but each of them have a variance that increases at the rate of , where is the sample number. While this model may not describe a typical room echo path, the main intention was to observe the behavior of the algorithms in the likelihood of a drastic change in the far-end room. Simulation results as shown in Fig. 7 indicate that when compared with the other implementations, the LMS/Newton algorithms exhibit a faster response to the echo path change. Also, improvement in misalignment is more significant. B. Experiments With Speech Signals The algorithms are also tested to evaluate their effectiveness when speech signals are used as inputs. Training adaptive filters using speech is challenging because of the nonstationary nature of speech signals and their wide dynamic range of magnitudes. in Fortunately, the presence of the normalization factor in (16) leaves front of the stochastic gradient vector the step-size independent of the power of the input signal. 2Normalized misalignment is computed as kh(n) 0 w(n)k =kh(n)k , where h(n) is the true room echo path and w(n) is the estimated room echo path.

Authorized licensed use limited to: The University of Utah. Downloaded on August 7, 2009 at 16:36 from IEEE Xplore. Restrictions apply.

2928

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 57, NO. 8, AUGUST 2009

Fig. 6. (a) Comparison of the MSE. (b) Comparison of the misalignment. (Solid—Exact Algorithm 1, Dashed—Approx. Algorithm 2, Dotted—NLMS, Dashdotted—XM-NLMS and Thick Solid—Leaky XLMS).

Fig. 7. (a) Comparison of the MSE. (b) Comparison of the misalignment. An abrupt change is made in the far-end room at iteration 150,000. (Solid—Exact Algorithm 1, Dashed—Approx. Algorithm 2, Dotted—NLMS, Dash-dotted—XM-NLMS and Thick Solid—Leaky XLMS).

Thus, the LMS/Newton algorithm resolves the problem of dynamic range of the input process. Hence, we compare the performance of our algorithm with the NLMS algorithm, which is also known to be robust to the dynamic range of the input process. The NLMS algorithm is implemented using the XM tap-selection technique of [11]. In this particular set of experiments, the room echo paths are independent, zero-mean Gaussian sequences, each having a vari, where is the sample number. ance that decays at the rate of We select the adaptive filter length to be 1024 taps. At a sampling rate of 8 kHz, this will enable us to model 128 ms of echo which is reasonable for a medium-size office room. The length is set equal to 2048 and the of the far-end room echo paths is selected to be 1024. length of the near-end room echo paths Once again it was observed that the algorithms exhibited satisfactory performance when the order of the AR model is chosen to

be 8. For the simulation results presented here, we have selected , and the power terms in (4), (6) and (8) are initialized to one. For the XM-NLMS algorithm, we once again . A zero-mean chose the size of the tap-selection set to be Gaussian sequence whose variance was set at 40 dB below the variance of the echo signal is added to it. The simulation results are obtained after averaging over 10 independent runs. We compare the performances of the different algorithms when there is an abrupt change in the far-end room. The measure of the echo-return-loss enhancement (ERLE), [17], [18], is chosen as a metric for performance evaluation. The ERLE is defined as

Authorized licensed use limited to: The University of Utah. Downloaded on August 7, 2009 at 16:36 from IEEE Xplore. Restrictions apply.

(38)

RAO AND FARHANG-BOROUJENY: FAST LMS/NEWTON ALGORITHMS

2929

Fig. 8. (a) Comparison of the ERLE. (b) Comparison of the misalignment. An abrupt change is made in the far-end room after 18.75 s. (Solid—Exact Algorithm 1, Dashed—Approx. Algorithm 2 and Dotted—XM-NLMS).

where is the echo picked up by the microphone, is , and is the the uncorrelated Gaussian noise added to residual echo signal transmitted back to the far-end room. The measured ERLE’s are based on averages of 1000 neighboring samples for each point of the plots. Figs. 8(a) and 8(b) compares the ERLE and the misalignment of the XM-NLMS algorithm with the LMS/Newton Algorithm 1 and Algorithm 2, respectively. From these figures, it is clear that the results are consistent with those for the case when the input is white and highlight the superior performance of the proposed algorithms as compared with the XM-NLMS algorithm. Moreover, the Exact Algorithm 1 performs better as expected, than the Approximate Algorithm 2.

Similarly, the forward prediction-error vectors are given by (A2) As in the well-known single-channel case [15], we can update the two-channel backward prediction-error and forward prediction-error vectors as (A3) and (A4)

VI. CONCLUSION We presented a two-channel gradient lattice adaptive algorithm for the problem of stereophonic AEC. The limitations of this algorithm led to the development of two new implementations of the two-channel version of the LMS/Newton algorithm and their application to stereo echo cancellation. The implementations provide for efficient realization of long adaptive filters. An exact implementation of the LMS/Newton algorithm was derived but the structural complexity of this algorithm might limit its applicability in a hardware/custom chip. A second algorithm was proposed to overcome this limitation. The ill-conditioned nature of the stereophonic AEC problem can result in a high misalignment [5]. This will lead to adaptive algorithms being sensitive to changes in the far-end room [2], [8]. However, the fast convergence of our algorithms helps to alleviate this problem to a great extent. APPENDIX A DERIVATION OF THE TWO-CHANNEL LEVINSON-DURBIN ALGORITHM

respectively. Using (A1) and (A2), we can expand (A3) as

(A5) Upon equating the coefficients of

, we have (A6a) (A6b)

Similarly, we can work with the forward prediction-errors and use (A1) and (A2) to expand (A4) as

If forms an orthogonal basis set , then the backward prefor diction-error vectors are given by (A.1)

Authorized licensed use limited to: The University of Utah. Downloaded on August 7, 2009 at 16:36 from IEEE Xplore. Restrictions apply.

(A7)

2930

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 57, NO. 8, AUGUST 2009

Once again, equating coefficients of following result

will lead to the

(A8a) (A8b) Thus, (A6) and (A8) constitute the two-channel LevinsonDurbin algorithm that is used to convert the reflection coefficients to the predictor coefficients. ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers whose constructive comments and suggestions greatly helped to improve the quality of the paper.

[16] F. Ling and J. G. Proakis, “A generalized multichannel least squares lattice algorithm based on sequential processing stages,” IEEE Trans. Acoust., Speech Signal Process., vol. ASSP-32, no. 2, pp. 381–389, Apr. 1984. [17] K. Mayyas, “Stereophonic acoustic echo cancellation using lattice orthogonalization,” IEEE Trans. Speech Audio Process., vol. 10, no. 7, pp. 517–525, Oct. 2002. [18] B. Farhang-Boroujeny, “Fast LMS/Newton algorithms based on autoregressive modeling and their application to acoustic echo cancellation,” IEEE Trans. Signal Process., vol. 45, no. 8, pp. 1987–2000, Aug. 1997. [19] N. Kalouptsidis and S. Theodoridis, Adaptive System Identification and Signal Processing Algorithms. London, U.K.: Prentice-Hall, 1993. [20] V. J. Mathews and S. C. Douglas, 2003, Adaptive Filters [Online]. Available: http://www.ece.utah.edu/mathews/ece6550/chapter3.pdf [21] S. S. Narayan, A. M. Peterson, and M. J. Narasimha, “Transform domain LMS algorithm,” IEEE Trans. Acoust., Speech Signal Process., vol. ASSP-31, no. 3, pp. 609–615, Jun. 1983. [22] T. Gänsler and J. Benesty, “New insights into the stereophonic acoustic echo cancellation problem and an adaptive nonlinearity solution,” IEEE Trans. Speech Audio Process., vol. 10, no. 5, pp. 257–267, Jul. 2002.

REFERENCES [1] M. M. Sondhi, D. R. Morgan, and J. L. Hall, “Stereophonic acoustic echo cancellation—An overview of the fundamental problem,” IEEE Signal Process. Lett., vol. 2, no. 8, pp. 148–151, Aug. 1995. [2] J. Benesty, T. Gänsler, D. M. Morgan, M. M. Sondhi, and S. L. Gay, Advances in Network and Acoustic Echo Cancellation. New York: Springer, 2001. [3] E. Hänsler and G. Schmidt, Topics in Acoustic Echo and Noise Control. Berlin, Germany: Springer-Verlag, 2006. [4] B. Widrow and S. D. Stearns, Adaptive Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1985. [5] J. Benesty, D. R. Morgan, and M. M. Sondhi, “A better understanding and an improved solution to the specific problems of stereophonic acoustic echo cancellation,” IEEE Trans. Speech Audio Process., vol. 6, no. 2, pp. 156–165, Mar. 1998. [6] D. R. Morgan, J. L. Hall, and J. Benesty, “Investigation of several types of nonlinearities for use in stereo acoustic echo cancellation,” IEEE Trans. Speech Audio Process., vol. 9, no. 6, pp. 686–696, Sep. 2001. [7] J. Benesty, F. Amand, A. Gilloire, and Y. Grenier, “Adaptive filtering algorithms for stereophonic acoustic echo cancellation,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Detroit, MI, May 1995, vol. 5, pp. 3099–3102. [8] T. Gänsler and J. Benesty, “Stereophonic acoustic echo cancellation and two-channel adaptive filtering: An overview,” Int. J. Adapt. Control Signal Process., vol. 14, no. 6, pp. 565–586, Aug. 2000. [9] T. Hoya, Y. Loke, J. A. Chambers, and P. A. Naylor, “Application of the leaky extended LMS (XLMS) algorithm in stereophonic acoustic echo cancellation,” Signal Process., vol. 64, no. 1, pp. 87–91, Jan. 1998. [10] J. Benesty, P. Duhamel, and Y. Grenier, “A multichannel affine projection algorithm with applications to multichannel acoustic echo cancellation,” IEEE Signal Process. Lett., vol. 3, no. 2, pp. 35–37, Feb. 1996. [11] A. W. H. Khong and P. A. Naylor, “Stereophonic acoustic echo cancellation employing selective-tap adaptive algorithms,” IEEE Trans. Audio, Speech Lang. Process., vol. 14, no. 3, pp. 785–796, May 2006. [12] A. W. H. Khong and P. A. Naylor, “Selective-tap adaptive algorithms in the solution of the nonuniqueness problem for stereophonic acoustic echo cancellation,” IEEE Signal Process. Lett., vol. 12, no. 4, pp. 269–272, Apr. 2005. [13] A. W. H. Khong and P. A. Naylor, “Frequency domain adaptive algorithms for stereophonic acoustic echo cancellation employing tap selection,” in Proc. IEEE Int. Workshop on Acoust., Echo Noise Control, Eindhoven, The Netherlands, Sep. 2005, pp. 141–144. [14] S. Haykin, Adaptive Filter Theory. New Delhi, India: Pearson Education, 2003. [15] B. Farhang-Boroujeny, Adaptive Filters: Theory and Applications. Chichester, U.K.: Wiley, 1998.

Harsha I. K. Rao (S’06) received the B.E. degree in electronics and communication engineering with highest honors from the National Institute of Technology, Tiruchirappalli, India, in 2003, the M.E degree in electrical engineering from the University of Utah, Salt Lake City, in 2008. He is currently pursuing the Ph.D. degree in electrical engineering. His dissertation has focused on the problems of acoustic crosstalk cancellation and stereophonic acoustic echo cancellation to achieve sound spatialization using a pair-wise loudspeaker paradigm. From 2003 to 2004, he was a design engineer at ABB in Bangalore, India. His research interests include adaptive filtering and its application in acoustic signal processing.

Behrouz Farhang-Boroujeny (M’84–SM’90) received the B.Sc. degree in electrical engineering from Teheran University, Iran, in 1976, the M.Eng. degree from University of Wales Institute of Science and Technology, U.K., in 1977, and the Ph.D. degree from Imperial College, University of London, U.K., in 1981. From 1981 to 1989 he was with the Isfahan University of Technology, Isfahan, Iran. From 1989 to 2000, he was with the National University of Singapore. Since August 2000, he has been with the University of Utah where he is now a professor and Associate Chair of the department. He is an expert in the general area of signal processing. His current scientific interests are adaptive filters, multicarrier communications, detection techniques for space-time coded systems, cognitive radio, and signal processing applications to optical devices. In the past, he has worked and has made significant contribution to areas of adaptive filters theory, acoustic echo cancellation, magnetic/optical recoding, and digital subscriber line technologies. He is the author of the books Adaptive Filters: Theory and Applications (Wiley, 1998), and Signal Processing Techniques for Software Radios (self-published at Lulu publishing house, 2009). Dr. Farhang-Boroujeny received the UNESCO Regional Office of Science and Technology for South and Central Asia Young Scientists Award in 1987. He served as associate editor of IEEE TRANS. ON SIGNAL PROCESSING from July 2002 to July 2005. He is now serving as associate editor of IEEE SIGNAL PROCESSING LETTERS. He has also been involved in various IEEE activities, including the chairmanship of the Signal Processing/Communications chapter of IEEE of Utah in 2004 and 2005.

Authorized licensed use limited to: The University of Utah. Downloaded on August 7, 2009 at 16:36 from IEEE Xplore. Restrictions apply.

Suggest Documents