Numerically stable fast transversal filters for recursive least squares ...

4 downloads 0 Views 2MB Size Report
Jan 1, 1991 - Abstract-In this paper, a solution is proposed to the long-standing problem of the numerical instability of fast recursive least squares transversal ...
92

IEEE

TRANSACTIONS ON SIGNAL PROCESSING, VOL. 39, NO.

1. JANUARY 1991

Numerically Stable Fast Transversal Filters for Recursive Least Squares Adaptive Filtering Dirk T. M. Slock,

Member, IEEE, and Thomas Kailath, Fellow, IEEE

Abstract-In this paper, a solution is proposed to the long-standing problem of the numerical instability of fast recursive least squares transversal filter (FTF) algorithms with exponential weighting, which is an important class of algorithms for adaptive filtering. A framework for the analysis of the error propagation in FTF algorithms is first developed; within this framework, we show that the computationally most efficient 7N form is exponentially unstable. However, by introducing redundancy into this algorithm, feedback of numerical errors becomes possible; a judicious choice of the feedback gains then leads to a numerically stable FTF algorithm with a complexity of 8N multiplications and additions per time recursion. The results are presented for the complex multichannel joint-process filtering problem.

the filter order and Tis the discrete time index. The error signal can be expressed as

+

l ) ] ” is the regreswhere XN(T ) = [x”( T ) * * x”( T - N sor.2We shall make the so-called “prewindowing” assumption that no data is available before time T = 0. In the least squares method with exponential weighting, the distant past is weighted less heavily in order to track slowly time-varying phenomena and the filter response WN,is obtained as the solution of

I. INTRODUCTION HE joint-process adaptive filtering problem can be described as follows. Two data sequences are available, the desired response d ( . ) and the input x ( * ). The input sequence is applied to a linear filter with finite impulse response (FIR). Then the problem is to adapt the filter response so as to make its output sequence as close as possible in some sense to the desired response sequence. In the general multichannel case, x ( T ) and d ( T ) arep x 1 and q X 1 complex vectors, respec1 ) is referred to as tively. The special case of d ( T ) = x ( T the (forward) prediction problem. Even though the basic formulation involves an FIR filter, infinite impulse response (IIR) filters may be appropriately reformulated as multichannel FIR filters [1]-[3]. Adaptive filtering has a wide variety of applications ranging from high resolution spectrum estimation and line enhancement, speech and biomedical signal processing, adaptive differential encoding and adaptive deconvolution to interference suppression, echo cancellation and channel equalization (see, e.g., [4], [ 5 ] ) . The multichannel algorithm can furthermore accommodate such applications as black-box system identification with multiple polynomials and self-tuning control. Other multichannel applications are frequency domain adaptive filtering, multirate signal processing, fractionally spaced and decisionfeedback equalizers, and image enhancement. We shall consider a transversal (or tapped-delay-line) FIR linear filter structure; the time-varying impulse response of the where N is filter is denoted by the q X Np row vector’ WN,T,

T

+

Manuscript received May 14, 1988; revised November 7, 1989. This work was supported in part by the Joint Services Program at Stanford University (U.S.Army, U.S. Navy, U.S. Air Force) under Contract DAAG2985-K-0048 and the SDIIIST Program managed by the Office of Naval Research under Contract N00014-85-K-0550. D. T. M. Slock was with the Information Systems Laboratory, Stanford University, Stanford, CA 94305. He is now with the Philips Research Laboratory, B-1348 Louvain-la-Neuve, Belgium. T. Kailath is with the Department of Electrical Engineering, Information Systems Laboratory, Stanford University, Stanford, CA 94305. IEEE Log Number 9040380. ‘The term vector refers to the single channel case ( p = q = 1 ).

T

=

C WN.T t = O

min

~ ~ - ‘ e E ( t ) e ~ ( t )

(2)

where tr denotes trace. There are two much-studied methods of solving the minimum-mean-square problem, which is to minimize the variance of the error signal. One solution is a stochastic Gauss-Newton algorithm, which leads to the least squares problem described above; its solution can be computed recursively in time, leading to the recursive least squares (RLS) algorithms. In contrast, the LMS algorithms also attempt to minimize the mean square of the error signal, but do so by employing a stochastic gradient technique. This technique involves an instantaneous estimate of the gradient. The advantages of RLS algorithms over LMS algorithms in aspects such as tracking behavior are well known. In high or medium SNR environments, the problem to be solved is essentially quadratic in nature. Since the RLS algorithm solves quadratic problems exactly, it can find the optimal solution quickly. Alternatively, it can be shown [6] that the convergence behavior of the RLS algorithm is independent of the spectral characteristics of the input signal. For the LMS algorithm, such a dependence is very strong. Furthermore, the requirement of convergence imposes a bound on the gain of the LMS algorithm and this bound depends on the eigenvalue spread of the autocorrelation matrix of the input signal. However, in practice, the use of LMS algorithms is widespread due to its computational simplicity. More recently, fast RLS algorithms have been introduced that circumvent the computational burden of the Riccati equation in the conventional RLS algorithm. There are two families of such fast algorithms, corresponding to two possible filter structures: the fast lattice (FLA) and the fast transversal filter (FTF) algorithms. Both algorithms, like the LMS algorithm, lead to a computational complexity of O ( N ), but FTF has a significantly lower coefficient of N than FLA ( FLA requires O ( N ) divisions which are com’Superscript H denotes Hermitian transpose.

1053-587X/91/0100-0092$01.OO

0 1991 IEEE

SLOCK AND KAILATH: NUMERICALLY STABLE FAST TRANSVERSAL FILTERS

TABLE I COMPARISON OF SOME ADAPTIVE ALGORITHMS Problem Solved

Tracking Speed

Computational Complexity ( X )

Numerical Stability

LMS RLS FLA

LMS RLS RLS

slow

exponential exponential exponential

FTF

RLS RLS

fast fast

2N 2N2 + 6N 14N( X ) + 2N( + ) = 30N( X ) IN 8N

SFTF

fast fast

unstable

exponential

93

numerical instability had been given. In the FTF algorithm, the are backward prediction filter BN, and the Kalman gain jointly updated via a so-called hyperbolic rotation:

cN,7.

where X N , Tis a hyperbolic rotation, i.e., X,&JXN,T = J with

“1

J = [ ’ 0 -I putationally more intensive than multiplications). The salient facts are summarized in Table I, which indicates that the FTF algorithm is a very attractive adaptive algorithm. However, numerical considerations disturb this picture. A recent survey of some numerical issues in adaptive algorithms is given in [7]. The LMS and the conventional RLS and FLA algorithms have been found to be exponentially numerically stable (for recursive algorithms, by numerical stability we mean the stability of the error propagation system, see below). Some pertinent early references are [81, [91 for LMS, [101-[12] for RLS and [lo], [13] for FLA. Note that different RLS algorithms (in transversal filter form) only differ in the way they compute a certain quantity often called the Kalman gain. In [ 121, [ 141, the update of the filter solution WN, is considered, and it is assumed that a certain quantized version of the Kalman gain is available. It is claimed in [12] that this leads to results that are valid regardless of the method used to compute the Kalman gain. While strictly speaking this is true, it will be clear from this paper (and also from [lo], [11]) that the numerical issues involved in computing the Kalman gain itself are far more crucial . Even though FTF type algorithms were proposed already in [ 13, numerical problems were reported only several years later [15], [16]. Indeed, while an algorithm may solve a certain problem exactly, it is not until the algorithm is actually implemented that numerical issues emerge (investigation of the numerical properties is often triggered by the observation of certain phenomena in the implementation). In order to arrive at a working algorithm, certain remedies were proposed in the form of rescuing devices [17], [16]: a quantity indicative of the level of the numerical errors is observed until a threshold is reached, at which point the algorithm is restarted. Even though the restart can include the current solution as prior information by means of a soft-constrained initialization, such a restart leads to suboptimal performance because of a significant discontinuity in the sample covariance matrix in the normal equations that determine the solution. The exponential divergence of numerical errors was demonstrated in [lo] for one particular input signal, considering the original fast Kalman version [ 11 of the FTF algorithms. In [ 101, the following statement appears: “There is nothing in terms of numerical tricks and improved organization of calculations that can be done to remove the bad error propagation properties. That can only be done by changing the algorithm itself.” This means that if one increases the precision of the computations by, for instance, doubling the wordlength, then numerical errors will still diverge. It will only take longer before the errors become noticeable. Stabilization of the algorithm really requires an actual change in the algorithm so that the dynamics of the error propagation are modified. Earlier on in [16], the following heuristic explanation for the

(the hyperbolic rotation is actually only a strict hyperbolic rotation if operating on the normalized versions of the filters, but is otherwise also a transformation with an unstable eigenvalue). Intuitively, since the hyperbolic rotation has an eigenvalue bigger than one in magnitude, we can expect that certain error components on the filters will be amplified by the above transformation. However, this explanation is incomplete since in the algorithms the hyperbolic rotation itself is a function of the vector that is going to be rotated. So, one does not get the new filter estimates from the old ones via linear transformations, but through some more involved nonlinear transformation. Hence, the errors on the filter estimates are not simply multiplied by hyperbolic transformations. It is precisely some degree of freedom in the structure of the nonlinear transformation that will allow us to stabilize the error propagation. Another reason why the hyperbolic rotation picture is misleading is that it only focuses on part of the algorithm. Nevertheless it is true that the presence of hyperbolic rotations is strongly indicative of possible numerical difficulties. More recently, efforts to attempt to stabilize the FTF algorithms have intensified. In the leaky LMS algorithm, certain leakage factors are introduced in order to stabilize the LMS algorithm in case the signal covariance matrix is singular. Cioffi [ 181 has studied leaky FTF algorithms by considering a so-called multiexperiment version [ 191 and introducing an artificial experiment to control the magnitude of the filter estimates; however, he found that too much performance had to be sacrificed to achieve stability. In contrast, the stabilized FTF algorithm of this paper does not introduce leakage factors and still solves the RLS problem exactly. A . Redundancy and Error Feedback

It has been noticed that by introducing computational redundancy in a certain sense (namely computing certain quantities in two different ways), one can make specific measurements of the numerical errors available. These measurements can be fed back to appropriately modify the dynamics of the propagation of errors. In [20]-[24] the availability of redundancy in the FTF algorithm was recognized and utilized; however, all the algorithms therein turn out to be only partial solutions to the stabilization problem (in that they use only one redundancy instead of two and hence cannot completely stabilize the error propagation system). We shall discuss the algorithm of [20]-1221, in more detail in Section V and show that the interpretation that has been given for the error control scheme in that algorithm can be misleading. We should emphasize that the possibility of redundancy is a key feature of the FTF algorithms considered here. If one attempted to introduce redundancy in the LMS or conventional RLS algorithms, by scheduling the order of multiplications and

94

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 39, NO. I , JANUARY 1991

additions in different ways and the like, one would not accomplish any change in the dynamics of the propagation of numerical errors (due to lack of observability). The redundancy in the FTF algorithm is of a more fundamental nature. The fact of the availability of redundancy in the FTF algorithm had been recognized for a while by the first author of this paper; however, the idea of error feedback and its proper structure came only later, but opened the way to a complete solution. Apart from a solution to the problem of numerical instability in FTF algorithms, this paper (a first version of which was presented in [25])also provides a concrete analysis of the solution, which demonstrates among other things the instability of the original 7N algorithm and its variants. By exploiting redundancy properly, we arrive at a numerically stable FTF algorithm, of which the computational complexity of 8N shows a 14% increase over the basic 7N. One may remark that the line of thought in the proposed solution runs parallel to the basic philosophy in robust data communication. Consider the problem of transmitting a source message over a noisy channel. Because of the limited capacity of the channel, one will first apply source coding to the message to remove all redundancies in it. Then, to ensure a basically error-free transmission over the channel, one will reintroduce redundancies in the message (e.g., bits for error-correcting codes), specifically chosen so as to ensure as much robustness as possible with little overhead. Similarly, the RLS solution can be recursively computed using the well-known conventional algorithm with a complexity of O ( N 2 )computations. However, processors only have a limited computational capacity and therefore faster algorithms are desirable; in fact, because of the time invariance of the filter to be estimated, the complexity of RLS algorithms can be shrunk down to O(N).But then, to fortify the algorithm against numerical errors in the recursions, extra redundancy has to be reintroduced into the algorithm. The nature of this redundancy is a crucial issue and so is its complexity. Possibly, in order to achieve sufficient robustness, we need to add so much redundancy that we end up with an O ( N 2 ) algorithm. It turns out that this is not necessary and a redundancy of O ( N ) complexity can do the job. The rest of this paper is organized as follows. In Section 11, we discuss some general techniques for tackling the analysis of error propagation in recursive algorithms. In Section 111, we review the conventional RLS algorithm and analyze its error propagation; this simple exercise will give us appropriate insight for the analysis and evaluation of the FTF algorithm. Section IV briefly describes the derivation of the FTF algorithm and goes on to point out the redundancies that are available. The error feedback mechanism is introduced and the analysis of the error propagation is described in several steps. The stabilized FTF algorithm proposed here is put into perspective in Section V, and several previous variants are discussed and compared. Alternative redundancies that may be exploited to stabilize FTF algorithms are described there also. Section VI discusses in detail the choice of feedback gains to be used. The algorithm presented in this paper handles the general complex multichannel case. For simplicity, however, the analysis of algorithms assumes a single real channel until Section VII, where we discuss some multichannel issues. Some concluding remarks are presented in Section VIII. We want to mention here also that Appendix D contains a Fortran77 program listing, providing a software implementation of the 8N stable FTF algorithm.

11. GENERAL APPROACHES FOR ANALYSIS OF ERROR PROPAGATION

The analysis of the error propagation properties of adaptive algorithms is very much related to the study of their convergence and especially their tracking properties. It will therefore be no surprise that we will see some of those techniques used here. Discrete-time adaptive algorithms are recursive in nature and thus can be viewed as discrete-time nonlinear dynamical systems. Let 8 ( T ) (a row vector) denote the set of algorithmic quantities that determines the state of the algorithm. Then the adaptive algorithms considered here can be written in state-space form as

w )= f ( w - 1 1 , w), x,(T)).

(4)

It is possible to include the regressor X,( T ) in the state vector e (T). Then x ( T ) would replace X,( T ) in the argument of f ( ). This would allow for the study of the effects of feedback from the algorithmic quantities to the input signal on the algorithm behavior. However, we shall not deal with such feedback in this paper. We shall also not consider the effects of signal ( d , x ) quantization. Now, any numerical implementation of the algorithm will generate inaccuracies due to roundoff errors and representation errors in the processor. Therefore, the algorithm will run with perturbed quantities 8 ( T ) and can be described by

where the driving term V ( T ) represents the roundoff noise. If the algorithm turns out to be robust w.r.t. numerical errors and the wordlength used is sufficiently long, then the perturbations A 8 ( T ) 6 ( T ) - 8 ( T ) will be small w.r.t. the infinite precision quantities 8 ( 7'). Therefore we shall consider linearization of the system (5) around the unperturbed trajectory 8 ( . ) A 8 ( T ) = AO(T

- l)F(T)

+ V(T)

(6)

whereF(T) Vef(8, d ( T ) , X,(T))le=e(T-,). Ifthelinearized error system (6)is exponentially stable, then the implemented algorithm (5) will be locally exponentially stable around the unperturbed trajectory. If the system (6) is unstable however, then the algorithm will not be acceptable. A complete study of the effect of rounding errors according to (6) involves the following three steps: 1) error generation: describe the error V ( T ) as it is generated in the recursion at time T ; 2) error propagation: how do errors generated at time T propagate in subsequent recursions (assuming no errors are made in those recursions); 3) error accumulation: how do errors generated at different times interact and accumulate. Step 1 has been studied in [ l l ] , [26],[27]for different implementations of Kalman filters (and hence RLS algorithms) from a numerical analysis point of view, i.e., not considering V ( T ) as a stochastic process, but as a deterministic quantity and giving an upper-bound for its magnitude. Stochastic descriptions for V ( . ) have been given in [28],[8],[9],[13], [29],[14], [ 121 using a mixture of analysis, heuristics, and simulations. Usually, V ( . ) is described as a zero mean i.i.d. sequence, in,T dependent of the variables { d ( t ) ,x ( t ) , V ( t ) , t = 0 , - 1 }. The treatment in, e.g., [ 131, however, takes into account

---

95

SLOCK AND KAILATH: NUMERICALLY STABLE FAST TRANSVERSAL FILTERS

certain dependencies to describe resulting bias. Note that the description of V ( * ) depends heavily on the implementation (fixed-point, mixed precision, floating-point, normalization, square-root forms). Step 2 is a crucial step and has received quite a bit of attention recently [ 101, [ 1 1 1 , [ 151. It is highly desirable to have (6) exponentially stable because this will lead to bounded-input bounded-input stability in step 3. If the system (6) is found to be unstable, so that the effect of a single disturbance tends to infinity, then the corresponding algorithm cannot be used in continuous operation. Hence if stability of the error propagation system is an issue (as it is for FTF), then step 2 is by far the crucial step. It turns out that quantitatively describing the error propagation properties for the FTF algorithm is an extremely complicated task. Hence step 3 cannot be completed analytically. Therefore except for a few qualitative remarks on step 3, here we shall concentrate on step 2. The main problem of interest is thus the analysis of the stability properties of the autonomous system

A 8 ( T ) = A 8 ( T - l)F(T).

(7)

Note that this is a time-variant system and that F ( T ) depends on { x ( t ) , d ( t ) , t = 0 , * . . , T } . There are two approaches that can be taken now, depending on whether x ( * ) is considered to be a stochastic process or a deterministic signal. A. Stochastic Approach From (7),

nals. In [34], Kushner presents a host of averaging results in a stochastic context and also analyzes approximations over an infinite time span. We will present an averaging theorem that is geared towards the system (7) and is a straightforward application of some of the ideas developed in [34]. Note that averaging techniques apply to more general linear and nonlinear stochastic systems too. Now, assumepF(J) = Z E G (T ) where G ( ) 1s such that ( 1 / T ) E,‘= I G ( t ) G (convergence in probability), where is a matrix with finite elements. Under certain ergodicity assumptions, we shall have that F = EF( T ) . The convergence result of the average G ( T ) to holds, e.g., if we have F ( T ) measurable w.r.t. u { x ( t ) , d ( t ) , t = 0, 1 , . . . , T } , F( * ) integrable, { x ( ), d ( * ) } stationary ergodic and 4mixing with Ed, < w . This last condition implies that the signals should be asymptotically uncorrelated (no sinusoidal components). Now, let 8 ( T ) = O ( T - l ) F ( T ) with F ( . ) as described above. Then we have the following weak convergence result ( “ A * B” means “ A converges weakly to B,” or the probability distribution of A converges to the probability distribution of B , and the convergence is pointwise if the distributions are continuous). %eorem I (Averaging 7’heoIem): If?( 0 ) 8, then S(* ) * e ( . )( a s E .+ 0 ) , where 8 ( T ) = 8 ( T - 1 ) F a n d e ( 0 )

+

+

-

=

eo.

Hence, for small E , the time-variant system (7) can be approximated by a time invariant system that can be found by direct averaging of the system propagation matrix. The character of this approximation can be made more precise. Under certain additional assumptions, it can beshown that the normalized difference process ( 8 ( * ) - 8 ( ) ) / & converges weakly and asymptotically ( T + 03) to a stationary process with zero mean and finite covariance. In our error system analysis, we will identify E with ( 1 - A ) or a power thereof. So, the averaging technique will prove to be a tractable approach, leading to results that are valid for small ( 1 - A ) .

-

If x ( ) is a stationary process, F ( . ) will be asymptotically stationary for X < 1 , after an initial transient period of a few time constants ( 1/( 1 - A ) ) . Even though this stationarity does not prevent (7) from being a time-varying system, if x ( * ) has certain ergodicity properties then asymptotically (for large T2 T I ) the state transition matrix +( T , , T 2 ) has a certain “constant” eigendecomposition. In Appendix A some intuitive explanation is given showing why this might be true, and some known concrete results are highlighted. Now, though these results tell us something about the existence of a constant underlying eigendecomposition, little information is given as to how to compute it. In order to get an idea of these time-invariant system dynamics, a so-called averaging technique has been frequently used in the literature [ 131, [29]. The averaging principle leads to studying the stability properties of the average F( . ). This principle has been around for a while, see, e.g., [30], [31] where it has been justified based on an assumed separation in spectral content of F ( * ) and A 8 ( ), or used as a first step in an iterative procedure in which deterministic (average) and purely random parts of an operator are split, the random part is considered input, and the deterministic part is taken as the system. Recently, the averaging principle has seen a revived interest in control theory, see, e.g., [32], [33]. The results presented there involve deterministic averaging. However, one can get stronger results in a stochastic framework. Indeed, the deterministic averaging leads to an approximation that can only be shown to hold over some finite time span, unless one assumes periodic or quasi-periodic sig-

B. Deterministic Approach If x ( * ), d ( * ) are known deterministic signals, then the sequence F( ) can be computed in principle and the stability properties of system (7) can be investigated for the given sig-

-

nals. In general, this approach will be intractable. An interesting special case however is the class of periodic signals. Ifn( * ), d ( ) are periodic with period T , it readily follows that the dynamics of system (7) will be governed by the magnitudes of the T ) . This approach can Tth root of the eigenvalues of + ( t , t lead to a manageable task for a multitude of interesting periodic signals and an exact analysis is possible in many cases, leading to valuable insight into the stability properties of the error system.

+

111. THE RLS ALGORITHM

In order to prevent singularities during the initial phase and also to provide the opportunity to incorporate prior knowledge, we shall consider the minimization of an augmented cost function

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 39, NO. I , JANUARY 1991

96

where WO = [ W : . . W : - ' ] represents the initial condition, A = diag { X N , A N - I , * * , X } (8I,, and p is a positive scalar weighting factor. The representation of this least squares problem can be done most concisely by incorporating the initialization and the exponential weighting in the data vectors describing the time series, as has been done in [16], [35]. We will now consider the vector space spanned by these augmented data vectors dT, xT, x T - . . . , x T - + I . An error vector for the linear least squares (1.I.s.) problem is introduced as

RLS algorithm consists of WN, and the rows of (the upper triangular part of) RL,\. We now turn to the error analysis of the RLS algorithm. We shall assume that the signal x ( ) is persistently exciting. The notion of persistent excitation refers to the conditioning of the data matrix [37], [2] and we will call x ( . ) persistently exciting if X N , Tand hence RN,T is full rank for all T. The error propagation for the RLS algorithm (17)-(22) can then be described by the equations

where X N , T g [ x T . . . x T - N + ] is the data matrix. The sum in (9) is recognized as the Frobenius norm (which reduces to the Euclidean norm in the single channel case) of e N ,

1 AWN,T = AW,,T-lFI(T) - EN(T)XNH(T)ARG,IT-Ix F l ( T )

FN(T) = mintr{~i',T~N,T}.

(11)

(24)

The solution WN,Tto the 1.I.s. problem satisfies the following orthogonality condition:

where F , ( T ) I X,(T)C,,. = Z - X N ( T ) X : ( T ) R , ' ,= XRN, T - I RL,&. This error system consists of two subsystems that are coupled. In order to be able to apply the averaging theorem, we introduce the following scaling: R;,IT ( 1 / 1 - X)R,',. This leads to consideration of the following system:

WN T

(W,,TX;,T + dF)XN.T = 0

(12)

and hence we find WN,T=

-~ZV,T(X:.TXN,T)-~.

(13)

To facilitate the solution in the vector space, we introduce the sample covariance matrix R, 2 X H X , the filter operator (transpose of the left inverse of X ) K , A X ( X " X ) - ' = XR,' and the projection operators onto the column space of X , Px X ( X " X ) - ' X " = X K ; and its orthogonal complement Pk $2 I - Px.With the notation thus introduced, we can rewrite (13) as WN,T= -d?KN

(14)

T

+

and it readily follows that

- ( 1 - X ) E ~ ( T ) X : ( T ) A I ? ; , ' , -x1~F , ( T ) .

Since = X I and eN( T ) X c ( T ) = 0 , the averaging principle of Theorem 1 leads straightforwardly to the following error system with averaged dynamics AR-1 N , T -- XARi,&-l,

fN,T

t N ( T ) = tr {d:P,+,TdT}.

= P,+,TdT,

,

CN

+ x N , T ~ i , T l l ;= Y N ( T )

= uHp,+.T~

110

Hence the RLS algorithm has an exponentially stable error propagation mechanism with base h, as has been concluded also in [lo], [27]. For X = 1 , consider the following rescaling R;,lT ki TR,',. We have EFI ( T ) = ( T / T 1 ) I + I (hence, the effect of the transient is stabilizing, rather than destabilizing (see [38] also), but nevertheless it leads to an unstable system asymptotically) and eN( T ) X : ( T ) = 0. We get the following error system with averaged dynamics:

+

ARi,& = AR;,lT-l,

resulting in CN,T= - c H K N , T .The actual time update can be derived in the vector space formulation [35] or by algebraic means [2] and is given by = - X f ( T)h-IR,\_

yil(T)= 1

I

- cN,TXN(T)

EPN(T)= d ( T )

+

WN,T-iXN(T)

= E%(T)YN(T)

WN,T=

wN.T-1

+4T)G.T

RG,', = 1-I RN- .IT - 1 - G , T Y N ( T ) G , ~

cN,T

(27)

(16)

T

CN,T

AWN,T=XAWN,T-I.

(15)

In order to arrive at a time recursion for the filter solution W N , T , it is useful to introduce a pinning vector U [ 1 0 . . *0IH [36] so that we can describe the influence of the most recent data samples d ( T ) = d F u and x ( T ) = xFu on the old solution WN,T - . It is well known [2] that the filter recursion involves the Kalman gain and an interesting interpretation of the Kalman gain CN,T(in unnormalized form) as the optimal 1.1,s. estimation filter for the pinning vector is possible, viz., min

(26)

(17) (18) (19)

(20) (21) (22)

It can be shown that = C N , T ~ , (i jTl ) . The initial conditions are WN, = WOand RL,l-I = { P A } - ' . The state e(T ) for the

AWN,T=

(28)

and this leads to a random walk phenomenon for the total accumulated errors AWN, and AR;,IT (whether we actually get precisely a random walk or an integrated random walk or the like will depend on the implementation (fixed or floating point), but the error system dynamics are in any case unstable and of the random walk type). Actually, in the case of the RLS algorithm, the averaging (TI, T 2 ) analysis can be made even more precise. Introduce $2 F , ( T , 1) . . . F , ( T , ) = RN,TI Ri,k2. With X < 1 and T2 - TI >> ( 1 / 1 - A), R N , T I and RN,T2will be independent. Then E @ ,( T I , T 2 ) = hT2-T11 since E R , & = { ERN,T } - I for sufficiently small ( 1 - A). Or if x ( ) is a periodic deterministic signal with period TI, then ( T, T + TI ) = hT'Zexactly! (Note that after an initial transient, the influence of the initial conditions has completely died out.)

+

-

SLOCK AND KAILATH: NUMERICALLY STABLE FAST TRANSVERSAL FILTERS

91

TABLE I1 MULTICHANNEL COMPLEX STABILIZED FTF ALGORITHM Computation

#

Cost NP * 1 . 5 + ~ 0~ . 5 ~ (N

-

l)P2

P

NP 1 . 5 +~ 0~. 5 ~ P2 1 . 5 ~ '+ 0 . 5 ~

3P P

NP P

NP 1

P

NP P2 + P

2P NP * 2p2 + 3p 3

NP9 9 NP9

Single real channel total (3 divides) N ( 6 p Z + 2pq

Multichannel total (3 divides)

IV. THE FTF ALGORITHM Our main interest is in the fast transversal filter algorithm. The equations are more complex than for the RLS algorithm and are shown in Table 11. We shall briefly comment on some of the key features. The FTF algorithm uses the same time update (2 1) for WN, as the RLS algorithm. This time update is a reflection of the structure X c , T = [ X N (T ) X c , T - I ] . However, the Riccati equation is not used to update the Kalman gain C N , T . Instead, a second time-updating mechanism is invoked that is made possible by the special structure of X N , - . Indeed, we can go from X N . T -I to X N , Tby doing an order update, followed by an order down date, i.e., [xT

XN.k-11

=&+i,T=

[XN,T

xT-NI.

(29)

It is exactly the existence of this second possibility that makes a fast algorithm possible. These order up and down dates lead to equations 11-(3),(4) and 11-( 11),(12), respectively (11-(3) denotes Table 11, equation 3). They involve the forward and backward prediction filters AN,r and B N , T and their outputs (prediction residuals). To do an order update by adding the vector xT to the space spanned XN,T - it is natural to consider the prediction residual (innovation) eN, of xT w.r.t. XN,T - I , since

+p ) + 7

9N + 23 . 5 +~ 1~4 . 5 ~+ q + 4

this leads to an orthogonal decomposition of the total column space of XN+ I , T. This is how the forward prediction filter AN. arises. And similarly for BN, in the order down date. Formally, the filters AN, and BN, are the solutions to the following forward and backward prediction problems: ?in tr { AN,T R N + I , TA:,

T

}

= tr { e f , , e N , , } = tr CYN(T)= trxl/Pk,,-,x,

min tr { BN, T R N + I . T B E , T } T=lD

=

tr

{rc.TrN,T}

= t r P N ( T ) = trxy_,P,$TXT-N.

(30)

The time update for AN, and BN, progresses in the same way as for W N , Tand leads to equations 11-(15)-(17) and 11-(18)(20). We also have e % ( T ) = A N . r - l X N + l ( T ) . e N ( T ) = A N , T X N + I ( T )= e % ( T ) r N ( T - 1 ) and r % ( T ) = B N , T - I X N + I ( T ~) , N ( T = ) B N . T X N + I ( T= ) 4 (T)y,v(T). Note that apart from this possibility for computing r g ( T), which we will denote by r$( T ) (superscript f denotes filtering), there is another possibility that follows from the last entry in the order downdate of see 11-(8). We shall denote this second

cN+

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 39, NO. I. JANUARY 1991

98

computation by r c ( T ) (superscript s denotes computation by I,T manipulation of scalars). The two ways of computing follows immediately from the corresponding computations of rpN( T ) . Also y i l ( T ) can be computed in several different ways and we shall consider three of them. y i I ( T ) can be updated by using its interpretation as the inverse of the energy of the residual in estimating U , see (16) and 11(4),(12). But it can also be or in an altercomputed as the output y i f ( T ) of the filter C&, native way y i a ( T )(see [16], [39]), viz.,

{WI

c;+

filter estimate 7,

prediction error

roundoff error

v5?L.jpl.

..

t+- --

A@ ( T )

--U

(32)

and similarly for c;+l,T(see Table 11-(10)) and y i ' ( T ) (see Table 11-(14),(21), where y i ' ( T ) denotes an intermediate result). Now, the quantity rpN( T ) is used in several instances in the algorithm, so if we use different values for the feedback constant K at those different places, then we shall have more freedom in affecting the error system dynamics. The motivation behind this particular choice of feedback structure is that it is intuitively appealing and it will allow us to stabilize all unstable modes in the error propagation, which is the ultimate goal. The algorithm presented in Table I1 reflects all the error feedback thus introduced. Note that the original 7 N algorithm is recovered by choosing all K, = 0. Again, we emphasize that the quantities computed by the algorithm will be independent of the K, if infinite precision were available. A schematic picture of the state-space representation of the FTF algorithm and its associated multivariate numerical error system in Fig. 1 summarizes some of the issues just considered. The output y ( T ) of the error system stands for the three-component signal Y ( T ) = [ r W ) - r c ( T ) r i f ( T ) - yNS(T)

Yia(T)- Y i S ( T ) ] .

(33)

The box labeled K is a network of gains K , that feed the three output signals back to various inputs in the algorithm.

y(T)

I I I

I

+ K ( r $ $ ( T )- r c ( T ) ) = K r $ ( T ) + (1 - K ) r g ( T )

t

I

I I

rpN(T) = r c ( T )

F

._..

t---

The computationally most efficient 7 N form of the FTF algorithm [16], [39] uses r e ( T ) , yis( T ) and avoids the innerproducts of the filtering operations. When infinite precision is used, the different ways of computing these quantities will yield identical answers. However, the answers will differ in any practical implementation and the difference will be purely due to numerical error. Hence, these difference signals can be regarded as output signals of the error propagation system, and one can now start thinking of using them in a feedback mechanism in order to influence the error propagation. However, an important question that arises here is: which feedback structure to use? Indeed, these difference signals can be fed back to any point in the algorithm without disturbing the true RLS character of the algorithm, since their values would be zero if infinite precision were available. The specific structure we propose can be described as follows: feed the difference signals back into the computation of the quantities they are associated with. Or, in other words, take as the final value for those quantities a convex combination of (the) two ways of computing them, viz.,

error m e

L - - - _ ,

(b) Fig. 1. Block diagram of the state representation of the FTF algorithm and its corresponding numerical error system. (a) Implemented algorithm, (b) roundoff error system.

The divisions that have to be performed are a "1/x" oper, y i ' ( ~ plus ) @,(TI in ation for y i I ( ~ ) y, i L 1 ( ~ )yiS(~), the single-channel case (only if K4 # 0 ) , and y,$*)( T ) (see further, (59)) and det ( a,( T ) )(see 11-(21) and (60)) in the multichannel case. Of course, one can alternatively view the singlechannel case as a special case of the multichannel algorithm. Now, since y i S (T ) - 7'; ( T ) is presumably very small compared to y i I ( T ) , one can use a first-order Taylor series expansion for computing y N (T ) (34)

Y N ( Z - 1 = Y;G(T) [2 - yi'(T)y;G(T)I

and similarly for y;'( T ) and yi$')( T ) , keeping the number of divisions to two in the multichannel case. Also note that the computation of the last coefficient in 11-(3) would lead to but we have separated out this- computation and put it as 11-(7) since the last coefficient of C,,,,, needs a separate treatment (for general K 4 ) . We have described above a time updating mechanism via order up and down dating that can be depicted as X N ,T - -, X, + -, X,, T. However, to solve the joint-process problem (then the role of the prediction part is just to provide and y i I ( T ) ) , the following four strategies are possible:

er+

e,,

1)

X h J , T - l -, X , + I , T

2)

XNT-I

3)

XN-1.T-I

4, x N + l , T - l

-+

-+

XN-1.T-I -+

XN,T

-, X N , T

X N , T -, X N - 1 . T

--* x N , T - I

-, xN+l,T

(35)

where 1) is the strategy discussed above. But 1) and 2) are equivalent to 4) and 3), respectively, since a reversal of the order of the update and the downdate is the only difference. Hence 1) and 3) are essentially the two possibilities to be considered. The algorithm for the computation of yil(T ) corresponding to strategy 3) can be obtained very straightforwardly from the one corresponding to 1): just replace N by N - 1 in the prediction part of Table 11. So if the purpose of the FTF algorithm is to do joint-process filtering, then this second strategy is preferable since it leads to a slightly lower com-

~

99

SLOCK AND KAILATH: NUMERICALLY STABLE FAST TRANSVERSAL FILTERS

putational cost and it is closer to the RLS case in terms of dimensions of data matrices. However, if the FTF algorithm is used for spectral estimation and whitening (i.e., for prediction only, the goal of which is now to compute AN, andlor BN, ), then one uses of course the order N that is desired, and hence strategy 1). Consider now the issue of error propagation. It is clear that the joint-process extension depends on the prediction part, and that this latter part runs independently. Therefore, the error propagation dynamics for the joint-process extension are completely identical to those (for WN, ) in the RLS algorithm, and hence the results of the previous section are immediately applicable here also. The prediction problem on the other hand needs to be studied separately and we will now concentrate on this part only. The initialization for the prediction problem is as follows: ~4,hJ-l

= [Io

[0

* * *

0I ] , /3N(-l)

= [O

* * *

O],y,'(-1)

BN,-l = IfN,-,

. . . 01, ( Y N ( - l )

=

XN/.bzp

= pIP = 1.

(36)

This brings us to the issue of defining the state vector for (the prediction part of) the FTF algorithm:

and the upper right 4 X 2 block will remain 0. So the diagonal elements of the product of F ( T ) are the products of the diagonal elements. Now, the diagonal elements are the eigenvalues. Also, since X = 1, B N , , will converge to the filter B, = EBN,p For the first four components of A 0 ( T ) , we get a random walk phenomenon. Hence the algorithm is not stable for X = 1! However, since the instability is not exponential it might take very long, depending on the wordlength used, before errors become noticeable. For A t N q, taking K4 = 0, the eigenvalues of the companion matrix F,, are the zeros of the backward prediction filter B N ( z ) .It was shown in [40]that the backward filter in the prewindowing method has its roots inside the unit circle. Hence, for prewindowing with exponential weighting, the roots of the backward predictor are in a circle with radius 1/A. For X < 1, this means that the backward prediction filter may not be minimum phase. However, B N , is an unbiased estimate of B N , and hence, on the average, the roots of BN* will be inside the unit circle. Nevertheless, one may want to choose K4 # 0 in order to influence the eigenvalues of F,,. It is obvious that the optimal values are K4 = K3 = 1 (for X = l!). This makes the eigenvalues associated with A t N , and AyNI ( T ) zero. For the case K6 = 1, we just have the 5 x 5 principal submatrix of F ( T ) in (39)as system matrix.

,

B. Averaging Analysis

0 ( ~ )[A,,,

yiI(~)l (37) We consider an averaging analysis assuming the input signal to be some stationary random process. We have seen that avwhere AN, and BN, are defined by A = [I A ] and B = [ I 1, G , T

~ N ( T ) BN,T

respectively. The error propagation system can be described by A0(T) = A0(T

- l)F(T)

(38)

+

and F ( T ) (a(3N 3 ) X (3N + 3 ) matrix) is given in Appendix B. The general expression for F ( T ) is prohibitive enough to make impossible any inference about its dynamics. Therefore we shall examine the following aspects. Each time we shall first consider the case K6 = 0, and then examine what happens for the case K6 = 1 (this will cover all cases of interest). A. Limiting Case: X

-+

eraging can be applied to slow modes (in this context, slow modes have eigenvalues that are ( 1 - X)-close to identity). From the previous analysis, it is clear that F ( T ) in general is slow for only part of the system. Hence, application of the averaging technique is not immediate. The details of this analysis are worked out in Appendix C. The averaging yields that under certain conditions the error process A 0 ( . ) converges weakly 0 and for T large) to the state of a system of (as 1 - h = E which the eigenvalueere given by the eigenvalues of the average diagonal entries FII * * & of F ( T ) , which are +

-

1

In this case, an exact analysis is possible. We get asymptotically

I * O *

0

0 1 0 0

0

*

0

0 0 0 1

0

0 0 1

F(T) =

1

* 0 * * o * o *

SH-

(l

- K4)uNBN,T-l 0

SH - (1 -

0

O 0

I

* 1 - K3

1

(39) where uN is the Nth unit vector and S is a ( N X N ) shift matrix (ones on the first subdiagonal). In what follows, we will concentrate on the block entries of F( T ) and consider F ( T ) as a (6 X 6 ) block matrix. Fjj( T ) will denote block entry (i, j ). However, momentarily consider F ( T ) as a 2 x 2 block matrix. Then it is obvious that products of consecutive F ( T ) will keep the same structure as F( T), namely the upper left 4 X 4 block and the lower right 2 x 2 block will remain upper triangular,

&)UNBN,

1-

K3.

(40)

After an initial Jransient period, the slow modes are dominating. The above eigenvalues show that the original algorithm ( Ki = 0 ) is exponentially unstable, due to the 1/ X entries for ABN, and A@,( T). It indicates that the algorithm can be stabilized by proper choice of the feedback parameters K j for X in a certain interval X E [ 1 - eo, 11. This interval in which the above analytical results are valid is quite small, among other reasons because we had to assume that J1-X is small w.r.t. 1. Also, for any finite 1 - X, the above expressions become worse approximations for the error system modes as the correlation sequence of the input signal becomes more slowly decaying. Note that these expressions are only correct if the slow modes are truly slow. One might be tempted to make certain eigenvalues zero by choosing values for K 1 , K 2 that are O ( 1 / ( 1 - A ) ) . However, this range of values for the feedback parameters makes the corresponding modes fast, so that the above expressions would not hold. Hence, the above expressions do not in-

IEEE TRANSACTIONS ON SIGNAL PROCESSING. VOL. 39. NO. I , JANUARY 1991

100

dicate that one can “zero out” certain eigenvalues by appropriate feedback. Notice that it is exactly the unstable modes that can be modified by the error feedback considered here; the stable modes (in F, , FZ2) are unaffected (up to first order in ( 1 - A ) ) . From a standard state-space description of linear systems point of view, consider a decomposition of the state into a controllable and an uncontrollable, and an observable and an unobservable part. Since output feedback is able to stabilize the unstable modes, this means that the unstable modes of tne open loop system are both controllable and observable in the feedback structure that we have chosen. Notice also that K, has disappeared in the above expressions because it is a higher order effect ( 0 (( 1 - A)*)). The last eigenvalue is given by 1 - K 3 , up to first order in ( 1 - A ) . The influence of K3 on the other Furthermore, the sixth (block-) eigenvalues is 0 ( (1 state disappears for K3 = 1. So for K3 = 1, the last eigenvalue equals 1 - K3 exactly for all A. Therefore, 1 - K3 is probably a good approximation for the last eigenvalue over a larger range of X (see an experiment described further in connection to this). For K6 = 1, the sixth (block-) state also disappears. What hap-

explicit solution for various algorithmic quantities:

,

We can substitute these expressions in the error system matrix F ( T ) as given in Appendix B. We will assume X < 1. Consider , T ) are rea scaling of the state vector in which A c N N . TAy;’( placed by XTA(?N,T, XTAy,l( T ) . Then we get asymptotically for the rescaled system matrix

F(T) =

pens to the rest of the system is a mild coupling between the error components A a and A0 which pushes their modes towards each other, starting from the values given by their diagonal entries. An exact analysis for all X E (0,11 can be done for simple deterministic cases. We consider two examples with N = 1. While the averaging analysis gives results that indicate how one might choose feedback coefficients that will work well in an average situation, the analysis of the deterministic examples will show how one can do (significantly) better if one has complete knowledge of the particular signal involved. The analysis will also indicate though that exploiting such specific knowledge makes the algorithm more vulnerable to errors in the assumption on the signal. Indeed, the optimal feedback coefficients of one example might not work at all when applied to another example. However, choices of feedback coefficients based on the averaging analysis will perform reasonably well in (almost) all cases. The deterministic examples will also give an idea of how small the largest eigenvalue can be made and how the sensitivity of the eigenvalues with respect to the feedback coefficients can increase for decreasing A.

C. Deterministic Example 1 Let x ( T ) = 1 (dc signal). This is the extreme case of a perfectly predictable signal. One can easily find the following

Let vi denote the set of eigenvalues “associated” with Fii. We see that v , = v2 = v4 = A. For K3 = 0, we have v6 = 1. However, if we choose K3 = 1, then v6 = 0,and the following quadratic equation remains for w 3 , w5: v2 - [ 2 - ( 1 - X ) K ,

- (2

- X)K4] v

+ 1 - (1 - X)K,

- (1

-X

We can make v3 K4

=

+ X2)K4 = 0 .

(43)

v, = 0 with the following choices for K , ,

K,

=

X ( 1 - 2X) (1 - P ) ( 1 - A )

1 K4 = 1 - X2‘ Notice that for h close to 1 we get a large negative and positive value for K , and K4, respectively. With the suboptimal choice K4 = 0 (see further), we have v3 = 1 - ( 1 - X ) K , , v5 = 1. Any positive value of K , (up to 2 / ( 1 - A ) ) will stabilize v3. Notice that v5 = 1, which means that the zero of the backward prediction polynomial lies on the unit circle since we have a perfectly predictable signal. Also notice that the optimal choice of K , depends heavily on the choice for K4. For K6 = 1, there is no change in the eigenvalues, except v6 = 0 of course.

-

101

SLOCK AND KAILATH: NUMERICALLY STABLE FAST TRANSVERSAL FILTERS

D. Deterministic Example 2 Let x ( T ) = 0.5 + O S ( - l ) T . This is the extreme case of a perfectly unpredictablezignal (at least with a first-order prediction). Indeed, we have AN, = BN, = 0, V T. Furthermore, we have asymptotically for T even 1 e N ( T )= eP,(T) = 1, a N ( T ) = 1 - X2’

the roots of this quadratic equation turn out to have quite simple expressions and one can show that v , = A, v i = ( 1 / X 2 ) [ 1 K , ( 1 - A ) ( 1 + A)’]. TO illustrate some sensitivity issues, consider the problem of minimizing I v, 1 max { I v, 1, I v3 I } . The minimum 1 v,, I = X = I v , I will be achieved if v i E [ -A2, h21, or hence for 1 + h4 1 + h2 KYpt E [ KYPhin,K9Phax]= 1 + h ’ ( l - X ) ( 1 +A)’

[

x rN(T ) = rP,( T ) = 0, ON(T ) = -

(47)

1 - h2

e,(T-

1 ) = eP,(T- 1 ) = 0, a N ( T - 1 ) =

r N ( T- 1 )

=

rP,(T - 1 ) = 1 , ON(T - 1 ) =

~ N ( T 1- ) = 1 ,

CN.T-1

=

0,

C;+I,T-I

c;+,,p,= --1 -A’ x2.

h 1 - X2’

1

= 0,

(45)

This optimal interval for K , as a function of ( 1 - A > is indicated in Fig. 2 by dotted lines. Also shown there is the graph of I v, 1 for two fixed values of K , (as we have found to hold more generally, the choice of a feedback gain closer to 1 works better for low filter orders). Note how the choice of a fixed K, L 1 leads to an interval for ( 1 - h ) of length shorter than 1 in which stability is achieved. In Fig. 3, the sensitivity of the choice of a fixed feedback gain K , for a given h is illustrated. For small ( 1 - A), a wide range of values for K, will allow optimality. And even though optimality can be achieved for arbitrarily small A, the choice of the feedback gain becomes extremely critical for small values of A. For K6 = 1 , the modifications are similar to those occurring in the averaging analysis.

Since the system is periodic with period two, we investigate F( T - 1 ) F ( T). We find F(T- l)F(T)=

There are two decoupled sets of modes, namely { v,, v 3 } and { v2, vq, v,, V g } . Alternatively, one could consider the product F ( T ) F ( T + 1 ) from which one can infer that the sets { v,, v2, v,} and { v2, v4, v6} are decoupled and that v, = 0. To investigate v2, v4, v6 further, one may observe that K, is only of secondary importance (effect of O ( ( 1 - A)’)). Making the choice K5 = 1 (or equivalently neglecting terms of O ( ( 1 A)’)), one can find v2 = A, v i = X2 2 ( 1 - K 2 ) (1 - A’), and v6 = 1 - K 3 . K z and K3 allow for arbitrary assignment of v4 and v6, respectively. Considering now the expression for F ( T - 1 ) F ( T ) above, we find that a quadratic equation remains for v,, v3. However,

+

v.

PUTTING THE STABILIZED

FILTERIN PERSPECTIVE

The influence of the different feedback parameters can be summarized as follows. K , is only of secondary importance. K4 is also not necessary for stabilization (except in extreme cases), but only adds additional freedom in the shaping of the error system dynamics. K3 = K6 = 0 leads to an 8N algorithm, and to a random walk for Ay;’( T). This has been confirmed by simulations, which show that with K3 = K6 = 0, the error variance increases perfectly linearly with time, over millions of

102

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 39, NO. I , JANUARY 1991

with computational complexity less than 8N (6N for the prediction problem). Imagine that one tried to solve the stability problem of FTF by constructing a leaky FTF algorithm, in analogy with the leaky LMS algorithm [7]. Such an algorithm would introduce leakage factors p in e.g., the update of B N , T as B N , T = p B N ,T - I . . * . It is clear from our analysis that one would need p < X to numerically stabilize the algorithm. However, such a low leakage factor would severely degrade the filtering performance by introducing a significant bias in the filters. In [16] and [20], it has been correctly pointed out that some of the numerical instability problems of the FTF algorithm are associated with the backward prediction error energy ON( T ) . In both references, it has been suggested that a "more stable" algorithm can be found by computing r;(N) as r$( T ) instead of rpNs ( T ) and that this modification will eliminate the need for propagating ON( T ) (these modifications also hold for the original fast Kalman algorithm [l], and the following comments also apply to this algorithm). From our analysis, we can see that this will lead to an 8N algorithm with ( K , , K 2 , K 3 , K4, K,, K 6 ) = ( 1 , 1 , 0, 0, 1, 0). So v4 is well behaved. However, v 3 = 1 for X very close to 1 , and is actually exponentially unstable for smaller values of A. Nevertheless, this variant leads to a situation where the instability is much closer to a random walk phenomenon than to the usual exponential instability. One can similarly analyze all other existing fast Kalman type algorithms and one will find that they are exponentially unstable (at least those algorithms that do not have a stabilization mechanism that is similar to the one proposed in this paper). For instance, to mention one last such variation, consider the 8N FTF algorithm that employs the unnormalized Kalman gain CN, [ 161. This algorithm is also exponentially unstable, similarly to the all K, = 0 case of the FTF algorithm discussed in this paper. Apart from the above modification, which takes care of the instability associated with PN ( T ) , the solution proposed in [20][22] also introduces (without analysis) the following mechanism to stabilize the instability associated with B N , p Since r$( T ) and r$$( T ) yield different computed values, one would like to modify the backward prediction filter so as to reduce this difference. In [20]-[22], a modification of B N , T - , is considered, which is a modification at the beginning of the update (from T - 1 to T ) . Equivalently, one can consider a modification a_' the end of the update, modifying B N . into some filter B = [ B I ] (this alternative would save some computation). So, one would hope that the output of the modified backward prediction filter, r f , ( T ) B X N + , ( T ) ,can be closer to the

+

0.4

-

0.2 -

1-X Fig. 2 . Eigenvalue urnax for deterministic example 2: 1 urnaxI = X for K, = KYm, where KyP'takes on any value between K:!,,," and KYPL,,, which are shown by the dashed lines. The two other solid lines represent I U,,, I for fixed K , = 1.5 and K , = 1.0, respectively

I vmaz I

0.4

-

0.2 0

"

"

"

"

'

Fig. 3 . Eigenvalue umax for deterministic example 2 , as a function of the feedback parameter K , and parameterized by A. 1 U,,, 1 = X over a certain interval for K , , that shrinks in length as the value of X decreases.

samples. Eventual blow-up may take a very long time, but will occur eventually. This also shows that, at least for V 6 , the diagonal element F66 of F dominates in making an inference about v6, even for small values of A. For the case K3 = 1 , K6 = 0 or the case K6 = 1 , the state component y i ' ( 7') is computed as a function of some other state components, and hence is no longer an actual part of the state, or equivalently, v6 = 0. Both yf,( T ) and y(: T ) offer valid ways of avoiding the random walk associated with yh( T ) . However, since the computation of y$( T ) does not involve an inner product, it is to be preferred. So this corresponds to taking K6 = 1 , which leads to an 8N algorithm. Finally, K , = K 2 = 0 leads to the original 7N algorithm with exponential divergence. Note that both forms of the 7N algorithm, the original FTF algorithm [16] and the FAEST algorithm [39], have exactly the same linearized error system and therefore the same numerical instability problems. Slight differences that are observed in practice for the actual time of "blow-up" are due to differences in nonlinearities, which become an issue when the errors reach the order of magnitude of the computed quantities. The actual algorithmic difference between those two 7N algorithms is essentially minor and concerns a difference in organization of the computations for the update of yN( T ) . In summary, from the above considerations it appears that no numerically stable FTF algorithm can exist

backward prediction error r;( T ) yN( T)rpNs( T ) . Indeed, one can obviously choose B so as to make these two values equal. However, on the other hand, B should not deviate too much from its original value, B N , . . Therefore, consider as optimal choice for B the filter that minimizes the following composite criterion: mjn

(( B-

BN,T)RN,T(B

-

~N.j-1"

+ p ( r i ( T >- r s ~ ( ~ ) f ] . (48)

p

> 0 is a weighting factor that can vary the balance between

the closeness of B to B N , and the closeness of r f , ( T ) to rSy( T ) . The solution of this quadratic optimization problem can be found to be

103

SLOCK AND KAILATH: NUMERICALLY STABLE FAST TRANSVERSAL FILTERS

+

where { = ( 1 / ( 1 - y N (T ) p - ’ ) ) > 0. One now takes this solution B as the final estimate for BN, in the current updating session. Since we know that diagonal entries are dominant in the first-order error analysis, consider the error propagation from AB,,.- I to AB. One finds

instant. Here is a time-varying feedback scheme that does a correction only once every To samples

i

Y N S ( T )+ &i’(T

YNI(T) =

yNS( T ),

Applying the averaging technique, we arrive at an eigenvalue equal to v 3 = ( l / X ) [ 1 - {( 1 - A ) ] . Hence, the parameter { (or p ) is related to our feedback coefficient K l . Now, the interpretation of the criterion in ( 4 8 ) suggests that any positive value of p will do, and maybe one might even think that this value should be as large as possible to stabilize well. However, the framework of this paper shows that the underlying concept is that of a feedback mechanism and one can clearly only choose the feedback gain within a certain interval in order to stabilize the system. Also, the structure of the solution presented in [20][22] is not sufficient since it fails to address the instability problem associated with y N (T ) . More recently, yet another solution was presented in [24] that can be seen to correspond approximately to the case K3 = K6 = 0 in our stabilized algorithm; it also leaves out the issue of y N (T ) and simulations have confirmed the existence of a random walk associated with AyN( T ) in their algorithm. The failure in [24], [41] to recognize the numerical problems associated with the computation of y N ( T ) reveals the inadequacy of their analysis, which alternatively is based on the wrong assumption that off-diagonal terms of O(E ) in F ( T ) can be neglected to conclude that the diagonal terms of the form Z + O(E ) give the eigenvalues of F ( T ) , or on the unacceptable assumption that part of the state A e ( T ) is error free. It should be noted also that it appears that with the modification introduced in [42], another 8N FTF algorithm is obtained in which the propagation of y N (T ) can be stabilized. The redundancies we have discussed so far are not the only possible redundancies that can be exploited and still keep the algorithm complexity of O(N ). Actually, by increasing the computation count by 3N, it is possible to eliminate the quantities a N (T ) and O N ( T ) from the state vector. Indeed, these quantities can alternatively be computed as the output of the forward and backward prediction filters with sample correlation lags as input [16], viz.,

X~ the numerical problems that where p k ( T ) X F - ~ However, are possibly associated with these two quantities have been adequately handled in the stabilized algorithm described above, so it is not necessary to take this alternative route for numerical purposes. The main numerical issues are associated with BN, T , and possibly Other redundancies can be obtained by running alternative strategies in ( 3 5 ) in parallel. See also [ 4 3 , table 111 for a few more possibilities. Although we have found a stabilized FTF algorithm with complexity 8N ( K6 = 1 ), it is interesting to note that one can obtain an 8N approximation of the 9N stabilized algorithm with K3 = 1, K6 = 0.Since the dynamics of Ay;’ ( T ) are well separated from the rest of the system, and the eigenvalue v6 equals 1 for the unstabilized FTF algorithm (a very mild instability), it is in practice not necessary to correct y i I ( T ) at every sample

cN,

- To) - yNS(T - T o ) ] ,

if T / T o = integer otherwise.

(52)

So one needs to evaluate yGf(t) only once every To samples, spreading the cost of N operations evenly over To samples, and, e.g., the choice To = N will lead to just a constant cost per sample for the stabilization of yN’ ( T ) . To analyze the resulting error dynamics, let us simply denote the update of yNS(T ) over To samples in the original FTF algorithm as yNS( T ) = (T - T o ) + $ ( T ) . Then from (52), we find

+ t [ y ; f ( T 2To) for integer TITo - ~N’(T 2 ~ , ) ]+ $ ( T )

y N S ( T )= y N S ( T - To)

-

(53)

a

with characteristic equation zZTo - zTo + = 0. This means a double pole for the errors on yNs(T ) and y;’( T ) at the stable ) 2 (for large T ~ )neglecting , point z = ( + > ‘ / T o = 1 - ( I / T ~ In terms of O(1 - A). Note that the coefficient in (52) is optimized for stability. This 8N approximation works fine with TO = N for a variety of examples that we have tested.

a

VI. FINDINGTHE OPTIMAL FEEDBACK GAINS K, The averaging analysis considered above has given us analytical results that hold for values of X in an interval [ l - 6, 1 1 for some small 6 > 0. For X E [ 1 - 6, 11, the moduli of the stabilized error system eigenvalues are decreasing as X decreases. Hence, by continuity, one will be able to stabilize the error system also for h in a certain interval below 1 - 6. However, we do not have analytical tools to investigate the eigenvalues for values of h in this extended range. The deterministic examples we have analyzed have yielded analytical results for X E (0, 13. But these results apply only to the specific signals considered there. So the issue becomes how to determine good values for the feedback gains that will be generally applicable, with X ranging over all values for which the error system can be stabilized. In general, the optimal K, can be a function of A, N , and the signal characteristics. The optimal choice for K4 appears to be a tricky matter. From F,, = S H - ( 1 - K 4 ) u N B Nand some of the analysis we have considered before, one might think that K4 = 1 would be the optimal choice. This option would reduce F,, to S H , which has only zero eigenvalues. However, for X outside the small interval [ 1 - 6, 1 1 , K4 = 1 leads to a much poorer performance than K4 = 0, or even instability for any signal for # 0 (hence, any colored signal). Also, any value for which K4 that is significantly different from zero appears to make the optimal values for the other K, depend heavily on the signal characteristics. Note that the maximum eigenvalue of F,, is a very nonconvex function of K4. The dependence of the optimal value of for instance K I on the signal characteristics occurs through the K4 term in F33.It is actually possible to devise a 9N algorithm in which the update equation for f ? N , T is computed and a twice: once as is, to compute the actual updated I , T = f?p+ to be used solely for the second time with

zN

e;+

cN,T,

104

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 39. NO. I . JANUARY 1991

update of B N . p The matrix F corresponding to this modified algorithm can be found from the original matrix F by putting K4 = 0 everywhere in its third column. Nevertheless, it was found that this modification made little difference in the issues considered above, namely that K4 = 1 doesn’t work and that K4 > 0 makes the optimal K , strongly signal dependent. This fact must be explained in terms of the interaction with the offdiagonal elements in F , which is nonnegligable for the values of X considered here. A very small positive value for K4 appears to work best. Furthermore, for K4 = 0, the optimal values for K , , K,, and K3 are virtually signal independent. Also, for such small K4, the relevant eigenvalues of F appear to be convex in K , , K,, and K3. However, since even for K4 = 0 the optimal value for K4 is quite signal dependent, it was found that K4 = 0 yields the safest solution. A computer program has been written to search by simulation for the optimal K, for a white noise signal. The optimality criterion used is

where T I is of the order of lo5,depending on N. K4 = 0, K5 = 1, and K6 = 0 were kept fixed, and the search was done over the remaining 3D space. Searches were performed for N = 2’, i=0,1;~~,7andX=1-(j/1ON),j=0,1;-~,10. The results indicate that stability can be achieved for all N and for X E [ 1 - ( 1 / 2 N ) , 1). This is a useful range for X for tracking purposes in all practical applications. This lower bound on h appears to coincide with the bound (4N + 5 ) / ( 4 N + 7 ) derived in [24] based on the consideration of a specific example. If one would consider optimizing the numerical behavior with respect to h, then X = 1 - ( 0 . 4 / N ) is about the optimal choice. We should note here that there are ways around the lower bound on X mentioned above. To understand this, one should realize that the essence of the FTF algorithm is a fast mechanism to compute the Kalman gain. The details of the computation of the Kalman gain in the prediction part are very critical. However, what one does with the Kalman gain thus obtained in the joint-process part does not necessarily have to be according to what the RLS algorithm dictates, in order to obtain a good adaptive algorithm (see, e.g., also the improved rescue procedures in [35]). Hence, to modify the tracking capabilities of the FTF algorithm, we could very well replace Table 11-(24) by

(55) where { ( T ) is an arbitrary gain factor (a similar mechanism was proposed independently in [42]). One can use averaging analysis to investigate the tracking behavior of adaptive algorithms [35] in a fashion which is very much similar to the error propagation analysis. The tracking characteristics for the modified FTF algorithm are determined by the eigenmodes of the following system matrix (assuming constant ()

where the first approximation holds for ( 1 - A ) 1/2, or 2 l / h - 2K2e in case K2 If. In summary, the eigenvalues are still roughly given by the diagonal elements (since we are mainly interested in the stability issue, and not the details of the dynamics).

C C C

C

B-backward prediction filter C-Kalman gain (overnormalized) CN1-C of order N + 1 W-desired impulse response

C

3. Stabilization with K6

=

1

C

The analysis above was for the case K6 = 0 . As indicated in Appendix B, for the case K6 = 1 we only have a 5 X 5 system to consider (as in the case K6 = 0 , K3 = 1 too actually). The interaction with subsystem F,, is as before and we mainly have to focus on what happens to the 4 X 4 principal subsystem. Consideration of EF( T ) as in (C.5) reveals that the modifications for columns one and three are of higher than first order in and hence can be neglected in our analysis here. So the analysis of the new system reduces to the analysis of a coupled system for ACY, Ap. The 2 X 2 system matrix has a rank one modification w.r.t. the case K6 = 0 , and when we apply averaging, we obtain the following quadratic equation for the eigenvalues v:

(A

-

.)(A

+ 2(1 - K2)(1 - A)

- Y)

C C C

C C C C

e-a posteriori forward prediction error ep-a priori forward prediction error r-a posteriori backward prediction error rps-a priori backward prediction error rpf = B X (= r p s ) rpi = rps Ki(i) (rpf - rps) ,i=1,2,5 eps-a posteriori output error epsp-a priori output error

+

C

C C C

C C C C

alphal-inverse of LS sum of forward errors beta-LS sum of backward prediction errors gammas-cosine of oblique angle gammals = l/gammas gamma = lambdaN*beta*alphal (=gammas) gamma1 = l/gamma gamma 1N 1-gamma 1 of order N + 1

C

where the last term represents the coupling introduced, and A; (the Nth order reflection coefficient) is the last coefficient of AN = EA,,,. In terms of the signal characteristics, there is no coupling for 1A;l = 1 (perfectly predictable signal) and the coupling is maximal for A; = 0 (white noise). In terms of the feedback coefficient K,, there is no coupling for K2 = 1. A bit of analysis shows that for K2 > 1 we have 1 Y 1 E [ A - 2( K2 1 ) ( 1 - A), A]. So the coupling does not significantly alter stable error propagation dynamics.

&

APPENDIX D FORTRAN77 PROGRAMLISTING

subroutine sftf(d, x ,lambda, mu, Ki ,N, W ,eps, begin, rescue, allowrescue) parameter (Nmax= 100) integer N real A(O:Nmax),B(O:Nmax),C(O:Nmax1) real CNl(O:Nmax), W(0:Nmax - l),epsp,eps,y 1 real X(0:Nmax) ,XX(O:Nmax) ,Ki(5 ) real d,x,ep,e,rps,rpf,rpl ,rp2,rp5,rl ,r2 real alphal ,beta,gammas,gammals,gammalN 1 real gamma,gammal ,CN 1N,CN 1Nf real lambdaJambda1 ,lambdaN,mu logical begin,allowrescue, rescue save

c Initialization call if(begin)then begin = .false. lambda1 = 1./lambda IambdaN =lambda**N do 2 i = l , N A( i) =0.0 B(i-1)=0.0 C(i-l)=O.O W(i-l)=O.O XX(i- 1)=0.0 2 X(i- 1)=0.0 A(0) = 1.O B(N) = 1.O XX(N)=O.O X(N)=O.O alphal = 1./(mu*lambdaN) beta = mu gamma1 = 1. gamma = 1. end if do 4 i = N , l , - 1 4 X(i)=X(i-1) X(0) = x

C C C

Stabilized FTF algorithm, for prewindowing with exponential weighting.

C C C

C

An order N estimation error eps of d is computed efficiently from the current and previous N- 1 values of x via the filter W.

C C

Variable comments

C C C C C C C C

C C C

C

begin-flagged if the subroutine call is the first rescue-flagged if a rescue is desirable allowrescue-a flag indicating whether a rescue can be executed when rescue is flagged Ki-vector of convex combination coefficients for the stabilization of the FTF algorithm suggested: Ki(1,2,3,4,5,6) = (1.5,2.5,*,0.,1.,1.) lambda-exponential weighting factor lambda1 = l/lambda lambdaN = lambda**N mu-initialization constant

C C C

C C

d-current desired response sample x-current input sample X-array of current and last N input samples XX-as X, but not reinitialized when rescued

C

C

A-forward

prediction filter

C

c

order update

C

ep=x do 6 i = l , N

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 39, NO. I . JANUARY 1991

112

ep =ep + A(i)*X(i) CNl(0) = -ep*alphal*lambdal do 8 i = l , N 8 CNl(i)=C(i-1) + CNl(O)*A(i) gammalN1 =gamma1 -CNl(O)*ep

6

C

26 else

rescue = .false. end if C

c

forward filter

return end.

C

e =ep*gamma do 10 i = l , N 10 A(i)=A(i)+e*C(i- 1) alpha1 =alpha1 *lambda1-CNl(O)*CNI(O)/gammalNl C

order downdate

C C

rps= -lambda*beta*CNl(N) rpf=O. do 12 i=O,N rpf = rpf + B(i)*X(i) 12 y l =rpf-rps rpl =rps+Ki(l)*yl rp2=rps+Ki(2)*yl rp5 =rps+Ki(S)*yl CNlNf= -rpf/(lambda*beta) CN 1N =CN 1(N) Ki(4) *(CN 1 Nf -CN I (N)) do 14 i=O,N-1 C(i)=CNl(i)-CNlN*B(i) 14 gamma 1s=gamma 1N1+ CNl (N)*rpS gammas = 1./gamma1 s

+

C

backward filter

C C

r l =rpl*gammas r2 = rp2*gammas do 18 i=O,N-1 18 B(i) =B(i) + r l *C(i) beta=lambda*beta+r2*rp2 gamma = lambdaN*beta*alpha1 gamma 1= 1./gamma C C

joint-process extension

C

do 20 i = N , l , - 1 XX(i)=XX(i- 1) XX(0) = x epsp = d do 22 i=O,N-1 22 epsp =epsp +W(i)*XX(i) eps =epsp*gamma do 24 i=O,N-1 24 W(i)=W(i)+eps*C(i) 20

C

C

C(i)=O. end if

rescuing

C

if(gamma.gt. 1 .)then rescue = .true. if(a1lowrescue)then beta= l./alphal gamma1 = 1. gamma = 1. do 26 i=O,N-1 X(i) =O. A(i+1)=0. B(i)=O.

C

REFERENCES [ l ] L. Ljung, M. Morf, and D. Falconer, “Fast calculation of gain matrices for recursive estimation schemes. ” Int. J. Contr., vol. 27, PD. 1-19, Jan. 1978. [2] L. I&g and T. Soderstrom, Theory and Practice of Recursive Identijcution. Cambridge, MA: M.I.T. Press, 1983. [3] B. Friedlander, “Lattice implementation of some recursive parameter-estimation algorithms.” Int. J . Contr., vol. 37, no. 4, pp. 661-684, 1983. [4] S. Haykin, Adaptive Filter Theory. Englewood Cliffs, NJ: Prentice-Hall, 1985. [ 5 ] B. Widrow and S. D. Steams, Adaptive Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1985. [6] E. Eleftheriou and D. Falconer, “Tracking properties and steadystate performance of RLS adaptive filter algorithms.” IEEE Trans. Acoust. Speech, Signal Processing, vol. ASSP-34, no. 5 , pp. 1097-1110, Oct. 1986. [7] J. M. Cioffi, “Limited-precision effects in adaptive filtering,” IEEE Trans. Circuits Syst., vol. CAS-34, no. 7, pp. 821-833, July 1987. [8] A. Weiss and D. Mitra, “Digital adaptive filters: Conditions for convergence, rates of convergence, effects of noise and errors arising from the implementation,” IEEE Trans. Inform. Theory, vol. IT-25, no. 6, pp. 637-652, Nov. 1979. [9] C. Caraiscos and B. Liu, “A roundoff analysis of the LMS adaptive algorithm,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-32, no. I , pp. 34-41, Feb. 1984. [IO] S . Ljung and L. Ljung, “Error propagation properties of recursive least-squares adaptation algorithms,” Automatica, vol. 21, no. 2, pp. 157-167, 1985. [ l l ] M. H. Verhaegen and P. Van Dooren, “Numerical aspects of different Kalman filter implementations,” IEEE Trans. Automat. Contr., vol. AC-31, no. 10, pp. 907-917, Oct. 1986. [12] S . H. Ardalan and S. T. Alexander, “Fixed-point roundoff error analysis of the exponentially windowed RLS algorithm for timevarying systems,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-35, no. 6, pp. 770-783, June 1987. 1131 C. G. Samson and V. U. Reddy, “Fixed point error analysis of the normalized ladder algorithm,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-35, no. 5, pp. 1177-1191, Oct. 1983. [14] S . H. Ardalan, “Floating-point error analysis of recursive least squares and least mean-squares adaptive filters,” IEEE Trans. Circuirs Syst., vol. CAS-33, no. 12, pp. 1192-1208, Dec. 1986. [15] F. Ling and J. G. Proakis, “Numerical accuracy and stability: Two problems of adaptive estimation algorithms caused by roundoff error,” in Proc. ICASSP 1984 (San Diego, CA), Mar. 1984, pp. 30.3.1-30.3.4. [16] J. M. Cioffi and T. Kailath, “Fast, recursive least squares transversal filters for adaptive filtering,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-32, no. 2, pp. 304-337, Apr. 1984. [I71 D. W. Lin, “On digital implementation of the fast Kalman algorithm,” IEEE Trans. Acoust. Speech, Signal Processing, vol. ASSP-32, no. 5, pp. 998-1005, Oct. 1984. [I81 J. M. Cioffi, private communication, 1986. 1191 B; Porat, “Contributions to the theory and applications of lattice filters,” Ph.D. dissertation, Stanford University, Stanford, CA, Aug. 1982. [20] J.- L. Botto and G. V. Moustakides, “Stabilization of fast recursive least-squares transversal filters,” INRIA, Rocquencourt, France, Tech. Rep. 570, Sept. 1986. [21] J.- L. Botto and G. V. Moustakides, “Stabilization of fast recursive least-squares transversal filters for adaptive filtering,” in Proc. ICASSP 87 Conf. (Dallas, TX), Apr. 1987, pp. 403-407. ~

I I3

SLOCK A N D KAILATH: NUMERICALLY STABLE FAST TRANSVERSAL FILTERS

J.- L. Botto and G. V. Moustakides, “Stabilization of fast Kalman algorithms,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-37, pp. 1342-1348, Sept. 1989. M. Bellanger, “Engineering aspects of fast least squares algorithms in transversal adaptive filters,” in Proc. ZCASSP 87 Conf. (Dallas, TX), Apr. 1987, pp. 2149-2152. A. Benallal and A. Gilloire, “A new method to stabilize fast RLS algorithms based on a first-order model of the propagation of numerical errors,” in Proc. ZCASSP 88 Conf. (New York, NY), Apr. 1988, pp. 1373-1376. D. T . M. Slock and T. Kailath, “Numerically stable fast recursive least squares transversal filters,” in Proc. ICASSP 88 Conf. (New York, NY), Apr. 1988, pp. 1365-1368. M. H. Verhaegen, “A new class of algorithms in linear system theory, with application to real-time aircraft model identification,” Ph.D. dissertation, Catholic Univ. Leuven, Leuven, Belgium, Nov. 1985. M. H. Verhaegen, “Roundoff error propagation in four generally applicable, recursive, least squares-estimation schemes.” Automatica, submitted for publication. R. D. Gitlin, J. E. Mazo, and M. G. Taylor, “On the design of gradient algorithms for digitally implemented adaptive filters,” IEEE Trans. Circuit Theory, vol. CT-20, pp. 125-136, Mar. 1973. S. Ljung, “Fast algorithms for integral equations and recursive identification,” Ph.D. dissertation, Linkoping University, Linkoping, Sweden, 1983, Ph.D. dissertation 93. J. C. Samuels, “On the mean-square stability of random linear equations,” IRE Trahs. PGZT, vol. PGIT-5, pp. 248-259, 1959. A. T. Bharucha-Reid, Random Equations. New York: Academic, 1966. B. D. 0. Anderson et a l . , Stability of Adaptive Systems: Passivity and Averaging Analysis. Cambridge, MA: M.I.T. Press, 1986. R. L. Kosut, B. D. 0. Anderson, and I. M. Y. Mareels, “Stability theory for adaptive systems: Method of averaging and persistency of excitation,” IEEE Trans. Automat. Contr., vol. AC-32, no. 1, pp. 26-34, Jan. 1987. H. Kushner, Approximation and Weak Convergence Methods f o r Random Processes, with Applications to Stochastic Systems Theory. Cambridge, MA: M.I.T. Press, 1984. D. T. M. Slock and T . Kailath, “Fast transversal filters with data sequence weighting,” ZEEE Trans.. Acoust., Speech, Signal Processing, vol. ASSP-37, no. 3, pp. 346-359, Mar. 1989. H. Lev-Ari, T. Kailath, and J. Cioffi, “Least squares adaptive lattice and transversal filters: A unified geometric theory,” IEEE Trans. Inform. Theory, vol. IT-30, no. 2, pp. 222-236, Mar. 1984. G. C . Goodwin and K. S. Sin, Adaptive Filtering Prediction and Control. Englewood Cliffs, NJ: Prentice-Hall, 1984. H. Aasnaes and T . Kailath, “Initial-condition robustness of linear least-squares filtering algorithms,” ZEEE Trans. Automat. Contr., vol. AC-19, no. 4, pp. 393-397, Aug. 1974. G. Carayannis, D. Manolakis, and N. Kalouptsidis, “A fast sequential algorithm for least squares filtering and prediction,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-3 1, no. 6, pp. 1394-1402, 1983. P. Delsarte, Y. Genin, and Y. Kamp, “Stability of linear predictors and numerical range of a linear operator,” IEEE Trans. Inform. Theory, vol. IT-33, no. 3, pp. 412-425, May 1987. A. Benallal, “A study of fast transversal least-squares algorithms and application to the identification of acoustic impulse responses,” (in French), Ph.D. dissertation, University of Rennes I, Rennes, France, Dec. 1988. A. Benallal and A. Gilloire, “Improvements of the tracking capability of the numerically stable fast RLS algorithms for adaptive filtering,” in Proc. ICASSP 89 Conf. (Glasgow, Scotland), May 1989, pp. 1031-1034. J. M. Cioffi and T . Kailath, “Windowed fast transversal adaptive filter algorithms with normalization,” ZEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-33, no. 3, pp. 607-625, June 1985. A. Graham, Kronecker Products and Matrix Calculus: With Applicarions. New York: Ellis Honvood Ltd., 1981. M . H. Verhaegen, “Improved understanding of the loss-of-symmetry phenomenon in the conventional Kalman filter,” IEEE Trans. Automat. Contr., vol. AC-34, no. 3 , pp. 331-333, Mar. 1989. D. T. M. Slock, L. Chisci, H. Lev-An, and T . Kailath, “Mod-

ular and numerically stable multichannel FTF algorithms,” in Proc. ICASSP 89 Conf. (Glasgow, Scotland), May 1989, pp. 1039-1041. D. T . M. Slock, L. Chisci, H. Lev-An, and T. Kailath, “Modular and numerically stable fast transversal filters for multichanne1 and multiexperiment RLS,” IEEE Trans Acoust., Speech, Signal Processing, submitted for publication. D. T. M. Slock, “Fast algorithms for fixed-order recursive leastsquares parameter estimation,” Ph.D. dissertation, Stanford University, Stanford, CA, Sept. 1989. S. Karlin and H. M. Taylor, A First Course in Stochastic Processes. New York: Academic, 1975. V. Wihstutz, “Ergodic theory of linear parameter-excited systems,” in Stochastic Systems: The Mathematics of Filtering and Identijcation and Applications, M. Hazewinkel and J. C . Willems, Eds. D. Reidel Publishing Co., 1981, pp. 204-218; also in Proc. NATO Adv. Study Inst., Savoie, France, 1980. V. I. Oseledec, “Lyapunov characteristic numbers for dynamical systems,” Trans. Moscow Math. Soc., vol. 19, pp. 197-231, 1968. M. S. Raghunathan, “A proof of Oseledec’s multiplicative ergodic theorem,” Israel J. Math., vol. 32, pp. 356-362, 1979. Y. Kifer, Ergodic Theory of Random Transformations. Boston, Basel, Stuttgart: Birkhauser, 1986. R. R. Bitmead and B. D. 0. Anderson, “Lyapunov techniques for the exponential stability of linear difference equations with random coefficients,” ZEEE Trans. Automat. Contr., vol. AC-25, no. 4, pp. 782-787, Aug. 1980. B. Porat, “Second-order equivalence of rectangular and exponential windows in least squares estimation of Gaussian autoregressive processes,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-33, no. 5, pp. 1209-1212, Oct. 1985. B. Widrow et a l . , “Stationary and nonstationary learning characteristics of the LMS aaaptive filter,” Proc. ZEEE, vol. 64, no. 8, pp. 1151-1162, Aug. 1976. G. H. Golub and C. F. Van Loan, Matrix Computations, 2nd ed. Baltimore, MD: The John Hopkins University Press, 1989.

Dirk T. M. Slock (S’85-M’88) was born in Ghent, Belgium, on October 22, 1959. He received the degree of Engineer in electrical engineering from the State University of Ghent, Ghent, Belgium, in 1982; and the M.S. degrees in electrical engineering and statistics and the Ph.D. degree in electrical engineering from Stanford University, Stanford, CA, in 1986, 1989, and 1989, respectively. From 1982 to 1984, he was a Research and d Teachine Assistant at the State Universitv of Ghent on a fellowship from tie National Science Foundation, Belgium. In 1984, h e received a Fulbright grant and went to Stanford University, where he held appointments as a Research and Teaching Assistant also. He currently is a Member of the Scientific Staff of the Philips Research Laboratory Belgium, where he is working on the application of fast and efficient adaptive filtering algorithms to telecommunications problems. His present areas of interest include fast recursive least squares algorithms and their implementation and application to equalization, electrical and acoustic echo cancellation, and adaptive control, nonlinear adaptive filtering algorithms, system identification, and signal modeling. ~~~

~

~

~

~

1I4

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 39, NO. I, JANUARY 1991

Thomas Kailath (S’57-M’62-F’70) received the S.M degree in 1959 and the Sc.D. in 1961 from the Massachusetts Institute of Technology. After a year at the Jet Propulsion Laboratories, Pasadena, CA, he joined Stanford University as an Associate Professor of Electrical Engineering in 1963. He was advanced to Full Professor in 1968, served as Director of the Information Systems Laboratory from 1971 through 1980, as Associate Department Chairman from 1981 to 1987, and currently holds the Hitachi America Professorship in Engineering. He has worked in a number of areas

including information and communication theory, signal processing, linear systems, linear algebra, operator theory and control theory; his recent research interests include array processing, fast algorithms for nonstationary signal processing, and the design of special purpose computing arrays. He is the author of Linear Systems (Prentice-Hall, 1980), and Lectures on Wiener and Kalman Filtering (Springer-Verlag, 1981). Dr. Kailath has held Guggenheim and Churchill fellowships, among others, and received awards from the IEEE Information Theory Society, the IEEE Signal Processing Society, and the American Control Council. He served as President of the IEEE Information Theory Group in 1975. He is a Fellow of the Institute of Mathematical Statistics and is a member of the National Academy of Engineering.

Suggest Documents