Some applications of the Euclidean al- gorithm

0 downloads 0 Views 185KB Size Report
A Padé table with singular blocks is called non-normal. The EA .... The size of the matrix M is m. The residual is r = b−Mx. For some starting vector x0, successive.
Some applications of the Euclidean algorithm Adhemar Bultheel and Marc Van Barel K.U. Leuven, Dept. Computing Science, Celestijnenlaan 200A, B-3001 Leuven (Belgium)

Abstract A simple algorithm like the Euclidean algorithm that everybody knows as a method to compute the greatest common divisor of two integers, when applied in more general situations may solve some unexpected problems. One can trace the fundaments of this algorithm in applications such as the recursive computation of Pad´e approximants, the theory of continued fractions, the recurrence relations of orthogonal polynomials, the partial realization problem of linear systems theory, the theory of linear predictive coding, the Lanczos method for solving large (un)symmetric linear systems or eigenvalue problems and several others. We shall give a survey of some of these applications.

Keywords: Euclidean algorithm, Lanczos method, linear algebra, approximation.

1 1.1

Euclidean algorithm Euclidean domain

A Euclidean domain is defined as an integral domain D with a valuation ∂ : D → N which satisfies ∂(a) = 0 iff a = 0 and ∂(ab) ≥ ∂(a) for all a, b 6= 0 and there is some division property or modulus calculus which means that for any two elements a, b ∈ D, b 6= 0, there exist elements q (quotient) and r (remainder) in D such that a = bq + r with ∂(r) < ∂(b). The latter will be denoted as r = a mod b. Quotient and remainder need not be unique. For example in Z with ∂(a) = |a|, we may write 8 = 1 · 6 + 2 or 8 = 2 · 6 − 4. When we introduce equivalence classes collecting all elements that differ only by a factor which is a unit of D (that is an invertible element from D), and select from this class a unique one (for example by selecting the positive remainder in the previous example) we can always think of the elements q and r as being essentially unique within any desired normalization. Another classical example of a Euclidean domain is the set of polynomials F[z] over some field F with valuation ∂(p) = 2deg p . (deg 0 = −∞ by definition.) In a Euclidean domain, two elements a, b ∈ D gave a greatest common divisor GCD(a, b), which can be computed by the Euclidean algorithm (EA) that can be formulated as: Set r−1 = a and r0 = b, then compute rk = rk−2 mod rk−1 until rk = 0. Then rk−1 = GCD(a, b). This algorithm can be traced back to the theorems 1 and 2 of book

Euclidean algorithm 7 of the Elements of Euclid, written ca 300 BC, but it is still today of great practical importance in many modern applications, as we want to illustrate in this paper. The proof which shows that this algorithm gives the GCD is based on the fact that GCD(rk−1 , rk ) = GCD(rk , rk+1 ) and the fact that it is finite since ∂(rk ) is a strictly decreasing sequence which will end up being zero. Since we have some freedom in defining our GCD, we may at every stage introduce units xk and ck and use GCD(sk , rk ) = GCD(sk xk , rk ck ). These units can be used to meet some normalization condition at every step of the algorithm.

1.2

EEA and continued fractions

To introduce the concept of the extended Euclidean algorithm, we make the link with continued fractions. Denote sk = rk−1 , then one step of the Euclidean algorithm corresponds to the relation sk−1 /rk−1 = qk + rk /sk , or, introducing the units xk and ck , we have      0 1 xk 0 yk ck [sk rk ] = [sk−1 rk−1 ]Vk ; Vk = ≡ 1 −qk 0 ck x k ak where ak = −qk ck , yk = 0 and qk = quotient of sk−1 /rk−1 . Introducing some V0 with x0 = c0 = 0 and y0 and a0 some units to normalize the given elements r0 and s0 , we arrive after a small manipulation at a continued fraction corresponding to the expansion   y0 c1 c2 ck rk r0 + x1 + · · · + xk−1 + xk . − = s0 a0 a1 a2 ak sk Truncating this after k terms, we shall get a rational approximation for the left hand side of the form c0k /a0k . The recurrence for the numerators and denominators of the approximants as well as for the tails (residuals), can be collected in one recurrence which reads     1 0 y0k c0k Gk = G−1 V0 V1 . . . Vk =  x0k a0k  ; G−1 =  0 1  . | {z } s0 r 0 sk rk V 0k

This extended recurrence is usually known as the Extended Euclidean Algorithm. (EEA) [1, 35, 7]

1.3

Rational approximation

If we are interested in the rational approximations as they are generated by the EEA, and not as such in the computation of the GCD, then it doesn’t matter whether the algorithm will end or not. Thus, we can drop the condition that the EEA is finite i.e., that the strictly decreasing sequence ∂(rk ) ends up being zero. We could as well let it go on decreasing until −∞. If we take for example for D the set of positive reals and let the operation r = a mod b mean that r/b is the decimal part of a/b, hence a/b = q + r/b with q integer, then the EEA can be used to compute the sequence of rational approximants which will only be finite when a/b is rational, but it may be infinite when a/b is irrational. The units in this situation are still the nonzero constants from D. A similar extension can be imagined for the set of formal Laurent series D = F− hzi over a field F having only finitely many terms with positive exponents and where the division property a/b = q + r/b means that q is the polynomial part of a/b and r/b is the strictly proper part. If ∂(s) is supposed to mean the degree of a series s ∈ F− hzi,

A. Bultheel and M. Van Barel that is the highest power of z that gets a nonzero coefficient, then the sequence ∂(rk ) as generated by the EA will be a strictly decreasing sequence (until −∞). Mutatis mutandis, this can be adapted for the formal Laurent series F+ hzi with only finitely many negative exponents. The valuation ∂(s) will then indicate the order of s, that is the smallest exponent of z that takes a nonzero coefficient. This sequence will go to +∞. The EA as a method to find rational approximants to real numbers is well established in number theory and can be found in many classical books on number theory and continued fractions. For example the second convergent in the continued fraction expansion if π would give 3 + 1/7 = 22/7; the next convergent is 3 + 1/(7 + 1/15) = 333/106. An interesting geometric interpretation will illustrate the fast convergence. For example a positive real number ρ = r/s can be represented as a straight line `ρ through the origin with slope ρ. The positive rational numbers are then represented by grid points of an equispaced grid Ω = N2 in the first quadrant on a line with rational slope. Whenever the line `ρ goes through a grid point, the number ρ is rational. If it does not, rational approximants can be obtained by choosing grid points close to the line. The EA is a method to generate these successive approximating grid points: the next grid point is found as a combination of the two previous ones. Taking units xk = ck = 1 (all the numbers below are in N), we see that the next grid point is found as (a0,k , c0,k ) = (y0,k−1 , x0,k−1 ) + ak (c0,k−1 , a0,k−1 ). Note that (y0,k−1 , x0,k−1 ) = (c0,k−2 , a0,k−2 ) and that (a0,k , c0,k ) and (y0,k−1 , x0,k−1 ) are at the same side of the line `ρ while (c0,k−1 , a0,k−1 ) is at the other side. In fact, ak is the largest integer multiple of (c0,k−1 , a0,k−1 ) that can be added to (y0,k−1 , x0,k−1 ) without crossing the line. The reader can imagine that the basis vectors (y0,k−1 , x0,k−1 ) and (c0,k−1 , a0,k−1 ) take more and more the same direction (viz. the direction of the line) which implies fast convergence. For a multi-dimensional generalization of the previous geometric interpretation of the number theoretic problem and the associated solution of Diophantine equations see for example [9].

1.4

Complexity, reliability and stable algorithms

The computational complexity of the EEA is well known for example for polynomials with real coefficients of degree n, it requires O(n2 ) floating point operations to find their GCD. However, it is explained in detail at several places (e.g., [7]) how this can be reduced to O(n log22 n) when an appropriate divide an conquer technique is applied in conjunction with Fast Fourier Transforms. On special architectures, the speed can be increased even further. For example, a systolic implementation is discussed in [8]. In many applications the Euclidean algorithm has been considered (without recognizing it as the EA) in its generic situation, thereby we mean that typically the sequence ∂(rk ) decreases in each step of the EA by a small generic amount. For example for the Laurent series of F− hzi, the quotients are polynomials of degree 1 at most. When this was not the case, a singularity occurred like a number being zero which resulted in a zero-division and the algorithm broke down. When it is realized that the algorithm in question is just a special case of the Euclidean algorithm in one form or another, it is a simple matter to implement it as a natural tool to avoid breakdown and make the algorithms reliable. By reliability, we mean that the algorithm does not break down unless some number becomes exactly zero. In this way exact breakdown can be avoided. However it is well known that these reliable algorithms could in a numerical environment be highly unstable. It will happen frequently that the

Euclidean algorithm numerically computed value is not zero but very small in absolute value, and when the computations are continued, this shall result in very large rounding errors. This situation is often called near-breakdown. It is only recently that look-ahead strategies were designed to make the algorithms function in a numerically stable fashion, and again this is mainly emerging from ideas of recursiveness in the Euclidean algorithm. We shall come back to it in a later section.

2

Approximation of formal series

One possible application is the approximation of formal Laurent series. Suppose we work in F− hzi as explained in section 1.3. The successive rational approximants for f = −r0 /s0 (suppose for simplicity that s = −1, so that f = r) satisfy f − c0k /a0k = rk /a0k . When the “quotients” have deg ak (z) = αk , then it can be shown that for a strictly proper f : deg c0k < deg a0k = α0k = α1 + · · · + αk and deg (rk /a0k ) = −2α0k − αk+1 with αk+1 ≥ 1. Comparing the degrees of freedom in the approximant (2α0k ) and the number of interpolation conditions that are met, we see that we have solved here a Pad´e approximation problem at infinity. When f is the transfer function of a linear system, then the approximant is the one of minimal (McMillan) degree which fits the first 2α0k coefficients. This is known as a solution of the minimal partial realization (MPR) problem. In this context, the EA is sometimes referred to as the Berlekamp-Massey algorithm, which was first designed for shift register synthesis of linear predictive codes. The numbers α0k are known as Kronecker indices. The numbers αk did not seem to have received a standard name. We shall call them Euclidean indices. Similarly, the rational approximants produced by the EEA for the ratio of two elements from F+ hzi are convergents of a continued fraction which is known as a P-fraction, a name due to Arne Magnus [36]. The P stands for principal part (plus constant term) which is indeed what is meant by the quotient here. The approximants are known as Pad´e approximants (at the origin, but this is what is usually meant by Pad´e approximation). When the EA algorithm is applied as we have explained, then it will produce Pad´e approximants as the ratio of two polynomials in the indeterminate z −1 . However, we can arrange the normalization by ck and xk to include some powers of z, so that one immediately finds a ratio of two polynomials in z. One of the earlier papers to explicitly recognize the EA as a method for computing Pad´e approximants is [39]. In the literature of Pad´e approximation, the notion of Pad´e table is a convenient concept to formulate certain results. The table is a two dimensional array which contains in the entry (i, j) a Pad´e approximant with numerator degree i and denominator degree j. The i-axis points down and the j-axis to the right. Excluding the possibility that the approximant is zero, then the Pad´e table consists of square blocks of size 1 or larger. Each block contains entries that are all equal to the entry in the left top corner of the square. Above the antidiagonal, there is some overshoot in the approximation (more coefficients are approximated than required) and below there is some defect (not enough coefficients are fitted). Such a block of size > 1 is called a singular block. A Pad´e table with singular blocks is called non-normal. The EA computes Pad´e approximants that can be found on descending diagonals of the table (in fact only on a subdiagonal of the main diagonal, but it can easily be adapted to compute other diagonals too) and each step of the EA computes the next one from the two preceding ones, excluding all the equal entries. Thus the EA jumps over the singular blocks and computes exactly one entry per block it meets on the diagonal along which it progresses.

A. Bultheel and M. Van Barel

3

Linear algebra and orthogonal polynomials

The interpolation conditions for the MPR problem in a linearized form can be written as a Hankel system of equations. Suppose we denote by bold face letters the vertical matrices containing the coefficients of series or polynomials, then the condition that for f (z) ∈ F− hzi, deg a0k (z) ≤ α0k there should exist a polynomials c0k (z) of degree < α0k such that rk (z) = f (z)a0k (z) − c0k (z) is strictly proper, can be written as Ha0k = rk with H = H(f ) the Hankel matrix with symbol f . With the down shift matrix Z we can define blocks A·k = [a0k |Za0k | . . . |Z αk+1 −1 a0k ] and R·k = [rk |Zrk | . . . |Z αk+1 −1 rk ] and upper triangular block matrix A = [A·0 |A·1 | . . .] and the lower triangular block matrix R = [R·0 |R·1 | . . .]. It turns out that HA = R and AT HA = D with D a block diagonal matrix D = diag(D00 , D11 , . . .). This relation can be interpreted in two different ways: (1) as a block triangular factorization of H −1 and, (2) as a block orthogonality relation for the polynomials a0k whose coefficients are the columns of A. They are block orthogonal polynomials with respect to the moment matrix H. By block orthogonality we mean that a polynomial in a block is orthogonal to all the polynomials in previous blocks, but it may not be orthogonal to all the polynomials in the same block. For this concept in a more general situation see [12]. The matrix H is symmetric but not necessarily positive definite, nor need it be strongly regular, meaning that it can happen that some of its leading submatrices are singular. In fact the sizes of the blocks in the decomposition correspond to the number of consecutive leading submatrices that are singular. In terms of Pad´e approximation, this corresponds to a part of the diagonal path passing through a singular block of the Pad´e table. The orthogonal polynomials satisfy the recurrence relation a0k (z) = a0,k−2 (z)xk−1 ck + a0,k−1 (z)ak (z) which follows immediately from the Euclidean algorithm. As before, the xk−1 and ck are units and ak (z) = −qk (z)ck is the “quotient” of sk−1 (z) and rk−1 (z). It represents a generalization of the 3-term recurrence relation for classical orthogonal polynomials. This recurrence relation can again be written as a matrix relation as follows   T01 , T12 , . . . ZA = AT, T = tridiag  T00 , T11 , . . .  T10 , T21 , . . . T is a block tridiagonal which is also a sparse upper Hessenberg matrix. The block Tk−2,k−1 of size αk−1 × αk is zero except the right upper corner which contains −βk = −xk−1 ck , the block Tk,k−1 of size αk+1 × αk is also zero except for the right upper corner which contains 1 and the block Tk−1,k−1 of size αk × αk is the Frobenius matrix for the polynomial ak (z) (coefficients in the last column). The tridiagonal T is a block generalization of the Jacobi matrix which is known in the theory of orthogonal polynomials to contain the recurrence coefficients. An equivalent formulation of the previous relation is (use H(zf ) = Z T H(f ) = H(f )Z) AT H(zf )A = DT = J with J a symmetric block tridiagonal. For a discussion of these matters see for example [19, 28, 30, 13]

Euclidean algorithm

4

Iterative solution of systems

During the last five years an enormous effort has been done to develop and apply the previous theory in the context of the iterative solution of linear systems with so called Krylov subspace methods. The Lanczos algorithm has long been believed to be only numerically acceptable for hermitean positive definite systems because of the numerical difficulties one may expect when the algorithm is applied in a non generic situation when some numbers become zero or nearly zero. This has been indicated as a breakdown or near-breakdown of the algorithm. However, as in previous situations, the Euclidean algorithm may also here bring a solution to overcome such a breakdown whenever possible (curable breakdown). The situation is the following. Suppose we want to solve the system M x = b. The size of the matrix M is m. The residual is r = b − M x. For some starting vector x0 , successive approximants xk = xk−1 + λk sk−1 are computed essentially by using a conjugate direction method for the minimization of the residual norm. The search directions sk = rk + µk sk−1 are conjugate with respect to the matrix M and the corresponding residuals rk = b − M xk are elements from the Krylov subspaces Kn (r0 , M ) = span{r0 , M r0 , . . . , M n r0 } In other words, there must exist a polynomial ϕn (z) of degree n and ϕn (0) = 1 such that rn = ϕn (M )r0 . Let us first suppose that M is symmetric. Then minimizing krn k2 = rnT rn is equivalent to minimizing kϕn k2H where the latterPnorm is taken with respect to the Hankel moment matrix H = H(f ) where f (z) = fk z −k and fk = r0T M k−1 r0 . The comonic polynomial of minimal norm is an orthogonal one, hence we get the orthogonality relations ϕk ⊥ ϕl for k 6= l with respect to the metric H or equivalently rk ⊥ rl in the metric I. Thus in principle we can perform the iterations of the Lanczos algorithm whenever we can construct the orthogonal polynomials ϕk . The problem of breakdown occurs when the Hankel matrix H = H(f ) is not strongly regular, which can happen when M is not positive definite. But we are then in a situation where we have to construct formal orthogonal polynomials with respect to an indefinite Hankel moment matrix, and this problem has been treated in the previous section. The previous formulas can be translated into the present terminology as follows. First we consider for n < m the matrix Yn = [y0 |y1 | . . . |yn ] where y0 = r0 and yk = M yk−1 for k ≥ 1. Then, by definition, the leading submatrix of size n + 1 of H is given by Hn = YnT Yn . The Krylov space is the column space of Yn and the problem is to construct an orthogonal basis for this space. We have to find an upper triangular matrix An such that the columns of Yˆn = Yn An are orthogonal vectors, or in the presence of singular blocks, they should be block orthogonal vectors. The matrix An is a submatrix of the matrix A we had in the previous section and which can be constructed recursively from the data f or equivalently from the matrix Yn by the Euclidean algorithm. The orthogonality means that YˆnT Yˆn = ATn Hn An = Dn with Dn block diagonal. Moreover we have that Yˆn M Yˆn = ATn Hn (zf )An = Dn Tn = Jn is a block tridiagonal matrix. At first sight, the case of a non symmetric matrix M , can not be treated by the EA because the matrix with entries r0 (M T )i M j r0 is not Hankel anymore. One possibility is to construct orthogonal polynomials for a general moment matrix, but this is computationally less efficient because the Jacobi matrix will then not be tridiagonal or block tridiagonal, but it will be a full upper Hessenberg matrix. Algorithms of this type belong to the class of Arnoldi-like algorithms. However, it is still possible, with substantially less computational effort to generate a Hankel matrix as before. Indeed, we

A. Bultheel and M. Van Barel consider two Krylov subspaces: the one generated by the vectors yi and another one Kn (r0 , M T ), which is spanned by the columns of Zn = [z0 |z1 | . . . |zn ] with z0 = r0 and zk = M T zk−1 . Clearly, the matrix Hn = Hn (f ) = ZnT Yn is again Hankel with symbol f (z) = z0T (zI − M )−1 y0 . Applying the EA as before to get the upper triangular factor An , leads to the fact that Zˆn = Zn An and Yˆn = Yn An satisfy a block biorthogonality condition Zˆn Yˆn = ATn Hn An = Dn . We shall not go into the details of all the possible variants on the Lanczos algorithms that were recently developed. For a survey see [23, 30, 13]. We hope that the previous analysis explains that the essential computations done by these methods are basically the same as those you can find in the Euclidean algorithm.

5

Eigenvalue problems and linear systems

Let us consider a linear system which can be described by the state space equations xk+1 = Axk + Buk , yk = C T xk ,

A ∈ Fn×n , B ∈ Fn×1 C ∈ Fn×1

The vector xk is the state, uk is the input and yk is the output. The triple (A, B, C) is called a state space realization of the system. When the number n (dimension of the state space) is the smallest possible one to describe the system, the realization is called minimal. P After taking z-transforms (e.g., u(z) = uk z −k ), we obtain y(z) = f (z)u(z) with transfer function f (z) = C T (zI − A)−1 B. The coefficients in its expansion at ∞, i.e., fk = C T Ak−1 B are called Markov parameters, and the sequence (fk ) is called the impulse response. Thus the MPR problem is to match a certain number of Markov parameters by constructing a rational transfer function of the smallest possible degree. It is a Pad´e approximation problem at infinity for the transfer function of the system. The matrix O = [C AT C (AT )2 C . . .]T is called the observability matrix and R = [B AB A2 B . . .] is called the reachability matrix. The system is called reachable when R has full rank, i.e., n = rank R and it is called observable when O has full rank, i.e., when n = rank O. When n = rank R = rank O = rank OR, then the realization is minimal. Note that H = OR is the same as the Hankel matrix H(f ), hence rank H(f ) = deg f . To see that the Lanczos algorithm can also help in reducing the eigenvalue problem for the matrix M to a simpler one, note that from (notation of previous section) ZˆnT Yˆn = Dn and hence ZˆnT = Dn Yˆn−1 , it follows from ZˆnT M Yˆn = Dn Tn that Yˆn−1 M Yˆn = Tn . This means that M is similar to Tn and thus it has the same eigenvalues. Since Tn is much simpler, being a sparse block tridiagonal, its eigenvalues are easily obtained. Since the finite dimensional analog of ZA = AT reads F (a0,n+1 )An = An Tn , where F (a0,n+1 ) represents the Frobenius matrix for the polynomial a0,n+1 (z), also Tn = A−1 n F (a0,n+1 )An and since An is unit upper triangular, it follows that the eigenvalues of M are eigenvalues of Tn and therefore zeros of a0,n+1 (z). All this is true when there is some α0,n+1 which is equal to m, the size of the matrix M . This need not be true in general. First note that the input for the EA is essentially f (z) = z0T (zI − M )−1 y0 . The triple (M, y0 , z0 ) is a state space realization for f . But if it is not minimal, then M may contain a larger spectrum than can be obtained as poles from f . The EA or Lanczos algorithm will end because α0,n+1 = r = deg f < m. It will thus be impossible to recover the whole spectrum of M from f . The Lanczos algorithm will have a breakdown, which can not be cured by the principles of the EA.

Euclidean algorithm Note that nonminimality is no problem in the solution of systems, since this breakdown indicates that f is approximated exactly and the residual will be zero. For the eigenvalue problem however, it turns out that there are two possibilities to be considered. Define ry = rank Y∞ = dimK∞ (z0 , M T ) and rz = rank Z∞ = dim K∞ (y0 , M ), then either (A) or (B) will happen. (A) m > r = min{ry , rz }, then the system is reachable or observable, and we can recover the spectrum of M restricted to the reachability space (that is K∞ (z0 , M T ) = the column space of Z∞ ) or the observability space (that is K∞ (y0 , M ) = the column space of Y∞ ). (B) r < min{ry , rz }, then the starting vectors y0 and/or z0 were chosen inadequately so that they do not have components along a basis for the maximal Jordan block that can be associated with each eigenvalue of M . The algorithm is trapped in a smaller dimensional space and only the spectral information of M restricted to these smaller spaces will be recovered. See [30, 41, 13].

6

Coding theory

Another classical application of the Euclidean algorithm can be found in the theory of error correcting codes, more precisely Goppa codes. Two examples are BCH codes and Reed-Solomon codes. Words are n-tuples of elements taken from an alphabet from a Galois field. The legal messages are code words from a set C. This set C is characterized by a generator polynomial G and a syndrome S. One has c = (α0 , . . . , αn−1 ) ∈ C iff P Sc (x) = αi /(x − ai ) is zero modulo G. Since all the arithmetic is done in finite (Galois) fields, S(x) can be rewritten as a polynomial as well. When the message is transmitted, some error may be introduced and the code word c has been changed into r and the syndrome Sr modulo G is not zero anymore. For deg G = s, errors with a weight bs/2c can be detected and corrected. The EEA is used to construct two polynomials from G and Sr . One will give the location of the error committed during transmission of the code word, the other will give the value of the error. Since there are no numerical aspects to this application, we shall not go deeper into it. A comprehensive treatment of the subject can be found in [38]. The Berlekamp and Massey algorithm (BMA) [6, 37], was first developed in this context to find linear feedback shift register realizations for the associated system. In a sense, this is equivalent to the Euclidean algorithm. We take the occasion here to bring forward a slight difference between the EA and the BMA algorithm. In the EA, the recurrence for the residuals rk is central and the polynomial quotients ak are the gear wheels of the recurrence, but the polynomials a0k and the other elements of the matrices V0k are side products. In the BMA it is the recurrence for a0k that is central and the residuals rk are only a side-effect. Thus the EEA is a natural compromise between both. This conceptual difference has consequences for practical implementation though. The EA starts out with all the information available in f , splits off a part of the continued fraction and continues with the tail. That is why this is sometimes called a layer peeling method. The BMA on the other side takes from f only just enough coefficients to find a1 = a01 and hence V1 . Then the data are transformed to give [s1 r1 ] and again just enough coefficients of r1 are computed to find V2 etc. This corresponds to what is called a layer adjoining method. The necessary coefficients are computed by inner products. This can be seen from H(f )A = R. The necessary coefficients from R are computed by multiplying rows of H(f ) with columns of A. These inner products make layer adjoining methods less suitable for a parallel implementation. See [33].

A. Bultheel and M. Van Barel See also [20, 21] for generalizations of the BMA to multi-sequences.

7

Matrix generalizations

Most of the previous notions concerning the EA which was applied to elements form a field F or to constructs like polynomials or Laurent series with coefficients in F, can be generalized to the q × p dimensional situation, i.e., where we consider matrices (in general rectangular matrices) whose entries are elements, polynomials, Laurent series etc., over the field F. There are now several possibilities for generalizing all the scalar notions and what might be understood by a EA for this situation is not uniquely defined. Only the situation of square matrices with no singular blocks (that is the matrices we need to invert are either zero or invertible) is a trivial generalization. The only new concept entering the picture is the non-commutativity of the product, but this does not cause a major difficulty. The general situation is however much more complex, and not all the problems have been solved.

7.1

Linear systems

The theory of linear systems was probably the first place where matrix generalizations of the previous theory were developed. The concept of state space is remarkably easy to adapt to systems with multiple inputs and multiple outputs. In the realization triple (A, B, C) one should assume that B ∈ Fn×p and C ∈ Fn×q where p is the dimension of the input vector uk and q is the dimension of the output vector yk . The transfer function f (z) = C T (zI − A)−1 B becomes a matrix rational function of size q × p. There are several possibilities to write a rational matrix function as the “ratio” of two polynomial matrices. One possibility is to write it as a polynomial matrix of size q × p divided by a scalar polynomial. This is however not an optimal way. It is better to write it as a left or right matrix polynomial fraction. For example f (z) = s(z)−1 r(z) with s and r polynomial r ∈ Fq×p [z] and s ∈ Fq×q [z] is a left description. The Hankel matrix H(f ) is replaced by a block Hankel matrix with q × p blocks.

7.2

Greatest common divisor

The definition of greatest common left divisor (GCLD) (there is also a right version) for matrix polynomials can be extended as follows. To have g = GCLD(s, r), first of all, g should be a common left divisor, thus there should exist s˜ and r˜ such that s = g˜ s and r = g˜ r and second, it should be the greatest, meaning that for any other common left divisor g˜, there exists some h such that g = g˜h. More details can be found in [34, p.376-378]. You can also find there a theorem saying that g is a GCLD of s and r when there exists a unimodular matrix V such that [s r]V = [g 0]. A unimodular matrix is a polynomial matrix whose inverse is again a polynomial matrix (it is like a unit in the set of polynomial matrices). In fact, if [s r] has full row rank then all the GCLDs shall only differ from g by a unimodular right factor. Thus the matrix version of the EEA will, exactly like in the scalar case, start from G−1 and generate successively Gk = Gk−1 Vk until after a finite number of steps one has [sn rn ] with rn = 0. The matrices stand for       Iq 0 y0k c0k yk ck G−1 =  0 Ip  , Vk = , Gk = G−1 V0 . . . Vk =  y0k a0k  | {z } x k ak sk rk s0 r0 V0k

Euclidean algorithm with Vk unimodular. They are a composition of elementary Gaussian-type elimination matrices which are either permutation matrices or lower triangular matrices with a constant diagonal and powers of z occurring only below the diagonal. Obviously the determinant of such a product will be a nonzero constant from F and therefore the whole matrix V0k will be unimodular. How the matrices Vk are produced exactly can be found in detail in [46] for the MPR problem and in [11] for the matrix Pad´e problem. On a more theoretical basis, the same ideas are found in [3]. Predecessors are [18] and [2]. The general idea is that leading column coefficients in rk are eliminated by subtracting suitable multiples of columns of sk . This process will gradually drive rk towards zero. Meanwhile some hierarchy in the degree structure has to be respected, so that from time to time some column of the second block column is interchanged with a column of the first block which gives rise to the premutation matrices. Thereby it should be important to keep the leading column coefficients of sk linearly independent since it should be possible to eliminate a leading column vector in rk by linear combinations of the leading column coefficients in sk . These restrictions prevent a pivoting technique to guanrantee numerical stability.

7.3

Approximation of series

When the previous algorithm is used for matrix polynomials s0 and r0 , then it will end after a finite number of steps. In general it may be applied to matrix Laurent series and then it may never stop, but at every stage, the approximants c0k a−1 0k are matrix versions of the Pad´e approximants at ∞ or at 0, depending on the implementation, of the rational matrix function f = −s−1 0 r0 . Again we refer to the papers mentioned above for the details. A somewhat different approach giving a different matrix Euclidean algorithm can be found in [26].

7.4

Matrix continued fractions

We can also associate with the matrices Vk an elementary step of a matrix continued fraction. However, in the scalar case, the left top element yk was zero. In the matrix case, this will no longer hold. This implies that the matrix generalization of the continued fraction will not only continue the division steps in the denominator, but it will also show a continuing division in the numerator. More precisely, the convergents of the continued fraction on the right in the expansion  ···  c2 + y2 ···  y 0 c 1 + y 1 ··· a2 + x 2 · · ·  ; where a = ab−1 −s−1  0 r0 = ··· b c2 + y2 · · ·   a0 a1 + x 1 ··· a2 + x 2 ··· are the successive approximants.

8

More general rational approximation

We have mentioned in the previous sections only rational approximation for formal series at 0 or ∞. Of course, it would not be difficult to replace the indeterminate z by

A. Bultheel and M. Van Barel x − α which would result in a Laurent series in the neighborhood of α, or one could consider simultaneous approximation of two series: one at 0 and one at ∞ or even at more points, giving multi-point rational approximations. The extreme cases are where all the interpolation points coincide at e.g., the origin or ∞ as we considered before and at the other extreme is the case where all the interpolation points are different (Cauchy interpolation). In the latter case, the formal series in powers of z should then be replaced with formal Newton series. In general, one has several series, one for each interpolation point where function values and derivatives up to a certain order are given. However, the general idea of constructing a continued fraction by successive division of the tails, giving better and better approximants, i.e., satisfying more and more interpolation conditions, just like the Euclidean algorithm did, can be maintained. The technicalities are of couse more involved, but basically the same. Whether the reader is still prepared to call this a variant of the Euclidean algorithm is of course a matter of taste. Among all these rational interpolation problems, we especially mention the vector valued Pad´e approximation problem, which has a longer history. We shall not go into the details, but we note a remarkable duality that has been considered is this context. There are two types of problems that were already considered by Hermite. There is the problem of finding polynomials p1 , . . . , pn with certain bounds δ1 , . . . , δn on their degrees suchP that they approximate given power series f1 , . . . , fn up to a P certain order n ω in the sense that 1 pk (z)fk (z) = O(z ) where ω is as large as possible (ω = k δk +n−1 at least). This is known as the Pad´e-Hermite approximation problem. On the other hand, there is the problem of simultaneous Pad´e approximation where polynomials p0 , p1 , . . . , pn are to be found such that for given series f1 , . . . , fn and given integer ω, it holds that fk (z)p0 (z) − pk (z) = O(z ω ) and the polynomials should be as simple as possible, i.e., max{deg pk : k = 0, . . . , n} should be minimal. There is an intimate relation between the two problems. Solving one problem is equivalent to solving the other. This duality is vaguely expressed by saying thet the residuals for one problem are the numerators for the other problem. This duality also shows up in the problems we have considered previously. For example, the approximation of a real number by a rational one is dual to the problem of solving a linear Diophantine equation etc. The computation of a GCD is dual to the computation of a continued fraction expansion. Walking on a downward sloping diagonal in the Pad´e table is dual to walking on an upward sloping diagonal. Solving nested Hankel systems growing in SE direction is dual to solving Hankel systems growing in NW direction etc. We also mention the duality here between the matrix rational approximation problem which can be expressed as a left or a right matrix fraction. For a discussion of the general duality principle in vector Pad´e-Hermite problems, see [16]. By now multipoint and matrix versions have been considered [4, 5, 48, 47]. The block structure of these problems for the scalar case has been discussed recently by M. Gutknecht [29].

9

Look-ahead strategies

The latest successes that were obtained in this circle of problems is a breakthrough in improving the numerical stability of this type of algorithms. This is the problem we mentioned before of how to make the algorithms deal with near-breakdown, and not just with exact breakdown. Taylor was probably the first to propose a look-ahead strategy in the context of the

Euclidean algorithm Lanczos algorithm, see [44, 42]. These papers coined the term look-ahead. See also [10, 22, 24, 25] The same idea has been implemented for the related problems that were discussed in this survey, like for example Pad´e approximation ([29]) the solution of Hankel ([51, 40, 15] and Toeplitz systems ([32, 31]) and in general rational (matrix) interpolation problems ([31, 49]. The EA gives a reliable algorithm because it jumps over the singular blocks in the Pad´e table, or in terms of Hankel matrices, it skips the singular leading submatrices. Such a singularity is detected by a pivot element being zero. Near-breakdown is detected when this pivot is small. Continuing the computations would blow up all the numbers, which when applied in finite arithmetic eventually results in a dramatic loss of accuracy. Thus numerically, it makes more sense not to leap from a nonsingular situation to the next one, but to leap from a well conditioned situation to a well conditioned one. This then results in a stable EEA (SEEA). In a scalar situation, this pivot element is just the leading coefficient in the residual series. For the matrix extensions of these problems, the situation is similar. For example in the MPR problem, the degrees of the the columns of the residual series rk is decreased by eliminating leading column coefficients. This elimination is obtained by subtracting polynomial multiples of columns of sk . Thus the leading column coefficients in sk should form a well conditioned matrix if the elimination has to be numerically stable. A similar situation holds for all the rational approximation problems we have discussed in previous sections. Because a well-conditioned situation is a regular one by definition, the transition matrix Vks we shall need in a SEEA to jump from one approximant to the next one, will be a product of some of the Vk matrices of the EEA. Therefore, the matrix Vks will not have the same simple form as a Vk with yk = 0, xk and ck units and only ak polynomial. In general Vks will have polynomial entries in all 4 places, but it will still be unimodular. For matrix problems however, the simple structure was lost anyway and there is not a big difference between a matrix of type Vk and a matrix of type Vks . It does mean that now, even in the scalar case, we shall have to consider more general continued fractions where fractions continue in numerator as well as in denominator. The same idea of grouping together several steps of the simple EEA was already used in the superfast versions of the EEA. Suppose we need N steps of a classical EEA to reach a solution, then the divide and conquer strategy grouped the product V1 . . . VN as V1n Vn+1,N where n = bN/2c. In terms of Pad´e approximation, this means that to have an order α0,N +1 approximant of f = −s−1 0 r0 , we may compute an order α0,n+1 approximant of f and an order α0,N +1 − α0,n+1 approximant of the tail of the continued fraction. These two can then be combined to give the desired approximant. By applying this strategy recursively, the superfast algorithms emmerge. In the previous discussion, a leap from a well conditioned point to a well conditioned point was always supposed to be on a path that is normally followed by the EEA, thus on a diagonal path. By introducing shift parameters [48, 47] this extends to any parallel diagonal. By changing the number of interpolation conditions as well as the degree conditions, one can walk in principle from any regular (well conditioned) point to any other regular (well conditioned) point in the interpolation table. The idea is that the solution to an interpolation problem can be written as a system of linear equations. The space of all the solutions for this problem is a submodule which is generated as polynomial multiples of some basis of polynomial vectors. One can move from one point to another one if the basis has leading column vector coefficients that form a sufficiently well conditioned matrix.

A. Bultheel and M. Van Barel

10

Concluding remarks

We have indicated how the simple idea of the Euclidean algorithm shows up in several applications. An achievement in one of the solution methods that are in this circle of ideas of course with the appropriate adaptations, will be applicable to other problems. The further we are away from the simple scalar case, the more difficult it will be to recognise the Euclidean algorithm. So, maybe it is not appropriate to call it a variant or generalization of the Euclidean algorithm. We only wanted to show that the same basic idea is present in the EA and in all the other solution methods. If one is prepared to move even further away from the basic algorithm, one could for example design Lanczos type algorithms which are related to Toeplitz matrices or even general matrices, rather than to Hankel matrices [14]. One could think of matrices which are not Hankel or Toeplitz but are structured. See for example [43]. An important challenge for future research is to see how time-variant system theory can be applied to the circle of ideas we have proposed here. In time-variant systems, the matrices A, B and C also depend on k. Many results in the field of signal processing have been obtained already. See [27, 50, 17].

References [1] A.V. Aho, J.E. Hopcroft, and J.D. Ullman. The design and analysis of computer algorithms. Addison Wesley, Reading, Mass., 1974. [2] B.M. Anderson, F.M. Brasch, and P.V. Lopresti. The sequential construction of minimal partial realizations from finite input-output data. SIAM J. Control, 13(3):552–571, May 1975. [3] A.C. Antoulas. On recursiveness and related topics in linear systems. IEEE Trans Automatic Control, AC-31:1121–1134, 1986. [4] B. Beckermann. Zur Interpolation mit polynomialen Linearkombinationen beliebiger Funktionen. PhD thesis, Universit¨at Hannover, 1989. [5] B. Beckermann and G. Labahn. A uniform approach for Hermite-Pad´e and simultaneous Pad´e approximants and their matrix-type generalizations. Numer. Alg., 3(1-4):45–54, 1992. [6] E.R. Berlekamp. Algebraic coding theory. McGraw-Hill, New York, 1968. [7] R.P. Brent, F.G. Gustavson, and D.Y.Y. Yun. Fast solution of Toeplitz systems of equations and computation of Pad´e approximants. J. Algorithms, 1:259–295, 1980. [8] R.P. Brent and H.T. Kung. Systolic VLSI arrays for polynomial GCD computation. IEEE Trans. on computers, C-33(8):731–736, 1984. [9] A.J. Brentjes. Multi-dimensional continued fraction algorithms, volume 145. Mathematical Centre, Amsterdam, The Netherlands, 1981. [10] C. Brezinski, M. Redivo Zaglia, and H. Sadok. Avoiding breakdown and near-breakdown in Lanczos type algorithms. Numerical Algorithms, 1(3):261–284, 1991. Addendum, vol. 2(2):133-136, 1992. [11] A. Bultheel and M. Van Barel. A matrix Euclidean algorithm and the matrix minimal Pad´e approximation problem. In C. Brezinski, editor, Continued fractions and Pad´e approximants, pages 11–51. North-Holland, 1990.

Euclidean algorithm [12] A. Bultheel and M. Van Barel. Formal orthogonal polynomials and rational approximation for arbitrary bilinear forms. Technical Report TW163, Department of Computer Science, K.U. Leuven, December 1992. [13] A. Bultheel and M. Van Barel. Euclid, Pad´e and Lanczos, another golden braid. Technical Report TW188, Department of Computer Science, K.U. Leuven, April 1993. [14] A. Bultheel and M. Van Barel. Linear algebra, rational approximation and orthogonal polynomials, 1993. (In preparation). [15] S. Cabay and R. Meleshko. A weakly stable algorithm for Pad´e approximants and the inversion of Hankel matrices. SIAM J. Matrix Anal. Appl., 1993. [16] G. De Samblanx. Vektor Pad´e-Hermite benaderingen (Vector Pad´e-Hermite approximation). Master’s thesis, K.U. Leuven, Dept. Computing Sci., 1993. (In Dutch). [17] P. Dewilde, M.A. Kaashoek, and M. Verhaegen, editors. Challenges of a generalized, nonstationary system theory. Essays of the Dutch Academy of Arts and Sciences. Dutch Acad. Arts Sci., Amsterdam, 1993. [18] B.W. Dickinson, M. Morf, and T. Kailath. A minimal realization algorithm for matrix sequences. IEEE Trans. Autom. Control, AC-19:31–38, 1974. [19] A. Draux. Polynˆ omes orthogonaux formels – applications, volume 974 of Lecture Notes Mathematics. Springer-Verlag, Berlin, 1983. [20] Gui-Liang Feng and K.K. Tzeng. A generalized Euclidean algorithm for multisequence shift-register synthesis. IEEE Trans. Information Theory, 35(3):584–594, 1989. [21] Gui-Liang Feng and K.K. Tzeng. A generalization of the Berlekamp-Massey algorithm for multisequence shift-register synthesis with applications to decoding cyclic codes. IEEE Trans. Information Theory, 37(5):1274–1287, 1991. [22] R.W. Freund. The look-ahead Lanczos process for large nonsymmetric matrices and related algorithms. In M.S. Moonen, G. Golub, and B.L.R. De Moor, editors, Linear algebra for large scale and real-time applications, volume 232 of NATO-ASI series E: Applied Sciences, pages 137–163. Kluwer Acad. Publ., Dordrecht, 1993. [23] R.W. Freund, G.H. Golub, and N.M. Nachtigal. Iterative solution of linear systems. Acta Numerica, 1:57–100, 1992. [24] R.W. Freund, M.H. Gutknecht, and N.M. Nachtigal. An implementation of the lookahead Lanczos algorithm for non-Hermitian matrices, Part I. SIAM J. Sci. Stat. Comp., 14(1):137–158, 1993. [25] R.W. Freund and N.M. Nachtigal. An implementation of the look-ahead Lanczos algorithm for non-Hermitian matrices, Part II. Technical Report 90.46, RIACS, NASA Ames Research Centes, Moffet Field, 1990. [26] P.A. Fuhrmann. A matrix Euclidean algorithm and matrix continued fraction expansion. Systems and Control Letters, 3:263–271, 1983. [27] I. Gohberg, editor. Time-variant systems and interpolation, volume 56 of Operator Theory, Advances and Applications. Birkh¨auser Verlag, 1992. [28] W.B. Gragg and A. Lindquist. On the partial realization problem. Lin. Alg. Appl., 50:277– 319, 1983.

A. Bultheel and M. Van Barel [29] M.H. Gutknecht. Block structure and recursiveness in rational interpolation. In E.W. Cheney, C.K. Chui, and L.L. Schumaker, editors, Approximation Theory VI, pages 93–130. Academic Press, 1992. [30] M.H. Gutknecht. A completed theory of the unsymmetric Lanczos process and related algorithms. Part I. SIAM J. Matrix Anal. Appl., 13(2):594–639, 1992. [31] M.H. Gutknecht. Stable row recurrences for the Pad´e table and generically superfast lookahead solvers vor non-Hermitian Toeplitz systems. Lin. Alg. Appl., 188/189:351–422, 1993. [32] M.H. Gutknecht and M. Hochbruck. Look-ahead Levinson and Schur algorithms for nonHermitian Toeplitz systems. In XII Householder symposium on numerical algebra, 1993. [33] E. Jonckheere and C. Ma. A simple Hankel interpretation for the Berlekamp-Massey algorithm. Lin. Alg. Appl., 125:65–76, 1989. [34] T. Kailath. Linear Systems. Prentice-Hall, Inc., Englewood Cliffs, N.J., 1980. [35] D.E. Knuth. The art of computer programming, Vol 2 : Seminumerical algorithms. Addison Wesley, Reading, Mass., 1969. [36] A. Magnus. Expansion of power series into P-fractions. Math. Z., 80:209–216, 1962. [37] J.L. Massey. Shift-register synthesis and BCH decoding. IEEE Trans. Inf. Theory, IT15:122–127, 1969. [38] R.J. McEliece. The theory of information and coding, A mathematical framework for communication, volume 3 of Encyclopedia of mathematics. Addison Wesley, Reading, Mass., 1977. [39] R.J. McEliece and J.B. Shearer. A property of Euclid’s algorithm and an application to Pad´e approximation. SIAM J. Appl. Math., 34:611–616, 1978. [40] R.J. Meleshko. A stable algorithm for the computation of Pad´e approximants. PhD thesis, Dept. of Computing Sc., Univ. of Alberta, Canada, 1990. [41] B.N. Parlett. Reduction to tridiagonal form and minimal realizations. SIAM J. Matr. Anal. Appl., 13:567–593, 1992. [42] B.N. Parlett, D.R. Taylor, and Z.A. Liu. A look-ahead Lanczos algorithm for unsymmetric matrices. Mathematics of Comp., 44(169):105–124, 1985. [43] A.H. Sayed. Displacement structure in signal processing and mathematics. PhD thesis, Stanford University, August 1992. [44] D.R. Taylor. Analysis of the look ahead Lanczos algorithm. PhD thesis, Center for Pure and Applied Mathematics, University of California, Berkeley, 1982. [45] M. Van Barel, B. Beckermann, A. Bultheel, and G. Labahn. A look-ahead method to compute vector Pad´e-Hermite approximants. 1993. In preparation. [46] M. Van Barel and A. Bultheel. A matrix Euclidean algorithm for minimal partial realization. Linear Algebra Appl., 121:674–682, 1989. [47] M. Van Barel and A. Bultheel. A new approach to the rational interpolation problem: the vector case. J. Comput. Appl. Math., 33(3):331–346, 1990. [48] M. Van Barel and A. Bultheel. A general module theoretic framework for vector M-Pad´e and matrix rational interpolation. Numer. Alg., 3:451–461, 1992.

Euclidean algorithm [49] M. Van Barel and A. Bultheel. The “look-ahead” philosophy applied to matrix rational interpolation problems. In Proceedings MTNS, Regensburg, Germany, August 1993, 1993. Submitted. [50] A.J. van der Veen. Time-variant system theory and computational modeling, Realization, approximation and factorization. PhD thesis, Technical University Delft, The Netherlands, June 1993. [51] P. Vyvey. Oplossen van niet sterk reguliere Toeplitz stelsels (The solution on non strongly regular Toeplitz systems). Master’s thesis, K.U. Leuven, Dept. Computing Sci., 1987. (In Dutch).