b), and (3) Numerically solve the speci c problem (solve Ax = b). ... It is common to determine the coe cients which minimize the sum of squares of the ..... at 1 op will be able to calculate 1 oating point operation every second (a very slow ...... Table 7.1: Accuracy of the Jacobi method (5 decimal place accuracy is obtained ...
Lecture Notes For Mathematics C05 Numerical Linear Analysis Stephen Roberts Department of Mathematics School of Mathematical Sciences, A.N.U. 1995
Contents 1 Introduction 1.1 1.2 1.3 1.4
Sources of Matrix Problems : : : : : A Circuit Problem : : : : : : : : : : A Dierential Equation : : : : : : : Least Squares Fitting or Regression :
: : : :
2 Computer Arithmetic 2.1 2.2 2.3 2.4 2.5
Floating Point Numbers : : : : : : : : Error Measurement : : : : : : : : : : : Arithmetic Operations : : : : : : : : : Example of Catastrophic Cancellation Operation Count : : : : : : : : : : : :
3 Vector and Matrix Norms 3.1 3.2 3.3 3.4 3.5
Norms : : : : : : : : : : : : : Examples of Common Norms Matrix Norms : : : : : : : : : Operator Matrix Norms : : : Singular Values : : : : : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
5 5 5 6 7
9 9 10 12 14 15
19 19 20 22 23 25
4 Conditioning
27
5 Direct Solution of Linear Equations
31
4.1 Conditioning of Matrix Problems : : : : : : : : : : : : : : : : : : : : : : : 27 4.2 Condition of a Problem : : : : : : : : : : : : : : : : : : : : : : : : : : : : 28
5.1 Gaussian Elimination : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 31
CONTENTS
2 5.2 5.3 5.4 5.5 5.6 5.7 5.8
The LU Decomposition : : : : : : : : : Solving Linear systems : : : : : : : : : : Breakdown and Pivoting : : : : : : : : : An Example of Gaussian Elimination : : Calculation of Condition Number : : : : Error Analysis for Gaussian Elimination Outline of Convergence Analysis : : : :
6 Orthogonal Factorizations 6.1 6.2 6.3 6.4 6.5
Orthogonal Factorizations : : : : : : : Householder transformations : : : : : The QR Factorization Theorem : : : : Application to Least Squares Problem Singular Value Decomposition : : : : :
7 Iterative Solution of Linear Equations Introduction : : : : : : : : : : : : : The Jacobi Method : : : : : : : : : The Gauss-Seidel Method : : : : : Convergence Analysis : : : : : : : 7.4.1 Application to Gauss-Seidel 7.5 Asymptotic Rate of Convergence : 7.6 Relaxation Methods : : : : : : : : 7.6.1 Convergence of SOR : : : : 7.6.2 SOR for Special Problems : 7.7 Iterative Improvement : : : : : : :
7.1 7.2 7.3 7.4
8 The Conjugate Gradient Method
: : : : : : : : : :
8.1 Introduction : : : : : : : : : : : : : : 8.1.1 Preliminaries : : : : : : : : : 8.2 Gradient Method: Steepest Descent 8.3 Conjugate Gradient Method : : : : :
9 The Eigenvalue Problem
: : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : :
32 33 35 36 37 39 40
43 43 44 46 47 48
51 51 52 53 54 56 57 57 59 61 62
65 65 66 66 68
73
CONTENTS 9.1 Introduction : : : : : : : : : : : : : : : : : : : : 9.2 The Power Method : : : : : : : : : : : : : : : : 9.2.1 Convergence Rate of Power Algorithm : 9.2.2 Shifts : : : : : : : : : : : : : : : : : : : 9.2.3 Aitken 2 Algorithm : : : : : : : : : : 9.3 Inverse Iteration or the Inverse Power Method : 9.3.1 Error Analysis of Inverse Iteration. : : : 9.4 The QR Algorithm : : : : : : : : : : : : : : : : 9.4.1 Reduction to Tridiagonal Form. : : : : : 9.4.2 The QR Algorithm : : : : : : : : : : : : 9.4.3 Stopping Criterion : : : : : : : : : : : : 9.4.4 Example of the QR Algorithm : : : : :
3
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
73 73 76 77 78 78 80 82 82 83 85 85
4
CONTENTS
Chapter 1
Introduction 1.1 Sources of Matrix Problems The numerical solution of nearly all physical problems involves to some extent the solution of a linear system of equations. In this section we give a few examples of where linear equations arise in common problems.
1.2 A Circuit Problem There are a number of problems that are equivalent to the solution of a linear equation. Consider the problem of calculating the currents owing through a circuit represented in gure 1.1 Figure 1.1: A simple circuit Using Kirckho's Laws (the currents through each node must balance and the potential around any loop is zero) and Ohm's Law (V = IR), we can derive a system of linear equations for the currents through each of the sections of the circuit. For instance, we have that
I1 ? I5 ? I6 ? I2 = 0 I1 R1 + I5 R6 ? I4 R3 = 0 since the current through node N is conserved and the potential around loop L is zero. Altogether we obtain a system of linear equations Ax = b where x is a vector of the currents Ii ; i = 1; :::; I11 . The solution of our circuit problem can be formulated as consisting of three stages: (1) Formulate the mathematical model, (2) Compute the speci c problem ( nd A and
CHAPTER 1. INTRODUCTION
6
b), and (3) Numerically solve the speci c problem (solve Ax = b). It is in the third stage that we are particularly interested in this course.
1.3 A Dierential Equation As another example consider the dierential equation yxx(x) + cos(x)y(x) = log(x + 4) on [0; 1]y(0) = 0; y(1) = 1 where yxx denotes the second derivative of y (wrt x). To approximate this problem on a computer we must discretize the equation to obtain a nite problem. Let us use the nite dierence method. First we discretize the interval [0; 1] into n uniform sub-intervals (see gure 1.1). The length of each sub-interval will be h = 1=n. we will also let xi = ih (hence x0 = 0 and xn = 1). Figure 1.1: Discretization of Interval We replace the 2nd order derivative in the equation with the simple nite dierence formula yxx(xi ) yi?1 ? 2y2i + yi+1 i = 1; :::; n ? 1 h where yi is the approximation of y(xi ). The boundary conditions are imposed by demanding that y0 = 1 and yn = 1. The approximate equation at the point xi is then yi?1 ? 2yi + yi+1 + cos(x )y = log(x + 4) i = 1; :::; n ? 1y = 0; y = 1: i i i 0 n h2 As a matrix equation this becomes Ay = b where A = (aij ) with 8 > < ?2 + h2 cos(xi) if i = j aij = > 1 if i = j + 1 or i = j ? 1 :0 otherwise, the approximate solution is given by the column vector y = (y1 ; :::; yn?1 )T and the data b = (bi ) satis es ( 2 for i = 1; 2; 3 bi = h log(2 xi + 4) 1 + h log(x1 + 4) for i = 4. In this case our original problem has been approximated by a nite linear system of equations. To obtain a reasonable approximation to the dierential equation we will need of the order of 100 node points. The corresponding size of matrix problems for higher dimensional problems goes as the power of the dimension. Hence it is very easy to generate very large matrix problems for practical problems.
1.4. LEAST SQUARES FITTING OR REGRESSION
7
1.4 Least Squares Fitting or Regression Suppose we have n data points dk at observational points tk ; k = 1; :::; n. The data points are assumed to be modeled by a function d(t) of the form
d(t) =
m X i=1
ci gi(t):
The functions gi are assumed to be known functions. For instance, it is common to suppose that the functions gi consist of the polynomials gi (t) = ti?1 . The task is to determine the coecients ci which best match the observations. In a perfect situation m X dk = ci gi (tk ) (1.1) i=1
for all k = 1; :::; n. But in practical situations the equation Equation 1.1 are not exact, either since errors occur in obtaining the data or the model is not correct. It is common to determine the coecients which minimize the sum of squares of the residues m X rk = dk ? cigi (tk ): i=1
Let
E (c1 ; :::; cn ) =
n X k=1
rk2 :
To nd the coecients which minimize E we need to solve @E = 0 @ci for i = 1; :::; n. This implies that
2 n X 4
k=1
Rearranging leads to
dk ?
m X j =1
3 cj gj (tk )5 gi (tk ) = 0 for i = 1; :::; m:
n m "X X j =1
cj
k=1
# X n
gj (tk )gi (tk ) =
k=1
dk gi (tk ):
(1.2)
This a linear system for the coecients ci . Let 2 3 2 3 c 1 66 d.1 77 66 . 77 c = 64 .. 75 and d = 64 .. 75 dn cm be the coecient and data vectors. If we also let A denote the matrix with components
aij = gj (ti ) for i = 1; :::; n and j = 1; :::; m
8
CHAPTER 1. INTRODUCTION
then the equation for the coecient vector (Equation 6.1) can be written AT Ac = ATd: This equation is known as the normal equation for the associated least squares problem. The computational problem reduces to solving the normal equations. We have given just three situations in which linear equations arise. The book by Rice [Ric83] describes a number of other common problems in which linear equations arise.
Chapter 2
Computer Arithmetic 2.1 Floating Point Numbers To actually perform numerical calculations on a computer or calculator it is usually necessary to approximate the numerical quantities so that they can be stored in the machine. The most commonly available method is to use a oating point system. This is a machine version of scienti c notation where the numbers are represented with a xed number of signi cant digits in a speci c base. For example 3:105 108 has 4 signi cant gures in base 10 notation. In general we will consider sets of oating point numbers which consist of a zero element +0:00:::0 0 and the numbers of the form 0:d1 d2 :::dn e where is the base (an integer), d1 ; d2 ; :::; dn are digits which satisfy 0 < d1 < , 0 di < (i = 2; ::; n) and e is the exponent that is required to lie in the range m e M . Note that we have normalized the oating point numbers in the sense that the rst digit d1 is non-zero. Such a system of normalized oating point numbers, together with the zero element is denoted F ( ; n; m; M ).
Example 1 As an example, the oating point system F (2; 2; ?1; 2) consists of the number zero and the numbers of the form :d1 d2 2e where d1 = 1 and d2 = 0 or 1 and the exponent satis es ?1 e 2. The complete set is 0 and the non-zero numbers shown in table 2.1.
The total number of oating point numbers in the system F ( ; n; m; M ) is given by the formula 2( ? 1) n?1 (M ? m + 1) + 1: Here the factors are, 2 for sign, ( ? 1) for rst digit d1 , n?1 for the other n ? 1 digits, M ? m + 1 for the exponents and +1 for zero. Suppose we are using a certain oating point system to represent numbers on a computer. An arbitrary real number x is usually represented by the number in the
oating point system to which it is closest. Suppose that a real number has a ex-
CHAPTER 2. COMPUTER ARITHMETIC
10
Negatives ?:10 2?1 = ?1=4 ?:11 2?1 = ?3=8 ?:10 20 = ?1=2 ?:11 20 = ?3=4 ?:10 21 = ?1 ?:11 21 = ?3=2 ?:10 22 = ?2 ?:11 22 = ?3
Positives +:10 2?1 = 1=4 +:11 2?1 = 3=8 +:10 20 = 1=2 +:11 20 = 3=4 +:10 21 = 1 +:11 21 = 3=2 +:10 22 = 2 +:11 22 = 3.
Table 2.1: Non-zero elements of the oating point system F (2; 2; ?1; 2). pansion (if = 10, we have a decimal expansion and if = 2 a binary expansion) x = 0:d1 d2 :::dn dn+1 ::: e where 0 < d1 and 0 di , for i > 1. If x is to be represented in the system F ( ; n; m; M ) we rst need m e M . Otherwise x will cause an under ow or an over ow. If m e M , the oating point representation of x in F ( ; n; m; M ) is usually obtained by either \chopping" or \rounding". The chopped representation is given by chop(x) = 0:d1 d2 :::dn e : For rounding, the representation is given by round(x) = sign(x) chop(jxj + 21 e?n ) ( 21 is added to the n +1 ? st signi cant digit). We will use the notation () to represent either round() or chop() as required. As an example of rounding and chopping, let us consider the representation of x = 8=3 = +0:266666::: 101 in F (10; 3; ?10; 10). The representation will be (x) = +0:266 101 for chopping and +0:267 101 for rounding. In F ( ; n; m; M ) the largest numbers in absolute value that can be represented are 0:d1 d2:::dn e where d1 = d2 = ::: = dn = ? 1 and e = M (which is approximately equal to M ). If a computer tries to use a oating point system to represent numbers larger than these numbers, we say that over ow has occurred. The smallest numbers in absolute value (other than zero) that can be represented are 0:100:::0 m . Under ow is said to occur if a representation of a number smaller than these smallest numbers is attempted.
2.2 Error Measurement What error do we make in representing a real number x by
(x) 2 F ( ; n; m; M ):
2.2. ERROR MEASUREMENT
11
An example shows that the error is not uniform. In F (10; 2; m; M ) with chopping x = 0:522 is represented by (x) = 0:52 and jx ? (x)j = 0:002. If x = 812:0, then
(x) = 810:0 and jx ? (x)j = 2:0. These are absolute errors. A better measure is the relative error approx. value - true value r = true value provided the true value is not zero. Returning to our example we see that for x = 0:522 the relative error is 0:003; for x = 812:0 the relative error is 0:002. It is easy to see that the absolute error depends on x, but the relative error can be shown to be bounded independently of x. First we give a de nition.
De nition 1 (Unit of Roundo) For a given system F ( ; n; m; M ), the unit of round-
o is de ned to be
u=
(
1 1?n 2 1?n
for a system that rounds for a system that chops:
Theorem 1 (Wilkinson's Theorem) If a real number x is within the range of F ( ; n; m; M ) then (x) = x(1 + ) where jj u. If x 6= 0 then x ? (x) u: r =
x Proof. We will give a proof of this result for chopping. Let x be a real number within the range of the oating point system F ( ; n; m; M ). Without loss of generality we assume that x is positive. Suppose that x has an in nite expansion x = 0:d1 d2 :::dn dn+1 ::: e where di > 0. The oating point representation of x will be given by
(x) = 0:d1 d2 :::dn e and so
x ? (x) = 0:00:::0d d ::: e = 0:d d ::: e?n : n+1 n+2 n+1 n+2 x The relative error is then given by x ? (x) 0:d d ::: e?n 0:dn+1 dn+2 ::: ?n n+1 n+2 = = e ? n x 0:d1 d2 :::dn dn+1 ::: 0:d1 d2 :::dn dn+1 ::: Since di ( ? 1) for all i 1, we have that 0:dn+1 dn+2 ::: 1. On the other hand di 0 for i > 1 and d1 1. Hence 0:d1 d2 :::dn dn+1 ::: 0:1 (base ) = ?1 . Consequently, 0:dn+1 dn+2 ::: 0:d1 d2 :::dn dn+1 ::: and so x ? (x) 1?n x = u: 2
CHAPTER 2. COMPUTER ARITHMETIC
12
2.3 Arithmetic Operations The result of an arithmetic operation applied to numbers in a particular oating point system may yield a result which cannot be represented exactly in that system. We must distinguish between the true algebraic operation and the machine computed operation. To each algebraic operation, there is a oating point version (i.e. the computed version). If represents a generic arithmetic binary operation then we will use the notation to represent the corresponding oating point operation. It is reasonable to assume that the machine can calculate the exact value of the arithmetic operation and then it simply projects this correct value into the oating point system. Hence we will assume that
xy x y x y xy
= (x + y) = (x ? y) = (x y) = (x=y)
From Wilkinson's theorem we have that x y = (x y) = (x y)(1 + ) with jj u, provided x y lies within the range of the oating point system. In the system F (10; 3 : ?10; 10) with rounding, if x = 0:315 101 and y = 0:268 10?1 , then
x + y = 0:31768 101 x ? y = 0:31232 101 x y = 0:84420 10?1 x=y = 0:11753731::: 103
x y = 0:317 101 x y = 0:312 101 x y = 0:844 10?1 x y = 0:118 103
Suppose we use F (10; 3; ?10; 10) and compute x=y where x = 0:300 10?6 and y = 0:400 109 . The true value of x=y is 0:750 10?16 , which chops or rounds to zero, so x y = 0. This is an example of under ow. The relative error is of order one for this example; much larger than the corresponding unit of roundo. In the same oating point system, let x = 0:300 107 and y = 0:200 107 . Then x y = 0:600 1013 which is outside the range of the oating point system. This an example of over ow. If the computer detects such an event it should return an error. It is interesting to consider whether the usual algebraic laws hold in a oating point system. It is easy to see that the commutative laws of addition and multiplication hold. On the other hand the associative laws do not hold in general. In particular x (y z) 6= (x y) z and x (y z) 6= (x y) z. For instance, in F (10; 3; ?10; 10) with rounding, if we let x = 0:218 100 , y = 0:411 10?3 and z = 0:302 10?3 , then
x (y z) = 0:219 100
2.3. ARITHMETIC OPERATIONS and
13
(x y) z = 0:218 100 :
In general, any arithmetic law which involves more than one binary operation will not hold exactly in a oating point system. Let us now use theorem Theorem 1 to obtain a bound on the error obtained when calculating x (y z ). We see that
y z = (y z)(1 + 1 ) where j1 j u and
x (y z)
= (x (y z ))(1 + 2 ) = (x (y z ))(1 + 1 )(1 + 2 )
where j2 j u. Hence the relative error is equal to
x (y z) ? x (y z) = j1 ? (1 + 1 )(1 + 2 )j = j1 j + j2 j + j1 2 j r = x (y z)
and so
r 2u + u2 :
Hence the relative error obtained when three numbers are multiplied is bounded by a term which is independent of the numbers, and is of the same order as the unit of roundo. With the help of a simple example, we see that the addition of a sequence of numbers is not as well behaved. In F (10; 3; ?10; 10) with rounding (u = 5 10?3 ), if x = 0:1 104 , y = 0:1 101 and z = ?0:1 104 , then we have that (y z ) = ?0:1 104 and so
x (y z) = 0:
The relative error is 1 which is considerably larger than the roundo error and this is an example of so called catastrophic cancellation. This problem is fundamental to numerical linear algebra as it especially eects the evaluation of inner products (long sequence of additions). Essentially, if the inner product of two vectors is close to zero, then the relative error in calculating the inner product can be large. Conversely, even if the inner product of two vectors is very close to zero, we cannot guarantee that are close to being orthogonal.
CHAPTER 2. COMPUTER ARITHMETIC
14
2.4 Example of Catastrophic Cancellation Let us consider a situation in which this problem arises. Suppose we want to calculate ex . Now we know that 1 n X ex = xn! n=0
Consider the MATLAB program shown in gure 2.1 which can be used to approximate ex . function ans = ex(x) % % EX Simple MATLAB Program to calculate exp(x) % using naive taylor series method % sum = 1; term = 1; i=0; while ( sum+term ~= sum ) i = i + 1; term = term*(x/i); sum = sum + term; end ans = sum;
Figure 2.1: The MATLAB function ex. Note that we stop the calculation when the addition of a new term does not make a dierence to the sum. Table 2.1 compares the results from this program with the exact value of ex . For x > 0 the results are ne, but for x < 0 the results become progressively worse as x ! ?1. What is happening? For positive x we have a sum of positive terms. For negative x we have a sum consisting of large positive and negative terms which add to a number close to zero. Table 2.2 displays the terms in the series e?15 . It is easy to see that in a situation like this, catastrophic cancellation can and does occur. We can avoid the situation in which catastrophic cancellation will occur by using the formula ex = e?1 x for x < 0. Speci cally we use the MATLAB function shown in gure 2.2. As can be seen in Table 2.3 the results obtained with the new algorithm are quite satisfactory.
2.5. OPERATION COUNT x 1 10 20 40 -1 -10 -20 -40
ex(x) 2.7182818284590455e+00 2.2026465794806714e+04 4.8516519540979046e+08 2.3538526683702006e+17 3.6787944117144245e-01 4.5399929670400209e-05 6.1475618289146260e-09 3.1169515882173582e-01
15
ex 2.7182818284590455e+00 2.2026465794806718e+04 4.8516519540979028e+08 2.3538526683701997e+17 3.6787944117144233e-01 4.5399929762484847e-05 2.0611536224385579e-09 4.2483542552915889e-18
Table 2.1: Accuracy of the MATLAB ex function function ans = newex(x) % % NEWEX Simple MATLAB Program to calculate exp(x) % using naive taylor series method for x>=0 and % 1/exp(|x|) for x j and i is a permutation of the numbers fn ? i; :::; ng and Pi is the corresponding permutation matrix, then Pi Ej = Ej Pi where Ej is the matrix Ej with the multipliers permuted according to i . Hence we can interchange the Pi and Ej matrices so that En?1 : : : E1 Pn?1 : : : P1 A = DU: where all the E matrices are of similar form as the corresponding E matrices. The algorithm can accept any non-singular matrix A since a non-zero element can always be brought to the pivotal position. Two main strategies are used to choose the appropriate non-zero element for the pivot. The most common pivoting strategy is called Partial Pivoting and involves using the largest element (in magnitude) of the column to be eliminated as the pivotal position. Using this method each element of L is bounded by 1 and so the elements in A(k) can only grow by a factor of 2 since (k) a(i;jk+1) = a(i;jk) ? mi;k ak;j
and so
max ja(k+1) j 2 max ja(k) j: i;j i;j i;j i;j
36
CHAPTER 5. DIRECT SOLUTION OF LINEAR EQUATIONS
Hence the pivot ratio
max ja(k) j gn = maxi;j;k ja ijj ; k;l kl the ratio of the largest absolute value of the elements obtained by Gaussian elimination with the largest absolute value of the elements in the original matrix A is bounded by 2n?1 . The extra cost of this strategy is the column search at each step. This adds at most n2 comparisons to the factorization. This is cheap when compared to the cost of 2 3 3 n ops for the elimination. It is also possible to use column interchanges as well as row interchanges to bring to the pivotal position the largest element in the whole reduced matrix. This method is called Complete Pivoting. The purpose of the complete pivot strategy is to ensure that U cannot be drastically larger than A. In 1961 Wilkinson produced a clever proof to show that for this strategy p gn nf (n) where 1=2 f (n) = 21 31=2 41=3 : : : n1=(n?1) n 14 log(n) as n ! 1: This function is much smaller than the corresponding bound of 2n?1 for the partial pivoting strategy. In practice it is observed that partial pivoting usually produces a pivot ratio which is O(1). Hence it is only in rare instances that complete pivoting is actually used.
5.5 An Example of Gaussian Elimination Let us consider the following example of Gaussian elimination taken from Kahane's article [Kah66]. Let 3 2 ?10 ?1 1 2 10 7 6 A = 64 ?1 1 1 75 1 1 1 Then 3 2 0 ? 2 2 7 6 A?1 = 14 64 ?2 0:9999999998 1:0000000002 75 2 1:0000000002 0:9999999998 Clearly A is not ill-conditioned. Suppose that we apply Gaussian elimination without pivoting to solve the equation Ax = b. The rst step eliminates x1 from equations 2 and 3 by subtracting suitable multiples of equation 1 from them. The reduced matrix would be 3 2 ?10 2 10 ? 1 1 7 66 ?4999999999 5000000001 75 4 0 0 5000000001 ?4999999999
5.6. CALCULATION OF CONDITION NUMBER
37
If we use a oating point system with 8 decimal digits then the best we could do would be to approximate the reduced matrix by
2 ?10 ?1 1 66 2 10 9 ?5 10 5 109 4 0 0 5 109 ?5 109
3 77 5
but this is precisely the reduced matrix we would obtain without rounding errors if A had originally been 2 ?10 ?1 1 3 2 10 7 66 0 0 75 4 ?1 1 0 0 In other words, the data in A0 s lower right hand 2 2 submatrix has fallen o the right hand end of our computer's 8-digit register, and been lost. The result is tantamount to distorting our original data by an the amount of the loss, and in this example the result is a disaster. These disasters occur whenever abnormally large numbers are added to moderate sized numbers comprising our data. To avoid such disasters we use the pivoting strategies already described. If in this example we choose a2;1 as the pivot we obtain the reduced matrix 3 2 ? 1 1 1 7 66 4 0 ?1:0000000 1:0000000 75 0 2 2 where row 1 has been interchanged with row 2 and the calculations have been worked to 8 signi cant gures. This reduced matrix is what would have resulted if no rounding errors had been committed during the reduction of
3 2 0 ? 1 1 7 6 A + A = 64 ?1 1 1 75 1
1 1
which diers negligibly from the given matrix A.
5.6 Calculation of Condition Number With the use of the previous pivoting strategies, the process of Gaussian elimination provides a well conditioned method to obtain a vector that provides a small residue to the corresponding linear equation. It is still very important to obtain some measure of the conditioning of the actual linear system. In this section we will indicate a method which is used in a number of the best linear algebra software packages to obtain estimates of the condition number of a matrix.
CHAPTER 5. DIRECT SOLUTION OF LINEAR EQUATIONS
38 Recall that
kAxk=kxk ?1 (A) = max min kAxk=kxk = kAkkA k:
We will work speci cally with the L1 norm. In this case
kAk1 = maxj fPi jaij jg = maxj kaj k1
where aj is the j -th column of A. The main idea is to carefully choose a y, solve Az = y and use kzk1 = kA?1 yk1 kyk1 kyk1 as an estimate of kA?1 k1 . The question is how to choose y. Here we give two possible methods. (1) Choose y at random. On average
kA?1 yk 1 kA?1 k: kyk 2 but this method can drastically under estimate kA?1 k.
(2) A better estimate can be obtained as follows. First solve AT y = c where c is a vector with components cj = 1. The sign of the components are chosen to make y as large as possible. We will indicate the method by using an example.
Example 1 Let us obtain an estimate of the condition number of the matrix " # " #" # A = 9:7 6:6 = LU = 1 0 4:1 2:8 :4227 1
9:7 6:6 : 0 0:0103
We want y which solves AT y = c for a specially chosen c. The solution is obtained via the equation U T (LT y) = c. Without loss of generality we may assume that c1 = 1. We will choose c2 = 1 so that LT y is as large as possible.
LT y1 LT y2
= c1=u1;1 =T :1031 = c2 ? u2;1 L y1 =u2;2 = (1 ? 6:6 0:1031) =0:0103:
This is larger when c2 = ?1. Then
LT y = and so
"
"
0:1031 ?163
#
y = ?69 : 163
#
5.7. ERROR ANALYSIS FOR GAUSSIAN ELIMINATION Now on solving Az = y we obtain
"
39
#
z = ?12690 : 18640 This leads to the estimate
j + j ? 18640j kA?1k kkyzkk = j12690 j69j + j ? 163j = 135:04:
Since kAk1 = 13:8 we have the estimate 1 (A) = 1833:6. The actual condition number is 2249:4 is within 17% of our estimate. This example should only is seen as giving the
avour of the calculation to be found in good quality software. The actual methods use more elaborate techniques (see [CMSW79]).
5.7 Error Analysis for Gaussian Elimination When using nite arithmetic, the process of Gaussian elimination will produce matrix factors L and U which are not the exact factors of A, but are the exact factors of a perturbed matrix A + E . That is, if L and U are the calculated lower and upper triangular factors of A then L and U are the exact factors of A + E , where E is to be thought of as a perturbation to A, so A + E = LU . If we think of A as data for the operation of producing an LU decomposition, then the computed factorization is the exact factorization of a perturbed matrix A + E . A bound on the size of E in the L1 norm is given by the following result.
Theorem 1 Let L and U be the computed triangular factors of A obtained by using
Gaussian elimination with partial or complete pivoting. If oating point arithmetic with rounding unit u has been used, then there exists a matrix E satisfying
kE k1 n2gnukAk1 such that LU = A + E . Here gn is the pivot ratio corresponding to the pivoting strategy used.
The approximate solution obtained by Gaussian elimination can also be interpreted as the exact solution of a perturbed matrix equation. In particular we have the result:
Theorem 2 Let z denote the computed solution of Ax = b obtained by the back substitution Ly = b and the forward substitution U z = y (L and U as in previous theorem). Then there exists a matrix E 0 (depending on b and A) satisfying kE 0 k1 (n3 + 3n2 )gn ukAk1 such that (A + E 0 )z = b:
40
CHAPTER 5. DIRECT SOLUTION OF LINEAR EQUATIONS
These results are satisfactory provided the pivot ratio gn is not too large. We can use this theorem to show that the error in the residue does not depend on the condition number of the matrix. In particular, with a reasonable pivoting strategy we can show that the size of the residue produce by Gaussian elimination will be small even for ill-conditioned matrices and so the problem of nding a vector with a small residue is a well conditioned problem. By the previous theorem we can assume that the computed solution z satis es the exact equation (A + E 0 )z = b. The residue satis es r = b ? Az = E0 z and so krk1 kE0 k1kzk1 (n3 + 3n2 )gn ukAk1kzk1 Note that (n3 + 3n2 )gn u can be bounded independently of A. Hence for any n n matrix we can essentially decide before the calculation on the amount of precision that will guarantee the production of a small residue. On the other hand an estimate in the error in the computed solution depends on the conditioning of the matrix A. To show this we start with the result that the computed solution z satis es the equation (A+E 0 )z = b. The exact solution x satis es the equation Ax = b . Hence A(x ? z) = E0z and so x ? z = A?1 E0 z. This implies that
kx ? zk1 kA?1 k kAk (n3 + 3n2)g u kzk1 : 1 1 n kxk kxk1 1
Thus the relative error in the computed solution is bounded by an expression which depends on the condition number of A. To guarantee that the computed solution is close to the exact solution it is necessary to obtain an estimate of the condition number and then verify that the precision used is adequate. When discussing errors it is usual to measure the error in the computed solutions. That is, it is common to ask questions about the closeness of the computed solution to the exact solution. The resulting analysis is called a forward error analysis. On the other hand the two preceding theorems deal with the question of what problem does the computed solution satisfy exactly and how close is that problem to the original problem. Such questions relate to what is called a backward error analysis. It turns out that backward error analysis of linear systems produce much more fruitful results than those obtained by forward error estimates. Results using forward error analysis seem to imply that working with matrices larger than 20 20 would lead to disaster whereas our backward error analysis shows that we can con dently deal with matrices of at least size 1000 1000 provided that the original problem is not too ill-conditioned.
5.8 Outline of Convergence Analysis We will now give an outline of the proof of theorem Theorem 1 In essence we want to show that the factors L and U obtained from the Gaussian elimination are the exact
5.8. OUTLINE OF CONVERGENCE ANALYSIS
41
factors of a matrix A + E where the in nity norm of E is bounded as in the statement of theorem. At the k-th step of the elimination we obtain
a(i;kk) (k+1) (k) (k) : ; ai;j = ai;j ? mi;k ak;j ( k ) ak;k If we denote the actual computed value with an overstrike then we have mi;k =
m i;k =
a(i;kk) + (i;kk) a(i;kk) (1 + ) = 1 k) k) a(k;k a(k;k
(k) = a(k) and j j u ( u is the unit of roundo). where i;k 1 i;k 1 Similarly, (k+1) = ha(k) ? m (k) (1 + )i (1 + ): ai;j a 2 3 i;k k;j i;j We need to rearrange this expression into the form
a(i;jk+1) = a(i;jk) + (i;jk) ? m i;k a(k;jk) (k) . for some error term i;j We will rst move the (1 + ) terms o the m i;k a(k;jk) terms. By division we obtain (k+1) (1 + )?1 (1 + )?1 = a(k) (1 + )?1 ? m i;k a(k;jk) : ai;j 2 3 2 i;j
Consequently
h
i
h
i
(k+1) ? a(k+1) 1 ? (1 + )?1 (1 + )?1 = a(k) ? a(k) 1 ? (1 + )?1 ? m ai;j i;k a(k;jk) : 2 3 2 i;j i;j i;j
If we let
(k) = a(k+1) 1 ? (1 + )?1 (1 + )?1 ? a(k) 1 ? (1 + )?1 i;j 3 2 2 i;j i;j
then
a(i;jk+1) = a(i;jk) + (i;jk) ? m i;k a(k;jk) :
(5.1)
Since (1 + )?1 1 ? for small , we conclude that to rst order in u (the unit of roundo) (k) j uja(i;kk) j; for i > k ji;k (k) j 2uja(k+1) j + uja(k) j 3u max ja(k) j for i; j > k . ji;j i;j;k i;j i;j i;j
Summing the expressions Equation 5.1 from k = 1; :::; r = min(i ? 1; j ) we obtain
ai;j + ei;j = a(1) i;j + ei;j
=
p X
k=1
m i;k a(k;jk)
CHAPTER 5. DIRECT SOLUTION OF LINEAR EQUATIONS
42
where p = min(i; j ), m i;i = 1 and
ei;j =
r X k=1
(i;jk) :
So the computed factors L and U are the exact triangular factors of the matrix A + E , where (E )i;j = ei;j . That is A + E = LU: The sum ei;j contains min(i ? 1; j ) quantities. Hence
jei;j j 3u min fi ? 1; j g max ja(k) j: k i;j This holds without any assumption about the multipliers m i;k . Hence the purpose of a pivoting strategy is to avoid the growth in the size of the elements a(k;jk) . In terms of the pivot ratio we have jei;j j 3ugn min fi ? 1; j g max ja j: i;j i;j where we recall that
(k) j maxi;j;k jai;j gn = max ja j : i;j i;j The error matrix E can then be bounded by
2 66 0 66 1 66 1 jE j 3gn u max j a j 61 i;j i;j 6 66 . 64 ..
0 1 2 2 .. . 1 2
0 1 2 3 .. . 3
::: 0 0 ::: 1 1 ::: 2 2 ::: 3 3 .. . . . ... . ::: n ? 1 n ? 1
3 77 77 77 77 : 77 75
A slight re nement of the preceding argument leads to the estimate
kE k1 n2gn u max ja j n2gn ukAk1: i;j i;j The original roundo analysis of Gaussian elimination was provided by Wilkinson in 1961 (see [Wil61]).
Chapter 6
Orthogonal Factorizations 6.1 Orthogonal Factorizations Whereas the Gaussian elimination process leads to the LU factorization, the methods of orthogonal factorizations lead to the so called QR factorization which consists of an orthogonal matrix Q and an upper triangular matrix R. The QR factorization of a matrix A is achieved by applying a sequence of orthogonal transformations to A which clear out the lower triangular elements of the matrix (as with elementary matrices for Gaussian elimination). The factorization can be applied to non-square as well as square martrices and plays an important role in calculating eigenvalues and eigenvectors and in the solution of least squares problems which arise both in over determined systems of linear equations and approximation of data. We will also consider the Singular Value Decomposition, in which a real matrix A is written as UDV T , where U and V are orthogonal and D = diag(1 ; :::; n ) is the diagonal matrix listing the singular values of A. Let us recall some standard orthogonality properties:
Two vectors x and y are orthogonal if xTy = 0. A real square matrix Q is orthogonal if QT Q = QQT = I . The rows and columns of an orthogonal matrix are orthonormal (they form a set of orthogonal unit vectors).
If a matrix is multiplied by an orthogonal matrix then the conditioning of the matrix is not magni ed. Of course, the extra calculations will add to the roundo error, but these errors will not be magni ed by the problem becoming more ill conditioned.
CHAPTER 6. ORTHOGONAL FACTORIZATIONS
44
6.2 Householder transformations Householder transformations (also known as elementary re ections) are matrices of the form Hk = I ? 2wwT; where w is a unit vector, i.e. kwk22 = wTw = 1: Matrices of this form Hk are orthogonal and symmetric. In practice we want to choose the vector w so that Hk has the same eect on a matrix as the elementary matrices Ek in the LU factorization. This means that premultiplying a matrix A by Hk will zero all elements in the kth column of A below the diagonal. It is convenient to write the n n Householder matrix Hk in the form T Hk = I ? 2uu2 ; kuk2 where u 2 IRn . Let a 2 IRn . Our aim is to zero the entries of the vector a, after the kth entry, by premultiplying a by Hk . Now T Hk a = a ? 2 u a2 u: kuk2
8 > ak ? r : ai
We can choose
for i < k for i = k for i > k
where we need to determine r such that This implies that r satis es
T 2 u a2 = 1: kuk2
r2
=
n X
a2i :
i=k Hk a = (a1 ; :::; ak?1 ; r; 0; :::; 0)T .
We then have In practice the sign of r is chosen so as to avoid subtractive cancellation in the calculation of n X kuk22 = (ak ? r)2 + a2i: i=k+1 Thus r should be given by
r = ?sign(ak )
n !1=2 X 2 i=k
ai
6.2. HOUSEHOLDER TRANSFORMATIONS
45
Example 1 Let us generate the Householder transformation which transforms the matrix 3 2 1 0 1 7 6 A = 64 ?2 1 1 75 2 ?1 0 to upper triangular form. For the transformation H1 , which zeros the second and third entries of the rst column of A, we have
3 2 3 2 1 ? ( ? 1) 77 66 4 77 66 r = ?3 u = 4 ?2 5 = 4 ?2 5 2
2
Then,
and kuk22 = 24:
3 2 3 2 1 0 0 77 2 66 4 77 h 66 H1 = 4 0 1 0 5 ? 24 4 ?2 5 4 0 0 1 2 3 2 2 66 1 0 0 77 1 66 16 ?8 = 4 0 1 0 5 ? 12 4 ?8 4 0 0 1 8 ?4 3 2 ? 1 2 ? 2 7 6 = 13 64 2 2 1 75 ?2 1 2
and
?2 2
i
3
8 7 ?4 75 4
3 2 ? 9 4 1 7 6 H1A = 31 64 0 1 4 75 : 0 ?1 ?1
For H2 , only the last entry in the second column of H1 A needs to be zeroed and so
2 3 0 6 p 7 r = ? 32 and u = 31 64 1 + 2 75 ?1 p
Therefore,
p with kuk22 = 92 (2 + 2):
3 2 2 3 1 0 0 0 i 7 6 6 p 7h p H2 = 64 0 1 0 75 ? 2+1p2 64 1 + 2 75 0 1 + 2 ?1 0 0 1 ?1 2 p 3 2 0 0 2 + 6 p 7 p = 2+1p2 64 0 ?1 ?p 2 1 + p2 75 : 0
1+ 2 1+ 2
CHAPTER 6. ORTHOGONAL FACTORIZATIONS
46 We then nd that
2 p p p 3 ? 9(2 + 2) 4(2 + 2) 2 + 2 7 6 p p 1 6 p H2 H1 A = 0 ?2(1 + 2) ?5(1 + p2) 75 : 3(2 + 2) 4 0 0 ?3(2 + 2)
6.3 The QR Factorization Theorem As can be seen from the previous example, Householder transformations can be used to reduce a matrix to upper triangular form in a manner similar to using elementary matrices in Gaussian elimination. We will describe the factorization of general, nonsquare matrices. Let A be an m n real matrix with m n. Choose H1 to be the Householder transformation which zeros the entries of the rst column of A below the diagonal. We then form the matrix H1 A and choose H2 to be the Householder transformation which zeros the entries in the second column of H1 A below the diagonal. We form the matrix H2 (H1 A) and then repeat the process until all the entries below the diagonal are zero. This will require at most n Householder transformations and the resulting matrix will be " # HnHn?1 :::H1 A = R ;
where R is an n n upper triangular matrix and is the (m ? n) n zero matrix. Now de ne the orthogonal m by m matrix
"
#
QT = HnHn?1 :::H1 ;
and observe that A = Q R . Thus we have shown the following:
Theorem 1 QR factorization: Let A be an m n real matrix of rank n with m n. There is an m m orthogonal matrix Q and an n n upper triangular matrix R such that " # A=Q R : In the case when m = n, at most n ? 1 transformations are required to form Q, whereas for m > n, at most n are required.
Example 1 With
3 2 1 0 1 7 6 A = 64 ?2 1 1 75 ; 2 ?1 0
6.4. APPLICATION TO LEAST SQUARES PROBLEM we have, from the previous example, 2 p p p 2) 4(2 + ? 9(2 + 2) 2 + 2 6 p p 1 6 p R= 0 ?2(2 + 2) ?5(2 +p 2) 3(2 + 2) 4 0 0 3(2 + 2) and
QT
47
3 77 ; 5
= H2 H1 3 32 2 p 2 0 0 ? 1 2 ? 2 2 + 7 6 p p 76 = 3(2+1p2) 64 0 ?(1 +p 2) 1 + p2 75 64 2 2 1 75 ?2 1 2 0 1+ 2 1+ 2 2 p 3 p p 2) ? 2(2 + 2) 7 ? (2 + 2) 2(2 + 6 p p p 1 = 3(2+p2) 64 ?4(1 + 2) ?(1 + 2) 1 + 2 75 : p p 3(1 + 2)
0
3(1 + 2)
The QR factorization is rarely used to solve linear systems of equations with n = m as it takes about twice as many operations (approximately 4n3 =3) as the LU factorization and requires extra storage.
6.4 Application to Least Squares Problem As discussed in the introduction, the least squares data tting problem leads to a linear system AT Ac = ATd (6.1) where A is an n by m matrix. The condition number of the matrix AT A will be the square of the condition number of A. In practical situations the conditioning of the matrix problem Equation 6.1 can be quite bad. The use of a QR factorization of A can be used to improve the conditioning of the problem. Suppose that " # A=Q R :
Then
AT A =
The normal equations become
h
RT
i
QT Q
"
#
R :
RT Rc = RTy1 where y1 is the vector composed of the rst n components of QT d. We need only nd c such that R c = y1 : The conditioning of this system of equations if the same as the conditioning of the matrix A.
CHAPTER 6. ORTHOGONAL FACTORIZATIONS
48
6.5 Singular Value Decomposition Another important orthogonal decomposition is the Singular Value Decomposition (SV D).
Theorem 1 Singular Value Decomposition (SVD): If A is a real m by n matrix, then there exists orthogonal matrices
U = [u1 ; :::um ] 2 IRmm V = [v1 ; :::vn ] 2 IRnn such that
U T AV = diag (1; :::; p ) 2 IRmn where p = min(m; n) and 1 ::: p 0.
Proof. We take the proof from Golub and Van Loan [GV89][p. 71]. Let x 2 IRn and y 2 IRm be unit 2-norm vectors such that Ax = 1y where 1 = kAk2 . Given the unit vectors x and y we can use the Gram-Schmidt process to obtain orthonormal bases for the spaces IRn and IRm which contain the two vectors respectively. So we can nd matrices V1 2 IRnn?1 and U1 2 IRmm?1 so that V = [x; V1 ] and U = [y; U1 ] are orthogonal. It is easy to show that the product U T AV has the form " T# w 1 T U AV = A1 : 0 B Since
" #
2
A1 1
(21 + wTw)2
21 + wTw.
we have actually has the form kA1 k22
w
But
21
"
2
= kAk22 = kA1 k22 . Hence w = 0 and so U T AV
#
1 0 : 0 B Now we can apply the same argument to B to so inductively obtain a diagonalization of our matrix (and obtain the i ). 2 The vectors ui and vi are respectively called the i-th left and right singular vectors of A. In particular
Avi = iui AT ui = i vi where i = 1; ::; min(n; m).
6.5. SINGULAR VALUE DECOMPOSITION
49
Useful properties of A can be read o from the SVD. If
1 2 ::: r > r+1 = ::: = p = 0 then rank (A) =r kernel(A) = spanfvr+1 ; :::; vp g range(A) = spanfu1 ; :::; ur g: Using the SVD and the invariance of the L2 and Frobenius norms, we easily see that
kAk2F = 21 + ::: + 2p kAk22 = 21 :
50
CHAPTER 6. ORTHOGONAL FACTORIZATIONS
Chapter 7
Iterative Solution of Linear Equations 7.1 Introduction As mentioned in the introduction, linear systems Ax = b arise naturally in the numerical solution of dierential equations. In three dimensions it is not unreasonable to have systems where n = (102 )3 and n = 104 are common. Recall that the cost for Gaussian Elimination is approximately 31 n3 multiplications and n2 storage locations. The matrices arising from dierential equations usually have many zero elements. This sparceness reduces the cost of Gaussian elimination somewhat but the possibility arises for the use of other methods. In this chapter we will look at the so called iterative methods. Iterative methods for solving linear equations begin with an initial vector x0 and compute successive values xk; k = 1; 2; ::: which hopefully converge to the solution vector as k ! 1. These methods can be useful when the system of equations is very large and the entries of A can be generated from a simple formula or when the matrix is sparce, thus removing the necessity to store all n2 entries of A. The principles on which many iterative methods are based can be stated in simple terms, namely: To solve Ax = b nd a matrix M (often called the conditioning matrix) such that: 1. kA ? M k is small. 2. M x = b is easily solved. Then the iterative method is de ned by the following prescription. Pick a starting vector x0 and for k = 0; 1; 2; ::: solve M xk+1 = b ? (A ? M )xk: (7.1)
52
CHAPTER 7. ITERATIVE SOLUTION OF LINEAR EQUATIONS
There are a variety of iteration methods corresponding to the choice of M . We will brie y discuss some of the most common methods. It is important to realize that most iterative methods only converge if A has some special properties and then convergence may still be slow. It should be observed that Equation 7.1 can be rewritten as an update method, namely xk+1 = xk + M ?1rk (7.2) where rk = b ? Axk is the residue. This formulation leads to less roundo error as it involves the addition of a small correction to a good approximation of the solution.
7.2 The Jacobi Method This is probably the earliest and simplest of the iterative methods. Let A be an n n matrix and write A in the form A=L+D+U where L is the lower triangle (below the diagonal) of A, D is the diagonal, and U is the upper triangle (above the diagonal) of A. The system of equations Ax = b can be written as Dx = b ? (L + U)x: This is in the form given above for an iterative method. Provided none of the diagonal entries are zero we have xk+1 = D?1b ? D?1 (L + U)xk = xk + D?1 (b ? Axk ); which is Jacobi's Method. In terms of components this is
1 0 n BB X k CC k +1 xi = @bi ? aij xj A aii i = 1; :::; n k = 0; 1; ::: j =1 j 6=i
Example 1 Consider the system Ax = b where 2 3 3 2 6 ? 2 2 66 ?1 77 77 66 A = 4 ?2 5 1 5 b = 4 8 5 ;
8 1 4 which has a solution x = (?0:5; 1; 2)T . Jacobi's method for this system is xk1+1 = (?1 + 2xk2 ? 2xk3 )=6; xk2+1 = (8 + 2xk1 ? xk3 )=5; xk3+1 = (8 ? 2xk1 ? xk2 )=4: 2
7.3. THE GAUSS-SEIDEL METHOD
53
Starting with x = 0 , we obtain the results tabulated in table 7.1. k 0 1 2 3 4 5 10
xk1 0 -0.166667 -0.3 -0.35 -0.407778 -0.434167 -0.491339
xk2 0 1.6 1.133333 1.143333 1.086667 1.059056 1.008028
xk3 0 2.0 1.683333 1.866667 1.889167 1.932222 1.990504
Table 7.1: Accuracy of the Jacobi method (5 decimal place accuracy is obtained after 23 iterations)
7.3 The Gauss-Seidel Method This method is closely related to the Jacobi method, the only dierence being that the improved values xk+1 are used as soon as they are available. With A = L + D + U as before, the system of equations can be written in the form (L + D)x = b ? Ux We are lead to the iterative method (L + D)xk+1 = b ? (A ? (L + D))xk ; k = 0; 1; ::: Once again provided none of the diagonal entries are zero we have xk+1 = D?1 b ? D?1Lxk+1 ? D?1 Uxk which is the Gauss-Seidel Method. In terms of components this equation is
0
1
i?1 n X X k +1 k +1 k @ A xi = bi ? aij xj ? aij xj aii j =1 j =i+1
i = 1; :::; n k = 0; 1; :::
Example 1 For the system given the previous example the Gauss-Seidel method is xk1+1 = (?1 + 2xk2 ? 2xk3 )=6; xk2+1 = (8 + 2xk1+1 ? xk3 )=5; xk3+1 = (8 ? 2xk1+1 ? xk2+1 )=4: Starting with x = 0 , we obtain the results tabulated in table 7.1.
54
CHAPTER 7. ITERATIVE SOLUTION OF LINEAR EQUATIONS k 0 1 2 3 4 5 10
xk1 0 -0.166667 -0.222222 -0.382407 -0.445664 -0.475251 -0.499510
xk2 0 1.533333 1.171111 1.083370 1.037662 1.017216 1.000341
xk3 0 1.7 1.818333 1.920361 1.963416 1.983322 1.999670
Table 7.1: Accuracy of the Gauss-Seidel method (5 decimal place accuracy is obtained after 13 iterations)
7.4 Convergence Analysis The important question for an iterative method is whether xk+1 converges to x (the solution of the linear system) and if so how fast. Consider an iterative method given by a matrix M . Let ek = xk ? x. Then ek+1 = M ?1(M ? A)ek = (I ? M ?1 A)ek : If we denote the error matrix (I ? M ?1 A) by E , then ek = Eek?1 = Eke0 :
Theorem 1 If kE k = < 1 then kx1 ? xk kx0 ? xk for every x0 and the sequence xk will converge to the solution of the matrix equation for every initial guess x0 . Remark 1 Note that this theorem states that the norm of the error will decrease mono-
tonically if and only if the corresponding norm of the error matrix is less than one. On the other hand it is still possible for the norm of the error to converge to zero even if the norm of E is larger than 1.
Proof. If kE k < 1 then, kx1 ? xk = ke1 k = kEe0k kEkke0 k ke0 k = kx0 ? xk: 2
7.4. CONVERGENCE ANALYSIS
55
De nition 1 The spectral radius (A) of a matrix is the maximum of the absolute values of the eigenvalues of the matrix. Namely
(A) = max jj (A)j j where j (A) is an eigenvalue of A. Let us note a number of interesting results:
Proposition 1 1. There exist matrices B with (B ) = 0 and kB k > 0 for any norm k k. 2. (A) kAk for any operator norm. 3. If A is singular then the error matrix E of the Jacobi and Gauss-Seidel methods have an eigenvalue equal to 1 and so (E ) 1.
Proof. The proofs of these results are left as an exercise for the reader. 2 We now have the following important result which associates the spectral radius of the error matrix, with the convergence properties of the iterative method.
Theorem 2 E k e0 ! 0 for every e0 as k ! 1 if and only if (E ) < 1. Proof. We can use the Jordan Normal Form to outline a prove this result. We will
restrict ourselves to real matrices which have only real eigenvalues. For the general case we can work over the complex eld and obtain the same result. Any matrix of the speci ed form can be decomposed into
E = T ?1 JT where J has a block diagonal form
2 66 J1 J 0 2 J = 666 0 ... 4
where each Ji has the form
Jm
2 0 66 1 66 1 ... ... Ji = 66 66 1 4 0
3 77 77 75 3 77 77 77 : 77 5
56
CHAPTER 7. ITERATIVE SOLUTION OF LINEAR EQUATIONS
Here is one of the eigenvalues of E . Now E k e0 = T?1 Jk T. It is easy to study the form of J k . In particular, if (E ) < 1 then the entries in the matrix J k converge to zero and so the resulting error must converge to zero. On the other hand, if (E ) 1 then it is possible to nd an eigenvector corresponding to an eigenvalue greater than or equal to one. It then follows that a starting vector can be found so that the error does not converge to zero. 2 This theorem allows us to study the convergence of our iterative methods. The idea is to apply conditions on the matrix A so that the spectral radius of the error matrix is less than one.
7.4.1 Application to Gauss-Seidel We will apply the previous result to the Gauss-Seidel method. First a de nition.
De nition 2 A matrix A is dominated by its diagonal D if either kD?1L + D?1U k1 < 1 or
kD?1L + D?1U k1 < 1:
In terms of components, A is diagonally dominated if either n X j =1 j 6=i
or
n X j =1 j 6=i
jaij j < jaiij ; i = 1; :::; n jajij < jaii j ; i = 1; :::; n:
Theorem 3 If A is diagonally dominated then the Gauss-Seidel method converges for all starting vectors.
Proof. Let us suppose that kD?1 L + D?1 U k1 < 1. Now the error matrix for GaussSeidel is given by
EGS = (L + D)?1 U:
We want to show that (EGS ) < 1. Suppose to the contrary that there exists an eigenvalue of EGS with associated eigenvector v 6= 0 such that jj 1. Hence
EGS v = v
7.5. ASYMPTOTIC RATE OF CONVERGENCE and so Thus
57
v = ?(D?1L + ?1 D?1U)v: kD?1 L + ?1D?1U k 1
(7.1)
for any operator norm. On the other hand, since jj 1
kD?1 L + ?1D?1U k1 = maxi Pik?=11 jlik j=jaii j + jj?1 Pnk=i+1 juik j=jaii j maxi Pik?=11 jlik j=jaii j + Pnk=i+1 juik j=jaii j = kD?1 L + D?1 U k1 < 1:
But this contradicts equation Equation 7.1 and so we must have jj < 1 for all eigenvalues of EGS , that is (EGS ) < 1. 2
7.5 Asymptotic Rate of Convergence As we have seen, the error at the k-th step of an iterative method is given by E k e0 . Hence asymptotically (large k) the eigenvalue of largest modulus will dominate (i.e. eigenvalue equal to (E )). That is the size of the k-th error will be approximately C j(E )jk for large k. Here C is a constant. Thus, to reduce the error by a factor 10?m we have to make n iterations where n is the smallest number such that
j(E )jn 10?m or
n log ?(m (E )) : 10
The asymptotic rate of convergence of an iterative method is de ned to be
R = ? log10 ((E )): So we obtain approximately nR extra signi cant gures of accuracy every n iterations of our method.
7.6 Relaxation Methods By a simple modi cation of the Gauss-Seidel method it is often possible to make a substantial improvement in the rate of convergence of the method. We note that the
CHAPTER 7. ITERATIVE SOLUTION OF LINEAR EQUATIONS
58
Gauss-Seidel method can be written in the form
xk+1 = xk + D?1 b ? (U + D)xk ? Lxk+1 = x k + pk :
The iterative method
xk+1 = xk + !pk
is the successive over-relaxation (SOR) method. Here !, the relaxation parameter, should be chosen so that the rate of convergence is maximized. For ! = 1 the method obviously reduces to the Gauss-Seidel method. We want to write SOR in the form M xk+1 = b ? (A ? M )xk . Now
xk+1 = xk + !D?1 b ? (U + D)xk ? Lxk+1
so that
!?1 Dxk+1 = !?1 Dxk + b ? (U + D)xk ? Lxk+1
and consequently
(L + !?1 D)xk+1 = b ? U + D + L ? (L + !?1 D) xk :
Hence for SOR the conditioning matrix is given by M ! = L + !?1 D and the error matrix is given by
E!
= I ? (L + !?1 D)?1 A ? = (L + !?1 D)?1 (!?1 ? 1)D ? U :
If ! > 1 then one speaks of over-relaxation, if ! < 1 then of under-relaxation.
Example 1 For the example given earlier, the SOR method is xk1+1 = (1 ? !)xk1 + !6 (?1 + 2xk2 ? 2xk3 ); xk2+1 = (1 ? !)xk2 + !5 (8 + 2xk1 ? xk3 ); xk3+1 = (1 ? !)xk3 + !4 (8 ? 2xk1 ? xk2 ): With ! = 1:15 the following results were obtained. For large systems the improvement in convergence with SOR can be even more pronounced than in this example. See the example at the end of the chapter for a situation where such an improvement is obtained.
7.6. RELAXATION METHODS k 0 1 2 3 4 5 10
59
xk1 0 -0.191667 -0.222227 -0.467803 -0.484375 -0.498250 -0.499998
xk2 0 1.751833 1.036493 1.045262 1.002260 1.002404 1.000000
xk3 0 1.906446 1.843806 1.991903 1.915800 1.999566 1.999999
Table 7.1: Accuracy of the SOR method (5 decimal place accuracy is obtained after 8 iterations)
7.6.1 Convergence of SOR We will now provide a simple convergence analysis to show that SOR can be applied successfully to systems involving positive de nite matrices. First we will show that the relaxation parameter should be restricted to lie in the range 0 < ! < 2. Lemma 1 The over-relaxation algorithm will converge for all initial vectors x0 if j1 ? !j < 1.
Proof. Suppose A is an n n matrix. Then (E! )n j Qni i(E! )j = j det E! j ? ? ? 1 = det (L + ! D)?1 (!?1 ? 1)D ? U = j1 ? !jn since L + !?1 D is a lower triangular matrix and (!?1 ? 1)D ? U is an upper triangular matrix. Hence if j1 ? !j < 1 then (E! ) < 1. 2 In the sequel we assume that 0 < ! < 2.
Theorem 2 If A is symmetric with positive diagonal then the over-relaxation method converges to A?1 b for any x0 and for every b if and only if A is positive de nite. Remark 1 Recall that A is positive de nite if for all x 6= 0 we have xTAx > 0. Corresponding to a positive de nite matrix is an inner product (; )A and norm k kA given by and
(u; v)A = uTAv
kukA = (u; u)1A=2 :
CHAPTER 7. ITERATIVE SOLUTION OF LINEAR EQUATIONS
60
The theorem is proved by showing that the errors decrease monotonically when calculated in the A norm. Then we can show that the errors tend to zero in this norm. Since all norms are equivalent on a nite vector space we can then conclude that the error converges to zero in any of the usual norms.
Proof. Now observe that for any iterative method A(ek+1 + ek) = (A ? 2M )(ek+1 ? ek)
(7.1)
and that
kek+1 k2A ? kekk2A
= ek+1 Aek+1 ? ek Aek = (ek+1 ? ek ) (A ? 2M )(ek+1 ? ek ):
Let us calculate A ? 2M for the SOR method.
A ? 2M ! = L + D + U ? 2L ? 2!?1 D = (U ? L) + (1 ? 2!?1 )D A is symmetric (A = AT ) so that U = LT and consequently xT(U ? L)x = 0 for all x since
xT(U ? L)x = (xT(U ? L)x)T = xT(UT ? LT)x = ?xT(U ? L)x: So
ek+1 Aek+1 ? ek Aek = (ek+1 ? ek) (1 ? 2!?1)D(ek+1 ? ek) 0 (7.2) since D is positive de nite and (1 ? 2!?1 )I is negative de nite by assumption. Equality will occur if and only if ek+1 ? ek = 0 . As we already know, if A is singular then (E ) 1 and this prevents convergence for some starting vectors. Hence we will assume that A is non-singular. Since ek = A?1M (ek+1 ? ek)
(7.3)
there is equality in Equation 7.2 only when ek = 0 . We may now draw the conclusions. If A is positive de nite kek kA must decrease to a non-negative limit as k ! 1 in which case ek+1 ? ek ! 0 and so Equation 7.3 implies that ek ! 0. If A is not positive de nite then there is a v such that vT Av < 0. For starting vectors x0 such that e0 = v, ek Aek < v Av and so ek does not converge to zero. 2
7.6. RELAXATION METHODS
61
7.6.2 SOR for Special Problems We state the next theorem as an example of the possible increase in the rate of convergence that can be obtained with SOR. Matrices of the form considered are typical of those obtained by discretizing partial dierential equations. The following result is due to Young [You72]:
Theorem 3 Let the real matrix A be symmetric positive de nite and of Block tridiagonal form 3 2 D U 1 1 77 66 77 66 L2 D2 U2 ... ... ... 77 A = 66 7 66 Ln?1 Dn?1 Un?1 75 4 where Di are diagonal submatrices. Then
Ln
Dn
(EGS ) = (EJ )2 and the optimal relaxation factor !~ in SOR is given by !~ = 1 + (1 ? 2(E ))1=2 GS ((EGS ) < 1). The optimal value of (E!~ ) is
(E!~ ) = !~ ? 1 Let us consider the particular case of approximating Laplace's equation. For this problem on a rectangular domain the discretization of the equation leads to the solution of a linear equation of the form described in the previous theorem. For this problem the spectral radius of the error matrix for the Jacobi method is
2 (EJ ) = cos( N ) 1 ? 0:5 N where the matrix is N N . The asymptotic rate of convergence is 2 RJ ? log 1 ? 0:5( N ) 0:21( N )2 Using the previous theorem we conclude that 2 (EGS ) 1 ? N and 2 2 !~ = 1 + sin and (E!~ ) 1 ? N : N
CHAPTER 7. ITERATIVE SOLUTION OF LINEAR EQUATIONS
62
The corresponding rates of convergence are
RGS 0:43( N )2 R!~ 0:86 N Hence
R!~ 2N : RGS 1 This means that approximately N fewer iterations are needed for SOR with optimal parameter to obtain the same increase in the number of accurate signi cant gures as Gauss-Seidel. In addition Gauss-Seidel needs approximately half the number of iterations of the Jacobi method.
7.7 Iterative Improvement The use of iterative methods can be applied even when the basic solution method is Gaussian elimination. Since Gaussian elimination provides a factorization LU which is the exact factorization of a matrix close to A, we can use an iterative method where M = LU (the computed factor of A). As the initial guess for the iterative method we use the solution obtained via Gaussian elimination. This leads to the following method: [Iterative Improvement] Let x0 be the approximate solution and L and U be the factors obtained by Gaussian elimination. Let xk; s = 1; 2; 3; ::: be obtained by the following iterative scheme: 1. Given an approximation xk , compute b ? Axk using double precision and round to single precision to obtain the residue rk . 2. Find the solution ek of LU ek = rk using backward and forward substitution. 3. Let xk+1 = xk + ek . 4. If the dierence between xk+1 and xk is within a given tolerance, then nish, otherwise let k = k + 1 and go back to step (1). The use of double precision in the calculation of the residue is important as catastrophic cancellation is likely to occur if single precision is used.
Example 1 Consider the matrix equation 2 66 0:20000 0:16667 0:14286 4 0:16667 0:14286 0:12500
0:14286 0:12500 0:11111
32 77 66 x1 5 4 x2 x3
3 2 77 = 66 0:50953 5 4 0:43453
0:37897
3 77 : 5
7.7. ITERATIVE IMPROVEMENT The exact solution is
63
x = (1; 1; 1)T :
If oating point arithmetic with 5 digits is used then Gaussian elimination will give the computed triangular factors
3
2 1 6 6 L = 4 0:83335
3
2
0:20000 0:16667 0:14286 7 0 07 6 6 7 0:00397 0:00595 75 1 0 5;U = 4 0 0 0 0:00015 0:71430 1:49874 1
and computed solution
x0 = (1:0345; 0:89673; 1:06667)T The rst step of iterative improvement involves calculating the residue b ? Ax0 in double precision arithmetic (in this case 10 digits).
2 66 0:5095324653 0 Ax = 4 0:4345190593
3 2 77 66 ?0:24653 0 ? 5 5 and so r = 10 4 1:09407
3 77 5:
0:80793 0:3789619207 Then we must solve for e0 using backward and forward substitution. We get
2 ?0:03709 6 0 6 e = 4 0:09955 ?0:06424
2 3 77 ; and x1 = x0 + e0 = 66 1:00136 4 0:99628 5
1:00243
3 77 : 5
Note that the error in the corrected solution x1 is approximately 30 times smaller than those in x0 . If we continue the iterations then the approximate solutions xk converge rapidly to the exact solution. This is clearly illustrated in table 7.1.
k 0 1 2 3
xk1
1.03845 1.00136 1.00005 1.00000
xk2
0.89673 0.99628 0.99986 1.00000
xk3
1.06667 1.00243 1.00009 1.00000
Table 7.1: Improved accuracy using Iterative Improvement.
64
CHAPTER 7. ITERATIVE SOLUTION OF LINEAR EQUATIONS
Chapter 8
The Conjugate Gradient Method 8.1 Introduction In the section on iterative methods we assumed that we could nd a conditioning matrix M which was close to A. If this was so, then we could show that the corresponding iterative method would produce a sequence of vectors which would converge to our solution. In this section we will suppose that our matrix in positive de nite. In this case we can rephrase our linear equation problem into a minimization problem. Speci cally, consider the matrix problem Ax = b where A is positive de nite. Consider the quadratic form f (y) = 21 y Ay ? b y: Then we have the following theorem:
Theorem 1 The linear problem: Find x such that Ax = b, is equivalent to the minimization problem: Find x such that f (x) = minn f (y): y2IR Proof. Let y be an arbitrary vector. Rewrite y = x + z. Now f (y) = f (x + z) = f (x) + z (Ax ? b) + 21 z Az: Here we have used the assumed symmetry of A. If Ax = b, then f (y) = f (x) + z Az and so f (y) f (x) for all y. On the other hand, if f (y) f (x) for all y, then the term z (Ax ? b) must be non-negative for all choices of z. This is possible only if Ax = b. 2 In this section we will use this equivalence to generate iterative schemes to solve linear equations associated with positive de nite matrices. The iterative schemes we will consider will be of the form xk+1 = xk + kdk (8.1)
66
CHAPTER 8. THE CONJUGATE GRADIENT METHOD
where dk is called the search direction, and k > 0 is the step length (compare this to the update formulation of standard iterative methods, Equation 7.2). The search direction and step length are chosen to ensure that the functional values f (xk ) decrease to the minimum value, and hence xk ! x. We will consider two particular methods, the gradient method and the Conjugate Gradient Method. Our discussion will follow chapter 7 of [Joh90].
8.1.1 Preliminaries Let us introduce some notation. Suppose g : IRm ! IR. The gradient of g will be denoted g0 where g0 = (@1 g; : : : ; @m g) and the Hessian g00 = (gij ) will be given by
2 6 g00 = 64
3
@112 g @12m g .. .. 77 : . . 5 2 2 @m1 g @mm g
Example 1 Let us calculate the gradient and Hessian of the quadratic function f (y) given above. In this case and
f 0 (y) = Ay ? b f 00 = A:
Suppose we have a scheme in which xk+1 is given by equation Equation 8.1. Then an application of Taylor's formula implies that 2 g(xk+1 ) = g(xk ) + kg0 (xk) dk + 2k dk g00 dk + O(3k) as k ! 0. Suppose that g0 (xk ) 6= 0. We can then ensure that g(xk+1 ) < g(xk ) (at least for small k ) if g0 (xk ) dk < 0: (8.2) We say that dk is a descent direction if equation Equation 8.2 holds.
8.2 Gradient Method: Steepest Descent The dierent methods are determined by the choice of descent direction dk and step length k . For the Gradient method we choose dk = ?g0(xk ):
8.2. GRADIENT METHOD: STEEPEST DESCENT
67
The step length k is then chosen optimally, in the sense that
g(xk + kdk ) = min g(xk + dk): >0 It is easy to see that k must then satisfy
d g(xk + dk ) =k = 0: d
This is equivalent to
g0 (xk + kdk) dk = 0:
In the case g = f quadratic, we have 0
so
0 k k k = f k(x + kkd ) d k = A(x + k d ) ? b d = (Axk ? b) dk + k dk Adk
k k
)d k = (b?dkAx kAdkk d = drk Ad k
(8.1)
where as before rk = b ? Axk is the residue. This speci es the gradient method for the iterative solution of positive de nite matrices. An important question is how fast does this method converge. For a positive de nite matrix A, the condition number (with respect to the L2 norm) is given by
(A) = max((AA)) min
where max (A) and min (A) are the maximum and minimum eigenvalues of A. It is worth noting that these eigenvalues satisfy
max (A) = max y 2 IRm y 6= 0 min (A) = min y 2 IRm y 6= 0
yAy kyk2 yAy kyk2
The second ratio is known as the Rayleigh Quotient. Also recall that if B is a symmetric matrix with eigenvalues 1 (B ); : : : ; n (B ) then the L2 operator norm of B satis es
kB k2 = max jj (B )j: j
68
CHAPTER 8. THE CONJUGATE GRADIENT METHOD
To obtain an idea of how the gradient method converges, let us consider a steepest descent method with a constant step length. That is, let us consider the iterative method where k = , a constant, and
dk+1 = rk = b ? Axk: Then
xk+1 = xk + (b ? Axk):
(8.2)
Now the exact solution x must also satisfy
x = x + (b ? Ax):
(8.3)
Subtracting equations Equation 8.2 and Equation 8.3 leads to the equation
xk+1 ? x = (I ? A)(xk ? x): If we let ek denote the error at the kth step, then we have ek+1 = Eek where the error matrix (see the section x7.1) is given by E = I ? A. For convergence we need kI ? Ak2 < 1. Consequently, we need maxj j1 ? j (A)j < 1. Since all the eigenvalues are positive, we conclude that this implies that < 2 (A) : max Let us choose = 1=max (A). It turns out that this choice is close to the best choice. Then we have kI ? Ak2 = 1 ? min ((AA)) = 1 ? (1A) : max
2
This show that as the condition number increases, the convergence rate decreases. In fact we see that the number of iterations required to obtain a speci ed accuracy will be proportional to the condition number.
8.3 Conjugate Gradient Method In the case of steepest descent the descent directions have a tendency to zig-zag into the solution. What we want is to ensure that the sequence of descent directions are chosen to be mutually orthogonal. This will necessarily stop the oscillations. Actually we can arrange the method so that the descent directions are conjugate in the sense that
di Adj = 0 i 6= j: This means that the directions are orthogonal with respect to a new scalar product h; i given by hx; yi = x Ay:
8.3. CONJUGATE GRADIENT METHOD
69
The conjugacy condition can be written hdi ; dj i = 0 i 6= j: Associated with this scalar product is a norm k kA given by kxkA = hx; xi1=2 : The conjugate gradient method can now be described. Essentially we choose the descent directions to be conjugate, and choose the step length to be optimal. Speci cally we have the algorithm: [Conjugate Gradient Method] Given an initial vector x0 and descent direction d0 = r0 = b ? Ax0 , generate xk; dk; k = 1; 2; : : : using the following iterative scheme: 1. Set 2. Set xk+1 = xk + k dk .
k k k = r k dk hd ; d i
3. Calculate the new residue rk+1 = b ? Axk+1. If the residue is small enough, then terminate the calculation, otherwise continue. 4. Set
k+1 k
k = hr k ; dk i hd ; d i 5. Calculate the new descent direction dk+1 = rk+1 ? k dk . 6. Set k = k + 1 and return to item (1).
A number of points should be noted. First, k has been chosen to be optimal, as in equation Equation 8.1. Second, the parameter k has be chosen to ensure that hdk+1; dki = 0. To see this, note that hdk+1 ; dki = hrk+1 ? k dk; dk i = hrk+1 ; dk i ? k hdk ; dk i: Consequently, hdk+1 ; dk i = 0 will be satis ed if k+1 k k = hr k ; dk i : hd ; d i The wonderful thing about this method, is that the search direction dk+1 turns out to be conjugate to all the other search directions. We will now show this via a number of lemmas.
CHAPTER 8. THE CONJUGATE GRADIENT METHOD
70
Lemma 1 The Conjugate gradient method produces search directions and residues which
generate the following equivalent vector subspaces.
span[d0 ; : : : ; dm ] = span[r0 ; : : : ; rm ] = span[r0 ; Ar0 ; : : : ; Am r0 ]:
Proof. The proof follows easily from an induction argument and the observation that dk+1 = rk+1 ? kdk and rk+1 = rk ? kAdk. See [Joh90][p.132]. 2 Lemma 2 The search directions di are pairwise conjugate, i.e. hdi ; dj i = 0, i 6= j . Further, the residues ri are orthogonal, i.e. ri rj = 0, i 6= j . Proof. We will use induction. Let us suppose the statement of the lemma is true for i; j k. That is, suppose ri rj = 0 and hdi ; dj i = 0 for i 6= j and i; j k. Now rk dj = 0 for j = 0; : : : ; k ? 1, since by induction hypothesis rk is orthogonal to the subspace span[r0 ; : : : ; rk?1 ] which by lemma 1 is equal to span[d0 ; : : : ; dk?1 ]. Consequently,
rk+1 dj = rk dj ? khdk; dj i = 0
for j = 0; : : : ; k ? 1. Substituting the exact form for k shows that rk+1 dk = 0 . Hence rk+1 dj = 0 for j = 0; : : : ; k. This shows that rk+1 rj = 0 for j = 0; : : : ; k. Now Adj 2 span[r0 ; : : : ; rj+1 ]. So hrk+1; dj i = 0 (8.1) for j = 0; : : : ; k ? 1. We can now use this to show that the search directions are conjugate. Observe that hdk+1; dj i = hrk+1 ; dji ? khdk; dj i = 0 for j = 0; : : : ; k ? 1. This follows from our induction hypothesis on conjugacy and by equation Equation 8.1. By construction of the method hdk+1 ; dk i = 0, so the induction step is complete. Our proof follows by observing that the result is obviously true for k = 1. 2 We are now lead to the interesting result that the Conjugate gradient method is a direct method for positive de nite matrices. That is, when using exact arithmetic, the method produces the exact solution in a nite number of steps. This result can be stated as follows:
Theorem 3 Consider an n n positive de nite matrix A. For some m n, Axm = b, where xm is generated by the Conjugate gradient method. Proof. Since the residues rj are mutually orthogonal, we must have sooner or later rm = b ? Axm = 0 for some m n. 2
8.3. CONJUGATE GRADIENT METHOD
71
The Conjugate gradient method can also be pro tably viewed as an iterative method. Observe that since xk+1 ? xk = k dk we have mX ?1 xm ? x0 = kdk: (8.2) k=0 By conjugacy mX ?1 m 0 m hx ? x ; d i = k hdk; dm i = 0: k=0 Consequently hxm; dmi = hx0 ; dmi (8.3) for m = 0; 1; : : :. By equation Equation 8.3 we see that rk dk = (Ax ? Axk) dk = hx ? xk; dki = hx ? x0 ; dki where x is the exact solution of our problem. Substitution of this relation for rk dk into the expression for k leads to the equality 0 k k = hx ?kx ;kd i hd ; d i for k = 0; 1; : : :. The previous relation for k can be substituted into equation Equation 8.2 to show that hxk ? x0 ; dji = Pkm?=01 m hdm; dj i = j hdj ; dj i = hx ? x0 ; dj i: for j = 0; 1; : : : ; k ? 1. So xk ? x0 is the projection of the initial error x ? x0 onto the subspace Wk = span[d0 ; : : : ; dk?1] Thus kx ? xkkA kx ? x0 + ykA (8.4) for all y 2 Wk . By lemma 1 we have Wk = span[r0 ; Ar0 ; : : : ; Ak?1 r0 ] = span[A(x ? x0 ); A2 (x ? x0 ); : : : ; Ak (x ? x0 )]: Hence any element y 2 Wk can be written as qk (A)(x ? x0 ) where qk (z ) is a polynomial of the form Pkj=1 aj z j . Equation Equation 8.4 can now be written in the form kx ? xkkA kpk(A)(x ? x0 )kA for all polynomials pk of the form pk (z ) = 1 + Pkj=1 aj z j . This leads to the following theorem:
CHAPTER 8. THE CONJUGATE GRADIENT METHOD
72
Theorem 4 kx ? xkkA kpk(A)(x ? x0)kA max jp ( (A))jkx ? x0 kA j k j P for all polynomials p of the form p (z ) = 1 + k a z j . k
j =1 j
k
To use this theorem we will construct a polynomial p~k of the required form such
k =
max
m in(A)z m ax(A)
jp~k (z)j
is as small as possible. Luckily this is a classical problem. The best polynomial will be the appropriate Chebyshev polynomial, and in that case
"p #k ( A ) ? 1
k = 2 p : (A) + 1
Thus for a given > 0, we will have kx ? xkkA kx ? x0 kA
p
provided k is chosen so that k . This is true if k > 12 (A) log 2 . For the Conjugate gradient methodpthe number of iterations necessary to obtain a required accuracy is proportional to (A). This should be compared to the gradient method in which the number of iterations is proportional to (A).
Chapter 9
The Eigenvalue Problem 9.1 Introduction Let us consider the problem of nding the eigenvalues of an n n matrix A. The classical method involves solving the equation det(A ? I ) = 0 for , where the function f () = det(A ? I ) is a polynomial of degree n. There are a number of disadvantages with this straight forward method. First, the evaluation of the roots of the polynomial det(A ? I ) is an ill-posed problem. That is, a very small change in the elements of A will produce a large change in the roots of the polynomial. Secondly, the number of operations required to calculate the coecients is of the order n!. Finally once the polynomial is known, it is necessary to solve for it's roots. Such root nding routine are necessarily iterative in nature and are prone to stability and convergence problems. In other words, the straight forward method for calculating eigenvectors and eigenvalues is not conducive to numerical computation. Note that since this root nding aspect is a key element in the determination of eigenvalues, we can never completely avoid it. In this section we will consider the simpler problem of determining just a few special eigenvalues of a matrix. First we will consider the problem of determining the dominant eigenvalue.
9.2 The Power Method Let us suppose that the matrix A has a complete set of eigenvalues 1; 2 ; :::; n?1 ; n; ordered so that j1 j j2 j ::: jn?1 j jnj; with corresponding eigenvectors, u1; u2 ; :::; un?1 ; un :
CHAPTER 9. THE EIGENVALUE PROBLEM
74
We will suppose that these eigenvectors have unit norm (kui k = 1). The dominant eigenvalues are the eigenvalues with the largest absolute values. Now let us make the added assumption that j1 j is strictly greater than j2 j. That is, we assume there is a unique dominant simple eigenvalue. Suppose we are given a vector x which is dependent on the rst eigenvector, that is,
x = c1 u1 + c2u2 + ::: + cn un where c1 6= 0. Since Auj = j uj , it is easy to see that
Ax = 1 c1 u1 + 2 c2 u2 + ::: + n cn un and so
Ak x = k1 c1 u1 + k2c2 u2 + ::: + kn cn un :
Therefore
Ak x = c u + 2 k c u + ::: + n k c u : 1 1 1 2 2 1 n n k1 Note that the terms i k i 6= 1 1 converge to zero as k tends to in nity. Hence, for large k the dominant term is c1 u1 and so Ak x = c u lim 1 1 k!1 k1 and kAk xk = j j: lim 1 k!1 k?1
kA
xk
This result forms the basis for a numerical method for determining the dominant eigenvalue of a matrix. The quantity kAk xk can grow very large if j1 j > 1 (very small if j1 j < 1) and so a sensible numerical method must scale the calculations at each step. This leads to the following algorithm known as the Power Algorithm. [Power Method] Let x0 2 IRn be an initial non-zero vector and let 0 = x0p where jx0pj = kx0 k1. Specify a tolerance > 0. Set k = 0. Generate xk and k; k = 1; 2; 3; ::: by using the following iterative scheme: 1. Set xk+1 = ?k 1 Axk
2. Set k+1 = xkp+1 where jxkp+1 j = kxk+1 k1 . 3. Set k = k + 1
9.2. THE POWER METHOD
75
4. If jk ? k?1j then the calculation has succeeded; write out the approximate eigenvalue k and eigenvector xk , and terminate the scheme. The sequence k will converge to the absolute value of the dominant eigenvalue (provided it is distinct) and the vector xk to the corresponding eigenvector. Let us prove that this. We make the same assumptions as for the analysis above, namely that the dominant eigenvector is real and distinct and our starting vector is dependent on the dominant eigenvector. We then have that
xk+1
= 1k Axk = 1k A k1?1 Axk?1 = k 1k?1 A2 xk?1 = k k?11 :::0 Ak+1 x0
Taking the norm of this expression, and observing that kxk+1 k1 = k+1 , we see that
k+1 k k?1:::0 = kAk+1 x0 k1 Hence
kAk xk = : kAk?1xk k+1
Consequently (using the result obtained earlier)
lim = j1 j k!1 k+1 as required. Let us rephrase the results of the previous section.
Theorem 1 Suppose a matrix A has a distinct dominant eigenvalue. Provided the starting vector x0 for the power algorithm is dependent on the dominant eigenvector (which will happen almost always), then the sequence k will converge to i where ji j = maxj jj j (the maximum of the absolute values of the eigenvalues of A) and the vector xk will converge to the dominant eigenvector.
Example 1 Let us use the power algorithm to estimate the dominant eigenvalue of the matrix " # A= 5 2 2 0 Note that the exact eigenvalues can be calculated and are approximately 1 = 5:70156212 and 2 = ?:70156212:
CHAPTER 9. THE EIGENVALUE PROBLEM
76
We will use the in nity norm. As a starting vector we choose x0 = (1; 0)T . Then 0 = 1. The rst step of the power algorithm gives " # " # 1 x = A 1 = 5 ; 1 = 5: 0 2 Continuing the process, we obtained the following results k 0 2 3 4 5 6 7 8 9 10 11
k 1.00000000 5.00000000 5.80000000 5.68965517 5.70303030 5.70138151 5.70158434 5.70155938 5.70156246 5.70156208 5.70156212
xk1 1.00000000 5.00000000 5.80000000 5.68965517 5.70303030 5.70138151 5.70158434 5.70155938 5.70156246 5.70156208 5.70156212
xk2 0.00000000 2.00000000 2.00000000 2.00000000 2.00000000 2.00000000 2.00000000 2.00000000 2.00000000 2.00000000 2.00000000
Table 9.1: Accuracy of the Power method The dominant eigenvalue (to 9 signi cant gures) is then given by 5:70156212 and the corresponding eigenvector is (5:70156212; 2:00000000)T . Note that in general both coordinates of the eigenvector would change with each iteration.
9.2.1 Convergence Rate of Power Algorithm We again make the assumption that the eigenvalues of a matrix are real and that the dominant eigenvalue 1 is distinct. Now n k Ak x0 = c u + X i u c 1 1 i i k 1 1 i=2 and so Ak x0 = c u + O(k ): 1 1 k1 where = 2 =1 (the ratio of the largest to the second largest eigenvalue). Note that jj 1. Here we are using the large O notation. A function f (k) = O(k ) as k tends to in nity if f (k)=k is bounded as k tends to in nity. The estimate of the absolute value of the dominant eigenvalue can then be given as k k = kAk?x10 k = j1 j + O(k ): kA x0 k
9.2. THE POWER METHOD
77
In fact we can expand the error term further to see that
k = j1 j + ak + Rk where a is some constant and
Rk = 0: lim k!1 k
We observe that the error ek = k ? j1 j satis es the relation
ek ek?1 This is an example of what is called linear convergence.
9.2.2 Shifts As described, the power method can only be used to nd the dominant eigenvalue. We will now describe how to nd the maximum and minimum eigenvalues of a matrix. The main idea is to \shift" the matrix so that one or other of the extreme eigenvalues become dominant. Let us consider the following example to clarify the method. Consider 3 2 0 1 0 7 6 A = 64 1 0 0 75 0 0 0:5 This matrix has eigenvalues 1 = ?1; 2 = 0:5; 3 = 1 with eigenvectors
2 3 2 3 2 3 17 0 1 6 7 6 7 6 6 7 6 7 6 v1 = 4 ?1 5 ; v2 = 4 0 5 ; v3 = 4 1 75 : 0
1
0
We observe that the power method will not work for this matrix, since the dominant eigenvalues are not distinct (j1 j = j3 j). Now consider the shifted matrix
2 ? 1 6 6 A ? I = 4 1 ? 0
0 0
0 0:5 ?
3 77 : 5
This matrix has eigenvalues ?1 ? ; 0:5 ? ; 1 ? . By judicious choice of shift we can make the smallest or the largest eigenvalue dominant. Hence the power method can be applied to nd either the smallest or largest eigenvalue of A ? I and so of A. In general the power method applied to a shifted matrix A ? I produces a sequence k such that ( 1 ? if > 21 (1 + 2 ) lim = k k!1 n ? if < 12 (1 + 2 ) where 1 and n denote respectively the smallest and largest eigenvalues of A.
CHAPTER 9. THE EIGENVALUE PROBLEM
78
9.2.3 Aitken 2 Algorithm The Aitken 2 algorithm is used on sequences which demonstrate linear convergence to produce a new sequence which converges to the same limit at a faster rate. Hence we can apply the Aitken algorithm to the sequence k produced by the power method to form a new sequence 0k which converges to the dominant eigenvalue at a faster rate. The algorithm is based on the following theorem.
Theorem 2 If we have a sequence Sk = S1 + ak + Rk where the term Rk satis es limk!1 Rk =k = 0 then the sequence Sk0 given by
? Sk )(Sk ? Sk?1) Sk0 = Sk + (Sk2+1 S ?S ?S k
k+1
k?1
satis es 0
lim Sk = S1 Sk0!1 ? S 1 k = 0: lim k!1 Sk ? S1 Note that Equation 9.1 implies that the sequence Sk0 converges to S1 faster than does Sk .
Example 2 Let us apply the Aitken algorithm to the values of k in table 9.1. Table
9.2 shows the speed up obtained by using the Aitken algorithm. Note that this speedup occurs only because the initial sequence produced by the power method displays linear convergence.
9.3 Inverse Iteration or the Inverse Power Method By utilizing shifts we have shown how to calculate the smallest and largest eigenvalues of a matrix. By also introducing the idea of using the inverse of a matrix (or at least by solving linear equations associated with a matrix) it is possible to use the power method to obtain an estimate of the eigenvalue which is closest to any speci ed initial guess. Consider the power method applied to the inverse of a matrix A. If A has eigenvalues (1 ; 2 ; ::::; n?1 ; n ) then A?1 has eigenvalues
(?1 1 ; ?2 1 ; ::::; ?n?1 1 ; ?n 1 ):
9.3. INVERSE ITERATION OR THE INVERSE POWER METHOD k 0 2 3 4 5 6 7 8 9 10 11
k 1.00000000 5.00000000 5.80000000 5.68965517 5.70303030 5.70138151 5.70158434 5.70155938 5.70156246 5.70156208 5.70156212
79
0k unde ned 4.70588235 5.68965517 5.70303030 5.70158434 5.70156246 5.70156212 5.70156212 5.70156212 5.70156212 unde ned
Table 9.2: Results using the power method and Aitken's algorithm. Hence the power method applied to A?1 (called inverse iteration) will converge to ?k 1 where jk j?1 = maxj jj j?1 = 1= minj jj j. Hence the power method applied to A?1 will converge to the reciprocal of the eigenvalue with the smallest absolute value. Let us now consider inverse iteration with shifts. If A has eigenvalues (1 ; 2 ; ::::; n?1 ; n ) then A ? I has eigenvalues (1 ? ; 2 ? ; ::::; n?1 ? ; n ? ) and (A ? I )?1 has eigenvalues ((1 ? )?1 ; (2 ? )?1 ; ::::; (n?1 ? )?1 ; (n ? )?1 ): Hence the power method applied to the matrix (A ? I )?1 will converge to (k ? )?1 where jk ? j = minj jj ? j. This leads to the following algorithm: [Inverse Iteration] Let x0 2 IRn be an initial non-zero vector and let 0 = x0p where jx0pj = kx0 k1. Specify a shift , and a tolerance > 0. Initialize k = 0. Generate xk and k; k = 1; 2; 3; ::: by using the following iterative scheme: 1. Solve for xk+1 where (A ? I )xk+1 = ?k 1 xk 2. Set k+1 = xkp+1 where jxkp+1 j = kxk+1 k1 .
CHAPTER 9. THE EIGENVALUE PROBLEM
80 3. Set k = k + 1
4. If jk ? k?1j then the calculation has succeeded; write out the approximate eigenvalue k and eigenvector xk , and terminate the scheme. The sequence k will converge to the reciprocal of distance between and A's closest eigenvalue. The vector xk will converge to the corresponding eigenvector.
9.3.1 Error Analysis of Inverse Iteration. An integral part of the inverse iteration algorithm is the solution of the linear equation (A ? I )xk+1 = xk . From our analysis of error measurement for Gaussian elimination we know that the errors expected from such a calculation will depend on the condition number of the matrix A ? I . Now if A is symmetric, then
kA ? I k2 = maxj jj ? j k(A ? I )?1 k2 = minj j1j ?j : Now the condition number (with respect to the 2 norm) is given by j jj ? j 2 (A ? I ) = max min j ? j : j j
If the shift is close to an eigenvalue, say = + ( small), then
(A ? I ) 1 : Thus it would seem that if is close to an eigenvalue then the condition number of A ? I would be large and the calculation of xk+1 would involve a large error. While this is true, it can be shown that error vector is almost entirely in the direction of the dominant eigenvector. Hence the direction of the vector xk+1 will still converge to the direction of the dominant eigenvector. For this reason the method works well. In fact inverse iteration is used as a standard method for nding the eigenvector given an estimate for some eigenvalue.
Example 1 We will consider an example in which Inverse Iteration is used to estimate
an eigenvalue given a close estimate of that eigenvalue. Suppose we have an approximation to an eigenvalue such that = ? where is a small quantity. An estimate of the eigenvector corresponding to the eigenvalue and a correction to the estimate can easily be obtained via one iteration of the inverse power method. The initial vector x0 can be written
x0 = c1 u1 + ::: + cn un:
9.3. INVERSE ITERATION OR THE INVERSE POWER METHOD
81
The vector x1 can also be written as a linear combination of the eigenvectors. Since (A ? I )x1 = x0 ; it follows that
x1 = (A ? I)?1x0 = c?1 u1 + ::: + c?n un:
n 1 Since = ? is assumed to be a small quantity, we have that x1 c u = c u :
? Then the approximation to the dominant eigenvector is given by 1 u kxx1 k : 1
Note that we have normalized the eigenvector using the in nity norm. To estimate the coecient c we use the fact that the eigenvectors are orthogonal ( uT u = 0, 6= ). Hence c uT u (x0 )Tu and so 0T c (x T) u
u u
From this we can calculate the correction to the eigenvalue. kxc1k : 1 Let us apply this method to the matrix 3 2 0 1 0 7 66 4 1 0 0 75 0 0 0:5 with a shift = 0:49 with an initial vector x0 = (1; 1; 1)T . We need to solve
2 3 3 2 ? 0 : 49 1 0 77 1 66 1 77 66 1 (A ? I )x = 4 1 ?0:49 0 5 x = 4 1 5 : 0
0
0:01
1
Thus x1 = (1:97; 1:97; 100:0)T . The approximate eigenvector is given by 3 2 0 : 0197 1 u = kxx1 k = 664 0:0197 775 : 1 1:0 From this the coecient c is approximately 1.0. Hence kxc1k 0:01 1 and so an improved estimate for the eigenvalue is given by + = 0:5.
CHAPTER 9. THE EIGENVALUE PROBLEM
82
9.4 The QR Algorithm The power method described in the previous section provides an ecient method for calculating speci c eigenvalues and their corresponding eigenvectors. To obtain further eigenvalues it is possible to use de ation methods. Unfortunately roundo errors grow when these techniques are used. In this section we will discuss a method, the QR method, which provides an ecient method for nding all the eigenvalues of a matrix. Before we describe the method it is necessary to recall some properties of symmetric matrices. The result that underpins the method is the fact that the eigenvalues of a symmetric matrix remain unchanged under an orthogonal similarity transformation. Speci cally we have the result:
Proposition 1 Let A be a symmetric matrix. Then if is an eigenvalue of A with eigenvector v then is an eigenvalue of the matrix QT AQ with eigenvector QT v, where Q is an arbitrary orthogonal matrix.
Proof. The result is seen via the following simple argument: QT AQ(QT v) = QTAv = QTv = QTv:
2 For a symmetric A it is possible to nd an orthogonal Q such that
QT AQ = diag(1 ; 2 ; :::; n ): In practice though it is impossible to do so in a nite number of operations. On the other hand it is possible to nd an orthogonal matrix Q which transforms the matrix A into a tridiagonal matrix. The number of operations needed to make such a transformation is O(n3 ). Just as with QR factorization, the transformation to tridiagonal form can most easily be understood as a sequence of Householder transformations, each transformation being used to \clear" the entries in the column below the matrix element ai;i?1 and the entries in the row to the right of the matrix element ai?1;i for i = 2; 3; :::; n. Once a symmetric matrix has been transformed to a tridiagonal form, it is possible to use the so called QR method to eciently nd approximations to the complete set of eigenvalues of A.
9.4.1 Reduction to Tridiagonal Form. We will use Householder matrices to generate an orthogonal similarity transformation which transforms a symmetric matrix A into a tridiagonal matrix. Since we know how to use Householder matrices to \clear out" columns it is a simple matter to apply a number
9.4. THE QR ALGORITHM
83
of transformations to produce the required tridiagonal form. We describe the method via an example. Let 3 2 1 : 0 1 : 0 0 : 5 7 6 A = 64 1:0 1:0 0:25 75 : 0:5 0:25 2:0 This matrix is symmetric. To form a tridiagonal matrix we must clear out the last element in the rst column (and then the last element of the rst row). Hence we must calculate a Householder matrix H2 . We consider the rst column 2 3 1:0 x = 664 1:0 775 0:5 of A. We form the unit vector 3 2 0 : 0 7 6 u = 64 0:9732 75 : 0:2297 The Householder matrix H2 is then given by 3 2 1 : 0 0 : 0 0 : 0 7 66 4 0:0 ?0:8944 ?0:4472 75 : 0:0 ?0:4472 0:8945 Now it is important to note that H2T , if applied on the left of A will clear out the last entry of the rst row of A (provided A is symmetric). Hence we see that A1 = H2 AH2T is equal to 3 2 1 : 0 ? 1 : 118 0 : 0 7 66 4 ?1:118 1:400 ?0:550 75 0:0 ?0:550 1:600 which is tridiagonal. Of course if the dimension of the initial matrix is larger we will have to apply more Householder transformations so as to clear out subsequent rows and columns.
9.4.2 The QR Algorithm Once our matrix is in tridiagonal form we can apply the following algorithm, known as the QR algorithm. Let A be an n n symmetric tridiagonal matrix. Let A0 = A and = 0. Generate the matrices A ; = 1; 2; 3; ::: using the following iterative scheme:
84
CHAPTER 9. THE EIGENVALUE PROBLEM 1. Choose a shift (to be discussed later, it will be an estimate of an eigenvalue). 2. Find the QR factorization of the matrix A ? I , that is nd Q and R such that Q R = A ? I (i.e. shift then factor). 3. Set A +1 = R Q + I (i.e. reverse order of factors and shift back). 4. Set = + 1.
To help justify the algorithm we rst observe that the iterates A are similar and so all have the same eigenvalues and the iterates keep the tridiagonal form.
Lemma 1 A+1 is similar to A and consequently has the same eigenvalues. Proof. We just need to show that A+1 is similar to A . A+1 = R Q + I = QT Q R Q + I = QT (A ? I )Q + I = QT A Q : So A +1 is similar to A . 2 Lemma 2 If A is tridiagonal then A+1 is tridiagonal.
Proof. First we observe that if R is upper triangular, then R?1 is also upper triangular
(consider what happens when you use Gaussian elimination to produce the inverse). Now since A is assumed to be tridiagonal, and R?1 is upper triangular, it follows that Q = A R?1 must be upper Hessenberg. An upper Hessenberg matrix is a matrix which has only non-zero elements in the sub-diagonal (one below the diagonal) and above. It follows that R Q must also be upper Hessenberg. But R Q is also symmetric and so it must be tridiagonal. Hence A +1 is tridiagonal. 2 On the face of it there is no reason for this algorithm to produce anything but a sequence of similar tridiagonal matrices. The amazing fact is that the matrices A almost always converge to a diagonal matrix. This result is closely related to the convergence behaviour of the power method. The interested reader is referred to Golub and Van Loan [GV89][ch.7-8]. The basic QR algorithm ( = 0 can be shown to be linearly convergent. However, with a proper choice of shifts k we can drive the subdiagonal elements to zero at a faster rate. We mention two such shifting strategies. (1) Rayleigh Shift: = an;n (The shift is given by the lower right hand corner element.) (2) Wilkinson Shift: is given by the eigenvalue of the lower right hand 2 by 2 matrix # " an?1;n?1 an?1;n an;n?1 an;n which is closest to an;n.
9.4. THE QR ALGORITHM
85
9.4.3 Stopping Criterion In general
3
2 66 a1;1 66 a2;1 A = 666 0 66 .. 4 .
0 ::: 0 a1;2 .. 77 ... . 77 a2;2 a2;3 7 ... ... ... 0 77 77 . . . a n?1;n?2 an?1;n?1 an?1;n 5 0 ::: 0 an;n?1 an;n Now an;n?1 ! 0 (and an?1;n ! 0) and an;n ! (an eigenvalue) as n ! 1. A measure of the relative error in the estimate of the eigenvalue is jan;n ? j (an?1;n)2 : an;n
When
(an?1;n )2 an;n tolerance we accept an;n as an eigenvalue and then \de ate" to the (n ? 1) by (n ? 1) matrix consisting of the n by n matrix with the n-th row and column excluded. The algorithm is then continued with this smaller matrix.
9.4.4 Example of the QR Algorithm Let us apply the QR algorithm using Rayleigh shifts to the symmetric matrix
"
A= 1 2 2 3 The shift is given by 0 = 3,
#
"
# ? 2 2 A ? 0 I = : 2 0
We nd the QR factors of this matrix to be
"
# ? 0 : 7068 0 : 7069 Q0 = ; 0:7069 0:7072
"
Reversing the factors and shifting back gives us the matrix A1
A1
#
R0 = 2:8274 ?1:4136 : 0:0 1:4136
" = R0 Q0 + 0 I #
= 0:0024 :9989 : 0:9992 3:9997
CHAPTER 9. THE EIGENVALUE PROBLEM
86 Shift given by 1 = 3:9997,
"
# ? 3 : 9973 0 : 9989 A1 ? 1 I = : 0:9992
We nd the QR factors of this matrix to be
"
# ? 0 : 9701 0 : 2426 Q1 = ; 0:2426 0:9702
0:0
"
#
R1 = 4:1202 ?0:9690 : 0:0 0:2422
Reversing the factors and shifting back gives us the matrix A2
A2
" = R 1 Q1 + 1 I # = ?0:2324 0:0594 ; 0:0588 4:2351
We take 1 = 4:2351 and 2 = ?0:2324 as approximations to the eigenvalues of the matrix A. An estimate of the relative error is given by
j4:2351 ? 1j 0:052 0:00059: 4:2351 1
Note that the eigenvalues are 4:23607 and ?0:23609 correct to 5 decimal places. Note that we have a large amount of roundo error due to using only 4 signi cant gures. Hence the error estimate is a little over optimistic.
Bibliography [CMSW79] A. K. Cline, C. B. Moler, G. W. Stewart, and J. H. Wilkinson. An estimate for the condition number of a matrix. Siam Num. Anal., 16:368{375, 1979. [GV89]
G. Golub and C. Van Loan. Matrix Computations. The John Hopkins University Press, Baltimore and London, 1989.
[Joh90]
C. Johnson. Numerical Solution of Partial Dierential Equations by the Finite Element Method. Cambridge University Press, Canbridge, 1990.
[Kah66]
W. Kahan. Numerical linear algebra. Canadian Math. Bull., 9:757{801, 1966.
[Ric83]
J. Rice. Matrix Computations and Mathematical Software. McGraw-Hill, 1983.
[Wil61]
J. H. Wilkinson. Error analysis of direct methods of matrix inversion. J. ACM, 8:281{330, 1961.
[You72]
D. M. Young. Iterative Solution of Large Linear Systems. Academic Press, New York, 1972.