Chapter 7. PETROV-GALERKIN METHODS. 7.1 Energy Norm Minimization. 7.2
Residual Norm Minimization. 7.3 General Projection Methods. 7.1 Energy Norm ...
Chapter 7 PETROV-GALERKIN METHODS 7.1 Energy Norm Minimization 7.2 Residual Norm Minimization 7.3 General Projection Methods
7.1
Energy Norm Minimization
Saad, Sections 5.3.1, 5.2.1a. 7.1.1 Methods based on approximate minimization 7.1.2 Steepest descent We assume that A is symmetric positive definite: A = QΛQT ,
Λ = diag (λ1 , . . . , λn ),
λi > 0.
Unlike SOR, there are no parameters to choose.
7.1.1
Methods based on approximate minimization
Minimization methods minimize some objective function φ(x) = φ(x1 , x2 , . . . , xn ) : phi
x2
x1
155
Possible objective functions: (i) norm of error kx − A−1 bk2 , (ii) norm of residual kA(x − A−1 b)k2 = kAx − bk2 , p (iii) energy norm of error kA1/2 (x − A−1 b)k2 = (x − A−1 b)T A(x − A−1 b) where A1/2 = QΛ1/2 QT is the unique symmetric positive definite square root of A. Each of these is a weighted 2-norm of the rotated error: −1
T
−1
kA(x − A b)k2 = kΛQ (x − A b)k2 kA1/2 (x − A−1 b)k2 = kΛ1/2 QT (x − A−1 b)k2 kx − A−1 bk2 = kQT (x − A−1 b)k2
least desirable weights λ1 , λ2 , . . . , λn 1/2 1/2 1/2 weights λ1 , λ2 , . . . , λn weights 1, 1, . . . , 1 most desirable
√ We denote the energy norm by |||w||| := w T Aw. In some applications it measures the square root of energy or power. We would like to minimize kx−A−1 bk2 but any minimization method would need to know A−1 b. However, |||x − A−1 b|||2 = (x − A−1 b)T A(x − A−1 b) −1 = xT Ax − 2xT b + b|T A {z }b, constant
which attains its minimum at the same value of x as does 1 φ(x) = xT Ax − xT b, 2 and this is cheap to compute for given x. Note unit circle in energy norm
unit circle
= = = =
{w | |||w||| = 1} {w | kΛ1/2 QT wk2 = 1} {QΛ−1/2 u | kuk2 = 1} QΛ−1/2 · ( unit circle) −1/2
(unit circle)
156
change variable Λ1/2 QT w =: u
Q
−1/2
(unit circle)
r
−1/2
elongation =
λmin
−1/2
=
p λmax p = kAk2 kA−1 k2 = κ2 (A) λmin
λmax −1 The contours of |||x − A b||| are curves of equal error in energy norm. xn
x1
For the 5-point difference operator on unit square − ui+N −1 1 − ui−1 + 4ui − ui+1 , h2 − ui−N +1
h=
1 . N
pπ 2 qπ 2 ) + (2N sin 2N ) , p, q = 1, 2, . . . , N − 1. The eigenvalues are (2N sin 2N
λmin ≈ 2π 2 λmax ≈ 8N 2 p 2 κ2 (A) ≈ N π generic minimization method x0 = initial guess; {subscripts index iterates not components} for i = 0, 1, 2, . . . do { choose a direction pi ; {search direction} choose a distance αi ; xi+1 = xi + αi pi ; } One could choose αi to minimize φ(xi + αpi ). We have d d α2 T 1 T T T T φ(xi + αpi ) = p Api + α(pi Axi − pi b) + xi Axi − xi b dα dα 2 i 2 T T = αpi Api − pi ri 157
where ri = b − Axi , the residual. Hence αi =
pTi ri . pTi Api
Note that ri+1 = ri − αi Api , which is very cheap because we need to compute Api anyway to get αi . Also note xi+1 = xi +
pi pTi · ri . pTi Api | {z }
The matrix with the underbrace is a rank one approximation to A−1 . cyclic coordinate descent Use search directions e1 , e2 , . . . , en , e1 , e2 , . . . , en , . . . : x1 x2
xnew
eT1 r0 = x0 + e1 , a11 eT2 r1 = x1 + e2 , a22 .. . eT rold = xold + ei i aii
For descent in the ith direction only the ith component of residual is computed and only the ith component is updated. This is part of a Gauss-Seidel sweep—the relaxation of the ith unknown. n steps of cyclic coordinate descent ≡ 1 sweep of Gauss-Seidel
(x)2
(x)1 158
You can see why GS always converges if A is symmetric positive definite. Also, you can see the reason for overrelaxation, i.e., for multiplying the correction by ω > 1. The graph below shows that
phi(xold + alpha pold )
alphak
quadratic
2alphak
the objective function is reduced if we use ωαk , 0 < ω < 2.
7.1.2
Steepest descent
What is the direction of steepest descent? Consider φ(xold + εp) where ε > 0 is infinitesimal and kpk2 is fixed. We have 1 (xold + εp)T A(xold + εp) − (xold + εp)T b 2 1 T = x Axold − xTold b − ε pT (b − Axold ) + O(ε2). | {z } 2 old
φ(xold + εp) =
We can get greatest decrease in pT (b − Axold ) = pT r for fixed kpk2 r p
by choosing p = b − Axold = rold . I.e., the residual points in the direction of steepest descent: x0 = initial guess; r0 = b − Ax0 ; for i = 0, 1, 2, . . . do { r T ri αi = Ti ; ri Ari xi+1 = xi + αi ri ; ri+1 = ri − αi Ari ; } 159
Convergence can be slow if κ2 (A) is large. The trouble with these methods is that when we minimize in direction pi , we lose the minimization in directions p1 , p2 , . . . , pi−1 . This can be avoided if we use conjugate directions pTi Apj = 0 if i 6= j. The conjugate gradient method constructs conjugate directions1 from the gradients r0 , r1 , r2 , . . .. It is the subject of Section 8.4.
Review questions 1. Define the energy norm associated with a symmetric positive definite matrix A. 2. Solving Ax = b where A is symmetric positive definite is equivalent to minimizing what readily computable scalar function of x? Relate this to the energy norm of the error. 3. What is the spectral condition number κ2 (A) of a symmetric positive definite matrix A? 4. Given a search direction pi and an approximation xi to the value x∗ that minimizes φ(x) = 12 xT Ax − bT x with A symmetric positive definite, what is the best choice for xi+1 ? 5. How are search directions chosen in cyclic coordinate descent? 6. What known method do we get if we apply cyclic coordinate descent to objective function φ(x) = 12 xT Ax − bT x where A is symmetric positive definite? 7. For objective function φ(x) = 12 xT Ax − bT x where A is symmetric positive definite, what is the direction of steepest descent? 1
Often (x, y)A = xT Ay is called the energy inner product.
160
8. On what property of the matrix does the rate of convergence of steepest descent depend?
Exercises
1. Let A=
a11 a12 a21 a22
,
b=
b1 b2
.
Plotted below are the level curves (contours) of the function φ(x) = where x = [ x1 x2 ]T is variable. Draw the straight line
1 T x Ax 2
− bT x
a11 x1 + a12 x2 = b1 and the straight line a21 x1 + a22 x2 = b2 . Explain in words how you constructed these two lines. Recall that relaxing the ith variable so as to satisfy the ith equation is equivalent to minimizing φ(x) in the direction of the ith coordinate axis.
x2
major axis
minor axis x1 2. Draw level curves for |||x − A−1 b||| where 4 −2 3 A= and b = . −2 4 0 Model your solution after page 156 of notes. Label various key quantities and do a reasonably careful drawing. 3. Show that in the generic minimization method if αi is chosen to minimize φ(xi + αpi ), then ri+1 is orthogonal to pi . Interpret this geometrically. 161
7.2
Residual Norm Minimization
Saad, Sections 5.3.3, 5.3.2, 5.2.1b. For a general matrix only the residual norm kb − Axk can serve as an objective function for minimization. This attains its minimum at the same value of x as does 1 φ(x) = xT (AT A)x − xT (AT b). 2 This is the objective function for minimizing the energy norm for the normal equations (AT A)x = AT b. A generic minimization method with an analytic line search takes the form x0 = initial guess; r0 = b − Ax0 ; for i = 0, 1, 2, . . . do { choose a direction pi ; αi = ((Api )T ri )/((Api )T Api ); xi+1 = xi + αi pi ; ri+1 = ri − αi Api ; } 7.1.1 Residual Norm Steepest Descent 7.1.2 Minimal Residual
7.2.1
Residual Norm Steepest Descent
At a given point xi the direction of steepest descent for the residual norm is pi = AT ri , leading to an algorithm requiring a multiplication by A and by AT at each step.
7.2.2
Minimal Residual
It is often the case that the matrix A is available only implicitly via a function call and it is inconvenient to request a matrix-vector product with AT . This is avoided in the minimal residual method by using pi = ri for the search direction. The disadvantage is that convergence for an arbitrary nonsingular A is no longer guaranteed. Convergence, however, can be proved for the case where A + AT is s.p.d. 162
Review questions 1. The 2-norm of the residual for Ax = b is equal to the energy norm for what system of linear equations? 2. Given a search direction pi and an approximation xi to the value x∗ that minimizes kb − Axk2 with A nonsingular, what is the best choice for xi+1 ? 3. For the (square of the) residual for Ax = b, what is the direction of steepest descent? 4. How are search directions chosen in the minimal residual method? 5. What is the disadvantage of the residual norm steepest descent method compared to the minimal residual method? 6. For which nonsymmetric matrices A can it be proved that the residual norm steepest descent method converges to the solution of Ax = b? 7. For which nonsymmetric matrices A can it be proved that the minimal residual method converges to the solution of Ax = b?
7.3
General Petrov-Galerkin Methods
Saad, Section 5.1. (i) The term “projection method” is unhelpful (a better term is “restriction”); it is not used in the 1997 book by Anne Greenbaum nor in the 2003 book by van der Vorst. (ii) The statement bridging pages 130 and 131 does not seem correct except in a very contrived way. Consider a method based on minimizing the (square of the) energy norm (b − Ax)T A−1 (b − Ax),
x ∈ x0 + K
where A is symmetric positive definite and K is the span of one or several search directions. Let x˜ be the minimizer, let r = b − A˜ x, and consider another feasible value x0 = x˜ + w, w ∈ K: (b − Ax0 )T A−1 (b − Ax0 ) = (r − Aw)T A−1 (r − Aw) = r T A−1 r − 2w T r + w T Aw. Hence, as discussed previously, w = 0 minimizes the energy norm of the error if and only if w T r = 0 for all w ∈ K. Therefore the minimization problem can be expressed: find x ∈ x0 + K such that b − Ax ⊥ K.
163
This alternative formulation is sometimes called the Galerkin conditions. If K has a basis U, then x˜ = x0 + Uy and y is the solution of the system U T AUy = U T r0 where r0 = b − Ax0 . Similarly, minimizing the residual norm can be expressed find x ∈ x0 + K such that b − Ax ⊥ AK. The general Petrov-Galerkin method takes the form find x ∈ x0 + K such that b − Ax ⊥ L where L is a subspace of the same dimension as K. If K has a basis U and L has a basis V , this yields the system V T AUy = V T r0 to solve for y where r0 = b − Ax0 .
Review questions 1. Let A be symmetric positive definite, and let K be a linear subspace. Express the problem of minimizing the (square of the) energy norm (b − Ax)T A−1 (b − Ax),
x ∈ x0 + K
as a Petrov-Galerkin (or Galerkin) method. 2. What is the general form of a Galerkin method for solving Ax = b? 3. Consider the Galerkin conditions find x ∈ x0 + K such that b − Ax ⊥ K for seeking an approximate solution to Ax = b. Express these as a linear system of equations given that K = R(U) where U is of full rank. 4. Let A be symmetric positive definite, and let K be a linear subspace. Express the problem of minimizing the (square of the) residual norm (b − Ax)T (b − Ax),
x ∈ x0 + K
as a Petrov-Galerkin (or Galerkin) method. 5. What is the general form of a Petrov-Galerkin method for solving Ax = b? 164
6. Consider the Petrov-Galerkin conditions find x ∈ x0 + K such that b − Ax ⊥ L for seeking an approximate solution to Ax = b. Express these as a linear system of equations given that K = R(U) and L = R(v) where U and V are of full and equal rank.
165
166