Numerical methods for elliptic partial differential equations Arnold ...

4 downloads 146 Views 2MB Size Report
Numerical methods for elliptic partial ... Contents. 1 Introduction to elliptic boundary value problems. 9 ... 3.3 Approximation properties of finite element spaces .
Numerical methods for elliptic partial differential equations Arnold Reusken

Preface This is a book on the numerical approximation of partial differential equations. On the next page we give an overview of the structure of this book:

2

Elliptic boundary value problems (chapter 1): • Poisson equation: scalar, symmetric, elliptic. • Convection-diffusion equation: scalar, nonsymmetric, singularly perturbed. • Stokes equation: system, symmetric, indefinite. Weak formulation (chapter 2) • Basic principles (chapter 3); application to Poisson equation. Finite element method

−→

• Streamline-diffusion FEM (chapter 4); application to convection-diffusion eqn. • FEM for Stokes equation (chapter 5).

• Basics on linear iterative methods (chapter 6). • Preconditioned CG method (chapter 7); application to Poisson equation. Iterative methods

−→

• Krylov subspace methods (chapter 8); application to convection-diffusion eqn. • Multigrid methods (chapter 9). • Iterative methods for saddle-point problems (chapter 10); application to Stokes equation.

Adaptivity

−→

• A posteriori error estimation (chapter ). • Grid refinement techniques (chapter ).

3

4

Contents 1 Introduction to elliptic boundary value problems 1.1 Preliminaries on function spaces and domains . . . 1.2 Scalar elliptic boundary value problems . . . . . . 1.2.1 Formulation of the problem . . . . . . . . . 1.2.2 Examples . . . . . . . . . . . . . . . . . . . 1.2.3 Existence, uniqueness, regularity . . . . . . 1.3 The Stokes equations . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

9 9 11 11 13 14 15

2 Weak formulation 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Sobolev spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 The spaces W m (Ω) based on weak derivatives . . . . . . . . . . 2.2.2 The spaces H m (Ω) based on completion . . . . . . . . . . . . . 2.2.3 Properties of Sobolev spaces . . . . . . . . . . . . . . . . . . . . 2.3 General results on variational formulations . . . . . . . . . . . . . . . . 2.4 Minimization of functionals and saddle-point problems . . . . . . . . . 2.5 Variational formulation of scalar elliptic problems . . . . . . . . . . . . 2.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Elliptic BVP with homogeneous Dirichlet boundary conditions 2.5.3 Other boundary conditions . . . . . . . . . . . . . . . . . . . . 2.5.4 Regularity results . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.5 Riesz-Schauder theory . . . . . . . . . . . . . . . . . . . . . . . 2.6 Weak formulation of the Stokes problem . . . . . . . . . . . . . . . . . 2.6.1 Proof of the inf-sup property . . . . . . . . . . . . . . . . . . . 2.6.2 Regularity of the Stokes problem . . . . . . . . . . . . . . . . . 2.6.3 Other boundary conditions . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

17 17 22 23 25 28 34 43 45 45 46 49 55 56 56 58 60 61

. . . . . . . . . .

63 . 63 . 66 . 67 . 67 . 68 . 75 . 75 . 77 . 78 . 84

. . . . . .

. . . . . .

3 Galerkin discretization and finite element method 3.1 Galerkin discretization . . . . . . . . . . . . . . . . . . 3.2 Examples of finite element spaces . . . . . . . . . . . . 3.2.1 Simplicial finite elements . . . . . . . . . . . . 3.2.2 Rectangular finite elements . . . . . . . . . . . 3.3 Approximation properties of finite element spaces . . . 3.4 Finite element discretization of scalar elliptic problems 3.4.1 Error bounds in the norm k · k1 . . . . . . . . . 3.4.2 Error bounds in the norm k · kL2 . . . . . . . . 3.5 Stiffness matrix . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Mass matrix . . . . . . . . . . . . . . . . . . . 5

. . . . . .

. . . . . . . . . .

. . . . . .

. . . . . . . . . .

. . . . . .

. . . . . . . . . .

. . . . . .

. . . . . . . . . .

. . . . . .

. . . . . . . . . .

. . . . . .

. . . . . . . . . .

. . . . . .

. . . . . . . . . .

. . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

3.6 3.7

Isoparametric finite elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonconforming finite elements . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 Finite element discretization of a convection-diffusion problem 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 A variant of the Cea-lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 A one-dimensional hyperbolic problem and its finite element discretization 4.4 The convection-diffusion problem reconsidered . . . . . . . . . . . . . . . . 4.4.1 Well-posedness of the continuous problem . . . . . . . . . . . . . . 4.4.2 Finite element discretization . . . . . . . . . . . . . . . . . . . . . 4.4.3 Stiffness matrix for the convection-diffusion problem . . . . . . . . 5 Finite element discretization of the Stokes problem 5.1 Galerkin discretization of saddle-point problems . . . . 5.2 Finite element discretization of the Stokes problem . . 5.2.1 Error bounds . . . . . . . . . . . . . . . . . . . 5.2.2 Other finite element spaces . . . . . . . . . . .

. . . .

. . . .

. . . .

6 Linear iterative methods 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Basic linear iterative methods . . . . . . . . . . . . . . . . . 6.3 Convergence analysis in the symmetric positive definite case 6.4 Rate of convergence of the SOR method . . . . . . . . . . . 6.5 Convergence analysis for regular matrix splittings . . . . . . 6.5.1 Perron theory for positive matrices . . . . . . . . . . 6.5.2 Regular matrix splittings . . . . . . . . . . . . . . . 6.6 Application to scalar elliptic problems . . . . . . . . . . . . 7 Preconditioned Conjugate Gradient method 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Conjugate Gradient method . . . . . . . . . . . . . . . 7.3 Introduction to preconditioning . . . . . . . . . . . . . 7.4 Preconditioning based on a linear iterative method . . 7.5 Preconditioning based on incomplete LU factorizations 7.5.1 LU factorization . . . . . . . . . . . . . . . . . 7.5.2 Incomplete LU factorization . . . . . . . . . . . 7.5.3 Modified incomplete Cholesky method . . . . . 7.6 Problem based preconditioning . . . . . . . . . . . . . 7.7 Preconditioned Conjugate Gradient Method . . . . . . 8 Krylov Subspace Methods 8.1 Introduction . . . . . . . . . . . . . . . . . . . 8.2 The Conjugate Gradient method reconsidered 8.3 MINRES method . . . . . . . . . . . . . . . . 8.4 GMRES type of methods . . . . . . . . . . . 8.5 Bi-CG type of methods . . . . . . . . . . . . 6

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . . . . . . .

. . . . .

. . . . . . . . . .

. . . . .

. . . . . . . . . .

. . . . .

. . . .

. . . . . . . .

. . . . . . . . . .

. . . . .

. . . .

. . . . . . . .

. . . . . . . . . .

. . . . .

. . . .

. . . . . . . .

. . . . . . . . . .

. . . . .

. . . .

. . . . . . . .

. . . . . . . . . .

. . . . .

. . . .

. . . . . . . .

. . . . . . . . . .

. . . . .

. . . .

. . . . . . . .

. . . . . . . . . .

. . . . .

. . . .

. . . . . . . .

. . . . . . . . . .

. . . . .

. . . .

. . . . . . . .

. . . . . . . . . .

. . . . .

. . . . . . .

. . . .

. . . . . . . .

. . . . . . . . . .

. . . . .

. . . . . . .

. . . .

. . . . . . . .

. . . . . . . . . .

. . . . .

. . . . . . .

. . . .

. . . . . . . .

. . . . . . . . . .

. . . . .

85 85

. . . . . . .

87 87 91 93 100 101 105 113

. . . .

115 115 117 118 124

. . . . . . . .

127 127 130 134 137 140 141 143 146

. . . . . . . . . .

151 151 151 159 161 162 163 165 169 170 170

. . . . .

175 175 176 180 186 188

9 Multigrid methods 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 9.2 Multigrid for a one-dimensional model problem . . 9.3 Multigrid for scalar elliptic problems . . . . . . . . 9.4 Convergence analysis . . . . . . . . . . . . . . . . . 9.4.1 Introduction . . . . . . . . . . . . . . . . . 9.4.2 Approximation property . . . . . . . . . . . 9.4.3 Smoothing property . . . . . . . . . . . . . 9.4.4 Multigrid contraction number . . . . . . . . 9.4.5 Convergence analysis for symmetric positive 9.5 Multigrid for convection-dominated problems . . . 9.6 Nested Iteration . . . . . . . . . . . . . . . . . . . 9.7 Numerical experiments . . . . . . . . . . . . . . . . 9.8 Algebraic multigrid methods . . . . . . . . . . . . 9.9 Nonlinear multigrid . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . definite problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

197 197 198 203 207 207 209 210 216 218 223 223 224 226 226

10 Iterative methods for saddle-point problems 229 10.1 Block diagonal preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 10.2 Application to the Stokes problem . . . . . . . . . . . . . . . . . . . . . . . . . . 232 A Functional Analysis 235 A.1 Different types of spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 A.2 Theorems from functional analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 238 B Linear Algebra 241 B.1 Notions from linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 B.2 Theorems from linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

7

8

Chapter 1

Introduction to elliptic boundary value problems In this chapter we introduce the classical formulation of scalar elliptic problems and of the Stokes equations. Some results known from the literature on existence and uniqueness of a classical solution will be presented. Furthermore, we briefly discuss the issue of regularity.

1.1

Preliminaries on function spaces and domains

The boundary value problems that we consider in this book will be posed on domains Ω ⊂ Rn , n = 1, 2, 3. In the remainder we always assume that Ω

is open, bounded and connected.

Moreover, the boundary of Ω should satisfy certain smoothnes conditions that will be introduced in this section. For this we need so-called H¨older spaces. By C k (Ω), k ∈ N, we denote the space of functions f : Ω → R for which all (partial) derivatives Dν f :=

∂ |ν| f , ∂xν11 . . . ∂xνnn

ν = (ν1 , . . . , νn ), |ν| = ν1 + . . . + νn ,

¯ k ∈ N, consists of all functions of order |ν| ≤ k are continuous functions on Ω. The space C k (Ω), ¯ for which all derivatives of order ≤ k have continuous extensions to Ω. ¯ in C k (Ω) ∩ C(Ω) ¯ Since Ω is compact, the functional f → max max |D ν f (x)| = max kD ν f k∞,Ω¯ =: kf kC k (Ω) ¯ ¯ |ν|≤k x∈Ω

|ν|≤k

¯ The space (C k (Ω), ¯ k · k k ¯ ) is a Banach space (cf. Appendix A.1). defines a norm on C k (Ω). C (Ω) ν Note that f → max|ν|≤k kD f k∞,Ω does not define a norm on C k (Ω). For f : Ω → R we define its support by supp(f ) := { x ∈ Ω | f (x) 6= 0 }. The space C0k (Ω), k ∈ N, consists of all functions in C k (Ω) which have a compact support in Ω, i.e., supp(f ) ⊂ Ω. The functional f → max|ν|≤k kD ν f k∞,Ω defines a norm on C0k (Ω), but 9

(C0k (Ω), k · kC k (Ω) ) is not a Banach space. For a compact set D ⊂ Rn and λ ∈ (0, 1] we introduce the quantity [f ]λ,D := sup{

|f (x) − f (y)| | x, y ∈ D, x 6= y } kx − ykλ

for f : D → R.

¯ and say that f is H¨ ¯ with exponent λ if [f ] ¯ < ∞. We write f ∈ C 0,λ (Ω) older continuous in Ω λ,Ω 0,λ ¯ A norm on the space C (Ω) is defined by f → kf kC(Ω) ¯ + [f ]λ,Ω ¯. We write f ∈ C 0,λ (Ω) and say that f is H¨ older continuous in Ω with exponent λ if for arbitrary compact subsets D ⊂ Ω the property [f ]λ,D < ∞ holds. An important special case is λ = 1: the ¯ [or C 0,1 (Ω)] consists of all Lipschitz continuous functions on Ω ¯ [Ω]. space C 0,1 (Ω) k,λ k,λ ¯ ¯ [C k (Ω)] The space C (Ω) [C (Ω)], k ∈ N, λ ∈ (0, 1], consists of those functions in C k (Ω) ν 0,λ 0,λ ¯ ¯ we for which all derivatives D f of order |ν| = k are elements of C (Ω) [C (Ω)]. On C k (Ω) define a norm by X f → kf kC k (Ω) [Dα f ]λ,Ω¯ . ¯ + |α|=k

Note that ¯ ⊂ C k (Ω) ¯ C k,λ (Ω) for all k ∈ N, λ ∈ (0, 1], ¯ ⊂ C k,λ1 (Ω) ¯ C k,λ2 (Ω) for all k ∈ N, 0 < λ1 ≤ λ2 ≤ 1 , ¯ replaced by Ω. We use the notation C k,0 (Ω) ¯ := C k (Ω) ¯ [C k,0 (Ω) := C k (Ω)]. and similarly with Ω ¯ ⊂ C k,λ (Ω), ¯ λ ∈ (0, 1], is in general not true. Consider Remark 1.1.1 The inclusion C k+1 (Ω) p n = 2 and Ω = { (x, y) | − 1 < x < 1, − 1 < y < |x| }. The function ( 1 (sign x)y 1 2 if y > 0, f (x, y) = 0 otherwise, ¯ but f ∈ ¯ if λ ∈ ( 3 , 1]. belongs to C 1 (Ω), / C 0,λ (Ω) 4



Based on these H¨older spaces we can now characterize smoothness of the boundary ∂Ω. Definition 1.1.2 For k ∈ N, λ ∈ [0, 1] the property ∂Ω ∈ C k,λ (the boundary is of class C k,λ ) holds if at each point x0 ∈ ∂Ω there is a ball B = { x ∈ Rn | kx − x0 k < δ, δ > 0 } and a bijection ψ : B → E ⊂ Rn such that ψ(B ∩ Ω) ⊂ Rn+ := { x ∈ Rn | xn > 0 },

ψ(B ∩ ∂Ω) ⊂

∂Rn+ , −1

ψ ∈ C k,λ (B), ψ

∈ C k,λ (E).

(1.1a) (1.1b) (1.1c) 

For the case n = 2 this is illustrated in Figure ??. F igure 1 10

A very important special case is ∂Ω ∈ C 0,1 . In this case all transformations ψ (and their inverses) must be Lipschitz continuous functions and we then call Ω a Lipschitz domain. This holds, for example, if ∂Ω consists of different patches which are graphs of smooth functions (e.g., polynomials) and at the interface between different patches interior angles are bounded away from zero. In Figure ?? we give an illustration for the two dimensional case. F igure 2 A domain Ω is convex if for arbitrary x, y ∈ Ω the inclusion { tx+(1−t)y | t ∈ [0, 1] } ⊂ Ω holds. In almost all theoretical analyses presented in this book it suffices to have ∂Ω ∈ C 0,1 . Moreover, the domains used in practice usually satisfy this condition. Therefore, in the remainder of this book we always consider such domains, unless stated otherwise explicitly. Assumption 1.1.3 In this book we assume that the domain Ω ⊂ Rn is such that Ω is open, connected and bounded, ∂Ω

is of class C 0,1 .

¯ ⊂ C k,1 (Ω) ¯ (cf. remark 1.1.1). One can show that if this assumption holds then C k+1 (Ω)

1.2 1.2.1

Scalar elliptic boundary value problems Formulation of the problem

On C 2 (Ω) we define a linear second order differential operator L as follows: Lu =

n X

i,j=1

n

X ∂u ∂2u aij bi + + cu, ∂xi ∂xj ∂xi

(1.2)

i=1 2

u with aij , bi and c given functions on Ω. Because ∂x∂i ∂x = j of generality, that aij (x) = aji (x)

∂2u ∂xj ∂xi

we may assume, without loss

holds for all x ∈ Ω. Corresponding to the differential operator L we can define a partial differential equation Lu = f, (1.3) with f a given function on Ω. In (1.2) the part containing the second derivatives only, i.e. n X

i,j=1

aij

∂2u , ∂xi ∂xj

is called the principal part of L. Related to this principal part we have the n × n symmetric matrix A(x) = (aij (x))1≤i,j≤n . (1.4) 11

Note that due to the symmetry of A the eigenvalues are real. These eigenvalues, which may depend on x ∈ Ω, are denoted by λ1 (x) ≤ λ2 (x) ≤ . . . ≤ λn (x). Hyperbolicity, parabolicity, or ellipticity of the differential operator L is determined by these eigenvalues. The operator L, or the partial differential equation in (1.2), is called elliptic at the point x ∈ Ω if all eigenvalues of A(x) have the same sign. The operator L and the corresponding differential equation are called elliptic if L is elliptic at every x ∈ Ω. Note that this property is determined by the principal part of L only. Remark 1.2.1 If the operator L is elliptic, then we may assume that all eigenvalues of the matrix A(x) in (1.4) are positive: 0 < λ1 (x) ≤ λ2 (x) ≤ . . . ≤ λn (x)

for all x ∈ Ω.

The operator L (and the corresponding boundary value problem) is called uniformly elliptic if ¯ inf{λ1 (x) | x ∈ Ω} > 0 holds. Note that if the operator L is elliptic with coefficients aij ∈ C(Ω) ¯ then the function x → A(x) is continuous on the compact set Ω and hence L is uniformly elliptic. Using n X aij (x)ξi ξj = ξ T A(x)ξ ≥ λ1 (x)ξ T ξ, i,j=1

we obtain that the operator L is uniformly elliptic if and only if there exists a constant α0 > 0 such that for all ξ ∈ Rn , we have n X

i,j=1

aij (x)ξi ξj ≥ α0 ξ T ξ

for all x ∈ Ω. 

We obtain a boundary value problem when we combine the partial differential equation in (1.3) with certain boundary conditions for the unknown function u. For ease we restrict ourselves to problems with Dirichlet boundary conditions, i.e., we impose : u = g on ∂Ω, with g a given function on ∂Ω. Other types of boundary conditions are the so called Neumann ∂u on ∂Ω, and the mixed boundary boundary conditon, i.e., a condition on the normal derivative ∂n condition which is a linear combination of a Dirichlet and a Neumann boundary condition. Summarizing, we consider a linear second order Dirichlet boundary value problem in Ω ⊂ Rn : Lu =

n X

i,j=1

n

aij

X ∂u ∂2u bi + + cu = f ∂xi ∂xj ∂xi

in Ω,

(1.5a)

on ∂Ω,

(1.5b)

i=1

u=g

where (aij (x))1≤i,j≤n is such that the problem is elliptic. A solution u of (1.5) is called a ¯ The functions (aij (x))1≤i,j≤n , (bi (x))1≤i≤n and c(x) are classical solution if u ∈ C 2 (Ω) ∩ C(Ω). called the coefficients of the operator L. 12

1.2.2

Examples

We assume n = 2, i.e. a problem with two independent variables, say x1 = x and x2 = y. Then the differential operator is given by Lu = a11

∂2u ∂2u ∂u ∂u ∂2u + 2a + a + b1 + b2 + cu. 12 22 2 2 ∂x ∂x∂y ∂y ∂x ∂y

In this case we have λ1 (x)λ2 (x) = det(A(x)) and the ellipticity condition can be formulated as a11 (x, y)a22 (x, y) − a212 (x, y) > 0,

(x, y) ∈ Ω.

Examples of elliptic equations are the Laplace equation ∆u :=

∂2u ∂2u + 2 = 0 in Ω, ∂x2 ∂y

the Poisson equation (cf. Poisson [72]) −∆u = f in Ω,

(1.6)

−∆u + cu = f in Ω,

(1.7)

the reaction-diffusion equation and the convection-diffusion equation −ε∆u + b1

∂u ∂u + b2 = f in Ω, ∂x ∂y

ε > 0.

(1.8)

If we add Dirichlet boundary conditions to the Poisson equation in (1.6), we obtain the classical Dirichlet problem for Poisson’s equation:  −∆u = f in Ω , (1.9) u = g on ∂Ω. Remark 1.2.2 We briefly comment on the convection-diffusion equation in (1.8). If |ε/b1 | ≪ 1 or |ε/b2 | ≪ 1 (in a part of the domain) then the diffusion term −ε∆u can be seen as a ∂u perturbation of the convection term b1 ∂u ∂x + b2 ∂y (in a part of the domain). The convectiondiffusion equation is of elliptic type. However, for ε = 0 we obtain the so-called reduced equation which is of hyperbolic type. In view of this the convection-diffusion equation with |ε/b1 | ≪ 1 or |ε/b2 | ≪ 1 is called a singularly perturbed equation. The fact that the elliptic equation (1.8) is then in some sense close to a hyperbolic equation results in some special phenomena, that do not occur in diffusion dominated problems (as (1.9)). For example, in a convection dominated problem (e.g., an equation as (1.8) with ε ≪ 1 and bi = 1 , i = 1, 2 ) the solution u shows a behaviour in which most of the information is transported in certain directions (“streamlines”). So we observe a behaviour as in the hyperbolic problem (ε = 0), in which the solution satisfies an ordinary differential equation along each characteristic. Another phenomenon is the occurrence of boundary layers. If we combine the equation in (1.8) with Dirichlet boundary conditions on ∂Ω then in general these boundary conditions are not appropriate for the hyperbolic problem (ε = 0). As a result, if |ε/b1 | ≪ 1 or |ε/b2 | ≪ 1 we often observe that on a part of the boundary (corresponding to the outflow boundary in the hyperbolic problem) there is a small neighbourhood in which the solution u varies very rapidly. Such a neighbourhood is called a boundary layer. 13

For a detailed analysis of singularly perturbed convection-diffusion equations we refer to Roos et al [76]. An illustration of the two phenomena described above is given in Section ??. Finally we note that for the numerical solution of a problem with a singularly perturbed equation special tools are needed, both with respect to the discretization of the problem and the iterative solver for the discrete problem. 

1.2.3

Existence, uniqueness, regularity

For the elliptic boundary value problems introduced above, a first important topic that should be addressed concerns the existence and uniqueness of a solution. If a unique solution exists then another issue is the smoothness of the solution and how this smoothness depends on the data (source term, boundary condition, coefficients). Such smoothness results are called regularity properties of the problem. The topic of existence, uniqueness and regularity has been, and still is, the subject of many mathematical studies. We will not treat these topics here. We only give a few references to standard books in this field: Gilbarg and Trudinger [39], Miranda [64], Lions and Magenes [60], Hackbusch [45], [47]. We note that for the classical formulation of an elliptic boundary value problem it is often rather hard to establish satisfactory results on existence, uniqueness or regularity. In Section 2.5 we will discuss the variational (or weak) formulation of elliptic boundary value problems. In that setting, additional tools for the analysis of existence, uniqueness and regularity are available and (much) more results are known.

Example 1.2.3 The reaction-diffusion equation can be used to show that a solution of an elliptic Dirichlet boundary value problem as in (1.5) need not be unique. Consider the problem in (1.7) on Ω = (0, 1)2 , with f = 0 and c(x, y) = −(µπ)2 − (νπ)2 , µ, ν ∈ N, combined with zero Dirichlet boundary conditions. Then both u(x, y) ≡ 0 and u(x, y) = sin(µπx) sin(νπy) are solutions of this boundary value problem.  Example 1.2.4 Even for very simple elliptic problems a classical solution may not exist. Consider   −a ∂ 2 u2 = 1 in (0,1), ∂x



u(0) = u(1) = 0,

with a(x) = 1 for 0 ≤ x ≤ 0.5 and a(x) = 2 for 0.5 < x ≤ 1. Clearly the second derivative of a solution u of this problem cannot be continuous at x = 0.5.  We present a typical result from the literature on existence and uniqueness of a classical solution. For this we need a certain condition on ∂Ω. The domain Ω is said to satisfy the exterior ¯ ∩Ω ¯ = x0 . Note that this sphere condition if for every x0 ∈ ∂Ω there exists a ball B such that B 2,0 condition is fulfilled, for example, if Ω is convex or if ∂Ω ∈ C .

14

Theorem 1.2.5 ([39], Theorem 6.13) Consider the boundary value problem (1.5) and assume that (i) L is uniformly elliptic, (ii) Ω

satisfies the exterior sphere condition,

(iii) the coefficients of L and the function f belong to C 0,λ (Ω), λ ∈ (0, 1), (iv) c ≤ 0 holds,

(v) the boundary data are continuous : g ∈ C(∂Ω).

¯ Then the problem (1.5) has a unique classical solution u ∈ C 2,λ (Ω) ∩ C(Ω). With respect to regularity of the solution it is important to distinguish between interior ¯ A typical result on interior smoothness smoothness (i.e., in Ω) and global smoothness (i.e., in Ω). is given in the next theorem: Theorem 1.2.6 ([39], Theorem 6.17) Let u ∈ C 2 (Ω) be a solution of (1.5). Suppose that L is elliptic and that there are k ∈ N, λ ∈ (0, 1) such that the coefficients of L and the function f are in C k,λ (Ω). Then u ∈ C k+2,λ (Ω) holds. If the coefficients and f are in C ∞ (Ω), then u ∈ C ∞ (Ω). This result shows that the interior regularity depends on the smoothness of the coefficients and of the right handside f , but does not depend on the smoothness of the boundary (data). A result on global regularity is given in: ¯ be a classical solution of (1.5). Theorem 1.2.7 ([39], Theorem 6.19) Let u ∈ C 2 (Ω) ∩ C(Ω) Suppose that L is uniformly elliptic and that there are k ∈ N, λ ∈ (0, 1) such that the coefficients ¯ ∂Ω ∈ C k+2,λ . Assume that g can be extended on Ω such of L and the function f are in C k,λ (Ω), k+2,λ k+2,λ ¯ ¯ holds. that g ∈ C (Ω). Then u ∈ C (Ω) For a global regularity result as in the previous theorem to hold, the smoothness of the boundary (data) is important. In practice one often has a domain with a boundary consisting of the union of straight lines (in 2D) or planes (3D). Then the previous theorem does not apply and the global regularity of the solution can be rather low as is shown in the next example. Example 1.2.8 [from [47], p.13] We consider (1.9) with Ω = (0,1)×(0,1), f ≡ 0, g(x, y) = x2 ¯ (so g ∈ C(∂Ω), g ∈ C ∞ (Ω)). Then Theorem 1.2.5 guarantees the existence of a unique classical 2 ¯ ¯ solution u ∈ C (Ω) ∩ C(Ω). However, u is not an element of C 2 (Ω). 2 ¯ ¯ Proof. Assume that u ∈ C (Ω) holds. From this and −∆u = 0 in Ω it follows that ∆u = 0 in Ω 2 holds. From u = g = x on ∂Ω we get uxx (x, 0) = 2 for x ∈ [0, 1] and uyy (0, y) = 0 for y ∈ [0, 1]. It follows that ∆u(0, 0) = 2 must hold, which yields a contradiction. 

1.3

The Stokes equations

The n-dimensional Navier-Stokes equations model the motion of an incompressible viscous medium. It can be derived using basic principles from continuum mechanics (cf. [43]). The unknowns are the velocity field u(x) = (u1 (x), . . . , un (x)) and the pressure p(x), x ∈ Ω. If one 15

considers a steady-state situation then these Navier-Stokes equations, in dimensionless quantities, are as follows: −ν∆ui +

n X j=1

uj

∂ui ∂p + ∂xj ∂xi

= fi

n X ∂uj j=1

in Ω, 1 ≤ i ≤ n,

= 0 in Ω,

∂xj

(1.10a) (1.10b)

with ν > 0 a parameter that is related to the viscosity of the medium. Using the notation P ∂u ∆u := (∆u1 , . . . , ∆un )T , div u := nj=1 ∂xjj , f = (f1 , . . . , fn )T , we obtain the more compact formulation −ν∆u + (u · ∇)u + ∇p = f

div u = 0

in Ω,

(1.11a)

in Ω.

(1.11b)

Note that the pressure p is determined only up to a constant by these Navier-Stokes equations. The problem has to be completed with suitable boundary conditions. One simple possibility is to take homogeneous Dirichlet boundary conditions for u, i.e., u = 0 on ∂Ω. If in the NavierStokes equations the nonlinear convection term (u · ∇)u is neglected, which can be justified in situations where the viscosity parameter ν is large, one obtains the Stokes equations. From a simple rescaling argument (replace u by ν1 u) it follows that without loss of generality in the Stokes equations we can assume ν = 1. Summarizing, we obtain the following Stokes problem: −∆u + ∇p = f

div u = 0 u = 0

in Ω,

(1.12a)

in Ω,

(1.12b)

on ∂Ω.

(1.12c)

This is a stationary boundary value problem, consisting of n + 1 coupled partial differential equations for the unknowns (u1 , . . . , un ) and p. In [2] the notion of ellipticity is generalized to systems of partial differential equations. It can be shown (cf. [2, 47]) that the Stokes equations indeed form an elliptic system. We do not discuss existence and uniqueness of a classical solution of the Stokes problem. In chapter 2 we discuss the varitional formulation of the Stokes problem. For this formulation the issue of existence, uniqueness and regularity of a solution is treated in section 2.6.

16

Chapter 2

Weak formulation 2.1

Introduction

For solving a boundary value problem it can be (very) advantageous to consider a generalization of the classical problem formulation, in which larger function spaces are used and a “weaker” solution (explained below) is allowed. This results in the variational formulation (also called weak formulation) of a boundary value problem. In this section we consider an introductory example which illustrates that even for a very simple boundary value problem the choice of an “appropriate solution space” is an important issue. This example also serves as a motivation for the introduction of the Sobolev spaces in section 2.2. Consider the following elliptic two-point boundary value problem: −(au′ )′ = 1 in (0, 1),

(2.1a)

u(0) = u(1) = 0.

(2.1b)

We assume that the coefficient a is an element of C 1 ([0, 1]) and that a(x) > 0 holds for all x ∈ [0, 1]. This problem then has a unique solution in the space V1 = { v ∈ C 2 ([0, 1]) | v(0) = v(1) = 0 }. This solution is given by u(x) =

Z

0

x

−t + c dt, a(t)

c :=

Z

0

1

 t dt a(t)

Z

0

1

−1 1 dt , a(t)

which may be checked by substitution in (2.1). If one multiplies the equation (2.1a) by an arbitrary function v ∈ V1 , integrates both the left and right handside and then applies partial integration one can show that u ∈ V1 is the solution of (2.1) if and only if Z

0

1





a(x)u (x)v (x) dx =

Z

1

v(x) dx 0

for all v ∈ V1 .

(2.2)

This variational problem can be reformulated as a minimization problem. For this we introduce the notion of a bilinear form.

17

Definition 2.1.1 Let X be a vector space. A mapping k : X × X → R is called a bilinear form if for arbitrary α, β ∈ R and u, v, w ∈ X the following holds: k(αu + βv, w) = α k(u, w) + β k(v, w), k(u, αv + βw) = α k(u, v) + β k(u, w). The bilinear form is symmetric if k(u, v) = k(v, u) holds for all u, v ∈ X.



Lemma 2.1.2 Let X be a vector space and k : X × X → R a symmetric bilinear form which is positive, i.e., k(v, v) > 0 for all v ∈ X, v 6= 0. Let f : X → R be a linear functional. Define J : X → R by 1 J(v) = k(v, v) − f (v). 2 Then J(u) < J(v) for all v ∈ X, v 6= u, holds if and only if for all v ∈ X.

k(u, v) = f (v)

(2.3)

Moreover, there exists at most one minimizer u of J(·). Proof. Take u, w ∈ X, t ∈ R. Note that  1 J(u + tw) − J(u) = t k(u, w) − f (w) + t2 k(w, w) =: g(t; u, w). 2

“⇒”. If J(u) < J(v) for all v ∈ X, v 6= u, then the function t → g(t; u, w) must be strictly positive for all w ∈ X \ {0} and t ∈ R \ {0}. It follows that k(u, w) − f (w) = 0 for all w ∈ X. “⇐”. From (2.3) it follows that J(u + tw) − J(u) = 12 t2 k(w, w) for all w ∈ X, t ∈ R. For v 6= u, w = v − u, t = 1 this yields J(v) − J(u) = 12 k(v − u, v − u) > 0. We finally prove the uniqueness of the minimizer. Assume that k(ui , v) = f (v) for all v ∈ X and for i = 1, 2. It follows that k(u1 − u2 , v) = 0 for all v ∈ X. For the choice v = u1 − u2 we get k(u1 − u2 , u1 − u2 ) = 0. From the property that the bilinear form k(·, ·) is positive we conclude u1 = u2 . Note that in this lemma we do not claim existence of a solution. For the minimizer u (if it exists) the relation 1 J(v) − J(u) = k(v − u, v − u) for all v ∈ X 2 holds. We now return to the example and take Z 1 a(x)u′ (x)v ′ (x) dx, X = V1 , k(u, v) =

f (v) =

0

Z

(2.4)

1

v(x) dx. 0

Note that all assumptions of lemma 2.1.2 are fulfilled. It then follows that the unique solution of (2.1) (or, equivalently, of (2.2)) is also the unique minimizer of the functional J(v) =

Z

0

1

1 [ a(x)v ′ (x)2 − v(x)] dx. 2

(2.5)

We consider a case in which the coefficient a is only piecewise continuous (and not differentiable at all x ∈ (0, 1)). Then the problem in (2.1) is not well-defined. However, the definitions of the 18

bilinear form k(·, ·) and of the functional J(·) still make sense. We now analyze a minimization problem with a functional as in (2.5) in which the coefficient a is piecewise constant: ( 1 if x ∈ [0, 21 ] a(x) = 2 if x ∈ ( 21 , 1]. We show that for this problem the choice of an appropriate solution space is a delicate issue. Note that due to lemma 2.1.2 the minimization problem in X = V1 has a corresponding equivalent variational formulation as in (2.3). With our choice of the coefficient a the functional J(·) takes the form Z 1 Z 1 2 1 ′ 2 J(v) := [v ′ (x)2 − v(x)] dx. (2.6) [ v (x) − v(x)] dx + 1 0 2 2

This functional is well-defined on the space V1 . The functional J, however, is also well-defined if v is only differentiable and even if we allow v to be nondifferentiable at x = 21 . We introduce the spaces V2 = { v ∈ C 1 ([0, 1]) | v(0) = v(1) = 0 },

V3 = { v ∈ C 1 ([0, 12 ]) ∩ C 1 ([ 21 , 1]) ∩ C([0, 1]) | v(0) = v(1) = 0 },

V4 = { v ∈ C 1 ([0, 12 ]) ∩ C 1 ([ 21 , 1]) | v(0) = v(1) = 0 }.

Note that V1 ⊂ V2 ⊂ V3 ⊂ V4 and that on all these spaces the functional J(·) is well-defined. Moreover, with X = Vi , i = 1, . . . , 4, and k(u, v) =

Z

1 2

u′ (x)v ′ (x) dx +

0

Z

1 1 2

2u′ (x)v ′ (x) dx,

f (v) =

Z

1

v(x) dx

(2.7)

0

all assumptions of lemma 2.1.2 are fulfilled. We define a (natural) norm on these spaces: |||w|||2 :=

Z

1 2

w′ (x)2 dx +

0

Z

1 1 2

w′ (x)2 dx.

(2.8)

One easily checks that this indeed defines a norm on the space V4 and thus also on the subspaces Vi , i = 1, 2, 3. Furthermore, this norm is induced by the scalar product (w, v)1 :=

Z

1 2

w′ (x)v ′ (x) dx +

0

Z

1 1 2

w′ (x)v ′ (x) dx

(2.9)

on V4 , and

1 |||w|||2 ≤ k(w, w) ≤ |||w|||2 for all w ∈ V4 2 holds. We show that in the space V3 the minimization problem has a unique solution. Lemma 2.1.3 The problem minv∈V3 J(v) has a unique solution u given by ( 5 x if 0 ≤ x ≤ 12 , − 12 x2 + 12 u(x) = 5 1 − 14 x2 + 24 x + 24 if 12 ≤ x ≤ 1. 19

(2.10)

(2.11)

Proof. Note that u ∈ V3 and even u ∈ C ∞ ([0, 12 ]) ∩ C ∞ ([ 21 , 1]). We use the notation u′L ( 12 ) = limx↑ 1 u′ (x) and similarly for u′R ( 21 ). We apply lemma 2.1.2 with X = V3 . For 2 arbitrary v ∈ V3 we have Z 1 Z 1 2 ′ ′ k(u, v) − f (v) = 2u′ (x)v ′ (x) − v(x) dx u (x)v (x) − v(x) dx + 1 2

0

1 2

1 1 (u′′ (x) + 1)v(x) dx = u′L ( )v( ) − 2 2 0 Z 1 1 1 − 2u′R ( )v( ) − (2u′′ (x) + 1)v(x) dx. 1 2 2 Z

(2.12)

2

= on [ 12 , 1] and u′L ( 21 ) − 2u′R ( 21 ) = 0 we obtain Due to = −1 on [0, k(u, v) = f (v) for all v ∈ V3 . From lemma 2.1.2 we conclude that u is the unique minimizer in V3 . 1 2 ],

u′′ (x)

− 12

u′′ (x)

Thus with X = V3 a minimizer u exists and the relation (2.4) takes the form J(v) − J(u) =

1 k(v − u, v − u), 2

with k(·, ·) as in (2.7). Due to (2.10) the norm ||| · ||| can be used as a measure for the distance from the minimum (i.e. J(v) − J(u)): 1 1 |||v − u|||2 ≤ J(v) − J(u) ≤ |||v − u|||2 . 4 2

(2.13)

Before we turn to the the minimization problems in the spaces V1 and V2 we first present a useful lemma. Lemma 2.1.4 Define W := { v ∈ C ∞ ([0, 1]) | v(0) = v(1) = 0 }. For every u ∈ V3 there is a sequence (un )n≥1 in W such that lim |||un − u||| = 0. (2.14) n→∞

ˆ( 12 ) a fixed arbitrary Proof. Take u ∈ V3 and define u ˆ(x) := u′ (x) for all x ∈ [0, 1], x 6= 12 , u value and u ˆ(−x) = u ˆ(x) for all x ∈ [0, 1]. Then u ˆ is even and u ˆ ∈ L2 (−1, 1) . From Fourier analysis it follows that there is a sequence u ˆn (x) =

n X

ak cos(kπx),

k=0

such that lim kˆ u−

n→∞

u ˆn k2L2

= lim

Z

1

n→∞ −1

n ∈ N,

2 u ˆ(x) − u ˆn (x) dx = 0.

R1 Note that due to the fact that u is continuous and u(0) = u(1) = 0 we get a0 = 12 −1 u ˆ(x) dx = R1 ′ R 12 ′ Pn ak 0 u (x) dx + 12 u (x) dx = 0. Define un (x) = k=1 kπ sin(kπx) For n ≥ 1. Then un ∈ W , ′ ˆn and un = u |||u − un |||2 ≤ kˆ u−u ˆn k2L2 holds. Thus it follows that limn→∞ |||un − u||| = 0.

20

Lemma 2.1.5 Let u ∈ V3 be given by (2.11). For i = 1, 2 the following holds: inf J(v) = min J(v) = J(u).

v∈Vi

v∈V3

(2.15)

Proof. Take i = 1 or i = 2. I := inf v∈Vi J(v) is defined as the greatest lower bound of J(v) for v ∈ Vi . From V3 ⊃ Vi it follows that J(u) ≤ I holds. Suppose that J(u) < I holds, i.e. we have δ := I − J(u) > 0. Due to W ⊂ Vi and lemma 2.1.4 there is a sequence (un )n≥1 in Vi such that limn→∞ |||u − un ||| = 0 holds. Using (2.13) we obtain 1 J(un ) = J(u) + (J(un ) − J(u)) ≤ I − δ + |||un − u|||2 . 2

So for n sufficiently large we have J(un ) < I, which on the other hand is not possible because I is a lower bound of J(v) for v ∈ Vi . We conclude that J(u) = I holds. The result in this lemma shows that the infimum of J(v) for v ∈ V2 is equal to J(u) and thus, using (2.13), it follows that the minimizer u ∈ V3 can be approximated to any accuracy, measured in the norm ||| · |||, by elements from the smaller space V2 . The question arises why in the minimization problem the space V3 is used and not the seemingly more natural space V2 . The answer to this question is formulated in the following lemma. Lemma 2.1.6 There does not exist u ∈ V2 such that J(u) ≤ J(v) for all v ∈ V2 . Proof. Suppose that such a minimizer, say u ¯ ∈ V2 , exists. From lemma 2.1.5 we then obtain J(¯ u) = min J(v) = inf J(v) = J(u) v∈V2

v∈V2

with u as in (2.11) the minimizer in V3 . Note that u ¯ ∈ V3 . From lemma 2.1.2 it follows that the minimizer in V3 is unique and thus u ¯ = u must hold. But then u ¯=u∈ / V2 , which yields a contradiction. The same arguments as in the proof of this lemma can be used to show that in the smaller space V1 there also does not exist a minimizer. Based on these results the function u ∈ V3 is called the weak solution of the minimization problem in V2 . From (2.15) we see that for solving the minimization problem in the space V2 , in the sense that one wants to compute inf v∈V2 J(v), it is natural to consider the minimization problem in the larger space V3 . We now consider the even larger space V4 and show that the minimization problem still makes sense (i.e. has a unique solution). However, the minimum value does not equal inf v∈V2 J(v). Lemma 2.1.7 The problem minv∈V4 J(v) has a unique solution u ¯ given by ( − 12 x(x − 1) if 0 ≤ x ≤ 12 , u ¯(x) = − 14 x(x − 1) if 12 < x ≤ 1. Proof. Note that u ¯ ∈ V4 holds. We apply lemma 2.1.2. Recall the relation (2.12): Z 1 2 1 1 (¯ u′′ (x) + 1)v(x) dx k(¯ u, v) − f (v) = u ¯′L ( )v( ) − 2 2 0 Z 1 1 1 (2¯ u′′ (x) + 1)v(x) dx. − 2¯ u′R ( )v( ) − 1 2 2 2

21

From u ¯′′ (x) = −1 on [0, 12 ], u ¯′′ (x) = − 21 on [ 12 , 1] and u ¯′L ( 21 ) = u ¯′R ( 12 ) = 0 it follows that k(¯ u, v) = f (v) for all v ∈ V4 . We conclude that u ¯ is the unique minimizer in V4 . A straightforward calculation yields the following values for the minima of the functional J(·) in V3 and in V4 , respectively: J(u) = −

11 1 , 12 32

J(¯ u) = −

1 . 32

From this we see that, opposite to u, we should not call u ¯ a weak solution of the minimization problem in V2 , because for u ¯ we have J(¯ u) < inf v∈V2 J(v). We conclude the discussion of this example with a few remarks on important issues that play an important role in the remainder of this book 1) Both the theoretical analysis and the numerical solution methods treated in this book heavily rely on the varational formulation of the elliptic boundary value problem (as, for example, in (2.2)). In section 2.3 general results on existence, uniquess and stability of variational problems are presented. In the sections 2.5 and 2.6 these are applied to variational formulations of elliptic boundary value problems. The finite element discretization method treated in chapter 3 is based on the variational formulation of the boundary value problem. The derivation of the conjugate gradient (CG) iterative method, discussed in chapter 7, is based on the assumption that the given (discrete) problem can be formulated as a minimization problem with a functional J very similar to the one in lemma 2.1.2. 2) The bilinear form used in the weak formulation often has properties similar to those of an inner product, cf. (2.7), (2.9), (2.10). To take advantage of this one should formulate the problem in an inner product space. Then the structure of the space (inner product) fits nicely to the variational problem. 3) To guarantee that a “weak solution” actually lies in the space one should use a space that is “large enough” but “not too large”. This can be realized by completion of the space in which the problem is formulated. The concept of completion is explained in section 2.2.2. The conditions discussed in the remarks 2) and 3) lead to Hilbert spaces, i.e. inner product spaces that are complete. The Hilbert spaces that are appropriate for elliptic boundary value problems are the Sobolev spaces. These are treated in section 2.2.

2.2

Sobolev spaces

¯ that are used in the classical formulation of elliptic boundary value The H¨older spaces C k,λ (Ω) problems in chapter 1 are Banach spaces but not Hilbert spaces. In this section we introduce Sobolev spaces. All Sobolev spaces are Banach spaces. Some of these are Hilbert spaces. In our treatment of elliptic boundary value problems we only need these Hilbert spaces and thus we restrict ourselves to the presentation of this subset of Sobolev Hilbert spaces. A very general treatment on Sobolev spaces is given in [1]. 22

2.2.1

The spaces W m (Ω) based on weak derivatives

We take u ∈ C 1 (Ω) and φ ∈ C0∞ (Ω). Since φ vanishes identically outside some compact subset of Ω, one obtains by partial integration in the variable xj : Z Z ∂u(x) ∂φ(x) φ(x) dx = − u(x) dx ∂xj Ω Ω ∂xj and thus

Z



D α u(x)φ(x) dx = −

Z

u(x)D α φ(x) dx , Ω

|α| = 1,

holds. Repeated application of this result yields the fundamental Green’s formula Z Z α |α| D u(x)φ(x) dx = (−1) u(x)D α φ(x) dx , Ω

for all φ ∈



C0∞ (Ω),

k

u ∈ C (Ω), k = 1, 2, . . .

(2.16)

and |α| ≤ k.

Based on this formula we introduce the notion of a weak derivative: Definition 2.2.1 Consider u ∈ L2 (Ω) and |α| > 0. If there exists v ∈ L2 (Ω) such that Z Z |α| v(x)φ(x) dx = (−1) u(x)D α φ(x) dx for all φ ∈ C0∞ (Ω) Ω

(2.17)



then v is called the αth weak derivative of u and is denoted by D α u := v.



Two elementary results are given in the next lemma. Lemma 2.2.2 If for u ∈ L2 (Ω) the αth weak derivative exists then it is unique (in the usual Lebesgue sense). If u ∈ C k (Ω) then for 0 < |α| ≤ k the αth weak derivative and the classical αth derivative coincide. Proof. The second statement follows from the first one and Green’s formula (2.16). We now prove the uniqueness. Assume that vi ∈ L2 (Ω), i = 1, 2, both satisfy (2.17). Then it follows that Z  v1 (x) − v2 (x) φ(x) dx = hv1 − v2 , φiL2 = 0 for all φ ∈ C0∞ (Ω) Ω

Since is dense in L2 (Ω) this implies that hv1 − v2 , φiL2 = 0 for all φ ∈ L2 (Ω) and thus v1 − v2 = 0 (a.e.). C0∞ (Ω)

Remark 2.2.3 As a warning we note that if the classical derivative of u, say D α u, exists almost everywhere in Ω and Dα u ∈ L2 (Ω), this does not guarantee the existence of the αth weak derivative. This is shown by the following example: Ω = (−1, 1),

u(x) = 0

if x < 0,

u(x) = 1 if x ≥ 0 .

The classical first derivative of u on Ω \ {0} is u′ (x) = 0. However, the weak first derivative of u as defined in 2.2.1 does not exist. A further noticable observation is the following: If u ∈ C ∞ (Ω)∩C(Ω) then u does not always √ have a first weak derivative. This is shown by the example Ω = (0, 1), u(x) = x. The only / L2 (Ω).  candidate for the first weak derivative of u is v(x) = 2√1 x . However, v ∈ 23

The Sobolev space W m (Ω), m = 1, 2, . . . , consists of all functions in L2 (Ω) for which all αth weak derivatives with |α| ≤ m exist: W m (Ω) := { u ∈ L2 (Ω) | D α u exists for all 0 < |α| ≤ m }

(2.18)

Remark 2.2.4 By definition, for u ∈ W m (Ω), Green’s formula Z Z α |α| D u(x)φ(x) dx = (−1) u(x)D α φ(x) dx , for all φ ∈ C0∞ (Ω), |α| ≤ m, Ω



holds.



For m = 0 we define W 0 (Ω) := L2 (Ω). In W m (Ω) a natural inner product and corresponding norm are defined by hu, vim :=

X

1

hDα u, D α viL2 ,

2 kukm := hu, uim ,

|α|≤m

u, v ∈ W m (Ω)

(2.19)

It is easy to verify, that h·, ·im defines an inner product on W m (Ω). We now formulate a main result: Theorem 2.2.5 The space (W m (Ω) , h·, ·im ) is a Hilbert space. Proof. We must show that the space W m (Ω) with the norm k · km is complete. For m = 0 this is trivial. We consider m ≥ 1. First note that for v ∈ W m (Ω): X (2.20) kvk2m = kDα vk2L2 |α|≤m

Let (uk )k≥1 be a Cauchy sequence in W m (Ω). From (2.20) it follows that if kuk − uℓ km ≤ ε then kDα uk − Dα uℓ kL2 ≤ ε for all 0 ≤ |α| ≤ m. Hence, (Dα uk )k≥1 is a Cauchy sequence in L2 (Ω) for all |α| ≤ m. Since L2 (Ω) is complete it follows that there exists a unique u(α) ∈ L2 (Ω) with limk→∞ D α uk = u(α) in L2 (Ω). For |α| = 0 this yields limk→∞ uk = u(0) in L2 (Ω). For 0 < |α| ≤ m and arbitrary φ ∈ C0∞ (Ω) we obtain hu(α) , φiL2 = lim hD α uk , φiL2 k→∞

= lim (−1)|α| huk , Dα φiL2 = (−1)|α| hu(0) , Dα φiL2

(2.21)

k→∞

From this it follows that u(α) ∈ L2 (Ω) is the αth weak derivative of u(0) . We conclude that u(0) ∈ W m (Ω) and X kD α uk − D α u(0) k2L2 lim kuk − u(0) k2m = lim k→∞

k→∞

=

X

|α|≤m

|α|≤m

lim kD α uk − u(α) k2L2 = 0

k→∞

This shows that the Cauchy sequence (uk )k≥1 in W m (Ω) has its limit point in W m (Ω) and thus this space is complete. 24

Similar constructions can be applied if we replace the Hilbert Rspace L2 (Ω) by the Banach space Lp (Ω), 1 ≤ p < ∞ of measurable functions for which kukp := ( Ω |u(x)|p dx)1/p is bounded. This results in Sobolev spaces which are usually denoted by Wpm (Ω). For notational simplicity we deleted the index p = 2 in our presentation. For p 6= 2 the Sobolev space Wpm (Ω) is a Banach space but not a Hilbert space. In this book we only need the Sobolev space with p = 2 as defined in (2.18). For p 6= 2 we refer to the literature, e.g. [1].

2.2.2

The spaces H m (Ω) based on completion

In this section we introduce the Sobolev spaces using a different technique, namely based on the concept of completion. We recall a basic result (cf. Appendix A.1). Lemma 2.2.6 Let (Z, k·k) be a normed space. Then there exists a Banach space (X, k·k∗ ) such that Z ⊂ X, kxk = kxk∗ for all x ∈ Z, and Z is dense in X. The space X is called the completion of Z. This space is unique, except for isometric (i.e., norm preserving) isomorphisms. Here we consider the function space Zm := { u ∈ C ∞ (Ω) | kukm < ∞ } ,

(2.22)

endowed with the norm k · km , as defined in (2.19), i.e, we want to construct the completion of (Zm , k · km ). For m = 0 this results in L2 (Ω), since C ∞ (Ω) is dense in L2 (Ω). Hence, we only consider m ≥ 1. Note that Zm ⊂ W m (Ω) and that this embedding is continuous. One can apply the general result of lemma 2.2.6 which then defines the completion of the space Zm . However, here we want to present a more constructive approach which reveals some interesting relations between this completion and the space W m (Ω). First, note that due to (2.20) a Cauchy sequence (uk )k≥1 in Zm is also a Cauchy sequence in L2 (Ω) and thus to every such sequence there corresponds a unique u ∈ L2 (Ω) with limk→∞ kuk −ukL2 = 0. The space Vm ⊃ Zm is defined as follows: Vm := {u ∈ L2 (Ω) | lim kuk − ukL2 = 0 for a Cauchy sequence (uk )k≥1 in Zm } k→∞

One easily verifies that Vm is a vector space. Lemma 2.2.7 Vm is the closure of Zm in the space W m (Ω). Proof. Take u ∈ Vm and let (uk )k≥1 be a Cauchy sequence in Zm with limk→∞ kuk −ukL2 = 0. From (2.20) it follows that (D α uk )k≥1 , 0 < |α| ≤ m are Cauchy sequences in L2 (Ω). Let u(α) := limk→∞ D α uk in L2 (Ω). As in (2.21) one shows that u(α) is the αth weak derivative Dα u of u. Using D α u = limk→∞ D α uk in L2 (Ω), for 0 < |α| ≤ m, we get lim kuk − uk2m = lim

k→∞

k→∞

X

|α|≤m

kD α uk − D α uk2L2 = 0

Since (uk )k≥1 is a sequence in Zm we have shown that Vm is the closure of Zm in W m (Ω). On the space Vm we can take the same inner product (and corresponding norm) as used in the space W m (Ω) (cf. (2.19)). From lemma 2.2.7 and the fact that in the space Zm the norm is the same as the norm of W m (Ω) it follows that (Vm , k · km ) is the completion of (Zm , k · km ). 25

Since the norm k · km is induced by an inner product we have that (Vm , h·, · im ) is a Hilbert space. This defines the Sobolev space H m (Ω) := (Vm , h·, · im ) = completion of (Zm , h·, ·im ) It is clear from lemma 2.2.7 that H m (Ω) ⊂ W m (Ω)

holds. A fundamental result is the following:

Theorem 2.2.8 The equality H m (Ω) = W m (Ω) holds. Proof. The first proof of this result was presented in [63]. A proof can also be found in [1, 65]. We see that the construction using weak derivatives (space W m (Ω)) and the one based on completion (space H m (Ω)) result in the same Sobolev space. In the remainder we will only use the notation H m (Ω). The result in theorem 2.2.8 holds for arbitrary open sets Ω in Rn . If the domain satisfies certain very mild smoothnes conditions (it suffices to have assumption 1.1.3) one can prove a somewhat stronger result, that we will need further on: ˜ m (Ω) be the completion of the space (C ∞ (Ω) ¯ , h·, ·im ). Then Theorem 2.2.9 Let H ˜ m (Ω) = H m (Ω) = W m (Ω) H

holds. Proof. We refer to [1]. ¯ $ Zm and thus H ˜ m (Ω) results from the completion of a smaller space than Note that C ∞ (Ω) H m (Ω). ˜ m (Ω) $ W m (Ω) Remark 2.2.10 If assumption 1.1.3 is not satisfied then it may happen that H holds. Consider the example Ω = { (x, y) ∈ R2 | x ∈ (−1, 0) ∪ (0, 1),

y ∈ (0, 1) }

and take u(x, y) = 1 if x > 0, u(x, y) = 0 if x < 0. Then D(1,0) u = D (0,1) u = 0 on Ω and thus ¯ such u ∈ W 1 (Ω). However, one can verify that there does not exist a sequence (φk )k≥1 in C 1 (Ω) that X kD α φk k2L2 → 0 for k → ∞ ku − φk k21 = ku − φk k2L2 + |α|=1

¯ is not dense in W 1 (Ω), i.e., H ˜ 1 (Ω) 6= W 1 (Ω). The equality H 1 (Ω) = W 1 (Ω), Hence, C 1 (Ω) however, does hold.  The completion can also be defined if in (2.22) the space C ∞ (Ω) is replaced by the smaller space C0∞ (Ω). This yields another class of important Sobolev spaces: H0m (Ω) ≡ completion of the space (C0∞ (Ω) , h·, ·im )

The space H0m (Ω) is a Hilbert space that is in general strictly smaller than H m (Ω).

26

(2.23)

Remark 2.2.11 In general we have H01 (Ω) $ H 1 (Ω). Consider, as a simple example, Ω = (0, 1), u(x) = x. Then u ∈ H 1 (Ω) but for arbitrary φ ∈ C0∞ (Ω) we have ku − φk21 = ku − φk2L2 + ku′ − φ′ k2L2 Z 1 Z 1 2 ′ ′ 1 − 2φ′ (x) + φ′ (x)2 dx ≥ u (x) − φ (x) dx = 0 0 Z 1  φ′ (x) dx = 1 − 2 φ(1) − φ(0) = 1 ≥1−2 0

k·k1

Hence u ∈ / H01 (Ω) = C0∞ (Ω)

.



The P technique of completion can also be applied if instead of k · km one uses the norm kukm,p = ( |α|≤m kD α ukp )1/p , 1 ≤ p < ∞. This results in Sobolev spaces denoted by Hpm (Ω). For p = 2 we have H2m (Ω) = H m (Ω). For p 6= 2 these spaces are Banach spaces but not Hilbert spaces. A result as in theorem 2.2.8 also holds for p 6= 2: Hpm (Ω) = Wpm (Ω). We now formulate a result on a certain class of piecewise smooth functions which form a subset of the Sobolev space H m (Ω). This subset plays an important role in the finite element method that will be presented in chapter 3. ¯ = ∪N Ω ¯ Theorem 2.2.12 Assume that Ω can be partitioned as Ω i=1 i , with Ωi ∩ Ωj = ∅ for all i 6= j and for all Ωi the assumption 1.1.3 is fulfilled. For m ∈ N, m ≥ 1, define ¯ i) Vm = { u ∈ L2 (Ω) | u|Ωi ∈ C m (Ω

for all i = 1, . . . , N }

For u ∈ Vm the following holds: ¯ u ∈ H m (Ω) ⇔ u ∈ C m−1 (Ω) Proof. First we need some notation. Let Γi := ∂Ωi . The outward unit normal on Γi is denoted by n(i) . Let Γiℓ := Γi ∩ Γℓ (= Γℓi ) and γint denotes the set of all those intersections Γiℓ with measn−1 (Γiℓ ) > 0 (in 2D with triangles: intersections by sides are taken into account but intersections by vertices not). Similarly, Γi0 := Γi ∩ ∂Ω and γb the set of all Γi0 with measn−1 (Γi0 ) > 0. For Γiℓ ∈ γint let n(iℓ) be the unit normal pointing outward from Ωi (thus n(ℓi) = −n(iℓ) . Finally, for u ∈ V1 let  [u]iℓ = lim u(x + tn(iℓ) ) − u(x + tn(ℓi) ) , x ∈ Γiℓ ∈ γint t↓0

be the jump of u across Γiℓ . We now consider the case m = 1, i.e., u ∈ V1 . Let v ∈ L2 (Ω) be given by v(x) = x ∈ Ωi , i = 1, . . . , N . For arbitrary φ ∈ C0∞ (Ω) we have Z

N

X ∂φ(x) u(x) dx = ∂xk Ω

Z

u(x)

i=1 Ωi N Z X

=−

=−

i=1

Z

∂u(x) ∂xk

for

∂φ(x) dx ∂xk

v(x)φ(x) dx +

Ωi

v(x)φ(x) dx + Ω

i=1 N Z X i=1

27

N Z X

Γi

Γi

(i)

u(x)φ(x)nk dx (i)

u(x)φ(x)nk dx

(2.24)

For the last term in this expression we have N Z X i=1

Γi

(i) u(x)φ(x)nk

dx =

X Z

X Z

Γi0 ∈γb

We have Rb = 0 because φ(x) = 0 on ∂Ω. If u ∈ H 1 (Ω) holds then the weak derivative Z

Γiℓ

Γiℓ ∈γint

+

∂φ(x) dx = − u(x) ∂xk Ω

Z



(iℓ)

[u]iℓ φ(x)nk

dx

(i)

Γi0

∂u ∂xk

u(x)φ(x)nk dx =: Rint + Rb

must be equal to v (for all k = 1, . . . , n). From

∂u(x) φ(x) dx = − ∂xk

Z

v(x)φ(x) dx , Ω

∀φ ∈ C0∞ (Ω),

it follows that Rint = 0 must hold for all φ ∈ C0∞ (Ω). This implies that the jump of u across Γiℓ ¯ holds. Conversely, if u ∈ C(Ω) ¯ then Rint = 0 and from the relation is zero and thus u ∈ C(Ω) ∂u (2.24) it follows that the weak derivative ∂xk exists. Since k is arbitrary we conclude u ∈ H 1 (Ω). This completes the proof for the case m = 1. For m > 1 we use an induction argument. Assume that the statement holds for m. We consider m + 1. Take u ∈ Vm+1 and assume that u ∈ H m+1 (Ω) holds. Take an arbitrary multi-index ˆ β and weak ones by Dβ (with α with |α| ≤ m − 1. Classical derivatives will be denoted by D ˆ α u ∈ C(Ω). ¯ β a multi-index as α). From the induction hypothesis we obtain w := D From m+1 β 1 u ∈H (Ω) it follows that D w ∈ H (Ω) for |β| ≤ 1. Furthermore, for these β values we ˆ β w ∈ C 1 (Ω ¯ i ) for i = 1, . . . , N . From the result also have, due to u ∈ Vm+1 that D β w = D β ¯ ˆ β w is continuous across the for m = 1 it now follows that D w ∈ C(Ω) for |β| ≤ 1. Hence, D β ˆ w ∈ C(Ω) ¯ holds. We conclude that D ˆ αD ˆ β u ∈ C(Ω) ¯ for all internal interfaces Γiℓ and thus D m m α ¯ ¯ ˆ ¯ |α| ≤ m−1, |β| ≤ 1, i.e., u ∈ C (Ω). Conversely, if u ∈ Vm+1 and u ∈ C (Ω) then D u ∈ C(Ω) α 1 ˆ for |α| ≤ m. From the result for m = 1 it follows that D u ∈ H (Ω) for all |α| ≤ m and thus u ∈ H m+1 (Ω) holds.

2.2.3

Properties of Sobolev spaces

There is an extensive literature on the theory of Sobolev spaces, see for example, Adams [1], Marti [61], Neˇcas [65], Wloka [99], Alt [3] , and the references therein. In this section we collect a few results that will be needed further on. A first important question concerns the smoothness of functions from H m (Ω) in the classical ¯ sense. For example, one can show that if Ω ⊂ R then all functions from H 1 (Ω) must (i.e., C k (Ω)) ¯ This, however, is not true for the two dimensional case, as the following be continuous on Ω. example shows: Example 2.2.13 In this example we show that functions in H 1 (Ω), with Ω ⊂ R2 , are not 1 necessarily continuous on Ω. With r := (x21 + x22 ) 2 let B(0, α) := { (x1 , x2 ) ∈ R2 | r < α } for α > 0. We take Ω = B(0, 21 ). Below we also use Ωδ := Ω \ B(0, δ) with 0 < δ < 12 . On Ω we define the function u by u(0, 0) := 0, u(x1 , x2 ) := ln(ln(1/r)) otherwise. Using polar coordinates one obtains Z 1 Z Z 2 2 2 [ln(ln(1/r))]2 r dr < ∞ u(x) dx = 2π lim u(x) dx = lim Ω

δ↓0

δ↓0

Ωδ

28

δ

so u ∈ L2 (Ω) holds. Note that u ∈ C ∞ (Ω \ {0}). For the first derivatives we have Z X Z 1 2 ∂u(x) 2 2π 1 r dr = dx = 2π lim 2 2 δ↓0 δ r (ln r) ∂xi ln 2 Ω i=1,2

∂u exist a.e. on Ω and are elements of L2 (Ω). Note, It follows that the classical first derivatives ∂x i ∞ however, remark 2.2.3. For arbitrary φ ∈ C0 (Ω) we have, using Green’s formula on Ωδ :

∂φ(x) u(x) dx = ∂x1 Ωδ

Z

Z

∂B(0,δ)

u(x)φ(x)nx1 ds −

Z

Ωδ

∂u(x) φ(x) dx. ∂x1

Note that lim | δ↓0

Z

∂B(0,δ)

u(x)φ(x)nx1 ds| ≤ lim 2πδkφk∞ | ln(ln(1/δ))| = 0. δ↓0

So we have ∂φ(x) dx = − u(x) ∂x1 Ω

Z

Z



∂u(x) φ(x) dx. ∂x1

∂u We conclude that ∂x is the generalized partial derivative with respect to the variable x1 . The 1 same argument yields an analogous result for the derivative w.r.t. x2 . We conclude that u ∈ H 1 (Ω). It is clear that u is not continuous on Ω. 

We now formulate an important general result which relates smoothness in the Sobolov sense (weak derivatives) to classical smoothness properties. For normed spaces X and Y a linear operator I : X → Y is called a continuous embedding if I is continuous and injective. Such a continuous embedding is denoted by X ֒→ Y . Furthermore, usually for x ∈ X the corresponding element Ix ∈ Y is denoted by x, too (X ֒→ Y is formally replaced by X ⊂ Y ). Theorem 2.2.14 If m −

n 2

> k (recall: Ω ⊂ Rn ) then there exist continuous embeddings ¯ H m (Ω) ֒→ C k (Ω) k m ¯ H (Ω) ֒→ C (Ω). 0

(2.25a) (2.25b)

Proof. Given in [1] It is trivial that for m ≥ 0 there are continuous embeddings H m+1 (Ω) ֒→ H m (Ω) and H0m+1 (Ω) ֒→ H0m (Ω). In the next theorem a result on compactness of embeddings (cf. Appendix A.1) is formulated. We recall that if X, Y are Banach spaces then a continuous embedding X ֒→ Y is compact if and only if each bounded sequence in X has a subsequence which is convergent in Y . 29

Theorem 2.2.15 The continuous embeddings H m+1 (Ω) ֒→ H m (Ω)

for

H0m+1 (Ω) m

H0m (Ω)

for

¯ H (Ω) ֒→ C k (Ω)

for

¯ H0m (Ω) ֒→ C k (Ω)

for

֒→

m≥0

m≥0 n m− >k 2 n m− >k 2

(2.26a) (2.26b) (2.26c) (2.26d)

are compact. Proof. We sketch the idea of the proof. In [1] it is shown that the embeddings H 1 (Ω) ֒→ L2 (Ω),

H01 (Ω) ֒→ L2 (Ω)

are compact. This proves the results in (2.26a) and (2.26b) for m = 0. The results in (2.26a) and (2.26b) for m ≥ 1 can easily be derived from this as follows. Let (uk )k≥1 be a bounded sequence in H m+1 (Ω) (m ≥ 1). Then (D α uk )k≥1 is a bounded sequence in H 1 (Ω) for |α| ≤ m. Thus this sequence has a subsequence (Dα uk′ )k′ ≥1 that converges in L2 (Ω). Hence, the subsequence (uk′ )k′ ≥1 converges in H m (Ω). This proves the compactness of the embedding H m+1 (Ω) ֒→ H m (Ω). The result in (2.26b) for m ≥ 1 can be shown in the same way. With a similar shift argument one can easily show that it suffices to prove the results in (2.26c) and (2.26d) for the case k = 0. The analysis for the case k = 0 is based on the following general result (which is easy to prove). If X, Y, Z are normed spaces with continuous embeddings I1 : X ֒→ Y, I2 : Y ֒→ Z and if at least one of these embeddings is compact then the continuous embedding I2 I1 : X ֒→ Z is compact. For m − n2 > 0 there exist µ, λ ∈ (0, 1) with 0 < λ < µ < m − n2 . The following continuous embeddings exist: ¯ ֒→ C 0,λ (Ω) ¯ ֒→ C(Ω). ¯ H m (Ω) ֒→ C 0,µ (Ω)

In this sequence only the first embedding is nontrivial. This one is proved in [1], theorem 5.4. Furthermore, from [1] theorem 1.31 it follows that the second embedding is compact. We con¯ is compact. This then yields the result clude that for m − n2 > 0 the embedding H m (Ω) ֒→ C(Ω) in (2.26c) for k = 0. The same line of reasoning can be used to show that (2.26d) holds. The result in the following theorem is a basic inequality that will be used frequently. Theorem 2.2.16 (Poincare-Friedrichs inequality) There exists a constant C that depends only on diam(Ω) such that sX kDα uk2L2 for all u ∈ H01 (Ω). kukL2 ≤ C |α|=1

Proof. Because C0∞ (Ω) is dense in H01 (Ω) it is sufficient to prove the inequality for u ∈ Without loss of generality we can assume that (0, . . . , 0) ∈ Ω. Let a > 0 be such that Ω ⊂ [−a, a]n =: E. Take u ∈ C0∞ (Ω) and extend this function by zero outside Ω. Note that Z x1 ∂u(t, x2 , . . . , xn ) dt u(x1 , . . . , xn ) = u(−a, x2 , . . . , xn ) + ∂t −a C0∞ (Ω).

30

Since u(−a, x2 , . . . , xn ) = 0 we obtain, using the Cauchy-Schwarz inequality  Z x1 ∂u(t, x , . . . , x ) 2 2 n 2 u(x) = 1 dt ∂t −a Z x1 Z x1 ∂u(t, x2 , . . . , xn ) 2 1 dt ≤ dt ∂t −a −a Z a ∂u(t, x2 , . . . , xn ) 2 dt for x ∈ E ≤ 2a ∂t −a

Note that the latter term does not depend on x1 . Integration with respect to the variable x1 results in Z a Z a 2 2 2 u(x1 , . . . , xn ) dx1 ≤ 4a D(1,0,...,0) u(x) dx1 for x ∈ E −a

−a

and integration with respect to the other variables gives Z Z X 2 u(x)2 dx ≤ 4a2 kDα uk2L2 D (1,0,...,0) u(x) dx ≤ 4a2 E

E

|α|=1

and thus the desired result is proved.

Corollary 2.2.17 For u ∈ H m (Ω) define |u|2m :=

X

|α|=m

kD α uk2L2

(2.27)

There exists a constant C such that |u|m ≤ kukm ≤ C|u|m

for all u ∈ H0m (Ω) ,

i.e., | · |m and k · km are equivalent norms on H0m (Ω). Proof. The inequality |u|m ≤ kukm is trivial. For m = 1 the inequality kuk1 ≤ C|u|1 directly follows from the Poincare-Friedrichs inequality. For u ∈ H02 (Ω) we obtain kuk22 = kuk21 + |u|22 ≤ C 2 |u|21 + |u|22 . Application of the Poincare-Friedrichs inequality to D α u ∈ H01 (Ω), |α| = 1, yields X X X kD β Dα uk2L2 ≤ c|u|22 |u|21 = kD α uk2L2 ≤ c |α|=1

|α|=1 |β|=1

˜ 2 holds. For m > 2 the same reasoning is applicable. Thus kuk2 ≤ C|u| In the weak formulation of the elliptic boundary value problems one must treat boundary conditions. For this the next result will be needed. Theorem 2.2.18 (Trace operator) There exists a unique bounded linear operator γ : H 1 (Ω) → L2 (∂Ω),

kγ(u)kL2 (∂Ω) ≤ ckuk1 ,

¯ the equality γ(u) = u|∂Ω holds. with the property that for all u ∈ C 1 (Ω)

31

(2.28)

¯ → L2 (∂Ω) by γ(u) = u|∂Ω and will show that Proof. We define γ : C 1 (Ω) kγ(u)kL2 (∂Ω) ≤ ckuk1

¯ for all u ∈ C 1 (Ω)

(2.29)

holds. The desired result then follows from the extension theorem A.2.3. We give a proof of (2.29) for the two dimensional case. The general case can be treated in the same way. In a neighbourhood of x ∈ ∂Ω we take a local coordinate system (ξ, η) such that locally the boundary can be represented as Γloc = { (ξ, ψ(ξ)) | ξ ∈ [−a, a] }

with a > 0, ψ ∈ C 0,1 ([−a, a])

and a small strip below the graph of ψ is contained in Ω: S := { (ξ, η) | ξ ∈ [−a, a], η ∈ [ψ(ξ) − b, ψ(ξ)) } ⊂ Ω , ¯ Note that for b > 0 sufficiently small. Take u ∈ C 1 (Ω). Z ψ(ξ) ∂u(ξ, η) dη for t ∈ [ψ(ξ) − b, ψ(ξ)] u(ξ, ψ(ξ)) = u(ξ, t) + ∂η t Using the inequality (α + β)2 ≤ 2(α2 + β 2 ) and application of the Cauchy-Schwarz inequality yields  Z ψ(ξ) ∂u(ξ, η) 2 2 2 u(ξ, ψ(ξ)) ≤ 2u(ξ, t) + 2 1 dη ∂η t Z ψ(ξ) ∂u(ξ, η) 2 2 dη ≤ 2u(ξ, t) + 2(ψ(ξ) − t) ∂η t Z ψ(ξ) ∂u(ξ, η) 2 2 ≤ 2u(ξ, t) + 2b dη ∂η ψ(ξ)−b In this last expression only the first term on the right handside depends on t. Integration over t ∈ [ψ(ξ) − b, ψ(ξ)] results in Z ψ(ξ) Z ψ(ξ) ∂u(ξ, η) 2 2 2 2 dη u(ξ, t) dt + 2b b u(ξ, ψ(ξ)) ≤ 2 ∂η ψ(ξ)−b ψ(ξ)−b Z ψ(ξ) ∂u(ξ, η) 2 u(ξ, η)2 + b2 =2 dη ∂η ψ(ξ)−b

Integration over ξ ∈ [−a, a] and division by b gives Z Z a ∂u(ξ, η) 2 dηdξ b−1 u(ξ, η)2 + b u(ξ, ψ(ξ))2 dξ ≤ 2 ∂η S −a

If ψ ∈ C 1 ([−a, a]) then for the local arc length variable s on Γloc we have p ds = 1 + ψ ′ (ξ)2 dξ Since 0 ≤ ψ ′ (ξ)2 ≤ C for ξ ∈ [−a, a] we obtain Z a Z 2 u(ξ, ψ(ξ))2 dξ u(s) ds ≤ C Γloc

−a

 ≤ C b−1 kuk2L2 (S) + b|u|21,S ≤ Ckuk21,S 32

(2.30)

If ψ is only Lipschitz continuous on [−a, a] then ψ ′ exists almost everywhere on [−a, a] and |ψ ′ (ξ)| is bounded (Rademacher’s theorem). Hence, the same argument can be applied. Finally note that ∂Ω can be covered by a finite number of local parts Γloc . Addition of the local inequalities in (2.30) then yields the result in (2.29). The operator defined in theorem 2.2.18 is called the trace operator. For u ∈ H 1 (Ω) the function γ(u) ∈ L2 (∂Ω) represents the boundary “values” of u and is called the trace of u. For γ(u) one often uses the notation u|∂Ω . For example, for u ∈ H 1 (Ω), the identity u|∂Ω = 0 means that γ(u) = 0 in the L2 (∂Ω) sense. The space range(γ) can be shown to be dense in L2 (∂Ω) but is strictly smaller than L2 (∂Ω). For a characterization of this subspace one can use a Sobolev space with a broken index:

1

H 2 (∂Ω) = range(γ) = { v ∈ L2 (∂Ω) | ∃ u ∈ H 1 (Ω) : v = γ(u) }

(2.31)

1

The space H 2 (∂Ω) is a Hilbert space which has similar properties as the usual Sobolev spaces. We do not discuss this topic here, since we will not need this space in the remainder. Using the trace operator one can give another natural characterization of the space H01 (Ω): Theorem 2.2.19 The equality H01 (Ω) = { u ∈ H 1 (Ω) | u|∂Ω = 0 } holds.

Proof. We only prove “⊂”. For a proof of “⊃” we refer to [47] theorem 6.2.42 or [1] remark 7.54. First note that { u ∈ H 1 (Ω) | u|∂Ω = 0 } = ker(γ). Furthermore, C0∞ (Ω) ⊂ ker(γ) and the trace operator γ : H 1 (Ω) → L2 (∂Ω) is continuous. From the latter it follows that ker(γ) is closed in H 1 (Ω). This yields:

H01 (Ω) = C0∞ (Ω)

k·k1

⊂ ker(γ)

k·k1

= ker(γ)

and this proves “⊂”. Finally, we collect a few results on Green’s formulas that hold in Sobolev spaces. For notational simplicity theR function arguments x are deleted in the integrals, and in boundary integrals like, for example, ∂Ω γ(u)γ(v) ds we delete the trace operator γ. 33

Theorem 2.2.20 The following identities hold, with n = (n1 , . . . , nn ) the outward unit normal on ∂Ω and H m := H m (Ω): Z R R ∂u ∂v u v dx + ∂Ω uvni ds, u, v ∈ H 1 , 1 ≤ i ≤ n (2.32a) dx = − Ω ∂x i ∂xi Ω Z R R (2.32b) ∆u v dx = − Ω ∇u · ∇v dx + ∂Ω ∇u · nv ds, u ∈ H 2 , v ∈ H 1 Ω Z R R u div v dx = − Ω ∇u · v dx + ∂Ω uv · n ds, u ∈ H 1 , v ∈ (H 1 )n . (2.32c) Ω

¯ the conProof. These results immediately follow from the corresponding formulas in C ∞ (Ω) tinuity of the trace operator and a density argument based on theorem 2.2.9. For the dual space of H0m (Ω) the following notation is used: ′ H −m (Ω) := H0m (Ω) .

(2.33)

The norm on this space is denoted by k · k−m : kφk−m :=

2.3

sup v∈H0m (Ω)

|φ(v)| , kvkm

φ ∈ H −m (Ω).

General results on variational formulations

In section 2.1 we already gave an example of a variational problem. In the previous section we introduced Hilbert spaces that will be used for the variational formulation of elliptic boundary problems in the sections 2.5 and 2.6. In this section we present some general existence and uniqueness result for variational problems. These results will play a key role in the analysis of well-posedness of the weak formulations of elliptic boundary value problems. They will also be used in the discretization error analysis for the finite element method in chapter 3. A remark on notation: For elements from a Hilbert space we use boldface notation (e.g., u), elements from the dual space (i.e., bounded linear functionals) are denoted by f, g, etc., and for linear operators between spaces we use capitals (e.g., L). Let H1 and H2 be Hilbert spaces. A bilinear form k : H1 × H2 → R is continuous if there is a constant Γ such that for all x ∈ H1 , y ∈ H2 : |k(x, y)| ≤ ΓkxkH1 kykH2 .

(2.34)

For a continuous bilinear form k : H1 ×H2 → R we define its norm by kkk = sup{ |k(x, y)| | kxkH1 = 1, kykH2 = 1 }. A fundamental result is given in the following theorem: 34

Theorem 2.3.1 Let H1 , H2 be Hilbert spaces and k : H1 × H2 → R be a continuous bilinear form. For f ∈ H2′ consider the variational problem: find

u ∈ H1

such that

k(u, v) = f (v)

for all

v ∈ H2 .

(2.35)

The following two statements are equivalent: 1. For arbitrary f ∈ H2′ the problem (2.35) has a unique solution u ∈ H1 and kukH1 ≤ ckf kH2′ holds with a constant c independent of f . 2. The conditions (2.36) and (2.37) hold: ∃ε>0 :

sup v∈H2

k(u, v) ≥ εkukH1 kvkH2

∀ v ∈ H2 , v 6= 0, ∃ u ∈ H1 :

for all

u ∈ H1 ,

k(u, v) 6= 0.

(2.36) (2.37)

Moreover, for the constants c and ε one can take c = 1ε . Proof. We introduce the linear continuous operator L : H1 → H2′ (Lu)(v) := k(u, v). Note that for all u ∈ H1 kLukH2′ = sup

v∈H2

k(u, v) (Lu)(v) = sup kvkH2 v∈H2 kvkH2

(2.38)

(2.39)

Furthermore, u ∈ H1 satisfies (2.35) if and only if Lu = f holds. “1. ⇒ 2.” From 1. it follows that L : H1 → H2′ is bijective. For arbitrary u ∈ H1 and f := Lu we have: k(u, v) kukH1 ≤ ckf kH2′ = c kLukH2′ = c sup . v∈H2 kvkH2

From this it follows that (2.36) holds with ε = 1c . Take a fixed v ∈ H2 , v 6= 0. The linear functional w → hv, wiH2 is an element of H2′ . There exists u ∈ H1 such that k(u, w) = hv, wiH2 for all w ∈ H2 . Taking w = v yields k(u, v) = kvk2H2 > 0. Hence, (2.37) holds. “1. ⇐ 2.” Let u ∈ H1 be such that Lu = 0. Then k(u, v) = (Lu)(v) = 0 for all v ∈ H2 . From condition (2.36) it follows that u = 0. We conclude that L : H1 → H2′ is injective. Let R(L) ⊂ H2′ be the range of L and L−1 : R(L) → H1 the inverse mapping. From (2.39) and (2.36) it follows that kLukH2′ ≥ εkukH1 for all u ∈ H1 and thus 1 kf kH2′ for all f ∈ R(L). (2.40) ε Hence the inverse mapping is bounded. From corollary A.2.6 it follows that R(L) is closed in H2′ . Assume that R(L) 6= H2′ . Then there exists g ∈ R(L)⊥ , g 6= 0. Let J : H2′ → H2 be the Riesz isomorphism. For arbitrary u ∈ H1 we get kL−1 f kH1 ≤

0 = hg, LuiH2′ = hJg, JLuiH2 = (Lu)(Jg) = k(u, Jg).

This is a contradiction to (2.37). We conclude that R(L) = H2′ and thus L : H1 → H2′ is bijective. From (2.40) we obtain, with u := L−1 f , kukH1 ≤ 1ε kf kH2′ for arbitrary f ∈ H2′ . 35

Remark 2.3.2 The conditon (2.37) can also be formulated as follows:   v ∈ H2 such that k(u, v) = 0 for all u ∈ H1 ⇒ v = 0. The condition (2.36) is equivalent to ∃ε>0 :

inf

sup

u∈H1 \{0} v∈H2

k(u, v) ≥ ε, kukH1 kvkH2

(2.41)

and is often called the inf-sup condition. In the finite dimensional case with dim(H1 ) = dim(H2 ) < ∞ this condition implies the result in (2.37) and thus is necessary and sufficient for existence and uniqueness, as can been seen from the following. Let L : H1 → H2′ be as in (2.38). If dim(H1 ) = dim(H2′ ) < ∞ we have L is bijective ⇔ L is injective kLukH2′ k(u, v) > 0 ⇔ inf sup > 0. ⇔ inf u6=0 v kukH1 kvkH2 u6=0 kukH1

(2.42)

The latter condition seems to be weaker than the inf-sup condition in (2.41), since ε > 0 is required there. However, in the finite dimensional case it is easy to show, using a compactness argument, that these two conditions are equivalent. In infinite dimensional Hilbert spaces the inf-sup condition (2.41) is in general really stronger than the one in (2.42). As we saw in the analysis above, it is natural to identify the bounded bilinear form k : H1 × H2 → R with a bounded linear operator L : H1 → H2′ via (Lu)(v) = k(u, v). The result in theorem 2.3.1 is a reformulation of the following result that can be found in functional analysis textbooks. Let L : H1 → H2′ be a bounded linear operator. Then L is an isomorphism if and only if the following two conditions hold: (a) L is injective and R(L) is closed in H2′ ,

(b) L′ : H2 → H1′ defined by (L′ v)(u) = (Lu)(v) is injective. These two conditions correspond to (2.36) and (2.37), respectively.



The following lemma will be used in the analysis below. Lemma 2.3.3 Let H1 , H2 be Hilbert spaces and k : H1 × H2 → R be a continuous bilinear form. For every u ∈ H1 there exists a unique w ∈ H2 such that hw, viH2 = k(u, v)

for all v ∈ H2 .

Furthermore, if (2.36) is satisfied, then kwkH2 ≥ εkukH1 , with ε > 0 as in (2.36), holds. Proof. Take u ∈ H1 . Then g : v → k(u, v) defines a continuous linear functional on H2 . Form the Riesz representation theorem it follows that there exists a unique w ∈ H2 such that hw, viH2 = g(v) = k(u, v) for all v ∈ H2 , and kgkH2′ = kwkH2 . Using (2.36) we obtain kwkH2 = kgkH2′ = sup

v∈H2

k(u, v) g(v) = sup ≥ ε kukH1 kvkH2 v∈H2 kvkH2

and thus the result is proved.

36

Definition 2.3.4 Let H be a Hilbert space. A bilinear form k : H × H → R is called H-elliptic if there exists a constant γ > 0 such k(u, u) ≥ γkuk2H

for all u ∈ H

As an immediate consequence of theorem 2.3.1 we obtain the following Theorem 2.3.5 (Lax-Milgram) Let H be a Hilbert space and k : H × H → R a continuous H-elliptic bilinear form with ellipticity constant γ. Then for every f ∈ H ′ there exists a unique u ∈ H such that k(u, v) = f (v) for all v ∈ H. Furthermore, the inequality kukH ≤ γ1 kf kH ′ holds.

This theorem will play an important role in the analysis of well-posedness of the weak formulation of scalar elliptic problems in the section 2.5. In the remainder of this section we analyze well-posedness for a variational problem which has a special saddle point structure. This structure is such that the analysis applies to the Stokes problem. This application is treated in section 2.6. Let V and M be Hilbert spaces and a : V × V → R,

b:V ×M →R

be continuous bilinear forms. For f1 ∈ V ′ , f2 ∈ M ′ we define the following variational problem: find (φ, λ) ∈ V × M such that a(φ, ψ) + b(ψ, λ) = f1 (ψ) for all ψ ∈ V

b(φ, µ) = f2 (µ) for all µ ∈ M.

(2.43a) (2.43b)

We now define H := V × M and k : H × H → R,

k(u, v) := a(φ, ψ) + b(φ, µ) + b(ψ, λ)

with u := (φ, λ), v := (ψ, µ).

(2.44)

On H we use the product norm kuk2H = kφk2V + kλk2M , for u = (φ, λ) ∈ H. If we define f ∈ H ′ = V ′ × M ′ by f (φ, λ) = f1 (ψ) + f2 (µ) then the problem (2.43) can be reformulated in the setting of theorem 2.3.1 as follows: find u ∈ H such that k(u, v) = f (v) for all v ∈ H.

(2.45)

Below we will analyze the well-posedness of the problem (2.45) and thus of (2.43). In particular, we will derive conditions on the bilinear forms a(·, ·) and b(·, ·) that are necessary and sufficient for existence and uniqueness of a solution. The main result is presented in theorem 2.3.10. We start with a few useful results. We need the following null space: V0 := { φ ∈ V | b(φ, λ) = 0 for all λ ∈ M }.

(2.46)

Note that both V0 and V0⊥ := { φ ∈ V | hφ, ψiV = 0 for all ψ ∈ V0 } are closed subspaces of V . These subspaces are Hilbert spaces if we use the scalar product h·, ·iV . As usual the 37

corresponding dual spaces are denoted by V0′ and (V0⊥ )′ . We introduce an inf-sup condition for the bilinear form b(·, ·): ∃β>0 :

sup ψ∈V

b(ψ, λ) ≥ β kλkM kψkV

for all

λ ∈ M.

(2.47)

Remark 2.3.6 Due to V = V0 ⊕ V0⊥ and b(ψ, λ) = 0 for all ψ ∈ V0 , λ ∈ M , the condition in (2.47) is equivalent to the condition ∃β>0 :

sup ψ∈V0⊥

b(ψ, λ) ≥ β kλkM kψkV

for all λ ∈ M.

(2.48)

Lemma 2.3.7 Assume that (2.47) holds. For g ∈ (V0⊥ )′ the variational problem find λ ∈ M such that b(ψ, λ) = g(ψ) for all ψ ∈ V0⊥

(2.49)

has a unique solution. Furthermore, kλkM ≤ β1 kgk(V ⊥ )′ holds. 0

Proof. We apply theorem 2.3.1 with H1 = M , H2 = V0⊥ and k(λ, ψ) = b(ψ, λ) (note the interchange of arguments). From (2.48) it follows that condition (2.36) is fulfilled. Take ψ ∈ V0⊥ , ψ 6= 0. Then ψ ∈ / V0 and thus there exists λ ∈ M with b(ψ, λ) 6= 0. Hence, the second condition (2.37) is also satisfied. Application of theorem 2.3.1 yields that the problem (2.49) has a unique solution λ ∈ M and kλkM ≤ β1 kgk(V ⊥ )′ holds. 0 Note that, opposite to (2.36), in (2.47) we take the supremum over the first argument of the bilinear form. In the following lemma we formulate a result in which the supremum is taken over the second argument: Lemma 2.3.8 The condition (2.47) (or (2.48)) implies: ∃β>0 :

sup λ∈M

b(ψ, λ) ≥ β kψkV kλkM

for all

ψ ∈ V0⊥ .

(2.50)

ˆ ∈ V ⊥, ψ ˆ 6= 0, and define g ∈ (V ⊥ )′ by g(ψ) = hψ, ψi ˆ V for ψ ∈ V ⊥ . From Proof. Take ψ 0 0 0 ˆ lemma 2.3.7 it follows that there exists a unique λ ∈ M such that ˆ = g(ψ) for all ψ ∈ V ⊥ b(ψ, λ) 0 ˆ M ≤ 1 kgk ⊥ ′ = 1 kψk ˆ V holds. We conclude that and kλk (V ) β β 0

sup λ∈M

ˆ 2 ˆ λ) ˆ λ) ˆ ˆ kψk b(ψ, b(ψ, g(ψ) V ˆ V ≥ = = ≥ β kψk ˆ M ˆ M ˆ M kλkM kλk kλk kλk

ˆ ∈ V ⊥. holds for arbitrary ψ 0 In general, (2.50) does not imply (2.47). Take, for example, the bilinear that is identically zero, i.e., b(ψ, λ) = 0 for all ψ ∈ V, λ ∈ M . Then (2.47) does not hold. However, since V0 = V and thus V0⊥ = 0 it follows that (2.50) does hold. Application of lemma 2.3.3 in combination with the infsup properties in (2.47) and (2.50) yields the following corollary. 38

Corollary 2.3.9 Assume that (2.47) holds. For every λ ∈ M there exists a unique ξ ∈ V0⊥ such that hξ, ψiV = b(ψ, λ) for all ψ ∈ V0⊥ . Furthermore, kξkV ≥ β kλkM holds. For every ψ ∈ V0⊥ there exists a unique ν ∈ M such that hν, λiM = b(ψ, λ)

for all λ ∈ M.

Furthermore, kνkM ≥ β kψkV holds. Proof. The first part follows by applying lemma 2.3.3 with H1 = M, H2 = V0⊥ , k(λ, ψ) = b(ψ, λ) in combination with (2.48). For the second part we use lemma 2.3.3 with H1 = V0⊥ , H2 = M, k(ψ, λ) = b(ψ, λ) in combination with (2.50). In the following main theorem we present necessary and sufficient conditions on the bilinear forms a(·, ·) and b(·, ·) such that the saddle point problem (2.43) has a unique solution which depends continuously on the data. Theorem 2.3.10 Let V, M be Hilbert spaces and a : V × V → R, b : V × M → R be continuous bilinear forms. Define H := V × M and let k : H × H → R be the continuous bilinear form defined in (2.44). For f ∈ H ′ = V ′ × M ′ consider the variational problem find u ∈ H such that k(u, v) = f (v) for all v ∈ H.

(2.51)

The following two statements are equivalent: 1. For arbitrary f ∈ H ′ the problem (2.51) has a unique solution u ∈ H and kukH ≤ ckf kH ′ holds with a constant c independent of f . 2. The inf-sup condition (2.47) holds and the conditions (2.52a), (2.52b) are satisfied: ∃δ>0:

sup ψ∈V0

a(φ, ψ) ≥ δ kφkV kψkV

for all φ ∈ V0

∀ ψ ∈ V0 , ψ 6= 0, ∃φ ∈ V0 : a(φ, ψ) 6= 0.

(2.52a) (2.52b)

Moreover, if the second statement holds, then for c in the first statement one can take c = (β + 2kak)2 δ−1 β −2 . Proof. From theorem 2.3.1 (with H1 = H2 = H) it follows that statement 1 is equivalent to 1′ . For H1 = H2 = H the conditions (2.36) and (2.37) are satisfied. We now prove that the statements 1′ and 2 are equivalent. We recall the condition (2.36): sup (ψ,µ)∈H

a(φ, ψ) + b(φ, µ) + b(ψ, λ) (kψk2V

+

1 kµk2M ) 2

≥ ε kφk2V + kλk2M

1 2

for all (φ, λ) ∈ H.

(2.53)

Define u = (φ, λ), v = (ψ, µ) and k(u, v) as in (2.44). We have to prove: {(2.53), (2.37)} ⇔ {(2.47), (2.52a), (2.52b)}. This is done in the following 39

5 steps: a) (2.53) ⇒ (2.47), b) {(2.53), (2.47)} ⇒ (2.52a), c) {(2.47), (2.37)} ⇒ (2.52b), d) {(2.47), (2.52a)} ⇒ (2.53), e) {(2.52a), (2.52b)} ⇒ (2.37). a). If in (2.53) we take φ = 0 we obtain sup ψ∈V

b(ψ, λ) b(ψ, λ) 2 = sup 1 ≥ εkλkM 2 2 kψkV 2 (ψ,µ)∈H (kψkV + kµkM )

for all λ ∈ M

and thus the inf-sup condition (2.47) holds. b). Take φ0 ∈ V0 . The functional g : ψ → −a(φ0 , ψ), ψ ∈ V0⊥ is linear and bounded, i.e., g ∈ ˆ ∈ M such that b(ψ, λ) ˆ = −a(φ0 , ψ) (V0⊥ )′ . Application of lemma 2.3.7 yields that there exists λ ˆ Every ψ ∈ V is decomposed as ψ = ψ + ψ for all ψ ∈ V0⊥ . In (2.53) we take (φ, λ) = (φ0 , λ). 0 ⊥ ⊥ ˆ = b(ψ , λ) ˆ we obtain from (2.53) with ψ 0 ∈ V0 , ψ ⊥ ∈ V0 . Using b(φ0 , µ) + b(ψ, λ) ⊥ sup (ψ,µ)∈H

ˆ a(φ0 , ψ 0 ) + a(φ0 , ψ ⊥ ) + b(ψ ⊥ , λ) (kψk2V

+

1 kµk2M ) 2

ˆ 2 ≥ ε kφ0 k2V + kλk M

1

2

.

ˆ = −a(φ , ψ ) for all ψ ∈ V ⊥ : From this we get, using b(ψ ⊥ , λ) 0 ⊥ ⊥ 0 sup ψ0 ∈V0

1 a(φ0 , ψ 0 ) a(φ0 , ψ 0 ) 2 ˆ 2 2 ≥ εkφ kV , = sup 1 ≥ ε kφ0 kV + kλkM 0 2 2 kψ 0 kV (ψ,µ)∈H (kψk2 V + kµkM )

and thus the condition (2.52a) holds. c). Take ψ 0 ∈ V0 , ψ 0 6= 0. The functional g : φ → −a(φ, ψ 0 ), φ ∈ V0⊥ is an element of (V0⊥ )′ . ˆ ∈ M such that b(ψ, µ ˆ ) = −a(ψ, ψ 0 ) for all From lemma 2.3.7 it follows that there exists µ ˆ ). Then there exists u = (φ, λ) ∈ H such that ψ ∈ V0⊥ . In condition (2.37) we take v = (ψ 0 , µ k(u, v) 6= 0, i.e., ˆ + b(ψ 0 , λ) = a(φ, ψ 0 ) + b(φ, µ) ˆ 6= 0. a(φ, ψ 0 ) + b(φ, µ) ˆ to get Decompose φ as φ = φ0 + φ⊥ , φ0 ∈ V0 , φ⊥ ∈ V0⊥ and use the definition of µ ˆ ) = a(φ0 , ψ 0 ) + a(φ⊥ , ψ 0 ) + b(φ⊥ , µ) ˆ = a(φ0 , ψ 0 ). 0 6= a(φ, ψ 0 ) + b(φ, µ Hence, the result in (2.52b) holds. d). Let u = (φ, λ) ∈ H be given. We decompose φ as φ = φ0 + φ⊥ , φ0 ∈ V0 , φ⊥ ∈ V0⊥ . We assume that λ 6= 0, φ⊥ 6= 0. From corollary 2.3.9 it follows that: ∃ ξ ∈ V0⊥ : hξ, ψiV = b(ψ, λ) ∀ ψ ∈ V0⊥ ;

∃ ν ∈ M : hν, µiM = b(φ⊥ , µ) ∀µ ∈ M ;

kξkV ≥ β kλkM ,

kνkM ≥ β kφ⊥ kV .

(2.54) (2.55)

From assumption (2.52a) it follows that there exist δ > 0 and ψ 0 ∈ V0 with kψ 0 kV = 1 such that a(φ0 , ψ 0 ) ≥ δkφ0 kV holds. We now introduce ξ , kξkV

¯ := α1 ψ + ξ, ˜ ψ 0

ξ˜ :=

¯ := α2 ν ˜, µ

ν , kνkM

˜ := ν

40

α1 ∈ R,

α2 ∈ R.

¯ 2 + k¯ Note that kψk µk2M = α21 + 1 + α22 . We obtain: V sup v∈H

k(u, v) ≥ kvkH2

sup

¯ µ)∈H ¯ (ψ,

= sup

α1 ,α2

= sup

α1 ,α2

= sup α1 ,α2

≥ sup

α1 ,α2

¯ + b(φ, µ) ¯ λ) ¯ + b(ψ, a(φ, ψ) 1 ¯ 2 + k¯ (kψk µk2 ) 2 V

M

¯ + a(φ , ψ) ¯ + b(φ , µ) ˜ a(φ0 , ψ) ⊥ ⊥ ¯ + b(ξ, λ) 1

(1 + α21 + α22 ) 2 ¯ + a(φ , ψ) ¯ + hν, µ ˜ V ¯ iM + hξ, ξi a(φ0 , ψ) ⊥ 1

(1 + α21 + α22 ) 2 ˜ + a(φ , ψ) ¯ + α2 kνkM + kξkV α1 a(φ0 , ψ 0 ) + a(φ0 , ξ) ⊥ 1

(1 + α21 + α22 ) 2

 (α1 δ − kak)kφ0 kV + α2 β − kak(α1 + 1) kφ⊥ kV + βkλkM 1

(1 + α21 + α22 ) 2

We take α1 , α2 such that α1 δ − kak = β and α2 β − kak(α1 + 1) = β. This results in (α1 , α2 ) =

kak + β (β, kak + δ). δβ 1

Note that δ ≤ kak, α1 ≥ 1, α2 ≥ 1, and thus (1 + α21 + α22 ) 2 ≤ α1 + α2 ≤ that sup v∈H

(β+2kak)2 . δβ

We conclude

 δβ 2 k(u, v) δβ 2 ≥ ≥ kφ k + kφ k + kλk kukH M 0 V ⊥ V kvkH (β + 2kak)2 (β + 2kak)2

(2.56)

holds. Using a continuity argument the same result holds if λ = 0 or φ⊥ = 0. Hence condition δβ 2 (2.53) holds with ε = (β+2kak) 2. e). Take φ ∈ V0 , φ 6= 0. From (2.52) and theorem 2.3.1 with H1 = H2 = V0 , k(u, v) = a(u, v) it follows that there exists a unique ξ ∈ V0 such that a(ξ, ψ) = hφ, ψiV for all ψ ∈ V0 and kξkV ≤ δ−1 kφkV0′ = δ−1 kφkV . If we take ψ = φ we obtain sup ψ∈V0

kφk2V a(ξ, φ) a(ψ, φ) ≥ = ≥ δkφkV . kψkV kξkV kξkV

(2.57)

We introduce the adjoint bilinear form (with u = (φ, λ), v = (ψ, µ)): ˆ v) := k(v, u) = a k(u, ˆ(φ, ψ) + b(φ, µ) + b(ψ, λ),

a ˆ(φ, ψ) := a(ψ, φ).

From (2.57) it follows that sup ψ∈V0

a ˆ(φ, ψ) ≥ δkφkV kψkV

for all φ ∈ V0 .

Using this one can prove with exactly the same arguments as in part d) that sup v∈H

ˆ v) k(u, δβ 2 ≥ kukH kvkH (β + 2kak)2

for all u ∈ H

ˆ v) 6= 0 and thus holds. Thus for every u ∈ H, u 6= 0 there exists v ∈ H such that k(u, k(u, v) 6= 0, too. This shows that (2.37) holds and completes the proof of a)–e). The final statement in the theorem follows from the final result in theorem 2.3.1 and the choice of ε in part d).

41

Remark 2.3.11 The final result in theorem 2.3.10 predicts that if we scale such that kak = 1 then the stability constant c = (β + 2kak)2 δ−1 β −2 is large when the values of the constants δ and β are much smaller than one. We now give an example with kak = 1 in which the stability deteriorates like δ−1 β −2 for δ ↓ 0, β ↓ 0. This shows that the behaviour c ∼ δ−1 β −2 for the stability constant is sharp. Take V = R2 with the euclidean norm k · k2 , M = R and let e1 = (1 0)T , e2 = (0 1)T be the standard basis vectors in R2 . For fixed β > 0, δ > 0 we introduce the bilinear forms b(ψ, λ) = β eT1 ψλ, ψ ∈ R2 , λ ∈ R,   0 1 T a(φ, ψ) = φ Aψ, A := , φ, ψ ∈ R2 . 1 δ We then have V0 = span(e2 ) and a simple computation yields sup ψ∈V

sup ψ∈V0

b(ψ, λ) = β |λ| kψk2

a(φ, ψ) = δ kφk2 kψk2

for all λ, for all φ ∈ V0 .

With u = (φ, λ), v = (ψ, µ) ∈ R3 we have k(u, v) = a(φ, ψ) + b(φ, µ) + b(ψ, λ) = uT Cv,



 0 1 β C :=  1 δ 0  . β 0 0

We consider the functional f (v) = f1 (ψ) + f2 (µ) = µ = (0 0 1)v with norm kf kH ′ = 1. The unique solution u ∈ V × M = R3 such that k(u, v) = f (v) for all v ∈ V × M is the unique solution of Cu = (0 0 1)T . Hence u=

1 β



1 δβ

1 T . δβ 2

From this is follows that for all 0 < β ≤ 1, 0 < δ ≤ 1 we have √ 3 1 kf kH ′ ≤ kukH ≤ 2 kf kH ′ . 2 δβ δβ 

Important sufficient conditions for well-posedness of the problem (2.43) are formulated in the following corollary.

42

Corollary 2.3.12 For arbitrary f1 ∈ V ′ , f2 ∈ M ′ consider the variational problem (2.43): find (φ, λ) ∈ V × M such that a(φ, ψ) + b(ψ, λ) = f1 (ψ) b(φ, µ) = f2 (µ)

for all ψ ∈ V

for all µ ∈ M.

(2.58a) (2.58b)

Assume that the bilinear forms a(·, ·) and b(·, ·) are continuous and satisfy the following two conditions: ∃ β > 0 : sup ψ∈V

∃γ>0 :

b(ψ, λ) ≥ β kλkM kψkV

∀λ∈M

a(φ, φ) ≥ γ kφk2V

(inf-sup condition),

∀φ∈V

(V-ellipticity).

(2.59a) (2.59b)

Then the conditions (2.47) and (2.52) (with δ = γ) are satisfied and the problem (2.58) has a 2 unique solution (φ, λ). Moreover, the stability bound k(φ, λ)kH ≤ (β+2kak) k(f1 , f2 )kH ′ holds. γβ 2

Proof. Apply theorem 2.3.10.

2.4

Minimization of functionals and saddle-point problems

In the variational problems treated in the previous section we did not assume symmetry of the bilinear forms. In this section we introduce certain symmetry properties and show that in that case equivalent alternative problem formulations can be derived. First we reconsider the case of a continuous bilinear form k : H × H → R that is H-elliptic. This situation is considered in the Lax-Milgram lemma 2.3.5. In addition we now assume that the bilinear form is symmetric: k(u, v) = k(v, u) for all u, v. Theorem 2.4.1 Let H be a Hilbert space and k : H ×H → R a continuous H-elliptic symmetric bilinear form. For f ∈ H ′ let u ∈ H be the unique solution of the variational problem k(u, v) = f (v)

for all v ∈ H.

(2.60)

Then u is the unique minimizer of the functional J(v) :=

1 k(v, v) − f (v). 2

43

(2.61)

Proof. From the Lax-Milgram theorem it follows that the variational problem (2.60) has a unique solution u ∈ H. For arbitrary z ∈ H, z 6= 0 we have, with ellipticity constant γ > 0: 1 J(u + z) = k(u + z, u + z) − f (u + z) 2 1 1 = k(u, u) − f (u) + k(u, z) − f (z) + k(z, z) 2 2 1 1 2 = J(u) + k(z, z) ≥ J(u) + γkzkH > J(u). 2 2 This proves the desired result. We now reconsider the variational problem (2.43) and the result formulated in corollary 2.3.12. Theorem 2.4.2 For arbitrary f1 ∈ V ′ , f2 ∈ M ′ consider the variational problem (2.43): find (φ, λ) ∈ V × M such that a(φ, ψ) + b(ψ, λ) = f1 (ψ) b(φ, µ) = f2 (µ)

for all ψ ∈ V

for all µ ∈ M.

(2.62a) (2.62b)

Assume that the bilinear forms a(·, ·) and b(·, ·) are continuous and satisfy the conditions (2.47) and (2.52). In addition we assume that a(·, ·) is symmetric. Define the functional L : V ×M → R by 1 L(ψ, µ) = a(ψ, ψ) + b(ψ, µ) − f1 (ψ) − f2 (µ) 2 Then the unique solution (φ, λ) of (2.62) is also the unique element in V × M for which L(φ, µ) ≤ L(φ, λ) ≤ L(ψ, λ)

for all ψ ∈ V, µ ∈ M

(2.63)

holds. Proof. ddd From theorem 2.3.10 it follows that the problem (2.62) has a unique solution. ¯ λ) ¯ ∈ V × M . We will prove: Take a fixed element (φ, ¯ µ) ≤ L(φ, ¯ λ) ¯ ∀ µ ∈ M ⇔ b(φ, ¯ ν) = f2 (ν) ∀ ν ∈ M L(φ, (2.64) ¯ λ) ¯ ≤ L(ψ, λ) ¯ ∀ ψ ∈ V ⇔ a(φ, ¯ ψ) + b(ψ, λ) ¯ = f1 (ψ) ∀ ψ ∈ V. L(φ,

¯ λ) ¯ satisfies (2.63) if and only if (φ, ¯ λ) ¯ is a solution of (2.62). This From this it follows that (φ, then proves the statement of the theorem. We now prove (2.64). Note that ¯ µ) ≤ L(φ, ¯ λ) ¯ ∀µ ∈ M L(φ, ¯ µ) − f2 (µ) ≤ b(φ, ¯ λ) ¯ − f2 (λ) ¯ ∀µ ∈ M ⇔ b(φ, ¯ ν) ≤ f2 (ν) ∀ ν ∈ M ⇔ b(φ,

¯ ν) = f2 (ν) ∀ ν ∈ M. ⇔ b(φ,

From this the first result in (2.64) follows. For the second result we first note ¯ λ) ¯ ≤ L(ψ, λ) ¯ ∀ψ ∈ V L(φ, 1 ¯ ¯ ¯ λ) ¯ − f1 (φ) ¯ ≤ 1 a(ψ, ψ) + b(ψ, λ) ¯ − f1 (ψ) ∀ ψ ∈ V φ) + b(φ, ⇔ a(φ, 2 2 1 ¯ ξ) + b(ξ, λ) ¯ − f1 (ξ) ∀ ξ ∈ V. ⇔ − a(ξ, ξ) ≤ a(φ, 2 44

¯ ξ) + b(ξ, λ) ¯ − f1 (ξ) is linear. A Now note that ξ → a(ξ, ξ) is a quadratic term and ξ → a(φ, scaling argument now yields ¯ λ) ¯ ≤ L(ψ, λ) ¯ ∀ψ ∈ V L(φ, ¯ ξ) + b(ξ, λ) ¯ − f1 (ξ) ∀ ξ ∈ V ⇔ 0 ≤ a(φ, ¯ ξ) + b(ξ, λ) ¯ − f1 (ξ) ∀ ξ ∈ V, ⇔ 0 = a(φ, and thus the second result in (2.64) holds. Note that if a(·, ·) is symmetric then (2.52a) implies (2.52b). Due to the property (2.63) the problem (2.62) with a symmetric bilinear form a(·, ·) is called a saddle-point problem.

2.5 2.5.1

Variational formulation of scalar elliptic problems Introduction

We reconsider the example of section 2.1, i.e., the two-point boundary value problem −(au′ )′ = 1 in (0, 1),

(2.65a)

u(0) = u(1) = 0.

(2.65b)

with a(x) > 0 for all x ∈ [0, 1]. Let V1 , k(·, ·) and f (·) be as defined in section 2.1: V1 = { v ∈ C 2 ([0, 1]) | v(0) = v(1) = 0 } Z Z 1 ′ ′ a(x)u (x)v (x) dx, f (v) = k(u, v) =

1

v(x) dx.

0

0

The two-point boundary value problem has a corresponding variational formulation: ( find u ∈ V1 such that k(u, v) = f (v) for all v ∈ V1 .

(2.66)

One easily checks that u ∈ V1 solves (2.65) iff u is a solution of (2.66). Hence, if the problem (2.65) has a solution u ∈ C 2 ([0, 1]) this must also be the unique (due to lemma 2.1.2) solution of (2.66). As in section 2.1 we now consider this problem with a discontinuous piecewise constant function a. Then the classical formulation (2.65) is not well-defined, whereas the variational problem does make sense. However, in section 2.1 it is shown that the problem (2.66) has no solution (the space V1 is “too small”). Since in the bilinear form k(·, ·) only first derivatives occur, the larger space V2 := { v ∈ C 1 ([0, 1]) | v(0) = v(1) = 0 } seems to be more appropriate. This leads to the weaker variational formulation: ( find u ∈ V2 such that (2.67) k(u, v) = f (v) for all v ∈ V2 . However, it is shown in section 2.1 that the problem (2.67) still has no solution. The key step is to take the completion of the space V2 (or , equivalently, of V1 ):  k·k1 k·k k·k = V 1 1 = V 2 1. H01 (0, 1) = C0∞ ([0, 1]) 45

Thus we consider

(

 find u ∈ H01 (0, 1) such that k(u, v) = f (v) for all v ∈ V2 .

 Both the bilinear form k(·, ·) and f (·) are continuous on H01 (0, 1) and thus this problem is equivalent to the variational problem (  find u ∈ H01 (0, 1) such that  (2.68) k(u, v) = f (v) for all v ∈ H01 (0, 1) .

From the Lax-Milgram lemma 2.3.5 it follows that there exists a unique solution (which is usually called the weak solution) of the variational problem (2.68). For  this existence and uniqueness result it is essential that we used the Sobolev space H01 (0, 1) , which is a Hilbert space. In section 2.1 we considered a space V3 with V2 ⊂ V3 ⊂ H01 (0, 1) and showed that the function  k·k u given in (2.11) solves the variational problem in the space V3 . Due to V 3 1 = H01 (0, 1) this function u is also the unique solution of (2.68). We summarize the fundamental steps discussed in this section in the following diagram: Weaker formulation, (2.65) → due to reduction of order of differentiation.

Weaker formulation, → (2.67) → due to completion of space.

→ (2.68).

A very similar approach can be applied to a large class of elliptic boundary value problems, as will be shown in the following sections Remark 2.5.1 For the weak formulation in (2.68) to have a unique solution it is important that the bilinear form is elliptic. The following examples illustrates this. Consider (2.65) with √ √ a(x) = x. Then the solution of this problem is given by u(x) = 32 x(1 − x) (as can be checked  by substitution). Note that u ∈ / V2 . Since u ∈ C 2 (0, 1) ∩ C([0, 1]), this is the classical solution  R1 of (2.65), cf. section 1.2. However, due to 0 u′ (x)2 dx = ∞ it follows that u ∈ / H 1 (0, 1) and thus the weak formulation as in (2.68) does not have a solution.

2.5.2

Elliptic BVP with homogeneous Dirichlet boundary conditions

In this section we derive and analyze variational formulations for a class of scalar elliptic boundary value problems. We consider a linear second order differential operator of the form n n X ∂u  X ∂u ∂ bi aij + + cu. Lu = − ∂xi ∂xj ∂xi

(2.69)

i=1

i,j=1

Note that this form differs from the one in (1.2). If the coefficients aij are differentiable, then due to n n n X X X ∂aij ∂u ∂ ∂2u ∂u  aij aij = + ∂xi ∂xj ∂xi ∂xj ∂xi ∂xj i,j=1

i,j=1

i,j=1

the operator L can be written in the form as in (1.2) with the same c as in (2.69) but with aij P ∂a and bi in (1.2) replaced by −aij and bi − nj=1 ∂xjij , respectively. 46

As in section 1.2.1 the coefficients that determine the principal part of the operator are collected in a matrix A(x) = (aij (x))1≤i,j≤n . We assume that the problem is uniformly elliptic: ∃ α0 > 0

ξ T A(x)ξ ≥ α0 ξ T ξ

for all ξ ∈ Rn , x ∈ Ω.

(2.70)

We use the notation b(x) = (b1 (x), . . . , bn (x))T . In this section we only discuss the case with homogeneous Dirichlet boundary conditions, i.e., we consider the following elliptic boundary value problem: Lu = f u=0

in Ω

(2.71a)

on ∂Ω.

(2.71b)

We now derive a (weaker) variational formulation of this problem along the same lines as in the previous section. For this derivation we assume that the equation (2.71a) has a solution u in the space V := { u ∈ C 2 (Ω) | u = 0 on ∂Ω }. Multiplication of (2.71a) with v ∈ C0∞ (Ω) and using partial integration implies that u also satisfies Z Z f v dx. ∇uT A∇v + b · ∇uv + cuv dx = Ω



Based on this, we introduce a bilinear form and linear functional: Z Z T f v dx. ∇u A∇v + b · ∇uv + cuv dx , f (v) = k(u, v) =

(2.72)





We conclude that a solution u ∈ V also solves the following variational problem: (

find u ∈ V such that k(u, v) = f (v) for all v ∈ C0∞ (Ω).

Note that in the bilinear form no higher than first derivatives occur. This motivates to use spaces k·k1 obtained by completion w.r.t. the norm k·k1 and leads to the Sobolev space H01 (Ω) = C0∞ (Ω) . One may check that C0∞ (Ω) ⊂ V ⊂ H01 (Ω) and thus V ing.

k·k1

= H01 (Ω). We thus obtain the follow-

The variational formulation of (2.71) is given by: ( find u ∈ H01 (Ω) such that k(u, v) = f (v) for all v ∈ H01 (Ω),

(2.73)

with k(·, ·) and f (·) as in (2.72) ¯ and if the It is easy to verify that if the problem (2.73) has a smooth solution u ∈ C 2 (Ω) coefficients are sufficiently smooth then u is also a solution of (2.71). In this sense this problem is the correct weak formulation. 47

Remark 2.5.2 There is a subtle reason why in the derivation of the weak formulation we used ¯ The reason for this choice is closely related to the type the test space C0∞ (Ω) and not C ∞ (Ω). of boundary condition. In the situation here we have prescribed boundary values which are k·k automatically fulfilled in the space V and also (after completion) in V 1 = H01 (Ω). Therefore, R the differential equation should be “tested” in the form Ω (Lu−f )v dx = 0 only in the interior of Ω, i.e., with functions v that are zero on the boundary. Hence we take v ∈ C0∞ (Ω). In problems ¯ with other types of boundary conditions it may be necessary to take test functions v ∈ C ∞ (Ω). This will be further explained in section 2.5.3.  We now analyze existence and uniqueness of the variational problem (2.73) by means of the Lax-Milgram lemma 2.3.5. We use the following mild smoothness assumptions concerning the coefficients in the differential operator: aij ∈ L∞ (Ω) ∀ i, j,

bi ∈ H 1 (Ω) ∩ L∞ (Ω) ∀ i,

c ∈ L∞ (Ω).

(2.74)

Theorem 2.5.3 Let (2.70) and (2.74) hold and assume that the condition 1 − div b + c ≥ 0 2

a.e. in



is fulfilled. Then for every f ∈ L2 (Ω) the variational problem (2.73) with f (v) := a unique solution u. Moreover, the inequality

R

Ω f v dx

has

kuk1 ≤ Ckf kL2 holds with a constant C independent of f . the fact that k · k1 and | · |1 , defined by |u|21 = P Proof. αWe 2use the Lax-Milgram lemma and 1 |α|=1 kD ukL2 , are equivalent norms on H0 (Ω). From Z |f (v)| = | f v dx| ≤ kf kL2 kvkL2 ≤ kf kL2 kvk1 Ω

it follows that f (·) defines a bounded linear functional on H01 (Ω). We now check the boundedness of the bilinear form k(·, ·) for u, v ∈ H01 (Ω): Z n Z n Z X X ∂u ∂u ∂v dx + v dx + cuv dx bi |k(u, v)| ≤ aij ∂xj ∂xi Ω Ω ∂xi Ω ≤

i=1

i,j=1 n X

i,j=1

n

kaij kL∞ k

+ kck

L∞



n X

i,j=1

X ∂u ∂v ∂u kbi kL∞ k kL2 k kL2 + k 2 kvkL2 ∂xj ∂xi ∂xi L

kukL2 kvkL2

kaij kL∞ kuk1 kvk1 +

≤ Ckuk1 kvk1 .

i=1

n X i=1

kbi kL∞ kuk1 kvkL2 + kckL∞ kukL2 kvkL2

Note that C0∞ (Ω) is dense in H01 (Ω), the bilinear form is continuous and k · k1 and | · |1 are equivalent norms. Hence, for the ellipticity, k(u, u) ≥ γkuk21 (with γ > 0), to hold it suffices to 48

show k(u, u) ≥ γ|u|21 for all u ∈ C0∞ (Ω). Take u ∈ C0∞ (Ω). From the uniform ellipticity assumption (2.70) it follows (with ξ = ∇u) that Z X n n Z X ∂u ∂u ∂u 2 aij dx ≥ α0 dx = α0 |u|21 , ∂x ∂x ∂x j i j Ω Ω j=1

i,j=1

with α0 > 0 holds. Using partial integration we obtain Z n Z n Z X ∂u ∂(u2 ) 1X 1 bi bi div b u2 dx. u dx = dx = − 2 ∂xi 2 Ω Ω ∂xi Ω i=1

i=1

Collecting these results we get k(u, u) ≥ α0 |u|21 +

Z



 1 − div b + c u2 dx. 2

The desired result follows from the assumption − 12 div b + c ≥ 0 (a.e.). We now formulate two important special cases. Corollary 2.5.4 For every f ∈ L2 (Ω) the Poisson equation (in variational formulation) ( find u ∈ H01 (Ω) such that R R 1 Ω ∇u · ∇v dx = Ω f v dx for all v ∈ H0 (Ω)

has a unique solution. Moreover, kuk1 ≤ Ckf kL2 holds with a constant C independent of f . Corollary 2.5.5 For f ∈ L2 (Ω), ε > 0 and bi ∈ H 1 (Ω) ∩ L∞ (Ω), i = 1, . . . , n, consider the convection-diffusion problem (in variational formulation) ( find u ∈ H01 (Ω) such that R R R ε Ω ∇u · ∇v dx + Ω b · ∇u v dx = Ω f v dx for all v ∈ H01 (Ω).

If div b ≤ 0 holds (a.e.), then this problem has a unique solution, and kuk1 ≤ Ckf kL2 holds with a constant C independent of f . We note that the condition div b ≤ 0 holds, for example, if all bi , i = 1, . . . , n, are constants. In the singular perturbation case it may happen that the stability constant deteriorates: C = C(ε) ↑ ∞ if ε ↓ 0.

2.5.3

Other boundary conditions

In this section we consider variational formulations of elliptic problems with boundary conditions that are not of homogeneous Dirichlet type. For simplicity we only discuss the case L = −∆. The corresponding results for general second order differential operators (L as in (2.69)) are very similar. Inhomogeneous Dirichlet boundary conditions. First we treat the Poisson equation with Dirichlet boundary data φ that are not identically zero: −∆u = f in Ω

u = φ on ∂Ω. 49

Assume that this problem has a solution in the space V = { u ∈ C 2 (Ω) | u = φ on ∂Ω }. After completion (w.r.t. k · k1 ) this will yield the space { u ∈ H 1 (Ω) | γ(u) = φ } where γ is the trace operator. As in the previous section the boundary conditions are automatically fulfilled in this space and thus we take test functions v ∈ C0∞ (Ω) (cf. remark 2.5.2). Multiplication of the differential equation with such a function v, partial integration and using completion with respect to k · k1 results in the following variational problem: ( find u ∈ { u ∈ H 1 (Ω) | u|∂Ω = φ } such that R R (2.75) 1 Ω ∇u · ∇v dx = Ω f v dx for all v ∈ H0 (Ω). R ˆ) · ∇v dx = 0 for all If u and u ˆ are solutions of (2.75) then u − u ˆ ∈ H01 (Ω) and Ω ∇(u − u 1 ˆ it follows that u = u ˆ and thus we have at most one solution. To v ∈ H0 (Ω). Taking v = u − u prove existence we introduce a transformed problem. For the identity u|∂Ω = φ (⇔ γ(u) = φ) to make sense, we assume that the boundary data φ are such that φ ∈ range(γ). Then there exists u0 ∈ H 1 (Ω) such that u0|∂Ω = φ.

Lemma 2.5.6 Assume f ∈ L2 (Ω) and φ ∈ range(γ). Take u0 ∈ H 1 (Ω) such that γ(u0 ) = φ. Then u solves the variational problem (2.75) iff w = u − u0 solves the following: ( find w ∈ H01 (Ω) such that R R R (2.76) 1 ∇w · ∇v dx = f v dx − Ω Ω Ω ∇u0 · ∇v dx for all v ∈ H0 (Ω). Proof. Trivial.

R R Note that f (v) = Ω f v dx − Ω ∇u0 · ∇v dx defines a continuous linear functional on H01 (Ω). The Lax-Milgram lemma yields existence of a solution of (2.76) and thus of (2.75). Natural boundary conditions. We now consider a problem in which also (normal) derivatives of u occur in the boundary condition: −∆u = f

∂u +βu =φ ∂n

in Ω

(2.77a)

on ∂Ω

(2.77b)

∂u with β ∈ R a constant and ∂n = (n · ∇)u = ∇u · n the normal derivative at the boundary. For this problem the following difficulty arises related to the (normal) derivative in the boundary condition. For u ∈ H 1 (Ω) the weak derivative Dα u, |α| = 1, is an element of L2 (Ω). It can be shown that it is not possible to define unambiguously v|∂Ω for v ∈ L2 (Ω). In other words, ∂u for u ∈ H 1 (Ω). This is the there is no trace operator which in a satisfactory way defines ∂n |∂Ω reason why for the solution u we search in the space H 1 (Ω) which does not take the boundary conditions into account. Due to this, for the derivation of an appropriate weak formulation, we ¯ (and not C ∞ (Ω), cf. remark 2.5.2). This results multiply (2.77a) with test functions v ∈ C ∞ (Ω) 0 in Z Z Z ¯ f v dx for all v ∈ C ∞ (Ω). ∇u · n v ds = ∇u · ∇v dx − Ω



∂Ω

We now use the boundary condition (2.77b) and then obtain Z Z Z Z φv ds f v dx + uv ds = ∇u · ∇v dx + β Ω

∂Ω

∂Ω



50

¯ for all v ∈ C ∞ (Ω).

This results in the following variational problem: ( find u ∈ H 1 (Ω) such that R R R R 1 ∇u · ∇v dx + β uv ds = f v dx + Ω ∂Ω Ω ∂Ω φv ds ∀ v ∈ H (Ω).

(2.78)

¯ and if φ is It is easy to verify that if the problem (2.78) has a smooth solution u ∈ C 2 (Ω) sufficiently smooth then u is also a solution of (2.77). In this sense this problem is the correct weak formulation. Note that now the space H 1 (Ω) is used and not H01 (Ω). The space H 1 (Ω) does not contain any information concerning the boundary condition (2.77b). The boundary data are part of the bilinear form used in (2.78). In the case of Dirichlet boundary conditions, as in (2.73) and (2.75), this is the other way around: The solution space is such that the boundary conditions are automatically fulfilled and the boundary data do not influence the bilinear form. The latter class of boundary conditons are called essential boundary conditions (these are a-priori fulfilled by the choice of the solution space). Boundary conditions as in (2.77b) are called natural boundary conditions (these are “automatically” fulfilled if the variational problem is solved). We now analyze existence and uniqueness of the variational formulation. For this we need two variants of the Poincare-Friedrichs inequality: Lemma 2.5.7 There exist constants C1 and C2 such that Z  2 2 u2 ds for all u ∈ H 1 (Ω), kuk1 ≤ C1 |u|1 + Z∂Ω  2 2 for all u ∈ H 1 (Ω). kuk1 ≤ C2 |u|1 + | u dx|2

(2.79a) (2.79b)



[Note that for u ∈ H01 (Ω) the first result reduces to the Poincare-Friedrichs inequality] Proof. For i = 1, 2 we define qi : H 1 (Ω) → R by Z Z u2 ds, q2 (u) = | u dx|2 . q1 (u) = Ω

∂Ω

Then qi is continuous on H 1 (Ω) (for i = 1 this follows from the continuity of the trace operator), qi (αu) = α2 qi (u) for all α ∈ R and if u is equal to a constant, say c, then qi (u) = qi (c) = 0 iff c = 0.  Assume that there does not exist a constant C such that kuk21 ≤ C |u|21 +qi (u) for all u ∈ H 1 (Ω). Then there exists a sequence (uk )k≥1 in H 1 (Ω) such that  1 = kuk k21 ≥ k |uk |21 + qi (uk ) for all k. (2.80)

This sequence is bounded in H 1 (Ω). Since the embedding H 1 (Ω) ֒→ L2 (Ω) is compact, there exists a subsequence (uk(ℓ) )ℓ≥1 that converges in L2 (Ω): lim uk(ℓ) = u

ℓ→∞

in L2 (Ω).

From (2.80) it follows that limℓ→∞ |uk(ℓ) |1 = 0 and thus lim Dα uk(ℓ) = 0,

ℓ→∞

if |α| = 1, in L2 (Ω). 51

From this we conclude that u ∈ H 1 (Ω),

lim uk(ℓ) = u

ℓ→∞

in H 1 (Ω),

Dα u = 0 (a.e.) if |α| = 1.

Hence, u must be constant (a.e.) on Ω, say u = c. From (2.80) we obtain lim qi (uk(ℓ) ) = 0.

ℓ→∞

Using the continuity of qi it follows that qi (c) = qi (u) = 0 and thus c = 0. This yields a contradiction: 1 = lim kuk(ℓ) k21 = kuk21 = kck21 = 0. ℓ→∞

Thus the results are proved. Using this lemma we can prove the following result for the variational problem in (2.78): Theorem 2.5.8 Consider the variational problem (2.78) with β > 0, f ∈ L2 (Ω) and φ ∈ L2 (∂Ω). This problem has a unique solution u and the inequality  kuk1 ≤ C kf kL2 + kφkL2 (∂Ω) holds with a constant C independent of f and φ.

Proof. For v ∈ H 1 (Ω) define the linear functional: Z Z f v dx + g:v→

φv ds.

∂Ω



Using the continuity of the trace operator we obtain |g(v)| ≤ kf kL2 kvkL2 + kφkL2 (∂Ω) kγ(v)kL2 (∂Ω)  ≤ c kf kL2 + kφkL2 (∂Ω) kvk1

′ R R and thus g ∈ H 1 (Ω) . Define k(u, v) = Ω ∇u · ∇v dx + β ∂Ω uv ds. The continuity of this bilinear form follows from |k(u, v)| ≤ |u|1 |v|1 + βkγ(u)kL2 (∂Ω) kγ(v)kL2 (∂Ω) ˜ ≤ |u|1 |v|1 + Cβkuk1 kvk1 ≤ Ckuk 1 kvk1 . The ellipticity can be concluded from the result in (2.79a): Z 2 u2 ds ≥ Ckuk21 for all u ∈ H 1 (Ω) k(u, u) = |u|1 + β ∂Ω

and with C > 0. Application of the Lax-Milgram lemma completes the proof. We now analyze the problem with pure Neumann boundary conditions, i.e., β = 0 in (2.77b). Clearly, for this problem we can not have uniqueness: if u is a solution then for any constant c the function u + c is also a solution. Moreover, for existence of a solution the data f and φ must satisfy a certain condition. Assume that u ∈ H 2 (Ω) is a solution of (2.77) for the case β = 0, then Z Z Z Z Z φ ds ∇u · n ds = − ∇u · ∇1 dx − f dx = − ∆u dx = Ω



∂Ω



52

∂Ω

must hold. This motivates the introduction of the compatibility relation: Z Z φ ds = 0. f dx +

(2.81)

∂Ω



To obtain uniqueness, for the solution space we take a subspace of H 1 (Ω) consisting of functions u with hu, 1iL2 = hu, 1i1 = 0: Z u dx = 0 }. H∗1 (Ω) := { u ∈ H 1 (Ω) | Ω

Since this is a closed subspace of H 1 (Ω) it is a Hilbert space. Instead of (2.78) we now consider: ( find u ∈ H∗1 (Ω) such that R R R (2.82) 1 Ω ∇u · ∇v dx = Ω f v dx + ∂Ω φv ds for all v ∈ H (Ω).

For this problem we have existence and uniqueness:

Theorem 2.5.9 Consider the variational problem (2.82) with f ∈ L2 (Ω), φ ∈ L2 (∂Ω) and assume that the compatibility relation (2.81) holds. Then this problem has a unique solution u and the inequality  kuk1 ≤ C kf kL2 + kφkL2 (∂Ω)

holds with a constant C independent of f and φ.

Proof. For v ∈ H 1 (Ω) define the linear functional: Z Z f v dx + g:v→

φv ds.

∂Ω



The continuity of this functional is shown in the proof of theorem 2.5.8. DefineR k(u, v) = R ∇u · ∇v dx. The continuity of this bilinear form is trivial. For u ∈ H∗1 (Ω) we have Ω u dx = 0 Ω and thus, using the result in (2.79b), we get Z 2 k(u, u) = |u|1 + | u dx|2 ≥ Ckuk21 for all u ∈ H∗1 (Ω) Ω

with a constant C > 0. Hence, the bilinear form is H∗1 (Ω)-elliptic. From the Lax-Milgram lemma it follows that there exists a unique solution u ∈ H∗1 (Ω) such that k(u, v) = g(v) for all v ∈ H∗1 (Ω). Note that k(u, 1) = 0 and, due to the compatibility relation, g(1) = 0. It follows that for the solution u we have k(u, v) = g(v) for all v ∈ H 1 (Ω).

Remark 2.5.10 For the case β < 0 it may happen that the problem (2.77) has a nontrivial kernel (and thus we do not have uniqueness). Moreover, in general this kernel is not as simple as for the case β = 0. As an example, consider the problem −u′′ (x) = 0 for x ∈ (0, 1)

−u′ (0) − 2u(0) = 0

u′ (1) − 2u(1) = 0.

All functions in span(u∗ ) with u∗ (x) = 2x − 1 are solutions of this problem. 53



Mixed boundary conditions It may happen that in a boundary value problem both natural and essential boundary values occur. We discuss a typical example. Let Γe and Γn be parts of the boundary ∂Ω such that measn−1 (Γe ) > 0, measn−1 (Γn ) > 0, Γe ∩ Γn = ∅, Γe ∪ Γn = ∂Ω. Now consider the following boundary value problem: −∆u = f

u=0 ∂u =φ ∂n

in Ω

(2.83a)

on Γe

(2.83b)

on Γn .

(2.83c)

The Dirichlet (= essential) boundary conditions are fulfilled by the choice of the solution space: HΓ1e (Ω) := { u ∈ H 1 (Ω) | γ(u) = 0 on Γe }. The Neumann (= natural) boundary conditions will be part of the linear functional used in the variational problem. A similar derivation as for the previous examples results in the following variational problem: ( find u ∈ HΓ1e (Ω) such that R R R (2.84) 1 (Ω). φv ds for all v ∈ H ∇u · ∇v dx = f v dx + Γ Ω Ω Γn e

¯ then u is a solution of One easily verifies that if this problem has a smooth solution u ∈ C 2 (Ω) the problem (2.83). Remark 2.5.11 For the proof of existence and uniqueness we need the following PoincareFriedrichs inequality in the space HΓ1e (Ω): ∃C>0:

kukL2 ≤ C|u|1

for all u ∈ HΓ1e (Ω).

For a proof of this result we refer to the literature, e.g. [3], Remark 5.16.

(2.85) 

Theorem 2.5.12 The variational problem (2.84) with f ∈ L2 (Ω) and φ ∈ L2 (∂Ω) has a unique solution u and the inequality  kuk1 ≤ C kf kL2 + kφkL2 (∂Ω)

holds with a constant C independent of f and φ holds. Proof. For v ∈ HΓ1e (Ω) define the linear functional: g:v→

Z

f v dx +

Z

φv ds.

∂Ω



The continuity of this linearRfunctional can be shown as in the proof of theorem 2.5.8. Define the bilinear form k(u, v) = Ω ∇u · ∇v dx. The continuity of k(·, ·) is trivial. From (2.85) it follows that k(u, u) ≥ Ckuk21 for all u ∈ HΓ1e (Ω) and with C > 0. Hence the bilinear form is HΓ1e (Ω)-elliptic. Application of the Lax-Milgram lemma completes the proof.

54

2.5.4

Regularity results

In this section we present a few results from the literature on global smoothness of the solution. First the notion of H m -regularity is introduced. For ease of presentation we restrict ourselves to elliptic boundary value problems with homogeneous Dirichlet boundary conditions. 1 1 Definition 2.5.13 (H m -regularity.) Let ′ k : H0 (Ω) × H0 (Ω) → R be 1a continuous elliptic −1 1 bilinear form and f ∈ H (Ω) = H0 (Ω) . For the unique solution u ∈ H0 (Ω) of the problem

k(u, v) = f (v) for all v ∈ H01 (Ω)

the inequality kuk1 ≤ Ckf k−1 holds with a constant C independent of f . This property is called H 1 -regularity of the variational problem. If for some m > 1 and f ∈ H m−2 (Ω) the unique solution u of Z f v dx for all v ∈ H01 (Ω) k(u, v) = Ω

satisfies kukm ≤ Ckf km−2 with a constant C independent of f then the variational problem is said to be m-regular.



The result in the next theorem is an analogon of the result in theorem 1.2.7, but now the smoothness is measured using Sobolev norms instead of H¨older norms. Theorem 2.5.14 ([39], Theorem 8.13) Assume that u ∈ H01 (Ω) is a solution of (2.73) (for existence of u: see theorem 2.5.3). For some m ∈ N assume that ∂Ω ∈ C m+2 and: ¯ ∀i, j, bi ∈ L∞ (Ω) ∀i, c ∈ L∞ (Ω), for m = 0 : f ∈ L2 (Ω), aij ∈ C 0,1 (Ω) ¯ ∀i, j, bi ∈ C m−1,1 (Ω) ¯ ∀i, c ∈ C m−1,1 (Ω). ¯ for m ≥ 1 : f ∈ H m (Ω), aij ∈ C m,1 (Ω) Then u ∈ H m+2 (Ω) holds and kukm+2 ≤ C kukL2 + kf km with a constant C independent of f and u.



(2.86)

Corollary 2.5.15 Assume that the assumptions of theorem 2.5.3 and of theorem 2.5.14 are fulfilled. Then the variational problem (2.73) is H m+2 -regular. Proof. Due to theorem 2.5.3 the problem has a unique solution u and kuk1 ≤ Ckf kL2 holds. Now combine this with the result in (2.86):   kukm+2 ≤ C kukL2 + kf km ≤ C1 (kf kL2 + kf km ≤ 2C1 kf km

with a constant C1 independent of f .

55

Note that in these regularity results there is a severe condition on the smoothness of the boundary. For example, for m = 0, i.e., H 2 -regularity, we have the condition ∂Ω ∈ C 2 . In practice, this assumption often does not hold. For convex domains one can prove H 2 -regularity without assuming such a strong smoothness condition on ∂Ω. The following result is due to [53]: Theorem 2.5.16 Let Ω be convex. Suppose that the assumptions of theorem 2.5.3 hold and in ¯ for all i, j. Then the unique solution of (2.73) satisfies addition aij ∈ C 0,1 (Ω) kuk2 ≤ Ckf kL2 with a constant C independent of f , i.e., the variational problem (2.73) is H 2 -regular. We note that very similar regularity results hold for elliptic problems with natural boundary conditions (as in (2.77b)). In problems with mixed boundary conditions, however, one in general has less regularity.

2.5.5

Riesz-Schauder theory

In this section we show that for the variational formulation of the convection-diffusion problem results on existence and uniqueness can be derived that avoid the condition − 21 div b + c ≥ 0 as used in theorem 2.5.3. The analysis is based on the so-called Riesz-Schauder theory and uses results on compact embeddings of Sobolev spaces. ([47], Thm. 7.2.14) ....In preparation .......

2.6

Weak formulation of the Stokes problem

We recall the classical formulation of the Stokes problem with homogeneous Dirichlet boundary conditions: −∆u + ∇p = f

div u = 0 u = 0

in Ω,

(2.87a)

in Ω

(2.87b)

on ∂Ω.

(2.87c)

In this section we derive a variational formulation of this problem and prove existence and uniqueness of a weak solution. From the formulation of the Stokes problem it is clear that the pressure p is determined only up to a constant. In order to eliminate this degree of freedom we introduce the additional requirement Z hp, 1iL2 =

p dx = 0.



¯ n | u = 0 on ∂Ω }, Assume that the Stokes Rproblem has a solution u ∈ V := { u ∈ C 2 (Ω) ¯ | p ∈ M := { p ∈ C 1 (Ω) Ω p dx = 0 }. Then (u, p) also solves the following variational problem: find (u, p) ∈ V × M such that Z Z Z f · v dx ∀ v ∈ C0∞ (Ω)n p div v dx = ∇u · ∇v dx − Ω Ω ZΩ (2.88) q div u dx = 0 ∀ q ∈ M Ω

56

R P with Ω ∇u · ∇v dx = h∇u, ∇viL2 := ni=1 h∇ui , ∇vi iL2 . We introduce the bilinear forms and the linear functional Z ∇u · ∇v dx (2.89a) a(u, v) := ΩZ (2.89b) b(v, q) := − q div v dx Z Ω f · v dx. (2.89c) f (v) := Ω

Note that no derivatives of the pressure occur in (2.88). To obtain a weak formulation in appropriate Hilbert spaces we apply the completion principle. For the velocity we use completion w.r.t. k · k1 and for the pressure we use the norm k · kL2 : Z k·k1 k·k1 k·kL2 2 2 1 n ∞ n p dx = 0}. V = C0 (Ω) = L0 (Ω) := {p ∈ L (Ω) | = H0 (Ω) , M Ω

This results in the following weak formulation of the Stokes problem, with V := H01 (Ω)n , M := L20 (Ω): Find (u, p) ∈ V × M such that a(u, v) + b(v, p) = f (v)

for all v ∈ V

b(u, q) = 0 for all q ∈ M.

(2.90a) (2.90b)

¯ n and p ∈ C 1 (Ω) ¯ satisfy (2.90). Then (u, p) is a solution Lemma 2.6.1 Suppose that u ∈ C 2 (Ω) of (2.87). Proof. Using partial integration it follows that Z  − ∆u + ∇p − f · v dx = 0 Ω

for all v ∈ C0∞ (Ω)n

R and thus −∆u + ∇p = f in Ω. Note that by Green’s formula we have that Ω div u dx = 0 for R u ∈ H01 (Ω)n . Hence in (2.90b) we can take q = div u, which yields Ω (div u)2 dx = 0 and thus div u = 0 in Ω. To show the well-posedness of the variational formulation of the Stokes problem we apply corollary 2.3.12. For this we need the following inf-sup condition, which will be proved in section 2.6.1 R Ω q div v dx (2.91) ≥ β kqkL2 ∀ q ∈ L20 (Ω). ∃β>0 : sup kvk 1 n 1 v∈H (Ω) 0

Using this property we obtain a fundamental result on well-posedness of the variational Stokes problem:

57

Theorem 2.6.2 For every f ∈ L2 (Ω)n the Stokes problem (2.90) has a unique solution (u, p) ∈ V × M . Moreover, the inequality kuk1 + kpkL2 ≤ C kf kL2 holds with a constant C independent of f .

R

Proof. We can apply corollary 2.3.12 with V, M, a(·, ·), b(·, ·) as defined above and f1 (v) = Ω f · v dx, f2 = 0. The continuity of b(·, ·) on V × M follows from |b(v, q)| =

Z



√ q div v dx ≤ kqkL2 kdiv vkL2 ≤ n kqkL2 kvk1

for (u, q) ∈ V × M.

The inf-sup condition is given in (2.91). Note that the minus sign in (2.89b) does not play a role for the inf-sup condition in (2.91). The continuity of a(·, ·) on V × V is clear. The V -ellipticity follows from the Poincare-Friedrichs inequality: a(u, u) =

n X i=1

|ui |21

≥c

n X i=1

kui k21 = ckuk21

for all u ∈ V.

Application of corollary 2.3.12 yields the desired result.

Remark 2.6.3 Note that the bilinear form a(·, ·) is symmetric and thus we can apply theorem 2.4.2. This shows that the variational formulation of the Stokes problem is equivalent to a saddle-point problem. 

2.6.1

Proof of the inf-sup property

In this section we derive the fundamental inf-sup property in (2.91). First we note that a function f ∈ L2 (Ω) induces a bounded linear functional on H01 (Ω) that is given by: u→

Z

u ∈ H01 (Ω),

f (x)u(x) dx,



kf k−1 =

|hf, uiL2 | . kuk1 u∈H 1 (Ω) sup 0

We now define the first (partial) derivative of an L2 -function (in the sense of distributions). For f ∈ L2 (Ω) the mapping Z F : u → − f (x)D α u(x) dx, |α| = 1, u ∈ H01 (Ω), Ω

defines a bounded linear functional on H01 (Ω). This functional is denoted by F =: Dα f ∈ ′ H −1 (Ω) = H01 (Ω) . Its norm is defined by kD α f k−1 :=

|hf, Dα uiL2 | |(D α f )(u)| = sup . kuk1 kuk1 u∈H 1 (Ω) u∈H 1 (Ω) sup

0

0

58

Based on these partial derivatives we define ∂f  ∂f ,..., , ∂x1 ∂xn v u n sX uX ∂f =t k k2−1 = kD α f k2−1 . ∂xi

∇f = k∇f k−1

i=1

|α|=1

In the next theorem we present a rather deep result from analysis. Its proof (for the case ∂Ω ∈ C 0,1 ) is long and technical. Theorem 2.6.4 There exists a constant C such that for all p ∈ L2 (Ω):  kpkL2 ≤ C kpk−1 + k∇pk−1 . Proof. We refer to [65], lemma 7.1 or [31], remark III.3.1.

Remark 2.6.5 From the definitions of kpk−1 , k∇pk−1 it immediately follows that kpk−1 ≤ kpkL2 and k∇pk−1 ≤ kpkL2 for all p ∈ L2 (Ω). Hence, using theorem 2.6.4 it follows that k · kL2 and k·k−1 +k∇·k−1 are equivalent norms on L2 (Ω). This can be seen as a (nontrivial) extension of the (trivial) result that on H m (Ω), m ≥ 1, the norms k · km and k · km−1 + k∇ · km−1 are equivalent. From the result in theorem 2.6.4 we obtain the following: Lemma 2.6.6 There exists a constant C such that for all p ∈ L20 (Ω).

kpkL2 ≤ Ck∇pk−1

Proof. Suppose that this result does not hold. Then there exists a sequence (pk )k≥1 in L20 (Ω) such that (2.92) 1 = kpk kL2 ≥ kk∇pk k−1 for all k.

From the fact that the continuous embedding H01 (Ω) ֒→ L2 (Ω) is compact, it follows that ′ −1 2 2 ′ 1 L (Ω) = L (Ω) ֒→ H0 (Ω) = H (Ω) is a compact embedding. Hence there exists a subsequence (pk(ℓ) )ℓ≥1 that is a Cauchy sequence in H −1 (Ω). From (2.92) and theorem 2.6.4 it follows that (pk(ℓ) )ℓ≥1 is a Cauchy sequence in L2 (Ω) and thus there exists p ∈ L2 (Ω) such that lim pk(ℓ) = p

ℓ→∞

in L2 (Ω).

(2.93)

From (2.92) we get limℓ→∞ k∇pk(ℓ) k−1 = 0 and thus ∂pk(ℓ) (φ) = 0 ℓ→∞ ∂xi lim

for all φ ∈ C0∞ (Ω) and all i = 1, . . . , n.

In combination with (2.93) this yields ∂pk(ℓ) ∂φ ∂φ (φ) = − lim hpk(ℓ) , i 2 = −hp, i 2 ℓ→∞ ℓ→∞ ∂xi ∂xi L ∂xi L

0 = lim

59

for i = 1, . . . , n.

Hence p ∈ H 1 (Ω) and ∇p = 0. It follows R that p isR equal to a constant (a.e.), say p = c. From 2 (2.93) and pk(ℓ) ∈ L0 (Ω) it follows that Ω p dx = Ω c dx = 0 and thus c = 0. This results in a contradiction: 1 = lim kpk(ℓ) kL2 = kpkL2 = kckL2 = 0, ℓ→∞

and thus the proof is complete. Theorem 2.6.7 The inf-sup property (2.91) holds. Proof. From lemma 2.6.6 it follows that there exists c˜ > 0 such that k∇qk−1 ≥ c˜kqkL2 for all q ∈ L20 (Ω). Hence, for suitable k with 1 ≤ k ≤ n we have k

c˜ ∂q k−1 ≥ √ kqkL2 ∂xk n

for all q ∈ L20 (Ω).

v k1 = 1 and Thus there exists v˜ ∈ H01 (Ω) with k˜ Z 1 c˜ ∂q ∂˜ v | (˜ v )| = q dx ≥ √ kqkL2 ∂xk 2 n Ω ∂xk

for all q ∈ L20 (Ω).

˜ = (˜ For v v1 , . . . , v˜n ) ∈ H01 (Ω)n defined by v˜k = v˜, v˜i = 0 for i 6= k we have R R q div v dx q div v dx Ω Ω = sup sup kvk1 kvk1 v∈H01 (Ω)n v∈H01 (Ω)n R R q ∂ v˜ dx q div v ˜ dx 1 c˜ Ω ∂xk ≥ Ω = ≥ √ kqkL2 k˜ vk1 k˜ v k1 2 n for all q ∈ L20 (Ω). This completes the proof.

2.6.2

Regularity of the Stokes problem

We present two results from the literature concerning regularity of the Stokes problem. The first result is proved in [26, 56]: Theorem 2.6.8 Let (u, p) ∈ H01 (Ω)n × L20 (Ω) be the solution of the Stokes problem (2.90). For m ≥ 0 assume that f ∈ H0m (Ω)n and ∂Ω ∈ C m+2 . Then u ∈ H m+2 (Ω)n , p ∈ H m+1 (Ω) and the inequality kukm+2 + kpkm+1 ≤ Ckf km (2.94)

holds, with a constant C independent of f .

If the property in (2.94) holds then the Stokes problem is said to be H m+2 -regular. Note that even for H 2 -regularity (i.e., m = 0) one needs the assumption ∂Ω ∈ C 2 , which in practice is often not fulfilled. For convex domains this assumption can be avoided (as in theorem 2.5.16). The following result is presented in [54] (only n = 2) and in [30] (for n ≥ 2): Theorem 2.6.9 Let (u, p) ∈ H01 (Ω)n × L20 (Ω) be the solution of the Stokes problem (2.90). Suppose that Ω is convex. Then u ∈ H 2 (Ω)n , p ∈ H 1 (Ω) and the inequality kuk2 + kpk1 ≤ Ckf kL2 holds, with a constant C independent of f . 60

2.6.3

Other boundary conditions

For a Stokes problem with nonhomogeneous Dirichlet boundary conditions, say u = g on ∂Ω, a compatibility condition is needed: Z ∂Ω

g · n ds = 0

...other boundary conditions ... ...in preparation ....

61

(2.95)

62

Chapter 3

Galerkin discretization and finite element method 3.1

Galerkin discretization

We consider a variational problem as in theorem 2.3.1, i.e., for f ∈ H2′ the variational problem is given by: find u ∈ H1 such that k(u, v) = f (v) for all v ∈ H2 . (3.1) We assume that the bilinear form k(·, ·) is continuous ∃M:

k(u, v) ≤ M kukH1 kvkH2

for all u ∈ H1 , v ∈ H2

(3.2)

and that the conditions (2.36) and (2.37) from theorem 2.3.1 hold: ∃ε>0 :

sup v∈H2

k(u, v) ≥ ε kukH1 kvkH2

for all

∀ v ∈ H2 , v 6= 0, ∃ u ∈ H1 :

u ∈ H1 ,

k(u, v) 6= 0.

(3.3) (3.4)

From theorem 2.3.1 we know that for a continuous bilinear form the conditions (3.3) and (3.4) are necessary and sufficient for well-posedness of the variational problem in (3.1). The Galerkin discretization of the problem (3.1) is based on the following simple idea. We assume a finite dimensional subspaces H1,h ⊂ H1 , H2,h ⊂ H2 (note: in concrete cases the index h will correspond to some mesh size parameter) and consider the finite dimensional variational problem find uh ∈ H1,h such that k(uh , vh ) = f (vh ) for all vh ∈ H2,h . (3.5) This problem is called the Galerkin discretization of (3.1) (in H1,h × H2,h ). We now discuss the well-posedness of this Galerkin-discretization. First note that the continuity of k : H1,h ×H2,h → R follows from (3.2). From theorem 2.3.1 it follows that we need the conditions (3.3) and (3.4) with Hi replaced by Hi,h , i = 1, 2. However, because Hi,h is finite dimensional we only need (3.3) since this implies (3.4) (see remark 2.3.2). Thus we formulate the following (discrete) inf-sup condition in the space H1,h × H2,h : ∃ εh > 0 :

sup vh ∈H2,h

k(uh , vh ) ≥ εh kuh kH1 kvh kH2 63

for all

uh ∈ H1,h .

(3.6)

We now prove two fundamental results: Theorem 3.1.1 (Cea-lemma.) Let (3.2), (3.3), (3.4), (3.6) hold. Then the variational problem (3.1) and its Galerkin discretization (3.5) have unique solutions u and uh , respectively. Furthermore, the inequality ku − uh kH1 ≤ 1 + holds.

M inf ku − vh kH1 εh vh ∈H1,h

(3.7)

Proof. The result on existence and uniqueness follows from theorem 2.3.1 and the fact that in the finite dimensional case (3.3) implies (3.4). From (3.1) and (3.5) it follows that k(u − uh , vh ) = 0

for all vh ∈ H2,h .

(3.8)

For arbitrary vh ∈ H1,h we have, due to (3.6), (3.8), (3.2): kvh − uh kH1 ≤

1 εh

=

1 εh

sup wh ∈H2,h

sup wh ∈H2,h

k(vh − uh , wh ) kwh kH2

k(vh − u, wh ) M kvh − ukH1 . ≤ kwh kH2 εh

From this and the triangle inequality ku − uh kH1 ≤ ku − vh kH1 + kvh − uh kH1

for all vh ∈ H1,h

the result follows. The result in this theorem simplifies if we consider the important special case H1 = H1 =: H, H1,h = H2,h =: Hh and assume that the bilinear form k(·, ·) is elliptic on H. Corollary 3.1.2 Consider the case H1 = H2 =: H and H1,h = H2,h =: Hh . Assume that (3.2) holds and that the bilinear form k(·, ·) is H-elliptic with ellipticity constant γ. Then the variational problem (3.1) and its Galerkin discretization (3.5) have unique solutions u and uh , respectively. Furthermore, the inequality ku − uh kH ≤

M γ

inf ku − vh kH

vh ∈Hh

(3.9)

holds. Proof. Because k(·, ·) is H-elliptic the conditions (3.3) (with ε = γ), (3.4) and (3.6) (with εh = γ) are satisfied. From theorem 3.1.1 we conclude that unique solutions u and uh exist. Using k(u − uh , vh ) = 0 for all vh ∈ Hh and the ellipticity and continuity we get for arbitrary vh ∈ Hh : 1 1 ku − uh k2H ≤ k(u − uh , u − uh ) = k(u − uh , u − vh ) γ γ M ≤ ku − uh kH ku − vh kH . γ 64

Hence the inequality in (3.9) holds. In chapter 4 and chapter 5 we will use theorem 3.1.1 in the discretization error analysis. In the remainder of this chapter we only consider cases with H1 = H1 =: H, H1,h = H2,h =: Hh and H-elliptic bilinear forms, such that corollary 3.1.2 can be applied. An improvement of the bound in (3.9) can be obtained if k(·, ·) is symmetric: Corollary 3.1.3 Assume that the conditions as in corollary 3.1.2 are satisfied. If in addition the bilinear form k(·, ·) is symmetric, the inequality s M ku − uh kH ≤ inf ku − vh kH (3.10) γ vh ∈Hh holds. 1

Proof. Introduce the norm |||v||| := k(v, v) 2 on H. Note that √ √ γkvkH ≤ |||v||| ≤ M kvkH for all v ∈ H. The space (H, ||| · |||) is a Hilbert space and due to |||v|||2 = k(v, v), k(u, v) ≤ |||u||||||v||| the bilinear form has ellipticity constant and continuity constant w.r.t. the norm ||| · ||| both equal to 1. Application of corollary 3.1.2 in the space (H, ||| · |||) yields |||u − uh ||| ≤ inf |||u − vh ||| vh ∈Hh

and thus we obtain ku − uh kH

1 1 ≤ √ |||u − uh ||| ≤ √ inf |||u − vh ||| ≤ γ γ vh ∈Hh

s

M γ

inf ku − vh kH ,

vh ∈Hh

which completes the proof. Assume H1 = H2 = H and H1,h = H2,h = Hh . For the actual computation of the solution uh of the Galerkin discretization we need a basis of the space Hh . Let {φi }1≤i≤N be a basis of Hh , i.e., every vh ∈ Hh has a unique representation vh =

N X

vj φj

j=1

with v := (v1 , . . . , vN )T ∈ RN .

The Galerkin discretization can be reformulated as: find v ∈ RN such that

N X j=1

k(φj , φi )vj = f (φi ) ∀ i = 1, . . . , N

(3.11)

This yields the linear system of equations Kv = b ,

with Kij = k(φj , φi ), bi = f (φi ), 1 ≤ i, j ≤ N. 65

(3.12)

In the remainder of this chapter we discuss concrete choices for the space Hh , namely the so-called finite element spaces. These spaces turn out to be very suitable for the Galerkin discretization of scalar elliptic boundary value problems. Finite element spaces can also be used for the Galerkin discretization of the Stokes problem. This topic is treated in chapter 5. Once a space Hh is known one can investigate approximation properties of this space and derive bounds for inf vh ∈Hh ku − vh kH (with u the weak solution of the elliptic boundary value problem), cf. section 3.3. Due to the Cea-lemma we then have a bound for the discretization error ku − uh kH (see section 3.4). In Part III of this book (iterative methods) we discuss techniques that can be used for solving the linear system in (3.12).

3.2

Examples of finite element spaces

In this section we introduce finite element spaces that are appropriate for the Galerkin discretization of elliptic boundary value problems. We only present the main principles. An extensive treatment of finite element techniques can be found in, for example, [27], [28], [21]. To simplify the presentation we only consider finite element methods for elliptic boundary value problems in Rn with n ≤ 3. Starting point for the finite element approach is a subdivsion of the domain Ω in a finite number of subsets T . Such a subdivision is called a triangulation and is denoted by Th = {T }. For the subsets T we only allow: T is an n-simplex (i.e., interval, triangle, tetrahedron), or, T is an n-rectangle.

(3.13)

Furthermore, the triangulation Th = {T } should be such that ¯ = ∪T ∈T T , Ω h

(3.14a)

int T1 ∩ int T2 = ∅ for all T1 , T2 ∈ Th , T1 6= T2 , any edge [face] of any T1 ∈ Th is either a subset of ∂Ω or an edge [face] of another T2 ∈ Th .

Definition 3.2.1 A triangulation that satisfies (3.13) and (3.14) is called admissible.

(3.14b) (3.14c)



Note that a triangulation can be admissible only if the domain Ω is polygonal (i.e., ∂Ω consists of lines and/or planes). If the domain is not polygonal we can approximate it by a polygonal domain Ωh and construct an admissible triangulation of Ωh (see ...) or use isoparametric finite elements (section 3.6). We consider a family of admissible triangulations denoted by {Th }. Let hT := diam(T ) for T ∈ Th . The index parameter h of Th is taken such that h = max{ hT | T ∈ Th }. Furthermore, for T ∈ Th we define ρT := sup{ diam(B) | B is a ball contained in T }, hT ∈ [1, ∞). σT := ρT 66

Definition 3.2.2 A family of admissible triangulations {Th } is called regular if 1. The parameter h approaches zero: inf{ h | Th ∈ {Th } } = 0, 2. ∃ σ :

hT ρT

= σT ≤ σ for all T ∈ Th and all Th ∈ {Th }.

A family of admissible triangulations {Th } is called quasi-uniform if ∃σ ˆ :

h ≤σ ˆ ρT

for all T ∈ Th

and all Th ∈ {Th }. 

3.2.1

Simplicial finite elements

We now introduce a very important class of finite element spaces. Let {Th } be a family of admissible triangulations of Ω consisting only of n-simplices. The space of polynomials in Rn of degree less than or equal k is denoted by Pk , i.e., p ∈ Pk is of the form X p(x) = γα xα1 1 xα2 2 . . . xαnn , γα ∈ R. |α|≤k

The dimension of Pk is

dim Pk =



n+k k

The spaces of simplicial finite elements are given by



.

X0h := { v ∈ L2 (Ω) | v|T ∈ P0 for all T ∈ Th }, ¯ | v|T ∈ Pk for all T ∈ Th }, k ≥ 1. Xkh := { v ∈ C(Ω)

(3.15a) (3.15b)

Thus these spaces consist of piecewise polynomials which, for k ≥ 1, are continuous on Ω.

Remark 3.2.3 From theorem 2.2.12 it follows that Xkh ⊂ H 1 (Ω) for all k ≥ 1.



We will also need simplicial finite element spaces with functions that are zero on ∂Ω: Xkh,0 := Xkh ∩ H01 (Ω),

3.2.2

k ≥ 1.

(3.16)

Rectangular finite elements

Let {Th } be a family of admissible triangulations consisting only of n-rectangles. The space of polynomials in Rn of degree less than or equal k with respect to each of the variables is denoted by Qk , i.e., p ∈ Qk is of the form X p(x) = γα xα1 1 xα2 2 . . . xαnn , γα ∈ R. 0≤αi ≤k

The dimension of Qk is

dim Qk = (k + 1)n .

The spaces of rectangular finite elements are given by

Q0h := { v ∈ L2 (Ω) | v|T ∈ Q0 for all T ∈ Th }, ¯ | v|T ∈ Qk for all T ∈ Th }, k ≥ 1, Qkh := { v ∈ C(Ω)

Qkh,0

:=

Qkh



H01 (Ω)

, k ≥ 1.

67

(3.17a) (3.17b) (3.17c)

3.3

Approximation properties of finite element spaces

In this section, for u ∈ H 2 (Ω) we derive bounds for the approximation error inf vh ∈Hh ku − vh k1 with Hh = Xkh or Hh = Qkh (note that Hh depends on the parameter k). The main idea of the analysis is as follows. First we will introduce an interpolation operator ¯ → Hh . Recall that we assumed n ≤ 3. The Sobolev embedding theorem 2.2.14 yields Ihk : C(Ω) ¯ H m (Ω) ֒→ C(Ω) for m ≥ 2 and thus the interpolation operator is well-defined for u ∈ H m (Ω), m ≥ 2. We will prove interpolation error bounds of the form (cf. theorem 3.3.9) ku − Ihk ukt ≤ chm−t |u|m

for 2 ≤ m ≤ k + 1, t = 0, 1.

This implies (corollary 3.3.10) inf ku − vh kt ≤ chm−t |u|m

vh ∈Hh

for 2 ≤ m ≤ k + 1, t = 0, 1.

¯ → Qk . Then we ¯ → Xk and I k : C(Ω) We first introduce the interpolation operators IXk : C(Ω) h Q h formulate some useful results that will be applied to prove the main result in theorem 3.3.9. ¯ → Xk . For the descripWe start with the definition of an interpolation operator IXk : C(Ω) h tion of this operator the so-called barycentric coordinates are useful: Definition 3.3.1 Let T be a nondegenerate n-simplex and aj ∈ Rn , j = 1, . . . , n+1 its vertices. Then T can be described by T ={

n+1 X j=1

λj aj | 0 ≤ λj ≤ 1 ∀ j,

n+1 X j=1

λj = 1 }.

(3.18)

To every x ∈ T there corresponds a unique n + 1-tuple (λ1 , . . . , λn+1 ) as in (3.18). These λj , 1 ≤ j ≤ n + 1, are called the barycentric coordinates of x ∈ T . The mapping x → (λ1 , . . . , λn+1 ) is affine.  Using these barycentric coordinates we define the set Lk (T ) :=

X  n+1 j=1

n+1 X 1 k−1 λj aj | λj ∈ {0, , . . . , λj = 1 , 1} ∀ j, k k j=1

which is called the principal lattice of order k (in T ). Examples for n = 2 and n = 3 are given in figure... figure This principal lattice can be used to determine a unique polynomial p ∈ Pk : Lemma 3.3.2 Let T be a nondegenerated n-simplex. Then any polynomial p ∈ Pk is uniquely determined by its values on the principal lattice Lk (T ). Proof. For example, in [67]. ¯ Let Th = {T } be an admissible triangulation of Ω consisting only of n-simplices. For u ∈ C(Ω) 68

we define a corresponding function IXk u ∈ L2 (Ω) by piecewise polynomial interpolation on each simplex T ∈ Th : ∀ T ∈ Th : (IXk u)|T ∈ Pk such that (IXk u)(xj ) = u(xj ) ∀ xj ∈ Lk (T ).

(3.19)

The piecewise polynomial function IXk u is continuous on Ω: ¯ we have I k u ∈ Xk . Lemma 3.3.3 For k ≥ 1 and u ∈ C(Ω) h X Proof. By definition we have that (IXk u)|T ∈ Pk . Thus we only have to show that IXk u is continuous across interfaces between adjacent n-simplices T1 , T2 ∈ Th . For n = 1 this is trivial, since the endpoints a1 , a2 of a 1-simplex [a1 , a2 ] are used as interpolation points. We now consider n = 2. Define Γ := T1 ∩ T2 and pi := (IXk u)|Ti , i = 1, 2. Note that k + 1 points of the principal lattice lie on the face Γ: Lk (T1 ) ∩ Γ = Lk (T2 ) ∩ Γ =: {x1 , . . . , xk+1 } with xi 6= xj for i 6= j. Since these xj are interpolation points we have that p1 (xj ) = p2 (xj ) = u(xj ) for j = 1, . . . , k + 1. The functions (pi )|Γ are one-dimensional polynomials of degree k. We conclude that (p1 )|Γ = (p2 )|Γ holds, and thus IXk u is continuous across the interface Γ. The case n = 3 (or even n ≥ 3) can be treated similarly. ¯ → Qk For the space of rectangular finite elements, Qkh , an interpolation operator IQk : C(Ω) h can be defined in a very similar way. For this we introduce a uniform grid on a rectangle in Rn . For a given interval [a, b] a uniform grid with mesh size b−a k is given by Gk[a,b] := { a + j On an n-rectangle T =

Qn

i=1 [ai , bi ]

b−a | 0 ≤ j ≤ k }. k

we define a uniform lattice by ˜ k (T ) := L

n Y

Gk[ai ,bi ] .

i=1

Using a tensor product argument it follows that any polynomial p ∈ Qk , k ≥ 1, is uniquely ˜ k (T ). Let Th = {T } be an admissible triangulation of Ω determined by its values on the set L ¯ we define a corresponding function I k u ∈ L2 (Ω) consisting only of n-rectangles. For u ∈ C(Ω) Q by piecewise polynomial interpolation on each n-rectangle T ∈ Th : ˜ k (T ) ∀ T ∈ Th : (IQk u)|T ∈ Qk such that (IQk u)(xj ) = u(xj ) ∀ xj ∈ L

(3.20)

With similar arguments as used in the proof of lemma 3.3.3 one can show the following: ¯ we have I k u ∈ Qk . Lemma 3.3.4 For k ≥ 1 and u ∈ C(Ω) Q h For the analysis of the interpolation error we begin with two elementary lemmas. Lemma 3.3.5 Let Tˆ, T ⊂ Rn be two sets as in (3.13) and F (ˆ x) = Bˆ x + c an affine mapping such that F (Tˆ) = T . Then the following inequalities hold: kBk2 ≤

hT , ρTˆ

kB−1 k2 ≤ 69

hTˆ . ρT

Proof. We will prove the first inequality. The second one then follows from the first one by using F −1 (T ) = Tˆ with F −1 (x) = B−1 x − B−1 c. Note that 1 kBk2 = max{ kBˆ xk2 | x ˆ ∈ Rn , kˆ xk2 = ρTˆ }. (3.21) ρTˆ Let B(a; ρTˆ ) be a ball with centre a and diameter ρTˆ that is contained in Tˆ. Take x ˆ ∈ Rn with kˆ xk2 = ρTˆ . For y1 = a + 12 x ˆ ∈ Tˆ and y2 = a − 12 x ˆ ∈ Tˆ we have x ˆ = y1 − y2 , F (yi ) ∈ T, i = 1, 2, and thus kBˆ xk2 = kB(y1 − y2 )k2 = kF (y1 ) − F (y2 )k2 . ≤ hT From (3.21) and (3.22) we obtain kBk2 ≤

(3.22)

hT ρTˆ .

ˆ and K be Lipschitz domains in Rn that are affine equivalent: Lemma 3.3.6 Let K ˆ =K F (K)

with F (ˆ x) = Bˆ x + c,

det B 6= 0.

ˆ → R. Then vˆ ∈ H m (K) ˆ and there exists a For m ≥ 0, v ∈ H m (K) define vˆ := v ◦ F : K constant C such that |ˆ v |m,Kˆ |v|m,K

1

−2 |v|m,K ≤ CkBkm 2 | det B| 1 2

v |m,Kˆ ≤ CkB−1 km 2 | det B| |ˆ

for all v ∈ H m (K),

ˆ for all vˆ ∈ H m (K).

(3.23a) (3.23b)

¯ is dense in H m (Ω) it suffices to prove (3.23a) for v ∈ C ∞ (Ω). ¯ For m = 0 Proof. Since C ∞ (Ω) this result follows from Z Z v(x)2 | det B|−1 dx = | det B|−1 |v|20,K . vˆ(ˆ x)2 dˆ x= |ˆ v |20,Kˆ = ˆ K

K

¯ the For the case m ≥ 1 we need some basic results on Fr´echet derivatives. For v ∈ C ∞ (Ω) m n n Fr´echet derivative D v(x) : R × . . . × R → R is an m-linear form. Let ej be the j-th basis vector in Rn . For |α| = m and for suitable i1 , . . . , im ∈ N we have D α v(x) =

∂ |α| v(x) ∂ |α| v(x) = D m v(x)(ei1 , . . . , eim ) αn = ∂xi1 . . . ∂xim . . . ∂xn

∂xα1 1

(3.24)

(note the subtle difference in notation between Dα and D m ). Let E be an m-linear form on Rn . Then both kEk2 := max

y i ∈Rn

|E(y 1 , . . . , y m )| and kEk∗ := max |E(ei1 , . . . , eim )| 1≤ij ≤n ky 1 k2 . . . ky m k2

define norms on the space of m-linear forms on Rn . Using the norm equivalence property it follows that there exists a constant c, independent of E, such that kEk∗ ≤ kEk2 ≤ c kEk∗ . 70

If we take E = D m v(x) and use (3.24) we get max |D α v(x)| ≤ kD m v(x)k2 ≤ c max |Dα v(x)|.

|α|=m

|α|=m

(3.25)

The chain rule applied to vˆ(ˆ x) = v(Bˆ x + c), with x = Bˆ x + c, results in D m vˆ(ˆ x)(y 1 , . . . , y m ) = D m v(x)(By 1 , . . . , By m ) and thus m kD m vˆ(ˆ x)k2 ≤ kBkm 2 kD v(x)k2 .

(3.26)

Combination of (3.25) and (3.26) yields α max |D α vˆ(ˆ x)| ≤ c kBkm 2 max |D v(x)|.

|α|=m

|α|=m

Using this we finally obtain Z X Z α 2 2 D vˆ(ˆ x) dˆ x≤C |ˆ v |m,Kˆ = ˆ |α|=m K

ˆ K

2 max |D α vˆ(ˆ x)| dˆ x

|α|=m

Z

2 x max |Dα v(x)| dˆ ˆ |α|=m K Z |D α v(x)|2 | det B|−1 dx = C kBk2m max 2 |α|=m K X Z −1 2 −1 |Dα v(x)|2 dx = C kBk2m ≤ C kBk2m | det B| 2 | det B| |v|m,K . 2 ≤ C kBk2m 2

|α|=m K

ˆ This proves the result in (3.23a). The result in (3.23b) follows from (3.23a) and F −1 (K) = K −1 −1 −1 with F (x) = B x − B c. The following result is a generalization of the Poincare-Friedrichs inequality in (2.79b) and will be used in the proof of theorem 3.3.8. Lemma 3.3.7 Let K be a Lipschitz domain in Rn . There exists a constant C such that Z  X 2  2 2 for all u ∈ H m (K). Dα u dx kukm ≤ C |u|m + K

|α|≤m−1

(Here | · |m and k · km denote Sobolev (semi)norms on the domain K) Proof. For m = 1 this result is given in (2.79b). From the result in (2.79b) it also follows that Z 2  2 2 u dx for all u ∈ H 1 (K). (3.27) kukL2 ≤ C |u|1 + K

We introduce the notation (for u ∈ βℓ :=

X

|α|=ℓ

kD

α

H m (K)):

uk2L2 (K)

,

δℓ :=

X Z

|α|=ℓ

71

K

D α u dx

2

,

ℓ = 0, . . . , m.

P Note that for ℓ ≤ m − 1 we have |α|=ℓ |D α u|21 = βℓ+1 . Using this and the inequality (3.27) with u replaced by Dα u we get for ℓ ≤ m − 1: βℓ =

X

|α|=ℓ

kD

α

uk2L2 (K)

≤ C βℓ+1 +

X Z

|α|=ℓ

D α u dx K

2 

= C(βℓ+1 + δℓ ).

From this it follows that kuk2m

=

m X ℓ=0

βℓ ≤ C βm +

m−1 X ℓ=0



δℓ = C

|u|2m

+

X

|α|≤m−1

Z

D α u dx K

2 

,

which completes the proof. The next theorem, due to Bramble-Hilbert [20], is a fundamental one: Theorem 3.3.8 Let K be a Lipschitz domain in Rn and Y a Banach space. Suppose L : H m (K) → Y , m ≥ 1, is a linear bounded operator such that L(p) = 0

for all p ∈ Pm−1 .

Then there exists a constant C such that for all u ∈ H m (K).

kLukY ≤ C |u|m

(3.28)

Proof. First note that kLukY = kL(u − p)kY ≤ kLkku − pkm

for all p ∈ Pm−1 .

(3.29)

P Let p(x) = |α|≤m−1 γα xα1 1 . . . xαnn ∈ Pm−1 . For any given u ∈ H m (K) one can show that the coefficients γα can be taken such that Z

α

D p dx =

Z

D α u dx

K

K

for |α| ≤ m − 1

holds (hint: the ordering |α| = m − 1, |α| = m − 2, . . ., yields a linear system for the coefficients γα with a nonsingular lower triangular matrix). Using the result in lemma 3.3.7 we obtain ku − pk2m ≤ C |u − p|2m + = C|u −

p|2m

=

X

|α|≤m−1

Z

K

C|u|2m

Combination of (3.29) and (3.30) completes the proof. We now present a main result on the interpolation error:

72

2 D α (u − p) dx )

(3.30)

Theorem 3.3.9 Let {Th } be a regular family of triangulations of Ω consisting of n-simplices and let Xkh be the corresponding finite element space as in (3.15b). For 2 ≤ m ≤ k + 1 and t ∈ {0, 1} the following holds: ku − IXk ukt ≤ Chm−t |u|m

for all u ∈ H m (Ω).

(3.31)

Let {Th } be a regular family of triangulations of Ω consisting of n-rectangles and let Qkh be the corresponding finite element space as in (3.17b). For 2 ≤ m ≤ k + 1 and t ∈ {0, 1} the following holds: ku − IQk ukt ≤ Chm−t |u|m for all u ∈ H m (Ω). (3.32) The constants C in(3.31) and (3.32) are independent of u and of Th ∈ {Th }. Proof. We will prove the result in (3.31). Very similar arguments can be used to show that the result in (3.32) holds. Take 2 ≤ m ≤ k + 1. The constants C used below are all uniform with respect to u ∈ H m (Ω) and Th ∈ {Th }. We will show that for all ℓ ∈ {0, 1} |u − IXk u|ℓ ≤ Chm−ℓ |u|m

for all u ∈ H m (Ω)

holds, with | · |0 := k · kL2 . The result in (3.31) then follows from this and from kvk21 = |v|20 + |v|21 . Due to X |u − IXk u|2ℓ,T |u − IXk u|2ℓ = T ∈Th

it suffices to prove for ℓ ∈ {0, 1} and for arbitrary T ∈ Th : |u − IXk u|ℓ,T ≤ Chm−ℓ |u|m,T

for all u ∈ H m (Ω).

(3.33)

Let Tˆ be the unit n-simplex and F : Tˆ → T an affine transformation F (ˆ x) = Bˆ x + c such that F (Tˆ) = T . Due to the fact that the family {Th } is regular, there exists a constant C such that kBk2 kB−1 k2 ≤ c

hT ≤ C. ρT

(3.34)

P Note that kpk∗ := x∈Lk (Tˆ) |p(x)| defines a norm on Pk . Since all norms on Pk are equivalent there exists a constant C such that kpkm,Tˆ ≤ C kpk∗

for all p ∈ Pk .

(3.35)

The continuous embedding H m (Tˆ) ֒→ C(Tˆ) yields: ∃C:

kvk∞,Tˆ ≤ C kvkm,Tˆ

for all v ∈ H m (Tˆ).

(3.36)

Let IˆXk : C(Tˆ) → Pk be the interpolation operator on the unit n-simplex as defined in (3.19) (with T = Tˆ). We then have ˆ (IXk u) ◦ F = IˆXk (u ◦ F ) = IˆXk u

with u ˆ := u ◦ F.

(3.37)

Define the linear operator L := id − IˆXk : H m (Tˆ) → H m (Tˆ). For this operator we have Lp = 0 for all p ∈ Pk and thus, due to m ≤ k + 1, Lp = 0 for all p ∈ Pm−1 . Furthermore, using (3.35) 73

and (3.36) we get kLvkm,Tˆ ≤ kvkm,Tˆ + kIˆXk vkm,Tˆ ≤ kvkm,Tˆ + CkIˆXk vk∗ X ≤ kvkm,Tˆ + C |v(x)| ≤ kvkm,Tˆ + C kvk∞,Tˆ ≤ Ckvkm,Tˆ . x∈Lk (Tˆ)

Thus we can apply theorem 3.3.8, which yields kv − IˆXk vkm,Tˆ ≤ C|v|m,Tˆ

for all v ∈ H m (Tˆ).

(3.38)

For u ∈ H m (Ω) we obtain, using lemma 3.3.5, lemma 3.3.6 and the results in (3.34), (3.37), (3.38): 1

|u − IXk u|ℓ,T ≤ C kB−1 kℓ2 | det B| 2 |u ◦ F − (IXk u) ◦ F |ℓ,Tˆ 1

ˆ|ℓ,Tˆ u − IˆXk u = C kB−1 kℓ2 | det B| 2 |ˆ

1 u − IˆXk u ˆkm,Tˆ ≤ C kB−1 kℓ2 | det B| 2 kˆ 1

u|m,Tˆ ≤ C kB−1 kℓ2 kBkm ≤ C kB−1 kℓ2 | det B| 2 |ˆ 2 |u|m,T  ℓ |u|m,T ≤ C kB−1 k2 kBk2 kBkm−ℓ 2

|u|m,T ≤ C hm−ℓ |u|m,T . ≤ C kBkm−ℓ 2 This proves the result in (3.33)

Corollary 3.3.10 Under the same assumption as in theorem 3.3.9 we have inf ku − vh kt ≤ C hm−t |u|m ,

(3.39)

inf ku − vh kt ≤ C hm−t |u|m .

(3.40)

vh ∈Xkh vh ∈Qkh

Furthermore, the results in (3.39)and (3.40) hold for u ∈ H m (Ω) ∩ H01 (Ω) with Xkh , Qkh replaced by Xkh,0 and Qkh,0 , respectively. Proof. The first part is clear. The second part follows from the fact that for u ∈ H m (Ω) ∩ H01 (Ω) we have IXk u ∈ Xkh,0 and IQk u ∈ Qkh,0 . We now prove so-called local and global inverse inequalities. These results can be used to bound the H 1 -norm of a finite element function in terms of its L2 -norm. Lemma 3.3.11 (inverse inequalities) Let {Th } be a regular family of triangulations of Ω consisting of n-simplices (n-rectangles) and let Vh := Xkh (= Qkh ) be the corresponding finite element space. For m ≥ 0 there exists a constant c independent of h such that |vh |m+1,T ≤ ch−1 T |vh |m,T

for all T ∈ Th and all vh ∈ Vh .

If in addition the family of triangulations is quasi-uniform, then there exists a constant c independent of h such that |vh |1 ≤ c h−1 kvh kL2 for all vh ∈ Vh . 74

Proof. We consider the case of simplices. The other case can be treated very similar. For T ∈ Th let F (ˆ x) = BT x ˆ + c be an affine transformation such that F (Tˆ) = T , where Tˆ is the unit simplex. Note that on the finite dimensional space Pk (Tˆ) all norms are equivalent. Using lemma 3.3.6 we get, with vˆh = vh ◦ F , 1

m+1 | det BT | 2 |ˆ |vh |m+1,T ≤ ckB−1 vh |m+1,Tˆ T k2 1

vh |m+1,Tˆ | det BT | 2 |ˆ ≤ ch−m−1 T 1

vh |m,Tˆ ≤ ch−1 | det BT | 2 |ˆ ≤ ch−m−1 T |vh |m,T , T

which proves the local inverse inequality. Note that vh ∈ H01 (Ω). Thus for m = 0 we can sum up −1 these local results and using the quasi-uniformity assumption (i.e., h−1 T ≤ ch ) we then obtain |vh |21 =

X

T ∈Th

|vh |21,T ≤ c

X

T ∈Th

2 −2 h−2 T |vh |0,T ≤ c h

X

T ∈Th

|vh |20,T = ch−2 kvh k2L2

and thus the global inverse inequality is proved.

3.4

Finite element discretization of scalar elliptic problems

In this section we consider the Galerkin discretization of the scalar elliptic problem (

find u ∈ H01 (Ω) such that k(u, v) = f (v) for all v ∈ H01 (Ω)

(3.41)

with a bilinear form and right handside as in (2.72), i.e.: k(u, v) =

R



∇uT A∇v + b · ∇uv + cuv dx , with −

and ∃ α0 > 0

1 2 div b +

T

c ≥ 0 a.e. in T

ξ A(x)ξ ≥ α0 ξ ξ

f (v) = Ω,

for all ξ ∈

Rn ,

R

Ω f v dx

x ∈ Ω,

and aij ∈ L∞ (Ω) ∀ i, j, bi ∈ H 1 (Ω) ∩ L∞ (Ω) ∀ i, c ∈ L∞ (Ω).

(3.42a) (3.42b) (3.42c) (3.42d)

For the Galerkin discretization we use finite element subspaces Hh = Xkh,0 and Hh = Qkh,0 . We prove bounds for the discretization error ku − uh k1 (section 3.4.1) and ku − uh kL2 (section 3.4.2).

3.4.1

Error bounds in the norm k · k1

We first consider the Galerkin discretization of (3.41) with simplicial finite elements. Let {Th } be a regular family of triangulations of Ω consisting of n-simplices and Xkh,0 , k ≥ 1, the corresponding finite element space as in (3.16). The discrete problem is given by (

find uh ∈ Xkh,0 such that k(uh , vh ) = f (vh ) for all vh ∈ Xkh,0 . 75

(3.43)

We have the following result concerning the discretization error: Theorem 3.4.1 Assume that the conditions (3.42b)-(3.42d) are fulfilled and that the solution u ∈ H01 (Ω) of (3.41) lies in H m (Ω) with m ≥ 2. Let uh be the solution of (3.43). For 2 ≤ m ≤ k + 1 the following holds ku − uh k1 ≤ C hm−1 |u|m , with a constant C independent of u and of Th ∈ {Th }. Proof. From the proof of theorem 2.5.3 it follows that the bilinear form k(·, ·) is continuous and H01 (Ω)-elliptic. From corollary 3.1.2 it follows that the continuous and discrete problems have unique solutions and that ku − uh k1 ≤ C

inf

vh ∈Xkh,0

ku − vh k1

holds. Now apply corollary 3.3.10 with t = 1. A very similar result holds for the Galerkin discretization with rectangular finite elements. Let {Th } be a regular family of triangulations of Ω consisting of n-rectangles and Qkh,0 , k ≥ 1, the corresponding finite element space as in (3.17c). The discrete problem is given by (

find uh ∈ Qkh,0 such that k(uh , vh ) = f (vh ) for all vh ∈ Qkh,0 .

(3.44)

We have the following result concerning the discretization error: Theorem 3.4.2 Assume that the conditions in (3.42b)-(3.42d) are fulfilled and that the solution u ∈ H01 (Ω) of (3.41) lies in H m (Ω) with m ≥ 2. Let uh be the solution of (3.44). For 2 ≤ m ≤ k + 1 the following holds ku − uh k1 ≤ C hm−1 |u|m , with a constant C independent of u and of Th ∈ {Th }. Proof. The same arguments as in the proof of theorem 3.4.1 can be used. Note that in the preceding two theorems we used the smoothness assumption u ∈ H01 (Ω)∩H m (Ω) with m ≥ 2. Sufficient conditions for this to hold are given in section 2.5.4, theorem 2.5.14 and theorem 2.5.16. In the literature one can find discretization error bounds for the case when u is less regular, i.e., u ∈ H01 (Ω) but u ∈ / H 2 (Ω) (cf., for example, [?]). One simple result for the 1 case of minimal smoothness (u ∈ H0 (Ω) only) is given in: Theorem 3.4.3 Assume that the conditions of theorem 2.5.3 are fulfilled. Let uh be the solution of (3.43). Then we have: lim ku − uh k1 = 0.

h→0

76

Proof. Define V := H01 (Ω) ∩ H 2 (Ω). Note that V From corollary 3.1.2 we obtain ku − uh k1 ≤ C There exists v ∈ V such that

inf

k·k1

vh ∈Xkh,0

ku − vk1 ≤

= H01 (Ω). Take ε > 0.

ku − vh k1 .

ε 2C

(3.45)

(3.46)

From corollary 3.3.10 it follows that kv − IXk vk1 ≤ C˜ h|v|2 , and thus for h sufficiently small we have ε kv − IXk vk1 ≤ . (3.47) 2C Combination of (3.45), (3.46) and (3.47) yields  ku − uh k1 ≤ Cku − IXk vk1 ≤ C ku − vk1 + kv − IXk vk1 ≤ ε

and thus the result is proved.

Remark 3.4.4 Comment on results for cases with other boundary conditions ....

3.4.2



Error bounds in the norm k · kL2

In this section we derive a bound for the discretization error in the L2 -norm. For the analysis we will need a duality argument, i.e., an argument in which the dual problem of the given variational problem (3.41) plays a role. For k(·, ·) and f (·) as in (3.42a) we define the dual problem by (

find u ˜ ∈ H01 (Ω) such that k(v, u˜) = f (v) for all v ∈ H01 (Ω).

(3.48)

Note that if k(·, ·) is continuous and H01 (Ω)-elliptic then this dual problem has a unique solution. The dual problem is said to be H 2 -regular (cf. section 2.5.4) if ∃C:

k˜ uk2 ≤ C kf kL2

for all f ∈ L2 (Ω).

The following result concerning the finite element discretization error holds: Theorem 3.4.5 Suppose that the assumptions of theorem 3.4.1 [theorem 3.4.2] are fulfilled and that the dual problem (3.48) is H 2 -regular. For 2 ≤ m ≤ k + 1 the inequality ku − uh kL2 ≤ C hm |u|m holds, with a constant C independent of u and of Th ∈ {Th }. Proof. We give the proof for the case of simplicial finite elements. Exactly the same arguments can be used for rectangular finite elements. The bilinear form k(·, ·) is continuous and H01 (Ω)-elliptic and thus the problem (3.41), its 77

Galerkin discretization and the dual problem (3.48) are uniquely solvable. Define eh = u − uh ˜ ∈ H01 (Ω) ∩ H 2 (Ω) be the solution of the dual problem and note that eh ∈ H01 (Ω). Let u Z eh v dx for all v ∈ H01 (Ω). k(v, u˜) = Ω

Using the Galerkin orthogonality, k(eh , vh ) = 0 for all vh ∈ Xkh,0 , we get Z ˜) e2h dx = k(eh , u keh k2L2 = Ω

(3.49)

= k(eh , u ˜ − IXk u ˜) ≤ Ckeh k1 k˜ u − IXk u ˜k1 .

From corollary 3.3.10 and the H 2 -regularity of the dual problem we obtain ˜k1 ≤ C h|˜ u|2 ≤ C hkeh kL2 . k˜ u − IXk u

(3.50)

Combining (3.49) and (3.50) results in keh kL2 ≤ C hkeh k1 . Now apply theorem 3.4.1 [theorem 3.4.2].

Remark 3.4.6 Comment on sufficient conditions for H 2 -regularity of the dual problem ...

3.5



Stiffness matrix

In this section we consider the discrete problem in (3.43) with a bilinear form and right handside as in (3.42). We will discuss properties of the linear system described in (3.12). For this we need a suitable basis of the finite element space Xkh,0 . The following lemma gives a general tool for constructing a basis in some finite element space. Lemma 3.5.1 Let H be a finite dimensional vector space. Assume that for i = 1, . . . , N we have φi ∈ H and ℓi ∈ H ′ such that the following conditions are satisfied: ℓi (φi ) 6= 0 for all i,

ℓi (φj ) = 0 for all i 6= j ,

for all v ∈ H, v 6= 0 : ℓi (v) 6= 0 for some i

(3.51a) (3.51b)

Then (φi )1≤i≤N forms a basis of H. Proof. Let α1 , . . . , αN be such that 0 = ℓi

N X j=1

PN

j=1 αj φj

= 0. Using (3.51a) we get

N  X αj ℓi (φj ) = αi ℓi (φi ) αj φj =

for i = 1, . . . , N,

j=1

and thus αi = 0 for all i. This yields that φi , i = 1, . . . , N , are independent. Hence, N ≤ k := dim(H) = dim(H ′ ) holds. We now show that N ≥ k holds, too. Let v1 , . . . , vk be a basis of H. Define the matrix L ∈ RN ×k by Lij = ℓi (vj ). Let x ∈ Rk be such that Lx = 0. We then have k X j=1

k X xj vj ) = 0 for all i = 1, . . . , N ℓi (vj )xj = ℓi ( j=1

78

Using (3.51b) this yields thus N ≥ k holds.

Pk

j=1 xj vj

= 0 and thus x = 0. Hence, L has full column rank and

The set (ℓi )1≤i≤N as in (3.51) forms a basis of H ′ and is called the dual basis of (φi )1≤i≤N . We now construct a so-called nodal basis of the space of simplicial finite elements Xkh,0 . We will associate a basis function to each interpolation point in the principal lattice of T ∈ Th that lies not on ∂Ω. To make this more precise, for an admissible triangulation Th , consisting of n-simplices, we introduce the grid ∪T ∈Th { xj ∈ Lk (T ) | xj ∈ / ∂Ω } =: {x1 , . . . , xN } =: V with xi 6= xj for all i 6= j. For each xi ∈ V we define a corresponding function φi as follows: ∀ T ∈ Th : (φi )|T ∈ Pk and ∀ xj ∈ Lk (T ) : φi (xj ) =

(

0 1

if xj 6= xi if xj = xi

(3.52)

From lemma 3.3.3 it follows that for all k ≥ 1 we have φi ∈ Xkh,0 . Thus we have a collection of functions (φi )1≤i≤N with the properties: φi ∈ Xkh,0 ;

∀ xj ∈ V : φi (xj ) = δij , 1 ≤ i ≤ N

(3.53)

Lemma 3.5.2 The functions (φi )1≤i≤N form a basis of Xkh,0 . Proof. Introduce the linear functional ℓi ∈ (Xkh,0 )′ : ℓi (u) = u(xi ) for u ∈ Xkh,0 , xi ∈ V, i = 1, . . . , N One easily verifies that for φi , ℓi , i = 1, . . . , N the conditions of lemma 3.5.1 are satisfied. Due to the property φi (xj ) = δij the functions φi are called nodal basis functions. In exactly the same way one can construct nodal basis functions for other finite element spaces like, for example, Xkh , Qkh,0 , Qkh . We consider the discrete problem (3.43) and use the nodal basis (φi )1≤i≤N of Xkh,0 to reformulate this problem as explained in (3.11)-(3.12). This results in the linear system of equations Kh vh = bh , with (Kh )ij = k(φj , φi ), (bh )i = f (φi ), 1 ≤ i, j ≤ N

(3.54)

The matrix Kh is called the stiffness matrix. In the remainder of this section we derive some important properties of this matrix that will play an important role in the chapters 6-9. In these chapters we discuss iterative solution methods for the linear system in (3.54). Below we assume that for the bilinear form k(·, ·) the conditions (3.42b)-(3.42d) are satisfied. The stiffness matrix is sparse. We introduce qrow (Kh ) = maximum number of nonzero entries per row in Kh qcol (Kh ) = maximum number of nonzero entries per column in Kh 79

Lemma 3.5.3 Let {Th } be a regular family of triangulations consisting of n-simplices and for each Th let Kh be the stiffness matrix defined in (3.54). There exists a constant q independent of Th ∈ {Th } such that max{ qrow (Kh ) , qcol (Kh ) } ≤ q Proof. Take a fixed i with 1 ≤ i ≤ N . Define a neighbourhood of xi by Nxi := { T ∈ Th | xi ∈ Lk (T ) } = supp(φi ) From the assumption that we have a regular family of triangulations it follows that |Nxi | ≤ M

(3.55)

with a constant M independent of i and of Th ∈ {Th }. Assume that (Kh )ij 6= 0. Using the fact that we have a nodal basis it follows that xj ∈ Nxi , i.e., xj is a lattice point in Nxi . Using (3.55) we get that the number of lattice points in Nxi can be bounded by a constant, say q, independent of i and of Th ∈ {Th }. Hence qrow (Kh ) ≤ q holds. The same arguments apply if one interchanges i and j. Note that the constant q depends on the degree k used in the finite element space Xkh,0 . The result of this lemma shows that the number of nonzero entries in the N × N -matrix Kh is bounded by qN . If h ↓ 0 then N → ∞ and the number of nonzero entries in Kh is proportional to N . Therefore the stiffness matrix is said to be sparse. The stiffness matrix is positive definite. Lemma 3.5.4 For the stiffness matrix defined in (3.54) we have: Kh + KhT is symmetric positive definite P k Proof. Take v ∈ RN , v 6= 0 and define u = N j=1 vj φj ∈ Xh,0 . Note that u 6= 0. Using the fact that the bilinear form is elliptic we get vT (Kh + KhT )v = 2vT Kh v = 2k(u, u) > 0 and thus the symmetric matrix Kh + KhT is positive definite. As a direct consequence we have: Corollary 3.5.5 If in (3.42) we have b = 0 then the bilinear form k(·, ·) is symmetric and the stiffness matrix Kh is symmetric positive definite. The stiffness matrix is ill-conditioned. We now derive sharp bounds for the condition number of the stiffness matrix. We restrict ourselves to the case b = 0 in (3.42). Then the stiffness matrix Kh is symmetric positive definite and its spectral condition number is given by κ(Kh ) = kKh k2 kKh−1 k2 =

λmax (Kh ) λmin (Kh )

We first give a result (due to [89]) on diagonal scaling of a symmetric positive definite matrix. We use the notation DA := diag(A) for a square matrix A. 80

Lemma 3.5.6 Let A ∈ RN ×N be a symmetric positive definite matrix and let q be such that qrow (A) ≤ q. For any nonsingular diagonal matrix D ∈ RN ×N we have −1

−1

κ(DA 2 ADA 2 ) ≤ q κ(DAD) −1

−1

Proof. Define A˜ = DA 2 ADA 2 and note that this matrix is symmetric positive definite and ˜ = I. Let A˜ = LLT be the Cholesky factorization of A. ˜ Let ei be the ith standard basis diag(A) N vector in R . Then we have ˜ i , ei i = A˜ii = 1 kLT ei k22 = hLT ei , LT ei i = hAe |A˜ij | = |hLT ej , LT ei i| ≤ kLT ej k2 kLT ei k2 = 1 ˜ 2 ≤ kAk ˜ ∞ ≤ q holds. For an arbitrary nonsingular diagonal matrix D ∈ RN ×N and thus kAk we have: ˜ = kAk ˜ 2 kA˜−1 k2 ≤ q kL−T L−1 k2 = q kL−1 k2 κ(A) 2

≤ q kL−1 D−1 k22 max |Djj |2 = q kL−1 D −1 k22 max kLT Dej k22 j

j

˜ 2 = q κ(D AD) ˜ ≤ q kL−1 D−1 k22 kLT Dk22 = q kD −1 A˜−1 D −1 k2 kDADk This shows that the desired result holds. The result in this lemma shows that for the sparse symmetric positive definite stiffness matrix Kh the symmetric scaling with the diagonal matrix DKh is in a certain sense optimal. Hence, we investigate the condition number of the scaled matrix 1

1

˜ h := D− 2 Kh D− 2 K Kh Kh

(3.56)

The following result is based on the analysis presented in [9]. Theorem 3.5.7 Suppose b = 0 in (3.42). Let {Th } be a regular family of triangulations consisting of n-simplices and for each Th let Kh be the stiffness matrix defined in (3.54). Then there exists a constant C independent of Th ∈ {Th } such that ( h CN (1 + log hmin ) if n = 2 ˜h) ≤ κ(K (3.57) 2 if n = 3 CN 3 with hmin = min{ hT | T ∈ Th }. Proof. We need the following embedding results (cf. [1], theorem 5.4): H 1 (Ω) ֒→ L6 (Ω) for n = 3 1

q

H (Ω) ֒→ L (Ω)

(3.58)

for n = 2, q > 0

(3.59)

For the embedding in (3.59) one can analyze the dependence of the norm of the embedding operator on q. This results in (cf. [9]): √ kukLq ≤ C qkuk1 for all u ∈ H 1 (Ω), q > 0 , (3.60) with a constant C independent of u and q. Note that if for c1 > 0, c2 we have c1 hDKh v, vi ≤ hKh v, vi ≤ c2 hDKh v, vi 81

for all v ∈ RN

(3.61)

˜ h ) ≤ c2 holds. then κ(K c1 P For v ∈ RN we define u := N i=1 vi φi . Note that each nodal basis function φi is associated to a grid point xi such that φi (xi ) = 1, φi (xj ) = 0 for j 6= i. The set of grid points (xi )1≤i≤N is denoted by V. We have X X  DKh ii u(xi )2 = k(φi , φi )u(xi )2 (3.62) hDKh v, vi = xi ∈V

xi ∈V

There are constants d1 > 0 and d2 such that d1 |φi |21 ≤ k(φi , φi ) ≤ d2 |φi |21 for all i. Using the lemmas 3.3.5 and 3.3.6 one can show that there are constants d˜1 > 0 and d˜2 independent of Th ∈ {Th } such that X (3.63) k(φi , φi ) ≤ d˜2 h−2 d˜1 h−2 T |T | for all T ∈ Th T |T | ≤ xi ∈T

Combination of (3.62) and (3.63) yields X X 2 2 ˆ2 h−2 kuk ≤ hD v, vi ≤ d h−2 dˆ1 K 0,T h T kuk0,T T

(3.64)

T ∈Th

T ∈Th

with constants dˆ1 > 0 and dˆ2 independent of Th ∈ {Th } and of v ∈ RN . For T ∈ Th let F (ˆ x) = Bˆ x + c be an affine mapping with F (Tˆ) = T , where Tˆ is the unit n-simplex. From X |u|21,T hKh v, vi = k(u, u) ≤ C |u|21 = C X

≤C

T ∈Th

X

≤C

T ∈Th

T ∈Th

|ˆ u|21,Tˆ h−2 T | det B|

≤C

X

T ∈Th

kˆ uk20,Tˆ h−2 T | det B|

(3.65)

2 h−2 T kuk0,T ≤ ChDKh v, vi

it follows that the second inequality in (3.61) holds with a constant c2 independent of v and of Th ∈ {Th }. We now consider the first inequality in (3.61). First note that for arbitrary α > 2, β ≤ 0 and w ∈ Lα (Ω) we have, using the discrete H¨older inequality: β  α−2 X β Z X X Z 2 2 hTα wα dx α ≤ hTα−2 α wα dx α T

T ∈Th

T ∈Th

T ∈Th

β α

≤ hmin β α

T

X  α−2 1 α kwk2Lα (Ω)

(3.66)

T ∈Th

≤ Chmin N

α−2 α

kwk2Lα (Ω)

We now distinguish n = 3 and n = 2. First we treat n = 3. We use the H¨older inequality to get X Z X −2 2 2 h−2 hT kuk0,T = C hDKh v, vi ≤ C T u dx T ∈Th



X

T ∈Th

≤C

X

Z

T ∈Th

T

h−2p dx T 3−2p

hT p

Z

1

p

Z

T ∈Th

T

82

u2q dx

T

u2q dx

T

1 q

1 q

1 1 ( + = 1) p q

Now take p = 23 , q = 3 and apply (3.66) with β = 0, α = 6. This yields hDKh v, vi ≤ C

Z

X

T ∈Th

u6 dx T

1

3

2

≤ C N 3 kuk2L6 (Ω)

We use the embedding result (3.58) and thus obtain 2

2

2

hDKh v, vi ≤ C N 3 kuk21 ≤ C N 3 k(u, u) = C N 3 hKh v, vi

(3.67)

Combination of the results in (3.65) and (3.67) proves the result in (3.57) for n = 3. We consider n = 2. Using the H¨older inequality it follows that for p > 1: kuk20,T ≤

Z

u2p dx T

1

p

Z

1 dx

T

1− 1

p

2− 2p

Z

≤ C hT

u2p dx

T

1

p

Using this we get hDKh v, vi ≤ C

X

T ∈Th

2 h−2 T kuk0,T



X

Z

−2 hT p

T ∈Th

u2p dx T

1

p

We apply (3.66) with α = 2p > 2, β = −4 and use the result in (3.60). This yields −2

p N hDKh v, vi ≤ C hmin

Note that |Ω| ≤

P

T ∈Th

p−1 p

−2

p N kuk2L2p (Ω) ≤ C phmin

p−1 p

kuk21

h2T ≤ CN h2 and thus N ≥ Ch−2 . We then obtain

hDKh v, vi ≤ C p

h  p2

hmin

N kuk21 ≤ C p

h  2p

hmin

N hKh v, vi

h The constant C can be chosen independent of p. For p = max{2, hmin } we have p h C(1 + log hmin ) and thus

hDKh v, vi ≤ C(1 + log

h )N hKh v, vi hmin

h hmin

2

p



(3.68)

Combination of the results in (3.65) and (3.68) proves the result (3.57) for n = 2. Remark 3.5.8 In [9] one can find an example which shows that for n = 2 the logarithmic term h ≤ σ for a constant can not be avoided. If the family of triangulations is quasi-uniform then hmin 2

σ independent of Th ∈ {Th } and furthermore, N = O(h−2 ) for n = 2, N 3 = O(h−2 ) for n = 3. ˜ h ) ≤ Ch−2 for n = 2, n = 3. Moreover, in this Hence, for the quasi-uniform case we have κ(K case the diagonal of Kh is well-conditioned, κ(DKh ) = O(1), and thus the scaling in (3.56) is not essential. We emphasize that for the general case of a regular (possibly non quasi-uniform) family of triangulations the scaling is essential: a result as in (3.57) does in general not hold for the matrix Kh . Finally we note that for the quasi-uniform case it is not difficult to prove that there exists a constant C > 0 independent of Th ∈ {Th } such that κ(Kh ) ≥ Ch−2 holds, both for n = 2 and n = 3.  83

3.5.1

Mass matrix

Apart from the stiffness matrix the so-called mass matrix also plays an important role in finite elements. This matrix depends on the choice of the basis in the finite element space but not on the bilinear form k(·, ·). Let (φi )1≤i≤N be the nodal basis of the finite element space Xkh,0 as defined in (3.52). The mass matrix Mh ∈ RN ×N is given by Z (3.69) φi φj dx = hφi , φj iL2 (Mh )ij = Ω

Note that this matrix is symmetric positive definite. As for the stiffness matrix we use a diagonal scaling with the diagonal matrix DMh := diag(Mh ). The next result shows that the scaled mass matrix is uniformly well-conditioned: Theorem 3.5.9 Let {Th } be a regular family of triangulations consisting of n-simplices and for each Th let Mh be the mass matrix defined in (3.69). Then there exists a constant C independent of Th ∈ {Th } such that −1

−1

κ(DMh2 Mh DM2h ) ≤ C

P Proof. Take Th ∈ {Th }. For v ∈ RN we define u := N i=1 vi φi . The constants that appear in the proof are independent of Th and of v. For each T ∈ Th = {T } let F : Tˆ → T be an affine transformation between the unit simplex Tˆ and T . Furthermore, u ˆ := u ◦ F . We use the index set IT := { i | T ⊂ supp(φi ) } Note that |IT | is uniformly bounded. We have

hMh v, vi = hu, uiL2 =

X

T ∈Th

|u|20,T

(3.70)

The nodal point associated to φi is denoted by xi (1 ≤ i ≤ N ). Using lemma 3.3.6 and the norm equivalence property in the space Pk (Tˆ) it follows that there exist constants c1 > 0 and c2 such that X c1 |u|20,T ≤ |T | u ˆ(zi )2 ≤ c2 |u|20,T zi ∈Lk (Tˆ)

and thus, using u(xi ) = vi , we get c1 |u|20,T ≤ |T |

X

i∈IT

vi2 ≤ c2 |u|20,T

Define di := |supp(φi )|. For i ∈ IT the quantity |T |d−1 is uniformly (w.r.t. T ) bounded both i from below by a strictly positive constant and from above (by 1). If we combine this with the result in (3.70) we get (with different constants c1 > 0, c2 ): X X di vi2 ≤ c2 hMh v, vi c1 hMh v, vi ≤ T ∈Th i∈IT

Hence c˜1 hMh v, vi ≤

N X i=1

di vi2 ≤ c˜2 hMh v, vi

84

with c˜1 > 0. Note (DMh )ii = hMh ei , ei i =

Z

supp(φi )

φ2i dx

thus there are constants cˆ1 > 0, cˆ2 independent of i such that cˆ1 di ≤ (DMh )ii ≤ cˆ2 di . We then obtain c1 hMh v, vi ≤ hDMh v, vi ≤ c2 hMh v, vi with c1 > 0. Thus the result is proved. Corollary 3.5.10 Let {Th } be a quasi-uniform family of triangulations consisting of n-simplices and for each Th let Mh be the mass matrix defined in (3.69). Then there exists a constant C independent of Th ∈ {Th } such that κ(Mh ) ≤ C Proof. Note that (Mh )ii =

Z

supp(φi )

φ2i dx

Using this in combination with the quasi-uniformity of {Th } it follows that the spectral condition number of DMh = diag(Mh ) is uniformly bounded. Now apply theorem 3.5.9.

3.6

Isoparametric finite elements

See Handbook Ciarlet, chapter 6.

3.7

Nonconforming finite elements

85

86

Chapter 4

Finite element discretization of a convection-diffusion problem 4.1

Introduction

In this chapter we consider the convection-diffusion boundary value problem −ε∆u + b · ∇u + cu = f

in Ω

u = 0 on ∂Ω

with a constant ε ∈ (0, 1], bi ∈ H 1 (Ω) ∩ L∞ (Ω) ∀ i, c ∈ L∞ (Ω) and f ∈ L2 (Ω). Furthermore, we also assume the smoothness property div b ∈ L∞ (Ω). The weak formulation of the problem is analyzed in section 2.5.2. We introduce Z Z f v dx. (4.1) ε∇u · ∇v + b · ∇uv + cuv dx , f (v) = k(u, v) = Ω



The weak formulation of this convection-diffusion problem is as follows: ( find u ∈ H01 (Ω) such that k(u, v) = f (v) for all v ∈ H01 (Ω).

(4.2)

In theorem 2.5.3 it is shown that if we assume 1 − div b + c ≥ 0 in 2



(4.3)

then this variational problem has a unique solution. In this chapter we treat the finite element discretization of the problem (4.2) for the convection-dominated case, i.e., ε ≪ maxi kbi kL∞ . Then the problem is singularly perturbed and the standard finite element method in general yields a poor approximation of the continuous solution. A significant improvement results if one introduces suitable artificial stabilization terms in the Galerkin discretization. For such a stabilization many different techniques exist, which leads to a large class of so-called stabilized finite element methods that are known in the literature. In section 4.4.2 we will explain and analyze one very popular method from this class, namely the streamline diffusion finite element method (SDFEM). In section 4.3 we consider a simple one-dimensional convection-diffusion equation to illustrate a few basic phenomena related to standard finite element discretization. To gain a better understanding of the (poor) behaviour of the standard finite element in the convectiondominated case we reconsider its discretization error analysis in section 4.4. 87

In the remainder of this section we briefly discuss the topic of regularity of the variational problem (4.2). In section 2.5.4 regularity results of the form kukm ≤ Ckf km−2 , m = 1, 2, . . ., with a constant C independent of f , are presented (with smoothness assumptions on the coefficients and on the domain). In the convection-dominated case it is of interest to analyze the dependence of the (stability) constant C on ε. An important result of this analysis is given in the following theorem. Theorem 4.1.1 Assume that 1 − div b(x) + c(x) ≥ β0 > 0 2

a.e. in Ω.

(4.4)

Then the solution u of (4.2) satisfies 1

ε 2 kuk1 + kukL2 ≤ Ckf kL2

(4.5)

with a constant C independent of f and of ε. Furthermore, if the regularity property u ∈ H 2 (Ω) holds, then the inequality 1 ε1 2 kuk2 ≤ Ckf kL2 (4.6) holds, with a constant C independent of f and of ε. Proof. Using partial integration, (4.4) and the Poincare-Friedrichs inequaliy we get Z  1 2 − div b + c u2 dx k(u, u) = ε|u|1 + 2 Ω  2 2 ≥ ε|u|1 + β0 kukL2 ≥ c εkuk21 + kuk2L2

with c > 0 independent of ε. In combination with

k(u, u) = f (u) ≤ kf kL2 kukL2 ≤ kf kL2 εkuk21 + kuk2L2 this yields 1

ε 2 kuk1 + kukL2 ≤

1

2

√ 1 √ 2 εkuk21 + kuk2L2 2 ≤ 2 c−1 kf kL2

and thus the result in (4.5) holds. If u ∈ H 2 (Ω) then the equality −ε∆u + b · ∇u + cu = f holds (where all derivatives are weak ones). Hence, using (4.5) and ε ≤ 1, we obtain 1

εk∆ukL2 ≤ kf kL2 + kbkL∞ kuk1 + kckL∞ kukL2 ≤ cε− 2 kf kL2 with a constant c independent of f and ε. We use the following result (lemma 8.1 in [57]) ∃ cˆ : kvk2 ≤ cˆk∆vkL2 + kvkL2

for all v ∈ H 2 (Ω).

Combination of this with (4.7) and (4.5) yields  1 1 1 ε1 2 kuk2 ≤ c ε1 2 k∆uk2 + ε1 2 kukL2 ≤ ckf kL2

and thus the result (4.6) holds.

88

(4.7)

Remark 4.1.2 The constants in (4.5) and (4.6) depend on β0 in (4.4). For the analysis the assumption β0 > 0 is essential. For the case β0 ≥ 0 a slight modification of the analysis results in a stability bound 1

ε 2 kuk1 +

p  1 −1 β0 kukL2 ≤ C min ε− 2 , β0 2 kf kL2 ,

with a constant C independent of f , ε and β0 .

The results in theorem 4.1.1 indicate that derivatives of the solution u (e.g., kuk1 ) grow if ε ↓ 0. This is due to the fact that in general in such a convection-diffusion problem there are boundary and internal layers in which the solution (or some of its derivatives) can vary exponentially. For an analysis of these boundary layers we refer to the literature, e.g. [76]. In certain special cases it is possible to obtain bounds on the derivative in streamline direction which are significantly 1 better than the general bound kuk1 ≤ Cε− 2 kf kL2 in (4.5). We now present two such results. The first one is for a relatively simple one-dimensional problem, whereas the second one is related to a two-dimensional convection-diffusion problem with Neumann boundary conditions on the outflow boundary. Theorem 4.1.3 For f ∈ L2 ([0, 1]) consider the following problem (with weak derivatives): −εu′′ (x) + u′ (x) = f (x)

for x ∈ (0, 1),

u(0) = u(1) = 0

with ε ∈ (0, 1]. The unique solution u satifies: max{kukL∞ , εku′ kL∞ } ≤ (1 − e−1 )−1 kf kL1 , −1 −1



ku kL1 ≤ 2(1 − e

)

kf kL1 .

(4.8) (4.9)

Proof. We reformulate the differential equation in the equivalent form ′ 1 e−x/ε u′ (x) = − e−x/ε f (x) =: g(x). ε

In textbooks on ordinary differential equations (e.g. [95]) it is shown that the solution can be represented using a Green’s function. For this we introduce the two fundamental solutions u1 (x) = 1 − ex/ε and u2 (x) = 1 − e(x−1)/ε (note: u1 and u2 satisfy the homogeneous differential equation and u1 (0) = u2 (1) = 0). The solution is given by ( Z 1 u1 (t)u2 (x) if t ≤ x, ε u(x) = G(x, t)g(t) dt, G(x, t) := 1 − e−1/ε u1 (x)u2 (t) if t > x. 0 We use C := (1 − e−1/ε )−1 . Note that C ≤ (1 − e−1 )−1 for ε ∈ (0, 1]. Using g(t) = − 1ε e−t/ε f (t) we get (for x ∈ [0, 1]) u(x) = Cu2 (x)

Z

x 0

(1 − e−t/ε )f (t) dt − Cu1 (x)e−x/ε

Note that |u2 (x)| ≤ 1, |u1 (x)|e−x/ε ≤ 1. We obtain |u(x)| ≤ C

Z

x 0

|f (t)| dt + C 89

Z

x

Z

x

1

 e(x−t)/ε 1 − e(t−1)/ε f (t) dt.

1

|f (t)| dt = Ckf kL1 .

(4.10)

From ′

1  e−t/ε 1 − e(t−1)/ε f (t) dt (1 − e )f (t) dt − x 0 Z x Z 1  C C = − e(x−1)/ε (1 − e−t/ε )f (t) dt + e(x−t)/ε 1 − e(t−1)/ε f (t) dt ε ε x 0

u (x) =

we get

Cu′2 (x)

Z

x

−t/ε

C |u (x)| ≤ ε ′

Z

0

x

Cu′1 (x)

C |f (t)| dt + ε

Z

Z

1 x

|f (t)| dt =

C kf kL1 . ε

(4.11)

Combination of (4.10) and (4.11) proves the result in (4.8). We also have Z 1 Z Z C 1 (x−1)/ε x ′ |u (x)| dx ≤ e (1 − e−t/ε )|f (t)| dt dx ε 0 0 0 Z Z  C 1 x/ε 1 −t/ε + e e 1 − e(t−1)/ε |f (t)| dt dx. ε 0 x

For the first term on the right handside we have Z Z Z 1 1 (x−1)/ε 1 1 (x−1)/ε x −t/ε e e F (x) dx (1 − e )|f (t)| dtdx =: ε 0 ε 0 0 Z 1  e(x−1)/ε 1 − e−x/ε |f (x)| dx = F (1) − 0 Z 1 |f (t)| dt = kf kL1 . ≤ F (1) ≤ 0

The second term can be treated similarly. This then yields Z 1 |u′ (x)| dx ≤ 2Ckf kL1 0

and thus the result in (4.9). Note that in (4.9) we have a bound on the derivative measured in the L1 -norm (which is weaker than the L2 -norm) that is independent of ε. Similar results for a more general one-dimensional convection-diffusion problem are given in [76] (section 1.1.2) and [38]. We now present a result for a two-dimensional problem. Theorem 4.1.4 For f ∈ L2 (Ω), Ω := (0, 1)2 and a constant c ≥ 0, consider the convectiondiffusion problem −ε∆u + ux + cu = f ∂u =0 ∂x u=0

in



on

ΓE := { (x, y) ∈ Ω | x = 1 }

on

∂Ω \ ΓE .

(4.12)

This problem has a unique solution u ∈ H 2 (Ω) and the inequality ckukL2 + kux kL2 ≤ 2kf kL2 holds. 90

(4.13)

Proof. First note that the weak formulation of this problem has a unique solution u ∈ H 1 (Ω). Using the fact that Ω is convex it follows that u ∈ H 2 (Ω) holds and thus the problem (with weak derivatives) in (4.12) has a unique solution u ∈ H 2 (Ω). From the differential equation we get kux k2L2 = hf, ux iL2 + εhuyy , ux iL2 + εhuxx , ux iL2 − chu, ux iL2 . Using Green’s formulas and the boundary conditions for the solution u we obtain, with ΓW := { (x, y) ∈ Ω | x = 0 }: huyy , ux iL2 huxx , ux iL2 hu, ux iL2

Z 1 ∂ 1 2 = − h (uy ) , 1iL2 = − u2 dy ≤ 0 2 ∂x 2 ΓE y Z 1 1 ∂ u2 dy ≤ 0 = h (ux )2 , 1iL2 = − 2 ∂x 2 ΓW x Z u2 dy − hux , uiL2 and thus hu, ux iL2 ≥ 0. = ΓE

Hence, we have kux k2L2 ≤ hf, ux iL2 ≤ kf kL2 kux kL2 .

(4.14)

Testing the differential equation with u (instead of ux ) yields c kuk2L2 = hf, uiL2 − εk∇uk2L2 − hux , uiL2 . This yields c kuk2L2 ≤ kf kL2 kukL2 .

(4.15)

Combination of (4.14) and (4.15) completes the proof. We note that for this problem a similar ε-independent bound for the derivative uy does not 1 hold. A sharp inequality of the form kuy kL2 ≤ ε− 2 kf kL2 can be shown. Furthermore, for the uniform bound on kux kL2 in (4.13) to hold it is essential that we consider the convection-diffusion problem with Neumann boundary conditions at the outflow boundary. Due to this there is no exponential boundary layer at the outflow boundary.

4.2

A variant of the Cea-lemma

In the analysis of the finite element method in chapter 3 the basic Cea-lemma plays an important role. In the analysis of finite element methods for convection-dominated elliptic problems we will need a variant of this lemma that is presented in this section and is based on a basic lemma given in [94]: Lemma 4.2.1 Let U be a normed linear space with norm k · k and V a subspace of U . Let s(·, ·) be a continuous bilinear forms on U × V and t(·, ·) a bilinear form on U × V such that for all u ∈ U the functional v → t(u, v) is bounded on V . Define r := s + t and assume that r is V -elliptic. Let c0 > 0 and c1 be such that r(v, v) ≥ c0 kvk2

for all v ∈ V

s(u, v) ≤ c1 kuk kvk

for all u ∈ U, v ∈ V.

91

(4.16) (4.17)

t(u,v) kvk .

On U we define the semi-norm kuk∗ := supv∈V

Then the following holds:

 |r(u, v)| ≤ max{c1 , 1} kuk + kuk∗ kvk for all u ∈ U, v ∈ V  r(u, v) c0 sup for all u ∈ V. ≥ kuk + kuk∗ kvk 1 + c0 + c1 v∈V

(4.18) (4.19)

Proof. For u ∈ U, v ∈ V we have

|r(u, v)| ≤ |s(u, v)| + |t(u, v)| ≤ c1 kukkvk + kuk∗ kvk

 ≤ max{c1 , 1} kuk + kuk∗ kvk

and thus (4.18) holds. We now consider (4.19). Take a fixed u ∈ V and θ ∈ (0, 1). Then there exists vθ ∈ V such that kvθ k = 1 and θkuk∗ ≤ t(u, vθ ). Note that r(u, vθ ) = s(u, vθ ) + t(u, vθ ) ≥ θkuk∗ − c1 kuk

and thus for wθ := u +

c0 kuk 1+c1 vθ

∈ V we obtain c0 kuk r(u, vθ ) 1 + c1  c0 kuk θkuk∗ − c1 kuk ≥ c0 kuk2 + 1 + c1  c0 kuk + θkuk∗ kuk. = 1 + c1

r(u, wθ ) = r(u, u) +

Furthermore,

(4.20)

c0 kuk 1 + c0 + c1 = kuk 1 + c1 1 + c1 holds. Combination of (4.20) and (4.21) yields  c0 r(u, wθ ) ≥ kuk + θkuk∗ . kwθ k 1 + c0 + c1 kwθ k ≤ kuk +

(4.21)

Because wθ ∈ V and θ ∈ (0, 1) is arbitrary this proves the result in (4.19). We emphasize that the seminorm k · k∗ on U depends on the bilinear form t(·, ·) and on the subspace V . Also note that in (4.18) we have a boundedness result on U × V , whereas in (4.19) we have an infsup bound on V × V . Using this lemma we can derive the following variant of the Cea-lemma (theorem 3.1.1). Theorem 4.2.2 Let the conditions as in lemma 4.2.1 be satisfied. Take f ∈ U ′ and assume that there exist u ∈ U , v ∈ V such that r(u, w) = f (w) r(v, w) = f (w)

for all w ∈ U

(4.22a)

for all w ∈ V.

(4.22b)

Then the following holds: ku − vk + ku − vk∗ ≤ C inf ku − wk + ku − wk∗ w∈V

with C := 1 + max{c1 , 1}

92

1 + c0 + c1 . c0



(4.23) (4.24)

Proof. Let w ∈ V be arbitrary. Using (4.18), (4.19) and the Galerkin property r(u−v, z) = 0 for all z ∈ V we get 1 + c0 + c1 1 + c0 + c1 r(v − w, z) r(u − w, z) = sup sup c0 kzk c0 kzk z∈V z∈V  1 + c0 + c1 ≤ max{c1 , 1} ku − wk + ku − wk∗ . c0

kv − wk + kv − wk∗ ≤

Using this and the triangle inequality

ku − vk + ku − vk∗ ≤ kv − wk + kv − wk∗ + ku − wk + ku − wk∗ we obtain the result. In this theorem there are significant differences compared to the Cea-lemma. For example, in theorem 4.2.2 we do not assume that U (or V ) is a Hilbert space and we do not assume an infsup property for the bilinear form r(·, ·) on U × V (only on V × V , cf. (4.19)). On the other hand, in theorem 4.2.2 we assume existence of solutions in U and V , cf. (4.22), whereas in the Cea-lemma existence and uniqueness of solutions follows from assumptions on continuity and infsup properties of the bilinear form.

4.3

A one-dimensional hyperbolic problem and its finite element discretization

If in a convection-diffusion problem with a bilinear form as in (4.1) one formally takes ε = 0 this results in a hyperbolic differential operator. In this section we give a detailed treatment of a very simple one-dimensional hyperbolic problem. We show well-posedness of this problem and explain why a standard finite element discretization method suffers from an instability. Furthermore, a stabilization technique is introduced that results in a finite element method with much better properties. In section 4.4 essentially the same analysis is applied to the finite element discretization of the convection-diffusion problem (4.1)-(4.2). We consider the hyperbolic problem bu′ (x) + u(x) = f (x), u(0) = 0.

x ∈ I := (0, 1), b > 0 a given constant,

(4.25)

For the weak formulation we introduce the Hilbert spaces H1 = { v ∈ H 1 (I) | v(0) = 0 }, H2 = L2 (I). The norm on H1 is kvk21 = kv ′ k2L2 + kvk2L2 . We define the bilinear form k(u, v) =

Z

1

bu′ v + uv dx

0

on H1 × H2 . Theorem 4.3.1 Let f ∈ L2 (I). There exists a unique u ∈ H1 such that k(u, v) = hf, viL2

for all v ∈ H2 .

Moreover, kuk1 ≤ ckf kL2 holds with c independent of f . 93

(4.26)

Proof. We apply theorem 2.3.1. The bilinear form k(·, ·) is continuous on H1 × H2 : √ |k(u, v)| ≤ bku′ kL2 kvkL2 + kukL2 kvkL2 ≤ 2 max{1, b}kuk1 kvkL2 , u ∈ H1 , v ∈ H2 . For u ∈ H1 we have sup v∈H2

1 k(u, v) hbu′ + u, viL2 = sup = kbu′ + ukL2 = b2 ku′ k2L2 + kuk2L2 + 2bhu′ , uiL2 2 . kvkL2 kvkL2 v∈H2

Using u(0) = 0 we get hu′ , uiL2 = u(1)2 − hu, u′ iL2 and thus hu′ , uiL2 ≥ 0. Hence we get sup v∈H2

k(u, v) ≥ min{1, b}kuk1 kvkL2

for all u ∈ H1 ,

i.e., the infsup condition (2.36) in theorem 2.3.1 is satisfied. We now consider the condition (2.37) Let v ∈ H2 be such that k(u, v) = 0 for all u ∈ H1 . This implies R 1 ′ in this theorem. R1 b 0 u v dx = − 0 uv dx for all u ∈ C0∞ (I) and thus v ∈ H 1 (I) with v ′ = 1b v (weak derivative). Using this we obtain Z 1 Z 1 Z 1 Z 1 ′ ′ uv dx for all u ∈ H1 , uv dx = bu(1)v(1) − u v dx = bu(1)v(1) − b uv dx = b − 0

0

0

0

and thus u(1)v(1) = 0 for all u ∈ H1 . This implies v(1) = 0. Using this and bv ′ − v = 0 yields kvk2L2 = hv, viL2 + hbv ′ − v, viL2 = bhv ′ , viL2 =

 b b v(1)2 − v(0)2 = − v(0)2 ≤ 0. 2 2

This implies v = 0 and thus condition (2.37) is satisfied. Application of theorem 2.3.1 now yields existence and uniqueness of a solution u ∈ H1 and kuk1 ≤ c sup

v∈H2

hf, viL2 = c kf kL2 , kvkL2

which completes the proof. For the discretization of this problem we use a Galerkin method with a standard finite element space. To simplify the notation we use a uniform grid and consider only linear finite elements. Let h = N1 , xi = ih, 0 ≤ i ≤ N , and Xh = { v ∈ C(I) |v(0) = 0, v|[xi ,xi+1 ] ∈ P1 for 0 ≤ i ≤ N − 1 }. Note that Xh ⊂ H1 and Xh ⊂ H2 . The discretization is as follows: determine uh ∈ Xh such that k(uh , vh ) = hf, vh iL2

for all vh ∈ Xh .

(4.27)

For the error analysis of this method we apply the Cea-lemma (theorem 3.1.1). The conditions (3.2), (3.3), (3.4) in theorem 3.1.1 have been shown to hold in the proof of theorem 4.3.1. It remains to verify the discrete infsup condition: ∃ εh > 0 :

sup vh ∈Xh

k(uh , vh ) ≥ εh kuh k1 kvh kL2

Related this we give the following lemma: 94

for all

uh ∈ X h .

(4.28)

Lemma 4.3.2 The infsup property (4.28) holds with εh = c h, c > 0 independent of h. Proof. For uh ∈ Xh we have hu′h , uh iL2 = 21 uh (1)2 ≥ 0 and thus sup vh ∈Xh

bhu′h , uh iL2 + kuh k2L2 k(uh , uh ) k(uh , vh ) ≥ = ≥ kuh kL2 . kvh kL2 kuh kL2 kuh kL2

Now apply an inverse inequality, cf. lemma 3.3.11, kvh′ kL2 ≤ ch−1 kvh kL2 for all vh ∈ Xh , resulting in kuh kL2 ≥ 12 kuh kL2 + chku′h kL2 ≥ ch kuk1 with a constant c > 0 independent of h.

Remark 4.3.3 The result in the previous lemma is sharp in the sense that the best (i.e. largest) infsup constant εh in (4.28) in general satisfies εh ≤ c h. This can be deduced from a numerical experiment or a technical analytical derivation. Here we present results of a numerical experiment. We consider the continuous and discrete problems as in (4.26), (4.27) with b = 1. Discretization of the bilinear forms (u, v) → hu′ , viL2 , (u, v) → hu, viL2 and (u, v) → hu′ , v ′ iL2 in the finite element space Xh (with respect to the nodal basis) results in N × N -matrices 

0 1 −1 0 1 1  . . .. .. Ch =  2  ∅ −1 

2 −1 −1 2 −1 1  .. .. Ah =  . . h  ∅ −1





1 6

1

1  6  2h      , Mh =  3   0 1 −1 1  ∅ .. .

1 .. . ∅

1 6

..

1 6

.

∅ .. . 1 1 6



   ,  1

(4.29)

6 1 2

∅ .. .

   .  2 −1 −1 1

Note that −1

kMh 2 (Ch + Mh )xk2 hCh x + Mh x, yi2 k(uh , vh ) = inf sup inf sup 1 1 = inf 1 uh ∈Xh vh ∈Xh kuh k1 kvh kL2 x∈RN y∈RN x∈RN k(Ah + Mh ) 2 xk2 h(Ah + Mh )x, xi22 hMh y, yi22 −1

1

kMh 2 (Ch + Mh )(Ah + Mh )− 2 zk2 = inf kzk2 z∈RN 1

1

= k(Ah + Mh ) 2 (Ch + Mh )−1 Mh2 k−1 2 =: εh . 1 1 A (MATLAB) computation of the quantity q(h) := εh /h yields: q( 10 ) = 1.3944, q( 50 ) = 1  1.3987, q( 250 ) = 1.3988. Hence, in this case εh is proportional to h.

Using the infsup result of the previous lemma we obtain the following corollaries. Corollary 4.3.4 The discrete problem (4.27) has a unique solution uh ∈ Xh and the following stability bound holds: kuh kL2 ≤ kf kL2 . (4.30) 95

Proof. Existence and uniqueness of a solution follows from continuity and ellipticity of the bilinear form k(·, ·) on Xh × Xh . From kuh k2L2 ≤ k(uh , uh ) = hf, uh iL2 ≤ kf kL2 kuh kL2 we obtain the stability result. Note that this stability result for the discrete problem is weaker than the one for the continuous problem in theorem 4.3.1. Corollary 4.3.5 Let u ∈ H1 and uh ∈ Xh be the solutions of (4.26) and (4.27), respectively. From theorem 3.1.1 we obtain the error bound ku − uh k1 ≤ c h−1 inf ku − vh k1 , vh ∈Xh

with a constant c independent of h. If u ∈ H 2 (I) holds, we obtain ku − uh k1 ≤ cku′′ kL2 .

(4.31)

Remark 4.3.6 Experiment. Is the result in (4.31) sharp ? Expectation (for suitable f ): |u − uh |1 ∼ c

ku − uh kL2 ≤ ch  These results show that, due to the deterioration of the infsup stability constant εh for h ↓ 0, the discretization with standard linear finite elements is not satisfactory. A heuristic explanation for this instability phenomenom can be given via the matrix Ch in (4.29) that represents the finite element discretization of u → u′ . The differential equation (in strong form) is u′ = − 1b u + 1b f on (0, 1), which is a first order ordinary differential equation. The initial condition is given by  1 u(0) = 0. For discretization of u′ (xi ) we use (cf. Ch ) the central difference 2h u(xi+1 )−u(xi−1 ) . Thus for the approximation of u′ at “time” x = xi we use u at the future “time” x = xi+1 , which is an unnatural approach. We now turn to the question how a better finite element discretization for this very simple problem can be developed. One possibility is to use suitable different finite element spaces H1,h ⊂ H1 and H2,h ⊂ H2 . This leads to a so-called Petrov-Galerkin method. We do not treat such methods here, but refer to the literature, for example [76]. From an implementation point of view it is convenient to use only one finite element space instead of two different ones. We will show how a satisfactory discretization with (only) the space Xh of linear finite elements can be obtained by using the concept of stabilization. A first stabilized method is based on the following observation. If u ∈ H1 satisfies (4.26), then Z 1

0

(bu′ + u)bv ′ dx = hf, bv ′ iL2

for all v ∈ H1

(4.32)

also holds. By adding this equation to the one in (4.26) it follows that the solution u ∈ H1 satisfies (4.33) hbu′ + u, bv ′ + viL2 = hf, bv ′ + viL2 for all v ∈ H1 . 96

Based on this we introduce the notation k1 (u, v) := hbu′ + u, bv ′ + viL2 , f1 (v) := hf, bv ′ + viL2 ,

u, v ∈ H1 ,

v ∈ H1 .

The bilinear form k1 (·, ·) is continuous on H1 × H1 and f1 is continuous on H1 . Moreover, k1 (·, ·) is symmetric and using hv ′ , viL2 = 12 v(1)2 ≥ 0 for v ∈ H1 , we get k1 (v, v) = b2 kv ′ k2L2 + kvk2L2 + 2bhv ′ , viL2 ≥ min{b2 , 1}kvk21 ,

for v ∈ H1 ,

(4.34)

and thus k1 (·, ·) is elliptic on H1 . The discrete problem is as follows: determine uh ∈ Xh such that k1 (uh , vh ) = f1 (vh ) for all vh ∈ Xh .

(4.35)

Due to Xh ⊂ H1 and the H1 -ellipticity of k1 (·, ·) this problem has a unique solution uh ∈ Xh . For the discretization error we obtain the following result. Lemma 4.3.7 Let u ∈ H1 be the solution of (4.26) (or (4.33)) and uh the solution of (4.35). The following holds: ku − uh k1 ≤ c inf ku − vh k1 vh ∈Xh

with a constant c independent of h. If u ∈

H 2 (I)

then

ku − uh k1 ≤ chku′′ kL2

(4.36)

holds with a constant c independent of h. Proof. Apply corollary 3.1.2. √ From (4.34) and f1 (v) ≤ max{b, 1} 2kf kL2 kvk1 it follows that the discrete problem has the stability property kuh k1 ≤ c kf kL2 which is similar to the stability property of the continuous solution given in theorem 4.3.1 and (significantly) better than the one for the original discrete problem, cf. (4.30). This explains why the discretization in (4.35) is called a stabilized finite element method. From (4.34) one can see that k1 (·, ·) contains an (artificial) diffusion term that is not present in k(·, ·). Note that the bounds in lemma 4.3.7 are significantly better than the ones in corollary 4.3.5. If u ∈ H 2 (I) then from (4.36) we have the L2 -error bound ku − uh kL2 ≤ c hku′′ kL2 .

(4.37)

In section 3.4.2, for elliptic problems, a duality argument is used to derive an L2 -error bound of the order h2 for linear finite elements. Such a duality argument can not be applied to hyperbolic problems due to the fact that the H 2 -regularity assumption that is crucial in the analysis in section 3.4.2 is usually not satisfied for hyperbolic problems. In the following remark this is made clear for the simple hyperbolic problem that is treated in this section. −βx − 1, with a Remark 4.3.8 Consider the problem in (4.25) with b = 1 and f (x) = 1−β β e constant β ≥ 1. Substitution shows that the solution is given by u(x) = β1 (e−βx − 1). Note that u, f ∈ C ∞ (I). Further elementary computations yield 1p kf kL2 ≤ 2, ku′′ kL2 ≥ β. 4

Hence a bound ku′′ kL2 ≤ c kf kL2 with a constant c independent of f ∈ L2 (I) can not hold, i.e., this problem is not H 2 -regular.  97

We now generalize the stabilized finite element method presented above and show that using this generalization we can derive a method with an H 1 -error bound of the order h (as in (4.36)) 1 and an improved L2 -error bound of the order h1 2 . This generalization is obtained by adding δ-times, with δ a parameter in [0, 1], the equation in (4.32) to the one in (4.26). This shows that the solution u ∈ H1 of (4.26) also satisfies kδ (u, v) = fδ (v) ′

for all v ∈ H1 , with ′

(4.38a) ′

fδ (v) := hf, δbv + viL2 .

kδ (u, v) := hbu + u, δbv + viL2 ,

(4.38b)

Note that for δ = 0 we have the original variational formulation and that δ = 1 results in the problem (4.33). For δ 6= 1 the bilinear form kδ (·, ·) is not symmetric. For all δ ∈ [0, 1] we have fδ ∈ H1′ . The discrete problem is as follows: determine uh ∈ Xh such that kδ (uh , vh ) = fδ (vh ) for all vh ∈ Xh .

(4.39)

The discrete solution uh (if it exists) depends on δ. We investigate how δ can be chosen such that the discretization error (bound) is minimized. For this analysis we use the abstract results in section 4.2. We write kδ (u, v) = sδ (u, v) + tδ (u, v), u, v ∈ H1 , with

tδ (u, v) = δhu, bv ′ iL2 + hbu′ , viL2 .

sδ (u, v) = δhbu′ , bv ′ iL2 + hu, viL2 ,

The bilinear form sδ (·, ·) defines a scalar product on H1 . We introduce the norm and the seminorm (cf. lemma 4.2.1) 1

|||u|||δ := sδ (u, u) 2 , Note that

and

kuk∗,h,δ := sup

vh ∈Xh

tδ (u, vh ) , |||vh |||δ

for u ∈ H1 .

1 tδ (u, u) = b(δ + 1)hu′ , uiL2 = b(δ + 1)u(1)2 ≥ 0 2

for all u ∈ H1 ,

(4.40)

√ 1 √ √ (b δ|u|1 + kukL2 ) ≤ |||u|||δ ≤ b δ|u|1 + kukL2 2

for all u ∈ H1 .

(4.41)

Lemma 4.3.9 For all δ ∈ [0, 1] the continuous problem (4.38) and the discrete problem (4.39) have unique solutions u and uh , respectively. The discrete solution satisfies the stability bound √ b δ|uh |1 + kuh kL2 ≤ 2kf kL2 . Proof. For δ = 0 the existence and uniqueness of solutions is given in theorem 4.3.1 and corollary 4.3.4. The stability result for δ = 0 also follows from corollary 4.3.4. For δ > 0 we obtain, using (4.40), kδ (u, u) = δb2 ku′ k2L2 + kuk2L2 + tδ (u, u) ≥ γkuk21

for u ∈ H1 , with γ := min{δb2 , 1} > 0,

and kδ (u, v) ≤ (bku′ kL2 + kukL2 )(δbkv ′ kL2 + kvkL2 ) ≤ c kuk1 kvk1 98

for u, v ∈ H1 .

Hence kδ (·, ·) is elliptic and continuous on H1 . The Lax-Milgram lemma implies that both the continuous and the discrete problem have a unique solution. For v ∈ H1 we have, cf. (4.40), kδ (v, v) = sδ (v, v) + tδ (v, v) ≥ |||v|||2δ .

(4.42)

Furthermore, using δ ≤ 1 we get

√  √ fδ (v) ≤ kf kL2 (bδ|v|1 + kvkL2 ) ≤ kf kL2 b δ|v|1 + kvkL2 ≤ 2kf kL2 |||v|||δ √ This yields |||uh |||2δ ≤ kδ (uh , uh ) = fδ (uh ) ≤ 2kf kL2 |||uh |||δ and thus √ √ b δ|uh |1 + kuh kL2 ≤ 2|||uh |||δ ≤ 2kf kL2 ,

for v ∈ H1 .

which completes the proof. Lemma 4.3.10 Let u and uh be the solutions of (4.38) and (4.39), respectively. The following error bound holds:  (4.43) |||u − uh |||δ + ku − uh k∗,h,δ ≤ 4 inf |||u − vh |||δ + ku − vh k∗,h,δ . vh ∈Xh

Proof. To derive this error bound we use theorem 4.2.2 with U = H1 , V = Xh , r(·, ·) = kδ (·, ·), k · k = ||| · |||δ , k · k∗ = k · k∗,h,δ . We verify the corresponding conditions in lemma 4.2.1. The bilinear form sδ (·, ·) is continuous on U = H1 : sδ (u, v) ≤ |||u|||δ |||v|||δ . Hence (4.17) is satisfied with c1 = 1. For u ∈ H1 the functional v → tδ (u, v) is clearly continuous on V = Xh . From (4.42) it follows that condition (4.16) is satisfied with c0 = 1. Application of theorem 4.2.2 yields  |||u − uh |||δ + ku − uh k∗,h,δ ≤ C inf |||u − vh |||δ + ku − vh k∗,h,δ , vh ∈Xh

with C = 1 + max{c1 , 1} 1+cc00+c1 = 4.

For the Sobolev space H1 we have H1 ⊂ C(I) and thus the nodal interpolation IX : H1 → C(I),

(IX u)(xi ) = u(xi ), 0 ≤ i ≤ N,

is well-defined. Theorem 4.3.11 Let u ∈ H1 and uh ∈ Xh be the solutions of (4.38) and (4.39), respectively. For all δ ∈ [0, 1] the error bound   √ √ b 1 b δ|u − uh |1 + ku − uh kL2 ≤ C b δ|u − IX u|1 + (1 + min{ , √ })ku − IX ukL2 h δ holds with a constant C independent of h, δ, b and u. Proof. From lemma 4.3.10 and (4.41) we obtain  √ √  √ b δ|u − uh |1 + ku − uh kL2 ≤ 4 2 b δ|u − IX u|1 + ku − IX ukL2 + ku − IX uk∗,h,δ .

Define eh := u − IX u, and note that eh (0) = eh (1) = 0. Thus we we have he′h , vh iL2 = −heh , vh′ iL2 99

for all vh ∈ Xh .

(4.44)

Using this and the inverse inequality |vh |1 ≤ ch−1 kvh kL2 for all vh ∈ Xh we obtain b(1 − δ)heh , vh′ iL2 tδ (eh , vh ) = sup 2 12 vh ∈Xh (δb2 |vh |2 vh ∈Xh |||vh |||δ 1 + kvh kL2 )  1 b bkeh kL2 |vh |1 ≤ c min √ , ≤ sup keh kL2 , 1 δ h vh ∈Xh (δ + ch2 b−2 ) 2 b|vh |1

keh k∗,h,δ = sup

(4.45)

with c independent of δ, h and b. The result follows from combination of (4.44) and (4.45). Corollary 4.3.12 Let u and uh be as in theorem 4.3.11 and assume that u ∈ H 2 (I). Then the following error bound holds for δ ∈ [0, 1]: √ √  h  b δ|u − uh |1 + ku − uh kL2 ≤ Ch h + b δ + b min{1, √ } ku′′ kL2 b δ

(4.46)

with a constant C independent of h, δ, b and u.

The term between square brackets in (4.46) is minimal for h ≤ b if we take δ = δopt = We consider three cases: δ = 0 (no stabilization): better than the one that δ = 1 (full stabilization): lemma 4.3.7. δ = δopt (optimal value):

h . b

(4.47)

Then we get ku − uh kL2 ≤ chku′′ kL2 . This bound for the L2 -error is follows from (4.31), cf. also remark 4.3.6. Then we obtain ku − uh k1 ≤ chku′′ kL2 , which is the same bound as in This results in

|u − uh |1 ≤ chku′′ kL2 ,

1

ku − uh kL2 ≤ ch1 2 ku′′ kL2 .

(4.48)

Hence, the bound for the norm | · |1 is the same as for δ = 1, but we have an improvement in the L2 -error bound. From these discretization error results and from the stability result in lemma 4.3.9 we see that δ = 0 leads to poor accuracy and poor stability properties. The best stability property is for the case δ = 1. A somewhat weaker stability property but a better approximation property is obtained for δ = δopt . For δ = δopt we have a good compromise between sufficient stability and high approximation quality. Finding such a compromise is a topic that is important in all stabilized finite element methods. Remark 4.3.13 Experiments to show dependence of errors on δ. Is the bound in (4.46) sharp?

4.4

The convection-diffusion problem reconsidered

In chapter 3 we treated the finite element discretization of the variational problem in (4.2). Under the assumption (4.3) we have k(u, u) ≥ ε|u|21

for all u ∈ H01 (Ω),

k(u, v) ≤ M |u|1 |v|1

for all u, v ∈

100

H01 (Ω),

(4.49) (4.50)

with a constant M independent of ε. Now recall the standard Galerkin discretization with linear finite elements, i.e.: uh ∈ X1h,0 such that k(uh , vh ) = f (vh ) for all vh ∈ X1h,0 . From the analysis in chapter 3 (corollary 3.1.2 and corollary 3.3.10) we obtain the discretization error bound M M |u − uh |1 ≤ inf |u − vh |1 ≤ C h|u|2 , (4.51) ε vh ∈X1h,0 ε provided u ∈ H 2 (Ω). The constant C is independent of h and ε. We can apply the duality argument used in section 3.4.2 to derive a bound for the error in the L2 -norm. The dual problem has the same form as in(4.1)-(4.2) but with b replaced by −b. Assume that (4.4) also holds with b replaced by −b and that the solution of the dual problem lies in H 2 (Ω) (the latter is true if Ω is convex). Then a regularity result as in (4.6) holds for the solution of the dual problem. Using this we obtain 1

ku − uh kL2 ≤ Cε−2 2 h2 |u|2 ,

(4.52)

with a constant C independent of h and ε. Even if |u|2 remains bounded for ε ↓ 0 (no boundary layers) the bounds in (4.51) and (4.52) tend to infinity for ε ↓ 0. These bounds, however, are too pessimistic and do not reflect important phenomena that are observed in numerical experiments. For example, from experiments it is known that the standard linear finite element discretization yields satisfactory results if h ≈ ε and ε ↓ 0. This, however, is not predicted by the bounds in (4.51) and (4.52). Below we present a refined analysis based on the same approach as in section 4.3 which results in better bounds for the discretization error. These bounds reflect important properties of the standard Galerkin finite element discretization applied to the convection-diffusion problem in (4.2) and show the effect of introducing a stabilization. In section 4.4.1 we consider wellposedness of the problem in (4.2). In section 4.4.2 we analyze a stabilized finite element method for this problem.

4.4.1

Well-posedness of the continuous problem

The (sharp) result in (4.49) shows that in the norm | · |1 (or equivalently k · k1 ) the convectiondiffusion problem is not uniformly well-posed for ε ↓ 0. In this section we introduce other norms in which the problem is uniformly well-posed. For a better understanding of the analysis we first present results for a two-dimensional hyperbolic problem: Remark 4.4.1 We consider a two-dimensional variant of the hyperbolic problem treated in section 4.3. Let Ω := (0, 1)2 , b := (1 0)T , f ∈ L2 (Ω). The continuous problem is as follows: determine u such that b · ∇u + u = f

in Ω

(4.53)

u = 0 on ΓW := {(x, y) ∈ Ω | x = 0 }.

Let H1 be the space of functions u ∈ L2 (Ω) for which the weak derivative ux = H1 := { u ∈ L2 (Ω) | ux ∈ L2 (Ω) }. This space with the scalar product hu, viH1 = hu, viL2 + hb · ∇u, b · ∇viL2 = hu, viL2 + hux , vx iL2 101

∂u ∂x

exists: (4.54)

is a Hilbert space (follows from the same arguments as for the Sobolev space H 1 (Ω)). Take H2 := L2 (Ω). The bilinear form corresponding to the problem in (4.53) is k(u, v) = hb · ∇u, viL2 + hu, viL2 ,

u ∈ H1 , v ∈ H2 .

Using the same arguments as in the proof of theorem 4.3.1 one can show that there exists a unique u ∈ H1 such that k(u, v) = hf, viL2

for all v ∈ H2

and that kukH1 ≤ c kf kL2 holds with a constant c independent of f ∈ L2 (Ω). Thus this hyperbolic problem is well-posed in the space H1 × H2 . Note that the stability result is similar to the one for the convection-diffusion problem in theorem 4.1.4.  We now turn to the convection-diffusion problem as in (4.1)-(4.2). As in section 4.3 the analysis uses the abstract setting given in section 4.2. We will need the following assumption: 1 There are constants β0 , cb such that − div b + c ≥ β0 ≥ 0, kckL∞ ≤ cb β0 . 2

(4.55)

We take cb := 0 if β0 = 0. Note that this assumption is somewhat stronger than the one in (4.3) but still covers the important special case div b = 0, c constant and c ≥ 0. Theorem 4.4.2 Consider the variational problem in (4.2) and assume that (4.55) holds. For u ∈ H01 (Ω) define the (semi-)norms

kb · ∇uk−ε

1 |||u|||ε := ε|u|21 + β0 kuk2L2 2 , R Ω b · ∇u v dx . = kuk∗ := sup |||v|||ε v∈H 1 (Ω)

(4.56a) (4.56b)

0

Then we have the continuity bound  k(u, v) ≤ max{cb , 1} |||u|||ε + kb · ∇uk−ε |||v|||ε

for all u, v ∈ H01 (Ω),

(4.57)

for all u ∈ H01 (Ω).

(4.58)

and the infsup result

 1 k(u, v) ≥ |||u|||ε + kb · ∇uk−ε 2 + max{cb , 1} v∈H 1 (Ω) |||v|||ε sup 0

Proof. We apply lemma 4.2.1 with U = V = H01 (Ω), norm k · k = ||| · |||ε and s(u, v) =

Z



ε∇u · ∇v + cuv dx ,

t(u, v) =

Z



For given u ∈ H01 (Ω) we have c |t(u, v)| ≤ c kvkL2 ≤ c |v|1 ≤ |||v|||ε ε 102

b · ∇uv dx .

and thus v → t(u, v) is bounded on V . Note that k(u, v) = s(u, v)+ t(u, v) holds. For u ∈ H01 (Ω) we have k(u, u) =

Z

ZΩ

ε∇u · ∇u + b · ∇uu + cu2 dx

1 ε∇u · ∇u + (− div b + c)u2 dx 2 ZΩ ε∇u · ∇u + β0 u2 dx = |||u|||2ε ≥ =



and thus (4.16) is satisfied with c0 = 1. Furthermore, for all u, v ∈ H01 (Ω) we have |s(u, v)| ≤ ε|u|1 |v|1 + kckL∞ kukL2 kvkL2 1 1 ≤ ε|u|21 + cb β0 kuk2L2 2 ε|v|21 + cb β0 kvk2L2 2 ≤ max{cb , 1}|||u|||ε |||v|||ε .

Hence (4.17) holds with c1 = max{cb , 1}. The results in (4.18) and (4.19) then yield (4.57) and (4.58), respectively. The result in this theorem can be interpreted as follows. Let H1 be the space H01 (Ω) endowed with the norm ||| · |||ε + kb · ∇ · k−ε and H2 the space H01 (Ω) with the norm ||| · |||ε . Note that these norms are problem dependent. The spaces H1 and H2 are Hilbert spaces. Using the linear operator L : H1 → H2′ , L(u)(v) := k(u, v) the variational problem (4.2) can be reformulated as follows: find u ∈ H1 such that Lu = f . From the results in theorem 2.3.1 and theorem 4.4.2 it follows that L is an isomorphism and that the inequalities kLkH2′ ←H1 ≤ max{cb , 1} ,

kL−1 kH1 ←H2′ ≤ 2 + max{cb , 1}

hold. Hence, the operator L : H1 → H2′ is well-conditioned uniformly in ε. In this sense the norms ||| · |||ε + kb · ∇ · k−ε and ||| · |||ε are natural for the convection-diffusion problem if one is interested in the case ε ↓ 0. If we take β0 > 0 and in the definition of the norms in (4.56) −1

formally put ε = 0 then √ using a density argument it follows that kb · ∇uk−ε=0 = β0 2 kb · ∇ukL2 . Furthermore |||u|||ε=0 = β0 kukL2 . The resulting norms in the spaces H1 and H2 are precisely those used in the well-posedness of the hyperbolic problem in remark 4.4.1.

The infsup bound in the previous theorem implies the following stability result for the variational problem (4.2). Corollary 4.4.3 Consider the variational problem in (4.2) and assume that (4.55) holds with β0 > 0. Then the inequality √ holds.

ε |u|1 +

p

β0 kukL2 +

√ √  −1 2 kb · ∇uk−ε ≤ 2 2 + max{cb , 1} β0 2 kf kL2

103

(4.59)

Proof. From k(u, v) = hf, viL2 for all v ∈ H01 (Ω) and (4.58) we obtain  |||u|||ε + kb · ∇uk−ε ≤ 2 + max{cb , 1}  ≤ 2 + max{cb , 1} Furthermore, note that

holds.

hf, viL2 v∈H 1 (Ω) |||v|||ε sup 0

kf kL2 kvkL2

sup v∈H01 (Ω)

1

(ε|v|21 + β0 kvk2L2 ) 2

 −1 ≤ 2 + max{cb , 1} β0 2 kf kL2 .

p  1 √ ε |u|1 + β0 kukL2 |||u|||ε ≥ √ 2 1

From this corollary it follows that for the case β0 > 0 the inequality ε 2 kuk1 + kukL2 ≤ Ckf kL2 holds with a constant C independent of f and ε. This result is proved in theorem 4.1.1, too. However, from corollary 4.4.3 we also obtain kb · ∇uk−ε ≤ Ckf kL2

(4.60)

with C independent of f and ε. Hence, we have a bound on the derivative in streamline direction. Taking (formally) ε = 0 we obtain a bound kukL2 + kb · ∇ukL2 ≤ Ckf kL2 , which is (for the example b = (1 0)T ) the same as the stability bound kukH1 ≤ ckf kL2 derived in remark 4.4.1. For ε = 0 and β0 > 0 the norm k · k−ε is equivalent to the L2 -norm. A result that relates the norm k · k−ε to the more tractable L2 -norm also for ε > 0 is given in the following lemma. Lemma 4.4.4 Let {Th } be a regular quasi-uniform family of triangulations of Ω consisting of n-simplices and let Vh := Xkh,0 ⊂ H01 (Ω) be the corresponding finite element space. Let Ph : L2 (Ω) → Vh be the L2 -orthogonal projection on Vh : hPh w, vh iL2 = hw, vh iL2

for all w ∈ L2 (Ω), vh ∈ Vh .

Assume that β0 > 0. There exists a constant C > 0 independent of h and ε such that for 0 ≤ ε ≤ h2 : −1

CkPh wkL2 ≤ kwk−ε ≤ β0 2 kwkL2

for all w ∈ L2 (Ω).

(4.61)

Proof. The second inequality in (4.61) follows from kwk−ε =

sup v∈H01 (Ω)

hw, viL2

(ε|v|21 + β0 kvk2L2 )

1 2

−1

≤ β0 2 kwkL2

For the first inequality we need the global inverse inequality from lemma 3.3.11: |vh |1 ≤ ch−1 kvh kL2 for all vh ∈ Vh . Using this inequality we get kwk∗ = =

hw, Ph wiL2 hw, viL2 ≥ |||Ph w|||ε v∈H 1 (Ω) |||v|||ε sup 0

kPh wk2L2

(ε|Ph w|21 + β0 kPh wk2L2 )

1 2

≥ c2

and thus the first inequality in (4.61) holds, too.

104

1 ε + β0 )− 2 kPh wkL2 ≥ CkPh wkL2 , 2 h

4.4.2

Finite element discretization

We now analyze the Galerkin finite element discretization of the convection-diffusion problem. For ease of presentation we only consider simplicial finite elements. The case with rectangular finite elements can be treated analogously. Let {Th } be a regular family of triangulations of Ω consisting of n-simplices and let Vh := Xkh,0 ⊂ H01 (Ω),

k ≥ 1,

be the corresponding finite element space. The standard discretization is as follows: determine uh ∈ Vh such that k(uh , vh ) = hf, vh iL2

for all vh ∈ Vh .

(4.62)

We now use the same stabilization approach as in section 4.3. Assume that for the solution u ∈ H01 (Ω) of the convection-diffusion problem we have u ∈ H 2 (Ω). Then Z  − ε∆u + b · ∇u + cu v dx = hf, viL2 for all v ∈ H01 (Ω), Ω

holds, but also for arbitrary δ ∈ R: Z  − ε∆u + b · ∇u + cu δb · ∇v dx = hf, δb · ∇viL2 Ω

for all v ∈ H01 (Ω).

Adding these equations it follows that the solution u satisfies

h−ε∆u + b · ∇u + cu, v + δb · ∇viL2 = hf, v + δb · ∇iL2

for all v ∈ H01 (Ω),

or, equivalently, k(u, v) + δh−ε∆u + b · ∇u + cu, b · ∇viL2 = hf, viL2 + δhf, b · ∇viL2

for all v ∈ H01 (Ω).

This leads to the following discretization: determine uh ∈ Vh such that kδ (uh , vh ) = fδ (vh ) for all vh ∈ Vh , X δT h−ε∆uh + b · ∇uh + cuh , b · ∇vh iT , with kδ (uh , vh ) := k(uh , vh ) +

(4.63a) (4.63b)

T ∈Th

fδ (vh ) := hf, vh iL2 +

X

T ∈Th

δT hf, b · ∇vh iT .

(4.63c)

P We use the notation h·, ·iT = h·, ·iL2 (T ) . In (4.63) we consider a sum T ∈Th h·, ·iT instead of h·, ·iL2 because for uh ∈ Vh the second derivatives in ∆uh are well-defined in each T ∈ Th but ∆uh is not well-defined across edges (faces) in the triangulation. In (4.63) we use a (stabilization) parameter δT for each T ∈ Th instead of one global parameter δ. This offers the possibility to adapt the stabilization parameter to the local mesh size and thus obtain better results if the triangulation is strongly nonuniform. The continuous solution u ∈ H01 (Ω) satisfies kδ (u, v) = fδ (v)

for all v ∈ H01 (Ω),

(4.64)

provided u ∈ H 2 (T ) for all T ∈ Th . In the remainder we assume that u has this regularity property. If δT = 0 for all T we have the standard (unstabilized) method as in (4.62). In the 105

remainder of this section we present an error analysis of the discretization (4.63) along the same lines as in section 4.3. We use the abstract analysis in section 4.2 with the spaces U := { v ∈ H01 (Ω) | v|T ∈ H 2 (T ) for all T ∈ Th },

V = Vh .

Note that U depends on Th . We split kδ (·, ·) as follows: u, v ∈ U, X δT hb · ∇u, b · ∇viT , + hcu, viL2 +

kδ (u, v) = sδ (u, v) + tδ (u, v), sδ (u, v) := εh∇u, ∇viL2

tδ (u, v) := hb · ∇u, viL2 +

X

T ∈Th

T ∈Th

δT h−ε∆u + cu, b · ∇viT .

We only consider δT ≥ 0. Then k · k = ||| · |||δ defines a norm on U : X |||u|||2δ := ε|u|21 + β0 kuk2L2 + δT kb · ∇uk2T , u ∈ U. T ∈Th

We also use the seminorm kuk∗,h,δ := sup

vh ∈Vh

tδ (u, vh ) , |||vh |||δ

u ∈ U.

(4.65)

We will apply theorem 4.2.2. For this we have to verify the corresponding conditions in lemma 4.2.1. First note that vh → tδ (u, vh ) is trivially bounded on Vh and thus the seminorm in (4.65) is well-defined. The conditions (4.16)-(4.17) in lemma 4.2.1 are verified in the following two lemmas. We always assume that (4.55) holds. Lemma 4.4.5 The bilinear form sδ (·, ·) is continuous on U × U : sδ (u, v) ≤ max{cb , 1}|||u|||δ |||v|||δ

for all u, v ∈ U.

Proof. The result follows from: sδ (u, v) ≤ ε|u|1 |v|1 + cb β0 kukL2 kvkL2 + ≤ max{cb , 1}

ε|u|21

+

β0 kuk2L2

= max{cb , 1}|||u|||δ |||v|||δ .

+

X

T ∈Th

X

T ∈Th

δT kb · ∇ukT kb · ∇vkT

δT kb · ∇uk2T

1

2

ε|v|21 + β0 kvk2L2 +

X

T ∈Th

δT kb · ∇vk2T

1

2

Below we use a local inverse inequality (lemma 3.3.11): |vh |m,T ≥ µinv hT |vh |m+1,T

for all vh ∈ Vh , m = 0, 1.T ∈ Th ,

(4.66)

with a constant µinv > 0 independent of h and T . Lemma 4.4.6 If 0 ≤ δT ≤

n 1 2 o 1 2 hT , µ min inv 2 ε β0 c2b

for all T ∈ Th

holds then the bilinear form kδ (·, ·) is elliptic on Vh : kδ (vh , vh ) ≥

1 |||vh |||2δ 2

106

for all vh ∈ Vh .

(4.67)

Proof. Using hb · ∇vh , vh iL2 = − 21 hdiv b vh , vh iL2 and (4.55) we obtain X X kδ (vh , vh ) ≥ ε|vh |21 + β0 kvh k2L2 + δT kb · ∇vh k2T + δT h−ε∆vh + cvh , b · ∇vh iT =

|||vh |||2δ

+

X

T ∈Th

T ∈Th

T ∈Th

δT h−ε∆vh + cvh , b · ∇vh iT .

−1 For a bound on the last term in (4.68) we use k∆vh kT ≤ µ−1 inv hT |vh |1,T , √ −1 and δT ≤ √12 β0 2 c−1 b : X δT h−ε∆vh + cvh , b · ∇vh iT



δT ≤

(4.68)

1 √1 µinv hT ε− 2 2

T ∈T



X

T ∈Th

h   −1 −1 δT εµinv hT |vh |1,T kb · ∇vh kT + β0 cb kvh kT kb · ∇vh kT

X h √ p  1 p  i  1 p ε|vh |1,T √ δT kb · ∇vh kT + β0 kvh kT √ δT kb · ∇vh kT ≤ 2 2 T ∈Th i h 1 X ≤ ε|vh |21,T + β0 kvh k2T + δT kb · ∇vh k2T 2 T ∈Th X  1 1 = ε|vh |21 + β0 kvh k2L2 + δT kb · ∇vh k2T = |||vh |||2δ . 2 2 T ∈Th

Using this in (4.68) proves the result. In the condition (4.67) the bound β0−1 c−1 b should be taken ∞ if β0 = 0 or cb = 0. From the ellipticity result in the previous lemma we see that if we take δT = δ > 0 such that (4.67) is satisfied, a term δkb · ∇vh k2L2 is added in the ellipticity lower bound |||vh |||2δ which enhances stability. The bilinear form corresponding to this additional term is (u, v) → δhb · ∇u, b · ∇viL2 , which models diffusion in the streamline direction b. Therefore the stabilized method (4.63) with δT > 0 is called the streamline diffusion finite element method, SDFEM. Remark 4.4.7 If Vh is the space of piecewise linear finite elements then (∆vh )|T = 0. Inspection of the proof shows that the result of the lemma holds with the condition (4.67) replaced by the weaker condition 0 ≤ δT ≤ 21 β0−1 c−2  b . Corollary 4.4.8 If (4.67) is satisfied then the discrete problem (4.63a) has a unique solution uh ∈ Vh . For β0 > 0 we have the stability bound √

ε|uh |1 +

p

β0 kuh kL2 +

X

T ∈Th

δT kb · ∇uh k2T

1 2

p  √ 1 ≤ 2 3 √ + δh kf kL2 , β0

with δh := max δT . T ∈Th

Proof. The bilinear form kδ (·, ·) is trivially bounded on the finite dimensional space Vh . Lemma 4.4.6 yields Vh -ellipticity of the bilinear form. Existence of a unique solution follows from the Lax-Milgram lemma. For the left handside of the stabiliy inequality we have X p 1 √ √ ε|uh |1 + β0 kuh kL2 + δT kb · ∇uk2T 2 ≤ 3|||uh |||δ . T ∈Th

107

Lemma 4.4.6 yields Combining this with fδ (uh ) = hf, uh iL2 +

|||uh |||2δ ≤ 2kδ (uh , uh ) = 2fδ (uh ) X

T ∈Th

≤ kf kL2 kuh kL2 +

δT hf, b · ∇uh iT X

T ∈Th

δT kf k2T

1

2

X

T ∈Th

δT kb · ∇uh k2T

1 2

p p  1 1 ≤ √ kf kL2 |||uh |||δ + δh kf kL2 |||uh |||δ = √ + δh kf kL2 |||uh |||δ . β0 β0

completes the proof.

Remark 4.4.9 As an example consider the case with linear finite elements, δT = δ for all T and β0 = 1. Then the stability result of this corollary takes the form √ √  √ √ 1 ε|uh |1 + kuh kL2 + δkb · ∇uh kL2 ≤ 2 3 1 + δ kf kL2 , for δ ∈ [0, c−2 ]. (4.69) 2 b note the similarity of this result with the one in corollary 4.4.3 (for the continuous problem) and in corollary 4.3.9 (for the stabilized finite element method applied to the 1D hyperbolic problem).  From the results in corollary 4.4.3 and (4.69) we see that one obtains the strongest stability result if δT is chosen as large as possible (maximal stabilization). In section 4.3 it is shown that smaller values for the stabilization parameter may lead to smaller discretization errors. Below we give an analyis on how to chose the parameter δT such that the (bound for the) discretization error in minimized. Application of theorem 4.2.2 yields: Theorem 4.4.10 Assume that (4.67) is satisfied. For the discrete solution uh of (4.63a) we have the error bound  (4.70) |||u − uh |||δ + ku − uh k∗,h,δ ≤ C inf |||u − vh |||δ + ku − vh k∗,h,δ vh ∈Vh  with C = 1 + max{cb , 1} 3 + 2 max{cb , 1} .

Proof. From lemma 4.4.5 and lemma 4.4.6 it follows that the conditions (4.16) and (4.17) are satisfied with c0 = 12 , c1 = max{cb , 1}. Now we use theorem 4.2.2.

The norm ||| · |||δ is given in terms of the usual L2 - and H 1 -norm. For the right handside in (4.70) we need a bound on ku − vh k∗,h,δ . Such a result is given in the following lemma. We will need the assumption kdiv bkL∞ ≤ γ0 β0 . (4.71) (β0 as in (4.55)). This can always be satisfied (for suitable γ0 ) if β0 > 0. For the cases β0 = 0 this implies that div b = 0 must hold. Lemma 4.4.11 Let δT be such that (4.67) holds and assume that (4.55) and (4.71) are satisfied. For u ∈ U the following estimate holds:  X 1 p  √ kbk∞,T 2 2 min δT−1 , . kuk kuk∗,h,δ ≤ ε|u|1 + β0 (1 + γ0 )kukL2 + T ε + µ2inv h2T β0 T ∈Th

108

Proof. By definition we have kuk∗,h,δ = sup

hb · ∇u, vh iL2 +

vh ∈Vh

P

T ∈Th δT h−ε∆u

|||vh |||δ

+ cu, b · ∇vh iT

.

(4.72) 1

We first treat the second term in the nominator. Using the inverse inequality (4.66) and δT2 ≤ 1 √1 µinv hT ε− 2 , 2

1

δT2 ≤

−1 √1 β 2 c−1 b 2 0

we get

X X  δT ε|u|2,T + cb β0 kukT kb · ∇vh kT δT h−ε∆u + cu, b · ∇vh iT ≤ T ∈Th

T ∈Th

X 1 √ p p √ ε|u|1,T + β0 kukT δT kb · ∇vh kT 2 T ∈Th p  X 1 √ 2  1 2 |||v ||| ≤ ε|u|1,T + β0 kukT h δ 2



(4.73)

T ∈Th

1 ≤ ε|u|21 + β0 kuk2L2 2 |||vh |||δ p  √ ε|u|1 + β0 kukL2 |||vh |||δ . ≤

For the first term in the nominator in (4.72) we obtain, using partial integration, |hb · ∇u, vh iL2 | ≤ |hu, (div b)vh iL2 | + |hu, b · ∇vh iL2 |

We write |||vh |||2δ = (4.74) we have

P

T ∈Th

ε|vh |21,T

≤ γ0 β0 kukL2 kvh kL2 + |hu, b · ∇vh iL2 | (4.74) p ≤ γ0 β0 kukL2 |||vh |||δ + |hu, b · ∇vh iL2 |. P + β0 kvh k2T + δT kb · ∇vh k2T =: T ∈Th ξT2 . For the last term in

X X kukT kb · ∇vh kT . hu, b · ∇vh iT ≤ |hu, b · ∇vh iL2 | = T ∈Th

T ∈Th

−1

From kb · ∇vh kT ≤ δT 2 ξT and kb · ∇vh kT ≤ kbk∞,T |vh |1,T = ≤

kbk∞,T

(ε + µ2inv h2T β0 )

1 2

kbk∞,T

(ε + µ2inv h2T β0 )



X

T ∈Th

 −1 kukT min δT 2 ,

h X

T ∈Th

ε|vh |21,T + µ2inv h2T β0 |vh |21,T

ε|vh |21,T + β0 |vh |2T

we get |hu, b · ∇vh iL2 | ≤

1 2

 min δT−1 ,

1

2

kbk∞,T

(ε + µ2inv h2T β0 ) µ2inv h2T β0

1

(ε + µ2inv h2T β0 ) 2

kbk∞,T

kbk2∞,T

ε+





kuk2T

1 2



i1 2

1

ξT ,

ξT

|||vh |||δ .

Using this in (4.74) we get h X i1 p kbk2∞,T  |hb · ∇u, vh iL2 | 2 2 ≤ γ0 β0 kukL2 + . min δT−1 , kuk T |||vh |||δ ε + µ2inv h2T β0 T ∈Th

109

2

Combining this with the results in (4.72) and (4.73) completes the proof. For the estimation of the approximation error in (4.70) we use an interpolation operator (e.g., nodal interpolation) IVh : H → Vh = Xkh,0 (k ≥ 1), that satisfies

ku − IVh ukT ≤ chm T |u|m,T

|u − IVh u|1,T ≤

(4.75a)

|u|m,T , chm−1 T

(4.75b)

for u ∈ H m (Ω), with 2 ≤ m ≤ k + 1. A main discretization error result is given in the following theorem. Theorem 4.4.12 Assume that (4.55), (4.71) hold and that δT is such that (4.67) is satisfied. Let u be the solution of (4.2) and assume that u ∈ H 2 (Ω). Let uh ∈ Vh = Xkh,0 be the solution of the discrete problem (4.63). For 2 ≤ m ≤ k + 1 the discretization error bound p √ ε |u − uh |1 + β0 ku − uh kL2 + ≤C +C

n X  T ∈Th

√

εhm−1 +

p

X

T ∈Th

δT kb · ∇(u − uh )k2T

 β0 (1 + γ0 )hm |u|m

 + min δT−1 , δT kbk2∞,T h2m−2 T

kbk2∞,T

ε + µ2inv h2T β0

1 2

(4.76)

 2 o 21 h2m |u|m,T , T

(4.77)

holds, with C independent of u, h, ε, β0 , δT and b. Proof. We apply theorem 4.4.10. For the left handside in (4.70) we have X p 1 i 1 h√ |||u − uh |||δ + ku − uh k∗,h,δ ≥ √ δT kb · ∇(u − uh )k2T 2 . ε |u − uh |1 + β0 ku − uh kL2 + 3 T ∈T h

For the right handside in (4.70) we obtain, using kb · ∇(u − vh )kT ≤ kbk∞,T |u − vh |1,T and lemma 4.4.11:  inf |||u − vh |||δ + ku − vh k∗,h,δ ≤ |||u − IVh u|||δ + ku − IVh uk∗,h,δ vh ∈Vh √  p ≤ C ε|u − IVh u|1 + β0 (1 + γ0 )ku − IVh ukL2 + C

 X  T ∈Th

 δT kbk2∞,T |u − IVh u|21,T + min δT−1 ,

kbk2∞,T

ε + µ2inv h2T β0



Using the approximation error bounds in (4.75) we obtain the result.

ku − IVh uk2T

 21

.

Note that this theorem covers the cases δT = 0 for all T (i.e., no stabilization) and β0 = 0. To gain more insight we consider a special case: Remark 4.4.13 We take b = (1 0)T , c ≡ 1 (hence β0 = 1, γ0 = 0), δT = δ for all T and m = 2. Then the estimate in the previous theorem takes the form √ √ ∂ √   h √ 1 |u|2 , ε|u − uh | + ku − uh kL2 + δk (u − uh )kL2 ≤ C h ε + h + δ + min √ , p ∂x δ ε/h + 1 110

with C independent of u, h, ε and δ. For ε ↓ 0 this result is very similar to the one in corollary 4.3.12 for the one-dimensional hyperbolic problem.  For the case without stabilization we obtain the following. Corollary 4.4.14 If the assumptions of theorem 4.4.12 are fulfilled we have the following discretization error bounds for the case δ = 0 for all T : h (4.78) |u − uh |1 ≤ C 1 + hm−1 |u|m ε (4.79) ku − uh kL2 ≤ Chm−1 |u|m if β0 > 0, with a constant C independent of u, h and ε. For ε ↓ 0 these bounds are much better than the ones in (4.51) and (4.52) which resulted from the standard finite element errror analysis. Moreover, the results in (4.78), (4.79) reflect important properties of the standard Galerkin finite element discretization that are observed in practice: • If h . ε holds then (4.78) yields |u − uh |1 ≤ Chm−1 |u|m with C independent of h and ε, which is an optimal discretization error bound. This explains the fact that for h . ε the standard Galerkin finite element method applied to the convection-diffusion problem usually yields an accurate discretization. Note, however, that for ε ≪ 1 the condition h . ε is very unfavourable in view of computational costs. • For fixed h, even if u is smooth (i.e., |u|m bounded for ε ↓ 0) the H 1 -error bound in (4.78) tends to infinity for ε ↓ 0. Thus, if the analysis is sharp, we expect that an instability phenomenom can occur for ε ↓ 0. • For the case β0 > 0 we have the suboptimal bound hm−1 |u|m for the L2 -norm of the discretization error (for an optimal bound we should have hm |u|m ). If u is smooth (|u|m bounded for ε ↓ 0) the discretization error in k·kL2 will be arbitrarily small if h is sufficiently small, even if ε ≪ h. Hence, for the case β0 > 0 the L2 -norm of the discretization error can not show an instability phenomenon as described in the previous remark for the H 1 -norm. Note, however, that the L2 -norm is weaker than the H 1 -norm and in particular allows a more “oscillatory behaviour” of the error. We now turn to the question whether the results can be improved by chosing a suitable value for the stabilization parameter δT . For the term between square brackets in (4.77) we have  + min δT−1 , δT kbk2∞,T h2m−2 T

with gT (δ) := δkbk∞,T + min



1 δkbk∞,T

kbk2∞,T

ε+ ,

µ2inv h2T β0

kbk∞,T ε



2m , hT ≤ gT (δT )kbk∞,T h2m−2 T

(4.80)

h2T . For ε ≤ hT kbk−1 ∞,T the function gT attains

its minimum at δ = hT kbk−1 ∞,T . Remember the condition on parameter choice  hT hT , with ξT := min 1, δT,opt := ξT kbk∞,T ε

δT in (4.67). This leads to the

1 2 µinv kbk∞,T . 2

(4.81)

If kbk∞,T = 0 we take δT,opt = 0. Note that δT,opt ≤ hT kbk−1 ∞,T and thus for hT sufficiently 1 −1 −2 small the condition δT,opt ≤ 2 β0 cb in (4.67) is satisfied. The second condition in (4.67) is satisfied due to the definition of δT,opt in (4.81). If ξT = 1 we have gT (δT,opt ) ≤ δT,opt kbk∞,T + 111

h2T = 2hT , δT,opt kbk∞,T

and if ξT < 1 this implies

hT ε

kbk∞,T ≤ 2µ−2 inv and thus

gT (δT,opt ) ≤ δT,opt kbk∞,T +

kbk∞,T 2 hT ≤ (1 + 2µ−2 inv )hT . ε

Hence, for δT = δT,opt we obtain the following bound for the δT -dependent term in (4.77): n X  T ∈Th

 + min δT−1 , δT kbk2∞,T h2m−2 T

kbk2∞,T

ε + µ2inv h2T β0

1 2m  2 o 12 1 ≤ CkbkL2 ∞ hm− 2 |u|m . hT |u|m,T

Using this in theorem 4.4.12 we obtain the following corollary. Corollary 4.4.15 Let the assumptions be as in theorem 4.4.12. For δT = δT,opt we get the estimate X p 1 √ ε |u − uh |1 + β0 ku − uh kL2 + δT,opt kb · ∇(u − uh )k2T 2 ≤C This implies

√

T ∈Th

ε+

p

1 √  β0 (1 + γ0 )h + kbkL2 ∞ h hm−1 |u|m .

(4.82)

1 √  kbkL2 ∞ √  m−1 β0 (1 + γ0 ) √ h h |u|m , (4.83a) |u − uh |1 ≤ C 1 + h+ √ ε ε 1  √ε kbkL2 ∞ √  m−1 h h |u|m if β0 > 0, ku − uh kL2 ≤ C √ + (1 + γ0 )h + √ β0 β0 (4.83b) √  X 1 p √ 1  δT,opt kb · ∇(u − uh )k2T 2 ≤ C ε + β0 (1 + γ0 )h + kbkL2 ∞ h hm−1 |u|m . (4.83c)

T ∈Th

The constants C are independent of u, h, ε, β0 , δT and b. Some comments related to these discretization error bounds: q  • The bound in (4.83a) is of the form c 1 + hε hm−1 |u|m and thus better than the one in (4.78) if ε ≪ h. √ √ • The bound in (4.83b) is of the form c( ε + h)hm−1 |u|m and thus better than the one in 1 (4.79) if ε ≪ h. For ε ↓ 0 we have a bound of the form chm− 2 |u|m which, for m = 2, is similar to the bound in (4.48) for the one-dimensional hyperbolic problem. • The result in (4.83c) shows a control on the streamline derivative of the discretization error. Such a control is not present in the case δT = 0 (no stabilization). If ξT = 1 for all T (i.e., ε ≤ 12 µ2inv kbk∞,T hT ) and hT ≥ c0 h with c0 > 0 independent of T and h we obtain r  ε + 1 hm−1 |u|m , kb · ∇(u − uh )kL2 ≤ c h and thus an optimal bound of the form kb · ∇(u − uh )kL2 ≤ chm−1 |u|m if ε ≤ h. 112

T • In (4.82) there is a correct scaling of ε, β0 and b. Note that δT,opt = ξT kbkh∞,T has a scaling w.r.t. kbk∞,T that is the same as in the one-dimensional hyperbolic problem in (4.47).

• In case of linear finite elements the condition on δT in (4.67) can be simplified to δT ≤ 1 −1 −2 2 β0 cb , cf. remark 4.4.7. Due to this one can take ξT = 1 in (4.81) and thus δT,opt = hT kbk∞,T . In the general case (quadratic or higher order finite elements) one does not use δT,opt as in (4.81) in practice, because µinv is not known. Instead one often takes the simplified form δT,opt := ξT

hT , kbk∞,T

 hT 1 with ξT := min 1, kbk∞,T , ε 2

in which, if necessary, kbk∞,T is replaced by an approximation.

4.4.3

Stiffness matrix for the convection-diffusion problem

Stiffness matrix for SDFEM: nonsymmetric. Condition number ? (example: hyperbolic problem with SDFEM).

113

114

Chapter 5

Finite element discretization of the Stokes problem 5.1

Galerkin discretization of saddle-point problems

We recall the abstract variational formulation of a saddle-point problem as introduced in section 2.3, (2.43). Let V and M be Hilbert spaces and a : V × V → R,

b:V ×M →R

be continuous bilinear forms. For f1 ∈ V ′ , f2 ∈ M ′ we consider the following variational problem: find (φ, λ) ∈ V × M such that a(φ, ψ) + b(ψ, λ) = f1 (ψ) for all ψ ∈ V

b(φ, µ) = f2 (µ) for all µ ∈ M

(5.1a) (5.1b)

We define H := V × M and

k : H × H → R,

k(U, V) := a(φ, ψ) + b(φ, µ) + b(ψ, λ)

with U := (φ, λ), V := (ψ, µ)

(5.2)

If we define F (ψ, µ) = f1 (ψ) + f2 (µ) then the problem (5.1) can be reformulated as follows: find U ∈ H such that k(U, V) = F (V)

for all V ∈ H

(5.3)

We note that in this section we use the notation U, V, F instead of the (more natural) notation u, v, f that is used in the chapters 2 and 3. The reason for this is that the latter symbols are confusing in view of the notation used in section 5.2. For the Galerkin discretization of this variational problem we introduce finite dimensional subspaces Vh and Mh : Vh ⊂ V, Mh ⊂ M, Hh := Vh × Mh ⊂ H The Galerkin discretization is as follows:

find Uh ∈ Hh such that k(Uh , Vh ) = F (Vh ) for all Vh ∈ Hh

(5.4)

An equivalent formulation is: find (φh , λh ) ∈ Vh × Mh such that a(φh , ψ h ) + b(ψ h , λh ) = f1 (ψ h ) for all ψ h ∈ Vh

b(φh , µh ) = f2 (µh ) for all µh ∈ Mh 115

(5.5a) (5.5b)

For the discretization error we have a variant of the Cea-lemma 3.1.1: Theorem 5.1.1 Consider the variational problem (5.1) with continuous bilinear forms a(·, ·) and b(·, ·) that satisfy: ∃ β>0: ∃γ>0 :

∃ βh > 0 :

supψ∈V

b(ψ,λ) kψkV

≥ β kλkM

a(φ, φ) ≥

supψh ∈Vh

b(ψh ,λh ) kψh kV

γ kφk2V

≥ βh kλh kM

∀λ∈M ∀φ∈V

∀ λh ∈ Mh

(5.6a) (5.6b) (5.6c)

Then the problem (5.1) and its Galerkin discretization (5.5) have unique solutions (φ, λ) and (φh , λh ), respectively. Furthermore the inequality  kφ − φh kV + kλ − λh kM ≤ C inf kφ − ψ h kV + inf kλ − µh kM ψh ∈Vh

holds, with C =

µh ∈Mh

√  2 1 + γ −1 βh−2 (2kak + kbk)3 .

Proof. We shall apply lemma 3.1.1 to the variational problem (5.1) and its Galerkin discretization (5.5). Hence, we have to verify the conditions (3.2), (3.3), (3.4), (3.6). First note that  |k(U, V)| ≤ kak + kbk kUkH kVkH

holds and thus the condition (3.2) is satisfied with M = kak + kbk. Due to the assumptions (5.6a) and (5.6b) it follows from corollary 2.3.12 and theorem 2.3.10 that the conditions (3.3), (3.4) are satisfied. Due to (5.6b) and (5.6c) it follows from corollary 2.3.12 and theorem 2.3.10, with V and M replaced by Vh and Mh , respectively, that the condition (3.6) is fulfilled, too. Moreover, from the final statement in theorem 2.3.10 we obtain that (3.6) holds with −2 −2 εh = γβh2 βh + 2kak ≥ γβh2 kbk + 2kak

Application of lemma 3.1.1 yields

1 M ) inf kφ − ψ h k2V + kλ − µh k2M 2 εh (ψh ,µh )∈Vh ×Mh √ p √ From this and the inequalities α + β ≤ 2 α2 + β 2 ≤ 2(α + β), for α ≥ 0, β ≥ 0, the result follows. kφ − φh k2V + kλ − λh k2M

1

2

≤ (1 +

Remark 5.1.2 The condition (5.6c) implies dim(Vh ) ≥ dim(Mh ). This can be shown by the following argument. Let (ψ j )1≤j≤m be a basis of Vh and (λi )1≤i≤k a basis of Mh . Define the matrix B ∈ Rk×m by Bij = b(ψ j , λi ) From (5.6c) it follows that for every λh ∈ Mh , λh 6= 0, there exists ψ h ∈ Vh such that b(ψ h , λh ) 6= 0. Thus for every y ∈ Rk , y 6= 0, there exists x ∈ Rm such that yT Bx 6= 0, i.e., xT BT y 6= 0. This implies that all columns of BT , and thus all rows of B, are independent. A necessary condition for this is k ≤ m.  116

5.2

Finite element discretization of the Stokes problem

We recall the variational formulation of the Stokes problem (with homogeneous Dirichlet boundary conditions) given in section 2.6 : find (u, p) ∈ V × M such that for all v ∈ V

a(u, v) + b(v, p) = f (v)

(5.7a)

b(u, q) = 0 for all q ∈ M

(5.7b)

with V := H01 (Ω)n , M := L20 (Ω) Z ∇u · ∇v dx a(u, v) := ΩZ b(v, q) := − q div v dx Z Ω f · v dx f (v) := Ω

For the Galerkin discretization of this problem we use the simplicial finite element spaces defined in section 3.2.1, i.e., for a given family {Th } of admissible triangulations of Ω we define the pair of spaces:  (5.8) (Vh , Mh ) := (Xkh,0 )n , Xhk−1 ∩ L20 (Ω) , k ≥ 1

A short discussion concerning other finite element spaces that can be used for the Stokes problem is given in section 5.2.2. For k ≥ 2, the spaces in (5.8) are called Hood-Taylor finite elements [52]. Note that for k = 1 the pressure-space X0h ∩ L20 (Ω) contains discontinuous functions, whereas for k ≥ 2 all functions in the pressure space are continuous. The Stokes problem fits in the general setting presented in section 5.1. The discrete problem is as in (5.5): find (uh , ph ) such that a(uh , vh ) + b(vh , ph ) = f (vh )

for all vh ∈ Vh

(5.9a)

b(uh , qh ) = 0 for all qh ∈ Mh

(5.9b)

From the analysis in section 2.6 it follows that the conditions (5.6a) and (5.6b) in theorem 5.1.1 are satisfied. The following remark shows that the condition in (5.6c), which is often called the discrete inf-sup condition, needs a careful analysis: Remark 5.2.1 Take n = 2, Ω = (0, 1)2 and a uniform triangulation Th of Ω that is defined as follows. For N ∈ N and h := N1+1 the domain Ω is subdivided in squares with sides of length h and vertices in the set { (ih, jh) | 0 ≤ i, j ≤ N + 1 }. The triangulation Th is obtained by subdividing every square in two triangles by inserting a diagonal from (ih, jh) to ((i + 1)h, (j + 1)h). The spaces (Vh , Mh ) are defined as in (5.8) with k = 1. The space Vh has dimension 2N 2 and dim(Mh ) = 2(N + 1)2 − 1. From dim(Vh ) < dim(Mh ) and remark 5.1.2 it follows that the condition (5.6c) does not hold. The same argument applies to the three dimensional case with a uniform triangulation of (0, 1)3 consisting of tetrahedra (every cube is subdivided in 6 tetrahedra). In this case we have dim(Vh ) = 3N 3 and dim(Mh ) = 6(N + 1)3 − 1. We now show that also the lowest order rectangular finite elements in general do not satisfy (5.6c). For this we consider n = 2, Ω = (0, 1)2 and use a triangulation consisting of squares Tij := [ih, (i + 1)h)] × [jh, (j + 1)h], 117

0 ≤ i, j ≤ N,

h :=

1 N +1

We assume that N is odd. The corresponding lowest order rectangular finite element spaces (Vh , Mh ) = (Q1h,0 , Q0h ∩ L20 (Ω)) are defined in (3.17). We define ph ∈ Mh by (ph )|Tij = (−1)i+j

(checkerboard function)

For uh ∈ Vh we use the notation uh = (u, v), u(ih, jh) =: ui,j , v(ih, jh) =: vi,j . Then we have: Z Z ph div uh dx = (−1)i+j uh · n ds Tij

∂Tij

h = (−1)i+j (ui+1,j+1 + ui+1,j ) − (ui,j+1 + ui,j ) 2  + (vi+1,j+1 + vi,j+1 ) − (vi+1,j + vi,j )

Using (uh )|∂Ω = 0 we get, for 0 ≤ k ≤ N + 1,

N X (−1)j (uk,j+1 + uk,j ) = 0, j=0

N X (−1)i (vi+1,k + vi,k ) = 0 i=0

and thus b(uh , ph ) = −

Z

ph div uh dx =

N Z X

ph div uh dx = 0

i,j=0 Tij



for arbitrary uh ∈ Vh . We conclude that there exists ph ∈ Mh , ph 6= 0, such that supvh ∈Vh b(vh , ph ) = 0 and thus the discrete inf-sup conditon does not hold for the pair (Vh , Mh ).  Definition 5.2.2 Let {Th } be a family of admissible triangulations of Ω. Suppose that to every Th ∈ {Th } there correspond finite element spaces Vh ⊂ V and Mh ⊂ M . The pair (Vh , Mh ) is called stable if there exists a constant βˆ > 0 independent of Th ∈ {Th } such that sup vh ∈Vh

b(vh , qh ) ≥ βˆ kqh kL2 kvh k1

for all qh ∈ Mh

(5.10) 

5.2.1

Error bounds

In this section we derive error bounds for the discretization of the Stokes problem with HoodTaylor finite elements. We will prove that for k = 2 the Hood-Taylor spaces are stable. Theorem 5.1.1 is used to derive error bounds. In the analysis of this finite element method we will need an approximation operator which is applicable to functions u ∈ H 1 (Ω) and yields a “reasonable” approximation of u in the subspace of continuous piecewise linear functions. Such an operator was introduced by Cl´ement in [29] and is denoted by IXC (Cl´ement operator). Let {Th } be a regular family of triangulations of Ω consisting of n-simplices and X1h,0 the corresponding finite element space of continuous piecewise linear functions. For the definition of the Cl´ement operator we need the nodal basis of this finite element space. Let {xi }1≤i≤N be the set of vertices in Th that lie in the interior of Ω. To every xi we associate a basis function φi ∈ X1h,0 with the property φi (xi ) = 1, φi (xj ) = 0 for all j 6= i. Then {φi }1≤i≤N forms a basis of the space X1h,0 . We define a neighbourhood of the point xi by ωxi := supp(φi ) = ∪{ T ∈ Th | xi ∈ T } 118

and a neighbourhood of T ∈ Th by ωT := ∪{ ωxi | xi ∈ T } The local L2 -projection Pi : L2 (ωxi ) → P0 is defined by: Z −1 Pi v = |ωxi | v dx ωx i

The Cl´ement operator IXC : H01 (Ω) → X1h,0 is defined by IXC u =

N X (Pi u)φi

(5.11)

i=1

For this operator the following approximation properties hold: Theorem 5.2.3 (Cl´ ement operator.) There exists a constant C independent of Th ∈ {Th } such that for all u ∈ H01 (Ω) and all T ∈ Th : kIXC uk1,T

ku −

IXC uk0,T

ku − IXC uk0,∂T

≤ C kuk1,ωT

(5.12a)

≤ C hT kuk1,ωT

(5.12b)

≤ C hT kuk1,ωT

(5.12c)

1 2

Proof. We refer to [29] and [13]. Variants of this operator are discussed in [13, 81, 14]. Results as in theorem 5.2.3 also hold if H01 (Ω) and X1h,0 are replaced by H 1 (Ω) and X1h , respectively. Using the Cl´ement operator one can reformulate the stability condition (5.10) in another form that turns out to be easier to handle. This reformulation is given in [93] and applies to a large class of finite element spaces. Here we only present a simplified variant that applies to the Hood-Taylor finite element spaces. We will need the mesh-dependent norm sX h2T k∇qh k20,T , qh ∈ X1h ∩ L20 (Ω) kqh k1,h := T ∈Th

Theorem 5.2.4 Let {Th } be a regular family of triangulations. The Hood-Taylor pair of finite element spaces (Vh , Mh ) as in (5.8), k ≥ 2, is stable iff there exists a constant β ∗ > 0 independent of Th ∈ {Th } such that sup vh ∈Vh

b(vh , qh ) ≥ β ∗ kqh k1,h kvh k1

for all qh ∈ Mh

(5.13)

Proof. For T ∈ Th let F (ˆ x) = Bˆ x + c be an affine mapping such that F (Tˆ) = T , where Tˆ is the unit n-simplex. Using the lemmas 3.3.5 and 3.3.6 and dim(Pk ) < ∞ we get, for qh ∈ Mh and qˆh := qh ◦ F : k∇qh k20,T = |qh |21,T ≤ CkB−1 k22 | det B||ˆ qh |21,Tˆ

2 ≤ CkB−1 k22 | det B||ˆ qh |20,Tˆ ≤ C h−2 T kqh k0,T

119

with a constant C independent of qh and of T . This yields kqh k1,h ≤ Ckqh kL2 and thus the stability property implies (5.13). Assume that (5.13) holds. Take an arbitrary qh ∈ Mh with kqh kL2 = 1. The constants C below are independent of qh and of Th ∈ {Th }. From the inf-sup property of the continuous problem it follows that there exists β > 0, independent of qh , and v ∈ H01 (Ω)n such that kvk1 = 1,

b(v, qh ) ≥ β

We apply the Cl´ement operator to the components of v:  n wh := IXC v = IXC v1 , . . . , IXC vn ∈ X1h,0 ⊂ Vh

From theorem 5.2.3 we get

kwh k1 ≤ Ckvk1 = C

X

T ∈Th

2 2 h−2 T kv − wh k0,T ≤ Ckvk1 = C

From this we obtain Z

qh div (wh − v) dx| X Z ∇qh · (wh − v) dx| =|

|b(wh − v, qh )| = |



T ∈Th

≤ ≤

X

T ∈Th

T

(5.14)

k∇qh k0,T kwh − vk0,T

X

T ∈Th

h2T k∇qh k20,T

≤ Ckqh k1,h

1

2

X

T ∈Th

h ,qh ) Define ξ := supvh ∈Vh b(v kvh k1 . From (5.13) we have kqh k1,h ≤ the result in (5.14) we obtain

2 h−2 T kwh − vk0,T

ξ β∗ .

1

2

Using this in combination with

b(wh , qh ) ≥ C b(wh , qh ) = C b(v, qh ) + C b(wh − v, qh ) kwh k1 ˜   ˜ h k1,h ≥ C β − Cξ ≥ C β − Ckq β∗

ξ≥

And thus ξ ≥ βˆ for a suitable constant βˆ > 0 independent of qh and of Th ∈ {Th }.

Theorem 5.2.5 Let {Th } be a regular family of triangulations consisting of simplices. We assume that every T ∈ Th has at least one vertex in the interior of Ω. Then the Hood-Taylor pair of finite element spaces with k = 2 is stable. Proof. We consider only n ∈ {2, 3}. Take qh ∈ Mh , qh 6= 0. The constants used in the proof are independent of Th ∈ {Th } and of qh . The set of edges in Th is denoted by E. This set is partitioned in edges which are in the interior of Ω and edges which are part of ∂Ω: 120

E = Eint ∪ Ebnd . For every E ∈ E, mE denotes the midpoint of the edge E. Every E ∈ Eint with endpoints a1 , a2 ∈ Rn is assigned a vector tE := a1 − a2 . For E ∈ Ebnd we define tE := 0. Since qh ∈ X1h the function x → tE · ∇qh (x) is continuous across E, for E ∈ Eint . We define  wE := tE · ∇qh (mE ) tE ,

Due to lemma 3.3.2 a unique wh ∈ X2h,0 wh (xi ) =

(

0 wE

n

for E ∈ E

is defined by

if xi is a vertex of T ∈ Th if xi = mE for E ∈ E

For each T ∈ Th the set of edges of T is denoted by ET . By using quadrature we see that for any p ∈ P2 which is zero at the vertices of T we have Z

p(x) dx =

T

|T | X p(mE ) 2n − 1 E∈ET

We obtain: −

Z

qh div wh dx =

Z





=

∇qh · wh dx =

X

T ∈Th

=

X

T ∈Th

X

T ∈Th

(∇qh )|T ·

Z

wh dx

T

X |T | (∇qh )|T · wh (mE ) 2n − 1

(5.15)

E∈ET

|T | X 2n − 1

E∈ET

2 tE · ∇qh (mE )

Using the fact that (∇qh )|T is constant one easily checks that Ck∇qh k20,T ≤

X

E∈ET

2 2 ˜ tE · ∇qh (mE ) ≤ Ck∇q h k0,T

(5.16)

with C > 0 and C˜ independent of T . Combining this with (5.15) we get −

Z



qh div wh dx ≥ C ≥C

X

T ∈Th

X

T ∈Th

|T | k∇qh k20,T 2n − 1 h2T k∇qh k20,T = Ckqh k21,h

(5.17)

Let ETˆ be the set of all edges of the unit n-simplex Tˆ. In the space { vˆ ∈ P2 | vˆ is zero at the vertices of Tˆ } 1 P 2 2 are equivalent. Using this componentwise for the the norms kˆ v k1,Tˆ and v ˆ (m ) E E∈ETˆ ˆ h := wh ◦ F we get: vector-function w ˆ h k21,Tˆ ˆ h |21,Tˆ ≤ Ch−2 |wh |21,T ≤ Ch−2 T |T | kw T |T | |w X X ˆ h (mE )k22 = Ch−2 kw kwE k22 ≤ Ch−2 T |T | T |T | E∈ET

E∈ETˆ

121

Summation over all simplices T yields, using (5.16), X X X kwh k21 ≤ C|wh |21 ≤ C |wh |21,T ≤ C |T | h−2 kwE k22 T =C

X

T ∈Th

≤C

X

T ∈Th

T ∈Th

h−2 T |T |

X

E∈ET

T ∈Th

E∈ET

2 tE · ∇qh (mE ) ktE k22

(5.18)

h2T k∇qh k20,T = C kqh k21,h

From (5.17) and (5.18) we obtain b(wh , qh ) ≥ C kqh k1,h kwh k1 with a constant C > 0 independent of qh and of Th ∈ {Th }. Now apply theorem 5.2.4. One can also prove stability for higher order Hood-Taylor finite elements: Theorem 5.2.6 Let {Th } be a regular family of triangulations as in theorem 5.2.5. Then the Hood-Taylor pair of finite element spaces with k ≥ 3 is stable. Proof. We refer to the literature: [15, 16, 22]. Remark 5.2.7 The condition that every T ∈ Th has at least one vertex in the interior of Ω is a mild one. Let S := { T ∈ Th | T has no vertex in the interior of Ω }. If S 6= {∅} then a suitable bisection of each T ∈ S (and of one of the neighbours of T ) results in a modified admissible triangulation for which the condition is satisfied. In certain cases the condition can be avoided (for example, for n = 2, k = 2, 3, cf. [87]) or replaced by another similar assumption on the geometry of the triangulation (cf. remark 3.2 in [16]). An example which shows that the stability result does in general not hold without an assumption on the geometry of the triangulation is given in [16] remark 3.3.  For the discretization of the Stokes problem with Hood-Taylor finite elements we have the following bound for the discretization error: Theorem 5.2.8 Let {Th } be a regular family of triangulations as in theorem 5.2.5. Consider the discrete Stokes problem (5.9) with Hood-Taylor finite element spaces as in (5.8), k ≥ 2. Suppose that the continuous solution (u, p) lies in H m (Ω)n × H m−1 (Ω) with m ≥ 2. For m ≤ k + 1 the following holds:  ku − uh k1 + kp − ph kL2 ≤ C hm−1 |u|m + |p|m−1 with a constant C independent of Th ∈ {Th } and of (u, p).

Proof. We apply theorem 5.1.1 with V = H01 (Ω)n , M = L20 (Ω) and (Vh , Mh ) the pair of HoodTaylor finite element spaces. From the analysis in section 2.6 it follows that the conditions (5.6a) and (5.6b) are satisfied. From theorem 5.2.5 or theorem 5.2.6 it follows that the discrete inf-sup property (5.6c) holds with a constant βh independent of Th . Hence we have  (5.19) ku − uh k1 + kp − ph kL2 ≤ C inf ku − vh k1 + inf kp − qh kL2 vh ∈Vh

122

qh ∈Mh

For the first term on the right handside we use (componentwise) the result of corollary 3.3.10. This yields: (5.20) inf ku − vh k1 ≤ Chm−1 |u|m vh ∈Vh

with a constant C independent of u and of Th ∈ {Th }. Using p ∈ L20 (Ω) it follows that inf kp − qh kL2 =

qh ∈Mh

inf

qh ∈Xk−1 h

kp − qh kL2

For m = 2 we can use the Cl´ement operator of theorem 5.2.3 and for m ≥ 3 the result in corollary 3.3.10 to bound the approximation error for the pressure: inf kp − qh kL2 ≤ Chm−1 |p|m−1

(5.21)

qh ∈Mh

Combination of (5.19), (5.20) and (5.21) completes the proof. Sufficient conditions for (u, p) ∈ H 2 (Ω)n × H 1 (Ω) to hold are given in section 2.6.2. As in section 3.4.2 one can derive an L2 -error bound for the velocity using a duality argument. For this we have to assume H 2 -regularity of the Stokes problem (cf. section 2.6.2): Theorem 5.2.9 Consider the Stokes problem and its Hood-Taylor discretization as described in theorem 5.2.8. In addition we assume that the Stokes problem is H 2 -regular. Then for 2 ≤ m ≤ k + 1 the inequality  ku − uh kL2 ≤ Chm |u|m + |p|m−1 holds, with C independent of Th ∈ {Th } and of (u, p).

Proof. The variational Stokes problem can be reformulated as in (5.3) : find U ∈ H such that k(U, V) = F (V)

for all V ∈ H

R with H = H01 (Ω)n × L20 (Ω), k(·, ·) as in (5.2), U = (u, p), F (V) = F ((v, q)) = Ω f · v dx. Let eh = U − Uh = (u − uh , p − ph ) be the discretization error. We consider the dual problem with ˜ = (˜ f = u − uh : let U u, p˜) ∈ H be such that Z ˜ ˜ k(U, V) = k(V, U) = Fe (V) := (u − uh ) · v dx ∀ V = (v, q) ∈ H Ω

For V = eh we obtain, using the Galerkin orthogonality of eh : ˜ = k(eh , U ˜ − wh ) ∀ wh ∈ Hh = Vh × Mh ku − uh k2L2 = Fe (eh ) = k(eh , U)

and thus ku − uh k2L2 ≤ C keh kH ≤ C keh kH

˜ − wh kH inf kU

wh ∈Hh

inf k˜ u − vh k1 + inf k˜ p − qh kL2

vh ∈Vh

qh ∈Mh



For the second term on the right handside we can use approximation results as in (5.20), (5.21). In combination with the H 2 -regularity this yields  u|2 + |˜ p|1 ≤ C hku − uh kL2 p − qh kL2 ≤ C h |˜ u − vh k1 + inf k˜ inf k˜ vh ∈Vh

qh ∈Mh

For keh kH ≤ ku − uh k1 + kp − ph kL2 we can use the result in theorem 5.2.8. Thus we conlcude that  ku − uh k2L2 ≤ C hm−1 |u|m + |p|m−1 h ku − uh kL2 holds.

123

5.2.2

Other finite element spaces

In this section we briefly discuss some other pairs of finite element spaces that are used in practice for solving Stokes (and Navier-Stokes) problems. Rectangular finite elements. Let {Th } be a regular family of triangulations consisting of n-rectangles. The pair of rectangular finite element spaces is given by (cf. (3.17)):  (Vh , Mh ) = (Qkh,0 )n , Qhk−1 ∩ L20 (Ω) , k ≥ 1

In remark 5.2.1 it is shown that for k = 1 this pair in general will not be stable. In [11] it is proved that the pair (Vh , Mh ) with k = 2 is stable both for n = 2 and n = 3. In [87] it is proved that for all k ≥ 2 the pair (Vh , Mh ) is stable if n = 2. In these stable cases one can prove disretization error bounds as in theorem 5.2.8 and theorem 5.2.9. The analysis is very similar to the one presented for the case of simplicial finite elements in section 5.2.1. Mini-element. Let {Th } be a regular family of triangulations consisting of simplices. For every T ∈ Th we can define a so-called “bubble” function (Q n+1 for x ∈ T i=1 λi (x) bT (x) = 0 otherwise with λi (x), i = 1, . . . , n + 1, the barycentric coordinates of x ∈ T . Define the space of bubble functions B := span{ bT | T ∈ Th }. The mini-element, introduced in [4] is given by the pair  (Vh , Mh ) = (X1h,0 ⊕ B)n , X1h ∩ L20 (Ω)

This element is stable, cf. [40, 4]. An advantage of this element compared to, for example, the Hood-Taylor element with k = 2 is that the implementation of the former is relatively easy. This is due to the following. The unknowns associated to the bubble basis functions can be eliminated by a simple local technique (so-called static condensation) and the remaining unknowns for the velocity and pressure basis functions are associated to the same set of points, namely the vertices of the simplices. In case of Hood-Taylor elements (k = 2) one also needs the midpoints of edges for some of the velocity unknowns. Hence, the data structures for the mini-element are relatively simple. A disadvantage of the mini-element is its low accuracy (only P1 for the velocity). IsoP2 − P1 element. This element is a variant of the Hood-Taylor element with k = 2. Let {Th } be a regular family of triangulations consisting of simplices. Given Th we construct a refinement T 1 h by dividing 2 each n-simplex T ∈ Th , n = 2 or n = 3, into 2n subsimplices by connecting the midpoints of the edges of T . Note that for n = 3 this construction is not unique. The space of continuous functions which are piecewise linear on the simplices in T 1 h and zero on ∂Ω is denoted by X11 h,0 . 2

The isoP2 − P1 element consists of the pair of spaces

 (Vh , Mh ) = (X11 h,0 )n , X1h ∩ L20 (Ω)

2

2

Both for n = 2 and n = 3 this pair is stable. This can be shown using the analysis of section 5.2.1. The proofs of theorem 5.2.4 and of theorem 5.2.5 apply, with minor modifications, 124

to the isoP2 − P1 pair, too. In the discrete velocity space Vh the degrees of freedom (unknowns) are associated to the vertices and the midpoints of edges of T ∈ Th . This is the same as for the discrete velocity space in the Hood-Taylor pair with k = 2. This explains the name isoP2 − P1 . Note, however, that the accuracy for the velocity for the isoP2 − P1 element is only O(h) in the norm k · k1 , whereas for the Hood-Taylor pair with k = 2 one has O(h2 ) in the norm k · k1 (provided the solution is sufficiently smooth). Nonconforming Crouzeix-Raviart: in preparation. In certain situations, if the pair (Vh , Mh ) of finite element spaces is not stable, one can still successfully apply these spaces for discretization of the Stokes problem, provided one uses an appropriate stabilization technique. We do not discuss this topic here. An overview of some useful stabilization methods is given in [73], section 9.4.

125

126

Chapter 6

Linear iterative methods The discretization of elliptic boundary value problems like the Poisson equation or the Stokes equations results in a large sparse linear system of equations. For the numerical solution of such a system iterative methods are applied. Important classes of iterative methods are treated in the chapters 7-10. In this chapter we present some basic results on linear iterative methods and discuss some classical iterative methods like, for example, the Jacobi and Gauss-Seidel method. In our applications these methods turn out to be very inefficient and thus not very suitable for practical use. However, these methods play a role in the more advanced (and more efficient) methods treated in the chapters 7-10. Furthermore, these basic iterative methods can be used to explain important notions such as convergence rate and efficiency. Standard references for a detailed analysis of basic iterative methods are Varga [92], Young [100]. We also refer to Hackbusch [46],[48] and Axelsson [6] for an extensive analysis of these methods. In the remainder of this chapter we consider a (large sparse) linear system of equations Ax = b

(6.1)

with a nonsingular matrix A ∈ Rn×n . The solution of this system is denoted by x∗ .

6.1

Introduction

We consider a given iterative method, denoted by xk+1 = Ψ(xk ), k ≥ 0, for solving the system in (6.1). We define the error as ek = xk − x∗ , k ≥ 0. The iterative method is called a linear iterative method if there exists a matrix C (depending on the particular method but independent of k) such that for the errors we have the recursion ek+1 = Cek ,

k ≥ 0.

(6.2)

The matrix C is called the iteration matrix of the method. In the next section we will see that basic iterative methods are linear. Also the multigrid methods discussed in chapter 9 are linear. The Conjugate Gradient method, however, is a nonlinear iterative method (cf. chapter 7). From (6.2) it follows that ek = Ck e0 for all k, and thus limk→∞ ek = 0 for arbitrary e0 if and only if limk→∞ Ck = 0. Based on this, the linear iterative method with iteraton matric C is is called convergent if lim Ck = 0 . (6.3) k→∞

127

An important characterization for convergence is related to the spectral radius of the iteration matrix. To derive this characterization we first need two lemmas. Lemma 6.1.1 For all B ∈ Rn×n and all ε > 0 there exists a matrix norm k · k∗ on Rn×n such that kBk∗ ≤ ρ(B) + ε Proof. For the given matrix B there exists a nonsingular matrix T ∈ Cn×n which transforms B to its Jordan normal form: T−1 BT = J,

J = blockdiag(Jℓ )1≤ℓ≤m ,

with Jℓ = λℓ ,  λℓ 1  ..  . or Jℓ =    ∅

..





   ,  .. . 1 λℓ .

λℓ ∈ σ(B),

1≤ℓ≤m

˜ ε := TDε , J ˜ ε := D−1 JDε . For the given ε > 0 define Dε := diag(ε, ε2 , . . . , εn ) ∈ Rn×n and T ε ˜ ε has the same form as J, only with the entries 1 on the codiagonal replaced by ε. Note that J For C ∈ Rn×n define ˜ −1 ˜ kCk∗ := kT ε CTε k∞ = max

1≤i≤n

n X j=1

˜ −1 ˜ |(T ε CTε )ij |

This defines a matrix norm on Rn×n . Furthermore, ˜ −1 BT ˜ ε k∞ = kJ ˜ ε k∞ ≤ max |λ| + ε ≤ ρ(B) + ε kBk∗ = kT ε λ∈σ(B)

This proves the result of the lemma. Lemma 6.1.2 (Stable matrices) For B ∈ Rn×n the following holds: lim Bk = 0 if and only if ρ(B) < 1.

k→∞

Proof. “⇐”. Take ε > 0 such that ρ(B) + ε < 1 holds and let k · k∗ be the matrix norm defined in lemma 6.1.1. Then kBk k∗ ≤ kBkk∗ ≤ (ρ(B) + ε)k holds. Hence, limk→∞ kBk k∗ = 0 and thus limk→∞ Bk = 0. ∞ n n×n . Take “⇒”. Let max{ kCxk kxk∞ | x ∈ C , x 6= 0 } be the complex maximum norm on C λ ∈ σ(C) and v ∈ Cn , v 6= 0, such that Cv = λv and |λ| = ρ(C). Then |λ|kvk∞ = kλvk∞ = kCvk∞ . From this it follows that ρ(C) ≤ kCk∞ holds for arbitrary C ∈ Cn×n . From limk→∞ Bk = 0 we get limk→∞ kBk k∞ = 0 and thus, due to ρ(B)k = ρ(Bk ) ≤ kBk k∞ , we have limk→∞ ρ(B)k = 0. Thus ρ(B) < 1 must hold.

128

Corollary 6.1.3 For any B ∈ Rn×n and any matrix norm k · k on Rn×n we have ρ(B) ≤ kBk ˜ := ρ(B)−1 B. Proof. If ρ(B) = 0 then B = 0 and the result holds. For ρ(B) 6= 0 define B ˜ > kBk ˜ holds and thus limk→∞ kBk ˜ k = 0. Using Assume that ρ(B) > kBk. Then 1 = ρ(B) ˜ k k ≤ kBk ˜ k this yields limk→∞ B ˜ k = 0. From lemma 6.1.2 we conclude ρ(B) ˜ < 1 which gives kB a contradiction. From lemma 6.1.2 we obtain the following result: Theorem 6.1.4 A linear iterative method is convergent if and only if for the corresponding iteration matrix C we have ρ(C) < 1. If ρ(C) < 1 then the iterative method converges and the spectral radius ρ(C) even yields a quantitative result for the rate of convergence. To see this, we first formulate a lemma: Lemma 6.1.5 For any matrix norm k · k on Rn×n and any B ∈ Rn×n the following equality holds: lim kBk k1/k = ρ(B). k→∞

Proof. From corollary 6.1.3 we get ρ(B)k = ρ(Bk ) ≤ kBk k and thus 1

ρ(B) ≤ kBk k k

for all k ≥ 1

(6.4)

˜ := (ρ(B) + ε)−1 B. Then ρ(B) ˜ < 1 and thus limk→∞ B ˜ k = 0. Take arbitrary ε > 0 and define B ˜ k k ≤ 1, i.e., (ρ(B) + ε)−k kBk k ≤ 1. We get Hence there exists k0 such that for all k ≥ k0 , kB 1

kBk k k ≤ ρ(B) + ε

for all k ≥ k0

(6.5)

From (6.4) and (6.5) it follows that limk→∞ kBk k1/k = ρ(B). For the error ek = Ck e0 we have max

e0 ∈Rn

kCk e0 k 1 ≤ 0 0 n ke k e e ∈R 1 1 1  k1 ⇔ kCk k k ≤ ⇔ kCk k ≤ e e

1 kek k ≤ ⇔ ke0 k e

max

From lemma 6.1.5 we have that, for k large enough,

kCk k1/k ≈ ρ(C) Hence, to reduce the norm of an arbitrary starting error ke0 k by a factor 1/e we need asymptotically (i.e. for k large enough) approximately (− ln(ρ(C)))−1 iterations. Based on this we call − ln(ρ(C)) the asymptotic convergence rate of the iterative method (in the literature, e.g. Hackbusch [48], sometimes ρ(C) is called the asymptotic convergence rate). The quantity kCk is the contraction number of the iterative method. Note that kek k ≤ kCkkek−1 k

for all

holds, and ρ(C) ≤ kCk. 129

k≥1

From these results we conclude that ρ(C) is a reasonable measure for the rate of convergence, provided k is large enough. For k ”small” it may be better to use the contraction number as a measure for the rate of convergence. Note that the asymptotic convergence rate does not depend on the norm k · k. In some situations, measuring the rate of convergence using the contraction number or using the asymptotic rate of convergence is the same. For example, if we use the Euclidean norm and if C is symmetric then ρ(C) = kCk2

(6.6)

holds. However, in other situations, for example if C is strongly nonsymmetric, one can have ρ(C) ≪ kCk. To measure the quality (efficiency) of an iterative method one has to consider the following two aspects: • The arithmetic costs per iteration. This can be quantified in flops needed for one iteration. • The rate of convergence. This can be quantified using − ln(ρ(C)) (asymptotic convergence rate) or kCk (contraction number). To be able to compare iterative methods the notion of complexity is introduced. We assume: • A given linear system. • A given error reduction factor R, i.e. we wish to reduce the norm of an arbitrary starting error by a factor R. The complexity of an iterative method is then defined as the order of magnitude of the number of flops needed to obtain an error reduction with a factor R for the given problem. In this notion the arithmetic costs per iteration and the rate of convergence are combined. The quality of different methods for a given problem (class) can be compared by means of this complexity concept. Examples of this are given in section 6.6

6.2

Basic linear iterative methods

In this section we introduce classical linear iterative schemes, namely the Richardson, (damped) Jacobi, Gauss-Seidel and SOR methods. For the convergence analysis that is presented in the sections 6.3 and 6.5 it is convenient to put these methods in the general framework of of so-called matrix splittings (cf. Varga [92]). We first show how a splitting of the matrix A in a natural way results in a linear iterative method. We assume a splitting of the matrix A such that A =M−N ,

where

(6.7a)

M is nonsingular, and

(6.7b)

for arbitrary y we can solve Mx = y with relatively low costs.

(6.7c)

For the solution x∗ of the system in (6.1) we have Mx∗ = Nx∗ + b . 130

The splitting of A results in the following matrix splitting iterative method. For a given starting vector x0 ∈ Rn we define Mxk+1 = Nxk + b , k ≥ 0 (6.8) This can also be written as xk+1 = xk − M−1 (Axk − b).

(6.9)

From (6.9) it follows that for the error ek = xk − x∗ we have ek+1 = (I − M−1 A)ek .

Hence the iteration in (6.8), (6.9) is a linear iterative method with iteration matrix C = I − M−1 A = M−1 N.

(6.10)

The condition in (6.7c) is introduced to obtain a method in (6.8) for which the arithmetic costs per iteration are acceptable. Below we will see that the above-mentioned classical iterative methods can be derived using a suitable matrix splitting. These methods satisfy the conditions in (6.7b), (6.7c), but unfortunately, when applied to discrete elliptic boundary value problems, the convergence rates of these methods are in general very low. This is illustrated in section 6.6. Richardson method The simplest linear iterative method is the Richardson method:  0 x a given starting vector , xk+1 = xk − ω(Axk − b) , k ≥ 0 .

(6.11)

with a parameter ω ∈ R, ω 6= 0. The iteration matrix of this method is given by Cω = I − ωA. Jacobi method A second classical and very simple method is due to Jacobi. We introduce the notation D := diag(A) ,

A=D−L−U ,

(6.12)

with L a lower triangular matrix with zero entries on the diagonal and U an upper triangular matrix with zero entries on the diagonal. We assume that A has only nonzero entries on the diagonal, so D is nonsingular. The method of Jacobi is the iterative method as in (6.8) based on the matrix splitting M := D , N := L + U . The method of Jacobi is as follows  0 x a given starting vector , Dxk+1 = (L + U)xk + b , k ≥ 0 . This can also be formulated row by row: ( 0 x a given starting vector, P aii xk+1 = − j6=i aij xkj + bi , i

i = 1, 2, . . . , n ,

k≥0.

(6.13)

P From this we see that in the method of Jacobi we solve the ith equation ( nj=1 aij xj = bi ) for the ith unknown (xi ) using values for the other unknowns (xj , j 6= i) computed in the previous 131

iteration. The iteration can also be represented as xk+1 = (I − D−1 A)xk + D−1 b,

k≥0

In the Jacobi method the computational costs per iteration are ”low”, namely comparable to one matrix-vector multiplication Ax, i.e. cn flops (due to the sparsity if A). We introduce a variant of the Jacobi method in which a parameter is used. This method is given by xk+1 = (I − ωD−1 A)xk + ωD−1 b, k ≥ 0 (6.14) with a given real parameter ω 6= 0. This method corresponds to the splitting M=

1 D, ω

N=(

1 − 1)D + L + U ω

(6.15)

For ω = 1 we obtain the Jacobi method. For ω 6= 1 this method is called the damped Jacobi method (“damped” due to the fact that in practice one usually takes ω ∈ (0, 1)). Gauss-Seidel method This method is based on the matrix splitting M := D − L ,

N := U .

This results in the method: 

x0 a given starting vector , (D − L)xk+1 = Uxk + b , k ≥ 0 .

This can be formulated row wise: ( 0 x a given starting vector , P P aii xk+1 = − ji aij xkj + bi , i j

i = 1, . . . , n .

(6.16)

For the Gauss-Seidel method to be feasible we assume that D is nonsingular. In the Jacobi method (6.13) for the computation of xk+1 (i.e. for solving the ith equation for the ith unknown i xi ) we use the values xkj , j 6= i, whereas in the Gauss–Seidel method (6.16) for the computation of xk+1 we use xk+1 , j < i and xkj , j > i. i j The iteration matrix of the Gaus-Seidel method is given by C = (D − L)−1 U = I − (D − L)−1 A .

SOR method The Gauss-Seidel method in (6.16) can be rewritten as (

x0 a given starting vector ,  P P k+1 xk+1 = xki − + j≥i aij xkj − bi /aii , i j0:  0   x a given starting vector ,  P P k+1 k+1 k−ω k − b /a , (6.17) x = x a x + a x ii ij ij i i j i j 1 (cf. Theorem 6.4.3 below). For ω = 1 the SOR method results in the Gauss-Seidel method. In matrix-vector notation the SOR method is as follows: Dxk+1 = (1 − ω)Dxk + ω(Lxk+1 + Uxk + b), or, equivalently,

1 1 (D − ωL)xk+1 = [(1 − ω)D + ωU]xk + b . ω ω From this it is clear that the SOR method is also a matrix splitting iterative method, corresponding to the splitting (cf. (6.15)) M :=

1 D−L , ω

N := (

1 − 1)D + U . ω

The iteration matrix is given by 1 C = Cω = I − M−1 A = I − ( D − L)−1 A. ω For the SOR method the arithmetic costs per iteration are comparable to those of the GaussSeidel method. The Symmetric Successive Overrelaxation method (SSOR) is a variant of the SOR method. One SSOR iteration consists of two SOR steps. In the first step we apply an SOR iteration as in (6.17) and in the second step we again apply an SOR iteration but now with the reversed ordering of the unknowns. In formulas we thus have:   P P k+ 21 k+ 1  k − b /a ,  xi 2 = xki − ω i = 1, 2, . . . , n a x + a x ii i ji aij xj i This method results if we use a matrix splitting with M=

1 (D − ωL)D−1 (D − ωU) . ω(2 − ω)

(6.18)

Although the arithmetic costs for one SSOR iteration seem to be significantly higher than for one SOR iteration, one can implement SSOR in such a way that the costs per iteration are approximately the same as for SOR (cf. [68]). In many cases the rate of convergence of both methods is about the same. Often, in the SSOR method the sensitivity of the convergence rate with respect to variation in the relaxation parameter ω is much lower than in the SOR method (cf. Axelsson and Barker [7]). Finally we note that, if A is symmetric positive definite then the matrix M in (6.18) is symmetric positive definite, too (such a property does not hold for the SOR method). Due to this property the SSOR method can be used as a preconditioner for the Conjugate Gradient method. This is further explained in chapter 7. 133

6.3

Convergence analysis in the symmetric positive definite case

For the classical linear iterative methods we derive convergence results for the case that A is symmetric positive definite. Recall that for square symmetric matrices B and C we use the notation B < C (B ≤ C) if C − B is positive definite (semi-definite). We start with an elementary lemma: Lemma 6.3.1 Let B ∈ Rn×n be a symmetric positive definite matrix. The smallest and largest eigenvalues of B are denoted by λmin (B) and λmax (B), respectively. The following holds: ρ(I − ωB) < 1 iff

0A ω ω

if ω ∈ (0, 2)

Application of lemma 6.3.4 proves the result.

In the following lemma we show that for every matrix A (i.e., not necessarily symmetric) with a nonsingular diagonal the SOR method with ω ∈ / (0, 2) is not convergent. Lemma 6.3.7 Let A ∈ Rn×n be a matrix with aii 6= 0 for all i. For the iteration matrix Cω = I − ( ω1 D − L)−1 A of the SOR method we have ρ(Cω ) ≥ |1 − ω|

for all ω 6= 0

ˆ := D−1 L and U ˆ := D−1 U. Then we have Proof. Define L  ˆ −1 ω(I − L ˆ − U) ˆ = I − ωL ˆ −1 (1 − ω)I + ω U) ˆ Cω = I − (I − ω L)    ˆ −1 (1 − ω)I + ω U) ˆ = (1 − ω)n . Let {λi | 1 ≤ i ≤ n } = σ(Cω ) be Hence, det(Cω ) = det I − ω L the spectrum of the iteration matrix. Then due to fact that the determinant of a matrix equals the product of its eigenvalues we get Πni=1 |λi | = |1 − ω|n . Thus there must be an eigenvalue with modulus at least |1 − ω|. Jacobi for the nonsymmetric case .....in preparation. Block-Jacobi method ....in preparation. 136

6.4

Rate of convergence of the SOR method

The result in theorem 6.3.6 shows that in the symmetric positive definite case the SOR method is convergent if we take ω ∈ (0, 2). This result, however, does not quantify the rate of convergence. Moreover, it is not clear how the rate of convergence depends on the choice of the parameter ω. It is known that for certain problems a suitable choice of the parameter ω can result in an SOR method which has a much higher rate of convergence than the Jacobi and Gauss-Seidel methods. This is illustrated in example 6.6.4. However, the relation between the rate of convergence and the parameter ω is strongly problem dependent and for most problems it is not known how a good (i.e. close to optimal) value for the parameter ω can be determined. In this section we present an analysis which, for a relatively small class of block-tridiagonal matrices, shows the dependence of the spectral radius of the SOR iteration matrix on the parameter ω. For related (more general) results we refer to the literature, e.g., Young [100], Hageman and Young [50], Varga [92]. For a more recent treatment we refer to Hackbusch [48]. We start with a technical lemma. Recall the decomposition A = D − L − U, with D = diag(A), L and U strictly lower and upper triangular matrices, respectively. Lemma 6.4.1 Consider tridiagonal structure  D11  A21   A=   

A = D − L − U with det(A) 6= 0. Assume that A has the blockA12 .. . .. . ∅

..

.

..

.

..

.

∅ .. . .. . Ak,k−1



     , Dii ∈ Rni ×ni , 1 ≤ i ≤ k,   Ak−1,k  Dkk

(6.26)

with diag(A) = blockdiag(D11 , . . . , Dkk ) Then the eigenvalues of zD−1 L + z1 D−1 U are independent of z ∈ C, z 6= 0. Proof. For z ∈ C, z 6= 0 define Gz := zD−1 L + z1 D−1 U. Note that 

0

 −1 zD22 A21   Gz = −    

1 −1 z D11 A12

0 .. . ∅

..

.

..

.

..

.

∅ .. . 0 zD−1 kk Ak,k−1



       1 −1  A D z k−1,k−1 k−1,k 0

Introduce Tz := blockdiag(I1 , zI2 , . . . , z k−1 Ik ) with Ii the ni × ni identity matrix. Now note that −1 −1 T−1 z Gz Tz = G1 = D L + D U This similarity transformation with Tz does not change the spectrum and thus σ(Gz ) = σ(D−1 L + D−1 U) holds for all z 6= 0. The latter spectrum is independent of z. We collect a few properties in the following lemma.

137

Lemma 6.4.2 Let A be as in lemma 6.4.1. Let CJ = I − D−1 A and Cω = I − ( ω1 D − L)−1 A be the iteration matrices of the Jacobi and SOR method, respectively. The following holds ξ ∈ σ(CJ ) ⇔ − ξ ∈ σ(CJ )

(a)

0 ∈ σ(Cω ) ⇒ ω = 1

(b)

For λ 6= 0, ω 6= 0 we have λ+ω−1 λ ∈ σ(Cω ) ⇔ ∈ σ(CJ ) 1 ωλ 2

(c)

ˆ := D−1 L and U ˆ := D−1 U we have CJ = L ˆ + U, ˆ Cω = (I − ω L) ˆ −1 (1 − Proof. With L  ˆ . From lemma 6.4.1 with z = 1 and z = −1 we have σ(L ˆ + U) ˆ = σ(−L ˆ − U) ˆ and thus ω)I + ω U ˆ + U) ˆ ⇔ − ξ ∈ σ(−L ˆ − U) ˆ = σ(L ˆ + U) ˆ holds, which proves (a). If 0 ∈ σ(Cω ) then we ξ ∈ σ(L have   ˆ −1 (1 − ω)I + ω U ˆ = det (1 − ω)I + ω U ˆ = (1 − ω)n 0 = det (I − ω L)

and thus ω = 1, i.e., the result in (b) holds. For λ ∈ σ(Cω ), λ 6= 0 and ω 6= 0 we have  ˆ −1 [(1 − ω)I + ω U ˆ − λ(I − ω L)] ˆ det(Cω − λI) = det (I − ω L)  1 1 ˆ + λ− 12 U) ˆ − (λ + ω − 1)I = det ωλ 2 (λ 2 L  1 1 ˆ + λ− 12 U) ˆ − λ + ω − 1I = ω n λ 2 n det (λ 2 L 1 ωλ 2 Using lemma 6.4.1 we get (for λ 6= 0, ω 6= 0): λ ∈ σ(Cω ) ⇔

λ+ω−1 ωλ

1 2

1 ˆ + λ− 12 U) ˆ = σ(CJ ) ∈ σ(λ 2 L

which proves the result in (c). Now we can prove a main result on the rate of convergence of the SOR method. Theorem 6.4.3 Let A be as in lemma 6.4.1 and CJ , Cω the iteration matrices of the Jacobi and SOR method, respectively. Assume that all eigenvalues of CJ are real and that µ := ρ(CJ ) < 1 (i.e. the method of Jacobi is convergent). Define ωopt The following holds ρ(Cω ) =

(

2 p =1+ := 1 + 1 − µ2 1 4

ωµ + ω−1

!2

(6.27)

for 0 < ω ≤ ωopt for ωopt ≤ ω < 2

(6.28)

µ p 1 + 1 − µ2

p 2 ω 2 µ2 − 4(ω − 1)

and ωopt − 1 = ρ(Cωopt ) < ρ(Cω ) < 1 holds. 138

for all ω ∈ (0, 2), ω 6= ωopt ,

(6.29)

ˆ := D−1 L and U ˆ := D−1 U. First we treat Proof. We only consider ω ∈ (0, 2). Introduce L the case where there exists ω ˜ ∈ (0, 2) such that ρ(Cω˜ ) = 0, i.e., Cω˜ = 0. This implies ω ˜ = 1, ˆ = 0, µ = 0 and ωopt = 1. From U ˆ = 0 we get Cω = (1 − ω)(I − ω L) ˆ −1 , which yields U ρ(Cω ) = |1 − ω|. One now easily verifies that for this case the results in (6.28) and (6.29) hold. We now consider the case with ρ(Cω ) > 0 for all ω ∈ (0, 2). Take λ ∈ σ(Cω ), λ 6= 0. From lemma 6.4.2 it follows that λ+ω−1 1

ωλ 2

= ξ ∈ σ(CJ ) ⊂ [−µ, µ]

A simple computation yields λ=

p 2 1 ω|ξ| ± ω 2 ξ 2 − 4(ω − 1) 4

(6.30)

We first consider ω with ωopt ≤ ω < 2. Then ω 2 ξ 2 − 4(ω − 1) ≤ ω 2 µ2 − 4(ω − 1) ≤ 0 and thus from (6.30) we obtain  1 |λ| = ω 2 ξ 2 − (ω 2 ξ 2 − 4(ω − 1) = ω − 1 4

Hence in this case all eigenvalues of Cω have modulus ω − 1 and this implies ρ(Cω ) = ω − 1, which proves the second part of (6.28). We now consider ω with 0 < ω ≤ ωopt and thus ω 2 µ2 − 4(ω − 1) ≥ 0. If ξ is such that ω 2 ξ 2 − 4(ω − 1) ≥ 0 then (6.30) yields |λ| =

p 2 1 ω|ξ| ± ω 2 ξ 2 − 4(ω − 1) 4

The maximum value is attained for the “+” sign and with the value ξ = ±µ, resulting in |λ| =

p 2 1 ωµ + ω 2 µ2 − 4(ω − 1) 4

(6.31)

There may be eigenvalues λ ∈ σ(Cω ) that correspond to ξ ∈ σ(CJ ) with ω 2 ξ 2 − 4(ω − 1) < 0. As shown above, this yields corresponding λ ∈ σ(Cω ) with |λ| = ω − 1. Due to p 2 1 1 ωµ + ω 2 µ2 − 4(ω − 1) ≥ ω 2 µ2 ≥ ω − 1 4 4

we conclude that the maximum value for |λ| is attained for the case (6.31) and thus ρ(Cω ) =

p 2 1 ωµ + ω 2 µ2 − 4(ω − 1) 4

which proves the first part in (6.28). An elementary computation shows that for ω ∈ (0, 2) the function ω → ρ(Cω ) as defined in (6.28) is continuous, monotonically decreasing on (0, ωopt ] and monotonically increasing on [ωopt , 2). Morover, both for ω ↓ 0 and ω ↑ 2 we have the function value 1. From this the result in (6.29) follows. In (6.27) we see that ωopt > 1 holds, which motivates the name ”over”-relaxation. Note that we do not require symmetry of the matrix A. However, we do assume that the eigenvalues of CJ are real. A sufficient condition for the latter to hold is that A is symmetric. For different values of µ the function ω → ρ(Cω ) defined in (6.28) is shown in figure 6.1. 139

1

0.9 mu=0.95

Spectral radius SOR iteration matrix

0.8

0.7 mu=0.9

0.6

0.5 mu=0.6 0.4

0.3

0.2

0.1

0

0.2

0.4

0.6

0.8

1 omega

1.2

1.4

1.6

1.8

2

Figure 6.1: Function ω → ρ(Cω )

Corollary 6.4.4 If we take ω = 1 then the SOR method is the same as the Gauss-Seidel method. Hence, if A satisfies the assumptions in theorem 6.4.3 we obtain from (6.28) ρ(C1 ) = µ2 = ρ(CJ )2 Thus − ln(ρ(C1 )) = −2 ln(CJ ), i.e., the asymptotic convergence rate of the Gauss-Seidel method is twice the one of the Jacobi method.  Assume that for µ = ρ(CJ ) we have µ = 1 − δ with δ ≪ 1. From Theorem 6.4.3 we obtain (provided A fulfills the conditions of this theorem)√thep following estimate related to the √ −2 2 convergence of the SOR method: ρ(Cωopt ) = (1 − δ) (1 + 2δ 1 − δ/2) = 1 − 2 2δ + O(δ). Hence the method of Jacobi has an asymptotic convergence rate − ln(µ) = − ln(1√− δ) ≈ δ√and the SOR method has an asymptotic convergence rate − ln(ρ(Cωopt )) ≈ − ln(1 − 2 2δ) ≈ 2 2δ. √ Note that for δ small we have 2 2δ ≫ δ, and thus the SOR method has a significantly higher rate of convergence than the method of Jacobi.

6.5

Convergence analysis for regular matrix splittings

We present a general convergence analysis for so called regular matrix splitting methods, due to Varga [92]. For this analysis we need some fundamental results on the largest eigenvalue of a positive matrix and its corresponding eigenvector. These results, due to Perron [71], are presented in section 6.5.1. In this section for B, C ∈ Rn×n we use the notation B ≥ C (B > C) iff bij ≥ cij (bij > cij ) for all i, j. The same ordering notation is used for vectors. For B ∈ Rn×n we define |B| = (|bij |)1≤i,j≤n and similarly for vectors. 140

6.5.1

Perron theory for positive matrices

For a matrix A ∈ Rn×n an eigenvalue λ ∈ σ(A) for which |λ| = ρ(A) holds is not necessarily real. If we assume A > 0 then it can be shown that ρ(A) ∈ σ(A) holds and, moreover, that the corresponding eigenvector is strictly positive. These and other related results, due to Perron [71], are given in lemma 6.5.2, theorem 6.5.3 and theorem 6.5.5. We start the analysis with an elementary lemma. Lemma 6.5.1 For B, C ∈ Rn×n the following holds 0 ≤ B ≤ C ⇒ ρ(B) ≤ ρ(C) Proof. From 0 ≤ B ≤ C we get 0 ≤ Bk ≤ Ck for all k. Hence, kBk k∞ ≤ kCk k∞ for all k. 1/k Recall that for arbitrary A ∈ Rn×n we have ρ(A) = limk→∞ kAk k∞ (cf. lemma 6.1.5). Using this we get ρ(B) ≤ ρ(C).

Lemma 6.5.2 Take A ∈ Rn×n with A > 0. For λ ∈ σ(A) with |λ| = ρ(A) and w ∈ Cn , w 6= 0, with Aw = λw the relation A|w| = ρ(A)|w| holds, i.e., ρ(A) is an eigenvalue of A. Proof. With these λ and w we have ρ(A)|w| = |λ||w| = |λw| = |Aw| ≤ |A||w| = A|w|

(6.32)

Assume that we have ” ρ(A) such that α|w| ≤ A|w| k |w| k ∞ ≥ αk and thus and thus Ak |w| ≥ αk |w| for all k ∈ N. This yields kAk k∞ ≥ kA k |w| k∞ 1/k

ρ(A) = limk→∞ kAk k∞ ≥ α, which is a contradiction with α > ρ(A). We conclude that in (6.32) equality must hold, i.e., A|w| = ρ(A)|w|.

Theorem 6.5.3 (Perron) For A ∈ Rn×n with A > 0 the following holds: ρ(A) > 0 is an eigenvalue of A

(6.33a)

There exists a vector v > 0 such that Av = ρ(A)v

(6.33b)

If Aw = ρ(A)w holds, then w ∈ span(v) with v from (6.33b)

(6.33c)

Proof. From lemma 6.5.2 we obtain that there exists w 6= 0 such that A|w| = ρ(A)|w|

(6.34)

holds. Thus ρ(A) is an eigenvalue of A. The eigenvector |w| from (6.34) contains at least one entry that is strictly positive. Due to this and A > 0 we have that A|w| > 0, which due to (6.34) implies ρ(A) > 0 and |w| > 0. From this the results in (6.33a) and (6.33b) follow. Assume that there exists x 6= 0 independent of v such that Ax = ρ(A)x. For arbitrary 1 ≤ k ≤ n define α = xvkk and z = x − αv. Note that zk = 0 and due to the assumption that x and v are independent we have z 6= 0. We also have Az = ρ(A)z. From lemma 6.5.2 we get A|z| = ρ(A)|z|, 141

which results in a contradicton, because (A|z|)k > 0 and ρ(A)(|z|)k = 0. Thus the result in (6.33c) is proved. The eigenvalue ρ(A) and corresponding eigenvector v > 0 (which is unique up to scaling) are called the Perron root and Perron vector. If instead of A > 0 we only assume A ≥ 0 then the results (6.33a) and (6.33b) hold with ”>” replaced by ”≥” as is shown in the following corollary. Clearly, for A ≥ 0 the result (6.33c) does not always hold (take A = 0). Corollary 6.5.4 For A ∈ Rn×n with A ≥ 0 the following holds: ρ(A) is an eigenvalue of A

(6.35a)

There exists a nonzero vector v ≥ 0 such that Av = ρ(A)v (6.35b)  Proof. For ε ∈ (0, 1] define Aε := aij + ε 1≤i,j≤n . From theorem 6.5.3 it follows that for λε := ρ(Aε ) there exists a vector vε > 0 such that Aε vε = λε vε holds. We scale vε such that kvε k∞ = 1. Then, for all ε this vector is contained in the compact set { x ∈ Rn | kxk∞ = 1 } =: S. Hence there exists a decreasing sequence 1 > ε1 > ε2 > . . ., with limj→∞ εj = 0 and limj→∞ vεj = v ∈ S. Thus v 6= 0 and from vεj > 0 for all j it follows that v ≥ 0. Note that 0 ≤ A ≤ Aεi ≤ Aεj for all i ≥ j. Using lemma 6.5.1 we get ρ(A) ≤ λεi ≤ λεi for all i ≥ j. From this it follows that lim λεi = λ ≥ ρ(A) (6.36) i→∞

Taking the limit i → ∞ in the equation Aεi vεi = λεi vεi yields Av = λv and thus λ is an eigenvalue of A with v 6= 0. This implies λ ≤ ρ(A). In combination with (6.36) this yields λ = ρ(A), which completes the proof. In the next theorem we present a few further results for the Perron root of a positive matrix. Theorem 6.5.5 (Perron) For A ∈ Rn×n with A > 0 the following holds: ρ(A) is a simple eigenvalue of A (note that this implies (6.33c))

(6.37a)

For all λ ∈ σ(A), λ 6= ρ(A), we have |λ| < ρ(A)

(6.37b)

No nonnegative eigenvector belongs to any other eigenvalue than ρ(A)

(6.37c)

Proof. We use the Jordan form A = TΛT−1 (cf. Appendix B) with Λ a matrix of the form Λ = blockdiag(Λi )1≤i≤s and   λi 1 ∅   .. ..   . .   ∈ Rki ×ki , 1 ≤ i ≤ s, Λi =   . .  . 1 ∅ λi

with all λi ∈ σ(A). Due to (6.33c) we know that the eigenspace corresponding to the eigenvalue ρ(A) is one dimensional. Thus there is only one block Λi with λi = ρ(A). Let the ordering of the blocks in Λ be such that the first block Λ1 corresponds to the eigenvalue λ1 = ρ(A). We will now show that its dimension must be k1 = 1. Let ej be the j-th basis vector in Rn and define t := Te1 , ˆt := T−T ek1 . From ATe1 = TΛe1 we get At = ρ(A)t and thus t is the Perron vector of A. This implies t > 0. Note that AT T−T ek1 = T−T ΛT ek1 and thus AT ˆt = ρ(A)ˆt. 142

Since AT > 0 this implies that ˆt is the Perron vector of AT and thus ˆt > 0. Using that both t and ˆt are strictly positive we get 0 < ˆtT t = eTk1 T−1 Te1 = eTk1 e1 . This can only be true if k1 = 1. We conclude that there is only one Jordan block corresponding to ρ(A) and that this block has the size 1 × 1, i.e., ρ(A) is simple eigenvalue. We now consider (6.37b). Let w ∈ Cn , w 6= 0, λ = eiφ ρ(A) (i.e., |λ| = ρ(A)) be such that Aw = λw. From lemma 6.5.2 we get that A|w| = ρ(A)|w| and from (6.33c) it follows that |w| > 0 holds. We introduce ψk , rk ∈ R, with rk > 0, such that wk = rk eiψk , 1 ≤ k ≤ n, and D := diag(eiψk )1≤k≤n . Then D|w| = w holds and thus AD|w| = Aw = λw = λD|w| = eiφ ρ(A)D|w| = eiφ DA|w| This yields  e−iφ D−1 AD − A |w| = 0

Consider the k-th row of this identity: n X j=1

 e−i(φ+ψk −ψj ) − 1 akj |wj | = 0

Due to akj |wj | > 0 for all j this can only be true if e−i(φ+ψk −ψj ) − 1 = 0 for all j = 1, . . . , n. We take j = k and thus obtain e−iφ = 1, hence λ = eiφ ρ(A) = ρ(A). This shows that (6.37b) holds. We finally prove (6.37c). Assume Aw = λw with a nonzero vector w ≥ 0 and λ 6= ρ(A) holds. Application of theorem 6.5.3 to AT implies that there exists a vector x > 0 such that AT x = ρ(AT )x = ρ(A)x. Note that xT Aw = λxT w and xT Aw = wT AT x = ρ(A)wT x. This implies (λ − ρ(A))wT x = 0 and thus, because wT x > 0, we obtain λ = ρ(A), which contradicts λ 6= ρ(A). This completes the proof of the theorem. From corollary 6.5.4 we know that for A ≥ 0 there exists an eigenvector v ≥ 0 corresponding to the eigenvalue ρ(A). Under the stronger assumption A ≥ 0 and irreducible (cf. Appendix B) this vector must be strictly positive (as for the case A > 0). This and other related results for nonnegative irreducible matrices are due to Frobenius [37]. Theorem 6.5.6 (Frobenius) Let A ∈ Rn×n be irreducible and A ≥ 0. Then the following holds: ρ(A) > 0 is a simple eigenvalue of A

(6.38a)

There exists a vector v > 0 such that Av = ρ(A)v

(6.38b)

No nonnegative eigenvector belongs to any other eigenvalue than ρ(A)

(6.38c)

Proof. Given in, for example, theorem 4.8 in Fiedler [34].

6.5.2

Regular matrix splittings

A class of special matrix splittings consists of so-called regular splittings. In this section we will discuss the corresponding matrix splitting methods. In particular we show that for a regular splitting the corresponding iterative method is convergent. We also show how basic iterative methods like the Jacobi or Gauss-Seidel method fit into this setting. 143

Definition 6.5.7 A matrix splitting A = M − N is called a regular splitting if M is regular,

M−1 ≥ 0

and M ≥ A

(6.39)

Recall that the iteration matrix of a matrix splitting method (based on the splitting A = M − N) is given by C = I − M−1 A = M−1 N. Theorem 6.5.8 Assume that A−1 ≥ 0 holds and that A = M − N is a regular splitting. Then ρ(C) = ρ(M−1 N) =

ρ(A−1 N) 0. Hence, in an M-matrix all diagonal entries are strictly positive. Another property that we will need further on is given in the following lemma. Lemma 6.5.11 Let A be an M-matrix. Assume that the matrix B has the properties bij ≤ 0 for all i 6= j and B ≥ A. Then B is an M-matrix, too. Furthermore, the inequalities 0 ≤ B−1 ≤ A−1 hold. Proof. Let DA := diag(A) and DB := diag(B). Because A is an M-matrix we have that DA is nonsingular and from B ≥ A it follows that DB is nonsingular, too. Note that NA := DA − A ≥ 0. We conclude that A = DA − NA is a regular splitting and from theo−1 rem 6.5.8 it follows that ρ(CA ) < 1 with CA := I − D−1 A A. Furthermore, with CB := I − DB B we have 0P≤ CB ≤ CA and thus ρ(CBP ) ≤ ρ(CA ) < 1 holds. Thus we have the representations k )D−1 and B−1 = ( ∞ Ck )D−1 . From the latter and C ≥ 0, D−1 ≥ 0 A−1 = ( ∞ C B k=0 B k=0 A B B A we obtain B−1 ≥ 0 and we can conclude that B in an M-matrix. The inequality B−1 ≤ A−1 −1 follows by using CB ≤ CA , D−1 B ≤ DA . There is an extensive literature on properties of M-matrices, cf. [12], [34]. A few results are given in the following theorem. Theorem 6.5.12 For A ∈ Rn×n the following results hold: (a) If A is irreducibly diagonally dominant and aii > 0 for all i, aij ≤ 0 for all i 6= j, then A is an M-matrix. (b) Assume that aij ≤ 0 for all i 6= j. Then A is an M-matrix if and only if all eigenvalues of A have positive real part. (c) Assume that aij ≤ 0 for all i 6= j. Then A is an M-matrix if A + AT is positive definite (this follows from (b)). (d) If A is symmetric positive definite and aij ≤ 0 for all i 6= j, then A is an M-matrix (this follows from (b)). (e) If A is a symmetric M-matrix then A is symmetric positive definite (this follows from (b)). (f ) If A is an M-matrix and B results from A after a Gaussian elimination step without pivoting, then B is an M-matrix, too (i.e. Gaussian elimination without pivoting preserves the M-matrix property). 145

Proof. A proof can be found in [12]. We now show that for M-matrices the Jacobi and Gauss-Seidel methods correspond to regular splittings. Recall the decomposition A = D − L − U. Theorem 6.5.13 Let A be an M-matrix. Then both MJ := D and MGS := D − L result in regular splittings. Furthermore ρ(I − (D − L)−1 A) ≤ ρ(I − D−1 A) < 1

(6.46)

holds. Proof. In the proof lemma 6.5.11 it is shown that the method of Jacobi corresponds to a regular splitting. For the Gauss-Seidel method note that MGS = D − L has only nonpositive off-diagonal entries and MGS − A = −U ≥ 0. From lemma 6.5.11 it follows that MGS is an M-matrix, hence M−1 GS ≥ 0 holds. Thus the Gauss-Seidel method corresponds to a regular splitting, too. Now note that NGS := U ≤ NJ := L + U holds and thus corollary 6.5.9 yields the result in (6.46). This result shows that for an M-matrix both the Jacobi and Gauss-Seidel method are convergent. Moreover, the asymptotic convergence rate of the Gauss-Seidel method is at least as high as for the Jacobi method. If A is the result of the discretization of an elliptic boundary value problem then often the arithmetic costs per iteration are comparable for both methods. In such cases the Gauss-Seidel method is usually more efficient than the method of Jacobi. The SOR method corresponds to a splitting A = Mω − N with Mω = ω1 D − L. If A is an M-matrix then for ω > 1 the matrix Mω − A has strictly positive diagonal entries and thus this is not a regular splitting. For ω ∈ (0, 1] one can apply the same arguments as in the proof of theorem 6.46 to show that for an M-matrix A the SOR method corresponds to a regular splitting, and −1 ρ(I − M−1 ω1 A) ≤ ρ(I − Mω2 A) < 1 for all 0 < ω2 ≤ ω1 ≤ 1

holds.

6.6

Application to scalar elliptic problems

In this section we apply basic iterative methods to discrete scalar elliptic model problems. We recall the weak formulation of the Poisson equation and the convection-diffusion problem: ( find u ∈ H01 (Ω) such that R R 1 Ω ∇u · ∇v dx = Ω f v dx for all v ∈ H0 (Ω) (

find u ∈ H01 (Ω) such that R R R ε Ω ∇u · ∇v dx + Ω b · ∇u v dx = Ω f v dx

for all v ∈ H01 (Ω)

with ε > 0 and b = (b1 , b2 ) with given constants b1 ≥ 0, b2 ≥ 0. We take Ω = (0, 1)2 . We 1 −k 2 , k = 1, 2, 3, 4. These use nested uniform triangulations with mesh size parameter h = 20 problems are discretized using the finite element method with piecewise linear finite elements. For the convection-diffusion problem, we use the streamline-diffusion stabilization technique (for 146

the convection-dominated case). The resulting discrete problems are denoted by (P) (Poisson problem) and (CD) (convection-diffusion problem). Example 6.6.1 (Model problem (P)) For the Poisson equation we obtain a stiffness matrix A that is symmetric positive definite and for which κ(D−1 A) = O(h−2 ) holds. In Table 6.1 we show the results for the method of Jacobi applied to this problem with different values of h. For the starting vector we take x0 = 0. We use the Euclidean norm k · k2 . By # we denote the number of iterations needed to reduce the norm of the starting error by a factor R = 103 . We observe that when we halve the mesh size h we need approximately four times as many h #

1/40 2092

1/80 8345

1/160 33332

1/320 133227

Table 6.1: Method of Jacobi applied to problem (P). iterations. This is in agreement with κ(D−1 A) = O(h−2 ) and the result in theorem 6.3.3.



We take a reduction factor R = 103 and consider model problem (P). Then the complexity of the method of Jacobi is cn2 flops (c depends on R but is independent of n). For model problem (P) there are methods that have complexity cnα with α < 2. In particular α = 1 21 for the SOR method, α = 1 14 for preconditioned Conjugate Gradient (chapter 7) and α = 1 for the multigrid method (chapter 9). It is clear that if n is large a reduction of the exponent α will result in a significant gain in efficiency, for example, for h = 1/320 we have n ≈ h−2 ≈ 105 and n2 ≈ 1010 . Also note that α = 1 is a lower bound because for one matrix-vector multiplication Ax we already need cn flops. Example 6.6.2 (Model problem (P)) In Table 6.2 we show results for the situation as described in example 6.6.1 but now for the Gauss-Seidel method instead of the method of Jacobi. For this model problem with R = 103 the Gauss-Seidel method has a complexity cn2 , which is h #

1/40 1056

1/80 4193

1/160 16706

1/320 66694

Table 6.2: Gauss-Seidel method applied to problem (P). of the same order of magnitude as for the method of Jacobi.



Example 6.6.3 (Model problem (CD)) It is important to note that in the Gauss-Seidel method the results depend on the ordering of the unknowns, whereas in the method of Jacobi the resulting iterates are independent of the ordering. We consider model problem (CD) with b1 = cos(π/6), b2 = sin(π/6). We take R = 103 and h = 1/160. Using an ordering of the grid points (and corresponding unknowns) from left to right in the domain (0, 1)2 we obtain the results as in Table 6.3. When we use the reversed node ordering then get the results shown in Table 6.4. These results illustrate a rather general phenomenon: if a problem is convection-dominated then for the Gauss-Seidel method it is advantageous to use a node ordering corresponding (as much as possible) to the direction in which information is transported.  147

ε #

100 17197

10−2 856

10−4 14

Table 6.3: Gauss-Seidel method applied to problem (CD). ε #

100 17220

10−2 1115

10−4 285

Table 6.4: Gauss-Seidel method applied to problem (CD).

Example 6.6.4 We consider the model problem (P) as in example 6.6.1, with h = 1/160. In Figure 6.2 for different values of the parameter ω we show the corresponding number of SOR iterations (#), needed for an error reduction with a factor R = 103 . The same experiment is performed for the model problem (CD) as in example 6.6.3 with h = 1/160, ε = 10−2 . The results are shown in Figure 6.3. Note that with a suitable value for ω an enormous reduction

5

10

4

10

3

10

2

10

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

Figure 6.2: SOR method applied to model problem (P). in the number of iterations needed can be achieved. Also note the rapid change in the number of iterations (#) close to the optimal ω value. 

148

4

10

3

10

2

10

1

10 0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

Figure 6.3: SOR method applied to model problem (CD)

149

1.9

150

Chapter 7

Preconditioned Conjugate Gradient method 7.1

Introduction

In this chapter we discuss the Conjugate Gradient method (CG) for the iterative solution of sparse linear systems with a symmetric positive definite matrix. In section 7.2 we introduce and analyze the CG method. This method is based on the formulation of the discrete problem as a minimization problem. The CG method is nonlinear and of a different type as the basic iterative methods discussed in chapter 3. The CG method is not suitable for solving strongly nonsymmetric problems, as for example a discretized convectiondiffusion problem with a dominating convection. Many variants of CG have been developed which are applicable to linear systems with nonsymmetric matrix. A few of these methods are treated in chapter 8. In the CG method and in the variants for nonsymmetric problems the resulting iterates are contained in a so-called Krylov subspace, which explains the terminology ”Krylov subspace methods”. A detailed treatment of these Krylov subspace methods is given in Saad [78]. An important concept related to all these Krylov subspace methods is the so-called preconditioning technique. This will be explained in section 7.3.

7.2

Conjugate Gradient method

In section 2.4 it is shown that to a variational problem with a symmetric elliptic bilinear form there corresponds a canonical minimization problem. Similarly, to a linear system with a symmetric positive definite matrix there corresponds a natural minimization problem. We consider a system of equations Ax = b (7.1) with A ∈ Rn×n symmetric positive definite. The unique solution of this problem is denoted by x∗ . In this chapter we use the notation hy, xi = yT x ,

1

2 , hy, xiA = yT Ax , kxkA = hx, xiA

(x, y ∈ Rn ) .

(7.2)

Since A ist symmetric positive definite the bilinear form h·, ·iA defines an inner product on Rn . This inner product is called the A-inner product or energy inner product. We define the functional F : Rn → R by 1 F (x) = hx, Axi − hx, bi . 2 151

(7.3)

For this F we have DF (x) = ∇F (x) = Ax − b

and

D 2 F (x) = A .

So F is a quadratic functional with a second derivative (Hessian) which is positive definite. Hence F has a unique minimizer and the gradient of F is equal to zero at this minimizer. Thus we obtain: min{F (x) | x ∈ Rn } = F (x∗ ) , (7.4)

i.e. minimization of the functional F yields the unique solution x∗ of the system in (7.1). This result is an analogon of the one discussed for symmetric bilinear forms in section 2.4. In this section we consider two methods that are based on (7.4) and in which first certain search directions are determined and then a line search is applied. Such methods are of the following form:  0 x a given starting vector, (7.5) xk+1 = xk + αopt (xk , pk )pk , k ≥ 0 . In (7.5), pk 6= 0 is the search direction at xk and αopt (xk , pk ) is the optimal steplength at xk in the direction pk . The vector Axk − b = ∇F (xk ) is called the residual (at xk ) and denoted by rk := Axk − b.

(7.6)

From the definition of F we obtain the identity 1 F (xk + αpk ) = F (xk ) + αhpk , Axk − bi + α2 hpk , Apk i =: ψ(α) . 2 The function ψ : R → R is quadratic and ψ ′′ (α) > 0 holds. So ψ has a unique minimum at αopt iff ψ ′ (αopt ) = 0. This results in the following formula for αopt : αopt (xk , pk ) := −

hpk , rk i . hpk , Apk i

(7.7)

For the residuals we have the recursion rk+1 = rk + αopt (xk , pk )Apk ,

k ≥ 0.

(7.8)

For ψ ′ (0), i.e. the derivative of F at xk in the direction pk , we have ψ ′ (0) = hpk , rk i. The direction pk with kpk k2 = 1 for which the modulus of this derivative is maximal is given by pk = rk /krk k2 . This follows from |ψ ′ (0)| = |hpk , rk i| ≤ kpk k2 krk k2 , in which we have equality only if pk = γrk (γ ∈ R). The sign and length of pk are irrelevant because the ”right sign” and the ”optimal length” are determined by the steplength parameter αopt . With the choice pk = rk we obtain the Steepest Descent method: ( 0 x a given starting vector, k ,rk i k+1 k x = xk − hrhrk ,Ar k r . i In general the Steepest Descent method converges only slowly. The reason for this is already clear from a simple example with n = 2. We take   λ1 0 , 0 < λ1 < λ2 , b = (0, 0)T (hence, x∗ = (0, 0)T ). A= 0 λ2 152

x0 • x1 •

x3 • •

•x2

Figure 7.1: Steepest Descent method

The function F (x1 , x2 ) = 12 hx, Axi − hx, bi = 12 (λ1 x21 + λ2 x22 ) has level lines Nc = {(x1 , x2 ) ∈ R2 | F (x1 , x2 ) = c} which are ellipsoids. Assume that λ2 ≫ λ1 holds (so κ(A) ≫ 1). Then the ellipsoids are stretched in x1 -direction as is shown in Figure 7.1, and convergence is very slow. We now introduce the CG method along similar lines as in Hackbusch [48]. To be able to formulate the weakness of the Steepest Descent method we introduce the following notion of optimality. Let V be a subspace of Rn .  y is called optimal for the subspace V if (7.9) F (y) = minz∈V F (y + z) So y is optimal for V if on the hyperplane y+V the functional F is minimal at y. Assume P a given y and subspace V . Let d1 , . . . , ds be a basis of V and for c ∈ Rs define g(c) = F (y + si=1 ci di ). Then y is optimal for V iff ∇g(0) = 0 holds. Note that ∂g (0) = h∇F (y), di i = hAy − b, di i ∂ci Hence we obtain the following: y optimal for V



hdi , Ay − bi = 0 for i = 1, . . . , s .

(7.10)

In the Steepest Descent method we have pk = rk . From (7.7) and (7.8) we obtain hrk , rk+1 i = hrk , rk i −

hrk , rk i hrk , Ark i = 0 . hrk , Ark i

Using (7.10) we conclude that in the Steepest Descent method xk+1 is optimal for the subspace span{pk }. This is also clear from Fig. 7.1 : for example, x3 is optimal for the subspace spanned by the search direction x3 − x2 . From Fig. 7.1 it is also clear that xk is not optimal for the subspace spanned by all previous search directions. For example x3 can be improved in the search direction p1 = γ(x2 − x1 ): for α = αopt (x3 , p1 ) we have F (x4 ) = F (x3 + αp1 ) < F (x3 ). Now consider a start with p0 = r0 and thus x1 = x0 + αopt (x0 , p0 )p0 (as in Steepest Descent). We assume that the second search direction p1 is chosen such that hp1 , Ap0 i = 0 holds. Due to 153

the fact that A is symmetric positive definite we have that p1 and p0 are independent. Define x2 = x1 + αopt (x1 , p1 )p1 . Note that now hp0 , b − Ax2 i = 0 and also hp1 , b − Ax2 i = 0 and thus (cf. 7.10) x2 is optimal for span{p0 , p1 }. For the special case n = 2 as in the example shown in figure 7.1 we have span{p0 , p1 } = R2 . Hence x2 is optimal for R2 which implies x2 = x∗ ! This is illustrated in Fig. 7.2. We have constructed search directions p0 , p1 and an iterand x2 such that x2 is optimal for the x0 • x1 • •

x2

Figure 7.2: Conjugate Gradient method two-dimensional subspace span{p0 , p1 }. This leads to the basic idea behind the Conjugate Gradient (CG) method: we shall use search directions such that xk is optimal for the k-dimensional subspace span{p0 , p1 , . . . , pk−1 }. In the Steepest Descent method the iterand xk is optimal for the one-dimensional subspace span{pk−1 }. This difference results in much faster convergence of the CG iterands as compared to the iterands in the Steepest Descent method. We will now show how to construct appropriate search directions such that this optimality property holds. Moreover, we derive a method for the construction of these search directions with low computational costs. As in the Steepest Descent method, we start with p0 = r0 and x1 as in (7.5). Recall that x1 is optimal for span{p0 }. Assume that for a given k with 1 ≤ k < n, linearly independent search directions p0 , ..., pk−1 are given such that xk as in (7.5) is optimal for span{p0 , ..., pk−1 }. We introduce the notation Vk = span{p0 , ..., pk−1 }

and assume that xk 6= x∗ , i.e., rk 6= 0 (if xk = x∗ we do not need a new search direction). We will show how pk can be taken such that xk+1 , defined as in (7.5), is optimal for span{p0 , p1 , ..., pk } =: Vk+1 . We choose pk such that pk ⊥A Vk ,

i.e. pk ∈ Vk⊥A

(7.11)

holds. This A-orthogonality condition does not determine a unique search direction pk . The Steepest Descent method above was based on the observation that rk = ∇F (xk ) is the direction of steepest descent at xk . Therefore we use this direction to determine the new search direction. A unique new search direction pk is given by the following: pk ∈ Vk⊥A such that kpk − rk kA = min kp − rk kA ⊥A

p∈Vk

154

(7.12)

The definition of p1 is illustrated in Fig. 7.3.

V1 6     

1 3r        ⊥A   V 1    p1  

V1 = span{p0 }.

Figure 7.3: Definition of a search direction in CG Note that pk is the A-orthogonal projection of rk on Vk⊥A . This yields the following formula for the search direction pk : pk = rk −

k−1 k−1 X X hpj , rk iA j hpj , Ark i j k p . p = r − hpj , pj iA hpj , Apj i j=0 j=0

(7.13)

We assumed that xk is optimal for Vk and that rk 6= 0. From the former we get that hpj , rk i = 0 for j = 0, . . . , k − 1, i.e., rk ⊥ Vk (note that here we have ⊥ and not ⊥A ). Using rk 6= 0 we conclude that rk ∈ / Vk and thus from (7.13) it follows that pk ∈ / Vk . Hence, pk is linearly 0 k−1 independent of p , . . . , p and Vk+1 = span{p0 , . . . , pk }

has dimension k + 1.

(7.14)

Given this new search direction the new iterand is defined by xk+1 = xk + αopt (xk , pk )pk

(7.15)

with αopt as in (7.7). Using the definition of αopt we obtain hpk , b − Axk+1 i = −hpk , rk i + αopt (xk , pk )hpk , Apk i = 0. Due to (7.11) and the optimality of xk for the subspace Vk (cf. also (7.10)) we have for j < k hpj , b − Axk+1 i = hpj , b − Axk i − αopt (xk , pk )hpj , Apk i = 0. Using (7.10) we conclude that xk+1 is optimal for the subspace Vk+1 ! The search directions pk defined as in (7.13) (p0 := r0 ) and the iterands as in (7.15) define the Conjugate Gradient method. This method is introduced in Hestenes and Stiefel [51]. We now derive some important properties of the CG method. Theorem 7.2.1 Let x0 ∈ Rn be given and m < n be such that for k = 0, 1, . . . , m we have xk 6= x∗ and pk , xk+1 as in (7.13), (7.15). Define Vk = span{p0 , . . . , pk−1 } (0 ≤ k ≤ m + 1). 155

Then the following holds for all k = 1, . . . , m + 1: dim(Vk ) = k k

(7.16a)

0

x ∈ x + Vk

(7.16b)

k

0

F (x ) = min{F (x) | x ∈ x + Vk } 0

k−1

Vk = span{r , ..., r j

k

j

k

hp , r i = 0, hr , r i = 0, k

k

0

(7.16c) 0

} = span{r , Ar , ..., A

for all j = 0, 1, ..., k − 1

for all j = 0, 1, ..., k − 1 k−1

p ∈ span{r , p

}

(for k ≤ m)

k−1 0

r }

(7.16d) (7.16e) (7.16f) (7.16g)

Proof. The result in (7.16a) is shown in the derivation of the method, cf. (7.14). The result in (7.16b) can be shown by induction using xk = xk−1 + αopt (xk−1 , pk−1 )pk−1 . The construction of the search directions and new iterands in the CG method is such that xk is optimal for Vk , i.e., F (xk ) = min{F (xk + w) | w ∈ Vk }. Using xk ∈ x0 + Vk this can be rewritten as F (xk ) = min{F (x0 + w) | w ∈ Vk } which proves the result in (7.16c). We introduce the notation Rk = span{r0 , . . . , rk−1 } and prove Vk ⊂ Rk by induction. For k = 1 this holds due to p0 = r0 . Assume that it holds for some k ≤ m. Since Vk+1 = span{Vk , pk } and Vk ⊂ Rk ⊂ Rk+1 , we only have to show pk ∈ Rk+1 . From (7.13) it follows that pk ∈ span{p0 , . . . , pk−1 , rk } = span{Vk , rk } ⊂ Rk+1 , which completes the induction argument. Using dim(Vk ) = k it follows that that Vk = Rk must hold. Hence the first equality in (7.16d) is proved. We introduce the notation Wk = span{r0 , Ar0 , ..., Ak−1 r0 } and prove Rk ⊂ Wk by induction. For k = 1 this is trivial. Assume that for some k ≤ m, Rk ⊂ Wk holds. Due to Rk+1 = span{Rk , rk } and Rk ⊂ Wk ⊂ Wk+1 we only have to show rk ∈ Wk+1 . Note that rk = rk−1 + αopt (xk−1 , pk−1 )Apk−1 and rk−1 ∈ Rk ⊂ Wk ⊂ Wk+1 , Apk−1 ∈ AVk = ARk ⊂ AWk ⊂ Wk+1 . Thus rk ∈ Wk+1 holds, which completes the induction. Due to dim(Rk ) = k it follows that Rk = Wk must hold. Hence the second equality in (7.16d) is proved. The search directions and iterands are such that xk is optimal for Vk = span{p0 , . . . , pk−1 }. From (7.10) we get hpj , rk i = 0 for j = 0, . . . , k − 1 and thus (7.16e) holds. Due to Vk = span{r0 , ..., rk−1 } this immediately yields (7.16f), too. To prove (7.16g) we use the formula (7.13). Note that rj+1 = rj + αopt (xj , pj )Apj and thus Apj ∈ span{rj+1 , rj }. From this and (7.16f) it follows that for j ≤ k − 1 we have hpj , rk iA = hApj , rk i = 0. Thus in the sum in (7.13) all terms with j ≤ k − 2 are zero. The result in (7.16g) is very important for an efficient implementation of the CG method. Combining this result with the formula given in (7.13) we immediately obtain that in the summation in (7.13) there is only one nonzero term, i.e. for pk we have the formula pk = rk −

hpk−1 , Ark i pk−1 . hpk−1 , Apk−1 i

(7.17)

From (7.17) we see that we have a simple and cheap two term recursion for the search directions in the CG method. Combination of (7.5),(7.7),(7.8) and (7.17) results in the following CG 156

algorithm:  0 x a given starting vector; r0 = Ax0 − b        for k ≥ 0 (if rk 6= 0) :     k−1 k i k−1 pk = rk − hphrk−1,Ap ( if k = 0 then p0 := r0 ) k−1 p ,Ap i    k k   k+1 = xk + α k , pk )pk k , pk ) = − hp ,r i  x (x with α (x opt opt  k hp ,Apk i     k+1 r = rk + αopt (xk , pk )Apk

(7.18)

Some manipulations result in the following alternative formulas for pk and αopt : hrk , rk i pk−1 , hrk−1 , rk−1 i hrk , rk i . αopt (xk , pk ) = hpk , Apk i pk = −rk +

(7.19)

Using these formulas in (7.18) results in a slightly more efficient algorithm. The subspace Vk = span{r0 , Ar0 , ..., Ak−1 r0 } in (7.16d) is called the Krylov subspace of dimension k corresponding to r0 , denoted by Kk (A; r0 ). The CG method is of a different type as the basic iterative methods discussed in chapter 6. One important difference is that the CG method is nonlinear. The error propagation xk+1 − x∗ = Ψ(xk − x∗ ) is determined by a nonlinear function Ψ and thus there does not exist an error iteration matrix (as in the case of basic iterative methods) which determines the convergence behaviour. Related to this, in the CG method we often observe a phenomenon called superlinear convergence. This type of convergence behaviour is illustrated in Example 7.2.3. For a detailed analysis of this phenomenon we refer to Van der Sluis and Van der Vorst [90]. Another difference between CG and basic iterative methods is that the CG method yields the exact solution x∗ in at most n iterations. This follows from the property in (7.16c). However, in practice this will not occur due the effect of rounding errors. Moreover, in practical applications n is usually very large and for efficiency reasons one does not want to apply n CG iterations. We now discuss the arithmetic costs per iteration and the rate of convergence of the CG method. If we use the CG algorithm with the formulas in (7.19) then in one iteration we have to compute one matrix-vector multiplication, two inner products and a few vector updates, i.e. (if A is a sparse matrix) we need cn flops. The costs per iteration are of the same order of magnitude as for the Jacobi, Gauss-Seidel and SOR method. With respect to the rate of convergence of the CG method we formulate the following theorem. Theorem 7.2.2 Define Pk∗ := { p ∈ Pk | p(0) = 1 }. Let xk , k ≥ 0 be the iterands of the CG method and ek = xk − x∗ . The following holds: kek kA = min∗ kpk (A)e0 kA

(7.20)

pk ∈Pk

≤ min∗ max |pk (λ)| ke0 kA pk ∈Pk λ∈σ(A)

≤2

p

p

κ(A) − 1

κ(A) + 1 157

!k

ke0 kA

(7.21) (7.22)

Proof. From (7.16b) we get ek ∈ e0 + Vk . And due to Vk = span{r0 , . . . , rk−1 } and (7.16f) we have Aek = rk ⊥ Vk and thus ek ⊥A Vk . This implies kek kA = minvk ∈Vk ke0 − vk kA . Note that vk ∈ Vk can be represented as vk =

k−1 X

ξj Aj r0 =

k−1 X

ξj Aj+1 e0

j=0

j=0

Hence, k

0

ke kA = min ke − ξ∈Rk

k−1 X j=0

ξj Aj+1 e0 kA = min∗ kpk (A)e0 kA pk ∈Pk

This proves the result in (7.20). The result in (7.21) follows from kpk (A)e0 kA ≤ kpk (A)kA ke0 kA = max |pk (λ)|ke0 kA λ∈σ(A)

Let I = [λmin , λmax ] with λmin and λmax the extreme eigenvalues of A. From the results above we have kek kA ≤ min∗ max |pk (λ)|ke0 kA pk ∈Pk λ∈I

The min-max quantity in this upper bound can be analyzed using Chebychev polynomials, defined by T0 (x) = 1, T1 (x) = x, Tm+1 (x) = 2xTm (x) − Tm−1 (x) for m ≥ 1. These polynomials have the representation p p k −k i 1h x + x2 − 1 + x + x2 − 1 Tk (x) = (7.23) 2 and for any interval [a, b] with b < 1 they have the following property max |pk (x)| = 1/Tk

min

pk ∈Pk ,pk (1)=1 x∈[a,b]

We introduce qk (x) = pk (1 − x) and then get min max |pk (λ)| = min∗

pk ∈Pk∗ λ∈I

pk ∈Pk x∈[1−λmax ,1−λmin ]

=

min

|pk (1 − x)|

max

qk ∈Pk ,qk (1)=1 x∈[1−λmax ,1−λmin ]

= 1/Tk Using the representation (7.23) we get Tk

max

2 − a − b b−a

κ(A) + 1  1  κ(A) + 1 + ≥ κ(A) − 1 2 κ(A) − 1

|qk (x)|

λmax + λmin  κ(A) + 1  = 1/Tk λmax − λmin κ(A) − 1

s

k 1  pκ(A) + 1 k κ(A) + 1 2 p −1 = κ(A) − 1 2 κ(A) − 1

which then yields the bound in (7.22). So if we measure errors using the A-norm and neglect thepfactor 2 in (7.22), it follows that p on average per iteration the error is reduced by a factor ( κ(A) − 1)/( κ(A) + 1). In this bound one can observe a clear relation between κ(A) and the rate of convergence of the CG method: a larger condition number results in a lower rate of convergence. For κ(A) ≫ 1 the p reduction factor is of the form 1 − 2/ κ(A), which is significantly better than the bounds for 158

the contraction numbers of the Richardson and (damped) Jacobi methods which are of the form 1 − c/κ(A). For the case κ(A) ∼ ch−2 the latter takes the form 1 − c˜h2 , whereas for CG we have an (average) reduction factor 1 − c˜h. Often the bound in (7.22) is rather pessimistic because the phenomenon of superlinear convergence is not expressed in this bound. For a further theoretical analysis of the CG method we refer to Axelsson and Barker [7], Golub and Van Loan [41] and Hackbusch [48]. Example 7.2.3 (Poisson model problem) We apply the CG method to the discrete Poisson equation from section 6.6. First we discuss the complexity of the CG method for this model problem. In this case we have κ(A) ≈ ch−2 . Using (7.22) it follows that (in the A-norm) the error is reduced with approximately a factor p κ(A) − 1 ≈ 1 − 2c−1 h (7.24) ρˆ := p κ(A) + 1

per iteration. The arithmetic costs are cˆn flops per iteration. So for a reduction of the error with 1 a factor R we need approximately − ln R/ ln(1 − 2c−1 h)ˆ cn ≈ c˜h−1 n ≈ c˜n1 2 flops. We conlude that the complexity is of the same order of magnitude as for the SOR method with the optimal value for the relaxation parameter. However, note that, opposite to the SOR method, in the CG method we do not have the problem of chosing a suitable parameter value. In Table 7.1 we show results which can be compared with the results in section 6.6. We use the Euclidean norm and # denotes the number of iterations needed to reduce the starting error with a factor R = 103 . h #

1/40 65

1/80 130

1/160 262

1/320 525

Table 7.1: CG method applied to Poisson equation.

In figure 7.4 we illustrate the phenomenon of superlinear convergence in the CG method. For the case h = 1/160 we show the actual error reduction in the A-norm, i.e. kxk − x∗ kA kxk−1 − x∗ kA p p in the first 250 iterations. The factor ( κ(A)−1)/( κ(A)+1) has the value ρˆ = 0.96 (horizontal line in figure 7.4). There is a clear decreasing tendency of ρˆk during the iteration process. For large values of k, ρˆk is significantly smaller than ρˆ. Finally, we note that an irregular convergence behaviour as in figure 7.4 is typical for the CG method.  ρˆk :=

7.3

Introduction to preconditioning

In this section we consider the general concept of preconditioning and discuss a few preconditioning techniques. Consider a (sparse) system Ax = b, A ∈ Rn×n (not necessarily symmetric positive definite), for which an approximation W ≈ A is available with the following properties: Wx = y can be solved with ”low” computational costs (cn flops). κ(W

−1

A) < κ(A).

(7.25a) (7.25b)

159

1

0.95

0.9

0.85

0.8

0.75

0.7

0.65

0

50

100

150

200

250

Figure 7.4: Error reduction of CG applied to Poisson problem .

An approximation W with these properties is called a preconditioner for A. In (7.25a) it is implicitly assumed that the matrix W does not contain many more nonzero entries than the matrix A, i.e. W is a sparse matrix, too. In the sections below three popular techniques for constructing preconditioners will be explained. In section 7.7 results of numerical experiments are given which show that using an appropriate preconditioner one can improve the efficiency of an iterative method significantly. The combination of a given iterative method (e.g. CG) with a preconditioner results in a so-called preconditioned iterative method (e.g. PCG in section 7.7). As an introductory example, to explain the basic idea of preconditioned iterative methods, we assume that both A and W are symmetric positive definite and show how the basic Richardson iterative method (which is not used in practice) can be combined with a preconditioner W. We consider the Richardson method with parameter value ω = 1/ρ(A), i.e.: xk+1 = xk − ω(Axk − b)

with ω :=

1 ρ(A)

(7.26)

For the iteration matrix C of this method we have ρ(C) = ρ(I − θA) = max{|1 − λ/λmax (A)| | λ ∈ σ(A)} 1 min (A) = 1 − λλmax (A) = 1 − κ(A)

(7.27)

When we apply the same method to the preconditioned system ˆ =b ˆ , Ax

ˆ := W−1 A , A

ˆ := W−1 b, b

we obtain ˆ = xk − ω ˆ k − b) ˆ xk+1 = xk − ω ˆ (Ax ˆ W−1 (Axk − b) with ω ˆ = 1/ρ(A). 160

(7.28)

This method is called the preconditioned Richardson method. Note that if we assume that ˆ is known, then we do not need the preconditioned matrix A ˆ in this (an estimate of) ρ(A) k k −1 method. In (7.28) we have to compute z := W (Ax − b), i.e., Wz = Ax − b. Due to the condition in (7.25a) z can be computed with acceptable arithmetic costs. For the spectral radius ˆ of the preconditioned method we obtain, using σ(A) ˆ = σ(W−1 A) = of the iteration matrix C − 12 − 12 σ(W AW ) ⊂ (0, ∞), ˆ = ρ(I − ω ˆ = max{|1 − λ/λmax (A)| ˆ | λ ∈ σ(A)} ˆ ρ(C) ˆ A) =1−

ˆ λmin (A) ˆ λmax (A)

=1−

1 ˆ κ(A)

.

(7.29)

ˆ ≪ ρ(C) From (7.27) and (7.29) we conclude that if κ(W−1 A) ≪ κ(A) (cf. (7.25b)), then ρ(C) and the convergence of the preconditioned method will be much faster than for the original one. Note that for W = diag(A) the preconditioned Richardson method coincides with the damped Jacobi method.

7.4

Preconditioning based on a linear iterative method

In this section we explain how a preconditioner W can be obtained from a given (basic) linear iterative method. Recall the general form of a linear iterative method: xk+1 = xk − M−1 (Axk − b).

(7.30)

If one uses this iterative method for preconditioning then W := M is taken as the preconditioner for A. If the method (7.30) converges then W is a reasonable approximation for A in the sense that ρ(I − W−1 A) < 1. The iteration in (7.30) corresponds to an iterative method and thus M−1 y (y ∈ Rn ) can be computed with acceptable arithmetic costs. Hence the condition in (7.25a), with W = M, is satisfied. Related to the implementation of such a preconditioner we note the following. In an iterative method the matrix M is usually not used in its implementation (cf. Gauss-Seidel or SOR), i.e. the iteration (7.30) can be implemented without explicitly computing M. The solution of Wx = y, i.e. of Mx = y, is the result of (7.30) with k = 0, x0 = 0, b = y. From this it follows that the computation of the solution of Wx = y can be implemented by performing one iteration of the iterative method applied to Az = y with starting vector 0. A bound for κ(W−1 A) (cf. (7.25b)) is presented in the following lemma. Lemma 7.4.1 We assume that A and M are symmetric positive definite matrices and that the method in (7.30) is convergent, i.e ρ(I − M−1 A) < 1. Then the following holds: κ(M−1 A) ≤

1 + ρ(C) 1 − ρ(C)

(7.31)

Proof. Because A and M are symmetric positive definite it follows that 1

1

σ(M−1 A) = σ(M− 2 AM− 2 ) ⊂ (0, ∞) . Using ρ(I − M−1 A) < 1 we obtain that σ(M−1 A) ⊂ (0, 2). The eigenvalues of M−1 A are denoted by µi : 0 < µ1 ≤ µ2 ≤ . . . ≤ µn < 2 . 161

Hence ρ(C) = max{|1 − µ1 |, |1 − µn |} holds and κ(M−1 A) =

µn 1 + |1 − µn | 1 + (µn − 1) ≤ . = µ1 1 − (1 − µ1 ) 1 − |1 − µ1 |

So κ(M−1 A) ≤

1 + ρ(C) 1 − ρ(C)

holds.  1+x With respect to the bound in (7.31) we note that the function x → 1−x increases monotonically on [0, 1). In the introductory example above we have seen that it is favourable to have a small value for κ(M−1 A). In (7.31) we have a bound on κ(M−1 A) that decreases if ρ(C) decreases. This indicates that the higher the convergence rate of the iterative method in (7.30), the better the quality of M as a preconditioner for A. Example 7.4.2 (Discrete Poisson equation) We consider the matrix A resulting from the finite element discretization of the Poisson equation as described in section 6.6. If we use the method of Jacobi, i.e. M = D then ρ(C) ≈ 1 − ch2 holds and (7.31) results in
k are computed. Finally note that in the kth step of the elimination process the entries amj , with 1 ≤ m ≤ k and j ≥ m do not change; these are the entries umj (1 ≤ m ≤ k, j ≥ m) of the matrix U. Another possible implementation of Gaussian elimination is based on solving the n2 equations A = LU for the n2 unknowns (lij )1≤j ˜ k = −˜rk +  k−1 (if k = 0 : p0 := z0 )  pk := −zk + 1 then lj = uj−1  j−1  (8.16) uj = αj − lj βj−1 ,     pj = u1j (qj − βj−1 pj−1 ),       xj = xj−1 + ξj pj ,       ˜ j+1 := q ˜ j+1 − αj qj , q βj := k˜ qj+1 k,      qj+1 := q ˜ j+1 /β . j

This algorithm for computing the solution xk of (8.2) (or (8.3)) has about the same computational costs as the CG algorithm presented in section 7.2.

In the derivation of the Lanczos iterative solution method (8.16) the following ingredients are important: An orthogonal basis of the Krylov subspace can be computed with low costs, using the Lanczos method (8.4).

(8.17)

As an approximation of the original system, the “projected” much smaller system Tk yk = kr0 ke1 in (8.10a) is solved.

(8.18)

The computation of the orthogonal basis in (8.17) and the solution of the projected system in (8.18) can be implemented in such a way, that we only need simple update formulas k − 1 → k.

(8.19)

In the derivation of the projected system (8.10a) the fact that we have an orthogonal basis plays a crucial role. The approach discussed above is a starting point for the development of methods which can be used in cases where A is not symmetric positive definite. In generalizing this appraoch to systems in which A is not symmetric positive definite one encounters the following two major 179

difficulties: If A is not symmetric positive definite, then an A-inner product does not exist, and thus the problem (8.2) does not make sense,

(8.20)

and if A is not symmetric, then an orthogonal basis of K k (A, r0 ) can not be computed with low computational costs.

(8.21)

In section 8.3 we consider the case that matrix A is not positive definite, but still symmetric (i.e. symmetric indefinite). Then we can still use the Lanczos method to compute, in a cheap way, an orthogonal basis of the Krylov subspace. To deal with the problem formulated in (8.20) one can replace the error minimization in the A-norm in (8.2) by a residual minimization in the euclidean norm, i.e. minimize kAx − bk over the space x0 + Kk (A; r0 ). For every nonsingular matrix A this residual minimization problem has a unique solution. Furthermore, as will be shown in section 8.3, this residual minimization problem can be solved with low computational costs if an orthogonal basis of the Krylov subspace is available. A well-known method for solving symmetric indefinite problems, which is based on using the Lanczos method (8.4) for computing the solution of the residual minimization problem is the MINRES method. In section 8.4 and Section 8.5 we assume that the matrix A is not even symmetric. Then both the problem formulated in (8.20) and the problem formulated in (8.21) arise. We can deal with the problem in (8.20) as in the MINRES method, i.e. we can use residual minimization in the euclidean norm instead of error minimization in the A-norm. It will turn out that, just as for the symmetric indefinite case, this residual minimization problem can be solved with low costs if an orthogonal basis of the Krylov subspace is available. However, due to the nonsymmetry (cf. (8.21)), for computing such an orthogonal basis we now have to use a method which is computationally much more expensive than the Lanczos method. An important method which is based on the idea of computing an orthogonal basis of the Krylov subspace and using this basis to solve the residual minimization problem is the GMRES method. We discuss this method in section 8.4. Another important class of methods for solving nonsymmetric problems is treated in section 8.5. In these methods one does not compute the solution of an error or residual minimization problem (as is done in CG, MINRES, GMRES). Instead one tries to determine xk ∈ x0 + Kk (A; r0 ) which satisfies an orthogonality condition similar to the one in (8.3). It turns out that using this approach one can avoid the expensive computation of an orthogonal basis of the Krylov subspace. The main example from this class is the Bi-CG method. The Bi-CG method has lead to many variants. A few popular variants are considered in section 8.5, too.

8.3

MINRES method

In this section we discuss the MINRES method (”Minimal Residual”) which can be used for problems with A symmetric and (possibly) indefinite. The method is introduced in Paige and Saunders [70]. For symmetric A the Lanczos method in (8.4) can be used to find, with low computational costs, an orthogonal basis q1 , q2 , ..., qk of the Krylov space Kk (A, r0 ), k = 1, 2, . . .. The recursion in (8.4) can be rewritten as Aqj = βj−1 qj−1 + αj qj + βj qj+1 , 180

(8.22)

and thus



α1

β1

  β1    AQk = Qk+1     

α2 .. .

..

.

..

.

..

.

.

..

.

..





βk−1 ∅

βk−1 αk βk

      =: Qk+1 Tk .    

(8.23)

Note that Tk is a (k + 1) × k matrix. Due to the orthogonality of the basis we have QTk Qk = Ik ,

QTk qk+1 = 0 .

(8.24)

The MINRES method, introduced in Paige and Saunders [70], is based on the following residual minimization problem: ( Given x0 ∈ Rn , determine xk ∈ x0 + Kk (A; r0 ) such that (8.25) kAxk − bk = min{ kAx − bk | x ∈ x0 + Kk (A; r0 ) } , where r0 := Ax0 − b. Note that the Euclidean norm is used and that for any regular A this minimization problem has a unique solution xk , which is illustrated in figure 8.2. Clearly, we have a projection: rk = Axk − b is the projection (with respect to h·, ·i) of r0 Ax0 − b

      

3         -



R 

Axk − b





R = A(Kk (A; r0 ))

Figure 8.2: Residual minimization on A(Kk (A; r0 )) = span{Ar0 , A2 r0 , ..., Ak r0 }. Any x ∈ Kk (A; r0 ) can be represented as x = −Qk y with y ∈ Rk and using this we obtain: kA(x0 + x) − bk = kAQk y − r0 k = kQk+1 Tk y − r0 k

= kQk+1 Tk y − Qk+1 (kr0 ke1 )k = kTk y − kr0 ke1 k .

(8.26)

So xk as in (8.25) can be obtained from kTk yk − kr0 ke1 k = min{ kTk y − kr0 ke1 k | y ∈ Rk } k

0

k

x = x − Qk y .

(8.27a) (8.27b)

From (8.27) we see that the residual minimization problem in (8.25) leads to a least squares problem with the (k + 1) × k tridiagonal matrix Tk . Due to the structure of this matrix Givens rotations are very suitable for solving the least squares problem in (8.27). Combination of the Lanczos algorithm (for computing an orthogonal basis) with a least squares solver based on 181

Givens rotations results in the MINRES algorithm. We will now derive this algorithm. First we recall that for (x, y) 6= (0, 0) a unique orthogonal Givens rotation is given by  2 2     c + s! = 1 ! c s G= such that w x −s c  with w > 0 = G 0 y

(8.28)

The least squares problem in (8.27a) is solved using an orthogonal transformation Vk ∈ R(k+1)×(k+1) such that   Rk Vk Tk = , Rk ∈ Rk×k upper triangular (8.29) ∅   ˜k b ˜ k ∈ Rk . Then the solution of the least squares problem Define bk := Vk e1 =: , with b ∗ ˜ is given by yk = R−1 k bk . We show how the matrices Rk and vectors bk , k = 1, 2, . . ., can be computed using short (and thus cheap) recursions. We introduce the notation   Ij−1 ∅ ∅ cj sj  ∈ R(j+1)×(j+1) with c2j + s2j = 1 Gj =  ∅ ∅ −sj cj Given T1 one can compute c1 , s1 , r1 such that     r α1 G1 = 1 0 β1 Given T2 and G1 one can compute c2 , s2 , r2 such that      β1 r G G2  1 α2  = 2 , 0 β2

(8.30)

r2 ∈ R2

For k ≥ 3 and for given Tk , Gk−1 , Gk−2 one can compute ck , ss , rk such that     0   Gk−1 Gk−2 βk−1  r   Gk  = k , rk ∈ Rk  αk 0 βk

(8.31)

(8.32)

Note that rk has at most three nonzero entries:

rk = (0, . . . , 0, rk,k−2 , rk,k−1 , rk,k )T

(8.33)

Using these Givens transformations Gj the orthogonal transformations Vj , j ≥ 1, are defined as follows   Vj−1 ∅ V1 := G1 , Vj := Gj , j≥2 ∅ 1

One easily checks, using induction, that  r1    r2 Rk  , Rk :=  Vk T k = ∅ ∅ ∅

..



. rk

182

   , rj as in (8.30),(8.31), (8.32) 

˜ k , bk,k+1 )T , we have the recursion For bk = Vk e1 =: (b   ˜ j−1 b    b1 = G1 e1 , bj =  cj sj bj−1,j  , 0 −sj cj

j≥2

(8.34)

(Notation: bj−1,j is the j-th entry of bj−1 .) We now derive a simple recursion for the vector xk in (8.27b). Define the matrix Qk R−1 k =: Pk = (p1 . . . pk ) with columns pj (1 ≤ j ≤ k). From Pk Rk = Qk and the nonzero structure of the columns of Rk (cf. (8.33)) it follows that p1 = q1 /r1 ,

p2 = (q2 − r2,1 p1 )/r2,2

pj = (qj − rj,j−2pj−2 − rj,j−1 pj−1 )/rj,j ,

j≥3

(8.35)

Note that using (8.34) we can rewrite (8.27b) as 0 0 ˜ ˜ xk = x0 − kr0 kQk R−1 k bk = x − kr kPk bk ˜ k−1 − kr0 kbk,k pk = xk−1 − kr0 kbk,k pk = x0 − kr0 kPk−1 b

(8.36)

This leads to the following method: MINRES algorithm. Given x0 , compute r0 = Ax0 − b. For k = 1, 2, . . . : Compute qk , αk , βk using the Lanczos method. Compute rk using (8.30), (8.31) or (8.32) (note (8.33)). Compute pk using (8.35). Compute bk,k using (8.34). Compute update: xk = xk−1 − kr0 kbk,k pk . Note that in each iteration of this method we need only one matrix-vector multiplication and a few relatively cheap operations, like scalar products and vector additions. Remark 8.3.1 If for a given symmetric regular matrix A and given starting residual r0 assumption 8.2.1 does not hold then there exists a minimal k0 hn such that AKk0 (A; r0 ) = Kk0 (A; r0 ). In the Lanczos method we then obtain (using exact arithmetic) βk0 = 0 and thus the iteration stops for k = k0 . It can be shown that xk0 computed in the MINRES algorithm satisfies Axk0 = b and thus we have solved the linear system.  We now derive the preconditioned MINRES algorithm. For this we assume a given symmetric positive definite matrix M. Let L be such that M = LLT . We consider the preconditioned system L−1 AL−T z = L−1 b , z = LT x ˜ := L−1 AL−T is symmetric. For given x0 ∈ Rn we have z0 = LT x0 and the starting Note that A ˜ 0 − L−1 b = L−1 r0 . We apply the Lanczos residual of the preconditioned problem satisfies Az ˜ L−1 r0 ). We want to avoid method to construct an orthogonal basis q1 , . . . qk of the space Kk (A; T computations with the matrices L and L . This can be achieved if we reformulate the algorithm using the transformations ˜tj := L˜ qj ,

tj := Lqj , 183

wj := L−T qj = M−1 tj

˜ Using these definitions we obtain an equivalent formulation of the algorithm (8.4) applied to A with r = L−1 r0 , which is called the preconditioned Lanczos method:  1  ˜ 0 := M−1 r0 ; krk := hw ˜ 0 , r0 i 2 t0 := 0; w      ˜ 0 /krk; β0 := 0; t1 := r0 /krk; w1 := w      for j ≥ 1 :       ˜j+1 := Awj − βj−1 tj−1 ,   t αj := h˜tj+1 , wj i,     ˜tj+1 := ˜tj+1 − αj tj ,      w  ˜ j+1 := M−1˜tj+1 ; βj := hw ˜ j+1 , ˜tj+1 i,      tj+1 := ˜tj+1 /βj ,     ˜ j+1 /βj , wj+1 := w

(8.37)

Note that for M = I we obtain the algorithm (8.4) and that in each iteration a system with the matrix M must be solved. As a consequence of theorem 8.2.2 we get: Theorem 8.3.2 The set w1 , w2 , ..., wk defined in algorithm (8.37) is orthogonal with respect to h·, ·iM and forms a basis of the Krylov subspace Kk (M−1 A; M−1 r0 ) (k ≤ n). Proof. From theorem 8.2.2 and the defintion of wj it follows that (LT wj )1≤j≤k forms an orthogonal basis of Kk (L−1 AL−T ; L−1 r0 ) with respect to the Euclidean scalar product. Note that hLT wj , LT wi i = 0 iff hwj , wi iM = 0, and LT wj ∈ Kk (L−1 AL−T ; L−1 r0 ) iff wj ∈ Kk (M−1 A; M−1 r0 ).  Define Wj := w1 w2 . . . wj ∈ Rn×j . From theorem 8.3.2 it follows that WT MW = I. From (8.23) we obtain, using LT Wk = Qk , that L−1 AL−T LT Wk = LT Wk+1 Tk holds and thus M−1 AWk = Wk+1 Tk

(8.38)

The matrix M−1 A is symmetric with respect to h·, ·iM . Instead of (8.25) we consider the following minimization problem:  0 n k 0 k −1 −1 0   Given x ∈ R , compute x ∈ x + K (M A; M r ) such that kM−1 Axk − M−1 bkM   = min{ kM−1 Ax − M−1 bkM | x ∈ x0 + Kk (M−1 A; M−1 r0 ) }

(8.39)

with r0 = Ax0 − b. Using arguments as in (8.26) it follows that the solution of the minimization problem (8.39) can be obtained from kTk yk − krke1 k = min{ kTk y − krke1 k | y ∈ Rk } k

0

k

x = x − Wk y . 1

(8.40a) (8.40b)

with krk = hM−1 r0 , r0 i 2 . This problem can be solved using Givens rotations along the same lines as for the unpreconditioned case. Thus we get the following:

184

Preconditioned MINRES algorithm. 1 ˜ 0 = M−1 r0 , krk = hw ˜ 0 , r0 i 2 . Given x0 , compute r0 = Ax0 − b, w For k = 1, 2, . . . : Compute wk , αk , βk using the preconditioned Lanczos method (8.37). Compute rk using (8.30), (8.31) or (8.32) (note (8.33)). Compute pk using (8.35) with qk replaced by wk . Compute bk,k using (8.34). Compute update: xk = xk−1 − krkbk,k pk . The minimization property (8.39) yields a convergence result for the preconditioned MINRES method: Theorem 8.3.3 Let A ∈ Rn×n be symmetric and M ∈ Rn×n symmetric positive definite. For xk , k ≥ 0, computed in the preconditioned MINRES algorithm we define ˜rk = M−1 (Axk − b). The following holds: k˜rk kM = ≤

min

pk ∈Pk ;pk (0)=1

kpk (M−1 A)˜r0 kM

min

max

pk ∈Pk ;pk (0)=1 λ∈σ(M−1 A)

|pk (λ)| k˜r0 kM

(8.41)

Proof. The equality result follows from k˜rk kM = = =

min

pk−1 ∈Pk−1

min

pk−1 ∈Pk−1

min

 kM−1 b − M−1 A x0 + pk−1 (M−1 A)˜r0 kM k˜r0 − M−1 Apk−1 (M−1 A)˜r0 kM

pk ∈Pk ;pk (0)=1

kpk (M−1 A)˜r0 kM

Note that M−1 A is symmetric with respect to h·, ·iM . And thus kpk (M−1 A)˜r0 kM ≤ kpk (M−1 A)kM k˜r0 kM =

max

λ∈σ(M−1 A)

|pk (λ)| k˜r0 kM

holds. From this result it follows that bounds on the reduction of the (preconditioned) residual can be obtained if one assumes information on the spectrum of M−1 A. We present two results that are well-known in the literature. Proofs, which are based on approximation properties of Chebyshev polynomials are given in, for example, [42]. Theorem 8.3.4 Let A, M and ˜rk be as in theorem 8.3.3. Assume that all eigenvalues of M−1 A are positive. Then

holds.

p  k˜rk kM −1 A) + 1) k , κ(M ≤ 2 1 − 2/( k˜r0 kM

k = 0, 1, . . .

We note that in this bound the dependence on the condition number κ(M−1 A) is the same as in well-known bounds for the preconditioned CG method. 185

Theorem 8.3.5 Let A, M and ˜rk be as in theorem 8.3.3. Assume that σ(M−1 A) ⊂ [a, b]∪ [c, d] with a < b < 0 < c < d and b − a = d − c. Then r [k/2] k˜rk kM ad , k = 0, 1, . . . (8.42) + 1) ≤ 2 1 − 2/( 0 k˜r kM bc holds.

In the special case a = −d, b = −c the reduction factor in (8.42) takes the form 1−2/(κ(M−1 A)+ 1). Note that here the dependence on κ(M−1 A) is different from the positive definite case in theorem 8.3.4.

8.4

GMRES type of methods

In this (and the following) section we do not assume that A is symmetric. We only assume that A is regular. In GMRES (“Generalized Minimal Residual”) type of methods one first computes an orthogonal basis of the Krylov subspace and then, using this basis, one determines the xk satisfying the minimal residual criterion in (8.25). It can be shown (cf. Faber and Manteuffel [33]) that only for a very small class of nonsymmetric matrices it is possible to compute this xk satisfying the minimal residual criterion with “low” computational costs. This is related to the fact that in general for a nonsymmetric matrix we do not have a method for computing an orthogonal basis of the Krylov subspace with low computational costs (cf. the Lanczos algorithm for the symmetric case). In GMRES the so-called Arnoldi algorithm, introduced in Arnoldi [5], is used for computing an orthogonal basis of the Krylov subspace :  1 q := r0 /kr0 k;       for j ≥ 1 :      ˜ j+1 := Aqj , q       for i = 1, . . . j : (8.43)  hij := h˜ qj+1 , qi i       ˜ j+1 := q ˜ j+1 − hij qi q       hj+1,j := k˜ qj+1 k     ˜ j+1 /hj+1,j . qj+1 := q When we put the coefficients hij (1 ≤ i ≤ j + 1 ≤ k + 1) in a matrix denoted by Hk we obtain:   h11 h12 · · · · · · h1k  h21 h22 h23 · · · h2k      . . . . . .   . . .   Hk =  (8.44)  . . . .  . . hk−1,k      ..  . hk,k  ∅ hk+1,k

This is a (k + 1) × k matrix of upper Hessenberg form. We also use the notation Qj := [q1 q2 . . . qj ] (n × j − matrix with columns qi ). 186

Using this notation, the Arnoldi algorithm results in AQk = Qk+1 Hk .

(8.45)

The result in (8.45) is similar to the result in (8.23). However, note that the matrix Hk in (8.45) contains significantly more nonzero elements than the tridiagonal matrix Tk in (8.23). Using induction it can be shown that q1 , q2 , ..., qk forms an orthogonal basis of the Krylov subspace Kk (A; r0 ). As in the derivation of (8.27a),(8.27b) for the MINRES method, using the fact that we have an orthogonal basis, we obtain that the xk that satisfies the minimal residual criterion (8.25) can be characterized by the least squares problem: kHk yk − kr0 ke1 k = min{kHk y − kr0 ke1 k | y ∈ Rk } xk = x0 − Qk yk .

(8.46) (8.47)

The GMRES algorithm has the following structure :  1. Start : choose x0 ; r0 := Ax0 − b; q1 := r0 /kr0 k       2. Arnoldi method (8.43) for the computation of an orthogonal     basis q1 , q2 , . . . , qk of Kk (A; r0 )

 3. Solve a least squares problem :     yk such that kHk yk − kr0 ke1 k = min{kHk y − kr0 ke1 k | y ∈ Rk }.      xk := x0 − Qk yk

(8.48)

The GMRES method is introduced in Saad and Schultz [80]. For a detailed discussion of implementation aspects of the GMRES method we refer to that paper. In [80] it is shown that using similar techniques as in the derivation of the MINRES method the least squares problem in step 3 in (8.48) can be solved with low computational costs. However, step 2 in (8.48) is expensive, both with respect to memory and arithmetic work. This is due to the fact that in the kth iteration we need computations involving q1 , q2 , ..., qk−1 to determine qk . To avoid computations involving all the previous basis vectors, the GMRES method with restart is often used in practice. In GMRES(m) we apply m iterations of the GMRES method as in (8.48), then we define x0 := xm and again apply m iterations of the GMRES method with this new starting vector, etc.. Note that for k > m the iterands xk do not fulfill the minimal residual criterion (8.25). In Saad and Schultz [80] it is shown that (in exact arithmetic) the GMRES method cannot break down and that (as in CG) the exact solution is obtained in at most n iterations. The minimal residual criterion implies that in GMRES the residual is reduced in every iteration . These nice properties of GMRES do not hold for the GMRES(m) algoritm; a well known difficulty with the GMRES(m) method is that it can stagnate. Example 8.4.1 (Convection-diffusion problem) We apply the GMRES(m) method to the discrete convection-diffusion problem of section 6.6 with b1 = cos( π6 ), b2 = sin( π6 ) and for several values of ε, m, h. For the starting vector we take x0 = 0. In Table 8.2 we show the number of iterations needed to reduce the Euclidean norm of the starting residual with a factor 103 . From these results we see that for this model problem the number of iterations increases significantly when h is decreased. Also we observe a certain robustness with respect to variation in ε. Based on the results in Table 8.2 we obtain that for m “small” (i.e. 1-20) the GMRES(m) method is more efficient than for m “large” (i.e. ≫ 20).  187

ε = 10−1 ε = 10−2 ε = 10−4 ε = 10−1 ε = 10−2 ε = 10−4

m: 1 h = 32 1 h = 32 1 h = 32 1 h = 64 1 h = 64 1 h = 64

10 97 68 61 270 127 119

20 72 75 59 191 134 114

40 61 80 60 146 150 114

80 56 59 59 147 160 114

Table 8.2: # iterations for GMRES(m).

There are other methods which are of GMRES type, in the sense that these methods (in exact arithmetic) yield iterands defined by the minimal residual criterion (8.25). These methods differ in the approach that is used for computing the minimal residual iterand. Examples of GMRES type of methods are the Generalized Conjugate Residual method (GCR) and Orthodir. These variants of GMRES seem to be less popular because for many problems they are at least as expense as GMRES and numerically less stable. For a further discussion and comparison of GMRES type methods we refer to Saad and Schultz [79], Barrett et al. [10] and Freund et al. [36].

8.5

Bi-CG type of methods

The GMRES method is expensive (both with respect to memory and arithmetic work) due to the fact that for the computation of an orthogonal basis of the Krylov subspace we need ”long” recursions (cf. Arnoldi method (8.43)). In this respect there is an essential difference with the symmetric case, because then we can use ”short” recursions for the computation of an orthogonal basis (cf. (8.4)). Also note that the implementation of GMRES (using Givens rotations to solve the least squares problem) is rather complicated, compared to the implementation of the CG method. The Bi-CG method which we discuss below is based on a generalized Lanczos method that is used for computing a ”reasonable” basis of the Krylov subspace. This generalized Lanczos method uses “short” recursions (as in the Lanczos method), but the resulting basis will in general not be orthogonal. The implementation of the Bi-CG method is as simple as the implementation of the CG method. The Bi-CG method is based on the bi-Lanczos (also called nonsymmetric Lanczos) method:  0 ˜ 0 := 0; v1 := v ˜ 1 := r0 /kr0 k2 ; β0 = γ0 = 0; v := v       For j ≥ 1 :      α := hA˜ vj , vj i j wj+1 := Avj − αj vj − βj−1 vj−1 ;     ˜ j − αj v ˜ j − γj−1 v ˜ j−1 ; ˜ j+1 := AT v w    j+1 j+1 j+1  γj := kw k, v := w /γj ,    j+1 j+1 ˜ ˜ j+1 /βj . βj := hv , w i, wj+1 := w 188

(8.49)

If A = AT holds, then the two recursions in (8.49) are the same and the bi-Lanczos method re˜ j+1 i = duces to the Lanczos method in (8.4). In the bi-Lanczos method it can happen that hvj+1 , w j+1 j+1 ˜ 0 , even if v 6= 0 and w 6= 0. In that case the algorithm is not executable anymore and this is called a (serious) “breakdown”. Using induction we obtain that for the two sequences of vectors generated by Bi-CG the following properties hold: span{v1 , v2 , . . . , vj } = Kj (A; r0 ) 1

2

j

j

T

(8.50)

0

˜ ,...,v ˜ } = K (A ; r ) span{˜ v ,v

(8.51)

˜ j i = 0 if i 6= j , hvi , v ˜i i = 1 . hvi , v

(8.52)

˜ j (i 6= j) bi-orthogonal. In general the vj (j = 1, 2, ...) will not Based on (8.52) we call vi and v be orthogonal. Using the notation ˜ j := [˜ ˜j ] V v1 . . . v

Vj := [v1 . . . vj ], we obtain the identities 

AVk

=

α1

β1

  γ1   Vk    

α2 .. .



..

.

..

.

..

.

..

.. ∅

.

. βk−1 γk−1 αk

=: Vk Tk + γk vk+1 eTk and

˜ T Vk = I k , V k



      + γk vk+1 0 . . . 0 1   

˜ T vk+1 = 0 . V k

(8.53) (8.54)

In Bi-CG we do not use a minimal residual criterion as in (8.25) but the following criterion based on an orthogonality condition: Determine xk ∈ x0 + Kk (A; r0 ) such that Axk − b ⊥ Kk (AT ; r0 ) .

(8.55)

The existence of an xk satisfying the criterion in (8.55) is not guaranteed ! If the criterion (8.55) cannot be fulfilled, the Bi-CG algorithm in (8.58) below will break down. For the case that A is symmetric positive definite, the criteria in (8.55) and in (8.3) are equivalent and (in exact arithmetic) the Bi-CG algorithm will yield the same iterands as the CG algorithm. Using (8.50)-(8.52) we see that the Bi-CG iterand xk , characterized in (8.55), satisfies ˜ T (AVk yk − r0 ) = 0 ; V k

xk = x0 + Vk yk

(yk ∈ Rk ).

Due to the relations in (8.53), (8.54) this yields the following characterization: Tk yk = kr0 k2 e1 k

0

(8.56) k

x = x + Vk y .

(8.57)

Note that this is very similar to the characterization of the CG iterand in (8.10a), (8.10b). However, in (8.56) the tridiagonal matrix Tk need not be symmetric positive definite and Vk in general will not be orthogonal. Using an LU-decomposition of the tridiagonal matrix Tk we can 189

compute yk , provided Tk is nonsingular, and then determine xk . An efficient implementation of this approach can be derived along the same lines as for the Lanczos iterative method in section 8.2. This then results in the Bi-CG algorithm, introduced in Lanczos [59] (cf. also Fletcher [35]):  ˜ 0 = p0 = ˜r0 = r0 = b − Ax0 ; ρ0 := kr0 k2 starting vector x0 ; p       For k ≥ 0 :       ˜ k i; αk := ρk /σk ; σk := hApk , p    xk+1 := xk + α pk k k+1 := rk − α Apk r  k    ˜k  ˜rk+1 := ˜rk − αk AT p    ρk+1 := h˜rk+1 , rk+1 i; βk+1 := ρk+1 /ρk ;      pk+1 := rk+1 + βk+1 pk   ˜ k+1 := ˜rk+1 + βk+1 p ˜k p

(8.58)

Note: here and in the remainder of this section the residual is defined by rk = b − Axk (instead of Axk − b). The Bi-CG algorithm is simple and has low computational costs per iteration (compared to GMRES type methods). A disadvantage is that a breakdown can occur (ρk = 0 or σk = 0). A “near breakdown” will result in numerical instabilities. To avoid these (near) breakdowns variants of Bi-CG have been developed that use so-called look-ahead Lanczos algorithms for computing a basis of the Krylov subspace. Also the criterion in (8.55) can be replaced by another criterion to avoid a breakdown caused by the fact that the Bi-CG iterand as in (8.55) does not exist. The combination of a look-ahead Lanczos approach and a criterion based on minimization of a “quasi-residual” is the basis of the QMR (“Quasi Minimal Residual”) method. For a discussion of the look-ahead Lanczos approach and QMR we refer to Freund et al. [36]. For the Bi-CG method there are only very few theoretical convergence results. A variant of Bi-CG is analyzed in Bank and Chan [8]. A disadvantage of the Bi-CG method is that we need a multiplication by AT which is often not easily available. Below we discuss variants of the Bi-CG method which only use multiplications with the matrix A (two per iteration). For many problems these methods have a higher rate of convergence than the Bi-CG method. We introduce the BiCGSTAB method (from Van der Vorst [91])and the CGS (“Conjugate Gradients Squared”) method (from Sonneveld [86]). These methods are derived from the Bi-CG method. We assume that the Bi-CG method does not break down. We first reformulate the Bi-CG method using a notation based on matrix polynomials. With Tk , Pk ∈ Pk defined by T0 (x) = 1,

P0 (x) = 1 ,

Pk (x) = Pk−1 (x) − αk−1 xTk−1 (x) ,

Tk (x) = Pk (x) + βk Tk−1 (x) ,

k ≥ 1,

k ≥ 1,

with αk , βk as in (8.58), we have for the search directions pk and the residuals rk resulting from the Bi-CG method: rk = Pk (A)r0 , 0

pk = Tk (A)r . 190

(8.59) (8.60)

˜ k with A replaced by AT and r0 replaced by Results as in (8.59), (8.60) also hold for ˜rk and p ˜r0 . For the sequences of residuals and search directions, generated by the Bi-CG method, we define related transformed sequences: ˆrk :=Qk (A)rk , k

k

ˆ :=Qk (A)p , p

(8.61) (8.62)

with Qk ∈ Pk . Note that corresponding to a given residual ˆrk there corresponds an iterand ˆ k := A−1 (b − ˆrk ). x

(8.63)

ˆ k corresponding to In the BiCGSTAB method and the CGS method we compute the iterands x ˆ k can be a ”suitable” polynomial Qk . These polynomials are chosen in such a way that the x k k ˆ and A. The costs per iteration computed with simple (i.e short) recursions involving ˆr , p of these algorithms will be roughly the same as the costs per iteration in Bi-CG. An important advantage is that we do not need AT . Clearly, from an efficiency point of view it is favourable to have a polynomial Qk such that kˆrk k = kQk (A)rk k ≪ krk k holds. For obtaining a hybrid Bi-CG method one can try to find a polynomial Qk such that for the corresponding transformed quantities we have low costs per iteration (short recursions) and a (much) smaller transformed residual. The first example of such a polynomial is due to Sonneveld [86]. He proposes: Qk (x) = Pk (x),

(8.64)

ˆ k corresponding to this Qk are computed in the with Pk the Bi-CG polynomial. The iterands x CGS method (cf. (8.75)). Another choice is proposed in Van der Vorst [91]: Qk (x) = (1 − ωk−1 x)(1 − ωk−2 x) · · · (1 − ω0 x).

(8.65)

ˆ k corresponding The choice of the parameters ωj is discussed below (cf. (8.73)). The iterands x to this Qk are computed in the BiCGSTAB algorithm. We now show how the BiCGSTAB algorithm can be derived. First note that for the BiCGSTAB polynomial we have Qk+1 (A) = (I − ωk A)Qk (A). ˆ k we obtain From the Bi-CG algortihm and the definition of p ˆ k+1 = Qk+1 (A)pk+1 = Qk+1 (A)rk+1 + βk+1 (I − ωk A)Qk (A)pk p = ˆrk+1 + βk+1 (ˆ pk − ωk Aˆ pk ).

(8.66)

Similarly, for the transformed residuals we obtain the recursion ˆrk+1 = (ˆrk − αk Aˆ pk ) − ωk A(ˆrk − αk Aˆ pk ).

(8.67)

For the iterands related to these transformed residuals we have ˆ k+1 − x ˆ k = A−1 (ˆrk − ˆrk+1 ) = αk p ˆ k + ωk (ˆrk − αk Aˆ x pk ), and thus we have the recursion ˆ k+1 = x ˆ k + αk p ˆ k + ωk (ˆrk − αk Aˆ x pk ). 191

(8.68)

Note that in (8.66), (8.67) and (8.68) we have simple recursions in which the scalars αk and βk defined in the Bi-CG algorithm are used. We now show that for these scalars one can derive other, more feasible, formulas. We consider ρk = h˜rk , rk i The coefficient for the highest order term of the Bi-CG polynomial Pk is equal to (−1)k α0 α1 · · · αk−1 . So we have ˜rk = Pk (AT )˜r0 = (−1)k α0 α1 · · · αk−1 (AT )k ˜r0 + w ˜ k,

˜ k ∈ Kk (AT ; ˜r0 ). Using this in the definition of ρk and the orthogonality condition for rk with w in (8.55) we obtain the relation ρk = (−1)k α0 α1 · · · αk−1 h(AT )k ˜r0 , rk i.

(8.69)

We now define the quantity ρˆk := h˜r0 , ˆrk i in which we use the transformed residual ˆrk . The coefficient for the highest order term of the BiCGSTAB polynomial Qk is equal to (−1)k ω0 ω1 · · · ωk−1 . So we have Qk (AT )˜r0 = (−1)k ω0 ω1 · · · ωk−1 (AT )k ˜r0 + wk , with wk ∈ Kk (AT ; ˜r0 ). Using this in the definition of ρˆk and the orthogonality condition for rk in (8.55) we obtain the relation ρˆk = h˜r0 , Qk (A)rk i = hQk (AT )˜r0 , rk i = (−1)k ω0 ω1 · · · ωk−1 h(AT )k ˜r0 , rk i.

(8.70)

The results in (8.69),(8.70) yield the following formula for βk+1 ρk+1 h(AT )k+1 ˜r0 , rk+1 i = − αk ρk h(AT )k ˜r0 , rk i ρˆk+1 αk )( ). = ( ρˆk ωk

βk+1 =

(8.71)

Similarly, for the scalar αk defined in the Bi-CG algorithm, the formula αk =

ρˆk hAˆ pk , ˜r0 i

(8.72)

can be derived. We finally discuss the choice of the parameters ωj in the BiCGSTAB polynomial. We use the notation 1 ˆrk+ 2 := ˆrk − αk Aˆ pk . The recursion for the transformed residuals can be rewritten as 1

1

ˆrk+1 = ˆrk+ 2 − ωk Aˆrk+ 2 . The ωk is now defined by a standard line search: 1

1

1

1

kˆrk+ 2 − ωk Aˆrk+ 2 k = min kˆrk+ 2 − ωAˆrk+ 2 k. ω

192

This results in

1

ωk =

1

hAˆrk+ 2 , ˆrk+ 2 i 1

1

hAˆrk+ 2 , Aˆrk+ 2 i

.

(8.73)

Using the recursion in (8.66), (8.67), (8.68) , the formulas for the scalars in (8.71), (8.72) and the choice for ωk as in (8.73) we obtain the following Bi-CGSTAB algorithm, where for ease of notation we dropped the “ˆ” notation for the transformed variables.  starting vector x0 ; r0 = b − Ax0 ; choose ˜r0 ( e.g. = r0 )       p−1 = c−1 = 0. α−1 = 0, ω−1 = ρ−1 = 1,       for k ≥ 0 :      k 0   ρk = hr , ˜r i, βk = (αk−1 /ωk−1 )(ρk /ρk−1 ), pk = βk pk−1 + rk − βk ωk−1 ck−1 , (8.74)  k = Apk ,  c      γk = hck , ˜r0 i, αk = ρk /γk ,    k+ 21 k − α ck , ck+ 21 = Ark+ 12 ,  = r r  k  1 1  k+ 21 k+ 21   , r i/hck+ 2 , ck+ 2 i, ω = hc k   1 1 1  xk+1 = xk + αk pk + ωk rk+ 2 , rk+1 = rk+ 2 − ωk ck+ 2 .

This Bi-CGSTAB method is introduced in Van der Vorst [91] as a variant of Bi-CG and of the CGS method. Variants of the Bi-CGSTAB method, denoted by Bi-CGSTAB(ℓ), are discussed in Sleijpen and Fokkema [84].

In the CGS method we take Qk (x) = Pk (x), with Pk the Bi-CG polynomial. This results in transformed residuals satisfying the relation ˆrk = (Pk (A))2 r0 . This explains the name Conjugate Gradients Squared. Along the same lines as above for the Bi-CGSTAB method one can derive the following CGS algorithm from the Bi-CG algorithm (cf. Sonneveld [86]):  ˜ 0 ; ˜r0 := b − A˜ ˜ −1 := 0; ρ−1 := 1; starting vector x x0 ; q0 := q       For k ≥ 0 :       ρk := h˜r0 ; ˜rk i; βk := ρk /ρk−1 ;    k k k   w := ˜r + βk q k k ˜ := w + βk (qk + βk q ˜ k−1 ) q (8.75)  k := A˜ k  v q     σk := h˜r0 , vk i; αk := ρk /σk ;      qk+1 := wk − αk vk     ˜rk+1 := ˜rk − αk A(wk + qk+1 )   ˜ k+1 := x ˜ k + αk (wk + qk+1 ) x

Note that both in the Bi-CGSTAB method and in the CGS method we have relatively low costs per iteration (two matrix vector products and a few inner products) and that we do not need AT . The fact that in the CGS polynomial we use the square of the Bi-CG polynomial often results in a rather irregular convergence behaviour (cf. Van der Vorst [91]). The Bi-CGSTAB polynomial (8.65),(8.73) is chosen such that the resulting method in general has a less irregular convergence behaviour than the CGS method. 193

Example 8.5.1 (Convection-diffusion problem) In Figure 8.3 we show the results of the CGS method and the Bi-CGSTAB method applied to the convection-diffusion equation as in 1 example 8.4.1, with ε = 10−2 , h = 32 . As a measure for the error reduction we use

10

log(σkd ) := 10 log(||Axk − b||2 /||Ax0 − b||2 ) = 10 log(||Axk − b||2 /||b||2 ).

In figure 8.3 we show the values of

10 log(σ d ) k

for k = 0, 1, 2, · · · , 60.

12

60

0

CGS

Bi-CGSTAB

-12

Figure 8.3: Convergence behaviour of CGS and of Bi-CGSTAB

Note that for both methods we first have a growth of the norm of the defect (in CGS even up to 1012 !). In this example we observe that indeed the Bi-CGSTAB has a smoother convergence behaviour than the CGS method. 

We finally note that for all these nonsymmetric Krylov subspace methods the use of a suitable preconditioner is of great importance for the efficiency of the methods. There is only very little analysis in this field, and in general the choice of the preconditioner is based on “trial and error”. Often variants of the ILU factorization are used as a preconditioner. The preconditioned Bi-CGSTAB algorithm, with preconditioner W, is as follows (cf. Sleijpen 194

and Van der Vorst [85]):  starting vector x0 ; r0 = b − Ax0 ; choose ˜r0 ( e.g. = r0 )       p−1 = c−1 = 0. α−1 = 0, ω−1 = ρ−1 = 1,       for k ≥ 0 :      ρk = hrk , ˜r0 i, βk = (αk−1 /ωk−1 )(ρk /ρk−1 ),    1 1   solve pk+ 2 from Wpk+ 2 = rk − βk ωk−1 ck−1 ,    1  pk = β pk−1 + pk+ 2 , k k = Apk , c     γk = hck , ˜r0 i, αk = ρk /γk ,    1   rk+ 2 = rk − αk ck ,   1 1 1    solve yk+ 2 from Wyk+ 2 = rk+ 2 ,   1 1   ck+ 2 = Ayk+ 2 ,    1 1 1 1   ωk = hck+ 2 , rk+ 2 i/hck+ 2 , ck+ 2 i,    1 1 1 xk+1 = xk + αk pk + ωk yk+ 2 , rk+1 = rk+ 2 − ωk ck+ 2 .

195

(8.76)

196

Chapter 9

Multigrid methods 9.1

Introduction

In this chapter we treat multigrid methods (MGM) for solving discrete scalar elliptic boundary value problems. We first briefly discuss a few important differences between multigrid methods and the iterative methods treated in the preceding chapters . The basic iterative methods and the Krylov subspace methods use the matrix A and the righthand side b which are the result of a discretization method. The fact that these data correspond to a certain underlying continuous boundary value problem is not used in the iterative method. However, the relation between the data (A and b) and the underlying problem can be useful for the development of a fast iterative solver. Due to the fact that A results from a discretization procedure we know, for example, that there are other matrices which, in a certain natural sense, are similar to the matrix A. These matrices result from the discretization of the underlying continuous boundary value problem on other grids than the grid corresponding to the given discrete problem Ax = b. The use of discretizations of the given continuous problem on several grids with different mesh sizes plays an important role in multigrid methods. We will see that for a large class of discrete elliptic boundary value problems multigrid methods have a significantly higher rate of convergence than the methods treated in the preceding chapters. Often multigrid methods even have “optimal” complexity. Due to the fact that in multigrid methods discrete problems on different grids are needed, the implementation of multigrid methods is in general (much) more involved than the implementation of, for example, Krylov subspace methods. We also note that for multigrid methods it is relatively hard to develop “black box” solvers which are applicable to a wide class of problems. In section 9.2 we explain the main ideas of the MGM using a simple one dimensional problem. In section 9.3 we introduce multigrid methods for discrete scalar elliptic boundary value problems. In section 9.4 we present a convergence analysis of these multigrid methods. Opposite to the basic iterative and Krylov subspace methods, in the convergence analysis we will need the underlying continuous problem. The standard multigrid method discussed in the sections 9.2-9.4 is efficient only for diffusion-dominated elliptic problems. In section 9.5 we consider modifications of standard multigrid methods which are used for convection-dominated problems. In section 9.6 we discuss the principle of nested iteration. In this approach we use computations on relatively coarse grids to obtain a good starting vector for an iterative method (not necessarily 197

a multigrid method). In section 9.7 we show some results of numerical experiments. In section 9.8 we discuss so-called algebraic multigrid methods. In these methods, as in basic iterative methods and Krylov subspace methods, we only use the given matrix and righthand side, but no information on an underlying grid structure. Finally, in section 9.9 we consider multigrid techniques which can be applied directly to nonlinear elliptic boundary value problems without using a linearization technique. For a thorough treatment of multigrid methods we refer to the monograph of Hackbusch [44]. For an introduction to multigrid methods requiring less knowledge of mathematics, we refer to Wesseling [96], Briggs [23], Trottenberg et al. [69]. A theoretical analysis of multigrid methods is presented in [19].

9.2

Multigrid for a one-dimensional model problem

In this section we consider a simple model situation to show the basic principle behind the multigrid approach. We consider the two-point boundary value model problem  −u′′ (x) = f (x), x ∈ Ω := (0, 1) (9.1) u(0) = u(1) = 0 . The variational formulation of this problem is: find u ∈ H01 (Ω) such that Z Z 1 ′ ′ u v dx = f v dx for all v ∈ H01 (Ω) 0

For the discretization we introduce a sequence of nested uniform grids. For ℓ = 0, 1, 2, . . . , we define hℓ = 2−ℓ−1 nℓ =

h−1 ℓ

−1

(“mesh size”) , (“number of interior grid points”) ,

ξℓ,i = ihℓ , i = 0, 1, ..., nℓ + 1 Ωint ℓ

(9.2)

= {ξℓ,i | 1 ≤ i ≤ nℓ }

(“grid points”) ,

(“interior grid”) ,

Thℓ = ∪{ [ξℓ,i , ξℓ,i+1 ] | 0 ≤ i ≤ nℓ }

(9.3) (9.4) (9.5)

(“triangulation”)

(9.6)

The space of linear finite elements corresponding to the triangulation Thℓ is given by X1hℓ ,0 = { v ∈ C(Ω) | v|[ξℓ,i ,ξℓ,i+1] ∈ P1 , i = 0, . . . , nℓ , v(0) = v(1) = 0 } The standard nodal basis in this space is denoted by (φi )1≤i≤nℓ . This basis induces an isomorphism nℓ X 1 nℓ Pℓ : R → Xhℓ ,0 , Pℓ x = xi φi (9.7) i=1

yields a linear system Z 1 Z 1 ′ ′ (Aℓ )ij = φi φj dx, (bℓ )i = f φi dx

The Galerkin discretization in the space Aℓ xℓ = bℓ ,

X1hℓ ,0

0

(9.8)

0

The solution of this discrete problem is denoted by x∗ℓ . The solution of the Galerkin discretization in the space X1hℓ ,0 is given by uℓ = Pℓ x∗ℓ . A simple computation shows that nℓ ×nℓ Aℓ = h−1 ℓ tridiag(−1, 2, −1) ∈ R

198

Note that, apart from a scaling factor, the same matrix results from a standard discretization with finite differences of the problem (9.1). Clearly, in practice one should not solve the problem in (9.8) using an iterative method (a Cholesky factorization A = LLT is stable and efficient). However, we do apply a basic iterative method here, to illustrate a certain “smoothing” property which plays an important role in multigrid methods. We consider the damped Jacobi method 1 xk+1 = xkℓ − ωhℓ (Aℓ xkℓ − bℓ ) with ℓ 2

ω ∈ (0, 1] .

(9.9)

The iteration matrix of this method is given by 1 Cℓ = Cℓ (ω) = I − ωhℓ Aℓ . 2 In this simple model problem an orthogonal eigenvector basis of Aℓ , and thus of Cℓ , too, is known. This basis is closely related to the “Fourier modes”: wν (x) = sin(νπx),

x ∈ [0, 1],

ν = 1, 2, ... .

Note that wν satisfies the boundary conditions in (9.1) and that −(wν )′′ (x) = (νπ)2 wν (x) holds, and thus wν is an eigenfunction of the problem in (9.1). We introduce vectors zνℓ ∈ Rnℓ , 1 ≤ ν ≤ nℓ , which correspond to the Fourier modes wν restricted to the interior grid Ωint ℓ : zνℓ := [wν (ξℓ,1 ), wν (ξℓ,2 ), ..., wν (ξℓ,nℓ )]T . These vectors form an orthogonal basis of Rnℓ . For ℓ = 2 we give an illustration in figure 9.1. To a vector zνℓ there corresponds a frequency ν. If ν < 12 nℓ holds then the vector zνℓ , or the o

x x

o x

x

x

x

0

x

: z12

o

: z42

x

o

o

1

o

o

o

Figure 9.1: two discrete Fourier modes. corresponding finite element function Pℓ zνℓ , is called a “low frequency mode”, and if ν ≥ 199

1 2 nℓ

holds then this vector [finite element function] is called a “high frequency mode”. These vectors zνℓ are eigenvectors of the matrix Aℓ : Aℓ zνℓ = and thus we have

π 4 sin2 (ν hℓ )zνℓ , hℓ 2

π Cℓ zνℓ = (1 − 2ω sin2 (ν hℓ ))zνℓ . 2

(9.10)

From this we obtain kCℓ k2 = max1≤ν≤nℓ |1 − 2ω sin2 (ν π2 hℓ )| = 1 − 2ω sin2 ( π2 hℓ ) = 1 − 12 ωπ 2 h2ℓ + O(h4ℓ ) .

(9.11)

From this we see that the damped Jacobi method is convergent, but that the rate of convergence will be very low for hℓ small (cf. section 6.3). Note that the eigenvalues and the eigenvectors of Cℓ are functions of νhℓ ∈ [0, 1]: π := 1 − 2ω sin2 (ν hℓ ) =: gω (νhℓ ) , with 2 2 π gω (y) = 1 − 2ω sin ( y) (y ∈ [0, 1]) . 2 λℓ,ν

(9.12a) (9.12b)

Hence, the size of the eigenvalues λℓ,ν can directly be obtained from the graph of the function gω . In figure 9.2 we show the graph of the function gω for a few values of ω. From the graphs 1

ω=

1 3 1 2

ω=

2 3

ω=

-1

ω=1

Figure 9.2: Graph of gω . in this figure we conclude that for a suitable choice of ω we have |gω (y)| ≪ 1 if y ∈ [ 21 , 1]. We choose ω = 23 (then |gω ( 21 )| = |gω (1)| holds). Then we have |g 2 (y)| ≤ 13 for y ∈ [ 21 , 1]. Using this 3 and the result in (9.12a) we obtain |λℓ,ν | ≤

1 3

1 for ν ≥ nℓ . 2

Hence: 200

the high frequency modes are strongly damped by the iteration matrix Cℓ . From figure 9.2 it is also clear that the low rate of convergence of the damped Jacobi method is caused by the low frequency modes (νhℓ ≪ 1). Summarizing, we draw the conclusion that in this example the damped Jacobi method will “smooth” the error. This elementary observation is of great importance for the two-grid method introduced below. In the setting of multigrid methods the damped Jacobi method is called a “smoother”. The smoothing property of damped Jacobi is illustrated in figure 9.3. It is impor-

0

1

0

1

Graph of the error after one damped Jacobi iteration (ω = 23 ).

Graph of a starting error.

Figure 9.3: Smoothing property of damped Jacobi. tant to note that the discussion above concerning smoothing is related to the iteration matrix Cℓ , which means that the error will be made smoother by the damped Jacobi method, but not (necessarily) the new iterand xk+1 . In multigrid methods we have to transform information from one grid to another. For that purpose we introduce so-called prolongations and restrictions. In a setting with nested finite element spaces these operators can be defined in a very natural way. Due to the nestedness the identity operator Iℓ : X1hℓ−1 ,0 → X1hℓ ,0 , Iℓ v = v is well-defined. This identity operator represents linear interpolation as is illustrated for ℓ = 2 in figure 9.4. The matrix representation of this interpolation operator is given by

x x

x

0

1

x

X1h1,0

x

I2 ?

x x x

x

x

0

x

x x

1

X1h2,0

x

Figure 9.4: Canonical prolongation.

pℓ : Rnℓ−1 → Rnℓ , 201

pℓ := Pℓ−1 Pℓ−1

(9.13)

A simple computation yields 

1 2

 1  1   2    pℓ =       



           1  2  1 

1 2

1 1 2

..



.

1 2



(9.14)

nℓ ×nℓ−1

int We can also restrict a given grid function vℓ on Ωint ℓ to a grid function on Ωℓ−1 . An obvious approach is to use a restriction r based on simple injection:

(rinj vℓ )(ξ) = vℓ (ξ)

if ξ ∈ Ωint ℓ−1 .

When used in a multigrid method then often this restriction based on injection is not satisfactory (cf. Hackbusch [44], section 3.5). A better method is obtained if a natural Galerkin property is satisfied. It can easily be verified (cf. also lemma 9.3.2) that with Aℓ , Aℓ−1 and pℓ as defined in (9.8), (9.13) we have (9.15) rℓ Aℓ pℓ = Aℓ−1 iff rℓ = pTℓ Thus the natural Galerkin condition rℓ Aℓ pℓ = Aℓ−1 implies the choice rℓ = pTℓ

(9.16)

for the restriction operator. The two-grid method is based on the idea that a smooth error, which results from the application of one or a few damped Jacobi iterations, can be approximated fairly well on a coarser grid. We now introduce this two-grid method. Consider Aℓ x∗ℓ = bℓ and let xℓ be the result of one or a few damped Jacobi iterations applied to a given starting vector x0ℓ . For the error eℓ := x∗ℓ − xℓ we have Aℓ eℓ = bℓ − Aℓ xℓ =: dℓ

( “residual” or “defect”)

(9.17)

Based on the assumption that eℓ is smooth it seems reasonable to make the approximation ˜ℓ−1 with an appropriate vector (grid function) e ˜ℓ−1 ∈ Rnℓ−1 . To determine the vector eℓ ≈ pℓ e ˜ℓ−1 we use the equation (9.17) and the Galerkin property (9.15). This results in the equation e ˜ℓ−1 = rℓ dℓ Aℓ−1 e x∗

˜ℓ−1 . Thus for the new iterand we take ˜ℓ−1 . Note that for the vector e = xℓ + eℓ ≈ xℓ + pℓ e ˜ℓ−1 . In a more compact formulation this two-grid method is as follows: xℓ := xℓ + pℓ e  procedure TGMℓ (xℓ , bℓ )     if ℓ = 0 then x0 := A−1  0 b0 else    begin     xℓ := Jℓν (xℓ , bℓ ) (∗ ν smoothing it., e.g. damped Jacobi ∗)  dℓ−1 := rℓ (bℓ − Aℓ xℓ ) (∗ restriction of defect ∗) (9.18)  −1  ˜ℓ−1 := Aℓ−1 dℓ−1 (∗ solve coarse grid problem ∗) e      ˜ℓ−1 (∗ add correction ∗) x ℓ := xℓ + pℓ e     TGM := x ℓ ℓ   end; 202

˜ℓ−1 , one or a few smoothing iterations are Often, after the coarse grid correction xℓ := xℓ + pℓ e applied again. Smoothing before/after the coarse grid correction is called pre/post-smoothing. Besides the smoothing property a second property which is of great importance for a multigrid method is the following: ˜ℓ−1 = dℓ−1 is of the same form as the system Aℓ xℓ = bℓ . The coarse grid system Aℓ−1 e ˜ℓ−1 = dℓ−1 approximately we can apply the two-grid algoThus for solving the problem Aℓ−1 e rithm in (9.18) recursively. This results in the following multigrid method for solving Aℓ x∗ℓ = bℓ : procedure MGMℓ (xℓ , bℓ ) if ℓ = 0 then x0 := A−1 0 b0 else begin xℓ := Jℓν1 (xℓ , bℓ ) (∗ presmoothing ∗) dℓ−1 := rℓ (bℓ − Aℓ xℓ ) e0ℓ−1 := 0; for i = 1 to τ do eiℓ−1 := MGMℓ−1 (ei−1 ℓ−1 , dℓ−1 ); τ xℓ := xℓ + pℓ eℓ−1 xℓ := Jℓν2 (xℓ , bℓ ) (∗ postsmoothing ∗) MGMℓ := xℓ end;

(9.19)

If one wants to solve the system on a given finest grid, say with level number ℓ, i.e. Aℓ xℓ∗ = bℓ , then we apply some iterations of MGMℓ (xℓ , bℓ ). Based on efficiency considerations (cf. section 9.3) we usually choose τ = 1 (“V -cycle”) or τ = 2 (“W -cycle”) in the recursive call in (9.19). For the case ℓ = 3 the structure of one multigrid iteration with τ ∈ {1, 2} is illustrated in figure 9.5.

ℓ=3 ℓ=2 ℓ=1 ℓ=0





B B B



•

•B

B B• B

•

 B  B





 





B B

B

  



•B

• • : smoothing

 A   A B A• B• • • • •  : solve exactly A   B A A A  A  B  A  A A A B



τ =1







τ =2

Figure 9.5: Structure of one multigrid iteration

9.3

Multigrid for scalar elliptic problems

In this section we introduce multigrid methods which can be used for solving discretized elliptic boundary value problems. Opposite to the CG method, the applicability of multigrid methods 203

is not restricted to (nearly) symmetric problems. Multigrid methods can also be used for solving problems which are strongly nonsymmetric (convection dominated). However, for such problems one usually has to modify the standard multigrid approach. These modifications are discussed in section 9.5. We will introduce the two-grid and multigrid method by generalizing the approach of section 9.2 to the higher (i.e., two and three) dimensional case. We consider the finite element discretization of scalar elliptic boundary value problems as discussed in section 3.4. Thus the continuous variational problem is of the form ( find u ∈ H01 (Ω) such that (9.20) k(u, v) = f (v) for all v ∈ H01 (Ω) with a bilinear form and righthand side as in (3.42): Z ∇uT A∇v + b · ∇uv + cuv dx , k(u, v) =

f (v) =



Z

f v dx



The coefficients A, b, c are assumed to satisfy the conditions in (3.42). For the discretization of this problem we use simplicial finite elements. The case with rectangular finite elements can be treated in a very similar way. Let {Th } be a regular family of triangulations of Ω consisting of nsimplices and Xkh,0 , k ≥ 1, the corresponding finite element space as in (3.16). The presentation and implementation of the multigrid method is greatly simplified if we assume a given sequence of nested finite element spaces. Assumption 9.3.1 In the remainder of this chapter we always assume that we have a sequence Vℓ := Xkhℓ ,0 , ℓ = 0, 1, . . ., of simplicial finite element spaces which are nested: Vℓ ⊂ Vℓ+1

for all ℓ

(9.21)

We note that this assumption is not necessary for a succesful application of multigrid methods. For a treatment of multigrid methods in case of non-nestedness we refer to [69] (?). The construction of a hierarchy of triangulations such that the corresponding finite element spaces are nested is discussed in chapter ??. In Vℓ we use the standard nodal basis (φi )1≤i≤nℓ as explained in section 3.5. This basis induces an isomorphism nℓ X nℓ xi φi Pℓ : R → Vℓ , Pℓ x = i=1

The Galerkin discretization: Find uℓ ∈ Vℓ such that k(uℓ , vℓ ) = f (vℓ )

for all vℓ ∈ Vℓ

can be represented as a linear system Aℓ xℓ = bℓ , with (Aℓ )ij = k(φj , φi ), (bℓ )i = f (φi ), 1 ≤ i, j ≤ nℓ

(9.22)

Along the same lines as in the one-dimensional case we introduce a multigrid method for solving this system of equations on an arbitrary level ℓ ≥ 0. For the smoother we use one of the basic iterative methods discussed in section 6.2. For this method we use the notation k xk+1 = Sℓ (xk , bℓ ) = xk − M−1 ℓ (Aℓ x − b) ,

204

k = 0, 1, . . .

The corresponding iteration matrix is denoted by Sℓ = I − M−1 ℓ Aℓ For the prolongation we use the matrix representation of the identity Iℓ : Vℓ−1 → Vℓ , i.e., pℓ := Pℓ−1 Pℓ−1

(9.23)

The choice of the restriction is based on the following elementary lemma: Lemma 9.3.2 Let Aℓ , ℓ ≥ 0, be the stiffness matrix defined in (9.22) and pℓ as in (9.23). Then for rℓ : Rnℓ → Rnℓ−1 we have: rℓ Aℓ pℓ = Aℓ−1

if and only if

rℓ = pTℓ

Proof. For the stiffness matrix matrix the identity hAℓ x, yi = k(Pℓ x, Pℓ y) for all x, y ∈ Rnℓ holds. From this we get rℓ Aℓ pℓ = Aℓ−1 ⇔ hAℓ pℓ x, rTℓ yi = hAℓ−1 x, yi

for all x, y ∈ Rnℓ−1

⇔ k(Pℓ−1 x, Pℓ rTℓ y) = k(Pℓ−1 x, Pℓ−1 y) for all x, y ∈ Rnℓ−1 Using the ellipticity of k(·, ·) it now follows that rℓ Aℓ pℓ = Aℓ−1 ⇔ Pℓ rTℓ y = Pℓ−1 y

for all y ∈ Rnℓ−1

⇔ rTℓ y = Pℓ−1 Pℓ−1 y = pℓ y ⇔ rTℓ = pℓ

for all y ∈ Rnℓ−1

Thus the claim is proved. Thus for the restriction we take: rℓ := pTℓ

(9.24)

Using these components we can define a multigrid method with exactly the same structure as in (9.19) procedure MGMℓ (xℓ , bℓ ) if ℓ = 0 then x0 := A−1 0 b0 else begin xℓ := Sℓν1 (xℓ , bℓ ) (∗ presmoothing ∗) dℓ−1 := rℓ (bℓ − Aℓ xℓ ) e0ℓ−1 := 0; for i = 1 to τ do eiℓ−1 := MGMℓ−1 (ei−1 ℓ−1 , dℓ−1 ); τ xℓ := xℓ + pℓ eℓ−1 xℓ := Sℓν2 (xℓ , bℓ ) (∗ postsmoothing ∗) MGMℓ := xℓ end;

205

(9.25)

We briefly comment on some important issues related to this multigrid method. Smoothers For many problems basic iterative methods provide good smoothers. In particular the GaussSeidel method is often a very effective smoother. Other smoothers used in practice are the damped Jacobi method and the ILU method. Prolongation and restriction If instead of a discretization with nested finite element spaces one uses a finite difference or a finite volume method then one can not use the approach in (9.23) to define a prolongation. However, for these cases other canonical constructions for the prolongation operator exist. We refer to Hackbusch [44], [69] or Wesseling [96] for a treatment of this topic. A general technique for the construction of a prolongation operator in case of nonnested finite element spaces is given in [17]. Arithmetic costs per iteration We discuss the arithmetic costs of one MGMℓ iteration as defined in (9.25). For this we introduce a unit of arithmetic work on level ℓ: W Uℓ := # flops needed for Aℓ xℓ − bℓ computation.

(9.26)

We assume: W Uℓ−1 . g W Uℓ

with g < 1 independent of ℓ

(9.27)

Note that if Tℓ is constructed through a uniform global grid refinement of Tℓ−1 (for n = 2: subdivision of each triangle T ∈ Tℓ−1 into four smaller triangles by connecting the midpoints of the edges) then (9.27) holds with g = ( 21 )n . Furthermore we make the following assumptions concerning the arithmetic costs of each of the substeps in the procedure MGMℓ : xℓ := Sℓ (xℓ , bℓ ) :

dℓ−1 := rℓ (bℓ − Aℓ xℓ ) xℓ := xℓ + pℓ eτℓ−1

costs . W Uℓ  total costs . 2 W Uℓ

For the amount of work in one multigrid V-cycle (τ = 1) on level ℓ, which is denoted by V M Gℓ , we get using ν := ν1 + ν2 : V M Gℓ . νW U ℓ + 2W U ℓ + V M Gℓ−1 = (ν + 2)W U ℓ + V M Gℓ−1  . (ν + 2) W U ℓ + W U ℓ−1 + . . . + W U 1 + V M G0  . (ν + 2) 1 + g + . . . + gℓ−1 W U ℓ + V M G0 ν+2 W Uℓ . 1−g

(9.28)

In the last inequality we assumed that the costs for computing x0 = A−1 0 b0 (i.e., V M G0 ) are negligible compared to W U ℓ . The result in (9.28) shows that the arithmetic costs for one V-cycle are proportional (if ℓ → ∞) to the costs of a residual computation. For example, for g = 18 (uniform refinement in 3D) the arithmetic costs of a V-cycle with ν1 = ν2 = 1 on level ℓ are comparable to 4 12 times the costs of a residual computation on level ℓ. For the W-cycle (τ = 2) the arithmetic costs on level ℓ are denoted by W M Gℓ . We have: W M Gℓ . νW U ℓ + 2W U ℓ + 2W M Gℓ−1 = (ν + 2)W U ℓ + 2W M Gℓ−1  . (ν + 2) W U ℓ + 2W U ℓ−1 + 22 W U ℓ−2 + . . . + 2ℓ−1 W U 1 + W M G0  . (ν + 2) 1 + 2g + (2g)2 + . . . + (2g)ℓ−1 W U ℓ + W M G0 206

From this we see that to obtain a bound proportional to W U ℓ we have to assume g
0 and c2 independent of ℓ such that 1

n

c1 kPℓ xkL2 ≤ hℓ2 kxk2 ≤ c2 kPℓ xkL2

for all x ∈ Rnℓ

(9.33)

Proof. Let Mℓ be the mass matrix, i.e., (Mℓ )ij = hφi , φj iL2 . Note the basic equality kPℓ xk2L2 = hMℓ x, xi

for all x ∈ Rnℓ

There are constants d1 , d2 > 0 independent of ℓ such that ( for all i, j ≤ d1 hnℓ hφi , φj iL2 for all i = j ≥ d2 hnℓ 208

(9.34)

From this and from the sparsity of Mℓ we obtain d2 hnℓ ≤ (Mℓ )ii ≤ kMℓ k2 ≤ kMℓ k∞ ≤ d˜1 hnℓ

(9.35)

Using the upper bound in (9.35) in combination with (9.34) we get kPℓ xk2L2 ≤ kMℓ k2 kxk22 ≤ d˜1 hnℓ kxk22 , which proves the first inequality in (9.33). We now use corollary 3.5.10. This yields λmin (Mℓ ) ≥ c λmax (Mℓ ) with a strictly positive constant c independent of ℓ. Thus we have λmin (Mℓ ) ≥ ckMℓ k2 ≥ c˜hnℓ ,

c˜ > 0, independent of ℓ

This yields kPℓ xk2L2 = hMℓ x, xi ≥ λmin (Mℓ )kxk22 ≥ c˜hnℓ kxk22 , which proves the second inequality in (9.33). The third preliminary result concerns the scaling of the stiffness matrix: Lemma 9.4.4 Let Aℓ be the stiffness matrix as in (9.22). Assume that the bilinear form is such that the usual conditions (3.42) are satisfied. Then there exist constants c1 > 0 and c2 independent of ℓ such that c1 hℓn−2 ≤ kAℓ k2 ≤ c2 hℓn−2 Proof. First note that kAℓ k2 = maxn x,y∈R



hAℓ x, yi kxk2 kyk2

Using the result in lemma 9.4.3, the continuity of the bilinear form and the inverse inequality we get maxn

x,y∈R



hAℓ x, yi k(vℓ , wℓ ) ≤ chnℓ max vℓ ,wℓ ∈Vℓ kvℓ kL2 kwℓ kL2 kxk2 kyk2 |vℓ |1 |wℓ |1 ≤ c hℓn−2 ≤ chnℓ max vℓ ,wℓ ∈Vℓ kvℓ kL2 kwℓ kL2

and thus the upper bound is proved. The lower bound follows from maxn

x,y∈R



hAℓ x, yi ≥ max hAℓ ei , ei i = k(φi , φi ) ≥ c|φi |21 ≥ chℓn−2 1≤i≤nℓ kxk2 kyk2

The last inequality can be shown by using for T ⊂ supp(φi ) the affine transformation from the unit simplex to T .

9.4.2

Approximation property

In this section we derive a bound for the first factor in the splitting (9.32). In the analysis we will use the adjoint operator Pℓ∗ : Vℓ → Rnℓ which satisfies hPℓ x, vℓ iL2 = hx, Pℓ∗ vℓ i for all x ∈ Rnℓ , vℓ ∈ Vℓ . As a direct consequence of lemma 9.4.3 we obtain 1

n

c1 kPℓ∗ vℓ k2 ≤ hℓ2 kvℓ kL2 ≤ c2 kPℓ∗ vℓ k2 209

for all vℓ ∈ Vℓ

(9.36)

with constants c1 > 0 and c2 independent of ℓ. We now formulate a main result for the convergence analysis of multigrid methods: Theorem 9.4.5 (Approximation property.) Consider Aℓ , pℓ , rℓ as defined in (9.22), (9.23),(9.24). Assume that the variational problem (9.20) is such that the usual conditions (3.42) are satisfied. Moreover, the problem (9.20) and the corresponding dual problem are assumed to be H 2 -regular. Then there exists a constant CA independent of ℓ such that −1 −1 kA−1 ℓ − pℓ Aℓ−1 rℓ k2 ≤ CA kAℓ k2

for ℓ = 1, 2, . . .

(9.37)

Proof. Let bℓ ∈ Rnℓ be given. The constants in the proof are independent of bℓ and of ℓ. Consider the variational problems: u ∈ H01 (Ω) : uℓ ∈ Vℓ :

uℓ−1 ∈ Vℓ−1 :

k(u, v) = h(Pℓ∗ )−1 bℓ , viL2

k(uℓ , vℓ ) = h(Pℓ∗ )−1 bℓ , vℓ iL2

for all v ∈ H01 (Ω) for all vℓ ∈ Vℓ

k(uℓ−1 , vℓ−1 ) = h(Pℓ∗ )−1 bℓ , vℓ−1 iL2

for all vℓ−1 ∈ Vℓ−1

Then −1 −1 −1 A−1 ℓ bℓ = Pℓ uℓ and Aℓ−1 rℓ bℓ = Pℓ−1 uℓ−1

hold. Hence we obtain, using lemma 9.4.3, − 12 n

−1 −1 k(A−1 ℓ − pℓ Aℓ−1 rℓ )bℓ k2 = kPℓ (uℓ − uℓ−1 )k2 ≤ c hℓ

kuℓ − uℓ−1 kL2

(9.38)

Now we apply theorem 3.4.5 and use the H 2 -regularity of the problem. This yields kuℓ − uℓ−1 kL2 ≤ kuℓ − ukL2 + kuℓ−1 − ukL2

≤ ch2ℓ |u|2 + +ch2ℓ−1 |u|2 ≤ ch2ℓ k(Pℓ∗ )−1 bℓ kL2

(9.39)

Now we combine (9.38) with (9.39) and use (9.36). Then we get −1 2−n k(A−1 kbℓ k2 ℓ − pℓ Aℓ−1 rℓ )bℓ k2 ≤ c hℓ 2−n −1 . The proof is completed if we use lemma 9.4.4. and thus kA−1 ℓ − pℓ Aℓ−1 rℓ k2 ≤ c hℓ

Note that in the proof of the approximation property we use the underlying continuous problem.

9.4.3

Smoothing property

In this section we derive inequalities of the form kAℓ Sνℓ k2 ≤ g(ν)kAℓ k2 where g(ν) is a monotonically decreasing function with limν→∞ g(ν) = 0. In the first part of this section we derive results for the case that Aℓ is symmetric positive definite. In the second part we discuss the general case. Smoothing property for the symmetric positive definite case. We start with an elementary lemma: 210

Lemma 9.4.6 Let B ∈ Rm×m be a symmetric positive definite matrix with σ(B) ⊂ (0, 1]. Then we have 1 kB(I − B)ν k2 ≤ for ν = 1, 2, . . . 2(ν + 1) Proof. Note that ν ν 1 kB(I − B)ν k2 = max x(1 − x)ν = ν+1 ν+1 x∈(0,1]  ν ν is decreasing on [1, ∞). A simple computation shows that ν → ν+1

Below for a few basic iterative methods we derive the smoothing property for the symmetric case, i.e., b = 0 in the bilinear form k(·, ·). We first consider the Richardson method:

Theorem 9.4.7 Assume that in the bilinear form we have b = 0 and that the usual conditions (3.42) are satisfied. Let Aℓ be the stiffness matrix in (9.22). For c0 ∈ (0, 1] we have the smoothing property c0 1 kAℓ (I − Aℓ )ν k2 ≤ kAℓ k2 , ν = 1, 2, . . . ρ(Aℓ ) 2c0 (ν + 1) holds.

Proof. Note that Aℓ is symmetric positive definite. Apply lemma 9.4.6 with B := ωℓ Aℓ , ωℓ := c0 ρ(Aℓ )−1 . This yields kAℓ (I − ωℓ Aℓ )ν k2 ≤ ωℓ−1

1 1 1 ≤ ρ(Aℓ ) = kAℓ k2 2(ν + 1) 2c0 (ν + 1) 2c0 (ν + 1)

and thus the result is proved. A similar result can be shown for the damped Jacobi method: Theorem 9.4.8 Assume that in the bilinear form we have b = 0 and that the usual conditions (3.42) are satisfied. Let Aℓ be the stiffness matrix in (9.22) and Dℓ := diag(Aℓ ). There exists −1 an ω ∈ (0, ρ(D−1 ℓ Aℓ ) ], independent of ℓ, such that the smoothing property ν kAℓ (I − ωD−1 ℓ Aℓ ) k2 ≤

1 kAℓ k2 , 2ω(ν + 1)

ν = 1, 2, . . .

holds. 1

1

˜ := D− 2 Aℓ D− 2 . Note that Proof. Define the symmetric positive definite matrix B ℓ ℓ (Dℓ )ii = (Aℓ )ii = k(φi , φi ) ≥ c |φi |21 ≥ c hℓn−2 ,

(9.40)

with c > 0 independent of ℓ and i. Using this in combination with lemma 9.4.4 we get ˜ 2≤ kBk

kAℓ k2 ≤c, λmin (Dℓ )

c independent of ℓ.

−1 ˜ Hence for ω ∈ (0, 1c ] ⊂ (0, ρ(D−1 ℓ Aℓ ) ] we have σ(ω B) ⊂ (0, 1]. Application of lemma 9.4.6, ˜ yields with B = ω B, 1

1

ν −1 2 2 ˜ ˜ ν kAℓ (I − ωD−1 ℓ Aℓ ) k2 ≤ ω kDℓ k2 kω B(I − ω B) k2 kDℓ k2 kDℓ k2 1 ≤ ≤ kAℓ k2 2ω(ν + 1) 2ω(ν + 1)

and thus the result is proved.

211

Remark 9.4.9 The value of the parameter ω used in theorem 9.4.8 is such that ωρ(D−1 ℓ Aℓ ) = −1

−1

ωρ(Dℓ 2 Aℓ Dℓ 2 ) ≤ 1 holds. Note that −1

−1

ρ(Dℓ 2 Aℓ Dℓ 2 ) = max n x∈R



hAℓ ei , ei i hAℓ x, xi ≥ max =1 hDℓ x, xi 1≤i≤nℓ hDℓ ei ei i

and thus we have ω ≤ 1. This is in agreement with the fact that in multigrid methods one usually use a damped Jacobi method as a smoother.  We finally consider the symmetric Gauss-Seidel method. This method is the same as the SSOR method with parameter value ω = 1. Thus it follows from (6.18) that this method has an iteration matrix T (9.41) Mℓ = (Dℓ − Lℓ )D−1 Sℓ = I − M−1 ℓ (Dℓ − Lℓ ) , ℓ Aℓ ,

where we use the decomposition Aℓ = Dℓ − Lℓ − LTℓ with Dℓ a diagonal matrix and Lℓ a strictly lower triangular matrix. Theorem 9.4.10 Assume that in the bilinear form we have b = 0 and that the usual conditions (3.42) are satisfied. Let Aℓ be the stiffness matrix in (9.22) and Mℓ as in (9.41). The smoothing property c ν kAℓ (I − M−1 ν = 1, 2, . . . ℓ Aℓ ) k2 ≤ ν + 1 kAℓ k2 , holds with a constant c independent of ν and ℓ.

T Proof. Note that Mℓ = Aℓ + Lℓ D−1 ℓ Lℓ and thus Mℓ is symmetric positive definite. Define −1

−1

the symmetric positive definite matrix B := Mℓ 2 Aℓ Mℓ 2 . From 0 < max n x∈R



hBx, xi hAℓ x, xi hAℓ x, xi ≤1 = max = max nℓ nℓ x∈R hMℓ x, xi x∈R hAℓ x, xi + hD−1 LT x, LT xi hx, xi ℓ ℓ ℓ

it follows that σ(B) ⊂ (0, 1]. Application of lemma 9.4.6 yields 1

ν ν 2 2 kAℓ (I − M−1 ℓ Aℓ ) k2 ≤ kMℓ k2 kB(I − B) k2 ≤ kMℓ k2

1 2(ν + 1)

2−n From (9.40) we have kD−1 . Using the sparsity of Aℓ we obtain ℓ k2 ≤ c hℓ

kLℓ k2 kLTℓ k2 ≤ kLℓ k∞ kLℓ k1 ≤ c(max |(Aℓ )ij |)2 ≤ ckAℓ k22 i,j

In combination with lemma 9.4.4 we then get 2−n T kAℓ k22 ≤ ckAℓ k2 kMℓ k2 ≤ kD−1 ℓ k2 kLℓ k2 kLℓ k2 ≤ c hℓ

(9.42)

and this completes the proof. For the symmetric positive definite case smoothing properties have also been proved for other iterative methods. For example, in [98, 97] a smoothing property is proved for a variant of the ILU method and in [24] it is shown that the SPAI (sparse approximate inverse) preconditioner satisfies a smoothing property. Smoothing property for the nonsymmetric case. For the analysis of the smoothing property in the general (possibly nonsymmetric) case we can not use lemma 9.4.6. Instead the analysis will be based on the following lemma (cf. [74, 75]): 212

Lemma 9.4.11 Let k · k be any induced matrix norm and assume that for B ∈ Rm×m the inequality kBk ≤ 1 holds. The we have ν

ν+1

k(I − B)(I + B) k ≤ 2

r

2 , πν

for ν = 1, 2, . . .

Proof. Note that      ν ν  X X  k ν ν ν k ν+1 B − B =I−B + (I − B)(I + B) = (I − B) k k k−1 ν

k=1

k=0

This yields     ν X ν ν − k(I − B)(I + B) k ≤ 2 + k k−1 ν

k=1









ν ν Using ≥ k k−1 down operator):

⇔ k≤

1 2 (ν



+ 1) and

    ν X ν ν − k k−1

ν k







ν ν−k



we get (with [ · ] the round

k=1

[ 21 (ν+1)]

=

X 1



ν k





  ν − + k−1

ν X

[ 12 (ν+1)]+1



   ν ν  − k−1 k

[ 21 ν]  [ 21 ν]        X  X  ν ν ν ν = − − + k m k−1 m−1 1

m=1

[ 21 ν]

=2

    X  ν   ν  ν  ν − =2 − 1 0 k k−1 [ 2 ν] k=1

An elementary analysis yields (cf., for example, [75]) 

ν 1 [ 2 ν]



ν

≤2

r

2 πν

for ν ≥ 1

Thus we have proved the bound. Corollary 9.4.12 Let k · k be any induced matrix norm. Assume that for a linear iterative method with iteration matrix I − M−1 ℓ Aℓ we have kI − M−1 ℓ Aℓ k ≤ 1 Then for Sℓ := I − 12 M−1 ℓ Aℓ the following smoothing property holds: kAℓ Sνℓ k

r

≤2

2 kMℓ k , πν 213

ν = 1, 2, . . .

(9.43)

Proof. Define B = I − M−1 ℓ Aℓ and apply lemma 9.4.11: kAℓ Sνℓ k

r 1 ν 2 ν kMℓ k k(I − B)(I + B) k ≤ 2 ≤ kMℓ k 2 πν

Remark 9.4.13 Note that in the smoother in corollary 9.4.12 we use damping with a factor 21 . Generalizations of the results in lemma 9.4.11 and corollary 9.4.12 are given in [66, 49, 32]. In [66, 32] it is shown that the damping factor 21 can be replaced by an arbitrary damping factor ω ∈ (0, 1). Also note that in the smoothing property in corollary 9.4.12 we have a ν-dependence 1 of the form ν − 2 , whereas in the symmetric case this is of the form ν −1 . It [49] it is noted that 1 this loss of a factor ν 2 when going to the nonsymmetric case is due to the fact that complex eigenvalues may occur. Assume that M−1 ℓ Aℓ is a normal matrix. The assumption (9.43) implies that σ(M−1 A ) ⊂ K := { z ∈ C | |1 − z| ≤ 1 }. We have: ℓ ℓ 1 1 1 −1 ν 2 |z(1 − z)ν |2 = max |z(1 − z)ν |2 kM−1 ℓ Aℓ (I − 2 Mℓ Aℓ ) k2 ≤ max z∈K z∈∂K 2 2 1 1 = max |1 + eiφ |2 | − eiφ |2ν 2 2 φ∈[0,2π] 1 1 1 1 = max 4( + cos φ)( − cos φ)ν 2 2 2 2 φ∈[0,2π] ν ν 4 = max 4ξ(1 − ξ)ν = ν +1 ν +1 ξ∈[0,1]

Note that the latter function of ν also occurs in the proof of lemma 9.4.6. We conclude that for the class of normal matrices M−1 ℓ Aℓ an estimate of the form 1 −1 c ν kM−1 ℓ Aℓ (I − 2 Mℓ Aℓ ) k2 ≤ √ , ν

ν = 1, 2, . . .

is sharp with respect to the ν-dependence.



To verify the condition in (9.43) we will use the following elementary result: Lemma 9.4.14 If E ∈ Rm×m is such that there exists a c > 0 with kExk22 ≤ chEx, xi

for all x ∈ Rm

then we have kI − ωEk2 ≤ 1 for all ω ∈ [0, 2c ]. Proof. Follows from: k(I − ωE)xk22 = kxk22 − 2ωhEx, xi + ω 2 kExk22 2 ≤ kxk22 − ω( − ω)kExk22 c 2 ≤ kxk22 if ω( − ω) ≥ 0 c

We now use these results to derive a smoothing property for the Richardson method. 214

Theorem 9.4.15 Assume that the bilinear form satisfies the usual conditions (3.42). Let Aℓ be the stiffness matrix in (9.22). There exist constants ω > 0 and c independent of ℓ such that the following smoothing property holds: c kAℓ (I − ωhℓ2−n Aℓ )ν k2 ≤ √ kAℓ k2 , ν

ν = 1, 2, . . .

Proof. Using lemma 9.4.3, the inverse inequality and the ellipticity of the bilinear form we get, for arbitrary x ∈ Rnℓ : 1 k(Pℓ x, vℓ ) hAℓ x, yi n ≤ c hℓ2 max vℓ ∈Vℓ kvℓ kL2 y∈R ℓ kyk2 1 1 |Pℓ x|1 |vℓ |1 n n−1 ≤ c hℓ2 max ≤ c hℓ2 |Pℓ x|1 vℓ ∈Vℓ kvℓ kL2

kAℓ xk2 = max n

1

≤ c hℓ2

n−1

1

1

k(Pℓ x, Pℓ x) 2 = c hℓ2

n−1

1

hAℓ x, xi 2

From this and lemma 9.4.14 it follows that there exists a constant ω > 0 such that Aℓ k2 ≤ 1 kI − 2ωh2−n ℓ

for all ℓ

(9.44)

1 n−2 Define Mℓ := 2ω hℓ I. From lemma 9.4.4 it follows that there exists a constant cM independent of ℓ such that kMℓ k2 ≤ cM kAℓ k2 . Application of corollary 9.4.12 proves the result of the lemma.

We now consider the damped Jacobi method. Theorem 9.4.16 Assume that the bilinear form satisfies the usual conditions (3.42). Let Aℓ be the stiffness matrix in (9.22) and Dℓ = diag(Aℓ ). There exist constants ω > 0 and c independent of ℓ such that the following smoothing property holds: c ν kAℓ (I − ωD−1 ℓ Aℓ ) k2 ≤ √ν kAℓ k2 ,

ν = 1, 2, . . . 1

Proof. We use the matrix norm induced by the vector norm kykD := kDℓ2 yk2 for y ∈ Rnℓ . 1

−1

Note that for B ∈ Rnℓ ×nℓ we have kBkD = kDℓ2 BDℓ 2 k2 . The inequalities 2−n kD−1 , ℓ k2 ≤ c1 hℓ

κ(Dℓ ) ≤ c2

(9.45)

hold with constants c1 , c2 independent of ℓ. Using this in combination with lemma 9.4.3, the inverse inequality and the ellipticity of the bilinear form we get, for arbitrary x ∈ Rnℓ : −1 −1 kDℓ 2 Aℓ Dℓ 2 xk2

−1

−1

−1

−1

k(Pℓ Dℓ 2 x, Pℓ Dℓ 2 y) hAℓ Dℓ 2 x, Dℓ 2 yi = max = max y∈Rnℓ y∈Rnℓ kyk2 kyk2 −1

−1

|Pℓ Dℓ 2 x|1 kPℓ Dℓ 2 ykL2 max ≤ c h−1 ℓ y∈Rnℓ kyk2 1

≤ c hℓ2

n−1

−1

−1

−1

|Pℓ Dℓ 2 x|1 kDℓ 2 k2 ≤ c |Pℓ Dℓ 2 x|1 −1

−1

1

−1

−1

1

≤ c k(Pℓ Dℓ 2 x, Pℓ Dℓ 2 x) 2 = c hDℓ 2 Aℓ Dℓ 2 x, xi 2 215

From this and lemma 9.4.14 it follows that there exists a constant ω > 0 such that −1

−1

2 2 kI − 2ωD−1 ℓ Aℓ kD = kI − 2ωDℓ Aℓ Dℓ k2 ≤ 1 for all ℓ

Define Mℓ := yields

1 2ω Dℓ .

Application of corollary 9.4.12 with k · k = k · kD in combination with (9.45)

1 1 −1 ν ν 2 kAℓ (I − ωhℓ D−1 A ) k ≤ κ(D 2 ℓ ℓ ) kAℓ (I − 2 Mℓ Aℓ ) kD ℓ c c c √ kDℓ k2 ≤ √ kAℓ k2 ≤ √ kMℓ kD = ν 2ω ν ν

and thus the result is proved.

9.4.4

Multigrid contraction number

In this section we prove a bound for the contraction number in the Euclidean norm of the multigrid algorithm (9.25) with τ ≥ 2. We follow the analysis in [44, 48]. Apart from the approximation and smoothing property that have been proved in the sections 9.4.2 and 9.4.3 we also need the following stability bound for the iteration matrix of the smoother: ∃ CS : kSνℓ k2 ≤ CS for all ℓ and ν (9.46) Lemma 9.4.17 Consider the Richardson method as in theorem 9.4.7 or theorem 9.4.15. In both cases (9.46) holds with CS = 1. Proof. In the symmetric case (theorem 9.4.7) we have kSℓ k2 = kI −

c0 λ Aℓ k2 = max 1 − c0 ≤1 ρ(Aℓ ) ρ(Aℓ ) λ∈σ(Aℓ )

For the general case (theorem 9.4.15) we have, using (9.44):

1 1 Aℓ )k2 kSℓ k2 = kI − ωh2−n Aℓ k2 = k I + (I − 2ωh2−n ℓ ℓ 2 2 1 1 ≤ + kI − 2ωh2−n Aℓ k2 ≤ 1 ℓ 2 2

Lemma 9.4.18 Consider the damped Jacobi method as in theorem 9.4.8 or theorem 9.4.16. In both cases (9.46) holds. Proof. Both in the symmetric and nonsymmetric case we have 1

−1

2 kSℓ kD = kDℓ2 (I − ωD−1 ℓ Aℓ )Dℓ k2 ≤ 1

and thus

−1

1

−1

1

1

1

kSνℓ k2 ≤ kDℓ 2 (Dℓ2 Sℓ Dℓ 2 )ν Dℓ2 k2 ≤ κ(Dℓ2 ) kSℓ kνD ≤ κ(Dℓ2 ) Now note that Dℓ is uniformly (w.r.t. ℓ) well-conditioned. Treatment of symmetric Gauss-Seidel method: in preparation. 216

Using lemma 9.4.3 it follows that for pℓ = Pℓ−1 Pℓ−1 we have Cp,1 kxk2 ≤ kpℓ xk2 ≤ Cp,2 kxk2

for all x ∈ Rnℓ−1

(9.47)

with constants Cp,1 > 0 and Cp,2 independent of ℓ. We now formulate a main convergence result for the multigrid method. Theorem 9.4.19 Consider the multigrid method with iteration matrix given in (9.30) and parameter values ν2 = 0, ν1 = ν > 0, τ ≥ 2. Assume that there are constants CA , CS and a monotonically decreasing function g(ν) with g(ν) → 0 for ν → ∞ such that for all ℓ: −1 −1 kA−1 ℓ − pℓ Aℓ−1 rℓ k2 ≤ CA kAℓ k2

kAℓ Sνℓ k2 kSνℓ k2

≤ g(ν) kAℓ k2 ,

≤ CS ,

ν≥1

(9.48a) ν≥1

(9.48b) (9.48c)

For any ξ ∗ ∈ (0, 1) there exists a ν ∗ such that for all ν ≥ ν ∗ kCM G,ℓ k2 ≤ ξ ∗ ,

ℓ = 0, 1, . . .

holds. Proof. For the two-grid iteration matrix we have −1 ν kCT G,ℓ k2 ≤ kA−1 ℓ − pℓ Aℓ−1 rℓ k2 kAℓ Sℓ k2 ≤ CA g(ν)

Define ξℓ = kCM G.ℓ k2 . From (9.30) we obtain ξ0 = 0 and for ℓ ≥ 1: ν τ kA−1 ξℓ ≤ CA g(ν) + kpℓ k2 ξℓ−1 ℓ−1 rℓ Aℓ Sℓ k2

−1 τ ν ≤ CA g(ν) + Cp,2 Cp,1 ξℓ−1 kpℓ A−1 ℓ−1 rℓ Aℓ Sℓ k2

 −1 τ ν ν ≤ CA g(ν) + Cp,2 Cp,1 ξℓ−1 k(I − pℓ A−1 ℓ−1 rℓ Aℓ )Sℓ k2 + kSℓ k2  −1 τ τ ≤ CA g(ν) + Cp,2 Cp,1 ξℓ−1 CA g(ν) + CS ≤ CA g(ν) + C ∗ ξℓ−1

−1 with C ∗ := Cp,2 Cp,1 (CA g(1)+ CS ). Elementary analysis shows that for τ ≥ 2 and any ξ ∗ ∈ (0, 1) the sequence x0 = 0, xi = CA g(ν)+C ∗ xτi−1 , i ≥ 1, is bounded by ξ ∗ for g(ν) sufficiently small.

Remark 9.4.20 Consider Aℓ , pℓ , rℓ as defined in (9.22), (9.23),(9.24). Assume that the variational problem (9.20) is such that the usual conditions (3.42) are satisfied. Moreover, the problem (9.20) and the corresponding dual problem are assumed to be H 2 -regular. In the multigrid method we use the Richardson or the damped Jacobi method described in section 9.4.3. Then the assumptions 9.48 are fulfilled and thus for ν2 = 0 and ν1 sufficiently large the multigrid W-cylce has a contractrion number smaller than one indpendent of ℓ.  Remark 9.4.21 Let CM G,ℓ (ν2 , ν1 ) be the iteration matrix of the multigrid method with ν1 preand ν2 postsmoothing iterations. With ν := ν1 + ν2 we have   ρ CM G,ℓ (ν2 , ν1 ) = ρ CM G,ℓ (0, ν) ≤ kCM G,ℓ (0, ν)k2

Using theorem 9.4.19 we thus get, for τ ≥ 2, a bound for the spectral radius of the iteration matrix CM G,ℓ (ν2 , ν1 ).  217

Remark 9.4.22 Note on other convergence analyses. Xu, Yserentant (quasi-uniformity not needed in BPX). Comment on regularity. Book Bramble. 

9.4.5

Convergence analysis for symmetric positive definite problems

In this section we analyze the convergence of the multigrid method for the symmetric positive definite case, i.e., the stiffness matrix Aℓ is assumed to be symmetric positive definite. This property allows a refined analysis which proves that the contraction number of the multigrid method with τ ≥ 1 (the V-cycle is included !) and ν1 = ν2 ≥ 1 pre- and postsmoothing iterations is bounded by a constant smaller than one independent of ℓ. The basic idea of this analysis is due to [18] and is further simplified in [44, 48]. Throughout this section we make the following Assumption 9.4.23 In the bilinear form k(·, ·) in (9.20) we have b = 0 and the conditions (3.42) are satisfied. Due to this the stiffness matrix Aℓ is symmetric positive definite and we can define the energy scalar product and corresponding norm: 1

hx, yiA := hAℓ x, yi ,

kxkA := hx, xiA2

x, y ∈ Rnℓ

We only consider smoothers with an iteration matrix Sℓ = I − M−1 ℓ Aℓ in which Mℓ is symmetric positive definite. Important examples are the smoothers analyzed in section 9.4.3: Richardson method : Damped Jacobi : Symm. Gauss-Seidel :

Mℓ = c−1 0 ρ(Aℓ )I , Mℓ = ω

−1

Dℓ ,

Mℓ = (Dℓ −

c0 ∈ (0, 1]

ω as in thm. 9.4.8

Lℓ )D−1 ℓ (Dℓ



LTℓ )

(9.49a) (9.49b) (9.49c)

For symmetric matrices B, C ∈ Rm×m we use the notation B ≤ C iff hBx, xi ≤ hCx, xi for all x ∈ Rm . Lemma 9.4.24 For Mℓ as in (9.49) the following properties hold:

∃ CM :

Aℓ ≤ Mℓ

kMℓ k2 ≤ CM kAℓ k2

for all ℓ

(9.50a)

for all ℓ

(9.50b)

Proof. For the Richardson method the result is trivial. For the damped Jacobi method we −1

−1

−1 −1 2 2 have ω ∈ (0, ρ(D−1 ℓ Aℓ ) ] and thus ωρ(Dℓ Aℓ Dℓ ) ≤ 1. This yields Aℓ ≤ ω Dℓ = Mℓ . The result in (9.50b) follows from kDℓ k2 ≤ kAℓ k2 . For the symmetric Gauss-Seidel method the T results (9.50a) follows from Mℓ = Aℓ + Lℓ D−1 ℓ Lℓ and the result in (9.50b) is proved in (9.42).

We introduce the following modified approximation property: ∃ C˜A :

1  21 ˜ kMℓ2 Aℓ−1 − pℓ A−1 ℓ−1 rℓ Mℓ k2 ≤ CA

for ℓ = 1, 2, . . .

(9.51)

We note that the standard approximation property (9.37) implies the result (9.51) if we consider the smoothers in (9.49): 218

Lemma 9.4.25 Consider Mℓ as in (9.49) and assume that the approximation property (9.37) holds. Then (9.51) holds with C˜A = CM CA . Proof. Trivial. One easily verifies that for the smoothers in (9.49) the modified approximation property (9.51) implies the standard approximation property (9.37) if κ(Mℓ ) is uniformly (w.r.t. ℓ) bounded. The latter property holds for the Richardson and the damped Jacobi method. We will analyze the convergence of the two-grid and multigrid method using the energy scalar product. For matrices B, C ∈ Rnℓ ×nℓ that are symmetric w.r.t. h·, ·iA we use the notation B ≤A C iff hBx, xiA ≤ hCx, xiA for all x ∈ Rnℓ . Note that B ∈ Rnℓ ×nℓ is symmetric w.r.t. h·, ·iA iff (Aℓ B)T = Aℓ B holds. We also note the following elementary property for symmetric matrices B, C ∈ Rnℓ ×nℓ : B ≤ C ⇔ BAℓ ≤A CAℓ (9.52) We now turn to the two-grid method. For the coarse grid correction we introduce the notation 1 Qℓ := I − pℓ A−1 ℓ−1 rℓ Aℓ . For symmetry reasons we only consider ν1 = ν2 = 2 ν with ν > 0 even. The iteration matrix of the two-grid method is given by 1

ν

1

CT G,ℓ = CT G,ℓ (ν) = Sℓ2 Qℓ Sℓ2

ν

Due the symmetric positive definite setting we have the following fundamental property: Theorem 9.4.26 The matrix Qℓ is an orthogonal projection w.r.t. h·, ·iA . Proof. Follows from Q2ℓ = Qℓ

and (Aℓ Qℓ )T = Aℓ Qℓ

As an direct consequence we have 0 ≤A Q ℓ ≤A I

(9.53)

The next lemma gives another characterization of the modified approximation property: Lemma 9.4.27 The property (9.51) is equivalent to 0 ≤A Qℓ ≤A C˜A M−1 ℓ Aℓ

for ℓ = 1, 2, . . .

Proof. Using (9.52) we get 1  12 −1 ˜ kMℓ2 A−1 ℓ − pℓ Aℓ−1 rℓ Mℓ k2 ≤ CA for all ℓ 1  21 −1 ⇔ − C˜A I ≤ Mℓ2 A−1 Mℓ ≤ C˜A I for all ℓ − p A r ℓ ℓ ℓ ℓ−1 ⇔ − C˜A M−1 ≤ A−1 − pℓ A−1 rℓ ≤ C˜A M−1 for all ℓ





ℓ−1

−1 ˜ ⇔ − C˜A M−1 ℓ Aℓ ≤A Qℓ ≤A CA Mℓ Aℓ



for all ℓ

In combination with (9.53) this proves the result. We now present a convergence result for the two-grid method: 219

(9.54)

Theorem 9.4.28 Assume that (9.50a) and (9.51) hold. Then we have −1 ν y) kCT G,ℓ (ν)kA ≤ max y(1 − C˜A y∈[0,1]  −1 ν  (1 − C˜A ) if ν ≤ C˜A − 1 =  C˜A ν ν if ν ≥ C˜A − 1 ν+1 ν+1

(9.55)

Proof. Define Xℓ := M−1 ℓ Aℓ . This matrix is symmetric w.r.t. the energy scalar product and from (9.50a) it follows that 0 ≤ A Xℓ ≤ A I (9.56) holds. From lemma 9.4.27 we obtain 0 ≤A Qℓ ≤A C˜A Xℓ . Note that due to this, (9.56) and the fact that Qℓ is an A-orthogonal projection which is not identically zero we get C˜A ≥ 1

(9.57)

Using (9.53) we get 0 ≤A Qℓ ≤A αC˜A Xℓ + (1 − α)I

for all α ∈ [0, 1]

(9.58)

Hence, using Sℓ = I − Xℓ we have  1 1 0 ≤A CT G,ℓ (ν) ≤A (I − Xℓ ) 2 ν αC˜A Xℓ + (1 − α)I (I − Xℓ ) 2 ν

for all α ∈ [0, 1] , and thus

 kCT G,ℓ (ν)kA ≤ min max αC˜A x + (1 − α) (1 − x)ν α∈[0,1] x∈[0,1]

A minimax result (cf., for example, [83]) shows that in the previous expression the min and max operations can be interchanged. A simple computation yields  max min αC˜A x + (1 − α) (1 − x)ν x∈[0,1] α∈[0,1]  = max max C˜A x(1 − x)ν , max (1 − x)ν ˜ −1 ,1] x∈[C A

˜ −1 ] x∈[0,C A

=

−1 ν y) max C˜A x(1 − x)ν = max y(1 − C˜A

˜ −1 ] x∈[0,C A

y∈[0,1]

This proves the inequality in (9.55). An elementary computation shows that the equality in (9.55) holds. We now show that the approach used in the convergence analysis of the two-grid method in theorem 9.4.28 can also be used for the multigrid method. We start with an elementary result concerning a fixed point iteration that will be used in theorem 9.4.30. Lemma 9.4.29 For given constants c > 1, ν ≥ 1 define g : [0, 1) → R by ( ν (1 − 1c )ν if 0 ≤ ξ < 1 − c−1   g(ξ) = ν c ν 1 ξ ν+1 ν if 1 − c−1 ≤ξ 4) smoothing iterations or more than two recursive calls in (9.25) will make a multigrid method less efficient. Stopping criterion. In general, for the discrete solution x∗ℓ (with corresponding finite element function uℓ = Pℓ x∗ℓ ) we have a discretization error , so it does not make sense to solve the discrete problem to machine accuracy. For a large class of elliptic boundary value the following estimate for the discretization error holds: ku − uℓ k ≤ ch2ℓ . If in the multigrid iteration one has an arbitrary starting vector (e.g., 0) then the error reduction factor R should be taken proportional to h−2 ℓ . Using the multigrid iteration MGMℓ one then needs approximately /| ln α| ≈ c ln nℓ /| ln α| iterations to obtain an approximation with the deln R/| ln α| ≈ ln ch−2 ℓ sired accuracy. Per iteration we need O(nℓ ) flops. Hence we conclude: When we use a multigrid method for computing an approximation uℓ of u with accuracy comparable to the discretization error in uℓ , the arithmetic costs are of the order c nℓ ln nℓ

flops .

(9.64)

Multigrid and nested iteration. For an analysis of the multigrid method used in a nested iteration we refer to Hackbusch [44]. From this analysis it follows that a small fixed number of MGMℓ iterations (i.e. k in (9.62)) on each level ℓ ≤ ℓ in the nested iteration method is sufficient to obtain an approximation xℓ of x∗ℓ with accuracy comparable to the discretization error in x∗ℓ . 225

The arithmetic costs of this combined multigrid and nested iteration method are of the order c n | ln α| ℓ

flops.

(9.65)

When we compare the costs in (9.64) with the costs in (9.65) we see that using the nested iteration approach results in a more efficient algorithm. From the work estimate in (9.65) we conclude: Using multigrid in combination with nested iteration we can compute an approximation xℓ of xℓ∗ with accuracy comparable to the discretization error in xℓ∗ and with arithmetic costs ≤ Cnℓ flops (C independent of ℓ). Example 9.7.2 To illustrate the behaviour of multigrid in combination with nested iteration we show numerical results for an example from Hackbusch [44]. In the Poisson problem as in example 9.7.1 we take boundary conditions and a righthand side such that the solution is given by u(x, y) = 12 y 3 /(x + 1), so we consider: (

−∆u = −(3y/(x + 1) + y 3 /(x + 1)3 ) in Ω = [0, 1]2 , u(x, y) = 12 y 3 /(x + 1)

on ∂Ω .

For the discretization we apply linear finite elements on a family of nested uniform triangulations with mesh size hℓ = 2−ℓ−1 . The discrete solution on level ℓ is denoted by uℓ . The discretization error, measured in a weighted Euclidean norm is given in table 9.3. From these results one can hℓ kuℓ − uk2

1/8 2.64 10−5

1/16 6.89 10−6

1/32 1.74 10−6

1/64 4.36 10−7

Table 9.3: Discretization errors. observe a ch2ℓ behaviour of the discretization error. We apply the nested iteration approach of section 9.6 in combination with the multigrid method. We start with a coarsest triangulation Th0 with mesh size h0 = 12 (this contains only one interior grid point). For the prolongation ˜ ℓ used in the nested iteration we take the prolongation pℓ as in the multigrid method (linear p interpolation). When we apply only one multigrid iteration on each level ℓ ≤ ℓ (i.e k = 1 in ˜ ℓ x1ℓ−1 ) and x1ℓ of x∗ℓ (= Pℓ−1 uℓ ) (cf. figure 9.7). The (9.63)) we obtain approximations x0ℓ (= p errors in these approximations are given in table 9.4. In that table we also give the errors for the case with two multigrid iterations on each level (i.e., k = 2 in (9.63)). Comparing the results in table 9.4 with the discretization errors given in table 9.3 we see that we only need two multigrid iterations on each grid to compute an approximation of x∗ℓ (0 ≤ ℓ ≤ ℓ) with accuracy comparable to the discretization error in x∗ℓ . 

9.8

Algebraic multigrid methods

9.9

Nonlinear multigrid

226

hℓ 1/8

kxiℓ − x∗ℓ k2 , k = 1 x02 7.24 10−3 1 x2 5.98 10−4

1/16

x03 x13

2.09 10−3 1.30 10−4

1/32

x04 x14

5.17 10−4 2.54 10−5

1/64

x05 x15

1.23 10−4 4.76 10−6

kxiℓ − x∗ℓ k2 , k = 2 x02 6.47 10−3 1 x2 4.92 10−4 x22 2.86 10−5 0 x3 1.73; 10−3 1 x3 9.91 10−5 x23 4.91 10−6 0 x4 4.43 10−4 x14 1.82 10−5 2 x4 8.52 10−7 x05 1.12 10−4 1 x5 3.25 10−6 2 x5 1.47 10−7

Table 9.4: Errors in nested iteration.

227

228

Chapter 10

Iterative methods for saddle-point problems In this chapter we discuss a class of iterative methods for solving a linear system with a matrix of the form   A BT K= B 0 (10.1) A ∈ Rm×m symmetric positive definite B ∈ Rn×m rank(B) = n < m

The so-called Schur complement matrix is given by S := BA−1 BT . Note that S is symmetric positive definite. The symmetic matrix K is (strongly) indefinite: Lemma 10.0.1 The matrix K has m strictly positive and n strictly negative eigenvalues. Proof. From the factorization K=



A 0 B I



  A−1 0 A BT 0 I 0 −S

it follows that K is congruent to the matrix blockdiag(A−1 , −S) which has m strictly positive and n strictly negative eigenvalues. Now apply Sylvester’s inertia theorem. Remark 10.0.2 Consider a linear system of the form     f v = 1 K w f2

(10.2)

with K as in (10.1). Define the functional L : Rm × Rn → R by L(v, w) = 12 hAv, vi + hBv, wi − hf1 , vi − hf2 , wi. Using the same arguments as in the proof of theorem 2.4.2 one can easily show that (v∗ , w∗ ) is a solution of the problem (10.2) iff L(v∗ , w) ≤ L(v∗ , w∗ ) ≤ L(v, w∗ ) for all v ∈ Rm , w ∈ Rn Due to this property the linear system (10.2) is called a saddle-point problem.



In section 8.3 we discussed the preconditioned MINRES method for solving a linear system with a symmetric indefinite matrix. This method can be applied to the system in (10.2). Recall that the preconditioner must be symmetric positive definite. In section 10.1 we analyze a particular preconditioning technique for the matrix K. In section 10.2 we apply these methods to the discrete Stokes problem. 229

10.1

Block diagonal preconditioning

In this section we analyze the effect of symmetric preconditioning of the matrix K in (10.1) with a block diagonal matrix   MA 0 M := 0 MS MA ∈ Rm×m , MA = MTA > 0,

MS ∈ Rn×n , MS = MTS > 0

The preconditioned matrix is given by ˜ = M− 21 KM− 12 = K 1

1



˜ B ˜T A ˜ B 0



1

1

˜ := M− 2 AM− 2 , B ˜ := M− 2 BM− 2 A A A S A We first consider a very special preconditoner, which in a certain sense is optimal: Lemma 10.1.1 For MA = A and MS = S we have √ √ ˜ = { 1 (1 − 5) , 1 , 1 (1 + 5) } σ(K) 2 2 Proof. Note that ˜ = K



˜T I B ˜ B 0



,

˜ = S− 12 BA− 12 B

    ˜ has a nontrivial kernel. For v ∈ ker(B), ˜ v 6= 0, we have K ˜ v = v and thus The matrix B 0 0 ˜ ˜ 1 ∈ σ(K). For µ ∈ σ(K), µ 6= 1, we get      ˜T I B v v =µ , w 6= 0 ˜ w w B 0 √ ˜B ˜ T ) = σ(I) = {1} and thus µ = 1 (1 ± 5). This holds iff µ(µ − 1) ∈ σ(B 2 Note that from the result in (8.41) it follows that the preconditioned MINRES method with the preconditioner as in lemma 10.1.1 yields (in exact arithmetic) the exact solution in at most three iterations. In most applications (e.g., the Stokes problem) it is very costly to solve linear systems with the matrices A and S. Hence this preconditioner is not feasible. Instead we will use approximations MA of A and MS of S. The quality of these approximations is measured by the following spectral inequalities, with γA , γS > 0: γA MA ≤ A ≤ ΓA MA γS MS ≤ S ≤ ΓS MS

(10.3)

Using an analysis as in [77, 82] we obtain a result for the eigenvalues of the preconditioned matrix: ˜ with preconditioners that satisfy (10.3) we have: Theorem 10.1.2 For the matrix K q q   ˜ ⊂ 1 (γA − γ 2 + 4ΓS ΓA , 1 (γA − γ 2 + 4γS γA σ(K) A A 2 2 q   1 ∪ γA , (ΓA + Γ2A + 4ΓS ΓA 2 230

Proof. We use the following inequalities ˜ ≤ ΓA I γA I ≤ A γA A

γS I ≤ 1

(10.4a)

−1 ≤ M−1 A ≤ ΓA A −1 −1 MS 2 SMS 2 ≤ ΓS I

−1

(10.4b) (10.4c)

1

˜B ˜ T = M− 2 BM−1 BT M− 2 . Using (10.4b) and (10.4c) we get Note that B A S S ˜B ˜ T ≤ ΓA ΓS I γA γS I ≤ B

(10.5)

˜ Then µ 6= 0 and there exists (v, w) 6= (0, 0) such that Take µ ∈ σ(K). ˜ +B ˜ T w = µv Av ˜ = µw Bv

(10.6)

˜ + From v = 0 it follows that w = 0, hence, v 6= 0 must hold. From (10.6) we obtain (A 1 ˜T ˜ 1 ˜T ˜ T T ˜ ˜ ˜ ˜ ˜ µ B B)v = µv and thus µ ∈ σ(A + µ B B). Note that σ(B B) = σ(BB ) ∪ {0}. We first consider the case µ > 0. Using (10.5) and (10.4a) we get ˜ + γA I ≤ A

1 1 ˜T ˜ B B ≤ (ΓA + ΓS ΓA )I µ µ

and thus γA ≤ µ ≤ ΓA + µ1 ΓS ΓA holds. This yields q   1 µ ∈ γA , (ΓA + Γ2A + 4ΓS ΓA ) 2

We now consider the case µ < 0. From (10.5) and (10.4a) it follows that ˜ + 1B ˜TB ˜ ≥ (γA + 1 ΓS ΓA )I A µ µ q 2 + 4Γ Γ ). Finally we derive an upper and thus µ ≥ γA + µ1 ΓS ΓA . This yields µ ≥ 21 (γA − γA S A bound for µ < 0. We introduce ν := −µ > 0. From (10.6) it follows that for µ < 0, w 6= 0 must hold. Furthermore, we have ˜ A ˜ + νI)−1 B ˜ T w = νw B( ˜ A ˜ + νI)−1 B ˜ T ). From I + ν A ˜ −1 ≤ (1 + and thus ν ∈ σ(B(

ν γA )I

and (10.4c) we obtain

˜ A ˜ + νI)−1 B ˜T = B ˜A ˜ − 12 (I + ν A ˜A ˜ −1 B ˜T ˜ −1 )−1 A ˜ − 21 B ˜ T ≥ (1 + ν )−1 B B( γA ν −1 − 12 ν −1 −1 ) MS SMS 2 ≥ (1 + ) γS I = (1 + γA γA q 2 + 4γ γ ). We conclude that ν ≥ (1+ γνA )−1 γS holds. Hence, for µ = −ν we get µ ≤ 21 (γA − γA S A Remark 10.1.3√ Note that if γA √ = ΓA = γS = ΓS = 1, i.e., MA = A and MS = S, we obtain ˜ = { 1 (1 − 5)} ∪ [ 1 , 1 (1 + 5) ], which is sharp (cf. lemma 10.1.1). σ(K)  2 2 231

10.2

Application to the Stokes problem

In this section the results of the previous sections are applied to the discretized Stokes problem that is treated in section 5.2. We consider the Galerkin discretization of the Stokes problem with Hood-Taylor finite element spaces  (Vh , Mh ) = (Xkh,0 )d , Xhk−1 ∩ L20 (Ω) ,

k≥2

Here we use the notation d for the dimension of the velocity vector (Ω ⊂ Rd ). For the bases in these spaces we use standard nodal basis functions. In the velocity space Vh = (Xkh,0 )d the set of basis functions is denoted by (ψ i )1≤i≤m . Each ψ i is a vector function in Rd with d − 1 components identically zero. The basis in the pressure space Mh = Xhk−1 ∩ L20 (Ω) is denoted by (φi )1≤i≤n . The corresponding isomorphisms are given by Ph,1 : R

m

m X

→ Vh , Ph,1 v =

Ph,2 : Rn → Mh , Ph,2 w =

vi ψ i

i=1 n X

wi φi

i=1

The stiffness matrix for the Stokes problem is given by 

A BT B 0



∈ R(m+n)×(m+n) , with Z ˜ ) dx ∀ v, v ˜ ∈ Rm ˜ i = a(Ph,1 v, Ph,1 v ˜ ) = (∇Ph,1 v) · (∇Ph,1 v hAv, v ΩZ hBv, wi = b(Ph,1 v, Ph,2 w) = − Ph,2 w div Ph,1 v dx ∀ v ∈ Rm , w ∈ Rn K=



ˆ The matrix A = blockdiag(A1 , . . . , Ad ) is symmetric positive definite and A1 = . . . = Ad =: A is the stiffness matrix corresponding to the Galerkin discretization of the Poisson equation in the space Xkh,0 of simplicial finite elements. We now discuss preconditioners for the matrix A and the Schur complement S = BA−1 BT . The preconditioner for the matrix A is based on a symmetric multigrid method applied to the ˆ Let CM G be the iteration matrix of a symmetric multigrid method applied to diagonal block A. ˆ ˆ M G is defined by CM G =: I − M ˆ −1 A. ˆ the matrix A, as defined in section 9.4.5. The matrix M MG ˆ For given This matrix, although not explicitly available, can be used as a preconditioner for A. −1 ˆ y the vector MM G y is the result of one multigrid iteration with starting vector equal to zero ˆ = y. applied to the system Av ˆ M G is symmetric and under certain reasonFrom the analysis in section 9.4.5 it follows that M −1 ˆ ˆ able assumptions we have σ(I − MM G A) ⊂ [0, ρM G ] with the contraction number ρM G < 1 independent of the mesh size parameter h. For the preconditioner MA of A we take ˆ M G) MA := blockdiag(M

(d blocks)

For this preconditioner we then have the following spectral inequalities (1 − ρM G )MA ≤ A ≤ MA ,

with ρM G < 1 independent of h 232

(10.7)

For the preconditioner MS of the Schur complement S we use the mass matrix in the pressure space, which is defined by hMS w, zi = hPh,2 w, Ph,2 ziL2

for all w, z ∈ Rn

(10.8)

This mass matrix is symmetric positive definite and (after diagonal scaling, cf. section 3.5.1) in general well-conditioned. In practice the linear systems of the form MS w = q are solved approximately by applying a few iterations of an iterative solver (for example, CG). We recall the stability property of the Hood-Taylor finite element pair (Vh , Mh ) (cf. section 5.2.1): ∃ βˆ > 0 :

sup uh ∈Vh

b(uh , qh ) ˆ h kL2 ≥ βkq kuh k1

for all qh ∈ Mh

(10.9)

with βˆ independent of h. Using this stability property we get the following spectral inequalities for the preconditioner Ms : Theorem 10.2.1 Let MS be the pressure mass matrix defined in (10.8). Assume that the stability property (10.9) holds. Then βˆ2 MS ≤ S ≤ d MS

(10.10)

holds. Proof. For w ∈ Rn we have: max

v∈Rm

1

hBA− 2 v, wi = maxm v∈R kvk

hBv, wi 1

Av, vi 2

1

hv, A− 2 BT wi = maxm v∈R kvk 1

1

= kA− 2 BT wk = hSw, wi 2

Hence, for arbritrary w ∈ Rn : 1

hSw, wi 2 = max

uh ∈Vh

b(uh , Ph,2 w) |uh |1

(10.11)

Using this and the stability bound (10.9) we get 1 1 hSw, wi 2 ≥ βˆ kPh,2 wkL2 = βˆ hMS w, wi 2

and thus the first inequality in (10.10) holds. Note that |b(uh , Ph,2 w)| ≤ kdiv uh kL2 kPh,2 wkL2 √ √ 1 ≤ d |uh |1 kPh,2 wkL2 = d |uh |1 hMS w, wi 2 holds. Combining this with (10.11) proves the second inequality in (10.10). Corollary 10.2.2 Suppose that for solving a discrete Stokes problem with stiffness matrix K we use a preconditioned MINRES method with preconditioners MA (for A) and MS (for S) as defined above. Then the inequalities (10.3) hold with constants γA , ΓA , γS , ΓS that are independent of h. From theorem 10.1.2 it follows that the spectrum of the preconditioned matrix ˜ is contained in a set [a, b] ∪ [c, d] with a < b < 0 < c < d, all independent of h, and with K b − a = d − c. From theorem 8.42 we then conclude that the residual reduction factor can be bounded by a constant smaller than one independent of h.  233

234

Appendix A

Functional Analysis A.1

Different types of spaces

Below we give some definitions of elementary notions from functional analysis (cf. for example Kreyszig [55]). We restrict ourselves to real spaces, i.e. for the scalar field we take R. Real vector space. A real vector space is a set X of elements, called vectors, together with the algebraic operations vector addition and multiplication of vectors by real scalars. Vector addition should be commutative and associative. Multiplication by scalars should be associative and distributive. Example A.1.1 Examples of real vector spaces are Rn and C([a, b]) Normed space. A normed space is a vector space X with a norm defined on it. Here a norm on a vector space X is a real-valued function on X whose value at x ∈ X is denoted by kxk and which has the properties kxk ≥ 0

kxk = 0 ⇔ x = 0

(A.1)

kαxk = |α| kxk

kx + yk ≤ kxk + kyk for arbitrary x, y ∈ X, α ∈ R. Example A.1.2 . Examples of normed spaces are (Rn , k · k∞ ) with kxk∞ = max |xi | , 1≤i≤n

(Rn , k · k2 ) with kxk22 =

n X

x2i ,

i=1

(C([a, b]), k · k∞ ) with kf k∞ = max |f (t)| , t∈[a,b]

(C([a, b]), k · kL2 ) with kf kL2

Z b 1 f (t)2 dt) 2 . =( a

Banach space. A Banach space is a complete normed space. This means that in X every Cauchy sequence, in the metric defined by the norm, has a limit which is an element of X. 235

Example A.1.3 Examples of Banach spaces are: (Rn , k · k2 ) ,

(Rn , k · k∗ ) with any norm k · k∗ on Rn , (C([a, b]), k · k∞ ).

The completeness of the space in the second example follows from the fact that on a finite dimensional space all norms are equivalent. The completeness of the space in the third example is a consequence of the following theorem: The limit of a uniformly convergent sequence of continouos functions is a continuous function. Remark A.1.4 The space (C([a, b]), k·kL2 ) is not complete. Consider for example the sequence fn ∈ C([0, 1]), n ≥ 1, defined by if t ≤ 12 if 21 ≤ t ≤ 12 + n1 if 12 + n1 ≤ t ≤ 1 .

  0 n(t − 12 ) fn (t) =  1

Then for m, n ≥ N we have kfn −

fm k2L2

=

Z

1

2

|fn (t) − fm (t)| dt ≤

0

Z

1 +1 2 N 1 2

1 dt =

1 . N

So (fn )n≥1 is a Cauchy sequence. For the limit function f we would have f (t) =



0 1

if 0 ≤ t ≤ 12 if 12 + ε ≤ t ≤ 1,

for arbitrary ε > 0. So f cannot be continuous. Inner product space. An inner product space is a (real) vector space X with an inner product defined on X. For such an inner product we need a mapping of X × X into R, i.e. with every pair of vectors x and y from X there is associated a scalar denoted by hx, yi. This mapping is called an inner product on X if for arbitrary x, y, z ∈ X and α ∈ R the following holds: hx, xi ≥ 0

(A.2)

hx, xi = 0 ⇔ x = 0 hx, yi = hy, xi

hαx, yi = αhx, yi

hx + y, zi = hx, zi + hy, zi.

(A.3) (A.4) (A.5) (A.6)

An inner product defines a norm on X : kxk =

p

hx, xi.

An inner product and the corresponding norm satisfy the Cauchy-Schwarz inequality: |hx, yi| ≤ kxk kyk for all x, y ∈ X. 236

(A.7)

Example A.1.5 Examples of inner product spaces are: Rn with hx, yi = C([a, b]) with hf, gi =

n X

xi yi ,

i=1

Z

b

f (t)g(t) dt. a

Hilbert space. An inner product space which is complete is called a Hilbert space. Example A.1.6 Examples of Hilbert spaces are: Rn with hx, yi = 2

L ([a, b]) with hf, gi =

n X

xi yi ,

i=1

Z

b

f (t)g(t) dt. a

We note that the space C([a, b]) with the inner product (and corresponding norm) as in Example A.1.5 results in the normed space (C([a, b]), k · kL2 ). In Remark A.1.4 it is shown that this space is not complete. Thus the inner product space C([a, b]) as in Example A.1.5 is not a Hilbert space. Completion. Let X be a Banach space and Y a subspace of X. The closure Y¯ of Y in X is defined as the set of accumulation points of Y in X, i.e. x ∈ Y¯ if and only if there is a sequence (xn )n≥1 in Y such that xn → x. If Y¯ = Y holds, then Y is called closed (in X). The subspace Y is called dense in X if Y¯ = X. Let (Z, k · k) be a given normed space. Then there exists a Banach space (X, k · k∗ ) (which is unique, except for isometric isomorphisms) such that Z ⊂ X, kxk = kxk∗ for all x ∈ Z, and Z is dense in X. The space X is called the completion of Z. The space L2 (Ω). Let Ω be a domain in Rn . We denote by L2 (Ω) the space of all Lebesgue measurable functions f : Ω → R for which sZ kf k0 := kf kL2 :=



|f (x)|2 dx < ∞

In this space functions are identified that are equal almost evereywhere (a.e.) on Ω. The elements of L2 (Ω) are thus actually equivalence classes of functions. One writes f = 0 [f = g] if f (x) = 0 [f (x) = g(x)] a.e. in Ω. The space L2 (Ω) with Z f (x)g(x) dx hf, gi = Ω

is a Hilbert space. The space C0∞ (Ω) (all functies in C ∞ (Ω) which have a compact support in Ω) is dense in L2 (Ω): k·k0

C0∞ (Ω)

= L2 (Ω)

In other words, the completion of the normed space (C0∞ (Ω), k · k0 ) results in the space L2 (Ω). 237

Dual space. Let (X, k · k) be a normed space. The set of all bounded linear functionals f : X → R forms a real vector space. On this space we can define the norm: kf k := sup{

|f (x)| |x ∈ X, x 6= 0 }. kxk

This results in a normed space called the dual space of X and denoted by X ′ . Bounded linear operators. Let (X, k · kX ) and (Y, k · kY ) be normed spaces and T : X → Y be a linear operator. The (operator) norm of T is defined by kT kY ←X := sup

 kT xkY | x ∈ X, x 6= 0 . kxkX

The operator T is bounded iff kT kY ←X < ∞. The operator T is called an isomorphism if T is bijective (i.e., injective and surjective) and both T and T −1 are bounded. Compact linear operators. Let X and Y be Banach spaces and T : X → Y a linear bounded operator. The operator T is compact if for every bounded set A in X the image set, i.e., B := T (A), is precompact in Y (this means that every sequence in B must contain a convergent subsequence). Continuous embedding; compact embedding. Let X and Y be normed spaces.and A linear operator I : X → Y is called a continuous embedding if I is bounded (or, equivalently, continuous, see theorem A.2.2) and injective. The embedding is called compact if I is continuous and compact. An equivalent characterization of compact embedding is that for every bounded sequence (xk )k≥1 in X the image sequence (Ixk )k≥1 has a subsequence that is a Cauchy sequence in Y .

A.2

Theorems from functional analysis

Below we give a few classical results from functional analysis (cf. for example Kreyszig [55]). ¯ is precompact (i.e., every sequence has Theorem A.2.1 (Arzela-Ascoli.) A subset K of C(Ω) a convergent subsequence) if and only if the following two conditions holds: (i) ∃M : kf k∞ < M

for all f ∈ K

(K is bounded)

(ii) ∀ε > 0 ∃δ > 0 : ∀f ∈ K |f (x) − f (y)| < ε (K is uniformly equicontinuous)

¯ with kx − yk < δ ∀ x, y ∈ Ω

Theorem A.2.2 (Boundedness of linear operators.) Let X and Y be normed spaces an T : X → Y a linear mapping. Then T is bounded if and only if T is continuous.

238

Theorem A.2.3 (Extension of operators.) Let X be a normed space and Y a Banach space. Suppose X0 is a dense subspace of X and T : X0 → Y a bounded linear operator. Then there exists a unique extension Te : X → Y with the properties (i) T x = Te x

for all x ∈ X0

(ii) If (xk )k≥1 ⊂ X0 , x ∈ X and

lim xk = x

k→∞

then Te x = lim T xk k→∞

(iii) kTe kY ←X = kT kY ←X0 Theorem A.2.4 (Banach fixed point theorem.) Let (X, k·k) be a Banach space. Let F : X → X be a (possibly nonlinear) contraction, i.e. there is a constant γ < 1 such that for all x, y ∈ X : kF (x) − F (y)k ≤ γkx − yk. Then there exists a unique x ∈ X (called a fixed point) such that F (x) = x holds. Theorem A.2.5 (Corollary of open mapping theorem.) Let X and Y be Banach spaces and T : X → Y a linear bounded operator which is bijective. Then T −1 is bounded, i.e., T : X → Y is an isomorphism. Corollary A.2.6 Let X and Y be Banach spaces and T : X → Y a linear bounded operator which is injective. Let R(T ) = { T x | x ∈ X } be the range of T . Then T −1 : R(T ) → X is bounded if and only if R(T ) is closed (in Y ). Theorem A.2.7 (Orthogonal decomposition.) Let U ⊂ H be a closed subspace of a Hilbert space H. Let U ⊥ = { x ∈ H | (x, y) = 0 for all y ∈ U } be the orthogonal complement of U . Then H can be decomposed as H = U ⊕ U ⊥ , i.e., every x ∈ H has a unique representation x = u + v, u ∈ U , v ∈ U ⊥ . Moreover, the identity kxk2 = kuk2 + kvk2 holds. Theorem A.2.8 (Riesz representation theorem.) Let H be a Hilbert space with inner product denoted by h·, ·i and corresponding norm k · kH . Let f be an element of the dual space H ′ , with norm kf kH ′ . Then there exists a unique w ∈ H such that f (x) = hw, xi

for all x ∈ H.

Furthermore, kwkH = kf kH ′ holds. The linear operator JH : isomorphism.

f → w is called the Riesz

Corollary A.2.9 Let H be a Hilbert space with inner product denoted by h·, ·i. The bilinear form hf, giH ′ := hJH f, JH gi , f, g ∈ H ′

defines a scalar product on H ′ . The space H ′ with this scalar product is a Hilbert space.

239

240

Appendix B

Linear Algebra B.1

Notions from linear algebra

Below we give some definitions and elementary notions from linear algebra. The collection of all real n × n matrices is denoted by Rn×n . For A = (aij )1≤i,j≤n ∈ Rn×n the transpose AT is defined by AT = (aji )1≤i,j≤n . Spectrum, spectral radius. By σ(A) we denote the spectrum of A, i.e. the collection of all eigenvalues of the matrix A. Note that in general σ(A) contains complex numbers (even for A real). We use the notation ρ(A) for the spectral radius of A ∈ Rn×n : ρ(A) := max{ |λ| | λ ∈ σ(A) }. Vector norms. On Rn we can define a norm k · k, i.e. a real-valued function on Rn with properties as in (A.1). Important examples of such norms are: kxk1 :=

n X

|xi |

i=1 n X

kxk2 := (

1

x2i ) 2

(1-norm),

(B.1)

(2-norm or Euclidean norm),

(B.2)

(maximum norm).

(B.3)

i=1

kxk∞ := max |xi | 1≤i≤n

Cauchy-Schwarz inequality. On Rn we can define an inner product by hx, yi := xT y. The norm corresponding to this inner product is the Euclidean norm (B.2). The Cauchy-Schwarz inequality (A.7) takes the form: |xT y| ≤ kxk2 kyk2 for all x, y ∈ Rn .

(B.4)

Matrix norms. A matrix norm on Rn×n is a real valued function whose value at A ∈ Rn×n is denoted by kAk and which has the properties kAk ≥ 0 and kAk = 0 iff A = 0

kαAk = |α|kAk

kA + Bk ≤ kAk + kBk kABk ≤ kAkkBk

241

(B.5)

for all A, B ∈ Rn×n and all α ∈ R. A special class of matrix norms are those induced by a vector norm. For a given vector norm k · k on Rn we define an induced matrix norm by: kAk := sup{

kAxk | x ∈ Rn , x 6= 0 } for A ∈ Rn×n . kxk

(B.6)

Induced by the vector norms in (B.1), (B.2), (B.3) we obtain the matrix norms kAk1 , kAk2 , kAk∞ . From the definition of the induced matrix norm it follows that kAxk ≤ kAkkxk

for all A ∈ Rn×n , x ∈ Rn

(B.7)

and that the properties (B.5) hold. In the same way one can define a matrix norm on Cn×n . In this book we will always use real induced matrix norms as defined in (B.6). Condition number. For a nonsingular matrix A the spectral condition number is defined by κ(A) := kAk2 kA−1 k2 . We note that condition numbers can be defined with respect to other matrix norms, too. Below we introduce some notions related to special properties which matrices A ∈ Rn×n may have. The matrix A is symmetric if A = AT holds. The matrix A is normal if the equality AT A = AAT holds. Note that every symmetric matrix is normal. A symmetric matrix A is positive definite if xT Ax > 0 holds for all x 6= 0. In that case, A is said to be symmetric positive definite. A matrix A is weakly diagonally dominant if the following condition is fulfilled: X |aij | ≤ |aii | for all i, with strict inequality for at least one i. j6=i

The matrix A is called irreducible if there does not exist a permutation matrix Π such that ΠT AΠ is a two by two block matrix in which the (2, 1) block is a zero block. The matrix A is called irreducibly diagonally dominant if A is irreducible and weakly diagonally dominant. A matrix A is an M-matrix if it has the following properties: aij ≤ 0 for all i 6= j ,

A is nonsingular and all the entries in A−1 are ≥ 0.

B.2

Theorems from linear algebra

In this section we give some basic results from linear algebra. For the proofs we refer to an introductory linear algebra text, e.g. Strang [88] or Lancaster and Tismenetsky [58]. Theorem B.2.1 (Results on eigenvalues and eigenvectors) . For A, B ∈ Rn×n the following results hold: 242

E1. σ(A) = σ(AT ). E2. σ(AB) = σ(BA) ,

ρ(AB) = ρ(BA).

E3. A ∈ Rn×n is normal if and only if A has an orthogonal basis of eigenvectors. In general these eigenvectors and corresponding eigenvalues are complex, and orthogonality is meant with respect to the complex Euclidean inner product. E4. If A is symmetric then A has an orthogonal basis of real eigenvectors. Furthermore, all eigenvalues of A are real. E5. A is symmetric positive definite if and only if A is symmetric and σ(A) ⊂ (0, ∞). Theorem B.2.2 (Results on matrix norms) . For A ∈ Rn×n the following results hold: P N1. kAk1 = max1≤j≤n ni=1 |aij |. P N2. kAk∞ = max1≤i≤n nj=1 |aij |. p N3. kAk2 = ρ(AT A). N4. If A is normal then kAk2 = ρ(A).

N5. Let k · k be an induced matrix norm. The following inequality holds: ρ(A) ≤ kAk. Using (N4) and (E5) we obtain the following results for the spectral condition number: κ(A) = ρ(A)ρ(A−1 ) , if A is normal ,

(B.8)

κ(A) = λmax /λmin , if A is symmetric positive definite,

(B.9)

with λmax and λmin the largest and smallest eigenvalue of A, respectively. Theorem B.2.3 (Jordan normal form) . For every A ∈ Rn×n exists a nonsingular matrix T such that A = TΛT−1 with Λ a matrix of the form Λ = blockdiag(Λi )1≤i≤s ,   λi 1 ∅   .. ..   . .  ∈ Rki ×ki , 1 ≤ i ≤ s, Λi :=    . .  . 1 ∅ λi

and {λ1 , . . . , λs } = σ(A).

243

244

Bibliography [1] R. A. Adams. Sobolev Spaces. Academic Press, 1975. [2] S. Agmon, A. Douglis, and L. Nirenberg. Estimates near the boundary of solutions of elliptic partial differential equations satisfying general boundary conditions ii. Comm. on Pure and Appl. Math., 17:35–92, 1964. [3] H. W. Alt. Linear Funktionalanalysis, 2.ed. Springer, Heidelberg, 1992. [4] D. N. Arnold, F. Brezzi, and M. Fortin. A stable finite element for the stokes equation. Calcalo, 21:337–344, 1984. [5] W. E. Arnoldi. The principle of minimized iterations in the solution of the matrix eigenvalue problem. Quart. Appl. Math., 9:17–29, 1951. [6] O. Axelsson. Iterative Solution Methods. Cambridge University Press, NY, 1994. [7] O. Axelsson and V. A. Barker. Finite Element Solution of Boundary Value Problems. Theory and Computation. Academic Press, Orlando, 1984. [8] R. E. Bank and T. F. Chan. A composite step biconjugate gradient method. Numer. Math., 66:295–319, 1994. [9] R. E. Bank and L. R. Scott. On the conditioning of finite element equations with highly refined meshes. SIAM J. Numer. Anal., 26:1383–1394, 1989. [10] R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. van der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, Philadelphia, 1994. [11] M. Bercovier and O. Pironneau. Error estimates for finite element solution of the stokes problem in primitive variables. Numer. Math., 33:211–224, 1979. [12] A. Berman and R. J. Plemmons. Nonnegative Matrices in the Mathematical Sciences. Academic Press, NY, 1979. [13] C. Bernardi. Optimal finite element interpolation on curved domains. SIAM J. Numer. Anal., 26:1212–1240, 1989. [14] C. Bernardi and V. Girault. A local regularization operator for triangular and quadrilateral finite elements. SIAM J. Numer. Anal., 35:1893–1915, 1998. [15] D. Boffi. Stability of higher-order triangular hood-taylor methods for the stationary stokes equations. Math. Models Methods Appl. Sci., 4:223–235, 1994. 245

[16] D. Boffi. Three-dimensional finite element methods for the stokes problem. SIAM J. Numer. Anal., 34:664–670, 1997. [17] D. Braess, M. Dryja, and W. Hackbusch. A multigrid method for nonconforming fediscretisations with application to non-matching grids. Computing, 63:1–25, 1999. [18] D. Braess and W. Hackbusch. A new convergence proof for the multigrid method including the V-cycle. SIAM J. Numer. Anal., 20:967–975, 1983. [19] J. H. Bramble. Multigrid Methods. Longman, Harlow, 1993. [20] J. H. Bramble and S. R. Hilbert. Estimation of linear functionals on sobolev spaces with applications to fourier transforms and spline interpolation. SIAM J. Numer. Anal., 7:113– 124, 1970. [21] S. C. Brenner and L. R. Scott. The Mathematical Theory of Finite Element Methods. Springer, New York, 1994. [22] F. Brezzi and R. S. Falk. Stability of higher-order hood-taylor methods. SIAM J. Numer. Anal., 28:581–590, 1991. [23] W. L. Briggs, V. E. Henson, and S. F. McCormick. A Multigrid Tutorial (2nd ed.). SIAM, Philadelphia, 2000. [24] O. Br¨ oker, M. Grote, C. Mayer, and A. Reusken. Robust parallel smoothing for multigrid via sparse approximate inverses. SIAM J. Sci. Comput., 32:1395–1416, 2001. [25] A. M. Bruaset. A Survey of Preconditioned Iterative Methods. Longman, Harlow, 1995. [26] L. Cattabriga. Su un problema al contorno relativo al sistema di equazioni di stokes. Rend. Sem. Mat. Univ. Padov., 31:308–340, 1961. [27] P. G. Ciarlet. The Finite Element Method for Elliptic Problems. North Holland, 1978. [28] P. G. Ciarlet. Basic error estimates for elliptic problems. In P. G. Ciarlet and J. L. Lions, editors, Handbook of Numerical Analysis, Volume II: Finite Element Methods (Part 1). North Holland, Amsterdam, 1991. [29] P. Cl´ement. Approximation by finite element functions using local regularization. RAIRO Anal. Numer. (M2 AN), 9(R-2):77–84, 1975. [30] M. Dauge. Stationary stokes and navier-stokes systems on two- or three-dimensional domains with corners. part i: linearized equations. SIAM J. Math. Anal., 20:74–97, 1989. [31] G. Duvaut and J. L. Lions. Les In´equations en M´ecanique et en Physique. Dunod, Paris, 1972. [32] E. Ecker and W. Zulehner. On the smoothing property of multi-grid methods in the non-symmetric case. Numerical Linear Algebra with Applications, 3:161–172, 1996. [33] V. Faber and T. Manteuffel. Orthogonal error methods. SIAM J. Numer. Anal., 20:352– 262, 1984. [34] M. Fiedler. Special Matrices and their Applications in Numerical Mathematics. Nijhoff, Dordrecht, 1986. 246

[35] R. Fletcher. Conjugate gradient methods for indefinite systems. In G. A. Watson, editor, Numerical Analysis Dundee 1975, Lecture Notes in Mathemaics, Vol. 506, pages 73–89, Berlin, 1976. Springer. [36] R. W. Freund, G. H. Golub, and N. M. Nachtigal. Iterative solution of linear systems. Acta Numerica, pages 57–100, 1992. ¨ [37] G. Frobenius. Uber matrizen aus nicht negativen elementen. Preuss. Akad. Wiss., pages 456–477, 1912. [38] E. Gartland. Strong uniform stability and exact discretizations of a model singular perturbation problem and its finite difference approximations. Appl. Math. Comput., 31:473–485, 1989. [39] D. Gilbarg and N. S. Trudinger. Elliptic Partial Differential Equations of Second Order. Springer, Berlin, Heidelberg, 1977. [40] V. Girault and P.-A. Raviart. Finite Element Methods for Navier-Stokes Equations, volume 5 of Springer Series in Computational Mathematics. Springer, Berlin, Heidelberg, 1986. [41] G. H. Golub and C. F. V. Loan. Matrix Computations. John Hopkins University Press, Baltimore, 2. edition, 1989. [42] A. Greenbaum. Iterative Methods for Solving Linear Systems. SIAM, Philadelphia, 1997. [43] M. E. Gurtin. An Introduction to Continuum Mechanics, volume 158 of Mathematics in Science and Engineering. Academic Press, 1981. [44] W. Hackbusch. Multigrid Methods and Applications, volume 4 of Springer Series in Computational Mathematics. Springer, Berlin, Heidelberg, 1985. [45] W. Hackbusch. Stuttgart, 1986.

Theorie und Numerik elliptischer Differentialgleichungen.

Teubner,

[46] W. Hackbusch. Iterative L¨ osung Großer Schwachbesetzter Gleichungssysteme. Teubner, 1991. [47] W. Hackbusch. Elliptic Differential Equations: Theory and Numerical Treatment, volume 18 of Springer Series in Computational Mathematics. Springer, Berlin, 1992. [48] W. Hackbusch. Iterative Solution of Large Sparse Systems of Equations, volume 95 of Applied Mathematical Sciences. Springer, New York, 1994. [49] W. Hackbusch. A note on reusken’s lemma. Computing, 55:181–189, 1995. [50] L. A. Hageman and D. M. Young. Applied Iterative Methods. Academic Press, New York, 1981. [51] M. R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Stand., 49:409–436, 1952. [52] P. Hood and C. Taylor. A numerical solution of the navier-stokes equations using the finite element technique. Comp. and Fluids, 1:73–100, 1973. 247

[53] J. Kadlec. On the regularity of the solution of the Poisson problem on a domain with boundary locally similar to the boundary of a convex open set. Czechoslovak Math. J., 14(89):386–393, 1964. (russ.). [54] R. B. Kellog and J. E. Osborn. A regularity result for the stokes problem in a convex polygon. J. Funct. Anal., 21:397–431, 1976. [55] E. Kreyszig. Introductory Functional Analysis with Applications. Wiley, New York, 1978. [56] O. A. Ladyzhenskaya. Funktionalanalytische Untersuchungen der Navier-Stokesschen Gleichungen. Akademie-Verlag, Berlin, 1965. [57] O. A. Ladyzhenskaya and N. A. Ural’tseva. Linear and Quasilinear Elliptic Equations, volume 46 of Mathematics in Science and Engineering. Academic Press, New York, London, 1968. [58] P. Lancaster and M. Tismenetsky. The Theory of Matrices. Academic Press, Orlando, 2. edition, 1985. [59] C. Lanczos. Solution of systems of linear equations by minimized iterations. J. Res. Natl. Bur. Stand. 49, pages 33–53, 1952. [60] J. L. Lions and E. Magenes. Non-homogeneous Boundary Value Problems and Applications, Vol. I. Springer, Berlin, 1972. [61] J. T. Marti. Introduction to Sobolev Spaces and Finite Element Solution of Elliptic Boundary Value Problems. Academic Press, London, 1986. [62] J. A. Meijerink and H. A. van der Vorst. An iterative solution method for linear systems of which the coefficient matrix is a symmetric m-matrix. Math. Comp., 31:148–162, 1977. [63] N. Meyers and J. Serrin. H=W. Proc. Nat. Acad. Sci. USA, 51:1055–1056, 1964. [64] C. Miranda. Partial Differential Equations of Elliptic Type. Springer, Berlin, 1970. ´ [65] J. Neˇcas. Les M´ethodes Directes en Th´eorie des Equations Elliptiques. Masson, Paris, 1967. [66] O. Nevanlinna. Convergence of Iterations for Linear Equations. Birkh¨auser, Basel, 1993. [67] R. A. Nicolaides. On a class of finite elements generated by lagrange interpolation. SIAM J. Numer. Anal., 9:435–445, 1972. [68] W. Niethammer. The sor method on parallel computers. Numer. Math., 56:247–254, 1989. [69] U. T. C. W. Oosterlee and A. Sch¨ uler, editors. Multigrid. Academic Press, London, 2001. [70] C. C. Paige and M. Saunders. Solution of sparse indefinite systems of linear equations. SIAM J. Numer. Anal., 12:617–629, 1975. [71] O. Perron. Zur theorie der matrizen. Math. Ann., 64:248–263, 1907. [72] S. D. Poisson. Remarques sur une ´equation qui se pr´esente dans la th´eorie des attractions des sph´eroides. Nouveau Bull. Soc. Philomathique de Paris, 3:388–392, 1813. 248

[73] A. Quarteroni and A. Valli. Numerical Approximation of Partial Differential Equations, volume 23 of Springer Series in Computational Mathematics. Springer, Berlin, Heidelberg, 1994. [74] A. Reusken. On maximum norm convergence of multigrid methods for two-point boundary value problems. SIAM J. Numer. Anal., 29:1569–1578, 1992. [75] A. Reusken. The smoothing property for regular splittings. In W. Hackbusch and G. Wittum, editors, Incomplete Decompositions : (ILU)-Algorithms, Theory and Applications, volume 41 of Notes on Numerical Fluid Mechanics, pages 130–138, Braunschweig, 1993. Vieweg. [76] H.-G. Roos, M. Stynes, and L. Tobiska. Numerical Methods for Singularly Perturbed Differential Equations, volume 24 of Springer Series in Computational Mathematics. Springer, Berlin, Heidelberg, 1996. [77] T. Rusten and R. Winther. A preconditioned iterative method for saddlepoint problems. SIAM J. Matrix Anal. Appl., 13:887–904, 1992. [78] Y. Saad. Iterative Methods for Sparse Linear Systems. PWS Publishing Company, London, 1996. [79] Y. Saad and M. H. Schultz. Conjugate gradient–like algorithms for solving nonsymmetric linear systems. Math. Comp., 44:417–424, 1985. [80] Y. Saad and M. H. Schultz. Gmres: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Statist. Comput., 7:856–869, 1986. [81] L. R. Scott and S. Zhang. Finite element interpolation of nonsmooth functions satisfying boundary conditions. Math. Comp., 54:483–493, 1990. [82] D. Silvester and A. Wathen. Fast iterative solution of stabilised stokes systems. part ii: using general block preconditioners. SIAM J. Numer. Anal., 31:1352–1367, 1994. [83] M. Sion. On general minimax theorems. Pacific J. of Math., 8:171–176, 1958. [84] G. L. G. Sleijpen and D. R. Fokkema. Bicgstab(l) for linear equations involving matrices with complex spectrum. ETNA, 1:11–32, 1993. [85] G. L. G. Sleijpen and H. van der Vorst. Optimal iteration methods for large linear systems of equations. In C. B. Vreugdenhil and B. Koren, editors, Numerical Methods for Advection-Diffusion Problems, volume 45 of Notes on Numerical Fluid Mechanics, pages 291–320. Vieweg, Braunschweig, 1993. [86] P. Sonneveld. Cgs: a fast lanczos–type solver for nonsymmetric linear systems. SIAM J. Sci. Statist. Comput., 10:36–52, 1989. [87] R. Sternberg. Error analysis of some finite element methods for the stokes problem. Math. Comp., 54:495–508, 1990. [88] G. Strang. Linear Algebra and its Applications. Harcourt Brace Jovanovich, San Diego, 3. edition, 1988. 249

[89] A. van der Sluis. Condition numbers and equilibration of matrices. Numer. Math., 14:14– 23, 1969. [90] A. van der Sluis and H. van der Vorst. The rate of convergence of conjugate gradients. Numer. Math., 48:543–560, 1986. [91] H. A. van der Vorst. Bi-cgstab: A fast and smoothly converging variant of bi-cg for the solution of non–symmetric linear systems. SIAM J. Sci. Statist. Comput., 13:631–644, 1992. [92] R. S. Varga. Matrix Iterative Analysis. Prentice Hall, Englewood Cliffs, New Jersey, 1962. [93] R. Verf¨ urth. Error estimates for a mixed finite element approximation of stokes problem. RAIRO Anal. Numer., 18:175–182, 1984. [94] R. Verf¨ urth. Robust a posteriori error estimates for stationary convection-diffusion equations. SIAM J. Numer. Anal., 43:1766–1782, 2005. [95] W. Walter. Gew¨ ohnliche Differentialgleichungen. Heidelberger Taschenb¨ ucher. Springer, Berlin, 1972. [96] P. Wesseling. An introduction to Multigrid Methods. Wiley, Chichester, 1992. [97] G. Wittum. Linear iterations as smoothers in multigrid methods : Theory with applications to incomplete decompositions. Impact Comput. Sci. Eng., 1:180–215, 1989. [98] G. Wittum. On the robustness of ILU-smoothing. SIAM J. Sci. Stat. Comp., 10:699–717, 1989. [99] J. Wloka. Partial Differential Equations. Cambridge University Press, Cambridge, 1987. [100] D. M. Young. Iterative Solution of Large Linear Systems. Academic Press, NY, 1971.

250

Suggest Documents