approximation of roots of polynomial and

Chapter 2

APPROXIMATION OF ROOTS OF POLYNOMIAL AND TRANSCENDENTAL EQUATIONS

2.1

INTRODUCTION

A very common problem in scientific and engineering computation is to find the roots of equations of the form f  x  0 (2.1) Equation (2.1) is known as a polynomial equation if the function f (x) is pure algebraic. A polynomial equation of degree n with real coefficients is defined as follows:

a0 x n  a1 x n 1  a 2 x n  2    a n 1 x  a n  0

(2.2)

where a0, a1, a2,…, an are real numbers, and a0 ≠ 0. If f (x) also contains other functions such as exponential, logarithmic and trigonometric function, then equation (2.1) is known a transcendental equation. For instance (a) x sin x – ex = 0 (b) x log10 x = 1.254 (c) x2 – ex + 4 sin x = 0 are transcendental equations. A value α of parameter x is called a root of equation (2.1) if

f ( )  0

(2.3)

A polynomial equation of degree n has exactly n roots (real or complex); however, a transcendental equation may have finite or infinite many roots. For instance, the equation cos x – x = 0 has exactly one real root near x = 0.739, while equation sin x = 0 has infinite many roots, namely x = 0, ± π, ± 2π, ± 3π and so on. Suppose that a polynomial equation f (x) = 0 is expressible in the form

 x  

m

 ( x)  0

(2.4)

where  (x) is bounded function, and  (α) ≠ 0. If m = 1, then α is called a simple root of f (x) = 0. However, if m ≠ 1 then the multiplicity of the root α is said to be m. A necessary and sufficient condition for an equation f (x) = 0 to have a root α of multiplicity m is that

Fundamentals of Numerical Methods

2.3

Rule 4: Descartes’ Rule of Signs tells the maximum number of positive and negative real roots that a polynomial equation can have. When a polynomial is arranged in standard form (i.e. highest to lowest power of x), a variation in sign occurs when the sign of a coefficient is different from the sign of the preceding coefficient. Descartes’ Rule states that: (a) The number of positive real roots of f (x) = 0 is either equal to the number of variations in sign of f (x), or less than that by an even number. (b) The number of negative real roots of f (x) = 0 is either equal to the number of variations in sign of f (− x), or less than that by an even number. For example, the coefficients of the polynomial f (x) = x5 − 4x3 + 8x2 − 3x + 1 have +, –, +, –, + signs when arranged in descending order of powers of x. Since there are four variations in sign; the equation f (x) = 0 has either four or two or zero positive roots. Now, obtain f (– x), by replacing ‘x’ by ‘− x’ in the given polynomial. That is f (– x) = (− x)5 − 4(− x)3 + 8(− x)2 − 3(− x) + 1 or

f (– x) = − x5 + 4x3 + 8x2 + 3x + 1

Since the polynomial f (– x) has only one variation in sign (the coefficients have signs –, +, +, +, +), the given polynomial equation f (x) = 0 has one negative real root. Thus, the equation f (x) = 0 may or may not have positive real roots; however, existence of one real negative root is confirmed by Descartes’ rule. It must be clear that the Descartes’ rule gives only the upper bound of the number of real roots; it does not tell about the exact number of real roots that a polynomial equation has. Most often, we are interested in determining all real and complex roots (simple or multiple) of a polynomial equation. Moreover, in case of real roots, we also want to know the interval in which each root lies. All this information can be obtained using Strum sequence. We define the Strum sequence of functions as follows: Let f (x) be a given polynomial of degree n, and f1 (x) be its first derivative. Let f2 (x) be the remainder with reverse sign when f (x) is divided by f1 (x). Similarly, f3 (x) is the remainder with reverse sign when f1 (x) is divided by f2 (x). This process is continued until a constant fn (x) is obtained. The sequence of functions

f ( x), f1 ( x), f 2 ( x ),  , f n ( x ) is called the Strum chain or Strum sequence or Strum functions, These terms can be multiplied or divided by a positive constant for further simplification. We state an important property of the strum sequence which is also known as Strum’s Theorem. Theorem 2.1. If f (x) = 0 is a polynomial equation whose Strum sequence is

f ( x), f1 ( x), f 2 ( x ),  , f n ( x ) then the number of real roots of f (x) = 0 in the interval (a, b) is same as the difference between the number of changes of sign in the sequence at x = a and x = b, provided f (a) and f (b) are non zero. In order to determine the exact number of real roots of a polynomial equation f (x) = 0, we assign a large negative value to x and find the signs of Strum functions f (x), f1 (x), f2(x), …, f (xn). Suppose that the number of variations in the signs of the sequence is V(– ∞). Next, we assign a large positive value to x and find the corresponding number of variations in the signs of the Strum sequence. Let this number be V(∞). The Strum’s theorem tells us that the exact number of real roots of equation f (x) = 0 is V(– ∞) – V(∞). In order to determine the number of real roots in an interval (a, b), we calculate V(a) and V(b) and then find the difference V(a) – V(b).

2.15


x3 

x1 f ( x2 )  x2 f ( x1 ) 1 ( 6.4375)  1.5  (10)   2.403509 f ( x2 )  f ( x1 )  6.4375  (10)

and f (x3) = 20.968538. The next iteration is obtained by using x2 and x3. That is

x4 

x2 f ( x3 )  x3 f ( x2 ) 1.5  (20.968538)  2.403509  (6.4375)   1.712228 f ( x3 )  f ( x2 ) 20.968538  (6.4375)

and f (x4) = –3.117212. The remaining two iterations are obtained as under

x5 

x3 f ( x4 )  x4 f ( x3 ) 2.403509  (3.117212)  1.712228  20.968538   1.801695 f ( x4 )  f ( x3 ) 3.117212  20.968538

and f (x5) = –1.264502. Lastly

x6 

2.6

x4 f ( x5 )  x5 f ( x4 ) 1.712228  ( 1.264502)  1.801695  ( 3.117212)   1.862757 f ( x5 )  f ( x4 ) 1.264502  ( 3.117212)

NEWTON-RAPHSON METHOD

Suppose that x = α is an exact root of the equation f (x) = 0. Let x0 be a good estimate of root α such that α = x0 + h. Since the true root is α, and h = α − x0 is the number measuring how far is the estimate x0 from α. So

f ( )  f ( x0  h)  0

(2.20)

If h is ‘sufficiently small’, then the Taylor series expansion of f (x0 + h) leads to an approximation

f ( x0  h)  f ( x0 )  h f ( x0 )

(2.21)

If f ʹ(x0) ≠ 0, then equations (2.20) and (2.21) imply that

h and therefore

f ( x0 ) f ( x0 )

  x0  h  x0 

f ( x0 ) f ( x0 )

Hence, an improved estimate of α is given by

x1  x0 

f ( x0 ) f ( x0 )

The next iteration x2 is obtained from x1 in the similar manner as x1 was obtained from x0. That is

(2.22)

(2.23)

2.18 or

Approximation of Roots of Equations

 n 1 

f ( ) 2  n + terms containing higher powers of  n 2 f ( )

Neglecting εn3 and higher power, we get

 n 1 

f ( ) 2 n 2 f ( )

(2.29)

which implies that the convergence of Newton-Raphson method is quadratic. Newton-Raphson method starts with an initial approximation and does not require a bracket of a root. Choice of initial approximation plays a vital role in the convergence of iteration sequence to the exact root. Thus, one should be careful while choosing initial approximation in the Newton-Raphson method.. If the initial approximation is not close enough to the exact root, Newton-Raphson method may not converge, or may converge to another root. We now prove an important theorem regarding the convergence of the Newton-Raphson method. Theorem 2.2. Suppose that f (x) is continuous, convex, monotonically increasing function and has continuous derivatives of first two orders. If f (x) has a zero, then the Newton-Raphson method converges to the exact root of the equation f (x) = 0 regardless of the initial approximation. Proof. The errors εn and εn+1 in the nth and (n +1)th iterations are given by (equation (2.29))

 n 1 

f ( ) 2  n  , n  1, 2,... 2 f ( )

Since f (x) is monotonically increasing and convex function, we have f ʹ(x) > 0 and f ʹʹ(x) > 0 for all x, and thus the above equation implies that εn > 0 for all n. Moreover, xn = α + εn implies that xn > α, and therefore, f (xn) > f (α) = 0 for all n. Now, using the monotonically increasing condition of f (x) in the Newton-Raphson method f ( xn )  n 1   n  (2.30) f ( xn ) we obtain that εn+1 < εn. Hence, both sequences {xn} and {εn} are decreasing and bounded from below. It means that these sequences are convergent, i.e. there exists e̅ and x̅ such that

lim en  e

n

and

lim xn  x

n

Thus for sufficiently large n, equation (2.30) becomes

e  e which implies that f (x̅ ) = 0, i.e. x̅ = α.

 f  x f x

∎

The above theorem guarantees global convergence of Newton-Raphson method for a monotonically increasing convex smooth function. If some of these conditions are relaxed then the method may not converge to the exact root. The method will converge only if the initial approximation is sufficiently close to the exact root.

2.19

Fundamentals of Numerical Methods Newton-Raphson Method for Multiple Roots

While deriving equation (2.29) we had assumed that α is a simple root of the equation f (x) = 0, implying that f ʹ(α) ≠ 0. Suppose that the multiplicity of root α is m where m ≠ 1, then the expansion of terms f (xn + εn) and f (xn + εn) in the right-hand side of equation (2.28) leads to

 nm

m!

 n 1   n 

f ( m ) ( ) 

 n m 1

(m  1)!

 n m 1

(m  1)!

f ( m ) ( ) 

 nm

m!

f ( m 1) ( )......

f ( m 1) ( )  ..........

On simplification, we get

1 1 f ( m 1) ( ) 2   n 1  1    n  2 n m (m  1) f ( m ) ( )  m

(2.31)

which implies that the method converges linearly. However, if the multiplicity of the root is known in advance, we can modify the Newton-Raphson formula for a root of multiplicity m as

xn 1  xn  m

f ( xn ) f ( xn )

(2.32)

then we get

 nm  n 1   n  m

m!

 n m 1

f ( m ) ( ) 

(m  1)! or

 n 1 

 n m 1

(m  1)!

f ( m ) ( ) 

 nm

m!

f ( m 1) ( )......

f ( m 1) ( )  ..........

1 f ( m 1) ( ) 2 n m(m  1) f ( m ) ( )

(2.33)

which implies quadratic convergence of the method. Thus, one must use the modified formula given by equation (2.32) to approximate a root of α of known multiplicity m. Advantages and Disadvantages of Newton-Raphson Method One of the big advantages of Newton-Raphson method is that once it the gets the smell of a root, it converges to it with amazing speed, but there is no guarantee that this method will converge to the true root universally. In fact, the method can go bad in many situations. (a) Inflexion Points: If a root is in close proximity of a point of inflexion, then the method may diverge away from the root. For instance, the equation f (x) = (x – 1.5)3 + 0.125 = 0 has a root x = 1. The function f (x) has a point of inflexion at x = 1.5, which is quite close to the root. If we approximate this root with a fairly good initial value x0 = 1.499, the first few iterations are: x1 = – 4.167 × 104, x2 = – 2.778 × 104, x3 = – 1.852 × 104 etc. In other words, the method diverges away from the root. Although, the sequence of iteration may eventually converge to the root, but the presence of inflexion point slows down the convergence process.

2.20


(b) Division by Zero: If a function f (x) attains a local maxima or minima at an iteration xi, then the first derivative f ʹ (xi) = 0, and hence the method fails. (c) Root Jumping: This phenomenon is generally observed in oscillating functions. If f (x) = 0 has many roots, then despite a good initial approximation, the method may jump and converge to other root. For example sin x = 0 has root at x = 0, ± π, ± 2π etc. If we start with an initial approximation x0 = 2.4 π to capture the root x = 2π, then the sequence of iterations end up converging to the root x = 0. Example 2.10 Use Newton-Raphson method to approximate a root of the equation

x sin x  cos x  0 with x = 3 as an initial approximation. Here f (x) = x sin x + cos x and f ʹ(x) = x cos x. Take x0 = 3 as an initial approximation. The first iteration is given by

x1  x0 

f ( x0 ) 0.566632  3  2.8092 f ( x0 ) 2.969977

Second iteration is given by

x2  x1 

f ( x1 ) 0.028606  2.8092   2.7984 f ( x1 ) 2.655437

Similarly, the third iteration is x3 = 2.7984. Since, the value of x2 and x3 are constant to four decimal places; the sequence of iterations converges to a final value x = 2.7984. Example 2.11 Apply Newton-Raphson method to estimate a root close to x = 1 of the equation

cos x  x e x  0 Take x0 = 1, and denote

f ( x)  cos x  x e x f ( x)   sin x  e x  x e x If xn is a known iteration, then the next iteration is given by

xn 1  xn 

f ( xn ) , n  0,1, 2,.... f ( xn )

Starting with n = 0, the first approximation is given by

x1  x0 

f ( x0 ) ( 2.17798)  1  0.653079 f ( x0 ) ( 6.278035)

x2  x1 

f ( x1 )  0.460642  0.653079   0.531343  f ( x1 )  3.783942

For n = 1, 2, 3, 4 we get

2.22


r4  r3 

f (r3 ) 0.0000212  0.0143962   0.01439478 f (r3 ) 14.9108

Note that f (x4) = 1.5187 × 10–9. Thus, r4 is correct up to eighth places of decimal. Hence the monthly interest rate is 0.01439478, i.e. 1.439478%. Example 2.13 The circle in Figure 2.7 has radius 1, and the longer circular arc ACB is twice as long as the chord AB. Find the length of the chord AB correct up to 8 places of decimal.

B

A

C Figure 2.7 Circle of radius 1 in which arc ACB is twice as long as the chord AB

Let O be the centre of the circle, and l and l’ denotes the lengths of the arcs ACB and ADB respectively. Join A and B to O as shown in Figure 2.8, and let L be the foot of perpendicular drawn from O on AB. Since triangles OLA and OLB are congruent, thus AL = LB = x (say). Let θ be the angle AOL. Since r = 1, therefore the circumference l + l’ = 2π. Also, since l = 2 (AB) = 4x, thus l’ = 2π – 4x. Now the angle subtended by the arc ADB on the centre O is l’/r, i.e. l . Therefore

2  l   2  4 x which implies that θ = π – 2x. Now in the right angled triangle ALO, we have sin θ = x implying that

sin 2x  x D A

x

L

B

θ O C

Figure 2.8 Circle of radius 1 with a triangle OAB We will apply Newton-Raphson method to find a root of the equation

f ( x)  x  sin 2 x  0 Take initial approximation as x0 = 1, the first iteration is given by

2.29


xn 1  xn 

2.8

f ( xn ) , f ( xn )

n  0,1, 2,...

CHEBYSHEV METHOD

Suppose that x0 is an initial approximation of a root of f (x) = 0. If x1 = x0 + h is a better approximation of the root, then we can write

f  x0  h   0

(2.38)

Expanding f (x0 + h) by Taylor series about x = x0 and neglecting h3 and higher powers, we get

f ( x0 )  h f   x0  

h2 f ( x0 )  0 2

(2.39)

implying that

h

f ( x0 ) f ( x0 )  h2  f  x0  2 f   x0 

(2.40)

Now, we use equation (2.22) of Newton-Raphson method, and replace h in the right-hand side of equation (2.40) by – f (x0)/f ʹ(x0). The resulting equation is therefore

 f ( x0 ) f ( x ) f ( x0 )  0 f   x0  2  f   x   3 0   2

h

Thus the first approximation of the root is given by

 f ( x0 ) f ( x ) f ( x0 ) x1  x0   0 f   x0  2  f   x   3 0   2

The second iteration x2 can be obtained from x1 in a similar manner as x1 is obtained from x0. The process is repeated until a desired level of accuracy is obtained. The general form of Chebyshev formula can be written as

 f ( xn ) f ( x ), f ( xn )  n f   xn  2  f   x   3 n   2

xn 1  xn 

n  0,1, 2,...

(2.41)

The method requires three evaluations, namely f (xn), f ʹ(xn) and f ʹʹ(xn) in each iteration. Convergence of Chebyshev Method If εn and εn+1 are the errors in the nth and (n +1)th iterations of a root α, then equation (2.41) implies that

 f (   n ) f (   ) f (   n )  n   n f     n  2  f       3 n   2

 n 1

Expanding f (α + εn) and f ʹ(α + ε n) by Taylor series and noting that f (α) = 0, we get

(2.42)

2.32


(1)  1 x1  1   6  1.125 3   2 2  2 2

or

2

3 x 3  x  1  x1  x1  1 x2  x1  1 2 1  (6 x1 ) 3 3 x1  1 2 3 x12  1

(0.701172)  0.701172   1.125   6.75   1.299857 3  2.796875 2  2.796875  2

2

3 x 3  x  1  x2  x2  1 x3  x2  2 2 2  (6 x2 ) 3 3 x2  1 2 3 x2 2  1

(0.10358)  0.10358   1.299857    7.799144   1.324693 4.068888 2  4.068888 3 2

2

3 x 3  x  1  x3  x3  1 x4  x3  3 2 3  (6 x3 ) 3 3 x3  1 2 3 x32  1

(0.000106)  0.000106   1.324693   7.948158   1.324718 3  4.264435 2  4.264435  2

Iteration x4 is correct up to 6 places of decimal.

2.9

MULTIPOINT ITERATION METHOD

The Chebyshev method can be modified to obtain another method known as multipoint iterative method that does not require the evaluation of second order derivative. If xn and xn+1 are two successive iterations of a roots of f (x) = 0, then equation (2.39) can be written as

f ( xn )  h f   xn  

h2 f ( xn )  0 2

(2.46)

where h = xn+1 – xn. Rewrite equation (2.46) as

h

f ( xn ) , n  0,1, 2,... h f   xn   f ( xn ) 2

Note that the denominator in the right-hand side can be approximated by Taylor series expansion of the function f ʹ(xn + h/2). Therefore

2.41


2.12

FIXED POINT ITERATION METHOD

Let g (x) be a given function. A point x0 is called a fixed point of the function g(x) if x0 is mapped to itself by the function, i.e. g(x0) = x0. Geometrically, a fixed point of a function g (x) is the point where the curve y = g(x) intersects the straight line y = x. For instance, x = 2 and 3 are two fixed points of the function f (x) = x2– 4 x + 6, for f (2) = 2 and f (3) = 3. Not all functions have fixed points. In fixed point iteration method, we rearrange a given equation f (x) = 0 in the following form:

x  g  x

(2.68)

Each fixed point of g(x) is a root of f (x) = 0. Our job now is to approximate a fixed point α of the function g(x) by an iterative method. The method includes guessing of an initial approximation, and then improving this approximation successively using equation (2.68) up to a certain preset accuracy level. Let x0 be an initial guess of a fixed point α of g(x), then the first approximation is given by

x1  g ( x0 ) Similarly, the subsequent iterations are given by

x2  g ( x1 ) x3  g ( x2 ) and so on. In general, if xn be the current iteration then the next iteration is given by

xn 1  g ( xn ),

n  0, 1, 2, ...

(2.69)

It is important to note that the sequence of approximation {xi} generated by the above iteration scheme may not necessarily converge to the fixed point α. Note that from the viewpoint of the curves y = x and y = g(x), the points (x1, x1), (x2, x2) etc lie on the straight lines y = x, whereas the points (x0, x1), (x1, x2) etc lie on the curve y = g(x). We can easily display the convergence or divergence of the sequence x0, x1, x2, … by joining the points (x0, 0), (x0, x1), (x1, x1), (x1, x2), (x2, x2) etc using line segments. The sequence {xi} converges to the root α only if these line segments converge to the point of intersection of y = x and y = g(x). Another noteworthy point is that the effectiveness of iteration method mainly depends on the choice of function g(x) and an initial approximation x0. For instance, the equation f (x) = x3 – 5x – 10 = 0 can be rearranged in the following different ways: (a) x = (5 x + 10)1/3 (b) x = (x3 – 10)/5 (c) x = 10/ (x2 – 5) It can be verified that the equation f (x) = 0 has a real root near x = 3. If we take x0 = 2.5 (which is fairly good initial approximation of the root) and apply form (a) by letting g(x) = (5 x + 10)1/3, we get the sequence x1 = 2.8231, x2 = 2.8891, x3 = 2.9022, x4 = 2.9048 etc, which gradually converges to a final value 2.9055. On the other hand, if we take g(x) = (x3 – 10)/5 or 10/ (x2 – 5), then we find that the sequence of iteration diverges away from the root. We will now prove an important theorem that prescribes the sufficient condition for the sequence of iterations {xn} to converge to a root. Theorem 2.3. If an equation x = g(x) has a root α in an interval (a, b), and i) g(x) is differentiable function in [a, b] ii) | gʹ(x) | ≤ l < 1 for all x  [a, b] then the sequence of iteration {xn} given by xn + 1 = g (xn) converges to the root α for any choice of initial approximation x0 in [a, b].

2.43


In the first case, we have | gʹ(x) | < 1 for all x  [2, 3], while in the remaining two cases, | gʹ(x) | > 1 in the same interval. It means that the sequence of iterations will converge to the root in the first case and diverge away in the remaining two cases. Moreover, the successive iterations will form a staircase pattern in first case because of 0 < gʹ (x) < 1 for all x  [2, 3]. (a)

Y y = g (x)

y=x (x1, x1)

(x0, x1)

(x2, x2)

(x1, x2) (x2, x3)

(b)

x0

α

O

X

Y y=x

(x0, x1)

(x1, x1)

(x1, x2) y = g(x) O

x0

α

X

Figure 2.10 Fixed point iteration method: (a) staircase pattern, and (b) cobweb pattern

We now prove a result which can be used as stopping criterion for the fixed point iteration method. Theorem 2.4. Suppose that the equation x = g(x) has a root x = α in interval (a, b). If xn denotes nth iteration of fixed point iteration scheme xn = g (xn – 1), n = 1, 2, …, then

|  xn | 

ln | x1  x0 | 1 l

where g (x) is differentiable on (a, b) and | g (x) | ≤ l < 1.

2.47


Example 2.23 A root of equation f (x) = 0 is approximated using an iterative scheme xn+1 = g (xn) where n = 0, 1, … If the function g(x) is chosen in the following manner g  x  x  a f  b f 2  c f 3

where f = f (x), then determine the values of a, b and c so that the order of the iteration method is maximum. Let x = α be an exact root of the equation f (x) = 0 which is solved by iteration method xn+1 = g (xn). Since the iteration function g (x) contains three unknowns a, b and c; three conditions are required to evaluate them. Also, to maximize the order of the method, the derivatives g (α), g (α) etc are equated to zero. Taking both aspect into consideration, we impose three conditions, namely g (α) = g (α) = g (α) = 0. Moreover, noting that f (α) = 0, one obtains g     0  1  a f   0

g     0   a f   2 b  f    0 2

g     0   a f   6 b f  f   6 c  f    0 3

where f  = f (α). Solving these equations, we obtain a

b 

1 f

f  2  f 

3

3  f    f  f  2

c

6  f 

5

Aitken’s ∆ 2 Process to Accelerate the Convergence: Steffensen Method Aitken’s process is used to accelerate the convergence of a linearly convergent scheme. Suppose that xn –1, xn and xn+1 are three consecutive approximations obtained from fixed point iteration method. If the terms εn–1, εn and εn+1 denote the errors involved in these iterations, then xn–1 = α + εn–1, xn = α + εn and xn+1 = α + εn+1. Moreover, equation (2.73) implies that

 n 1  n  n  n 1 or

 xn 1    xn 1      xn   

On simplification, we get



xn 1 xn 1  ( xn ) 2 xn 1  2 xn  xn 1

(2.74) 2

2.58

2.14


ROOTS OF POLYNOMIAL EQUATIONS

In previous sections, we discussed several techniques to find the roots of an arbitrary nonlinear equation f (x) = 0. We now consider a special case when the function f (x) is a polynomial. Such polynomial equations often arise in differential equations with constant coefficients and numerical integration. 2.14.1 Birge-Vieta Method Birge-Vieta method is an iterative technique to determine real linear factors of a given polynomial. The method is applicable only when all the coefficients of the polynomial are real. The method is based on the remainder theorem that states that if a polynomial Pn (x) is divided by a linear factor (x – q) then the remainder R is given by R = Pn (q), and thus, if x = α is a zero of Pn (x), then R = Pn (α) = 0. In Birge-Vieta method, we start with an initial approximation q and successively improve it to a final value q such that R (q) = 0. To understand the concept better, consider a polynomial Pn (x) of degree n of the form

Pn ( x)  a0 x n  a1 x n 1  ........  an 1 x  an ,

a0  0

(2.93)

if this polynomial is divided by a linear factor (x – q) where q is a real number, then we can write

Pn ( x)  ( x  q ) Qn 1 ( x)  R

(2.94)

where R is the remainder. The quotient Qn–1(x) is known as the deflated polynomial, and given by

Qn 1 ( x)  b0 x n 1  b1 x n  2  b2 x n  3 ....  bn  2 x  bn 1

(2.95)

where bi’s are real constants. It must be noted that the values of constants bi’s and remainder R depends on q. Thus, these parameters can be regarded as uniquely determined functions of q. Now suppose that q +  q is a better approximation of a zero of Pn(x), then

R q   q  0

(2.96)

Expanding the left-hand side by Taylor series only up to ∆ q, we obtain

Rq or

dR 0 dq

q  

R R

(2.97)

where Rʹ = dR/dq. Equation (2.97) can be used as an iterative formula to improve the approximations successively till a desired level of accuracy is obtained. The only task we have to complete is to determine the value of R in terms of q, and for that, we require the values of constants bi’s which can be obtained by using equation (2.95) in (2.94) and comparing the coefficients of similar powers of x. The result is

a0  b0 a1  b1  b0 q

Chapter 3

SOLUTION OF LINEAR SYSTEM OF ALGEBRAIC EQUATIONS

3.1

INTRODUCTION

Collection of linear equations involving a set of variables is called a linear system of equations. It holds important place in the studies of science, engineering and business for its capability to model large number of computational problems, and a myriad of other applications. Often, a modeling process results in nonlinear equations which are reduced to systems of linear equations when we apply iterative solution procedures. In fact, most of the numerical methods to solve partial differential equations, integral equations and boundary value problems of ordinary differential equations convert the original problem into linear system of equations using finite difference methods. In most cases, these systems involve large number of equations which cannot be solved by ordinary methods. Moreover, we are often interested in the methods which can be efficiently implemented by computer programs. A system of linear algebraic equation consisting n equations in n variables can be written as

a1 1 x1  a12 x2    a1 n xn  b1 a2 1 x1  a2 2 x2    a2 n xn  b2 

(3.1)

an 1 x1  an 2 x2    an n xn  bn where xj ( j = 1, 2,…, n) are unknowns, ai j (i, j = 1, 2, …, n) are coefficients, and bj (j = 1, 2, …, n) are the non-homogeneous terms. The system (3.1) is called a homogeneous linear system if bj = 0 ∀ j. Using summation notation, the linear system can be expressed as n

a j 1

ij

x j  bi , i  1, 2, , n

(3.2)

Another convenient method to represent system (3.1) is in matrix notations. We write Ax  b

where x = [x1 x2 ⋯ xn]T, b = [b1 b2 ⋯ bn]T and the coefficient matrix A is given by

(3.3)

3.5


3 3 2 D2  2 3 1  16, 1 4 1

3 1 3 D 3  2 3  3   8 1 2 4

Thus, the solution of the given system is

x1 

D1 A

 1, x2 

D2 A

 2,

x3 

D3 A

 1

3.2.2 Gauss Elimination Method Gauss elimination method–named after famous German mathematician C. F. Gauss (1777 – 1855)– proceeds in two steps: (i) forward elimination, and (ii) backward substitution. The first step involves elimination of the forward variables of the given linear system by apply row operations successively on the augmented matrix [A | b] until the matrix A in it is transformed to an upper triangular matrix. In the second step, the values of variables xi’s are calculated by back substitution, i.e. we solve the last equation of the reduced system and use the result to solve the second last equation, and so on. The procedure is explained by considering a linear system

 a11 a  21  a31    a  n1

a12 a22 a32 an 2

a13  a1n   x1   b1  a23  a2 n   x2  b2      a33  a3n   x3    b3              an 3  an n   xn  bn 

(3.8a)

whose augmented matrix is

 a11   a21  A | b    a31   a  n1

a12 a22 a32 an 2

a13  a1n b1   a23  a2 n b2  a33  a3n b3      an 3  ann bn 

Now, apply row operations Ri – (ai1/a11)R1, i = 2, 3, ⋯, n on matrix [A | b] so that all the elements of first column except a11 (i.e. ai1, i = 2, 3, …, n) become zero. This process is called as the first step of forward elimination, after which we get

 a11  0  A | b    0  0 

a12  a22  a32 an 2

a13  a1 n   a2 n a23   a3n a33   an 3  an n

b1   b2  b3    bn 

3.6

Linear System of Algebraic Equations

where aʹi j = ai j – (ai1 / a11)a1j, i, j = 2, 3, ⋯, n, i ≤ j. For the second step of forward elimination, apply row operations Ri – (aʹi2 /aʹ22) R2, i = 3, 4, ⋯, n, to get

 a11  0  A | b    0  0 

a13  a1n b1     a2 n b2  a23   a3n b3 a33     an3  ann bn 

a12  a22 0 0

where ai j = aʹi j – (aʹi 2 / aʹ22) aʹ2 j, i, j = 3, 4, ⋯, n, i ≤ j. The process of forward variable elimination is repeated (n – 1) times so that the original matrix A is transformed to an upper triangular matrix, i.e.

 a11  0  A | b    0  0 

a12  a22 0

a13    a23   a33 

0

0

a1n a2 n a3n 

 an( nn1)

       bn( n 1)  b1 b2 b3

(3.8b)

The linear system corresponding to the above matrix can be written as

a11 x1  a12 x2  a13 x3    a1n xn  b1  x2  a23  x3    a2 n xn  b2 a22  x3    a3n xn  b3 a33

(3.9)

 an( nn1) xn  bn( n 1) Now, the solution of the system (3.8) can be obtained by backward substitution (also known as back substitution), i.e. the last equation of (3.9) is solved first to get

xn 

bn ( n 1) ann ( n 1)

(3.10a)

Substitute this value in the second last equation of (3.9) to obtain the value of xn-1. The process of back substitution is continued until all xi s are obtained. The values of the variables can be generalized as

xi 

bi

i 1

n

  ai j  x j j  i 1

aii

i 1

i 1

,

i  n  1, n  2, ,1

(3.10b)

Operation Counts One of the important factors to assess the efficiency of a numerical scheme is the computational cost of the scheme, i.e. time taken in the computation and the cost of hardware used. Computational cost of any scheme is directly linked to the number of operations involved in it. In the following part, we will

3.7


show that the Gauss elimination method requires a total of 2n3/3 + 3n2/2 – 7n/6 mathematical operations: Let’s first develop an approach for operation counts: we will calculate the number of operations involved in forward elimination and back substitution separately. The meaning of ‘operations’ here is addition, subtraction, multiplication and division. The first step of forward elimination consists of n – 1 row transformations of the form Ri – (ai1/a11)R1. For each row, the division ai1/a11 needs to be calculated only once, and thus the number of operations carried out when the row transformation is applied to the any row is 2n +1. So, the number of operations involved in the first step of forward elimination is (2n + 1)(n – 1). Similarly, the second step of forward elimination consists of n – 2 row transformations, each requiring 2n – 1 operations. The process of forward elimination is continued up to n – 1 times. In the last step we require one row transformation involving 5 operations. Thus the total number of mathematical operations, N1, involved in the forward elimination is:

N1   2n  1 n  1   2n  1 n  3    5 n

   2i  1 i  1 i2 n

n

n

i2

i2

i2

  2 i 2  i  1  n(n  1)(2n  1)   n(n  1)   2  1    1  (n  1) 6    2  

2 3 1 2 7 n  n  n 3 2 6

Furthermore, computation of variable xn by back substitution requires 1 operation (division). Similarly, the number of operation involved in calculation of xn–1, xn–2, …., x1 are 3, 5, …, 2n–1 respectively. Since 1 + 3 + 5 + … + (2n – 1) = n2; the total number of operations, N, involved in the Gauss elimination method is:

N  N1  n 2 

2 3 1 2 7 n  n  n  n2 3 2 6



2 3 3 2 7 n  n  n 3 2 6

 , a33  ,  , an( nn1) in matrix (3.8b) are called the pivot elements. One of the major The elements a11 , a22 drawbacks of Gauss elimination method is that it leads to division by zero if any of the pivot elements becomes zero. Another apparent disadvantage of this method is that the round-off error due to successive divisions and subtractions accumulates in the last equation for xn. The errors spreads further by the process of back substitution to all the variables, and shows its maximum effect into the value of x1. This is known as the propagation of error. Systems containing large number of equations are particularly prone to such errors. To avoid these pitfalls, we use pivoting of augmented matrix at each stage of forward elimination. The pivoting process involves interchanging of the row having zero pivot with one of the rows below it. There are generally two types of pivoting strategies: partial pivoting and full pivoting.

3.8


Partial Pivoting Partial pivoting is based on the fact that the order of linear equations in a system can be interchanged without affecting the solution. At the beginning of the first step of forward elimination, select the element whose magnitude is largest in the first column, and interchange the row having this element with first row. Similarly, in the beginning of second step of elimination, select the element with largest magnitude in the second column except its first element, and interchange the corresponding row with second row. In other words, at the beginning of kth step of forward elimination, we find the maximum of |akk|, |ak+1,k|,⋯, |ank|. Suppose that the maximum value is |apk|, k ≤ p ≤ n, then the kth and pth rows are interchanged (see Figure 3.1).

Rows where pivoting has been done

akk

Row where pivoting is being done

apk

Row with maximum absolute value

element

Column to search for max absolute value Figure 3.1 Partial pivoting of a linear system

The main idea of partial pivoting is to ensure that the pivot elements remain non zero and the multiplier used for row transformation is as small as possible so that the effects of round-off errors are minimized. Full Pivoting At the beginning of first step of forward elimination, search the entire matrix A for an element with largest magnitude. Suppose that this element belongs to ith row and jth column. We then interchange the first row with i th row, and first column with j th column of matrix A. It must be noted that the interchanging of column requires reordering of the corresponding variables. Therefore, we also interchange the order of first and i th variables in the matrix x. This process is primarily intended to make the largest magnitude element as the first pivot of the matrix A. Similarly, at the beginning of second step of forward elimination, we search for an element with largest magnitude within the sub matrix which excludes the first row and first column of the modified matrix. Interchange the corresponding rows, columns and order of variable as stated earlier. In general, at the beginning of kth step, we select the sub matrix that excludes first k rows and k columns of the modified matrix A, and search for the element with largest magnitude in it. If this element belongs to l th row and mth column, then interchange the kth row with lth row, kth column with mth column and the kth variable with mth variable in matrix x (see Figure 3.2).

3.17


d 4  d 4 

a4 d3 (1)(1 / 3) 5  1  b3 4/3 4

Thus, the solution is given by

x4 

d 4 5 / 4  1 b4 5 / 4

x3 

d3  c3 x4 1/ 3  (1)(1)  1 b3 4/3

x2 

d 2  c2 x3 1 / 2  (1)(1)  1 b2 3/ 2

x1 

d1  c1 x2 1  (1)(1)  1 b1 2

Multiple Right-Hand Sides Sometimes, we come across several systems of linear equations having same coefficient matrix, but different right-hand sides. This situation occurs often when we are given a linear system and asked to analyze the sensitivity of its solution when the values of the constants in the right-hand sides vary. Also, when an engineering process is modeled whose boundary conditions change frequently, one gets several values in the right-hand sides of the linear equations. Assume that we are solving m linear systems, each having the same coefficient matrix A but different right-hand sides, i.e. the linear equations of the form

A x1  b1 A x2  b2

(3.18)

  A xm  bm

where xi = [x1i x2i ⋯ xni]T and bi = [b1i b2i ⋯ bni]T, i = 1, 2, …, m. The above system of linear systems can be expressed in a matrix form as

Ax  b

(3.19)

where x = [x1 x2 … xm] and b = [b1 b2 … bm]. Such system can be solved by Gauss elimination method with a bit of extra efforts. Reduce the coefficient matrix A of the augmented matrix [A | b] to an upper triangular matrix by row operations. If that the reduced augmented matrix is [Aʹ | bʹ], then the backward substitution is applied to each column of bʹ to obtain the solutions. The procedure is illustrated in the following example. Example 3.8 Use Gauss elimination method to find the solution of the following linear systems

x y z  1

p  q  r  2

u v w 2

2 x  y  3z  4 ;

2 p  q  3r  5 ; 2u  v  3w   1

3 x  2 y  2 z  2

3 p  2q  2r  1

3u  2v  2w  4

3.25


 a11 a  21  a31    a  n1

a12 a22 a32 an 2

a13  a1n   l11 0 a23  a2 n  l21 l22 a33  a3n   l31 l32       an 3  an n  ln1 ln 2

0 0 l33 ln 3

 0  u11 u12   0   0 u22  0  0 0       ln n   0 0

l1 1 u1 2  l11 u11 l u l u  l u  2 1 11 21 1 2 2 2 2 2   l3 1 u11 l31 u1 2  l3 2 u2 2    l u  n1 11 ln1 u12  ln 2 u2 2

u13  u1n   u23  u2 n  u33  u3n      0  un n 

l1 1 u13 l2 1 u13  l2 2 u2 3 l31 u13  l3 2 u2 3  l33 u33 ln1 u13  ln 2 u2 3  ln 3 u33



l1 1 u1 n



l21 u1 n  l2 2 u2 n

    l31 u1 n  l3 2 u2 n  l33 u3 n       ln1 u1 n  ln 2 u2 n    ln n u n n 

On equating the matrices of left- and right-hand sides, we get n2 equations in n2 + n unknowns, implying that uij and lij determined from these equations are not unique. Thus the LU decomposition of the matrix A is not unique. To strike uniqueness, we reduce the number of unknowns to n2. This can be done by choosing the n diagonal elements of either of the matrices L or U as unity. Doolittle Factorization If the diagonal elements of the lower triangular matrix L are chosen as unity, i.e. lii = 1, then the LU factorization is said to be Doolittle factorization. In this case, the first row of the matrix U becomes identical to the first row the coefficient matrix A, i.e. u1j = a1j, j = 1, 2, …, n. Moreover, we get li1 = ai1/a11, i = 2, 3, …, n. We can then calculate the second row of U, and after that, the second column of L. This process is continued until we get determine L and U completely. Crout Factorization If the diagonal elements of the upper triangular matrix U are chosen as 1, i.e. uii = 1, then the factorization method is known as Crout factorization, named after American mathematician P. D. Crout (1907–1984). In this case, the first column of the matrix L becomes identical to the first column of the coefficient matrix A. In other words, li1 = ai1, i = 1, 2, …, n. Also, we get u1j = a1j / a11. Using these values, we calculate the second column of L and after that the second row of U. The process is continues all values are determined. Example 3.12 Decompose the following matrix using (i) Doolittle, and (ii) Crout factorization method

1 4 3 A   2 7 9   5 8 2  (i) Doolittle factorization: Suppose that

1 4 3  1 0  2 7 9   l    21 1  5 8 2   l31 l3 2

0  u11 u12  0   0 u2 2 1   0 0

u13   u2 3  u33 

3.27


1 2 3  1 0  2 4 2   l    21 1  2 5 1  l31 l32

0  u11 u12 0   0 u22 1   0 0

u12  u11  l21u11 l21u12  u22 l31u11 l31u12  l32 u22

u13  u23  u33  u13

   l31u13  l32u23  u33  l21u13  u23

which yields u11 = 1, u12 = 2, u13 = 3, l21 = 2 and l31 = 2; but, it fails as the pivot element u22 becomes zero. In case, the coefficient matrix of the linear system does not admit LU factorization; we interchange the corresponding row with another row of the given linear system. As pointed out in the beginning of the section, the LU factorization of a matrix A is not unique. However, for any two distinct factorization of A, there is a relationship between the pair of upper and lower triangular matrices. Suppose that A = L1 U1 = L2 U2, then it follows that

L21L1  U 2 U11 Since the matrix in the left is a lower triangular, while the matrix in the right is an upper triangular. Thus, these matrices can be equal if and only if they result in a diagonal matrix, say D. Now, it can be easily shown that the relationship between is these matrices is L1 = L2 D and U2 = D U1. Cholesky Factorization Before outlining this method, we define two important two important classes of matrices, namely strictly diagonally dominant matrices and symmetric positive definite matrices. A square matrix is strictly diagonally dominant if the magnitude of the diagonal element in each row is strictly greater than then the sum of the magnitudes of remaining elements in the row. In other words, matrix A = [ai j]n  n is strictly diagonally dominant if

a11  a12  a13    a1n a22  a21  a23    a2 n  an n  an 1  an 2    an n 1 For instance, the matrix

2 1 4 2  4 1    2 1 6  is strictly diagonally dominant because |a11| > |a12| + |a13|, |a22| > |a21| + |a23|, |a33| > |a31| + |a32|, whereas the matrix 1 2 3 2 4 2    2 5 1 

3.34


Solving, we get: l11 = 1, l21 = 3, l31 = 6, l22 = 2, l32 = 1, l33 = 5. Thus

Inverse of L is obtained as

1 0 0  L   3 2 0  6 1 5   10 1  L   15 10  9 1

Therefore, the inverse of A is

 

A 1  L1

T

L1 

0 0 5 0  1 2 

0 0 10 15 9   10  203 33 9  1    15 5 0   1  33 13 1 0 5  1   50   100   0  9 1 0 2   9 1 2  2 

Operation Counts: Doolittle and Crout factorization Both Doolittle and Crout methods require same number of mathematical operations for factorization of a matrix A of order n × n. Operation count is quite simple. If we use Doolittle factorization, then the LU decomposition can be written as  a11 a  21  a31    a  n1

a12 a22 a32 an 2

a13  a1n   u11 u12   a23  a2 n  l21 u11 l21 u12  u2 2 a33  a3n    l31 u11 l31 u12  l3 2 u2 2        an 3  an n  ln1 u11 ln1 u12  ln 2

u13  u1 n   l21 u13  u2 3  l21 u1 n  u2 n  l31 u13  l3 2 u2 3  u33  l31 u1 n  l3 2 u2 n  u3 n      ln1 u13  ln 2 u2 3  ln 3  ln1 u1 n  ln 2 u2 n    un n 

Now equate the matrices in the left- and right-hand side. Obviously, computation of u1 j does not require any operation. Equating the first column (except its first element), we can obtain each li 1 by a single division. Next equate the second row (except first element) of both matrices. Note that determination of u2j requires two operations (one division and one subtraction). This process is continued until we determine un n which requires 2n – 2 operations. The number of operations required corresponding to each entry of the right-hand side matrix is given in matrix form:

0 1  1   1 

0 0  2 2  3 4   3 5 

0

 2  4     2n  2 

Thus the total number of operations, N, can be obtained from

N  1 (n  1)  3  (n  2)    (2n  3)  1  2  (n  1)  4  (n  2)    (2n  2)  1 

2 3 1 2 1 n  n  n 3 2 6

3.41


3.3

ITERATIVE METHODS

Direct methods provide solution of a linear system in finite number of steps. The accuracy of the solution obtained from a direct method depends mainly on the nature of the linear system rather than the choice of the method. These methods work very well for small systems, i.e. system containing few equations. However, most of the mathematical processes include linear system with large number of equations. Moreover, the coefficient matrix associated with such system is often sparse, i.e. most of elements of the coefficient matrix are zero. Such systems cannot be efficiently solved by direct methods. We will now discuss iterative methods in which we start with a ‘wise’ guess of the solution, and then, the iterative scheme proceeds to yield a better approximation of the solution. Thus, we get a new solution at the end of each iteration. These methods use lesser number of steps than direct methods, and therefore, generally preferred over direct methods when the number of equations in a linear system is large. Iterative methods are very efficient when the coefficient matrix A is sparse. Iterative schemes to find solution of linear system are similar to fixed point iteration method for finding roots of equations. Here, we treat the linear system A x = b as an equation A x – b = 0 and look for a matrix x that satisfies this equation. The equation is then converted in an equivalent form x  Ex r

(3.34)

where E and r are appropriate matrices which differ from scheme to scheme. Matrices E and r are known as iteration matrix and residual matrix respectively. Starting from some initial guess x(0) of the solution, the iterative formula uses (3.34) to generate a sequence of approximations given by

x ( k 1)  E x ( k )  r

(3.35)

The sequence of approximations thus obtained may or may not converge to a final value. However, if an iterative method converges, then the accuracy of the solution mainly depends on the number of times that the scheme has been applied. Most of the iterative methods work on the notion of splitting of the coefficient matrix. Suppose that A is a n  n matrix. If M and N be two matrices of order n  n such that M is a nonsingular matrix and A = M + N, then the matrices M and N are called a splitting pair of matrix A. The splitting can help converting the linear system A x = b in the form (M + N) x = b which can further be expressed as an iterative relation x  M 1N x  M 1b

which is in the form of equation (3.34). 3.3.1 Gauss-Jacobi Method Consider the following splitting of the coefficient matrix A

A  LDU

(3.36)

where L is a strictly lower triangular (li j = 0 for i ≤ j), D is a diagonal matrix, and U is strictly upper triangular matrix (ui j = 0 for i ≥ j). With this splitting of matrix A the linear system A x = b can be expressed as (L  D  U ) x  b or

D x   (L  U ) x  b

(3.37)

3.46


1 1 1 1 3 x2(1)  2 x3(1)    3  1.6  2  3.286    0.93 9 3 9 3 1 (1) 1   x1  x3(1)  1.6    0.222  3.286   1.6  2.213 5 5 1 (1) 26 1 26  x1  2 x2(1)     0.222  2  1.6    3.225 7 7 7 7

x1(2) 



x2(2)



x3(2)









Repeating the process two more times, we get the third and fourth iterations as

1 1 1 1 3 x2(2)  2 x3(2)    3  2.213  2  3.225    1.121 9 3 9 3 1 1   x1(2)  x3(2)  1.6    0.93  3.225   1.6  2.059 5 5 1 (2) 26 1 26  x1  2 x2(2)     0.93  2  2.213   2.949 7 7 7 7

x1(3) 



x2(3)



x3(3) and







1 1 1 1 3 x2(3)  2 x3(3)    3  2.059  2  2.949    1.008 9 3 9 3 1 1   x1(3)  x3(3)  1.6   1.121  2.949   1.6  1.966 5 5 1 (3) 26 1 26  x1  2 x2(3)    1.121  2  2.059    2.966 7 7 7 7

x1(4) 



x2(4)



x3(4)











Exact solution of the given system is x1 = 1, x2 = 2, x3 =3. 3.3.2 Gauss-Seidel Method We have seen in Gauss-Jordan method that the values of the variable xi obtained in the nth iteration remain unused until the entire nth iteration is completed, resulting in slow convergence of the method. This process is modified in Gauss-Seidel [P. L. Seidel (1821 – 1896) was a German mathematician] method where the approximations are used as soon as they become available. That is, once we have calculated x1 from the first equation, we use it in second equation to calculate x2. The values of x1 and x2 are then used in third equation to obtain x3, and so on. Thus, Gauss-Seidel method is a process of successive replacement rather than simultaneous replacement. The iteration scheme can be expressed as

x2( k 1)  xn( k 1)

b 1 a12 x2( k )  a13 x3( k )    a1 n xn( k )  1 a11 a11

    b 1  a21 x1( k 1)  a23 x3( k )    a2 n xn( k )  2  a22 a22      b 1  an1 x1( k 1)  an 2 x2( k 1)    an n 1 xn( k11)  n  an n an n 

x1( k 1)  













(3.43)

3.49

Fundamentals of Numerical Methods Put k = 0 and use the initial approximation x(0) = [1, 1, 1]T, we get

 0.2  1  3   2.6  0 0.6 x (1)   0 0.3 0.15  1   4.5    4.05 0 0.2 0.017  1 3.333 3.55  Similarly, for k = 1, 2 and 3

x

(2)

x

(3)

 0.2   2.6   3   1.28  0 0.6    0 0.3 0.15   4.05   4.5    2.753 0 0.2 0.017  3.55  3.333  4.202   0.2   1.28   3   2.189  0 0.6    0 0.3 0.15   2.753   4.5   3.044  0 0.2 0.017   4.202  3.333 3.954 

 0.2   2.189   3  1.964  0 0.6 x (4)   0 0.3 0.15  3.044    4.5    2.994  0 0.2 0.017  3.954  3.333  4.008  3.3.3 Successive Over Relaxation Method The Successive Over Relaxation (SOR) method is designed to improve the convergence of GaussSeidel method by taking the weighted average of x(k) and x(k+1) produced by the Gauss-Seidel method. The method uses a weighting factor which can be selected appropriately to enhance the convergence of the solution. Recall that in Gauss-Seidel method, the iteration x(k+1) can be regarded as a sum of current iteration x(k) and an increment x(k+1) – x(k). That is

x ( k 1)  x ( k )  (x ( k 1)  x ( k ) )

(3.48)

where the increment can be obtained using equation (3.47) as

x ( k 1)  x ( k )   D1L x ( k 1)  D1U x ( k )  D1b  x ( k )

(3.49)

In SOR scheme, the above increment is multiplied by a weight factor ω to control the convergence of the method. In other words, we assume that

x ( k 1)  x ( k )   (x ( k 1)  x ( k ) )

(3.50)

Parameter ω is known as the relaxation factor. Using equation (3.49) in (3.50), we get

x ( k 1)  x ( k )     D1L x ( k 1)  D1U x ( k )  D1b  x ( k )  which can be simplified as

x ( k 1)  (D   L) 1  (1   )D   U  x ( k )   (D   L) 1 b Alternatively, equation (3.51) can be derived by multiplying the equation

(3.51)

3.62


 0.6 0 B J   I  0.6  0.2  0    2  0.4  0 0 0.2 





The eigenvalues are 0, ± √0.4. Thus ρ(BJ) = √0.4. Now the optimal value of ω can be obtained from

opt 

3.5

2 1  1     B J 

2



2 1  1  0.4

 1.127

ILL-CONDITIONED LINEAR SYSTEM

When we solve a linear system, it is usually expected that a small change in the coefficient matrix A or matrix b will not change its solution invariably. Unfortunately, it is not true in many cases. For example, consider the following system

400 x1  201 x2 

200

800 x1  401 x2   200 whose solution is x1 = – 301, x2 = 600. Now, suppose that the coefficient of x1 in the first equation is changed from 400 to 401 so that the system becomes

401 x1  201 x2 

201

800 x1  401 x2   200 The solution of the new system is x1 = 120801, x2 = – 241000; much different than what one would have expected! A linear system whose solution is extremely sensitive to the variation in its coefficients is called an Ill-conditioned system. Unlike well-conditioned system, an ill-conditioned system is quite over reactive to the round off errors generated by the solution procedure, and thus, may lead to serious underestimation or overestimation of the actual result. We will now analyze the sensitivity of the solution with change in the coefficient matrix A and matrix b of the system A x = b. Suppose that if matrix A is changed to A + ∆ A and b is changed to b + ∆ b, the solution changes to x + ∆ x. Thus

( A   A )(x   x)  b  b Simplifying, we get

( A   A ) x   b  ( A )x or

 x  ( A   A ) 1 [  b  (  A ) x ]

or

 x  A 1 ( I  A  1  A )  1 [  b  (  A ) x ]

thus

  x    A 1   (I  A 1 A) 1    b    ( A )x 

or

  x    A 1   (I  A 1 A) 1    b    ( A )   x 

or

 b  x   A     A 1  (I  A 1 A) 1   A     x  A   x   A  


4.5

This means that the polynomial Hn(x) which is of degree n or less, has n + 1 distinct zeros. This is possible only if Hn (x) is identically zero, yielding Pn (x) = Gn(x). ■ Note that the degree of Pn(x) may be less than n. For instance, if all points (x0, f0), (x1, f1), …, (xn, fn) lie on a straight line then Pn(x) will be a polynomial of degree 1. Similarly, if all fi’s are same, then Pn(x) is constant (i.e. of degree 0). The method described above uses monomial 1, x, x2, …, xn as the basis for the interpolating polynomial. The biggest disadvantage of this method is that the condition number (a parameter that measures how much the output value of the matrix can change for a small change in its elements) of the Vandermonde matrix is very large; signifying large amount of efforts and error when Gauss elimination method is used for computation of ai’s. Fortunately, there are other simpler methods that can be used for construction of interpolating polynomial. These methods do not use the monomials 1, x, x2, …, xn as the basis of the interpolating polynomial. We describe them below: 4.2.1 Lagrange Form of Interpolating Polynomial Construction of Lagrange form of polynomial passing through n + 1 distinct points (x0, f0), (x1, f1), …, (xn, fn) is extremely simple. We first define a polynomial Ln,i (x) which is 1 at node xi and 0 at all other nodes n

Ln,i  x    j 0 j i

x  x  x  x  j

i

(4.8)

j

The product in the right is taken over all j from j = 0 to n except j = i. Note that each Ln,i(x) is precisely of degree n for i = 0, 1, 2, …, n. In the expanded form, polynomial Ln,i (x) is Ln ,i  x  

 x  x0  x  x1   x  xi 1  x  xi 1   x  xn   xi  x0  xi  x1   xi  xi 1  xi  xi 1   xi  xn 

so that when x = xi, the resulting product is 1, and when x = xi, i ≠ j, one of the factors in the numerator will be zero. Moreover, the fact that no two xi’s are same implies that the denominator is nonzero and the expression in the right-hand side is well defined. Now, we can construct the desired interpolating polynomial with the help of Ln,i(x). Define n

Ln  x    Ln ,i  x  f i

(4.9)

i 0

Since each Ln,i(x) is a polynomial of degree n; the summation in the right is a polynomial of degree at most n. Furthermore, the fact that Ln,i(xi) = 1 and Ln,i(xj) = 0 for i ≠ j, we can deduce that Ln(xi) = fi. Polynomial (4.9) defined above is known as Lagrange form of interpolating polynomial, named after J. L. Lagrange (1736 – 1813) who is usually considered to be a French mathematician but Italians Encyclopedia refers him an Italian mathematician. Terms Ln,i(x) serve as basis for the interpolating polynomials and often known as Lagrange’s fundamental polynomials. In terms of Kronecker delta δ i j, one can write

1 when i  j Ln ,i  x j    i j   0 when i  j

(4.10)

If each fi is equal to 1, then uniqueness of the interpolating polynomials implies that Ln(x) = 1. Thus, equation (4.9) leads to the result


y0 

4.11

 0  1 0  1 0  3  0  3 0  1 0  3 y3  y  3  1 3  1 3  3  1  3 1  1 1  3 1 

or

y0  

or

y0 

 0  3 0  1 0  3  0  3 0  1 0  1 y1  y 1  31  11  3  3  3 3  1 3  1 3

1 9 9 1 y3  y1  y1  y3 16 16 16 16

1 9  y1  y1    y3  y3  16 

4.2.2 Newton Form of Interpolating Polynomial Isaac Newton (1643 – 1727) suggested an elegant way of expressing the interpolating polynomial. Instead of choosing monomials 1, x, x2, …, xn or Ln,i (x) as the basis, he expressed the polynomial as Pn  x   a0  a1 G1  x   a2 G2  x     an Gn  x 

(4.21)

where a0, a1, …, an are constants, and the basis polynomials Gk (x) are defined as Gk  x    x  x0  x  x1   x  xk 1  k  1, 2, ..., n

(4.22)

Each Gk(x) has exactly k zeros, namely x0, x1, …, xk–1. That is, Gk (xj) = 0 for j < k and Gk (xj) ≠ 0 for j ≥ k. In the expanded form, the interpolating polynomial (4.21) can be expressed as Pn  x   a0  a1  x  x0   a2  x  x0  x  x1     an  x  x0  x  x1   x  xn 1 

(4.23)

The basis polynomials in the right are in ascending order of their degree up to nth degree. Thus, the sum Pn(x) is essentially of degree n or less. Coefficients ai’s can determined by imposing the condition that the polynomial passes through the points (x0, f0), (x1, f1), …, (xn, fn), i.e. Pn  xi   f i ,

i  0, 1,  , n

which translate to the following conditions

a0  f 0

  a0  a1 G1  x1   f1   a0  a1 G1  x2   a2 G2  x2   f 2   ........................................................   a0  a1 G1  xn   a2 G2  xn     an Gn  xn   f n 

(4.24)

The above linear system can be expressed in matrix form as M a = f, where the matrices M, a and f are given by


4.3

4.25

ERROR IN POLYNOMIAL INTERPOLATION

We have seen in last sections that while the interpolating polynomial P(x) agrees with underlying function f (x) at the interpolation points. There is no reason for us to believe these two will be close to each other elsewhere. The difference f (x) – P(x) measures the deviation of the interpolating polynomial from the actual function, and is refer to as interpolation error. In this section, we will study ways to estimate the interpolation error and strategies to minimize the error. We now prove an important theorem that provides an estimation of interpolation error in terms of derivatives of the actual function: Theorem 4.2. If f (x) is a function that has continuous n + 1 derivatives on the interval [x0, xn], and Pn(x) is the interpolating polynomial that agrees with f (x) at n + 1 distinct points x0, x1, …, xn, then for all x  [x0, xn],    [x0, xn] such that the interpolation error en is given by

en  x   f  x   Pn  x  

n 1 f ( n 1)     x  x j  (n  1)! j 0

(4.55)

Proof. Consider a point x[x0, xn]. If x is one of the interpolation points x0, x1, …, xn then f (x) and Pn(x) coincide with each other, and thus, the result (4.55) holds trivially. We will now show that the relation (4.55) also holds for an arbitrary point x other than the interpolation points. Define a function

  y   f  y   Pn  y    g  y 

(4.56)

where λ is constant, and function g(y) is a polynomial of degree n + 1, given by n

g  y   y  xj 

(4.57)

j 0

Note that the polynomial ϕ(y) vanishes at the n + 1 interpolation points x0, x1, …, xn. We choose the constant λ in such a manner that ϕ(y) vanishes at one more arbitrarily chosen point x in the interval [x0, xn] where x ≠ xj, j = 0, 1, …, n. In other words, the value of λ is obtained from the equation ϕ(x) = 0, x ≠ xj yielding f  x   Pn  x   n (4.58)  x  xj  j 0

Since x ≠ xj, j = 0, 1, …, n, the denominator in the right does not vanish, and therefore λ is well defined. Notice that the function f (y) and polynomials Pn(y) and g(y) have continuous derivatives of order n + 1 on the interval [x0, xn]. Thus, their linear combination, ϕ(y), also has continuous derivatives of order n + 1 on [x0, xn]. Moreover ϕ(y) has n + 2 zeros x0, x1, …, xn, x in [x0, xn]. Therefore, by generalized Rolle’s mean value theorem, ϕʹ(y) vanishes at least n + 1 times in [x0, xn]; ϕʹʹ(y) vanishes at least n times in [x0, xn] and so on. Finally, ϕ(n+1)(y) vanishes at least once in [x0, xn]. Denote this point by ξ so that

 ( n 1)     0 f ( n 1)    Pn

( n 1)

    g ( n 1)    0

(4.59)

Since Pn(ξ) is a polynomials of degree n; the terms Pn(n + 1)(ξ) vanish uniformly. Moreover, g(ξ) is a polynomial of degree n + 1, thus (4.57) implies that g(n+1)(ξ) = (n + 1)!

4.48

4.7

Interpolation

INTERPOLATION USING FINITE DIFFERENCES

4.7.1 Newton-Gregory Forward Interpolating Polynomial When the nodes of interpolation are equally spaced, Newton interpolating polynomial reduces to a simpler form containing forward differences. We have shown in equation (4.92) that if xj = x0 + i h, i = 0, 1, …, n, the divided difference can be expressed in terms of forward difference as f  x0 , x1 ,  , xn  

 n f  x0  n! hn

and therefore, Newton form of interpolating polynomial Pn  x   f  x0    x  x0  f  x0 , x1    x  x0  x  x1  f  x0 , x1 , x2      x  x0  x  x1   x  xn 1  f  x0 , x1 , , xn 

translates to

Pn  x   f 0   x  x0 

 f0 2 f0   x  x0  x  x0  h  h 2! h 2

    x  x0  x  x0  h   x  x0  (n  1)h 

n f0 n! hn

where f0 = f (x0). Letting u = (x – x0)/h in the above relation, we can write

Pn  x   f 0  u  f 0 

1 1 u  u  1  2 f 0    u  u  1 u  n  1  n f 0 2! n!

This result is known as Newton-Gregory forward interpolation formula. An alternative form is



where





Pn  x   f 0  u  f 0  u  2 f 0    u  n f 0 1 2 n

(4.101)

 uk   u(u 1)k(!u  k  1)

An alternative derivation of Newton-Gregory forward interpolation formula is straight from the definition of shift operator E without referring to Newton form of interpolating polynomial. Using relation u = (x – x0)/h, we can write x = x0 + u h. Therefore

f  x   f  x0  uh   E u f  x0   1    f  x0  u

Notice that the parameter u takes integral value only at the nodes x0, x0 + h, …, x0 + nh. For points other than these nodes, u is non-integral, and thus, the binomial expansion of the right-hand side leads to the following result

    f  x    u   f  x    u   f  x    u   f  x    1 2 n

f  x   1  u   u  2    u  n   f  x0    1 2 n 0

0

2

0

n

0


4.53

In order to determine the number of workers earning wages in the range Rs 400 – 449, we first approximate the number of workers whose wages are less than 450 using Newton-Gregory backward interpolation formula. We have u = (x – x5)/h = (450 – 600)/100 = – 1.5. Now

f  450   f 5  u f 5 

u  u  1 2!

   (1.5)  98 

 2 f5   

u  u  1  u  4  5!

5 f5

1 1 (1.5)(1.5  1)(213)  (1.5)(1.5  1)(1.5  2)  4 2! 3!



1 (1.5)(1.5  1)(1.5  2)(1.5  3)  374 4!



1 (1.5)(1.5  1)(1.5  2)(1.5  3)(1.5  4)  723 5!

 1618  147  79.875  0.25  8.7656  8.4727  1408.613 Therefore, the number of workers whose wages are in the range 400 – 449 are: f (450) – f (400) = 1408.613 – 1209 = 199.613 ≈ 200 workers.

4.8

CENTRAL DIFFERENCE INTERPOLATION

While interpolating for the values of x which are in the beginning or end of a given table, it is understandable that one has no option but to rely on the leading differences. However, when interpolating near the middle part of a table, it might be of advantage if we could involve the data which are on both sides of the value of x between which we wish to interpolate. That is to say, instead of starting from the beginning of the table and choosing the terms f0, f1, f2, … we might shifts the origin suitably and prefer a formula that contains terms f–2, f–1, f0, f1, f2, etc. In central interpolation, we are not confined to the leading differences; the interpolation uses values of f (x) on the either sides of the origin. There are several formulae for central differences interpolation which can be directly developed from Newton-Gregory forward formula by shifting the origin appropriately and invoking properties of finite differences. In next few subsections, the derivation of some of the major central interpolation formulae is explained. 4.8.1 Gauss Forward Interpolation Newton-Gregory forward interpolation formula is P  x   f0  u  f0 

u  u  1 2!

 2 f0 

u  u  1 u  2  3!

3 f0 

u  u  1 u  2  u  3  4!

 4 f0  

where u = (x – x0)/h. Gauss forward formula can be simply derive by converting the leading differences f0, ∆ f0, ∆ 2f0, ∆ 3f0, ∆ 4f0, ∆5f0 etc in terms of the differences on the zigzag line as shown in Table 4.8, i.e. by f0, ∆f0, ∆2f–1, ∆3f–1, ∆4f–2 etc. The conversion needs to be done successively. That is, we first convert ∆2f0, ∆3f0, ∆4f0, ∆ 5f0 etc in terms of ∆2f–1, ∆ 3f–1, ∆4f–1, ∆ 5f–1 etc. In the resulting expression, ∆ 4f–1, ∆5f–1 etc are converted into ∆ 4f–2, ∆ 5f–2 etc. The procedure is illustrated below: From the definition of forward difference, we know that


4.65

Construct a difference table with x = 30 as the origin. Laplace-Everett formula as far as fourth differences is

  v  v 2  12  2 v  v 2  12  v 2  22  4 P  x   v f 0   f 1   f 2  ... 3! 5!     u  u 2  12  2 u  u 2  12  u 2  22  4  u f1   f0   f 1   3! 5!   For x = 35, we have u = (35 – 40)/10 = 0.5, and v = 1 – u = 0.5. Substituting values of u, v and differences of f in the formula, we get u

x

f (x)

–2

10

1.0000

–1

20

1.301

0

30

1.4771

1

40

1.6021

2

50

1.699

3

60

1.7782

∆2 f

∆f

∆3 f

∆4 f

∆5 f

0.301 0.1761 0.125 0.0969 0.0792

– 0.1249 – 0.0511 – 0.0281 – 0.0177

0.0738 0.023 0.0104

– 0.0508

0.0382

– 0.0126

  0.5  0.52  12  0.5  0.52  12  0.52  2 2  P  35   0.5  1.4771  (0.0511)  (0.0508)  3! 5!     0.5  0.52  12  0.5  0.52  12  0.52  2 2    0.5  1.6021  (0.0281)  (0.0126)  3! 5!     0.73855  0.003194  0.000277812    0.80105  0.001756   1.544203

Actual value of the function is log10(35) = 1.5441.

4.9

INTERPOLATION BY ITERATION

In preceding sections, we discussed various algorithms to develop an explicit form of interpolating polynomial passing through n + 1 points. We now describe a different technique that generates value of the interpolating polynomial at the single value of the independent variable. This technique is iterative and very effective when employed on digital computers. The key concept of iterative interpolation is the fact that the interpolating polynomial of degree n passing through given n + 1 points can be obtained by combining two polynomials of degree n –1 that interpolate different sets of n points amongst the given n + 1 points. To grasp the concept, recall the linear interpolant passing through points (x0, f0) and (x1, f1). The interpolant can be written as


4.10

4.73

HERMITE INTERPOLATION

We have seen in preceding sections that Lagrange and Newton interpolation deal with construction of a polynomial Pn(x) satisfying the conditions Pn  xi   f  xi  , i  0,1, , n

Now we turn our attention to slightly different interpolation problems in which the requirement is not only to match the values of function at given points but also to match the derivatives of the function. The concept is illustrated with an example. Example 4.32 Construct an interpolating polynomial P(x) that passes through points (0, –1), (1, 1) and satisfies the condition Pʹ(1) = 3. Assume a polynomial of degree n of the form Pn  x   a0  a1 x  a2 x 2    an x n

where the n + 1 constants a0, a1, …, an can be uniquely determined if n +1 conditions are given. In this example, 3 conditions, including the derivative conditions, are given. Thus it can be assumed that the interpolating polynomial is of second degree. That is P  x   P2  x   a0  a1 x  a2 x 2

so that

P2  x   a1  2a2 x

Since the polynomial P2(x) passes through the points (0, –1), (1, 1), and P2(1) = 3. Therefore

1  a0 1  a0  a1  a2 3  a1  2a2 Solving, we obtain: a0 = –1, a1 = 1, a2 = 1. Thus the polynomial is P2  x   1  x  x 2

In the above example, we were asked to equate derivatives of only first order. However, there may be requirement of matching the higher order derivatives. Hermite interpolation–named after French mathematician Charles Hermite (1822-1901)–uses an interpolant which is equal to the function and its derivatives of order up to m at n + 1 distinct points x0, x1, …, xn. This means that the observed values are:

 x0 , f 0  ,  x1 , f1  , ,  xn , f n   x0 , f 0 ,  x1 , f1 , ,  xn , f n   x0 , f 0 ,  x1 , f1 , ,  xn , f n 

 x , f  ,  x , f  , ,  x , f  0

(m) 0

1

(m) 1

n

(m) n

4.78

Interpolation

H3  x   1   1 

4.11

1 4 11 2 2  x  1   x  1   x  1  x  8 3 147 4116 1 4 11  x  1   x 2  2 x  1   x3  10 x2  17 x  8 3 147 4116

11x 3  222 x 2  1783 x  2544 4116

ERROR IN HERMITE INTERPOLATION

Now, we prove an important theorem for estimation of the error in Hermite interpolation. Theorem 4.6. Suppose that the function f (x) is n + 2 times differentiable on [x0, xn] and H2n+1(x) is the Hermit interpolating polynomial satisfying conditions (4.119) and (4.121) at n + 1 distinct points x0, x1, …, xn, then  x  [x0, xn],    [x0, xn] such that the error of interpolation e(x) is given by

e  x   f  x   H 2 n 1  x  

 x  x0   x  x1    x  xn   2n  2  ! 2

2

2

f (2 n  2)  

(4.130)

Proof. If x is one of the points x0, x1, …, xn, then the term

 x  x0   x  x1    x  xn   2n  2  ! 2

2

2

f (2 n  2)  

vanishes uniformly. Thus, the interpolation condition that H2n + 1(xi) = f (xi), i = 0, 1, …, n yields e x  0

Hence, the theorem holds trivially at the points of interpolation. Now we prove the theorem at a point other than x0, x1, …, xn. Choose an arbitrary x in the interval [x0, xn] such that x is not a point of interpolation, and define a function g(y) by

g  y   f  y   H 2 n 1  y  

 y  x0   y  x1    y  xn  2 2 2  x  x0   x  x1    x  xn  2

2

2

 f  x   H 2 n 1  x  

(4.131)

Since, the right-hand side is linear combination of differentiable functions; function g(y) defined above is also differentiable on [x0, xn], Moreover, at each point of interpolation xi, i = 0, 1, …, n, we have

 x  x0    xi  xi    xi  xn  g  xi   f  xi   H 2 n 1  xi   i 2 2 2  x  x0    x  xi    x  xn   f  xi   H 2 n 1  xi   0 2

Moreover,

g  x   f  x   H 2 n 1  x  

2

2

 f  x   H 2 n 1  x  

 x  x0    x  xn   f x  H 2 n 1  x    0 2 2     x  x0    x  xn  2

2


4.81

polynomial of degree 99; and theoretically, because large degree polynomials have an awkward tendency to oscillate wildly. To make this point clear, we briefly describe Runge’s phenomenon– named after German mathematician C. D. T. Runge (1856–1927). Consider the function

f  x 

1 1  25 x 2

Runge observed that if this function is approximated between – 1 and 1 using n equidistant points xi given by 2i xi  1  n i = 0, 1, …, n. then contrary to what one would expect, the error increases as we increase the degree of interpolating polynomial. In Figure 4.4, we have plotted the actual function f (x) and its polynomial approximation of sixth (n = 6) and eighth (n = 8) degree. For sixth degree polynomial, the grid points are x = – 1, –2/3, –1/3, 0, 1/3, 2/3, 1, and the polynomial is

P6  x   1 

211600 2 2019375 4 1265625 6 x  x  x 24089 96356 96356

For n = 8, the grid points are –1, –3/4, –1/2, –1/4, 0, 1/4, 1/2, 3/4, 1 and the polynomial is P8  x   1 

98366225 2 228601250 4 383  10 6 6 2  108 8 x  x  x  x 7450274 3725137 3725137 3725137

1 0.5 f(x)

0

P6(x) P8(x)

-0.5 -1

-1

-0.6

-0.2

x

0.2

0.6

1

Figure 4.4 Approximation of f (x) = 1/(1 + 25 x2) by sixth and eighth degree polynomials in interval [–1, 1]

Observe two important things: (i) both P6(x) and P8(x) are oscillatory except in the middle part of the curve, and (ii) error in approximation of f (x) by P8(x) is more than that of P6(x). The error increase when we increase the degree of interpolant. This phenomenon is known as Runge’s phenomenon. Let’s now understand what causes this phenomenon. We have seen in an earlier section that the bound of error for n + 1 equally spaced grid points is given by

en  x  

M n 1 n 1 h 4  n  1

4.88

4.13

Interpolation

SPLINE INTERPOLATION

Piecewise interpolating polynomials discussed in preceding sections are easy to construct, but they lack smoothness at the endpoints of subintervals (see Example 4.35). In this section, we discuss another form of piecewise polynomial interpolant, known as ‘splines’. The term spline refers to a thin and flexible mechanical device used by drafters for drawing smooth curves in engineering and architect designing. Typically, a drafter would insert pins at judiciously selected points (known as knots) and bent the spline so that it passes through each knot. The idea is to choose an interpolant that guarantees the continuity of the curve as well of its tangent and curvature at each point. Importance of spline functions is best summed up by Rice (1969, page 123) 1 who quoted: “Spline functions are the most successful approximating functions for practical application so far discovered. The readers may be unaware of the fact that ordinary polynomials are inadequate in many situations. This is particularly the case when one approximates the functions which arise from the physical world rather than from the mathematical world. Functions which express physical relationship are frequently of a disjointed or dissociated nature. That is to say that their behavior in one region may be totally unrelated to their behavior in another region. Polynomials along with most other functions have just the opposite property. Namely, their behavior in one region determines their behavior everywhere. Splines do not suffer from this handicap since they are defined piecewise, yet, for k ≥ 3, they represent nice, smooth curves in the physical world .” Mathematically, the spline interpolation is another form of piecewise polynomial interpolation with certain continuity conditions at the nodes. Suppose that f0, f1, …, fn are the discrete values of a function f (x) at n + 1 nodes x0, x1, …, xn respectively, where x0 < x1 < … < xn. A spline of degree k is a function S(x) satisfying the following conditions: (i) S(x) passes through the points (xi, fi), i.e. S(xi) = f (xi), i = 0, 1, …, n (ii) on each subinterval [xi, xi+1], S(x) is a polynomial of degree k, and (iii) S(x) and its first (k – 1) derivatives are continuous on the interval (x0, xn). Suppose that the spline S(x) is made up of polynomials S1(x), S2(x), …, Sn(x) in subintervals [x0, x1], [x1, x2], …, [xn – 1, xn] respectively as shown in Figure 4.8. f

S1(x)

Si (x)

S2 (x) f1

f2

x1

x2

Sn (x)

fi–1

fi

xi–1

xi

f0

x0

fn–1

xn–1

fn

xn

X

Figure 4.8 Development of spline function S(x)

1 Rice J R (1969). The approximation of functions. Vol 2, Addison-Wesley, Reading, Massachusetts.


Si  x  

4.93





1 2 2 M i  x  xi 1   M i 1  xi  x   Ai 2 hi

(4.158)

Since equation (4.158) is valid for xi –1 ≤ x ≤ xi; the expression for Si(xi) can be obtained directly by setting x = xi in (4.158). That is

Si  xi  

1 1 2 M i  xi  xi 1   Ai  hi M i  Ai 2 hi 2

(4.159)

However, in order to determine Si+1(xi), we must first replace i by i + 1 in (4.158), and then set x = xi in it. Simplifying, we get

Si1  xi   

1 1 2 M i  xi 1  xi   Ai 1   hi 1 M i  Ai 1 2 hi 1 2

(4.160)

where Ai+1 can be obtained by replacing i by i + 1 in equation (4.156). Now, the continuity condition of first order derivative, i.e. Si (xi) = Si+1(xi) implies that

1 1 hi M i  Ai   hi 1 M i  Ai 1 2 2 Finally, substitute the values of Ai and Ai + 1 in the above equation and simplify it further. This leads to the following recursive relation between Mi – 1, Mi and Mi + 1

 f f f  f i 1  hi M i 1  2  hi  hi 1  M i  hi 1 M i 1  6  i 1 i  i  hi  hi 1 

(4.161)

For i = 1, 2, …, n – 1, equation (4.161) provides a system of n – 1 linear equations in n + 1 unknowns, namely M0, M1, …, Mn. In order to determine these unknowns uniquely, two additional conditions are required which are often chosen as follows: a) Natural end conditions assume that the second order derivatives at both end points are zero. That is S1(x0) = M0 = 0 and Sn(xn) = Mn = 0. A cubic spline with these end conditions is known as natural cubic spline. Instead of choosing natural end conditions, we may also opt for some nonzero second order derivatives such as S1(x0) = f  (x0) and Sn(xn) = f  (xn). b) Clamped end conditions assume that the first order derivatives at the first and the last nodes are known. That is S1(x0) = f0 and Sn(xn) = fn. A cubic spline with these conditions is known as clamped cubic spline. Clamped cubic splines give better approximation of the function, but require numerical values of the derivative at the endpoints. c)

Periodic end conditions assume that S1(x0) = Sn(xn) and S1(x0) = Sn(xn). A cubic spline with these end conditions is known as periodic cubic spline. d) Not-a-knot end conditions do not specify any extra conditions at the end points. When we have no information other than the value of f at each interpolating point, it is advised to use Not-aknot conditions. Since, the cubic spline changes from one cubic to another at each knot, the idea of not-a-knot condition is not to change the cubic polynomial as we cross the first and the last interior knots, i.e. x1 and xn–1. Thus, a not-a-knot spline requires that the third derivative of the spline is continuous at x1 and xn–1, i.e. S1(x1) = S2(x1) and Sn-1(xn–1) = Sn(xn–1). Once the values of M0 and Mn are known, we can express the system of equation (4.161) as


4.15

4.103

MULTIVARIATE POLYNOMIAL INTERPOLATION

Interpolation techniques for function of several independent variables are less developed than that of a single variable function. The main difficulty in polynomial interpolation of multivariate function is that the interpolating polynomial may not be unique. For instance, assume that we are given values of a bivariate function z = f (x, y) at n +1 points (x0, y0), (x1, y1), …, (xn, yn) in the x–y plane and asked to determine whether there can be a unique polynomial that attains value f (xi, yi) at each point (xi, yi), i = 0, 1, …, n. Obviously, answer to this question in general is ‘no’, because if we assume for a while that all the points (xi, yi, zi) lie on a straight line in the x–y–z space, then there may be infinite many planes passing through these points (xi, yi, zi). Each of these plane is a linear interpolant of the form z = a x + by + c. It means that even a linear interpolant through these points is not unique. However, if the points of interpolation are specifically chosen, the interpolating polynomial will be unique. For simplicity, we will restrict ourselves to functions of two independent variables only. Suppose that a function f (x, y) is tabulated for (m + 1)(n + 1) distinct points (xi, yj), where i = 0, 1, …, m, j = 0, 1, …, n. For simplicity, we denote the point (xi, yj) by pi, j and the corresponding value of function f (xi, yj) by fi, j as shown in the following table: y x x0

y0

y1

…

yn

f0,0

f0,1

…

f0,n

x1

f1,0

f1,1

…

f1,n

fm,0

fm,1

⁞ xm

… …

fm,n

Now, we seek a polynomial, P(x, y), of degree at most m in x and at most n in y satisfying interpolation conditions P  xi , y j   f i , j (4.185) i = 0, 1, …, m, j = 0, 1, …, n. Usually, the interpolating polynomial should be of the form m

n

P  xi , y j    ai , j x i y j

(4.186)

i 0 j 0

Equation (4.186) contains (m + 1)(n + 1) constants ai,j which can be determined using (m + 1)(n + 1) condition prescribed by equation (4.185). Since, number of conditions and unknown are equal; interpolating polynomial of the form (4.186) is unique. The interpolating polynomial can be formed more conveniently by using Lagrange form of interpolating coefficients. Consider

 x  xk  , k  0  xi  xk  m

X m, i ( x)  

i  1, 2, , m

(4.187)

j  0,1, , n

(4.188)

k i

and

n

Yn , j ( y )   k 0 k i

 y  yk 

y

j

 yk 

,

Clearly Xm, i(x) is of degree m and Yn, j(y) is of degree n. Thus the product Xm, i(x)Yn, j(y) is of degree m in x, and n in y. Moreover, this product vanishes at every interpolation point p, β = (x, yβ) except when  = i and β = j in which case the product is unity. Therefore, the polynomial

Chapter 5

NUMERICAL DIFFERENTIATION AND INTEGRATION

5.1

INTRODUCTION

The concepts of differentiation and integration are extremely important to model wide range of real life problems. Starting from a simple problem concerning slope of a tangent line, to a complicated fluid flow problem, derivatives and integrals have made their presence almost everywhere. The basic rules for computation of derivative and integrals are taught at the school level. These methods are sufficient to calculate most of the derivatives; however, calculation of integrals by analytical methods may not be always possible. Moreover, in practical application, we are often given discrete values of a function, and required to compute the derivatives and integral without knowing the actual function. For example, the position of a moving car is recorded every minute using a GPS system, and we are required to compute the speed of the car; or approximate the area of cross section of a river whose depth is measured every foot along its width (See Figure 5.1). dn

d0 d1

d2

d3

d4 depth

Figure 5.1 Cross sectional view of a river

In the first case, if the position of the car is known as a continuous function of time, we can calculate the speed by differentiating the position with respect to time. Similarly, in the second case, if the depth of the river is known as a function of distance (from one bank), the area of cross section can be obtained by integrating this function with respect to the distance. But when the depth is measured only at isolated points, integration is not possible analytically. The same is applicable to the moving car

5.19

Fundamentals of Numerical Methods 5.2.3 Numerical Differentiation using Undetermined Coefficients

We have seen in preceding sections that the approximation of a derivative of f (x) is eventually expressible as a linear combination of values f0, f1, f2 etc. The method of undetermined coefficients uses this concept. In this method, we express the kth order derivative of f (x) as a linear combination of some or all values amongst f0, f1, …, fn. The coefficients involved in the linear combination are then determined by solving the linear system which consists of as many equations as the number of unknowns. The first nonzero term of the remaining series measures the error of approximation. The procedure is illustrated with an example: We seek a numerical approximation of f (x0) in terms of f (x0 – h), f (x0) and f (x0 + h). Since, the expression sought is for second order derivative, it is assumed that

  x0   a0 f  x0  h   a1 f  x0   a2 f  x0  h  f num  ( x0 ) is approximation of f (x0), and a0, a1 and a2 are constants. To evaluate a0, a1 and a2, where f num expand the right-hand side by Taylor series, and combine the coefficients of the similar terms. We get   x0    a0  a1  a2  f  x0   h  a0  a2  f   x0   f num

h2  a0  a2  f   x0  2!

h3 h4  a0  a2  f   x0    a0  a2  f (iv )  x0   3! 4! Since, there are three unknowns; three equations are required to uniquely determine the unknowns. Thus, equating both sides the coefficients of f (x0), f (x0) and f (x0), we get 

a0  a1  a2  0 ;

a0  a2  2 / h 2

 a0  a2  0 ;

whose solution is: a0 = 1/h2, a1 = – 2/h2 and a2 = 1/h2. Consequently, the approximation is

  x0   f ( x0 )  f num

f  x0  h   2 f  x0   f  x0  h  h2

For truncation error, note from one of the above equations that

f   x0  

f  x0  h   2 f  x0   f  x0  h  h2



h h2  a0  a2  f   x0    a0  a2  f (iv )  x0   3! 4!

Substituting the values of a0 and a2, the second term in the right vanishes and therefore

TE  

h2 h2  a0  a2  f (iv )     f (iv )   4! 12

Example 5.7 Derive an approximation for second order derivative f (x0) in terms of values f (x0 – 2h), f (x0 – h), f (x0), f (x0 + h) and f (x0 + 2h). Determine the leading term of the error. Use the expression to estimate f (1.5) for the function f (x) = 1/x2 with h = 0.1. Compare the result with exact solution.

  x0  . So Denote numerical approximation of f (x0) by f num   x0   a0 f  x0  2h   a1 f  x0  h   a2 f  x0   a3 f  x0  h   a4 f  x0  2h  f num Expand the terms in the right-hand side by Taylor series and combine the similar terms

5.21


5.3

ROUNDOFF ERRORS IN NUMERICAL DIFFERENTIATION

Recall the simplest approximation of the first order derivative

f   x0  

f  x0  h   f  x0 



h

h f    2

where the remainder term – h f ()/2, x0 <  < x0 + h denotes the truncation error. Note that the error is proportional to h and tends to zero as the step size h goes to zero. So, one might get an impression that the numerical approximation becomes better and better as h is reduced. But this is not the case. In reality, when h is reduced, the value of f (x0 + h) gets closer to f (x0), and at the same time, the denominator of the fraction also tends to zero. It means that in the above formula, difference of two approximately equal numbers is being divided by a small number h. This leads to another form of error, known as roundoff error. It is quite possible that if h is extremely small, the roundoff errors grow so large that the numerical differentiation becomes unstable. When we enter the values of f (x0) and f (x0 + h) in a digital computer, certain roundoff error are included in these values depending on the machine precision. That is to say that the actual calculation will be carried out using some values F(x0) and F(x0 + h) instead of f (x0) and f (x0 + h). The difference between the actual value and the value used in the calculation is the roundoff error. If we denote the respective roundoff errors by ε (x0) and ε (x0 + h) then

f  x0   F  x0     x0  f  x0  h   F  x0  h     x0  h 

and

Therefore, the numerical approximation of f (x0) becomes

f   x0  

F  x0  h   F  x0  h



  x0  h     x0  h 

h

2

f   

The second term in the right denotes the roundoff error (RE), whereas the third term is the truncation error (TE) in the numerical differentiation. Thus

f   x0   So

RE 

F  x0  h   F  x0  h

  x0  h     x0  h

TE 

and



 RE  TE

  x0  h  h



  x0  h

h f    2

If ε = Max {|ε(x0 + h)|, |ε(x0)|} and M2 = Max | f ()|, x0 <  < x0 + h, then we can write

RE 

2 h and TE  M 2 h 2

Note that when h → 0, the bound of truncation error (h M2)/2 tends to zero, but the roundoff error 2ε/h tend to infinity. In such situation, a more realistic approach would be to choose the step size in such a

5.23


RE 

Now

4 h

and TE 

h2 M3 3

where ε = Max {|ε–2|, |ε–1|, |ε0|} and M3 = Max | f ()|, x0 – 2h <  < x0. For optimal step size, we need to minimize the value of 4 h 2  M3 h 3 which results in hopt = (6 ε/M3)1/3. 5.4

RICHARDSON EXTRAPOLATION FOR NUMERICAL DIFFERENTIATION

Richardson extrapolation–named after English mathematician L. F. Richardson (1881–1953)–is a simple and effective technique of developing higher order methods from the lower order ones. In this method, we use two approximations of same order and combine them in such a manner that the leading term of the error is eliminated. Thus, the order of the resulting approximation is always higher than that of actual approximations. Suppose that we have a method of order p for approximation of f (x0). If this method is used twice; one with step size h and other with k h, then for step size h

f   x0   Dh  C1 h p  O  h p 1 

(5.60)

where Dh is the numerical approximation of f (x0) and C1 is a constant which depends on the method we use. Similarly, for step size k h

f   x0   Dkh  C1 (k h) p  O  h p 1 

(5.61)

where Dkh is the numerical approximation of f (x0) with step size k h. Multiply equation (5.60) by k and subtract from it equation (5.61), we obtain

f   x0  

Dh k p  Dkh  O  h p 1  p k 1

p

(5.62)

implying that the expression

Dh k p  Dkh k p 1 is a higher order approximation of f (x0). To illustrate the procedure in a particular case, we apply the Richardson extrapolation to a second order approximation of f (x0). Recall that

f   x0  

f  x0  h   f  x0  h  2h



h2 f    , x0  h    x0  h 6

If Dh and Dh/2 are two approximations of f (x0) using step sizes h and h/2 respectively, then

f   x0   Dh  C1 h 2  O  h 4 

f   x0   Dh / 2  C1  h / 2   O  h 4  2

5.31


5.5

NUMERICAL INTEGRATION

Evaluation of definite integrals of the form b

I   f  x  dx

(5.67)

a

is an important aspect in science and engineering–recall the example of cross section of a river stated in the beginning of this chapter. At school level, students learn many formulae and methods to integrate a function f (x). But these analytical methods can work only if the integrand f (x) has an antiderivative, i.e. there exists a function F(x) such that F (x) = f (x). In reality, we often require to integrate functions whose antiderivatives are either extremely complicated or not expressible in terms of standard functions. For instance, antiderivative of f (x) = e –sin x is not known, and therefore, this function cannot be integrated by analytical methods. Furthermore, one can think of analytical integration only if the function is expressed in a mathematical form. Imagine that we are given the discreet values of a function and expected to integrate it. For instance, given the depth of river at several equidistant points along its width, find the cross section of the river. In such cases, we need to explore an alternative technique which can approximate value of a definite integral irrespective of the form in which the integrand is expressed. The basic task in numerical integration–also known as numerical quadrature–is to compute an approximate value of a definite integral. The numerical approximation of integral differs from analytical integration in two ways. First, it is an approximation so it might contain certain error. Second, it does not provide the solution in terms of elementary functions; it only gives us a numerical value as an approximation of the integral. Typically, numerical techniques are meant to provide an estimation of a definite integral in terms of discreet observed value of the integrand. That is n

b

I   f  x  dx  w0 f 0  w1 f1    wn f n   wi f i a

(5.68)

i 0

where the parameters wi, i = 0, 1, …, n are known as quadrature weights. fi is short form of f (xi). For convenience, we denote the integral (5.67) by I ( f ) and its numerical approximation (5.68) by In( f ). The techniques of numerical integration are based on two different approaches. In the first approach, we fix equally spaced nodes (or abscissas) namely x0, x1, …, xn in the interval [a, b] and then approximate the function f (x) by a polynomial interpolating the points (xi, fi). Finally, the interpolating polynomial is integrated to obtain an approximation of the given integral. This approach is commonly known as Newton–Cotes quadrature method. In the second approach, the nodes are not fixed. Rather, we determine the nodes and weights for a given n such that the method is as high accurate as possible. This approach is known as Gaussian quadrature method. We discuss these approaches one by one in the subsequent sections. 5.5.1 Newton-Cotes Quadrature Methods Suppose that x0, x1, …, xn are the equally spaced nodes and f0, f1, …, fn be the corresponding values of function f (x). Consider Lagrange’s form of polynomial interpolating the points (xi, fi), i = 0, 1, …, n n

Ln  x    Ln , i  x  f i i0

where the associated polynomial Ln, i (x) are

(5.69)

5.34

Numerical Differentiation and Integration

Now from (5.75)

f  x

Pn 1  x 

f  xi  1  f ( n 1)    (n  1)! i  0  x  xi  Pn 1  xi  n



which results in n f  xi  d  f  x  1 d ( n 1)  f       2 dx  Pn 1  x   ( n  1)! dx  i  0  x  xi  Pn 1  xi 

Comparing the last term of this equation with (5.78), we obtain the desired result.

■

Besides the aforementioned results concerning error in quadrature, there is another approach which comes in handy whenever the degree of precision of a method is known. Suppose that a numerical quadrature rule n

I n  f    wi f i i 0

has degree of precision as n, then the method must provide exact values whenever f (x) is a polynomial of degree n or less. That is to say that the error in the quadrature formula is of the form

En  C f ( n 1)  

(5.79)

where  is in [a, b] and C is the error constant. The actual integral I ( f ) can be expressed in terms of numerical quadrature In( f ) and error En(x) as

I  f   I n  f   C f ( n 1)   In order to determine C, we set f (x) = xn+1 in both sides of the above equation. The resulting expression can be written as

C

 1  bn 2  a n 2 n   wi xin 1   (n  1)!  n  2 i 0 

(5.80)

Therefore, if the quadrature weights wi s are known, we can find the error constant C using (5.80) and consequently, the quadrature error using (5.79). Note that the formula (5.73) can determine the quadrature error whenever the abscissas xi’s are know; whereas, formula (5.80) can be used only when both abscissas and quadrature weights are know. Closed Newton-Cotes Formulae We now derive the simplest closed Newton-Cotes formula for n = 1. That is, we fit a linear interpolant between two points (a, f (a)) and (b, f (b)). Lagrange’s polynomials associated with linear interpolant are: xb xa and L1, 0  x   L1,1  x   a b ba Thus from (5.72), the quadrature weights become

w0  

b

a

 x  b dx a  b

and

w1  

b

a

 x  a dx b  a 

5.36

Numerical Differentiation and Integration

or

C

ba  1  b3  a 3 b  a 2   a  b2     12   2!  3 2 

3

Substituting the value of C in E1 = C f (), we obtain the same result as (5.82). We can infer from (5.82) that if b – a is not sufficiently small, trapezoidal rule would contain significant error. In such cases, interval [a, b] is divided into N subintervals [xi – 1, xi], i = 1, 2, …, N where x0 = a and xN = b such that each subinterval is of length h = (b – a)/N and apply the trapezoidal rule in each subinterval. Finally, these N results are added to obtain the value of the integral in [a, b]. This approach is commonly referred to as Composite Newton-Cotes quadrature. Trapezoidal rule in composite form can be expressed as N

b

I  f    f  x  dx    a

i 1

xi xi 1

f  x  dx

Thus N N 1 h h  I1  f     f  xi 1   f  xi     f 0  2 f i  f N  2 i 1 2 i 1 

(5.83)

where fi = f (xi). Relation (5.83) is known as the composite trapezoidal rule. It follows from (5.82) that the error E1 in the composite rule is N

E1    i 1

h3 h3 N  1 f  i    12 12  N



n

 f     , i 1

i

i  [ xi 1 , xi ]

Note that the term  f (i)/N = M (say) is the average of f (1), …, f (N). So, M must lie between values Min{f (x)} and Max{f (x)}, a ≤ x ≤ b. Since f (x) is continuous on [a, b]; it must attain all values between its minimum and maximum values at some point [a, b]. That means that there exists number   [a, b] such that f () = M, and therefore

E1  

h3 N h 2 (b  a ) f      f    12 12

(5.84)

If M2 is the bound of absolute value of f (x) in [a, b] then the bound of error E1 can be obtained from

E1 

h 2 (b  a ) M2 12

(5.85)

Example 5.12 Use composite trapezoidal method with (i) N = 4, and (ii) N = 8 to estimate the integral



 /2

0

sin x dx

Determine the bound of quadrature error in both cases. (i) For N = 4, we have h = (b – a)/N = π/8. The equally spaced abscissas are: x0 = 0, x1 = π/8, x2 = π/4, x2 = 3π/8, x4 = π/2, and the corresponding values of f (x) are f0 = sin 0 = 0, f1 = sin (π/8) = 0.38268, f2 = sin (π/4) = 0.70711, f3 = sin (3π/8) = 0.92388, f4 = sin (π/2) = 1. Writing the composite trapezoidal rule for N = 4

5.43


h4 b  a 

E3 

80

M4

where h = (b – a)/N. To keep the error below 5  10–5, the number of pieces N must satisfy

b  a 

5

80 N 4

M 4  5  105

Solving, we obtain N = 8.8 ≈ 9, i.e. number of subintervals m = N/3 = 3. Open Newton-Cotes Formulae In open Newton-Cotes methods, the endpoints of the interval are not included in the abscissas, i.e. neither x0 = a nor xn = b; however, points a, x0, x1, …, xn and b are equally spaced with spacing x. The simplest open Newton-Cotes formula is for n = 0 in which we consider only one abscissa x0 in the interval [a, b]. Keeping in view that this abscissa has to be equally spaced from both a and b, we take x0 = (a + b)/2. Since a polynomial that interpolates only one point must be a constant; thus, the integrand f (x) is interpolated by the constant f (x0). The corresponding quadrature weight becomes b

b

a

a

w0   L0, 0 dx   dx  b  a Substituting this value in I0( f ) = w0 f0, the simplest open Newton-Cotes formula reads

 ab I0  f   b  a  f    2 

(5.99)

Relation (5.99) is known as the midpoint rule. Error in the method can be obtained by setting n = 0 in (5.73). Thereafter, we follow a procedure exactly similar to the error term of Simpson’s rule, and get

E0  x  

b  a  24

3

f   

(5.100)

where   [a, b]. Alternatively, we use equation (5.79) to obtain the error term (5.100). Using this approach, we find that with n = 0, x0 = (a + b)/2 and w0 = b – a, the value of C is zero, implying that the method returns exact value for linear polynomials. So we use f (x) = x2 and note that

C

3 2 1  b3  a 3  a  b   b  a   b  a       2!  3 24  2  

Substituting this in E0 = C f (), we get the same expression as (5.100). Degree of precision of midpoint method is 1. For composite midpoint method, divide [a, b] into N subintervals and consider the midpoint of each subinterval as the abscissa. This provides us N abscissas

 1 xi  a   i   h, i  0,1, , N  1  2

(5.101)

where h = (b – a)/N. The composite midpoint rule takes the form

I 0  f   h  f 0  f1    f N 1 

(5.102)

5.45

Fundamentals of Numerical Methods (ii) Open two-point rule is

I1  f  

3x  f 0  f1  2

where x = (b – a)/3 = 1/3; x0 = a + x = 1/3, x1 = a + 2x = 2/3; f0 = f (x0) = 3/4, f1 = f (x1) = 3/5. Thus

I1  f  

3 1 3 3    0.675 2 3  4 5 

I2  f  

4 x  2 f 0  f1  2 f 2  3

(iii) Open three-point rule is

where x = (b – a)/4 = 1/4; x0 = a + x = 1/4, x1 = a + 2x = 1/2, x2 = a + 3x = 3/4 ; f0 = 4/5, f1 = 2/3, f2 = 4/7. Thus

I2  f  

4 1 4 2 4   2    2    0.69206 3 4 5 3 7

Exact value of the integral is I = log (2) = 0.693147. From (5.100), bound of error of midpoint rule is |E0| ≤ (b – a)3M2/24, where M2 is the maximum value of | f (x)| in the interval [0, 1]. Since | f (x)| = 2/(1 + x)3, so M2 = 2 and |E0| ≤ 1/12 = 0.08333. From (5.105), the bound of error of two-point rule is |E1| ≤ ( b – a)3M2/36. Substituting values of a, b and M2 we obtain |E1| ≤ 1/18 = 0.055556. Lastly, from (5.107), bound of error of open three-point rule is |E2| ≤ 7(b – a)5M4/23040. Proceeding in similar manner with M4 = 24, we get |E2| ≤ (7  24)/23040 = 0.0072917. 5.5.2 Gaussian Quadrature Methods Quadrature methods discussed in Section 5.5.1 are based on equally spaced abscissas. Consequently, if n + 1 abscissas x0, x1, …, xn are chosen, an interpolating polynomial of degree n is considered and integrated. Thus, the definite integral b

I   f  x  dx a

is approximated by a numerical quadrature n

I n  f    wi f i i 0

where fi = f (xi). Quadrature weights wi are determined by integrating the associated interpolating polynomials. We have also seen in previous section that for a given n, the degree of precision of a method is n or n + 1 according as n is odd or even. Generally, these quadrature methods are suitable when the abscissas are pre specified. However when the integrand f (x) is known, we have the freedom to select the abscissas xi (at which the integrand f (x) requires to be evaluated) so as to achieve highest

5.53


Example 5.19 Use (i) two-point, and (ii) three-point Gauss-Chebyshev formula to estimate the integral



1

1

1  x 2 cos x dx

Comparing the given integral with

I 

f  x

1

1

1  x2

dx

we get f (x) = (1 – x2) cos x. The two-point Gauss-Chebyshev formula is



1

1

f  x 1 x

dx 

2



 

1   f   2  2

 2

 1  f   2 

0.380122  0.380122  1.194189

Applying three-point Gauss-Chebyshev formula, we get



1

1

f  x

dx 

1  x2



 

3  f     f 0  3   2 

 3

 3  f     2  

0.161965  1  0.161965  1.386416

5.5.3 Gaussian Quadrature using Orthogonal Polynomials In this section, we describe a general procedure to obtain abscissas and weights of Gaussian quadrature using orthogonal polynomials. Before stating the main result, we briefly discuss few important aspects related to orthogonal polynomials. Suppose that {p0(x), p1(x), p2(x), …, pn(x)} be a family of polynomials which consists of n + 1 polynomials for a given positive integer n. Let the degree of each pk be k, i.e. p0 is constant, p1 is linear, p2 is quadratic, and so on. The family of polynomials is said to be orthogonal with respect to weight function w(x) over an interval [a, b] if the orthogonality relation b

 w  x  p  x  p  x  dx  0 i

a

j

is obeyed whenever i  j. Some examples of orthogonal polynomials are Legendre polynomials, Chebyshev polynomials, Laguerre polynomials, Hermite polynomials etc. Each of these polynomials has a specific weight function w(x) and interval [a, b]. For instance Legendre polynomials are orthogonal with respect to weight function w(x) = 1 over the interval [–1, 1]. Similarly, Chebyshev polynomials Tn(x) are orthogonal with respect to w(x) = (1 – x2)–1/2 over [–1, 1]. Information concerning origin of these polynomials can be found in any standard book on Special Functions. Here, we briefly describe the Legendre polynomials Pn(x) which arise as the solution of Legendre differential equation 2

1  x  ddxP 2

n 2

 2x

dPn  n  n  1 Pn  0 dx

and can be obtained from the Rodrigue’s formula

5.63


E1 



1    x2 4 e x dx   w0 x04  w1 x14   f (iv )   ,    1, 1  4!   4 4 1   4  x2   1   1   (iv )  2  x e dx         f   4!  0 2  2 2     

Substituting x2 = t in the first integral, we get

E1 

1    t 3/ 2   (iv ) 1  5   (iv )  0 e t dt   f          f   4!  4  4!   2  4 

1 3    (iv )  (iv )  f     f    4!  4 4  48 Letting f (x) = cos x, we obtain 







5.6

2

e  x cos x dx 

 

 1   1  cos     cos     0.76024   1.3475 2   2  2 

ROMBERG INTEGRATION

We have seen in preceding sections that the accuracy of numerical integration can be increased by dividing the interval [a, b] into smaller subintervals, and applying a quadrature formula in each subinterval. This process–known as the composite quadrature method–increases the accuracy of the result, but requires large number of function evaluations. Now recall how Richardson extrapolation in Section 5.4 was used to improve the accuracy of derivatives. There, we used two approximations of same order and combined them in such a manner that the leading term of the error is eliminated. The resulting approximation was of higher order than the actual approximations. Romberg integration– named after German mathematician W. Romberg (1909 – 2003)–uses the concept of Richardson extrapolation to improve the accuracy of composite trapezoidal rule. It can be shown using EulerMaclaurin summation formula that if the integrand f (x) is sufficiently differentiable, then b

 f  x  dx  T  f   c h a

h

1

2

 c2 h 4  

(5.152)

where c1, c2, … are constant. The term Th ( f ) refers to the trapezoidal approximation of the integral with step-size h. Note that the trapezoidal approximation is of second order. The basic task in Romberg integration is to use trapezoidal approximations of the given integral with step sizes h and h/2, and combine the approximations in such a manner that the term c1 in equation (5.152) is eliminated. The resulting approximation will be of fourth order. For successive use of this process, the integral is approximated by trapezoidal method with step-sizes h, h/2, h/4 etc, and combined to get approximations of 2nd order. Now the 2nd order approximations are combined to obtain fourth order approximations. The process is repeated until a desired level of accuracy is obtained. Conventionally, the Romberg integral is denoted by Rk,i where index i denotes the level of extrapolation and index k controls the step-size h by the relation

h

ba 2k 1

(5.153)

5.65


or

R4,1 

h  f 0  2  f1  f 2    f 6   f 7  2

R4,1 

1   64 16 64 4 64 16 64  1  1 2           0.78474712 16   65 17 73 5 89 25 113  2 

Using relation (5.153), we obtain the following table of Romberg integrals O(h2)

O(h4)

O(h6)

O(h8)

0.75 0.775

0.78333333

0.78279412

0.78539216

0.78552942

0.78474712

0.78539812

0.78539852

0.78539644

Exact value of the integral is π/4, i.e. 0.78539816.

5.7

QUADRATURE FORMULAE FOR DOUBLE INTEGRAL

Trapezoidal rule and Simpson's rule for single integrals can also be extended to multiple integrals. In this section, we develop quadrature method for estimation of the double integral of the form

I 

d

c

b

 f  x, y  dx dy

(5.155)

a

where a, b, c and d are constants. Typically, a Newton-Cotes quadrature formula uses two sets of equally spaced points, namely xi s and yj s to expresses the double integral in the form

I 

d

c

b

n

m

 f  x, y  dx dy   w a

j 0 i 0

i, j

f  xi , y j 

(5.156)

where wi,j are the quadrature weights. Development of trapezoidal rule and Simpson’s rule for approximation of double integrals is illustrated below: Set m = n = 1 and apply trapezoidal rule to the interior integral of (5.155), we get

I 

ba d  f  a, y   f  b, y   dy 2 c  d ba  d f  a, y  dy   f  b, y  dy    c c   2

Apply trapezoidal rule once again, the double integral becomes

I

 b  a  d  c  4

 f  a, c   f  a, d   f  b, c   f  b, d  

Comparing the quadrature with (5.156), we find that w0, 0 = w0, 1 = w1, 0 = w1, 1 = (b – a)(d – c)/4. For simplicity, denote x = (b – a)/n = b – a and y = (d – c)/n = d – c. So, the result can be expressed as

Chapter 6

NUMERICAL SOLUTION OF ORDINARY DIFFERENTIAL EQUATIONS

6.1

INTRODUCTION

Mathematical relationships that contain a function, its derivatives and independent variables are called differential equations. They are at the heart of many physical phenomenon and processes related to engineering, biological, business and economical sciences. Differential equations are divided in two major categories namely ordinary differential equations and partial differential equation. While an ordinary differential equation (ODE) arises in case of function of single variable; a partial differential equation (PDE) indicates the presence of two or more independent variables. An ODE of order n can be written in the form:

 dy d 2 y dn y    x, y, , 2 ,, n   0 dx dx dx  

(6.1)

An ODE is called an explicit differential equation if it can be expressed in the form

 dn y dy d n 1 y   f x , y , ,  ,   dx dx n dx n 1  

(6.2)

An ODE which is not explicit is known as an implicit differential equation. Another major classification of ODEs is based on their linear and nonlinear characteristic. A differential equation that does not contain: (a) product of two derivatives, and (b) product of dependent variable and its derivatives is called a linear differential equation. Differential equations that are not linear are called nonlinear differential equations. A function of independent variables that satisfies the given differential equation is called a solution of the differential equation. If a solution contains as many independent arbitrary constants as the order of the differential equation, then it is known as the general solution. Literature is replete with various analytical techniques to obtain closed form solution of the ordinary and partial differential equations; however, most of these techniques have very limited scope while dealing with real world problems. Often, differential equations describing complex processes do not admit closed form general solution

6.7

Fundamentals of Numerical Methods y ( x ) 

y ( x  h)  y ( x ) h h2  y ( x )  y ( x )  h 2! 3!

The term h and its higher powers in the right-hand side of the above equation can be abbreviated in the ‘big-oh notation’ as

y ( x) 

y ( x  h)  y ( x )  O ( h) h

which provides us an approximation of dy/dx as

y ( x) 

y ( x  h)  y ( x ) h

(6.7)

We can also expand y(x – h) by a Taylor series. For this, we have to replace h in equation (6.6) by – h. We get h2 h3 (6.8) y ( x  h)  y ( x )  hy ( x )  y ( x )  y ( x )   2! 3! which can be arranged as

y ( x) 

y ( x )  y ( x  h)  O ( h) h

Thus, another approximation of dy/dx is

y ( x) 

y ( x )  y ( x  h) h

(6.9)

Approximations obtained in both equations (6.7) and (6.9) are of first order, which means that the derivative term yʹ(x) is approximated by neglecting h and its higher power. A higher order approximation of dy/dx can be obtained by subtracting equation (6.8) from equation (6.6), leading to the result

y ( x)  or as

y ( x  h)  y ( x  h)  O(h 2 ) 2h

y ( x) 

y ( x  h)  y ( x  h) 2h

(6.10)

The approximation defined by equation (6.10) is of second order. 6.3.1 Forward Euler Method Let’s replace the derivative term in the left-hand side of equation (6.4) by equation (6.7), resulting in

y ( x  h)  y ( x )  h f ( x, y )

(6.11)

Equation (6.11) is the basis of the forward Euler method. The method is named after Swiss mathematician Leonhard Euler (1707 – 1783). It provides the solution at the current grid point in terms

6.15


y2  0.5  2(0.2)(0.2)2 sin(0.5)  0.50767 Similarly, for i = 1, 2 and 3

y3  y1  2h x22 sin y2  0.5  2  0.2 (0.4) 2 sin (0.50767)   0.53111 y4  y2  2h x32 sin y3  0.50767  2  0.2 (0.6) 2 sin(0.53111)   0.58061 y5  y3  2h x42 sin y4  0.53111  2  0.2 (0.8) 2 sin(0.58061)   0.67153 The given differential equation is solved analytically. With given initial condition, the solution reads



y  2 tan 1 e x

3

/3

tan(0.25)



Analytical and numerical solutions are compared in the Table 6.2. Table 6.2 Comparison of numerical and analytical results of dy/dx = x2 sin y, y(0) = 0.5

Grid point xi :

0.2

0.4

0.6

0.8

1.0

Exact solution y(xi) :

0.50128

0.51032

0.53563

0.58816

0.68466

Numerical solution yi :

0.5

0.50767

0.53111

0.58061

0.67153

6.3.4 Trapezoidal Method An alternative approach to derive numerical methods of preceding sections is to integrate the differential equation dy/dx = f (x, y) with respect to x from x = xi to xi+1



xi 1 xi

xi 1  dy    dx  xi f ( x, y ) dx  dx 

y ( xi 1 )  y ( xi )  

xi 1

xi

f ( x, y ) dx

(6.22)

The integrand f (x, y) in right-hand side of equation (6.22) represents the slope of the solution curve y(x), which varies continuously from point to point of the solution curve. If the varying slope f (x, y) in the interval [xi, xi+1] is approximated by a constant value f (xi, y(xi)), i.e. by the slope at the initial point of the interval, then equation (6.22) becomes

y ( xi 1 )  y ( xi )  f  xi , y ( xi )  

xi 1 xi

y ( xi 1 )  y ( xi )  h f  xi , y ( xi ) 

In numerical approximation, this equation can be written as

yi 1  yi  h f ( xi , yi )

dx

6.18

6.4

Numerical Solution of Ordinary Differential Equations

ERROR IN NUMERICAL METHODS

When we use a numerical method to solve an initial value problem, we often want to know the accuracy of the solution. In previous examples, we saw that the numerical solution of a given problem did not agree completely with the corresponding analytical solution. The difference between the analytical and numerical solutions indicates error in the numerical solution. There are number of ways to describe the error. Suppose that yi denotes the numerical solution at ith grid point and y(xi) is the corresponding exact solution, then the difference

 i  y  xi   yi ,

i  1, 2, ..., N

(6.25)

is known as the global error of the numerical solution at the ith grid point. To highlight the magnitude of the numerical error at a specific grid point, we define the relative error Relative error =

y ( xi )  yi y ( xi )

 100

(6.26)

It is difficult to know the exact value of εi, because the true solution y(xi) is not known in most of the cases. However, we will be happy to get some estimation of εi. Even this is not an easy task, because the error propagates as we move from one grid point to another grid point. Error in numerical solution is usually combination of the truncation error, and the round-off error. Taylor series is used as a fundamental tool for deriving a numerical method. We have already seen in Euler and midpoint methods that the derivative term of the initial value problem is replaced by a truncated Taylor series. It means that we actually approximate an infinite series by first few significant terms of the series. Obviously, the numerical solution will contain certain amount of error at each of the grid point. The error arising due to use of truncated Taylor series is called the truncation error. Precision of our computation depends on the machine that we use. Digital computers have precision limits on their ability to represent numbers. Most of the computers round off real numbers after certain places of decimal and use this approximation to generate the solution at the next grid point. This leads to another form of error in numerical methods. Such unavoidable errors are called round-off errors and significantly affect the result when the number of steps N is quite large. Even if no truncation error is involved, round-off errors are present. Certain numerical methods are highly sensitive to the roundoff errors and occasionally lead to disastrous results. A classical example of what round-off error can do, is the failure of Patriot Missile on 25 Feb 1991 in the Persian Gulf War. The antiballistic Patriot missiles were deployed by U.S. troops to intercept the Iraqi Scud missiles. Due to poor handling of rounding errors, a Scud missile evaded Patriot missile and hit a U.S. army base in Dhran, Saudi Arebia, killing 28 soldiers. Usually, truncation errors increase with step size, whereas the round-off errors decrease as the step size increases. In numerical methods, the error propagates. For example, when we apply forward Euler method to an initial value problem, we obtain the solution y1 at the first grid point. This solution contains some error, because we have used truncated Taylor series. We now use this value, and not the exact value y(x1), to calculate y2. Thus, the accuracy of y2 is influenced by the error contained in estimation of y1 as well as the truncated Taylor series used for evaluation of y2. Hence, the value y2 contain more error than y1. The error accumulates further as we move on to the next grid point. Geometrical interpretation of the global error in forward Euler method is presented in Figure 6.4. Curves C1, C2, C3 etc are the members of the family of curves that constitutes the solution of dy/dx = f (x, y). Here C1 is the solution curve satisfying condition y(x0) = y0. When we obtain y1 using forward Euler method, we switch from C1 to some other curve C2. It means that we approximate the exact solution BL by y2. Thus the error at this stage is BP. Now, when we move from point x1 to x2, we use the tangent to C2 at P instead of

6.33

Fundamentals of Numerical Methods h3 y ( ), xi    xi 1 3!

 i 1  Third order Taylor series (p = 3)

yi 1  yi  h yi 

h2 h3 yi yi 2! 3!

Local truncation error is

 i 1 

h 4 ( iv ) y ( ), xi    xi 1 4!

The major advantage of Taylor series method is that the upper bound of error in its solution near x = x0 can be predicted. Let’s write equation (6.51) for all x in the neighborhood of x0 as y ( x )  y0  ( x  x0 ) y0 

( x  x0 ) 2 ( x  x0 ) p ( p ) y0    y0  E p ( x ) 2! p!

(6.56)

where Ep (x) is the error in estimating the actual solution y(x) by a polynomial of degree p. Note that the terms y0, y0, y0 etc in the right-hand side are the exact values of y(x0), y(x0), y(x0) etc in the respective order. That means that the error Ep(x) is same as the truncation error. If the polynomial consisting first p + 1 terms in the right is denoted by P(x), then the absolute value of the error is given by

E p ( x)  y ( x)  P( x)

(6.57)

We now state a theorem that provides us a bound of the error Ep (x). Theorem 6.3. (Lagrange’s bound of error) Suppose that a function y(x) and its first p + 1 derivatives are continuous. If y(x) is approximated by a pth degree Taylor series polynomial P(x) centered at x = x0, then the bound of global error bound is given by

| E p ( x) | 

M p 1  x  x0  ( p  1)!

(6.58)

where M is the bound of | y(p + 1)| in the interval [x0, x]. Proof. Differentiating equation (6.57) p + 1 time and noting P(x) is a polynomial of degree p, we get E p( p 1) ( x )  y ( p 1) ( x )

If M is the bound of | y(p+1)(x) | in the interval [x0, x], then the above equation translates to

E p( p 1) ( x )  M The desired expression for the error bound can now be obtained by repetitively integrating this relation p + 1 times. First integration with respect to variable x results in

E

( p 1) p

( x) dx   M dx  Mx  C

where C is the constant of integration. Also, by property of integrals

6.39

Fundamentals of Numerical Methods 6.7.2 Runge-Kutta Methods

Taylor series method has tremendous ability to keep the errors small, but it also has a major disadvantage of requiring higher order partial derivatives of the function f (x, y) at the beginning of each step in order to approximate the solution at the end of the step. Since, most of the real world problems involve complicated differential equations; evaluation of higher order partial derivatives may be very onerous task. Runge-Kutta methods constitute an important family of iterative methods for numerical solution of initial value problems, which were developed around 1900 by German mathematician C. Runge and M.W. Kutta. These methods include a family of explicit and implicit schemes of various orders which maintain the desirable accuracy of the Taylor series method without requiring the evaluation of partial derivatives. Instead, they simulate the effect of higher order derivatives by simply evaluating the function f (x, y) several times in the interval [xi, xi +1]. As we understand that the slope f (x, y) of the solution curve y(x) of the initial value problem dy/dx = f (x, y), y(x0) = y0, varies continuously in the interval [xi, xi +1]. A close look at the Euler methods reveals that in forward Euler method, the varying slope of the solution curve in the interval [xi, xi +1] is approximated by f (xi, yi), whereas the same is approximated by f (xi +1, yi +1) in the backward Euler method. Midpoint method uses the slope at the midpoint of interval [xi, xi +2] to approximate the solution at the current grid point. The Runge-Kutta methods are based on the idea of taking weighted average of the slopes at multiple points in the interval [xi, xi +1]. A Runge-Kutta method that takes ν slopes into account is defined as follows:

yi 1  yi  ( w1 K1  w2 K 2  w3 K 3    w K )

(6.60)

where w1, w2, …, wν are the weights associated with slopes of the solution curve in the interval [xi, xi +1]. K1, K2, …, Kν are the slope parameters defined as follows:

K1  h f  xi  a1 h, yi  b11 K1  b12 K 2    b1 K     K 2  h f  xi  a2 h, yi  b21 K1  b22 K 2    b2 K         K  h f  xi  a h, yi  b 1 K1  b 2 K 2    b  K  

(6.61)

In order to specify a particular method, we need to provide an integer ν, and parameters wi’s, ai’s and bij’s. Since K1, K2, K3, …, Kν are implicitly defined in the above equations, this method is known as fully implicit ν-stages Runge-Kutta method. The derivation of implicit Runge-Kutta methods is exceedingly complicated. In this chapter, we will confine our discussion to ν-stage explicit RungeKutta methods in which K1, K2, …, Kν are defined as follows:

K1  h f ( xi , yi )

   K 2  h f  xi  a2 h, yi  b21 K1         K  h f  xi  a h, yi  b 1 K1  b 2 K 2    b  1 K 1  

(6.62)

6.56


6.7.3 Adaptive Step-Size Control: Embedded Runge-Kutta Methods Some differential equation may have solution that changes rapidly in certain interval of x, while in other intervals, it changes relatively slower. In such problems, keeping a constant step-size throughout the solution domain may not be a good idea. Rather, the step size should be small when the solution changes rapidly, and at other points, it should be reasonably large. Our aim in this section is to develop numerical methods that can automatically select the step size to control the error in the approximations. A numerical method is called an adaptive method if it has the capability of selecting an appropriate step size at every step. The first component of any adaptive method is to develop a procedure to estimate the error in the approximated solution. It has been mentioned earlier that it is very difficult to guess the global error in the solution; however, we shall be happy even if we get a reasonably good idea of the local truncation error. So, in the following part of the section, we discuss a generic approach to estimate the local truncation error of a single step method. Consider an initial value problem

dy  f  x, y  , y  x0   y0 dx

(6.82)

Let yi is an approximated solution of this problem at x = xi. Now suppose that the following single-step method of order p1 is used to estimate the solution at x = xi + 1 yi 1  yi  h   xi , yi , h 

(6.83)

Our aim is to obtain an estimation of the local truncation error of method (6.83) at the current step xi+1. Let’s assume that another single step method of order p2 (where p2 > p1) with increment function ϕ1(xi, yi, h) is also used to estimate the solution of (6.82) at xi+1. Suppose that the solution predicted by this method be . So, we have

yi 1  yi  h 1  xi , yi , h 

(6.84)

Now consider another initial value problem

dy  f  x, y  , y  xi   yi dx

(6.85)

whose exact solution at x = xi +1 is Yi +1. If method (6.83) is applied to problem (6.85) to approximate the solution at x = xi +1, then the expression yi + h ϕ (xi, yi, h) is based on exact solution yi and thus the difference Yi +1 – yi – h ϕ (xi, yi, h) would only contain the local truncation error of the method (6.83) at the current step xi +1. Since the method is of order p1 and yi + h ϕ (xi, yi, h) = yi +1, so we can write



Yi 1  yi 1  C1 h p1 1  O h p1  2



(6.86)

Similarly, if method (6.84) is applied to the problem (6.85), we have



Yi 1  y i 1  O h p2 1



(6.87)

Subtracting (6.87) from (6.86) and nothing that p2 > p1, we obtain



y i 1  yi 1  C1 h p1 1  O h p1  2



6.78

Numerical Solution of Ordinary Differential Equations x

x

x0

x0

w3  w0   f 3  x, u2 , v2 , w2  dx  w0  

 w2 x  u2  dx

x  5 x 2 x3 3x 4  x 4    2 3  3    3  x     x   1  2 x  2 x  x    dx 2 2 8  8    0  x 7 x3 5 x 4 3x5   3   1  5 x  3x 2     dx 0 2 8 8  

 3 x 

5x2 7 x 4 x5 x6  x3    2 8 8 16

Now, substitute x = 0.5 in the expressions for u3(x), v3(x), w3(x). We get u (0.5)  1  2  0.5  2  (0.5) 2  v (0.5)  2  3  0.5  w(0.5)  3  0.5 

6.9

7  (0.5)3 7  (0.5) 4 (0.5)5 (0.5) 6     2.708659 6 8 4 48

3  (0.5) 2 11  (0.5)3 (0.5) 4 3  (0.5)5 (0.5) 6      4.147786 2 6 2 8 24

5  (0.5) 2 7  (0.5) 4 (0.5)5 (0.5) 6  (0.5)3     4.30957 2 8 8 16

CONVERGENCE AND STABILITY OF SINGLE-STEP METHODS

Earlier in this chapter, we defined three important terms, namely convergence, consistency and stability of a numerical method. Consistency of a numerical method refers to reduction of the truncation error with step size. We say that a numerical method is consistent if τi → 0 as h → 0. Convergence, on the other hand, refers to reduction in the global error with step size. A numerical method is said to be convergent if the global error | y(xi) – yi | tends to zero when the step size h becomes sufficiently small. Stability of a numerical method deals with sensitivity of the solution to small perturbation errors. If the method does not let these perturbations grow beyond a stipulated bound, then we say that the method is stable. Mathematically, a numerical method is called stable if the combined effect of all errors (truncation and round-off) is bounded as the number of grid points N tend to infinity, i.e.

 i  y( xi )  yi  M

as N  

(6.108)

Among the aforementioned terms, convergence is the most important thing that we look for in a numerical method. If a numerical method is not convergent, then all the solutions that we obtain from it are just useless numbers. However, it is not easy to directly comment on the convergence of a method because for commenting on convergence, we need to have a fair estimation of global error which is an extremely difficult task. So, in the sections to follow, we would try to comment on the convergence of a method by linking it to the remaining two aspects, namely consistency and stability. Some authors define stability in terms of perturbation in the initial condition. Let {yi} be the sequence of approximation of an initial value problem dy/dx = f (x, y), y(x0) = y0 using a numerical method. Now suppose that {wi} is the sequence of approximations using the same numerical method when the initial condition is slightly perturb to y(x0) = y0 + ε. The method is said to be stable if there

6.92


yielding  y0   0.98 0.18  1  0.8   y1   z   B  z    0.18 0.8  1   0.98       1  0 

and  y2   y1   0.98 0.18   0.8   0.6076   z   B  z    0.18 0.8   0.98    0.928       2  1 

Example 6.28 If three-stage Runge-Kutta method is applied to the problem

dy z dx dz   y, dx

0

then determine the optimal step size that produces stable solution. Stability of the solution depends on the given problem as well as the applied numerical method. In the present case, the eigenvalues of the coefficient matrix A are obtained as

A  I  0  

or

 0 

implying that λ = ± ω i. If h is the step size then the Runge-Kutta method produces stable results if 1 h  

1

or

h   2

2



h   6

3

1

 h2 2  h2 2  i h  1   1 2 6   2

2

or

 h 2 2  h 2 2  2 2 1    h  1   1 2  6   

or

h 6 6 h 4 4  0 36 12

resulting in the condition ℎ < √3/ .

6.10

STIFF EQUATIONS

Stability is not an issue concerning with a numerical method alone. It depends on the differential equation, initial values and the numerical scheme applied for solution. For instance, suppose that the initial value problem

6.102

6.11


MULTISTEP METHODS

In preceding sections, we have discussed few single-step methods. These methods have the ability to determine the solution at a grid point with the help of the solution at the immediate previous grid point. No need to go far in the previous history of the solution. Although, methods like Runge-Kutta methods might use the value of function at several points between xi and xi+1; they do not retain this information for direct use in future approximations. Indeed, the entire information used by single step methods is obtained within the interval over which the solution is being approximated. The main disadvantage of single-step methods is that a low order single-step method suffers from low accuracy, whereas a higher order method needs large number of function evaluations at each grid point. There is another class of numerical methods which require solutions at several previous grid points for estimating the current solution. These methods are known as multistep method. A multistep method is called a k-step method if it uses the values of y and its derivative y ʹ at k previous nodes. These methods are further divided in two major categories, namely, explicit and implicit methods. A k-step method is explicit if the current approximation yi+1 can be explicitly expressed in terms of k previous values of y and its derivative y ʹ. A method that is not explicit is known as implicit method. A k-step method for numerical solution of dy/dx = f (x, y), y(x0) = y0 is expressible in the form

yi 1  a1 yi  a2 yi 1   ak yi k 1  h b0 yi1  b1 yi  b2 yi1   bk yik 1 

(6.137)

where a1, a2, …, ak, b0, b1, …, bk are constants, and i = k –1, k, …, N –1. While if b0 = 0, the method is explicit since it provides yi +1 explicitly in terms of solutions at previous grid points; when b0 ≠ 0, the method is implicit since the term yi +1 is present in both sides of the equation. Equation (6.137) contains linear relationship between yi, yi –1, …, yi – k +1 and yʹi +1, yʹi,…, yʹi – k +1, so these methods are also known as linear multistep methods. Linear multistep methods are divided in four major categories: (i) Adams-Bashforth methods, (ii) Adams-Moulton methods, (iii) Nystrӧm and Milne-Simpsons methods, and (iv) Backward Differentiation Formulas. Methods of Adams-Bashforth and Nystrӧm family are explicit, whereas Adams-Moulton, Milne-Simpson and Backward differentiation formulas are implicit methods. An important point to note here is that all the aforesaid multistep methods are based on polynomial interpolation. There may be difference in the manner that this polynomial is used. While the first three categories rely on numerical integration of the interpolating polynomial; the last category uses numerical differentiation of the polynomial for approximation of the solution. Derivations of multistep methods of categories (i), (ii) and (iii) are based on a common approach; approximate the function f (x, y) of the initial value problem dy/dx = f (x, y), y(x0) = y0 by an interpolating polynomial, and integrate the resulting equation over an appropriate interval of x to obtain estimation of y(x). 6.11.1 Adams-Bashforth Methods Integrating the equation dy/dx = f (x, y) with respect to x from x = xi to xi +1, we get

y  xi 1   y  xi   

xi 1 xi

f  x, y ( x )  dx

The function f (x, y(x)) cannot be integrated without knowing y(x). Therefore, an acceptable approach is to replace f (x, y(x)) by an interpolating polynomial P(x) so that we can obtain an approximation

y ( xi 1 )  y ( xi )  

xi 1 xi

P  x  dx

6.122


The roots of the quadratic equation 0.065333y42 + y4 – 0.843244 = 0 are given by

y4 

1  1  4  0.065333( 0.843244)  0.801295,  16.107496 2  0.065333

We retain y4 = 0.801295. Lastly, for i = 3 h y5  y3    x52 y52  4 x42 y42  x32 y32  3 0.1  (1.5) 2 y52  4(1.4)2 (0.801295)2  (1.3)2 (0.938393) 2   0.938393  3

  0.075 y42  0.720992 The roots of the quadratic equation are

y5 

1  1  4  0.075(0.720992)  0.685726,  14.019059 2  0.075

We retain y5 = 0.685726. Analytical solution of the given initial value problem is y(x) = 3/(x3+1). Comparison of the analytical and numerical solution is presented in Table 6.16 Table 6.16 Comparison of exact solution with Milne-Simpsons method for dy/dx = – x2y2, y(1) = 1.5

Grid point xi :

1.1

1.2

1.3

1.4

1.5

Exact solution y(xi) :

1.287001

1.099706

0.938379

0.801282

0.685714

Numerical solution yi :

1.287008

1.099719

0.938393

0.801295

0.685726

6.11.5 Backward Differentiation Formulas Backward differentiation formulas (BDF) constitute a family of implicit methods that are most popular for numerical solution of stiff problems. These methods are derived by adopting a procedure which is completely opposite to the Adams-Bashforth and Adams-Moulton methods. We have seen in preceding sections that the multistep methods of Adams family are derived by integrating the polynomial that approximates the function f (x, y(x)). In contrast, the backward differentiation formulas are derived by differentiating a polynomial that approximates y(x), and setting its derivatives y ʹ(xj) equal to f (xj, yj). If the solution y(x) of the initial value problem dy/dx = f (x, y), y(x0) = y0 is approximated by an interpolating polynomial Pk (x) of degree k passing through k +1 points (xi +1, yi +1), (xi, yi), (xi –1, yi –1), …, (xi – k +1, yi – k +1), then we have seen that y ( x )  Pk ( x )  yi 1  

 x  xi 1  h

yi 1 

 x  xi 1  x  xi  2! h 2

 x  xi 1  x  xi  x  xi 1  3! h

3

 2 yi 1

 3 yi 1   

 x  xi 1  x  xi  ...  x  xi  k  2  k ! hk

 k yi 1 (6.158)

Differentiating both sides with respect to x, and setting x = xi +1 and P ʹ(xi+1) = f (xi+1, yi+1), we obtain

6.137


6.13

STABILITY OF MULTISTEP METHODS

Recall the key concept of stability analysis that we discussed for single step methods in Section 6.9. We had mentioned there that a numerical method is called stable if the combined effect of all errors (truncation and round-off) is bounded as the number of grid point N tends to infinity, i.e.

 i  y ( xi )  yi  M

as N  

For stability analysis of a single step method, we used the test equation dy/dx = f (x, y), y(x0) = y0 and converted the given method in the form yi +1 = E(λh) yi. Finally, for λ < 0, the interval of absolute stability of the single step method was obtained from the condition | E(λh) | < 1. In case of multistep methods, stability analysis is carried out in slightly different manner. Firstly, we apply the multistep method to the same test equation. This would convert the method in a homogeneous difference equation which has a characteristic equation associated with it. Finally, the stability criterion of the multistep method is defined by imposing a condition on roots of the characteristic equation. The complete procedure is described below: Applying the test equation dy/dx = λy, y(x0) = y0 to the general multistep method

yi 1  a1 yi  a2 yi 1    ak yi  k 1  h  b0 yi1  b1 yi  b2 yi1    bk yi k 1  yielding

yi 1  a1 yi  a2 yi 1    ak yi  k 1  h  b0 yi 1  b1 yi  b2 yi 1    bk yi  k 1 

(6.186)

Equation (6.186) is a kth order homogeneous difference equation with constant coefficient. Suppose that its solution is of the form yi  c  i

(6.187)

where c is constant. Using equation (6.187) in (6.186) and dividing the resulting equation by ξi – k +1, we get

 k  a1  k 1  a2  k  2    ak  h  b0  k  b1  k 1  b2  k  2    bk   0

(6.188)

Equation (6.188) is known as the characteristic equation of the multistep method. Let’s denote

 ( )   k  a1  k 1  a2  k  2    ak

(6.189)

 ( )  b0  k  b1  k 1  b2  k  2    bk

(6.190)

so that equation (6.188) reduces to

 ( )   h  ( )  0

(6.191)

The terms ρ(ξ) and σ(ξ) are known as the first and second characteristic polynomials respectively. As mentioned earlier, for a consistent linear multistep method, we should have

 1  0

6.147


Routh-Hurwitz criterion for k = 3 requires that c0 > 0, c1 > 0, c2 > 0, c3 > 0 and c1 c2 – c0 c3 > 0. These conditions translate to

2

11  2  0, 4   0, 2  2   0,    0 3 3

2  11     4  2  2     2  0 3 3    

and

Solving, we get the interval of absolute stability is (– 6/11, 0).

6.14

PREDICTOR-CORRECTOR METHODS

The major drawback of implicit multistep methods is that it requires solving of a nonlinear implicit equation. This aspect can be circumvented to a large extent by using predictor-corrector methods. Predictor-corrector methods are based on the idea of combining the strength of an explicit method with high stability of an implicit method. These methods usually proceed in two steps. The first step is a predictor that calculates a rough approximation of the solution yi +1 by using an explicit method. The second step is a corrector that refines the rough approximation by using an implicit method. Corrector step is iterated until the refined values attain a predetermined level of accuracy. A predictor-corrector method has different variant depending on the number of times that the corrector is repeated. Suppose that the predictor is an explicit linear k-step method k

k

j 1

j 1

yi(0) 1   a j yi  j 1  h  b j f i  j 1

(6.199)

where yi(0) 1 denotes the predicted numerical solution. The corrector is an iterative implicit linear multistep method. For simplicity, we assume that the corrector is also a k-step method, given by k

k

j 1

j 1

yi(s11)   a j yi  j 1  h b0 f ( xi 1 , yi(s1) )  h  b j f i  j 1 , s  0, 1, 2,...

(6.200)

Each time the corrector (6.200) is used, an evaluation of the function f  xi 1 , yi(s1)  is made. A method is called a PEC method if it uses one predictor P, one evaluation E of function f (xi +1, y(s)i + 1) , and one corrector C. Suppose that the corrector C is iterated twice in the hope of achieving better accuracy, and each iteration requires one function evaluation E then the mode is denoted by PECEC or P(EC)2 where

P:

k

k

yi(0) 1   a j yi  j 1  h  b j f i  j 1 j 1

(0) i 1

j 1

(0) i 1

E:

f

 f ( xi 1 , y )

C:

yi(1)1   a j yi  j 1  h b0 f i (0)1  h b j f i  j 1

k

k

j 1

j 1

E:

(1) f i (1) 1  f ( xi 1 , yi 1 )

C:

(1) yi(2) 1   a j yi  j 1  h b0 f i 1  h  b j f i  j 1

k

k

j 1

j 1

(6.201)

6.150


For rapid convergence of the method, we would like that h L |b0| be much smaller than unity. In practice we prefer h L |b0| ≈ 0.1 or such values. Condition (6.206) can be readily applied to the multistep methods discussed in preceding sections. For instance, when the second order AdamsMoulton method is used as corrector, we have b0 = 1/2 so the convergence condition is h L < 2. Similarly for third and fourth order methods, we respectively have b0 = 5/12 implying that hL < 12/5 and b0 = 3/8 which implies that hL < 8/3.

6.15

MODIFIED PREDICTOR-CORRECTOR METHODS

If the orders of predictor and corrector methods are equal, then this feature can be used to estimate the local truncation error, and whereby, reduce the number of iterations of the corrector formula, leading to a modified Predictor-Corrector method. To illustrate this process, we take explicit midpoint method as predictor and implicit trapezoidal method as corrector. Both predictor and corrector are second order methods with their truncation error as follows: Predictor (Midpoint method):

yi 1  yi 1  2h f  xi , yi  ,  iP1 

h3 y  1  , 3

xi 1  1  xi 1

(6.207)

Corrector (Trapezoidal method):

yi 1  yi 

h h3  f  xi 1 , yi 1   f  xi , yi   ,  iC1   y   2  , xi   2  xi 1  2 12

(6.208)

(1) Denote the predicted value by yi(0) 1 , and first corrected value by yi 1 , it follows that the exact solution y(xi +1) is given by h3 y  xi 1   yi(0)  y  1  1 3

y  xi 1   yi(1)1  Subtracting, we get

yi(1)1  yi(0) 1 

h3 y   2  12

h3 h3 y  1   y   2  3 12

(6.209)

Assuming that the 3rd order derivative doesn’t vary much in the interval, i.e. y(1) ≈ y(2) = y(), then the above equation becomes 5h 3 yi(1)1  yi(0)  y    , xi 1    xi 1 (6.210) 1 12 (1) We now have a measure of the local truncation error in terms of yi(0) 1 and yi 1 . This can be used in (6.207) and (6.208) to obtain truncation error of predictor and corrector formula. Here, we get

4

  iP1   yi(1)1  yi(0) 1  5 1 5

  iC1    yi(1)1  yi(0) 1 

(6.211) (6.212)

approximation of roots of polynomial and

approximation of roots of polynomial and

Suggest Documents

Exact Bivariate Polynomial Factorization in Q by Approximation of Roots

Radical Representation of Polynomial Roots

On the roots of the subtree polynomial

The Polynomial Roots Repartition and Minimum Roots ... - wseas

code optimization of polynomial approximation ... - Semantic Scholar

Polynomial Approximation of Closed-Form MPC

Local Polynomial Approximation for Unsupervised Segmentation of ...

Polynomial Approximation of Blinn-Phong Model

POLYNOMIAL APPROXIMATION OF ONE PARAMETER ... - FAU

A much better polynomial time approximation of

Polynomial approximation of aerodynamic coefficients based ... - ORBi

Polynomial Roots and Calabi-Yau Geometries - arXiv

Polynomial Roots and Calabi-Yau Geometries

Interpolation, Extrapolation & Polynomial Approximation

Orthogonal Polynomial Approximation - Aerospace Engineering

Function Approximation by Polynomial Wavelets

Weighted Polynomial Approximation (1) - Core

Polynomial Approximation Schemes for Smoothed and ... - CiteSeerX

Analytic regularity and polynomial approximation of parametric and

Sum of Roots, Polynomial Spectral Factorization, and Control

New and Sharp Bounds for orthogonal polynomial approximation of ...

Statistical distribution of roots of a polynomial modulo primes

Symbolic Computation of the Roots of any Polynomial with ... - arXiv

Symbolic Computation of the Roots of any Polynomial with ... - arXiv