Backward error bounds for 2 χ 2 linear systems ...

3 downloads 0 Views 264KB Size Report
Jul 1, 2015 - 2 Division of Mathematical Sciences, Tokyo Woman's Christian University. 2-6-1 Zempukuji, Suginami-ku, Tokyo 167-8585, Japan. 3 CREST ...
NOLTA, IEICE Paper

Backward error bounds for 2 × 2 linear systems arising in the diagonal pivoting method Kenta Kobayashi 1 ,3 a) and Takeshi Ogita 2 ,3 b) 1

Graduate School of Commerce and Management, Hitotsubashi University 2-1 Naka, Kunitachi, Tokyo 186-8601, Japan

2

Division of Mathematical Sciences, Tokyo Woman’s Christian University 2-6-1 Zempukuji, Suginami-ku, Tokyo 167-8585, Japan

3

CREST, JST

a) b)

[email protected] [email protected]

Received October 13, 2014; Revised January 21, 2015; Published July 1, 2015 Abstract: Matrix factorizations such as LU, Cholesky and others are widely used for solving linear systems. In particular, the diagonal pivoting method can be applied to symmetric and indefinite matrices. Floating-point arithmetic is extensively used for this purpose. Since finite precision numbers are treated, rounding errors are involved in computed results. In this paper rigorous backward error bounds for 2 × 2 linear systems which arise in the factorization process of the diagonal pivoting method are given. These bounds are much better than previously known ones. Key Words: rounding error analysis, backward error bound, floating-point arithmetic, diagonal pivoting method

1. Introduction and main result In this paper we are concerned with the rounding error analysis on the diagonal pivoting method1 when using floating-point arithmetic. Since finite precision numbers are used, rounding errors are involved in computed results. For the backward error analysis on the diagonal pivoting method, it is necessary to derive a backward error bound for 2 × 2 linear systems during the factorization process. For the diagonal pivoting method there are several pivoting strategies such as Bunch–Parlett [1], Bunch–Kaufman [2] and so forth. Rounding error analyses for the diagonal pivoting method are presented in [3–5]. For example, see [5] for details. Let F be a set of floating-point numbers. We assume the use of the following standard model of floating-point arithmetic barring overflow and underflow: For a, b ∈ F and ◦ ∈ {+, −, ∗, /} it holds 1

It is also known as the block LDLT factorization.

383 Nonlinear Theory and Its Applications, IEICE, vol. 6, no. 3, pp. 383–390

c IEICE 2015

DOI: 10.1588/nolta.6.383

that fl (a ◦ b) = (1 + δ)(a ◦ b),

|δ| ≤ u.

(1)

Throughout the paper we use the notation δi with |δi | ≤ u, i = 1, 2, . . . , to treat the rounding errors in floating-point arithmetic. For rounding error analysis as in [6] we introduce the constant γn =

nu 1 − nu

with the implicit assumption nu < 1. If nu  1, then γn ∼ nu. We refer to backward error bounds for a solution of a 2 × 2 linear system which is a pivot in the factorization process of the diagonal pivoting method. We denote the pivot as the real symmetric 2 × 2 matrix A:   a11 a12 A= ∈ F2×2 , a12 = a21 =: β. a21 a22 We assume that the following conditions hold: |a11 | < α|β|, 2 2

|a11 a22 | < α β ,

(2) (3)

√ where α is a constant for controlling the pivoting, e.g. α = (1 + 17)/8 ∈ [0.64, 0.641] in Bunch– Kaufman pivoting. In fact, the conditions (2) and (3) hold in most of pivoting strategies such as Bunch–Parlett, Bunch–Kaufman and others. We also assume that the 2 × 2 linear system Ax = b where b = (b1 , b2 )T ∈ F2 is successfully  is obtained. According to the solved by using floating-point arithmetic, and a computed solution x Oettli–Prager theorem [6, Theorem 7.3], the componentwise backward error defined as ) := min{ : (A + ΔA)x  = b, |ΔA| ≤ |A|} ω(x ΔA

is characterized by ) = max ω(x i

)i | |(b − Ax |)i (|A||x

(with 0/0 := 1),

(4)

) where | · | denotes the componentwise absolute value for vectors or matrices. We can see that ω(x . Our goal is to provide an upper bound of ω(x ) for arbitrary A and b with the depends on A, b and x conditions (2) and (3) as the constant c satisfying  = b, (A + ΔA)x

|ΔA| ≤ c |A|.

(5)

), there exists the backward error ΔA satisfying (5). Note that ΔA is not uniquely For any c ≥ ω(x determined in general.  of Ax = b depends on how to solve it, and the constant c also does. The approximate solution x There are mainly two ways; One is to apply Gaussian elimination with partial pivoting (GEPP) for solving Ax = b. The other is to compute the explicit inverse A−1 of A such as   1 a22 −a21 −1 A b = b (6) a11 a22 − a221 −a21 a11 ⎡ a ⎤ 22 −1 1 ⎥  ⎢ = (7) ⎣ a21 a11 ⎦ b. a11 a22 −1 a21 · −1 a21 a21 a21 √ ) for α = (1 + 17)/8 are given for the Bunch–Kaufman pivoting in cases In [3] estimations of ω(x of GEPP and the explicit inversion with scaling as in (7). The latter is adopted in the LAPACK auxiliary routine xLASYF.

384



c

=

(GEPP)

2



4γ2 + O(u2 )

γ180 + O(u ) (explicit inversion with scaling as in (7)) 8u

(GEPP)

180u (explicit inversion with scaling as in (7))

(8)

.

Note that these are derived with ignoring O(u2 ) terms. Therefore we modify this point for the sake of completeness. Moreover, we also deal with the case of using the form (6) which does not apply scaling. We will show that using (6) gives smaller c than (7). Although some rough estimates often suffice to show the backward stability of numerical algorithms, rigorous and computable ones are mandatory in verified numerical computations, in particular precise estimations are preferable. Therefore we refine backward error bounds for the 2 × 2 linear system ) as small as possible. Ax = b, i.e., we try to provide upper bounds of ω(x By our rounding error analysis we will prove the following theorem. Theorem 1.1 Suppose α ∈ [0.64, 0.641]. Under the conditions (2) and (3) we can choose ⎧ 3(1 + 2α2 ) ⎪ ⎪ γ2 for u < 2−1 (GEPP) ⎪ ⎪ 2 ⎪ ⎪ ⎪ ⎨ 9 + α2 c := γ for u ≤ 2−7 (explicit inversion without scaling as in (6)) 2 ) 13 ⎪ 13(1 − α ⎪ ⎪ ⎪ ⎪ 10 + 3α2 ⎪ ⎪ ⎩ γ16 for u ≤ 2−11 (explicit inversion with scaling as in (7)) 16(1 − α2 ) and

⎧ for u ≤ 2−9 ⎪ ⎨ 5.5u c ≤ 16u for u ≤ 2−13 ⎪ ⎩ 19.1u for u ≤ 2−13

(9)

(GEPP) (explicit inversion without scaling as in (6)) .

(10)

(explicit inversion with scaling as in (7))

√ Note that α = (1 + 17)/8 ∈ [0.64, 0.641]. It turns out that each of the bounds in the above theorem is considerably smaller than that in (8).

2. Rounding error analysis For later use we first present the following lemma: Lemma 2.1 Let m, n ∈ N and a, u ∈ R with 0 < a < 1 and 0 < u < n−1 be given. For k ∈ N satisfying m + an , k≥ (1 − a)(1 − nu) it holds that (1 − u)m − a(1 + u)n ≥ (1 − ku)(1 − a). Proof. It is straightforward that   (m + an)u (m + an)u (1 − ku)(1 − a) ≤ 1− (1 − a) = 1 − a − (1 − a)(1 − nu) 1 − nu     mu nu = 1− −a 1+ ≤ (1 − mu) − a(1 − nu)−1 1 − nu 1 − nu ≤ (1 − u)m − a(1 + u)n , since 1 − pu ≤ (1 − u)p ≤ 1/(1 + u)p for any p ∈ N.  In the following we will prove (9). The inequalities (10) immediately follow from (9). The proofs are given by carefully treating the rounding errors with considering their cancellations. Put ξk := 1 + δk . Then 1 − u ≤ |ξk | ≤ 1 + u. Moreover, we abbriviate ξk1 ξk2 . . . ξkm as ξk1 ,k2 ,...,km for readability.

385

2.1 Gaussian elimination with partial pivoting The process of solving the linear system Ax = b by GEPP can be written as follows: w :=

a11 , β

x2 :=

z2 , β − wa22

z1 := b2 ,

z2 := b1 − wz1 ,

x1 :=

z1 − a22 x2 . β

Note that from the condition (2) the first and second rows are permuted. Taking the rounding errors into account we have  := w

a11 ξ1 , β

2 := x

 2 ξ2 )ξ3 (b1 − wb ξ6 ,  22 ξ4 )ξ5 (β − wa

1 := x

2 ξ7 )ξ8 (b2 − a22 x ξ9 . β

A little computation yields ⎡ = x

(β 2

⎢ −a22 ξ3,6,7,8,9 1 ⎢ − a11 a22 ξ1,4 )ξ5 ⎣ βξ3,6

⎤ {β 2 ξ5 + a11 a22 ξ1 (ξ2,3,6,7 − ξ4,5 )}ξ8,9 ⎥ β ⎥ b, ⎦ −a11 ξ1,2,3,6

which implies  x  = b, A

Here

 := A

 a11  a21

 a12  a22



a11 ξ1,2 ⎢ ξ8,9 ⎢ =⎢ ⎣ β ξ8,9



⎤ β 2 ξ5 + a11 a22 ξ1 (ξ2,3,6,7 − ξ4,5 ) ⎥ βξ3,6 ⎥ ⎥. ⎦ a22 ξ7

       ξ1,2 (1 + u)2   a11 − a11 | = a11 −1 ≤ − 1 |a11 | ≤ 2γ2 |a11 |, | ξ (1 − u)2 8,9

      a11 a22 ξ1 ξ4,5  ξ5  a12 − a12 | = β −1 + ξ2,7 − | ξ3,6 β ξ3,6      1+u (1 + u)2 2 2 ≤ − 1 + α (1 + u) − (1 − u) |a12 | (1 − u)2 (1 − u)2     1 = (1 + u) − (1 − u)2 + α2 (1 + u) (1 + u)2 − (1 − u)4 |a12 | 2 (1 − u)  1  ≤ (3u − u2 ) + α2 (6u + 2u2 ) |a12 | 1 − 2u  u  3(1 + 2α2 )u |a12 | = 3(1 + 2α2 ) − u(1 − 2α2 ) |a12 | ≤ 1 − 2u 1 − 2u 3(1 + 2α2 ) γ2 |a12 |, ≤ 2        1 1   a21 − a21 | = β −1 ≤ − 1 |β| < γ2 |a21 |, | ξ8,9 (1 − u)2

and a22 − a22 | = |a22 (ξ7 − 1)| ≤ u|a22 |. | Note that we utilize 1 − 2α2 > 0 in (13). Hence we have c :=

3(1 + 2α2 ) γ2 , 2

which implies c ≤ 5.5u for u ≤ 2−9 .

386

(11) (12)

(13)

2.2 Explicit inversion without scaling  denote a computed solution of Ax = b obtained by As is the case with the previous subsection, let x using (6). The process of solving Ax = b via (6) can be written as follows:

μ := β 2 − a11 a22 ,

a22 β , g12 = g21 := , μ μ x2 := g21 b1 + g22 b2 .

g11 := −

x1 := g11 b1 + g12 b2 ,

g22 := −

a11 μ

Taking the rounding errors into account yields  := (β 2 ξ1 − a11 a22 ξ2 )ξ3 , μ 1 := (g11 b1 ξ7 + g12 b2 ξ8 )ξ9 , x

Hence, we have 1 = x  μ

a22 β ξ4 , g12 = g21 := ξ5 ,   μ μ 2 := (g21 b1 ξ10 + g22 b2 ξ11 )ξ12 . x

g11 := −



Then

−a22 ξ4,7,9 βξ5,10,12

βξ5,8,9 −a11 ξ6,11,12



x  = b, A

with ψ := It follows that  − A| ≤ ε|A|, |A

ξ6,11 ⎢ ξ9 a11  := ψ ⎢ A ⎢ ⎣ ξ5,10 β ξ9

g22 := −

a11 ξ6  μ



b.

⎤ ξ5,8 β ⎥ ξ12 ⎥ ⎥ ⎦ ξ4,7 a22 ξ12

(β 2 ξ1 − a11 a22 ξ2 )ξ3 . β 2 ξ5,5,8,10 − a11 a22 ξ4,6,7,11     ξa,b   ε := max  ψ · ξ − 1 . (a,b,c)=(6,11,9),(5,8,12), c (5,10,9),(4,7,12)

Substituting ξi by (1 − u) or (1 + u) to maximize ε yields  2   β (1 + u) − |a11 a22 |(1 − u) (1 + u)3   − 1 ε ≤  2 ·  4 4 β (1 − u) − |a11 a22 |(1 + u) 1−u β 2 {(1 + u)4 − (1 − u)5 } + |a11 a22 |(1 − u)(1 + u)3 u = {β 2 (1 − u)4 − |a11 a22 |(1 + u)4 }(1 − u)   (1 + u)4 − (1 − u)5 2 3 + α (1 + u) u {(1 − u)4 − α2 (1 + u)4 }−1 . ≤ 1−u

Here we have (1 + u)4 − (1 − u)5 u 9u u + α2 (1 + u)3 u ≤ + α2 = (9 + α2 ). 1−u 1 − u 1 − 3u 1 − 3u Moreover, by Lemma 2.1 it holds for u ≤ 2−7 that (1 − u)4 − α2 (1 + u)4 ≥ (1 − 10u)(1 − α2 ). Thus ε≤

u 9 + α2 9 + α2 · γ13 =: c , ≤ (1 − 3u)(1 − 10u) 1 − α2 13(1 − α2 )

which implies c ≤ 16u for u ≤ 2−13 .

387

2.3 Explicit inversion with scaling  denote a computed solution of Ax = b obtained by As is the case with the previous subsection, let x using (7). The process of solving Ax = b via (7) can be written as follows: a11 a22 w1 := , w2 := β β w2 1 w1 , g12 = g21 := − , g22 := μ := β(w1 w2 − 1), g11 := μ μ μ x1 := g11 b1 + g12 b2 , x2 := g21 b1 + g22 b2 .

Taking the rounding errors into account yields a11 a22 1 := 2 := ξ1 , w ξ2 w β β  := β(w 1 w 2 ξ3 − 1)ξ4,5 , μ

2 1 1 w w ξ6 , g12 = g21 := − ξ7 , g22 := ξ8    μ μ μ 2 := (g21 b1 ξ12 + g22 b2 ξ13 )ξ14 . x

g11 :=

1 := (g11 b1 ξ9 + g12 b2 ξ10 )ξ11 , x

Hence, we have = x

(a11 a22 ξ1,2,3 − β 2 )ξ4,5

Then

a22 ξ2,6,9,11 −βξ7,12,14



x  = b, A

with ψ := It follows that where



1

ξ1,8,13 ⎢ ξ11 a11  := ψ ⎢ A ⎢ ⎣ ξ7,12 β ξ11

−βξ7,10,11 a11 ξ1,8,13,14 ξ7,10 β ξ14

ξ2,6,9 a22 ξ14



b.

⎤ ⎥ ⎥ ⎥ ⎦

(β 2 − a11 a22 ξ1,2,3 )ξ4,5 . − a11 a22 ξ1,2,6,8,9,13

β 2 ξ7,7,10,12

 − A| ≤ max(ε1 , ε2 )|A|, |A

    ξa,b,c  max ψ· − 1 ε1 :=  (a,b,c,d)=(1,8,13,11), ξd

and

(2,6,9,14)

    ξa,b  ε2 := max ψ· − 1 .  (a,b,c)=(7,10,14), ξc (7,12,11)

In a similar way to the previous subsection, it holds that     (1 + u)5 β 2 − |a11 a22 |(1 − u)3  max(ε1 , ε2 ) ≤  2 − 1 ·  4 6 β (1 − u) − |a11 a22 |(1 + u) 1−u β 2 {(1 + u)5 − (1 − u)5 } + |a11 a22 |(1 − u)(1 + u)5 {(1 + u) − (1 − u)2 } = {β 2 (1 − u)4 − |a11 a22 |(1 + u)6 }(1 − u)   5 (1 + u) − (1 − u)5 2 5 + 3α (1 + u) u {(1 − u)4 − α2 (1 + u)6 }−1 . ≤ 1−u Here we have 10u u 3u (1 + u)5 − (1 − u)5 + 3α2 (1 + u)5 u < + α2 = (10 + 3α2 ). 1−u 1 − 5u 1 − 5u 1 − 5u Moreover, by Lemma 2.1 it holds for u ≤ 2−11 that (1 − u)4 − α2 (1 + u)6 ≥ (1 − 11u)(1 − α2 ). Thus max(ε1 , ε2 ) ≤

10 + 3α2 u 10 + 3α2 · γ16 =: c , < (1 − 5u)(1 − 11u) 1 − α2 16(1 − α2 )

which implies c ≤ 19.1u for α ∈ [0.64, 0.641] and u ≤ 2−13 .

388

Fig. 1. Distribution histogram of ω(x ) for 106 data sets of normally distributed pseudo-random numbers among 500 bins.

) for 106 data sets of normally distributed Table I. Statistical results of ω(x pseudo-random numbers.

GEPP Median Maximum value Standard deviation

0.65 × 10−16 3.97 × 10−16 ∼ 3.57u 0.40 × 10−16

Explicit inversion without scaling 0.86 × 10−16 7.54 × 10−16 ∼ 6.79u 0.62 × 10−16

Explicit inversion with scaling 0.89 × 10−16 8.03 × 10−16 ∼ 7.23u 0.64 × 10−16

3. Sharpness of the error bounds Recall from Theorem 1.1 that ⎧ for u ≤ 2−9 (GEPP) ⎪ ⎨ 5.5u c ≤ 16u for u ≤ 2−13 (explicit inversion without scaling as in (6)) ⎪ ⎩ 19.1u for u ≤ 2−13 (explicit inversion with scaling as in (7)) √ holds for α = (1 + 17)/8. In this section we show how sharp the error bounds are. We assume the use of IEEE 754 binary64 floating-point arithmetic with rounding to nearest (ties to even), and then u = 2−53 . Let us first consider the following example: For v := 2−27 , let ⎧ a := 23(1 + 9v) ⎪ ⎪ ⎪ 11 ⎪ ⎪ a ⎨ 22 := −36(1 + 20v + 16u) β = a12 = a21 := 46(1 − 14v) ⎪ ⎪ ⎪ b ⎪ 1 := 16 − 15v ⎪ ⎩ b2 := −32(1 − 16v + u). In case of using GEPP, we have | a12 − a12 | ≤ c = 5.816 · · · × 10−16 ∼ 5.24u, which is slightly ) = smaller than 5.5u but shows the sharpness of the error bound in a way. On the other hand, ω(x −16 ) and c is due to the 4.403 · · · × 10 ∼ 3.97u. As mentioned before, the difference between ω(x arbitrariness of ΔA. Thus it turns out that the bound 5.5u for GEPP is very reasonable with respect ). In a similar way, we also think we may find examples that nearly to c , but not necessarily to ω(x attain the other bounds of c for explicit inversions. ) from a statistical perspective. In Next, we evaluate c in Theorem 1.1 as an upper bound of ω(x 6 ) for 10 data sets (a11 , a22 , β, b1 , b2 ) of normally Fig. 1, we display a distribution histogram of ω(x distributed pseudo-random numbers. Note that all the data sets satisfy the conditions (2) and (3). As expected, we can see from Fig. 1 that the results seem to be log-normally distributed. The medians, maximum values and standard deviations are also displayed in Table I. It can be seen from the results ). We think this is reasonable as that c in Theorem 1.1 is at most about three times larger than ω(x ). an a priori estimate for ω(x

389

Acknowledgments The authors wish to thank the two anonymous referees for their valuable comments.

References [1] J.R. Bunch and B.N. Parlett, “Direct methods for solving symmetric indefinite systems of linear equations,” SIAM J. Numer. Anal., vol. 8, no. 4, pp. 639–655, 1971. [2] J.R. Bunch and L. Kaufman, “Some stable methods for calculating inertia and solving symmetric linear systems,” Math. Comp., vol. 31, pp. 163–179, 1977. [3] N.J. Higham, “Stability of the diagonal pivoting method with partial pivoting,” SIAM J. Matrix Anal. Appl., vol. 18, no. 1, pp. 52–65, 1997. [4] I. Slapniˇcar, “Componentwise analysis of direct factorization of real symmetric and hermitian matrices,” Linear Alg. Appl., vol. 272, no. 1–3, pp. 227–275, 1998. [5] H.R. Fang, “Stability analysis of block LDLT factorizations for symmetric indefinite matrices,” IMA J. Numer. Anal., vol. 31, no. 2, pp. 528–555, 2011. [6] N.J. Higham, “Accuracy and Stability of Numerical Algorithms,” 2nd ed., SIAM, Philadelphia, PA, 2002.

390

Suggest Documents