A Columnwise Recursive Perturbation Based Algorithm ... - CiteSeerX

2 downloads 0 Views 168KB Size Report
Inde nite Factorization. (CRSIF) of A1:m;1:n if (n = 1) k = maxk+1 i m jAi;kj if ( k > 1 ) if ( 2 k > jAk;kj) ... the question is, how much does LDLT change when we add ...
A Columnwise Recursive Perturbation Based Algorithm for Symmetric Inde nite Linear Systems Fred Gustavson IBM Watson Research Center P. O. Box 218 Yorktown Heights NY 10598, USA Jerzy Wasniewski UNIC Building 304 - DTU DK-2800 Lyngby, Denmark

Abstract It has been recently shown that

the recursive formulation of some Numerical Linear Algebra algorithms leads to better performance on comuter architectures with hierarchical memory. In the present paper we discuss a recursive algorithm for the solution of symmetric inde nite linear systems. A columnwise perturbation approach is presented. Numerical tests on an IBM SMP node illustrate the better performance compared to the corresponding LAPACK subroutines. Key words: recursive algorithm, symmetric indefinite system

1 Introduction Recursive algorithms for dense linear algebra problems are developed in [1, 2, 6, 9]. They show that recursion leads to better performance on processors with a tiered memory sys This research is supported by the UNIC collaboration with the IBM T.J. Watson Research Center at Yorktown Heights. The third author was partially supported by the Danish Natural Science Research Council through a grant for the EPOS project (Ecient Parallel Algorithms for Optimization and Simulation). The last author was partially supported by the Grant I-702/97 from the Bulgarian Ministry of Education and Science.

Alexander Karaivanov UNIC Building 304 - DTU DK-2800 Lyngby, Denmark Plamen Yalamov University of Rousse 7017 Rousse, Bulgaria tem. Also, the codes using recursion are very concise, and easy to write in languages that support recursion (e. g. Fortran90). One of the important problems in practice is the solution of symmetric inde nite linear systems. Two possible recursive algorithms for this problem are discussed in [8]. Some dif culties with the recursive formulation of the LAPACK SYSV [3] algorithm are given, and a perturbation approach to overcome them is presented. Then in [10] another way of formulating a recursive perturbation algorithm is discussed. It is based on approximating A?1 by B ?1 where A ? B has low rank. Here B ?1 can be computed cheaply and because A ? B has low rank A?1 is also computed cheaply. We compute A?1 from B ?1 by standard techniques; e. g. the Sherman-Morrison-Woodbury (SMW) formula [5, p.129]. Experiments show that the two recursive perturbation algorithms [8, 10] will fail for some matrices (although exotic). Here we would like to propose another perturbation approach which has better stability properties.

2 The perturbation approach In [8] we presented a perturbation algorithm which overcomes some of the diculties aris-

ing from the recursive formulation of the LAPACK algorithm with Bunch-Kaufman pivoting. We tested this algorithm with "randomly" generated matrices, and the performance results were quite promising. The algorithm produces the following factorization:

A = LDLT ; where L is perturbed unit lower triangular, and D is perturbed diagonal. Then this factorization is used to solve a linear system of equations. Since the factorization is perturbed, we apply iterative re nement at the end in order to add more correct digits to the solution. The problem with the perturbation algorithm in [8] is that upper bounds on the entries of L are not so evident. It is known (see [4]) that the growth in L can lead to instability. Here we would like to propose a new version of the perturbation algorithm which overcomes this diculty. The factorization part of the algorithm is given as follows:

k=1

Columnwise Recursive Symmetric Inde nite Factorization (CRSIF) of A1:m;1:n if (n = 1) k = maxk+1im jAi;k j if (k > 1 ) if (2k > jAk;k j) end Ak;k = Ak;k + Sgn(Ak;k)2k  end else if (k > jAk;k j) Ak;k = Ak;k + Sgn(Ak;k )k 

end end

Dk;k = Ak;k Lk+1:m = Ak+1:m;k =Dk;k

else

n1 = n=2 n2 = n ? n1 PRSIF of A:;k:k+n1?1 update A:;k:k+n2?1 PRSIF of A:;k:k+n1?1

end

Since matrix A is square, we must set m = n when we rst call CRSIF. The number  is a parameter of the algorithm which we will discuss later. Let us note that the recursion in [8] splits both rows and columns. Here the recursion works only with whole columns (therefore we call the algorithm columnwise). The bene t allows us to control the growth in L. From the description of the algorithm it can be seen that the entries of L which is stored in the lower triangle of A satisfy

jLi;j j  C=;

for some constant not depending on  . We do not present a proof of this fact here (a proof is available from the authors). After some manipulation we can also show that in the updated part of A, i. e. A:;k:k+n2?1 , we have similar bounds (assuming the original matrix was scaled appropriately). Thus, this modi cation should work in a more stable way. Although this factorization is perturbed, we can use it for solution of linear systems. For this purpose we need a small number of iterative re nement steps [7, x3.5.3]. Because of adding perturbations it is possible that iterative re nement will fail. So, the question is, how much does LDLT change when we add perturbations. Hence we consider the product LDLT = A: For any diagonal entry we have i?1 X (1) Ai;i = Di;i + L2i;j : j =1

If a perturbation is necessary then we add i  (the following conclusions are true also if we add 2i  ) to Di;i, and we have i?1 X (2) A~i;i = Di;i  i  + L2i;j : j =1

By comparing (1) and (2) we see that A~i;i = Ai;i  i : For the whole algorithm we operate not on the original matrix A but on A~, where A~ = A + A; jAj  iI; (3)

i. e. A is diagonal. From practical considerations we think that the best choice of  is the square root of the machine precision. Now iterative re nement is expected to work well when i  is not large as we only add small perturbations to the original matrix A. The case when i  is large forces us to compute A~?1 . We do this by using a rank k correction to A where k is the number of changes where i  is large. See [10] where this approach is studied. Unfortunately, when k is large the cost of computing the correction is large and so performance degrades. In order to reduce the cases when we add large perturbations, and as a result use large rank k corrections, we would like to propose a more sophisticated choice of the perturbation. The corresponding algorithm is given below. The rank k correction is done by the SMW folmula.

Perturbed Recursive SMW Symmetric Inde nite Factorization (PRSMWSIF) of A(1 : m; 1 : n) if (n = 1) k = maxk+1im jAi;k j if (k > 1 ) if (2k > jAk;k j)  = Sgn(Ak;k )(2k ? jAk;k j) Ak;k =  if ( > fp()) uk = jj vk = ?sign()

end else if (k > jAk;k j)

 = Sgn(Ak;k )(k ? jAk;k j) Ak;k =  if ( > fp()) uk = jj vk = ?sign()

end end

Dk;k = Ak;k Lk+1:m = Ak+1:m;k =Dk;k

else

n1 = n=2

n2 = n ? n1 PRSMWSIF of A:;k:k+n1?1 update A:;k:k+n2?1 PRSMWSIF of A:;k:k+n1?1 end This algorithm assures bounded entries of both matrix L, and the reduced matrix (the lower right corner of A). Theoretical analysis for this will be presented in a future work. After the factorization is nished we rst apply 1-2 steps of iterative re nement to reduce the in uence of small perturbations of order O(). Then we apply the SMW formula to remove the in uence of large rank k corrections. In this way the fast iterative re nement scheme is combined with the slower but more stable rank k correction scheme. At the same time small perturbations (leading to iterative re nement) are used wherever possible.

3 Numerical experiments The test are done in Fortran 90 on an IBM SMP node with four processors in double precision. The exact solution is chosen to be x = (1; : : :; 1)T everywhere. We tested three kinds of matrices:  Series 1, randomly generated matrices, and  Series 2, matrices with the following structure1 : !  B A = BT I ;

where  is diagonal with small entries, B is random with large entries, and I is the identity. The entries of  are chosen to be less than  ( equal to 10?15). In this way the CRSIF algorithm is forced to make perturbations.

 Series 3: matrices similar to Series 2: !  B A = BT 0 ; 1

Suggested to us by John Reid.

2

1.8

1.7

1.8

Speedup

1.6

Speedup

1.5

1.4

1.6 1.4

1.3

1.2

1.2

1.1

1 1

0

200

400

600

800

1000

n

−11

10

−12

Error

10

−13

10

−14

0

200

200

400

600

800

1000

n

Figure 1: Speedup results: SYSV/PRSYSV (), PSYSV/PRSYSV (- -), and PSYSV/SYSV (: : :)

10

0

400

600

800

1000

n

Figure 2: Error results: SYSV (-), PSYSV(- -), and PRSYSV (: : :) The rst algorithm is tested with Series 1 and Series 2. Our algorithm which is based on the CRSIF is called PRSYSV. In order to show the advantage of recursion we changed the Bunch-Kaufman pivoting to perturbation approach in the LAPACK subroutine. The resulting algorithm is called PSYSV. In this way the comparison between PSYSV and PRSYSV shows the speedup introduced by recursion only. The results for Series 1 are given in Fig. 1{2. The algorithm behaves quite well in Series 1 because the perturbations we add are small enough. In Series 2 the error blows up because this assumption is not valid, and we do not present the experiments.

Figure 3: Series 1. Speedup results: SYSV/PRSMWSYSV (-), PSMWSYSV/PRSMWSYSV (- -), SYSV/PSMWSYSV(: : :) For the second algortihm we present results with the three series. The results are presented in Fig. 3{8. Our algorithm is denoted by PRSMWSYSV. A LAPACK SYSV based code is developed (PSMWSYSV) with our combined approach instead of the BunchKaufman pivoting. We see that the combination of the two types of perturbations works well. Of course, it can be slow if we add too many large perturbations. The experiments show that the recursive formulation of the algorithms can speed up the algorithms essentially. Unfortunately, we should take care of the numerical stability as well. This forces us to apply such techniques as perturbations and small rank corrections. In some cases this slows down the algorithm. This issue needs more research to improve the present algorithms.

−11

10

−12

Error

10

1.4 1.2

−13

−14

10

0

200

400

600

800

1000

Speedup

10

1 0.8 0.6

n

0.4

Figure 4: Series 1. Error results: SYSV (-), PSMWSYSV(- -), and PRSMWSYSV (: : :) 2

0

200

400

600

800

1000

n

Figure 7: Series 3. Speedup results: SYSV/PRSMWSYSV (-), PSMWSYSV/PRSMWSYSV (- -), and PSMWSYSV/SYSV (: : :)

1.8

Speedup

0.2

1.6 1.4 1.2 1

0

200

400

600

800

1000

n

Figure 5: Series 2. Speedup results: SYSV/PRSMWSYSV (-), PSMWSYSV/PRSMWSYSV (- -), SYSV/PSMWSYSV(: : :)

−11

10

Error

−11

−10

10

10

−12

10

−13

10

−12

Error

10

−14

10 −13

10

200

400

600

800

1000

n

−14

10

0

0

200

400

600

800

1000

n

Figure 6: Series 2. Error results: SYSV (-), PSMWSYSV(- -), and PRSMWSYSV (: : :)

Figure 8: Series 3. Error results: SYSV (-), PSMWSYSV(- -), and PRSMWSYSV (: : :)

References [1] B. Andersen, F. Gustavson, J. Wasniewski. A recursive formulation of the Cholesky factorization operating on a matrix in packed storage form. In: Parallel Processing for Scienti c Computing, Proceedings of the Ninth SIAM Conference on Parallel Processing for Scienti c Computing, San Antonio, TX , USA, March 24-27, 1999. [2] B. Andersen, F. Gustavson, J. Wasniewski, P. Yalamov. Recursive formulation of some dense linear algebra algorithms. In: Parallel Processing for Scienti c Computing, Proceedings of the Ninth SIAM Conference on Parallel Processing for Scienti c Computing, San Antonio, TX , USA, March 24-27, 1999. [3] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, D. Sorensen. LAPACK Users' Guide Release 2.0. SIAM, Philadelphia, 1995. [4] C. Ashcraft, R. Grimes, J. Lewis. Accurate symmetric inde nite linear equation solvers. SIAM J. Matrix Anal. Appl., 20:513{561, 1999. . Bjorck. Numerical Methods for Least [5] A Squares Problems. SIAM, Philadelphia, 1996. [6] E. Elmroth, F. Gustavson. New serial and parallel recursive QR factorization algorithms for SMP systems. In: Applied Parallel Computing, (Eds. B. Kagstrm  et. al.), Lecture Notes in Computers Science, v. 1541, Springer, 1998. [7] G. Golub, C. Van Loan. Matrix Computations, 3rd edition. John Hopkins University Press, Baltimore, 1996. [8] A. Gupta, F. Gustavson, A. Karaivanov, J. Wasniewski, P. Yalamov. Experience

with a Recursive Perturbation Based Algorithm for Symmetric Inde nite Linear Systems. (submitted to EuroPar'99) [9] F. Gustavson. Recursion leads to automatic variable blocking for dense linearalgebra algorithms. IBM J. Res. Develop., 41:737{755, 1997. [10] F. Gustavson, A. Karaivanov, J. Wasniewski, P. Yalamov. Application of the Sherman-Morrison-Woodbury Formula to a Recursive Algorithm for Symmetric Inde nite Linear Systems. (submitted to EuroPar'99)