1 Introduction 2 The PCG Algorithm - Google Sites

1 downloads 110 Views 244KB Size Report
used. In a second order analysis, the term related to Tseq should be multiplied for a factor < 1 as a consequence a s
A Parallel Implementation of the Conjugate Gradient Method on the Meiko CS-2 Antonio d' Acierno, Giuseppe De Pietro, Antonio Giordano IRSIP - CNR via P. Castellino 111, 80131 Napoli (Italy)

Abstract

In this paper we describe a parallel implementation of a preconditioned conjugate gradient algorithm on the Meiko CS-2. The parallel implementation is based on the use of a global reduction operator this allows to obtain a simple and ecient code whose performance is easily and precisly predictable. We describe and validate a performance model and we show and comment experimental results

1 Introduction Solving large sparse systems of linear equations is an integral part on mathemetical computing and nds application in a variety of elds such as uid dynamics, structural analysis, atmospheric modeling and so on. Iterative methods such as the conjugate method for solving such systems are becoming increasingly appealing as they can be easily parallelised 1, 2, 3, 4]. In this paper we describe a parallel implementation of a Preconditioned Conjugate Gradient (PCG) algorithm on the Meiko CS-2. The implementation is based on the use of reduce operations which are well supported by the CS-2 hardware. The paper is organised as follows. In section 2 the PCG algorithm is brie y descrided and the sequential code is scketched section 3 describes the CS-2 hardware and introduces the reduce operation supported by the used hw/sw system. In section 4 we describe the parallel implementation while, in section 5, we propose and validate a performance model and we show and comment experimental results conclusions are the concern of section 6.

2 The PCG Algorithm The PCG algorithm is used for solving systems of linear equations in the form: 1

A~x = ~b

(1)

where A is a N  N (symmetric positive de nite) sparse matrix. Let Ki the number of non-zero elements in the i ; th column and let Kmax = maxi Ki , i = 0 : : :  N ; 1. In our implementation the matrix A is stored using two Kmax  N matrices, AC and ROWS, where AC is a compressed version of the matrix A and ROWS contains the indices of the elements of A stored in AC, so that:

AC i] j ] = A ROWS i] j ]] j ]

(2) The matrix vector product of a compressed matrix for a vector can be described in the following way: void mv_prod (int n, int K_MAX, double *result, double **matr, int **rows, double *vect) { int i,j for (i=0i