POPE : A \Paradigm-Oriented" parallel ... - Semantic Scholar

POPE : A \Paradigm-Oriented" parallel Programming Environment for SIT algorithms F.A. Rabhi and J. Schwarz Department of Computer Science University of Hull, Hull HU6 7RX (UK) Email : [email protected] Functional Languages Implementation Workshop. Paper 37 Abstract A more ecient control of the implicit parallelism in functional programs is possible if parallel algorithmic structures (or skeletons) are used in the design of algorithms. A structure captures the behaviour of a parallel programming paradigm and acts as a template in the design of an algorithm. This paper addresses the issue of de ning a structure for static iterative transformation (SIT) algorithms which exploit coarse-grained data parallelism. The parameters required by the structure, which form the problem speci cation, are supplied by the programmer using the Haskell functional language. The paper shows how this speci cation can be successively turned into a sequential functional program, then into a parallel program for a graph reduction machine, and nally into a program that maps on a speci c parallel architecture.

Keywords. Functional languages, data parallel algorithms, skeletons, iterative algorithms, Haskell.

1 Introduction Two major problems are hampering the acceptance of functional languages as a means of programming parallel systems. First is the diculty in writing some programs with sucient inherent parallelism. Second is the unpredictable performance of programs due to problems of load-balancing, grain size, and locality of references [23]. One potential solution to both these problems involves the use of constructs known as parallel algorithmic structures (or skeletons). There are a number of projects underway involving skeletons in one or another form. These include, most notably, the work of Darlington et al. [6], Kelly [12], Cole [5] and Bratvold [4]. Such skeletons capture the behaviour of an entire class of parallel algorithms or a paradigm. Any algorithm that obeys a known paradigm can then be speci ed by using its corresponding parallel algorithmic structure as a template, leaving the lower level details of exploiting parallelism to the implementation. Therefore, the process of designing a program is entirely \paradigm-oriented": a user enters the problem speci cation by de ning the parameters required by the corresponding structure. This approach bene ts 1

37 { 2

Implementation of Functional Languages '94, UEA, Norwich

the programmer in providing convenient high-level parallel concepts, while expressing the parallelism in a non-architectural way provides the opportunity for ecient implementation across parallel architectures. This paper addresses the issue of de ning a parallel algorithmic structure for a speci c class of parallel algorithms, called static iterative transformation (or SIT) algorithms [21]. We show what the user needs to de ne in the speci cation, and how this speci cation can be successively converted into a sequential functional program (Section 4), then into a parallel program for a graph reduction machine (Section 5), and nally into a program that maps onto a speci c parallel architecture (Section 7). A prototype implementation using C and PVM is described in Section 7.2.

2 Iterative Transformation Algorithms These algorithms operate on a set of homogeneous data objects. These objects are transformed through several iteration steps. During an iteration step, one or more of the following operations can be performed:

local operation : each object performs a computation using local data or data received from other objects broadcast : global information is made available to all objects combine : global information is obtained by combining local data

Local communication between objects can form a variety of communication patterns. An algorithm is static if there is no change to the shape of the communication pattern, and dynamic if the number of objects changes at run-time. In a static algorithm, parallelism arises when applying a local operation simultaneously to every object. In a dynamic algorithm, additional parallelism arises when a group of objects can be combined independently from other groups. In the rest of the paper, we only consider the static case. This class of algorithms is of great importance in scienti c and engineering applications such as image processing, numerical analysis and nite element methods. They can be considered as data-parallel algorithms [10] with an iterative control structure and with a coarser grain than simple vector processing. In other classi cations, they correspond to geometric parallelism[9] or domain partition algorithms[14].

3 The user's speci cation To design an iterative transformation algorithm, the following components have to be speci ed by the user:

the size and shape of the set a description of the objects and the information held by these objects how these objects communicate

POPE : A Paradigm-Oriented Parallel Programming Environment

37 { 3

the local computations involved, which include a description of the initial state for each

object, and how a state changes from one iteration to another the global computations involved, which include a description of the global variables, their initial values and how these values change from one iteration step to another

The rest of this section shows how to enter these various parameters using a functional language. In the examples, functions are de ned using the Haskell[11] syntax.

3.1 Data structures As the number of objects does not change, we assume that the size of the set is de ned as a special constant wsetsize (of an appropriate type Size) whose value can be accessed anywhere in the speci cation.

wsetsize :: Size Each object carries p items of information in a tuple of type State.

type State = S1 : : : S

p

Each object is uniquely identi ed in the set through a global coordinates system. Global information that is accessible by all the objects is also represented as a tuple of type Global:

type Global = G1 : : : G

q

3.2 Transformations A transformation occurs during an iteration cycle. To make the process of de ning a transformation easy, local state and global variables are paired into a tuple ((s1; s2; : : : ; s ); (g1; g2; : : : ; g )) and transformations can be de ned as: p

q

transf ((s1; s2; : : : ; s ); (g1; g2; : : :; g )) = ((s01; s02; : : : ; s0 ); (g10 ; g20 ; : : : ; g0 )) p

q

p

q

The left component of the result tuple (s01; s02; : : : ; s0 ) represents the new state (in each object) and the right hand side (g10 ; g20 ; : : :; g0 ) represents the new global data. All communication is implicit through variable name references. For example, if a global variable g is used in the computation of one of the g0 expressions, its value is locally accessed. If it is needed in the computation of one of the s0 expressions, a broadcast operation would have to be performed. The left component may also contain references to the self index coord in case objects need to know their position in the set. Local neighbourhood communication is achieved through references to external expression lists. An external expression list is of the form exp@dest where exp is an arbitrary expression and dest is a list of neighbour coordinates. A condition imposed in the paradigm is that the expression exp must be computed strictly by each of the objects whose coordinates are in dest and the cumulated results are returned as a list. An expression computed remotely may refer to a: p

q

i

i

i

37 { 4

Implementation of Functional Languages '94, UEA, Norwich

remote state variable s : which corresponds to a value in the state before it is modi ed j

in the iteration cycle

remote state variable s0 : which corresponds to the value retained after the end of the j

iteration cycle

Another condition is that the new state is computed strictly (i.e. in normal form) so that there are no references to the old state at the start of the new iteration step.

3.3 Initial conditions and termination The initial state (s1 ; s2 ; : : :; s ) for each object and the initial global values (g1 ; g2 ; : : : ; g ) are de ned using a parameterless transformation init. i

pi

i

i

i

qi

init = ((s1 ; s2 ; : : :; s ); (g1 ; g2 ; : : : ; g )) i

i

pi

i

i

qi

The initial state values may contain references to the self-index coord but should not contain external expression lists. The nal component in the speci cation is the termination condition terminate. Termination is decided based upon the value of the global data.

terminate :: Global ! Boolean Given a transformation transf to be executed during the iteration cycle, the entire SIT problem can be expressed using the function sit$ (de ned later) which produces the executable speci cation:

problem = sit$ terminate transf init

3.4 Examples The following examples show how to de ne speci cations for various SIT algorithms.

3.4.1 Example 1 : Iterative methods Iterative methods work by continuously re ning a solution to a problem until an acceptable one is reached. A well known example of iterative methods is when solving a set of equations Ax = b where A is an n n matrix, b a vector of size n and x is the vector of size n to be determined. These methods only converge when some conditions apply but this is not discussed in this paper for the sake of simplicity. Three methods are considered here: the Jacobi relaxation, the Jacobi over-relaxation and the Gauss-Seidel relaxation. For each of these methods, the corresponding speci cation will be given.

POPE : A Paradigm-Oriented Parallel Programming Environment

37 { 5

In the Jacobi relaxation, each point x in the vector x is re ned according to the following equation: 0 1 X ? 1 x (t + 1) = a @ a x (t) ? b A i

i

ii

j

ij

6=

j

i

i

We assume that the algorithm stops when the dierence between the old value and the new one at every point is less than some threshold. The objects are arranged into a chain of size wsetsize. In this SIT speci cation, each object is uniquely identi ed by its position i in the chain where 1 i wsetsize. In the calculation, an object i only requires the row i of the matrix A. Therefore, the state of an object consists of the row A , the constant b and the variable x . We choose to keep the maximum dierence between two successive values in all the grid as a global variable. A function sumprod that computes the sum of the product between a row in the matrix (represented as a Haskell array r) and the list of values xs is de ned : i

i

i

sumprod j i r [] sumprod j i r (x:xs) | (j==i) | otherwise

= 0 = sumprod (j+1) i r (x:xs) = x*r!j + sumprod (j+1) i r xs

Each object needs to communicate with all the other objects so a function others is de ned: others x = [ y | y

POPE : A \Paradigm-Oriented" parallel ... - Semantic Scholar

POPE : A \Paradigm-Oriented" parallel ... - Semantic Scholar

Suggest Documents

Parallel Operators - Semantic Scholar

PARALLEL COMPUTERS AND PARALLEL ... - Semantic Scholar

Parallel-Wiki: A Collection of Parallel Sentences ... - Semantic Scholar

A learnable parallel processing architecture ... - Semantic Scholar

A Parallel Watershed Algorithm - Semantic Scholar

A geometrical data-parallel language1 - Semantic Scholar

Parallel Computing on a Hypercube - Semantic Scholar

A Language-Independent Parallel Refactoring ... - Semantic Scholar

a parallel multiplier generator - Semantic Scholar

A Generic Parallel Collection Framework - Semantic Scholar

A prospective randomised comparative parallel ... - Semantic Scholar

A domain decomposition parallel processing ... - Semantic Scholar

A prospective randomised comparative parallel ... - Semantic Scholar

Parallel Program Performance Metrics: A ... - Semantic Scholar

A Parallel BSP implementation - Semantic Scholar

NEMO5: A Parallel Multiscale Nanoelectronics ... - Semantic Scholar

A Parallel Computational Model for ... - Semantic Scholar

Parallel Performance Wizard: A Performance ... - Semantic Scholar

Neurovascular coupling: a parallel implementation - Semantic Scholar

A New Parallel Domain Decomposition ... - Semantic Scholar

Parallel Differential Evolution - Semantic Scholar

Parallel Pairwise Clustering - Semantic Scholar

PARALLEL VIDEO PROCESSING ... - Semantic Scholar

Parallel Mesh Generation - Semantic Scholar