As yet there is no unifying parallel programming ... recursive parallel computation over arrays as ad- vocated ... tems, and Gaussian elimination (see 8] for further.
A Simple Implementation of Divide and Conquer Parallelism (expressed in Haskell) Paul Roe School of Computing Science Queensland University of Technology GPO Box 2434, Brisbane, QLD 4001 Australia E-mail: proe@ t.qut.edu.au
Abstract
Divide and conquer (D&C) parallelism is a powerful paradigm of data parallel processing. In particular it is more powerful than the simple data parallelism oered by Fortran 90 and C*; it supports recursive parallel computation over arrays. Using the D&C paradigm results in programs which are easier to write and more portable than those written using other paradigms such as message passing. In the past special languages and systems have been proposed to support parallel D&C programming. This paper describes how abstractions for D&C programming may be implemented in Haskell using simple message passing. The method described should be applicable to most languages which support message passing. As yet there is no unifying parallel programming paradigm. Hence not all problems are amenable to D&C implementation. A big advantage of building divide and conquer abstractions on top of message passing is that the two paradigms may be freely combined. To date the D&C abstractions have only been tested using a message passing simulator. However it should be straightforward to port the implementation to a real message passing parallel system. Keywords: parallel programming, data parallelism, functional programming
1 Introduction
Parallel programming is becoming increasingly important. Unfortunately designing, writing and porting parallel programs is dicult. Data parallel programming is widely recognised as the simplest paradigm for parallel programming. Divide and conquer (D&C) parallelism represents a particularly expressive form of data parallelism, which has been advocated by many [2,3,7{9]. This is more powerful than the simple data parallelism oered by Fortran 90 and C*; in particular it supports recursive parallel computation over arrays as advocated by Mou [8,9]. This enables basic opera-
tions such as scan and aggregation to be expressed. It also enables many other parallel algorithms to be expressed in a clear and succinct manner, for example: monotonic sort, matrix multiplication, polynomial evaluation, linear lower triangular systems, and Gaussian elimination (see [8] for further details). In languages such as C* such algorithms must be provided as primitives or expressed in a more complex manner. In general D&C parallelism results in programs which are easier to write and more portable (see for example [3,10]) than those written using message passing or conventional data parallelism. In this paper some parallelism abstractions are de ned which support divide and conquer parallelism, and in particular parallel recursive computation over arrays. These abstractions are designed for implementation on a multicomputer using a language which supports message passing. In addition to the bene ts previously mentioned of divide and conquer parallelism, two other important bene ts arises from the approach taken here. Firstly, no special language or system is required to support the D&C abstractions. Although the implementation is described in terms of (message passing) Haskell it should be possible to implement the abstractions in most languages which support a basic message passing model of parallelism. Secondly, since the implementation is built on top of message passing it is possible to freely combine the two paradigms. This is particularly important since there is no single unifying parallel programming paradigm, and thus not all problems are suited to a D&C solution. Currently the implementationhas only been tested with a message passing simulator. However it should be straightforward to port this to a real parallel system. The structure of this paper is as follows: the next section describes the message passing base on which the parallel divide and conquer abstractions will be built. Section 3 describes divide and conquer parallelism via some example algorithms.
The key to these algorithms is recursive parallel computation over arrays. Section 4 describes the implementation of abstractions which support divide and conquer parallelism. Some example algorithms using these abstractions are also described. Due to space limitations only relatively simple algorithms are described; however the abstractions have a much wider use than shown here. The nal sections discuss further work and conclusions.
2 The message passing model
The divide and conquer model of parallelism which will be constructed is built on top of a message passing base. To simplify the exposition it is assumed that the target machine consists of a simple linear array of processors, and that the number of processors is an exact power of two. Note that neither of these restrictions are necessary; the divide and conquer model which will be described can be implemented on any multicomputer architecture which supports message passing1 . It is assumed that each processor can communicate only with its adjoining neighbours. From the software perspective the system comprises a network of communicating Haskell programs (processes), in a one to one correspondence with the processor network. (For information on Haskell and functional programming see [1,4,5].) We shall extend Haskell with some simple message passing operations2 . Haskell programs (processes) may communicate with adjacent programs via send and receive functions. send :: Comm a => Dir -> a -> IO () receive :: Comm a => Dir -> IO a
The send and receive functions are overloaded functions (analogous to read and show in Haskell), and Comm is the class representing communicable values (analogous to Text). The idea is that actual communication is performed by lower level functions which, for example, operate on byte arrays. Then send and receive may be implemented using these lower level communications functions, plus functions for converting values to and from byte arrays. For simplicity it is assumed that send is not blocking. In the case that send is blocking, more complex programs are required in order to achieve correct implementation. This strengthens the argument for using divide and conquer abstractions. With regards to directions (Dir) there are two directions for sending and receiving: left and right, 1 Some algorithms do require an exact power of two number of processors though, eg. magg algorithm described later 2 These operations may be easily implemented in Glasgow Haskell via its C language interface.
plus an operation oppDir which reverses a direction (eg. oppDir left = right). Since send and receive are imperative operations (they have side eects) their use must be controlled. This is achieved via the IO type. For the purposes of this paper the only imperative computation of concern is I/O; hence only a simpli ed view of integrating imperative computation in Haskell is presented (for a full description see [6]). The IO type represents I/O actions. All Haskell programs must be of type IO. This restricts all I/O to occur at the top level of a Haskell program. There is a useful analogy with C in that all IO actions produce results. For example receive given a direction produces an IO action containing the received value. The send operation sends a value in the speci ed direction. Its result is an IO action returning the unit type (), indicating that the action is performed solely for its eect. (The role of the unit type is rather like void in C.) In addition to send and receive two other basic IO actions will be required: procNo :: IO Int noProc :: IO Int
The procNo action produces the number of the processor on which the program is running. Processors are numbered from one. The noProc action produces the total number of processors as its result. These are all the basic IO actions required. (We do not concern ourselves with real I/O in this paper.) In order to write useful programs it is necessary to build and combine IO actions. The following functions are used to do this: returnIO :: a -> IO a thenIO :: IO a -> (a -> IO b) -> IO b seqIO :: IO () -> IO b -> IO b
The returnIO function produces an IO action which does nothing but returns the given value as a result. The thenIO function composes two actions in sequence. The second action is actually a function which is passed the result of the rst action. The seqIO function is used to compose in sequence an IO action, whose result is not required, with another IO action. Its role is like that of semicolon in imperative languages. Consider the following Haskell program: distFirstRight :: Comm a => a -> IO a distFirstRight a = procNo `thenIO` \pn -> noProc `thenIO` \np -> if pn == 1 then send right a `seqIO` returnIO a else if pn == np then
receive left else receive left `thenIO` \v -> send right v `seqIO` returnIO v
If this program is run on each processor in a linear array of processors, it will have the eect of broadcasting the value given to the rst processor to all other processors. Each program inspects the number of the processor on which it is running and the total number of processors, in order to determine what actions to perform. The rst processor sends its value right and returns the same value. The last processor receives a value from its left and returns that value. All other processors receive a value from their left neighbour, pass the value on to their right neighbour and return this value. Note that backquotes are Haskell's notation for an operator, and that the form \\v -> e" denotes a lambda abstraction. The body of the lambda abstraction e extends as far to the right as possible. As can be seen writing, debugging and understanding message passing programs is dicult. This is due to the complex behaviour of message passing programs. It is very easy to write incorrect programs. Also, writing reusable software is dicult, particularly for more complex problems, where algorithms and message passing details become inextricably intertwined. Parallel computers need to be programmed at a higher level of abstraction than message passing. This is all the message passing machinery which is required in order to implement divide and conquer abstractions. Before describing this implementation, divide and conquer parallelism is discussed.
3 Divide and conquer parallelism
The message passing model of parallelism is very simple to implement, and many parallel systems support this model eg. PVM. Unfortunately often it is not the easiest or most natural method in which to express parallel programs. In particular data parallelism, where applicable, is a much simpler programming paradigm. Divide and conquer (D&C) parallelism has been advocated by many, and represents a particularly powerful form of data parallelism [2,3,7{9]. Essentially data parallelism involves performing the same operation, in parallel, over a collect of data items. For a multicomputer this means distributing data items over processors, and running the same program on each processor, over local data items (SPMD parallelism). Consider the scan data parallel operation (also known as parallel pre x). This is an important parallel operation which is usually provided as a
primitive or library routine in data parallel languages. Using D&C parallelism it is possible to de ne the scan operation. Scan performs the following operation over a vector: scan [ 1 2 n] = [1 1 2 1 2 n] For example: scan (+)[1 2 3 4 5] = [1 3 6 10 15]. A pictorial description of one possible D&C implementation of scan on a linear array of processors is shown below: a ; a ; : : :; a a ;a
a ; : : :; a ;
f
? f
f
1
2
3
f
? f f
1
2
3
f
f
f
1
2
3
? f
4
;
;
f
5
f? f
a
;
a
;
? f
6
;
;
f
7
? f f
;
? f
8
f?
4
5
6
7
8
4
5
6
7
8
# ? f f
f? f? f?
Each processor, represented by a circle, contains one element of the vector. Arrows denote the communication of a value from one processor to another. Communicated values are combined with original values using the operator supplied as an argument to scan (which must be associative). The natural way to express this parallel divide and conquer algorithm is in terms of recursion over a vector. Some idealised Haskell code which expresses this is shown below: scan op vec | isUnit vec = vec | otherwise = concat l1 r3 where (l,r) = halve vec l1 = scan op l r1 = scan op r r2 = distribute (last l1) r1 r3 = pairmap op r2 r1
The vector operations halve and concat split a vector into two halves and concatenate two vectors, respectively. The distribute operation distributes a value over a vector. The pairmap operation applies an operator to corresponding elements from two vectors, forming a vector result. For example pairmap (+) is vector addition. The above scan de nition may be compared with the direct message passing implementation of scan given in the appendix. Although the implementations have a similar length, the message passing implementation is not as clear, nor is it as portable
as the one given above. It is dicult to determine the algorithm from the message passing code. Also the D&C implementation is constructed using general purpose operations such as pairmap and distribute. If all programs can be constructed using a few general purpose operations such as these then porting a program to another machine only involves porting the general purpose operations. We would like to express divide and conquer programs in a form similar to that shown previously, and in particular without having to resort to explicit low level message passing. The key to the above algorithm is the use of recursion over a vector. Our goal is to build some divide and conquer abstractions, expressed in terms of message passing, which support the expression of divide and conquer programs in a similar data parallel style to that shown previously. Before going on to describe how this goal may be achieved another divide and conquer program is described: multiple aggregation. Multiple aggregation combines together, using a function, values spread across processors; each processor receives a copy of the result. Like scan multiple aggregation is an important operation for parallel programming: magg [ 1 2 n] = [ 1 2 n
r1 l2 r2 l3 r3
= = = = =
magg op r distribute distribute pairmap op pairmap op
(first r1) l1 (last l1) r1 l1 l2 r2 r1
Notice how this function uses the same component functions as scan, namely: halve, concat, distribute and pairmap. This code may be compared with the direct message passing implementation shown in the appendix.
4 Implementation of D&C abstractions
How can divide and conquer parallelism be implemented in terms of message passing? Each processor of a multicomputer will run the same program, but will contain dierent data. We wish to treat data elements distributed over a machine's processors as a single composite data structure { an array. The key to the D&C algorithms described in the previous section is the expression of parallelism via recursion over an array. To implement this we de ne an abstract data type representing recursive arrays, together with a set of operations on this type. Each program/processor manipulates a single 1 2 n element of the recursive array; thus the recursive .. array operations will actually only operate on a . single local data element of the array. Since we 1 2 n] are dealing with a linear array of processors, a linFor example: magg (+) [1 2 3 4] = [10 10 10 10]. ear array, vector, will suce. For example, given A pictorial description of the communications the type of recursive arrays is: Vec, a vector of integers has type Vec Int. Each element of the involved in the algorithm is shown below: vector resides on a dierent processor, and is processed by the program running on that processor. f- f f- f f- f f- f Note that each processor runs the same program (SPMD parallelism). The simplest vector opera8 1 2 3 4 5 6 7 tion is vmap. This applies a function element-wise across the vector. It has type: a ; a ; : : :; a
;
?f
1
f- f
2
3
;
a
a
a ;
a
a
a ;
a
a
a
;
?f f?
4
5
;
;
f- f
6
7
;
f?
8
vmap :: (a->b) -> Vec a -> Vec b
The implementation of vmap is trivial, each pro-
cessor simply applies the function to its local ele?f f? f? f- f f? f? f? ment of the vector.
The power of divide and conquer parallelism comes from the ability to express parallel algorithms The divide and conquer algorithm to perform using recursion over vectors (or more generally arrays). This must be incorporated into the implemultiple aggregation may be expressed thus: mentation of Vec. The recursive computation over magg op vec arrays, as previously described, involves recursively | isUnit vec = vec halving arrays and recursively concatenating result | otherwise arrays. The key to this recursive computation is = concat l3 r3 that at each stage an element on a processor is where associated with a certain block of other elements. (l,r) = halve vec For example consider the recursive halving of a l1 = magg op l vector of eight elements: 1
2
3
4
5
6
7
8
form of
Recursive levels 1
1
2
3
4
5
6
7
8
2
1
2
3
4
5
6
7
8
3
1
2
computations. The implementation of utilises a function inRightHalf to determine whether each vector element being processed belongs to the left or right half of the whole vector. Based on this, either the left or right vector transformation is applied to each vector element. IO toHalves
3
4
5
6
7
8
toHalves :: (Vec a -> IO (Vec b)) -> (Vec a -> IO (Vec b)) -> Vec a -> IO (Vec b) toHalves l r (v,i) = procNo `thenIO` \pn -> (if inRightHalf pn i then r (v,i-1) else l (v,i-1)) `thenIO` concat where concat (v,_) = returnIO (v,i)
At each stage in the recursion each vector element is associated with a block of other elements, constituting a half, a quarter or an eighth of the original vector. If the number of processors is an exact power of two, then the recursion can be `tracked' using a recursion level counter. Thus by associating a To mix D&C computation and ordinary mescounter with each element it is possible to track sage passing computation, functions to construct the level of recursion (halving). In the case that Vecs and to extract Vec values are required: the number of processors is not an exact power of two then a pair of indices can be used to track mkVec :: a -> IO (Vec a) blocks of elements. For the remainder of the paper mkVec a = noProc `thenIO` \np -> it is assumed that the number of processors is an returnIO (a, log2 np) exact power of two. Thus the implementation of a Vec consists of associating a counter with each getVecVal :: Vec a -> a vector element, which records the current state of getVecVal (a,i) = a recursion (halving): These allow Vec to be made into an abstract data type Vec a = (a, Int) type. Two utility functions which operate on Vecs are We can now de ne vmap: next de ned. The vpair function produces a vector vmap :: (a->b) -> Vec a -> Vec b of pairs from two vectors: vmap f (v,i) = (f v, i)
To express D&C computations two basic functions are required. One to determine when recursive halving should stop (isUnit) and one to halve and concatenate a vector (toHalves). The implementation of isUnit is straightforward:
vpair :: Vec a -> Vec b -> Vec (a,b) vpair (a,it1) (b,it2) | it1 == it2 = ((a,b),it1) | otherwise = error "vpair: uneq vecs"
Notice how it is necessary for both vectors to have the same shape/size (this is known as conformance in C*). The pairmap function applies an operator to This function returns true when a vector has elements of a vector of pairs. been recursively halved to a unit vector. The toHalves function combines halving and :: (a->b->c) -> Vec (a,b) -> Vec c concatenating of vectors. It is necessary to com- pairmap pairmap f v = vmap (uncurry f) v bine these operations in order to guarantee that a function de nes a size-preserving vector transforare the all the basic Vec operations that mation. The generalised use of vector halving and are These required. concatenating is dicult to implement. For example what should happen if the result of a vector operation is a vector twice the size of the original? 4.1 Scan The toHalves function takes two vector trans- We are now almost in a position to de ne scan. All formations, and a vector as arguments. The trans- that remains is to de ne some vector communicaformations are applied to each respective half of tion functions. In particular for the scan operation, the vector, and the resulting vectors are concate- a function which distributes the last element of the nated. Since the transformations are likely to in- left half of a vector across all the elements in the volve communication the transformations take the right half is required: isUnit :: Vec a -> Bool isUnit (_,i) = i == 0
distLastLR :: Comm a => Vec a -> IO (Vec a) distLastLR (v,i) = procNo `thenIO` \pn -> if isLeftLast pn i then send right v `seqIO` returnIO (v,i) else if isRightLast pn i then receive left `thenIO` \u -> returnIO (u,i) else if inRightHalf pn i then receive left `thenIO` \u -> send right u `seqIO` returnIO (u,i) else returnIO (v,i)
This function is a vector transformation. Since it uses communication it must take the form of an IO computation. This function is not speci c to scan and may be used in any recursive vector computation. In general a library of vector communication (transformation) operations is required. Using distLastLR and the other Vec operations scan may be de ned thus: scan :: Comm a => (a->a->a) -> Vec a -> IO (Vec a) scan op v | isUnit v = returnIO v | otherwise = toHalves (scan op) (scan op) v `thenIO` \u -> distLastLR u `thenIO` \w -> toHalves (returnIO . vmap fst) (returnIO . pairmap op) (vpair w u)
| isUnit v = returnIO (vmap f v) | otherwise = d v `thenIO` \u -> toHalves (divCon d c f) (divCon d c f) u `thenIO` c
The divCon function recursively halves and processes a vector. It takes as arguments: a vector transformation to be applied before the vector is split, a transformation to be applied after the vector has been concatenated, and a function to be applied to unit vectors. Using this function scan can be de ned thus: scanDC :: Comm a => (a->a->a) -> Vec a -> IO (Vec a) scanDC op v = divCon d c id v where d = returnIO c v = distLastLR v `thenIO` \u -> toHalves (returnIO . vmap fst) (returnIO . pairmap op) (vpair u v)
4.3 Multiple aggregation
Multiple aggregation is an important parallel operation, which was described in the Section 3. To implement this operation three general Vec distribution functions are required; these functions have implementations similar to distLastLR, and hence will not be de ned here. exchangeMid
:: Comm a => Vec a -> IO (Vec a)
distFirstToAll :: Comm a =>
(The \." operator is function composition.) This Vec a -> IO (Vec a) compares favourably with the idealised scan code described in Section 3. The main dierence be- distLastToAll :: Comm a => tween the idealised code and above is that above, Vec a -> IO (Vec a) vectors must be paired in order to process them. This is due to the restrictions which using toHalves The rst function exchanges the values of the imposes. middle two elements of a Vec, the second distributes The next section describes how the scan imple- the rst value of a Vec to all other elements of a mentation can be further improved. Vec, and the third distributes the last value of a Vec to all other elements of a Vec. 4.2 Divide and conquer function The multiple aggregation function may be deAs previously de ned, scan uses explicit recursion. ned using divCon thus: A common technique in functional programming is to `package-up' recursion into higher order func- magg :: Comm a => (a->a->a) -> (Vec a) -> IO (Vec a) tions. This can be done for Vecs by de ning a magg op v general divide and conquer function thus: divCon :: (Comm a, Comm b) => (Vec a -> IO (Vec a)) -> (Vec b -> IO (Vec b)) -> (a->b) -> Vec a -> IO (Vec b) divCon d c f v
= divCon d c id v where d = returnIO c v = exchangeMid v `thenIO` \u -> toHalves distLastToAll distFirstToAll u
`thenIO` \w -> toHalves (returnIO . pairmap op) (returnIO . pairmap (flip op)) (vpair v w)
(The flip function reverses the arguments to an operator.) Once again this compares favourably with the idealised code described in Section 3, and the explicit message passing implementation described in the appendix.
4.4 Multi-element vectors and masking The algorithms described so far have assumed that one vector element resides on each processor. For eciency its desirable to store and process many vector elements on each processor. This can be achieved by de ning: a type representing parallel vectors of vectors, and operations on this type. If the type of ordinary sequential vectors is: SVec; then a parallel vector of vectors may be de ned thus: type Vector a = Vec (SVec a)
If smap is the map operation over sequential vectors, then a map over Vectors may be de ned thus: vecmap :: (a->b) -> Vector a -> Vector b vecmap f v = vmap (smap f) v
An ecient scan for Vectors is a little more involved. An algorithm for ecient scanning is described via an example. Consider: vscan (+) [[1,2,3],[4,5,6],[7,8,9]]
First the sub-vectors are scanned giving: [[1,3,6],[4,9,15],[7,15,24]]
Next the last elements of the sub-vectors ([6,15,24]) are scanned, giving: [6,21,45]. These values are then shifted right one place: [0,6,21]. Finally these last values are added to the corresponding results of scanning the sub-vectors, giving the desired result:
Using vProcNo a processor mask operation may be de ned thus: procMask :: (Int -> Bool) -> Vec a -> Vec a -> IO (Vec a) procMask pred u v = vProcNo `thenIO` \vpn -> pairmap f (vpair vpn (vpair u v)) where f pn (a,b) | pred pn = a | otherwise = b
The procMask operation takes a predicate and two vectors as arguments. The predicate is supplied with each processor's number. If the predicate is true for a particular processor then the corresponding element from the rst vector appears in the result vector, otherwise the element from the second vector is returned. Processor masking enables dierent vector results to be generated depending upon processors' numbers. In addition to procMask a communication(transformation) function to shift Vec values one place left or right is required. This takes a direction, an end value and a Vec as arguments. shift :: Comm a => Dir -> a -> Vec a -> IO (Vec a)
Eg: shift right 0 [1,2,3,4] = [0,1,2,3]. The scan operation on vectors may be de ned thus: vscan :: Comm a => (a->a->a) -> Vector a -> IO (Vector a) vscan op vec = let w = vmap (sscan op) vec in scanDC op (vmap slast w) `thenIO` \v -> shift right (error "no val") v `thenIO` \u -> procMask isFirstProc w (pairmap g u w) where g x a = smap (op x) a
(The function slast returns the last element of a SVec and sscan performs a scan over a SVec.) The implementation closely follows the previous description of the algorithm. This section has shown how multi-element vectors may be de ned and processed, and how mask[[1,3,6],[10,15,21],[28,36,45]] ing may be implemented. Masking is an important In order to implement this ecient scan on vec- data parallel operation. tors a processor masking operation is required. Masking operations have a similar role to where state- 5 Further work ments in C*. These operations can be de ned using the existing abstractions. However, a function The implementation which has been described can which returns a vector of processor numbers is re- be adapted to most architectures. For example, in the case of a grid a two dimensional array is the quired: natural parallel data structure. With a few minor vProcNo :: IO (Vec Int) changes the implementation can also manipulate vProcNo = procNo `thenIO` \pn -> mkVec pn mutable vectors.
From an eciency view point there are a couple of areas to be explored. Firstly the use of recursion level counters can be optimised. An extreme optimisation, valid for machines with small numbers of processors, is to completely unfold-fold recursion. Secondly conformance checking, checking that vectors have the same size, could be optimised. Often it can be deduced that vectors must have the same size. A more sophisticated type system might even eliminate the need for conformance checking { however a simple solution to the problem is sought. One aspect of the implementation which could be improved are the distribution functions. The idealised code in Section 3 for scan and magg used a single generic distribute function. It is desirable to devise a single function like this. However it is also important that distribution/communications functions are ecient. Investigation has also started into explicitly representing communication patterns. This allows communication patterns to be manipulated, which can be useful since often communication patterns are simple re ections or variations of others. Ultimately the goal is to be able to generate a right to left scan by simply reversing a left to right scan. The abstractions as described must be used carefully. They do not support completely foolproof data parallel programming. Problems can arise arise if message passing or processor identi cation via procNo is combined with D&C abstractions. In particular if only certain processors, determined via procNo, engage in a communication activity (eg. distLastToAll) a program may not behave correctly. Also although a vProcNo operation was introduced, it is not strictly necessary. The following is a valid program: pMask pred u v = procNo `thenIO` \pn -> if pred pn then returnIO u else returnIO v
A potential solution is to design more restrictive abstractions which prevent the use of procNo in D&C programs. It is also necessary to prevent the import of values from program parts written using message passing. However so far such abstractions have proved too restrictive, and have lead to a clumsy style of programming.
6 Conclusions
lows message passing and D&C paradigms to be combined.
Acknowledgements
I would like to thank the anonymous referees for their helpful comments.
References
[1] R Bird and P Wadler. An Introduction to Functional Programming. Prentice Hall International, 1988. [2] F W Burton and M R Sleep. Executing functional programs on a virtual tree of processors. In Conference on Functional Programming Languages and Computer Architecture, pages 187{194, Portsmouth, New Hampshire, October 1982. [3] M Cole. Algorithmic Skeletons: structured management of parallel computation. Pitman, 1989. [4] P Hudak and J H Fasel. A gentle introduction to Haskell. SIGPLAN Notices, Volume 27, Number 5, pages T1{T53, May 1992. [5] P Hudak, S L Peyton Jones and P L Wadler (editors). Report on the programming language Haskell, a non-strict purely functional language (version 1.2). SIGPLAN Notices, Volume 27, Number 5, May 1992. [6] J Launchbury and S L Peyton Jones. Lazy functional state threads. In Proceedings of ACM SIGPLAN '94 Conference on Programming Language Design and Implementation,
[7]
[8]
[9]
A simple implementationof abstractions which support divide and conquer parallelism has been described. It should be possible to implement these [10] on most parallel systems which support message passing. The D&C paradigm of parallel programming can lead to clearer and more portable programs, than if other paradigms are used. In addition the implementation technique presented al-
pages 24{35. ACM Press, June 1994. D McBurney and M R Sleep. Transputerbased experiments with the ZAPP architecture. Technical Report SYS-C86-10, University of East Anglia, November 1986. Z G Mou. Divacon: A parallel language for scienti c computing based on divide-andconquer. In Proceedings of 3rd Symposium on Frontiers of Massively Parallel Computation, pages 451{461. IEEE, October 1990. Z G Mou. A Formal Model for Divide-andConquer and Its Parallel Realization. Ph.D. thesis, Yale University, 1990. D B Skillicorn. Architecture-independent parallel computation. IEEE Computer, Volume 23, Number 12, pages 38{50, December 1990.
Appendix scan :: Comm a => (a->a->a) -> a -> IO a scan f a = procNo `thenIO` \pn -> noProc `thenIO` \np -> thenIOs (map (scanphase pn f) [1..log2c np]) a scanphase :: Comm a => Int -> (a->a->a) -> Int -> a -> IO a scanphase pn f i a = if isLeftLast pn i then (send right a `seqIO` returnIO a) else if inLeftHalf pn i then returnIO a else receive left `thenIO` \v -> if isRightLast pn i then returnIO (f v a) else send right v `seqIO` returnIO (f v a)
aggregate :: Comm a => (a->a->a) -> a -> IO a aggregate f a = procNo `thenIO` \pn -> noProc `thenIO` \np -> thenIOs (map (agg pn f) [1..log2c np]) a
agg :: Comm a => Int -> (a->a->b) -> Int -> a -> IO b agg pn f i a = if inRightHalf pn i then if isRightFirst pn i then send left a `seqIO` receive left `thenIO` \v -> if i > 1 then send right v `seqIO` returnIO (f v a) else returnIO (f v a) else receive left `thenIO` \v -> if isRightLast pn i then returnIO (f v a) else send right v `seqIO` returnIO (f v a) else -- in left half if isLeftLast pn i then send right a `seqIO` receive right `thenIO` \v -> if i > 1 then send left v `seqIO` returnIO (f a v) else returnIO (f a v) else receive right `thenIO` \v -> if isLeftFirst pn i then returnIO (f a v) else send left v `seqIO` returnIO (f a v)