A Calculational Framework for Parallelization of

A Calculational Framework for Parallelization of Sequential Programs Zhenjiang HU University of Tokyo

Masato TAKEICHI University of Tokyo

[email protected]

[email protected]

sequential programs. Basically, there are two major approaches. One aims at a general derivation of parallel programs from sequential ones, e.g. [CTT97]. It utilizes some arti cial intelligent technicals such as synthesis from examples, and makes painstaking eort to systemize derivation process while imposing as less restriction as possible on the form of sequential programs. This approach has the advantage of generality, but it usually requires heuristics and human insights in the derivation process, which seems a bit dicult to be made automatic. The other approach, which is very popular, is to use Bird-Meertens formalism [Bir87, MFP91] to synthesize parallel functional programs by program calculation, e.g. [GDH94, Gor96a, Gor96b]. Dierent from the rst approach whose emphasis is on the derivation process, its emphasis is on skeletons of sequential programs whose parallel versions can be obtained by a simple application of calculational rules (laws). This calculational approach has the advantage of simple derivation process suiting mechanical implementation, as demonstrated in other applications [OHIT97, HITT97]. But for lack of descriptive power of the speci c forms, its application scope is rather limited. In this paper, we make the rst attempt to construct a calculational framework for parallelization with the aim of combining the advantages of the two approaches. We take the advantage of the rst one for deriving elementary parallelization laws, and the second one for constructing our parallelization algorithm. Our main contributions are as follows.

Abstract

A great deal of eort has been made on systematic ways for parallelizing sequential programs. What seems to be unsatisfactory, however, is that the current approaches are either too general where many heuristics are needed or too restrictive where application scope is rather limited. In this paper, we propose a calculational framework for deriving parallel divide-and-conquer programs from naive sequential ones. Being more constructive, our method is not only helpful in design of ecient parallel programs in general but also promising in construction of parallelization system. Several interesting examples are used for illustration. 1 Introduction

Parallel programs are known to be more dicult to write than their sequential counterparts. Consider the sbp problem of determining whether the brackets '(' and ')' in a given string are correctly matched. This problem has a straightforward linear sequential algorithm, in which the string is examined from left to right. A counter is initialized to 0, and incremented or decreased as opening and closing brackets are encountered. sbp xs = sbp0 xs 0 0 sbp [ ] c = c=0 sbp0 (x : xs) c = if x =0 (0 then sbp0 xs (c + 1) else if x =0)0 then 0 c > 0 ^ sbp xs (c 0 1) 0 else sbp xs c: It, however, becomes dicult to write a parallel program like those in [BSS91, Col95] whose algorithms involved are actually non-trivial. So, it would be helpful if we could parallelize sequential ones. As a matter of fact, much attention has been drawn to looking into systematic ways of parallelization of

We develop several elementary but general par-

allelization laws (Section 3). By elementary, we mean that they contribute to the core transformations in our parallelization algorithm; and by general, we mean that they are more powerful than the previous ones [Ski92, GDH94, Gor96a, Gor96b] and can be applied to synthesize several interesting parallel programs (as demonstrated in

Information Systems and Technologies for Network Society, Fukuoka, Japan, September 1997

1

Section 3). Moreover, these laws can be directly implemented in a way of simple symbolic manipulation. We propose a systematic and constructive parallelization algorithm (Section 4) for derivation of divide-and-conquer parallel programs from naive sequential ones. The distinguished point of our algorithm is its constructive way of deriving associative/distributive operators from the data types, and its eective use of the fusion and tupling calculation in the parallelizing process. Our parallelization algorithm is given in a calculational way like those in [OHIT97, HITT97]. Therefore, it preserves the advantages of transformation in calculational form; being correct, guaranteeing to terminate, and being natural to be generalized to programs over any data types other than lists we studied in this paper. It would be not only helpful in design of ecient parallel programs but also promising in construction of parallelization system. The organization of this paper is as follows. In Section 2, we review the notational conventions and some basic concepts used in this paper. Then we focus ourselves on deriving several basic parallelization laws with some interesting examples in Section 3. Finally, we propose our parallelization algorithm in Section 4. Related work and conclusions are given in Section 5 and 6 respectively.

The projection function i selects the ith component of tuples, e.g., 1 (a; b) = a: The is an important binary operator on tuples, de ned by (f g) a = (f a; g a). And f1 1 1 1 fn is abbreviated to 1n1 fi . 4

4

4

4

2.2 Lists

Lists are nite sequences of values of the same type. There are two basic views of lists. Parallel View. A list is either empty, a singleton, or a concatenation of two lists. We write [] for the empty list, [a] for the singleton list with element a (and [1] for the function taking a to [a]), and xs ++ ys for the concatenation of xs and ys. We usually call the lists in the parallel view append lists. Sequential View. A list is either empty [], or constructed by an element a and a list x either by the constructor : to have a : x or by the constructor ^: to have x ^: a. Semantically we have a : x = [a]++ x and x ^: a = x ++[a]. To tell dierence, we call the former cons lists and the later snoc lists. Concatenation is associative, and [ ] is its unit. For example, the term [1] ++[2] ++[3] denotes a list with three elements, often abbreviated to [1; 2; 3]. 2.3 Recursions on Lists

Functions over lists are usually de ned in a recursive way. This section introduces some well-known recursive patterns over append, cons and snoc lists.

2 Preliminary

In this section, we brie y review the notational conventions known as Bird-Meertens Formalisms [Bir87] and some basic concepts which will be used in the rest of this paper.

De nition 1 ((List) Homomorphism)

A function h satisfying the following three equations is called a list homomorphism : [] = [] = h (xs ++ ys) =

2.1 Functions

is denoted by a space and the argument which may be written without brackets. Thus f a means f (a). Functions are curried, and application associates to the left. Thus f a b means (f a) b. Functional application is regarded as more binding than any other operator, so f a 8 b means (f a) 8 b, but not f (a 8 b). Functional composition is denoted by a centralized circle . By de nition, we have (f g) a = f (g a). Functional composition is an associative operator, and the identity function is denoted by id. In x binary operators will often be denoted by 8;

and can be sectioned ; an in x binary operator like 8 can be turned into unary functions by (a8) b = a 8 b = (8 b) a: Functional application

8

h

h x

fx

h xs

8 h ys

where 8 is an associative binary operator with unit 8. We write ([f; 8])1 for the unique function h. Usually, even a function h de ned by the last two equations is considered to be a list homomorphism too. 2 List homomorphisms are a good characterization of parallel computational models [Ski92, GDH94, Col95]. Intuitively, the de nition of list homomorphisms means that the value of h on the larger list depends in a particular way (using binary operation 8) on the values of h applied to the two pieces of 1 Strictly speaking, we should write ([ ; f; 8]) to denote the 8 unique function h. We can omit the 8 because it is the unit of 8.

2

List mutumorphisms (mutumorphisms for short in this paper) is a generalization of homomorphisms, powerful enough to describe all primitive recursive functions. Moreover, it can be automatically turned into ecient list homomorphisms via tupling calculation [HIT97, HITT97].

the list. The computations of h xs and h ys are independent of each other and can thus be carried out in parallel. This simple equation can be viewed as expressing the well-known divide-and-conquer paradigm. A number of works on eciently mapping list homomorphisms to particular parallel architectures can be found in [Ski92, GDH94, Gor96a]. It follows that if we can derive list homomorphisms from sequential programs we can have corresponding parallel programs. De nition 2 (Left / Right Reduction) List function h is called a left reduction if there exists a binary operator 8 and a value e such that [] = h (xs ^ : x) =

e

[] = (x : xs) =

e

h

h xs

Let h1; 1 1 1 ; hn be mutumorphisms as de ned in De nition 3. Then, 1n1 hin = ([1n1 fi ; 1n1 8i ]) and (81 ; 1 1 1 ; 8 ) is the unit of 11 8i . 2 Theorem 2 (Tupling [HIT97])

n

Therefore, like homomorphisms, mutumorphisms can be considered as a good characterization of computational model as well. In the rest of this paper, we shall focus on how to derive mutumorphisms from sequential programs in a very general setting.

8 x:

Dually, h is called a right reduction if there exists a binary operator 8 and a value e such that h h

x

2

8 h xs:

3

In contrast to the parallelism in homomorphisms, left and right reductions stipulate computation order, leading to sequential programs.

Before giving our parallelization algorithm, we shall develop several elementary laws for parallelizing sequential programs. We develop these laws basically based on the parallel synthesis method (second order generalization + induction) [CTT97]. We omit the discussion of the detailed development process [HT97b] where some extension of the parallel synthesis method has been done.

Theorem 1 (Third Homomorphism Theorem)

Function h is a homomorphism if it is a left and a right reductions. 2 This theorem [Gib96] states that a function that can be computed by both a left reduction and a right reduction is necessarily a list homomorphism which can be computed according to any parenthesization of the list. This theorem looks very attractive because it claims that if a problem can be de ned by two speci c sequential programs, then it can be de ned by a list homomorphism which can be implemented in parallel as argued in Section 2.3. However, there remain two major problems with this \attractive" theorem. First, there are a lot of useful and interesting list functions that are not list homomorphisms and thus have no corresponding operator 8 [Col95]. Second, as pointed by Gorlatch [Gor96b], although the existence of an associative binary operator is guaranteed the theorem does not address a direct and ecient way of calculating it. To solve these problems, rather than using list homomorphisms, we choose mutumorphisms (mutual morphisms) [Fok92] on append lists as our parallel computation model. De nition 3 ((List) Mutumorphisms) The h1 ; 1 1 1 ; hn are called list mutumorphisms if they are mutually de ned in the following way: hi

[] = 8 [] = fx (xs ++ ys) = ((11 h ) xs) 8 ((11 h ) ys)

hi x hi

3.1 Basic Form

Sequential programs are usually speci ed in the following recursive way f

n

i

i

n

i

(x : xs) =

g x

(g0 xs) 8 f xs

reading that the result of f applying to x : xs can be obtained by merging the result of f recursively applying to xs with the result of other functions g and g0 applying to x, xs. Certainly, not all sequential programs have ecient parallel counterparts. If the 8 is associative, we then have the following divide-andconquer parallel program for f . Theorem 3 (Basic Form)

program

f f

[] = (x : xs) =

e g x

Given is a sequential (g0 xs) 8 f xs

where 8 denotes an associative binary operator and 0 is a homomorphism ([fg ; 80 ]). Then, for any nonempty lists xs and ys, we have g

0

[x] = f (xs ++ ys) =

i

i

Parallelization Laws

f

2

3

(g0 [0 ]) 8 e G xs (g ys) 8 f ys

g x

information for later computation, as seen in the example of the sbp problem in the introduction where a counter is used for accumulation. The following theorem is a natural extension of Theorem 3 capable of dealing with accumulating parameters. Theorem 4 (Accumulation) Given is a sequential program

where G is a function de ned by [] = (xs ++ ys) z =

G x z

g x z

G

G xs

(g0 ys 80 z) 8 G ys z

Proof. We prove the new de nition of f by induction on the length of the non-empty list xs. Base. Assuming xs = [x], we have

= = = = =

Induction.

(xs ++ ys) f Assumption g f ([x] ++ ys) f trivial g f (x : ys) f Given de nition of f g 0 g x (g ys) 8 f ys f De nition of G g 0 G [x] (g ys) 8 f ys f Assumption g 0 G xs (g ys) 8 f ys

[] c = g1 c (x : xs) c = g2 x (g20 xs) c 8 f xs (g3 x c) where 8 and are two associative binary operators, and g2 = ([fg2 ; 80]). Then, for any non-empty lists xs

f

f

f

and ys, we have

[x] c = f (xs ++ ys) c =

f

(g20 [0 ]) c 8 g1 (g3 x c) G2 xs (g2 ys) c 8 f ys (G3 xs c) g2 x

where G2 and G3 are functions de ned by [] (xs ++ ys) z c G3 [x] G3 (xs ++ ys)

G2 x z c

Assuming xs = x : xs0 , we have

G2

f (xs ++ ys) = f Assumption g 0 f ((x : xs ) ++ ys) = f trivial g 0 f (x : (xs ++ ys)) = f De nition of f g 0 0 0 g x (g (xs ++ ys)) 8 f (xs ++ ys) = f De nition of G, Inductive hypothesis g 0 0 0 0 G [x] (g (xs ++ ys)) 8 (G xs (g ys) 8 f ys) = f De nition of0 g0 g 0 0 0 0 0 G [x] (g xs 8 g ys) 8 (G xs (g ys) 8 f ys) = f Associativity of 8 g (G [x] (g0 xs0 80 g0 ys) 8 G xs0 (g0 ys)) 8 f ys = f De nition of G g 0 0 G ([x] ++ xs ) (g ys) 8 f ys 0 = f Since xs = x : xs g 0 G xs (g ys) 8 f ys 2

= = = =

g2 x z c

(g20 ys 80 z) c 8 G2 ys z (G3 xs c)

G2 xs g3 x

G3 ys

G3 xs

2

For a use of the theorem, recall the sbp problem given in the introduction. Based on the manipulation property of if-structure, we like to turn the de nition of sbp0 into the following. sbp0

(x : xs) c = (if x =0 (0 then True else if x =0 )0 then c > 0 else True) ^ sbp0 xs ((if x =0 (00 then 1 else if x = )0 then 0 1 else 0) + c)

It follows from Theorem 43 that sbp0

This theorem shows a mechanical way to turn a sequential de nition of f into mutumorphisms2 which can be automatically transformed into ecient homomorphisms by an application of the tupling theorem. Notice that the sequential programs that can be dealt with here are much more general than those in the previous studies [BSS91, GDH94, Gor96b] because the xs is also allowed to be used by functions other than f .

where

(xs ++ ys) c = G2 xs c ^

[]

G2 x c

(xs ++ ys) c [] G3 (xs ++ ys) G2

G3 x

3.2 Extended Forms

sbp0

ys

(G3 xs + c)

= if x =0 (0 then True else if x =0)0 then c > 0 else True = G2 xs0 c 0 ^ G2 ys (G3 xs + c) = if x = ( then 1 else if x =0)0 then (01) else 0 = G3 ys + G3 xs

This is the parallel version we would like to get in this paper, although it is currently inecient as there are multiple recursive calls in the RHS which operate on the same input list. But this can be automatically improved by tupling calculation as intensively studied

This section extends the basic form in several respects, enriching syntactic structures in a recursive de nition. One is accumulating parameters, very useful to store 2 By eciency, we mean that redundant computations due to multiple data traversals of the input by several functions in the mutumorphisms are removed.

3 Note that the second parameter of not necessary) and has been removed.

4

G2

is a dead one (i.e.,

l b is a tricky nonlinear recursion on lists, which com-

in [HIT97, HITT97]. For instance, we can obtain the following program by tupling sbp0, G2 and G3. sbp0

xs c

23 [x] c

s

23 (xs ++ ys) c

s

putes the bonacci number of the length of a given list, mimicking the bonacci function on natural numbers.

= s where (s; g2 ; g3 ) = s23 xs c = if x =0 (0 then (c + 1 = 0; True; 1) else if x =0)0 then (c 0 1 = 0; c > 0; 01) else (c = 0; True; 0) = let (sx; g2 x; g3 x) = s23 xs (sy; g2 y; g3 y) = s23 ys in (g2x c ^ sy (g3 x + c); g2 x c ^ g2 y (g3 x + c); g 3 x + g3 y )

l b [ ] l b (x : xs) l b0 [ ] l b0 (x : xs)

To handle nonlinear recursive sequential programs, we need to make use of distributive property in order to parallelize them.

program

f f

f1

[] (x : xs) f2 [] f2 (x : xs)

= e1 0 = g1 x (g1 xs) 8 (g11 f1 xs) 8 (g12 f2 xs) = e2 0 = g2 x (g2 xs) 8 (g21 f1 xs) 8 (g22 f2 xs) where 8 is associative and commutative, and is associative and distributive w.r.t 8, i.e., for any x, y and f1

f1

z

,

x

(y 8 z) = (x y) 8 (x z):

Then, f1 and f2 can be parallelized. Proof. The calculational rule can be found in [HT97a]. 2 We can use this theorem to parallelize l b function and get an ecient O(log n) parallel program [HT97a]. 4

Given is a sequential

Parallelization Algorithm

Having given the parallelization laws have been given, we are ready to propose our parallelization algorithm, making it clear how to recognize associative and distributive operators in a program and how to apply these parallelization laws in a systematic way.

[] = e (x : xs) = if g1 x (g10 xs0) then g2 x (g2 xs) 8 f xs else g3 x (g30 xs)

where 8 denotes an associative binary operator, and 0 = ([fg0 ; 80 ]) for i = 1; 2; 3. Then, we can calculate a i parallel version of f . Proof. The calculational rule can be found in [HT97a]. 2 gi

Assume that and f2 are mutually recursive functions de ned by

Theorem 6 (Nonlinear Structure)

It is worth noting that s23 can be parallelly implemented with multiple processor system supporting bidirectional tree-like communication, using the time of O(log n) where n denotes the length of the input list based on the algorithm in [Ble89]. Two passes are employed; an upward pass in the computation can be used to compute the third component of s23 xs c before a downward pass is used to compute the rst two values of the tuple. This example is taken from [Col95] where only an informal and intuitive derivation was given. Although our derived program is a bit dierent, it is as ecient as that in [Col95]. Another structure that is important in a recursive de nition is conditional. Related work can be found in [FG94, CDG96], where transformation on conditional expressions is proposed. Our parallelization law with regard to the conditional structure is as follows. Theorem 5 (Conditional)

= 1 = l b xs + l b0 xs = 0 = l b xs

4.1 Recognizing Associative and Distributive Operators

i

Central to our parallelization laws is the use of associativity of a binary operator 8 as well as distributivity of

. So we must be able to recognize them in a program. There are several ways; limiting application scope so that all associative and distributive operators can be made explicit, e.g. in [FG94, CTT97], or adopting some arti cial methods like anti-uni cation [Hei94] to synthesize them. To be constructive, we shall focus on the associative and distributive operators that are derivable from the target type of a sequential program to be parallelized.

An example of the use of the theorem for parallelizing a sequential program solving the least sorted pre x problem can be found in [HT97a]. So far we have considered linear recursions , i.e, recursions with a single recursive call in the de nition body. Now we shall provide our parallelization law for nonlinear recursions . For instance, the following 5

In fact every type R which has a zero constructor (a constructor with no arguments like [ ] for lists) has a function that is associative, and that has the zero CZ for both a left and right identity [SF93]. Such function 8 is called zero replacement function, since x 8 y means to replace all CZ in x with y . Here are two examples. For the type of cons lists, we have such 8 de ned by

try to rewrite constructor expressions for producing the result by using 8. For instances, when R is the cons list type (whose zero constructor is [ ] and whose associate operator is ++), we have the rule

CZ

[] 8 y = (x : xs) 8 y =

y x

x

When R is the type of natural numbers, we have Succ e

: (xs 8 y)

y Succ

scan [] scan (x : xs)

= [] = [x] ++(x+) 3 scan xs:

Step 2: Normalizing body

(n 8 y)

In order to apply our parallelization laws, we normalize the body to the following normal form.

which is the integer addition +. Associating with 8, we may derive a most natural distributive . For example, for the type of natural numbers, associating with + we have a distributive operator de ned by:

e1

8 e2 8 1 1 1 8 e

n

where ei is (i) an expression without recursive calls (to f ), or (ii) a recursive call (to f ), or (iii) a function application, say g e, where e is an expression of (ii) and g is another function, or (iv) an if expression, say if e10 then e20 else e30, where 0 0 e2 is an expressions of form (ii) or (iii), and e3 is an expression of form (i). Looking at the scan example, we simply normalize the body to

(x ) 0 = 0 (x ) (Succ n) = x + x n Clearly, is our familiar 2. This natural distributive

operator is very important as seen in Theorem 6. 4.2 Main Algorithm

Suppose that we are given a sequential program de ned by : [A] ! C ! R [] c = g1 c f (x : xs) c = body f

[x] ++(x+) 3 scan xs

f

in which the rst underlined expression is of from (i) and the second can be considered as a function application g (scan xs) where g r = (x+) 3 r.

where body is an expression. The accumulating parameter is probably unnecessary which can then be eliminated. Our algorithm is to derive a parallel mutumorphism for f , consisting four steps. We shall adopt scan (also known as pre x sums ) [Ble89, FG94, Gor96b] as our running example. scan [] scan (x : xs)

) (Succ 0) + e:

Returning to our running example, we should get

which is the list append operator ++; and for the type of natural numbers, we have it de ned by 08y = (Succ n) 8 y =

: e ) [x] ++ e:

Step 3: Removing Recursive Calls by Fusion

Recall that the parallelization laws require that recursive calls (to f ) should be exposed to the associative operator in the body rather than being wrapped in a function application, and require that the predicate part of a if expression should not contain recursive calls. However, as seen in Step 2, the normalized body may contain expressions that violate this requirement. Our idea is to apply fusion calculation [OHIT97] in order to remove the recursive calls and to turn the unexpected expressions into expected ones. As an example, consider our running example of scan where the recursive call scan xs does not directly expose to the associative operator ++; being0 included in the expression (x+) 3 scan xs. Let scan xs x =

= [] = x : (x+) 3 scan xs

where 3 is a map function [Bir87] de ned informally by f 3 [x1; 1 1 1 ; xn] = [f x1; 1 1 1 ; f xn]. Step 1: Making Associative Operator Explicit

First of all, we need to make the associative operator 8 be explicit in our program. Such 8 is not an arbitrary associative operator; rather it is the zero replacement operator derivable from the resulting type R which has a zero constructor CZ . To this end, we 6

Then we can introduce a new function f 0 (in fact, 0 f is the same as f ) and have

(x+) 3 scan xs. We apply the fusion calculation to scan0 and obtain the following result. scan (x : xs) scan0 [ ] y scan0 (x : xs)

y

= [x] ++ scan0 xs x = [] = [x + y] ++ scan0 xs (x + y)

f

0 f

5

Step 4: Applying Parallelization Laws

Now we are ready to use the parallelization laws to derive a parallel f . There are three cases according to the structure of the normalized body. First, the transformed body has no recursive call to f . In this case, we step to parallelize other functions in the body. For instance, for the following new scan de nition xs x

we should step to parallelize scan0. Second, the transformed body has a single recursive call to f (a direct recursive call or a recursive call inside a if structure), denoted by E[f] here and after. We have three subcases. { body = e 8 E [f ]. We apply Theorem 3, 4 or 5 for parallelization. For the example of scan', we can use Theorem 4 and get the following parallel version for scan0 after the elimination of the second parameter of G2. scan0

(xs ++ ys) c = G2 xs0 c ++ scan ys (G3 xs + c) G2 [x] c = [x + c ] G2 (xs ++ ys) c = G2 xs c ++ G2 ys (G3 xs G3 [x] = g3 x G3 (xs ++ ys) = G3 ys + G3 xs { body = E [f ] 8 e. De ning a new associative operator 8^ by x8^ y = y 8x, we turn the body into the rst subcase, i.e., body = e8^ E [f ]. { body = e1 8 E [f ] 8 e2 . Here, we need to check if 8 is commutative. If so, we exchange

the positions of e2 and e3, transforming it to the rst subcase. Otherwise, we give up parallelizing. Third, the transformed body has over one recursive calls, say two for simplicity. In this case, we require that 8 should be commutative and should have a corresponding distributive operator with the identity unit say . If this requirement is satis ed, we can transform body to the form e

8 E1 [f ] 8 E2 [f 0 ] 0 e 8 E1 [f ] 8 E2 [f ] e

Now we are able to apply Theorem 6 for parallelizing mutually de ned functions f and f 0.

The new scan0 can be parallelized by Theorem 4, leading to a parallel scan.

scan (x : xs) = [x] ++ scan0

(x : xs) c = (x : xs) c =

8 E1 [f ] 8 E2 [f ]

7

Related Work and Conclusions

Much attention has been paid to parallel programming with homomorphisms [Ski92, Col95, Gor96b, Gor96a, GDH94, HIT97], because homomorphisms are a good characterizations of parallel computational models [Ski92, GDH94, Gor96a]. Our work has been much in uenced by these work. We are particularly interested in how to derive ecient homomorphisms. One popular way, known as calculational approach [Ski92], makes use of laws in Bird Meertens Formalism [Bir87]. It forces the initial programs to be described in terms of a small set of speci c homomorphisms such as map and reduction, from which a more complicated homomorphism are derived based on calculational laws such as promotion rules. As argued in the paper, homomorphisms are rather limiting, excluding many interesting programs. To remedy this situation, Cole [Col95] proposed the idea of near homomorphisms (or called almost homomorphisms ), a composition of projection function with a homomorphism, and gave a intuitive and informal way of how to solve problems using them. This idea was then formalized by [HIT97]. However, all above require programs to be initially de ned on append lists (i.e., in the parallel view of lists). What is more challenging is to derive homomorphisms from sequential programs de ned over cons or ) snoc lists (i.e., in the sequential view of lists). To this end, one idea is to develop skeletons of sequential programs whose list homomorphisms can be easily derived, e.g., in [GDH94, Gor96a]. However, the prepared skeletons are usually less general and depend heavily on associativity of the operators but how to nd or determine an associative operator was not given. Compared to them, our approach is not restricted to any skeleton. Furthermore, our approach gives a way to recognize the associative operator from the target type. Another idea for deriving homomorphisms from sequential programs is to make use of the third homomorphism theorem [Gib96]. Barnard et al [BSS91] tried it for the language recognition problem. As pointed by [Gor96b], although the existence of an associative binary operator is guaranteed, the theorem does not address any ecient way of calculating it. Gorlatch [Gor96a, Gor96b] proposed an idea of synthe-

sizing list homomorphisms by generalizing both leftward and rightward reduction functions. Since his idea was studied in an informal way and the generalization algorithm was not given at all, it is unclear how to do it in general. Our work was greatly inspired by the parallel synthesis algorithm in [CDG96, CTT97]. We brought the idea of second order generalization and inductive derivation in [CTT97] for building our basic parallelization laws. What is dierent is that rather than determining a desired pre-parallel form from examples which requires many heuristics and human insights, we propose a constructive algorithm.

[FG94] [Fok92]

[GDH94]

[Gib96] [Gor96a]

6 Conclusions

In this paper, we propose a calculational framework for parallelizing any naive sequential programs. Particularly, we give a constructive parallelization algorithm, by developing a set of elementary but powerful parallelization laws and deriving associative and distributive operators from the resulting data type. We illustrate with several interesting problems that our parallelization algorithm can be applied to a wide class of programs.

[Gor96b]

[Hei94] [HIT97]

Acknowledgements

[HITT97]

We gratefully acknowledge inspiring discussions with W.N. Chin, from which some of the ideas in this paper came out. Thanks are also to the members of Tokyo CACA Seminars for many stimulating suggestions (http://www.ipl.t.u-tokyo.ac.jp/~ caca/).

[HT97a]

References

[HT97b]

[Bir87]

R. Bird. An introduction to the theory of lists. In M. Broy, editor, Logic of Programming and Calculi of Discrete Design, pages 5{42. Springer-Verlag, 1987. [Ble89] Guy E. Blelloch. Scans as primitive operations. IEEE Trans. on Computers, 38(11):1526{1538, November 1989. [BSS91] D. Barnard, J. Schmeiser, and D. Skillicorn. Deriving associative operators for language recognition. In Bulletin of EATCS (43), pages 131{139, 1991. [CDG96] W. Chin, J. Darlington, and Y. Guo. Parallelizing conditional recurrences. In Annual European Conference on Parallel Processing, LNCS 1123, pages 579{ 586, LIP, ENS Lyon, France, August 1996. SpringerVerlag. [Col95] M. Cole. Parallel programming with list homomorphisms. Parallel Processing Letter, 5(2), 1995. [CTT97] W. Chin, S. Tan, and Y. Teo. Deriving ecient parallel programs for complex recurrences. In

[MFP91]

A. Fischer and A. Ghuloum. Parallelizing complex scans and reductions. In ACM PLDI, pages 135{146, Orlando, Florida, 1994. ACM Press. M. Fokkinga. A gentle introduction to category theory | the calculational approach |. Technical Report Lecture Notes, Dept. INF, University of Twente, The Netherlands, September 1992. Z.N. Grant-Du and P.G. Harrison. Skeletons, list homomorphisms and parallel program transformation. Technical report, Department of Computing, Imperial College, 1994. J. Gibbons. The third homomorphism theorem. Journal of Functional Programming, 1996. to appear. S. Gorlatch. Systematic ecient parallelization of scan and other list homomorphisms. In Annual European Conference on Parallel Processing, LNCS 1124, pages 401{408, LIP, ENS Lyon, France, August 1996. Springer-Verlag. S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. Microprocessing and Microprogramming, 41:571{578, 1996. (Also appears in PLILP'96). B. Heinz. Lemma discovery by anti-uni cation of regular sorts. Technical report no. 94-21, FM Informatik, Technische Universitat Berlin, May 1994. Z. Hu, H. Iwasaki, and M. Takeichi. Formal derivation of ecient parallel programs by construction of list homomorphisms. ACM Transactions on Programming Languages and Systems, 19(3):444{461, 1997. Z. Hu, H. Iwasaki, M. Takeichi, and A. Takano. Tupling calculation eliminates multiple data traversals. In ACM SIGPLAN International Conference on Functional Programming, Amsterdam, The Netherlands, June 1997. ACM Press. Z. Hu and M. Takeichi. A calculational framwork for parallelization of sequential programs. Technical Report METR 97{07, Faculty of Engineering, University of Tokyo, May 1997. Z. Hu and M. Takeichi. Synthsizing calculational laws for parallelization. in preparation, May 1997. E. Meijer, M. Fokkinga, and R. Paterson. Functional programming with bananas, lenses, envelopes and barbed wire. In Proc. Conference on Functional Programming Languages and Computer Architecture (LNCS 523), pages 124{144, Cambridge,

Massachuetts, August 1991. [OHIT97] Y. Onoue, Z. Hu, H. Iwasaki, and M. Takeichi. A calculational fusion system HYLO. In IFIP TC 2 Work-

ing Conference on Algorithmic Languages and Calculi, Le Bischenberg, France, February 1997. Chap-

[SF93]

[Ski92]

ACM SIGSAM/SIGNUM International Conference on Parallel Symbolic Computation, Hawaii, July

1997. ACM Press. to appear.

8

man&Hall. T. Sheard and L. Fegaras. A fold for all seasons. In

Proc. Conference on Functional Programming Languages and Computer Architecture, pages 233{242,

Copenhagen, June 1993. D. B. Skillicorn. The Bird-Meertens Formalism as a Parallel Model. In NATO ARW \Software for Parallel Computation", June 92.

A Calculational Framework for Parallelization of

A Calculational Framework for Parallelization of

Suggest Documents

Parallelization in Calculational Forms - CiteSeerX

VIRACOCHA: An Efficient Parallelization Framework for ... - CiteSeerX

A NUMA-aware fine grain parallelization framework for ... - HAL-Inria

A unified spatio-temporal parallelization framework for accelerated ...

A Cost-Driven Compilation Framework for Speculative Parallelization ...

Framework Design, Parallelization and Force Computation in

A CALCULATIONAL STUDY ON NEUTRON KINETICS AND ...

A Calculational Fusion System HYLO

Fuzzy logic for practical use â A calculational ... - Semantic Scholar

Interprocedural Analysis for Parallelization

A novel parallelization technique for DEVS

A Bit-Compatible Parallelization for ILU(k)

Building an IDE for the Calculational Derivation of Imperative ... - arXiv

utilization of an equilibrium calculational program for ...

Scans and Convolutions A Calculational Proof of Moessner's Theorem

Designing a calculational proof of Cantor's theorem - UT CS

Experimental and Calculational Results of the ...

Building an IDE for the Calculational Derivation of Imperative Programs

calculational examination of the baneberry event

OpenMP Parallelization Paradigms for Nested

Parallelization Strategies for Markerless Human

A CALCULATIONAL SYSTEM TO AID ECONOMICAL USE OF MTRs F ...

Scans and Convolutions A Calculational Proof of Moessner's Theorem

The Calculational Design of a Generic Abstract Interpreter - DI ENS

A Calculational Framework for Parallelization of