Array Structures and Data-Parallel Algorithms

0 downloads 0 Views 266KB Size Report
Abstract. We apply Brookes and Geva's theory of generalised concrete data structures ... for the semantics and veri cation of imperative data-parallel programs.
Research Report ISIS-RR-95-1E

Array Structures and Data-Parallel Algorithms  Gaetan Hainsy

John Mullinsz

February 1995

Institute for Social Information Science (ISIS ) at Makuhari

FUJITSU LABORATORIES LTD. 1{9{3 Nakase, Mihama-ku, Chiba 261, Japan Telephone: (Makuhari) +81-43-299-3211 Fax: +81-43-299-3075

Array Structures and Data-Parallel Algorithms  Gaetan Hainsy

John Mullinsz

Institute for Social Information Science (ISIS ) at Makuhari

FUJITSU LABORATORIES LTD. 1{9{3 Nakase, Mihama-ku, Chiba 261, Japan Email: [email protected]

Abstract

We apply Brookes and Geva's theory of generalised concrete data structures and computational co-monads to the semantics of higher-order data-parallel functional languages. This yields a mathematical framework for describing the interaction between higher-order functions, explicitly distributed data and asynchronous algorithms. Concrete data structures (or CDS) allow the construction of several Cartesian closed categories, standard models for typed functional languages. Brookes and Geva have studied generalised CDSs and so-called parallel algorithms as meanings for lambda-calculus terms. An input-output function may correspond to many algorithms. Their construction is adapted to data-parallel functional languages through concrete array structures with explicit data layout. We construct a sub-category of array gCDS preserved by exponentiation through isomorphisms relating higherorder objects to their local parts. This formalism brings notions of data locality, synchronisation and denotational semantics in a uni ed framework. Key words: Speci cation and veri cation, semantics, implementation issues, data-parallel programming, functional programming.  Work supported by Fujitsu Labs, ENS-Lyon (France), CRIN-CNRS (France), FCAR (Quebec) and

NSERC (Canada). y Visiting the Parallel Programming Group, Makuhari z Department of Computer Science, University of Ottawa, Ontario, Canada

1 Introduction This paper is about the construction of explicit semantic models for massively parallel implementations of functional languages. Any successful abstract model of computation must be able to account for both machines and programs through both an execution model and a programming model. But with the rapid development of massively parallel architectures and their execution models, less attention was paid to the programming side. A common situation has been the use of programming models which are restricted versions of parallel execution models. This situation is in sharp contrast with the development of ever more abstract programming models for sequential programming. Bouge's work [3, 4] has highlighted the above issues and analysed their consequences for the semantics and veri cation of imperative data-parallel programs. The data-parallel paradigm is found extremely useful because it superposes two complementary interpretations of any given program. The macroscopic vision, a programming model, sees it as a sequential composition of parallel operations. The microscopic vision, an execution model, views the program as a ( at) parallel composition of sequential traces. This duality, when formalised, yields a rich context for studying the articulation of program and parallel execution. However data-parallel programming is de ned through an imperative view | a ow of control perpendicular to data and process space | and although not incompatible with declarative languages, it has not let to such a clear understanding of the relationship between functional programming and parallel implementations. Now paraphrasing [4]: a common approach for the use of such languages on parallel architectures has been the addition of new functionality to account for parallelism; and until now this approach has not allowed the most e ective use of massively parallel architectures. This situation is partly due to the diculty of designing ecient implementations for very high-level languages. But in our opinion the diculty of \adding" parallel operations may be part of the problem and in general requires a deep analysis of the language's semantics. Such an analysis is advocated by Hudak and Anderson [11] to, among other things, let the programmer control execution features like evaluation order and data/task mapping to a particular machine topology. In this paper we pursue a similar but more general goal: to incorporate evaluation order and physical location into the semantic (programming) model. To do so in a relatively clean way requires the construction of cartesian closed categories (CCC) [12], standard models for typed lambda calculi. Concrete data structures (CDSs) were used by Curien [8] to construct several sequential semantics, while analysing the question of what is a truly sequential functional program. Later, Brookes and Geva [6] de ned generalised CDSs (or gCDS) and a notion of parallel algorithms. In [10] we designed a CCC of gCDS whose objects are domains of array structures with explicit physical addresses or indices. This introduced the notion of task mapping in functional semantics. Here we combine this result with a general construction of Brookes and Geva on gCDS [5] to describe explicitly distributed algorithms 1 (as opposed to functions) and hence cover notions like evaluation order in our CCC of array structures. The algorithms of Brookes and Geva are intended to classify multiple computations for the same I/O function. This notion is not unrelated to Violard and Perrin's recent explanation of reduction by nondeterminism in program re nement. 1

1

The straightforward but key technical novelty we introduce is the array gCDS preserved by exponentiation through an isomorphism relating a function to its local values. For M; N generalised concrete structures (domains or types): M 2 is the domain of arrays with elements from M and (M 2 ! N 2 )  = (M 2 ! N )2. Continuous array transformations thus correspond to arrays of continuous functionals. The former is the macroscopic (global) view and the latter the microscopic (local) view of the same -term (program). In summary, the concrete data structures allow the introduction of time and space through plain (but not arbitrary) set-theoretical tricks, yet provide us with a complete categorical structure. This allows the kind of two-level discussion characteristic of data-parallel programming and is our motivation for avoiding purely operational or purely categorical methods. The rest of the paper is technical but requires no deep understanding. The reader is expected to understand the basic concepts of CCCs but all other developments are explained and given intuitive meanings. We rst outline the category of gCDS and continuous functions, then its subcategory of array structures where the above isomorphism holds. At that point we can describe the (parallel) semantics of a continuous transformation on arrays of integers. The next sections explain algorithms with a linearly ordered time, then distributed time and distributed (array-) algorithms to replace continuous functions as arrows of the CCC. At that point the full exibility of our model is visible: an array transformation can be given an in nite variety of distributed algorithms whose evaluation and localisation features are recursively de ned. Next is an application of a subset of the model to solve an open problem about the Crystal language [7]. The conclusion outlines future developments and related work.

2 Concrete Data Structures This section summarises basic notions about concrete data structures [6]. A concrete data structure or CDS [8] is a tuple (C; V; E; `) where C is a countable set of cells, V is a countable set of values, E  C  V is a set of events. An event (c; v) is often written cv. The enabling relation ` is between nite sets of events and cells. It induces a precedence relation on cells: c  c0 i 9y; v: y [ fcvg ` c0 . This enabling relation must be well-founded. A cell c is called initial if it is enabled by the empty set of events (written ` c). A cell c is lled in a set of events y if 9v: cv 2 y. Write F (y) for the set of cells lled in y. If y ` c, then y is an enabling of c and c is said to have an enabling in any superset y0 of y, written y `y c. Let E (y) be the set of cells enabled in y, and call A(y) = E (y) ? F (y) the set of cells accessible from y. Let M; M 0 ; N denote CDSs from now on. A state of M is a set of events x  EM which is functional and safe, namely where cv1 ; cv2 2 x ) v1 = v2 and c 2 F (x) ) 9y  x: y `x c. De ne D(M ) to be the poset of states of CDS M , ordered by set inclusion. For example let Bool = (fB g; fT; F g; fBT; BF g; `) where B is initial. Then D(Bool) is f;; fBT g; fBF gg and represents the at domain of booleans. Nat is (fN g; N; fNn j n 2 Ng; `) where N is initial has states f;; fN 0g; fN 1g; : : :g and represents the at domain of naturals. The following structure 0

Bool + Nat

= (fS; B; N g; fL; Rg [ VBool [ VNat ; fSL; SRg [ EBool [ ENat ; `) 2

where the enablings are ` S , SL ` B , and SR ` N , has a state domain isomorphic to the sum of Bool and Nat. Our last example will be used later. It encodes the linear order !^ into a CDS: Vnat = (N; fg; fn  j n 2 Ng; `) where ` 0 and fkg ` (k + 1). Its domain of states is isomorphic to the vertical ordering of the natural numbers with a point at in nity. Vnatplays the role of trace index in the Brookes-Geva algorithms. The posets obtained as sets of states of CDS are called concrete domains (or CDs) and have the following properties. Proposition 1 (Brookes and Geva [6]) CDs are (consistently complete) Scott domains where ? is the empty set, lowest upper bounds are given by unions and the compacts are the nite states. Recall from the theory of Scott domains [9] that when D; E are consistently complete domains then so is [D ! E ]. However Berry and Curien have shown that the continuous function space does not preserve CDSs. In fact our construction of array structures (Sect. 3) is inherently parallel; it is non-deterministic in the terminology of [8] and so unsuitable for the theory of CDS as developed by Curien. But continuous functions do preserve gCDSs and this is one of our motivations for using them. A generalised concrete data structure or gCDS [6] is a CDS equipped with a partial order  on its cells and such that its set of events and its enabling relation are upwards-closed with respect to cell ordering. Namely cv 2 E; c  c0 ) c0v 2 E and y ` c; c  c0 ) y ` c0. States must be upwards-closed with respect to cell ordering: cv 2 x; c  c0 ) c0 v 2 x. Any CDS is a gCDS with discrete order on its cells. The domains built from gCDS states are called generalised concrete domains (gCD) and satisfy proposition 1. The category gCDScont has gCDSs as objects and continuous functions as arrows. More precisely, an arrow M ! N is a continuous function from [D(M ) ! D(N )]. Its product is de ned as follows. Let c:i denote a pair (c; i) where i is an integer tag and extend this notation to sets of cells, events etc. The product M1  M2 = (C; ; V; E; `) of two gCDSs Mi = (Ci; i ; Vi; Ei; `i) is de ned by superposition: C = C1:1 [ C2:2, c:i  c0:i0 if and only if c i c0 and i = i0, V = V1 [ V2, E = E1:1 [ E2:2, and y:i ` c:i if and only if y `i c. Proposition 2 (Brookes and Geva [6]) The product preserves gCDSs and is a categorical product in gCDScont with pairing hx1 ; x2 i = x1 :1[x2 :2, and projection i (x) = fcv j c:i v 2 xg. D(M1 )  D(M2) is isomorphic to D(M1  M2 ). Let two gCDSs M; M 0 be given and let us call them the source structure and the target structure. The exponential M ! M 0 is (C; ; V; E; `) where C = Dfin(M )  CM , and Dfin(M ) is the set of nite states of M ordered by inclusion. (x; c0 ) will be abbreviated to xc0. The nite sets of cells are ordered by inclusion and the cells of the target retain their order: =  M . Also V = VM . And most importantly:  E = fxc0v0 2 C  V j c0 v0 2 EM g. So an event is a pair (xc0; v0) but viewing it instead as the pair (x; c0 v0) highlights its intended meaning: to associate an event c0v0 of the target to a nite state x in the source. It is a nitary piece of a map from states to states.  fxj c0j vj0 j 1  j  lg ` xc0 if and only if fc0j vj0 j 1  j  lg `M c0 and xj  x for all j . A cell xc0 is enabled by a set of events when the source-states parts of those events are subsets of x and the target-events parts enable c0. 0

0

0

0

0

3

The exponential preserves gCDSs: D(M ! N )  = [D(M ) ! D(N )]. Proposition 3 (Brookes and Geva [6]) An exponential domain is isomorphic to its space of continuous functions ordered pointwise, so that continuous functions are coded as states. The isomorphisms are:

a 2 D(M ! N ) 7! z 2 D(M ): fc0v0 j 9x  z: xc0 v0 2 ag f 2 [D(M ) ! D(N )] 7! fxc0v0 2 E j c0v0 2 f (x)g:

(1) (2)

We may thus freely interchange continuous functions and states of the exponential structure. Moreover, application and curry cation satisfy both halves of the de nition of an exponentiation [12], therefore gCDScont is a CCC.

3 Array Structures De ne now a sub-category of gCDScont whose objects are the state domains of array structures where cells, values and events are labelled by network addresses called indices. Communication along the network's edges is represented by the enabling relation in analogy to the way cell S communicates to cells B; N in Bool+Nat. Given a gCDS an array gCDS is constructed by replicating the cells over the nodes or indices of a graph. This graph indirectly de nes the enabling relation as explained below. Array indices thus represent adresses in a static (\physical") multiprocessor network. In the remainder we assume the existence of a xed countable directed graph (I; L) whose nodes ~{ 2 I will be called indices and whose edges (~{;~|) 2 L will be called channels or links. Let M = (C0; 0 ; V0; E0; `0) be a given gCDS. The array data structure or array structure over M , written M 2 = (C; ; V; E; `) is de ned as a gCDS: C = I  C0 is countable because both I and C0 are. A cell (~{; c) or ~{c in the array structure is said to be located at ~{. Cells are ordered locally: ~{c  ~{0 c0 if and only if ~{ = ~{0 and c 0 c0. V = V0 is countable by hypothesis. And

 E = I  E0 . Possible events are the localisations of possible scalar events. The type of E is correct since E0  C0  V0 and so E  I  C0  V0 = C  V .  The enabling relation ` is between Pfin (I  E0) and I  C0. There are two types of enablings, local or through a link. f~{0 c1v1 ; : : : ;~{0 ck vk g ` ~{c when ((~{0 = ~{) or (~{0 ;~{) 2 L) and fc1v1 ; : : : ; ck vk g `0 c. The rst type of enabling gives M 2 a copy of the enabling relation of M at every location ~{. The second de nes the e ect of enablings across links (see for example the arrays in the left column of Fig. 1). Because M is a gCDS, it follows that E and ` are upwards closed with respect to . The following is also necessary to make M 2 a gCDS. Lemma 1  is well-founded. Proof: A descending chain in  determines a descending chain in 0 . 2 As a result the set D(M 2) of states of an array structure is a generalised concrete domain called an array domain. Its states will be called arrays over M . 4

Two important remarks about M 2 : because a cell can be enabled either locally or remotely, enablings are not unique even when they were so in M and because the cells of t 2 D(M 2) may have been enabled remotely, the set t~{ = t \ (f~{g  C0  V0) is not in general a state of D(M ) For example the following is a state of Vnat2 when (~j ;~{) 2 L: t = f~|0;~|1;~|2;~{3g. It represents the situation where node ~| has begun counting and ~{ picks up from 3 onwards. Here t~{ is not a state. De ne now the category ADScont with array structures as objects, continuous transformations between their array domains D(M 2) as morphisms, function composition as composition and identity transformations as identity morphisms. It is a subcategory of gCDScont . Let Null be the CDS (;; ;; ;; ;) whose only state is ;. Then Null2 = (;; ;; ;; ;) and Null2 is a terminal object in ADScont . The product of array structures is a special case of the product of CDSs. A pair of arrays corresponds by geometrical superposition to an array of pairs. Let Mk2 = (I  Ck ; Vk ; I  Ek ; `k 2) for k = 1; 2 be two array structures over Mk = (Ck ; Vk ; Ek ; `k ). The array product structure M12  M22 = (I; V; E; `) is such that the two enabling relations are superimposed without interaction: f~{l el :kl gl ` ~{c:k if and only if 8l: kl = k and f~{l el gl `k ~{c. As a result the array construction preserves nite products.

Lemma 2 (M1  M2 )2 = M12  M2 2 in ADScont . The ismorphisms are splitM ;M : x 7! (x1 ; x2) where xi = fe j e:i 2 xg and merge : (x1; x2 ) 7! x1 :1 [ x2 :2. Since the product is a categorical product in gCDScont and since 1

2

it preserves array domains, it is also a categorical product in operator preserves array structures.

ADScont .

The exponential

Proposition 4 (M 2 ! N 2) = (M 2 ! N )2 in ADScont . Proof: (Outline) Both structures have the same cells, cell ordering, events and enablings up to the following isomorphisms: loc : glob :

a 2 [D(M 2 ! N 2 )] 7! f~{xe j x~{e 2 ag t 2 [D((M 2 ! N )2)] 7! fx~{e j ~{xe 2 tg

which interchange the role of indices and other parts of the structure. 2 So loc (localisation) takes an array transform and decomposes it into an array of scalar functionals, while glob (globalisation) takes the array of functionals and returns the transformation that applies every element of it to an argument array. Since localisation is independent of the array structure of the source domain M 2:

Corollary 1 (M ! N 2) = (M ! N )2 for any gCDSs M; N . ADScont is

closed for (the terminal object and) the product and exponentiation of its enclosing category gCDScont . Moreover, in gCDScont application and curry cation satisfy the axioms for being a CCC. Therefore

Proposition 5

ADScont is

a CCC.

5

I =

f~a; ~b; ~cg max2

:

and L =

f~a~b; ~b~c; ~c~ag.

Write array

f~ax; ~by; ~cz g

as [x

j y j z].

! Vnat2 [0; 1; 3  j 1  j 2] ! 7 [0; 1; 2; 3  j 0; 1; 2; 3  j 0; 1; 2; 3] [4; 5  j 0; 1  j 2; 3] 7! [0; : : : ; 5  j 0; : : : ; 5  j 0; : : : ; 5] Vnat2

Figure 1: The e ect of max2. To illustrate the structure of ADScont , consider the following example. Interpret arrays t 2 D(Vnat2) as maps from I to unary integers with the provision that for example f0; 1; 2; 7g is interpreted as 7 (remember that in general t~{ is not a state of Vnat). According to this representation, union of sets of events corresponds to their maximum. Consider the max-reduction function: [ max2 : Vnat2 ! Vnat2 : 8~{: (max2 t)~{ = ft~{ j ~{ 2 I g: It computes the overall maximum of integers in the array and distributes the result everywhere as illustrated by Fig. 1. Now max2 is a state of (Vnat2 ! Vnat)2  = (Vnat2 ! Vnat2) and its events have the form x~{n  = ~{xn where x 2 Dfin(Vnat2). By applying the de nitions, the enablings are f~{0x0 (n ? 1)g ` ~{xn where x0  x and ~{0~{ 2 L0 [ L: In other words if knowing x0 about the input makes the output at ~{0 no less than (n ? 1) then knowing x0, the output at ~i may be n or more. Then enablings for which ~{0 6= ~{ enable the transmission of values along links, of which algorithms for max2 can take advantage: the event ~{n can occur in max2t before or even without ~{k events for k < n. This is the intention behind the de nition of M 2. We return to this example in Sect. 6.

4 Algorithms This section summarises Brookes and Geva's construction of the co-Kleisli category of gCDS and algorithms. It constitutes the framework for an intensional semantics where algorithms, i.e. intensional meanings, are related to each other and to continuous functions, i.e. extensional meanings. Given any gCDS M , the structure of paths over M is PM = (Vnat ! M ). Paths are treated as maps or states depending on context. P is made into an endofunctor for gCDScont by (f : M ! N ) 7! (Pf = map f : PM ! PN ) where map fc = f  c. Functor P together with the following continuous maps S val M = c 2 D (PM ): fcn j n 2 D (Vnat)g : PM ! M pre M = c 2 D (PM ): m:n:c min(m; n) : PM ! P 2M satisfy the axioms of a co-monad over gCDScont i.e. val : P ! Id, pre : P ! P 2 and path : Id ! P are natural transformations and the following identities hold: (map pre M )  pre M = pre P M  pre M (3) 6

 pre M = idP M (4) (map val M )  pre M = idP M (5) The continuous transformation path M = x 2 D(M ): i: x : M ! PM is added and val P M

satis es:

 path M = path P M  path M = val M

idM pre M

 path M

(6) (7)

The above axioms make (P; val ; pre ; path ) into a computational co-monad [5]. The coKleisli category built from gCDScont and (P; val ; pre ; path ) is called gCDSalg and de ned as follows. Its objects are gCDSs and its arrows are algorithms, continuous maps from paths to states. The exponential of M; N is (M ) N ) = (PM ! N ) and algorithm composition is a0  a = a0  (map a)  pre M . The identity algorithm is idM = val M , the terminal object, currying, the pairing and projections are the same as in gCDScont , with the provision for example that a pair of paths corresponds to an path on pairs in the obvious fashion. Algorithms form a domain under pointwise ordering i.e. eagerness (see [5] for the general categorical de nitions and examples of algorithms in gCDSalg ). The application algorithm is given by

d

app M;M 0

= app M;M  (val M )M  idP M ): 0

0

The input-output function of an algorithm a is fun a = apath M and the canonical algorithm of function f is alg f = f val M . In summary P constructs computations, val is a standard algorithm, pre constructs the pre xes of a given computation and path makes a constant computation out of a state. De ned in this way, gCDSalg is a CCC and re nes gCDScont in the sense that fun ?1 de nes an input-output equivalence where each class of algorithms forms a complete lattice. Moreover M ! M 0 is isomorphic to the quotient of M ) M 0 by input-output equivalence.

5 Distributed Paths The co-Kleisli construction is directly applicable to array structures. But it nds a useful application to array algorithms in a slightly generalised form: with distributed time instead of the implicit global time of P -paths. We rst verify the construction and then show that algorithms in the resulting category are more expressive than those of gCDSalg . Let rst I? be the CDS (fC g; I; fC g  I; f` C g) such that D(I?) is isomorphic to the at domain of indices. Consider now I? ! Vnat, whose domain is isomorphic to ! I since D(Vnat)  = ! . So D(I? ! Vnat) is the I -indexed product of D(Vnat). We will c as maps from I to ! . De ne use it as domain of distributed clocks, and write its states m 2 2  T : ADScont ! ADScont by TM = (I? ! Vnat ) ! M = ! I ! M 2. States c of TM 2 are 2 called computations. Since TM 2  = (!I ! M ) (by corollary 1), T preserves array domains. Its extension to a functor is pointwise like P : given f : M 2 ! N 2 de ne Tf = map f i.e. (Tf )c = f  c. T preserves identities and composition. De ne now the elements of a 7

c 0 1 2 3

0

1

[; j ;] [0 j ;] [0 j ;] [0; 1 j ;] [0 j ;] [0; 1 j 2] [0; 1 j 2] [0; 1; 3 j 2] ::: :::

c0 0 1 2 3

2

[0; 1 j ;] : : : [0; 1 j ;] : : : [0; 1 j 2] : : : [0; 1; 3 j 2] : : : :::

0 [; j ;] [; j ;] [; j ;] [; j ;] :::

1 2 [; j ;] [; j ;] : : : [0; 1 j ;] [0; 1 j ;] : : : [0; 1 j ;] [0; 1; 3 j 2] : : : [0; 1 j ;] [0; 1; 3 j 2] : : : ::: :::

Figure 2: Computations c; c0 2 T Vnat2 (jI j-dimensional). Event n written n. computational co-monad in analogy with (P; val ; pre ; path ): S b j nb 2 ! I g tval M 2 = c 2 D (TM 2 ): fcn c: nb : c(m c \ nb ) tpre M 2 = c 2 D (TM 2 ): m 2 c: t tpath M 2 = t 2 D (M ): m

c \ nb ) is the pointwise minimum of two \distributed clocks" in ! I . The value of where (m c-th initial a computation is de ned by tval as the union of its intermediate values, the m c namely those parts of c happening no later than m c. subcomputations of c are the (tpre c) m And tpath associates a constant computation to any array.

Proposition 6 (T; tval ; tpre ; tpath ) is a computational co-monad in ADScont . Proof: (Ouline) Axioms (3) to (7) and the properties of natural transformations are veri ed

with M 2 replacing M , T replacing P , and tval ; tpre ; tpath replacing val ; pre ; path . The non-trivial parts involve basic properties of ! . 2 We may now de ne the associated co-Kliesly category [5] of arrays domains and algorithms ADSalg . Its objects are the ADS and its arrows are continuous maps from computations to arrays. The exponential and composition are de ned as in gCDSalg : M =2) N = (TM ! N ) and a0  a = a0  (map a)  tpre T M 2 .

6 Array Algorithms We now explain how array algorithms are strictly more expressive than path algorithms and give example algorithms for max2. Assume the index space I = f~a; ~bg, L = f~a~b; ~b~ag and the initial part of a computation c in T Vnat2 (Fig. 2). Implicit communications occur when cells are lled without local enabling, like cell ~b2 between c(1; 2) and c(2; 2). Implicit local work occurs when cells are lled without remote enabling, for example cell ~a1 between c(1; 0) and c(1; 1). So because they are array-valued, computations in TM 2 are able to discriminate between values com2 2 2 puted locally or communicated. As a consequence, algorithms in M =) N can select computations according to those communications, favour certain links etc. This is the rst kind of information speci ed in ADSalg and not in gCDSalg . But M 2 =2) N 2 also involves 8

distributed time. A computation in TM 2 speci es a family of possible PM 2-paths determined by the relative speeds of the local times. For example c of Fig. 2 traces the following path when the times evolve as (0; 0) ! (0; 1) ! (1; 2) ! (2; 2) ! (3; 2) ! : : :: [; j ;] ! [0  j ;] ! [0; 1  j ;] ! [0; 1  j 2] ! [0; 1; 3  j 2] ! : : : Some events of c occur at ~b when it is ~a's time which increases, like c(1; 1) = [0; 1  j ;] and c(2; 1) = [0; 1  j 2]. Certain algorithms may reject this behaviour and thus map c to ?. Yet others could interpret it as a write action of ~a into ~b's store. A computation may be \asynchronous" like c or synchronous like the c0 of Fig. 2, in the sense that c0(ma ; mb) = c0(min(ma ; mb); min(ma ; mb)). So although computations are very loose structures with respect to concurrency, algorithms can lter them and map their asynchronous behaviour to unique values. This justi es our claim that ADSalg is a truly parallel model of functional (hence deterministic) programs. It should be clear from the above remarks that array algorithms allow the speci cation of synchronisation details invisible to algorithms on paths. That is the second way in which ADSalg is more expressive than gCDSalg . Consider now four possible algorithms for the function max2 introduced at the end of Sect. 3. They vary in eagerness and synchronisation. Let lsmax2 (lazy-synchronous) be the least2algorithm such that lsmax2 c = ? for asynchronous c lsmax2 [; j ;]n  c = max2 (tval c) for c synchronous : where the synchronous computation is represented by its diagonal path, in which c denotes concatenation. Here lsmax2 ignores asynchronous computations and waits arbitrarily long before producing its value. Let esmax2 (eager-synchronous) be the least algorithm of value ? for asynchronous c and such that:

 j ;]  [0  j 1]  c =

c) where c is synchronous. esmax2 immediately requires an action at ~ a and a communication along ~a~b. Similar de nitions over all computations yield algorithms lamax2 (lazy-asynchronous) and eamax2 (eagerasynchronous). The partial order among the four algorithms is lsmax2 < flamax2; esmax2g < eamax2 . The \synchronous" predicate could be replaced by other properties, for example to modulate the use of certain links. esmax2 [0

max2 (tval

7 Data-Parallel Semantics Report [5] concludes with a pair of denotational semantics for the same simply-typed lambdacalculus with products and constants. The extensional semantics is interpreted in gCDScont by standard equations and the intensional semantics follows similar equations interpreted in gCDSalg . The main result is that if they agree on the constants (the precise de nition of this agreement involves path and fun ), the intensional and extensional semantics also agree on all terms. By proposition 6 this construction applies directly to ADScont and ADSalg but it is beyond the scope of this article to discuss its implications in detail. 2

Taking advantage of the complete lattice structure of fun ?1 max2.

9

But to further illustrate the interest of array structures for data-parallel semantics, we propose a short solution to a problem left open by the designers of the parallel functional language Crystal (described in [7]). We only use ADScont , not ADSalg . The problem is to give a meaning of an index domain d which is de ned recursively using a function f whose domain is precisely d. This is dicult in Crystal because of the separate treatment of index domains and functions over them. But in array semantics everything refers to the same index domain. So for the purpose of this example let us assume that I = N. Let 2 be a trivial CDS isomorphic to the boolean domain fF < T g. Then arrays of 22 represent subsets of N and we write singletons fnT g as n . The problem's data are the domain d : 22 of the form f0; 1; : : : ng, a function f : 22 ! M 2 into some values M and a predicate over f; d which we encode as  : (22 ! M 2) ! 22 ! 22 so that the only values of  are T ! and F ! (constant true and constant false arrays). The question left open in [7] is approximately the following: de ne (f; d) such that f n =? when n is outside the range of d and d is the least initial segment ending in n such that f n is true. After a few transformations we arrive at the following description: d = (Y (x:n : if f n = T ! then n else n [ x(n + 1)))0 f = x: if x  d then e else ? : And so (f; d) is the xpoint of (f; d) = ((Y (x:n: : : :))0; x: if x  d : : :).

8 Conclusion A model of data layout has been introduced in the theory of gCDS and shown to provide intensional semantics for higher-order functional programs on arrays. The following observations should convince the reader that the ADScont -ADSalg framework brings new precision to data-parallel language description. The choice of language primitives is an important question in data-parallel programming [2, 1]. Too many or too powerful primitives hinder portability, while too few and less expressive ones limit expressive power. Now a xed set of primitives (i.e. constants) can be formally related to di erent implementations by varying the algorithms for those constants and/or the index space (I; L). \Porting" the language corresponds to preserving the input-output functions of primitive algorithms. Hudak's work [11] highlights the need for functional programmers to control computation algorithms. In ADSalg there is an objective criterion for choosing language constants that would provide this kind of control: they must be array algorithms. An interesting question is the de nition of synchronisation primitives with (functorial) semantics in ADSalg , i.e. which preserve array algorithms. The loc isomorphism determines in a trivial way the localisation of a function or algorithm's output. But a distributed-memory implementation along the lines suggested by array semantics (i.e. where indices are processes) requires control of locality in communications. Although this dicult question was not raised here, it can now be described mathematically: in a function M 2 ! M 2, event x~{cv causes more or less communication along channel ~|~{ depending on the size of x~| (recall that for an array state x and an index ~|, x~| is an abbreviation for fe j ~|e 2 xg). Similarly, computations and algorithms can be classi ed according to communication criteria. 10

Work on those questions should improve our understanding of data-parallel functional languages and provide new tools for their systematic design.

References [1] G. Blelloch and S. Chatterjee. VCODE: a data-parallel intermediate language. In J. JaJa, editor, 3rd IEEE Symp. Frontiers of Massively Parallel Comp., 1990. [2] G. E. Blelloch. Vector Models for Data-Parallel Computing. MIT Press, 1990. [3] L. Bouge. On the semantics of languages for massively parallel SIMD architectures. In E. H. L. Aarts and J. van Leeuwen, editors, PARLE-91, number 505 and 506 in Lecture Notes in Computer Science, Eindhoven, June 1991. Springer. [4] L. Bouge. Le modele de programmation a parallelisme de donnees: une perspective semantique (version revisee). Research Report 94-06, LIP, E cole Normale Superieure de Lyon, 1994. [5] S. Brookes and S. Geva. A cartesian closed category of parallel algorithms between Scott domains. Technical Report CMU-CS-91-159, Carnegie Mellon Univ., 1991. [6] S. Brookes and S. Geva. Continuous functions and parallel algorithms on concrete data structures. In MFPS'91, L.N.C.S. Springer, 1991. [7] M. Chen, Y.-Il Choo, and J. Li. Crystal: Theory and pragmatics of generating ecient parallel code. In B. K. Szymanski, editor, Parallel Functional Languages and Compilers, chapter 7. ACM Press, 1991. [8] P.-L. Curien. Categorical Combinators, Sequential Algorithms and Functional Programming. Birkhauser, Boston, second edition, 1993. [9] C. A. Gunter and D. S. Scott. Semantic domains. In J. Van Leeuwen, editor, Handbook of Theoretical Computer Science. North-Holland, MIT-Press, 1990. [10] G. Hains and J. Mullins. A categorical model of array domains. Rapport de Recherche RR94-43, LIP, E cole Normale Superieure de Lyon, December 1994. [11] P. Hudak and S. Anderson. Pomset interpretations of parallel functional programs. In G. Kahn, editor, Functional Programming Languages and Computer Architecture, number 274 in Lecture Notes in Computer Science. Springer, 1987. [12] B. C. Pierce. Basic Category Theory for Computer Scientists. MIT Press, 1991.

11