A geometrical data-parallel language1 - Semantic Scholar

5 downloads 0 Views 250KB Size Report
The data-parallel objects (DPO) are necessarily linked to an hyper-space. Their shape and their position are dynamic during the execution, but the DPO ...
A geometrical data-parallel language1 Jean-Luc Dekeyser Dominique Lazure Philippe Marquet fdekeyser,lazure,[email protected]

Laboratoire d'Informatique Fondamentale de Lille Universite des Sciences et Technologies de Lille2 Publication ERA{127 April 1993 revised January 1994 ACM Sigplan Notices, 1994

This project is partially supported by Digital Equipment Corporation in the frame of an EERP/DPRI contract. 2B^ atiment M3, Cite Scienti que, 59655 Villeneuve d'Ascq cedex, France. 1

Abstract The help project proposes a model of data-parallel programming allowing a programmer to develop an algorithm the nearest of his thought. Usually, for many parts of a data-parallel program, the manipulations of data could be modelized as geometrical migrations inside a cartesian reference space. We de ne the language C-help in the frame of explicit data-parallel languages, the communications and the computations are separated, moreover any vector description is banished. This model and the associated languages are based on the hyper-space notion, and the algorithm development follows an original semantic of computations limited to a set of hyperspace points. The hyper-space is not only a compilation-oriented concept but consists in a multi-dimentional virtual array integrated at the programming model and provides a referential for any object access.

Introduction The parallel programming techniques are divided into two elds: the task-parallelism and the data-parallelism. Those two techniques coexist and the parallelism specialist community is divided into two clans. Nevertheless a connection between the two philosophies is observed: the task parallelism is selected for the execution model (supported by the architecture) whereas the data-parallelism becomes the programming model (supported by the compilers). Some recent works have allowed the emergence of two standard programming language: HPF [For93] and HPC. With those tools, it is possible to manage eciency the power of parallel architectures but no general methodology is provide for the scientist to develop parallel algorithms. Those new standards are based on a standard languages (Fortran , C) on which some features are added, independently of the programming model. For the C-based languages, the speci c parallel features are often issued directly from the architecture; while Fortran-issued languages do not provide deterministic criteria of optimization (one can optimize its program only after execution). Hence, the data-parallel algorithm development needs some tools to simplify the architecture power access. The aim of the help project is to propose a general methodology integrating all steps of the parallel development. This methodology is particularly well-suited to scienti c algorithms. It relies on the three following principles: The data-parallel developing tools uni cation The design, the translation, and the debugging of a data-parallel program have to be grounded on a single concept: the geometrical model. The data-parallel programming deals with some computations applied in parallel on some homogeneous data structures. Those structures are often multi-dimentional arrays and the descriptions of those objects can be translated in some geometrical moves inside a cartesian space. The resolution methodology, that has been elaborated during the algorithm design, has to be directly supported by the syntax and the semantic of the parallel programming language. Also, the debugging tool has to lay on this methodology. The communications and the parallel computations explicitation To be ecient, the parallel programming has to be explicitly expressed by the programmer. Like the

vector architectures, the massively parallel machines require explicit communications and explicit calls to parallel computations. The compiler has not to take into account the object migrations in the network. As far as eciency is concerned, those migrations are too much in uent to be programmerhidden. The communications and the computations separation While the communications and the computations are explicit, a parallel programming legibility needs those two fundamental operation separation at the model and the language levels. Therefore, the generated code becomes more ecient. help provides an explicit and imperative data-parallel language. Thus, the communication and the computation phases are clearly distinguished. The help-based languages take into account this distinction by providing some explicit communication operators. In opposition to High Performance Fortran [For93], every communication generated by a nonaligned-operand computation is banished in the help model. In this article, we detail every steps of a scienti c algorithm development using the help model. We present the main characteristics of the help implementation inside the C-help language. The square matrix inversion by the Gauss-Jordan algorithm is developed as an example; it shows the adequation of help model to the matrix computation algorithms.

1 Data-parallel think Before writing a parallel program, one has to choose a strategy in order to parallelize his problem. To support this choice, we propose the geometrical model.

1.1 The geometrical model

The scienti c data-parallel algorithms are mainly laying on some regular structure manipulations (vectors, matrix...) on which some global or regular-subset-limited computations (for example, limited to a matrix row...) are applied. The algorithm description is then simpli ed if the programming model is restricted to the manipulation of geometrical objects and their geometrical subsets. For example, in many numerical algorithms, a matrix has to interact with one or several of its rows or columns. In this case, and in order to avoid implicit communications, the data-parallel model has to adopt a primitive of replication to extend the vector and make it conform to the matrix. The programming model able to implement those algorithms consists in de ning a work space, moving data-parallel objects inside this space by geometrical primitives and then triggering computations locally to the points of this space. Those computations may be triggered in parallel because each point owns its elemental operands. A geometrical approach allows the programmer to visualize his algorithm inside the space. The data-parallel model understanding is made easier by the association of the geometrical visualization with the geometrical expression of the algorithm.

1.2 Example: Gauss-Jordan, the algorithm

The Gauss-Jordan square matrix inversion algorithm is made up of three phases: the initialization, the diagonalization and the inverse computation. 2

Initialization The square matrix A has to be inversed. A and a same-size identity matrix

are coupled together. The treatments applied to A will be simultaneously computed on this second matrix. The \union" of the two matrix is called M . 1

1

1

A

1

0

0 1

1

1

1

1

1

1

Diagonalization In order to zero the elements of the p-row, we apply to each r-row, with r= 6 p rowr = rowr ? rowp  Arp=dp While the communications are explicit, we have to di use the pivot row (rowp) on every other row, and to di use the pivot column (colp) on every other column, in order to make Arp visible along r-row. This treatment may be realized parallel.

d1

colp

d2 0 0 dp?1 dp rowp rowr

Inverse computation After the diagonalization phase, each matrix row has to be divided by the same-index diagonal element value.

rowr = rowr=dr In order to di use each element of the diagonal to the right part of the matrix M , we use the geometrical primitives of projection and of replication. This communication treatment is realized in parallel for each row. 3

d1

d2

0

dr

A?1

0

dN This basic example has demonstrated the real adequation of a data-parallel model using 2-dimensional space and geometrical primitives of communication for the general matrix computation purposes: some geometrical primitives provide the data migrations. Then, the data are available on the points where computation treatments have to be triggered.

1.3 Hyper-Space: A geometrical reference for the HELP model

The interacting object manipulation is always hyper-space internal. An hyper-space is a geometrical set of points where the parallel objects (matrix, vectors) will be positioned and moved. The help hyper-space point is the basic entity for computation treatments. All computation on the data-parallel objects are triggered at the point level on the local data. Several hyper-space may be de ned for a single algorithm, for di erent speci cations of size or data shape. The hyper-space regroups and aligns together the same-distribution objects. The programmer speci es the data distribution of its program by the expression of some directives linked to the hyper-space notion. The same-hyper-space objects are so forth allocated with the same distribution algorithm. An hyper-space is de ned as a cartesian reference of positive-coordinate points. Some informations are linked to each hyper-space:  The size of each dimension (number of points).  Some priority orders between the hyper-space dimensions, allowing the programmer to make either the parallelism or the communications provileged; two concepts are provided for this feature: The elementary block speci es a geometrical set of hyper-space points. The communications inside two points of the same block will never generate any physical communication. Particularly, a full dimension may be grouped inside a unique elementary block of the same size. During the projection, the help compiler will allocate this dimension in memory. The parallelism priority tree A priority order between the hyper-space dimensions allows the programmer to privilege the physical projection of a dimension on the processors network. Several examples of this parallelism priority tree are developed in the part 2. 4

Secondary dimensions could be de ned as the composition of some hyper-space dimensions. With this feature, one can manipulate non-regular objects like the diagonal of a matrix. Those secondary dimensions are also used with the geometrical primitives of communication. The data-parallel objects (DPO) are necessarily linked to an hyper-space. Their shape and their position are dynamic during the execution, but the DPO association to an hyperspace is xed. Those objects are multi-dimensional arrays, de ned in relation of the hyperspace dimensions, either primary or secondary dimensions. Each point included in the object geometry holds one of its elements.

1.4 Two programming levels The help thinking model clearly separates communications from computations. Like this model, two programming levels are provided by help languages: the microscopic level where the local-to-point computations are triggered; and the macroscopic level where the communications are modelized by some geometrical primitives calls.

2 Data-parallel writing After the algorithm design at the model level, one has to translate it into a data-parallel language. Such a language is derived from a classical language (C or Fortran ) in which some features derived from the model are included. One can also directly program its algorithm with HelpDraw. HelpDraw is a graphical editor derived from the help model, providing all geometrical features of the hyper-space notion. In order to ease the data-parallel language learning, a sequential expression in the usual language is kept. Only the data-parallel part of the algorithm is written with speci c constructors. Those constructors are identical for C-help and Fortran-help .

2.1 The C-HELP language

Hyper-Spaces declaration

For each hyper-space dimension, the programmer has to declare:

 A name, in order to increase the C-help code legibility.  The size (number of points).  The elementary block size. By default, this size is 1. The star symbol ('*') makes the dimension to be mapped in memory.

The parallelism priority is speci ed by the expression of a priority tree. The most priority dimensions (appearing at the highest level of the tree) are mapped onto the physical network, in order to privilege the parallelism. If two dimensions have the same priority (the same level in the tree), the priority is assigned from the left to the right. By default, the priority tree is one-level and the dimensions appear in the declaration order. 5

Examples of priority tree A 6  2 hyper-space is mapped onto a 2-dimentional grid machine with 32 processors. Two lays are necessary to map two points by physical processor. Three di erents mapping could be obtain with di erent priority trees: hspace plan [ x = 6 , y = 2 ] (x,(y)); (1,1)

(2,1)

(3,1)

(1,2)

(2,2)

(2,3)

(4,1)

(5,1)

(6,1)

(2,4)

(2,5)

(2,6)

x y

hspace plan [ x = 6 , y = 2 ] ((x),y); (1,1)

(1,2)

(2,1)

(4,1)

(4,2)

(5,1)

(2,2)

(3,1)

(3,2)

(5,2)

(6,1)

(6,2)

y x

hspace plan [ x = 6 , y = 2 ] (x,y); (1,1)

(2,1)

(3,1)

(4,1)

(5,1)

(6,1)

(1,2)

(2,2)

(3,2)

(4,2)

(5,2)

(6,2)

x

y

Secondary dimensions The secondary dimensions are obtained by the composition of several hyper-space dimensions. This composition can be operated between some primary dimensions or secondary dimensions, if every primary dimension appearing in the expression is not used more than once. The secondary dimension sizes can not be expressed by the programmer. hspace plan

[ x = 100 , y = 100 , d = (x , y) ]; y

6

 -

The hyper-space declaration follows the grammar: 6

d

x



= = = = = = = = = = = =

hspace `[' `]' [ ] `,' | `=' [ ] | `(' `)' , | , `(' `)' `*' | `(' `)' `,' | |

Data-Parallel Object declaration By default, every DPO is dynamic for its size and its position inside the hyper-space. One can declare a DPO static using the key-word steady if its size and position are known at compil-time. Such a DPO becomes static and could not migrate over the hyper-space. A DPO is built over the primary and secondary dimensions ; but all the primary dimensions used to de ne a secondary dimension could not be used at the same time with this secondary dimension to allocate a DPO. The DPO declaration follows the grammar :



= [ steady ] dpo = `[' `]' | = `,' | = [ `=' ] = | `*' | `:' | `:' `*' | `;' = `conform' `(' ')'

For a DPO declaration, the references to the hyper-space declaration is expressed either with the dimension name, or with the hyper-space primary dimensions declaration order. In order to specify the position and the size, one can declare:

   

Only the position; the DPO dimension is by default one-sized. `*'; the DPO dimension size is complete (the same size as the hyper-space). A couple lower bound `:' upper bound. A couple lower bound ';' length. 7

hspace cubicspace dpo float cubicspace dpo float cubicspace dpo float cubicspace dpo float cubicspace dpo float cubicspace dpo float cubicspace

[ x=100 , y=100 , z=100, d=(x,y) ] ; [ x=40;30 , y=80;20, z=1 ] cube ; [ x=30:100, y=10 , z=* ] plan ; [ x=1 , z=80 , y=* ] line ; [ x=1 , y=50 , z=50] point ; [ x=30 , d=1;70 , z=1 ] diag ; conform(line) sameline; cube

y

6 line

z

diag point

-

x

plan



cubicspace

The microscopic level The microscopic treatments are computed locally to the hyper-space points. Such treatments consist in scalar operator or function applications on the DPO values belonging to the same point. All the scalar operators of the host language are extended to the DPO manipulation. In order to make two DPO interacting, the help model needs the programmer to insure the DPO conformity (same position and same shape). While the projection directives are linked to the hyper-space notion, the logical conformity of two DPO implies the physical conformity of data. There is not any implicit communication in C-help, making the code generation simpli ed and more ecient.

The conformity domain By default, two interacting DPO have to be conform; the conformity domain is de ned as the set of points where those DPO are allocated. y

y

6

6 A

B

C

C A

-

B

-

x

A = B + C /* legal */

x

A = B + C /* illegal */

conformity domain =A=B=C

8

Constraint domain, conformity rule Sometimes, the computation has to be constricted

to a set of common points to non-same-shape or non-same-position DPO. The operator and the constructor on(DPO) limit the conformity domain to the points where DPO is allocated. The interacting objects inside the in uence of an on(DPO) must embed DPO. One can successively reduce the constraint domain, nesting on constructors. y

y

6

6 B

B

A

A

-

-

x

A + B

x

on(B) A + B

illegal : A and B are not conform. legal : A includes B. The conformity rule must be applied to all the operands of all the microscopic operators (except injective assignment, see below). This rule is: Two DPO are conform: 1. If the conformity domain is expressed by a conformity constructor on, each DPO domain has to include this domain. 2. If there is no explicit conformity domain, all the DPO must have the same shape and the same position inside the hyper-space. The scalar data are considered conform to any DPO. Moreover, a DPO appearing as an argument of a nested on has to be included in the previous conformity domain (successive reductions of the conformity domain). y

6 B A

on (B) { A + B; on (C) { D + A; } D + B;

/* /* /* /* /* /* /*

C D

-

x

conformity domain = allocation domain of B correct : A and B include the conformity domain correct : the conformity domain includes C the conformity domain becomes the domain of C correct : D and A include the conformity domain the conformity domain back to B error: D does not include B

}

9

*/ */ */ */ */ */ */

Masked domain Like most of the data-parallel languages, C-help provides the possibility to mask some points of the conformity domain by the use of the operator where(DPO_EXPR). The DPO appearing in the expression DPO_expr and inside the where-block expressions have to be conform. The new masked domain is composed of the points where the DPO_expr expression evaluation result is true. The masked domain is a sub-set of the conformity domain. The microscopic operations nested in a masked domain are only computed on the masked domain points. Moreover, a macroscopic operator application or a function call masks the current conformity domain (cf. infra). Afterwards, the masked domain is inversed for the elsewhere block execution. where (A!=0) where (C!=0) A = ( (B / C) + D ) / A ; elsewhere A = D / A; elsewhere A = 1 ;

/* A, B, C, and D are conform /* mask1 : (A!=0) /* mask2 : (mask1) && (C!=0)

*/ */ */

/* mask3 : not(mask2)

*/

/* mask4 : not(mask1)

*/

The C-language operator expr ? then_expr : else_expr is extended to the DPO operands. This constructor is valid if expr, then_expr and else_expr satisfy the conformity rule. The resulting DPO is allocated on the conformity domain.

Association The association operator

Suggest Documents