Dynamic Distributed Data (DDD) in a Parallel Programming ...

Dynamic Distributed Data (DDD) in a Parallel Programming Environment { Speci cation and Functionality Klaus Birken

Rechenzentrum der Universitat Stuttgart Allmandring 30, 70550 Stuttgart E-mail: [email protected]

Peter Bastian

Institut fur Computeranwendungen Universitat Stuttgart Pfaenwaldring 27, 70550 Stuttgart E-mail: [email protected]

Forschungs- und Entwicklungsberichte

September 94

RUS-22

Abstract The parallel implementation of algorithms based on dynamic data structures (e.g. adaptive PDE solvers, sparse matrices) on architectures with distributed memory will unavoidably lead to various technical problems as for redundancy, consistency and load balancing. The module DDD is presented, which makes the development of dynamic parallel algorithms much easier. DDD consists of a formal model for describing distributed data, a functionality speci cation and a portable library implementation. The main goal of DDD is to provide a general and ecient interface connecting SPMD-style applications with various architectures and programming models.

Contents 1 Motivation

3

2 Framework

5

3 Formal Model: Data Distribution

7

3.1 Parallelization in the DDD context : : : : : : : : : : 3.2 The notion of "object" in DDD : : : : : : : : : : : : 3.3 Global View of Data : : : : : : : : : : : : : : : : : : 3.3.1 Global object. : : : : : : : : : : : : : : : : : : 3.3.2 Reference relation. : : : : : : : : : : : : : : : 3.3.3 Set of global relations. : : : : : : : : : : : : : 3.3.4 Global (sequential) graph. : : : : : : : : : : : 3.4 Local View of Data : : : : : : : : : : : : : : : : : : : 3.4.1 (Local) object. : : : : : : : : : : : : : : : : : 3.4.2 Local relation sets, local graphs. : : : : : : : : 3.5 Distributed View of Data : : : : : : : : : : : : : : : : 3.5.1 Identi cation of local objects. : : : : : : : : : 3.5.2 Distributed objects. : : : : : : : : : : : : : : : 3.5.3 Two possible pointer models. : : : : : : : : : 3.5.4 Set of distributed relations, distributed graph. 3.6 Parallelization/Distribution : : : : : : : : : : : : : : 3.6.1 Distribution. : : : : : : : : : : : : : : : : : : : 3.6.2 Design of distributions. : : : : : : : : : : : : : 3.6.3 Design example. : : : : : : : : : : : : : : : : : 1

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

7 8 9 9 10 10 10 12 12 12 14 14 14 14 15 16 16 17 18

CONTENTS

2

3.7 Supplements : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 19 3.7.1 Additional property: Priority. : : : : : : : : : : : : : : : : 19 3.7.2 Additional property: Range : : : : : : : : : : : : : : : : : 19

4 Functionality of DDD

4.1 Management Module : : : : : : : : : 4.1.1 General management. : : : : : 4.1.2 De nition module. : : : : : : 4.2 Object Management Module : : : : : 4.2.1 Object creation and deletion. 4.2.2 Object properties. : : : : : : : 4.3 Interface Module : : : : : : : : : : : 4.3.1 Interface de nition. : : : : : : 4.3.2 Interface usage. : : : : : : : : 4.4 Xfer Module : : : : : : : : : : : : : : 4.4.1 Xfer process. : : : : : : : : : 4.4.2 Object xfer operations. : : : : 4.4.3 Additional xfer operations. : : 4.5 Identi cation Module : : : : : : : : : 4.5.1 Identi cation process. : : : : : 4.5.2 Identi cation operations. : : : 4.6 Supplement Module : : : : : : : : : : 4.6.1 Additional info functions. : : 4.6.2 Maintainance and debugging.

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

21

22 22 22 24 24 24 25 25 26 27 27 27 28 29 29 29 31 31 31

Chapter 1 Motivation A lot of existing, well-known algorithms use unstructured data as an underlying concept for computational intensive procedures. For example, modern algorithms for the numerical solution of partial dierential equation problems may use discretizations resulting in unstructured grids. Complex data structures are needed to represent these grids in the corresponding implementations (see [1], [4]). One general way to formulate the "unstructuredness" of data within a particular implementation is the concept of references, which connect certain data objects in a potentially random manner. With respect to certain programming languages, one might use native pointers in C, index arithmetics in Fortran or some user-de ned pointer-class in C++ to represent those references in the actual program. For the implementation of these methods on a MIMD parallel computer with distributed memory usually the data distribution paradigm is used. All processors execute the same program, but work on dierent portions of the data structure (single program multiple data model). Several problems arise in this context [5]:

The whole (global) data structure has to be split up and distributed into the local memories of the participating processors.

This distribution has to be equally balanced to ensure ecient execution of the parallel program.

3

4

CHAPTER 1. MOTIVATION

Whenever the data structure changes at runtime, the workload will get imbalanced and has to be redistributed, while taking the previous distribution into account.

In order to keep the dierences between the sequential algorithm and the

parallel program as small as possible, data redundancy will be needed. The existence of data object copies enforces protocols of keeping these copies consistent; those protocols have to be handled carefully because of their in uence on the eciency of the parallel program.

During distribution of data complex structures built upon the reference concept (as mentioned above) will have to be transfered from the local memory on one processor into that on another. All references which have to undergo this procedure will become invalid when being transfered across the local memory address spaces. Therefore a specialized packing/unpacking mechanism has to be implemented.

In most algorithms the exchange of information between dierent processors

during the program run based on an existing data topology will be needed. The resulting communication patterns on complicated processor and data topologies have to be organized well in order to avoid eciency losses due to communication latency or data buering.

Considering these problems, a layer-based strategy for implementing datadynamic programs on MIMD parallel computers seems unavoidable. A useful level of abstraction has to be speci ed in order to keep the application part of the parallel program as simple as possible. In the following a parallelization strategy is presented, which consists of a parallel data model and a library speci cation de ning useful functions to access and manipulate the model's data.

Chapter 2 Framework In the following chapters an abstract model of a data structure and a related set of operations will be sketched. The combination of both shall supply an easy and transparent way of managing dynamic data which has to be distributed on MIMD systems as described in the introductory section. Fig. 2.1 shows the structure of a parallel program using the paradigm described in this paper. The actual application program is not connected directly to the MIMD hardware (i.e. it doesn't call the native communication library functions for exchanging topological and numerical data), but relies on two stacked layers of parallel functionality, whose complexity increases from bottom to top. The native communication library of the parallel system is covered by a standardized parallel processing interface (short: PPIF), which is a thin communication layer for ensuring portability of the resulting parallel program (see [2]). Its functionality includes basic synchronous and asynchronous send and receive calls as well as some prede ned communication topologies (array and tree structures). PPIF is based on a virtual channel model, which can easily be implemented also using parallel programming models which don't have the notion of channels by adding dummy channel structures. The DDD layer (short for dynamic distributed data) between PPIF and the actual application has to supply all functionality described in the next chapters. All problems mentioned in the previous section will be covered by DDD; the application program doesn't have to fall back on the message passing primitives of the lower layers. 5

CHAPTER 2. FRAMEWORK

6

Application

DDD Machine independent

Parallel Processing IF

Machine dependent

Parallel Hardware

Figure 2.1: Structure of parallel program using the DDD library

Chapter 3 Formal Model: Data Distribution 3.1 Parallelization in the DDD context In this chapter, a formal model of a general data distribution technique is going to be developed. Fig. 3.1 gives a simpli ed overview. Every programmer concerned with a (sequential) algorithm based on dynamic, unstructured data will have some data structures in mind, which should be plausible and useful for the correct and ecient implementation of her program. These data structure is called global data structure in Fig. 3.1 and in further parts of this paper. This notion of a (sequential) data structure should not have to be changed when implementing the algorithm on a parallel computer, although in reality the implementation of a distributed data structure is necessary there. The method of transfering the global data structure into the (desired) distributed structure suitable for the distributed memory hardware is called parallelization. The abstract data model will use graphs for specifying the properties of the global and the distributed structure, respectively. Thereby the distributed structure will consist of several local graphs, each one of them located inside one processor's memory. 7

8

CHAPTER 3. FORMAL MODEL: DATA DISTRIBUTION

(Notion of) Global Data Structure

Parallelizati o

n

(Desired) Distributed Structure

Local Graphs

Figure 3.1: Parallelization via data distribution in DDD In the following each of the three components and their relations (as shown in Fig. 3.1) are formally de ned. This will provide an easy, but nevertheless powerful way to de ne parallel (i.e. distributed) data structures and to develop methods of managing the actual distribution.

3.2 The notion of "object" in DDD In the following the term object will be used for representing small pieces of data inside a processor's memory. Although these objects may be classi ed by certain types (i.e. classes) and will be manipulated through the use of corresponding function calls (i.e. methods), they are not objects in the sense of the widely discussed object orientation paradigm (compare [6]). The main dierence which should be stated here is that there is no type checking at compile time, as the DDD object abstraction occurs at the DDD library interface as speci ed in Section 4 and in the DDD 1.0 Reference Manual ([3]).

3.3. GLOBAL VIEW OF DATA

9

3.3 Global View of Data At rst the global view of the given data structure shall be characterized. This global view of data can be directly implemented on a sequential computer; on a distributed memory parallel computer this can represent the underlying notion of data the application programmer has in mind.

3.3.1 Global object. De nition 1. Let a global object o~ 2 O~ be speci ed by o~ = (data,

desc)

with general structure desc and current data entries data. O~ shall be the set of all global objects. Global objects can be classi ed according to their structure description desc, which describes the internal format of a certain object type as a set of tupels: (offset, length, type) 2 desc:

These tupels denote small, successive memory parts (called elements) containing data speci ed by type according to Table 3.1. Each element has a given distance from the begin of its object (offset) and a xed size (length). 1 2

data data pointer

3

pointer

problem-speci c data (e.g. unknowns) pointer to object consisting of type-1-data (e.g. block with user data of variable length) reference, i.e. pointer to another object

Table 3.1: Types of components of a global object Table 3.1 classi es all possible elements of a global object roughly into data elements and pointer elements. This will be used for de nition of a general reference-relation in the next section.


10

3.3.2 Reference relation. De nition 2. The notation

ptr o~i ?! o~j speci es the situation, where object o~i has a pointer-component representing a reference on object o~j (e.g. pointer in C).

3.3.3 Set of global relations. De nition 3. The set of global relations R~ shall be speci ed by ptr (~oi; o~j ) 2 R~ , o~i ?! o~j

R~ thus denotes the set of all references occuring within the whole data struc-

ture.

3.3.4 Global (sequential) graph. Now the global graph G~ (corresponding to the sequential program) is build up from the global objects O~ (nodes) and the global relations R~ (edges) as ~ R~): G~ = (O; Fig. 3.2 shows an example of a global graph structure with four global objects and their reference relations. As the references are uni-directional in the general case, G~ will be a directed graph.

3.3. GLOBAL VIEW OF DATA

11

~ G ~ o1

~ o2

~ o3

~ o4

Figure 3.2: Example of a global graph.


12

3.4 Local View of Data 3.4.1 (Local) object. De nition 4. Let a (local) object o be speci ed by o = (data,

desc, proc)

with current entries data and processor number proc 2 P . The set of (abstract) processors P may denote real processors as well as processes. Again objects are classi ed according to their structure description desc, additionally type-1-data is distinguished even further (see Table 3.2). 1a

global data

1b

local data

2 3

data pointer pointer

problem-speci c data valid in all address spaces (e.g. cartesian coordinates) data valid on only one processor (e.g. local linked lists) as before as before

Table 3.2: Types of components of a (local) object For each processor p the set of all (local) objects Op can be de ned; the set S of all local objects O is derived by O = p2P Op :

3.4.2 Local relation sets, local graphs. De nition 5. With the local relation sets Rp speci ed by ptr (oi ; oj ) 2 Rp , oi 2 Op ^ oj 2 Op ^ oi ?! oj one can de ne local graphs Gp by Gp = (Op; Rp )

8p 2 P:

8p 2 P

3.4. LOCAL VIEW OF DATA

13

Fig. 3.3 shows an example of two local graphs GA and GB on two processors A and B , respectively. Note: there are no references from one local graph into the other.

o5

o2 A

o1

G

o6 B

o3

G

o4

Figure 3.3: Example of local graphs.

14


3.5 Distributed View of Data 3.5.1 Identi cation of local objects. The mapping id : O 7! I shall assign an index from an arbitrary index set I to every object. By id another mapping sh : I 7! P (O) is induced as follows: sh(i 2 I ) := fo j id(o) = ig Hence, local objects are grouped by being mapped onto the same index i. On the other hand, every index i 2 I denotes a group of objects (as a subset of O).

3.5.2 Distributed objects. Every group of objects with index i now can be regarded as one single distributed object oî with constituing local objects located on dierent processors. De nition 6. Let the set of distributed objects O^ be speci ed by ^ 8i : oî := sh(i) 2 O: Usually the desc-structure de nitions will be equal for all local objects inside the same distributed object. For that case the distributed object can be regarded as a single unit, consisting of several object copies with dierent data and common structure.

3.5.3 Two possible pointer models. ptr The exact de nition of the reference-relation ?! leads to two dierent models concerning the characteristics of allowed memory references:

Model L: The additional restricting demand ptr 8oi; oj 2 O : oi ?! oj ) proc(oi) = proc(oj ) forbids references across processor address spaces. This admits ecient access possibilities via hardware oriented pointers analogous to the sequential program.

3.5. DISTRIBUTED VIEW OF DATA

15

Model G: The above restriction is not made, references with ptr oi ?! oj ) proc(oi ) 6= proc(oj )

are possible. This is equivalent to the implementation of a global address space. The algorithm described here uses Model L, hence the technique of memory addressing is restricted. Without supporting hardware a portable, ecient implementation of Model G functionality does not seem possible at present. As a practical compromise the basic (ecient) Model L functionality should be enhanced by a more comfortable but slower Model G addressing.

3.5.4 Set of distributed relations, distributed graph. De nition 7. With the set of all distributed relations R^ speci ed by (ôi; o^j ) 2 R^ , 9ok 2 oî : 9ol 2 o^j : 9p 2 P : (ok ; ol) 2 Rp one can build up the distributed graph G^ by ^ R^ ) G^ = (O; from the distributed objects O^ (nodes) and the distributed relations R^ (edges). Fig. 3.4 shows an example of a distributed graph structure. The two local graphs (as in Fig. 3.3) constitute this distributed graph via identi cation of local objects to construct distributed objects (dashed circles in Fig. 3.4). Note: according to Model L there are still no references across processor partitions. Using Model L, the distributed graph obviously emerges out of the overlay of all local graphs because of the de nition of distributed relations.

16


ô1

^ G

o5

o2

ô2

A

o1

G

^ o3 ô4 o3

o6 B

G

o4

Figure 3.4: Example of distributed graph.

3.6 Parallelization/Distribution In the previous sections the desired data structure G~ and all possible (in terms of correctness) distributed structures G^ had been de ned. The following section characterizes a class of possible mappings, which are able to build up a relation between those two structures and thus describe the method of data parallelization.

3.6.1 Distribution. De nition 8. A distribution is a homomorphism D : O~ 7! O^ with respect to the relations R~ and R^ .

Thus if two objects o~1; o~2 are related in R~ (i.e. (~o1; o~2) 2 R~ ), the mapped objects have to be related in R^ (homomorphism property): (~o1; o~2) 2 R~ ) (D(~o1 ); D(~o2 )) 2 R^

3.6. PARALLELIZATION/DISTRIBUTION

17

This guarantees that at least all relations in the (desired) global data model are represented by corresponding relations in the actual (implemented) distributed model.

ô1

^ G

~ G

ô2

o5 ~ o1

~ o2

o2 A

o1

G

D ô3 ~ o3

~ o4

ô4 o3

o6 B

G

o4

Figure 3.5: Example of distribution/mapping. In Fig. 3.5 one can see how this works in case of the previous example. Without the distribution D there is no ocial connection between G~ and G^ . Although the homomorphism is obvious due to the small size of this example, the actual relation between G~ and G^ is constructed only when distribution D is introduced. Distribution D can be de ned here via:

D(~o1 ) = o^1; D(~o2 ) = o^2; D(~o3 ) = o^4; D(~o4 ) = o^3

3.6.2 Design of distributions. In practice one uses graph G~ as a starting point and tries to design the distributed structure G^ with its constituing local graphs Gp in a way that an appropriate homomorphism exists. The homomorphism property of the distribution D will support this design in a constructive manner.

18


3.6.3 Design example. In most cases the underlying assumption is that a bijective distribution (i.e. an isomorphism distribution) can be found; then a unique mapping between objects in O~ an those in O^ can be constructed. The minimal distributed graph (in terms of the number of local objects) can then be constructed by ful lling the ModelL-restriction mentioned in Section 3.5.3 in a straightforward manner: 1. Determine a real decomposition of the data structure into partitions, which are assigned to the existing processors. 2. Create exactly one local object copy for every object in O^ and retrieve its proc-property from the above partitioning. 3. References in R~ , whose isomorphic equivalent exists inside a processor's domain can be mapped immediately into the set of local references. 4. All other references are not allowed in Model L and have to be resolved by introducing one more local object copy on one of the participating processors. Using this new local object the reference can be "localized" and thus be expressed as a local reference. Often it may become necessary to use more local objects than the minimal number, as the algorithms used by the application program can be expressed in a more natural way if redundant copies of data objects exist. In numerical algorithms for example, all elements along the border between two dierent processor partitions may be duplicated to enable a homogeneous and symmetric view of the data structure.

3.7. SUPPLEMENTS

19

3.7 Supplements Most operations concerning the object set will not be applied to all objects as a whole. Usually the operations will be applied to restricted subsets of objects instead, which are chosen according to a number of prede ned object properties. Thus the object de nition from Section 3.4 will be extended by the following to object properties: priority and range. Hence the de nition of a local object can now be rewritten as:

o = (data,

desc, proc, prio, range):

3.7.1 Additional property: Priority. The new object property prio allows to assign a priority prio(o) 2 N to each object o. The priority can be used to de ne consistency protocols on application level. For instance, a certain local object copy of each distributed object may be distinguished and thus preferred in later operations (e.g. during exchange of data inside a distributed object).

3.7.2 Additional property: Range Via the new object property range an index range(ô) 2 R may be assigned to each distributed object o^. Thus the whole set of distributed objects may be split up into subsets according to the range set R. Through the range property objects may be distinguished according to the applied algorithms. For instance, every numerical element may be assigned to its level inside a multilevel-hierarchy in a computational uid dynamics code by using the level number as range value.

20


Chapter 4 Functionality of DDD In the previous section a formal way of expressing data parallelism has been described. The distributed data structures used there can be accessed via function calls provided by the DDD interface. This section will give an overview of the DDD functionality. For a detailed description of the DDD functions and the error messages returned by DDD refer to the DDD 1.0 Reference Manual ([3]). Fig. 4.1 shows an overview of the DDD functionality, which is organized as package of six dierent modules. The Management module contains functions to initialize DDD and its management parameters. The Object Management module provides routines for creating and destroying DDD objects. The Interface module supports communication on existing, static data topologies. The Xfer module enables dynamic changes of the data topology; the most important part of this is the transparent transfer of DDD objects across local memory boundaries. The Identi cation module supports creation of distributed objects via identi cation of local objects (i.e. construction of both mappings id and sh). The Supplement module contains other routines, e.g. for getting information about DDD internals or for executing a consistency check of the global data structure.

21

Management

IF

Xfer

Ident

Object Manager

Supplements

CHAPTER 4. FUNCTIONALITY OF DDD

22

Figure 4.1: Functionality of DDD { Overview

4.1 Management Module The Management module contains functions for general DDD management and for de nition of DDD object types.

4.1.1 General management. The whole parallel section of a SPMD application will start with a call to DDD Init() and end with its counterpart DDD Exit(). All initialization and clean-up work is done in DDD Init() and DDD Exit(), respectively, e.g. allocating all global buer space and initializing the underlying message passing library (i.e. PPIF). The user-supplied memory manager (MEMMGR, see [3]) has to be initialized before the parallel section starts.

4.1.2 De nition module. The main data structure managed by DDD is the local object, as speci ed in Section 3.4. All objects can be classi ed according to their structure description desc. The dierent object structure descriptions (one for each object type) have

4.1. MANAGEMENT MODULE

23

to be registered via DDD StructRegister(), which will return a DDD type id. This DDD TYPE can be used to refer to a certain prede ned desc property. The routine DDD StructDisplay() will print out a DDD TYPE description and may so serve as a useful debugging tool. Usually DDD library functions are called by the application program. Some cases exist, however, in which DDD has to use some functionality, which couldn't be de ned a priori (e.g. gathering data out of some objects and prepare it to being send to another processor). This functionality can be coded inside the application program via using handlers, which are usually small routines doing some exactly speci ed work on a single local object. These handlers can be registered seperately for each previously de ned DDD TYPE via DDD HandlerRegister(). This may be done many times during runtime, depending on what functionality the application program will need in a certain situation.

24


4.2 Object Management Module After a DDD TYPE had been registered, objects of that type may be created and deleted with routines from the Object Management module. It also provides functions for retrieval and manipulation of the additional object properties de ned in Section 3.7.

4.2.1 Object creation and deletion. Local objects of a given DDD TYPE may be created or deleted using the functions DDD ObjGet() and DDD ObjDelete(), respectively. These functions use memory manager routines, which may be implemented by the application programmer to keep control over memory management. Some other functions (DDD ObjInit(), DDD ObjUnlink(), DDD ObjMoveTo()) allow dierent low level manipulations on DDD objects (refer to [3] for a detailed explanation). These functions may interact with a complicated MEMMGR in more complicated applications, but will not be necessary in most cases.

4.2.2 Object properties. The supplementary object properties prio and range may be read by the info functions DDD InfoPriority() and DDD InfoRange(). Manipulation of this values is possible using DDD PrioritySet() for changing the prio-property of one local object and DDD RangeSet() for changing the range-property of a global object.

4.3. INTERFACE MODULE

25

4.3 Interface Module Every communication on an existing distributed data structure to exchange information between dierent parts of this structure should happen using DDD interfaces. Communication may only happen inside of single distributed objects; usually this means communication between dierent local objects of their corresponding distributed object those are contained in. Sources and destinations of the communication are directed by using the object property prio de ned in Section 3.7.1. The de nition of a DDD interface X is done by calling DDD IFDe ne(), which needs the following parameters:

X = (OX ; A; B ) with OX O being a subset of objects chosen according to their desc and range properties and A; B being subsets of priorities. A \ B need not to be empty (i.e. sending to object copies with same priority is allowed). For A = B = exchange of data will happen between all objects in the interface.

4.3.1 Interface de nition. One standard interface is automatically de ned by DDD without previous calls to DDD IFDe ne(). It is de ned such that it contains all objects o 2 O with arbitrary priority (i.e. A = B = ). All other (application speci c) interfaces have to be de ned before their rst usage via DDD IFDe ne(). The best place to do this in the code is during initialization phase (after registering the object structures). The routine DDD DisplayIF() may be called at any time on arbitrary processors to print out various quantitative information about the interfaces on these processors.


26

4.3.2 Interface usage. Three kinds of communication can be derived from the above interface de nition:

A ! B (forward oneway communication): every object copy

o 2 OX \ Op with

priop (o) 2 A

is sending data to all object copies

o 2 OX \ Oq q 6= p with

prioq (o) 2 B:

B ! A (backward oneway communication): the above with A and B exchanged. A $ B (exchange communication): both directions A ! B and B ! A simultaneously. In the current implementation all communications using DDD interfaces will be minimal according to their number of messages, i.e. only one message (if necessary at all) will be sent from p to q. The two functions DDD IFOneway() (for oneway communication in either direction) and DDD IFExchange() (for exchange communication simultaneously in both directions) are provided for implementation of the above speci cation.

4.4. XFER MODULE

27

4.4 Xfer Module The Xfer module contains functions for changing the existing distributed data topology. This includes operations for creating local object copies on other processors (the actual object transfer) and for deleting object copies in local memory.

4.4.1 Xfer process. Prior to the actual transfer a global transfer operation has to be established. This is done via a call to DDD XferBegin() on all processors (begin of rst phase). After this all necessary Xfer-calls may be issued on each processor individually (see next section). At last the transfer operation is started by calling DDD XferEnd() (second phase). During the rst phase all Xfer-calls are not executed immediately, but are recorded for later execution only. The actual transfer starts after DDD XferEnd() had been issued on all processors. This principle is well-known from database processing (transactions) and allows minimization of the necessary message passing operations at the beginning of the second phase.

4.4.2 Object xfer operations. In the rst phase of a global transfer operation two dierent commands may be issued. DDD XferCopyObj() will create an object copy in some other processor's memory. DDD XferDeleteObj() will delete a local copy. As other processors may be involved in this deletion by owning an object copy of the same distributed object, the latter command has to be issued during the global transfer operation. A DDD XferMoveObj()-operation for moving an object copy from one processor's memory to another may easily be created by concatenation of the above two operations. Whenever a processor gets more than one local copy of the same distributed object, the one with the higher prio-property will be preferred.

28


4.4.3 Additional xfer operations. Some supplementary Xfer-operations are provided to add data of various formats to the objects being transfered. For a more detailed description refer to functions DDD XferCopyObj() and DDD XferAddData() in the DDD 1.0 Reference Manual ([3]).

4.5. IDENTIFICATION MODULE

29

4.5 Identi cation Module One way to construct a distributed data structure is to create all objects on one processor, do a partitioning of the whole object set and transfer the subsets to the other processors afterwards according to this partitioning. This method has several disadvantages, thereby the most important is the serious sequential bottleneck on the master processor both in storage space and execution time. An alternative method to construct the distributed structure is to create local objects an all processors independently as a rst step and identify local objects on dierent processors afterwards to construct the distributed objects. The functionality for doing the identi cation is provided by the DDD Identi cation module. Identi cation can be regarded as a distributed construction of the mappings id and sh de ned in Section 3.5.1.

4.5.1 Identi cation process. Similar to the Xfer-operation every identi cation procedure has to be initiated via a call to DDD IdentifyBegin() (begin of rst phase). After this all necessary Identify-calls as described in the next section may be issued on each processor individually. At last the identi cation process is started by calling DDD IdentifyEnd() (second phase). During the rst phase all Identify-calls are not executed immediately, but are recorded for later execution only. The actual identi cation procedure starts after DDD IdentifyEnd() had been issued on all processors. As in case of the Xfer module, this technique is well-known from database processing and allows the minimization of the necessary message passing operations at the beginning of the second phase.

4.5.2 Identi cation operations. Between the two calls to DDD IdentifyBegin() and DDD IdentifyEnd() an arbitrary series of Identify-operations may be issued on each processor individually. Valid Identify-operations are DDD IdentifyNumber(),

30


DDD IdentifyString() and DDD IdentifyObject() for identifying two lo-

cal objects via a common number, a string or another (already identi ed) object, respectively. Every processor may map each of its local objects onto a unique identi cation tupel by issueing a series of Identify-commands. The corresponding processor has to create the same tupel for one of its local objects. After the identi cation has been completed, every pair of objects with the same identi cation tupel will be mapped onto the same distributed object.

4.6. SUPPLEMENT MODULE

31

4.6 Supplement Module The Supplement module contains several routines for retrieving useful internals of DDD structures and for checking the consistency of distributed data.

4.6.1 Additional info functions. Two info functions { DDD InfoPriority() and DDD InfoRange() { have already been mentioned during description of the Object Management module. Other info routines return the global id (i.e. id(o), cf. Section 3.5.1) of a local object (DDD InfoGlobalId()) or a list of local object members of one particular distributed object and their prio-entries (DDD InfoProcList()).

4.6.2 Maintainance and debugging. A few functions are provided to help during debugging of the parallel application: function DDD ConsCheck() does a consistency check of the whole distributed data structure. A complete list of local object copies may be printed out by calling DDD ListLocalObjects() on one particular processor.

32


Bibliography [1] P. Bastian. Parallele adaptive Mehrgitterverfahren. PhD thesis, Universitat Heidelberg, 1994. [2] P. Bastian. Parallel adaptive multigrid methods. Preprint 93-60, Interdisziplinares Zentrum fur Wissenschaftliches Rechnen, Heidelberg, Oktober 1993. [3] K. Birken. Dynamic Distributed Data in a parallel programming environment { DDD Reference Manual. Forschungs- und Entwicklungsberichte RUS{23, Rechenzentrum der Universitat Stuttgart, Germany, September 1994. [4] C. Helf and U. Kuster. A nite volume method with arbitrary polygonal control volumes and high order reconstruction for the Euler equations. In Proceedings of ECCOMAS 1994, University of Stuttgart, Germany, 1994. [5] D. Roose and R. Van Driessche. Distributed memory parallel computers and computational uid dynamics. In H. Deconinck, editor, Lecture Notes on Computational Fluid Dynamics, LS 1993-04, Von Karman Institute for Fluid Dynamics, Belgium, 1993. [6] E. Yourdon. Object-Oriented Systems Design { An Integrated Approach. Prentice Hall (Yourdon Press), 1994.

33

Dynamic Distributed Data (DDD) in a Parallel Programming ...

Dynamic Distributed Data (DDD) in a Parallel Programming ...

Suggest Documents

Dynamic Distributed Data (DDD) in a Parallel Programming ...

Dynamic Distributed Data in a Parallel Programming ... - CiteSeerX

Dome: Parallel Programming in a Distributed ...

PDP: Parallel Dynamic Programming

Parallel Distributed Genetic Programming - CiteSeerX

Dome: Parallel Programming in a Distributed Computing ... - CiteSeerX

Dome: Parallel Programming in a Distributed Computing ... - CiteSeerX

Paradigms for Parallel Distributed Programming 1

Asynchronous Distributed Parallel Gene Expression Programming ...

Distributed Parallel Computing Using Navigational Programming

Visualised Parallel Distributed Genetic Programming - Google Sites

Dynamic Query Scheduling in Parallel Data Warehouses

Dynamic Query Scheduling in Parallel Data Warehouses

A Standards-Based Dynamic Distributed Data ... - EarthCube

Task-parallel versus data-parallel library-based programming in ...

A Standards-Based Dynamic Distributed Data ... - EarthCube

Parallel Stochastic Dynamic Programming: Finite Element ... - CiteSeerX

Dynamic programming algorithms for scheduling parallel ... - CiteSeerX

EasyPDP: An Efficient Parallel Dynamic Programming ... - CiteSeerX

Approximate Dynamic Programming Applied to Parallel ... - CiteSeerX

PARALLEL DYNAMIC PROGRAMMING - Computer Science: Indiana ...

A MULTITHREADED PARALLEL IMPLEMENTATION OF A DYNAMIC PROGRAMMING ...

A MULTITHREADED PARALLEL IMPLEMENTATION OF A DYNAMIC PROGRAMMING

Distributed Online Data Aggregation in Dynamic Graphs