MPI

MPI: EXTENSIONS AND APPLICATIONS A Thesis

Submitted to the Graduate School of the University of Notre Dame in Partial Fulfillment of the Requirements for the Degree of Master of Science

in Computer Science and Engineering by Jeffrey M. Squyres, B.A., B.S.

Andrew Lumsdaine, Director

Department of Computer Science and Engineering Notre Dame, Indiana April 1996

MPI: EXTENSIONS AND APPLICATIONS

Abstract by Jeffrey M. Squyres

Message passing is the most well known and well understood paradigm for implementing parallel algorithms on distributed memory architectures. The Message Passing Interface (MPI) has emerged as a widely used standard for writing message-passing programs. This work presents research projects that use or extend the functionality of MPI. The first is a proposed set of C++ language bindings for MPI. The proposed bindings provide a basic set of objects and methods that are derived from a direct one-to-one function mapping of the MPI function specifications. Second, a full-featured MPI class library is presented; Object Oriented MPI (OOMPI). OOMPI incorporates many of the features of C++ into MPI, such as inheritance, polymorphism, and function overloading. Finally, a parallel image processing library that was written with MPI. This library can be used by programers with minimal knowledge of parallel programming.

To Paul Phillips, the definitive Useless Master

ii

TABLE OF CONTENTS

LIST OF FIGURES

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

ACKNOWLEDGEMENTS

: : : : : : : : : : : : : : : : : : : : : : : : : :

1 SUMMARY : : : : : : : : : : : : : : : : : : : : : :

1

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

2

The Message Passing Interface :

1.2

Overview :

2 C++ LANGUAGE BINDING PROPOSAL FOR MPI 2.1

MPI Bindings

2.2

Alternatives

4

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

4

: : : : : : : : : : : : : : : : : : : : : : : : : : : :

7

: : : : : : : : : : : : : : : : : : : : : : : : : : : : :

7

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

8

Other Work

Recommendations

2.4

C++ Bindings 2.4.1

Naming Rules

2.4.2

Copy Semantics

2.4.3

Construction / Destruction Semantics

2.4.4

Comparison Syntax

2.4.5

Constants

2.4.6

Class Libraries

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

9 10

: : : : : : : : : : : : : : :

11

: : : : : : : : : : : : : : : : : : : : : : : :

12

: : : : : : : : : : : : : : : : : : : : : : : : : : : : :

13

: : : : : : : : : : : : : : : : : : : : : : : : : : :

3 OBJECT ORIENTED MPI Introduction

4

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

2.3

3.1

x 1

1.1

2.2.1

viii

13 14

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

iii

14

3.2

Requirements

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

14

3.3

Analysis

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

16

3.3.1

Syntax

3.3.2

Ports and Communicators

3.3.3

Messages

3.3.4

User Defined Data Types

3.3.5

Return Values

3.3.6

A Stream Interface for Message Passing

3.3.7

Packed Data

3.3.8

Attributes

3.3.9

Objects

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

17

: : : : : : : : : : : : : : : : : : : : : : : : : : : : :

18

: : : : : : : : : : : : : : : : : : : : :

20

: : : : : : : : : : : : : : : : : : : : : : : : : : :

23

: : : : : : : : : : : : : :

23

: : : : : : : : : : : : : : : : : : : : : : : : : : : :

24

: : : : : : : : : : : : : : : : : : : : : : : : : : : : :

26

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

26

3.3.10 Communicator Objects

: : : : : : : : : : : : : : : : : : : : : :

28

: : : : : : : : : : : : : : : : : : : : : : : : :

32

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

36

3.3.12 Object Semantics

3.5

Examples

26

: : : : : : : : : : : : : : : : : : : : :

3.3.11 Message and Data Objects

3.4

16

3.4.1

Ring, Version 1

: : : : : : : : : : : : : : : : : : : : : : : : : :

36

3.4.2

Ring, Version 2

: : : : : : : : : : : : : : : : : : : : : : : : : :

37

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

38

Availability :

4 PARALLEL AND DISTRIBUTED ALGORITHMS FOR HIGH-SPEED IMAGE PROCESSING 39 39

4.1

Introduction

4.2

Parallel Image Processing

: : : : : : : : : : : : : : : : : : : : : : : : :

40

4.2.1

: : : : : : : : : : : : : : : : : : : : : : : : :

41

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

43

4.3

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Data Distribution

Requirements 4.3.1

System Model

4.3.2

Functional Requirements

4.3.3

Non-Functional Requirements

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

iv

: : : : : : : : : : : : : : : : : : :

44 46 49

4.3.4

Reduction Kernels

: : : : : : : : : : : : : : : : : : : : : : : : :

50

4.3.5

Analysis

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

51

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

59

4.4

Design

4.5

Experimental Results

4.6

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

60

: : : : : : : : : : : : : : : : : : : : : : : :

61

: : : : : : : : : : : : : : : : : : : : : : : : : :

64

4.5.1

Communication Costs

4.5.2

Parallel Performance

4.5.3

Load Balancing

4.5.4

Communication and Computation Patterns

Conclusions

59

: : : : : : : : : : : :

66

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

69

4.6.1

Further Extensions to PIPT

: : : : : : : : : : : : : : : : : : : :

69

4.6.2

PIPT as Photoshop Plug-In

: : : : : : : : : : : : : : : : : : : :

70

4.6.3

An Object-Oriented Implementation of PIPT

4.6.4

Parallel Image Processing in a Shared Memory Environment

4.6.5

Parallel Image Processing in a Wide-Area Network

BIBLIOGRAPHY

: : : : : : : : : : : : : :

71

: : : : : : : :

72

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

73

A MPI-2 C++ BINDINGS A.1 C++ Classes

76

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

A.2 Defined Constants A.3 Typedefs

71

76

: : : : : : : : : : : : : : : : : : : : : : : : : : : : :

76

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

80

: : : : : : : : : : : : :

80

: : : : : : : : : : : : : : :

84

A.4 C++ Bindings for Point-to-Point Communication A.5 C++ Bindings for Collective Communication

A.6 C++ Bindings for Groups, Contexts, and Communicators A.7 C++ Bindings for Process Topologies

: : : : : : : : :

85

: : : : : : : : : : : : : : : : : : :

87

: : : : : : : : : : : : : : : : :

88

: : : : : : : : : : : : : : : : : : : : : : : : :

88

A.8 C++ Bindings for Environmental Inquiry A.9 C++ Bindings for Profiling B OOMPI INTERFACE

89 v

B.1 Notation

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

B.2 Class Listings

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

OOMPI Array message

92

: : : : : : : : : : : : : : : : : : : : : : : : : : : :

94

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

99

OOMPI Comm world OOMPI Datatype

: : : : : : : : : : : : : : : : : : : : : : : : : : :

104

: : : : : : : : : : : : : : : : : : : : : : : : : : : : :

107

OOMPI Environment OOMPI Error

: : : : : : : : : : : : : : : : : : : : : : : : : : :

113

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

115

OOMPI Graph comm OOMPI Group

91

: : : : : : : : : : : : : : : : : : : : : : : : : :

OOMPI Cart comm OOMPI Comm

90

: : : : : : : : : : : : : : : : : : : : : : : : : : :

117

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

121

OOMPI Inter comm

: : : : : : : : : : : : : : : : : : : : : : : : : : : :

125

OOMPI Intra comm

: : : : : : : : : : : : : : : : : : : : : : : : : : : :

127

: : : : : : : : : : : : : : : : : : : : : : : : : : : : :

132

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

136

OOMPI Message OOMPI Op

OOMPI Packed OOMPI Port

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

139

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

143

OOMPI Request

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

OOMPI Request array OOMPI Status

: : : : : : : : : : : : : : : : : : : : : : : : : : :

152

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

158

OOMPI Status array OOMPI Tag

149

: : : : : : : : : : : : : : : : : : : : : : : : : : : :

161

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

163

OOMPI User type

: : : : : : : : : : : : : : : : : : : : : : : : : : : : :

OOMPI Enumerated Types OOMPI Constants

164

: : : : : : : : : : : : : : : : : : : : : : : :

165

: : : : : : : : : : : : : : : : : : : : : : : : : : : : :

168

C PIPT INTERFACE

171

vi

C.1 Defined Constants and Enumerated Types

: : : : : : : : : : : : : : : : :

171

: : : : : : : : : : : : : : : : : : : : : : : : : :

172

C.2 Data Structures

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

174

PIPT Datatypes

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

175

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

179

PIPT Enumerated Types

C.3 Functions

: : : : : : : : : : : : : : : : : : : : : : : : : : :

180

PIPT Errors

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

182

PIPT Exit()

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

185

PIPT Array Functions

PIPT Get object offset() PIPT Init()

: : : : : : : : : : : : : : : : : : : : : : : : : :

186

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

188

PIPT Kernels :

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

PIPT Kernel Invocation Routines PIPT Manager()

189

: : : : : : : : : : : : : : : : : : : : :

195

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

198

: : : : : : : : : : : : : : : : : :

200

: : : : : : : : : : : : : : : : : : : : : : : :

201

: : : : : : : : : : : : : : : : : : : : : : : : : : : :

205

PIPT Memory Maniuplation Functions PIPT Registration Functions PIPT Toolkit Mode :

PIPT Worker Functions

: : : : : : : : : : : : : : : : : : : : : : : : : :

vii

206

LIST OF FIGURES

4.1

Data dependency for an image processing algorithm using a point operator. 41

4.2

Data dependency for an image processing algorithm using a window operator. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

:

42

Fine-grained parallel decomposition for an image processing algorithm using a window operator. : : : : : : : : : : : : : : : : : : : : : : : : :

:

42

Coarse-grained parallel decomposition for an image processing algorithm using a window operator. : : : : : : : : : : : : : : : : : : : : : : : : :

:

43

4.3 4.4 4.5

System model for applications using original image processing toolkit.

: :

45

4.6

System model for applications using parallel image processing toolkit.

: :

45

4.7

Detailed diagram of the interaction between the various components of the PIPT. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

:

47

Complete code listing for the PIPT implementation of a parallel average filter. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

:

52

: : : : : : : : : : : : : : : :

53

: : : : : : : : : : : : : : : : :

60

4.8 4.9

PIPAverage() with level of indirection.

4.10 Input image for preliminary experiments.

4.11 Comparison of the time required to send messages in a workstation cluster and an IBM SP-2. : : : : : : : : : : : : : : : : : : : : : : : : : : : :

:

61

4.12 Execution times for a parallel square median filter on a cluster of Sparc 5' s, a an IBM SP-2, and an SGI PowerChallenge. : : : : : : : : : : : :

:

62

4.13 Execution times for a parallel average filter on a cluster of Sparc 5' s, a an IBM SP-2, and an SGI PowerChallenge. : : : : : : : : : : : : : : : : :

:

62

4.14 Comparison of ideal speedup with observed speedup for parallel square median filter and parallel average filter. : : : : : : : : : : : : : : : : :

:

63

4.15 Observed parallel speedup as a function of window size for parallel square median filter and parallel average filter. : : : : : : : : : : : : : : : : :

:

64

viii

4.16 Communication patterns for scattering and gathering image data for four processor parallel average filter with 17 17 window and no load balancing. 67 4.17 Communication patterns for the same experiment as in Figure 4.16 except using the simple load balancing scheme from Section 4.5.3. : : : : : : :

ix

:

68

ACKNOWLEDGEMENTS

This thesis is due, in large part, to the combined efforts of many people. I would like to thank the Coca Cola and Snapple companies for contributing to many late night hacking sessions. /bin/fortune, for its wise and inspirational quips, will always hold a special place among my revered and sacred sources of knowledge. And of course Dorothy, our wonderful Cleaning Lady, for putting up with our slobbish habits, will forever remain a cherished memory (if only my apartment were as clean as our lab!). Perhaps my largest source of inspiration was the Trombone Section of the Notre Dame Marching Band – may they eternally be crazy (play loud!). What would an acknowledgements section be without mentioning the DomeCam? It will conquer the world. This work would have been impossible without the contributions and guidence from Dr. Andrew Lumsdaine. Additionally, Dr. Robert L. Stevenson was heavily involved in the PIPT project. I would also like to thank Ms. Tracy Payne for listening to all my rantings and ravings, putting up with my moody work habits, and for that one long car ride which was the source of most of the inspiration for Chapter 4.

x

Brian C. McCandless and John J. Tran contributed to the development and testing of the PIPT. Brian McCandless was also an integral part of the design and implementation of OOMPI. The PIPT effort was sponsored by Rome Laboratory, Air Force Materiel Command, USAF under grant number F30602-94-1-0016. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Rome Laboratory or the U.S. Government. This work was also supported in part by NSF Grant CCR92-09815.

xi

CHAPTER 1

SUMMARY

1.1 The Message Passing Interface One of the challenges in developing a parallel software is making it portable to the various (and diverse) types of parallel hardware that are available (both now and in the future). In order to make parallel code portable, it is important to incorporate a model of parallelism that is used by a large number of potential target architectures. The most widely used and well understood paradigm for the implementation of parallel programs on distributed memory architectures is that of message passing. Several message passing libraries are available in the public domain, including p4 [3], Parallel Virtual Machine (PVM) [1], PICL [10], and Zipcode [21]. Recently, a core of library routines (influenced strongly by existing libraries) has been standardized in the Message Passing Interface (MPI) [9, 11, 23]. Public domain implementations of MPI are widely available. Several vendors of parallel machines are already providing native versions of MPI for their hardware; it is expected that an increasing number of parallel architectures will support native implementations of MPI.

1

1.2 Overview This thesis discusses two extensions to MPI, and one application that uses MPI. Chapters 2 through 4 are based upon the following three papers: 1. Andrew Lumsdaine and Jeffrey M. Squyres. C++ Language Bindings Proposal for MPI. Presented to the MPI Forum. 2. Jeffrey M. Squyres, Andrew Lumsdaine, Brian C. McCandless. Object Oriented MPI Reference. CSE Technical Report TR-96-10, University of Notre Dame, March 1996. Also available at: http://www.cse.nd.edu/lsc/research/oompi/

3. Jeffrey M. Squyres and Andrew Lumsdaine and Brian C. McCandless and Robert L. Stevenson. “Cluster-Based Parallel Image Processing.” Submitted to Journal of Parallel and Distributed Computing. Also available at: http://www.cse.nd.edu/lsc/research/PIPT/

Chapter 2 is a proposal for C++ MPI language bindings [25]. The proposal has been presented to the MPI Forum and is presently under consideration. Some form of the proposal will be adopted into the MPI-2 standard [17]. The new bindings take advantage of several features of the C++ language, such as const and reference semantics, as well as allowing for profiling libraries and polymorphism. They do not compose a class library, rather, they present a simple, unambiguous, one-to-one mapping from the current MPI function definitions to bindings in C++. 2

Chapter 3 presents such a class library: Object Oriented MPI (OOMPI). OOMPI is a comprehensive C++ class library for MPI [26] that provides a simple, intuitive, and flexible object oriented interface to MPI. Since OOMPI is implemented on top of the existing C MPI bindings, the library is usable on any parallel architecture that has an MPI implementation. This chapter discusses the design of OOMPI as well as many issues specific to parallel object oriented communication library design. The Parallel Image Processing Toolkit (PIPT) is an image processing library described in Chapter 4 [24]. The PIPT is a scalable and extendible library in which the parallel processing is transparent to the programmer; users can either use image processing routines that are built in to the PIPT, or have their own routines take advantage of the transparent parallelism inherent in the library. Since the PIPT is implemented on top of MPI, it is highly portable to many different types of parallel architectures; this chapter focuses mainly on issues dealing with cluster-based applications that use the PIPT.

3

CHAPTER 2

C++ LANGUAGE BINDING PROPOSAL FOR MPI

2.1 MPI Bindings The MPI standard is a functional specification that currently only specifies bindings for the languages of C and Fortran 77. As C++ [8, 16] gains widespread acceptance as a programming language for high-performance computing, the MPI standard should be expanded to include this user community. In addition, the C++ language includes semantic and syntactic improvements over the C language; C++ MPI bindings that take advantage of these improvements are necessary to gain the acceptance in the object-oriented community.

2.2 Alternatives There are three principle alternatives to consider for C++ language bindings: 1. Use the existing C language bindings, 2. Provide a lightweight set of objects, or 3. Provide a comprehensive class library. We discuss each of these alternatives below. 4

C++ bindings

The C bindings provide one available interface for writing MPI programs

in ANSI C++. C++ MPI programs may be written by simply including the “mpi.h” header file (as specified in [9]) and calling MPI functions as described by the C bindings. So, one proposal for C++ bindings for MPI could be to make the existing C bindings to be the C++ bindings as well. There are some drawbacks to this alternative. First, it is conceivable that there will be versions of MPI written in C++. There will be natural C++ bindings in such an implementation on top of which the C bindings would be layered. To require that C++ programs use the C bindings of such an implementation seems unwarranted. Moreover, the specification of MPI is in itself object-oriented. Thus, it seems that certain C++ objects and bindings would be appropriate in such a setting. Finally, C++ potentially provides a much higher level of expressive power than C and this expressiveness should be provided to MPI users (if possible).

Lightweight Objects As mentioned above, the design of MPI is object-oriented in flavor. It has been proposed within the MPI Forum that the C++ interface to MPI should be a basic set of classes corresponding to the fundamental object types in MPI-1. The functionality of MPI-1 would be realized through member functions of the objects. This type of interface would be lightweight. For instance, MPI error codes would still be returned by function calls, no new types of objects would be introduced, only a minimal use of advanced features of C++ such as polymorphism would be introduced, etc.

5

Class Library There have also been proposals within the Forum that a class library be defined and used for the C++ bindings. A class library should make full use of C++ features such as inheritance, polymorphism, etc. Thus, a class library would use polymorphism to obviate the need to explicitly provide type information to messages. By way of contrast, the C++ bindings would still require that type information be provided. Similarly, a class library would provide “out” variables as return values to allow chaining of operations. The C++ bindings would still return error codes. A class library is not appropriate as a C++ binding, however. While a class library may make user programming more elegant, a binding must provide a direct and unambiguous mapping to the specified functionality of MPI. Thus, a general class library should be provided, but it should be built on top of the C++ bindings. However, the C++ bindings should be general enough to allow the building of application-specific class library as well.

Discussion: [26] provides the specification one possible C++ class library for MPI — Object Oriented MPI, or OOMPI. Although the OOMPI class library is specified in C++, few assumptions about implementation are made — in some sense the specification is a generic one that uses C++ as the program description language. Thus, the OOMPI specification can also be considered as a generic object oriented class library specification which can thus also form the basis for MPI class libraries in other object-oriented languages. As such, its inclusion in MPI-2 would be appropriate.

6

2.2.1 Other Work Other C++ interfaces to MPI include MPI++ [22], mpi++ [13], and PARA++ [6]. We comment on each of these briefly. MPI++ is one of the earliest proposed C++ interfaces to MPI. MPI++ had as one of its goals to be semantically and syntactically consistent with the C interface. The basic design of MPI++ would thus form a solid basis for a C++ binding for MPI. mpi++ is a more recently introduced C++ interface to MPI and is presently under development. The version described in [13] includes only point to point communication operations. However, even with these operations, mpi++ does not appear to be semantically or syntactically consistent with the C interface because of its use of templates for datatypes. PARA++ provides a generic high-level interface for performing message passing in C++, with no attempt (by design) to be consistent with MPI.

2.3 Recommendations The bindings subcommittee makes the following recommendations for C++ bindings for MPI: 1. The C++ language bindings should consist of a small set of classes with a lightweight functional interface to MPI-2. The set of classes should be as small as possible and the classes should correspond to fundamental MPI object types (e.g., communicator, group, etc., as well as any new types that may emerge from MPI-2).

7

2. The MPI C++ language bindings must provide a semantically correct interface to MPI-2. 3. To the greatest extent possible, the C++ bindings should be semantically and syntactically identical to the C bindings, but use C++ semantics where a higher level of expressive power is desired. 4. A class library should also be designed and provided as an annex to the MPI-2 document. The class library should be built on top of the C++ bindings. A class library built on top of the the C bindings has already been designed and is provided in [26]. A class library built on top of the C++ bindings will have a somewhat clearer interface because it will be able to take advantage of const arguments and so forth.

2.4 C++ Bindings The complete set of C++ language bindings is presented in Appendix A. There is a small set of classes defined for these bindings. However, to maintain consistency with the binding definitions we list the C++ bindings in the same order as given for the C bindings (rather than being grouped by class; see Section A.1). The bindings are similar to the MPI++ proposal (see section 2.2.1, but with several notable differences: MPI++ makes use of C++ features such as polymorphism, while not taking advantage of other C++ features such as reference and const semantics. Appendix A proposes the exact opposite: do not use polymorphism because such a scheme

8

cannot embrace a simple one-to-one mapping that is critical for a bindings, and use reference and const semantics since they are an important feature of the C++ language. In order to make the following function list less cluttered, the keyword virtual has been left out of each declaration with the implicit understanding that all member functions are virtual (except for constructors and the assignment operator). In particular, profiling library objects can be easily derived for these objects; the profiling mechanism can be invoked followed by a call to the original MPI function.

2.4.1 Naming Rules The bindings presented above were generated by applying a simple set of rules to the existing C bindings. These rules can also be applied to MPI-2 C bindings to create MPI-2 C++ bindings:

1. Remove the MPI opaque handle from the argument list and make it an object. If there is no MPI opaque handle in the argument list, the function should be globally scoped.

If the original function contained an opaque MPI handle, make the function a method of the MPI object. Strip the “MPI ” from the beginning of the function name, as well as any part of the name that is duplicated in the object name (e.g. MPI Comm compare() becomes MPI Comm::Compare()).

If the original function did not contain a single opaque MPI handle, the full function name should be retained (e.g. MPI Buffer attach() becomes ::MPI Buffer attach()). 9

2. Follow the MPI-1 capitalization rules for the method names. 3. Methods should be declared const if they return information to the user and do not change the internal state of the calling object. 4. Any reference or array argument that will not be modified in the method should be declared const. 5. Pointer arguments should be changed to references when a reference is what is semantically implied. 6. Array arguments should be denoted with square brackets ([]), not pointers.

These rules should be applied to all new functions created in MPI-2 so that the C++ bindings stay current with the latest functionality of MPI.

2.4.2 Copy Semantics The copy and assignment semantics in C++ should be the same as those specified by the C and Fortran 77 bindings. That is, the MPI user level objects should behave like handles.

Advice to implementors. Each MPI user level object is likely to contain, by value or by reference, an implementation-dependent state information. The assignment and copying of MPI object handles may simply copy this value (or reference). (End of advice to implementors.)

10

2.4.3 Construction / Destruction Semantics The construction and destruction semantics in C++ should be the same as those specified by the C and Fortran 77 bindings. That is, the MPI user level objects should behave like handles.

Advice to implementors. Default constructors for all MPI objects should allow the comparison with a corresponding MPI * NULL object to return a boolean true, MPI IDENT, or whatever the corresponding comparison function returns (see Section 2.4.4). The destructor for each MPI user level object should not invoke the corresponding MPI * free() function (if it exists) for the following reasons:

1. Such a scheme would not be consistent with the C and Fortran 77 functionality. 2. The model put forth in MPI-1 makes memory allocation and deallocation the responsibility of the user, not the implementation. 3. MPI objects going out of scope may produce collective operations, which is not intuitive, and may not be what the user wants (this also affects the copy and construction semantics). The communicator that is created at the beginning of the function shown below is freed when the function exits, which triggers a collective operation:

void foo() 11

{ MPI_Comm a = MPI_COMM_WORLD.Dup(); // Rest of the function }

(End of advice to implementors.)

2.4.4 Comparison Syntax Several MPI objects already have MPI functions for comparison (e.g., MPI Comm::Compare()). These functions typically can return more than two values; a boolean is not sufficient to describe their relationship. Therefore, the equality and inequality operators operator==() and operator!=()) should not be overridden for these objects.

Discussion: However, many C and Fortran implementations that use pointers or integers as MPI handles indirectly allow the use of the == and != operators; the only meaningful boolean response is true, which is equivalent to MPI IDENT. It may be desirable to adopt this functionality into the C++ bindings. For example, after an assignment such as a = b, the test (a == b) should return true.

However, several objects do not have methods equivalent to the C MPI * compare() functions (e.g., MPI Datatype). Since no other form of comparison is available, and since a simply boolean answer is all that is required, the equality and inequality operators should be overridden to provide this functionality. This is also consistent with the C and Fortran 77 functionality.

12

The following table summarizes which objects should make use of the overridden operators: Override operators

Use MPI functionality

MPI Datatype

MPI Comm

MPI Op

MPI Group

MPI Status MPI Request

2.4.5 Constants Constants should be singleton objects; they should either be declared const or derived from the original MPI object and override all non-const functions to prevent modification.

2.4.6 Class Libraries The C++ bindings have been designed to support class libraries; they include const and reference semantics.

13

CHAPTER 3

OBJECT ORIENTED MPI

3.1 Introduction This chapter describes an object oriented approach to the Message Passing Interface (MPI) [9, 11]. Object Oriented MPI (OOMPI) is a class library specification that encapsulates the functionality of MPI into a functional class hierarchy to provide a simple, flexible, and intuitive interface.

3.2 Requirements With the specification of a C++ class library, we will necessarily be moving away from the simple one-to-one mapping of MPI function to language binding (as with C and Fortran). We therefore also run the risk of adding, losing, or changing MPI-1 specified functionality with the library specification. In order to properly delimit the scope of the MPI C++ class library, we have the following guidelines:

Semantics. The MPI C++ class library must provide a semantically correct interface to MPI.

14

Syntax. The names of member functions should be consistent with the underlying MPI functions that are invoked by each member function.

Functionality. The MPI C++ class library must provide all functionality defined by MPI-1. To the greatest extent possible, this functionality should be provided through member functions of objects, although some globally scoped functions may be permitted.

Objects. It is only natural to think of communicating objects in a message-passing C++ program. The MPI-1 specification, however, does not deal with objects. It only specifies how data may be communicated. Thus, we require that the MPI C++ class library similarly provide the capability for sending the data that is contained within objects. Moreover, since the data contained within an object is essentially a user-defined structure, we require that mechanisms be provided to build MPI-1 user-defined data types for object data and for communicating that data in a manner identical to communicating primitive data types. Objects that have complex data to be communicated must be explicitly constructed to do so.

Implementation. The MPI C++ class library must be a layer on top of the C bindings1. In conjunction with the guidelines for functionality, this implies that the MPI-1 functionality will essentially be provided with calls to C functions. That is, there will be 1

For consistency, this document only discusses OOMPI relative to the C MPI bindings. Although there are C++ MPI bindings currently proposed, they have not been accepted by the MPI Forum yet. Future plans for OOMPI include implementations based upon the accepted C++ MPI bindings, additional OOMPI documentation will be released at that time.

15

no attempts for the C++ class library itself to provide any MPI-1 functionality apart from that provided by the C bindings. Further implementation stipulations are that:

The class library must introduce as little overhead as possible to the C bindings.

The class library may not make use of internal details of particular implementations of C bindings.

Except where the C++ language offers a simpler interface, preserve similar function names from the C MPI bindings as well as necessary arguments.

3.3 Analysis 3.3.1 Syntax A typical MPI function call (in C) is of the following form:

MPI_Comm comm; int i, dest, tag; . . . MPI_Send(&i, 1, MPI_INT, dest, tag, comm);

Here, i, 1, MPI INT, and tag specify the content and type of the message to be sent, and comm and dest specify the destination of the message. A more natural syntax results from encapsulating the pieces of information that make up the message and the destination. That is, we could perhaps encapsulate i, 1, MPI INT, and tag as a message object and comm and dest as a destination (or source) object.

16

Before committing to any objects, let' s examine the sort of expressive syntax that we would like for OOMPI. The function call above would be very naturally expressed as int i; . . . Send(i); But this is incomplete — we still require some sort of destination object. In fact, we would like an object that can serve as both a source and a destination of a message. In OOMPI, this object is called a port2 .

3.3.2 Ports and Communicators Using an OOMPI Port, we can send and receive objects with statements like: int i, j; OOMPI_Port Port; ... Port.Send(i); Port.Receive(j); The OOMPI Port object contains information about its MPI communicator and the rank of the process to whom the message is to be sent. Note, however, that although the expression Port.Send(i) is a very clear statement of what we want to do, there is no explicit construction of a message object. Rather, the message object is implicitly constructed (see 3.3.3 below). Port objects are very closely related to communicator objects — a port is said to be a communicator' s view of a process. Thus, a communicator contains a collection of ports, 2

The moniker “port” was suggested by Marc Snir.

17

one for each participating process. OOMPI provides an abstract base class OOMPI Comm to represent communicators. Derived classes provided by OOMPI include OOMPI Intra comm, OOMPI Inter comm, OOMPI Cart comm, and OOMPI Graph comm, corresponding to an intra-communicator, inter-communicator, intra-communicator with Cartesian topology, and intra-communicator with graph topology, respectively. Individual ports within a communicator are accessed with operator[](), i.e., the ith port of an OOMPI Comm c is c[i]. The following code fragment shows an example of sending and receiving:

int i, j, m, n; OOMPI_Intra_comm Comm; ... Comm[m].Send(i); Comm[n].Receive(j);

Here, the integer i is sent to port m in the communicator and the integer j is received from port n.

3.3.3 Messages We define an OOMPI Message object with a set of constructors, one for each of the MPI base data types. Then, we define all of the communication operations in terms of OOMPI Message objects. The need to construct OOMPI Message objects explicitly is obviated — since promotions for each of the base data types are declared, an OOMPI Message object will be constructed automatically (and transparently) whenever a communication function is called with one of the base data types.

18

Discussion: Message objects could be eliminated entirely by declaring each communication function in terms of every base data type. However, this would result in an enormous number of almost identical member functions. The use of message objects seems better for the sake of maintainability. There is some function overhead because of the need to construct a message object, but the constructors can be made very lightweight so that the overhead is negligible.

The base types supported by OOMPI are: char short int long unsigned char unsigned short unsigned unsigned long float double In addition to messages composed of single elements of these types, it is also desirable to send messages composed of arrays of these types. By introducing an OOMPI Array message object, we can also provide automatic promotion of arrays. Thus, to send an array of integers, we can use a statement like:

int a[10]; OOMPI_Port Port; ... Port.Send(a, 10);

Again, no explicit message is constructed. Note that in the above examples we have not explicitly given a tag to the messages that are sent. If no tag is given, a default tag is assigned by OOMPI, but a user can supply a tag as well:

int a[10]; OOMPI_Port Port; ... 19

Port.Send(a, 10, 201);

The declaration of OOMPI Port::Send() is

void OOMPI_Port::Send(OOMPI_Message buf, int tag = OOMPI_NO_TAG); void OOMPI_Port::Send(OOMPI_Array_message buf, int count, int tag = OOMPI_NO_TAG);

Here, the default value of OOMPI NO TAG is not the tag used on the message. Rather, it is a dummy value that indicates that no tag was explicitly given, so inside the body of OOMPI Send(), a default tag is used, depending on the type of the data. OOMPI reserves the top OOMPI RESERVED TAGS tags. Users can use any tag between zero and OOMPI TAG UB.

3.3.4 User Defined Data Types Although it is convenient to be able to pass messages of arrays or of single elements of basic data types, significantly more expressive power is available by accommodating user objects (i.e., user-defined data types). That is, OOMPI should provide the ability to make statements of the form:

MyClass a[10]; OOMPI_Port Port; ... Port.Send(a, 10, 201);

To accomplish this, OOMPI provides a base class OOMPI User type from which all non-base type objects that will be communicated must be derived. This class provides an 20

interface to the OOMPI Message and OOMPI Array message classes so that objects derived from OOMPI User type can be sent using the syntax above. Besides inheriting from OOMPI User type, the user must also construct objects in the derived class so that an underlying MPI Datatype can be built. OOMPI provides a streams-based interface to make this process easier. The following is an example of a user-defined class object:

class foo : public OOMPI_User_type { public: foo() : OOMPI_User_type(type, this, FOO_TAG) { // Build the data type if it is not built already if (!type.Built()) { type.Struct_start(); type >(OOMPI Message data).

Invokes an MPI Recv()

with MPI ANY SOURCE

as the source argument. The result is put into data. The default tag for OOMPI Message is used. OOMPI Status Recv(R ECVBUF, int tag = OOMPI NO TAG). Invokes an MPI Recv() with MPI ANY SOURCE as the source argument. The result is put into data. If tag is not specified, the default tag for data is used. OOMPI Request Irecv(R ECVBUF, int tag = OOMPI NO TAG). Invokes an MPI Irecv() with MPI ANY SOURCE as the source argument. The result is put into data. If tag is not specified, the default tag for data is used. OOMPI Request Recv init(R ECVBUF, int tag = OOMPI NO TAG). Invokes an MPI Recv init() with MPI ANY SOURCE as the source argument. The result is put into data. If tag is not specified, the default tag for data is used. OOMPI Status Probe(int tag).

Invokes MPI Probe()

with MPI ANY SOURCE as the source argument. OOMPI Status Iprobe(int tag, int& flag). Invokes MPI Iprobe() with MPI ANY SOURCE as the source argument.

130

bool Iprobe(int tag). Shortcut for the previous function, except it returns the value of flag.

General Topology Function int Cart map(int ndims, int dims[], bool Invokes MPI Cart map()

periods[]).

and returns the

resulting newrank. int Cart map(int ndims, int dims[], bool periods).

Shortcut function similar to the above function,

except periods is a single bool indicating the periodicity of all dimensions. int *Dims create(int ndims, int dims[], int nnodes = 0).

Calls MPI Dims create().

Returns the

dimensions array. If nnodes is not specified, Size() is used. int Graph map(int ndims, int dims[], int edges[]).

Invokes MPI Graph map()

and returns the

resulting newrank.

See Also

OOMPI Cart comm,

OOMPI Comm,

OOMPI Graph comm

131

OOMPI Inter comm,

Name

OOMPI Message

Declaration

#include "oompi.h" class OOMPI_Message

Inheritance

This class inherits from OOMPI Tag.

Description

A class for managing the data associated with a message; its type, count, tag, and data. This class also promotes the base types into OOMPI Message objects so that they can be easily accessed in other OOMPI functions. OOMPI Message objects are usually created via promotion (and immediately destroyed) to keep the calling semantics of OOMPI simple. However, sometimes it is desirable to explicitly create an OOMPI Message. Explicit creation of OOMPI Message objects allows the default type tag to be overridden. OOMPI Message objects can be explicitly created for scalar variables and contiguous arrays that require a count argument. If an OOMPI Message is explicitly created, it can be re-used even if the value of the variable (or values in an array) change. The OOMPI Message object keeps a pointer to the data, not a copy of the data. NOTE: If the data pointed to by an OOMPI Message object is deleted, the OOMPI Message will still reference the deleted memory. It is considered erroneous to attempt to use the OOMPI Message after the data memory has been deleted. 132

Constructors/Destructors OOMPI Message(const OOMPI Message& a).

Copy con-

structor. Copies type, location, count, and tag information. OOMPI Message& operator=(OOMPI Message& a).

As-

signment operator. Copies type, location, count, and tag information. OOMPI Message( data, int tag = OOMPI TAG). Constructor. This constructor is used to promote a base type into an OOMPI Message type. This function is not templated; the notation is used for brevity. In the above function prototype can take on any of the following base types: char

unsigned char

float

short unsigned short double int

unsigned

long

unsigned long

OOMPI Packed

The tag argument sets the default tag for that message. If the tag is not specified, the default is used (depending on the type). OOMPI Message(& data, int tag) OOMPI Message( *data, int count) OOMPI Message( *data, int count, int tag). Constructors. Constructors for explicitly creating an array 133

message (not to be confused with an OOMPI Array message). These functions are used when it is desirable to either override the default tag for a base type, or create an envelope for an array that can be used (for example) in the streams send/receive interface. supports the same types as listed above. OOMPI Message(OOMPI Datatype type, void *top, int count = 1).

Constructor.

Used for explicitly creating

OOMPI Message objects based upon OOMPI Datatypes. Uses a default tag based upon type. OOMPI Message(OOMPI Datatype type, void *top, int count, int tag).

Constructor.

Used for explicitly

creating OOMPI Message objects based upon OOMPI Datatypes.

Access Functions void *Get top(void). This access function returns the address of the top of the object that is being encapsulated in the OOMPI Message. Since the OOMPI Message retains a pointer to the data, it will always reflect the current value of the data (even if it changes after the OOMPI Message was created). This is useful for creating “persistent” OOMPI Messages that can be re-used in successive OOMPI calls.

134

MPI Datatype *Get type(void). This access function returns the MPI Datatype associated with the message. This function is only meant to be used by OOMPI; it is not considered to be part of the user interface. int Get count(void). Returns the count of the current object (will always be 1 for data that was promoted).

See Also

OOMPI Constants, OOMPI Datatype, OOMPI Intra comm, OOMPI Packed, OOMPI Port

135

Name

OOMPI Op

Declaration

#include "oompi.h" class OOMPI_Op

Description

An OOMPI Op object has four main functions: construction, copying, use as an argument for reduction operations, and destruction.

Inheritance

None.

Constructors/Destructors OOMPI Op(MPI Op op = MPI OP NULL). MPI constructor. A new container is created for op and a reference to it is made. This constructor also acts as the default constructor. OOMPI Op(const OOMPI Op& a). Copy constructor. Perform a shallow (reference counted) copy. const OOMPI Op& operator=(const OOMPI Op& a). Assignment operator. Perform a shallow (reference counted) copy. OOMPI Op(MPI User function *function, bool commutative = true).

Constructor.

Calls MPI Op -

create() to create an MPI Op handle. A new container is created for the resulting MPI Op handle and a reference to it is made.

Discussion: It would seem more consistent to have an OOMPI User function user's callback method, prototyped as:

136

typedef void

OOMPI User function&(void *invec, void *inoutvec, int &len, OOMPI Datatype &datatype); The OOMPI version

of the user's callback method would have the same semantics as the corresponding MPI callback function, except that it would provide an OOMPI Datatype rather than an MPI Datatype.

Additionally, the

callback method can be a member function of a user object than can have local data associated with it. Such a callback scheme would necessitate a

level of indirection, where OOMPI registers an intermediate MPI callback function that can provide the translation from the MPI Datatype to the corresponding OOMPI Datatype, and then invoke the user callback method. However, it seems that the only reasonable way to do this would be to wrap the calls to MPI Reduce() and MPI Reduce scatter() in functions that cache attributes (or other global data) containing the relevant user method pointer and OOMPI Datatype. This may not provide good performance. For this release of OOMPI, it was decided that MPI

user-defined operators are low-level functions, and therefore must use the corresponding MPI Datatype. NOTE: It is erroneous to generate an OOMPI Datatype from the MPI Datatype that is passed to the user's callback function. This is erroneous for the same reasons that it is erroneous to create a new OOMPI object with the result of a Get mpi() function (see the Compatibility paragraph in section 3.3.12).

137

OOMPI Op(). Destructor. The destructor deletes the reference to

the MPI Op handle (which may trigger a call to MPI Op free())

Access and Information bool Is null(void). Returns a boolean indicating whether the instance is valid for use in reduction operations or not. MPI Op& Get mpi(void). Returns the internal MPI Op.

See Also

OOMPI Comm, OOMPI Intra Comm

138

Name

OOMPI Packed

Declaration

#include "oompi.h" class OOMPI_Packed

Description

The OOMPI Packed object is used to provide a streams-based interface to the MPI-1 packing and unpacking functions. Note that access to MPI Pack size() is not provided through this object. Instead, it is provided through the OOMPI Comm object.

Discussion: Since MPI Pack size() is used to determine how large a buffer is necessary to create in order to pack or unpack an message, it is pointless to create an OOMPI Packed object with a buffer only to determine what the “real” size of the buffer should be. Therefore, MPI Pack size() is encapsulated in all communicators so that the proper buffer size can be determined before an OOMPI Packed object is created.

Inheritance

This class inherits functions from OOMPI Tag.

Constructors/Destructors OOMPI Packed(int size, OOMPI Comm &c, int tag = OOMPI PACKED TAG). Constructor. A buffer of length size is allocated for packing and unpacking. The tag will be used as the default tag in the streams-based interface for sending and receiving this object. 139

OOMPI Packed(void *ptr, int size, OOMPI Comm &c, int tag = OOMPI PACKED TAG). Constructor. A buffer of length size is provided by the caller. The tag will be used as the default tag in the streams-based interface for sending and receiving this object. OOMPI Packed(const OOMPI Packed &a). Copy constructor. The copy constructor performs a deep copy of the object (the entire buffer is copied). const OOMPI Packed &operator=(const OOMPI Packed &a). The assignment operator performs a deep copy of the a object (the entire buffer is copied). The destination buffer is only deleted if the OOMPI Packed object initially created it; if the user specified the buffer in the OOMPI Packed constructor, it is not deleted before copy takes place (but OOMPI will allocate a new buffer for the destination of the copy). OOMPI Packed(). Destructor. The destructor deletes the buffer

only if the OOMPI Packed object created the buffer; if the user specified the buffer in the OOMPI Packed constructor, it is not deleted.

Access and Information int Get position(void). Returns the current position of the pack/unpack. 140

int Get size(void). Returns the size of the buffer available for packing and unpacking. void Reset(void). Resets the state of the object back to the beginning of the buffer. int Set position(int size). Sets the current position of the pack/unpack. Returns the actual position that is set. int Set size(int size). Allows the user to expand or shrink the buffer used for packing and unpacking. Returns the size that the buffer is actually set to.

Packing and Unpacking void Start(int position = 0). Resets the state of the object and prepares for repeated calls to operator() (to unpack from the buffer). Optionally specify a specific location to start in the buffer. OOMPI Packed& operator>(OOMPI Message data). Unpack the specified object from the buffer using MPI Unpack(). Attempting to unpack beyond the end of the buffer is undefined; it is the user' s responsibility to ensure that this does not happen. void Unpack(BUFFER). Unpack the specified object into the buffer using MPI Unpack().

Attempting to unpack beyond the

end of the buffer is undefined; it is the user' s responsibility to ensure that this does not happen.

See Also

OOMPI Comm, OOMPI Intra comm, OOMPI Port, OOMPI Tag

142

Name

OOMPI Port

Declaration

#include "oompi.h" class OOMPI_Port

Description

A class for managing point to point and rooted collective operations.

Inheritance

None.

Constructors/Destructors OOMPI Port(void). Default constructor. Sets the internal communicator to MPI COMM NULL and the internal rank to OOMPI PROC NULL. OOMPI Port(MPI Comm &c, int my rank). MPI constructor. Create an OOMPI Port in the corresponding MPI Comm with the specified rank my rank. OOMPI Port(const OOMPI Port& a).

Copy constructor.

Performs a shallow (reference counted) copy. const OOMPI Port& operator=(const OOMPI Port& a). Assignment operator. Performs a shallow (reference counted) copy. OOMPI Port(). Destructor. Delete the reference to the internal

MPI Comm. This may trigger a call to MPI Comm free() if the original OOMPI Comm has been deleted. 143

Access and Information int Rank(void). Calls MPI Rank() to return the rank of the port in its communicator.

Intercommunicator Management OOMPI Comm Inter Intercomm create(OOMPI Intra Comm& peer comm, int remote leader, int tag = OOMPI INTERCOMM CREATE TAG). Calls MPI Intercomm create() to create a new intercommunicator. The local leader is implicitly specified by the invoking OOMPI Port instance. The remote leader is specified as an (peer comm, remote leader) pair. OOMPI Comm Inter Intercomm create(OOMPI Port& peer port, int tag= OOMPI INTERCOMM CREATE TAG). Calls MPI Intercomm create()

to create a new

intercommunicator. The local leader is implicitly specified by the OOMPI Port that the function is invoked on. The remote leader is specified with peer port.

Rooted Collective Operations void Bcast(BUF). Calls MPI Bcast() with the choice argument BUF. 144

void Gather(S ENDBUF, R ECVBUF). Calls MPI Gather() with the choice arguments S ENDBUF and R ECVBUF. void Gather(S ENDBUF). Shortcut for the previous function; non-root processes may call this function, and therefore not have to specify the R ECVBUF. void Gatherv(S ENDBUF, OOMPI Array message recvbuf, int recvcounts[], int displs[]).

Calls

MPI Gatherv() with S ENDBUF and recvbuf. void Reduce(S ENDBUF, R ECVBUF, OOMPI Op op). Calls MPI Reduce(), using the internal MPI Op of op. void Reduce(S ENDBUF, OOMPI Op op). Shortcut for the previous function; non-root processes may call this function, and therefore not have to specify the R ECVBUF. void Scatter(S ENDBUF, R ECVBUF).

Calls

MPI -

Scatter() with the choice arguments S ENDBUF and R ECVBUF. void Scatter(R ECVBUF). Shortcut for the previous function; non-root processes may call this function, and therefore not have to specify the S ENDBUF.

145

void Scatterv(OOMPI Array message sendbuf, int recvcounts[], int displs[], R ECVBUF). Calls MPI Scatterv() with sendbuf and R ECVBUF.

Streams Interface const OOMPI Port& operator(OOMPI Message buf). Streams interface to MPI Recv().

Sends

void Bsend(S ENDBUF, int tag = OOMPI NO TAG). Calls MPI Bsend() with the choice argument S ENDBUF. If no tag is specified, the default tag for S ENDBUF is used. void Bsend init(S ENDBUF, int tag = OOMPI NO TAG). Calls MPI Bsend init()

with the choice argument

S ENDBUF. If no tag is specified, the default tag for S ENDBUF is used. void Ibsend(S ENDBUF, int tag = OOMPI NO TAG). Calls MPI Ibsend() with the choice argument S ENDBUF. If no tag is specified, the default tag for S ENDBUF is used. void Irsend(S ENDBUF, int tag = OOMPI NO TAG). Calls MPI Irsend() with the choice argument S ENDBUF. If no tag is specified, the default tag for S ENDBUF is used. 146

void Isend(S ENDBUF, int tag = OOMPI NO TAG). Calls MPI Isend() with the choice argument S ENDBUF. If no tag is specified, the default tag for S ENDBUF is used. void Issend(S ENDBUF, int tag = OOMPI NO TAG). Calls MPI Issend() with the choice argument S ENDBUF. If no tag is specified, the default tag for S ENDBUF is used. void Rsend(S ENDBUF, int tag = OOMPI NO TAG). Calls MPI Rsend() with the choice argument S ENDBUF. If no tag is specified, the default tag for S ENDBUF is used. void Rsend init(S ENDBUF, int tag = OOMPI NO TAG). Calls MPI Rsend init()


S ENDBUF. If no tag is specified, the default tag for S ENDBUF is used. void Send(S ENDBUF, int tag = OOMPI NO TAG). Calls MPI Send() with the choice argument S ENDBUF. If no tag is specified, the default tag for S ENDBUF is used. void Send init(S ENDBUF, int tag = OOMPI NO TAG). Calls MPI Send init()


S ENDBUF. If no tag is specified, the default tag for S ENDBUF is used. void Ssend(S ENDBUF, int tag = OOMPI NO TAG). Calls MPI Ssend() with the choice argument S ENDBUF. If no tag is specified, the default tag for S ENDBUF is used. 147

void Ssend init(S ENDBUF, int tag = OOMPI NO TAG). Calls MPI Ssend init()


S ENDBUF. If no tag is specified, the default tag for S ENDBUF is used.

Receives

OOMPI Request Irecv(R ECVBUF, int tag = OOMPI NO TAG). Calls MPI Irecv()


R ECVBUF. If no tag is specified, the default tag for R ECVBUF is used. OOMPI Status Recv(R ECVBUF, int tag = OOMPI NO TAG). Calls MPI Recv() with the choice argument R ECVBUF. If no tag is specified, the default tag for R ECVBUF is used. OOMPI Request Recv init(R ECVBUF, int tag = OOMPI NO TAG). Calls MPI Recv init() argument R ECVBUF.

with the choice

If no tag is specified, the default tag for

R ECVBUF is used.

See Also

OOMPI Array message, OOMPI Comm, OOMPI Intra comm, OOMPI Inter comm,

OOMPI Message,

OOMPI Status, OOMPI User type

148

OOMPI Request,

Name

OOMPI Request

Declaration

#include "oompi.h" class OOMPI_Request

Inheritance

None.

Description

A class for encapsulating a single MPI Request handle and its associated functionality.

Constructors/Destructors OOMPI Request(MPI Request request = MPI REQUEST NULL). MPI constructor. A new container is created for request and a reference to it is made. This constructor also acts as the default constructor. OOMPI Request(OOMPI Request& a).

Copy constructor.

Perform a deep copy; OOMPI Request objects are not reference counted. const OOMPI Request& operator=(OOMPI Request& a). Assignment operator. Perform a deep copy; OOMPI Request objects are not reference counted. const OOMPI Request& operator=(MPI Request& a). Assignment operator.

Perform a deep copy; OOMPI Request

objects are not reference counted. 149

OOMPI Request(void). Free the memory associated with the

current MPI Request. Does not trigger a call to MPI Request free() because the underlying MPI implementation will take care of it.

Access and Information bool Is null(void). Return a boolean indicating whether the underlying MPI Request is MPI REQUEST NULL or not. MPI Request& Get mpi(void). Returns the internal MPI Request. bool operator==(const OOMPI Request& a) Returns a boolean indicating whether the current request refers to the same MPI Request as a. bool operator!=(const OOMPI Request& a) Returns a boolean indicating whether the current request does not refer to the same MPI Request as a.

Test / Wait

OOMPI Status Test(bool& flag). Calls MPI Test(). If the underlying MPI request is valid (i.e., it is not MPI REQUEST NULL) and the operation on the current MPI request has completed, flag is set to true. An OOMPI Status object is returned.

150

bool Test(void). Shortcut for the previous function; returns the boolean flag. OOMPI Status Wait(void). Calls MPI Wait() and waits until the communication associated with the internal MPI request has completed. An OOMPI MPI Status object is returned.

Start / Cancel

void Cancel(). Calls MPI Cancel() to cancel the current communication on the MPI Request handle. void Start(). Calls MPI Start() to initiate non-blocking communication on the MPI Request handle.

See Also

OOMPI Comm, OOMPI Request array, OOMPI Status

151

Name

OOMPI Request array

Declaration

#include "oompi.h" class OOMPI_Request_array

Inheritance

None.

Description

A class for encapsulating an array of MPI Request handles and their associated functionality.

Constructors/Destructors OOMPI Request array(count = 1). Default constructor. An array of MPI request handles is created of size count. OOMPI Request array(MPI Request array[], int count). Constructor. Creates an OOMPI Request array object from an array of MPI Request handles. OOMPI Request array(const OOMPI Request array& a). Copy constructor. Perform a deep copy of a; OOMPI Request objects are not reference counted. const OOMPI Request array& operator=(OOMPI Request array& a). Assignment operator. Perform a deep copy of the OOMPI Request array object; OOMPI Request objects are not reference counted.

152

OOMPI Request array(). Delete the current memory asso-

ciated with the current array of requests. Does not trigger a call to MPI Request free() because the underlying MPI implementation will take care of it.

Access and Information OOMPI Request& operator[](int i). Returns a reference to an OOMPI Request object that contains the ith element in the internal MPI Request array. MPI Request *Get mpi(void). Returns the internal array of MPI Request handles. int Get size(void). Returns the number of MPI Request handles in the internal array. bool Set size(int size).

Sets the size of the array of

MPI Request handles. Returns a boolean indicating whether the action was successful or not. bool operator==(const OOMPI Request array& a). Returns a bool indicating if a pairwise comparison of every element in the current instance to every element in a returns true, false otherwise.

153

bool operator!=(const OOMPI Request array& a). Returns a bool indicating if a pairwise comparison of every element in the current instance to every element in a returns false, true otherwise.

Test / Wait

OOMPI Status array Testall(bool& flag).

Calls

MPI Testall() to test if all of the communications associated with the array of MPI requests have completed. flag is set true if all of the operations have completed. An OOMPI Status array object is returned that contains an MPI Status handle for each MPI Request in the invoking object. bool Testall(OOMPI Status array& status). Shortcut for the previous function, except that it takes status as an argument and returns flag. OOMPI Status Testany(int& index, int& flag). Calls MPI Testany() to test if any of the operations associated with the array of MPI requests has completed. flag is set true if at least one operation completed and index contains the index of the request that completed. An OOMPI Status object is returned that contains the MPI Status of the completed operation. Otherwise, an invalid OOMPI Status is returned.

154

bool Testany(OOMPI Status& status, int& index).

Shortcut for the previous function, except that it

takes status as an argument and returns flag. OOMPI Status array Testsome(int& outcount, int array of indices[]).

Calls MPI Testsome()

to test

some of the operations associated with the array of MPI requests. outcount is set to the number of operations completed.

An

OOMPI Status array object is returned that contains an MPI Status handle for each MPI Request in the invoking object. int Testsome(OOMPI Status array& status, int array of indices[]).

Shortcut for the previous function,

except that it takes status as an argument and returns outcount. OOMPI Status array Waitall(void).

Calls

MPI -

Waitall() to block until all of the operations associated with the array of MPI requests return.

An OOMPI Status array

object is returned that contains an MPI Status handle for each MPI Request in the invoking object. void Waitall(OOMPI Status array& status). Shortcut for the previous function, except that it takes the status as an argument, and returns nothing.

155

OOMPI Status Waitany(int& index).

Calls MPI -

Waitany() to block until any one of the operations associated with the array of requests completes. index contains the index of the request that completed. An OOMPI Status object is returned that contains the MPI Status of the completed operation. int Waitany(OOMPI Status& status).

Similar to the

above function, except that the index is returned and the status is filled. OOMPI Status array Waitsome(int& outcount, int array of indices[]).

Calls MPI Waitsome() to block

until at least one of the operations associated with the array of MPI requests completes. It sets outcount to the number of operations completed.

An OOMPI Status array object is returned that

contains an MPI Status handle for each MPI Request in the invoking object. int Waitsome(OOMPI Status array& status, int array of indices[]).

Shortcut for the previous function,

except that it takes the OOMPI Status array as an argument, and returns outcount.

Start

int Startall(void). Starts all the communications associated with the MPI request array by calling MPI Startall(). 156

See Also

OOMPI Comm, OOMPI Request, OOMPI Status

157

Name

OOMPI Status

Declaration

#include "oompi.h" class OOMPI_Status

Description

A class for encapsulating a single MPI Status handle and its associated functionality.

Inheritance

None.

Constructors/Destructors OOMPI Status(void). Default constructor. An invalid instance is created. Since there is no OOMPI STATUS NULL object, it is not possible to merge the default and MPI constructors into one function. OOMPI Status(MPI Status status). MPI constructor. A new container is created for status and a reference to it is made. OOMPI Status(OOMPI Status& a). Copy constructor. Perform a deep copy; OOMPI Status objects are not reference counted. OOMPI Status& operator=(OOMPI Status& a). Assignment operator. Perform a deep copy; OOMPI Status objects are not reference counted. OOMPI Status& operator=(MPI Status& a).

Assign-

ment operator. Perform a deep copy; OOMPI Status objects are not reference counted. 158

OOMPI Status(). Destructor. Delete the current reference to

the internal MPI Status.

Access and Information int Get count(OOMPI Datatype type).

Calls MPI -

Get count() to get the number of entries of datatype that were received. int Get elements(OOMPI Datatype type). Calls MPI Get elements() to get the number of received basic elements that were received. int Get error(void). Returns the error code of the message referenced by the status object. MPI Status& Get mpi(void). Returns a reference to the underlying MPI Status status handle. int Get source(void). Returns the source rank of the message referenced by the status object. int Get tag(void). Returns the tag of the message referenced by the status object.

Test

bool Test cancelled(OOMPI Status& status).

Calls

MPI Test cancelled() to determine whether the cancel associ-

159

ated the OOMPI Request object was successful. A true is returned upon a successful cancellation.

See Also

OOMPI Comm,

OOMPI Datatype,

Request

160

OOMPI Port,

OOMPI -

Name

OOMPI Status array

Declaration

#include "oompi.h" class OOMPI_Status_array

Description

A class for encapsulating an array of MPI Status handles and their associated functionality.

Inheritance

None.

Constructors/Destructors OOMPI Status array(int size = 1). Default constructor. A new container is created for a newly created array of size MPI Status handles, and a reference to it is made. OOMPI Status array(MPI Status array[], int size = 1). Constructor. A new container is created for array and a reference to it is made. OOMPI Status array(const OOMPI Status array& array). Copy constructor. Performs a deep copy; OOOMPI Status array objects are not reference counted. OOMPI Status array& operator=(const OOMPI Status array& array).

Assignment operator.

Performs a

deep copy; OOOMPI Status array objects are not reference counted. 161

OOMPI Status array() Destructor. Frees the memory associ-

ated with the internal MPI Status array if the memory was allocated by OOMPI.

Access and Information OOMPI Status& operator[](int i). Returns a reference to the ith OOMPI Status in the array. MPI Status *Get mpi(void). Returns a pointer to the underlying MPI Status status array. int Get size(void). Returns the number of MPI Status handles in the internal array. bool Set size(int size).

Sets the size of the array of

MPI Status handles. Returns a boolean indicating whether the action was successful or not.

See Also

OOMPI Comm,

OOMPI Datatype,

Request

162

OOMPI Port,

OOMPI -

Name

OOMPI Tag

Declaration

#include "oompi.h" class OOMPI_Tag

Description

The OOMPI Tag class is used for inheritance only; it is never explicitly instantiated. Several OOMPI objects inherit from OOMPI Tag to gain the use of its methods.

Inheritance

None.

Constructors/Destructors OOMPI Tag(int tag = OOMPI NO TAG). Default constructor. Creates a tag with a sentinel value that, while valid, indicates that no tag has been set.

Access and Information int Get tag(void). Returns the value of the tag. void Set tag(int tag). Sets the value of the tag.

See Also

OOMPI Constants

163

Name

OOMPI User type

Declaration

#include "oompi.h" class OOMPI_User_type

Description

A base class for creating user-defined OOMPI data objects. Classes derived from OOMPI User type can immediately use existing OOMPI communication functions, provided the user-defined type is properly constructed.

Inheritance

This class inherits functions from OOMPI Message and OOMPI Tag.

Constructors/Destructors OOMPI User type(OOMPI Datatype &type, void Constructor.

*top, int tag).

datatype with a user object.

Associates an OOMPI

Its arguments are a pointer to the

static OOMPI Datatype member of the user class (see code example in section 3.3.4), this, and the default tag to use for this class. NOTE: It is not necessary for the type argument to have been constructed yet; it only needs to be instantiated. OOMPI User type(). Destructor. Does nothing except internal

bookkeeping.

See Also

OOMPI Datatype, OOMPI Message, OOMPI Tag

164

Name

OOMPI Enumerated Types

Declaration

#include "mpi.h"

Description

Listed below are OOMPI enumerated types and their possible values.

OOMPI Enumerated Types OOMPI Aint. This type corresponds to MPI Aint. OOMPI Compare. This type is returned from the communicator and group Compare() functions.

OOMPI IDENT OOMPI CONGRUENT OOMPI SIMILAR OOMPI UNEQUAL Default tags. The following are a list of default tags that are used in OOMPI (usually based upon the datatype). Note that all of these values are above OOMPI TAG UB, and should never conflict with user tags.

165

OOMPI CHAR TAG

OOMPI SHORT TAG

OOMPI INT TAG

OOMPI LONG TAG

OOMPI UNSIGNED -

OOMPI UNSIGNED SHORT TAG

CHAR TAG OOMPI UNSIGNED TAG OOMPI UNSIGNED LONG TAG OOMPI FLOAT TAG

OOMPI DOUBLE TAG

OOMPI BYTE TAG

OOMPI MESSAGE TAG

OOMPI PACKED TAG

OOMPI MPI DATATYPE TAG

OOMPI INTERCOMM -

OOMPI NO TAG

CREATE TAG OOMPI Error action. This type is used to check and set what OOMPI does when MPI errors are encountered. Valid values are: OOMPI ERRORS ARE FATAL

Let the underlying MPI function handle the error.

OOMPI ERRORS EXCEPTION OOMPI throws an OOMPI Error exception to handle the error. OOMPI ERRORS RETURN

Do nothing. Implementation dependant on how reliable MPI will be able this.

166

OOMPI Error type.

OOMPI errno is loaded with a value of

this type after an MPI error occurs. The values listed below have the same meanings as their C counterparts.

OOMPI SUCCESS

OOMPI ERR TOPOLOGY

OOMPI ERR BUFFER

OOMPI ERR DIMS

OOMPI ERR COUNT

OOMPI ERR ARG

OOMPI ERR TYPE

OOMPI ERR UNKNOWN

OOMPI ERR TAG

OOMPI ERR TRUNCATE

OOMPI ERR COMM

OOMPI ERR OTHER

OOMPI ERR RANK

OOMPI ERR INTERN

OOMPI ERR REQUEST OOMPI ERR PENDING OOMPI ERR ROOT

OOMPI ERR IN STATUS

OOMPI ERR GROUP

OOMPI ERR LASTCODE

OOMPI ERR OP

See Also

OOMPI Constants

167

Name

OOMPI Constants

Declaration

#include "mpi.h"

Description

Listed below are OOMPI global constants and their respective types. They are mainly used for initialization and comparison. It is erroneous to attempt to assign a value to any of these constants.

OOMPI Constants int OOMPI ANY SOURCE. Has the same meaning as MPI ANY SOURCE. int OOMPI ANY TAG. Has the same meaning as MPI ANY TAG. OOMPI Comm world OOMPI COMM WORLD. Singletary instance of the OOMPI Comm world class. Pre-defined datatyes. The following OOMPI Datatype constants are initialized upon OOMPI COMM WORLD::Init(): OOMPI CHAR

OOMPI SHORT

OOMPI INT

OOMPI LONG

OOMPI UNSIGNED CHAR OOMPI UNSIGNED SHORT OOMPI UNSIGNED

OOMPI UNSIGNED LONG

OOMPI FLOAT

OOMPI DOUBLE

OOMPI BYTE

OOMPI MESSAGE

OOMPI PACKED 168

OOMPI Error action OOMPI errno.

Contains the result

code of the last OOMPI function called. OOMPI Environment OOMPI ENV. Singletary instance of the OOMPI Environment class. int OOMPI HOST. This integer is initialized in OOMPI COMM WORLD::Init(). It corresponds to the integer attribute that the MPI HOST keyval can be used to retrieve. int OOMPI IO. This integer is initialized upon OOMPI COMM WORLD::Init(). It corresponds to the integer attribute that the MPI IO keyval can be used to retrieve. Pre-defined operations. The following OOMPI Op constants are initialized in OOMPI COMM WORLD::Init():

OOMPI MAX

OOMPI MIN

OOMPI SUM

OOMPI PROD

OOMPI MINLOC OOMPI MAXLOC OOMPI BAND

OOMPI BOR

OOMPI BXOR

OOMPI LAND

OOMPI LOR

OOMPI LXOR

OOMPI Port OOMPI PORT NULL.

This port is analogous to

MPI PROC NULL; any communications on it will immediately return.

169

int OOMPI PROC NULL. Has the same meaning as MPI PROC NULL. int OOMPI RESERVED TAGS. The number of tags that OOMPI reserves for internal use. int OOMPI TAG UB. This integer is initialized in OOMPI COMM WORLD::Init(). It corresponds to the integer attribute that the MPI TAG UB keyval can be used to retrieve. However, since OOMPI reserves the upper OOMPI RESERVED TAGS tags, OOMPI TAG UB actually equals the MPI implentation' s upper bound on tags minus OOMPI RESERVED TAGS. int OOMPI UNDEFINED.

Has the same meaning as MPI -

UNDEFINED. bool OOMPI WTIME IS GLOBAL.

This bool is initialized in

OOMPI COMM WORLD::Init(). It corresponds to the integer attribute that the MPI WTIME IS GLOBAL keyval can be used to retrieve.

See Also

OOMPI Comm world,

OOMPI Datatype,

OOMPI -

Environment, OOMPI Op, OOMPI Packed, OOMPI Port

170

APPENDIX C

PIPT INTERFACE

This section covers the design of the PIPT, including the defined constants and enumerated types, the primary data structures, and the major functions provided by the PIPT.

C.1 Defined Constants and Enumerated Types The following pages contain the descriptions for the enumerated data types contained in the PIP Toolkit.

171

Name

PIPT Enumerated Types

Declaration

#include hpipt.hi PIPT PARAMTYPES PIPT PARAMACTIONS

Description

PIPT PARAMTYPES is an enumerated list that corresponds to all the legal types used in the PIPT. These include many of the built in C types and PIPT special types. PIPT PARAMETERS are used extensively in the registration routines. The following is a complete list: PIPT_CHAR PIPT_UCHAR PIPT_SHORT PIPT_USHORT PIPT_INT PIPT_LONG PIPT_ULONG PIPT_DOUBLE PIPT_FLOAT PIPT_ARRAY PIPT_IMAGE PIPT_FEATURE

PIPT PARAMACTIONS is an enumerated type used in the call to PIPT Register params() to associate an action with each registered parameter. The possible actions are: BROADCAST, SCATTER, GATHER, and REDUCE. BROADCAST parameters are sent by the Manager to all the Workers working on the routine. This is done once by the Manager at the 172

start of an image processing routine. Any PIPT PARAMTYPE can be broadcast. SCATTER parameters are divided by the Manager into slices. Each slice is received by only one Worker. Only PIPT FEATURE and PIPT IMAGE can be scattered. GATHER is the opposite of scatter. It is used when an image or feature is distributed among the Workers and needs to be gathered back to the Manager for reassemble. Only PIPT FEATURE and PIPT IMAGE can be scattered. REDUCE is used when a reduction operation is performed over the parameter. Only an ARRAY can be reduced.

See Also

PIPT IMAGE,

PIPT FEATURE,

Register()

173

PIPT ARRAY,

PIPT -

C.2 Data Structures The following pages contain the descriptions for the data structures contained in the PIP Toolkit.

174

Name

PIPT Datatypes

Declaration

#include hpipt.hi PIXEL PALETTE IMAGE FEATURE ARRAY

Description

A PIXEL is the smallest unit of an IMAGE. Values for a PIXEL range from 0 to MAXPIXEL. The constant MAXPIXEL is defined as 255, but your code should use the defined constant. PALETTE is a data structure for encapsulating color palette data. The actual PALETTE data structure is defined as follows:

typedef struct { int nColors; byte *rMap; byte *gMap; byte *bMap; } PALETTE;

The field nColors specifies the size of the color palette. The fields rMap, gMap, and bMap are byte arrays of size nColors, which store

175

the color palette data. A byte is an unsigned 8-bit PIPT datatype with values from 0 to 255. IMAGE is data structure for encapsulating image data. The actual IMAGE data structure is defined as follows:

typedef struct { int u_long u_long u_long PIXEL PALETTE

fmt; h; w; p; ***data; *palette;

} IMAGE;

The fields h, w, p specify the height, width, and number of planes in the image, respectively. The data is stored in raster format: data[0][0] points to the beginning of a contiguous plane of h*w pixels, data[k][0] points to the beginning of the kth plane (or color); data[k][i] points to the first pixel in the i-th row of the k-th plane. The palette is only used for palettized color images. FEATURE is data structure for encapsulating feature data. The actual FEATURE data structure is defined as follows:

typedef struct {

176

u_long h; u_long w; u_long dim; float ***data; } FEATURE;

The fields h, w, and dim specify the height, width, and number of dimensions in the feature, respectively. The feature data is pointed to by the data field. FEATURES are often used in conjunction with IMAGES in image processing routines and share a similar data structure. ARRAY is a an N dimensional array of a specified PIPT DATATYPE. Its structure is defined as follows:

typedef struct { PIPT_PARAMTYPES type; int ndims; int contig; int *dims; void *ptr; } ARRAY;

The allowed PIPT PARAMTYPES for ARRAYS are:

PIPT -

CHAR, PIPT UCHAR, PIPT SHORT, PIPT USHORT, PIPT INT, PIPT LONG, PIPT ULONG, PIPT DOUBLE, and PIPT FLOAT. The field, ndims, refers to the number of dimensions, and the dims array refers to the size of each dimension. The number of

177

contiguous dimensions is memory is stored in contig, and ptr is the pointer to the array data.

See Also

allocImage(), allocFeature(), PIPT Array alloc(), PIPT PARAMTYPES

178

C.3 Functions The following pages contain the descriptions for the functions contained in the PIP Toolkit that are unique to the PIPT. Descriptions of the image processing functions contained in the PIPT can be found in the documentation for the original IP Toolkit.

179

Name

PIPT Array Functions

Declaration

#include hpipt.hi ARRAY *PIPT Array alloc(PIPT PARAMTYPES type, int *dims, int ndim, int maxcontig) void PIPT Array free(ARRAY *array)

Description

PIPT Array alloc() allocates memory for an ndims dimensional array of the specified type. The value of type can be any of the PIPT PARAMTYPES, however an ARRAY of PIPT ARRAY, PIPT IMAGE, and PIPT FEATURE cannot be passed between the Manager and Workers. The dims array contains the size of each of the ndims dimensions. The parameter maxcontig is the maximum number of dimensions which will be contiguous in memory. If the value of maxcontig is 0, or greater than the number of dimensions, then PIPT Array alloc() will attempt to make all the dimensions contiguous. If PIPT Array alloc() cannot make the number of specified number dimensions contiguous, then it will try one less than that number, then two less, and so on until it can either allocate the ARRAY, or fails. Upon failure PIPT Array alloc() will return NULL. After the ARRAY has been allocated successfully, the contig field of the ARRAY will contain the actual number of contiguous dimen180

sions in memory, which may not equal the number passed into PIPT Array alloc(). The following example creates an array with 3 dimensions. The last two dimensions are designated to be contiguous in memory.

int dims[] = {3,100,100}; int ndims = 3; ARRAY *a; a = PIPT_Array_alloc(PIPT_PIXEL, dims, ndims, 2);

The first element in the ARRAY can be accessed as follows:

((PIXEL ***) a)->ptr[0][0][0]}

PIPT Array free() frees an ARRAY structure previously allocated by PIPT Array alloc().

Notes

If PIPT Array free() is called with an array of pointers to other structures, only the pointers will be freed.

See Also

ARRAY, PIPT PARAMTYPES

181

Name

PIPT Errors

Declaration

#include hpipt.hi PIPT ERRNO errno char **PIPT errlist PIPT ERROR LEVEL *PIPT errlevel BOOLEAN PIPT errflag BOOLEAN PIPT Completed PIPT ERRNO *PIPT errarray void PIPT Set errno(PIPT ERRNO errno) void PIPT perror(char *message)

Description

PIPT errno contains the error tag of the last error to have occurred. If this number is non zero, then an error has occurred. The errno values are listed in the table below. PIPT errlist is a table containing error messages for each PIPT ERRNO. PIPT errlevel is a table listing the severity of each PIPT ERRNO. There are four error levels: PIPT HEALTHY, PIPT BRUISED, PIPT WOUNDED, and PIPT DEAD. PIPT HEALTHY indicates that there are no problems.

182

PIPT BRUISED is an internal error and will not be seen in user programs. PIPT WOUNDED occurs in the Manager if the routine was aborted. If the error occurred in a Worker, it indicates that the Worker was unable to work on the current routine. The Manager will skip this Worker for the remainder of the routine but return to it on the next routine. The errors that have an error level of PIPT WOUNDED are: PIPT PIPT PIPT PIPT PIPT PIPT PIPT PIPT PIPT PIPT PIPT PIPT PIPT PIPT PIPT PIPT

errno ERR NONE ERR INIT ERR RREG ERR KREG ERR CREG ERR PREG ERR RPREV ERR KPREV ERR RPARAM ERR KPARAM ERR KMATCH ERR NSUP ERR HOOK ERR USR2 ERR SIG

PIPT errlist PIPT no error PIPT not initialized PIPT routine not registered PIPT kernel not registered PIPT compute point function not registered PIPT parameter not registered PIPT routine previously registered PIPT kernel previously registered PIPT bad routine parameter PIPT bad kernel parameter PIPT kernel and routine do not match PIPT option is not supported PIPT hook function failed PIPT user error 2 (wounded) PIPT signal received

PIPT DEAD indicates for the Manager that a fatal error occurred and no more image processing is possible. If this error occurs on the Worker, it indicates that error is severe enough so that the Worker can no longer process any more routines. The Manager will not send any more work to this Worker.

183

PIPT PIPT PIPT PIPT PIPT PIPT

errno ERR MEM ERR INTRNL ERR WORK ERR USR3 ERR SIG

PIPT errlist PIPT out of memory PIPT internal error PIPT all workers dead PIPT user error 3 (dead) PIPT signal received

PIPT errflag is set to TRUE on the Manager process whenever an error occurs on any process. PIPT Completed is set to TRUE if a routine completes successfully. PIPT errarray contains an array of PIPT errno values for all the processors. For Example:

/* error on the Manager */ PIPT_Errarray[0]; /* error on the first Worker */ PIPT_Errarray[1];

PIPT Set error() is called to set a PIPT errno value when an error has occurred. PIPT perror() displays all the errors encountered by any processor since the last time they were cleared. If the message parameter, is nonNULL, then the message is displayed along with the error messages. Errors are cleared by PIPT Manager().

184

Name

PIPT Exit()

Declaration

#include hpipt.hi void PIPT Exit()

Description

PIPT Exit() should be called in the user program before exiting. It tells the Workers to quit, then destroys the MPI communicators. This function allows the user program to exit normally.

Notes

No other PIPT functions can be called after PIPT Exit() is invoked. If your application needs to use the Worker processors for other parallel processing besides the PIPT, but will later return to the PIPT, then PIPT Set toolkit mode should be used.

See Also

PIPT Init(),

PIPT Set toolkit mode(),

toolkit mode()

185

PIPT Get -

Name

PIPT Get object offset()

Declaration

#include hpipt.hi OBJECT INFO *PIPT Get object offset(void *ptr)

Description

PIPT Get object offset() returns a pointer to an OBJECT INFO structure which contains information about where slice on a Worker process fits into the original image or feature. The parameter ptr points to the image or feature slice. OBJECT INFO is defined as:

typedef struct _object_info { u_long u_long u_long u_long u_long

windowTop; realTop; realBottom; windowBottom; height;

} OBJECT_INFO;

The windowTop and windowBottom are the row numbers from where the slice was taken in the original image or feature. The top most row is indexed with 0. The bottom most row is indexed with height. Slices are often sent with extra rows on the top and bottom to account for the window in window operators. The actual part of the image that the Worker is expected to compute is defined by the realTop and 186

realBottom row numbers. Once again, these numbers refer to rows in the original image or feature, not rows in the slice. The height is defined as the number of rows in the overall image, not the height of the slice.

Notes

The information that this function returns is only important for window kernels such as ProcessWindow or ProcessFeatureWindow where the boundaries of the data in the window need to be controlled so that the Workers do not cross over the boundaries of the overall image.

Example

OBJECT_INFO *info; info = PIPT_Get_object_info(ptr); if (info.windowTop == 0) /* Top slice */ if (info.windowBottom == info.height - 1) /* Bottom slice */ else /* Middle slice */

Diagnostics

PIPT Get object offset() will return NULL upon failure.

See Also

PIPT Kernel()

187

Name

PIPT Init()

Declaration

#include hpipt.hi void PIPT Init(int argc, char *argv[])

Description

PIPT Init() initializes the PIPT. It must be the first PIPT function called in any user program. It has three main purposes. First, it initializes MPI (the Message Passing Interface) and sets up the communicators between the manager and the worker processors. Second, it sets up several internal tables including the routine and kernel tables. These tables store information about all the image processing routines available to the PIPT. Third, it defines a signal handling routine for all signals which were not already assigned by the user program. The error handling routine allows for proper error handling of unexpected signals.

Notes

PIPT Init() is called by all the invoking processes. Only the manager process returns.

See Also

PIPT Exit(),

PIPT Set toolkit mode(),

toolkit mode()

188

PIPT Get -

Name

PIPT Kernels

Declaraction

#include hpipt.hi IMAGE *ProcessWindow(IMAGE *pimageIn, u long ulWindowHeight, u long ulWindowWidth, PIXEL (*transform)()) IMAGE *ProcessPoint(IMAGE *pimageIn, PIXEL (*transform)()) IMAGE *ProcessPointRow(IMAGE *pimageIn, PIXEL (*transform)()) FEATURE *ProcessFeatureWindow(IMAGE *pimageIn, u long ulWindowHeight, u long ulWindowWidth, float (*transform)()) FEATURE *ProcessFeaturePoint(IMAGE *pimageIn, float (*transform)()); FEATURE *ProcessMultFeatureWindow(IMAGE *pimageIn, u long ulNumFeatures, u long ulWindowHeight, u long ulWindowWidth, void (*transform)()) FEATURE *ProcessFeatureWindowFeature(FEATURE *pfeatureIn, u long ulWindowHeight, u long 189

ulWindowWidth, float (*transform)()) ARRAY *ReducePoint(IMAGE *pimageIn, ARRAY *array, void (*transform)())

Description

These functions are the parallel computational kernels provided by the PIPT. Most parallel image processing routines call one of these functions as an entry point into the opaque parallel transport mechanism. There are three types of kernels: point operators: Each point in the output image or feature depends on only the corresponding point in the input image or feature. window operators: Each point in the output image or feature depends on an area, or window, of input points. The size of the window is passed into the kernel. reduction operators: The output depends on every point of the input image. ProcessWindow() performs an operation on a window of input pixels to compute each pixel of the output image. The first argument, pimageIn, is the input image. The next two arguments, ulWindowHeight and ulWindowWidth, define the size of the window. The forth argu-

190

ment, transform, is a function pointer that points to the specific window operator to perform. The prototype for the window operator is: PIXEL ComputePoint(IMAGE *pimageWindow); ProcessPoint() performs an operation on one input pixels to compute each pixel of the output image. The first argument, pimageIn, is the input image. The second argument, transform, is a function pointer that points to the specific point operator to perform. The prototype for the point operator is: PIXEL ComputePoint(PIXEL pixelValue); ProcessPointRow() works just like ProcessPoint() except that the point operator is passed in the row number and plane number of the current pixel, as well as the pixel value. The prototype for the point operator is: PIXEL ComputePoint(PIXEL pixelValue, int rowNum, int planeNum); ProcessFeatureWindow() performs an operation on a window of input pixels to compute each float of the output feature. The first argument, pimageIn, is the input image. The next two arguments, ulWindowHeight and ulWindowWidth, define the size of the window. The forth argument, transform, is a function pointer that points to the 191

specific window operator to perform. The prototype for the window operator is: PIXEL ComputePoint(IMAGE *pimageWindow); ProcessFeaturePoint() performs an operation on one input pixels to compute each float of the output feature. The first argument, pimageIn, is the input image. The second argument, transform, is a function pointer that points to the specific window operator to perform. The prototype for the window operator is: float ComputePoint(PIXEL pixelValue); ProcessMultFeatureWindow() performs an operation on a window of input pixels a multiple number of times to compute each float of the output feature. The first argument, pimageIn, is the input image. The second argument, ulNumFeatures, specifies the number of times to perform the window operator. The next two arguments, ulWindowHeight and ulWindowWidth, define the size of the window. The fifth argument, transform, is a function pointer that points to the specific window operator to perform. The prototype for the window operator is: PIXEL ComputePoint(IMAGE *pimageWindow); ProcessFeatureWindowFeature() performs an operation on a window of input floats to compute each float of the output feature. The 192

first argument, pfeatureIn, is the input feature. The next two arguments, ulWindowHeight and ulWindowWidth, define the size of the window. The forth argument, transform, is a function pointer that points to the specific window operator to perform. The prototype for the window operator is: float ComputePoint(FEATURE *pfeatureWindow); ReducePoint() is a reduction kernel. It performs and operation over each plane of an image. The first argument, pimageIn, is the input image. The second argument, array, is an N dimensional array which is defined in the routine. This 0 dimension is defined by the opaque transport mechanism and refers to the number of slices which the image will be broke up into. The first dimension refers to the plane number. The rest of the dimensions are determined by the routine return type. The third argument, transform, is a function pointer that points to the specific reduce operator to perform. The prototype for the window operator is: void ComputePoint(void *reduceVariable, PIXEL pixelValue);

Notes

The above parallel kernels can only be called by routines which have been registered with PIPT Register routine.

Diagnostics

The kernels return a NULL pointer upon failure. 193

See Also

PIPT Register(), ARRAY

194

Name

PIPT Kernel Invocation Routines

Declaration

#include hpipt.hi void PIPT Kernel start(void *Kernel) void PIPT Kernel param(void *ptr [, u long overlap ]) void *PIPT Kernel(void *Transform) void PIPT Kernel end()

Description

The four functions PIPT Kernel start(), PIPT Kernel param(), PIPT Kernel(), and PIPT Kernel end() are needed to properly invoke a computational kernel in PIPT. Often these functions are grouped together into a wrapper function, instead of each being called explicitly by the image processing routine (see the example below). PIPT Kernel start() is called with a function pointer to the kernel as its only parameter. It informs the opaque transport mechanism that a new kernel is about to be invoke. PIPT Kernel param() is called to load a kernel parameter with a value. The first parameter is a pointer to the kernel parameter to load. The second parameter is only used if the first parameter is an image or a feature. It represents the overlap caused by a window operator. NOTE: The parameters used in the PIPT Kernel param() calls must

195

be made in the same order that they were registered in the kernel registration function. PIPT Kernel() Is used to call the kernel. Its only argument is a pointer to the routine' s transform functions. For this pointer, it can determine the correct kernel to call. This level of indirection is needed for internal bookkeeping. PIPT end() is called after PIPT Kernel returns. It informs the opaque transport mechanism that the kernel is completed.

Examples

The following example is taken from the PIPT Process window() kernel. The wrapper function, ProcessWindow() is called from the image processing routines, and makes all the calls necessary to invoke PIPT Process window:

IMAGE * ProcessWindow(IMAGE *pimageIn, u_long ulWindowHeight, u_long ulWindowWidth, PIXEL (*computePoint)()) { u_long ulOverlap = ulWindowHeight / 2; IMAGE *ret; PIPT_Kernel_start(PIPT_Process_window); PIPT_Kernel_param(&ulWindowHeight); PIPT_Kernel_param(&ulWindowWidth); PIPT_Kernel_param(&pimageIn, &ulOverlap); ret = PIPT_Kernel(computePoint);

196

PIPT_Kernel_end(); return ret; }

See Also

PIPT Kernel(), PIPT Register()

197

Name

PIPT Manager()

Declaration

#include hpipt.hi BOOLEAN PIPT Manager(void (*transform)(), void *kernel)

Description

PIPT Manager() manages the parallel aspect of the image processing routines. The call to PIPT Manager() is made within the computational kernel and marks the place where the Manager and Workers part. When a Worker process invokes PIPT Manager() it simply returns FALSE. When the Manager process invokes PIPT Manager() it begins to send messages to the Workers to give them instructions. The first message tells the Workers which routine to process. The Manager then divides the scatter variables (see PIPT PARAMACTIONS) into slices and begins sending the slices to the Workers for processing. When a Worker completes a slice, it sends the result back to the Manager. When all the slices have been processed, the Workers go into wait mode, and the Manager returns back to the calling kernel function with a value of TRUE. The transform parameter is a function pointer that points to the routine' s Computer Point function. This function pointer was passed into the kernel by the routine. The kernel parameter is a function pointer to the current computational kernel. 198

Notes

All the complexities of parallel programming were placed into the transport mechanism which is only accessed through the PIPT Manager() function. This was done intentionally so that the programmer of custom kernels and routines do not need to be familiar with parallel programming in order to do image processing with the PIPT.

Example

The following example is taken from the PIPT Process window() kernel. It illustrates how to invoke PIPT Manager() and do the appropriate error checking.

if (PIPT_Manager(transform, PIPT_Process_window)) { if (!PIPT_Completed) { freeImage(pimageOut); return NULL; } return pimageOut; }

Diagnostics

If PIPT Manager() was able to complete the routine successfully, the global BOOLEAN variable PIPT Completed will be set to TRUE. If PIPT Completed returns FALSE then PIPT errno should be checked in the user program. PIPT Completed only has meaning for the Manager process.

See Also

PIPT Set num divisions(), PARAMACTIONS, PIPT Error() 199

PIPT Kernel(),

PIPT -

Name

PIPT Memory Manipulation Functions

Declaration

#include hpipt.hi void *PIPT Malloc(int size) void *PIPT Calloc(int num, int size) void *PIPT Realloc(void *ptr, int size) void *PIPT Free(void *ptr)

Description

These functions are wrappers for the corresponding C functions. On the Manager these functions simply call their C counterparts, but on the Workers they perform internal memory management within the PIPT. When either PIPT Malloc, PIPT Calloc, or PIPT Realloc, is called, the function places the pointer of the newly allocated memory into a table. When PIPT Free is called with a non NULL pointer, it simply frees the memory that ptr points to and removes it from the table. But when PIPT Free is called with a NULL pointer, it frees everything in the table. This is done by the transport mechanism on the Workers after an image processing routine is finished.

200

Name

PIPT Registration Functions

Declaration

#include hpipt.hi void PIPT Register routine(void *Routine, void *computePoint, void *Kernel) void PIPT Register kernel(void *Kernel) void PIPT Register param(void *Function, PIPT PARAMTYPES type, PIPT PARAMACTIONS action, void *ptr [, u long overlap ]) void PIPT Register hooks(void *Routine, BOOLEAN (*OpenRoutine)(), void (*PreSlice)(), void (*PostSlice)(), void (*CloseRoutine)()) void PIPT Register user kernels() void PIPT Register user routines()

Description

PIPT Register routine() registers a routine with the opaque transport mechanism which enters the routine into the PIPT Routine Table. The table associates routines with a computational kernel and computer point function. PIPT Register routine() takes three arguments: Routine is a pointer to the routine entry function, which is called by the Manager. computePoint is a pointer to the routine' s compute point 201

function. The compute point function is called by the Worker processes. Kernel is a pointer to the computational kernel which is called by the routine. PIPT Register kernel() is called to register a computational kernel with the PIPT. The only parameter is a pointer to the kernel itself. PIPT Register param() associates a parameter with a previously registered routine or a kernel with the opaque transport mechanism. The first argument, Function, is a pointer to either a routine or a kernel. The second argument, type, specifies the PIPT type of the parameter. PIPT PARAMTYPES is an enumerated type which contains the valid PIPT types. PIPT ACTION specifies what operation is to be performed on the parameter by the opaque transport mechanism. These possible actions are BROADCAST, SCATTER, GATHER, and REDUCE. The forth parameter, ptr, is the pointer to the parameter. A fifth argument, overlap, is used whenever the action is either PIPT SCATTER, or PIPT GATHER. An overlap is the number of extra rows which must be sent on the top and bottom of each slice of a window operation. For window operations, the value of the overlap is usually one half the window height size. For non-window kernels, a pointer to u long

202

must still be provided; define a variable with a value of zero to send into this function. PIPT Register hooks() is used in routines where the Worker needs more entry points into the routine than just the compute point function. PIPT Register hooks provides a mechanism for increased flexibility if needed, by allowing the routine to register four additional entry points, or hooks. For example, in some routines, memory needs to be allocated by the Workers at the start of a routine, then freed when the routine terminates. In the scheme described above, this is not possible, since each Worker only calls the Compute Point function. To fix this situation define two new functions called Open() and Close(). Each Worker will call Open() after it receives all the routine and kernel parameters. And each Worker will call Close() after the computation for the routine is complete. Another situation arises when a particular action needs to be performed by the Worker on each slice before and after processing it. Two new functions can be defined called PreSlice(), and PostSlice(). PreSlice is used mainly in the reduction routines to initialized state variables. PostSlice is not used in any built in PIPT routine, but may be useful for some user designed routines.

203

PIPT Register user routines() is a function that the user program needs to write if there are any non built in PIPT routines to be registered. This function contains a list of calls to routine registration functions, and is called by the PIPT during the PIPT Init. For example:

PIPT_Register_user_routines() { My_routine1_register(); My_routine2_register(); My_routineN_register(); }

The above code calls the registration functions for each of the listed routines. PIPT Register user kernels() is identical to PIPT Register user routines() except that it is used to register user kernels instead of user routines.

See Also

PIPT Init(),

PIPT Kernel param(),

PARAMACTIONS, PIPT PARAMTYPES

204

PIPT -

Name

PIPT Toolkit Mode

Declaration

#include hpipt.hi void PIPT Set toolkit mode(BOOLEAN mode) BOOLEAN PIPT Get toolkit mode()

Description

PIPT Set toolkit mode() allows user programs to leave and re-enter the PIPT. This feature is used in programs that do parallel processing other than parallel image processing with PIPT. When the toolkit mode is set to FALSE with PIPT Set toolkit mode(FALSE), the worker processors exit from their waiting mode and re-enter the user program. When the toolkit mode is set to TRUE, the workers leave the user program and re-enter their waiting mode, ready for more image processing processing. PIPT Get toolkit mode() returns the current toolkit mode. A return value of TRUE indicates that the PIPT is ready do image processing routines

See Also

PIPT Init(), PIPT Exit()

205

Name

PIPT Worker Functions

Declaration

#include hpipt.hi void PIPT Get numworkers(int *numworkers, int *maxworkers) void PIPT Set numworkers(int numworkers)

Description

PIPT Get numworkers() returns pointers to the current and maximum number of workers in the PIPT application. PIPT Set numworkers() set the number of worker processors available for parallel image processing.

Notes

The number of workers cannot exceed the maximum number of workers. The maximum number of workers is defined as the number of processors that the PIPT application was originally invoked with minus one. If numworkers = 1, then the PIPT enters serial mode.

Example

/* decrement the number of workers */ int numworkers, maxworkers; PIPT_Get_numworkers(&numworkers, &maxworkers); PIPT_Set_numworkers(numworkers-1);

206

MPI - Semantic Scholar

MPI - Semantic Scholar

Suggest Documents

MPI at Exascale - Semantic Scholar

MPI/CTP: A Reconfigurable MPI for HPC ... - Semantic Scholar

Automatic Hybrid OpenMP + MPI Program ... - Semantic Scholar

MPI on a Million Processors - Semantic Scholar

Transparent Redundant Computing with MPI - Semantic Scholar

myocardial perfusion imaging (mpi) - Semantic Scholar

MODELING MPI PROGRAMS FOR VERIFICATION ... - Semantic Scholar

the 1994 MPI Implementors' Workshop - Semantic Scholar

Dynamic Communicators in MPI - Semantic Scholar

Performance of MPI parallel applications - Semantic Scholar

High Performance RDMA-Based MPI ... - Semantic Scholar

Benchmarking MPI Communications for Parallel ... - Semantic Scholar

MPI Cluster System Software - Semantic Scholar

Transparent Redundant Computing with MPI - Semantic Scholar

MPI-focused Tracing with OTFX: An MPI-aware In ... - Semantic Scholar

MPI Farm programs on non-dedicated clusters - Semantic Scholar

MPI SUPPORT ON THE GRID Kiril Dichev, Sven ... - Semantic Scholar

Modeling Wildcard-Free MPI Programs for ... - Semantic Scholar

Parallel Ray Tracing using MPI and OpenMP - Semantic Scholar

Power-aware MPI Task Aggregation Prediction for ... - Semantic Scholar

Wrapping MPI-based legacy codes as Java ... - Semantic Scholar

Runtime Checking of Datatype Signatures in MPI - Semantic Scholar

MPI-LIT: a literature-curated dataset of microbial ... - Semantic Scholar