Classdesc and Graphcode: support for scientific programming in C+

0 downloads 0 Views 227KB Size Report
Oct 22, 2008 - the benefits of object reflection to C++, ClassdescMP which dramatically .... The EcoLab agent based modelling system[20]2 uses Classdesc to expose C++ .... The Classdesc package includes the ref concept, which ...
arXiv:cs/0610120v2 [cs.MS] 22 Oct 2008

Classdesc and Graphcode: support for scientific programming in C++ Russell K. Standish School of Mathematics University of New South Wales [email protected] http://parallel.hpc.unsw.edu.au/rks Duraid Madina Department of Systems Studies University of Tokyo [email protected] October 23, 2008 Abstract Object-oriented programming languages such as Java and Objective C have become popular for implementing agent-based and other objectbased simulations since objects in those languages can reflect (i.e. make runtime queries of an object’s structure). This allows, for example, a fairly trivial serialisation routine (conversion of an object into a binary representation that can be stored or passed over a network) to be written. However C++ does not offer this ability, as type information is thrown away at compile time. Yet C++ is often a preferred development environment, whether for performance reasons or for its expressive features such as operator overloading. In scientific coding, changes to a model’s code takes place constantly, as the model is refined, and different phenomena are studied. Yet traditionally, facilities such as checkpointing, routines for initialising model parameters and analysis of model output depend on the underlying model remaining static, otherwise each time a model is modified, a whole slew of supporting routines needs to be changed to reflect the new data structures. Reflection offers the advantage of the simulation framework adapting to the underlying model without programmer intervention, reducing the effort of modifying the model. In this paper, we present the Classdesc system which brings many of the benefits of object reflection to C++, ClassdescMP which dramatically simplifies coding of MPI based parallel programs and Graphcode a general purpose data parallel programming environment.

1

1

Introduction

This paper describes Classdesc, ClassdescMP and Graphcode, techniques for building high performance scientific codes in C++. Classdesc is a technique for providing automated reflection capabilities in C++, including serialisation support. ClassdescMP builds on Classdesc’s serialisation capability to provide a simple interface to using the MPI message passing library with objects. Graphcode implements distributed objects on a graph, where the objects represent computation, and the links between objects represent communication patterns. It is a higher level of abstraction than the message passing paradigm of ClassdescMP, yet more general and powerful than traditional data parallel programming paradigms such as High Performance Fortran[2] or POOMA[12]. This paper is organised into three sections, describing the three technologies in more detail. The final section concludes with a description of the current status of the code, and where it can be obtained from.

2 2.1

Classdesc Reflection and Serialisation

Object reflection allows straightforward implementation of serialisation (i.e. the creation of binary data representing objects that can be stored and later reconstructed), binding of scripting languages or GUI objects to ‘worker’ objects and remote method invocation. Serialisation, for example, requires knowledge of the detailed structure of the object. The member objects may be able to be serialised (e.g. a dynamic array structure), but be implemented in terms of a pointer to a heap object. Also, one may be interested in serialising the object in a machine independent way, which requires knowledge of whether a particular bitfield is an integer or floating point variable. Languages such as Objective C give objects reflection by creating class objects and implicitly including an isa pointer in objects of that class pointing to the class object. Java does much the same thing, providing all objects with the native (i.e. non-Java) method getClass() which returns the object’s class at runtime, as maintained by the virtual machine. When using C++, on the other hand, at compile time most of the information about what exactly objects are is discarded. Standard C++ does provide a run-time type information mechanism (RTTI), however this is only required to return a unique signature for each type used in the program. Not only is this signature compiler dependent, it could be implemented by the compiler enumerating all types used in a particular compilation, and so the signature for a given type would differ from program to program! Importantly, standard RTTI does not provide any information on the internal structure of a type, nor methods implemented. The solution to this problem lies (as it must) outside the C++ language per

2

se, in the form of a separate program which parses the interface header files. A number of C++ reflection systems do this: SWIG[4], being perhaps the oldest, parses a somewhat simplified C++ syntax with markup, provides exposure to scripting language of selected top level objects. Reflex[13], a more recent system than Classdesc[9], is interesting in that it use the GCC-XML parser (based on the C++ front end of GCC) to parse the input file to build a dictionary of type properties. Classdesc differs from these other attempts by traversing the data structures recursively at runtime, providing a genuine solution to serialisation, as well as allowing “drill down” of simulation objects in an interactive exploration of a running simulation. These are generically termed object descriptors. The object descriptor generator only needs to handle class, struct and union definitions. Anonymous structs used in typedefs are parsed as well. What is emitted in the object descriptor is a sequence of function calls for each base class and member, similar in nature to compiler generated constructors and destructors. Function overloading ensures that the correct sequence of actions is generated at compile time. For instance, assume that your program had the following class definition: class jellyfish: public animal { double position[D], velocity[D], radius; int colour; }; and you wished to generate a serialisation operator called pack. Then this program will emit the following function declaration for jellyfish: #include "pack_base.h" void pack(pack_t *p, string nm, jellyfish& v) { pack(p,nm,(animal&)v); pack(p,nm+".position",v.position,is_array(),1,D); pack(p,nm+".velocity",v.velocity,is_array(),1,D); pack(p,nm+".radius",v.radius); pack(p,nm+".colour",v.colour); } The use of auxiliary types like is_array() improves resolution of overloaded functions, without polluting global namespace further. This function is overloaded for arbitrary types, but is more than a template, so deserves a distinct name. We call these functions class descriptors (hence the name Classdesc), or simply an action for short. Thus, calling pack(p,"",var) where var is of type test1, will recursively descend the compound structure of the class type, until it reaches primitive data types which can be handled by the following generic template defined for primitive data types:

3

template void pack_basic(pack_t *p,string desc, T& arg) {p->append((char*)&arg,sizeof(arg));} given a utility routine pack t::append that adds a chunk of data to a repository of type pack t. This can even be given an easier interface by defining the member template: template pack_t& pack_t::operator b; Optional arguments to get allow selective reception of messages by source and tag. By setting the preprocessor macro HETERO, MPIbuf is derived from xdr_pack instead of pack_t. This allows ClassdescMP programs to be run on heterogeneous clusters, where numerical representation may differ from processor to processor. In MPI-2 C++ bindings, the basic object handling messages is a communicator. In ClassdescMP, an MPIbuf has a communicator. It also has a buffer, and assorted other housekeeping members. Some of these are used for managing asynchronous communication patterns: { MPIbuf buf; buf v;

22

b->v = fooptr(p)->u; for (Ptrlist::iterator n=p->begin(); n!=p->end(); n++) { b->u -= 0.25 * fooptr(n)->v; b->v -= 0.25 * fooptr(n)->u; } } swap(graph,back); Here graph and back are the graph and backing buffer for the calculation. We are also assuming a utility function fooptr() written by the programmer to return a foo* pointer to the object. A typical implementation of this might be: foo *fooptr(objref& x) {return dynamic_cast(&*x);} foo *fooptr(Ptrlist::iterator x) {return dynamic_cast(&**x);} It is important to use the new dynamic_cast feature of C++ to catch errors such x not referring to a foo object, or an incorrect combination of dereferencing and address-of operators. dynamic_cast will return a NULL pointer in case of error, which typically causes an immediate NULL dereference error. Old fashioned C style casts (of the type (foo*)) will simply return an invalid pointer in case of error, which can be very hard to debug. It should be noted that a Graph object appears as a list of those objects local to the executing processor. So this code will execute correctly in parallel. Each time Prepare_Neighbours is called, the message tag is incremented, preventing subsequent calls from interfering with the delivery of the previous batches of messages.

4.7

Deployed applications and performance

Within the EcoLab system[18], Graphcode is deployed with two of the example models provided with the EcoLab software. These models are working scientific models, not toy examples. The first model is the spatial Ecolab model[19], where the panmictic Ecolab model (Lotka-Volterra ecology equations, coupled with mutation) is replicated over a 2D Cartesian grid, and migration is allowed between neighbouring grid cells. The performance of this model has not been studied much yet. The second model is one of jellyfish in assorted lakes on the islands of Palau. Each jellyfish is represented as a separate C++ object, commonly called agent based modelling. The jellyfish move around within a continuous space representing the lake, and from time to time bump into each other. In order to determine if a collision happens in the next timestep, each jellyfish must examine all the other jellyfish to see if its path intersects that of the other. This is clearly an O(N 2 ) serial operation, which severely limits scalability of the model. To improve scalability, the lake is subdivided into a Cartesian grid, and the jellyfish is allocated to the cell describing its position. If the cells are sufficiently large that the jellyfish will only ever pass from one cell to its neighbouring cell 23

35 linear 9am noon 3pm 6pm 9pm

30

Speedup

25

20

15

10

5

0 0

5

10

15 20 No. processors

25

30

Figure 2: Speedup of the Jellyfish application with 1 million jellyfish in the OTM lake. Speedup is reported as accumulated wall time for a single processor divided by the accumulated wall time for n processors for that point in the simulation. Note that the speedup from 16 processors to 32 is more than double. in a given timestep, then only the jellyfish within the cell, plus those within the nearest neighbours need to be examined. This reduces the complexity of the algorithm to dramatically less than O(N 2 ), and also allows the algorithm to be executed in parallel. In the field of molecular dynamics simulations, this method is often called a particle in cell method. PARMETIS allows nodes and edges to be weighted, so in this case we weight each cell by wi = n2i , and each edge by vij = nj . In figure 2, the speedup (relative performance of the code running on n processors versus 1 processor) is plotted for different stages of the simulation. The simulation starts at 7am with the jellyfish uniformly distributed throughout the lake. As the sun rises in the east, the jellyfish track the sun, and become concentrated along the shadow lines. PARMETIS is called repeatedly to rebalance the calculation. As the sun sets at around 5pm, the jellyfish disperse randomly throughout the lake. In figure 3, the speedup is plotted as function of simulation time, so the effect of load unbalancing can be seen. It can be seen that Graphcode delivers scalable performance in this application. Graphcode has also been deployed in a 3D artificial chemistry model[8] exhibiting superlinear speedup over 64 processors due to the effectively enlarged 24

35

16 2 processors 4 processors 8 processors 16 processors 32 processors

14

12

Speedup

10

8

6

4

2

0 8

10

12

14 16 Time of day (24 hour clock)

18

20

Figure 3: Speedup of the Jellyfish application as a function of simulation time. This is differential speedup, calculated from the wall time needed to simulate a 6 minute period, so does not include partitioning time.

25

22

memory cache.

5

Current Status

Classdesc and Graphcode are open source packages written in ISO standard C++. They have been tested on a range of platforms, and compilers, including Linux, Mac OSX, Cygwin (Windows), Irix, Tru64; gcc and Intel’s icc for Linux, as well as native C++ compilers for Irix and Tru64. The source code is distributed through a SourceForge project, available from http://ecolab.sourceforge.net. The code is managed by the Aegis source code management system, which is browsable through a web interface. Version numbers of the form x.Dy are considered “production ready” — they have been tested on a range of platforms, and are more likely to be reliable. These codes are also available through the SourceForge file release system. The versions x.y.Dz are under active development, and have only undergone minimal testing (ie they should compile, but may still have significant bugs). Developers interested in contributing to the code base can register as a developer of the system by emailing one of the authors. Classdesc and Graphcode are also included as part of the EcoLab simulation system, which is available from the same source code repository.

Acknowledgments This work was funded by a grant from the Australian Partnership for Advanced Computing (APAC) under the auspices of its Computational Tools and Techniques programme. Computer time was obtained through the Australian Centre for Advanced Computation and Communication (ac3).

References [1] Draft technical report on C++ library extensions. Technical Report DTR 19768, International Standards Organization, 2005. [2] Charles H. Koelbel amd David B. Loveman, Robert S. Schreiber, Guy L. Steele, Jr., and Mary E. Zosel. The High Performance Fortran Handbook. MIT Press, Cambridge, Mass., 1994. [3] J. M. Squyres an B. C. McCandless and A. Lumsdaine. Object oriented MPI: A class library for the message passing interface. In Proceedings of the POOMA conference, 1996. [4] David M. Beazley. SWIG : An easy to use tool for integrating scripting languages with C and C++. In Proceedings of 4th Annual USENIX Tcl/Tk Workshop. USENIX, 1996. http://www.usenix.org/publications/library/proceedings/tcl96/beazley.html. 26

[5] Greg Colvin, Beman Dawes, and Darin Adler. Boost Smart Pointers. http://www.boost.org/. [6] O. Coulaud and E. Dillon. PARA++: C++ bindings for message passing libraries. users guide. Technical report, INRIA, 1995. [7] Dennis Gannon, Shridar Diwan, and Elizabeth Johnson. HPC++ and the Europa call reification model. ACM SIGAPP Applied Computing Review, 4:11–14, 1996. [8] Duraid Madina, Naoki Ono, and Takashi Ikegami. Cellular evolution in a 3d lattice artificial chemistry. In Banzhaf et al., editors, Advances in Artificial Life, volume 2801 of Lecture Notes in Computer Science, pages 59–68, Berlin, 2003. Springer. [9] Duraid Madina and Russell K. Standish. A system for reflection in C++. In Proceedings of AUUG2001: Always on and Everywhere, page 207. Australian Unix Users Group, 2001. [10] J. K. Ousterhout. TCL and the Tk Toolkit. Addison-Wesley, 1994. [11] John K. Ousterhout. Scripting: Higher-level programming for the 21st century. IEEE Computer, 31(3):23–30, 1998. [12] J.V.W. Reynders, P.J. Hinker, J.C. Cummings, S.R. Atlas, S. Banerjee, W.F. Humphrey, S.R. Karmesin, K. Keahey, M. Srikant, M.D. Tholburn, et al. POOMA:aframework for scientific simulations of paralllel architectures. In Parallel Programming in C++, pages 547–588. MIT Press, Cambridge, MA, 1996. [13] S. Roiser and P. Mato. The SEAL C++ reflection system. Interlaken, Switzerland, 2004. http://chep2004.web.cern.ch/chep2004/. [14] Kirk Schloegel, George Karypis, and Vipin Kumar. Parallel multilevel algorithms for multi-constraint graph partitioning. In A. Bode et al., editors, Euro-Par 2000, volume 1900 of Lecture Notes in Computer Science, page 296, Berlin, 2000. Springer. [15] Kirk Schloegel, George Karypis, and Vipin Kumar. A unified algorithm for load-balancing adaptive scientific simulations. In Supercomputing 2000, 2000. http://www.sc2000.org/proceedings/techpapr. [16] Will Schroeder, Ken Martin, and Bill Lorensen. The visualization toolkit : an object-oriented approach to 3-D graphics. Prentice Hall, Upper Saddle River, N.J., 1996. [17] Marc Snir et al. MPI: the complete reference. MIT Press, Cambridge, MA, 1996. [18] Russell K. Standish. Ecolab http://ecolab.sourceforge.net. 27

documentation.

Available

at

[19] Russell K. Standish. Cellular Ecolab. Complexity International, 6, 1999. [20] Russell K. Standish and Richard Leow. EcoLab: Agent based modeling for C++ programmers. In Proceedings SwarmFest 2003, 2003. arXiv:cs.MA/0401026. [21] Bjarne Stroustrup. The C++ Programming Language. Addison-Wesley, Reading, Mass., 3rd edition, 1997. [22] Robert A. van Engelen and Kyle Gallivan. The gSOAP toolkit for web services and peer-to-peer computing networks. In Proceedings of the 2nd IEEE International Symposium on Cluster Computing and the Grid (CCGrid2002),, pages 28–135, Berlin, Germany, May 2002.

28