Experiences on Porting a Parallel Objects

Copyright 1996 IEEE. Published in the Proceedings of the 4th EUROMICRO Workshop on Parallel and Distributed Processing, January 1996 at Oporto, Portugal. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone: + Intl. 732-562-3966.

Experiences on Porting a Parallel Objects Environment from a Transputer Network to a PVM-Based System Franco Zambonelli

Matteo Pugassi

Dipartimento di Elettronica Informatica e Sistemistica - Università di Bologna Viale Risorgimento 2 - 40136 Bologna ITALY E-mail: [email protected]

Dipartimento di Elettronica e Informazione Politecnico di Milano Piazza L. da Vinci 32 - 20133 Milano ITALY E-mail: [email protected]

Letizia Leonardi

Nello Scarabottolo

Dipartimento di Scienze dell’Ingegneria - Università di Modena 214, Via Campi - 41100 Modena - ITALY E-mail: {leonardi, scarabot}@dsi.unimo.it

Abstract Parallel Objects is a powerful model for distributed/parallel Object-Oriented program-ming. Goal of this paper is to present the approach adopted in porting the support of the Parallel Objects environment, originally implemented for a massively parallel architecture, onto the PVM environment, which is nowadays a de-facto standard in the design of distributed applications on heterogeneous networks of computers.

1. Introduction Heterogeneous computer networks are rapidly growing as the preferred target architectures for the development of parallel and distributed applications, for several reasons: high availability; • low costs; • ease of scaling. • The lack of standards has seriously compromised for many years the diffusion of general purpose environments for the development of large and portable

applications. In the last few years, several proposals have been issued, with the aim of defining a stable and portable interface to the applications. Among the others, the PVM environment [1, 2], has become a de-facto standard in the design of distributed applications because PVM is now available for a wide range of architectures, from workstations to massively parallel architectures. The PVM environment defines a message passing abstract architecture, by means of a set of primitives with a simple and clearly specified interface. However, the outlined model is still low-level and forces users to explicitly deal with process creation and inter-process communication directives. A higher level of abstraction is needed to make parallel programming widely accepted. A promising field is given by object-oriented programming. The above considerations have driven our work toward the implementation of the support for a parallel Object-Oriented programming environ-ment on the top of the PVM architecture, in order to offer a high-level programming interface to PVM users. The paper presents the Parallel Objects (PO for short) programming environment [3], based on the objectoriented paradigm [4] and enriching it with parallelism. PO applications were originally conceived for massively

parallel architectures, in particular for a transputerbased Meiko Computing Surface - 1 (from now on, MCS) [5]. The peculiar characteristics of the MCS architecture have influenced the implementation of the support. Thus, the porting of the support onto the PVM had to deal with several problems, mainly due to the different computational model defined by the Meiko concrete architecture and the PVM abstract one. Goal of the paper is to present our experience on the porting and to describe the implementation of the PVM support. The paper is organized as follows. Section 2 describes the Parallel Objects model. Section 3 discusses the MCS support of PO. The need for moving toward distributed systems and the choice of the PVM system are motivated in section 4. The experience and the problems encountered in porting the PO environment from the MCS architecture to PVM are analyzed in section 5. Finally, section 6 briefly describes the PVMbased support to PO.

Incoming service requests are issued to execution on the basis of internal synchronization constraints against the already executing activities [7]. When it is the case, a request is served by creating an activity to execute the specified operation. PO objects are always instances of a given class. PO classes describe both the non-parallel and the parallel part of an object. In particular, a class describes the interface (i.e. all the operations that can be requested to its instances), the state variables of its instances and the synchronization constraints. PO classes can be incrementally defined by means of a multiple inheritance mechanism.

2. Parallel Objects

PO introduces a high-level language for defining classes and for developing application by using them: for further details see [3]. Starting from the PO language, a preprocessor parses the code of PO classes in order to solve the inheritance relationships and to translate the PO code into C code with calls to the support library. Main tasks of the preprocessor are: • to check the syntax of the PO sources in order to detect syntax errors or invalid semantic; • to handle class inheritance; • to translate the class implementation into C code, enriched with calls to the primitives of the PO support. With regard to the last point, PO primitives are defined for: • dynamic object creation and deletion; • object naming; • requesting services in asynchronous, synchronous and future mode; • accessing the object state. In addition, primitives for dealing with explicit object allocation are under implementation. PO primitives encapsulate any architectural dependency: in particular, their interface has been conceived to be as general as possible to guarantee the portability of both the preprocessor and the run-time support, below described.

PO is an Object-Oriented parallel environment based on the active objects model [6]. PO can express parallelism with two dimensions of concurrency: by associating independent execution capacity with objects and by allowing multiple threads of execution within the same object. We distinguish then between inter-object and intra-object parallelism [3]. In parallel object languages, computation results from message passing between objects. When one object requires an external service, it sends a message to another object: the message specifies the service the sender needs. The receiving object verifies whether the operation is correct or not, then executes the requested operation. In PO both synchronous, asynchronous and future communication modes are available. Asynchronous communication modes enforce real interobject parallelism: both the sender and the receiver objects can execute at the same time. The second form of parallelism, the intra-object one, arises because of multiple activities within the same object. Since a parallel object can receive more than one request, each parallel object can execute several internal activities, one for each service request. Activities are created on request and terminated when the service is completed. The PO model does not constrain the number of internal activities that may run within the same object.

3. The MCS Support to PO This section describes the PO support for the MCS transputer-based architecture [5] and analyzes the characteristics of the MCS architecture that deeply influenced its implementation.

3.1. The PO Preprocessor

3.2. The Run-Time Support

The MCS support implementation consists of two levels: a support for the intra-object parallelism, replicated in each object, and a support for inter-object parallelism. The intra-object support is composed by a set of threads (see Figure 1) that realize the execution management for each PO object [8]: they are the Object Manager (OM for short) and one or more State Managers (SM). The OM represents the identity of the object itself: it is in charge of receiving the service requests from other objects. Moreover, it is the only entity authorized to access and modify the so called object execution state (represented by the state of the pending requests and of the executing activities). When it is the case the OM is in charge of creating a new activity, on the basis of the given intra-object scheduling policy, and to pass it the reference to the state. Activities of a PO object that reside on the same node of the OM have access to the state with direct memory reference. However, the state of any PO object can be split in several partitions: one SM is associated with each partition. When the activities of an object are distributed onto different nodes, the SMs allow activities to remotely access the managed part of the state. An access to a non-local state partition is transparently translated into a message for the corresponding SM.

Service Requests

Activity OM

Activity Activity

SM

SM

Figure 1. A MCS PO Object

The PO support for inter-object management is constituted by a set of entities - replicated on each node and composed by one or more threads - and called system managers [9]. The monitoring manager periodically measures the application behavior to detect its evolution. The allocation manager is in charge of deciding the

allocation of newly created objects [10]. The creation manager implements the decision taken by the allocation manager: it receives the command to create each new objects in its node. The router manager is in charge of delivering all messages that flow between application objects (both inter-object and intra-object communications) and between the managers of non physically connected nodes.

3.3. MCS Issues in the Support Design Even having designed the PO environment with the aim of portability, the peculiar characteristics of the MCS architecture have influenced the implementation of both the support and the environment itself. In particular: 1. the threaded scenario with a globally shared and unprotected memory space: all processes on a node have access to the whole local memory; 2. the presence of a high speed and point to point communication network, having (or pretending to have!) communication times comparable to the local memory access times; 3. the single application partition-based scheduling policy of the system, allowing only one application at a time to be in execution within a partition of the system; 4. the difficulty of accessing external resources, in particular mass storage devices. Point 1 has facilitated the implementation of the MCS support for intra-object parallelism. The parallel activities within a single PO object must share the state of the object itself. Because parallel activities, in the MCS, are threads, they share the whole local address space. Thus, holding a pointer to the state is all what is needed to access it. Any unsafe or non consistent access to the shared state is prevented by the a priori synchronization policy. Again with regard to the access to the state, point 2 has allowed objects to be capable of inter-node distribution. In fact, if the costs of accessing the network are comparable with the memory accesses costs, objects can profit of internal distribution and remote access to the state, by means of the SMs, in order to make the intra-objects concurrency real parallelism. As a consequence of the single application scenario (point 3), the unprotected approach does not have to deal with any protection /authentication problem: no resources are shared among different users. Finally, referring to point 4, it must be noticed that, since the support to the MCS architecture does not provide any way to load new parts of code during execution, all the code of an application has to be loaded onto the transputer nodes before starting the execution

of the application itself: that results in an excessive utilization of the local memory for each node of the system. An important characteristic that has to be analyzed in an Object-Oriented environment is the class-instances relationship. In the PO support, both the creation manager and each OM need to have access to class data structures. The former to handle object creation correctly, the latters to handle the creation of new activities. Thanks to the capability of sharing memory, the preprocessor builds the class lattice as a global data structure to be maintained in the local memory of each node during the whole application execution. Since this structure is globally accessible from any thread within a node1, classes and instances share the same description. This makes it also possible to dynamically solve any inheritance related bounding, such as the reference to a method or to an instance variable.

4. Moving to Distributed Systems 4.1. Motivations The diffusion of massively-parallel systems for general-purpose computing have been limited for different reasons: • on the one hand, non standard environments and/or system libraries give no guarantee on software porting and reuse; • on the other hand, the widely claimed power scalability of this kind of architectures cannot be appreciated: scaling up a parallel system by replicating its components does not outweigh the technological obsolescence of each single component. In addition, the lack of any general-purpose solution to the typical problems regarding the efficiency of massively parallel architectures, such as load balancing [11] and routing [12], makes it often necessary to solve them by scratch. All these reasons have limited the diffusion of massively parallel architectures to special-purpose applications, where their intrinsic characteristics are welcome and porting and/or economic aspects are not so important. In the last few years, the idea of considering clusters of connected workstations as the preferred target architecture for general-purpose parallel computing has 1 The distribution of the replicas of the class lattice does not cause any consistency problem: up to now, PO classes cannot change at runtime.

been come out. Networked workstations, because they are already present in every computing center, show a lower cost/performance ratio with respect to massively parallel architectures. In addition, scalability is guaranteed by: • the capability of frequently updating the hardware at the level of single workstations; • the possibility of enlarging a system by simply adding new workstations. Not only hardware factors make the use of those systems as virtual parallel machines increase, but also software ones: • the standardization of the many UNIX versions and the high level of connectivity reached among different computer manufacturers; • the multi-user scheduling capabilities, making the system always available for execution; • the fact that only simple (and often low cost) software add-ons allow to turn networked computers in a ready-to-use message-passing parallel architecture. One of the key points that could make parallel programming on a network of workstations a widely diffused technology, is standardization: one must guarantee software investments by providing parallel libraries with and a defined and long-lived interface and made them available for most of the commercial systems. In addition, to provide hardware transparency (heterogeneous nodes within a network may have different internal data representation) translation data protocols are needed. Currently many development environments are available that try to step over these problems. However, the programming level defined by those libraries is usually low level, making the user in charge of dealing with explicit message-passing and process creation: we believe that higher level parallel programming environments are needed. For this reasons, we decided to implement the support of the PO programming environment upon the library that, as far as we think, is growing as a standard. 4.2 PVM Presently, one of the most widely diffused libraries for parallel programming on networks of workstations is the Parallel Virtual Machine, PVM [1, 2]. Developed by the Oak Ridge University, it is a well known and supported system for message-passing programming and remote tasks execution management over a network of TCP/IP connected heterogeneous machines. Its availability as a public domain software, its frequent updates, the wide and growing number of supported systems, its simplicity, have made PVM the “de facto”

standard for parallel virtual machines management and programming. Main features of PVM are: • message-passing communication protocol in order to use distributed memory machines; • non-blocking communications in order to enhance concurrent execution; • automatic data format translation to maintain data integrity during heterogeneous inter-node communications; • processes that can access all the external resources of the node where they are running; • unique process ID across all the virtual machines to allow global referencing; • dynamic process creation with some limited, but user modifiable, load balancing capabilities; • use of the standard C and Fortran compilers and libraries available on the nodes; • availability of graphics tools for helping users during the development and testing phases. Providing only simple message-passing primitives, PVM can be used as a basis for more elaborated and powerful programming environments, using it like a lower level abstraction layer that hides all the implementation details of the underlying operating systems and hardware architectures.

5. From MCS to PVM: Porting Issues To provide the same PO programming paradigm on top of PVM, a different design is required for the support system due to the architectural differences of virtual machines with respect to massively parallel ones. In particular, with reference to the key points analyzed in section 3.3, the implementation of the PO support onto a network of PVM workstations has to deal with a different scenario: 1. PO processes are mapped into UNIX processes: this implies that memory protection mechanisms do not support memory sharing among them; 2. the slow (if compared with the time needed to access the local memory) non-dedicated communication network forbids state distribution among nodes; 3. the available physical resources are shared among several users and several applications; 4. there is the capability of dynamically accessing the file system for loading new code on need. Let us analyze how these characteristics have influenced the porting of the PO support.

5.1. The Problem of the State The lack of any shared memory among processes has posed the problem of implementing the objects state: the state of an object, in fact, must be shared among the concurrent activities within it. Since any PVM process is a closed entity, providing this capability is not trivial. Possible solutions are: • to create a PVM process, that manages the state the same way State Manager does in the MCS support; • to overcome the PVM model by using threads, such as Sun light-weighted processes [13] or DCE threads [14]. All these solutions present some drawbacks. The former makes accesses to the state costly, since any access is translated into a slow request sent to the State Manager. The latter makes the implementation of the support non standard. We have chosen to overcome these problems by adopting the UNIX shared memory library within a PVM process [13]. This makes the support portable, since this library is available on most of the UNIX based systems. However, we claim that any access to the state is translated by the preprocessor into a primitive of the support (see section 2.2). Thus, it is not needed to know how the shared state is implemented by the support: any change in the implementation of the access to the state would not mine the portability of PO applications. Our hope is that the urge for lightweighted processes to achieve greater flexibility in the area of distributed environments, emerged from our experience too, will produce a feedback in future PVM releases.

5.2. Concentrated vs. Distributed Objects With regard to the access to the object state problem, one can criticize that even in the MCS implementation the access to the object state is granted by the State Manager processes. However, an important difference arises: in the transputer implementation State Managers were introduced only to provide remote state access capabilities (that thank to the high speed network, can be effectively granted). In a distributed implementation, the cost of accessing the network is higher and unpredictable. Thus, we have chosen not to allow, as a general rule, an object to be distributed onto more than one node. In particular, any activity within an object that needs to access the state is not allowed to execute on a node different from the one where the state resides, i.e., the Object Manager node. Without this limitation, the cost of accessing such a state would become too high, degrading the execution time. Only activities that can execute independently without accessing the state or activities that belong to an object with no state at all can be allocated everywhere in the system. Because of

this choice, no support for remotely accessing the object state was provided.

5.3. Structuring Application Code A distributed architecture, composed of a set of networked workstations, has the capability of making resources shared among several users and/or several applications. This means that, in general, protection and authentication mechanisms have to be added to the system to guarantee execution among unauthorized accesses. Using only the TCP/IP communication protocol and the normal UNIX libraries PVM builds the virtual machine in the user space, thus allowing many non intersecting virtual machines running on the same nodes from different users. Running in the user space PVM inherits the standard UNIX access controls, because only an authorized user can start PVM processes and only acknowledged machines can become PVM nodes.

5.4. Dynamic Code Loading In the MCS support, no code can be executed by a process if not loaded at the application booting time. In the PVM system, instead, the creation of a new process is associated with the file name that contains its executable code. Thanks to this capability, code can be loaded on need onto the system nodes and, thus, memory usage can be improved. However, to provide this characteristic to the support, the preprocessor and the way it builds the inheritance lattice and the enriched PO code have to change. In particular, all the class inheritances must be resolved at compile time.

5.5. The Preprocessor The PO preprocessor is in charge of parsing the PO code and translating it into C code enriched with the PO primitives. However, it is also in charge of solving the inheritance and building the class hierarchy. In the MCS support, the globally shared memory makes it possible to provide the inheritance lattice as a global data structure. In a PVM system, like in all nonthreaded operating systems, there is no way of accessing other tasks memory to execute code, thus, whenever an object or an activity within an object have to be created, the associated code must be loaded from the file system. To provide a global view of the inheritance hierarchy, one could build a global file containing the whole application hierarchy and made objects and activities execute the various parts of this file. However, it is clear that such a choice is neither feasible nor convenient because loading a file from the storage memory is a very expensive task.

For this reasons, the way the preprocessor acts on the PO code to produce C code has to be changed: each class implementation requires a copy of all the inherited methods code in self-contained C files. More in detail, the translation phase breaks each class implementation in two different files: • a class description file, that maintains a description of the state instance variables, of the interface provided by the objects of its class and of the class scheduling policy; • a file containing all class methods implementation. As a consequence, the global produced code has larger dimensions that the MCS one. Moreover, dynamic binding is no more allowed.

6. The PO-PVM Support 6.1. Intra-Object Management In the PVM implementation, for each class, the Object Manager code is executed from the class description file. As soon as a new object is created, the OM is in charge of: • initializing the internal state by allocating it in a shared memory area and setting it to the proper values before activity start-up; • connecting itself to the PVM communication subsystem in order to provide a gateway for incoming method invocations.

Service Requests

Activity OM

Activity Activity

Shared Memory (state) Figure 2. A PVM PO Object

A PO object, in fact, is a complex autonomous entity capable of reacting to asynchronous events. Then, after the initialization phase, the Object Manager waits on the PVM port for a message. In particular, there are two possible message types corresponding to different events:

Figure 3. The PO-PVM Support • external messages, sent by other objects requesting methods execution; The creation of a new object is explicitly commanded • internal messages, sent by the object activities for by means of a message sent to the PO spawner. The signaling a change on their execution state. allocation of the new objects can be either decided by Whenever an external message is received the OM: the PO spawner, on the basis of an integrated policy, or • verifies its correctness with respect to the provided by the PVM system itself on the basis of its internal, interface; cyclic, allocation policy. • checks the possibility of activating the operation In PO there are two types of objects: with respect to already executing activities. If the • the private ones, owned by their creator and not object internal scheduling policy allowed activation, accessible in a public manner; the OM starts a new PVM task from the method • the public ones, that are created with a public name implementation file and sends it the requested that can be used by other objects that need their method name together with the parameters, services. otherwise the request is enqueued in the internal Public names require some checking in order to avoid waiting-to-start task list. duplications and/or bad name resolution. To maintain efficiency, each activity within an object A simple solution to this problem is charging a single is an autonomous entity: it is capable of sending the PO spawner of providing name services. This is reply directly to the client that requested it, bypassing possible, and is currently used in our implementation, the Object Manager and thus avoiding unnecessary but produces network traffic towards the node of the PO multiple slow PVM calls. spawner in charge of the naming service. To avoid this Once accomplished its task, each activity sends an problem, a distributed implementation of the name internal message to the OM, allowing the OM itself to server will be adopted in the future. test the activation of enqueued requests with respect to the synchronization constraints and, if it is the case, to activate them. 7. Conclusions and future works The OM is also responsible for object shutdown, required by an explicit termination message to objects The paper presented the Parallel Objects environment that are not used anymore. and reported our experience in porting it from a massively parallel architecture to the PVM system. 6.2. Inter-Object Management The work was motivated from the clear advantages of Currently, the inter-object support level is limited to having a standard parallel programming library over a the presence of a single entity - we call it the PO networked computer system to be used as the target spawner, in analogy with the PVM terminology architecture for the development of parallel and replicated onto each system node (see figure 3). The PO distributed applications. spawner is in charge of: The PO environment was originally conceived for • activating new objects; massively parallel architectures, in particular transputer• terminating running ones; based ones: the scenario of the PVM system obliged us • providing name service facilities for public objects. to several design rethinkings In particular, the main topics that we had to deal with are: • the different memory model; APPLICATION OBJECTS • the different network model; • the different system scheduling policy; • the different resources the system nodes can access. First experimental results show that the advantages PO PO PO SPAWNER SPAWNER SPAWNER of a high-level programming environment do not have to pay too much in performance, i.e., the overhead of the support is limited with respect to the one imposed by the PVM PVM PVM PVM system. DAEMON DAEMON DAEMON Future works will deal with extending the support in order to allow object migration and persistency. UNIX UNIX UNIX HOST

HOST

HOST NETWORK

References 1.

2. 3.

4.

5. 6.

7.

G. A. Geist, V. S. Sunderam, “Network-Based Concurrent Computing on the PVM System”, Concurrency Practice and Experience, Vol. 4, No. 4, June 1992. A. Beguelin et al., “Recent Advances to PVM”, 1994, available via ftp at: netlib2.cs.ukt.edu:/pvm3. M. Boari et al., “A Programming Environment Based on Parallel Objects for Transputer Architectures”, Models and Tools for Massively Parallel Architectures, Napoli (I), June 1993. Wegner, “Concepts and Paradigms of Object Oriented Programming”, ACM OOPS Messenger, Vol. 1, No. 1, Aug. 1990. Meiko Ltd., “Computing Surface Reference Manual”, Meiko Ltd.,1989. Chin, S.T. Chanson, “Distributed Object-Based Programming Systems”, ACM Computing Surveys, Vol. 23, No. 1, March 1991. Corradi, L. Leonardi, “PO Constraints as Tools to Synchronize Active Objects”, The Journal of Object-Oriented Programming, Vol. 4, No. 6, Oct. 1991.

8.

9.

10.

11.

12.

13. 14.

Ciampolini, A. Corradi, L. Leonardi, “The Support for a Dynamic Parallel Object Model on a Transputer Based Architectures”, IEEE Int. Phoenix Conf. on Computers and Communications, March 1991. A. Ciampolini, A. Corradi, L. Leonardi, F. Zambonelli, “The Benefits of Migration in a Parallel Objectsd Environment”, EUROMICRO Workshop on Parallel and Distributed Processing, Malaga (E), Jan. 1994. Corradi, L. Leonardi, F. Zambonelli, “Load Balancing Strategies for Massively Parallel Architectures”, Parallel Processing Letters, Vol. 3, No. 2&3, Sept. 1992. M. Shivaratri, P. Krueger, M. Singhal, “Load Distributing for Locally Distributed System”, IEEE Computer, Vol. 25, No. 12, Dec. 1992. M. Boari et al., “Adaptive Routing for Dynamic Applications in Massively Parallel Architectures”, IEEE Parallel and Distributed Technology, Vol. 3, No. 1, Spring 1995. Sun Ltd., “SunOS Reference Manual”, N- 8003827-10, 1990. Open Software Foundation, “Introduction to OSF DCE”, Prentice Hall, N.J., 1992.

Experiences on Porting a Parallel Objects

Experiences on Porting a Parallel Objects

Suggest Documents

Initial Experiences Porting Hydrawidth.3emMPI, which ... - HPCx

Initial Experiences Porting a Bioinformatics Application to ... - CiteSeerX

Experiences porting the Plan 9 research operating ... - Google Sites

porting - CodeWeavers

Constructing Digital Game Exhibitions: Objects, Experiences ... - MDPI

Early experiences on the journey towards self ... - Parallel Data Lab

Early experiences on the journey towards self ... - Parallel Data Lab

Porting Linux

Early experiences on the journey towards self ... - Parallel Data Lab

NAS Experiences of Porting CM Fortran Codes to HPF on ... - CiteSeerX

Parallel Objects: Virtualization and in-Process Components

Parallel Self-Sorting System for Objects - arXiv

Using High Level View on Android Porting

Parallel Processing of Objects in a Naming Task - Semantic Scholar

Experiences in Verifying Parallel Simulation Algorithms

Practical Experiences with Modern Parallel Performance Analysis ...

Experiences with Parallel N-Body Simulation

Experiences applying parallel and interoperable network ... - CiteSeerX

Marshmallow Porting on Beaglebone Black - Google Groups

A Feasibility Study on Porting the Community Land ... - Semantic Scholar

Porting Android on Arm Based Platform

Porting Xen on ARM to a new SOC - schd.ws

A case study on porting scientific applications to GPU ...

Porting OpenFOAM to HECToR A dCSE Project