Using the Object Space: a Distributed Parallel make1 Andreas Polze e-mail:
[email protected] Institut für Informatik, Freie Universität Berlin 14195 Berlin, Takustr. 9, FR Germany Abstract
2. The Object Space approach
We present the Object Space approach to distributed computation. It allows for decoupled communication between program components by providing a shared data space of objects. The Object Space approach extends the sequential language C++ with coordination and communication primitives as known from Linda. It integrates inheritance into the associative addressing scheme and facilitates passing of arbitrary objects between program components. Furthermore we introduce the notion of application-specific matching functions. A prototype for Object Space has been implemented in C++ under UNIX. We give a distributed parallel make as an example to demonstrate the ideas and use of the concepts developed.
Object Space supports communication and synchronization between components of a distributed application. All components (clients) get access to a shared associative data store known as the Object Space. Every client may write objects into the store which can subsequently be read by others. In order to read an object a component presents a template which is matched against the objects. If no matching object can be found within Object Space then the read operation blocks. This way objects may be passed from one component to another. Object Space itself is implemented in a distributed fashion, employing Object Space Manager processes in different nodes of a network. This style of communication is called decoupled. Neither does the sender of an object know its receiver nor vice versa. If there exist several components trying to read one object it is nondeterministic which of them will succeed. This style of decoupled communication is well suited for scalable distributed computing within a network of workstations. Object Space integrates coordination constructs (communication and synchronization) into the C++ language. Two other approaches which combine object-oriented programming languages with a Linda-like communication style may be found in [4] and [5].
1. Introduction In this paper we develop a model to integrate objectoriented programming paradigms like inheritance and data encapsulation with decoupled communication between components of a distributed application as known from Linda [1]. Within our model processes may exchange arbitrary objects. When matching one object against another inheritance will be taken into account by an associative addressing scheme. The notion of application-specific matching functions allows access to more then one object within a single operation. This extends the coordination model of Linda [2]. We define the Object Space Language (OSL) to express communication and synchronization constructs and briefly explain the prototype implementation of Object Space. A distributed parallel make serves as example to show the use of ideas and concepts developed. The final section of the paper presents conclusions and an outline of future work.
2.1. The Object Space Language The Object Space Language (OSL) defines how communication and synchronization between program components via Object Space may be specified. Therefore it is a coordination language in the sense of [2] and has to be embedded into a sequential computational language, C++ in our case. A preprocessor translates coordination constructs written in OSL into creation of C++ objects and calls to the special library libos++.
1 appeared in the Proceedings of 4th IEEE Workshop on Future Trends of Distributed Computing Systems, Lisbon, September 1993.
OSL-Program object_space_op op_spec object
::= ::= ::= ::=
{ object_space_op | local_computation }. object ‘‘.’’ op_spec. rd | in | out | eval . [ objID ]‘‘(’’ class [ ‘‘:’’ base ] ‘‘,’’ data_comp { ‘‘,’’ data_comp } ‘‘)’’.
data_comp formal actual matching_fct
::= ::= ::= ::=
formal | actual. [ type ] ‘‘?’’ name. [ type ] name [ ‘‘=’’ value [ matching_fct ]]. delta‘‘(’’ diff ‘‘)’’ | or value { or value }. figure 1: Object Space Language
When matching an object against another we need some kind of structure information about them. Since C++ does not provide runtime type information we need a special mechanism to obtain this structure information. When designing libos++, we have choosen to force the programmer to supply structure information by enumerating an object’s data components using calls to special functions from libos++. These functions have to be called within the special member function description() inside a C++ class definition. Besides an enumeration of all components of an object at runtime, we need a means to distinguish between those components being significant for associative addressing and those being insignificant. Corresponding to actual and formal data components as described below the special functions actual() and formal() serve this purpose. Introduction of OSL allows to hide these details from the programmer. A preprocessor can automatically generate calls to the special functions mentioned above using information contained in OSL constructs. Figure 1 shows the definition of the Object Space Language using extended Backus-Naur notation. A program in OSL is a sequence of Object Space operations and local sequential operations. The notion of local_computations is defined by the syntactic rules of C++, OSL’s host language. In OSL objects may appear without a name. It is useful to leave out the object’s name if that object is used by Object Space operations only (e.g. for synchronization purposes). An object’s class has to be defined as an ordinary C++ class somewhere. In our implementation of the Object Space approach class names are mapped to unique integer classIDs by a distributed type service. A much more complex technique to obtain runtime type information from C++ classes is described in [6]. Inheritance information is expressed in a distributed fashion as a relation over classIDs. The associative addressing scheme takes this information into account when performing Object Space operations.
The set of types for data components of objects in Object Space contains arbitrary C++ classes besides the
C++ builtin types. Although the type of a data component is already defined in the declaration of a class, we allow explicit specification of that type in OSL. The explicitly specified type overrides a type known from the class declaration during matching in the Object Space. Data components may be formal. The notation ?name describes a formal component. Its value is | . The type of a formal component but not its value is taken into account by the associative addressing scheme within Object Space. We give an example for operations written in OSL. Let tstatC be a C++ class containing a string component f describing the name of a file and a component s of class state describing the file’s state. Objects of class tstatC could be used to describe the sub-targets of a make target. Several processes may retrieve one of these objects and perform the actions necessary to generate a particular sub-target in parallel. For demonstration purposes we assume that member function show_f prints component f of a tstatC object: (tstatC, f="file1.o", s="nonexistent").out; (tstatC, f="file2.o", s="up-to-date").out; (tstatC, f="file3.o", s="out-of-date").out; tstatC o; o(tstatC, ?f, s="nonexistent" or "out-of-date").in; o.show_f();
This example demonstrates how three objects are written into Object Space. Afterwards one object is read. The matching function or causes the object written first or last to be read nondeterministically. So within our example either the string file1.o or file3.o is printed out.
2.2. The associative addressing scheme In contrast to OSL, objects have no idea about their identity once put into Object Space. Within Object Space an object is represented as the aggregate of class-specific type information and of its visible data components. Inheritance is expressed as a relation over the type information. It is taken into account when performing Object Space operations. Arbitrary C++ objects may be used as messages within Object Space. Objects from a derived class may match against objects from a base class. During Object Space operations an object may be stored or retrieved as a whole only. So each attempt to access a data component of an object must pass the access control mechanisms known from C++. Thus concepts of object-oriented programming like data encapsulation and inheritance are effective for communication between program components. The read operations rd and in carry a template as argument. A template is an object with (perhaps) formal data components. It describes a set Mmatch of matching objects. An element of this set is used to fill the formal data components of the template with values. In OSL templates may be written as objects. Lets now consider under which circumstances an object matches a template. Templates may contain matching functions. These boolean functions influence the construction of set Mmatch . An object matches a template if: — both are instances of the same class and corresponding data components match. — its class is derived from the template’s class and corresponding data components match. As shown in figure 2, two data components match if they are of the same type and the template’s matching function returns true when applied on an object’s data component or if one of these components is formal. For each data component of a template (mi ) and the corresponding component of an object (oi ) holds: m = | or i (mi, oi ) ∈ match ←→ oi = | or match_fct(oi, mi ) figure 2: relation match Now lets briefly discuss a matching function available within Object Space. We write the object’s data component a matching function is applied on as first argument and separate it from the other arguments by a semicolon. An operation equal is used in the definition of matching functions. This operation is defined for all types used
within Object Space. As an example, figure 3 shows matching function or as defined within Object Space. Let m be a template’s data component, let mi for 1 ≤ i ≤ n be alternative values of a template’s data component, let o be a component of an object stored within Object Space: true, ---| i: o equal mi or(o; m1 , . . . , mn ) = false else figure 3: matching function or The associative addressing scheme uses a default function if no matching function is explicitly given for a template’s data component. Four operations are defined within Object Space: • out writes an object into Object Space. • in and rd carry a template as argument. Both operations retrieve a matching object from the Object Space and store its values in the template; they block until a matching object is found. in removes the matching object from Object Space. • eval creates a new UNIX process either locally or remotely. It carries the command line arguments for a remote process or the address and arguments of a function which has to be executed locally as parameters. The Object Space approach has been implemented in C++ under UNIX. Within that implementation access to Object Space is available trough inheritance from two communication base classes: objsp_comm and objsp_proc. The operations out, in and rd are implemented within objsp_comm class. Furthermore this class provides access to the distributed type service. Client classes, which want to use Object Space have to supply their class name to a constructor from objsp_comm class. Operation eval is implemented by class objsp_proc. An instance of that class may initiate process creation. Afterwards it contains a unique identifier for the newly created process, composed from the process’ nodename and its UNIX process identifier.
3. Prototype implementation The Object Space approach supports the imagination of a distributed shared memory. Our prototype implementation in C++ is based on interprocess communication mechanisms available within the UNIX operating system. Some kind of storage medium is needed to keep the data available which make up an object. In our
implementation these storage media are provided by a separate UNIX process, the Object Space Manager. It uses virtual memory to store the objects. Furthermore this process performs matching between templates and objects. It implements matching strategies which correspond to matching functions as available in the language OSL. In addition to Object Space Manager several Client Object Space Manager processes may exist. These processes do not store any object at all. They just forward operation requests to the Object Space Manager process using a special protocol on top of TCP/IP. Thus crossing machine boundaries is possible. A scenario around Object Space as shown in figure 4 includes several processes each of them storing objects into and retrieving them from Object Space.
TCP IP Object Space Manager
(msgrcv)
Client Object Space Manager
(System V message queues)
libos++ client 3
(msgsnd)
libos++ client 1 (intC, c=1).out;
(intC, ?c).in;
libos++ client 2
figure 4: distributed Object Space Managers Client processes communicate with a local Object Space Manager process. They use message queues as communication mechanism. Message queues have been introduced with UNIX System V but actually they are available with nearly every UNIX System. The library libos++ transforms calls to Object Space operations into a proper sequence of msgsnd() and msgrcv() system calls on these message queues. An Object Space Manager handles two message queues, a read queue and a write queue. Each client process contacts the read queue of a local Object Space Manager to initiate operations on the Object Space. Results of operations are returned on the corresponding write queue.
4. Distributed parallel make The standard UNIX utility make may be used to maintain, update and regenerate groups of programs regarding its dependencies with each other. Besides dependencies
actions which describe how a target may be generated can be expressed within a makefile. Actions may be written as a sequence of shell commands. A tool pmake is included in the ISIS package [3]. This ISIS application facilitates several processes to fulfill tasks as described above. Whilst standard-make executes all actions sequentially pmake may execute actions in parallel. These actions have to be written using a special notion within the makefile. The pmake program models actions belonging to a particular target within a directed acyclic graph as shown in figure 5. Actions which have to be executed to generate a make target are represented as nodes within that graph. Data dependencies between different targets are modeled as edges within the graph. The pmake program executes two phases: 1) The graph showing parallel actions and dependencies between sub-targets is transformed into an ascii representation and written into file pmake.gph. Sequential actions are executed immediately. 2) Several processes are created, each of them reading the file pmake.gph. These processes execute actions in parallel. We show a pmake version which uses Object Space as a communication mechanism instead of the ISIS package. Instances of class tstatC as mentioned in section 2.1. describe the state of sub-targets and define a protocol for communication between processes executing pmake steps.
comp1.c
comp4.c comp2.o
comp3.o compile
compile comp1.o
comp4.o link prog
figure 5: pmake scenario As shown in figure 5, each step belonging to a make target has input dependencies and generates output data which other steps depend on. Input dependencies and output data are usually files within the UNIX filesystem. After execution of pmake’s first phase file pmake.gph contains steps necessary to generate the sub-targets which are out-of-date.
eval
pmake do_it Step 3
fork(); exec();
Object Space
fork(); exec();
cc -o prog ("comp2.o").rd ("comp3.o").rd
fork(); exec();
do_it Step 1
("comp1.c").rd
("comp4.c").rd
cc -c ... ("prog").out
("comp1.o").out
do_it Step 2
fork(); exec();
fork(); exec();
fork(); exec();
("comp1.o").rd
cc -c ...
("comp4.o").out
("comp4.o").rd
figure 6: distributed pmake processes
The basic algorithm
Space out and rd operations is indicated by thin arrows.
For each input dependency which is up-to-date an object of class tstatC is written into Object Space. Each object contains a dependency’s name and its state, for example: (tstatC, (tstatC, (tstatC, (tstatC,
f="comp1.c", f="comp2.o", f="comp3.o", f="comp4.c",
s="up-to-date") s="up-to-date") s="up-to-date") s="up-to-date")
Now, three do_it processes are created initiating the compile- and link-steps shown in figure 5. Each process attempts to read the tstatC object describing its input dependencies before it performs any action (e.g. starts a compiler). It eventually blocks. After executing an action each do_it process writes a tstatC object into Object Space describing its output file. At begin several processes may block when attempting to read their tstatC objects since their input dependencies are nonexistent or out-of-date. But at least one process will perform its action immediately. With advancing execution each process becomes active. Finally an tstatC object corresponding to the initial pmake target appears in Object Space. Figure 6 shows the processes and its dependencies corresponding to our example scenario from figure 5. Within figure 6, process creation initiated by eval is shown by thick arrows. Data flow as visible trough Object
The filename component of corresponding tstatC objects appears as label at these arrows. pmake actions are represented by ellipses. Each of these actions is guarded by an additional do_it process which initiates the appropriate action if input dependencies are fulfilled. The Object Space operation eval allows to launch processes on different network nodes. Attention is required to handle cases where generation of a sub-target fails. To avoid blocking of the pmake algorithm an output object is written into Object Space in this cases, too. But the state component of that object describes the corresponding pmake sub-target as nonexistent. So subsequent steps relying on that object may not perform any pmake action at all. Instead these steps generate output objects whose state component describes the corresponding target as nonexistent. Within our example finally the object: (tstatC, f="prog", s="nonexistent")
appears in Object Space. Thus the whole pmake algorithm terminates if an action fails. Another problem appears if a user interrupts execution of the pmake program (e.g. by sending a signal via kill() or via the control-C key). Our implementation allows for propagation of signals over Object Space. A special function within libos++ may be used to send a signal to a process running either locally or remotely. This function takes a process identifier as returned from
eval as argument. Actually Object Space processes are not able to catch this kind of signals. They terminate silently when receiving a signal via Object Space. However this mechanism allows the implementation of robust distributed applications within the context of UNIX.
5. Conclusions We have developed the Object Space approach to distributed computing. It provides decoupled communication over a shared data space of C++ objects. Based on the C++ class concept, we are able to define certain abstract data types describing communication protocols for distributed applications. The associative addressing scheme within Object Space takes inheritance into account. This allows extension of previously designed protocol classes by derivation of subclasses. Instances of a subclass may be used instead of base class’ instances during Object Space operations. With a parallel make example we demonstrate that the Object Space approach provides a powerful mechanism to express communication in data flow algorithms within the context of UNIX. Furthermore Object Space allows for distribution of parallel processes over the nodes of a network. Finally, compared with the ISIS version our pmake algorithm turns out to be simpler and more natural Currently we use the Object Space approach to implement a distributed phonebook application [7] and to develop a mechanism for automatic generation of distributed prototype implementations from LOTOS specifications [8].
References [1] D.Gelernter; Generative communication in Linda; ACM Transactions on Programming Languages and Systems, 7(1):80-112, 1985. [2] D.Gelernter, N.Carriero; Coordination Languages and their Significance; Communications of the ACM, Vol. 35, No.2, Feb. 1992. [3] K.Birman, R.Cooper, T.Joseph, K.Marzullo, M.Makpangou, K.Kane, F.Schmud and M.Wood; The ISIS System Manual, Version 2.0; Cornell University, May 8, 1990. [4] R.Jellinhaus; Eiffel Linda: An Object-Oriented Linda Dialect; ACM Sigplan Notices, Vol.25, No.3, December 1990. [5] S.Matsuoka, S.Kawai; Using Tuple Space Communication in Distributed ObjectOriented Languages; Proceedings of OOPSLA’88. [6] P.-A.Pauw, R.Werring, A.Jansen; An operational metadata system for C++; Proceedings of TOOLS USA’92. [7] A.Polze; The Object Space Approach: decoupled communication in C++; to appear in Proceedings of TOOLS USA’93. [8] A.Polze, A.Vogel; Generation of Distributed Prototype Implementations from LOTOS Specifications based on the Object Space Approach; (in german) Report B-93-2, Freie Universität Berlin, Institut für Informatik, 1993.