A CORBA-Based Object Group Service and a Join ... - Semantic Scholar

8 downloads 3507 Views 120KB Size Report
lel software applications by transparently encapsu- lating typical forking and ..... distributed as well as information about the dispatch- ing strategy that has been ...
A CORBA-Based Object Group Service and a Join Service Providing a Transparent Solution for Parallel Programming Markus Aleksy, Axel Korthaus University of Mannheim, Germany {aleksy|korthaus}@wifo3.uni-mannheim.de Abstract The field of distributed parallel programming is predominated by tools such as the Parallel Virtual Machine (PVM) [5] and the Message Passing Interface (MPI) [11]. On the other hand, mainly standards like the Common Object Request Broker Architecture (CORBA) [12], Remote Method Invocation (RMI) [19], and the Distributed Component Object Model (DCOM) [10] are used for distributed computing. In this paper, we examine the suitability of CORBA-based solutions for meeting application requirements in the field of parallel programming. We outline concepts defined within CORBA which are helpful for the development of parallel applications. Subsequently, we present our design of an Object Group Service and a Join Service which facilitate the development of CORBA-based distributed and parallel software applications by transparently encapsulating typical forking and joining mechanisms often needed in that context.

1.

Introduction

In the field of parallel programming, message passing libraries such as PVM and MPI play a prevailing role. When the Java programming language emerged and became more popular, these solutions were ported to that language and new techniques were developed. There are, for example, products such as JPVM [8], jPVM (formerly JavaPVM) [9], JavaMPI [7], HPJava [6], or DOGMA [3] which aim at enabling parallel and distributed programming with Java. While jPVM accesses the “original” PVM implementation via the Java Native Interface (JNI) [18], JPVM is a purely Java-based solution developed from scratch. A detailed description of JPVM and an assessment of its performance can be found in [4] and [20]. In this paper, we focus on CORBA as a potential middleware solution for distributed parallel programming and we present our approach to the trans-

parent distribution and joining of data based on different strategies. The CORBA standard has been widespread in the area of object-oriented and distributed systems. Not only does it support independence of the computer architectures and programming languages to be used, it allows users a vendor-independent choice of ORB products as well. Prerequisite of the latter option was the introduction of a unique object reference, the Interoperable Object Reference (IOR), and a standardized transmission protocol, the Internet-InterORB-Protocol (IIOP), in CORBA 2.0. In the course of our research work on CORBA, we analyzed the interoperability of ORB products from different vendors [14] and examined portability and exchangeability of the stubs and skeletons produced by the corresponding Interface Definition Language (IDL) compilers [1]. The results of these examinations have shown that collaborations between different C++ and Java ORB products work well and, as far as Java is concerned, the files generated by the respective IDL compilers can be interchanged in many cases. The basic communication model propagated by CORBA is synchronous and blocking. Synchronous communication means that the client invokes a server operation and has to pause its own processing until the server has processed the call and finally acknowledged the termination of the operation. An acknowledgement even occurs if the return value of the operation being called is of type void. During the processing of the request on the server, the client blocks, i.e., no further activities can take place on the client. In case of communication errors or server failures, the blocking might last for an undesirably long period of time, since no acknowledgement would arrive from the server and the client would not be able to resume its operation before a certain timeout period would have passed which must be specified by the developers or users. Even if no such errors occur, blocking of the client can be very inadequate, e.g., in situations where operation calls lead to complex and time consuming computations. In that case, valuable processing time will unnecessarily be lost for the client if it

does not depend on the immediate return of the computed result, but could perform computations and evaluations on its own in the meantime. If CORBA is to be used as the middleware for a distributed, parallel application, this basic communication model seems to be insufficient in most cases. For example, there might be a “master” process residing on one computer in the network which dispatches tasks that are to be executed in parallel by several “worker” processes living on different computers in the network. In that case, a basic requirement is that it must be possible to trigger the workers via CORBA in an asynchronous way in order to achieve parallelism and to enable the master to continue its work. Therefore, we have to analyze first how asynchronous operation invocations can be performed and how client blocking can be avoided in CORBA. The following sections present possibilities of how to achieve this goal. The description is not limited to the means provided by the CORBA standard itself, but also includes techniques for solving these kinds of problems in Java.

2.

CORBA features for asynchronous operation invocation

The CORBA standard defines three possible ways of extending the basic synchronous communication model by asynchronous and/or non-blocking calls (cf. [13]), namely: • • •

the Interface Definition Language (IDL) keyword oneway, the Dynamic Invocation Interface (DII), and the Event Service.

The first option is the use of the IDL keyword oneway. With the help of this keyword it can be specified that a server operation only receives information from the client, but does not return any information back. For oneway-operations, only “in“parameters are allowed and the type of the operation result has to be void. Calls to operations specified as oneway are executed according to “best-effort” semantics, i.e., there is no guarantee that these messages actually will be delivered. It is only guaranteed that a call will be delivered at most once. There are no further details with respect to the semantics of oneway calls in the CORBA standard, although the standard obviously implies that these calls are asynchronous and non-blocking. As a matter of fact, all the ORB products we analyzed are based on the common understanding that oneway calls should be implemented

to show asynchronous and non-blocking behavior, i.e., after having sent its request, the client does not block, nor does it receive any kind of notification, whether the request were processed successfully or not. Should the request have not reached the server, e.g., due to some communication error, the client does not get any feedback about that situation. The second way of achieving our goal is by employing the Dynamic Invocation Interface. In that case, requests have to be made using a special request operation called send_deferred(), and the results can be obtained by calling an operation named get_response() at any time anywhere in the program code. Although the invocation is asynchronous, it nevertheless might block the client in case the server has not finished the processing of the request at that time. If, for any reason, the client must not halt when its flow of control reaches that point, the developer has to fall back on the features of the programming language used. A different alternative can be a “dynamic” oneway-invocation, executed with the help of operation send_oneway(). Since this approach does not provide any feedback, it is superfluous to call get_response(), and, consequently, no blocking of the client can occur. The Event Service represents the last of the three currently available possibilities of achieving asynchronous and/or non-blocking communication. Although it does not belong to the CORBA core, it was added to the OMG standard as a part of the Common Object Services Specification (COSS). The Event Service applies the Publish-Subscribe design pattern. Users of the Event Service are subdivided into suppliers and consumers. The so-called Event Channel represents the central element of this service and supports the following communication models: • • • •

pure Pull-Model, pure Push-Model, hybrid Pull/Push-Model, hybrid Push/Pull-Model.

As we have already illustrated in our interoperability analysis [14], an Event Service from a specific vendor can be used together with servers and clients that rely on an ORB from a different vendor without any problems. But, since the Event Service shows several weaknesses, the OMG specified an enhanced solution, the Notification Service, which we are not going to describe here. Further techniques of synchronous and asynchronous communication in CORBA, concerning “CORBA messaging”, can be found in [16] and [17].

3.

Avoiding blocking on the programming language level

Taking into account that asynchronous operation calls are connected with certain restrictions, synchronous calls are to be preferred in many cases. As mentioned before, the main disadvantage of synchronous communication techniques results from blockings the master is afflicted with. This problem can be solved on the programming language level, provided a language is used which is suitable for that purpose. For example, if C, C++ or Java are used, then blocking can be avoided by aptly employing multithreading techniques. However, in the case of C and C++ this might lead to restrictions with regard to portability. Furthermore, regardless of the programming language used, the development effort increases and the source code becomes less understandable.

4.

The components of our solution

The solution we suggest consists of: • • • •

GroupManager Group 1 Group 1 ... Group n Group n

Worker A Worker A Worker B Worker B Worker C Worker C Worker D Worker D

Fig. 1 : Structure and functionality of the OGSand Join Service approach In the following section, we will explain the components of our solution in detail. The basic design principle of our approach, i.e., the application of the

Introduction of an Object Group Service (OGS) for parallel processing

Due to the currently insufficient specification of the CORBA standard with regard to group communication, we developed our own Object Group Service (OGS) in order to support the design of applications using parallel processing already on the IDL level. Our primary goals in the development of the OGS were:



In this scenario, it is the responsibility of the OGS to pass a request issued by a master to the members of a group of workers. The Join Service is in charge of collecting the workers’ results, combining them and sending a single answer back to the master. Fig. 1 illustrates the structure and the functionality of the OGS and Join Service approach in a graphical way.

JoinServer JoinServer

5.



an Object Group Service (OGS), a Join Service, one or more masters, and one or more workers.

Master Master

“divide and conquer” philosophy, is comparable to the Master-Slave design pattern described in [2]. However, by focusing on a CORBA context, adding features for the dynamic management of groups of workers, providing a dedicated service for recombining the partitioned data chunks, and integrating several distinct dispatching and joining strategies, we go far beyond the basic ideas expressed in the MasterSlave design pattern and show how to apply the “divide and conquer” principle in a CORBA environment in practice.



simplicity with respect to the handling of the OGS, supported by a restriction of the scope of the OGS to core functionality, generality of functionality by abstraction from specific data structures in order to make the service suitable for a great variety of tasks, and support of several different data dispatching strategies in order to allow the realization of different computing algorithms and to decrease network traffic (compared to sending the complete data to each worker).

Our design and the resulting implementation aim at supporting the parallel processing of CORBA operation calls, i.e., a request issued by a master is propagated to any number of workers in parallel, which process the data transferred independently of each other. An OGS client has to implement an interface called Master, containing only one single operation, called receive(), which gets information issued by the Join Server. A worker, on the other hand, has to implement interface Worker, including operation send() which not only gets information about the operation to be executed as an argument, but is also provided with the Interoperable Object Reference (IOR) of the Join Server as well as some information about the call itself. The IOR is compulsory in order to be able to send the result back to the master via a callback. As a first step, the master asynchronously transmits the message to be performed, the dispatch-

ing strategy of choice together with additional information, its own IOR, and a particular joining strategy to a particular group of workers by calling onewayoperation send(). The second step comprises informing the Join Server of the prospect of getting results from the workers and dispatching the message to the group members, the workers. The last step is represented by the workers calling the Join Service’s operation receive(), where the results of the computations are combined to a single end result that can be sent back to the master. The responsibility of operation send() specified in IDL interface Group is to propagate a call issued by the master to all members of the group. The interface describes functionality for attachment and detachment of workers to and from a group, respectively. If a worker tries to register itself although it has already been registered before, an AlreadyRegistered exception will be raised. Similarly, an attempt to detach a worker which is not registered leads to a NotRegistered exception. With the help of operation get_size(), the number of workers belonging to a specific group can be found out. The result of this operation can be helpful when the concrete dispatching strategy has to be decided upon. By calling operation merge(), the OGS informs the Join Service of the event of sending a master’s request to the members of a group, i.e. the workers registered with the group. This message contains additional information that enables a quicker identification of the incoming worker messages with respect to the different groups and the multicasts involved. To be able to administer several groups of workers, we defined a GroupManager interface. It provides operations for creation (create()), listing (list()), determination (resolve()), and deletion (destroy()) of groups. Operation create() will raise an AlreadyExists exception in case a group with the name specified was already created, operation resolve() will throw a NotFound exception in case the group does not exist, and, in the same case, the invocation of operation destroy() will lead to a NotAvailable exception. A DataStructure contains the data that has to be distributed as well as information about the dispatching strategy that has been chosen (cf. section 6). A JoinStrategyInfo contains information relevant for the recombination of data (cf. section 7). Both the result parameter of operation receive() and the data element inside DataStructure which is used as the message parameter of operation send() are of CORBA type any. The reason for this design decision is that it enables the transmission of arbitrary data types and structures

through the OGS. The price for this flexibility is a certain loss in performance, because marshalling and unmarshalling of type any takes more time than that of simple data types. As we already pointed out in section 1, it is desirable to minimize the blocking of the master. Therefore, by specifying the Group interface’s operation send()as oneway, we use one of CORBA’s built-in mechanisms for asynchronous communication. During our studies, we tested another approach, based on a synchronous call to the OGS, but applying multithreading in order to minimize the master’s blocking. In that approach, operation send() simply adds the data arguments to a queue and returns immediately, thus keeping the blocking time of the master very low. For each group of workers, a dispatcher thread is started which reads the queue and dispatches the data to the workers. However, a performance evaluation of the two solutions came to the result that, because of the reduced communication overhead, the asynchronous solution is about 60 times faster than the synchronous one based on multithreading. This only refers to the time the master is blocked during the invocation of operation send(); the processing of data performed by the workers and the recombination of the partial results are excluded. In our test environment, we had a master on a Pentium III PC with Linux 2.2.10 perform 100 calls to each version of the OGS. The OGSs were located on a SPARCstation 10 with Sun Solaris 2.5, together with the Join Service and five workers (connection via 10 MBit Ethernet). The synchonous calls took 14092 ms, while the asynchronous calls only required 230 ms. In the following examples, we refer to the asynchronous solution only.

6.

Data dispatching strategies

The design of our Object Group Service allows the distribution of data independently of the data types or structures at hand. The developer only has to pay attention to the requirement that the data has to be organized as a CORBA sequence, i.e., a vector with variable length, in case a different strategy than simply passing the whole data to all workers is to be applied. The data elements of the sequence, on the other hand, can have any simple or complex structure. Simple basic data types are just as permissible as complex data types containing other data types such as simple types, arrays, sequences, self-defined (and possibly layered) structures. At runtime, the OGS dynamically determines the type of data contained in the CORBA sequences and copies the data into new subsequences of the same data type. How many of these subsequences have to be created depends on the

number of workers and the number of data sets to be dispatched. A positive aspect of this approach is the possibility to distribute even sequences of data types/structures that were not known when the OGS was developed. Thus, the OGS is capable of covering a broad field of current and future applications. Element dds in a DataStructure serves for specifying the dispatching strategy for the data to be distributed. Furthermore, a DataStructure contains information needed for the execution of the chosen strategy. In addition to the element dds mentioned before, there are some strategy-specific components. The array data_size provides information necessary for the execution of the DIFFERENT_SIZE strategy (see below). The array data_length, on the other hand, is used if the SAME_SIZE strategy applies. The following snippet from the IDL interface of the Object Group Service illustrates this:





enum DataDispatchStrategy { ANYTHING, PER_ELEMENT, SAME_SIZE, DIFFERENT_SIZE }; typedef sequence sizeSeq; struct DataStructure { DataDispatchStrategy dds; sizeSeq data_size; unsigned long data_length; any data; };

The OGS supports the following data dispatching strategies: • •

ANYTHING:

The OGS sends the complete data sets to all the workers. PER_ELEMENT: Here, the first element is sent to the first worker, the second element to the second worker, and so on. When the last worker has received its data the cycle starts again with the first server (provided that there is still more data to be distributed), i.e., given that there are n workers, the first worker gets element n+1, the second worker gets ele-



ment n+2 and so on, until there are no more elements left. SAME_SIZE: With this strategy, the OGS uses the value of data_length as the specification of the number of data elements to be sent to each worker. The amount of data specified in data_length is assigned to each worker, e.g., if data_length=3, the first worker gets the first three data sets, the second worker gets the next three data sets, and so on, until no data sets are left. Should the number of data sets to be transmitted exceed the product of the number of workers and data_length, the surplus data sets are lost. In the opposite case, some workers might receive no data at all. DIFFERENT_SIZE: If the DIFFERENT_SIZE strategy is used, the data_size component will be evaluated to determine the individual quantity of data sets to be assigned to each specific worker. The sequence data_size consists of an array of integers whose values reflect the number of data sets the corresponding worker has to process. This means that data_size[i] contains the number of elements to be delivered to worker i+1. It is permissible to have the value 0 contained in the sequence, in which case the corresponding worker does not get any data. Should the actual number of data sets to be distributed be bigger than the sum of the integer values contained in data_length, the additional data sets are simply neglected. In the opposite case, the specified quantities are being delivered as long as there are data sets left. If the OGS runs out of data, the dispatching is terminated, so that some workers might possibly remain unconsidered. Default: If the OGS is not able to recognize the data format provided (i.e., if it is not a CORBA sequence), it will use its default dispatching strategy which is equivalent to the ANYTHING strategy. Thus, the complete data will be sent to all the workers registered with the service.

Table 1 shows an example consisting of three workers and a simple data sequence: {1, 2, 3, 4, 5}. The rows of the table illustrate how the data will be distributed between the three workers depending on the kind of dispatching strategy in use.

Table 1: Distribution of data resulting from different dispatching strategies Strategy

Worker 1 Worker 2

Worker 3

PER_ELEMENT

1, 4

2, 5

3

DIFFERENT_SIZE

1, 2

3

4, 5

SAME_SIZE { 3 }

1, 2, 3

4, 5

ANYTHING

1, 2, 3,

1, 2, 3,

1, 2, 3,

4, 5

4, 5

45

1, 2, 3,

1, 2, 3,

1, 2, 3,

4, 5

4, 5

4, 5

{ 2, 1, 7 }

UNKNOWN

7.

The Join Service

The Join Service represents the counterpart of the OGS. Its function is to collect and combine the results of the group requests that have been sent to the workers. Subsequently, the end result of the recombination can be sent back to the master. To be able to perform this task, the Join Service needs some information from the master and from the OGS. The information originating from the master is about which strategy has been chosen for joining the resulting data. Presently, the following joining strategies are supported: • •

COMPLETE and PARTIAL.

The COMPLETE-strategy has the effect that the Join Service will wait for the results of all the workers involved. In case of a crash of one or more workers during computation, this means that no result will be reported back to the Join Service by those workers and the service threads waiting for the results will have to wait infinitely. Using the PARTIAL-strategy, this drawback can be avoided. In addition to information about the strategy itself, a timeout value has to be specified for the service. In this case, the Join Service maximally waits for the specified period of time and, subsequently, reports back the results of those workers which have already delivered. If all the workers send their answers before timeout, the Join Service will report the complete end result back to the master immediately, not waiting for the end of the timeout period.

The data about the joining strategy is transferred from the master to the OGS, which adds its own information about the group identity, the job identity and the identities of the workers involved, and transmits the extended data to the Join Service. This additional information is needed for an accelerated identification of the incoming worker messages. The sequence of messages reaching the Join Service may be arbitrary; because of the identification information the service will be capable of recombining the results accordingly, no matter whether the first worker is also the first one to reply or not.

8.

An OGS scenario modeled with UML

Figure 2 shows a UML Deployment Diagram corresponding to our implementation. It is an instance level diagram depicting a snapshot of a system implementing the OGS and Join Service approach at runtime. The different computational nodes, e.g., the computer on which the master is running, the network etc., are symbolized by 3D boxes. The CORBA objects implementing master, workers, the OGS and the Join Service are modeled as UML components with explicit interfaces (shown in “lollipop” notation) which represent their IDL interfaces. The solid lines between the nodes show hardware connections used for the actual flows of information. Dependencies between the components are expressed by dashed lines, e.g., a component uses an interface of another component. For example, the master component depends on the GroupManager interface of the :GroupManager component in order to be able to receive a list of available groups by calling operation list(). Let us have a look at an example scenario of the use of the OGS and Join Service for parallel processing purposes. The UML sequence diagram in Fig. 3 shows the dynamic flow of messages in the scenario described below. In the example, worker1 creates two groups g1 and g2 and registers itself with these groups. With the help of operation list(), worker2 finds out which groups exist and registers with them. Last, the master requests a list of the existing groups and sends a message to groups g1 and g2, from where the data is dispatched to the workers and the information required for recollecting the data is transmitted to the Join Manager.

:ClientMachine

:OGSMachine

:ServerMachine

grpman :GroupManager

master:Master

worker1:Worker

GroupManager

Master

Worker

> ck>

lba

al

Suggest Documents