A Case for Event-Driven Distributed Objects - CiteSeerX

3 downloads 34991 Views 202KB Size Report
very attempt to mimic centralized programming implies that distributed objects create the illusion that threads traverse the whole distributed application.
A Case for Event-Driven Distributed Objects Aliandro Lima, Walfredo Cirne, Francisco Brasileiro, and Daniel Fireman Laborat´ orio de Sistemas Distribu´ıdos Departamento de Sistemas e Computa¸c˜ ao Universidade Federal de Campina Grande 58109-970, Campina Grande, PB, Brazil [aliandro,walfredo,fubica,fireman]@dsc.ufcg.edu.br

Abstract. Much work has been done in order to make the development of distributed systems as close as sensible to the development of centralized systems. As a result, there are today good distributed object solutions that closely resemble centralized programming. However, this very attempt to mimic centralized programming implies that distributed objects create the illusion that threads traverse the whole distributed application. This brings all the problems related to multi-thread programming, including the need to reason about the thread behavior of the whole application, which gets amplified by the large scale and inherent non-determinism of distributed systems. Moreover, distributed objects present other troubles when the application is not “pure” client-server, i.e., when the client has other things to do besides waiting for the server. As an alternative, there are a number of message-based non-blocking communication solutions. Unfortunately, these solutions were not designed to directly address the above mentioned issue of multi-threading over the whole distributed application. In addition: (i) these solutions are not as well integrated to the programming language as distributed objects, and (ii) most of them do not provide a well-defined embedded failure detection mechanism, something that is crucial for the development of many distributed systems, and that is well solved by distributed objects (as they couple method invocation and failure detection). We here propose and evaluate an improvement for such a status-quo, named JIC (Java Internet Communication). JIC is an event-driven middleware that relies on a non-blocking communication model, yet providing close semantics to the object-oriented paradigm. JIC is designed to combine the best characteristics of distributed objects and messagebased solutions. For instance, JIC defines precise scope for the application’s threads, promotes non-blocking communication, provides a failure detection service that is simple to use with precise semantics, and has performance comparable to Java RMI. Furthermore, JIC is designed to be firewall and NAT friendly, greatly helping the deployment of JICbased applications across multiple administrative domains.

1

Introduction

In the last decades, much work has been done in order to make the development of distributed systems easier. Indeed, a great number of those efforts relied on

papering over the distinction between centralized and distributed programming. Actually, if distributed programming were as simple as centralized programming, distributed system developers could take full advantage of the object-oriented paradigm and write their programs without worrying about object location. However, programming a distributed system can not be as simple as programming a centralized system. There are fundamental characteristics that are not present in a local environment, like partial failure in absence of a central manager, mandatory concurrency and partial connectivity (the last one due to the increasing use of firewalls and NATs). Therefore, unifying the programming models for local and distributed objects is doomed to failure if the particular characteristics of distributed systems are not considered [1]. This means that such a unification, in the best case, would make local programming as complex as distributed programming. Such a realization allowed for great progress with distributed objects [2] as some solutions can cope with the differences between distributed and centralized programming, while integrating well to the programming language (e.g., Java RMI [3, 4] and CORBA [5]). These solutions provide middlewares that hide the basic building blocks of processes and message passing, providing a distributed programming model as close as sensible to the object-oriented paradigm. This is done via a blocking communication paradigm, in which a client invokes a server and blocks until receiving a response or detecting a failure. Note that the blocking client thread gives the illusion that the thread goes beyond the local address space, traversing the distributed application. In reality, each invocation is remotely performed by a new thread. Because of that, the server application is forced to be multi-threaded even if it could otherwise process remote requests sequentially. Much worse, since the new thread can be thought of as an extension of the invoking thread, the programmer must reason about the thread behavior of the whole distributed system as to assure that thread-related problems do not occur. In particular, one cannot combine two correct components and know if the result is still correct without examining the two codes [6]. The problem arises because one has no control over how a third part code acquire locks, and thus the composition of components may lead to distributed deadlocks. In short, if centralized multi-thread programming is already complex and failure-prone [7], distributed multi-thread programming only makes matters worse. Moreover, for applications that are not “pure” client-server ones, i.e., when the client has other things to do but to wait for the server (e.g., interactive, peer-to-peer and performance-constrainted [8] applications), the blocking nature of distributed objects invocation does not fit well. The most obvious way to circumvent this problem seems to be creating an extra thread to block on each invocation, releasing the original thread to continue working. However, this solution has a series of drawbacks. First, creating a new thread for each new invocation may lead to a thread explosion in the client if it is performing a great number of simultaneous invocations. Moreover, if the server needs to callback the client, communication initiation should still be possible

in both directions (from client to server and vice-versa), which is typically a problem due to firewalls and NATs. Finally, since the application thread does not block, the failure detection mechanism must be augmented. Note that, in distributed objects, the failure detection mechanism is embedded in the remote method invocation. If the invocation is done through a new thread, there should be a mechanism to notify the original thread when a failure occurs. A possible way to deal with the problems of distributed objects is to avoid them altogether, and instead, use message-based communication solutions. There are a number of message-based communication alternatives available, ranging from sockets to sophisticated Event-Based Middleware (EBM) [9,10] and MessageOriented Middleware (MOM) [11, 12], passing by parallel programming support as PVM [13] and MPI [14]. These solutions do not block waiting for results, and thus some of the problems described above simply do not exist (thread explosion, for example). On the down side, they do not integrate with the programming language as nicely as distributed objects. In 2005, we faced the need for a communication solution that could solve the problems discussed, but yet remaining lightweight and working well in environments composed by multiple administrative domains (read firewalls and NATs are present) where failures are the common. Such a need came from the OurGrid project, a free-to-join peer-to-peer computational grid in which research labs donate their spare computation resources to other labs [15]. OurGrid was based on Java RMI and although it is in production since December 2004, the problems of using distributed objects were making the system brittle and hampering its evolution [16]. It was clear that the communication solution we needed was message-based. However, we could not find an existing message-based solution which was a good match for OurGrid needs. The problem was that existing solutions were designed with other goals in mind. Sockets simply aim to expose the transport layer to application programmer. EBMs and MOMs try to decouple the components of a distributed system via the publish-subscribe paradigm, often augmented by support services, like store and forward, messaging routing and transaction management, done by integration containers. PVM and MPI, on their turn, are designed to facilitate collective operations, hence facilitating the development of parallel applications. For example, none of them has good support for firewalls and NATs traversing, or aims to scope out the threads of a distributed application. JIC (Java Internet Communication) then appeared as a solution for our needs in OurGrid. And, since we had the chance to develop a novel solution, we made it well integrated with the programming language (in the case, Java), challenging the belief that message-based solutions must rely on send/receive primitives. The key point of keeping a good integration to the programming language is to allow for the type-checking system to work also for the interaction among distributed components. As we show in details throughout the paper, JIC (i) is event-driven (i.e., a message-based solution on which messages are consumed by threads that process

them in an event loop); (ii) relies on a non-blocking communication model, yet providing close semantics to the object-oriented paradigm; (iii) creates explicit scope for threads, called access points; (iv) is NAT and firewall friendly; (v) provides a failure detection service that is simple to use and has precise semantics; and (vi) has performance comparable to that of Java RMI. We are currently working on migrating OurGrid [15] from RMI to JIC. This on-going experience suggests that JIC can be a very interesting solution for applications that are not well served by distributed objects or other eventbased solutions. We conjecture that JIC should serve well most applications that remain being written at the socket level, like most peer-to-peer applications. The remaining of this paper is organized as follows: the next section introduces the JIC solution, highlighting how the discussed problems are addressed and explaining the semantics that JIC provides to the programmer that develops a distributed application using it. We also explain how this semantics is achieved. After that, in section 3 we present the results obtained from a performance and software engineering evaluation over JIC. Then, we provide some discussion about related work in section 4 and finally conclude in section 5.

2

The JIC Solution

JIC’s main goal is to provide non-blocking access to Java objects located in a distributed system in a way that threads can have a well defined scope. As such, we require that JIC objects only carry void-returning methods, i.e., methods that provide no return. Thus, the client object can promptly proceed to the next line of code. The server object will have the target method eventually invoked. Clearly, JIC objects are different from Java objects. Besides the required void type, the methods of JIC objects throw no exceptions. A little more subtle (and more important) difference is that the thread from the client (invoker) object never goes into the server (invoked) object. Invocation only marks the invoked method to be executed with the server own thread (actually, with the access point thread, as we shall see briefly). Nevertheless, JIC provides automatic object marshaling (saving the programmer from creating tedious “message” classes) and enables the Java type-checking to work. Type-checking greatly helps to capture bugs in early development stages and is this characteristic that makes us consider JIC well-integrated to the language. Note that the automatic object marshaling and type checking are possible due to the use of stub objects, which in JIC are automatically generated in run-time. A JIC object must implement the EventProcessor interface, which denotes that the object can be exported by JIC (as Remote denotes an RMI object), and contains a few methods that must be implemented by any JIC object. The code fragment below shows an excerpt of the OurGrid EventProcessor that performs job submission. public interface MyGrid extends EventProcessor{ public void addJob(JobSpecification jobSpec);

} public class MyGridImpl extends SimpleEventProcessor implements MyGrid{ public void addJob(JobSpecification jobSpec){ ... } } Notice that the MyGrid interface extends EventProcessor. In addition, the MyGridImpl class implements the EventProcessor interface by extending the SimpleEventProcessor class. SimpleEventProcessor is a default implementation for the EventProcessor methods and provides a specialized semantic for remote objects, overriding the default local semantic from equals, toString and hashCode methods. It was developed to avoid the burden of programming that for each application EventProcessor. Hence, by extending SimpleEventProcessor, it is possible to inherit the semantics provided by JIC. 2.1

Connectivity

JIC is built on top of Jabber [17, 18], which provides a good substratum for firewall- and NAT-friendly messaging passing. As such, this is the infrastructure used by the EventProcessors objects so as to communicate. Jabber is a set of streaming XML protocols, mostly used for instant messaging, that enables entities to exchange messages close to real time. Its architecture is something similar to the e-mail architecture. If two entities want to communicate, they must create an account in a Jabber server, receiving a Jabber Identification (JID). An entity that has a JID and connects to a Jabber server is named a Jabber client. JIDs are based on DNS and recognized URI schemes, in a manner that they have the same form of an e-mail address, as shown below: node@domain/resource In the JID above, domain is a Fully Qualified Domain Name (FQDN) for the Jabber server, node is the client that connects to the server and resource is a session-specific object that belongs to a client, such as a device or location. Therefore, a message sent from a Jabber client to another, first goes to the server of the sending client, which forwards it to the server of the recipient client, which finally relays it to the recipient client. Note that the use of Jabber server as relay greatly reduces the problems posed by the use of firewalls and NATs. The problems with firewalls and NATs arise because they limit the connectivity between peers, typically making possible to start a connection from only one direction. Since every EventProcessor communicates through a Jabber server, one needs to open just a single port per administrative domain, no matter how many objects are exported by an application. This port is exactly the one that makes the Jabber server accessible by other Jabber servers. Indeed, if the instant messaging port (5222) is

opened by default, Jabber communication can be done without any additional administrative effort. 2.2

Access Points

Since JIC uses Jabber as a NAT and firewall friendly communication substratum, JIC method invocations are sent to Jabber clients. Each Jabber client used by JIC is known as an AccessP oint. An access point is the entry point for a set of JIC objects. In order to receive remote invocations, a JIC object must be exported by some access point. The following code fragment creates an access point and exports OurGrid job submission object in the access point peer. ... MyGrid mygrid = new MyGridImpl(); String myGridName = "mygrid"; ... String accessPointName = "[email protected]/myGridAP"; AccessPoint accessPoint = new AccessPoint(accessPointName); accesspoint.start(); accesspoint.bind(myGridName, mygrid); Each access point contains a queue of invocations to be performed on the objects it holds, as well as a set of threads that perform such invocations, as depicted in Figure 1. That is, any invocation to a method m that belongs to object obj that is exported via the access point ap, is marshaled by the JIC runtime, transferred to ap using Jabber, and stored at ap’s invocation queue. The threads that belong to the access point ap remain in an eternal loop: get invocation from the queue and perform it. Note that the invocation queue is processed in FIFO order. Therefore, if ap was created with t threads, and there are t + u not yet completed invocations of ap’s objects, the last u invocations remain in the queue waiting for the t threads to finish with their invocations. In particular, if ap has a single thread (i.e., t = 1), the JIC objects exported by ap do not need to worry about synchronization and thread safety. Note that an access point is a self-contained component that sets boundaries for its threads. It is exactly such a feature that makes threads more tractable in JIC-based applications. In JIC, threads do not travel over the whole distributed application. Instead, each thread remains confined within its own access point. This render thread usage composable and modular in JIC. One does not need to understand and reason about the thread behavior of the whole application. This can be done one access point at a time. Note, however, that if two access points are running on the same JVM, it is currently up to the programmer to ensure that the threads from one access point do not traverse over objects of the other access point. We are currently working on automatic ways to instrument the code and use monitoring techniques to detect violations on this [19].

AccessPoint

Jabber Message Communication Object

JICEvent

EventHandler JICEvent JICEvent EventHandler

... JICEvent

EventHandler

MethodCall MethodCall

EventProcessor MethodCall

n threads

Fig. 1. Access point overview

2.3

Services and Objects

JIC objects are said to provide a service, i.e., a computational task invoked by another program. We differentiate between services and objects because a given JIC object may go down and have to be replaced by another object in the future. In fact, this commonly happens in practice due to a number of reasons: power failure, reboots to install patches, software rejuvenation, etc. The service is the permanent entity that fulfills a given role in the distributed system (e.g., the name service for a domain). The object is the Java object that at some point in time is in charge of a service. At any moment in time, a service is provided by (at most) one object. The service id is formed by the access point id together with the service name (which is given by the application) and is a globally unique identifier. The object id is formed by the service id together with an incarnation number. JIC objects can choose their incarnation number. This is meant to be used when the service is implemented as to save the state of the object that provides it, as to recover this state when a new object takes over the service. In such a situation, moving from one object that provides the server to another is just seen as a temporary service unavailability. When the object that implements that service is stateless, however, it will have no history of the past interaction with remote objects and, therefore, these remote objects must know the object they were interacting with no longer exists. In this case, the incarnation number can be randomly generated when the object is being exported. In such a manner, when the service reboots, it receives a new incarnation, denoting that there is a new object providing the service and there is no state about past interaction with this service. In short, the differentiation between services and objects is needed because a remote object may want to know that a service it is using has just rebooted. If such a service is stateless, the remote object likely wants to take actions, such as resubmit pending requests. This way, JIC lookup is done on a service, but it returns an object. JIC communication is always among objects.

2.4

JIC Semantics

Understanding the precise semantics of the communication infrastructure is a key for developing robust applications. JIC is no different. One must understand the semantics provided by JIC to develop an application using it. By doing that, it is possible to understand how the communication among objects happens and in which situations the JIC infrastructure will detect a failure. Consider a scenario in which there are two JIC objects A and B, exported within access points APA and APB , respectively, intended to communicate to each other. Access points APA and APB can be the same, distinct access points in the same address space, or in different address spaces, possibly in distinct machines. Communication is done through remote method invocation, such that an invocation is marshaled into a message and sent to the receiver side, where it will be placed in the access point queue to be consumed in FIFO order. Every parameter but EventProcessors is passed by value. EventProcessors, since are remote objects, are passed by reference. Given that, we have some definitions: Let SA and SB be the Jabber servers used by the access points APA and APB , which contains the objects A and B, respectively. A path between two JIC objects A and B is composed by all Jabber elements that are involved in the communication of these objects. Thus, we can say that there is a path between A and B if and only if SA and SB are up and all the communication links are active, i.e., they are able to communicate. Similarly, we say that there is no path between A and B if SA or SB is down, or even if at least one communication link is not active. If there is a path between A and B, we say that A is connected to B, which we denote connectedA (B). Obviously, a path between two objects is of little use if both objects are not available. We say that an object is available if the object is exported via an access point. In the same way, we say that an object is unavailable if the object is not exported via an access point. Of course, if the access point is down, the object is unavailable. For an object A, the notation of its state is represented as available(A) if the object is available and ¬ available(A) otherwise. Now, we can define the reachability relation between JIC objects, that will be of great importance for understanding JIC semantics. reachableA (B) |= available(A) ∧ available(B) ∧ connectedA (B) Given the above definitions, JIC provides precise semantics to the programmer that writes a distributed application. Considering that an EventProcessor A has registered interest in the failure of an EventProcessor B and vice versa, JIC guarantees that: 1. if B stays reachable from A, then every sent message from A to B is eventually received by B. Messages between two JIC Objects are delivered in a FIFO order; 2. if B becomes unreachable from A, then A will eventually be notified that B is unreachable. In this case, we say that A suspects B.

3. if there is at least one message lost from A to B, A will eventually be notified that B is unreachable. In addition to it, B will also be notified that A is unreachable, i.e., A suspects B and B suspects A. 2.5

Assuring the semantics

Note that “A suspects B” does not guarantee that B has become unavailable. It is possible that a message from A to B has been dropped in the network, or B is too slow in responding the failure detection messages from A. This means that the programmer must reason about the application knowing that either the invocations are successful or the caller is going to be notified of a failure. However, being notified does not assure that a failure happened. In order to avoid any inconsistent behavior, and guarantee that the above semantics hold, JIC has a failure detection mechanism that implements a probabilistic model described in [20]. Our implementation provides a flexible framework in which it is possible to balance the trade-off between detection time and wrong suspicions. Therefore, JIC access point provides a high level failure detection interface that makes possible for an object to be interested in an object failure or a service recovery. Once A registers interest in the failure of B, the local APA failure detection mechanism starts monitoring B. This monitoring is done in a pull model [21], in which the APA failure detector periodically request information about the availability of B. The local APB receives these requests and responds whether B is alive or not. If there is no response up to a threshold (that can be application specific), APA will suspect that B has failed. Nevertheless, there is a situation in which B stays available and reachable from A, but does not receive a message due to a brief network failure. In that case, the APB failure detector was able to respond APA monitoring messages, although B has failed on receiving an application message. Indeed, APA needs to detect such a failure, but cannot do it only by using the failure detection messages. Because of that, we implemented a handshake protocol similar to TCP three-way handshake [22] that establishes a connection between two access points before objects start to communicate. This handshake is responsible for defining sequence numbers for messages between two access points. This means that every application message carries such a sequence number in a way that it is possible for the other side to identify if a message was lost (Since messages between two access points are delivered in a FIFO order, a sequence number greater than the expected denotes a lost message). In this situation, the access point that identified the message lost (APB ) releases the connection (also using a 3-way protocol) and notifies B that A is unreachable. The same happens for APA , after finishing the release protocol. Then, both A and B are notified that the other is not reachable, as stated in the JIC semantics. It is important to notice that the application does not need to know which message has been lost. It only needs to be notified that an object of interest has failed. Note that, although we are using TCP between the access points and the Jabber servers, we can not simply rely on the TCP three-way handshake pro-

tocol. This happens because we do not have an end-to-end TCP connection, but one point-to-point connection between each access point and its server, and between the server themselves.

3

Evaluation

In this section, we evaluate the use of JIC against Java RMI, which is a representative well succeeded example of distributed object middleware. The experiment focuses on performance, but we also provide a preliminary assessment of the software engineering benefits of setting scope for threads. The performance evaluation consisted on running an application that multiplies matrices. Basically, the application consists on reading two matrices from the disk (randomly reading numbers from a very large file), multiplying them, and calculating the greatest sum of elements from a column in the resulting matrix. For each request, this procedure is done twice, such that, in the first time, it is done with concurrency control (there is a critical region protected by a java synchronized block). The greatest value between the two executions is returned as result. We believe that such application is representative of real workloads as it combines processing within and without critical regions, as well as disk access with processing. The experiment was executed in a local area network using machines with about the same configuration. Timing measurements were performed in machines with no users, such that they are not influenced by other processes. Moreover, the experiments were conducted in periods of light network traffic, as to reduce communication contention. We divided the experiment in two scenarios: client-server and peer-to-peer applications. In the former, clients performed requests on a single server, asking it to multiply matrices of a specified order. After processing, the server returned a result, which delimits the end of a request. In the latter type of application, there were multiple peers running. Peers produce requests, which are processed locally and forwarded to other three random peers, a process that is repeated until each request reaches the entire set of peers. After receiving the response of the forwarded requests, each peer compares it with the local result, selecting the greatest value as the request result. If a peer receives the same request twice, it does not process the request again, but only replies with the already calculated result. For each kind of application, we executed scenarios with two distinct matrix orders. One that is smaller enough to be processed in few milliseconds (matrix order 10) and other that was processed in seconds (matrix order 150). For each execution, every client/peer performed 100 requests. The metric that was measured was the request processing time average for a single request. Figure 2(a) and 2(b) show the results we obtained for client-server scenario. The x axis represents the number of clients that simultaneously accessed the single server. In the y axis, we have the request processing time mean considering all the clients and the standard deviation for that metric, represented as an error bar.

Client-Server Matrix Multiplication (100 requests, order 10) 5000 request time mean and standard deviation (ms)

4500 4000 3500 3000 2500 2000 1500 1000 500 0

JIC

1

RMI

JIC

RMI

3

1

JIC

RMI

JIC

5 number of clients

RMI

JIC

10

RMI

20

(a)

Client-Server Matrix Multiplication (100 requests, order 150)

request time mean and standard deviation (ms)

30000

25000

20000

15000

10000

5000

0

JIC

1

RMI

1

JIC

RMI

3

JIC

RMI

5 number of clients

JIC

RMI

10

JIC

RMI

20

(b) Fig. 2. (a) Client-server application for a small matrix and (b) for a big matrix

We can see that JIC performance is comparable to Java RMI. The greater times sometimes observed in JIC can be justified by the use of the Jabber server, which acts as a relay for every message between clients and the server. However, we consider that this is a good trade-off, since the use of the Jabber server makes JIC a firewall and NAT friendly solution and, besides that, we did not need to worry about concurrency in JIC, as the server access point was monothreaded. In RMI, the server has as many threads as the number of simultaneous clients, so that the server must be thread-safe. Figure 3(a) and 3(b) shows the same measurement for the peer-to-peer scenario. This scenario was executed for a network of 20 peers. The results, however, show that JIC average request times are smaller for both matrix sizes. We believe this happens due to the blocking nature of RMI application. In RMI, each request is locally processed, then forwarded to a random peer, and only forwarded to the second peer after the receiving of the result from the first random peer. (Likewise, the forwarding to the third random peer is only done after the second peer returns) As such, for the whole system of n peers, there are at most n2 threads, but n2 − n are blocked waiting for a response of some other peer. JIC requests although dispatched in parallel, are processed by a single thread in each peer, but the peers do not suffer the overhead of thread management. Although we have not finished porting OurGrid to JIC, we have already changed most of its interfaces to explicitly implement an event-driven architecture in which threads are modularized in the same way this is done by JIC. Communication is still RMI, but after a remote method is invoked, it creates an object denoting the invocation, and places it in an invocation queue. Invocation queues are consumed by threads that never leave the “access point”. All business logic is within “access points”, behind the invocation queue. RMI methods only serve to relay invocations to the queue. We only retained the traditional RMI usage for the components that are commonly installed in different administrative domains, on which the need for callbacks would generate great administrative burden to open firewall ports and/or set-up reverse NATs. Even considering that the application of the event-driven architecture did cover all code, and that much of the existing code is tedious and will disappear with JIC (i.e., the “invocation classes” and all RMI code that exists before the invocation queue), the results were very positive. We were able to release a long delayed release of OurGrid (version 2.2), which was “stuck” in a fix-a-bugintroduce-another loop. Maintenance time was drastically reduced. Programmer felt much more confident about their code. As a numerical evidence, the number of synchronized blocks in the code have decreased 36,8%, the efferent coupling [23] has decreased 4.6%, the lack of cohesion of methods (LCOM*) [23] has decreased 26.6%. We refer the reader to [16] for a detailed description of our experience of introducing the event-driven remote invocation in OurGrid. By finishing the migration of OurGrid to JIC, we expect to further improve the code. Expressing an event-driven architecture, JIC is much more natural than RMI methods that explicitly place “invocation objects” on an event queue.

P2P Matrix Multiplication (100 requests, order 10)

request time mean and standard deviation (ms)

14000 12000 10000 8000 6000 4000 2000 0

JIC

1

RMI

20 number of peers

(a)

P2P Matrix Multiplication (100 requests, order 150)

request time mean and standard deviation (ms)

80000 70000 60000 50000 40000 30000 20000 10000 0

JIC

1

RMI

20 number of peers

(b) Fig. 3. (a) Peer-to-peer application for a small matrix and (b) for a big matrix

A lot of OurGrid code is just going to disappear, further simplifying the software and increasing its understandability.

4

Related Work

The Common Object Request Broker Architecture (CORBA) [5] provides two main non-blocking communication models: (i) the Oneway invocation; and, (ii) the Deferred Synchronous model. In [24], Schmidt et al. shows that Oneway calls guarantee neither non-blocking semantics nor reliable delivery. In the Deferred Synchronous model, after making a request, the client can either poll to see if the target object has returned a response, or it can perform a separate blocking call to wait for the response. The deferred synchronous model is inefficient due to its reliance on the Dynamic Invocation Interface (DII), which allocates memory and copies data excessively [24]. To address these problems, the CORBA Messaging specification [25] introduces the Asynchronous Method Invocation (AMI) [26], which specifies two similar non-blocking communication models. In the Polling model, the invocation returns a Poller object. The client can use the Poller methods to obtain the status of the request and the value of the reply from the server (in a blocking way or not). In the Callback model, a callback entity named ReplyHandler is passed as a parameter when a client invokes a server and the server responses are redirected by the client ORB to the ReplyHandler. However, few ORBs fully implement this specification. There are works that aim to built remote invocation mechanisms in Java, like RMI. In [27] and [28], the authors propose to extend the model used in RMI to provide non-blocking communication. The solution consists on the automatic creation of threads to deal with each remote call. Despite the simplicity and good integration with the language these solutions have the same drawbacks of RMI, since they use the same communication layer and do not prevent the thread explosion that can occur at the client side if many simultaneous invocations are performed. Message-oriented middleware (MOM), such as IBM’s MQSeries [29], are simple and provide low coupling between their entities. Basically, MOMs are composed by three types of entities: (i) the event service, that is the entity responsible for the management of the events, notifying the interested entities that some event has happened; (ii) the interest objects, that publish events in the event service; and, (iii) the receptor objects, that subscribe their interest in the occurrence of events of a given type (publish/subscribe model). Despite the well-defined API to exchange messages, MOMs are not fully (and transparently) integrated to programming languages. MOMs architecture provide mechanisms that allow suppliers to reliably transmit messages asynchronously through the use of special entities named containers. The containers (or servers) route and process messages on behalf of application processes. It uses a store and forward strategy. It means that if a consumer happens to be unavailable due to scheduled downtime, a site crash, or a network

partition, the router will attempt to deliver the message periodically until the consumer becomes available. Other problem with the MOMs is that vendors have performed proprietary extensions to the model. Because of that, applications built using systems from different vendors may not be able to exchange messages. To circumvent this problem, the user needs to use (or built) gateways that deal with the particularities of each vendor implementation. Another drawback is that MOM’s non-blocking mechanisms are too heavyweight for many high-performance and real-time applications. Moreover, the message-oriented invocation mechanisms of MOM systems can be harder to program correctly due to the lack of strong type checking.

5

Conclusions

JIC is a communication solution that supports a notion of distributed objects that can be invoked without blocking the invoking object. This approach leads to an event-driven software architecture, in which threads are scoped within components of the distributed application, called access points. Delimiting thread scope (instead of letting threads to traverse the whole distributed application) greatly simplifies the software development. In particular, scoping out threads makes for composable components. JIC also addresses other difficulties of using distributed objects on scenarios in which clients (invokers) have something else to do besides waiting for the server (invoked) to respond, such as thread explosion and the need to decouple failure detection from method invocation. JIC is built over Jabber, which provides native message forwarding, much easing the deployment of JIC-based applications over firewall and NATs. Finally, all of this is done without introducing a performance penalty (when compared with RMI). It is also worth noting that JIC is open source (available at www.ourgrid.org). We are currently porting OurGrid, a free-to-join grid [15], to JIC. The initial impression of the OurGrid programmers was that developing a JIC application requires more effort than its RMI equivalent. This is due to the need of explicitly managing response events. In JIC, the programmer must separate which events are new requests from those that are responses for a request initiated by the component itself. In RMI, this is automatically done. The response is the method return value. However, after a while, the programmers realize that the initial greater effort to develop JIC code pays off. The code seems to be harder to develop in the beginning, but it is easier to get right, just as a code that is developed with automated tests. Understanding the code behavior and debugging it are much simpler. We credit most of this simplicity due to the well-defined thread scoping. However, we also believe that JIC leads to a good distributed programming discipline. By making the event handling explicit, the programmer is invited to think how the code should react to any event (including failures on remote objects), which can happen with any configuration of the component state. While seeming a lot of work, this is truly necessary in a

distributed system. RMI programmers might only realize this after lots of tests and bug occurrences. In short, we believe that RMI is deceively simple. JIC just exposes the real complexity of distributed programming in a way that helps the programmer to deal with it. One exciting possibility JIC opens up is the possibility of obtaining parallelism without writing multi-thread code. JIC access points can be single-threads. If an application has enough of them, this may be enough to explore the application parallelism. Even if this is not the case, one can start with single thread per access point, profile the application, and only invest in multi-threading the bottleneck access points. We are currently investigating these possibilities. On the down side, JIC is language dependent. However, we do not see any obstacle to remove this limitation by using XML to code messages in order to represent method calls in an interoperable way. Some argue that this could require manual coding of the XML to achieve true interoperability [30]. But that seems the price to pay for true multi-language interoperability.

6

Acknowledgments

We would like to thank Ayla Dantas, Marcell Manfrim and Katia Saikoski for the important comments and suggestions to this paper. We also thank the anonymous reviewers for the insightful comments and questions. This work has been developed in collaboration with HP Brazil R&D.

References 1. Waldo, J., Wyant, G., Wollrath, A., Kendall, S.: A Note on Distributed Computing. Technical Report SMLI TR-94-29, Sun Microsystems Labs (1994) 2. Coulouris, G., Dollimore, J.: Distributed systems: concepts and design. AddisonWesley Longman Publishing Co., Inc., Boston, MA, USA (2005) 3. Grosso, W.: Java RMI. O’Reilly & Associates, Inc., Sebastopol, CA, USA (2002) 4. RMI: Java Remote Method Invocation web site. At java.sun.com/products/jdk/ rmi/ (2006) 5. CORBA: Common Object Request Broker Architecture Core Specification. At www.omg.org/technology/documents/formal/corba iiop.htm (2006) 6. Sutter, H., Larus, J.: Software and the concurrency revolution. Queue 3(7) (2005) 54–62 7. Lee, E.A.: The problem with threads. Technical Report UCB/EECS-2006-1, EECS Department, University of California, Berkeley (2006) 8. Schmidt, D., Gokhale, A., Harrison, T., Parulkar, G.: A high-performance endsystem architecture for real-time CORBA. IEEE Communications Magazine 4(2) (1997) 9. Pietzuch, P.R.: Event-based middleware: A new paradigm for wide-area distributed systems? In: 6th CaberNet Radicals Workshop. (2002) 10. Starovic, G., Cahill, V., Tangney, B.: An event based object model for distributed programming. In: OOIS (Object-Oriented Information Systems) ’95, London, Springer-Verlag (1995) 72–86

11. DS-online: Distributed Systems Online web site. Message Oriented Middleware section. At dsonline.computer.org/middleware/intro MOM.html (2006) 12. Monson-Haefel, R., Chappell, D.: Java Message Service. O’Reilly & Associates, Inc., Sebastopol, CA, USA (2000) 13. Sunderam, V.S.: PVM: a framework for parallel distributed computing. Concurrency, Practice and Experience 2(4) (1990) 315–340 14. Forum, M.P.I.: MPI: A message-passing interface standard. Technical Report UT-CS-94-230 (1994) 15. Cirne, W., Brasileiro, F., Andrade, N., Costa, L., Andrade, A., Novaes, R., Mowbray, M.: Labs of the world, unite!!! (Accepted for publication in Journal of Grid Computing. Available at walfredo.dsc.ufcg.edu.br/papers/Labs of the World Unite v19.pdf) 16. Dantas, A., Cirne, W., Saikoski, K.: Using aop to bring a project back in shape: The ourgrid case. Journal of the Brazilian Computer Society 11(3) (2006) 21–35 17. JSF: Jabber software foundation web site. At www.jabber.org/ (2006) 18. XMPP: Extensible messaging and presence protocol specification. At www.xmpp. org/specs/ (2006) 19. Deng, X., Dwyer, M.B., Hatcliff, J., Mizuno, M.: Invariant-based specification, synthesis, and verification of synchronization in concurrent programs (2002) 20. Hayashibara, N., Defago, X., Yared, R., Katayama, T.: The phi accrual failure detector. In: Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems (SRDS04), IEEE Computer Society (2004) 66 –78 21. Felber, P., D´efago, X., Guerraoui, R., Oser, P.: Failure detectors as first class objects. In: Proceedings of the International Symposium on Distributed Objects and Applications (DOA’99), Edinburgh, Scotland (1999) 132–141 22. Tomlinson, R.S.: Selecting sequence numbers. In: Proceedings of the 1975 ACM SIGCOMM/SIGOPS workshop on Interprocess communications, New York, NY, USA, ACM Press (1975) 11–23 23. Henderson-Sellers, B.: Object-Oriented Metrics: Measures of Complexity. Prentice Hall (1995) 24. Schmidt, D., Vinoski, S.: Object interconnections: Programming asynchronous method invocations with corba messaging (1999) 25. Object-Management-Group: Corba messaging specification, omg document orbos/98- 05-05 ed. (1998) 26. Vinoski, S.: New features for corba 3.0. Commun. ACM 41(10) (1998) 44–52 27. Raje, R., William, J., Boyles, M.: An asynchronous remote method invocation (ARMI) mechanism for java. In: ACM Workshop on Java for Science and Engineering Computation, ACM Press (1997) 28. Falkner, K.E.K., Coddington, P.D., Oudshoorn, M.J.: Implementing Asynchronous Remote Method Invocation in Java. Technical Report DHPC-072 (1999) 29. MQSeries: IBM MQSeries web site. At www.ibm.com/software/mqseries (2006) 30. Loughran, S., Smith, E.: Rethinking the Java SOAP Stack. Technical Report HPL-2005-83, Hewlett-Packard Bristol Laboratories (2005)

Suggest Documents