The JEDI event-based infrastructure and its application to ... - CiteSeerX

2 downloads 129465 Views 715KB Size Report
JEDI supports the development and operation of event-based systems and has ... The “external world” might be constituted by a single application placed at a ...
To appear on IEEE Transactions on Software Engineering

The JEDI event-based infrastructure and its application to the development of the OPSS WFMS G. Cugola, E. Di Nitto, A. Fuggetta CEFRIEL – Politecnico di Milano Via Fucini, 2 20133 Milano Italy +39 2 239541 e-mail: {cugola, dinitto, fuggetta}@elet.polimi.it

ABSTRACT The development of complex distributed systems demands for the creation of suitable architectural styles (or paradigms) and related run-time infrastructures. An emerging style that is receiving increasing attention is based on the notion of event. In an event-based architecture, distributed software components interact by generating and consuming events. An event is the occurrence of some state change in a component of a software system, made visible to the external world. The occurrence of an event in a component is asynchronously notified to any other component that has declared some interest in it. This paradigm (usually called “publish/subscribe” from the names of the two basic operations that regulate the communication) holds the promise of supporting a flexible and effective interaction among highly reconfigurable, distributed software components. In the past two years, we have developed an object-oriented infrastructure called JEDI (Java Event-based Distributed Infrastructure). JEDI supports the development and operation of event-based systems and has been used to implement a significant example of distributed system, namely, the OPSS workflow management system (WFMS). The paper illustrates JEDI main features and how we have used them to implement OPSS. Moreover, the paper provides an initial evaluation of our experiences in using the event-based architectural style and a classification of some of the event-based infrastructures presented in the literature. Keywords Event-based systems, distributed systems, software architectures, workflow, business processes, object-orientation, publish/subscribe middleware.

1

INTRODUCTION

Convergence between telecommunication, broadcasting, and computing is opening new opportunities and challenges for a potentially large market of innovative network-wide services. The class of users interested by this revolution is significantly large: families, professionals, large organizations, government agencies, and administrations. The services range from home banking and electronic commerce, to coordination and workflow support for large dispersed teams, within the same company or even across multiple companies. Many research and industrial activities are currently being carried out to identify feasible strategies to develop and operate these services in an effective and economically viable way. The requirements and technical problems that have to be addressed are complex and critical: •

Services must be able to operate on a wide area network with acceptable performance.



The software technology used to implement these services must be “light”, i.e., it should be scalable in terms of the number of both components and users involved and of their distribution.



The technology must enable a “plug and play” approach to support dynamic reconfiguration and introduction of new service components.



Finally, it is essential to support openness and interoperability between different platforms since the services are usually implemented in a heterogeneous hardware infrastructure.

To foster the diffusion of network-wide applications we need to identify proper architectural styles and supporting infrastructures able to cope with the above requirements and challenges. Actually, there is a wide range of distributed architectural styles and middleware infrastructures that have purposely been conceived to address the above issues. Most of these existing styles and infrastructures are based on a point-to-point communication model. For instance, the basic service offered by CORBA [36], RMI [51], and DCOM [20] is the synchronous invocation of a remote service offered by some server over the network. The wide diffusion of the point-to-point communication model has been fostered by the availability of RPC, which is certainly an effective mechanism to implement a wide range of distributed systems. RPC is characterized by a tight conceptual coupling between the component that requests a service (i.e., the client) and the component that satisfies such request (i.e., the server). Before invoking a service, the client has to know the existence of a server capable of satisfying its request and it has to obtain a reference to such server. Even extensions and new facilities of advanced middleware infrastructures such as CORBA Naming Service [37] and CORBA Dynamic Invocation Interface do not depart significantly from the underlying RPC para2

digm. Despite the effectiveness and conceptually simplicity of the point-to-point communication model, many situations require the availability of a more decoupled model. In particular, the communication among the components of a distributed system may involve more than two parties, and may be driven by the contents of the information being exchanged rather than by the identity of information producers and consumers. As an example, let us consider a network management system. In this system, whenever a network node signals a failure, a procedure has to be started to fix the failure. By using an event-base approach the node is simply required to notify the “external world” of the detected failure and can therefore ignore how the failure will be handled. The “external world” might be constituted by a single application placed at a fixed location on the net and in charge of executing the complete recovery procedure. Alternatively, it can be composed of different applications dynamically dispersed across the network and in charge of different steps of the recovery procedure (e.g., logging the failure, reconfiguring a subsystem, etc.). As another example, consider a distributed workflow management system, where, as soon as an activity A terminates, other activities A1,…,An have to be launched. In this case, it is useful to have a mechanism that hides the existence of A1,…,An, to A, and allows A to simply notify the “external world” of its termination. The effect of this notification is hidden to A, thus increasing information hiding and reducing the coupling among components. The two scenarios presented above are not unique as for their communication requirements. In [4] other scenarios that will likely emerge in the next future are presented. A promising approach to address the above issue is the event-based paradigm. The components of an event-based system cooperate by sending and receiving events, a particular form of messages. The sender delivers an event to an event dispatcher. The event dispatcher is in charge of distributing the event to all the components that have declared their interest in receiving it. Thus, the event dispatcher supports a high degree of decoupling between the sources and the recipients of an event. The relevance and potential impact of the event-based paradigm has been acknowledged by OMG that has recently defined an event service on top of the CORBA framework (see Section 5). Nevertheless, we are still far from a satisfactory solution able to address in a coherent and comprehensive way all the issues and problems related to the creation of an effective, networkwide event distribution infrastructure [45]. This observation can be easily verified by checking the large number of initiatives being launched in the area. Several new draft proposals have been submitted to the IETF (Internet Engineering Task Force). Furthermore, the event-based paradigm has been the focus of the first workshop of the series TWIST (The Workshop on Inter3

net-scale Software Technologies) [60]. The workshop has gathered together researchers from leading software industries and from the academia to compare existing approaches and steer future research work on the topic. As a contribution to the ongoing research work, we have developed an event-based, object-oriented infrastructure called JEDI (Java Event-based Distributed Infrastructure) that has been applied, among the others, to the development of a WorkFlow Management System (WFMS) called OPSS (ORCHESTRA Process Support System).1 A WFMS [3, 23] is an environment for developing and executing a process-based application, i.e., a coordinated set of activities involving both humans and computerized tools. Typical examples of the activities supported by a WFMS are business services, such as customer care, interoffice procedures, and software development processes. This paper presents JEDI and OPSS, by highlighting their main features and functionality. It also illustrates some lessons we have derived from the development and operation of JEDI. This paper significantly extends a previously published paper [15], by providing more details on the design choices that guided the development of both JEDI and OPSS, and by introducing new features that were not presented in the previous paper. It also significantly enriches the analysis of the state of the art, and the comparison and evaluation of the related work. The contributions of this paper can be summarized as follows: •

It describes JEDI, an event-based infrastructure suitable to develop a wide range of distributed systems.



It introduces OPSS and discusses the OPSS features that mostly benefit from the adoption of an event-based communication infrastructure.



It presents our experiences in using the event-based paradigm and provides a comprehensive comparison of our work with the state of the art in the field.

Consequently, the paper is organized as follows. Section 2 presents JEDI basic concepts and implementation. Section 3 provides an overview of the architecture of OPSS. Section 4 provides an evaluation of our experience. Section 5 presents the related works. Finally, Section 6 draws some conclusions and proposes future research activities.

1

OPSS has been developed as part of the ORCHESTRA project [34]. 4

2

JEDI: A JAVA EVENT-BASED DISTRIBUTED INFRASTRUCTURE

2.1

High-level architecture of JEDI

Figure 1 describes the logical architecture of JEDI. The infrastructure is based on the notion of active object2 (AO). An AO is an autonomous computational unit performing an application-specific task. Each active object has its own thread of control and interacts with other AOs by explicitly producing and consuming events3. Events are a particular type of message. Conventional messages are sent from a source to one or more recipients, as specified by the source itself. Conversely, events do not include any information about their recipients. A JEDI event is an ordered set of strings: the first string is the event name, while the remaining strings are the values of the event parameters4. It was a deliberate choice to keep the structure of JEDI events quite simple. We will discuss this choice in Section 4.6 and Section 4.7. An event is generated by an AO and sent to a component called the event dispatcher (ED). The ED notifies the event to those AOs that have explicitly declared their interest in receiving it (event recipients). An AO declares the classes of events it is interested in by invoking an event subscription operation. It can also stop accepting events of a given class by invoking the unsubscribe operation. Event subscription and unsubscription can be invoked at any time during the AO lifetime. The notification of events is accomplished asynchronously with respect to their generation.

2

We have not used the term “component” since it is heavily overloaded and could have induced some confusion.

3

In this paper we often use the term “event” with the meaning of “event notification”. We believe that the precise interpreta-

tion of the term can be easily derived from the context. 4

In the remainder of the paper, an event will be represented using a notation similar to function calls in traditional program-

ming languages, e.g., open(foo.c,read), where open is the name of the event, and foo.c and read are its parameters. 5

AO

AO

Event

AO

Dispatcher

AO

AO

= event

Figure 1: A logical view of JEDI architecture.

2.2 2.2.1

Main features of JEDI Event patterns

AOs can either subscribe to a specific event or to an event pattern. An event pattern is an ordered set of strings representing a very simple form of regular expression. The first string of the pattern is the pattern name, while the others are the pattern parameters. Each string of the pattern may end with an asterisk. Given a pattern p, an event e matches the pattern iff the following conditions hold: •

The name of e is equal to the name of p, if the latter does not end with an asterisk. Conversely, if the name of p ends with an asterisk then the name of e must start with the same characters of the name of p (excluding the asterisk). In other words, the asterisk has the same semantics adopted by the Unix and DOS shells.



e and p have the same number of parameters.



Each parameter of e matches the parameter of p having the same position using the asterisk semantic used for event names. This means that for each i, let be ei the ith parameters of event e and pi the ith parameter of pattern p, then either ei equals pi (if pi does not end with an asterisk) or ei starts with the same characters of pi, excluding the asterisk.

For instance, pattern foo(aa*, bb) matches with all the events whose name is foo and having two parameters, the value of the first one starting with “aa” and the value of the second one being exactly “bb”. Another example of pattern is the following: *(*, *, *). This pattern matches with all the events having three parameters, regardless their names and the values of their parameters.

6

2.2.2

Reactive objects

An active object can invoke the basic operations offered by JEDI (e.g., event generation and subscription) in any order. According to our experience, however, some active objects often operate according to a quite standard sequence of operations. Upon activation, an AO subscribes to some events and then starts waiting for their occurrence. When one of these events is notified, the AO performs some operations (possibly generating new events and subscribing or unsubscribing to events) and then starts waiting again. It therefore executes a standard loop: wait for any event among those it has subscribed to, and then process it. We use the term reactive object to indicate this particular kind of active object. The JEDI framework provides programmers with standard classes supporting the implementation of both active and reactive objects (see Section 2.4). The JEDI class used to implement reactive objects (i.e., the ReactiveObject class) exports an abstract method (called processMessage) that is automatically invoked each time the reactive object has to be notified of an event it has subscribed to. Programmers who want to implement a reactive object should subclass the ReactiveObject class and implement the processMessage method. 2.2.3

Distribution of the Event Dispatcher

The event dispatcher is a logically centralized component since it must have a global knowledge of all the events that are generated and of all the subscription requests that are issued. However, a centralized implementation of the event dispatcher can become a critical bottleneck for a distributed system. This happens, in particular, when the system is composed of several Internet-wide distributed AOs that are engaged in an intense communication. In this situation, it is worthwhile to decompose the event dispatcher in several distributed and cooperating components, in order to guarantee an acceptable level of performance. This decomposition, however, requires some coordination protocol to be defined among the event dispatcher components. They, in fact, need to share information about generated events and subscriptions in order to guarantee that agents connected to different event dispatcher components communicate properly. Such coordination protocol has to be carefully designed in order to limit the network load generated by the intra-dispatcher coordination activity. In some cases, in fact, it could happen that this coordination traffic grows more than the traffic generated by AOs, thus resulting in undesired and unacceptable performance degradation. In JEDI we provide two implementations of the event dispatcher: centralized and distributed. The centralized version is constituted by a single (operating system) process and has been developed to address the requirements of simple systems, com7

posed of few AOs, running over a local area network, and exchanging a limited number of events. The distributed version addresses the need of “network-intensive” applications by exploiting a set of dispatching servers (DSs) interconnected in a tree structure. Each DS is located on a different node of the network and is connected to one parent DS (except for the root DS) and to zero or more descendant DSs. Each AO is connected to a DS (not necessarily to the leaves of the tree).

AO

AO

DS

DS

DS

DS

DS

DS

DS

DS

AO

AO

AO

DS

DS

DS

AO

AO

AO

AO AO

DS

Local subscription strategy

Distributed subscription strategy

DS

AO

DS

DS

DS

DS

DS

AO

AO

AO AO

Hierarchical strategy (JEDI solution )

Event Subscription

Figure 2: Alternative strategies for distributed event dispatching.

There are several strategies that can be exploited to distribute events across the hierarchy of DSs (see Figure 2). Two key issues have to be considered in defining such strategies: handling of subscription and unsubscription requests, and distribution of events. A first strategy (called local subscription) exploits a very simple approach. Each subscription request is recorded locally by the DS that has received it from the issuing AO. When an event is generated, it is distributed to all the DSs in the tree and each DS decides autonomously to which AOs the event has to be sent. A somewhat dual strategy (called distributed subscription) is based on a radically different approach: subscriptions are distributed to all the DSs in the tree. In particular, the DS that has received the subscription from the issuing AO registers itself to its parent and descendants, which, in turn, register to their parent and descendants (with the exclusion of the DS the subscription comes from), and so on. In the distributed version of the JEDI ED, we are exploiting an intermediate solution that we call hierarchical subscription strategy. In this strategy, sub8

scriptions are propagated only upwards in the DS tree. So that only the ancestors of the DS that has accepted the event subscription request from an AO will eventually receive it. Consequently, when a DS receives an event from one of the objects that are connected to it (either an AO or another DS), it dispatches the event to the following entities: i.

its parent, if this is not the one that has propagated the current event;

ii.

the subset of its descendants that are subscribed to an event pattern that matches the received event;

iii.

the AOs that are directly connected to the DS and that are subscribed to an event pattern that matches the received event.

One may argue that JEDI events are dispatched upward to the top of the ED hierarchy even if, in principle, this might turn out to be unnecessary. For instance, the event shown in the example of Figure 2 (bottom diagram) is generated and has to be received by the AOs attached to the same sub-tree. Nevertheless, it has to be propagated up to the top of the hierarchy since in some other sub-trees a subscription matching the event could have been issued and this is unknown to the intermediate DSs handling the event. The other strategies however have their own advantages and disadvantages too. For instance, the distributed subscription strategy allows events to be distributed through the minimal path since each DS is able to build the path that events have to follow to reach all the parties interested in receiving them (see [7] for the presentation of an optimized algorithm to calculate this path). This approach, however, has the disadvantage of requiring a potentially high number of messages to be exchanged each time a new subscription (or unsubscription) request is issued. Therefore, this strategy is effective when the number of events is sensibly larger than the number of subscriptions and unsubscriptions. The approach adopted in JEDI represents a reasonable compromise that is expected to operate satisfactorily in a variety of situations. Colleagues at University of Colorado at Boulder and University of California, Irvine, are conducting a detailed and quantitative analysis of possible alternative strategies [45]. As a concluding remark, notice that in JEDI AOs behavior is not influenced by the implementation strategy chosen for the ED. The decision of exploiting the centralized or the distributed version only affects the overall performance of the system, but it does not have any influence on the way Aos are implemented. 2.2.4

Preservation of event ordering

In general, in a distributed system a crucial issue is to establish a relationship between the order according to which messages 9

are generated and the order in which they are received. Actually, none of the communication mechanisms traditionally used over the Internet guarantees a total ordering of messages since latency is extremely variable. With RPC or RMI, for instance, two clients might have invoked the same remote method in an order that is different from the one seen by the component that receives there method invocations and implements the corresponding method. Consider for instance two clients A and B that invoked a method M at time t1 and t2 respectively, being t1 < t2. Let’s call d1 and d2 the time needed to deliver the method invocation request to the machine storing the method code (i.e., the server). Due to the variable latency of the network, it may happen that d1 > (t2 – t1 + d2). If this is case, the server observes two method invocation requests in an order that is not consistent with the original ordering of the clients’ operations. In most cases developers live with this problem, since its solution would require the implementation of a global clock that would be really expensive to manage. Also, most distributed systems are built around a centralization element that naturally introduces a serialization of the messages it receives, thus defining a total ordering that is not necessarily the actual order in which messages are generated, but that can still considered acceptable. The ordering problem, however, becomes critical when a distributed event dispatcher comes into play. In this case, two events coming from different sources can be delivered to some recipients without passing through the same serialization point. This means that not only the order in which events are generated is not the same order in which they are received by one recipient, but also that different active objects can receive these events in a different order. The order of event delivery can be guaranteed only if events are tagged with a timestamp, a global clock among all the DSs is assumed, and the communication network provides a guaranteed fixed latency time. The problem with such an approach is that it is not suited to wide-area networks, which have a largely variable latency time. All these problems are considered and discussed in detail in [35]. As discussed in Section 1, one of the main goals of JEDI is to support the development of distributed applications composed of a large number of components, and this contrasts with the assumption of having a global clock. Therefore, we have chosen to guarantee only a particular form of partial ordering among events, i.e., causal ordering [30]. Events e1 and e2 are delivered according to a causal ordering policy when if event e1 has been caused by the generation of event e2, then any AO registered to receive both e1 and e2 will receive e1 after e2 and not vice versa. A special case of causality is the relationship among events generated by the same AO. Thus, JEDI ensures that the events generated by a given source are delivered to all the interested recipients in the order they have been published. While ordering among causally related events is guaranteed, JEDI users should not make any assumption on the ordering of events not related by causality. 10

As a final remark, our experience in implementing several event-based applications, including OPSS, has shown that causal ordering guarantees that pairs of components can synchronize through the generation of events. Therefore, we argue that our choice of guaranteeing only causal ordering is acceptable compared to what is offered by most applications where the order in which events (or messages) from different sources are received does not provide a trustable indication of the order in which they have been generated. 2.2.5

Mobility

The ability to move running application components across the nodes of a network is currently a hot topic in software engineering research [22]. Mobility can be used to reduce network traffic, since applications can be moved (or can autonomously move) close to the resources they need for their execution. Indeed, it can be used to implement applications whose graphical front-ends (and their state) follow nomadic users during their migration from site to site. Mobility is fairly orthogonal to the event-based paradigm. At the same time, our experience in using JEDI convinced us that the event-based communication paradigm is particularly suited to support communication among mobile components (some basic form of event communication is also provided by the Aglets mobility platform [31]). In fact, the decoupling among the components introduced by the event dispatcher allows each component to operate independently of the physical location of the other components. Supporting mobile AOs imposes specific requirements on event-based infrastructures. In particular, if an AO can move, it is natural to require that, while it is moving from one place to another, the event-based infrastructure stores the subscriptions it has issued and the events that are generated in the meanwhile and that match the subscriptions. JEDI offers two operations to handle mobility of active objects: moveOut and moveIn. By invoking the moveOut operation an AO is able to temporarily disconnect from the event dispatcher. Through the moveIn operation, the AO can reconnect to the dispatcher at a later time. While the AO is disconnected the event dispatcher stores the event patterns the AO is subscribed to, so that when it reconnects it does not have to re-subscribe. Moreover, at moveOut time, the AO can request the event dispatcher to store all the events it has subscribed to for the time it will be disconnected. When the AO reconnects, it receives all the events generated during the disconnection that match its subscriptions. The event dispatcher delivers these events according to the causal ordering rule presented in Section 2.2.4. When the event dispatcher is distributed, the AO can either reconnect to the dispatching server it was initially connected to or it can connect to another dispatching server. In this last case, the new dispatching server engages a 11

direct communication with the old dispatching server in order to obtain information about all the subscriptions issued by the AO and all the events that have been buffered on behalf of it. Moreover, the new dispatching server communicates with its parent dispatching server to notify that all the new events addressed to the AO have to be routed through a new path. The moveOut and moveIn operations enable the exploitation of JEDI in conjunction with frameworks for building mobile agents, such as Aglets [31] and µCode [42]. It is therefore possible to implement generic mobile active objects (agents in the mobility community) that interact though events. In this case, the code mobility environment handles the state of the moving AOs, while JEDI deals with enqueueing and redistributing events on behalf of temporarily disconnected active objects. We are currently experiencing the integrated usage of JEDI and µCode [41]. Event-based infrastructures themselves can be profitably enriched with some basic mobility features. These features could be exploited by programmers who do not need all the features (security, management of remote resources, naming services, ...) provided by a full fledged mobility framework, but can still take advantage from moving components from one site to the other. To address this need JEDI offers a specific mechanism to move reactive objects. In JEDI a reactive object can move to a different host by invoking the move operation, which causes the following actions to occur: i.

The reactive object is temporarily disconnected from the ED (i.e., the moveOut operation is invoked) and the thread of control executing the reactive object main loop is stopped.

ii.

The state of the reactive object (i.e., the value of its attributes) is serialized and stored using standard Java facilities [50].

iii.

The reactive object is moved to the new location through a network connection. At the destination host, the reactive object is restarted and it is reconnected to the ED (i.e., the moveIn operation is invoked).

2.3

Summary of the JEDI operations and features

In summary, JEDI offers the following set of operations that can be invoked by any AO: •

open. It opens a connection with the event dispatcher. This is the first operation that any active object has to invoke.



close. It closes the connection with the event dispatcher.



subscribe. It subscribes the issuing AO to the set of events that matches a given event pattern. 12



unsubscribe. It removes an existing subscription.



dispatch. It allows AO to generate an event notification.



getEvent. It retrieves the first event addressed to the AO from the queue of events associated to the AO.



hasEvents. It checks if the queue of events associated to the AO contains any event.



moveOut. It is used to temporarily disconnect from the event dispatcher.



moveIn. It is used to reconnect after a moveOut.



move. It is used by reactive objects to move to another location.

The event-based communication style used in JEDI is characterized by the following properties: •

It is asynchronous.



Only the subscribers of an event will receive it.



The source of a communication does not specify the destination of the communication.



The destination of a communication does not necessarily know the identity of the source.



Events are guaranteed to be received according to the causal relationships that hold among them. This property is guaranteed even in presence of mobile AOs.



An AO can disconnect from the dispatching server it is connected to and reconnect at a later time from a different host to a different dispatching server. JEDI stores the AO’s subscriptions and, if required, the events addressed to the AO while it is disconnected.



Reactive objects are provided with a special operation that allows them to autonomously move from a host to another without loosing any of the events they have subscribed to.

2.4

The implementation of JEDI

JEDI has been implemented as a framework of Java classes. The framework includes the event dispatcher and the classes needed to develop active and reactive objects (organized as two Java packages). Package polimi.jedi contains the classes needed to implement active and reactive objects. Package polimi.jedi.dispatcher, includes the classes that implement the event dispatcher. Figure 3 and Figure 4 describe the UML logical design of the two packages. 13

Each active object communicates with the event dispatcher through the methods offered by interface ConnectionToED (Figure 3). This interface includes all the operations listed in Section 2.3. It hides the implementation details of the communication between the AO and the event dispatcher. By taking advantage of this design choice, it is possible to change the implementation of the ED (e.g., to move from the centralized to the distributed ED) without impacting on existing AOs. Currently, the infrastructure provides two implementations for interface ConnectionToED through classes RMIConnectionToED and SocketConnectionToED. The former uses RMI to connect to the event dispatcher, while the latter uses standard TCP/IP sockets. An ad-hoc, eight bit protocol has been developed to send and receive events and event subscriptions over plain sockets. Plain TCP/IP communication can be used to implement components that have to run in an environment that does not support RMI (e.g., a Java 1.0 virtual machine). Furthermore, TCP/IP connections allow non-Java active objects to exploit the features of the JEDI event dispatcher. JEDI provides an abstract class called ReactiveObject to implement reactive objects. Users may easily implement new reactive objects by creating subclasses of ReactiveObject. These subclasses have to provide a suitable implementation for the abstract method processMessage. This method is called each time a new event is received. Each reactive object uses a RMIConnectionToED instance to communicate with the event dispatcher. Figure

4

illustrates

the

most

important

Java

classes

used

to

implement

the

event

dispatcher

(package

polimi.jedi.dispatcher) and their relationships. An instance of class EventQueue stores the queue of events that have been received and not yet dispatched, while an instance of class Register contains all the subscriptions that have been received by the ED. RMIBasedED is the main class. Each RMIBasedED instance constitutes what we called a dispatching server in Section 2.2.3. The relation between RMIBasedEDs shown in Figure 4 models the connection among different dispatching servers to create a distributed event dispatcher. Each RMIBasedED instance is an RMI server that exports the services used to publish, receive, subscribe to, and unsubscribe from events. Moreover, each RMIBasedED instance acts also as a TCP/IP daemon, waiting for TCP/IP connections from the AOs that are interested in using TCP/IP to communicate with the event dispatcher. Each time a new TCP/IP connection is opened, a CommunicationThread instance is created to manage such connection. As discussed in more detail in Section 5.1, a key distinctive feature of an event-based infrastructure is the set of mechanisms used to observe and notify events. There are basically two approaches: push and pull. In a push approach events are pushed 14

from the source to the event dispatcher (observation) or from the event dispatcher to the recipient (notification). The pull model assumes that it is the event dispatcher that “pulls” events from the source (observation) or the recipient that “pulls” them from the event dispatcher (notification). In JEDI, event observation is always based on a push approach, i.e., it is always the producer that contacts the event dispatcher to deliver an event. Conversely, event notification can be accomplished using either a pull or a push approach. Indeed, JEDI active objects exploit a pull approach, while reactive objects are based on a push behavior. EventDispatcher (from polimi.jedi.dispatcher)

connectedTo * ConnectionToED

RMIConnectionToED

EventQueue receivedEvents

SocketConnectionToED

uses ReactiveObject

Figure 3: Package polimi.jedi.

EventDispatcher parent son

*

Register

RMIBasedED

EventQueue (from polimi.jedi)

* * CommunicationThread

Figure 4: The event dispatcher (package polimi.jedi.dispatcher). 3

OPSS

WFMSs (WorkFlow Management Systems) support human beings in the execution of processes (also called workflows). There are several examples of processes in many domains of our society, ranging from traditional accounting and IS processes to 15

engineering processes such as those used to develop software. Usually, WFMSs exploit some form of process model, i.e., a formal description of the steps to be carried out to pursue the business objective. The core component of a WFMS is the process engine. It enacts the process model and, by doing so, it guides and supports human actors in the accomplishment of their activities. It also guarantees that the actions performed by human actors are coherent with the constraints and properties mandated by the process model. Finally, it automates the execution of repetitive tasks. Process engines usually exploit a database that persistently stores the current state of the process. Another important element of WFMSs is the user interaction environment. It is usually composed of a number of tools such as an agenda, a mailing client, and other specialized elements. These tools allow human actors to be notified of their assignments and to perform proactively actions that push forward the state of the process. OPSS (ORCHESTRA Process Support System) is a WFMS that has been developed as part of the ORCHESTRA project. ORCHESTRA (Open aRCHitecture for supporting Enhanced Services in inTegRAted broadband networks) is a retailing infrastructure supporting the development, deployment, and operation of multimedia services [34, 17]. It allows users distributed over a wide-area network to transparently access services from several types of terminals. It also supports nomadic users: they can access the ORCHESTRA environment without being constrained by their physical location. In ORCHESTRA, services can be distributed and replicated across the network, depending on load balancing needs. Users do not need to be aware of such distribution and replication, since ORCHESTRA is in charge of locating and executing services on their behalf. Within the ORCHESTRA context, OPSS has been specifically conceived to support the design and operation of sophisticated process-based services. Examples of such services are electronic commerce, customer care, and remote education. We call them business services. Basic requirements for these services can be summarized as follows: •

Services have to be scalable with respect to the distribution of the involved users and operators.



Services have to cope with a number of users that changes dynamically. Depending on particular circumstances, the number of customers can vary from tenths to thousands of people. Services must be able to cope with this variation. This is quite unusual in traditional process-based activities where the number and the identity of actors are quite stable.



The user interaction environment provided by services has to be dynamically deployed onto the customer terminals. In ORCHESTRA, in fact, we make the assumption that each service is allowed to install on the user termi16

nal all the components needed to support service fruition. To address these requirements we implemented OPSS on top of the JEDI framework. In the remainder of this section we present the main characteristics of OPSS and its architecture. 3.1

The Architecture of OPSS

In OPSS, the activities that constitute a process can be executed by human agents or by some computerized support. The executors of process activities are collectively called agents. Each agent receives an activity description (i.e., a process model fragment) and executes it. An activity description may be specified in any language that can be understood by the agent that is supposed to execute it. OPSS exploits three types of agents: software agents, human agents, and external tools.

Human Agent

Activity Description SoftwareAgent

Tool

Agenda

Event Dispatcher

State Server

OPSS Viewer

Figure 5: The ORCHESTRA Process Support System.

Software agents are computerized interpreters of executable activity descriptions. In the current implementation of OPSS, we have taken a very simplistic approach: activity descriptions for software agents are simply coded in Java (exploiting a set of classes offering specific process semantics, see later on) and are defined as sub-classes of ReactiveObject. Thus, software agents are Java interpreters.5 Human agents are persons executing creative, human-specific activities (e.g., customer service operators). Activity descriptions executed by human agents can be written in natural language or in any simple graphical format that is understood by the agent. Activity descriptions for human agents are received and visualized by the Agenda tool.

5

In principle, it is possible to introduce additional software agents for full-fledged process modeling languages without im-

pacting on the general architecture of the environment. 17

External tools are components that execute business-specific activities (e.g., a configuration management procedure). The activity description for an external tool is just the set of information needed to launch and operate the tool (e.g., the initial parameters). External tools can be either OPSS-compliant or off-the-shelf tools. The latter have to be interfaced with OPSS through some proper gateways/wrappers. JEDI class ConnectionToED supports the programmer in the implementation of both tools and gateways. 3.1.1

State Server

As any other WFMS, OPSS has a persistent repository storing the state of the enacting process. This component is called State Server and mirrors the state of all the process entities. According to the formalization proposed by the Workflow Coalition [60], the key entities of a process are activities, agents in charge of executing them, resources (e.g., tools and devices) used to carry out the activities, and artifacts used as inputs or produced as results by activities. These entities are represented in the State Server as a set of objects, called process entity representatives, each containing a detailed description of a specific process entity. These objects constitute a reification of the process state [53]. The State Server subscribes to events such as login of users and creation of new activities, artifacts, or resources. When one of these events occurs (e.g., a new activity needs to be started), it creates the corresponding process entity representative. Process entity representatives show a reactive behavior themselves. In particular, they have a state, subscribe to events, and react to them according to rules that define the set of admissible transitions between states. Process entity representatives are organized in a class hierarchy rooted at ProcessElement (see Figure 6), that, in turn, is a subclass of ReactiveObject. The subclasses of ProcessElement are the following ones: •

AgentInfo. This class defines the possible states of process agents. They are Available (i.e., the agent can be assigned to the execution of an activity) and NotAvailable.



ActivityInfo. This class is used to maintain information on the activities of the process. An activity can be in one of the following states: Defined, Assigned, OnGoing, Suspended, Terminated, Aborted. These states will be presented in more detail later on.



ArtifactInfo. This class defines the information concerning documents and data manipulated in the process. The possible states are Created, OnEdit, Edited, and Destroyed.



ResourceInfo. This class defines information about the tools that can be invoked or used by OPSS (e.g., the 18

executable code of the Java interpreters or of an external tool, devices such as a printer or an audio device). The possible states are Available and NotAvailable.

StateServerRMI

ReactiveObject

1..1

1..1 StateServerRMI_Impl 1..1

0..* ProcessElement

ArtifactInfo

ResourceInfo Creates/ modifies AgentInfo

Uses

IsExecutedBy ActivityInfo

ToolInfo HumanAgentInfo

Precedes

SoftwareAgentInfo

Figure 6: Process entity representatives and the State Server structure.

Each of the above classes is associated with a finite state machine (FSM) called life cycle. A life cycle defines the set of events the process entity is interested in and the set of admissible transitions between states. A transition is defined by a triple: triggering event, condition, and action. With this respect, transitions are similar to ECA rules in active databases (see Section 5.1 for a brief description of ECA rules). When an object receives an event Ei in a state Sj, all the transitions having Sj as initial state and Ei as triggering event are evaluated for firing. One of the transitions whose condition evaluates to true is nondeterministically fired. The firing of the transition causes the execution of the action part and moves the instance to the target state. The execution of the action part of a state transition can produce new events that may influence the behavior of agents and the state of other objects in the State Server.

19

AssignAgent Defined

Assigned WillingToStartActivity Suspend OnGoing

Suspended

Resume

Abort Abort Abort

Terminate

Abort Aborted

Terminated

Figure 7: The Activity life cycle.

As an example, Figure 7 shows the life cycle associated with class ActivityInfo. Upon creation, the state of an object AI of this class is set to Defined. In this state AI is characterized solely by a unique identifier and by an activity description. AI can enter state Assigned when it receives event AssignAgent(activityID,agentID), i.e., an agent has been selected to execute the activity. The transition to state Assigned can only be executed if the instance of class AgentInfo representing agent agentID is in state Available. This state transition triggers the production of event AgentAssigned(activityID, agentID) and the transition of the AgentInfo instance into the state NotAvailable, if the agent results completely booked after the current assignment. Agendas subscribe to event AgentAssigned to provide human agents with information about their assignments. In the state Assigned, when AI receives event WillingToStartActivity(agentID, activityID), it checks if the preceding activities have been terminated. If this is the case, it moves to state OnGoing, and produces event ActivityStarted(activityID,AD-URL). This event must be subscribed by the agent assigned to activity activityID or, if he/she is a human agent, by his/her Agenda. Parameter ADURL contains the location of the activity description to be executed. If for any reason activity activityID cannot be started when event WillingToStartActivity is received (e.g., it has to wait the termination of some other activity), AI produces an event to warn the requesting agent. Beside its event-based interface, the State Server exports a set of RMI services through which any OPSS component can query the state of the running process (i.e., of the process entity representatives). These services constitute a synchronous interaction 20

mechanism that is not directly supported by JEDI. In the current OPSS prototype the State Server is implemented as a centralized component. This can have negative effects on the scalability of the system: the State Server can become a bottleneck for the operation of the system, especially when agents are distributed over a wide area. We are working on developing a new distributed State Server. Since process entities within the State Server are autonomous objects that evolve according to their own FSM and communicate with all the other entities (including the ones that are running in the same State Server) through events, it is relatively easy to distribute them over a number of State Servers. The main issue to be dealt with concerns the creation of these process entities. This activity is currently performed by the centralized State Server based on the events received by agents. In the distributed implementation, State Servers would have to be coordinated in order to avoid that more than one State Server creates a copy of the same entity. Otherwise, multiple copies of the same entity would generate duplicated events that would have to be handled by the other components of OPSS. 3.1.2

OPSS Viewer

OPSS Viewer is a monitoring tool that provides information on the state of the process. When it is launched, it sends the event StartMonitor to notify other OPSS components of its creation. Each process entity representative has been implemented to subscribe to this event and to react to its occurrence by generating a proper response event. The response event carries information on the current state of the process entity representative. The Viewer collects all these events and exploits them to provide human agents with an initial visualization of the process state. After terminating this initial setup, the Viewer listens to all the events that notify specific state changes occurring during the normal execution of the process, and use their contents to update the information offered to the human agent. It is interesting to note that multiple viewers can coexist without interfering with each other and with the process being executed. They can subscribe to the same events and, based on the information carried out by such events, can provide human agents with different and independent representations of the same process. Figure 8 and Figure 9 show the process representation of two different viewers we have implemented so far. In the viewer shown in Figure 8 the process is represented in terms of the process entities stored in the State Server. The rightmost window in the figure illustrates the set of process entity representatives of the technology advisor process that will be presented in more detail in Section 3.2, while the leftmost window describes the lifecycle of a particular process entity repre21

sentative and its current state. In the viewer shown in Figure 9 the process is represented in terms of the sequence of activities that constitute the process, and of the input-output and control-flow relationships. The description is given in a standard notation called IDEF0 [18]. The diagram is animated by changing the color of the activities being executed. The control signals represent the events received by the activity representatives.

Figure 8: The user interface of the basic OPSS viewer.

22

Figure 9: The user interface of the IDEF0-based OPSS viewer.

3.2

An example of business process implemented in OPSS

To validate the JEDI/OPSS approach, we have implemented an ORCHESTRA service called technology advisor. This service provides users with information and recommendations about technological problems. In particular, a user can login to the service and can browse through the subjects supported by the technology advisor. Each subject is associated with several multimedia documents and services. In general, documents are automatically downloaded and displayed on the user’s computer. Services include the possibility to set up synchronous conversations with experts and to send them asynchronous multimedia messages. The technology advisor manages the interaction between users and experts, according to the subject and to the experts’ workload.

23

We have started the implementation of the technology advisor process by identifying its main entities and their representatives in the state server. The artifacts used or produced during the process are technical papers, presentations, and videos. The agents operating the process are the human experts and the Java interpreters in charge of executing the automated activities. The resources exploited during process enactment are the remote conferencing system provided by ORCHESTRA, a search engine that makes it possible for customers to browse the information provided by the service, and some agendas provided both to customers and human experts. There are three process activities, one executed by human agents and two by software agents. The automated activities are manageUserAccess, in charge of authenticating users who access the service, and manageUserInteraction, in charge of reacting to the requests of a specific user. A new instance of manageUserInteraction is created each time a new user enters the service. The activity executed by the human experts is called manageMeeting. When executing it, an expert instructs one or more customers on a specific subject. He/she interacts with customers through the ORCHESTRA remote conferencing service. As an example of execution of the technology advisor, consider the case in which a customer requests through his/her agenda to have a synchronous conversation with an expert. As a result, the agenda generates an event AskForMoreInfo. This event is handled by the manageUserInteraction activity description, which, in turn, queries the State Server (through its synchronous RMI interface) to check if at least one expert skilled in the subject specified by the users is available. If this is the case, it generates an event CreateNewActivity that, in turn, causes a representative of the manageMeeting activity to be created in the State Server. As soon as manageUserInteraction is acknowledged of the creation of the new activity, it generates the AssignAgent event, specifying as a parameter the identifier of one of the available agents. This event is received by the manageMeeting representative that, if the agent is still available at that time, changes its state to Assigned (as specified in Section 3.1.1) and generates an AgentAssigned event. Upon receiving it, the expert's agenda issues a willingToStartActivity event, which, in turn, causes the manageMeeting representative to change its state to OnGoing and to request ORCHESTRA to start a remote conferencing service session between the customer and the expert. Notice that if the delegated agent is not anymore available when the manageMeeting representative receives the event AssignAgent, an error notification is generated. This is received by manageUserInteraction that tries to delegate a new agent.

24

4

EVALUATION

The development of OPSS has demonstrated that the main advantage of the event-based paradigm supported by JEDI is the easy re-configurability of the system. However, our experience has also identified some problems and open issues that will e discussed hereafter. 4.1

Synchronous vs. asynchronous communication

In JEDI, active objects communicate using a pure event-based style. Namely, the only mean for an active object to send (receive) an information is to generate (receive) an event. Events are sent and received in an asynchronous way. We have noticed that in many situations an active object, after generating an event, needs some response from the recipient(s) of the event in order to continue its processing. For instance, consider the case in which an agent needs to notify the State Server that a new activity has to be created and that this activity has to be assigned to a certain agent. The agent executes the following code fragment: sendEvent("CreateNewActivity(ActID,ActType)"); sendEvent("AssignAgent(ActID,AgentID)"); The execution of this code might be erroneous because of possible race conditions. For instance, the State Server might be unable to react to event CreateNewActivity properly. This may happen if the State Server fails in creating the corresponding ActivityInfo object before the event AssignAgent is produced. As a result, the event AssignAgent is lost since the ActivityInfo object would be late in subscribing to it. Thus, in this case the State Server would not be able to properly keep track of the agent assignment. To avoid this situation, it is useful for the agent to receive the confirmation of the creation of the ActivityInfo object before generating the next event. In OPSS we have obtained this behavior by programming the event recipient to produce an event that acts as a “response” to the initial event. This way, the source of the initial event can explicitly subscribe to this event and wait for the event occurrence before producing the AssignAgent event. This solution is quite cumbersome and expensive, since it requires the exchange of a high number of messages between the event source, the recipient(s), and the event dispatcher. An alternative solution would be to explicitly define in JEDI the concept of “return value” from the event recipient(s) back to 25

the agent that has generated the event, and to provide the programmers with mechanisms to easily manage these values. In particular, we are currently introducing in JEDI an additional synchronous operation for event generation that requires a “return value” from the recipient(s) of the event. The execution of this operation allows an active object to send an event to the dispatcher and wait until some information is returned from the event recipient(s) or, if no object is interested in the event, from the event dispatcher. When the event has multiple recipients, several policies can be envisaged to manage the return values. For instance, the source can wait for the first return value, or it can wait until all the recipients have provided a response. In this latter case the event dispatcher should inform the source of the number of return values that it should receive. Notice that this additional synchronous mechanism still preserves the anonymity of the recipient(s) of the event, since the exchange of return value can be still managed by the event dispatcher. More in general, the mechanism preserves the basic semantics of events (multicast dispatching and anonymity of both source and recipients) and introduces a significant amount of flexibility and optimization in the management of complex agent interaction patterns. 4.2

Event granularity

We have experienced a significant problem in identifying the events to be exchanged among agents. If the granularity of events is very fine, many events have to be generated, since each of them has a poor or limited meaning. This choice might significantly complicate the programming activity, reduce the performance of the system, and make it difficult to test and monitor the system. On the other side, a too coarse-grained definition of events might hide inside agents significant operations that must be made visible to the rest of the system. For instance, consider the example presented in the previous section. In that case, the events CreateActivity and AssignAgent (that gave us several synchronization troubles) could have been replaced by a unique event carrying the information about both the creation of the activity and its assignment to the specified agent. This design choice reduces the number of exchanged events but modifies the semantics of activities: any activity can be created only if a proper executing agent has been already designated and if the creator of the activity is aware of this designation. There is no universal solution to this event design problem. It is the designer’s responsibility to evaluate the trade-off and select the most suitable solution, based on the constraints and requirements of the problem being addressed. Still event-based infrastructures can support designers in this decision by providing suitable event composition languages and mechanisms that allow higher-level events to be synthesized from lower-level events. In [13] we approach this problem by introducing into the event dispatcher a new component called event filter. The event filter captures all the events generated by components and uses them to synthesize new events according to the guidelines provided by a set of filtering rules. A filtering rule is composed of 26

two main parts: an event expression and an event generator. The event expression defines the specific combination of input events to be recognized in order to produce a new event called filtered event. The event generator indicates how to compute the filtered event. Filtering rules can be posted to the event filter by components according to their specific needs. Given a certain level of granularity of the events generated by components, the event filter allows developers to increase the granularity of the events received by other specific components by defining proper filtering rules. We are currently assessing the approach sketched above. Similar approaches are presented in Section 5. 4.3

Remote procedure call vs. event-based design paradigms

The event-based paradigm represents a significant shift with respect to the traditional synchronous remote procedure call approaches. In a remote procedure call approach, interaction between components occurs when a component is not able to perform some operation and asks some other component to do it on its behalf. In an event-based approach, components are autonomous entities that inform the “external world” of the main changes occurred in their internal state or in the state of the components and devices they can observe. The notification of an event is seen by a component as an external stimulus that can determine a change in its internal state. Thus, collaboration among components is indirect. Based on this consideration, a main step in understanding the advantages and drawbacks of the remote procedure call and event-based design paradigms should be the identification of the classes of systems that better suit each approach. Since they address different requirements, we are convinced that event-based and remote procedure call approaches are not alternative. Instead, they can be profitably integrated even in the same system. In OPSS we have tried to use the event-based approach to guarantee autonomy of process agents and re-configuration of the system, and we also exploited the remote procedure call approach to query the global state of the process maintained by the State Server. We are aware, however, that a systematic study and characterization of the problem is definitely needed. 4.4

Network-wide event distribution

The development of OPSS has emphasized the need for powerful and efficient mechanisms to support the notification and distribution of events on a network-wide scale (e.g., on the Internet). The event-based infrastructure must guarantee that the services implemented on top of it are made available to users dispersed over the Internet. The hierarchical ED we implemented may represent an initial solution to the problem. However, there are still a number of issues to be addressed. First, as we have underlined in Section 2.2.3, several other event routing strategies can be envisaged. Second, connection topologies alternative 27

to the hierarchical one have to be evaluated. Finally, the impact of the expressive power provided by the subscription mechanism on the performance of the system has still to be analyzed. Colleagues at the University of Colorado at Boulder and University of California, Irvine, are addressing these issues by defining and assessing new architectures for distributed EDs [44, 45]. 4.5

Mobility

We argue that the features for event buffering and forwarding provided by JEDI to support mobility of active objects represent a powerful mechanism for implementing sophisticated applications. However, these features may introduce several problems when combined with ED distribution. The ED, in fact, has to provide specific mechanisms to guarantee that mobile objects do not receive duplicated events and that the original ordering of events is respected. We provided a specific solution for our hierarchical ED, but the impact of this issue on alternative ED architectures is still to be understood. Also, we still lack an extensive experimentation of mobility since it was not exploited in the OPSS implementation. 4.6

Event structure

A JEDI event is a simple sequence of strings. An alternative solution could have been to define events as Java objects. In principle, by exploiting the Java serialization API, it is possible to transmit any Java object through a network connection. As a consequence, we could have defined a class Event with a minimal set of attributes and methods, allow programmers to specialize it, and use the resulting subclasses to instantiate their own events. A similar solution has been adopted for the development of the C2 event-based infrastructure (see Section 5.1). We have chosen a simple event structure for the sake of flexibility and interoperability. Indeed, even if a more sophisticated solution had enriched the semantics of events significantly, it would have also introduced several constraints on the ability of exploiting different languages to develop active objects. Moreover, the event structure we have selected, even if simple, makes it possible to implement an easy to use and expressive event subscription operation. Other structures for events are presented in Section 5.1.1. 4.7

Global vs. local type system

A critical issue in the development of JEDI has been the selection of the type system exploited to create and distribute events. There are at least two basic alternatives. In a first scenario, the space of event types exchanged through the event dispatcher is global. This means that all the components in the event-based infrastructure see and use the same set of event types. A second alternative is based on the assumption that there is no global type space. Each constituent of the event-based infrastructure can 28

produce events without referring to a specific type. A subscriber is supposed to know the structure of the event being received, while this structure is completely hidden to the event-based infrastructure. Certainly, a global type system may result desirable, since in general it makes it possible to perform significant consistency and compatibility checks on the information being exchanged. Still, we have preferred to adopt the second solution. This choice is justified by the fact that an important requirement underlying the development of JEDI is its ability to operate over the Internet effectively. The experiences of the past years have demonstrated that it is extremely difficult to define type systems at the Internet-level, where it is necessary to cross company boundaries and involve millions of independent users. The issue is not merely technical: it is also related to scalability and ease of operation. The Internet is inherently decentralized and based on autonomous and independent operators. Type compatibility cannot be enforced by an explicit, network-wide (and thus conceptually centralized) type system; rather, it is the result of a set of simple and voluntarily conventions. MIME is a typical example of such an approach. It does not define the structure of the different files being exchanged over the Internet. MIME is used just to label the documents being exchanged so that each party can access them according to agreed procedures and tools (e.g., a “text” file is what you can usually open with an editor). This position is far from being consolidated and accepted in the community, as demonstrated by the debate that took place at WISEN 98 [60]. As a last remark, we argue that the two issues related to event structure and event type system are fairly orthogonal. For instance, it is possible to offer the ability to create complex, structured events without using any global type space. Conversely, one may use a very simple structuring mechanism and enforce a global type system. Certainly, the overall complexity of the system tends to increase significantly as we integrate different features. This issue is further discussed in the next section.

4.8

Putting everything together

Several colleagues have argued that event-based infrastructures are not particularly new. These infrastructures have been around for years now, and therefore they might be considered consolidated technologies. Indeed, the growing number of commercial systems being introduced in the market seems to support this claim. We argue that this observation is only partially true. We are certainly in a phase where event-based systems have reached a significant level of maturity. This makes them suitable to implement complex and critical applications such as a trading system for the stock market [57]. Still, our experience in the development of JEDI/OPSS and the analysis we have conducted of the state of the art in the field (see Section 5) have identified two critical open issues: 29



As discussed in the previous sections, there are many facets and features that characterize an event-based infrastructure. The critical point is not to support these features singularly; rather it is the identification of a reasonable compromise to integrate them in a feasible and viable way. For instance, the interaction of features such as full support to mobility, distributed event dispatching, and a global, object-oriented type system may result in a very complicated and inefficient solution that is not applicable in real, Internet-wide applications. JEDI is an attempt to identify a reasonable compromise among this number of often conflicting requirements and features.



It is not altogether clear if it is reasonable to envision a single general-purpose event-based infrastructure. Given the wide variety of features and application domains, one may even argue that it is necessary to create different event-based infrastructures, each of them offering only the features needed by a specific application domain. JEDI partially supports this vision. For instance, we have limited purposely the level of abstraction of some of the JEDI features to make it suitable to operate at the Internet scale (e.g., by avoiding a global and powerful type system in favor of scalability and flexibility). Certainly, we argue that it is necessary to perform a systematic evaluation of the correlation among the technical features of event-based infrastructures and the characteristics of the application domains.

The next section is an initial attempt to address the two issues above, by providing criteria and concepts to compare and analyze existing approaches and systems. 5

RELATED WORK

This section surveys event-based infrastructures and compares them with JEDI. Also, it compares OPSS with similar “Internetwide” WFMSs. 5.1

Event-based infrastructures and frameworks

As pointed out in Section 4.8, the idea of exploiting events in software systems is not new and has been adopted in several contexts. For instance, in the area of active databases, events are generated when updates are performed on data. These events may trigger the execution of actions, depending on the structure of some Event-Condition-Action rules (ECA rules). These rules specify the set of events they are triggered by (Event part), the condition that has to be checked upon triggering (Condition part), and the set of operations that are executed if the condition is true (Action part) [21]. In active databases both the generation of events and the reaction to their occurrence is local to the DBMS. With this respect they differ significantly from 30

event-based infrastructures that are devoted to support communication among distributed components. For this reason, even issues that appear to be similar, in reality are addressed in the two domains with different emphasis and scope. For instance, the issues related with the distribution and scalability of the dispatching mechanism are irrelevant in active databases. Conversely, they are of primary importance in event-based infrastructures. Indeed, while approaches to support analysis and testing of ECA rules have been proposed in DBMSs (see for instance [1]), the same issues have not been considered so far in eventbased infrastructures. For the above reasons, we will not discuss active databases further. The focus of the remainder of this section will be on understanding and comparing characteristics and peculiarities of event-based infrastructures devoted to support communication among distributed components.

The first event-based infrastructures were proposed in the 80s to solve specific problems such as the development of extensible CASE tools [43] and the integration of applications running on mainframes [40]. In the past years the interest in this communication paradigm is exploded due to the dramatic diffusion of distributed, component-based systems. Since we started the development of JEDI, a number of new event service infrastructures have been proposed either in academia or in industry. Moreover, several attempts to define the event-based architectural style and to classify the event-based infrastructures according to well-defined frameworks have been proposed. One of these efforts is presented in [9]. That paper focuses on the identification of the main functional components of an event-based middleware and defines a type system that can be further specialized to describe specific event-based middlewares.

register

Registrar

register

instruct

Informer (Participant)

Informer (Participant)

Router send

send MTFs

DCs

Figure 10: Functional components of an event-based infrastructure.

Figure 10 summarizes the functional components of an event-based infrastructure as they have been identified in [9]. Participants can either send or receive messages that represent the occurrence of some event. Before sending or receiving any mes31

sage, a participant has to inform the Registrar of its intention of doing so. The Router is in charge of delivering the messages. It may contain a number of internal components, the Message Transforming Functions (MTFs) and the Delivery Constraints (DCs). MTFs are in charge of filtering the messages on behalf of some listener, while DCs define some constraints on the order in which events are received. A more general framework is proposed in [44]. In this case, an event-based system is described by seven models: •

The object model characterizes the components of the system.



The event model focuses on the characteristics of events.



The naming model defines how components refer to the other components and to events for the purpose of subscribing to them.



The observation model focuses on the mechanisms through which the occurrences of events are observed.



The time model is related to the temporal aspects of events.



The notification model focuses on the mechanism to notify consumers of the occurrence of events.



The resource model defines how computational resources are allocated and accounted.

In this section we introduce a classification that can be considered a pragmatic specialization of the two frameworks mentioned above. The objective is to provide the reader with guide for a practical comparison between our work and other efforts presented in the literature. We classify systems depending on their event model, subscription approach, observation and notification model, and, finally, on the basis of their architecture. 5.1.1

The classification framework

As for the event model, we identify three different classes of event-based systems: •

Tuple-based: notifications are defined as a set of strings. For example, (UsefulStuff, 4, http://…) is a tuple-based event notification that indicates the availability of Release 4 of product “UsefulStuff”, which can be downloaded from site “http://…”.



Record-based: notifications are defined as sets of typed fields characterized by a name and a value. For instance, Struct NewRelease { 32

string ProductName = “UsefulStuff” integer ProductRelease = 4 string DownloadURL = “http://…” } is a record-based notification composed of three typed fields. Note that within the record-based category, different event-based infrastructures could be further classified depending on the richness of the type system they offer. •

Object-based: notifications have both a state and a set of methods. For instance, the following code defines a class of events called NewSoftwareRelease: class NewSoftwareRelease: public Event { public String ProductName; public String ProductRelease; private String DownloadURL; NewSoftwareRelease(String name, String Release, String URL); public void DownloadAndInstall(); } In this case, an event is created through the invocation of the class constructor NewSoftwareRelease: NewSoftwareRelease NewProduct = new NewSoftwareRelease(“UsefulStuff”, “4”, “http://…”); Event NewProduct represents a new product being delivered. It provides a method, DownloadAndInstall, that can be invoked by the receiver of the event to get the product downloaded from the address contained in the variable DownloadURL and installed on the local machine.6

The subscription approaches can be classified as follows:

6

Notice that DownloadURL has been defined as private because we want to ensure that the user of a NewSoftwareRe-

lease instance downloads the corresponding software only by invoking the method DownloadAndInstall. 33



Content-free: Subscription is accomplished by specifying a channel. The subscriber receives all the messages that are posted to the channel.



Subject-based: Each event is labeled with a subject. Subscriptions are specified by indicating the subject of interest. Notice that the subject-based approach is a variation of the content-free concept. We introduce this distinction because it reflects a market trend. In practice, both subjects and channels can be used to represent the “key” of the events the subscriber wants to receive. Both approaches enable the exploitation of multicast communication infrastructures and guarantee the high level of performance needed in several critical application domains (e.g., thousands of events per second in stock market applications). The drawback is in the limited freedom that subscribers have in expressing the event categories they want to receive.



Content-based: Subscriptions are specified as expressions evaluated over the event contents. Within the contentbased category, subscription language constructs can be further classified depending on the expressive power they provide to specify predicates: •

Disjoint elementary expressions: it is possible to specify the value or the range of values for each event parameter. For instance, in a system supporting a record-based event model, we could issue the following subscription: subscribe(name = “UsefulStuff”, release > 4), where name and release are names of event fields, and “UsefulStuff” and 4 are two constant values.



Compound expressions: it is possible to compare different event parameters. For instance, in a system supporting a tuple-based event model, the following subscription could be issued: subscribe(I parameter > 4, II parameter < I parameter).



Regular expressions: the subscription request is expressed using regular expression. For instance, in a system supporting

a

tuple-based

event

model,

the

following

subscription

could

be

issued:

sub-

scribe(“*Staff”, “*”, “*.it”). •

Event combination: it is possible to define subscription expressions that require the combined occurrence of more than one event. For instance, in a system supporting a record-based event model, we could issue the following subscription: subscribe(A followed by B where A.share = ``IBM'' and B.share = ``IBM'' and B.value = A.value+25%). This subscription will be issued by a component that wants to 34

be notified when the IBM share value increases by 25%. Two characteristics of event-based infrastructure that can influence the design of applications are the observation and the notification model. These models can follow two different communication styles, push or pull. In general, in the push style the provider of data (i.e., the event source in the observation model and the event dispatcher in the notification model) starts a communication with the receiver (i.e., the event dispatcher in the observation model and the event recipient in the notification model). Conversely, in the pull style, the receiver explicitly polls the provider. The adopted communication style has an impact on the way active objects are implemented and also on the performance of the event dispatcher. For instance, if the observation mechanism is pull, a producer of events should offer to the event service a “polling service” through which it can be queried. In turn, the event dispatcher should periodically exploit this polling service, thus increasing its workload. An important factor that has an impact on performances and scalability of event-based infrastructures is the internal architecture of the event dispatcher. We classify event-dispatcher architectures according to the following categories: •

Direct Connection. No explicit event dispatcher exists. Events are dispatched by the sources to the interested parties that are directly connected to the sources themselves. In other words, the sources act as event dispatchers of the events they want to notify.



Broadcast. This is a special case of direct connection in which sources exploit IP multicast [26] to deliver the events to the destinations.



Centralized. A single event dispatcher performs the dispatching of events.



Distributed. A number of interconnected dispatching servers cooperate to deliver events.



Mixed. The mixed approach exploits broadcast messages to deliver events within a LAN and a centralized or distributed event dispatcher to forward them across different LANs. This way it is possible to take advantage of the broadcasting mechanisms still overcoming their limitations in WAN communications.

As a final comment, notice that, while support to component mobility is a distinctive feature of JEDI, it does not appear in the above classification since none of the other event-based infrastructures supports it. 5.1.2

A comparison of representative event-based infrastructures

Table 1, Table 2, and Table 3 classify representative event-based systems with respect to the characteristics we have identified 35

in Section 5.1.1. In the following of this section we describe all the infrastructures listed in the tables. Event model Tuple

Record

4

CORBA

4

C2 Smartsockets

4

TIB/rend.

4

ToolTalk

4

Elvin

4

Yeast

4

GEM

4

Gryphon

4

JMS

4

JEDI

Object 4

JavaBeans

4

Table 1. Classification of the event models. Subscription approach Content-based Contentfree JavaBeans

Subjectbased

Compound expressions

Regular expressions

Event combination

4

CORBA

4

C2

4

Smartsockets

4

TIB/rend.

4

ToolTalk

Disjoint elementary expressions

4 4

Elvin

4

Yeast

4

GEM Gryphon

4

JMS

4

JEDI

(Partially)

Table 2. Classification of the subscription approaches.

36

Observation model Push

Pull

Notification model Push

Pull

4

Event-service architecture Dir. Conn.

Distr.

JavaBeans

4

CORBA

4

C2

4

4

Smartsockets

4

4

TIB/rend.

4

4

4

ToolTalk

4

4

4

Elvin

4

4

4

Yeast

4

4

4

GEM

4

4

4

4

Gryphon

4

4

4

JMS

4

4

4

JEDI

4

4

4

4

4

Mixed

Centr.

4 4

4

4 4

4

4 4

4

(Not specified) 4

Table 3. Classification of observation and notification models, and of event-service architectures. JavaBeans are reusable components written in Java [48]. They support a simplified event-based model in which communication is permitted among components (called beans) running in the same process. Despite this simplification, JavaBeans provide a powerful event model, in which events are instances of the class EventObject. A JavaBeans developer can define new classes of events by specializing this class. JavaBeans approach does not explicitly provide an event dispatcher. Instead, the sources of events are in charge of explicitly notifying each of the JavaBeans that has expressed interest in receiving an event. JavaBeans can subscribe to classes of events. This means that they will receive all the events belonging to that class or to one of its subclasses. Instead, they cannot express requirements on the specific content of the events they will be notified of. Subscriptions are issued by directly contacting the sources of the interesting events. The observation and notification models follow the push approach and exploit the Java method invocation protocol.

OMG has defined a standard for the implementation of an event service on top of the CORBA ORB [37]. In particular, the standard defines the IDL interfaces for three types of components that are involved in an event-based interaction. These are the event supplier, the event consumer, and the event channels. Event consumers may be directly connected to event suppliers. Alternatively, the distribution of events can be mediated through an event channel that allows multiple suppliers to communicate with multiple consumers asynchronously, thus providing a true event distribution mechanism. A component of the system 37

(either supplier or consumer) may be connected to several event channels. According to our classification the CORBA event service supports a record-based event model. The channel can either be aware of the structure of events (typed approach) or not (untyped approach). Current implementations of the event service specification support mainly the untyped approach. In the untyped approach, a consumer connected to an event channel receives all the events that suppliers forward to that event channel. Thus, the subscription approach is content-free. Conversely, observation and notification models are quite sophisticated. In fact, CORBA supports both the push and pull approaches. They can be combined in different ways, by having both push and pull event observation and notification in the same system. The CORBA-compliant event channels that are currently available on the market present mostly a centralized architecture. Event channels can be pipelined. This constitutes a sort of distributed event dispatching architecture. However, such distribution is not transparent to application developers. In fact, they have to explicitly manage it. In order to enhance the capabilities of the event service, OMG is currently working on specifying the interfaces of a notification service [38]. It enriches the event service with a content-based subscription mechanism supporting compound expressions and event combination. C2 is an event-based architectural style that has been designed to support the development of GUI software [56]. C2 is currently applied also to other classes of applications. The distinctive characteristic of C2 is that it provides support for the dynamic reconfiguration of an application [39]. In C2 multiple software components can communicate through connectors (called C2 buses) that manage the routing of events. Each C2 bus has two terminations (called bottom and top). Components connected to the bottom of a connector are enabled to generate “requests”. The connector forwards these requests to the components connected to its top that are able to serve the request. In turn, components connected to the top can generate “notifications” that the connector forwards to all the components connected to its bottom. A component can be connected to the bottom of one C2 bus and to the top of another one. This means that components and buses can define a kind of hierarchical architecture. The constraints imposed on this architecture are that components cannot be interconnected directly and that it is not possible to have either direct or indirect cyclic connections. According to our classification the event model supported by the C2 infrastructure is object-based, as the parameters of events can be Java serialized objects. The subscription approach is contentfree for notifications, since all the components connected to the bottom of a bus receive all the notifications that have been generated by the components connected to the top of the same bus. The internal architecture of each bus is centralized, but the architectural style provides guidelines to compose these buses. Finally, the observation and the notification models are pushbased, and the protocol used is in both cases RMI. 38

Smartsockets [55] is a commercial event-based infrastructure developed by Talarian. It provides a rich environment for the development of event-based applications and supports monitoring of the events exchanged among the components of an application. Also, it provides APIs for synchronous communication and supports fault tolerant connections. As for the event-based communication mechanism, Smartsockets supports a record-based event model and predefines a set of commonly used event types. Developers can either create events as instances of these types or define their own application-specific types. The subscription approach adopted by Smartsockets is subject-based. Subjects can be organized in hierarchies (e.g., “/stock/computer” is a subject defined within the context of the broader subject “stock”). Subscriptions refer to subjects. They can contain wildcards. For instance, “/*/computer” matches all the subjects containing a sub-subject called “computer”. Subjects are orthogonal to event types, in the sense that events of the same type can belong to different subjects and, vice versa, a subject can be associated to events of different types. Therefore, a consumer cannot express requirements on the content of the event it wants to receive. The internal structure of the Smartsockets event dispatcher is distributed. Each dispatching server is aware of all the subscriptions that have been issued in some point on the system and is able to dynamically route events based on the cost of network connections and on their load. While this approach to distribution provides several advantages on a local area network in terms of increased performance, load sharing, and reliability, its applicability to wide-area network scenarios has to be analyzed according to the discussion of Section 2.2.3. Smartsockets supports a push observation model and both push and pull approaches for the notification model. TIB/Rendezvous is a commercial infrastructure developed by TIBCO for creating and maintaining large, distributed, eventbased applications [57]. It has been used over the past years to integrate financial and banking applications (especially, trading services for financial markets). It offers several interesting features including reliable and scalable distribution of events. It exploits a three-level hierarchical event dispatcher: each node running one or more agents must also run a TIB/Rendezvous daemon, which is in charge of filtering events for the agents running in that node. TIB/Rendezvous daemons communicate among themselves by mean of broadcast messages. The delivery of events among nodes that do not belong to the same subnet is achieved through two kinds of “routing daemons”: a subnet routing daemon and a wide-area routing daemon. The combination of TIB/Rendezvous daemons, subnet routing daemons, and wide-area routing daemons defines the three level hierarchical structure mentioned above. TIB/Rendezvous events are composed of a set of typed data fields. The subscription approach is similar to the one adopted by Smartsockets: each event has an associated subject, which plays the role of a special field. Agents may subscribe to one or more subjects. Observation and notification models follow the push approach. 39

ToolTalk [47] is a product derived from FIELD [43]. It was originally conceived to support tool integration in software engineering environments. In Tooltalk events are composed of a name and a set of attributes. Each attribute can be of type integer, string, or byte, and is associated with a textual comment that is used to describe its semantics to application developers. For the purpose of our classification, we can consider these events as record-based. The components of a ToolTalk system subscribe to events either statically, at installation time, or dynamically, during their execution. If there is no dynamic subscription for a specific event, ToolTalk exploits static subscriptions to start a component able to receive and handle the event. Also, subscriptions can have two different scopes: session and file. A session is defined as the set of all the tools served by the same ToolTalk server. If a component performs a subscription with a session scope, it can receive the messages that are sent to that session and that match the subscription. When a component performs a subscription having a file scope, it receives all the messages that match the subscription and that refer to that file, independently of the session in which they are generated. According to our classification, the ToolTalk subscription approach is based on disjoint elementary expressions. As we have mentioned before, multiple ToolTalk servers can be instantiated. They interact when events associated with files have to be delivered. Therefore, the event service architecture is distributed. The observation and notification models are both push. ToolTalk also supports events with return values as described in Section 4.1. Elvin [46] is an event-based infrastructure that has been developed at the University of Queensland (Australia). The focus of this project is on defining a rich event subscription language and on achieving scalability by supporting the federation of event dispatchers. The event model supported by Elvin is record-based. The types supported for fields are string, integer, float, and date. The subscription mechanism that is currently supported exploits compound expressions. The standard comparison operators can be used to compare a field with other fields of the same event or with constant values. The observation and notification models support the push approach. Distribution of the ED is currently an ongoing work on the last version of Elvin (Elvin4). In [4] authors distinguish between local and wide area federation of EDs. The first one takes place within the boundaries of a single organization or business unit and aims at providing reliability (if a fault on an ED occurs, the others take its place) and scalability with respect to the number of users (some form of load balancing allows new users to be connected to the less loaded ED). The second type of federation is more close to the distributed approach we propose in JEDI and focuses on issues such as minimization of coordination messages among dispatchers and ordering of messages. An interesting characteristic of Elvin that is not shared with the other event-based infrastructures we are aware of is called quenching. It is the ability of components attached to the Elvin dispatcher to know (from the dispatcher itself) if some other components have issued a subscrip40

tion compatible with some quench expression. Using this feature, a component can autonomously decide not to generate events to which no one is interested. At a first look this approach seems to optimize the network traffic. However, a deeper evaluation is needed when we consider applications in which components can subscribe to events anytime during their execution. In this case, the generator of an event cannot issue a quenching request once for all, but it has to renew it all the times it generates the event. When a distributed event dispatcher is considered, the management of such quenching requests may be quite expensive in terms of network load. Yeast is an event-action system [29]. It offers a powerful mechanism for specifying and detecting the combined occurrence of events. The Yeast main component is a centralized server that observes event sequences and reacts to their occurrence according to some action specification. It differs from the other event-based infrastructures discussed so far. Events in Yeast can be either operating system events (e.g., file changes) or messages produced by the components of the system. According to our classification the structure of events is record-based. The users of Yeast control its execution by defining and posting some event-action specifications. These specifications define an event pattern and the actions that have to be performed when the event pattern is matched. Actions include any command that can be executed by the Unix shell (e.g., sending an e-mail message or creating a new file). An event pattern can be composed of a number of event descriptors combined using some logical and temporal operators. For instance, “file foo mtime changed then in 10 minutes” is an event pattern composed of two event descriptors. The first one is of type file, is called foo, and has an attribute called mtime. The second one is a temporal event descriptor. The event pattern is matched if file foo has changed and ten minutes have elapsed since then. Event patterns represent the Yeast subscription mechanism. According to our classification, they support event combination. As for event observation, Yeast supports a mixed approach. In particular, operating system events (the authors call them predefined events) can be observed either through a pull or a push approach, while messages produced by components (called userdefined events) are observed only through a push approach. Yeast does not have an explicit notification model since it is not supposed to automatically notify components of the occurrence of events. However, being its action language the Unix shell language, it allows the programmer to notify components by exploiting the standard e-mail mechanism. Yeast and JEDI (or any other event-based infrastructure) are quite different and complementary. Yeast does not offer any event dispatching functionality, but provides sophisticated mechanisms for defining, observing event sequences, and reacting to their occurrence (see Section 4.2). Thus, it could be implemented as a component on top of many event-based infrastructures. GEM (Generalized Event Monitor) [35] has been developed to support network management. GEM architecture is composed 41

of three types of nodes: event generators, event monitors, and event disseminators. Event generators emit event notifications. Event monitors process the event notifications they receive from the other nodes of the network. In particular, they filter and compose incoming notifications and emit resulting notifications. They operate by interpreting filtering and composition rules that are defined in a proper language. Event disseminators forward event notifications to the clients that subscribe to them. In GEM the focus is on the definition of the language for event filtering and composition. This language can be considered similar to the one implemented in Yeast. The main difference is that the temporal aspects related with the evaluation of filtering and composition rules are managed in GEM by assuming that event monitors are distributed. This requires the definition of mechanisms for guaranteeing the existence of a global clock. The event model provided by GEM is record-based. The observation model can be either push or pull. The notification model is push. The event service architecture is distributed. From the documentation available we could not understand if the components that use the system (the monitored objects and the final users of the event notifications) need to be aware of this distribution or not. The subscription approach is not explicitly described in [35]. The language used for defining filtering and composition rules can potentially support an event combination approach to subscription. Gryphon [7] is a research project by IBM that is currently focusing on defining efficient algorithms to match events against content-based subscriptions. The system supports a record-based event model and a compound expression-based subscription approach. The architecture of the event dispatcher is distributed. When receiving an event, a dispatcher executes a matching algorithm that, based on subscriptions allows it to determine the set of neighbors (either other dispatchers or application agents) that need to receive the events. As we mentioned before, the main focus of the project, at the moment, is on defining an efficient algorithm for performing such matching and limiting the network traffic concerning event delivery. No attention is currently paid to how subscriptions are distributed to all event dispatchers. Event dispatchers are assumed to be somehow informed of the subscriptions issued by all the connected active objects. As we discussed in Section 2.2.3 and as other researches have pointed out [4], the issue of subscription distribution cannot be disregarded since it can dramatically increase the network traffic in a wide area network. The Java Messaging Service (JMS) [49] is an API developed by Sun Microsystems. It aims at representing the standard, common interface for Java messaging products. Sun does not provide any implementation of this interface, and assumes that other tool vendors will adopt it. According to the specification, messaging products can be broadly classified as either point-to-point or publish-subscribe systems. Point-to-point (PTP) products are built around the concept of message queues. Each message is 42

addressed to a specific queue; clients extract messages from the queue(s) established to hold their messages. Publish-subscribe systems are what we defined as event-based infrastructures. Consistently with this classification, JMS interfaces can be ideally split in two subsets: one tailored for point-to-point messaging products, the other tailored for publish-subscribe systems. To be “JMS compliant” a tool vendor has to adopt either one of the two sets of interfaces. In this paper we will discuss only the interfaces related to the publish-subscribe paradigm. JMS messages are composed of a set of standard headers (each one characterized by a name and a value), a set of properties, which can be user-defined or vendor-specific, and a body, which may include any stream of data. JMS messages are addressed to a topic. Topics have a name and are organized in a hierarchy. JMS clients subscribe to messages addressed to a given topic by specifying a message selector. Any string that conforms to a subset of the standard SQL92 conditional expressions syntax can be used as a selector. It can reference message headers and properties but cannot reference message bodies. Only messages whose headers and properties match the selector are delivered to the subscribers. Messages can be received both synchronously and asynchronously. Observe that, being a pure API, JMS does not specify how the dispatcher is implemented, i.e., as a centralized server or through a set of collaborating, distributed components. 5.1.3

Other related approaches

In this section we present some systems that we do not consider true event-based infrastructures. Nevertheless we believe that they offer specific functionality that are relevant to the discussion presented in this section.

Multicast RPC [10, 58, 59] (also known as group RPC) allows a client to invoke a service on a group of servers that export the same interface. The servers “register” to a class of messages (service requests) by joining a group and by exporting the common interface defined for the group. This is quite different from the approach taken by JEDI. In JEDI event consumers use a more powerful declarative approach to “register” to a class of messages and they do not need to export any common interface. Moreover, multicast RPC is a synchronous communication mechanism in which an answer is required, while JEDI implements an asynchronous communication mechanism without answer. From this viewpoint, multicast RPC is complementary to the JEDI approach, and could be similar to the synchronous mechanism we advocated in Section 4. Linda [12] is the precursor of a generation of languages aiming at describing and supporting cooperative computations. The basic idea is that different autonomous computations can cooperate by reading and writing information through a shared repository (or space) of information tuples. Each Linda program can read a tuple from the repository on the basis of its contents, 43

using a pattern matching mechanism. A read operation does not remove the tuple from the repository. Linda offers also a consume operation that reads the tuple and removes it from the repository. There are several differences between Linda and JEDI. First, JEDI makes it possible to “declare”, through the subscribe operation, the class of events which an application is interested in. As a consequence, the application will receive all the events that conform with the subscribe declaration without requesting them further. In other words, events are distributed by the ED to the application as they are produced and asynchronously with respect to the main control flow of the application. Conversely, in Linda each read/consume operation is independent of each other and is synchronously executed by the Linda program. Second, JEDI (as any other true event-based approach) guarantees that all the parties that have declared their interest in an event will eventually receive it. This is enforced by the JEDI run-time support based on subscription requests. In Linda the only way to achieve a similar effect is to work at the application level. For instance, before removing the tuple, a Linda program might check for some global information to be sure that all the other interested parties have already read it. Another possibility is that each event producer writes multiple copies of a tuple, one for each interested party. This means that the producer must know the number of interested parties. In both cases, the correctness of the event distribution semantics is left to the programmer’s responsibility. JavaSpaces by Sun [52] and Tspaces by IBM [33] both follow the Linda approach to support distributed cooperation. In both cases the Linda paradigm is enhanced with a subscription mechanism that allows agents to be notified when new tuples matching their subscriptions are inserted into the shared repository (Tspaces also support subscriptions to other kinds of events such as the elimination of a tuple from the shared repository). Still, the style of interaction supported by such systems is different from the one supported by event-based infrastructures since, as we discussed above, they do not really support multicast communication of tuple contents. This has to be managed at the application level by ensuring that components do not remove a tuple from the shared repository before all the interested parties have read it. Event-based systems can be considered as an evolution of a well-established class of products often called MOMs (MessageOriented Middleware) [40].7 In MOMs, messages are sent to explicit queues, which guarantee location transparency. Depending on the specific MOM, messages can be tuples or records. In several MOMs, there can be multiple consumers for the same

7

Some authors consider MOMs as including both message queues and publish/subscribe (i.e., event-based) infrastructures. In

this paper we refer to MOMs as including only message queues as in [40]. 44

message queue. This approach makes MOM similar to Linda. As a consequence, we argue that MOMs exhibit the same problem of Linda. In fact, even if a MOM made it possible to just “read” a message from the queue without removing it, this would be a decision left to the consumer. The delivery of the event to all the interested parties (and only to them) cannot be guaranteed by the platform. Notice that some new generation MOMs (see for instance Oracle AQ) take advantage of both the persistency properties of the MOM approach and the ability of multicasting messages that is typical of event-based infrastructures. 5.1.4

A word of wisdom

The list of systems we have presented is not meant to be exhaustive. Instead, it is supposed to provide an overview of the issues that are being addressed by the existing approaches. New systems are being released every few months. For instance, Sun, HP, Toshiba, Oracle, and Microsoft have recently delivered their event-based infrastructures. Compared with the other approaches, JEDI can be considered an interesting representative of the event-based infrastructure category. Even if some other systems provide more powerful event models and subscription approaches, JEDI is interesting for its unique ability of dealing with agent mobility and for its attempt to support Internet-wide applications. As we have discussed in the previous sections, many of the systems we considered have a distributed architecture. However, the approaches adopted to distribute subscriptions have been designed to support fault tolerance and load balancing among pools of dispatchers located on the same LAN while their applicability to an Internet-wide setting is not discussed. As an exception, the researchers of the Elvin project explicitly consider and discuss this problem, but they do not present any concrete solution. JEDI provides an initial proposal. We are aware that this proposal needs to be improved and to take into account critical issues such as fault tolerance and security. Still, we feel that it contributes to the identification and analysis of the problem. 5.2

WFMSs

There are a number of WFMSs that, like OPSS, use event-based communication. These environments differ in the level of pervasiveness of event-based communication in their architecture. SPADE [6,8] is the first WFMS that has been developed in our group. It uses the event-based communication mechanisms provided by ToolTalk (see Section 5.1.2) and DECFuse [19] (another FIELD spin-off) to support communication between the engine executing the process and the external tools. In SPADE events are used in a quite limited way, since they are exploited only to support the interaction between the centralized process engine and the tools, but are not used to support other interactions in the environment, such as those occurring between the process engine and the process state repository. Nevertheless, event-based communication proved to be a valuable and non45

intrusive mechanism for controlling external tools in a flexible way [5]. ProcessWall and APEL introduce the idea of exploiting a state server to store the relevant information on the process. ProcessWall [25] is a server that provides storage for process state and operations for defining and manipulating the structure of the state. The applications that actually execute the process operate as clients of such server. Clients execute the process activities and invoke the ProcessWall operations to modify the state of the process according to the result of their processing. An event dispatching system is used to notify the interested clients of changes occurred in the state of the process. Differently from OPSS, the communication in the opposite direction, from clients to the ProcessWall server is point to point. This limits the possibilities of reconfiguring the system. It is not easy, for instance, to replicate or distribute the ProcessWall state server without affecting its clients. APEL [16] is an environment developed at IMAG (Grenoble, France). It exhibits several interesting features such as a highlevel graphical interface that incorporates different paradigms for process modeling (activity-based and document-based). The aspect that is particularly relevant within the context of this paper is the APEL architecture and its underlying technology. APEL is centered on an event server and a state server that jointly offer a service similar to Process Wall. The event server distributes events to the other components of the architecture. The requests from these components to the process server are accomplished by exploiting CORBA facilities. A similar architectural approach is also implemented in PEACE+ [32]. Two other interesting environments are Serendipity [24] and PROSYT [14]. Both of them allow the execution of the process to be distributed over a wide area network and use the event-based approach to support communication among distributed process engines. Differently from OPSS, both Serendipity and PROSYT do not store the state of the process separately from process engines. This results in the fact that when process engines connect (or re-connect) to the system, they explicitly synchronize with all the other process engines. The management of such kind of synchronization can be heavy and cumbersome as the number of engines grows. Serendipity is more sophisticated than OPSS as far as the mechanisms for defining distributed process models are concerned. Also, it supports temporary disconnection of process engines. Even if, in principle, this is also possible in OPSS, we did not implement this feature yet. As for the event based communication, in Serendipity it is implemented in an ad-hoc way, since system elements are connected by point-to-point communication channels and all the features concerning event publication and delivery are implemented as part of Serendipity itself. 46

As OPSS PROSYT exploits the JEDI framework to distribute process model enactment. PROSYT has been implemented at Politecnico di Milano in parallel with OPSS and has contributed to highlight the pros and cons of the event-based communication paradigm described in the previous sections. Differently from OPSS, PROSYT focuses on providing the proper mechanisms to allow humans to deviate from the modeled process and to keep track of such deviations. The exploitation of JEDI as underlining infrastructure allows PROSYT to better monitor the actions performed by users on tools thus limiting the occurrence of deviating actions outside the control of the system. The interest of exploiting events in WFMSs is growing even in the industrial context. Companies such as Netscape and Oracle are actively participating in the IETF SWAP (Simple Workflow Access Protocol) working group that aims at defining a protocol for supporting communication between process engines in an Internet-wide environment [54]. The protocol is based on HTTP and defines two main roles for components: the process instance and the observer. The process instance is any process fragment that is being executed. A process instance allows external components to start, stop, resume, and terminate its execution. Observers control the execution of process instances and are notified of their termination. The protocol assumes that such notification can be delivered using a general-purpose event-based infrastructure whose specification is considered to be outside the scope of SWAP. Besides the obvious technological diversities, the communication protocol defined in OPSS is more complex than the one proposed by SWAP. In OPSS events are not just limited to notify the termination of an activity, but are generated in several other situations that can be defined by process modelers. Endevours and OzWeb are two interesting web-based WFMSs. They share with OPSS the possibility of distributing process execution even if they are not based on the event-based communication paradigm. Endevours [11, 28] supports distribution of process execution, lightweight installation and re-configuration, and easy integration of process fragment interpreters with tools and hyperwebs of artifacts. Its architecture is composed of three main levels: the user level, that is in charge of managing the interaction with users, the system level that defines the main process abstractions (e.g., activities, artifacts, …), and the foundation level that manages object persistency and distribution. The foundation level may interact with a number of HTTP servers (through the HTTP protocol) to operate on distributed process artifacts. Different Endevours installation can interact with the same HTTP server. The server exploits a locking policy to prevent the installations to access an artifact when it is in an inconsistent state. Both Endevours and OPSS provide a decentralized execution of processes, i.e., they exploit multiple process engines. The main difference is that Endevours does not rely on the event-based approach to coordinate the interaction of different engines (interpreters in the Endevours terminology): they interact by sharing the artifacts and information stored in a 47

passive repository controlled by HTTP servers. In OzWeb [27] a workflow support is introduced in the context of a subweb. A subweb is a collection of hyperlinks to web documents. To each hyperlink are associated information such as, the content type and the access mode to the corresponding document. Users access to the subweb documents using standard web browsers configured to use a subweb proxy as a mediator for all their communications. The proxy forwards the requests concerning the subweb to a subweb server. This server acts as a workflow service and checks if the operation corresponding to the requests can be performed based on the current state of the process. The execution of an operation in the subweb server can also trigger the automatic execution of other process fragments. The interesting aspect of OzWeb is its capability of enhancing the behavior of web-based systems still maintaining their simplicity and worldwide accessibility. 6

CONCLUSION

In this paper we have illustrated the experiences and lessons learned from the development of JEDI, an event-based infrastructure for the development of complex distributed systems. JEDI exploits the notion of event and adopts standard Internet technologies to provide the software developer with a programming framework where multiple active objects cooperate by generating and consuming events. JEDI offers a simple set of mechanisms to create mobile active objects that interoperate by exchanging events on the Internet scale. The entire architecture is based on very simple and orthogonal concepts. Events are asynchronously distributed to subscribers. All the operations related to event subscription and event notification are managed in a highly dynamic and flexible way. JEDI has been used to implement a significant example of distributed system, namely the OPSS Process Support System. OPSS is a significant example of a distributed system whose development has greatly benefited from the availability of an event-based infrastructure. By exploiting JEDI features, OPSS can offer an extremely flexible and dynamically changeable support for workflow management. The main lessons we have learned from the work described in this paper indicate that the event-based approach nicely complements traditional RPC and conventional point-to-point communication techniques, and it is suited when distributed components need to interact asynchronously and preserving anonymity. These advantages are also demonstrated by the growing interest in this technology of both academia and industry. Nevertheless, a number of technological issues concerning event-based architectures have to be explored. In this respect, we argue that the most critical issue is the identification of appropriate design 48

and implementation strategies that make it possible to integrate different (and sometime conflicting) features such as Internetwide scalability, enhanced event model (e.g., object-oriented), synchronous and asynchronous event handling mechanisms, event filtering. Moreover, we still miss effective methodological guidelines to guide and support the design of event-based systems. With respect to these issues, we are currently addressing several aspects that we consider critical impediments to the effective exploitation of the event-based architectural style. In particular, we are introducing extensions to the existing JEDI event model and operations. The main purpose of these extensions is to support return values. Also we are enriching the structure of events in order to provide a more flexible way to specify the information associated to an event. The impact of these extensions is particularly critical since they have to be combined with other existing features of JEDI: mobility and distribution of the event dispatcher. Indeed, we have identified different strategies to implement these extensions and we are currently evaluating their implementation cost and performance. To achieve this goal, we plan to reuse and further extend the work carried out at the University of Boulder on the evaluation of different architectures for the event dispatcher [45]. ACKNOWLEDGEMENTS The authors wish to thank Antonio Carzaniga, Carlo Ghezzi, Dennis Heimbigner, David Rosenblum, and Alex Wolf for their important contribution to the accomplishment of the work described in this paper. They wish also to thank S. Beretta, C. Colombo, F. Coda, S. Montaruli, S. Sargenti, E. Tracanella, and F. Vadalà who provided an essential support in the development and implementation of JEDI and OPSS. OPSS development has been funded by Telecom Italia under a contract managed by Armando Limongiello. The views and the conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of Telecom Italia. Elisabetta Di Nitto and Alfonso Fuggetta have been partially supported by the University of California, Irvine. Alfonso Fuggetta has been also supported by CNR. REFERENCES 1.

A. Aiken, J.M. Hellerstein, J. Widom, "Static Analysis Techniques for Predicting the Behavior of Active Database Rules", ACM Transactions on Database Systems (ACM TODS), vol. 20, n. 1, 1995, pp. 3-41.

2.

K. Alho, C. Lassenius, and R.Sulonen, “Process Enactment Support in a Distributed Environment”, WET ICE ‘95, IEEE Fourth Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises, Berkeley Springs, West Vir49

ginia, April 20-22, 1995. 3.

V. Ambriola, R. Conradi, and A. Fuggetta. “Assessing Process-Centered Environments”, ACM Transactions on Software Engineering and Methodology, vol. 6, no. 3, July 1997.

4.

D. Arnold, B. Segall, J. Boot, A. Bond, M. Lloyd, and S. Kaplan. “Discourse with Disposable Computers: How and why you will talk to your tomatoes”, Usenix Workshop on Embedded Systems (ES99), Cambridge Massachusetts, March 1999.

5.

S. Bandinelli, E. Di Nitto and A. Fuggetta, “Supporting cooperation in the SPADE-1 environment”, IEEE Transactions on Software Engineering, vol. 22, no. 12, December 1996.

6.

S. Bandinelli, A. Fuggetta, and C. Ghezzi, “Process Model Evolution in the SPADE Environment”. IEEE Transactions on Software Engineering, IEEE Computer Society, December 1993.

7.

G. Banavar, T. Chandra, B. Mukherjee, J. Nagarajarao, R.E. Strom, and D.C. Sturman, “An Efficient Multicast Protocol for Content-Based Publish-Subscribe Systems”, In Proceedings of ICDCS '99 -- Int'l Conference on Distributed Computing Systems.

8.

S. Bandinelli, A. Fuggetta, C. Ghezzi, and L. Lavazza, “SPADE: an environment for Software Process Analysis, Design, and Enactment”. In A. Finkelstein, J. Kramer, and B. Nuseibeh, editors. Software Process Modelling and Technology. Research Studies Press Limited (J. Wiley), 1994.

9.

D. J. Barrett, L. A. Clarke, P. L. Tarr and A. E. Wise, “A Framework for Event-Based Software Integration”, ACM Transactions on Software Engineering and Methodology, vol. 5, no. 4, October 1996.

10. K. P. Birman and T. A. Joseph, “Reliable Communication in Presence of Failures”, ACM Transactions on Computer Systems, 5(1), February 1987. 11. G.A. Bolcer and R.N. Taylor, “Endevours: A Process System Integration Infrastructure”, IRUS Conference on Software Process Improvement, Practice and Experience, January 24, 1997, Irvine, CA. 12. N. Carriero and D. Gelernter, “Linda in Context”, Communication of ACM, 32, 4, April 1989. 13. S. Ceri, E. Di Nitto, A. Discenza, F. Fuggetta, and G. Valetto, “DERPA: A generic distributed event-based reactive processing architecture”, CEFRIEL Technical report 1998. 14. G. Cugola, “Tolerating Deviations in Process Support Systems Via Flexible Enactment of Process Models”, IEEE Trans50

actions on Software Engineering, special issue on Managing Inconsistency in Software Development. Vol. 24, no. 11, November 1998. 15. G. Cugola, E. Di Nitto, and A. Fuggetta, “Exploting an event-based infrastructure to develop complex distributed systems”, Proceedings of the 20th International Conference on Software Engineering (ICSE 98), Kyoto (Japan), April 1998. 16. S. Dami, J. Estublier, and M. Amiour, “APEL: a graphical yet executable formalism for process modeling”, Automated Software Engineering Journal, special issue on Process Technology, vol. 5, no. 1, January 1998. 17. M. Decina, E. Di Nitto, A. Fuggetta, V. Trecordi e J. Wojtowicz, “ORCHESTRA: a retailing infrastructure for networkwide services”, CEFRIEL internal report, submitted for publication. 18. Paula S. deWitte and Chris Pourteau, “IDEF enterprise engineering methodologies support simulation”, Magazine Manufacturing Systems: Information Technology for Manufacturing Managers, March 1997, pp. 70-75. 19. Digital Equipment Corporation, “DEC FUSE Handbook - Version 1.1”, Maynard, Massachusetts, Dec 1991. 20. Guy Eddon and Henry Eddon. “Inside Distributed COM”. Redmond, WA, Microsoft Press, 1998. 21. P. Fraternali and L. Tanca. “A structured approach for the definition of the semantics of the active databases”, ACM Transactions on Database Systems, 1995. 22. A. Fuggetta, G.P. Picco, and G. Vigna, “Understanding code mobility”, IEEE Transactions on Software Engineering, May 1998. 23. D. Georgakopoulos, M. Hornick, and A. Sheth, “An overview of workflow management: from process modeling to workflow automation infrastructure”, Distributed and Parallel Databases, No. 3, pp. 119-153, 1995. 24. J.C. Grundy, M. D. Apperley, J.G. Hosking, and W.B. Mugridge, “A decentralized architecture for software process modeling and enactment”, IEEE Internet Computing, September/October 1998. 25. D. Heimbigner, “The ProcessWall: A Process Server Approach to Process Programming”, Fifth ACM/SIGSOFT Conference on Software Development Environments, 9-11 December 1992, Washington, D.C. 26. V. Johnson and M. Johnson, “IP Multicast Backgrounder”, An IP Multicast Initiative White Paper, http://www.ipmulticast.com.

51

27. G. E. Kaiser, S. E. Dossick, W. Jiang, J. Jingshuang Yang and S. Xi Ye, “WWW-based Collaboration Environments with Distributed Tool Services”, World Wide Web, Baltzer Science Publishers, 1:3-25, January 1998. 28. P.J. Kammer, G.A. Bolcer, R.N. Taylor, and A.S. Hitomi, “Supporting distributed workflow using HTTP”, Proceedings of the 5th International Conference on the Software Process (ICSP5), Lisle, IL, June 1998. 29. B. Krishnamurthy and D.S. Rosenblum, “Yeast: A General Purpose Event-Action System”, IEEE Transactions on Software Engineering, vol. 21, no. 10, October 1995. 30. L. Lamport. “Time, clocks, and the ordering of events in a distributed system”. Communications of the ACM, 21(7):558565, 1978. 31. D. B. Lange and D. T. Chang, “IBM Aglets Workbench---Programming Mobile Agents in Java”. IBM Corp. White Paper, February, 1997. 32. S. Latrous and F. Oquendo, “A reflective multi-agent system for software process enaction and evolution”, Proceedings of the first International Conference on Practical Application of Intelligent Agents and Multi-Agent Technology, London, UK, April 1996. 33. T.J. Lehman, S.W. McLaughry, P. Wyckoff. “TSpaces: The Next Wave”, in Proceedings of the Hawaii International Conference on System Sciences (HICSS-32), January 99. 34. A. Limongiello, R. Melen, M. Roccuzzo, V. Trecordi, J. Wojtowicz, “An Experimental Open Architecture to Support Multimedia Services Based on CORBA, Java and WWW Technologies”, IS&N '97, Cernobbio (Como), Italy, 27-29 May 1997. 35. 3. M. Mansouri-Samani and M. Sloman, “GEM A Generalized Event Monitoring Language for Distributed Systems”, IEE/IOP/BCS Distributed Systems Engineering Journal, Vol. 4, No. 2, June 1997. 36. Object Management Group, “CORBA/IIOP 2.2 Specification”, February 1998, ftp://ftp.omg.org/pub/docs/formal/98-0701.pdf. 37. Object

Management

Group,

“CORBAservices:

Common

Object

Services

Specification”,

July

1997,

ftp://ftp.omg.org/pub/docs/formal/97-07-04.pdf. 38. Object Management Group, “Notification Service”, August 1999, OMG TC Document telecom/99-07-01, 52

http://www.omg.org/docs/telecom/98-06-17.pdf. 39. P. Oreizy, N. Medvidovic, and R. N. Taylor, “Architecture-Based Runtime Software Evolution”, Proceedings of the 20th International Conference on Software Engineering 1998 (ICSE'98). Kyoto, Japan, April 19-25, 1998. 40. OVUM, “OVUM Evaluates: Middleware”, OVUM Ltd, 1996. 41. D. Piantanida and E. Sanvito, “JAMES – Java Meeting Scheduler”, Master Thesis (in Italian), Politecnico di Milano, Dipartimento di Elettronica e Informazione, 1999. 42. G.P. Picco, “µCode: A Lightweight and Flexible Mobile Code Toolkit”, Proceedings of the 2nd International Workshop on Mobile Agents (MA’98), Stuttgart (Germany), K. Rothermel and F. Hohl eds., September 1998, Springer, Lecture Notes on Computer Science vol. 1477, pp. 160-171. 43. S.P. Reiss, “Connecting Tools Using Message Passing in the Field Environment”, IEEE Software, July 1990. 44. D.S. Rosenblum and A.L. Wolf, “A Design Framework for Internet-Scale Event Observation and Notification”, 6th European Software Engineering Conference (Joint with SIGSOFT '98, Foundations of Software Engineering), Zurich, Switzerland, September 1997. 45. D. S. Rosenblum, A.L. Wolf and A. Carzaniga, “Critical Considerations and Designs for Internet-Scale, Event-Based Compositional Architectures”, Digest of the OMG-DARPA-MCC Workshop on Compositional Software Architectures, Monterey, CA, January 1998. 46. B. Segall and D. Arnold, “Elvin has left the building: A publish/subscribe notification service with quencing”, Proceedings of AUUG97, September 1997. 47. Sun

Microsystems,

“Integrating

applications

with

the

SPARCworks

3.0.1

toolset”,

http://www.sun.com/software/Products/Developer-products/literature/int_tool/preface.html. 48. Sun Microsystem, “JavaBeans”, Sun Microsystem Technical Report. 49. Sun Microsystem, “Java Message Service Specification”, Sun Microsystem Technical Report. 50. Sun Microsystem, “Java Object Serialization Specification”, Technical Report, ftp://ftp.javasoft.com/docs/jdk1.2/serialspec-JDK1.2.pdf.

53

51. Sun

Microsystems,

“Java

Remote

Method

Invocation

Specification”,

February

10,

1997,

ftp://ftp.javasoft.com/docs/jdk1.1/rmi-spec.pdf. 52. Sun Microsystems, “The JavaSpaces Specification”, November 1999, http://www.sun.com/jini/specs/js101.html. 53. S. M. Sutton, D. Heimbigner, and L.J. Osterweil, “APPL/A: a Language for Software-Process Programming” ACM Transactions on Software Engineering Methodology. 4(3). July 1995. 54. K. Swenson, “Simple Workflow Access Protocol (SWAP)”, Internet Draft, August 1998, http://www.ietf.org/internetdrafts/draft-swenson-swap-prot-00.txt. 55. Talarian, “Mission Critical Interprocess Communications - an Introduction to Smartsockets”, White paper. 56. R.N. Taylor, N. Medvidovic, K.M. Anderson, E.J.Whitehead Jr., J.E. Robbins, K.A. Nies, P. Oreizy, and D.L. Dubrow. “A component-based architectural style for GUI software”, IEEE Transactions on Software Engineering, vol. 22, no. 6, June 1996. 57. TIBCO, “TIB/Rendezvous”, White Paper. http://www.rv.tibco.com/rvwhitepaper.html. 58. K. S. Yap, P. Tripathi, and S. Tripathi, “Fault Tolerant Remote Procedure Call”, Proceedings of 8th International Conference on Distributed Computing System, June 1988. 59. X. Wang, H. Zhao, and J. Zhu, “GRPC: A Communication Cooperation Mechanism in Distributed Systems”, ACM Operating System Review, 27(3), 1993. 60. WISEN 98: 1998 Workshop on Interned Scale Event Notification. Irvine Research Unit on Software (IRUS), Irvine (CA), 13-14 July 1998, http://www.ics.uci.edu/wisen. 61. The Workflow Management Coalition, “The Workflow Reference Model”, WFMC-TC-1003, 29-Nov-94, ver. 1.1, http://www.aiim.org/wfmc/DOCS/refmodel/rmv1-16.html.

54

Suggest Documents