Towards A Content-Based Publish/Subscribe Architecture to Support Complex User Subscriptions Simon Courtenage, Cavendish School of Computer Science, University of Westminster, 115 New Cavendish Street, London, United Kingdom
[email protected] June 3, 2002
Abstract Content-based publish/subscribe systems allow users to create their own subscriptions and decide what messages to receive. Most existing contentbased publish/subscribe systems allow filters on individual messages, but we wish to extend this flexibility to allow users to create subscriptions that build new messages from combinations and patterns of messages. However, this increased flexibility causes significant problems. The publish/subscribe network has to be aware of the combinations and patterns specified in user subscriptions, to recognise them as they occur, and to create new messages from these combinations and patterns in order to forward them on to the interested subscribers. And since new combinations and patterns can be added dynamically to the system as users make new subscriptions, the network must be dynamically configurable. This paper presents a solution to these problems. It first describes a simple functional-style programming language for expressing user subscriptions as computable functions, which, as component events occur, progressively computing the result which will produce the composite event occurrence. It then presents an outline of a distributed event detection system which can be used to support a content-based publish/subscribe system and which is dynamically-configured with complex user subscriptions expressed in the event language.
The author is a Senior Lecturer in the University of Westminster.
1
Introduction
One particular area within the field of distributed systems where event languages play an important role is that of publish/subscribe systems [10]. Here, users or subscribers make known to the system what information they are interested in receiving by registering a subscription. Messages are then delivered to the relevant subscribers based on what the system knows about users’ interests from their current subscriptions. Traditional publish/subscribe systems classified messages into, for example, a hierarchy of topics, and the scope for expressing user subscriptions was limited to specifying which topic to receive messages from. These kind of message subscription facilities are offered by, for example, the CORBA Event Service [13] and Java Distributed Events [15]. However, more recent research has concentrated on using content-based routing [1] [2] [6] [7] [11] as means of delivering messages. In content-based publish/subscribe systems, subscribers express their interest in terms of the content of a message or combination of messages, effectively defining filters which the content of a message has to satisfy in order to be routed onto the subscriber. The advantage of content-based publish/subscribe systems over channel or topic-based alternatives is that subscribers have greater flexibility in specifying their requirements. A subscriber is not limited to predefined message classifications, but can in essence define their own customised message grouping. Typical applications of this form of publish/subscribe are event notification services, such as Elvin [19], Siena [4], Gryphon [20], READY [14], and HERALD [3]. In a scalable architecture, a publish/subscribe system is typically implemented as a network of servers, with local servers acting as access points for publishers and subscribers. Messages originate on the network at the local access point of a publisher, and are routed across the network to the local access point of a subscriber who has previously declared, through a subscription, an interest in receiving messages with certain properties. The problem facing any such system, however, lies in turning the subscription request into routing information to be distributed across the network of servers, in such a way that the task of delivering a message from a publisher to an interested subscriber can be easily accomplished. If subscription requests are simply filters on single messages, then the problem reduces to insertion of the subscription as an entry into the routing tables of the appropriate servers. The task is more complex, however, if we wish to allow subscribers to make subscription requests that are satisfied by patterns of messages. A subscription which is satisfied by a pattern of messages essentially defines a composite event, whose component event occurrences are the publication of messages that are part of the pattern. A new subscription may define a new composite event that the network will have to be notified of in order to begin detecting its occurrence. This in turn will require that existing event detectors of the component events are informed of the new event detector in order to send on notifications of component event occurrences. One possible solution to this problem is to route individual messages directly to the local access points of subscribers, which will be responsible for detecting when the complex subscription is satisfied by a particular mes-
sage pattern. However, this approach ignores the possibility that different subscriptions may share some common structure, and can therefore lead to excess network traffic. In the rest of this paper, we describe a simple, functional-style programming language for expressing user subscriptions as composite events. In this language, specifications of composite events appear as computable functions, which take component events as arguments, progressively computing the result which will produce the composite event occurrence. In the succeeding section, we show how such functional expressions can be represented as directed acyclic graphs. We describe how these graphs dictate the organisation of event detectors in the network, such that the network is essentially a merger of the graphs of all subscriptions currently in effect. In the following section, we discuss how new user subscriptions are dynamically implemented in the network, introducing a second tier of administrative servers to provide a static structure to the dynamically-evolving network of event detectors. Finally, we discuss related work and summarise our research and future work.
2
An event language for complex user subscriptions
In [9], we introduced an event language for expressing complex user subscriptions based on the typed λ-calculus, a simple yet powerful programming language. Expressions in the typed λ-calculus are built from variables, anonymous functions (or λ-abstractions) and applications of one term to another. Types in our event language are built according to the following grammar: T ::= t | T Seq T1 T2 | T Disj T1 T2 | T Conj T1 T2 A type expression, therefore, may be the type of a primitive events, denoted by t, or a compound type. Compound types, which are assigned to composite event expressions, consist of a type constructor which builds the compound type from two type arguments. For example, a sequence type would be constructed as TSeq CityTemp CityTemp Expressions in the event language are built up from variables, constants (which are used to denote actual events), anonymous functions, and compound data structures, according to the following grammar: E ::= a | x | λP.E | Seq E1 E2 | Disj E1 E2 | Conj E1 E2 where a is an event constant, x is a variable, and P is an argument pattern (see below). The data constructors Seq, Disj, and Conj build compound values. Argument patterns specify the events that are part of the composite event specified in the body of a λ-abstraction and their relationship to each
other. The grammar of argument patterns is as follows: P ::= x :: T | P 1 & P 2 | P 1 ; P 2 | P 1 | P 2 | (P ) An argument pattern is either a variable x (tagged with an event type T ), a conjunction of two other patterns P 1 & P 2, a temporal sequence P 1 ; P 2, a disjunction of two patterns P 1 | P 2, or a bracketed pattern (P ). The grammar of the event language is flexible enough to allow λ-abstractions whose argument patterns do not reflect the structure, partially or totally, of the abstraction body. The grammar also allows nested λ-abstractions, which do not have a meaning when seen as event specifications. Therefore, we define a predicate WELL FORMED over expressions in the event language to determine if an expression has a valid meaning as an event specification or not, such that WELL FORMED returns true for an expression E iff E does not contain nested λ-abstractions and, if E does contain a λ-abstraction 0 0 λP.E , then the argument pattern P reflects the body of the abstraction E . We do not restrict the types of variables in argument patterns to be only the types of primitive events. Variables in argument patterns can also be assigned compound types, for example, λ(x :: T1 & y :: (T Seq T2 T2 ).Conj x y) Functions in the event language map the occurrence of events to the occurrence of a composite event of which they are components. Component events of a composite event, and their relationship to each other in the structure of the composite event, are specified in the single argument pattern of a function. When a component events occurs, it is passed to a functional composite event expression, and the result is another functional expression if more component events are required for the composite event to occur1 . Otherwise, a non-functional expression is returned, at which point we say that the composite event has occurred2 . For example, if we have the composite event expression λ(x :: T | (y :: U ; z :: U )).Disj x (Seq y z) then, when an event a :: U occurs, we apply the event as an argument to the event expression and obtain (λ(x :: T | (y :: U ; z :: U )).Disj x (Seq y z)) a :: U Using the evaluation rules given in [9], this results in λ(x :: T | z :: U.Disj x (Seq a z) If a further event b : U occurs, then we have the applied expression (λ(x :: T | z :: U.Disj x (Seq a z)) b :: U 1
This technique is called Currying, a feature of the λ-calculus. Briefly, a curried function is one that, if called with fewer arguments than parameters, will return a function that awaits the remaining arguments. 2 In other words, the result is an expression in the event language which does not evaluate to a λ-abstraction
which evaluates to Disj ⊥ (Seq a b) (Note that in the last evaluation, the variable x is replaced with a special value ⊥, intended as a null value which represents the non-occurrence of that particular event in the disjunction.)
3
A distributed system for composite event detection
The key problem we address is how the publish/susbcribe system can be dynamically configured with the details of new subscriptions, so that subscribers receive the appropriate composite messages. Our solution lies in a network of event detector servers, each responsible for the detection of a particular event, in which, as new subscriptions are registered and new composite event defined, new event detector servers are added to the network. For example, consider the subscription λ(x :: T1 ; y :: T2 ).Seq x y When fully applied (i.e., events of type T1 and T2 have occurred in the proper sequence, we get the composite event Seq a :: T1 b :: T2 (where a :: T1 and b :: T2 are the component event occurrences). In [9], we showed how this expression can be converted into the expression (λ(x :: T1 ; y :: T2 ).Seq x y) ((λu :: T1 .u) a :: T1 ) ((λv :: T2 .v) b :: T1 ) We can construct a graphical representation of the structure of this expression in which sub-expressions appear as nodes and edges between nodes wherever one sub-expression is applied as a function to another. For the example above, we obtain the graph ?>=< 89:; E > 3B` B | BB | | BB | | BB || B | | ?>=< 89:; ?>=< 89:; E2 E1
(where E1 = λu :: T1 .u, E2 = λv :: T2 .v and E3 = λ(x :: T1 ; y :: T2 ).Seq x y). The structure of the expression graph defines the organisation of the event detector servers in the network required to be able to route messages from the local access points of the publishers of messages of type T1 and T2 to the subscriber. If a further subscription is made, e.g., λ(x :: Seq T1 T2 & z :: T3 ).Conj x y
then, as before, we can convert back to the expression (λ(x :: Seq T1 T2 & z :: T3 ).Conj x y) ((λu :: T Seq T1 T2 .u)a1 :: T Seq T1 T2 ) ((λv :: T3 .v) a2 :: T3 ) In this case, the expression graph is ?>=< 89:; E |> 6B` BB | BB | | BB || BB | || ?>=< 89:; ?>=< 89:; E5 E4
(where E4 = λu :: T Seq T1 T2 .u, E5 = λv :: T3 .v and E6 = λ(x :: Seq T1 T2 & z :: T3 ).Conj x y). If we attempt to configure the network for this subscription, then we already have an event detector server E3 to detect events of type T Seq T1 T2 but need new event detectors for E5 and E6 . Hence the network graph becomes ?>=< 89:; E > 61X || 11 || 11 | || 11 || 11 ?>=< 89:; E 3 11 |> B` BB | BB 11 | | BB 11 || B | B || ?>=< 89:; ?>=< 89:; 89:; ?>=< E1 E2 E5
The advantage of this approach is that event detection is shared between subscriptions where they have common sub-expressions. Hence network traffic is minimised. An alternative scheme might be to route all events to the server at which a subscription is registered, but this means that detection of events of the same type can be duplicated, resulting in more network traffic than need be the case. To this network structure, we add two further types of nodes: publication and subscription nodes. Publication nodes act as access points to the network for publishers, being the point of origin for messages of a particular type. Subscription nodes are the point of origin for subscriptions, being the access point at which a subscriber registers their subscription. A subscriber who wants to make a subscription does so through a subscription node. The subscription node, on behalf of the subscriber, then communicates with the rest of the network to put the subscription into effect.
4
Implementation of new subscriptions
One problem posed by the network architecture outlined above is that it is difficult for subscribers to find out the current organisation of the network. This follows from the dynamic nature of the network. Event detector servers are created and removed according to the set of current subscriptions, so at any one time, a subscriber cannot know which event detector servers exist and, if they do exist, where they are located on the network.
To overcome this problem, we introduce a secondary network of administration servers. This network is fixed and static. The responsibility of each administration server is to manage a subset of event detector servers, effectively a sub-graph of the network graph of event detector servers. Each event detector server is registered with its local administration server, in terms of its host location and the event type it is detecting. Adding the administration server network to the network graph example above gives ?>=< 89:; E hP |> 61X 1 PPPPP | | PPP 11 | PPP 11 || PPP | 11 P || ?>=< 89:; / 11 A1 BUUUU A2 E 3B ` > U B | 11 BB UUUUU || | BB | UU|U|U BB B 11 || BB | UUUUU BBB 1 ||| UUUU B BÃ ||| |~ U* ?>=< 89:; ?>=< 89:; ?>=< 89:; E5 E2 E1
(where A1 and A2 are administration servers). Note that the graph only shows the links between administration servers and the event detector servers they manage, and not the connections between the administration servers themselves. The administration server network is assumed to be a bidirectional network implemented through routing tables. When a new subscription is made, the subscription node responsible communicates with the nearest administration server to discover if an event detector server exists for each of the component events that are used in the structure of the composite event which is the new subscription. If the administration server does not manage an event detector node of that type, it broadcasts to the request to the other administration servers. If the event is not currently being detected, then an administration server spawns the appropriate event detector node and communicates with existing event detector nodes that provide the component events of the composite event detected by the new node to update their routing tables.
5
Related work
Most work into event languages and composite event detection has been in the context of active databases and ECA (Event-Condition-Action) rules. For example, HiPAC [ref], Compose [12], SAMOS [ref], and Snoop [8]. ECA Rules are user-defined, like subscriptions, and are of the form ”on event, if condition, then action”. Composite events are formed from primitive events (typically database operations) and operators representing conjunction, disjunction and sequence. (Other operators are also possible, for example, negation in Compose and Any in Snoop.) Unlike publish/subscribe, most active databases are centralised systems with global event detection. Hence the problem of distributing the structure of a composite event across different event detectors does not arise. Event language for distributed systems have typically been used to monitor and debug distributed system behaviour. Schwiderski [18] defines a composite event language for monitoring the behaviour of distributed systems,
but assumes that each event specification would be separately assigned to a node. Our work is related to Schwiderski’s, in that the implementation of the event language will use graph reduction[16], a well-known implementation technnique for functional programming languages, which is akin to the expression trees used in Schwiderski’s event detection algorithm. However, graph reduction does not suffer the disadvantages of Schwiderski’s approach since it does not store activated child nodes in a buffer associated with the node for a composite event. Another language for composite events in a distributed environment is GEM[17]. GEM scripts are used to configure nodes in a distributed system that are responsible for monitoring system behaviour. Again, however, individual scripts must be created to configure each node, and the question of which nodes need to be configured is user-defined. In their work into the design of Siena, a scalable, wide-area event notification system, Carzaniga, Rosenblum and Wolf [7] discuss factoring subscription patterns into their component parts so that each sub-pattern can be distributed across the network. However, there are no details on how this can be achieved, and furthermore, they don’t present an event language. In [5], they discuss patterns of filters for Siena, but only in terms of sequences of filters for temporally-ordered sequences of messages. Fabret et al [11] and Aguilera et al [1] consider the problem of efficient matching of events to subscriptions. Their description of subscriptions however is solely in terms of predicates over the attributes of messages.
6
Summary
This paper has sought to present a solution to the problem of dynamically configuring content-based publish/subscribe systems with new complex subscriptions. We defined an language for describing composite events as functional expressions that can be represented as graphs, and outlined a possible two-tier system architecture of dynamic event detectors and static administration servers. In this architecture, the organisation of the event detectors is essentially a merger of the graphs of all current subscriptions. The architecture presented is essentially a ’naive’ version. In a real setting, many thousands of subscriptions may be current, and under the current proposal, this may result in many thousands of separate event detector servers. From the perspective of performance, this is not likely to be a feasible implementation. A more probable implementation would focus on clustering a sub-graph of the event detector graph into a single server, with local event detection and communication of events between event detectors within the same server. This would reduce network traffic and increase performance, as well as reduce the load on the carrier network generally from the reduction in the number of servers. This suggestion, however, points to the future directions for our work, since the division of the network of event detectors between servers in order to optimise performance is likely to depend on the way in which the network expands as new subscriptions are made. So far little work has been done to
analyse the likely dynamic behaviour of this kind of application.
References [1] Marcos Kawazoe Aguilera, Robert E. Strom, Daniel C. Sturman, Mark Astley, and Tushar Deepak Chandra. Matching events in a contentbased subscription system. In Symposium on Principles of Distributed Computing, pages 53–61, 1999. [2] G. Banavar, T. Chandra, B. Muhkerjes, J. Nagarajarao, R. E. Strom, and D. C. Sturman. An efficient multicast protocol for content-based publish-subscribe systems. In Proceedings of the 19th IEEE Conference on Distributed Computing Systems (ICDCS’99), 1999. [3] L. Cabrera, M. Jones, and M. Theimer. Herald: Achieving a global event notification service, 2001. [4] A. Carzaniga, D. Rosenblum, and A. Wolf. Interfaces and algorithms for a wide-area event notification service. Technical report, Department of Computer Science, University of Colorado. [5] A. Carzaniga, D. Rosenblum, and A. Wolf. Achieving expressiveness and scalability in an internet-scale event notification service. In Proc. of the 19th ACM Symposium on Principles of Distributed Computing, Portland OR., July 2000., 2000. [6] A. Carzaniga, D. Rosenblum, and A. Wolf. Challenges for distributed event services: Scalability vs expressiveness, 2000. [7] A. Carzaniga, D. Rosenblum, and A. Wolf. Content-based addressing and routing: A general model and its application, 2000. [8] S. Chakravarthy and D. Mishra. Snoop: An Expressive Event Specification Language for Active Databases. Data and Knowledge Engineering, 14(1):1–26, November 1994. [9] S. A. Courtenage. Specifying and detecting composite events in contentbased publish/subscribe systems. Technical report, Cavendish School of Computer Science, University of Westminster, 2002. Tech. Report CSCS/2002/01. [10] P.Th. Eugster, P. Felber, R. Guerraoui, and A.-M Kermarrec. The many faces of publish/subscribe, 2001. [11] F. Fabret, F. Llirbat, J. Pereira, and D. Shasha. Efficient matching for content-based publish/subscribe systems. Technical report, INRIA, 2000. http://wwwcaravel.inria.fr/pereira/matching.ps. [12] N. Gehani, H. Jagadish, and O. Shumeli. Event specification in an active object-oriented database. In Proceedings of International Conference on Management of Data (SIGMOD’92), June 1992.
[13] Object Management Group. Object management group. corbaservices: Common object service specification. Technical Report, Object Management Group, July 1998. [14] R. Gruber, B. Krishnamurthy, and E. Panagos. The architecture of the ready event notification service. 1999. [15] Sun Microsystems Inc. Mountain View CA USA. Java Distributed Event Specification., 1998. [16] Simon L. Peyon Jones. The Implementation of Functional Programming Languages. Prentice Hall, 1987. [17] M. Mansouri-Samani and M. Sloman. Gem: A generalized event monitoring language for distributed systems. Imperial College of London, Research Report no. DOC 95/8, August 1995., 1995. [18] S. Schwiderski. Monitoring the behavior of Distributed Systems. PhD thesis, 1996. [19] B. Segall and D. Arnold. Elvin has left the building: A publish/subscribe notification service with quenching. In Proceedings of AUUUG’97, 1997. [20] R. Strom, G. Banavar, T. Chandra, M. Kaplan, K. Miller, B. Mukherjee, D. Sturman, and M. Ward. Gryphon: An information flow based approach to message brokering, 1998.