Data Dissemination supporting collaborative complex event processing

0 downloads 0 Views 242KB Size Report
Apr 27, 2010 - event processing: characteristics and open issues. Roberto Baldoni, Silvia ... Most distributed applications today receive events, process them and in turn ..... on multicast platforms for building reliable and consistent distributed ...
Data Dissemination supporting collaborative complex event processing: characteristics and open issues Roberto Baldoni, Silvia Bonomi, Giorgia Lodi, Leonardo Querzoni Sapienza - University of Rome Via Ariosto 25 I-00185, Rome, Italy

{baldoni, bonomi, lodi, querzoni}@dis.uniroma1.it ABSTRACT Most distributed applications today receive events, process them and in turn create new events which are sent to other processes. Business intelligence, air traffic control, collaborative security, complex system software management are examples of such applications. In these applications basic events, potentially occurred at different sites, are correlated in order to detect complex event patterns formed by basic events that could have temporal and spatial relationships among them. In this context, a fundamental functionality is the data dissemination that brings events from event producers to event consumers where complex event patterns are detected. In this paper we discuss the characteristics that a Data Dissemination service should have in order to support in the best way the complex event pattern detection functionality. We consider event traffic can reach thousands of events per second coming from different event sources; that is, the data dissemination service has to sustain high throughput. Finally, we present an assessment of a number of technologies that can be used to disseminate data in the earlier mentioned context, discussing scenarios where those technologies can be effectively deployed.

Keywords Data dissemination, Event pattern detection, Complex event processing, Collaborative systems, Data dissemination service Reliability

1.

INTRODUCTION

Today there is a trend towards enabling the construction of new IT services that enterprises will provide within changing scenarios according to different business needs and situations in a continuum [25]. The backbone of these scenarios is a dynamic and loosely coupled distributed system formed by autonomous entities (e.g., nodes, processes, organization clouds) distributed across different administrative domains. These entities need to cooperate and federate in order to effectively deploy complex monitoring applications that em-

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DD4LCCI’2010 April 27, Valencia, Spain Copyright 2010 ACM 978-1-60558-917-6/10/04 ...$10.00.

ploy “sense-and-respond” capabilities, required to augment the entities’ perceived knowledge of the global business scenario state and timely and appropriately respond to changes that may occur. Examples of these applications include business intelligence, air traffic control, collaborative security, to cite a few. The possibility to federate and cooperate offers the opportunity to pool resources together and share data for common benefit [9, 20]. Services can be composed so as to form complete end-to-end services, and the entities can pave the way to collaborative executable enterprises [25]. In order to provide such end-to-end services it is crucial to support event-driven computing so as to monitor and detect complex event patterns generated in nearly real-time. Complex Event Pattern Detection (CEPD) is becoming a fundamental capability of a variety of monitoring applications. It consists of detecting complex event patterns that may occur over a certain time period and possibly within a certain spatial distance. In most of the existing CEPD systems (e.g., [2, 5]), distributed and possibly heterogeneous event sources originate simple, basic events that are continuously pushed to a CEPD site. The CEPD site evaluates and recognizes, with continuos queries and rules, the presence of specific complex patterns of events that could have temporal and/or spatial relationships. As an example the emerging scenario of protection of financial critical infrastructures represents an interesting context where ”sense-and-response” capabilities and the notion of federation are of primary importance. Financial institutions are indeed increasingly exposed to a variety of security related risks, such as massive and coordinated cyber attacks [3, 8] aiming at capturing high value (or, otherwise, sensitive) information, or disrupting the service operation for various purposes. To date, these attacks have been monitored and faced in isolation by the single financial institutions using several tools that re-enforce their defense perimeter (e.g. intrusion detection systems, firewalls). These tools detect possible attacks by exploiting the information available at the financial institution carefully looking whether there exists some host that performs suspicious activities within certain time windows. For example, in case of stealthy scans attacks [31], basic events obtained from web servers’ logs available at financial institution sites can be used to recognize if there exists an unusually high number of TCP SYN requests possibly targeting an unusually high number of ports, and originating from the same external IP address. However, to detect these attacks a more large view of what is happening in the Internet is required, that could be obtained by sharing and combining the information coming from several

CEPD 2

CEPD 3

CEPD i

Data Dissemination Service

Event Sources Institution X

Event Sources Institution Y

Approximated Space/Time Service (es. GPS/NTP)

CEPD 1

Event Sources Institution Z

Figure 1: The CEPD system model

financial institutions, thus improving chances of identifying low volume activities, which would have gone undetected if individual institutions were exclusively relying on their local protection systems [23]. In order to exploit all that information available at different financial institutions and, in general, sites, it is mandatory to employ a data dissemination service that allows both the large volume of basic events originated from those sites to reach the CEPD destination systems for processing, and the results of the CEPD computation to reach all the sites interested in receiving the results. The data dissemination service should guarantee a number of Quality of Service (QoS) requirements (e.g., reliability, ordering) in order to sustain high-throughput CEPD systems: without these guarantees, the systems could run the risk for instance to loose events that are decisive for detecting complex event patterns, to obtain events and disseminate results in delay, thus loosing the potential benefits of the CEPD computation as interested receivers might be unable to timely react to what has been correlated and detected. The main contributions of this paper are thus the following: (i) we present a model of a data dissemination service for collaborative event detection environments (Section 2 and Section 3); (ii) we review a number of technologies, available on the market, and well known data dissemination paradigms that can be used for the implementation of the data dissemination service (Section 4), and finally, (iii) we discuss a number of open issues in this area (Section 5).

2.

COMPLEX EVENT PATTERN DETECTION

Our reference system model is depicted in Figure 1. A possibly large set of sources send a stream of basic events; events are managed by a data dissemination service that brings events to Complex Event Pattern Detection (CEPD) modules which in turn derive complex events representing significant activities. Complex events are obtained by properly correlating basic events that are apparently uncorrelated. We consider each CEPD module can be potentially interested in receiving any subsets of basic events and the CEPD correlation be based on the ability to recognize patterns of basic events, namely complex event patterns, that exhibit logical, timing and/or spatial relationships among them. Event Sources. We assume event sources are loosely syn-

chronized and geographically dispersed over different administrative domains. They can access a coarse grain common clock such as Network Time Protocol (NTP) and might have knowledge about their geographical location. This means that events can be labeled with timing and geographical timestamps. Note that due to the typical unpredictable delays of loosely coupled distributed systems like the Internet, this kind of synchronization cannot be used to reliably and totally order events produced by independent sources [22]. In this discussion we assume that the aggregated traffic generated by the sources can reach hundreds or thousands of events per second which entails that the data dissemination service has to sustain high throughput. CEPD Module. As in [26], a CEPD module can be a finite state automata which might be able to detect several patterns concurrently. Operators are used to express these patterns and can be embedded in a programming language; the automata is then obtained by compiling the language. We assume that a CEPD module can ideally keep in memory all the events received from the data dissemination service, i.e., event patterns cannot be missed by the fact that the CEPD module drops events due to memory limitations. Another important aspect for a CEPD module is when the automata can consume an event (i.e., the detection policy) coming from the data dissemination service. In the following we discuss pattern operators and detection policies respectively. Event Pattern Operators. Usually operators work on a set of events C, namely the context, kept in memory by the CEPD module (e.g., C could be the entire set of events kept in memory by a CEPD). To express complex patterns, operators can be broadly grouped into the following five classes: • Logical operators: logical pattern operators are and, or, and not. As an example the pattern e and e’ is detected as soon as e and e0 belongs to C. • Quantifier operators: given a predicate P (e), the any operator states that any event e in C makes P (e) true; the exists operator states that there is at least one event e in C that makes P true. • Temporal operators: these operators are related to the time interval in which events occur. They include sequence (identified by a list of events) stating that the pattern is satisfied if there exists a sequence of events occurred in the same order of the pattern definition list, and a number of operators such as time interval and time within [2] that can denote the interval time and the point time the events occurred. • Counting operators: counting operators include, among others, count which counts how many times certain basic events or event patterns have been detected into the context C [19]. • Spatial operators: these operators include distance used to evaluate if the events occur within a certain distance, and moving used in case the events describe consistent movements to a certain direction [19]. The spatial operators can be conveniently used in case of processing of events for instance generated by sensors in smart houses [26].

Note that all these operators can be used in conjunction, thus producing complex pattern expressions. For example, using the syntax of the open source complex event processing engine Esper [2], the pattern (A or B) where timer:within (5000) matches for any A or B events in the next 5 seconds. More complex operators can be also devised, for example based on statistical property of the context (e.g., trends). Detection Policy in CEPD. An important part of the design of a CEPD module is to determine the right time to consume an event incoming from the data dissemination service. As remarked by Pietzuch in [26], the problem is to decide when the next event in the event input stream can be safely consumed by the automata without running the risk that an event with an older timestamp is still being delayed by the network. Premature consumption could lead to incorrect detection or non-detection of an event pattern. Pietzuch identifies two main detection policies:

3.

DATA DISSEMINATION SERVICE

A data dissemination service for collaborative event pattern detection aims at carrying each event from its event source to all the CEPD modules interested in receiving it. Ideally, a data dissemination service should deliver instantaneously each event to all the interested CEPD modules in a totally ordered way, reflecting the real time order of the event generation. Note that such ideal behavior ensures consistency between the set of patterns detected by different deployed CEPD modules: if a specific pattern is detected by a CEPD module, then it will be detected by all the other CEPD modules. However, such behavior cannot be obtained in practice and a more realistic data dissemination service can be characterized by the following properties: • Reliability: Given an event e, generated by some source s, it has to be delivered to all the CEPD modules interested to e;

• Best Effort Detection (BED): basic events are consumed as soon as they arrive to the CEPD module. This policy may cause incorrect detection and then it can be applied by applications sensitive to delay and not to false positives.

• Ordering: Given two modules CEPDi and CEPDj and two events e and e0 such that e happens before e0 according to the common global time of the sources, if e and e0 are delivered to both CEPDi and CEPDj , e is delivered before e0 to both modules.

• Guaranteed Detection (GD): basic events are consumed when all the preceding events are available (i.e. basic events can be consumed when they are stable). This policy requires that basic events have to be delivered by the data dissemination service respecting some ordering property and avoiding the CEPD module to ignore patterns that should be detected. In an asynchronous setting, the guaranteed policy can introduce an unbounded delay. To avoid this problem, a Probabilistic Stability policy can be used that allows the CEPD module to consume events when they are stable according to a specified probability. This latter policy is a trade-off between BED and GD that makes possible to relax the constraint on the event deliveries of the data dissemination service.

Each of these properties, if satisfied by the data dissemination service, can determine if a given detection policy can be used for a certain type of event pattern. Let us consider a data dissemination service that does not ensure the Reliability property. In this case, any event can be lost or delivered only to a subset of the interested CEPD modules; as a consequence, independently of the operators describing the pattern, it would be possible to ensure only a BED policy and not a GD policy. In other words, the Reliability property is a necessary condition for adopting a GD policy. The Ordering property is not needed to deterministically detect most of the patterns (i.e. it is not needed to adopt a GD policy): logical, quantifier, counting, time interval, time within and spatial operators can be detected deterministically even if the data dissemination service guarantees Reliability and does not satisfy Ordering; the operators consider just the set of occurred events and not the order in which those events took place. In contrast, in order to deterministically detect patterns involving the sequence operator, the Ordering property is mandatory, otherwise only a BED policy detection can be used. Needless to say, if both Reliability and Ordering are satisfied the GD policy can be effectively used and all the event patterns can be deterministically detected. Beside Reliability and Ordering, a third fundamental property characterizing the data dissemination service is:

Application-specific Timed Detection. The detection of a complex pattern of events is an activity that can possibly happen at any point in time after the events that constitute the pattern have been generated by (multiple) sources. However, from an application point of view, it is often desirable that a pattern happened at the sources is recognized as quickly as possible by the CEPD module, that is, within some application dependent time bound. For example an application of collaborative security could require that detection should happen within a maximum of 10 seconds after the occurrence of the pattern at the sources. The pattern detection can happen only when all the events that constitute the pattern have reached the CEPD module and the module itself has completed the elaboration. Therefore, if both activities are guaranteed to end within the application time bound from the production of the last event that constitute the pattern, the system is able to provide a timed detection service for a given application. Note that if the service is not timed, the detection of patterns can become useless from an application viewpoint as interested receivers might be unable to timely react to what has been correlated and detected.

• Timeliness: there exists a time interval ∆ such that, given any event e delivered at a CEPD module at time t, e has been generated at a time t0 where t−∆ ≤ t0 < t; The Timeliness property does not directly impact the kinds of patterns that can be detected by the CEPD modules and it does not impact the detection policies. However, the time needed for the data dissemination service to convey events to the CEPD module has a strong influence on the timed detection; if the application time bound is larger than the value of ∆ no timed detection can be achieved. Note that, in the timeliness property, the value of ∆ can also depend

on the throughput that the data dissemination service has to sustain: the higher the throughput, the larger the value of ∆ and this, in turn, influences the timed event pattern detection.

4.

TECHNOLOGIES FOR DATA DISSEMINATION

This section reviews a number of state of the art data dissemination technologies and approaches that can be considered for supporting CEPD systems. We distinguish between message passing technologies, shared memory technologies and a hybrid between these two.

4.1

Data Distribution Service (DDS)

The OMG’s Data Distribution Service for Real-time Systems (DDS) is an API specification and interoperability wireprotocol that defines a data-centric publish-subscribe interaction paradigm [15]. DDS is based on a fully decentralized architecture, which provides an extremely rich set of configurable QoS policies to be associated with topics. A publisher can declare the intent of generating data with an associated QoS and writing the data in a topic. The DDS is then responsible for disseminating data (in either a reliable or best-effort fashion) in agreement with the declared QoS, that has to be compatible with the one defined by the topic. The DDS provides a set of QoS policies in order to control the timeliness properties of distributed data. Specifically, it defines the maximum inter-arrival time for data and the maximum amount of time that should elapse for distribution of data from publishers to subscribers. Owing to the properties discussed in Section 3, DDS guarantees reliability and timeliness in the data dissemination. No total ordering reflecting real-time event generation is ensured for events originated from multiple and heterogeneous sources. In addition, its guaranteed QoS properties can be effectively applied only when the DDS is deployed in a strictly controlled setting (i.e., in a managed environment); in a large scale, unreliable and unmanaged context as collaborative event detection environments can be, the performance obtainable by the DDS may become unpredictable [10], thus compromising the possibility to support high throughput CEPD systems.

4.2

Java Message Service (JMS)

The Java Message Service (JMS) [24] is a standard promoted by Sun Microsystems to define a Java API for the implementation of message-oriented middleware. A JMS implementation represents a general-purpose message oriented middleware (MOM) that acts as an intermediary between heterogeneous applications: the applications can choose the communication mode that better suits their specific needs such as pub/sub and point-to-point modes. JMS allows an application to require every message to be received once and only once or choose a more permissive (and generally more efficient) policy, which permits to drop and duplicate messages. It supports various degree of reliability through different basic and advanced mechanisms. Basic mechanisms include: message persistence through which a JMS application can specify that messages are persistent, message priority levels through which an application can define urgent messages, and, finally, message expiration through which an application can set a message expiration

time in order to prevent duplicated messages. The most advanced mechanism consists in the creation of durable subscriptions that allow subscribers that are idle to receive messages as soon as they come back on-line. Other features common in MOM products, like load balancing, resource usage control, and timeliness of messages are not explicitly addressed in the JMS specification. With respect to the properties discussed in Section 3, JMS guarantees reliability through different mechanisms including message persistence. However, as earlier stated no timeliness and total order reflecting the real-time event generation can be provided. Finally, it is worth noting that JMS is typically deployed through the use of a central server that implements all the MOM functionalities. This solution can then suffer from inherent drawbacks of a centralized system. The central server can become a single point of failure or security vulnerability: if the server crashes or is compromised by a security attack, the data dissemination process can be jeopardized. In addition, the volume of events the central server can disseminate in the time unit is limited by the server’s processing and bandwidth capacities which prevent the system to be sufficiently scalable to support high throughput CEPD systems.

4.3

Zookeeper

ZooKeeper [1] is a distributed, open-source coordination service for distributed applications that allows the implementation of higher level services like synchronization, configuration maintenance, leader election, group membership, event notification, locking, priority queue mechanism etc. Zookeeper is made up of replicated servers offering a shared hierarchical name space modeled as a file system. Each server, belonging to the ZooKeeper service, must know all the other and maintains an in-memory image of state, along with a transaction logs and snapshots in a persistent store. As long as a majority of the servers are up, the service will be available. Clients interact with the ZooKeeper service by establishing TCP connections and in case of connection failure, the client will (re-)connect to a different server. ZooKeeper ensures ordering using timestamps: each update is labeled with a number that reflects the order of all ZooKeeper transactions: subsequent operations can use such order to implement higher-level abstractions, such as synchronization primitives. Note that, having this type of ordering ensure sequential consistency of the updates (i.e. updates from a client will be applied in the order that they were sent). Other interesting properties of ZooKeeper are: atomicity (updates either succeed or fail), single system image (a client will see the same view of the service regardless of the server that it connects to), persistence (once an update has been applied, it will persist from that time forward until a client overwrites the update) and timeliness (the clients view of the system is guaranteed to be up-to-date within a certain time bound). With respect to the properties of a good data dissemination service, ZooKeeper ensures reliability (thanks to the persistence) and timeliness but not total order reflecting the real-time events generation. In fact, ZooKeeper ensures that a total order between all the events exists; however, this order is just one of the possible linearizations of the event partial order. Finally, using a memory based system implies that the amount of data that can fit in memory is limited: this makes this solution unfeasible in a context characterized

by high events throughput.

4.4

Bulletin Board (BB)

Bullentin Board [14] is a peer-to-peer topic-based shared memory service supporting write-subscribe communication semantics. Although BB’s function is fundamentally a memory, the data is pushed out through notifications rather than pulled out through reads; unlike pub/sub, there is no requirement for every subscriber to be notified of each individual update. The BB service supports a failure detection functionality through which the subscribers of each individual topic can learn of the writers joining and leaving the topic (either voluntarily or due to failure). In addition, it guarantees two main properties: safety for which (1) each update notification has a corresponding write occurring in an earlier point in the execution, and (2) the per-writer FIFO order is preserved; liveness that mandates that under stable conditions (i.e., in the absence of process and communication failures), for each update U to a topic T, each subscriber of T is notified of either U, or an update U’ which supersedes U. With regard to the data dissemination properties, BB guarantees reliability. A FIFO order per single source is ensured; however, this order is weaker than that required by our data dissemination model. In addition, to the best of our knowledge no specific timeliness assessment is currently available. Note that the provided write-subscribe semantics limits BB’s applicability to a specific context: it can be effectively used for disseminating management information among various distributed controllers in order to support control loops involving agents and application containers running on managed machines. Hence, it could not be recommended for a high throughput CEPD system. Finally, BB is fully integrated in IBM WebSphere Virtual Enterprise (WVE); no standalone version is currently available on the market.

4.5

Multicast Technologies

In the nineties there has been a large body of research on multicast platforms for building reliable and consistent distributed systems (e.g., [12]). This research has produced interesting add-on to commercial products (e.g., [6]) that are focused to keep consistent a certain number of replicas. One of the main consistency models introduced for such platforms has been the Virtual Synchrony on top of that several different types of multicast primitives have been proposed for ordered and reliable multicast diffusion (causal multicast, total order multicast etc.) [13]. Data dissemination could be implemented using such platforms that would deliver in a consistent and ordered way events to CEPD modules. However, it is well known that these platforms do work for small number of groups of small sizes. However, in most implementations, performance does not scale in terms of either number of groups, large groups or high multicast rate and they can show instability, as detailed in Section 5. This lack of scalability on several dimensions prevents the usage of such technology in a collaborative large scale environment. Currently, there is an interesting field of research that aims at providing scalable multicast platforms in the context of cloud computing. This research could reconcile some degree of consistency of multicast with the need to sustain high throughput (e.g., [28]).

4.6

Gossip Technology

The recent shift from small/mid-scale distributed systems deployed on very controlled environments, to large or hugescale systems, geographically distributed over the world where processes interact using unreliable links that traverse several independent administrative domains showed the limits of traditional deterministic approaches to information dissemination. This shift led to the design of novel data dissemination algorithms based on the gossip paradigm. These algorithms are based on the so-called epidemic approach where data is disseminated like the spread of a contagious disease or the diffusion of a rumor. This approach has several advantages that have been thoroughly studied: few initial infection points are sufficient to quickly infect the whole population as the number of infected processes grows with an exponential trend. Moreover, these algorithms are also strongly resilient to the premature departure of several processes, making them very robust against failures. The gossip approach has been successfully applied to a variety of application domains like database replication [18], cooperative attack detection [30], resource monitoring [27], and publish/subscribe based data dissemination [16]. Taking into account the properties of an ideal data dissemination service, most of such algorithms based on the gossip paradigm are able to deliver a huge amount of events in a geographically distributed setting with nice reliability properties. Thanks to the quick spread of “infections” also the time figures are very interesting. However, such properties can be guaranteed only on a probabilistic basis thus only allowing best effort policies. Moreover, gossip-based algorithms usually do not provide total order services.

5.

OPEN ISSUES

Reliability vs system stability. In order to ensure reliable event dissemination, a typical way is to impose a (strong or weak) feedback loop between sources and receivers. However, it is well known that when a system has to sustain high throughput applications, synchronization points are sources of system instability [17]. Examples of system oscillations have been reported in Air-Traffic-Control systems, Amazon system platform, and IP multicast [11]. Decoupling the system, using asynchronous communications whenever is conceivable and reducing synchronization points as few as possible are key requirements to effectively support high load and scalable applications [29]. As a consequence, data dissemination services should deal with events that can be either lost or received by a subset of target CEPD receivers. This implies that the GD policy introduced above cannot be ensured; one has to rely more on a probabilistic stability policy. Gossiping techniques can be used to retrofit reliability in data dissemination service. In addition, in order to maximize the probability of identifying a specific event pattern, several CEPD modules detecting the same pattern and disseminating the result of the detection can be deployed. Timeliness vs Priority and Deadlines. In order to guarantee timeliness of some activity in a system, a time constraint called priority has been traditionally used. Prioritizing tasks is a way to attempt to maximize the number of tasks that complete within their deadlines. This prioritization policy becomes more effective when the system has to sustain high throughput applications. Current data dissemination services provide a marginal support to prioritize the

forwarding of some events towards the destinations with respect to other events. In particular, if we consider an event e that belongs to some event pattern that has to occur in some time interval, the routing of e from source to destination should be prioritize with respect to events that do not belong to neither any event pattern nor timely constraint event patterns. An interesting challenge would be to use temporal event patterns defined within the CEPD modules so as to assign priorities to events for the routing inside the data dissemination service. Privacy. Processing and disseminating large amount of data coming from multiple collaborative sites, also possible thanks to a broader use of programming models such as MapReduce [21] and technologies like cloud computing [4], poses further security and privacy challenges that cannot be ignored. If we consider the financial context mentioned in the introduction of this paper, it is likely that sensitive data of financial institutions’ customers, which are to be properly protected from malicious users, flow from data sources through data dissemination services in order to reach predefined CEPD destination systems. Most platforms for large scale processing, and all the data dissemination specifications and paradigms analyzed in this paper do not embody specific mechanisms for managing data privacy, confidentiality and integrity. There exist specific vendor implementations of those specifications that include security mechanism [7]; however, we believe that privacy enforcement is still an open issue that needs to be addressed for both data dissemination services and processing systems in order to foster a larger use of collaborative environments.

Acknowledgements This work is partially supported by the EU project COMIFIN.

6.

REFERENCES

[1] Zookeeper documentation. http://hadoop.apache.org/ zookeeper/docs/r3.2.1/zookeeperOver.html. [2] Where Complex Event Processing meets Open Source: Esper and NEsper. http://esper.codehaus.org/, 2009. [3] FBI investigates 9 Million ATM scam. http://www.myfoxny.com/dpp/news/090202\_FBI\ _Investigates\_9\_Million\_ATM\_Scam, 2010. [4] Google appengine. http://code.google.com/appengine/, 2010. [5] JBoss Drools Fusion. http://www.jboss.org/drools/drools-fusion.html, 2010. [6] Jgroups. http://www.jgroups.org//, 2010. [7] Real time messaging and integration middleware. http://www.rti.com/, 2010. [8] Update: Credit card firm hit by DDoS attack. http://www.computerworld.com/securitytopics/ security/story/0,10801,96099,00.html, 2010. [9] M. Balazinska, H. Balakrishnan, and M. Stonebraker. Contract-based load management in federated distributed systems. In NSDI, pages 197–210, 2004. [10] R. Baldoni, L. Querzoni, and S. Scipioni. Event-based data dissemination on inter-administrative domains: Is it viable? In FTDCS08, pages 44–50, Washington, DC, USA, 2008. IEEE Computer Society. [11] K. Birman. Rethinking multicast for massive-scale platforms. In ICDCS, page 1, 2009. [12] K. P. Birman. The process group approach to reliable distributed computing. Commun. ACM, 36(12):36–53, 103, 1993.

[13] K. P. Birman and T. A. Joseph. Exploiting virtual synchrony in distributed systems. In SOSP, pages 123–138, 1987. [14] V. Bortnikov, G. V. Chockler, A. Roytman, and M. Spreitzer. Bulletin Board: A Scalable and Robust Eventually Consistent Shared Memory over a Peer-to-Peer Overlay. In ACM LADIS 2009, 2009. [15] A. Corsaro, L. Querzoni, S. Scipioni, S. T. Piergiovanni, and A. Virgillito. Quality of service in publish/subscribe middleware. In R. Baldoni and G. Cortese, editors, Global Data Management. IOS Press, 2006. [16] P. Costa and G. P. Picco. Semi-probabilistic content-based publish-subscribe. In Proceedings of the 25th IEEE International Conference on Distributed Computing Systems, pages 575–585, Washington, DC, USA, 2005. IEEE Computer Society. [17] G. DeCandia et al. Dynamo: amazon’s highly available key-value store. In SOSP, pages 205–220, 2007. [18] A. Demers et al. Epidemic algorithms for replicated database maintenance. In PODC ’87: Proceedings of the sixth annual ACM Symposium on Principles of distributed computing, pages 1–12, New York, NY, USA, 1987. ACM. [19] O. Etzion. Event Processing Architecture and Patterns. DEBS Tutorial, July 2008. [20] Y. Huang, N. Feamser, A. Lakhina, and J. J. Xu. Diagnosing network disruptions with network-wide analysis. In SIGMETRICS’07, San Diego, California, USA, 12-16 June 2007. [21] D. Jeffrey and S. Ghemawat. MapReduce: simplified data processing on large clusters. Commun. ACM, 51(1):107–113, 2008. [22] C. Liebig, M. Cilia, and A. Buchmann. Event composition in time-dependent distributed systems. In Proceedings of the Fourth IECIS International Conference on Cooperative Information Systems, page 70, Washington, DC, USA, 1999. IEEE Computer Society. [23] G. Lodi et al. Defending financial infrastructures through early warning systems: the intelligence cloud approach. In Proceedings of the 5th Annual Workshop on Cyber Security and Information Intelligence Research, pages 1–4, New York, NY, USA, 2009. ACM. [24] S. Microsystem. Java Message Service (JMS). http://java.sun.com/products/jms/, 2008. [25] NESSI. NESSI Strategic Agenda, 2009. [26] P. R. Pietzuch. Hermes: A Scalable Event-Based Middleware. In Ph.D. Thesis, University of Cambridge, 2004. [27] R. Van Renesse, K. P. Birman, and W. Vogels. Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining. ACM Trans. Comput. Syst., 21(2):164–206, 2003. [28] Y. Vigfusson et al. Dr. multicast: Rx for data center communication scalability. In EUROSYS, 2010. [29] W. Vogels. Eventually consistent. Commun. ACM, 52(1):40–44, 2009. [30] G. Zhang and M. Parashar. Cooperative detection and protection against network attacks using decentralized information sharing. Cluster Computing, 13(1):67–86, March 2010. [31] C. V. Zhou, C. Leckie, and S. Karunasekera. A survey of coordinated attacks and collaborative intrusion detection. Computer and Security 29 (2010), pages 124–140, 2009.