Efficient Publish/Subscribe through a Self-Organizing Broker Overlay ...

11 downloads 2415 Views 758KB Size Report
Email: {baldoni,beraldi,querzoni,virgi}@dis.uniroma1.it. Recently many scalable and ..... Zone of interest and list of matched events for brokers in the example of ...
c The Author 2005. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved.

For Permissions, please email: [email protected] doi:10.1093/comjnl/bxh000

Efficient Publish/Subscribe through a Self-Organizing Broker Overlay and its Application to SIENA ROBERTO BALDONI 1 , ROBERTO B ERALDI 1 , L EONARDO Q UERZONI 1 A NTONINO V IRGILLITO12 1

AND

Dipartimento di Informatica e Sistemistica, Universit`a di Roma “La Sapienza”, Via Salaria 113, 00198 Roma, Italia 2 Istituto Nazionale di Statistica, Via Cesare Balbo 16, 00184 Roma, Italia Email: {baldoni,beraldi,querzoni,virgi}@dis.uniroma1.it

Recently many scalable and efficient solutions for event dissemination in publish/subscribe (pub/sub) systems have appeared in the literature. This dissemination is usually done over an overlay network of brokers and its cost can be measured as the number of messages sent over the overlay to allow the event to reach all intended subscribers. Efficient solutions to this problem are often obtained through smart dissemination algorithms that avoid flooding events on the overlay. In this paper we propose a complementary approach, that is obtaining efficient event dissemination by reorganizing the overlay network topology. More specifically, this reorganization is done through a self-organizing algorithm executed by brokers whose aim is to directly connect, through overlay links, pairs of brokers matching same events. In this way, on average, the number of brokers involved in an event dissemination decreases, thus, reducing its cost. Even though the paradigm of the self-organizing algorithm is general and then applicable to any overlay-based pub/sub system, its concrete implementation depends on the specific system. As a consequence we studied the effect of the introduction of the self-organizing algorithm in the context of a specific system implementing a tree-based routing strategy, namely SIENA, showing the actual performance benefits through an extensive simulation study. In particular performance results point out the capacity of the algorithm to converge to an overlay topology accommodating efficient event dissemination with respect to a specific scenario. Moreover, the algorithm shows a significant capacity to adapt the overlay network topology to continuously changing scenarios while keeping an efficient behavior with respect to event dissemination. Keywords: Publish/Subscribe systems, overlay networks, self-organization, scalability, distributed systems Received 00 Month 2004; revised 00 Month 2004; accepted

1.

INTRODUCTION

In the last years, a growing attention has been paid to the publish/subscribe (pub/sub) communication paradigm as a mean for disseminating information (also called events) through distributed systems on wide-area networks [1]. Participants to the communication can act as publishers, that submit information to the system, and as subscribers, that express their interest in specific types of information. Main characteristics of such many-to-many communication paradigm are [2]: the interacting parties do not need to know each other (decoupling in space), partners do not need to be up at the same time (decoupling in time), and send/receive operations do not block participants (decoupling in flow). So, the publish/subscribe paradigm has been largely recognized as the most promising

application-level communication paradigm for integration of information systems. Pub/sub systems can be classified in two main classes: topic-based systems (e.g. TIB/Rendezvous [3], SCRIBE [4] and BAYEUX [5]) and content-based systems (e.g. SIENA [6], JEDI [7], HERMES [8], Gryphon [9] and REBECA [10]). In a topic-based system, processes exchange information through a set of predefined subjects (topics) which represent many-to-many distinct (and fixed) logical channels. Content-based systems are more flexible as subscriptions are related to specific information content and, therefore, each combination of information items can actually be seen as a single dynamic logical channel. This makes the usage of methods for efficient event dissemination based on multicast trees unfeasible due to the huge number of trees to be built and maintained.

T HE C OMPUTER J OURNAL VOL . 00 N O . 0, 2006

2

R. BALDONI et al.

Topic-based and content-based pub/sub systems are mainly realized considering an overlay network (usually in the form of a static network of application-level event routers called event brokers), built over a physical one such as the Internet, where each publisher or subscriber is connected to any broker of the network. In this case events are diffused from publishers to subscribers using a diffusion algorithm over this overlay network ([6, 7, 11, 12, 9, 8, 3]). The main issue with pub/sub system based on an overlay network of brokers is how to devise smart diffusion algorithms that avoid any trivial approach based on event flooding. SIENA and TIB/Rendezvous, for example, employ a specific mechanism that filters out an event from that parts of the overlay network through which no interested subscribers are reachable. This mechanism reduces the cost of the diffusion of an event e, that can be measured as the number of overlay hops necessary to reach all the interested subscribers. Lowering this cost is a relevant objective [6, 11, 13] because it allows to reduce the overall processing load at each broker, due to event matching against subscriptions and event forwarding, thus enhancing the scalability of the whole system. [6] and [11] point out that the filtering capabilities of an event dissemination algorithm perform at their best when the distributions of subscriptions and events present a certain degree of regionalism, that is, brokers matching same events are placed within a limited number of overlay hops. On the contrary, when brokers matching the same events are far from each other in terms of overlay hops, the performance gain obtained using a smart event dissemination algorithm, with respect to an algorithm that simply floods the network with events, becomes negligible [14]. Taking into account the previous consideration, in this paper we propose a synergistic and complementary approach to help an event dissemination algorithm to reduce the cost of such a diffusion by dynamically reorganizing the overlay network topology. More specifically, this reorganization is done through a self-organizing algorithm, executed by brokers, whose basic principle is to cluster brokers sharing similar interests within a limited number of overlay hops. This self-organization algorithm does its best to reach an overlay topology that accommodates the previous principle by looking at the most recent events generated and matched by brokers. In order to design such a self-organizing algorithm, several points must be addressed: • define a measure of the similarity of interests between two brokers; • discuss how to execute a convenient reorganization of the overlay topology exploiting only information local at each broker; • find a balance between optimizing overlay hops and network-level metrics, like latency or bandwidth, in order that a reconfiguration of the overlay network does not spoil an initial overlay topology favorable with respect to network-level metrics.

While the measure of similarity of interests is a general notion defined in Section 2, the self-organizing algorithm and its relation with the network-level metrics is strictly dependent on the specific event routing strategy employed by the pub/sub system. Therefore, Section 3 presents a self-organizing algorithm specifically tailored to the treebased event routing strategy employed within many pub/sub systems, like SIENA [6], Rebeca [10], Jedi [7] and REDS [15]. Section 4 reports an extensive simulation study. Experimental results show (i) the ability of the algorithm to converge to a stable overlay topology (i.e. algorithm convergence), (ii) its ability to adapt to continuously changing scenarios while keeping an efficient behavior with respect to event dissemination (i.e. algorithm stability), (iii) the performance gain obtainable in various scenarios with respect to a plain event dissemination algorithm and (iv) how the selforganization algorithm can effectively cope with problems related to network-level metrics like latency or bandwidth. Section 5 compares our work with both different methods used for efficient event dissemination and other related works. Finally, section 6 concludes the paper. 2.

A SELF-ORGANIZING OVERLAY NETWORK

A pub/sub system is a mechanism used to efficiently diffuse events issued by publishers to a set of subscribers that can dynamically change. An event is a piece of information made up of a structured set of pairs (attribute, value), plus an optional payload. Each subscription selects, by means of constraints expressed on attributes, a subset of events. We say that an event matches a subscription if the event attribute values satisfy the constraints defined by the subscription. A pub/sub system is structured as a set of event brokers, {B1 , . . . , BN }. Each broker accepts connections from clients that issue events and subscriptions3 . Brokers are interconnected through transport-level links which form an overlay network, that can be abstracted as a generic graph, where brokers represent vertices and transport level links (also called overlay links) represent undirected edges. In particular each broker Bi maintains a set of open connections with other brokers (its neighbors) and the overlay link connecting broker Bi with broker Bj is denoted as li,j . As examples of pub/sub systems based on overlay network of brokers, JEDI [7], SIENA [6] and Rebeca [11] are based on acyclic topologies; Kyra [12] exploits multiple cliques that are interconnected among each other. In the following we refer exclusively to the former class of systems (Tree-based event routing systems). We assume that the tasks related to overlay topology maintenance (such as insertion of new brokers or topology repairing after broker failures) are carried out by a human operator, as our focus is on exploiting self-organization techniques for enhancing performance rather than dependability. Moreover, using a non-automatic management for handling 3 Brokers completely act on behalf of their clients, so clients will not be considered in the following for simplicity.

T HE C OMPUTER J OURNAL VOL . 00 N O . 0, 2006

E FFICIENT P UBLISH /S UBSCRIBE THROUGH A S ELF -O RGANIZING B ROKER OVERLAY joins/departures/crashes of brokers is a common choice in most of the above-mentioned systems [6, 16, 11, 8] as topology changes due to such events are expected to be rare. Nevertheless, it is important to note that the resilience to node faults of broker overlays based on acyclic graphs can be increased employing self-repair techniques like those introduced in [17]. A publisher can inject events in the system through any broker of the overlay. Each subscription issued by a subscriber through a subscribe operation is stored into a broker, and it can be later removed by the same subscriber through an unsubscribe operation. To diffuse an event to the set of intended subscribers, a pub/sub system employs a diffusion mechanism running over the overlay network. This mechanism is able to route events through the overlay from the source broker to those hosting subscription matched by the same event (target brokers). A broker involved in the diffusion of an event e can be either a target broker of e or a broker that acts as a router of the event, even though it has no subscription matching e. In the following we refer to the latter type of broker as a pure forwarder for e. Ideally, in order to lower the cost due to the event dissemination, one broker should only process events that match the subscriptions it hosts. Therefore, to reach this ideal goal, the event dissemination should not involve pure forwarders. Let us remark that in many of the pub/sub systems the overlay network is static, and this makes it impossible to exclude pure forwarders from each event dissemination. A necessary condition to exclude pure forwarder from an event dissemination is that the overlay can be dynamically changed. We propose a pragmatic approach to reorganize the overlay network, based on a simple principle: we try to create clusters of brokers that supposedly will be target for the same events in the near future. In this situation, an event is expected to follow a single path toward the cluster rather than being diffused in different directions towards dispersed target brokers. As an example Figure 1(a) depicts the diffusion of an event e, published by P1 , toward two interested subscribers S1 and S2 ; the cost of this diffusion is six overlay hops in the best case (black arrows). If B9 and B5 were directly connected, the cost of the diffusion could be lowered to as much as three overlay hops (see Figure 1(b)). To put the previous principle in practice, the following problems have to be addressed: (i) determining a method to compute the similarity of interests between brokers (ii) implementing the reorganization exploiting only local knowledge at each broker and (iii) finding a balance between overlay hops and network-level metrics, like latency or bandwidth, so that a reorganization of the overlay network topology would not disrupt a favorable overlay topology w.r.t. network-level metrics. The final result must then be analyzed to evaluate the trade-off between the gain obtained on event diffusion cost and the overhead generated by the self-organization algorithm. In the rest of this section we discuss point (i). Solutions to the last two problems are intimately related to the event

3

B7

B1

B4

B3 B6

B8

B2 B9

Pure forwarders

B5

publish(e)

notify(e) notify(e)

P1

S2

S1

(a) B7

B1

B4

B3 B6

B8

Pure forwarders

B2

B5

B9 publish(e)

notify(e) P1

notify(e)

S2

S1

(b)

FIGURE 1. Diffusion of an event on two similar overlay networks. Black arrows show messages used by an hypothetical optimal (e.g. with minimum event dissemination cost) event dissemination algorithm to notify an event e injected on broker B2 to interested subscribers connected to brokers B9 and B5 . Note how the presence of an overlay link connecting B9 to B5 in the second network greatly reduces the number of messages generated.

dissemination mechanism of the specific pub/sub system. Therefore we provide these solutions in the context of broker-based pub/sub systems relying on tree-based event routing in Section 3. Measuring the similarity of interests between two brokers. Let mi be a list containing (attribute, value) pairs4 of the last Qi events matched by the broker Bi at a given time. We define the similarity of interests between Bi and Bj as the ratio

ai,j =

|{e ∈ mi : ∃s ∈ SBj , e matches s}| . Qi

where SBj is the set of subscriptions stored at broker Bj . ai,j is an estimation of the probability that a new event matched by one of the two brokers will also be matched by 4 m does not contain event payloads that are transparent to the matching i mechanism

T HE C OMPUTER J OURNAL VOL . 00 N O . 0, 2006

4

R. BALDONI et al.

the other one. When ai,j is close to one then almost all the last Qi events were matched by both Bi and Bj . By considering an obvious “locality” principle, we argue that further common matches can happen with a high probability in the future 5 . If ai,j is zero then either subscriptions hosted by Bi and Bj are disjoined or none of the last matched events by Bi have also been matched by Bj and vice versa. In other words, let us consider the case that Bi and Bj are connected through a direct overlay link: if ai,j tends to one, then the probability that Bi acts as a pure forwarder for an event e matched by Bj tends to zero (and vice versa). Note that since ai,j is related to the actual number of common matched past events, this number dynamically adapts to the changes in the distributions of events and subscriptions, i.e. no a-priori knowledge is required about them. Given the previous definition each link li,j of the overlay network can then be labeled with a weight w(li,j ) representing the similarity of interests between the brokers connected through it, i.e. w(li,j ) = ai,j . 3.

B2 0

5 The principle holds as long as subscriptions can be considered longlived and the distribution of events in the event space changes slowly w.r.t. the event injection ratio. Both these assumptions are in fact true for a wide range of different publish/subscribe applications.

B6

DREQ(m 9 , HS(B 9 ,B 1 ))

DREQ(m 9 , HS(B 9 ,B4 ))

0.25

B4

DREQ(m 9 , HS(B 9 ,B 8 ))

0

DREQ(m 9 , HS(B 9 ,B6 ))

B7 B3

DREQ(m 9 , HS(B 9 ,B 5 ))

0

B1

B8

DREQ(m 9 , HS(B 9 ,B7 ))

0.25

DREQ(m 9 , HS(B 9 ,B 3 ))

0.25

0.25 B5

B9

(a)

B2

0.25 0

B6

DREP(0.25, 9, HS(B 9 ,B 1 ))

B1

B8

DREP(0.75, 9, HS(B 9 ,B5 ))

0.25

B4

DREP(0.75, 9, HS(B 9 ,B5 ))

DREP(0.75, 9, HS(B 9 ,B5 ))

0.25

SELF-ORGANIZING ALGORITHM

In this section we present a self-organization algorithm specifically tailored to SIENA, a popular content-based pub/sub system introduced in [6]. However, our algorithm is more general and can be applied to all pub/sub systems relying on a tree-based event routing strategy, such as Rebeca [10], Jedi [7] and REDS [15]. SIENA [18] is a content-based pub/sub system which realizes event dissemination through a Content-Based Routing (CBR) algorithm over a tree-shaped overlay network of brokers. Subscriptions and events are defined over a fixed event schema, constituted by a set of n attributes each characterized by a name and a type, where the type can be a common basic type such as integer, real or string. An event is therefore a set of n values, one for each attribute whose type is consistent with that attribute’s type. If all values are defined for every attribute, an event can be considered as a point in the n-dimensional event space. In the sequel we abstract the notion of subscription as a subset of events inside the event schema, defined by using a set of constraints over the attributes. In SIENA, subscriptions issued by each subscriber on a broker are inserted in the broker’s subscription table. We define the zone of interest of a broker Bi , denoted ZBi , as the union of all subscriptions contained in Bi ’s subscription table. Each Broker Bi is connected to a maximum of Fi neighbor brokers through overlay links (in the rest of the paper, for the sake of simplicity, we will consider this value equal for all brokers, i.e. Fi = F ). The routing table of Bi contains an entry for each of these links and each entry provides information about the subscriptions hosted

0.25

0

DREP(0.75, 9, HS(B 9 ,B5 ))

B7 B3

DREP(0.75, 9, HS(B9 ,B 5 ))

0

DREP(0.75, 9, HS(B 9 ,B 5 ))

0.25

0.25 B9

B5

(b)

FIGURE 2. Broker discovery phase. The first picture shows the propagation of DREQ messages started at B9 . The second one shows corresponding DREP messages while they converge toward B9 . Labels on links represent their weight.

on brokers reachable through the corresponding link. More specifically a covered zone Zli,j is associated to each link li,j and it is defined as the union of all the zone of interests belonging to brokers that lies beyond li,j . Routing tables are kept up-to-date by SIENA using a smart update mechanism whose details are here omitted (see [6] for such details). Event dissemination in SIENA is realized through a simple mechanism: each broker Bi receiving an event e through a link li,j , (i) matches e against subscriptions contained in its subscription table and then (ii) forwards e through all the links li,k (with k 6= j) such that e matches Zli,k . This behavior ensures that e is forwarded only through those links which can lead to interested subscribers. Note that a broker forwarding e to some neighbor in (ii) without any match in (i) is actually acting as a pure forwarder for e. 3.1.

Algorithm Details

Let us consider two brokers Bi and B` that are not directly connected by an overlay link. The goal of the selforganization algorithm is to connect directly Bi to B` only if there is an overlay link lp,q in the path from Bi to B` such that ai,` is larger than the similarity of interests of the brokers connected by lp,q , i.e. ap,q . Moreover to keep the topology acyclic as required by SIENA’s CBR algorithm, lp,q must be

T HE C OMPUTER J OURNAL VOL . 00 N O . 0, 2006

E FFICIENT P UBLISH /S UBSCRIBE THROUGH A S ELF -O RGANIZING B ROKER OVERLAY Broker B1 B2 B3 B4 B5 B6 B7 B8 B9

ZBi S1 , S10 S2 S3 , S 6 S4 , S 8 S1 , S7 , S10 , S11 S6 S7 S8 , S 6 S1 , S3 , S10 , S11

mi e1 , e10 e2 e3 , e6 e4 , e8 e1 , e7 , e10 , e11 e6 e7 e6 , e8 e1 , e3 , e10 , e11

TABLE 1. Zone of interest and list of matched events for brokers in the example of Figure 2(a)

teared down. The overall self-organization algorithm can be split in four phases: (i) (ii) (iii) (iv)

triggering of a broker discovery, broker discovery, tear-down link selection, overlay topology update.

For the sake of clarity, along the description of the various algorithm phases we refer to a running example over the network of brokers depicted in Figure 2(a). This figure represents a snapshot of a system at a given time whose state, i.e. zone of interest and the last four events matched (Q = 4) by each broker6 , is reported in Table 17 . In Figure 2(a) each link is labelled with its weight, i.e. the associativity between the brokers it connects; for example, the weight of link l3,9 is given by w(l3,9 ) = a3,9 = |m3 ∩ m9 |/Q = |{e3 , e6 } ∩ {e1 , e3 , e10 , e11 }|/Q = 1/4 = 0.25. In this example we suppose that an event ex matches only subscription Sx .

3.1.1. Triggering of a broker discovery The triggering phase, executed by a broker Bi , launches the execution of each run of the self-organization algorithm that could lead to an update of the overlay topology. The triggering occurs every δ events received by the broker; the value of δ is updated at the end of each run according to its outcome and following a backoff policy: if the overlay topology has changed, then the self-organization succeeded and δ is set to a base value, otherwise δ = 2δ. The choice of employing a backoff mechanism to limit the frequency of runs is justified by the fact that, if a selforganization run does not lead to a change of the overlay topology, it will take time before the conditions in the overlay network become favorable for a successful execution of a new self-organization run started by the same broker. Once a run of the self-organization algorithm has been launched, the following Activation Predicate (AP) is checked for each overlay link of Bi : 6 For simplicity, we consider in the following a same value Q of Q at i each broker. 7 For the sake of clarity we depict in figures a particular scenario in which each subscription Si is matched only by event ei .

5

AP : αi,j (mi ) > ai,j . where αi,j (mi ) is the number of events in mi , that match Zli,j divided by Q. The set ZBj , needed to calculate ai,j , is a subset of Zli,j and can be thus derived from it without the need any message exchange between Bi and Bj . If the predicate is false, then the run terminates immediately. Otherwise, for each link li,j such that the predicate holds, a broker discovery procedure is started. The rationale behind this action is that if AP holds for a link li,j , then there could be a broker behind Bj with a similarity of interests with Bi equal to αi,j (mi ), that is larger than ai,j . To avoid starting a discovery based on too little information, the list mi has to contain at least Q/2 events. At the end of the triggering phase, k independent broker discovery procedures are started, where k is the number of links for which AP is satisfied. For each activated procedure the broker B` with the highest similarity with Bi , located behind the link, will be returned, as we detail in the following. Considering the example in Figure 2(a), let us assume that the self-organization algorithm is triggered on broker B9 by the arrival of an event. B9 checks AP on its only link l9,3 and finds out that α9,3 (m9 ) = 1 is greater than a9,3 = 0.25; given the fact that AP is true on l9,3 a broker discovery phase is started on this link. 3.1.2. Broker Discovery The broker Bi starting the broker discovery on one of its links sends through it a request message8 DREQ containing mi and a hop sequence HS. A hop sequence HS represents the overlay network path joining two brokers Bx and By and is defined as a list of pairs (broker id, weight), {(Bx , 0), (Bx+1 , w(lx,x+1 )), . . . , (By , w(ly−1,y ))}, such that any two adjacent brokers in the list are connected via an overlay link. We denote such a path as HS(Bx , By ). Note that the weight in the first pair is always 0. The hop sequence HS contained in the DREQ message sent by Bi is initialized to {(Bi , 0)}. The size of DREQ messages grows with the size of mi and the length of HS. The forwarding of the DREQ message is driven by the list of events mi , as explained in the following. When a broker Bj receives DREQ on one of its links lk,j , it (i) computes its own associativity w.r.t. Bi , ai,j , (ii) updates HS by adding (Bj , w(lj,k )), i.e. HS(Bi , Bj ) = HS ∪ (Bj , w(lj,k )) and, finally, (iii) computes the following Forwarding Predicate (FP): FP : ∃lj,h 6= lj,k : αj,h (mi ) > ai,j . FP is based on the same idea as AP: when FP is true for a link lj,h there are some possibilities to locate another node behind that link with a similarity w.r.t. Bi larger than ai,j . In this case a copy of the DREQ message is sent on that 8 We consider messages exchanged by the algorithm sent through reliable (or perfect) point-to-point channels. An implementation example of such channels over fair-lossy links (e.g. TCP or UDP) can be found in [19]

T HE C OMPUTER J OURNAL VOL . 00 N O . 0, 2006

6

R. BALDONI et al.

link9 . Note that the evaluation of FP is completely based on information that is either locally maintained on Bj (i.e. Zlj,h and ZBj ) or contained in the DREQ message (i.e. mi ). If no link exists such that FP is satisfied, then a Discovery Reply message, DREP, is sent back to Bi along the path stored in HS. The DREP message contains ai,j , HS(Bi , Bj ) and alj , where alj is the number of overlay connections that can still be created on Bj before the limit F is reached. Referring to our example (Figure 2(a)), the DREQ message generated on B9 is forwarded through various brokers toward B1 and B5 which are the nodes with the largest similarity of interests with B9 . Note that in B3 the predicate FP is true for links l3,1 and l3,6 , and a copy of DREQ is sent through them both. On the other hand, in B6 FP is true only for link l6,8 and not for l6,2 as α6,2 (m9 ) = 0. For each lj,h satisfying FP, Bj forwards the DREQ and then waits for the corresponding reply message DREP(ai,n , aln , HS(Bi , Bn )), where Bn is the broker reachable through lj,h that can offer the highest similarity of interests w.r.t. Bi . As soon as Bj receives the reply from each link through which previously DREQ was forwarded, it computes the maximum among all the similarity values carried in the received replies and its own ai,j . Let ai,`0 be the maximum and B`0 be the corresponding broker; then Bj sends a DREP(ai,`0 , al`0 , HS(Bi , B`0 )) through lj,k toward Bi . In Figure 2(b) broker B3 , after gathering DREP messages coming from B1 and B5 through l3,1 and l3,6 respectively, determines that the broker with the largest similarity with B9 among itself, B1 and B5 is the latter; then it forwards toward B9 (through l3,9 ) the DREP message received from B5 . 3.1.3. Tear-Down Link Selection The aim of this phase is to select the link that must be teared down during the overlay topology update phase. The procedure is activated for each DREP message received by the broker Bi that started the self-organization run and returns a tear-down candidate link denoted as ltd . If no link exists that can be deleted, ltd = N U LL. A single selection works as follows: let DREP be the reply to the the DREQ message sent along link li,j . The reply contains the identifier B` of the broker behind li,j with the highest similarity with Bi , together with the path stored in HS(Bi , B` ). Let us denote the link that should be created between Bi and B` as lnew . Clearly, if al` = 0 ∧ ali = 0 the link lnew cannot be created, as both Bi and B` have no overlay connections still available, and thus ltd = N U LL. Otherwise, we have two cases: (i) ali > 0 ∧ al` > 0: in this case both Bi and B` have available overlay connections and thus they can establish the link lnew between them. In this case, ltd is the link with the minimum weight in HS(Bi , B` ). 9 This forwarding mechanism can lead, in some rare case, to a simple flooding of the DREQ message on the whole network. To limit the depth of the search conducted through DREQ messages it is possible to add in FP a simple constraint based on a TTL counter.

(ii) al` = 0, ali > 0 (resp. ali = 0, al` > 0): in this case ltd is the link that connects B`−1 to B` (resp. Bi to Bi+1 ), i.e. ltd = l`−1,` (resp. ltd = li,i+1 ); in this way the constraint F on the maximum number of overlay links remains satisfied for B` (resp. Bi ). The next phase of the algorithm (overlay topology update) takes place only if ltd 6= N U LL and w(lnew ) > w(ltd ). In other words a reconfiguration occurs only if lnew is expected to be traversed by a number of events that are matched on both brokers directly connected through it is larger than those which traverse ltd . Considering Figure 2(b), after DREP message reaches B9 through l9,3 , the algorithm selects B5 as the candidate to establish a link (i.e., lnew = l9,5 ) with B9 , and l4,7 as ltd . The reorganization of the overlay network will occur as w(l9,5 ) = a9,5 is greater than w(l4,7 ) (we suppose both ali and al` larger than 0). 3.1.4. Overlay topology update Let Bi and B` be the two brokers that must be connected by lnew and Bp and Bq be the brokers connected by ltd in the path stored in HS(Bi , B` ). To avoid network partitioning or the creation of cycles we must ensure that during the ltd tear down, other links in HS(Bi , B` ) are not teared down by concurrent runs of the self-organization algorithm. Therefore there is the need of a locking mechanism to ensure that only one tear down operation at a time can take place along the path from Bi to B` . A possible implementation of the locking mechanism is the following: Bi sends a LOCK message along the path towards B` . A generic broker B in that path executes the following locking algorithm: • when receiving a LOCK message through link l: (i) if l is not locked and B 6= B` , locks l and forwards the LOCK message to the next node towards B` ; (ii) if l is not locked and B = B` , locks l and sends an ACK message to the next node towards Bi ; (iii) if l is locked or B` is no more reachable through the path HS(Bi , B` ), sends a NACK towards Bi . • when receiving an ACK message: B forwards the ACK message to next node towards Bi ; • when receiving a NACK message: (i) if B 6= Bi removes the lock from the link and forwards the NACK message to next node towards Bi ; (ii) if B = Bi aborts the self-organization. Locks on links must be associated to the corresponding reorganization in order to correctly handle concurrent instances of the lock algorithm. As an example Figure 3(a) depicts an execution of the locking algorithm acting on the path from B9 to B5 assuming no concurrent instances. Numbers reported in the picture indicate the message order. Once the path is locked, Bi establishes the new overlay link with B` , then both Bi and B` send a CLOSE message to Bp and Bq respectively, along the locked path (no

T HE C OMPUTER J OURNAL VOL . 00 N O . 0, 2006

E FFICIENT P UBLISH /S UBSCRIBE THROUGH A S ELF -O RGANIZING B ROKER OVERLAY

B2

B8

3:LOCK

B4

9:ACK

10:ACK

B6

4:LOCK

5:LOCK 2:LOCK

8:ACK

11:ACK

B7 B3 7:ACK

12:ACK

B1

6:LOCK

1:LOCK

B5

B9

(a) 15:CLOSE

B2

B6

14:CLOSE

B8

18:UNLOCK

16:CLOSE

7

It is important to note that events being diffused inside the path identified by HS(Bi , B` ) can be lost and produce false negatives due to the abrupt change of routing tables entries performed during the path unlock. For example let us consider an event e waiting at B4 and to be notified at B5 (Figure 3.a). If the link l4,7 is teared down before e reaches B7 , it will never be notified at B5 . To avoid such problem, in-transit events should be buffered at each broker and rerouted according to the current value of the local routing table. This can lead to duplication of events that can be easily fixed locally at each broker. In the example of Figure 3.b, the event e, after the routing table update at B4 due to the self-organization, will be re-routed to B8 that will forward it toward B5 without duplicating local notifications.

B4

17:UNLOCK

19:UNLOCK

B7 B3 20:UNLOCK

3.2.

14:UNLOCK

B1

Addressing Network Awareness

13:CLOSE 13:CLOSE

B9

B5

(b)

FIGURE 3. Overlay topology update phase. The first picture represents the locking algorithm in action while it locks link in the path connecting B9 to B5 . The second picture shows the moment when link ltu (l4,7 in this example) is teared down and the new link lnew (l9,5 ) is created.

synchronization is needed between Bi and B` ). When these messages arrive at their destination, the link lp,q is teared down and an UNLOCK message starts from Bp and Bq and follows the reverse path to Bi and B` as shown in Figure 3(b) with respect to the path from B9 to B5 (also in this case no synchronization is needed between Bp and Bq ). The UNLOCK message is also used to trigger an update of routing tables10 and to remove locks on the path. A lock can be removed only by the broker that established it (locks are associated with the identity of the locking broker). Once the UNLOCK message arrives at Bi and B` , the link lnew becomes operative. Let us also remark that if the path HS(Bi , B` ) changes (due to a concurrent reconfiguration) before the propagation of the LOCK message, but after the previous phases ended, eventually a NACK message is returned to Bi which aborts the self-organization run. In case of a broker failure, the administrator, besides reconnecting the topology and restoring routing tables [20] as required by SIENA, must remove pending locks and reset data structures associated with the self-organization algorithm. 10 Various mechanisms can be used to fix routing tables (see [20] for an example), but it is important to note that only nodes on the new path connecting Bp to Bq are interested by this update.

The self-organization algorithm presented in the previous section follows a network-oblivious approach, trying to achieve only a reduction in the number of overlay hops. Nevertheless, it is important to note that usually overlay networks are built trying to respect some constraint based on physical proximity between nodes, such that the resulting overlay topology closely matches the underlying network topology. With this technique it is possible to obtain good performance from the overlay network with respect to network-level metrics. For example, connections between brokers in a same LAN must be privileged, both for simplicity of management and for performance reasons. In this case, the intervention of the self-organization can disrupt this proximity, thus affecting network-level performance. To avoid this problem, or at least limit its effects, our self-organization algorithm follows this simple principle: new links can be created only among those brokers whose network-level distance is less than a threshold value d. Network distance can be measured indifferently with any metric, either IP hops or latency or bandwidth, depending on the specific application requirements. Anyway, the choice of the distance metric does not affect the algorithm specification. When network awareness is considered each broker involved in a broker discovery phase checks if its network distance from the broker source of the self-organization is higher than d. If this is the case the broker will simply avoid to propose itself as a candidate, letting the discovery continue in its search for other candidates. Performance obtained considering network awareness obviously depends from the choice of d as this threshold value is actually used to prune the set of candidate brokers for a self-organization run; pruned candidate brokers can sometimes be best candidates, i.e. those brokers offering the largest similarity, thus leading to a less efficient overlay network. In particular if d is unbounded the network-level metric is not taken into account; if d = 0 no reorganization can happen.

T HE C OMPUTER J OURNAL VOL . 00 N O . 0, 2006

8 4.

R. BALDONI et al. SIMULATION STUDY

To evaluate performance and applicability of our selforganization algorithm, we conducted an extensive simulation study. This study is based on results obtained from a prototype implementation of a SIENA-like pub/sub broker augmented with the self-organization algorithm. The prototype is deployed over J-Sim [21], a component-based realtime simulator that allows to simulate the whole TCP/IP protocol stack. The aim of this study is to compare the behavior of the algorithm on different scenarios in order to evaluate at which extent, on these scenarios, the self-organization can improve performance. This section is organized as follows: at first we introduce the simulation model used in our experiments and the scenarios on which the simulations were ran; then we study specific aspects of the algorithm like convergence and stability; the third subsection is devoted to performance results that show how our algorithm can effectively reduce the cost of event dissemination from an application-level point of view; at last we take network latency into account to show how our algorithm can cope with network-level metrics while trying to optimize an application-level metric.

4.1.

Experimental model

4.1.1. TCP/IP and overlay network models The experiments were run over a 100 nodes TCP/IP network. The IP network was first created using the GT-ITM tool [22, 23] configured to generate four subnetworks linked by backbone links. Then TCP/IP-based hosts were simulated on this network through the INET framework provided by J-Sim. Brokers were installed on all hosts and connected to form an initial pub/sub network that closely matches the underlying IP network. The maximum number of allowed overlay connections per broker was set to F = 10.

4.1.2. Data Model Given the absence of publicly available data traces of real pub/sub applications, we tested our algorithm on various synthetic scenarios. Following the same approach previously used in other simulation studies ([24, 12, 25]) we generated nine scenarios characterized as follows: Event space. The proposed self-organization algorithm is independent from the actual event space used, as it only relies on the matching operations of the CBR algorithm. Then, for the purpose of simulations, we assumed a simple two-dimensional event space, i.e., subscriptions have two numerical range filters (defined on the domain of real numbers in the range [+10, −10]). This simple space was chosen to limit the cost of matching operations in our simulations, but, given the independence of the algorithm from the actual structure of the space, we believe that the results presented in this paper are of general validity.

U0 U1 U2 U3 U4 G0 G1 G2 G3

Distribution of events uniform uniform uniform uniform uniform gaussian gaussian gaussian gaussian

Distribution of subscriptions uniform uniform uniform uniform uniform uniform uniform uniform uniform

Popularity of events 1.7% 5.4% 10.1% 24.5% 50.9% 1.4% 5.2% 8.8% 24.6%

Percentage of target brokers 4.8% 14.6% 24.3% 48.9% 75.3% 4.1% 14.2% 22.9% 51.3%

TABLE 2. Scenarios used for the experiments.

Distributions of subscriptions and events in the event space. We generated various scenarios using two different events distributions: uniform (U scenarios) and Gaussian (G scenarios). The Gaussian distribution simulates “hot spots” in the event space: each G-scenario presents four randomly chosen “spots” around which events are generated using a normal gaussian distribution with mean 0 and standard deviation 1. On the contrary, using a uniform distribution, an event can be generated in any point of the event space with the same probability. As far as subscriptions are concerned a uniform distribution was always used to generate values for the constraints. The different U and G scenarios are characterized by popularity of events, which is defined as the percentage of subscriptions matched on average by each event. Event popularity is an important parameter as it affects the average number of target brokers for each event thus affecting also the number of messages generated for their diffusion. Table 2 reports characteristics of each scenario along with the corresponding values of average popularity of events and average number of target brokers11 . In this study we do not consider scenarios with extremely high popularity because, in such scenarios, simple event flooding has been proved to work quite well w.r.t. CBR [14]. Distribution of subscriptions and events in the broker network. Subscriptions and events are spread uniformly among the brokers. In our experiments we do not take into account the presence of regionalism (i.e., the presence of similar subscriptions and events clustered into the brokers’ network): the purpose of our algorithm is to cluster subscriptions, so it would be pointless to apply it in a scenarios where clustering is already present because of regionalism. 4.2.

Performance metrics

For each scenario the evolution of the system was observed under the same settings with and without the reconfiguration algorithm. The following statistics were collected: • Number of reorganizations: the number of reorganization caused by the self-organization algorithm. 11 Percentage of target brokers have been computed by injecting 300 subscriptions in the system.

T HE C OMPUTER J OURNAL VOL . 00 N O . 0, 2006

E FFICIENT P UBLISH /S UBSCRIBE THROUGH A S ELF -O RGANIZING B ROKER OVERLAY Parameter Q δbase d

Description Maximum length of mi . Base value for the triggering backoff algorithm. Maximum latency allowed for link created by the self-organization algorithm.

9

Value 50 50 unbounded

TABLE 3. Parameters of the Self-organization algorithm.

• Notification cost: the ratio between the number of total application messages generated by brokers and the number of subscriptions matched by diffused events (notifications). The number of application messages includes the control messages generated by self-organization algorithm, such as discovery, locks and route updates messages. • Percentage of brokers involved: the average number of brokers involved during the diffusion of an event. This metric is used to compare the number of pure forwarders in the different settings. • Event dissemination time: the time spent for an event to travel through the overlay network from the injection point to all the target brokers. This metric is used for the evaluation of network-level performance as it reflects the network latency of the links traversed by events. The average notification cost expresses the efficiency of event dissemination: the lower the cost, the higher the efficiency. The average percentage of brokers involved per event dissemination gives us the opportunity of comparing the algorithm against the ideal case, i.e., when this number corresponds to the percentage of target brokers. For each scenario the above metrics have been estimated through several independent simulation runs such that their confidence interval was less than 5%. In each simulation run first 300 subscriptions are injected in the pub/sub system, then statistics are collected during the subsequent diffusion of 5000 events. The size of the event lists was set to Q = 50, and the base value of the backoff mechanism was set to δ = 5012 . Unless differently specified in the text, the maximum latency allowed for links created by the selforganization algorithm was always set to d = ∞. These parameters are reported in Table 3 for the convenience of the reader. Note that our simulation model does not explicitly model publishers and subscribers attached to brokers as all the network traffic generated by them can be considered as a fixed cost that occurs with or without the use of the selforganization algorithm; for this reason this cost is actually ignored in our results. In the following, CBR indicates the results from the plain SIENA’s event dissemination algorithm and CBR+SO the ones obtained when self-organizations are allowed.

12 These values were chosen by basing on a suite of preliminary tests. These tests showed how, starting from a base value (i.e., 50 in our scenarios), the impact of Q on the performance results is negligible. This result probably stems from the fact that the minimum useful value for Q is a function of both the event injection ratio and the popularity of events. The value of δ is instead proportional to Q

4.3.

Algorithm convergence

The first aspect we analyze is the algorithm capacity to converge in a limited number of runs. With convergence we mean the ability to reach a stable overlay network topology, i.e., a topology where no more reorganizations happen, given that initial conditions of the scenario do not change. To run these tests we first allocated 300 subscriptions on the brokers and then started injecting events. The number of successful self-organizations are collected every 50 events. The results obtained under scenarios U and G are reported in Figure 4(a) and 4(b) respectively. The curves show how the number of reorganizations generated by the selforganization algorithm reaches a maximum within a limited number of events injected in the system. During the injection of the first 1000 events there is a huge growth of the number of reorganizations that converges to a maximum after the 2000th event. The initial growth is needed to transform the initial overlay topology into an efficient one and can be considered as a transient condition; then the system enters a steady-state condition, in which the number of successful self-organizations is negligible w.r.t. the number of injected events.

4.4.

Algorithm stability

Results about the convergence of the algorithm previously shown were obtained under “static” systems in which the subscriptions inserted at start time remain unchanged for the whole observation period. Obviously, such a scenario does not give any hint on how the algorithm behaves in a dynamic setting where subscriptions (and then interest of brokers) can change during the lifetime of the system. This is an important aspect that must be analyzed to check how the self-organization algorithm reacts to changes in the state of the system and if it is able to eventually converge to a new efficient overlay topology. For this purpose we ran specific experiments, in which the set of subscriptions is abruptly changed at once every 4000 events; although this situation is hardly going to happen in real applications, where subscriptions are expected to change continuously with a certain frequency during the whole lifetime of the system, it can be seen as a “worst case” scenario that puts the algorithm under the most stressing condition. Figure 5(a) reports the notification cost during the evolution of the system as more and more events are injected (the cost was measured every 50 events; the values shown are not cumulative); curves are shown only for scenario U1, as tests conducted for all the other scenarios showed the same behavior. The vertical dotted lines at 4000 and 8000

T HE C OMPUTER J OURNAL VOL . 00 N O . 0, 2006

10

R. BALDONI et al.

70

U0 100

U1 U2

80

60

U3 40

20

U4

Reorganizations of the overlay network

Reorganizations of the overlay network

120

G2

60

G1 50

40

G3 30

G0 20

10

0

0 0

1000

2000

3000

4000

0

5000

1000

2000

3000

4000

5000

Events

Events

(a)

(b)

3,0

3,0

2,5

2,5

2,0

2,0

Notification cost

Notification cost

FIGURE 4. Reconfiguration distributions in U and G scenarios. The curves show how the number of reorganizations of the overlay network converges to a maximum within a limited number of events injected in the system.

1,5

1,0

CBR+SO

0,5

1,5

1,0

CBR+SO

0,5

CBR

CBR 0,0 2000

0,0 4000

6000

8000

10000

12000

2000

4000

6000

8000

10000

12000

Events

Events

(a)

(b)

FIGURE 5. Evolution of notification cost in scenario U1. The graph a shows how the algorithm is able to react to an abrupt change of the subscriptions present in the system (occurring after 4000 and 8000 events), and converge, in a limited number of runs, to a new stable overlay topology. Graph b shows the algorithm rection to a continuous subscription change (occurring from event 4000 to 12000). The set of subscription updates was the same for both graphs.

indicate the instants when subscriptions were substituted. The grey curve depicts the cost for CBR that remains mostly constant during all the test as it does not suffer from the overhead generated by the self-organization algorithm. The black curve depicts the notification cost for the content based routing algorithm augmented with self-organization (CBR+SO). As the curve clearly shows, a subscription update suddenly renders the overlay topology inefficient to diffuse events to new target brokers. The self-organization algorithm quickly detects this new situation and starts re-adapting the overlay network by converging to a new stable topology. Moreover, the new overlay topology obtained after this transient condition provides almost the same increase in performance that was observed before changing subscriptions (this is true as long as the distribution of subscriptions does not change). The transient conditions showed in the plot are large both in

amplitude (notification cost) and duration due to the fact that subscriptions were updated as a whole at the same time. Figure 5(b) shows the results for a similar scenario where subscriptions are not changed all at the same time, but their updates are uniformly spread in time: starting from event 4000, up to the end of the test, the same updates used in the previous experiment are periodically issued, one at a time. In this case the abrupt performance evolution observed in figure 5(a) is absent: when subscriptions are spread over time, the algorithm gracefully adapts the overlay topology, continuously pursuing for an efficient one.

4.5.

Performance results

In this section we investigate the amount of performance improvement obtained through the adoption of our algorithm in the different scenarios described above. Let us note

T HE C OMPUTER J OURNAL VOL . 00 N O . 0, 2006

E FFICIENT P UBLISH /S UBSCRIBE THROUGH A S ELF -O RGANIZING B ROKER OVERLAY

70%

100%

% brokers involved in event diffusion

90%

Normalized notification cost

11

80% 70% 60% 50% 40% 30% 20% 10% 0%

60%

50%

40%

30%

20%

10%

0%

U0

U1

U2

U3

U4

U0

Scenario CBR

U1 CBR

CBR+SO

(a)

Scenario CBR+SO

U2

U3 Ideal value

(b)

FIGURE 6. Performance analysis in U-scenarios. Graph a) shows average notification cost for event dissemination (values are normalized with CBR set to 100%) where the cost of CBR+SO includes the reconfiguration overhead. Graph b) shows average percentage of brokers involved in each event dissemination.

(a) U0 U1 U2 U3

CBR 0.097 0.235 0.325 0.521

CBR+SO 0.182 0.342 0.448 0.659

Gain 88% 46% 38% 26%

(b) G0 G1 G2 G3

CBR 0.02 0.09 0.182 0.461

CBR+SO 0.115 0.237 0.356 0.642

Gain 478% 161% 96% 39%

TABLE 4. Average link weights in U and G scenarios.

that while G-scenarios should favor the mechanism of selforganization given the concentration of events on specific zones of the event space they offer, U-scenarios represent more difficult settings for CBR+SO as the similarity of interests is uniformly spread among all the brokers. The results shown in this section are obtained from measures taken on stable overlay networks. Figure 6(a) reports the notification cost in U-scenarios for both CBR and CBR+SO. The cost is normalized, so that the CBR’s cost is always represented as 100%. As the plot shows SO+CBR can obtain a gain on plain CBR that ranges from 20% to 26% on scenarios U0 through U3. Scenario U4 shows a more modest gain; this is due to the large number of target brokers induced by the large popularity of events which characterizes U4 (see table 2): each event in this scenario must be diffused, on average, to 3/4 of all the brokers, and this means that the gain of CBR itself w.r.t. a simple event flooding mechanism is negligible. High popularity scenarios are not favorable workloads for our self-organization algorithm and for simple CBR as well [14]; for this reason we decided to concentrate our efforts only on scenarios with lower popularity.

Figure 6(b) shows the percentage of brokers involved on average in each event dissemination under various Uscenarios for CBR and CBR+SO. The ideal value, i.e., the average percentage of target brokers for each event, is also reported as a reference. It represents the minimum value that could be reached by an hypothetical “optimal” diffusion mechanism that delivers all the events in one hop per target broker. The figure shows how the self-organization algorithm modifies the application network topology trying to reach this ideal value. In fact, the gap between the ideal value and the CBR value can be interpreted as due to pure forwarder brokers that are needed by the routing mechanism to forward each event toward all the target brokers: the selforganization algorithm reduces the average notification cost lowering the average number of pure forwarders. This result is obtained by CBR+SO reorganizing the network in order to increase the similarity between brokers connected by links. The relationship between the average number of pure forwarders for each event dissemination and the average similarity between brokers is demonstrated by the values reported in Table 4(a). The table shows that the self-organization algorithm accommodates the principles introduced in section 2 and that an increase in the average link weight (i.e., similarity between brokers) obtained by CBR+SO corresponds to the reduction of pure forwarders shown in Figure 6(b). Figure 7 and table 4(b) report the results obtained under G-scenarios. As it appears from the plot, CBR+SO on these scenarios can obtain a gain w.r.t. plain CBR up to 33%. 4.6.

Taking network latency into account

All the results previously shown were obtained without taking the effect of self-organization on network-level metrics into account. Here we choose link latency as the network proximity metric, and study the behaviour of our algorithm when the value of threshold d varies from 0 (no

T HE C OMPUTER J OURNAL VOL . 00 N O . 0, 2006

12

R. BALDONI et al.

70%

100%

% brokers involved in event diffusion

Normalized notification cost

90% 80% 70% 60% 50% 40% 30% 20%

60%

50%

40%

30%

20%

10%

10% 0%

0% G0

G1

G2

Scenario

CBR

G0

G3

G1 CBR

CBR+SO

Scenario

G2

CBR+SO

(a)

G3 Ideal value

(b)

FIGURE 7. Performance analysis in G-scenarios. Graph a) shows average notification cost for event dissemination (values are normalized with CBR set to 100%) where the cost of CBR+SO includes the reconfiguration overhead. Graph b) shows average percentage of brokers involved in each event dissemination.

6,0

5,0

5,5

4,5 4,0

4,5 3,5

4,0

Notification cost

Event diffusion time (sec)

5,0

3,5 3,0 2,5 2,0

3,0 2,5 2,0 1,5

1,5 1,0

1,0

0,5

0,5

0,0

0,0 0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0,4

0,45

0

0,5

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0,4

0,45

0,5

Maximum latency allowed d (sec)

Maximum latency allowed d (sec)

(a)

(b)

FIGURE 8. Network level metrics analysis. The plots show how varying d it is possible to tune how out algorithm affects average event dissemination time (on the left) and average notification cost (on the right).

reorganizations are allowed) up to ∞ (link latency is not taken into account). Figure 8(a) shows how the average event dissemination time13 varies with d. The maximum and minimum values for event dissemination time obtained from the tests are also reported in the figure. Figure 8(b) shows the corresponding average notification cost. As the plots show, by varying d it is possible to fine-tune the behaviour of the algorithm in order to reduce the notification cost while limiting the increase in network latency (15% in our experiments by choosing d = 0.2). Figure 8(a) has also another important interpretation: even though the average event dissemination time increases with d, vertical bars show that there can be cases where the diffusion time decreases with d. This means that the use of our algorithm can also transform the initial overlay topology 13 We do not take the time required to process matching operations on brokers into account.

in a new one that also has better network-level performance than the initial one. For this reason d should be determined on a per-case basis and possibly by means of an automatic mechanism that can dynamically adjust it. 5.

RELATED WORK

Managed vs. self-managed overlay networks. Pub/sub systems based on networks of brokers ([6, 7, 11, 12, 9, 3]) rely on a managed overlay, that is brokers and links connecting them are setup manually by an administrator. In particular, [26] introduces algorithms and tools to simplify network administration, i.e. to make topology changes in the overlay network, but these tools are intended to be just an aid for a human administrator. The notion of automatic self-organization of the overlay network has been introduced by Cugola et al. in [20] and then included in the REDS system [15]. In REDS, self-organization is driven by changes happening

T HE C OMPUTER J OURNAL VOL . 00 N O . 0, 2006

E FFICIENT P UBLISH /S UBSCRIBE THROUGH A S ELF -O RGANIZING B ROKER OVERLAY in the underlying network, often occurring in mobile systems and peer-to-peer networks. Differently from our approach, authors of [20] introduce techniques that “react” to topological changes, but does not induce such changes to improve performance.. A similar approach was presented also in [27]. A completely self-organized publish/subscribe system was introduced in [28]. In this work the authors propose an implementation of the content-based publish/subscribe scheme on top of an unstructured peer-to-peer system. The proposed publish/subscribe system is able to self-organize in order to exploit similarity between client subscriptions. Differently from our approach the proposed similarity metric is only based on subscriptions, without taking into account event distributions. Event dissemination through multicast trees and gossiping. In single source systems efficient event dissemination can be achieved by dynamically building single-source multicast trees ([29, 3]) over an overlay network. However the model addressed in this paper, where every broker can be a source of events, is not suitable for such technique. This is due to the fact that the number of trees that should be built for each broker is given by the number of topics, in topicbased pub/sub systems, and by the number of possible events in content-based pub/sub systems. Moreover, these trees should be rebuilt/updated each change/removal of a subscription. Gossip-based algorithms have been considered as a viable alternative to traditional deterministic multicast algorithms in large scale environments [30]. Using these algorithms, there is no need of building/maintaining any structure over the overlay network. This is at the cost of actually flooding any event through the overlay. DHT-based pub/sub systems. Scribe [4] and Bayeux [5] are two pub/sub systems built on top of two DHT overlays (namely Pastry [31] and Tapestry [32]), which leverage their scalability, efficiency and self-organization capabilities. However, both systems provide only a topic-based addressing, thus offering limited expressiveness to users. Other works ([33], [34], [10], [35], [36]) propose solutions that combine the self-organization capabilities of overlay networks with the expressive addressing schemes of content-based pub/sub. In such systems, a mapping between content-based subscriptions and DHT keys is realized in order to assign each subscription to a given node in the overlay, that is responsible for matching all the events that are published for that subscription. Such mappings are typically static and, therefore, do not allow a reorganization at run-time as our self-organization algorithm does. Distributed query processing systems. If we consider an event as a piece of data and a subscription as a persistent query, our CBR architecture becomes very similar to wide area distributed query processing systems (e.g. PIER [37], Astrolabe [38], etc.). In PIER, for example, a query dataflow engine is deployed over a DHT system, however queries are

13

instantaneous (i.e. non persistent ) in the sense that they do not last for a given period of time. The aim of PIER is indeed to build database related functions (selection, projection, join, grouping and aggregation) over the basic put/get primitives provided by DHTs. Semantic overlay networks. Keeping close over a virtual network all pieces of information having similar semantics is a principle that has been also used in the self-organizing semantic overlay network named pSearch [39]. In this system the basic idea is to adapt classical information retrieval algorithms (like the one used in Google-like search engines) to work on DHTs. However the semantic space used in pSearch is based on document summarization, therefore queries can return only approximate results which is in contrast with the content-based pub/sub system which is able to deliver precise results with respect to persistent queries (i.e. subscriptions). Semantical proximity has also been successfully exploited for enhancing the search performance in peer-to-peer file sharing systems. In particular, [40, 41] present a mechanism for improving search by grouping peers sharing similar content. In both, semantic proximity is computed following on a principle similar to our similarity metric: taking into account previous query result, a node consider as ‘semantically close’ those nodes that in the past either have provided positive replies to its query or have issued queries to which it has replied positively. Network-aware overlays. Many techniques for building network-aware overlay topologies have been described in the literature ([42, 43, 44, 45, 46]). In all these works the overlay is built only by taking into account the underlying network topology, without considering, as we do in our algorithm, also application-level information. On the other hand our self-organization algorithm presents some resemblances with the network aware heuristic used in Pastry. Pastry addresses network proximity (as described in [47, 45]) through a heuristic that chooses the neighbors for any node in the network as the ones that are the closest at network-level to the node (according to a generic distance metric). 6.

CONCLUSIONS

This paper presented a novel approach, based on a selforganizing overlay network, to improve performance of event dissemination in pub/sub systems based on such networks. The proposed self-organization algorithm works in synergy with an event dissemination mechanism of a pub/sub system and it is able to modify the topology of the overlay network adapting to changes in event and subscription distributions. In particular each resulting overlay topology represents a compromise, among all possible event dissemination occurring in the system, that tends to efficient overall performance (i.e., reduction of the average number of overlay hops per event dissemination). By reducing the notification cost in terms of overlay hops with respect to the native event dissemination algorithm

T HE C OMPUTER J OURNAL VOL . 00 N O . 0, 2006

14

R. BALDONI et al.

of the pub/sub system, this approach makes the system augmented with a self-organizing overlay network more scalable. The paper first introduced the basic principle behind the self-organization that states “brokers matching similar events should be as close as possible in the overlay network”. Then the paper presented a measure, used to label edges of the graph representing the overlay network, that quantifies the similarity of interests between two brokers. The paper then showed the application of this approach to a tree-based event routing system, namely SIENA [6], presenting the details of the self-organization algorithm. Performance results executed over nine scenarios show that the self-organization algorithm converges (i.e., the algorithm converges to a stable overlay topology after a limited number of event disseminations for a given scenario) and it is stable (i.e., for each scenario, given a sequence of subscription changes, the algorithm converges to a stable topology after each change). Moreover, these results reveal during a stable topology a performance gain of the notification cost in terms of overlay network hops up to 33% with respect to the native SIENA diffusion algorithm. ACKNOWLEDGEMENTS The authors would like to thank Antonio Carzaniga for the interesting discussions about SIENA and Michael A. Jaeger for his helpful comments on a preliminary version of the paper. The authors are also indebted with the anonymous reviewers for their suggestions and criticism that greatly helped improving the content of this paper. REFERENCES [1] Baldoni, R. and Virgillito, A. (2005) Distributed event routing in publish/subscribe communication systems: a survey. Technical Report TR-1/06. Dipartimento di Informatica e Sistemistica, Universit´a di Roma “La Sapienza”. http://www.dis.uniroma1.it/ midlab/publications.php. [2] Eugster, P., Felber, P., Guerraoui, R., and Kermarrec, A. (2003) The Many Faces of Publish/Subscribe. ACM Computing Surveys, 35, 114–131. [3] Oki, B., Pfluegel, M., Siegel, A., and Skeen, D. (1993) The information bus - an architecture for extensive distributed systems. Proceedings of the Fourteenth ACM Symposium on Operating Systems Principles, Asheville, North Carolina, USA, 5-8 December, pp. 58–68. ACM Press, New York. [4] Rowston, A., Kermarrec, A., Castro, M., and Druschel, P. (2001) SCRIBE: The Design of a Large-Scale Notification Proceedings of the third International Infrastructure. Workshop on Networked Group Communication, London, UK, 7-9 November, pp. 30–43. Springer-Verlag, Berlin. [5] Zhuang, S. Q., Zhao, B. Y., Joseph, A. D., Katz, R., and Kubiatowicz, J. (2001) Bayeux: An architecture for scalable and fault-tolerant wide-area data dissemination. Proceedings of the eleventh International Workshop on Network and Operating Systems Support for Digital Audio and Video, Port Jefferson, NY, USA, 25-26 June, pp. 11–20. ACM, New York. [6] Carzaniga, A., Rosenblum, D., and Wolf, A. (2001) Design and evaluation of a wide-area notification service. ACM Transactions on Computer Systems, 3, 332–383.

[7] Cugola, G., Nitto, E. D., and Fuggetta, A. (2001) The jedi event-based infrastructure and its application to the development of the opss wfms. IEEE Transactions on Software Engineering, 27, 827–850. [8] Pietzuch, P. and Bacon, J. (2002) Hermes: A distributed event-based middleware architecture. Proceedings of the 22nd International Conference on Distributed Computing Systems Workshops (DEBS), Vienna, Austria, 2-5 July, pp. 611 – 618. IEEE Computer Society, Washington. [9] The Gryphon Team (2004) Achieving Scalability and Throughput in a Publish/Subscribe System. Technical report. IBM Research Report RC23103. [10] Terpstra, W. W., Behnel, S., Fiege, L., Zeidler, A., and Buchmann, A. P. (2003) A peer-to-peer approach to content-based publish/subscribe. Proceedings of the second International Workshop on Distributed Event-Based Systems, San Diego, California, USA, 8 June, pp. 1–8. ACM, New York. [11] M¨uhl, G. (2002) Large-Scale Content-Based Publish/Subscribe Systems. PhD thesis, Technical University of Darmstadt. [12] Cao, F. and Singh, J. P. (2004) Efficient event routing in content-based publish-subscribe service networks. Proceedings IEEE INFOCOM, Hong Kong, China, 7-11 March, pp. 929 – 940. IEEE, Washington. [13] Triantafillou, P. and Economides, A. (2004) Subscription summarization: A new paradigm for efficient publish/subscribe systems. Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS ’04), Tokyo, Japan, 24-26 March, pp. 562–571. IEEE Computer Society, Washington. [14] Opyrchal, L., Astley, M., Auerbach, J., Banavar, G., R.Strom, and Sturman, D. (2000) Exploiting IP multicast in contentbased publish-subscribe systems. Proceedings of Middleware 2000, IFIP/ACM International Conference on Distributed Systems Platforms, Lecture Notes in Computer Science, 4-7 April, pp. 185–207. Springer-Verlag, Berlin. [15] Picco, G. P. and Cugola, G. (2005) Reds: A reconfigurable dispatching system. Technical report. Politecnico di Milano. [16] Bhola, S., Strom, R. E., Bagchi, S., Zhao, Y., and Auerbach, J. S. (2002) Exactly-once delivery in a content-based publish-subscribe system. Proceedings of the International Conference on Dependable Systems and Networks, Bethesda, MD, USA, 23-26 June, pp. 7–16. EEE Computer Society, Washington. [17] Porter, B., Ta¨ıani, F., and Coulson, G. (2006) Generalised repair for overlay networks. (to appear) Proceedings of the 25th IEEE Symposium on Reliable Distributed Systems, Leeds, UK, 2-4 October. IEEE, Washington. [18] SIENA Web Site, http://www.cs.colorado.edu/users/carzanig/siena/. [19] Basu, A., Charron-Bost, B., and Toueg, S. (1996) Simulating reliable links with unreliable links in the presence of process crashes. Proceedings of the 10th International Workshop on Distributed Algorithms, Bologna, Italy, October, pp. 105– 122. Springer-Verlag, Berlin. [20] Picco, G. P., Cugola, G., and Murphy, A. L. (2003) Efficient content-based event dispatching in the presence of topological reconfiguration. Proceedings of the 23rd International Conference on Distributed Computing Systems, Providence, RI, USA, 19-22 May, pp. 234–243. IEEE Computer Society, Washington.

T HE C OMPUTER J OURNAL VOL . 00 N O . 0, 2006

E FFICIENT P UBLISH /S UBSCRIBE THROUGH A S ELF -O RGANIZING B ROKER OVERLAY [21] J-Sim Web Site, http://www.j-sim.org. [22] Zegura, E. W., Calvert, K. L., and Bhattacharjee, S. (1996) How to model an internetwork. Proceedings IEEE INFOCOM, San Francisco, CA, USA, 24-28 March, pp. 594– 602. IEEE, Washington. [23] Site,, G. W. http://www-static.cc.gatech.edu/ projects/ gtitm/. [24] Riabov, A., Liu, Z., Wolf, J., Yu, P., and Zhang, L. (2002) Clustering algorithms for content-based publicationsubscription systems. Proceedings of the 22nd International Conference on Distributed Computing Systems, Vienna, Austria, 2-5 July, pp. 133–42. IEEE Computer Society, Washington. [25] Carzaniga, A. and Wolf, A. (2002) A benchmark suite for distributed publish/subscribe systems. Technical Report CU-CS-927-02. Software Engineering Research Laboratory, Department of Computer Science, University of Colorado at Boulder. [26] Bhola, S. (2004) Topology Changes in a Reliable Publish/Subscribe System. Technical report. IBM Research Report RC23354. [27] Xue, T. and Feng, B. (2003) An efficient and self-configurable publish-subscribe system. In M. Li, X. H. Sun, Q. Deng, and J. Ni, editors, Grid and Cooperative Computing: Second International Workshop, pp. 159–163. SpringerVerlag GmbH. [28] Voulgaris, S., Rivi`ere, E., Kermarrec, A.-M., and van Steen, M. (2005) Sub-2-sub: Self-organizing content-based publish and subscribe for dynamic and large scale collaborative networks. Research Report RR5772. INRIA, Rennes, France. [29] Raghavan, S., Manimaran, G., and Murthy, C. S. R. (1999) A rearrangeable algorithm for the construction of delay-constrained dynamic multicast trees. IEEE/ACM Transactions on Netwoking, 7, 514–529. [30] Eugster, P. T., Guerraoui, R., Kermarrec, A.-M., and Massoulieacute, L. (2004) Epidemic information dissemination in distributed systems. IEEE Computer, 37, 60–67. [31] Rowstron, A. and Druschel, P. (2001) Pastry: Scalable, decentralized object location and routing for large-scale peerto-peer systems. Proceedings of IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidelberg, Germany, 12-16 November Lecture Notes in Computer Science, pp. 329–350. Springer-Verlag, Berlin. [32] Zhao, B. Y., Huang, L., Stribling, J., Rhea, S. C., Joseph, A. D., and Kubiatowicz, J. (2003) Tapestry: A Resilient Global-scale Overlay for Service Deployment. IEEE Journal on Selected Areas in Communications, 22, 41–53. [33] Gupta, A., Sahin, O. D., Agrawal, D., and Abbadi, A. E. (2004) Meghdoot: Content-based publish/subscribe over p2p networks. Proceedings of the 5th ACM/IFIP/USENIX International Middleware Conference, Toronto, Canada, 1820 October Lecture Notes in Computer Science, pp. 254–273. Springer, Berlin. [34] Pietzuch, P. and Bacon, J. (2003) Peer-to-peer overlay broker networks in an event-based middleware. Proceedings of the 2nd International Workshop on Distributed Event-Based Systems (DEBS’03), San Diego, California, USA, 8 June, pp. 1–8. ACM, New York. [35] Triantafillou, P. and Aekaterinidis, I. (2004) Contentbased Publish/Subscribe over Structured P2P Networks. Proceedings of the 3rd International Workshop on Distributed Event-based Systems (DEBS’04), Edinburgh, UK, 2425 May, pp. 104–109. IEE The Institution of Electrical Engineers, London.

15

[36] Baldoni, R., Marchetti, C., Vitenberg, R., and Virgillito, A. (2005) Content-based publish/subscribe over structured overlay networks. Proceedings of the 25th International Conference on Distributed Computing Systems, Columbus, OH, USA, 6-10 June, pp. 437–446. IEEE Computer Society, Washington. [37] Huebsch, R., Hellersteinand, J. M., Lanhamand, N., Looand, B. T., Shenkerand, S., and Stoica, I. (2003) Querying the internet with PIER. Proceedings of 29th International Conference on Very Large Data Bases, Berlin, Germany, 9-12 September, pp. 321–332. Morgan Kaufmann, San Fransisco. [38] Van Renesse, R., Birman, K. P., and Vogels, W. (2003) Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining. ACM Transactions on Computer Systems, 21, 164–206. [39] Tang, C., Xu, Z., and Dwarkadas, S. (2003) Peer-to-peer information retrieval using self-organizing semantic overlay networks. Proceedings of ACM SIGCOMM, Karlsruhe, Germany, 25-29 August, pp. 175–186. ACM Press, New York. [40] Sripanidkulchai, K., Maggs, B., and Zhang, H. (2003) Efficient content location using interest-based locality in peerto-peer systems. Proceedings of IEEE INFOCOM, San Franciso, CA, USA, 30 March - 3 April, pp. 2166–2176. IEEE Computer Society, Washington. [41] Voulgaris, S., Kermarrec, A.-M., Massoulie, L., and van Steen, M. (2004) Exploiting semantic proximity in peerto-peer content searching. In Proceedings of the 10th IEEE International Workshop on Future Trends in Distributed Computing Systems, Suzhou, China, 26-28 May, pp. 238– 243. IEEE Computer Society, Washington. [42] Kwon, M. and Fahmy, S. (2002) Topology-aware overlay networks for group communication. Proceedings of the 12th International Workshop on Network and Operating System Support for Digital Audio and Video, Miami Beach, Florida, USA, 12-14 May, pp. 127–136. ACM Press, New York. [43] Xu, Z., Tang, C., and Zhang, Z. (2003) Building topologyaware overlays using global soft-state. Proceedings of the 23rd International Conference on Distributed Computing Systems, Providence, RI, USA, 19-22 May, pp. 500–508. IEEE Computer Society, Washington. [44] Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., and Balakrishnan, H. (2003) Chord: A scalable peer-to-peer IEEE/ACM lookup protocol for internet applications. Transactions on Networking, 11, 17–32. [45] Castro, M., Druschel, P., Hu, Y. C., and Rowston, A. (2002) Exploiting network proximity in distributed hash tables. International Workshop on Future Directions in Distributed Computing, Bertinoro, Italy, 3-7 June. [46] Ratnasamy, S., Handley, M., Karp, R., and Shenker, S. (2002) Topologically-aware overlay construction and server selection. Proceedings of IEEE INFOCOM, New York, USA, 23-27 June, pp. 1190 – 1199. IEEE Computer Society, Washington. [47] Castro, M., Druschel, P., Hu, Y. C., and Rowston, A. (2002) Exploiting Network Proximity in Peer-to-Peer Overlay Networks. Technical Report MSR-TR-2002-82. Microsoft Research.

T HE C OMPUTER J OURNAL VOL . 00 N O . 0, 2006

Suggest Documents