Data Aware Multicast Implementation based on P

0 downloads 0 Views 206KB Size Report
This paper presents the implementation viewpoint of Data Aware Multicast, a multi- ... exploitation of P-Grid, others from ensuring the integrity of the data structures .... issues, like interconnection between higher and lower level abstractions and ..... stable state, where the set of paths of all peers is prefix-free and complete ...
Data Aware Multicast Implementation based on P-Grid Oana Jurca Doctoral School of Computer and Communication Sciences, EPFL 1015 Lausanne, Switzerland [email protected] Responsible assistants: S´ebastien Baehni, Anwitaman Datta, Roman Schmidt Professors: Karl Aberer, Rachid Guerraoui

Abstract This paper presents the implementation viewpoint of Data Aware Multicast, a multicast algorithm for peer-to-peer dissemination of events in a distributed, topic-based, publishsubscribe system. The implementation is based on P-Grid, a structured, completely decentralized, self-organizing peer-to-peer system. We discuss various design approaches to interconnecting the algorithm with the underlying peer-to-peer system going from a loose dependency to a highly coupled solution and describe their advantages and drawbacks. The two implementations rely on the most promising design approaches and have similar structures, apart from the group membership layer. Some implementation difficulties arise from the efficient exploitation of P-Grid, others from ensuring the integrity of the data structures required by the algorithm. The test results provide means to compare the two implementations and their overall performance.

Contents 1

Introduction

2

Data Aware Multicast: the algorithm 2.1 Data structures . . . . . . . . . . 2.2 Topic subscription . . . . . . . . . 2.3 Event dissemination . . . . . . . . 2.4 Support . . . . . . . . . . . . . .

4

. . . .

4 5 6 6 7

3

P-Grid 3.1 P-Grid construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Search feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7 8 9

4

Application design 4.1 General design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Messaging . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Membership . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 DaMulticast . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Possible variations of the general design . . . . . . . . . . . . . . . . 4.2.1 A faithful approach . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Membership based on P-Grid search . . . . . . . . . . . . . . 4.2.3 Dissemination based on P-Grid replica management . . . . . 4.2.4 Having a hierarchical topic tree in the leaves of the P-Grid tree

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . . . . .

9 10 11 11 11 11 11 13 15 17

. . . . . . . . .

19 19 21 22 23 23 24 24 25 25

6

Test settings and results 6.1 Test settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Test results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26 26 26

7

Related work

28

8

Future work

28

9

Conclusions

29

5

Implementation 5.1 General overview . . . . . . . . . . 5.2 Communication . . . . . . . . . . . 5.3 Membership . . . . . . . . . . . . . 5.4 The publish-subscribe primitives . . 5.5 daMulticast . . . . . . . . . . . . . 5.5.1 Topic table . . . . . . . . . 5.5.2 Supertopic table . . . . . . 5.5.3 Dissemination in own topic 5.5.4 Dissemination in super topic

. . . . . . . . .

. . . . . . . . .

2

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

List of Figures 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

Example of a topic architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . The dissemination of an event in daMulticast . . . . . . . . . . . . . . . . . . . Example P-Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A search query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . General design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peers’ data structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A faithful approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Membership based on P-Grid search . . . . . . . . . . . . . . . . . . . . . . . . Example network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dissemination based on P-Grid replica management . . . . . . . . . . . . . . . . Example network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Having a hierarchical topic tree . . . . . . . . . . . . . . . . . . . . . . . . . . . Topic search area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Package overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The peer2peer package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The pubsub package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The usage of OutputPipe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The usage of InputPipe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The membership package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The dam package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The damulticast package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The dissemination algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . SCAMP based membership implementation’s reliability and latency . . . . . . . SCAMP based membership implementation’s efficiency and upward events . . . P-Grid based membership implementation’s reliability and latency . . . . . . . . P-Grid based membership implementation’s efficiency and upward events . . . .

3

5 6 8 9 10 10 12 13 14 14 16 16 17 18 19 20 20 21 22 22 23 24 25 26 27 27 27

1 Introduction The area of publish-subscribe algorithms has raised several interesting questions with the need to accommodate performance issues, in terms of message delivery while trying to be completely decentralized. Both of these aspects have to be considered in order to fulfil the current day’s demands. Even though performance rate has always been an issue, it gets tougher to accomplish in the typical distributed environments that we face today. Indeed, a peer-to-peer (P2P) approach is a very realistic setting for a publish-subscribe study. Another interesting idea is to organize the topics of such publish-subscribe systems in a hierarchical structure and thus, to manage the inclusion relationships between the topics. Data Aware Multicast ([3]) is such a topic-based algorithm, relying only on local knowledge, aiming at no parasite (unrequested) messages and offering good probabilistic guaranties on message delivery. The algorithm is ”data-aware” in the sense that it exploits knowledge of the process subscriptions and topic inclusions relationships to group the participating processes and, through these groups, to manage an efficient flow of information. The subscription mechanism reflects the hierarchical organization of topics. It considers a process as a subscriber for all of a topic’s subtopic tree, if the process subscribed to that respective topic. Thus, when a process is interested in several topics, it is enough to subscribe for the highest-level topic and then it will receive all related messages. The underlying membership algorithm plays an important role in achieving the requirements of Data Aware Multicast (daMulticast, for short) and, under it, the underlying overlay network is also important. As the goal of this doctoral school project was to have an implementation of the daMulticast algorithm, the choice of the underlying system would have influences on further issues, like interconnection between higher and lower level abstractions and algorithm-specific support. The P-Grid ([1]) system offers the needed primitives, which are those of an overlay allowing peers to efficiently discover and contact each other. P-Grid is a structured, completely decentralized, self-organizing peer-to-peer system. It relies on a virtual distributed routing tree, similarly structured as standard distributed hash tables. Each peer has a part of the overall tree and takes decisions based strictly on local information. Because the final goal was to have an implementation of daMulticast, I considered the existing implementation of P-Grid and I tried to use at maximum the features it offered, without changing its code. As such, several problems had to be circumvented, even though, at conceptual level, they were easily or already solved. This also led to some new ideas on how a better implementation could be done in the future, taking advantage of other existing or new P-Grid features. The organization of the rest of the paper is as follows: Section 2 presents daMulticast, Section 3 presents P-Grid and Section 4 describes the general design, leaving Section 5 to discuss implementation issues, Section 7 to present the implementation tests, Section 6 to present related work, while Section 7 presents future work. I conclude in Section 8.

2 Data Aware Multicast: the algorithm In this section, I present an overview of daMulticast ([3]). It is a multicast algorithm for P2P dissemination of events in a distributed topic based publish-subscribe system. Processes can publish events on certain topics and they expect events of topics they have subscribed to. A characteristic of daMulticast is that the topics considered are hierarchically organized, based on inclusion relationships, up to a root topic, that unites all topic trees. Then, it groups processes by the topics they are interested in. The algorithm is ”data-aware” in the sense that it exploits information about

4

process subscriptions and topic inclusion relationships to build dynamic groups of processes and efficiently manage the flow of information within and between these groups. A

B

C

D

E

G

F

H

Figure 1: Example of a topic architecture Having the topics organized in a hierarchy, it is sufficient for a process interested on several topics to subscribe to the highest level topic and receive all interesting events, if those topics include one another. Figure 1 presents an example hierarchy, where the topic denoted by A is the root topic; it has as subtopics the topics B, E and G. Topic B is the super topic for topics C and D. The inclusion relationship between topics translates into a similar relationship between processes, in the sense that those interested in a topic T are ”super processes” for the processes interested in a subtopic of the T topic. The daMulticast algorithm interconnects process groups based on the topic inclusion relationship (or, as shown, process and ”super process” relationship). Published events are propagated within groups in a gossip-based manner and disseminated between groups following a bottom-up approach imposed by the topic hierarchy. To have access to process groups, daMulticast relies on an underlying membership service that will actually manage the groups corresponding to each topic and will provide daMulticast with members of such groups. Instead, daMulticast interconnects these groups, by relying on two topic tables each process will have to keep. These two topic tables use information about topic inclusion and are all the memory complexity a process has to deal with in order to ensure message propagation. Still, processes will not receive any parasite messages, as an event is routed only along the topic hierarchy. As the implied techniques are gossip-based, it is possible to trade between the memory complexity of the algorithm and its dissemination reliability through several parameters. Another service daMulticast assumes, as an underlying support is the messaging service that would allow events (messages) to be sent and received by processes involved in the algorithm.

2.1 Data structures As mentioned before, each process has to maintain two tables, used in the dissemination algorithm, a topic table and a supertopic table. They represent the whole memory complexity a process has to deal with in order to participate in the algorithm, considering that all the topics the process is interested in include one another. The topic table of process stores the identifiers (IDs) of other processes interested in the same topic as . The supporting membership service fills it with values and should, based only on local knowledge, provide a population of about  

   identifiers, where  is the number of processes interested in the corresponding topic (the size of the group for that topic). All the sizes of topic tables from the processes of a topic group will be around this value and the tables will allow a good dissemination of an event in the group. The constant involved in setting the topic table size is a parameter in the trade-off between the memory complexity of the algorithm and the reliability of the dissemination. 5

In opposition to the topic table’s varying size, the supertopic table has a constant size. It stores IDs of super-processes of this process (processes interested in the super topic of process p’s topic of interest). Its content can be provided in several ways, the underlying membership does not directly fill this table as it does with the topic table. One way to provide content for the supertopic table is that, at initialization time, the joining process also receives contacts for the group of its supertopic. Another way would be for the process to ask, via an initialization message, other processes about processes interested in the supertopic. As soon as one process detects that its supertopic table is outdated (i.e. the information about the processes is not consistent anymore), it updates its supertopic table. The size of the supertopic table provides one more parameter to configure the already mentioned trade-off. An exception from maintaining two tables is the group of processes interested in the root topic of the topic hierarchy. Processes that are part of this group have only the topic table. Maintaining the links between groups is an important part of daMulticast. First, the two tables mechanism creates links between groups, but these also need to be maintained (updated), to reflect a consistent view of the groups’ system. For the sake of reliability and load balancing, these links should not be the same throughout the entire lifetime of the group.

2.2 Topic subscription To join a group, a process must subscribe to the topic of that group. It goes through an initialization phase, which is responsible for initializing the topic and supertopic tables for that process. These tables represent the only membership information a process must maintain for any number of topics it is interested in, if those topics include one another. Once a process has joined a group, the underlying membership algorithm and other mechanisms will keep its topic tables updated and the process can participate in the dissemination algorithm. When the process is no longer interested in a topic, it will unsubscribe from that topic (the process will no longer keep the topic and supertopic tables) and will not participate anymore in the dissemination algorithm for that topic.

2.3 Event dissemination The final goal of daMulticast is that subscribers of a topic receive the messages published on that topic (including messages published on the subtopics that it includes). The dissemination mechanism bases itself on the topic tables each process participating in the algorithm maintains.

A A

2 3

E

E

Topic hierarchy

F 1

F

Figure 2: The dissemination of an event in daMulticast

6

First, a process willing to disseminate an event (process 1 in Figure 2), can elect itself to disseminate (with a certain probability) the event to the processes of its supertopic table. If it decides to act as link between groups (to send the received event to processes from its supertopic table), it will send the event to at least one super-process (to processes 3 and 4 in the same figure). In the second step, it will gossip the event in its own group, by sending it to processes from its topic table. The events will shift up to the next supertopic group, as long as there is a supertopic with interested processes. When an event reaches the root group, the processes receiving the event gossip it only in their group (no super group and thus no super table exist).

2.4 Support In order to achieve its goals, daMulticast employs the help of two supporting services: a membership service and a messaging service. The role of the membership service is to provide contents for the topic tables. In order to do so, it must manage groups of processes, corresponding to the topics of interest that exist in the system. For a process to take part in the daMulticast algorithm, it must first enter the system by joining a group. From that point on, the membership service is responsible to include the new member in the group. The membership service also needs support, namely a weakly connected overlay network that provides the location of the involved nodes. Since daMulticast assumes support for events sending and receiving, the messaging service is responsible for carrying the events from one process to another. This service should be able to ensure message traffic in general, from a sender process to a receiver process.

3 P-Grid P-Grid is a peer-to-peer lookup system based on a virtual distributed search tree, similarly structured as standard distributed hash tables. Each peer in the network holds part of the overall tree. The path of a peer, that is, the binary bit string representing the subset of the tree’s overall information that the peer is responsible for determines every participating peer’s position in the network. This mechanism partitions the overall search space (where data items are mapped) into intervals for which different peers are responsible. Such a peer should be able to provide the addresses of all peers that have an information item with a key value belonging to its corresponding interval. For example, the path of Peer 4 in Figure 3 is 10, so it stores all data items whose keys begin with 10. For fault tolerance, multiple peers can be responsible for the same path, for example, Peer 1 and Peer 6 (they are called, in P-Grid terms, replicas). From the routing viewpoint, for each bit in its path, a peer stores a reference to at least one other peer that is responsible for the other side of the binary tree at that level. This means, that for each possible prefix of its path, a peer has references to peers with the same prefix, but different continuation. Thus, if a peer receives a binary query string it cannot satisfy, it must forward the query to a peer that is closer to the result (has a longer common prefix with the query). In the given example, as all path lengths are equal to two, each routing table has two levels. A characteristic of P-Grid, in contrast to other DHT-based P2P systems, is the separation of concern between peer identifier and peer’s path. In P-Grid, peer paths are not determined a priori but are acquired and changed dynamically through negotiation with other peers as part of the network maintenance protocol. Thus, a decentralized, self-organizing process constructs PGrid’s prefix-routing infrastructure and adapts it to a given distribution of data keys stored by the peers ([1]). The process is based on pair-wise interactions of peers in which they locally decide whether to modify the routing infrastructure (by path extension or retraction) in a given data key subspace, if the present data justifies such a modification. As a result, the shape of the (virtual) 7

0

1 "virtual binary search tree"

00

01

10

11

1

6

2

3

4

5

1 :3 01 : 2

1 :5 01 : 2

1 :4 00 : 6

0 :2 11 : 5

0 :6 11 : 5

0 :6 10 : 4

data

data

data

data

data

data

Figure 3: Example P-Grid tree underlying the construction of routing tables will adapt to the data key distribution, achieving a uniform load distribution for peers with respect to storage. The following section gives more information on the grid’s construction.

3.1 P-Grid construction As mentioned before, P-Grid peer paths are not determined beforehand, but are a result of an algorithm that constructs the virtual routing tree. The pair-wise operations composing this algorithm can determine peers to change their paths or to exchange information (about the data items they reference) and are presented just below. Other (still pair-wise) operations determine peers to exchange information from routing tables or to initiate interactions with previously unknown peers. Balanced split The peers check whether their paths are identical. If so, they extend their paths by complementary bits, i.e., split the common subspace covered by their paths, exchange their data correspondingly and reference each other for future query routing. This enables the refinement of the indexing structure into subspaces that are sufficiently populated with data. Unbalanced split The peers check whether one path is a proper prefix of the other. If so, the peer with the shorter path extends it by one bit complementary to the bit of the longer path at the same level. Then, the peers exchange their data corresponding to the updated paths and update their routing table. This enables the refinement of the indexing structure into subspaces as in the previous case, but covers the frequently occurring situation that peers have already specialized to different degrees. Balanced data exchange The peers check whether their paths are identical. They replicate mutually all data pertaining to their common path. This allows peers to take advantage of unused storage space to increase resilience. Unbalanced data exchange The peers check whether one path is a proper prefix of the other. If this is the case, data from the peer with shorter path, pertaining to other peer (with a longer path) is moved to the latter. Through this mechanism, multiple peers can become responsible for the same search subspace, without imposing a limit in the number of such peers per subspace. It will also get P-Grid in a 8

stable state, where the set of paths of all peers is prefix-free and complete (i.e., no two peers exist such that one is a proper prefix of the other and if there exists a peer with a path p, then there also exists another peer with the binary inversed path). This guarantees full coverage of the search space and complete partitioning of the search space among the peers. All data stored at a peer then matches its path.

3.2 Search feature A P-Grid search can start at any peer and the system solves it based on the routing tables that each peer maintains. Upon reception of a query, the peer verifies if it is responsible for that query. If it cannot solve the query, it will forward it to the peer from his routing table that has the longest common prefix with the search query. This algorithm always terminates successfully, in the sense that this prefix routing strategy will always find the location of a peer at which the search can continue (use of completeness) and each time the query if forwarded, the length of the common prefix of the query and the peer’s path increases. 0

1 "virtual binary search tree"

00

01

10

11

query(6, "100") query(5, "100") query(4, "100") found

1

6

2

3

4

5

1 :3 01 : 2

1 :5 01 : 2

1 :4 00 : 6

0 :2 11 : 5

0 :6 11 : 5

0 :6 10 : 4

data

data

data

data

data

data

Figure 4: A search query The same P-Grid network is used to exemplify the search mechanism. In Figure 4, a search for a data item mapped to the ”100” key is initiated at peer 6. This peer checks if it is responsible for the key. Since it is not, it will forward the query to the peer from its routing table having the longer common prefix (”1”) with the query: peer 5. Repeating the same steps, peer 5 forwards the query to peer 4, which is responsible for the query. As another example, Peer 2 forwards any query starting with ”1” to Peer 4, which is in Peer 2’s routing table and whose path starts with ”1”. Peer 4 can either satisfy the query or forward it to another peer, depending on the next bits of the query. If Peer 2 gets a query starting with ”0”, and the next bit of the query is ”1”, it is responsible for the query. If the next bit is ”0”, however, Peer 2 will check its routing table and forward the query to Peer 6, whose path starts with ”00”.

4 Application design This section presents the envisioned design for the whole implementation. It starts with a general description of the concepts that have to be managed (membership and communication support and building daMulticast on top), with possible logical structuring and some interactions. It actually presents more than just possible designs for the task at hand, as it goes from implementing daMulticast on top of P-Grid to an analysis of a publish-subscribe system that would take maximum 9

advantage of P-Grid’s capabilities. Furthermore, this section presents several possible alternative designs for the actual membership or communication support, respectively.

4.1 General design As mentioned before, daMulticast relies on membership and communication (messaging) support. It assumes that these issues are ensured, no matter how, by the messaging part, as long as it has access to a communication layer. For the membership support part, daMulticast (in its original paper) actually uses a specific membership algorithm, SCAMP ([4]), which ensures access to concepts such as groups and group members.

Data Aware Multicast

Messaging

Membership

P-Grid

Figure 5: General design Every design for the task at hand should consider this general design as a template and add on it to develop different approaches. Figure 5 presents this general design, with the inclusion of P-Grid in a dashed box, as the final goal is to integrate it in the implementation. Its box line is dashed to reflect that no conditions are set upon where in the architecture it can be used. Other issues in the analysis of the different designs are the type and quantity of data that each node has to keep, as shown in Figure 6. As part of the P-Grid network, each node will have a part of the virtual routing tree and information about the stored data. As part of the daMulticast implementation, each node also maintains topic tables. X

routing data topic tables

Peer X in the network

P-Grid routing information Stored data in P-Grid daMulticast required topic tables

Figure 6: Peers’ data structures The following subsections reflect the logical separation of concepts, into messaging related, membership related and multicast algorithm related issues.

10

4.1.1

Messaging

The obvious need for messaging support comes from the need to send messages on the existing topics between the nodes that form the topic based publish-subscribe network. Such a node must be able to send messages to other nodes (thus, node identification and location information must be managed) and to receive messages from others. It must also have means to register listeners for incoming messages (and to be able to route them to interested parties). 4.1.2

Membership

The key concept for the membership support is the group concept. It actually reflects the organization of nodes by the topic in which they are interested. All the nodes interested (subscribed) to a topic must be in the same group. A node could be interested in several topics and so it is part of several groups. A node interested in topics that include one another has to subscribe only to the highest one in the topic hierarchy and will receive all interesting messages. The membership service must provide the management of such groups, by allowing processes to subscribe and unsubscribe from a group, depending on their interest in the corresponding topic. It must also provide, upon request,a list of members of a certain topic group. 4.1.3

DaMulticast

This part of the global design is quite straight forward, as it will have to provide the features of the daMulticast algorithm, described in [3]. It will be based on the functionality offered from the previously discussed parts, the supporting layers, with the purpose of offering a topic based publish-subscribe system.

4.2 Possible variations of the general design During the design phase, different approaches were considered, mainly for the membership layer and these approaches will be described separately. As for the messaging support, the decision had to be taken between using P-Grid’s communication mechanisms and having a dedicated, separate support layer. The latter was preferred, not only for clarity reasons (to separate the implementation of daMulticast from the implementation of the underlying P2P network), but also because messages required for the membership layer might also be needed (and, again, to keep the membership layer separated from other implementations). The next subsections will provide descriptions of the solutions for the membership layer. 4.2.1

A faithful approach

In the original paper, daMulticast assumes that an underlying gossip-based membership algorithm, the one presented in [4] provides the membership service . This first design bases itself on a faithful approach, that is, the architecture reflects exactly the original architecture: there is a messaging support (not specified in the paper, so it will be the one presented before) and the membership support through SCAMP. Figure 7 reflects this architecture. As such, each daMulticast node that subscribes to a given topic, will participate in the membership algorithm for that topic. Moreover this is how the nodes interacting with each other, as a result of the algorithm requirements, populate daMulticast topic tables. SCAMP is a peer-to-peer membership protocol that operates in a fully decentralized manner and provides each member with a partial view of the group membership. The protocol is selforganizing, in the sense that the size of partial views naturally (through the algorithm) converges to the value required to support a gossip algorithm reliably. This value is a function of the group 11

Data Aware Multicast

Messaging

Membership get first contact

P-Grid

Figure 7: A faithful approach size (actually   , where  is the total group size), but is achieved without any node knowing the group size. Each new node starts by contacting one of the nodes in the group (and so its initial view of the group consists only of its contact) and then participates in the algorithm (receiving and routing join and leave requests from other nodes). By this mechanism, as results showed ([4]), the mean value of partial view size is quite close to   , but with actual view sizes varying from 1 (only the contact node) to about 4 times the mean value. Still, an algorithm requirement remains unsolved and that is the quest for a ”contact” of a group, the starting point of the SCAMP algorithm. The design delegates this issue to P-Grid, whose powerful search capabilities can easily provide a node in the network pertaining to a certain group. Nodes have to register themselves in P-Grid, with the topic(s) in which they are interested. In this approach, the normal registration of P-Grid is considered, through its usual mapping of data to a key which is then stored in the P-Grid tree. As mentioned, in order to find a node interested in a certain topic, by using P-Grid, the node has to register in the network. The first step would be to map the topics into P-Grid keys. Independently of the mapping algorithm, the same topic will always map to the same key. Then, the subscription and unsubscription from a group will directly translate into registering or unregistering topic interests in the network. From P-Grid’s viewpoint, all topics are in the system all the time (at any point, it can route a query on any given key), only subscribers being presents or not in the network. The registration process will be a common point between all discussed approaches, the variation coming from the mapping functions that are used or from internal P-Grid management issues. Reflecting the currently presented configuration, an example network of six nodes could look as in Figure 6, which shows the data stored at each peer under each peer id. Pertaining to PGrid management are the routing and data storage fields that contain information about the virtual routing tree and the data mapped at this peer, respectively. The last field corresponds to the topic tables that daMulticast uses to disseminate topic messages, which tables are filled, in this case, by the SCAMP membership algorithm. Life cycle The life cycle of a node in the topic based publish-subscribe system would be to subscribe to a topic, possibly publish messages and receive messages that others published on the topic and then unsubscribe from the topic. In this approach, the life cycle would translate to:  

Subscription: Register the topic interest in P-Grid, then ask it for a contact and start the SCAMP membership algorithm. Publish: Publish the message through daMulticast (based on the topic tables, filled by 12

0

1 "virtual binary search tree"

00

1

10

01

6

2

3

11

4

5

routing

routing

routing

routing

routing

routing

data

data

data

data

data

data

topic

topic

topic

topic

topic

topic

tables

tables

tables

tables

tables

tables

P-Grid related daMulticast related

Figure 8: Example network SCAMP). The node will also receive messages that other nodes published through daMulticast. 

Unsubscription: Unregister the topic interest in P-Grid (after this, P-Grid will no longer provide it as a contact node for new subscribing nodes) and leave the topic group through SCAMP (SCAMP will remove it from other nodes’ topic tables).

Advantages and drawbacks The advantage of this solution is in the clean separation between daMulticast and the rest. It depends only on the messaging service, which is independent (does not depend on further components) and on the membership service. As described, the latter depends, but only loosely, on P-Grid, which provides it with a first contact. The drawback of the solution is that it does not use P-Grid as much as it could use it (i.e. P-Grid can directly provide the partial views required at each peer). One of the versions of the application implements this design. 4.2.2

Membership based on P-Grid search

A second design choice would consider more involvement from P-Grid in solving the membership service, thus better exploiting it. It would actually rely entirely on the underlying P2P network to provide group management and will not use the original membership algorithm anymore, because the membership service will be available based on P-Grid. The P-Grid registration step described in Section 4.2.1, remains the first action that every node has to do. However, the topic table management relies on querying P-Grid for members of the corresponding topics, as in Figure 9. For this approach to be consistent with daMulticast requirements (i.e. that the topic tables should probabilistically cover the whole group, as the SCAMP algorithm ensures), the search function should return randomly selected results from the network. Because of the random factor that appears in the routing of each search through the P-Grid search tree, a random choice does appear, but from the topic tables’ viewpoint, this is not enough (they must contain a certain number of entries, i.e   ). The results returned should also be randomly chosen from the available data. As presented in Section 3, the P-Grid tree can have multiple peers responsible for the same key (called replicas in P-Grid terms). From the membership requirements point of view, these P-Grid replicas do not have to be consistent with each other, but collectively they must have the complete

13

Data Aware Multicast fill topic tables through search

Messaging

P-Grid

Figure 9: Membership based on P-Grid search information about the members of the corresponding group. In the routing mechanism of a search, when several P-Grid replicas are available, the routing choice is random (it randomly selects one of the those available to forward the query), thus the mechanism will access each of them with about the same probability. Further more, the special search function required by daMulticast should return randomly chosen peers from the whole collection of matching peers. Because a search query in P-Grid tends toward finding all matching data, a daMulticast dedicated query should bind the number of results to log(n), where n is the size of the topic group. Such a function can easily be implemented based on the existing search capabilities (it should wait for the search to end and then return a random partial view of the required size) and would allow a better comparison with the SCAMP membership (this is the reason for returning log(n) peers). This alternative membership service should also manage the inclusion of new peers and exclusion of departing peers from the topic tables. Thus, it will have an update mechanism that would refill the topic tables after given time intervals or when they become outdated. In this way, it has all the features of the original membership algorithm, with the advantage of a better group view size (as the query will return log(n), as compared to the SCAMP case, where the view size are expected to be around this value). This approach uses P-Grid to provide the required membership information, through the specially created search function. Because P-Grid’s objective is to return a complete view of the existing data, it is more natural to use it in this way, than just to provide an initial contact. 0

1 "virtual binary search tree"

01

00

10

11

1

6

2

3

4

5

routing

routing

routing

routing

routing

routing

data

data

data

data

data

data

topic tables

topic tables

topic tables

topic tables

topic tables

topic tables

P-Grid related filled from P-Grid search

Figure 10: Example network To reflect the changes assumed in having the membership based solely on P-Grid, Figure 10 illustrates them on the same example network, with the data stored at each peer. Pertaining to

14

P-Grid management are the routing and data storage fields, while the last field corresponds to the topic tables that daMulticast needs, which in this case are filled with the results returned from P-Grid searches. A mixture between these first two approaches is also possible, like initializing topic tables through P-Grid’s special search function at each new join and then allowing the original membership algorithm to maintain the tables (no update from P-Grid). Another issue easier solved by a hybrid approach is the recovery from isolation that can occur in the SCAMP membership; in that case, a P-Grid search can be used to update the tables. Life cycle The life cycle of a node in this approach (membership based on P-Grid, not the variation presented in the previous paragraph), would translate to: 

Subscription: Register the topic interest in P-Grid, then ask it for peers interested in the same topic and the super-topic to fill the topic tables. Update them at given time intervals, by using the same P-Grid function. 

Publish: Publish the message through daMulticast (based on the topic tables, filled by PGrid special search function). The node will also receive messages that other nodes published through daMulticast. 

Unsubscription: Unregister the topic interest in P-Grid (after this, P-Grid will no longer provide it as one of the results in subsequent searches). The node need to take no further action, as the following updates of the other node’s topic tables will remove its presence from those topic tables.

Advantages and drawbacks This approach has the advantage of better exploiting P-Grid, easier and more efficient than in the first case, requiring only minor modifications in daMulticast algorithm. Another change is that, with this approach, daMulticast does not depend anymore on the [4] membership. It directly depends on P-Grid, by filling the topic tables with the results returned from the new dedicated search function. The draw back would consist of (perhaps) a longer time until the topic tables are updated, compared to the SCAMP algorithm. In addition, the implementation of the dedicated search function might raise some other problems (how long to wait for a search to end, to get an algorithm consistent size of the partial view, but not to make the daMulticast algorithm stall). A second version of the application implements a variation of this approach, in the sense that it uses only already implemented features from P-Grid (thus, no special search function, the membership service randomizes and bounds the partial view). 4.2.3

Dissemination based on P-Grid replica management

Another design emerged from the existence of P-Grid replicas (multiple peers responsible for the same key, see [2]). If these replicas have means of gossiping between themselves, the only thing a publisher of a topic message would need is to be able to communicate with any one of these replicas. Then, it would simply publish its message to that replica. In this configuration, daMulticast is no longer employed in its original version, only the goal (data aware, topic based publish-subscribe) is common. There will be no more topic tables, but there will still exist the need for membership and messaging services (Figure 11). Of course, interested nodes would still have to register in P-Grid through the mechanism described in Section 4.2.1. Nevertheless, no further information would be required from P-Grid, there will be no more searches for group members to fill topic tables, but only one search, for a P-Grid replica responsible for the considered topic. 15

Publish-subscribe system dissemination based on replica management

Messaging

P-Grid

Figure 11: Dissemination based on P-Grid replica management To overcome a possible drawback, the function employed in mapping topics to P-Grid keys keeps no relationship between the keys that would reflect the topic inclusion relationship. In this way, topics including one another will map to unrelated parts of the virtual P-Grid search tree. Because this approach deals with nodes at another level than the previous approaches (which were working at the subscriber’s level), the two roles of each node in the network must be presented. As part of the publish-subscribe system, all nodes are subscribers (though in the current approach, they could be only publishers). As part of the P-Grid system, they are delegates (they keep information about subscribers of the topics that map to the key that they are responsible for). A node can receive a message in its subscriber role because it is interested in the message topic or it can receive a message in its delegate role, and then it has to filter the message (it forwards the message only to the topic subscribers that it is a delegate for) and gossip it to its replicas (which are all delegates for the same topic). The publish-subscribe logic is tightly coupled with the P-Grid replica management, the message forwarding toward super-topics also having to be included at this level. This logic can construct the hierarchy of including topics and forward the message to all of them. 0

1 "virtual binary search tree"

00

10

01

11

1

6

2

3

4

5

routing

routing

routing

routing

routing

routing

data

data

data

data

data

data

replica info

replica info

replica info

replica info

replica info

replica info

P-Grid related managed by P-Grid

Figure 12: Example network The same example is used to show the changes in the data structures implied by this approach. Figure 12 presents an extra field for each leaf, used in the replica management process. Life cycle 

The life cycle of a node in this approach, would translate to:

Subscription: Register the topic interest in P-Grid (one or several delegates will then reference the new subscriber). 16



Publish: Get from P-Grid one of P-Grid replica for the publishing topic. Publish the message at this replica (which, in turn, will gossip it in its replica group, but also send it to a super-topic delegate). Subscribers will receive topic messages from their delegates. 

Unsubscription: Unregister the topic interest in P-Grid (by P-Grid mechanisms, no delegate will reference this node anymore).

Advantages and drawbacks The main advantage of this approach is that both publishers’ and subscribers’ life is much easier (a publisher does not even have to be a subscriber for the topic on which it wants to publish). Moreover, the amount of information required from P-Grid is also small (just one of the P-Grid replicas has to be found for the publishing topic). However, the supplementary logic required from P-Grid is more complex (it must also trace all the topics that include the publishing topic and send the message to a delegate for each of those topics). Another drawback is that P-Grid replicas become potential bottlenecks. An increase in the number of P-Grid replicas for a key, correlated with a decrease of data stored at each replica can solve this problem. But then the system can end up having too many nodes (the delegates) receiving messages they are not interested in (there is no relationship between the delegate for a topic and the topic itself). Because of the gossiping at the delegates’ level and the fact that subscribers can be references by multiple delegates, topic messages will be sent multiple times both in replica gossiping (not an issue with the first two approaches) and in subscriber dissemination (same as previous approaches). Overall, this design was not considered for implementation, due to its large number of drawbacks. 4.2.4

Having a hierarchical topic tree in the leaves of the P-Grid tree

A fourth approach has come from the idea to map the topic hierarchy into P-Grid’s tree hierarchy (Figure 13). How could the mapping of groups to leaves reflect the inclusion relationship existing between the topics? This can be done by imagining super-topics nodes not in the leaves of the tree, but at upper levels in the tree. In this way, a publisher would only have to publish a topic message in the corresponding topic node and the message would naturally go on higher in the tree, while the peers at each level will have the responsibility to disseminate the message in their own group. Publish-subscribe system topic hierarchy in P-Grid tree leaves

Messaging

P-Grid

Figure 13: Having a hierarchical topic tree As no peers exist in the interior nodes of the P-Grid search tree (after all, it is a virtual tree, it represents the query routing decisions), this approach needs a way to translate such a positioning. The simplest way is to map topics that include one another into keys that are in a prefix relationship (the key of a super topic would be a prefix for all its subtopics’ keys). Then, the action of ”going higher in the tree” would actually translate into bigger delegates groups to gossip the message, because the search area includes more and more leafs of the search tree.

17

Again, each delegate will filter the message toward its subscribers and then gossip it between other delegates, both for the topic and the including super topics (because of the prefix relationship, it will have in its routing table references to peers responsible for other keys starting with the same prefix). 0

1 "virtual binary search tree"

00

10

01

11 Root topic search area

Search area for 0

Search area for 00

1

6

2

3

4

5

routing

routing

routing

routing

routing

routing

data

data

data

data

data

data

replica info

replica info

replica info

replica info

replica info

replica info

P-Grid related managed by P-Grid

Figure 14: Topic search area The general idea is presented in Figure 14, where the mapping of super-topics in the leaves of the subtopics determines wider and wider publishing areas for super-topics. A message published on a topic that maps to ”00” would have only peers 1 and 6 disseminating it for this subtopic. Further more, since the message also has to arrive to the topic mapped to ”0”, which is the direct super topic, in the next round, peers 1, 6 and 2 will have to disseminate the message. Life cycle 

In this approach, the life cycle of a node would translate to:



Subscription: Register the topic interest in P-Grid (one or several delegates will then reference the new subscriber).



Publish: Get from P-Grid one of P-Grid delegates for the publishing topic. Publish the message at this replica (which, in turn, will gossip it in its replica group, but also send it to delegates whose keys have the same prefix, thus might be delegates for subscribers interested in the super-topics; the initial delegate has the other delegates in its routing table). Subscribers will receive topic messages from their delegates. Unsubscription: Unregister the topic interest in P-Grid (by P-Grid mechanisms, no delegate will reference this node anymore).

Advantages and drawbacks The main advantage is the simple and natural translation from the topic hierarchy to the P-Grid search tree. However, the drawbacks appear in greater number than in the previous solution, because of the prefix relationship between keys. At the present time, a well defined hashing function that would take care of this inclusion relationship does not exist and it would also have to take into account the impact on P-Grid tree’s depth and shape. To disseminate to messages all the way to the root topic, the dissemination area could span to the whole network in search for delegates and only after this step, the delegates would perform the filtering. Thus, it would tend to transform into broadcast at the delegates’ level and the fact that at subscribers’ level no parasite message is sent becomes unimportant. Because of the gossiping at the delegates’ level 18

and the fact that subscribers can be referenced by multiple delegates, this approach will also cause topic messages to be sent multiple times both in replica gossiping (not an issue with the first two approaches) and in subscriber dissemination (same as previous approaches). This solution was also not considered for the implementation phase, due to the broadcast-like dissemination of the messages for the root topic at the replica level.

5 Implementation This section presents the implementation details of the project. The implementation has two versions, which differ in the membership service, provided either by an implementation of SCAMP (as the faithful design in Section 4.2.1 described it) or entirely on P-Grid (the approach of the Section 4.2.2 design) It starts with a general overview that describes the main packages and their interactions. Then, it continues with presentations of each logical block of functionality, namely the support blocks (membership and communication) and the algorithm implementation, the daMulticast package. Each section will underline the problems encountered and their solutions.

5.1 General overview The general design view presented in Section 4.1 maps to the general implementation overview of Figure 15. It is made of the top-level packages: daMulticast, peer2peer and pubsub and shows the dependency relationships between them, as well as the packages which implement the interfaces packages. The daMulticast package encapsulates all the logic necessary to implement the daMulticast algorithm. It depends on the peer2peer package that offers the membership and communication primitives and on the pubsub package that offers the topic-based publish-subscribe primitives. damulticast

pubsub

dam

peer2peer

membership

comm

Figure 15: Package overview The peer2peer package (Figure 16) encapsulates both the membership and the communication primitives and provides means of node localization (needed message exchange between nodes). The LocalPeer interface provides means of accessing the local node and its allocated resources (communication and membership related). The daMulticast implementation relies on the LocalPeer to interconnect with the supporting services. The RemotePeer interface provides means of addressing other nodes, to be able to communicate with them (to exchange messages). The membership interface will offer common group management functionality and its implementation will delegate membership operations to the underlying P2P system, in this case P-Grid. The communication interface provides the primitives for bidirectional commmunication between peers. It receives messages (modeled by the Message interface) from other peers in the network through the InputPipe interface, which manages a list of interested listeners (based on 19

> LocalPeer

>

>

MembershipManager

PipeManager

>

>

>

>

RemotePeer

OutputPipe

InputPipe

PipeListener

> Message

Figure 16: The peer2peer package the PipeListener interface) towards which it will dispatch the received messages. Local issued messages are sent to other (remote) peers through a dedicated OutputPipe. The pubsub package (Figure 17) contains topic based publish-subscribe primitives, modeled by the Topic and TopicSubscriber interfaces, which allow publishers to publish topic events and subscribers to receive topic events, respectively. Topic management requires the presence of a local TopicManager to retain the existing topics at its node and the presence of TopicSubscribers that are interested in the topics. A similar registration mechanism manipulates the dynamic relationship between the local topics and their (local) subscribers, as described in Figure 15. Each node will have one TopicManager, obtained from the LocalPeer; the manager knows all local topics and each of them knows all its local subscribers. The publish-subscribe level also deals with messages, which, in this case, encapsulate the actual topic related content. >

>

>

TopicManager

Topic

TopicSubscriber

> Message

Figure 17: The pubsub package

20

5.2 Communication The communication package is totally independent. It deals with the concept of ”pipes” (as the names of the encapsulated interfaces announce) between the local peer, at one end, and remote peers, at the other end. The two types of pipes are the InputPipe that receives messages and the OutputPipe that sends messages to other peers. In the implementation, each LocalPeer has exactly one InputPipe and can open as many times as it needs OutputPipes toward remote peers. Figure 18 shows, through a sequence diagram, the steps required for the local peer to send a message to a remote peer. First, the local peer must have a reference to the pipe manager, which knows how to create an output pipe. Then, it asks the pipe manager to provide it with an output pipe toward a specific remote peer. The pipe manager returns such a reference (that it obtained by calling one of the OutputPipe’s constructors) and the local peer sends its message and subsequently closes the output pipe.

Object

localPeer : DefaultLocalPeer

pipeManager : PipeManagerImpl

getPipeManager() : PipeManager pipeManager : PipeManager getOutputPipe() : OutputPipe : OutputPipeImpl outputPipe : OutputPipeImpl sendMessage(message) : void close() : void

Figure 18: The usage of OutputPipe Things are somewhat more complicated with the input pipe. Each local peer has only one input pipe that receives all incoming messages. The desired behavior states that processes should not be concerned with messages in which they have no interest and they should not block while waiting for a message to arrive. The following solution solves this problem: every entity interested in messages must implement the PipeListener interface, whose only method (the onMessage(message) method) is called whenever a new message is available, and register itself as a listener to the unique input pipe which receives the new messages. To avoid that the input pipe blocks while it dispatches the received messages to its listeners, a MessageDispatcher fulfills the task of periodically checking the received message queue and sending them to interested listeners. Figure 19 depicts the typical use of the input pipe. An interested listener first requests from the local peer the reference of the input pipe (instantiated in the initialization phase). Then, it registers itself as a pipe listener. Whenever the input pipe receives a message, it deposits the message in the received messages queue. The associated message dispatcher (created by the input pipe at the initialization phase) periodically checks the queue and sends the message to all listeners that already registered. Thus, the interested listener will receive the message. When it is no longer interested, it unregisters itself from the input pipe, which ensures that he will not receive any more messages. In the actual implementation, output pipes use a simple socket (java.net.Socket), to connect to a remote peer and send it a message. The input pipe, however, uses a ServerSocketChannel 21

pipeListener : PipeListenerImpl

localPeer : DefaultLocalPeer

pipeManager : PipeManagerImpl

inputPipe : InputPipeImpl : MessageDispatcher messageDispatcher

getPipeManager() : PipeManager pipeManager : PipeManager getInputPipe() : InputPipe inputPipe : InputPipe register() : void onMessage(message) : void onMessage(message) : void unregister : void

Figure 19: The usage of InputPipe (from the java.nio.channels package), which can open (and manage) a dedicated socket for every incoming request, without blocking on any of them.

5.3 Membership As mentioned before, the membership package only offers the basic requirements for a membership service. It is the interface between the actual membership service that daMulticast needs and P-Grid, which is the starting point for both membership implementations (based entirely on P-Grid and based on P-Grid and SCAMP, respectively). Every design presented in Section 4, initializes daMulticast at each node by registering to the interesting topics. When a topic is no more of interest, the node must unsubscribe from that topic. This registration process translates, in membership terms, to joinGroup() and leaveGroup() methods (Figure 20), which inform P-Grid about new or parting members. At any point, the membership manager can provide members of a given group, through the searchPeers() method. Indeed these are all the primitives a higher-level membership service needs, the means to specify group affiliation and to retrieve group members.

MembershipManagerImpl DefaultLocalPeer +joinGroup():void +leaveGroup():void +searchPeers():Collection

... File newFile = new File(group + ".xml"); newFile.createNewFile(); ...

... File file = new File(group + ".xml"); file.delete(); ...

Figure 20: The membership package The local peer implementation will create a membership manager which will initialize a P-Grid node during the creation phase. As a result, the local peer becomes a node in the P-Grid network. To become a node in the daMulticast network, it must register its topics of interest. For this task, the node uses P-Grid, so a group join means that the node shares a file in the P-Grid network 22

that has the name of the group. All nodes that subscribed to the same topic, share a file with the same name (the topic name). Similarly, to leave a group, a node removes the corresponding file from its shared folder. Then, group membership is transparent to daMulticast, as the underlying peer-to-peer network is able to find all the files with the group name. The searchPeers() method takes advantage of this behavior to retrieve group members. The membership package implements a minimal functionality and acts as an abstract interface for further (higher-level) implementations. A concrete membership service can either develop upon a first contact strategy (which SCAMP does), or upon usage of searchPeers() method in a repeated way (the membership based entirely on P-Grid approach).

5.4 The publish-subscribe primitives The pubsub.dam package implements the publish-subscribe primitives needed in the damulticast system. They relate to the topics in the system, to topic messages, but also to the management of topics at each peer, which TopicManagerImpl ensures (Figure 21). Then, TopicSubscribers just subscribe to the topics that interest them.

TopicManagerImpl

+getTopic(topicName:String):Topic +getSuperTopic(topic:Topic):Topic

TopicImpl

*

+publish(message:Message):void +subscribe(sub:TopicSubscriber):void +unsubscribe(sub:TopicSubscriber):void

Figure 21: The dam package The topic manager manages topics by mapping topic names to actual topic instances, one per topic name. This mechanism gives the manager a complete and easily maintainable view of the topics in the system. Interested subscribers request the topic instances from the topic manager, in order to subscribe to the corresponding topic, thus all subscribers for a topic receive the and subscribe to the same topic instance. The topic manager implements the PipeListener interface and, in addition manages incoming messages. First, it checks if they are new messages (not received another time) and then distributes them to the corresponding topic. Each topic (referenced by the topic manager) knows its subscribers and executes the daMulticast algorithm for incoming (from the topic manager) and published (from the topic subscribers) messages. The next section describes the implementation of the daMulticast algorithm.

5.5 daMulticast Figure 22 presents the damulticast package, which includes the implementation of the daMulticast algorithm (in the DaMulticast class) and of the related topic tables (in the TopicTable class). The other classes in the package serve the actual membership service, which fills the topic and supertopic table. DaMulticast includes the two topic tables it uses, which are both instances of the TopicTable class, but employ different algorithms to initialize and update their content. For the topic table, it is best if the algorithm can profit from the features of the membership algorithm. The supertopic table has only one content update algorithm, which provides it with the required remote peer population. From a structural viewpoint, topic tables are lists of remote peers, so they provide means of adding

23

DaMulticast −topicTable:TopicTable −supertopicTable:TopicTable

TopicTable 2 +get(remote:RemotePeer):void +add(remote:RemotePeer):void

+disseminate(message:Message):void

Figure 22: The damulticast package and removing remote peers, requesting a given peer and so on. The implementation extends the list concept with the specification of the content algorithm, used by each type of topic table. The following subsections present the logical components of the algorithm and the differences between the two considered membership service implementations. 5.5.1

Topic table

The topic table of every instance of DaMulticast contains references to remote peers that are interested in the same topic as the one the parent instance manages. When DaMulticast creates the topic table (in its initialization phase), it also provides the table with the specific membership algorithm the table must use. In this way, it chooses between the two implemented membership services. SCAMP solves the membership problem in the first version of the implementation. The nodes exchange various membership-related messages, to announce a join, a departure or to forward a new subscription, in addition to the topic-related messages. Based on first type of messages, the topic table includes or excludes peers, which receive (through the dissemination process) messages of the second type. Besides the two daMulticast topic tables, the SCAMP algorithm usess a third one, called the inview table, which contains the remote peers that reference in their topic table the local peer. This last table’s content is changed based on inview messages (stating that a peer should be added in the receiver’s inview table) and messages that announce a departure from the group. In the SCAMP based membership implementation, the searchPeers() method is no longer used, instead, the joinGroup() method from the MembershipManager class returns a contact for the newly subscribed group. The membership algorithm starts to fill the topic table and the inview table, using the contact as a starting point. The table size tends toward a value proportional to the group size. The P-Grid based membership service relies on an update algorithm, which provides content for the topic table, based on the searchPeers() method from the MembershipManager class. With a time interval, the update algorithm repeats the method call, to have an updated view of the system (to ensure that the topic table includes newly joined nodes and excludes departed nodes). 5.5.2

Supertopic table

The supertopic table references nodes interested in the supertopic of the topic of interest for the parent DaMulticast instance. Its content corresponds to the ”super processes” presented in Section 2. Unlike the topic table, the supertopic table’s size does not vary with the size of the corresponding group, but has a constant size, which is configurable. When DaMulticast creates the supertopic table, it first has to find out the super topic of its own topic (through the getSuperTopic() method from the TopicManager). Then, it provides the table with an update algorithm similar to the one used in the P-Grid based approach for solving

24

the membership. The main difference is that the amount of results returned by the searchPeers() method is bounded to the constant which defines the supertopic table size. 5.5.3

Dissemination in own topic

The dissemination logic in the daMulticast has two parts, one that disseminates the topic message in the node’s own topic group and a second part that disseminates the message in the super topic group. This section describes the implementation of the first issue, while the next section presents the second one.

sub : TopicSubscriber

topic : TopicImpl

dam : DaMulticast

topicTable : TopicTable

supertopicTable : TopicTable

publish(message) : void disseminate(message) : void get() : RemotePeer : RemotePeer get() : RemotePeer : RemotePeer

Figure 23: The dissemination algorithm The daMulticast algorithm uses gossip-based techniques to disseminate messages. Each node has only a partial view of the group membership of its topic, in its topic table. Thus, it sends the message to peers in the topic table, which will do the same, if they did not receive the same message another time. Because the membership service goal is to provide overlapping partial views, this gossiping technique should reach all the members of the topic group, with possibly multiple sends of the same message to any peer. However, no peer gossips the same message twice (every peer’s topic manager ensures this constraint). 5.5.4

Dissemination in super topic

The dissemination process toward peers in the supertopic table has a number of supplementary constraints. First, every node decides, with a certain probability (another parameter that determines the number of messages sent versus the reliability of the dissemination), whether or not it will act as a link between its own topic group and the supertopic group. If the node decides to be a link, it will send, with some probability, the topic message to every peer in its supertopic table. This second probability determines how many peers from the supertopic table will receive the message and is another parameter for the just-mentioned trade-off. Figure 23 depicts the message dissemination algorithm. It is a two steps algorithm; the first step manages the dissemination in the peer’s own group and accesses the topic table to get references to peers from the same group. The second step manages the dissemination in the group of the peer’s supertopic, thus the super topic table is accessed, to get the needed peer references.

25

6 Test settings and results To compare the two versions of the implementation, they were tested in the same settings for their reliability (the percentage of subscribers receiving the topic messages) and efficiency (the percentage of useful messages from all topic messages sent). Other measurements present the number of rounds needed to disseminate the message to all subscribers and the amount of messages between a topic and its super topic. The dissemination probability toward the super topic was the varying parameter for the tests.

6.1 Test settings



Each test employed a linear architecture comprised of three topics, where topic is the root topic,   as its super topic and  is the leaf topic and has  as its super topic. Each has topic has a number of subscribers, 100 for the leaf topic, 38 for the intermediate topic and 8 for the root topic. Among the 100 subscribers of the leaf topic, one is the publisher, thus, in each test, there is one published message that should arrive to all (146) subscribers in the system. The test settings for P-Grid determined the construction of a new grid for every test, while each daMulticast subscriber was also a node in P-Grid. The tests allow some amount of time for the grids’ construction and only then the the publisher is publishing the topic message. A test set is comprised of 5 test; each test used a different dissemination probability toward the super topic, starting with 0.2 and increasing by 0.2 until 1. There was one test set for each implementation.

6.2 Test results As stated, for each implementation, the tests measured their reliability, latency, efficiency and amount of upward events between topic groups. Figure 24 presents the reliability and latency for the test set of the Scamp membership based implementation. 9

1.2

8 1 7 Latency (rounds)

Reliability

0.8 T0 T1

0.6

0.4

6 T0 T1 T2

5 4 3 2

0.2 1 0

0 0

0.2

0.4

0.6

0.8

0.2

1

0.4

0.6

0.8

1

Dissemination probability

Dissemination probability

Figure 24: SCAMP based membership implementation’s reliability and latency For a good reliability, the dissemination probability must be quite high (over 0.4). The latency depends on the dissemination probability, but also on the publisher’s behaviour of directly disseminating in the super topic and on the construction of partial views. Fow the leaf topic, there is no influence of the dissemination probability, the results depend only on the actual partial views each member has. Figure 25 presents the efficiency and the number of upward events for the root and intermediate topics for the test set of the SCAMP membership based implementation. The efficeincy decreases with the dissemination probability, while the numaber of upward events increases. Figure 26 presents the reliability and latency for the test set of the P-Grid based membership based implementation. The algorithm presents good reliability even for a 0.2 dissemination proba26

180

8

160

7

140

6 T0 T1 T2

5 4

Upward events

Latency (rounds)

9

120 100

T0 T1

80

3

60

2

40

1

20 0

0 0.2

0.4

0.6

0.8

0.2

1

0.4

0.6

0.8

1

Dissemination probability

Dissemination probability

Figure 25: SCAMP based membership implementation’s efficiency and upward events 1.2

10 9

1

8

Reliability

T0 T1

0.6

0.4

Latency (rounds)

7

0.8

6 T0 T1 T2

5 4 3 2

0.2

1 0

0 0

0.2

0.4

0.6

0.8

1

0.2

0.4

Dissemination probability

0.6

0.8

1

Dissemination probability

Figure 26: P-Grid based membership implementation’s reliability and latency bility. The latency depends on the dissemination probability, but also on the publisher’s behaviour of directly disseminating in the super topic and on the construction of partial views and appears to be similar to the previous implementation. 0.3

160 140

0.25

Efficiency

T0 T1

0.15

Upward events

120

0.2

100 T0 T1

80 60

0.1 40

0.05 20

0

0

0.2

0.4

0.6

0.8

1

0.2

Dissemination probability

0.4

0.6

0.8

1

Dissemination probability

Figure 27: P-Grid based membership implementation’s efficiency and upward events Figure 27 presents the efficiency and the number of upward events for the root and intermediate topics for the test set of the P-Grid membership based implementation. The efficeincy decreases with the dissemination probability,but a bit less than in the first implementation, while the numaber of upward events increases with the dissemination probability, also a bit less than with the SCAMP membership. The two implementations are both functionally correct and differ slightly in the test results. They can be interchanged with aproximatly the same results, but, in addition to the small improvement of the P-Grid based membership implementation, it also genereates less network traffic (no 27

exchange of membership messages).

7 Related work Similar to daMulticast, Scribe [8] (on Pastry [7]), HiCAN [6] (on CAN [5]) and Bayeux [10] (on Tapestry [9]) provide application-level multicast on top of an underlying overlay. Scribe supports large number of groups, with a potentially large number of members per group. It uses Pastry to create and manage groups and to build multicast trees for the dissemination of messages to each group. Each multicast tree starts at a rendez-vous point associated with the corresponding group. Publishers on the topic must contact this node in order to publish their event, which is then routed through the tree. This mechanism makes the algorithm sensitive to failures of the rendez-vous and routing nodes, while in daMulticast approach, this is not the case. Scribe nodes that are part of a topic’s multicast tree may or may not be subscribers to the topic, which implies receiving parasite messages, a disadvantage that daMulticast avoids. HiCAN (CAN multicast) also supports multiple groups and relies on CAN to manage these groups; it also uses a spanning tree. The multicast algorithm constructs a tree for each topic. Its common point with daMulticast is that nodes of such a tree are all members of the corresponding topic (thus no parasite message is received) and an event publishing corresponds to flooding such a local CAN. However, because of the mechanisms involved in managing group membership (joins and departures) and routing, failure of nodes pose a series of problems. Moreover, it is significantly more expensive to build and maintain separate CAN overlays per group than to provide the membership service that daMulticast requires. Bayeux also builds a multicast tree per group on top of Tapestry. The root of this tree is the source node, which is the only possible source for events for the group topic, as opposed to daMulticast that allows any group member to be an event-source (to publish messages on a topic). In addition, each group join in Bayeux is routed by Tapestry all the way to the source node, which makes it a single point of failure. The nodes in the multicast tree are not necessarily subcribers of the event-source, they might be part of the tree just as ”routers”, because of the multicast tree construction algorithm that Bayeux uses. This mechanism generates parasite messages at the ”routers” which is not the case with daMulticast. All these related algorithms are based on spanning trees and as a consequence, they are sensitive to failures of processes located at the nodes of those trees. Even if their systems provide fault-tolerance mechanisms for replicating (or replacing) such node processes, these are resourceconsuming and the respective node processes must have more bandwidth and processing power than ”average” processes. Moreover, none of these algorithms considers the hierarchical disposition of topics, as daMulticast does, leading to high memory complexity and/or parasite messages.

8 Future work The current implementations of daMulticast on top of P-Grid are still far from showing the full development possibilities of such an approach. First of all, the implementations could use only those P-Grid features available in the P2P system’s current implementation. Section 4 presented several design approaches that tried to profit from a more consistent use of P-Grid. Another hard problem to solve in publish-subscribe systems is persistency. It would be interesting to study how could message persistency be ensured in the setting of daMulticast and P-Grid and what would that mean from an algorithmic point of view. The first steps in this direction might come from a different exploitation of the TopicManager or Topic interfaces. Their implementation might address this issue by keeping some more information, than the one they already manage in

28

the current approach. However, resolving persistence related requests from peers requires more attention.

9 Conclusions This paper presented the implementation of a topic-based, distributed publish-subscribe system that ensures the dissemination of topic messages published in the system toward the corresponding topic subscribers. Data Aware Multicast, a topic-based, distributed publish-subscribe algorithm is the starting point of the implementation and P-Grid, a peer-to-peer system provided the distributed underlying layer that connects the participating nodes. In order to achieve the best results, various possibilities of interconnecting the publish-subscribe algorithm and the peer-to-peer system were analyzed. The membership service that daMulticast needs was the natural link between the two, however, no constraints are a-priori imposed on how much the underlying layer must be involved. Section 4 presents four different approaches, starting with an architecture faithful to the original architecture (the first design approach), then involving P-Grid more and more (the next two approaches), arriving at a transposition of the daMulticast topic hierarchy idea into P-Grid’s data structures (the fourth approach). Each solution presented advantages and drawbacks and the final implementation has two versions, based on the most promising approaches, that kept the main benefits of daMulticast (no parasite messages and reduced knowledge of other peers). The two versions of the implementation differ by their membership providing algorithm, the original SCAMP membership algorithm, as proposed in [4] of a similar algorithm (in terms of memory complexity) based entirely on P-Grid. SCAMP needs its own messages for solving the membership and a contact node, provided by P-Grid, while the second algorithm exploits better P-Grid’s features. Section 5 describes in greater detail the implementation issues, both from the membership and communication viewpoints. However, none of the implementations offers a natural mapping from the topic hierarchy to the peer-to-peer system’s structures, even though the daMulticast properties are met. Together with ensuring message persistence with the help of the underlying layer, these are interesting future ideas for the development of the project. Note: the conclusions will also discuss the similarities and differences between the two versions of the implementation based on the test results, which are not available yet (they will also generate another Results subsection in the Implementation section).

References [1] K. Aberer. P-grid: A self-organizing access structure for p2p information systems. In Sixth International Conference on Cooperative Information Systems (CoopIS 2001), September 5-7 2001. [2] K. Aberer, A. Datta, and M. Hauswirth. The quest for balancing peer load in structured peer-to-peer systems. Technical report, IC/2003/32, EPFL, 2003. [3] S. Baehni, P. Th. Eugster, and R. Guerraoui. Data-aware multicast. In Proceedings of the 5th IEEE International Conference on Dependable Systems and Networks (DSN ’04), June 2004. [4] A. J. Ganesh, A.-M. Kermarrec, and L. Massoulie. Scamp: Peer-to-peer lighweight membership service for large-scale group communication. In Proceedings of the 3rd International Workshop on Networked Group Communication (NGC), 2001. 29

[5] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable content addressable network. In Proceedings of the ACM Conference on Applications, Technologies, Architectures (SIGCOMM), 2001. [6] S. Ratnasamy, M. Handley, R. Karp, and S. Shenker. Application-level multicast using content addressable network. In Proceedings of the 3rd International Workshop on Networked Group Communication (NGC), 2001. [7] A. Rowstron and P. Druschel. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In Proceedings of the 4th IFIP/ACM International Conference on Distributed Systems Platforms and Open Distributed Processing (Middleware), 2001. [8] A. Rowstron, A.-M. Kermarrec, M. Castro, and P. Druschel. Scribe: The design of a largescale event notification infrastructure. In Proceedings of the 3rd International Workshop on Networked Group Communication (NGC), 2001. [9] B. Zhao, J. Kubiatowicz, and A. Joseph. Tapestry: An infrastructure for fault-tolerant widearea location and routing. Technical report, UCB//CSD-01-1141, U.C.Berkeley, 2001. [10] S. Q. Zhuang, B. Zhao, A. Joseph, R. Katz, and J. Kubiatowicz. Bayeux: An architecture for scalable and fault-tolerant wide-area data dissemination. In Proceedings of the 11th International Workshop on Network and OS Support for Digital Audio and Video, 2001.

30

Suggest Documents