SONNET: subscription using path queries over structured overlay ...

7 downloads 54 Views 461KB Size Report
Application-level content-based routing using XML is a key technology for decentralized publish/subscribe systems. In this paper, a new approach is proposed to ...
Front. Comput. Sci. China 2007, 1(2): 213−225 DOI 10.1007/s11704-007-0022-3

RESEARCH ARTICLE

SONNET: subscription using path queries over structured overlay networks QIAN Weining1,2, XU Linhao3, ZHOU Aoying ( )1,2, ZHOU Minqi1 1 Departmenet of Computer Science and Engineering, Fudan University, Shanghai 200433, China 2 Software Engineering Institute, East Normal University of China, Shanghai 200062, China 3 Department of Computer Science, National University of Singapore, Singapore, Singapore 117543, Singapore

© Higher Education Press and Springer-Verlag 2007

Abstract Application-level content-based routing using XML is a key technology for decentralized publish/ subscribe systems. In this paper, a new approach is proposed to support the efficient dissemination of XML packets when allowing the clients to specify their subscriptions with path queries. The proposed method is based on Chord-liked distributed hash table (DHT) scheme. The integration of XML packet filtering and finger table based routing in structured overlay networks provides an elegant base for the proposed SONNET system, upon which the optimization techniques are studied. Analytical and empirical results have shown that the coupling of disseminating and routing in publish/subscribe systems could offer robustness and extensibility for the systems, while the decoupling of the two aspects brings more scalability and workload balance. Extensive empirical studies have demonstrated that the proposed method outperforms previous efforts for contentbased routing. Keywords Publish/Subscribe system, data dissemination, distributed Hash table, overlay network

1

Introduction

Along with the popularity of Internet-based applications, collaboration among loosely coupled computers has become a basic requirement for modern large-scale distributed information systems. The publish/subscribe (P/S) communication paradigm is thought to be appropriate for this type of systems that have a large amount of computer nodes over the Internet [1]. In this paradigm, the information is published to a middleware service by publishers, while subscribers retrieve information based on their subscriptions. Received October 10, 2006; accepted February 25, 2007 E-mail:{wnqian,zhouminqi,ayzhou}@fudan.edu.cn, [email protected]

Many research efforts have been devoted to event-based P/S techniques. In event-based P/S systems, information notifications are organized as events. The subscription can be issued based on the topic or type of the events. The content-based P/S system has recently become a field of interest in both database and network research communities. Compared to traditional event-based P/S systems, contentbased systems introduce a dynamic subscription scheme based on the properties of source information, which offers more expressive power. The subscribers enter the network with specifications describing parts of their interest. The specification is often in the form of name-value predicates, which is understood by middleware service to identify the part that the subscriber is interested in. Motivated by the fact that XML is starting to act as a standard for data storage and exchange, it is natural to represent source information in XML forms, while the specifications are in the form of XML queries [2]. In order to support efficient and reliable publishing and subscription, the distributed architecture is often used for implementing middleware service of P/S systems [1]. The middleware service can be provided by a set of distributed servers or by processes on publishers and subscribers. In either approach, a set of distributed nodes collaborate to promise scalability, availability and robustness of P/S systems. This paper focuses on the problem of content-based P/S with XML data sources. There are several requirements for such kind of systems. First, it should provide sufficient power to represent user’s specifications. For XML data, a subset of XQuery or XPath query language is a natural choice for defining the user’s interest. Second, though the complexity of XML documents and query languages introduces additional cost, the system should stay efficient in terms of data publish, subscription management, and data dissemination. Furthermore, in a large-scale network, the system should be scalable when the number of publishers and subscribers increases. Last but not the least, the system should be robust and available even when some nodes or network links fail to work. We present SONNET, a framework for reliable and efficient

214

subscription of XML data with path queries. Taking advantage of the reliability and high routing performance provided by structured overlay networks, SONNET maintains subscription specifications in a distributed manner. The publishers generate the source information in XML documents. Each document transmitted in the system is attached with a head. The head along with the XML document forms a packet. The XML document is called the body of the packet. In SONNET, the head of a packet is a summary of the packet body. The basic idea of the SONNET is to determine where to disseminate a packet by comparing the head of a packet and the entries in the finger table1. This process does not need to parse the packet body, i.e. the XML document. Therefore, it is much more efficient than current XML document filtering techniques. Furthermore, SONNET balances the workload on different nodes and network links dynamically. It introduces a series of optimization techniques to provide better quality of service (QoS). 1.1

Our contribution

The major contributions of our work could be summarized as follows: ● A new framework for content-based routing in Chordliked rings, called SONNET, is introduced. As a natural extension to existing DHT based overlay network techniques, the path query based subscription mechanism is more general than current event-based P/S systems. To the best of our knowledge, this is one of the first efforts for integrating XML packet dissemination with routing in underlying structured overlay network. The previous work either routes queries to data sources or directly uses path queries as entries in routing table. SONNET is more flexible, since it can be built upon existing overlay network services. ● We use path digests to summarize both path queries and XML packets. Path digest is composed of two parts, content digest for summarizing the element names, and order digest for summarizing the order of the elements. Analytical and empirical results have shown that small path digest can provide powerful filtering functionality. ● Two routing schemes, i.e., basic subscription and advanced routing, are described in detail. While being simple and fully integrated with the Chord-liked finger tables, the basic subscription method is inflexible in terms of up-stream router selection and unscalable on the number of possible element names. Despite the complexity, advanced routing leaves rooms for optimization. Our experiments show that even with more nodes in the system, the advanced routing is more efficient in terms of the number of hops and latency than the basic one. 1.2

Paper organization

The rest of this paper is organized as follows. Related work is introduced in Section 2. We analyze the requirements of content-based P/S systems in Section 3, and the overview of the SONNET system is also provided there. In Section 4, the basic subscription scheme is introduced in detail. The 1

method for handling dynamics of the network is introduced in Section 5. Section 6 is devoted to the enhancement technologies for further improving the performance of the basic subscription method. After the experimental results introduced and discussed in Section 7, the last section is for concluding remarks.

2

Related work

Recently, the community has focused on content-based routing using XML (or RDF). In this style of routing, the packets are routed based on their content, rather than destination address. Mesh-based approach [3] is an early work, which aims at highly reliable and timely data transmission. An (n−1)-resilience client connects to n parents, receiving duplicate packet streams from each parent. Thus, it can still receive packet stream if (n−1) independent links fail without being repaired. Compared with mesh-based system, our SONNET system is designed for the situations where clients’ need is written in path queries. Other works include the view selection [4] by improving the query evaluation at each router, SemCast [5] by selecting the overlap channels to avoid duplicate packets, Meghdoot [6] for content-based routing over P2P networks without path queries, and P2P-DIET [7] based method for RDF query processing [6]. Since the initial success of file-sharing systems such as Napster and Gnutella, there has been a growing interest in peer-to-peer (P2P) networks. The structured (distributed hash table (DHT) based) overlay networks, such as Chord [8], CAN [9], Pastry [10], Tapestry [11], and Koorde [12] had been proposed to solve the problems in key-based search applications. The DHT mapping scheme bears the core advantage of load balancing. These systems can be proven to be scalable. After that, the application-level multicast infrastructure is introduced over these systems. Two examples are CAN multicast [13] built upon CAN and SCRIBE [14] built on top of Pastry. CAN multicast uses the routing tables maintained by CAN to flood messages to all nodes in a CAN overlay network. Each group has a separate CAN overlay. In SCRIBE, no separate overlay networks are constructed. Instead, a multicast tree is constructed by the rendezvous point of each group. Another combination of query processing over XML data and P2P-based technology is to route the query to nodes storing the required XML documents, such as the work reported in Refs. [15-17]. It should be noted that SONNET is motivated by a different application scenario where most subscription queries are continuous queries, and the source XML documents that can answer a specific query may scatter over various nodes. Considering a researcher who is interested in P2P technologies presented in research papers, the desired paper may be published in DBWorld, DBLP, CiteSeer, or any other similar web sites. For such a case, query routing based methods are not applicable.

Finger table is a table containing pointers to neighborhood nodes. It is used to route messages and maintain the connectivity in structured overlay networks.

215

The work presented by Bonifati et al. [17] coincides with SONNET in which their system also supports linear path queries. SONNET is different from theirs in that, first, SONNET is designed for publish/subscription while their work is for pull-based query processing; second, SONNET supports predicates of exact match, while their work supports structural search only. On the other hand, XML filtering and XML stream processing have attracted much attention recently. Different approaches have been proposed for efficiently filtering of XML stream when given a set of XML queries (say, XPath queries). Existing approaches could be divided into the following three categories: automata-based filtering, e.g., YFilter [18], indexing-based filtering, e.g. Index-Filter [19], and approximate filtering, e.g. Bloom-filter-based approach [20]. However, the work reported in this paper has different focus compared with the XML packet filtering techniques. Since the routers are themselves clients, which are usually personal computers, they are not capable of taking the heavy burden of parsing the XML packets which are to be disseminated in the system. Therefore, SONNET intends to distribute the filtering process to different nodes. However, because the existing XML packet filtering technologies are independent to whatever presented in this paper, they can then be employed to implement the XML query processor for individual client which provides the final answers to subscriptions. ONYX [2] is the most related work to SONNET, which shares the same motivation as ours. The key difference is in that ONYX directly uses XPath queries as entries in the routing table. Thus each forwarding process is essentially a filtering process; however, Sonnet uses an additional header, which is a 128-bit string, for each packet. The header is checked via bit-based operators in the routing process, which is much cheaper than XML filtering. Consequently, the routing also serves as an approximate dissemination process in Sonnet, only packets passing through the routing is evaluated by the XML filtering module.

3

Motivation and system overview

Publish/subscribe systems are important applications in large-scale network environments. Content-based P/S is important for applications such as news-feed subscription and web-log (blog) aggregation. There are several challenging requirements for content-based P/S, that motivate the design of SONNET. They are listed as follows: (1) Powerful subscription query language The information published may contain several fields (attributes), and be organized in a tree structure. Information consumers may subscribe interested content by issuing queries on several fields or on the relationships of the fields. Since XML has become the de facto standard for information publish and exchange, publishing in XML and querying in XML language are a natural choice for P/S systems. (2) Accurate yet efficient dissemination It is important to guarantee every user can get the infor-

mation he/she subscribes. Furthermore, the P/S service should be efficient, which means the information should be delivered to subscribers in time. (3) Low computation and network cost Since most P/S applications are built on large-scale network, the P/S process should not overload the system. Any node in the system should not be assigned too many tasks for information dissemination. And the information should not be flooded in the network. Each network link should not be overloaded by data transmittal. Therefore, a P/S system is more than a system with separated dissemination nodes. Performance optimization functions, such as workload balance, should be integrated with the dissemination modules. (4) Overlay as a service The P/S application sometimes overlaps with other applications. To construct a separate overlay network for each application increases additional overhead to the nodes in the network. Furthermore, existing overlay network technologies have proven to be efficient in terms of key-based lookup. We argue that the P/S systems can treat the overlay network as a service, which may increase the robustness of the system and would not affect the performance. 3.1

System overview

The node architecture of SONNET is illustrated in Fig. 1 (a), while a sketch of the network architecture is shown in Fig. 1 (b). A SONNET node is composed of two layers. The higher layer is a conventional XML filtering engine that can evaluate XPath queries over XML packets the lower layer returns. The lower layer distinguishes SONNET from existing P/S systems. It is built upon a Chord-liked overlay network. Each node is assigned a unique identifier. A set of pointers to neighboring nodes are maintained in the routing table, in which each entry is a pair of node identifier and network pointer. When a packet is received, its head is compared with the node identifier and all identifiers appeared in the routing table. When a potential match occurs, the packet is returned to the higher layer or transmitted to the neighboring node corresponding to the matched identifier. Note that both the head of the packet and the identifier of the node are bit-strings. The partial matching test is a process of invoking bit-string operators. Therefore, the dissemination in the lower layer is rather efficient compared to previous XML document dissemination methods. Though dissemination in this layer is approximate, the accurate subscription can be achieved via the higher layer filtering. Furthermore, we will show that the lower layer dissemination is quite powerful for filtering out a large number of packets. This is important for large-scale network, since each node devotes a small portion of computational power to help the whole system to stay efficient. The details of the lower layer dissemination are introduced in the next section. When a query is posed by user, the node generates its node identifier based on the query. Then, the node registers its query and node identifier to the up-stream node. An up-stream node is a node having packets to feed the down-stream nodes. Thus, the essential idea behind the

216

Fig. 1

The SONNET network architecture

SONNET is to map the publish/subscribe relationships to the up- and down-stream relationships in the overlay network. By fully utilizing the network links, SONNET becomes very robust and increases availability. Furthermore, the contentbased dissemination is thus translated to low-cost bit-string comparison process. Therefore, the requirements listed above can be met. We also introduce enhancement technologies to further optimize the performance in Section 5.

4 4.1

Basic subscription scheme Document and query encoding

In SONNET, an XML document is treated as a single rooted document with nested elements. Each element has its element name, or tags. An element may contain contents. Currently, SONNET supports categorical contents, which means

the content can be searched only by exact match queries. A subset of XPath query is supported by SONNET. The Sonnet could support linear path expression, by linear we mean that any two elements or attributes e1 and e2 in the path have descendant relationship. In Sonnet, attributes are treated in the same way as elements. Sonnet also supports equality predicates, for exact match. This subset of XPath query is chosen for two reasons. First, this is an important subset of query that has strong motivation [17]. More advanced queries can be decomposed into several linear queries and be filtered further by the higher layer of the system. Furthermore, our document and query encoding technology could integrate filtering for this kind of queries into overlay routing with low overhead. Each document is encoded as a set of digests, while each query is encoded into one digest. A digest is composed of a content digest, for summarizing the content of elements (including the element name), and order digest, for summarizing the nesting relationships of the elements.

217

For an m-bit content digest, an element e is mapped by using a hash function fm:s→[0, m−1]. Thus, the hash function

Algorithm 1 Given a permutation, compute its rank

for an element is defined as: h(e) = x0x1 ... xm−1 in which:

1: if n = 1 then

⎧1, xi = ⎨ ⎩0,

i = f m (e), otherwise.

For a path with n elements: p:{e1, e2, ..., en}, the content digest is defined as follows: ⎛n ⎞ dc ( p) = ⎜ ∨ h(ei ) ⎟ . i = 1 ⎝ ⎠

Here, ∨ means the bit-wise OR of a set of bit-strings. According to general path queries, “// ” is just omitted when producing the content digests. Thus, paths e1 / e2 //, e1/ e2, e1//e2, and // e1 / e2 all have the same content digest. An order digest includes (l−2) integers, each of which is a rank of permutation of the partial path from the root ‘/’. Here, l is the maximum length of the paths whose order information can be stored in the order digest, and it is determined by the bits devoted to order digest. Recall that the content of the elements has been summarized by the content digest, so when generating the order digest, only the alphabetic order of the elements is considered. For an element appearing more than once in the path, each occurrence is assigned a different order. Taking paths /a/c/b, /b/d/c, and /a/b/a for example, they would all be treated as (1, 3, 2), which is a permutation of set {1, 2, 3}. The rank of a permutation π = π0, π1, ... , πn−1, denoted by rp(π), is a unique integer in the range [0, n!−1]. There is a one to one mapping between a permutation and its rank of permutation (RP), since for a set of n elements, there are exactly n! permutations. Many research efforts have been devoted to rapidly computing the RP when given a permutation; however, it is shown that this can be achieved in O(n) time complexity [21]. For the sake of self-containment, the algorithm is mentioned as Algorithm 1 in this paper, where π−1 [π [i]] = i for i = 0, 1, ..., n−1. Given two paths p1 and p2, we can use only rp(p1) and rp(p2) to test whether these two paths are equal to each other. However, we need prefix matching, since usually the query target is an intermediate element in a path, instead of a leaf element. Therefore, for a complete path p = /p0 /p1/.../pk, all RPs of all paths /p0/p1, /p0/p1/p2, ... /p0/p1/.../pl are computed and stored in sequence in order digest. In current implementation of SONNET, l is set to 8. For a path whose length is less than 8, zero is used as the RPs of /p0/p1/.../pi where i≥k. The bits needed for each RP is shown in Table 1. We can see that only 56 bits are needed to store the order digest of paths with length less than 8, which is sufficiently large for many applications [18, 20]. The order digest of a path p is denoted by do(p). A predicate is treated as an additional element, and it is for producing content digest only. Thus, a path /a/b[@c = ‘v’] is transformed into p:(a, b, @c, @c =‘v’) for generating the content digest, and (a, b, c) for generating the order digest.

Integer rank (n, π, π −1)

2: return 0; 3: end if 4: s←π[n−1]; 5: swap (π[n−1], π[ π−1[n−1]]); 6: swap (π−1[s], π−1[n−1]); 7: return (s + n×rank (n−1), π, π−1); Table 1

The assignment of bits in order digest

Index

2

3

4

5

6

7

8

Bits Query length n!

1

2 −4

5 −9

10−16

17−26

27−39

40−56

2

3

4

5

6

7

8

2

6

24

120

720

5040

40320

4.2 4.2.1

Approximate filtering using digest Subscription with simple path queries

After an XML packet is generated, the root router generates a path digest for each complete path appearing in the packet. The path digest is attached to the packet, and it is sent with the packet to the down-stream routers whose identifiers match this packet digest. On receiving an XML packet, a router should judge whether this packet should be disseminated and which neighboring node should it be sent to. We show that it can be performed using relatively cheap bit operations on path digest. A simple path query q matches a path digest d(p) if it follows that: 1. All elements in q appear in p, and this can be tested by comparing if dc(q)|dc(p) equals dc(p), where | means bit-wise OR; and, 2. The length(q)-th element of do(q) is the same as that in do(p), and the testing on order digests can be done with masking followed by a equality test. The pseudo-code of the process on a router is presented in Algorithm 2. Each neighboring node is checked, and the packet is forwarded to the nodes whose queries are potentially matched. In the procedure of IsOrderMatch(), the mask m[i] is a bit-string that only has bits in Table 1 with index i set to 1, and other bits set to 0. Note that the order digest is the summary of the order information of paths whose length varies from 2 to 8. For a path whose length is larger than 8, the prefix of first eight elements is used to do the order match testing. We argue that in real life applications, very few path queries would have length larger than 6. Furthermore, by using the prefix, the power of filtering is sufficient to filter out a large amount of XML packets as analyzed in Subsection 4.3. 4.2.2

Subscription with general path queries

A general path query is a path query with ‘//’. The content digest can be used similarly with a simple path query, the

218 Algorithm 2 XML packet processing on routers when only simple path queries exist

4.3

OnPacketGenerate (P)

The performance measurement of the SONNET includes two main parts: precision and efficiency. The precision can be measured by both false negative and false positive. Here, false negative means the ratio of expected packets that fail to be disseminated to the higher layer in the node. As for the false positive, it means the ratio of packets that are not expected but delivered to the higher layer in the node.

1: for all complete path p appearing in P do 2: for all node n in finger table do if IsContentMatch(Pd, n.id) then

3:

Send P to node n;

4: 5:

end if

6: end for 7: end for

Proposition 1. There is no false negative in SONNET basic subscription scheme.

OnPacketArriveSP (P) 1: for all node n in finger table do IsContentMatch(Pd,

2: if

n.id)

and

IsOrderMatch(Pd, n.id) Send P to node n;

3:

4: end if 5: end for boolean IsContentMatch (p, q)

1: if (dc(q)|dc(p)) = dc p then 2: return true; 3: else 4: return false; 5: end if boolean IsOrderMatch (p, q)

1: Initialize masks m[2 : 8]; 2: l←length(p)≤8? length(p) : 8; {Truncation of long paths} {Masking order digests using bit-wise AND: &.} 3: if (do(p)& m[l]) = (do(q) &m[l]) then 4: return true; 5: else 6: return false; 7: end if

utilization of order digest needs to be modified. SONNET uses three additional bits to denote the position of the occurrence of the ‘//’, and takes a best-effort strategy for utilizing order digest for general path queries. For a general path query, the prefix before the leftmost ‘//’ is retrieved. On receiving a new packet, the content digest over the whole path and the RP over the prefix are used to filter the packet, as illustrated in Algorithm 3. Algorithm 3 XML packet processing on routers when general path queries exist

OnPacketArriveGP (P) 1: p'← prefix (n.id); 2: for all node n in finger table do IsContentMatch(Pd,

3: if

IsOrderMatch (Pd, p') then 4:

Send P to node n;

5: end if 6: end for

Performance analysis of basic subscription

n.id)

and

This is obvious, as we use the same hash function to generate the content digest and the same determinant permutation strategy to rank the XML tags in both path digest and packet digest. For the two identical sets of XML tags which are ordered in the same sequence, we could get the same content digest and the same order digest. Because disseminating the XML packets is based on the content digest and order digest, there is definitely no false negative. The computation of false positive ratio is a bit more complex. Here, we consider the scenario with the queried tag set containing no duplication at first, and then the scenario with duplication. Suppose the vocabulary of the element names in the XML documents contains N different tags, and the length of the query is m. In order digest, it contains m−1 parts corresponding to the m−1 tag orders. For the ith part, it can represent (i+1)! different possible orders. Thus, the distinguishing ability of the order digest of a path with length m is m!, for m≤8. Considering the total number of possible paths published is CNm × m ! , the false positive of order digest would be 1−

1

. Similarly, the distinguishing ability of the CNm content digest is Mm, where M is the bit length of the content digest. Mm Proposition 2. The false positive of path digest is 1− m CN for paths without duplicated tags.

Now we consider the scenario where the queried tags may have duplications. The distinguishing ability of the order digest and content digest remains unchanged. However, the number of possible paths published is Nm then. Thus, the following proposition holds. Proposition 3. The false positive of path digest is 1 m !× M m − for paths that may have duplicated tags. Nm Note that a negative false positive ratio means there would be no false positive if the hash function is appropriately chosen. The path-digest-based dissemination method is quite powerful for long path queries. For example, if the vocabulary contains one hundred tags, and the length of the

219

query is six. We use 32 bits for content digest. The false positive ratio is less than 30%. The computational cost of path-digest-based method is low. For a partial match testing, only two bit-string comparison operations are needed, in which one is used to compare the content digest in the query with that in the packet head, and another is to test whether the result is match. Furthermore, for a query with m elements, only 2m bit-string operations are needed for order digest testing. Thus, for a path query whose length is smaller than 8, no more than 20 instructions are needed for partial match testing.

5

Handling the dynamic network

Since the whole SONNET works in a dynamic networking environment, nodes can freely join or leave the network, and each node can register or unregister its queries. A node may leave the system due to node or network failure, and in this case no notification is issued in advance. This section is devoted to the introduction of the basic protocol for handling the dynamic behavior of the system. 5.1

On node’s join and query registration

Recall that a node with a new query is treated as a new node in the system. The node's joining and querying registration share the same process. A new comer generates its node identifier which is composed of three parts: content digest, order digest and node digest, and then issues a join message to a node which is already in the SONNET system. This is called the bootstrapper, or well-known node (wkn). This bootstrapper acts the same as that in Chord and in other P2P systems. The bootstrapper locates a set of candidate up-streams for the newly-joined node according to its node identifier. In particular, the bootstrapper uses Chord-liked primitives to find the routers whose identifiers satisfies that: ● The content digest matches the new comer’s content digest. This means that the up-stream router should have a broader query, so that all useful packets could pass the filter. The nodes satisfying this condition can be reached via links of the first class in the finger table. ● The order digest of the prefix before the first (leftmost) ‘//’ in the new comer’s query should be matched up with the corresponding part in the up-stream router. Then, the new comer would not register to the wrong router. Taking the new path query p:/a/b/c for example, though both /b/a and /a/b are content matched with p, only /a/b is a qualified up-stream router, since do(p)[2] equals rp(/a/b). For finding the routers satisfying this second condition, links of the second class in the finger table are traversed.

There are some situations where particular up- stream nodes do not exist. If some node wants to register to such position in the ring, it degrades its path digest and tries to locate the responsible routers. The degradation occurs in the condition that no qualified up-stream router is found by a

new comer or the qualified up-stream router cannot satisfy other constraints, such as bandwidth capability. The suffix of the path query may be dropped, so that up-stream routers with broader queries can be contacted. The degradation will be recursively invoked, until the root router is reached or a satisfying up-stream router is found. Thus, a path query /a/b/c/d/e may be degraded to /a/b/c/d, /a/b/c, ..., and so on. After retrieving the information of these potential up-stream routers, the new node contacts them for registration process. If the register requests from some down-stream nodes are received, the router registers the information about the identifier and the IP address of the node into the finger table, so that the generated XML packets can be filtered and sent to the down-stream node. 5.2

On node’s leave

As for the node leaving, the reverse of the joining process is applied as follows. The leaving message is sent to all the nodes in the finger table, so that they can update their own finger tables. The registration to the up-stream routers is withdrawn. For down-stream nodes, the leaving router is obliged to find a new responsible router for them, which is usually a node in the finger table of the leaving node. However, if such replacement does not exist in the finger table, as in the case of joining, the degradation procedure will be invoked. Finally, the new up-stream’s information is forwarded to down-streams, and down-stream nodes can replace new responsible routers for the leaving node. After that, the node can leave the SONNET system. 5.3

On node’s failure

The stabilization process of the original Chord has to handle all the finger table entries and the predecessor and successor lists. However, the SONNET needs only to handle the nodes in up-stream set and some special down-stream nodes. The healthy states of these nodes are affirmed by their heartbeat messages. If an up-stream router has not sent packets for a long time, the node pings the router, and if no response, the up-stream router is considered to have crashed. The down-stream nodes are stabilized using the same process as fingers are. After detecting the failure of the up-stream router, the node needs to find an alternative. If there is still up-stream routers with the same content digest and order digest of the crashed router, the node simply asks them to select a new one. If not, the same process on node's join is performed to find a new router.

6 6.1

Optimization Problems in basic subscription

Though the basic subscription scheme introduced in the last section has the advantages that the number of hops for

220

packet transmission is bounded, and the dissemination process can be implemented using inexpensive bit operations, but it suffers from several drawbacks. First, the first intermediate router in the XML packet routing path would be overloaded if the number of possible first element of a path, denoted by E, is large. Note that for an m-bit content digest, only m intermediate routers may become first intermediate routers receiving XML packets directly from root routers. Apparently, these routers may become bottleneck of the system when E is much larger than m. Lacking in quality of service (QoS) is another problem for basic subscription in SONNET. Though the up-stream router is the logically nearest source for providing XML packets for a specific query, it may not be the best in terms of QoS when all the latency, stability, and/or query processing cost are considered. The last but not the least, it is obvious that the ring generated based on basic subscription protocol is not full, i.e., some identifiers will never be taken by any nodes, for all bits in their content digest are zero. On the other hand, though full decentralization is always a goal of research on peer-to-peer computing systems, quite often, there exist some super-peers who provide global services to the whole system in real-life applications. The basic subscription scheme is not flexible enough for supporting such super-peers. In this section, the advanced subscription techniques is introduced. It provides more optimization techniques for improving the performance of SONNET.

date up-stream routers, and sends the registration message to the candidate up-stream routers with any node digest. 3. The acknowledge messages are collected by the subscriber, and the most preferred router(s) are chosen to register the subscription query. 4. The chosen routers fill the subscriber's identifier in its corresponding finger table. Future packets that match with the subscriber’s identifier will be forwarded to the subscriber. The registration process is illustrated in Fig. 1(b). Note that the registration information is stored in the finger table, and no additional storage space is needed. The proximity-based routing scheme relaxes the coupling of the dissemination relationships and the overlay network. It enables more optimization techniques to be utilized in SONNET. 6.3

Up-stream router selection

When a node joins the system, it contacts one qualified up-stream router via well-known node, just as in the basic subscription scheme. But instead of picking up the particular responsible router as the up-stream router in the basic scheme, the qualified up-stream router contacts all its predecessors and successors for the most suitable up-stream router according to the proximity metric. The newcomer may have a choice about how deep this message will go. An example proximity metric includes: (1) Locality aware selection Since consistent hash uniformly distributes nodes to 6.2 Proximity-based routing every identifier, there is no guarantee that physically close The method for generating node identifiers is different from nodes are logically adjacent the in SONNET structure. We have that in basic subscription scheme. In fact, we need to address to manually evaluate the physical distance between nodes. the problem of unscalability at first. The basic idea is to There are several metrics that represent the distance. In SONavoid overloading first intermediate routers by spreading the NET, we use lss = latency(r, c), the one-way latency from the related elements, i.e., the elements sharing the same fm(e), router to the subscriber, for the key operation publish is across a range of identifiers. Consequently, these elements executed from the up-stream router to the down-stream node. are managed by different routers, removing hotspots in basic The latency may be obtained directly from the probing information, which is exact but waste some bandwidth, or may subscription scheme. In proximity-based routing, the node identifier is com- be estimated through some synthetic coordinate systems, posed of path digest and node digest. Node digest is the which is faster but only approximates could be gotten. (2) Most stable first summary of the node setting, and is independent of the query. msf = alt/adt, where alt denotes average live time and adt Therefore, node identifier is longer than packet head. Only the path digest part in the identifier is used for routing a represents average down time. Again, this information can packet. Thus, there may be several nodes with different node be collected via probing by the down-stream node or directly from super-peers. identifiers corresponding to the same query. (3) Cost-based selection With this scheme, the characteristics of the basic subAs mentioned before, it is desired that the queries of the scription scheme, say, up-stream routers have down-streams as their fingers, does not hold again. For example, a node up-stream router contains one of the down-stream node, with query /a/b/c may not have any finger pointing to any which will save a large amount of bandwidth. However, this node with query /a/b, since extended content digest of /a/b/c may not be always available. Instead, we resort to an overlap may no more be 1-bit different with that of /a/b. Therefore, metric. In particular, cbs = overlap(c, r), where the overlap the overlay network protocol needs to be extended to a reg- function evaluates the partial overlap metric between the subscriber and the router [5]. The overlap metric can be istration-based one, which is: 1. When a new subscription query is generated, a sub- based on a query or based on a benchmark. Query-based scriber determines its identifier based on the path digest of overlap metric compares two queries and thus assumes uniform distribution of data, while the Benchmark-based overthe query and its node digest. 2. The subscriber generates the path digest for its candi- lap metric compares the data set after applying the two que-

221

ries on some predefined benchmark. The benchmark-based overlap metric explores the features of applications better, but it depends on a benchmark which typically represents these characteristics, and it is sometimes hard to collect. Thus, the ranking conforms to the scoring function: score(c, r) = α /

7 7.1

Empirical study Experimental setup

SONNET is empirically evaluated on a cluster of forty nodes. It is compared with two previous technologies, namely the + β + γ =1), whose values depend on subscriber’s individual Mesh-based XML dissemination [3] and YFilter [4]. SONNET is compared with the former over P2PSim [22], a needs. multi-threaded, discrete event simulator to evaluate, investigate, and explore P2P protocols, for validating the overall 6.4 Extra facilities using 0-source nodes performance in a large-scale network. The parameter setSince no query occupies 0-path identifiers, this portion is tings of P2PSim is listed in Table 2. We implemented a never used. However, if a node is under-provisioned, it 2-resilience mesh network to compare with SONNET. For fairmay choose to construct another virtual node to a 0-path ness, in SONNET, each node may choose two up-stream routers. identifier. This portion of the system may provide extra The robustness, network overhead, and workload distributions facilities such as statistical information and schema are compared between mesh-based method and SONNET. Mesh-based XML dissemination technology is chosen to information. be compared with for two reasons. First, it shares the same Statistics can be collected periodically for better tuning of the whole system. Usually, each root router is associated idea of ONYX [2] in that the XPath queries are directly used with an XML schema (or DTD), to which all the packets it as entries in the routing table. Second, the mesh-based techpublishes conform. This meta-data information can be stored nology emphasizes stability and robustness of the system. It at particular 0-path virtual node by hashing its name. The shares the same motivation of SONNET, which tends to proplacement and retrieval of such information conform to vide a robustness service for content-based publish/ subChord placement and lookup. If a root router enters the sys- scription. Furthermore, YFilter [18] is used as the filtering tem, it first looks up its schema title in the network, if there engine in mesh-based XML dissemination implementation. is already one root router has the identical schema or a simi- Thus, it is used as a reproduction of the ONYX system. lar one, it has the option of using the already established one, SONNET is also compared with YFilter, an accurate XML avoiding the increase in source number. filtering engine. Since the method used in SONNET is an approximate filtering technique, we use Yfilter as a benchmark 6.5 Dynamic workload balancing for the filtering accuracy of SONNET. Furthermore, we compare the dissemination efficiency of SONNET with YFilter There are two kinds of workload balancing in the SONNET. system, which is released by the author of [18]. Both the The first one is the balance of workloads of the nodes with XML documents and XPath queries are generated using the identical query. It is called horizontal workload balance. document and query generator shipped with the YFilter system. Up-stream router selection guarantees the horizontal workIn all experiments, we use 32 bits as content digest, 56 load balance, since overloaded nodes refuse to serve the bits as order digest. The rest bits in the 128-bit packet head registration request, and light-loaded nodes tend to provide and node identifier are used as flags or node digests. better QoS. In the experimental evaluation, the basic subscription The second kind of workload balance is called vertical scheme is called BR (basic routing), while the method with workload balance. It means the workload on up- and optimization techniques are called AR (advanced routing). down-stream nodes should be balanced. We propose active degradation for vertical workload balance. When the Table 2 The parameters P2PSim took in the experiments up-stream routers with a satisfying QoS do not exist, a Parameter name Value Step subscriber can degrade its digest to become a router corre100 Number of nodes sponding to a shorter query. The degradation procedure is 100−3000 to cut the last element in the subscription query, and N/A Number of elements 400−2000 re-construct the finger table based on the new content diTopology Euclidean N/A gest and order digest. Note that the filtering engine for the Mean time to publish 3000 N/A end user remains unchanged. The overhead of degraded Mean time to live 10000 N/A router is that it may need more efforts to filter and dis500 Mean time to die 0−10000 seminate additional packets for other down-stream nodes Exit time 2 000 000 N/A and itself. This is a trade-off between high QoS and heavy Stabilization timer 10000 N/A overhead of each router in SONNET. When a degraded node 2 N/A Base for SONNET finds that the nodes with the same subscription query are Max-down-stream for no longer overloaded, it upgrades the query by one level. 2 N/A Mesh This technology is employed aiming at the imbalance of K-resilience for Mesh 2 N/A length of queries. lss + β ⋅ msf + γ ⋅ cbs, where α, β, γ are three parameters (α

222

7.2

XML packet dissemination

The filtering power of content digest and order digest is shown in Fig.2. The false positive ratio of our approximate filtering is shown in the figure. It is shown that the longer the paths, the more powerful our path-digest-based method. Even for short path queries with four elements, or attributes, the false positive rate is lower than 1/3. Note that 1) identifier matching is a low-cost process, and 2) only packets passed the identifier matching process are sent to the filtering engine. Therefore, low false positive means fast dissemination and low computation cost. Another interesting result shown in the figure is that the increasing of the number of elements does not affect the false positive rate a lot. Therefore, this approximate filtering scheme is suitable for applications in an open world, in which XML documents published may have different schemas or representation.

corresponding part in YFilter. The experimental result of the packet generating cost is shown in Fig. 3. It is shown that it is very efficient for generating path digests for XML documents. Furthermore, the digest generating time is linear to the depth of the XML documents and the number of packets. The dissemination cost of SONNET is compared with YFilter, as shown in Fig. 4. Since SONNET does not need to parse the packet body in dissemination, the parse time is not included in YFilter either. The filtering time is the time for evaluating ten thousand XPath queries over one hundred XML packets on one node. It is shown that the SONNET outperforms YFilter when the depth of the XPaths is less than eight. Note that SONNET is designed for distributed P/S applications. Usually there are fewer long path queries than shorter ones. Furthermore, the packets arrived at nodes corresponding to long path queries are those have already passed the nodes of short queries. Therefore, evaluating long path queries will not consume too much time. From the experimental results of Fig. 2 and 4, we can conclude that the path-digest-based XML packet filtering technique performs well in terms of accuracy and filtering efficiency. 7.3

Robustness of SONNET

SONNET is developed over a Chord-liked overlay network. We mentioned that the rationale behind the design is that P2P-based overlay network may provide better routing and maintenance performance. S ONNET , with basic routing scheme, is compared with another system designed for XML packet transmitting over a mesh-based network. Since meshbased method has no stabilization mechanism, we turned off the stabilization of SONNET. The result of loss rate is shown in Fig. 5. The loss rate is defined as the number of packets received over the number of packets that are expected to Fig. 2 False positive rate of SONNET receive. It is shown that even without this maintenance The cost of our path-digest-based dissemination method mechanism, the SONNET outperforms mesh-based method. can be divided into two parts. The first part is the cost of Actually, when the stabilization function is turned on, the generating path digest when a new XML packet is generated. SONNET almost never lose any packet. This advantage is the The second part is for the comparison of packet head and the result of the robustness of the underlying overlay network. node identifier. Note that the first part of the cost has no However, it also validates our argument that scalable P/S

Fig. 3

The cost of generating digests

223

Fig. 4 Dissemination efficiency compared with YFilter (evaluating 10 000 path queries over 100 packets)

Fig. 5 The loss rates experienced by nodes in SONNET (basic routing as representative) and Mesh systems. The figure shows loss rates of the average, the best 10%th and 90%th nodes

systems should be developed upon P2P-based overlay network protocol. 7.4

Efficiency of subscription

To measure the subscription performance of SONNET system, we change the parameter number of nodes from 100 to 3000. To compare the two systems fairly, we tune the deathmean so that the loss rates for the two systems are roughly the same. For both SONNET and mesh-based method, the upper bound for the number of hops is O(log n). So, we expect to see the same number of hops in both systems. However, when the number of subscribers is small, SONNET system is not entirely binary due to the fact that there are many positions in the ring that is not occupied at all, which compresses the SONNET structure. As a result, routers of i-length paths with a small i may have extra work. So in Fig. 6 (a), a smaller number of hops in SONNET than that of the mesh is observed.

Fig. 6 Observed number of hops and standardized latency with varying number of subscribers. In this experiment, the number of possible elements is fixed and the loss rates for the two systems are tuned to be roughly the same. We also plotted the standardized latency reduced for each link in advanced routing.

The situation for latency metric is similar, except that in advanced routing, the latency is further optimized. Figure 6 (b) verifies this empirically. We use the normalized latency latency/avg(latency) to standardize the result. 7.5

Basic routing vs. advanced routing

The ability to select up-stream routers according to cbs (cost-based selection) allows the SONNET system to save a large amount of bandwidth, since the subscriber chooses a router whose query overlaps its query most. Particularly, if the query containment exists, the bandwidth is further optimized, as described before. Similarly, it is expected that latency is optimized according to lss. We evaluate how much bandwidth is spared by using cbs metric. The bottom line in Fig. 6 (b) shows the normalized latency reduced in advanced routing. In Table 3, we can see that the drastic savings in bandwidth and latency with different query depth. The result in each column is done separately, that is, the values in the

224 Table 3 The percentage saved in bandwidth and reduced in latency per link in advanced routing. This result does not include extra bandwidth saved through filtering of merged results

Depth

Latency saving

Bandwidth saving

1

21%

17%

2

25%

25%

3

29%

34%

4

30%

37%

5

31%

39%

Fig. 7

that the edge subscribers do not have any downstream subscribers), but at the same time the 90%th has an approximate workload of 4 links, indicating that there are still many nodes that have a workload of at least four links. We can also see that the number of extra links of basic routing is small with the increase in size of system. In contrast, the number of links for mesh system is getting larger with more nodes joining in the system. As illustrated in Fig. 8 (b), the number of packets through per link for mesh system is balanced. But the packets per link in SONNET is much higher and diverse. This is because in the SONNET system, every link is fully utilized, e.g. a downstream node needs all the packets of its up-stream routers. Thus, the link stress is much higher than that of mesh system. Note that it does not mean SONNET is less efficiency in terms of workload balance. On the contrary, it means SONNET is more flexible, since each node may automatically find appropriate up-stream routers without global knowledge of the

Workload distribution affected by degradation

first column is got by setting parameter α = 1 and similarly, the second column by γ = 1. If a subscriber needs a balance between the two optimizations, it will choose the appropriate α and γ, which balances the results in two columns. Note that the percentage saved is for each link, and for a particular routing path, the result is accumulated. The effectiveness of active degradation and upgrading between up-stream and down-stream nodes is shown in Fig. 7. The change of workload distribution before and after degradation is illustrated. It is shown that, when no degradation is applied (degradation level being 0), the workload is unbalanced, in which some nodes are overloaded, i.e. their load is 1.25−3 times more than the average load, and some nodes are lightloaded. The effectiveness of degradation is quite good in the experiments. When three-level degradation is allowed, most nodes achieve similar workload. 7.6

Workload balancing and scalability

Node stress is the number of down-stream nodes per node and link stress is the number of packets sent per link. In SONNET, every node has approximately two up-stream routers for resilience, and due to the characteristics of consistent hashing, every node has roughly two down-stream routers as well. However, as mentioned before, some nodes have extra work because the number of subscriber is small. Thus, in Fig. 8 (a), the number of average links per node in SONNET is approaching 2 and is becoming a little smaller than 2 as the number of nodes increases in system (attribute to the fact

Fig. 8 The node stress and links stress of nodes in SONNET system vs. Mesh system with a varying number of subscribers. Again, the number of elements is fixed. We also plot the extra links out of Chord fingers in SONNET.

225

system. Thus, the system may decrease the overhead of network maintenance by fully utilizing existing network links. As it is shown in previous experimental results, the system can stay robust and efficient even the link stress is high.

8

Conclusions

This paper presents and evaluates SONNET, a highly efficient and available content-based subscription system over structured overlay networks. It is one of the first work that tries to utilize the underlying overlay network for publish/ subscribe. We design a path-digest-based approach for XML packet dissemination that can be integrated with structured overlay network. The elegance of SONNET is that it is simple yet open for optimization. The advanced routing scheme built upon the basic scheme makes the first step for developing such advanced technologies. Analytical and empirical results show the efficiency, robustness, availability, and scalability of SONNET in XML packet dissemination with path query subscriptions. Acknowledgements The authors would like to thank Mr. Xi Liu for his contribution to the early stage of the SONNET system, and Mr. Yongyan Liu for his help in implementation of the SONNET prototype system.

References 1. Eugster P T, Felber P A, Guerraoui R, et al. The many faces of publish/subscribe. ACM Computing Surveys, 2003, 35(2):124–131 2. Diao Y, Rizvi S, Franklin M. Towards an internet-scale xml dissemination service. In: Proceedings of the 30th VLDB Conference, 2004 3. Snoeren A C, Conley K, Gifford D K. Mesh-based content routing using xml. In: Proceedings of 18th ACM Symposium on Operating System Principles (SOSP’01), 2001 4. A. K. Gupta, D. Suciu, and A. Y. Halevy. The view selection problem for xml content based routing. In: Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’ 2003), 2003 5. Papaemmanouil O, Cetintemel U. Semantic multicast for content-based stream dissemination. In: Proceedings of the 7th International Workshop on the Web and Databases (WebDB’2004), 2004 6. Gupta A, Sahin O D, Agrawal D, et al. Meghdoot: Content-based publish/subscribe over p2p networks. In: Proceedings of the Fifth ACM/IFIP/USENIX International Middleware Conference (Middleware’2004), 2004, 254–273

7. Idreos S, Koubarakis M, Tryfonopoulos C. P2p-diet: An extensible p2p service that unifies ad-hoc and continuous querying in super-peer networks. In: Proceedings of ACM SIGMOD 2004 International Conference on Management of Data (SIGMOD’2004), 2004 8. Stoica I, Morris R, Karger D, et al. Chord: a scalable peer-to-peer lookup service for internet applications. In: Proceedings of the ACM SIGCOMM 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM’2001) ACM Press, 2001, 149–160 9. Ratnasamy S, Francis P, Handley K, et al. A scalable content-addressable network. In: Proceedings of the ACM SIGCOMM 2002 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM’2001), 2001 10. Rowstron A, Druschel P. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In: Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms (Middleware’2001), 2001, 329–350 11. Zhao B Y, Kubiatowicz J, Joseph A D. Tapestry: a fault-tolerant wide-area application infrastructure. ACM SIGCOMM Computer Communication Review, 2002, 32(1): 81 12. Kaashoek N F, Karger D R. Koorde: A simple degree-optimal distributed hash table. In: Proceedings of the 2nd International Workshop on Peer-to-Peer Systems (IPTPS’03), 2003 13. Ratnasamy S, Handley M, Karp R, et al. Application-level multicast using content-addressable networks. In Proceedings of Third International 12 Workshop on Networked Group Communication (NGC’01), 2001 14. Castro M, Druschel P, Kermarrec A-M, et al. Scribe: A large-scale and decentralized application-level multicast infrastructure. IEEE Journal on selected areas in communication, 2002, 20(8) 15. Galanis L, Wang Y, Jeffery S R, et al. Locating data sources in large distributed systems. In: Proceedings of 29th VLDB Conference (VLDB’2003), 2003 16. Koloniari G, Pitoura E. Content-based routing of path queries in peer-to-peer systems. In: Proceedings of the EDBT’2004 Conference, 2004 17. Bonifati A, Matrangolo U, Cuzzocrea A, et al. Xpath lookup queries in p2p networks. In: Proceedings of the WIDM’2004 Workshop, 2004 18. Diao Y, Fischer P, Franklin M, et al. Yflter: Efficient and scalable fltering of xml documents. In: Proceedings of the 18th IEEE International Conference on Data Engineering (ICDE’2002), 2002 19. Bruno N, Gravano L, Koudas N, et al. Navigation- vs. index-based xml multi-query processing. In: Proceedings of the 19th IEEE International Conference on Data Engineering (ICDE’ 2003), 2003 20. Gong X, Qian W, Yan Y, et al. Bloom filter-based xml packets filtering for millions of path queries. In: Proceedings of the 21th IEEE International Conference on Data Engineering (ICDE’ 2005), 2005 21. Myrvold W J, Ruskey F. Ranking and unranking permutations in linear time. Information Processing Letters, 2001, 79(6): 281– 284 22. P2PSim. P2p simulator. http: // www. pdos. lcs. mit. edu/ p2psim/, 2004

Suggest Documents