Efficient Management of Multidimensional Data in ... - VLDB Endowment

2 downloads 0 Views 224KB Size Report
Aug 28, 2009 - S. Shenker. A Scalable Content-Addressable ... [13] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and. H. Balakrishnan. Chord: A scalable ...
Efficient Management of Multidimensional Data in Structured Peer-to-peer Overlays

1

Djelloul Boukhelef 1 Supervised by Prof. Hiroyuki Kitagawa 1,2

Graduate School of Systems and Information Engineering 2 Center for Computational Sciences University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8573. Japan [email protected] , [email protected]

ABSTRACT Efficient handling of multidimensional data is a challenging issue in P2P systems. DHT-based systems provide mechanisms for handling exact-match lookups that are extremely scalable. However, efficient evaluation of complex queries (such as multi-attributes range search, kNN search...) over huge volumes of multidimensional data is still an open problem in DHTs, mainly because they use hashing that destroys the spatial locality of the stored data, and also due to the high cost of nodes joins and departures. In this paper we propose a new scalable and distributed indexing structure for managing multidimensional data in dynamic P2P systems. Our approach is based on the Content Addressable Networks paradigm. The key idea is to equip each node with long links towards some distant nodes in the system such that a message moves faster to its target during routing, while the cost of maintaining the network during a nodes churn is minimized. Our system is a pure P2P overlay that is fully-decentralized and self-organizing, where no predefined limits are imposed on the sizes of the network or the routing state per node. Each node self-adjusts its routing state to cope with changes in network membership. Specifically, in a network with N nodes, each node maintains O(log N ) long links. Exact-match and range queries are routed within O(log N ) hops. We also provided an effective load balancing mechanism that assign a new joining node to a heavily loaded area in the key space. This mechanism guarantees a constant load imbalance factor, with an amortized cost of O(log N ) messages node join compared to O(log2 N ) in other systems. We implemented a simulator and conducted experiments to study the performance of our design. Experimental results validate the full scalability and efficiency of our approach.

1.

INTRODUCTION

Peer-to-peer (P2P) computing has emerged as a powerful key paradigm for structuring large scale distributed sys-

Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, to post on servers or to redistribute to lists, requires a fee and/or special permission from the publisher, ACM. VLDB ‘09, August 24-28, 2009, Lyon, France Copyright 2009 VLDB Endowment, ACM 000-0-00000-000-0/00/00.

tems in an ad-hoc manner, offering a large variety of features such as efficient storage and location of data items, wide area routing architecture, massive scalability, and fault-tolerance [9]. P2P systems are fundamentally different from traditional client-server systems, in the sense that they do not employ any central authority nor assume any global knowledge. Participating nodes act simultaneously as clients and servers and exchange information and services directly with each other. By nature, P2P systems are dynamic where nodes can join and leave the network freely. Each node keeps contacts with other nodes in the system (called neighbors). Efficiency, scalability, and fault-tolerance are recognized to be the central challenge in designing completely decentralized and self-organizing P2P systems. At the heart of P2P data sharing systems, the issue of efficient data storage and location is still an open challenge which is related to both scalability and efficiency. Facing the problems of scalability and search correctness in the unstructured P2P, researchers have proposed the structured P2P approach, in which the overlay topology is tightly controlled and the mapping of data items to nodes is strictly defined. A wide range of structured P2P protocols have been proposed over the last few years, such as: CAN [11], Chord [13], P-Grid [1], to name only a few. Their main focus is on lookup efficiency by shortening the lookup paths and minimizing routing state maintained by each node in the system. Distributed Hash Tables (DHTs) implement a scalable distributed hash function to deterministically map data items to nodes. DHTbased systems have become an important class of P2P protocols due to their scalability, routing efficiency, and search completeness. Storage and query loads are shared among all participating nodes, in such a way that they manage approximately the same amount of data, and fairly participate in the query processing and routing tasks. Load balancing, another critical issue in the design of P2P systems, deals with the even distribution of data and routing loads among nodes that make-up the system. It is often desirable that each node assumes responsibility for a portion of the key space that is proportional to its power (measured in term of its processor speed, available bandwidth, storage capacity, etc), and this property is maintained as nodes join and leave the system [5]. To achieve uniform distribution of hash values among peers, DHTs generally adopt one of the following strategies: (i) randomizing the DHT address associated with each item with a “good enough” hash function and (ii) making each DHT node responsible for a balanced portion of the DHT address space [6]. The use of randomiza-

tion may result in several problems. Random distribution of values creates with high probability load imbalance among peers of up to a logarithmic factor [7, 2]. Uniformity by randomization is sensitive to adversarial intervention through peer removals and/or joins [2]. DHT protocols support efficiently exact-match lookups. However, because the randomization does not preserve the locality of data, resulting DHT schemes are generally not efficient to support complex queries such as range queries and nearest-neighbor searches. In situations where applications require efficient execution of such queries, more sophisticated methods that simultaneously achieve good load balance and preserve data locality are highly required. CAN [11] is a decentralized self-organizing P2P system that provides DHT functionality on Internet-like scale. CAN operates on a virtual d-dimensional Cartesian key space that is dynamically partitioned among the existing nodes such that every node possesses its individual distinct region within the overall key space. Each node maintains links towards its immediate neighbors (i.e. nodes managing the adjacent regions). Actually, CAN’s greedy routing using immediatelinks is not efficient in large scale networks especially when the number of dimensions is small, mainly because at each hop a message can only be routed through an adjacent node which is closer to the target. Moreover, in lower dimensions, a node has a few neighbors through which it can forward messages, which makes the routing process more sensitive to network failures. Thereby, achieving rapid lookup response requires a lookup protocol that shortens lookup paths as well as evenly distributes routing traffic to avoid nodes/links bottlenecks and to improve the system responsiveness. Our contributions. We propose in this paper a novel method for efficient routing in grid overlays, we coin RCAN. Our proposal (topological structure and routing mechanism) aims to optimize several features simultaneously: short routing path, small routing state, low maintenance cost during nodes join and departure, and more routing flexibility and faulttolerance. Our basic substrate is a simple CAN overlay. As mentioned above, greedy routing using only neighboring nodes is inefficient and vulnerable to failures. The idea of RCAN is to equip each node with a few long links towards some non-adjacent nodes (called distant neighbors). RCAN establishes long-links in such a way that the network diameter is small while the cost to build/update routing tables when nodes join or leave the system is very low. In RCAN, a node selects as “distant neighbors” nodes located at distances proportional to powers of 2 from itself on the coordinate space. Long links are clockwise-directed and wrap around along each dimension. The set of long links from different nodes give birth to multiple independent rings along each dimension. These rings are of small size, and their maintenance is very simple. RCAN provides self-scaling features where no predefined limit is imposed on the number of links per node. Each node maintains a routing state that automatically self-scales with the network size. Also, the number of rings and their sizes self-adjust as nodes join or leave the network. In a uniformly partitioned space, a node is a member of one ring per dimension, i.e. the ring that intersects with its region. Long links have been used by many other P2P protocols to provide good routing performance, such as Chord [13], Symphony [10], etc. In a direct relation to our work, we

have eCAN [15], LDP [12], and SCAN [14], that have also adopted long links but in different ways. These three methods, including our multi-ring infrastructure, are built on top of a conventional CAN-like overlay. Our long links model is simple and very straightforward. Long links can be established during or after the construction of the overlay. Our multi-ring infrastructure is mainly used to speed-up the routing process and enable high level of routing flexibility and robustness by providing multiple disjoint paths between any pair of nodes. Due to its ease of maintenance, our multi-rings overlay handles efficiently frequent nodes joins and departures. Unlike eCAN’s hierarchical design, RCAN is a pure P2P system, where all nodes are at the same level, and ensure the same role. A flat architecture is a good alternative for hierarchy in highly dynamic networks, and is necessary to reach a high degree of scalability, achieve fair load distribution among the existing nodes, avoid nodes/links bottlenecks, and provide more routing flexibility and robustness. Long links in LDP point to randomly selected nodes. SCAN establishes long links following a harmonic distribution. Unlike these methods, the number of long links per node is not fixed a-priori in RCAN. Moreover, as the size of the network changes, nodes autonomously adapt the size of their zones to ensure a good load balancing, and adjust the number of long links they maintains to achieve good routing performance. Balancing the load in a decentralized P2P system is a challenging problem due to the dynamic nature of such environment and the absence of any kind of global knowledge about its actual composition. We propose a simple and efficient decentralized mechanisms to fairly distribute the data load among the participating nodes in a large-scale selforganizing Content Addressable Network. The key idea is to enable a new node that joins the system to share the load with a heavily-loaded node which is already in the system, such that the total load is still evenly distributed among all nodes in the system. In the multiple random choices method, a new node probes the loads at some nodes selected uniformly at random, then chooses the heaviest one among them to share the load with. The spirit of our work is closely similar to [7, 12]. However, our work proposes a more efficient and practical techniques for load balancing using long links which achieves small and constant load imbalance. Moreover, our method is simple and easy-to-deploy and incurs only small communication overhead. In RCAN overlay, a new joining node selects one node in the system at random, and the heaviest node among that node and its neighbors is chosen for split. Routing and load information in RCAN are disseminated between a node and its immediate and distant neighbors only. The contributions of our work achieved until now are summarized in what follows: • Self-scaling architecture: Each node autonomously adapts the size of its routing table as the network size (n) changes, without the need of estimating the network size or setting an upper limit for it. • Logarithmic routing state: Each node in RCAN maintains an average of O(d) short links and O(log n) long links with high probability (w.h.p). • Logarithmic lookup performance: A search initiated from any node reaches the node where the key is stored after at most O(log n) overlay hops (w.h.p).

• Low maintenance cost: The communication complexity to maintain long links when a node joins is O(log n) messages, instead of O(log2 n) in other logarithmicdegree overlays such as Chord, SCAN, LDP, etc. • Decentralized load balancing mechanism: It extends the multiple choices method to achieve very low load imbalance among all the participating nodes. Specifically, in the case of uniform data distribution, the load imbalance factor is constant and is close to 4. Our load balancing strategy is executed during nodes join and incurs a very low communication overhead. We have implemented a prototype system of RCAN, and performed an experimental study to verify the efficiency of RCAN versus original CAN, LDP and SSP [12]. Results confirm the full scalability and efficiency of our design, and show its superiority over these methods in term of routing performance, design flexibility, and load balancing.

2.

RELATED WORK

Content Addressable Network (CAN) described in [11] achieves O(d · n1/d ) routing performance with only a routing state of size O(d) per node. Recently there have been several works aimed at improving CAN’s routing performance. Their essence is to slightly increase the routing state at each node in order to reduce the lookup latency and provide more routing flexibility and fault-tolerance. In [15], Xu et al. proposed eCAN, a design for efficient routing in CAN using expressways. The objective of their work is closely related to our work. eCAN is a hierarchical scheme that maintains neighbor pointers at different levels of the logical space. Expressways are built by taking snapshots at different times of the evolution of the system. Some peers are hence selected to handle the expressways. An expressway maintains links to other neighboring expressways at the same level, and also pointers to peers covered by its region. eCAN shows that it is possible to achieve O(log n) routing performance by keeping logarithmic routing information at each node in a CAN overlay. In eCAN routing is conducted in bottom-up manner. A query message needs to be propagated to upper-level expressways, which implies a performance bottleneck in this design at the higher level nodes that maintain expressways. Moreover, the construction of long links in eCAN depends on the joining process of nodes, which may have direct impact on the routing flexibility, scalability and fault-tolerance. LDP [12] improves CAN’s routing performance by establishing long distance pointers (LDP) to some randomly selected nodes. The number of LDPs per node is fixed and does not change with the network size. The selection of random distant neighbors incurs high communication overhead [12], and does not guarantee a good network coverage. An improved method based on sub-space pointers (SSP) was also proposed [12]. It consists of partitioning the key space into fixed virtual sub-spaces. Each node selects random nodes from each sub-space as its long-range contacts. In a seminal paper [8], Kleinberg shows that a d-dimensional grid augmented with only one additional long link per node chosen at random following a harmonic distribution yields networks in which routing can be performed in O(log2 n) expected number of hops. Symphony [10] and SCAN [14] are two P2P systems built following this idea. They use “fixed” k long links per node to achieve O(log2 n/k) expected rout-

ing path length. Symphony arranges all the participating nodes in one big ring. Each node then establishes k links using a harmonic distribution function. SCAN approximates Kleinberg’s small-world model in multidimensional spaces. But, instead of estimating the network size like in Symphony, SCAN places an upper limit for the network size, Nmax , and takes k = log Nmax . The aforementioned methods, except eCAN, follow a flat P2P design. They establish a fixed number of long links per node, and achieve O(log n) lookup performance in average (k is chosen to be close to log n in Symphony and SCAN). Scalability is a common problem in LDP and SCAN, as the number of links per node is fixed and does not scale with the network size. Large values for k incur high maintenance overhead in the case of small networks, and small values of k do not guarantee logarithmic network diameter (max. path length) in case of large networks [3]. Moreover, mechanisms to build long links during the evolution of the network are not clearly addressed in these methods. Although we can maintain long links in a lazy manner (like in eCAN), the maintenance of one link needs O(log n) messages in average. Generally, the total cost to update the k out-going links is O(log2 n). The same remark applies for the departure of nodes, since k in-coming links need to be re-established. Thus, the expected communication overhead for nodes join/leave is O(log2 n) messages for all of the aforementioned methods as well as in Chord, Pastry, etc.

3.

OUR PROPOSAL

In this section we will briefly describe RCAN, a Multiring Content Addressable Network [3]. RCAN is a new selfscaling P2P protocol with a novel topological and routing infrastructure. The basic substrate of RCAN is a conventional grid-like overlay, where nodes know only about their immediate neighborhoods. The key design of RCAN is to equip each node with a few long links towards some distant nodes in the system. Long links are established in such a way that the routing path is shortened while the maintenance overhead for building and updating these links when nodes join or leave the system is very low. Distant neighbors are situated at distances inverse of powers of 2 on the coordinate space from the originating node. The set of long links from each node is partitioned into d small subsets, each of which is established along one dimension. Long links are clockwise-directed and wrap around the key space. The set of all long links in the system yields multiple independent rings along each dimension (multiple rings routing infrastructure). The rings are of small size, and their maintenance is very easy. The number of rings and their sizes (number of slots in a ring) self-adjust as nodes join and leave the network. In a uniformly partitioned key space, a node is member of only one ring per dimension, i.e. the ring that intersects with its region. The goal of RCAN is twofold. First it aims to improve the routing performance and enhance the fault-tolerance of CAN-like overlays by building fast shortcuts between nodes on the overlay level, while minimizing their maintenance cost during frequent nodes joins and departures (churn). The second ultimate goal is to efficiently support semantic queries over data with multidimensional keys. Semantic queries include, but not limited to, multiattributes range queries, k-nearest neighbor search, etc. On the architectural point of view, RCAN is mainly designed for a large scale dynamic P2P systems.

Self-organization. RCAN is a fully self-adjusting P2P system where no upper limits for the number of links per node or the network size are imposed. Each node maintains a routing table, consisting of the set of links towards its neighbors, which automatically self-scales when the network size changes. A self-organizing P2P overlay has the ability to spontaneously adapt itself to continuous changes in network membership due frequent and autonomous nodes joins and departures without the need of an external or central authority. The data distribution that can be affected by the insertion and deletion of data items, and the routing loads imposed on nodes/links due to query processing should also be considered by a self-organizing system. In a dynamic P2P system, nodes join and leave the system frequently, which may partially impair the predefined structure of the overlay and reduce its performance. Therefore it is highly required to deploy low-cost stabilization mechanisms that restore the system structure and keep its performance at an acceptable level. Following this idea, we also proposed an efficient techniques to maintain the routing tables during nodes churn that incur a very low communication overhead. With high probability, each node in RCAN maintains O(log n) long links, at most O(log n) messages are needed to build the routing table of a newly joining node, and O(log n) messages are needed to update the invalid entries in the other O(log n) affected routing tables [3]. In a self-organizing system, nodes should cooperate to ensure fair load balancing among them, and provide more routing flexibility and faulttolerance. RCAN provides a cost-free yet efficient mechanism to cope with data load imbalance at a large scale [4].

3.1

Overview of RCAN Overlay

RCAN operates on a native d-dimensional Cartesian key space that warps around each dimension. The key space is sub-divided into non-overlapped hyper-rectangular regions (called also zone). The regions’ sizes could be changed dynamically through split and merge operations that occur when nodes join or leave the system. In what follows we will describe the data and overlay structures in RCAN. Regions. Each region r is given a globally unique identifier (r.id) generated by applying a hash function to a reference point from r itself. Actually, an identifier of a region, may not bear any semantic, yet it should distinguish the region during all its life-time. In RCAN, the reference point of a region is the smallest point that may possibly belong to that region1 . The goal it to guarantee that the reference point does not change when the region is split or merged. Level of a region. RCAN is a decentralized self-organizing content addressable network. Multiple splits and merges may occur independently at the same time and at different locations in the key space. As consequence, regions with different extents may coexist. To keep track of the evolution of the key space, each region r is associated a positive integer r.l (called level ) that indicates the number of partitioning operations (split and merge) the region r has undertaken 2 . The next dimension along which the region should be split (resp. merged) is inferred from the region’s level. Specifi1 In 2D RCAN (figure 1(a)), the reference of a region corresponds to the top-left corner of its bounding rectangle. 2 Conceptually, the level represents the depth of a region in the virtual partitioning tree (i.e. distance from the root).

(1,0)

(0,0) 8

2

9

0

5 1 3

12

6

4

11

Source Target

10

(0,1)

7

(1,1)

(a) Long links model

(b) Torus model

Figure 1: 2D RCAN overlay. (a) Multi-rings infrastructure; (b) Routing from source to target in CAN (black arrows) and RCAN (blue arrows).

cally, the next dimension to splitting (resp. merging) r is (r.l mod d) (resp. r.l − 1 mod d). Example. In figure 1(a), nodes 3 and 4 are sibling, but 2 and 8 are not. The levels of nodes 11, 7, 10, 3, and 6 are 3, 4, 5, 6, and 7 respectively. Split. The bounding hyper-rectangle of a region r is subdivided into two equal distinct regions along one dimension i0 . The extent of r is collapsed to cover only the first half. The second half is assigned to a new sibling region s. Data items that fall into the second half are also transferred to s. After the split, the level of r is incremented by one. the level of s will be the same as r. Merge. It is the inverse of split. Two sibling regions r and s are combined into one big region that covers the two of them. One of the two regions is extended to take the place of the new region, (wolg, let us say r). The other region (s) is deleted after moving its data items to r. Finally, the level of r is decremented by one. Assigning data items to regions. As stated above, our aim is to efficiently support semantic queries (i.e. range queries, nearest neighbor search, etc.) over multidimensional data. The idea is to use a locality preserving mapping that places data items in the key space according to their semantic (Spatial coordinates of data points in the 3d space, for example). Locality preserving mappings ensure that items with similar attribute values are assigned to the same region, or at least stored in the nearby. CAN [11] and Chord [13] are examples of DHTs that employ hashing techniques to assign IDs to items and nodes. The randomization, unfortunately, destroys the data proximity and can only support exact match queries. Instead of that, RCAN operates on a native key space where data items are mapped directly to regions according to their attributes values. As discussed above, this is very important to efficiently support semantic queries. However, in the absence of good load balancing techniques, this may result in a highly imbalanced partitioning where some regions are big and store a large amount of data, while others are small and store only few items (or may be empty). The situation may become arbitrarily bad if the data distribution in not uniform. To cope with this problem, RCAN proposes simple and efficient load balancing techniques that cope with load imbalance caused by changes in the network composition due to nodes joins (static load balancing) [4].

Assigning regions to nodes. Conceptually, RCAN builds a d-dimensional overlay on top of an evolving set of computers (nodes) connected through a physical communication network. Actually, regions in RCAN are assigned to nodes using one-to-one mapping, which is independent of the structure and composition of the underlaying network. Each node p in the network owns one region r in the whole key space, and is responsible for the data items covered by r. The logical address of p is taken to be the identifer of its region (p.id = r.id). Other sophisticated mappings can also be employed in RCAN, since our definition of region is logical and independent of the physical network.

3.2

Multi-ring routing overlay

For routing purpose, each node in the system maintains routing information (routing state) about its neighboring nodes which consists of two types of links3 : short and long. Short Links. A node maintains contacts with O(d) adjacent neighbors on average. Short links are maintained by exchanging heartbeat messages between neighboring nodes. Long Links. They are at the heart of the design of RCAN. The routing state of a node in RCAN is augmented with a few unidirectional links towards nodes in the system that are at distance inverse of the power of 2 on the key space. Actually, the number of links maintained by a node along each dimension is equal to the number of times its region had split along that dimension. As result, the total number of long links at each node is always proportional to the size of its region. This property enables the implementation of selfscaling routing tables, since a node can adapt dynamically the number of long links by establishing additional links when its region splits, or dropping extra links when its region shrinks. With high probability, the total number of long links per node is tightly bounded by O(log n) [3]. Multi-rings routing infrastructure. Long links are parallel to data axes and wrap around along each dimension. Long links originating from nodes whose origins are situated on the same line form a ring. Multiple small sub-rings are hence formed along each dimension. In a regularly partitioned grid, a node is a member of one ring along each dimension (i.e. the ring passing by its origin). The number of rings and the number of nodes per ring self-scales as the network size changes. In case where regions may have different sizes, a node may become temporary member of more than d rings, because its region is large and might intersect with more than one ring along each dimension.

Example. Nodes 0, 8, 2, 9, 5, and 11 are members of the first horizontal ring. Nodes 0, 12, 10, and 7 are members of the first vertical ring. Node 11 owns a big region, it is thus member of other horizontal rings including the ring passing by nodes 12, 6, 4 and others. When a node p splits its region with a new node q along dimension i, q builds its long links. Node q becomes the successor of p on the same ring along dimension i. On the 3 When we say that a node p has a link towards a node q, this means that a direct communication channel is established between the two nodes, and through which p can send messages to q. A link is materialized by the network address, region extent, etc. of the node towards which the link points.

other dimensions, q becomes a member of the ring adjacent (in the positive direction) to p’s ring and takes the same position as p on its ring. If the level of p is the highest in its ring, this means that the adjacent ring where q will join does not yet exist. In this case a new ring is created and q becomes the first member of it. RCAN’s multi-rings infrastructure is a virtual model to organize long links. Rings are built using existing long links and do not need additional building or maintenance overhead. Our multi-ring model is highly flexible. Rings are created or removed dynamically with almost no extra cost. The number of rings and their sizes self-adjust to fit any changes in the network size or nodes distribution. Routing mechanism. RCAN adopts a hop-by-hop greedy routing approach. A node uses only its local routing table to decide the next routing step. In RCAN, a message is identified by a multidimensional key specifying the coordinates of the target point. During a routing task, a node looksup in it routing table for a neighboring node (immediate or distant) that is strictly closer to the target, according to a well defined metric, and forwards the message through that neighbor. If there are many neighbors at the same expected distance to the target, one of them is taken at random. Intuitively, the routing task in RCAN consists of solving the routing path along one ring at each step. Rings can be used in an arbitrary order. Another good feature of RCAN’s multi-ring topology is that there are many paths with almost the same expected distance between any pair of nodes. This property enables more routing flexibility and robustness against nodes and links failures.

4.

IMPLEMENTATION AND EVALUATION

In order to demonstrate the effectiveness of our proposal, we have implemented a prototype system of RCAN in C++. We have analyzed the behavior of RCAN under different uniform and skewed data patterns: Uniform data are generated using C random function. A real-world dataset consists of the coordinates of about 680000 main cities in Europe (http://www.world-gazetteer.com). We performed experiments on networks of up to 216 nodes.

(a) Europe

(b) n = 1024 nodes

Figure 2: Europe dataset and partitions. In our experiments we mainly focused on the efficiency of our scheme in terms of routing performance, maintenance overhead, and load balancing. Experimental results confirm the full scalability and efficiency of our design compared to the original CAN as well as other existing methods [3, 4]. Currently a system version of RCAN is under construction, in which links maintenance during node departure/failure,

and the bootstrapping service will be implemented. Due to space constraints, we do not present our experimental results in this paper (please refer to our conference papers [3, 4]).

5.

workload among the participating nodes, (iii) while achieving low communication overhead and query latency. These strategies should also be robust against node churn by providing the best-effort answer for any given query.

CONCLUSION AND FUTURE WORK

We proposed in this paper RCAN, a novel self-organizing topological structure to overcome the weakness of greedy routing in CAN-like overlays. The key idea is to equip each node with additional long-range links. RCAN is a pure P2P system where nodes assume same responsibilities. Each node maintains a routing state that self-scales logarithmically with the network size. RCAN’s multi-ring infrastructure gracefully adapts itself to cope with changes in the network membership. Our solution is simple and efficient and shows that even a small extension can lead to significant improvements on different aspects. We must emphasize that RCAN is by no mean the first P2P system that uses long links nor the first to achieve logarithmic routing performance using logarithmic routing state. However, to the best of our knowledge, RCAN is the first CAN-like overlay that provides a completely decentralized mechanism that provides a self-scaling routing state. The number of long links per node is proved to be O(log n). RCAN is also the first to achieve O(log n) maintenance overhead during nodes churn. This makes RCAN optimal on this aspect in comparison with other existing methods that incur a O(log2 n) maintenance overhead. RCAN increases network connectivity to make the routing process more flexible and fault-resilient. Load balancing. RCAN proposes a static load balancing mechanism that achieves a constant load imbalance factor [4]. This mechanism is simple and efficient but supports nodes join only. In our ongoing work, we are studying strategies for load balancing on nodes departure as well as the dynamic case where the distribution of data may change over time. In the first case, we need to find a good substitute for the departing node. In the later case, when a load imbalance is detected part of the load from a heavily-loaded node is moved to a lightly-loaded one (data moving). Another idea, is to force a lightly-loaded node to leave the system and rejoin at a highly populated region in the key space (node moving). Multiple regions may also be assigned to the same physical node (virtual servers [6]). Efficient load balancing strategies should reduce the amount of transferred data and the communication overhead. Query processing. Efficient evaluation of complex queries (i.e. range queries and similarity search) is an important issue in applications that manage data with multidimensional keys (spatial data, scientific data, etc.). Efficient execution of such kind of queries in large-scale and dynamic P2P systems is a challenging problem. The main reason is the lack of knowledge about the actual composition of the network which may result in missing some target nodes or visiting too many nodes that are not relevant to the query. Moreover, a P2P network is dynamic and its composition may change during the query evaluation. As result, some sub-queries or parts of the answer may be lost during the routing process. Up to this point, RCAN provides efficient support for exact-much search only. In our PhD research work we are also focusing on efficient range queries evaluation strategies that (i) guarantee that the query message is propagated to all relevant nodes and minimizes the number of non-relevant nodes to visit, (ii) fairly distribute query

Acknowledgements. This research is partly supported by Grant-in-Aid from MEXT 21013004, Japan.

6.

REFERENCES

[1] K. Aberer. P-grid: A self-organizing access structure for P2P information systems. In CooplS, pages 179–194, 2001. [2] I. Abraham, B. Awerbuch, Y. Azar, Y. Bartal, D. Malkhi, and E. Pavlov. A generic scheme for building overlay networks in adversarial scenarios. In IPDPS, page 40b, 2003. [3] D. Boukhelef and H. Kitagawa. Multi-ring infrastructure for content addressable networks. In CoopIS, pages 193–211, 2008. [4] D. Boukhelef and H. Kitagawa. Dynamic load balancing in RCAN content addressable network. In ICUIMC, pages 98–106, 2009. [5] G. Giakkoupis and V. Hadzilacos. A scheme for load balancing in heterogenous distributed hash tables. In PODC, pages 302–311, 2005. [6] D. R. Karger and M. Ruhl. Simple efficient load balancing algorithms for peer-to-peer systems. In SPAA, pages 36–43, 2004. [7] K. Kenthapadi and G. S. Manku. Decentralized algorithms using both local and random probes for p2p load balancing. In SPAA, pages 135–144, 2005. [8] J. Kleinberg. The small-world phenomenon: An algorithmic perspective. In STOC, pages 163–170, 2000. [9] E. K. Lua, J. Crowcroft, M. Pias, R. Sharma, and S. Lim. A survey and comparison of peer-to-peer overlay network schemes. IEEE Communications Surveys and Tutorials, 7(2):72–93, 2005. [10] G. S. Manku, M. Bawa, and P. Raghavan. Symphony: distributed hashing in a small world. In USNIX USITS, pages 10–10, 2003. [11] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A Scalable Content-Addressable Network. In SIGCOMM, pages 161–172, 2001. [12] O. D. Sahin, D. Agrawal, and A. E. Abbadi. Techniques for efficient routing and load balancing in Content-Addressable Networks. In IEEE P2P, pages 67–74, 2005. [13] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup protocol for Internet applications. IEEE/ACM Transactions on Networking, 11(1):17–32, 2003. [14] X. Sun. SCAN: a small-world structured P2P overlay for multi-dimensional queries. In WWW, pages 1191–1192, 2007. [15] Z. Xu and Z. Zhang. Building low-maintenance expressways for P2P systems. Tech. Rep. HPL-2002-41 41, 2002.