A P2P-Framework for Context-Based Information - CiteSeerX

2 downloads 0 Views 187KB Size Report
peer-to-peer technology. Furthermore, we modified the underlying Pastry algorithm to optimize for locality of data. Thus, we get a self-organizing network, which ...
A P2P-Framework for Context-Based Information Mirko Knoll and Torben Weis Context-based Systems Group, Universit¨ at Stuttgart {knoll, weis}@ipvs.uni-stuttgart.de

Abstract. Recently context-based systems become more and more popular as sensor technology is improved and network access is almost omnipresent. Until now, establishing such systems involves larger investments in an appropriate infrastructure. In this paper we propose an architecture for a context-based system based on an overlay network using peer-to-peer technology. Furthermore, we modified the underlying Pastry algorithm to optimize for locality of data. Thus, we get a self-organizing network, which in addition stores context-data on nodes, which are geographically close.

1

Introduction

Advances in sensor technology enable us to track the position of people outdoors and in-house. Using small sensor boards, we can gather a multitude of additional information about our environment. Furthermore, RFID tags allow us to track objects ranging from books to freight containers. The data gathered by such sensor systems is the basis for a digital world model. In home-automation scenarios these world models are not very huge and can easily be handled by commercialoff-the-self SQL databases. However, large-scale world models require an appropriate infrastructure. Until recently, huge server farms have had to be established to provide large-scale context-information. Along with the costs for the servers, further costs for bandwidth, disk storage, and administration arise. In contrast to the WWW, world models are highly dynamic, especially when they store the position of people and physical objects. Routing all of this dynamic data to a central server is not feasible. Therefore, we must try to store such dynamic data close to the location where it originates. Thus, large-scale context services are ideally assembled of a set of servers distributed across the globe and connected by high-bandwidth networks. Although technically possible, such a large-scale infrastructure requires a significant investment. Especially projects that do not have a commercial backing cannot afford such an expensive infrastructure. Because of these drawbacks we are working on a P2P-based solution to get rid of an expensive fixed infrastructure for hosting context-based data. The idea is that those users and institutions ?

The presented work has been funded by DFG Excellence Center 627 ”Nexus”

who want to participate in a context-based application have to provide their share of bandwidth, CPU, and disk space. The major challenge is to develop a P2P algorithm that allows us to store and search information based on location. Modern P2P algorithms such as Pastry feature a 1-dimensional data structure (i.e. a ring structure), but location-based storage and queries are at least 2-dimensional. Furthermore, we want to optimize the storage of information such that it is stored close to the location it is related to. In this paper we propose an efficient architecture based on a modified Pastry algorithm. In the following section we state our system model. Then, we discuss how we partition the world and map it to the 1-dimensional Pastry structure using space filling curves. We close with a discussion of related work, conclusions, and outlook.

2

System Model

Using a P2P overlay network we can get rid of an expensive fixed infrastructure. P2P systems are self-organizing and decentralized. This allows us to reduce the cost of ownership to a minimum as the costs for the resources (e.g. disk space, cpu power, bandwidth) are spread over all peers [1][2]. In contrast to first generation P2P algorithms such as Napster, second generation algorithms no longer require central index server. Therefore, we have chosen Pastry [3] as the base algorithm for the overlay network in our system. In Pastry each node gets a 128-bit identifier by applying a hash function on its IP address. This procedure guarantees unique ids. However, in the original Pastry algorithm the IDs are by intention not related to the physical location of the Pastry node. Thus, to store and query location-based data, we must map a location in the physical world to the ID of a Pastry node. To better understand the modifications we undertook to achieve locality of data, we briefly explain Pastry’s routing mechanisms. Given a message and a key, Pastry routes the message to the node with the node ID numerically closest to the key. In each routing step the message reaches a node sharing a prefix (with the target object) at least one digit longer, thus reaching the destination in O(log(N )). However, Pastry calculates node IDs by a hash over the IP address. This leads to the fact that even nodes with ”close” ids may reside on different continents. For the original Pastry algorithm this effect is intended, because it reduces the probability that two machines with close IDs fail at the same time. Contrary to this we aspire locality of data to minimize the distance a data and queries have to travel. We picture to use Pastry’s prefix-routing, except for the hashing. We rather partition the world into equally sized zones with static ids. As we envision most nodes to be location-aware through GPS or other means of localization in the near future, they are able to calculate their ID in accordance to their position. However, this approach raises another problem. Contextinformation is related to 2-dimensional (e.g. a map) or even 3-dimensional (e.g.

several floors of a building) coordinates, whereas common P2P algorithms feature only a 1-dimensional index structure (mostly a ring).

3

World Partitioning

In order to assign an ID corresponding to the location of the node, we divide the world into equally-sized zones. For the remainder of the paper we regard only 2-dimensional coordinates due to simplicity. Still, our model can simply be extended to support a third dimension by adding more bits to the IDs. Each zone may contain at most one node. The more zones, the more nodes can be supported. However, we believe that a 56 bit encoding suffices as a single zone then covers less than 0, 007m2 . After a node has booted and located its position, it determines its zone and thus can derive its ID and join the Pastry ring. As as zone contains at most one node, no node ID will occur twice. For the time the positioning techniques are not accurate enough to match the resolution of the world partitioning, the IDs can be extended by a few bits to enhance the tolerance level. In case two nodes are still assigned the same id, a recovery procedure will re-assign a new id to the newer node. The way IDs are assigned to zones is most important to achieve locality of data. Imagine a row-wise ID assignment. The IDs of the last node on row n and the first node on row n + 1 differ by only 1. Thus, the IDs of the nodes are close but geographically they are far away from each other. Furthermore, the IDs of the first nodes of rows n and n + 1 are very different but geographically these nodes are close. In general, there is no optimal solution to this problem, because we reduce 2-dimensional information (i.e. x and y coordinate) to 1-dimensional information (i.e. the IDs). Several methods are available for mapping a line (or curve) onto a 2-dimensional area. Our objective thereby is to set the distance in ID-space according to the geographic distance. One of the most researched areas for that kind of dimension reduction are space-filling curves (SFC). We have investigated the characteristics of several curves and determined their usefulness in our scenario. In contrast to former publications [4][5] we thereby focus on the average case as it is essential to have an appropriate relation between the ID and geographic distance for all nodes.

4

Space-Filling Curves

Peano defined a space-filling curve as a surjective function f : I 7−→ Ω ⊂ Rd and Ω to be positive Rd . This definition leads to several possible curves, which d map the interval I = [0, 1] onto the space Ω = [0, 1] [6]. Each curve features a different level of locality. In this case, locality indicates the relationship between the distance of two points p1, p2 ∈ I on the SFC and its image f (p1), f (p2) ∈ Ω in the multi-dimensional space. We are looking for a curve with superior locality properties, such that: p1 ≈ p2 ⇔ f (p1) ≈ f (p2). For the use in our scenario, we solely use the SFC to assign indices to each zone. As we are not filling the entire area (in terms of mathematical space), we only regard discretizations of

the curves. As it is difficult to estimate the locality property of the curves by simply regarding the geometrical representation of the curves, we are running simulations for curves from Peano, Hilbert, Lebesgue and others. However, which curve performs best is still subject to further research.

Fig. 1. Space-filling curves and their orders (Lebesgue[4], Peano[4] and Hilbert[3])

5

Related Work

To our knowledge there is no other framework for hosting context-data, which is based on P2P technology and optimizes for locality of their content. However, there has been excessive research in partial aspects of our project. P2P Systems: Protocols in this section are optimized towards the main tasks of P2P systems: insertion, lookup, and deletion of keys. CAN is a popular and efficient representative of this section. Its storage mechanisms are similar to Pastry’s, but its routing algorithm differs. CAN’s coordinate space is completely logical and bears no relation to any physical coordinate system [7]. Hence it is difficult to host context-information according to its location as peers are responsible for a randomly chosen zone. Furthermore, the coordinate space is partitioned dynamically which requires many updates of neighborhood nodes in 1/d the case of new nodes or node failure. Thus, its runtime of O( d·N4 ) is worse than Pastry’s O(log(N )). Another interesting approach has been presented by [8] with his work about distributed space partitioning trees (DSPT). The work concentrates on publishing and searching geometrical objects within certain geometrical constraints using a distributed data structure. In RectNet, a direct implementation of this approach, the first node of the network becomes the so-called clusterhead and is responsible for the entire context space (storing all information and answering all queries locally). As more and more clients enter the area they all send their queries to the clusterhead A, whose load (CPU and traffic) steadily increases. At a certain threshold is exceeded, the clusterhead divides the space into two subcluster and has two other nodes (B and C) handle these clusters. However, the clusterhead is still responsible for the routing of all inter-subcluster communication and therefore does not scale well.

Data Storage: This area deals with storage problems and how to spread large amounts of data among all peers most adequately. One of the largest projects in that area is the OceanStore [9], [10] project. Its main research focuses on a utility infrastructure for providing continuous access to persistent information. OceanStore distinguishes between service providers and users who subscribe to one of these providers. The providers are comprised of untrusted servers, which raises the necessity to replicate all data on several other servers to prevent a loss. Therefore, OceanStore is less suited for hosting context-based information. Space-Filling Curves: In [11] the authors present a P2P information discovery system, which supports complex searches using keywords. Their systems uses Chord for the overlay network topology and the Hilbert SFC for the dimension reduction. However, their focus lies on mapping their data elements, which are local in a multi-dimensional keyword space, to indices which are local in the 1-dimensional index space. In this case two documents are considered local, when their keywords are lexicographically close (e.g. computer and computation) or they have common keywords. This comes at a cost, as using SFC does not guarantee a uniform distribution of data elements in the index space. Therefore, additional load-balancing algorithms have to run to reduce the load of heavily used nodes.

6

Conclusion and Outlook

In this paper we have proposed an architecture for hosting context-based information on a peer-to-peer system. In contrast to existing context-based systems our algorithm optimizes the data distribution towards geographic locality, keeping the distance information travels short. We are planning to extend the simulations with the space-filling curves, such that we evaluate further scenarios underlying stochastic distributions. Furthermore, we study replication methods, which backup the data on nearby nodes, so that in case a node disconnects from the ring, the information stored on that node will still be available in the same region.

References 1. Milojicic, D.S., Kalogeraki, V., Lukose, R., Nagaraja, K., Pruyne, J., Richard, B., Rollins, S., Xu, Z.: Peer-to-Peer Computing. Technical Report HPL-2002-57, HP Laboratories Palo Alto (2002) 2. Hauswirth, M., Dustdar, S.: Peer-to-peer: Grundlagen und architektur. Datenbank-Spektrum 13 (2005) 5–13 (in German). 3. Rowstron, A., Druschel, P.: Pastry: Scalable, distributed object loaction for routing for large-scale peer-to-peer systems. In: Proceedings IFIP/ACM Middleware 2001. (2001) Heidelberg, Germany. 4. Niedermeier, R., Reinhardt, K., Sanders, P.: Towards optimal locality in meshindexings. In: Fundamentals of Computation Theory. (1997) 364–375

5. Pascucci, V., Frank, R.J.: Global static indexing for real-time exploration of very large regular grids. In: Supercomputing ’01: Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), New York, NY, USA, ACM Press (2001) 2–2 6. Sagan, H.: Space-Filling Curves. Springer-Verlag, New York, NY, USA (1994) 7. Ratnasamy, S.P.: A Scaleable Content-Adressable Network. PhD thesis, University of California, Berkeley, CA, USA (2002) 8. Heutelbeck, D.: Distributed Space Partitionin Trees and their Application in Mobile Computing. PhD thesis, Open University Hagen, Germany (2005) 9. Kubiatowicz, J., Bindel, D., Chen, Y., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., Zhao, B.: OceanStore: An Architecture for Global-scale Persistent Storage. In: Proceedings of ACM ASPLOS, ACM (2000) 10. Kubiatowicz, J.: The OceanStore Project. http://oceanstore.cs.berkeley.edu/ (2005) 11. Schmidt, C., Parashar, M.: Flexible information discovery in decentralized distributed systems. In: 12th IEEE International Symposium on High Performance Distributed Computing (HPDC-12 ’03), IEEE Computer Science (2003) 226

Suggest Documents