KaZaA [13] also uses supernodes to achieve fast searching, but it can not guarantee a file existing in the system to be found, and it does not use supernodes for ...
SBARC: A Supernode Based Peer-to-Peer File Sharing System Zhiyong Xu and Yiming Hu Department of Electrical & Computer Engineering and Computer Science University of Cincinnati, Cincinnati, OH 45221-0030 E-mail: zxu,yhu @ececs.uc.edu
Abstract Peer-to-Peer (P2P) system has become one of the hottest research topics, its excellent characteristics of fully decentralized control and self-organizing make it attractive for some particular applications. However, it faces more technical problems than client/server architecture. In this paper, we propose SBARC, a new P2P file sharing system which takes into account the tremendous resource differences among peers to improve system performance. SBARC divides the peers into supernodes and ordinary nodes and most workloads are taken on supernodes. The main contributions of SBARC are (1). A supernode based routing algorithm which can reduce the average routing latency; (2) Routing information are cached to further reduce the routing cost; (3). A coordinate file caching scheme which achieves efficient utilization of free storage space. Our simulation results show SBARC routing and caching schemes can achieve better performance than previous approaches.
and clients as C/S system. The rest of the paper is organized as follows, we discuss related works in Section 2. The concept of supernode is introduced in Section 3. In Section 4, we describe SBARC system design. Section 5 presents SBARC performance evaluation results. We conclude our paper in Section 6.
2 Related Works
A bunch of research papers were published in the past two years, including PAST [2] [3], OceanStore [4] and CFS [5], new scalable routing/locating algorithms such as Pastry [6], Tapestry [7], Chord [8] [9] and CAN [10] are proposed. All these systems did not consider too much about resource diversity among peers which is a drawback. Although exploring diversity among peers to boost system performance was mentioned in [1] [11], but no solution is given. The significant difference between SBARC and other systems is its heavy usage of resources differences among peers. Brocade [12] also notices this problem and uses supernodes to improve the routing efficiency. However, SBARC differs from Brocade in many design considerations: SBARC inte1 Introduction grates supernodes directly into the underlying P2P routing algorithm while Brocade creates another level of structure over the existing alThe biggest problem in traditional C/S architecture is that the funcgorithm. In addition, SBARC proposes a new caching scheme and tionalities of clients and servers are strictly separated, clients can utilizes supernodes to achieve better caching performance which not carry workloads on servers even in case servers are very busy. is not addressed in Brocade. KaZaA [13] also uses supernodes to P2P architecture comes to another extreme, system workloads are achieve fast searching, but it can not guarantee a file existing in the uniformly distributed on all peers without the consideration of hetsystem to be found, and it does not use supernodes for caching purerogeneous resources among peers. Each node is supposed to make pose. equal contribution. This seems natural since equal treatment of peers Some other P2P systems such as Freenet [14], FreeHaven [15], is the basic principle of P2P system. But unfortunately, computer Publius [16] and Crowds [17] are more concerned on providing resources, such as network bandwidth, computation power and storstrong anonymity and anti-censorship systems. This is also an image capacity are quite different among peers. Saroiu and Gribble portant research direction in P2P system, but we do not address this conducted a measurement study on Napster and Gnutella [1]. problem in SBARC. In this paper, we propose SBARC, a Supernode BAsed Routing and Caching P2P file sharing system. SBARC makes an intensive usage of heterogeneous character of P2P system and attains a great 3 Supernodes performance improvement. In SBARC, we incorporate the advantages of both C/S and P2P architecture together. Different peers are 3.1 Selection Criteria treated differently and make unequal contributions according to their capabilities. System workloads are neither uniformly distributed on Using supernodes to improve P2P system performance has become all nodes nor concentrated on a small number of centralized servers, a hot topic recently. We briefly introduce the concept of supernode they are unevenly spread. Supernodes take most system workloads in this section. A node can be a supernode only if certain criteria and act more like servers while ordinary nodes only take a small are satisfied. The basic requirement is a high bandwidth connecportion of system workloads and act more like clients. However, tion. This property is important to gain low network latency in largeSBARC does not make strict separation between the roles of servers scale P2P systems. The second criteria is it must has enough com-
putation power because supernodes need to deal with most system workloads. Third, supernodes can not join/leave system frequently, otherwise its efficiency will be greatly reduced. Also it is helpful if supernodes have large amount of storage space.
3.2 Supernode Domain A supernode domain consists of several topologically adjacent nodes. It can include 1-3 supernodes and tens of ordinary nodes. Once a supernode becomes the primary supernode, other supernodes in this domain can only be secondary supernodes. Another way to create a supernode domain is to aggregate nodes whose nodeids are numerically congruent. However, it is impractical because in such an approach, the nodes within one domain can be widely spread which could bring great overheads for maintenance. The primary supernode records the information of all other nodes in its domain. It also stores copies of all other nodes’ routing tables. Routing tasks previously executed on the ordinary nodes are now taken by their primary supernode. Cache space of the ordinary nodes are also managed by their primary supernodes. The details are described in Section 4.2 and Section 4.4.
NodeId
10232312
Routing Table 01332302,0 1 0 01332302,33123211 10002332,02213201 10112001,0 10203312,0 10212203,0
23013211,0 12230112,0 2 10221001,0
31022001,0 12322113,0 10331123,11232132 3
2 10233021,33023121 10230303,00112301 10231231,11230222 3 10231102,31230001 10232201,11230223 1 2
Neighborhood Set 02312003 11102310
Leaf Set
33013012 00122001 Small
11000012 23310021
30123301 20012002
Large
10231102
10231231
10232333
10232201
10233002
10233212
10233021 10300210
Supernode Table 00233120
10223211
3.3 Supernode Problems Because the overheads of creating a new supernode domain, supernodes must be stable. A frequently joining/leaving node can not be a supernode even though it has a high bandwidth connection. Otherwise, the system maintenance overhead will increase greatly and the performance gain will decrease sharply. To encourage a node to apply to be a supernode, some benefits should be given, for example, large storage space quota can be assigned to the client who contributes its node as a supernode. An adversary supernode can bring great trouble to system performance. Security techniques such as credit system, quota limitation must be performed to relieve the problem, more researches are needed in supernode security.
4 SBARC System Design In this section, we describe the design of SBARC. SBARC system is based on PAST, a large-scale P2P storage utility. In this section, we give a brief introduction of PAST, details can be found in [2] [3] [6]. PAST is totally self-organized and decentralized. In PAST, each node is capable of initiating and routing client requests to insert or retrieve files. Nodes also contribute storage to the system. Files are replicated on multiple nodes to ensure persistence and availability. PAST provides scalability, high availability, persistence and security. However, SBARC takes into account computer resources diversities (network bandwidth, computation power and shared disk storage capacity, etc.) among peers to boost system performance. In SBARC, a node can be either a supernode or an ordinary node. System workloads are unevenly distributed on peers according to their capabilities. SBARC borrows ideas from PAST in its naming schema, node operations, fault handling, storage management and security mechanism. Thus PAST properties such as a routing procedure finishes in less than steps and network locality in routing tables are also held in SBARC. However, SBARC differs
Figure 1: State of a hypothetical SBARC node
from PAST significantly with the usage of supernodes, the differences are mainly concentrated on the routing algorithm and caching scheme. We focus our discussion on these two aspects only.
4.1 SBARC Data Structures In SBARC, each node, no matter a supernode or an ordinary node, maintains several tables, which include a routing table, a neighborhood set, a leaf set and a supernode table. Unlike Pastry, in SBARC, an entry in the routing table contains two items. The first item is the nodeid and associated IP address of the node whose nodeid shares some digits as the present node’s nodeid. The second item is the nodeid and IP address of the first item’s primary supernode. If the first item itself is a primary supernode or it does not belong to any supernode domain, the second item is set to 0. The nodes in the entries in higher levels of the routing table share fewer number of digits as the present node while nodes in lower level entries share more digits. Neighborhood set and leaf set have the same contents as Pastry and are created using the same method. Supernode table maintains the information of supernodes. The first entry stores the nodeid and IP address of the primary supernode, it can be null in case the present node does not belong to any supernode domain. It may have extra entries storing information of secondary supernodes. Figure 1 shows the state of a hypothetical node 10232312. Other than these tables, a supernode maintains extra tables such as domain member table, routing cache table and file cache table. It also keeps copies of all other nodes’ routing tables in its domain. Domain member table keeps the nodeids and IP addresses of all nodes within the supernode domain and it is largely overlapped with its neighborhood set. Routing cache table and file cache table are used in SBARC caching scheme and will be discussed later.
4.2 Routing/Locating Algorithm
primary supernode A and adds node A in its supernode table as the primary supernode, node X then contacts A for other supernodes in Routing/locating is the most frequent executed operation in P2P systhis domain and add them to its supernode table also. In the sectems. Improvements in routing/locating algorithm can greatly boost ond step, node X contacts A to route a “join” message with the key system overall performance. Current routing algorithms simply treat equal to X. This message is sent along the path to node Z whose nodes as if they have the same amount of computer resources and nodeid is numerically closest to X. The whole procedure is similar can deal with workloads equally. However, a node with a fast CPU as Pastry. However, in SBARC, the message is mainly carried on and a broad bandwidth connection (power node) can deal with a supernodes as other routing messages. The encountered supernodes routing task much faster than a node with a slow CPU and a narrow do not send the corresponding lines of their own routing tables, but bandwidth connection (weak node). the corresponding lines in the routing tables of their surrogated orPastry uses a smart mechanism to relieve this problem. When dinary nodes to node X. Then the routing table on node X is created a new node comes, all the entries in its routing table are created by correctly as Pastry. The neighborhood set and leaf set on node X choosing a node which is topologically close to the current node. can also be created using the same mechanism used in Pastry. In However, it has limitations. If the number of system nodes is small, the third step, supernode A informs other nodes who need to be a weak node can not be avoided if it is the only node satisfying the aware of node X’s arrival with new node’s information, these nodes prefix requirement. In SBARC, we solve this problem by taking are mainly supernodes, they do not modify their own routing tables routing hops on supernodes, bypassing the weak nodes and elimibut modify their surrogated nodes’ routing tables accordingly. The nating the possible long latencies on weak nodes. Another advanneighborhood sets and leaf sets of affected nodes are modified using tage of using supernodes in routing/locating procedure is the relief the same mechanism as Pastry [6] also. In the final step, node X of the node failure problem because supernodes stay much longer sends its newly created routing table to supernode A and the node than ordinary nodes. arrival operation finishes. Basically, SBARC uses Pastry as the underlying routSBARC nodes failure can be divided into two categories. If ing/locating algorithm, but SBARC can provide much better routan ordinary node fails, its primary supernode knows immediately. ing performance. SBARC still only takes at most hops Because other nodes are not aware of this and the supernode is still for a routing request, however, its routing/locating algorithm utilizes alive, the newly coming routing message via this node can be carried both advantages of the locality property of Pastry and the power of by this supernode without any problem. However, in such situation, supernodes. it should not take the routing task via the failed ordinary node any In SBARC, a routing request is always sent to the supernode more. It should notify the originator the failure of the node, the rest instead of the ordinary node. The supernode chooses an entry in operations are the same as Pastry – the entries in affected tables are the routing table of the corresponding ordinary node whose nodeid modified. If a primary supernode fails, a secondary supernode can shares more digits or much closer to the request key and uses the take the responsibility and becomes the primary supernode. It noprimary supernode in this entry for the next routing hop. Only in tifies all other nodes in this domain and these nodes modify their case of a supernode failure, the current supernode will forward the tables accordingly. In SBARC, when a new routing request is sent message to the original ordinary node. Thus SBARC bypasses ordito the failed supernode, the originator can not get response, it waits nary nodes in most cases. Supernodes can process a routing request for a while, then sends the routing message to the corresponding much faster and forward the message to the next node more quickly, ordinary node. The ordinary node takes its own routing task, fora shorter average routing network latency can be expected. For exwards this message to the next node and the routing procedure conample, suppose Node A is performing a routing procedure to Node tinues. After the response returns, it notifies the originator to modify X, it needs five hops to reach the destination. In Pastry, a very long its routing table, setting the second item in the corresponding entry network latency hop exists. It dominates the whole routing latency. with the new primary supernode or 0 in case of no supernode. In SBARC, the third and fourth hops are carried on supernodes, thus It achieves a much lower routing latency. In both SBARC and PAST, a file is stored on the node whose 4.4 Caching Scheme nodeid is numerically closest to its fileid, so the routing and locating Caching scheme is another important aspect which differs SBARC procedures are combined. However, in SBARC, routing and locatfrom other P2P systems. PAST and CFS also use caches, they cache ing procedures could be separated with the introduction of routing files along the routing paths. However, an individual node can not cache table. A routing procedure may finish early if it gets a cache cache many files because of the large storage requirement. A low hit in a routing cache table along the routing path. In this case, if it cache hit rate can be anticipated. Such a caching scheme is not very does not get a file cache hit simultaneously, the originator still needs efficient and can not achieve much benefits, especially in a largeto perform the locating procedure to retrieve the required file from scale system. the destination node, since it already knows the destination node, it The usage of supernodes generates a more efficient caching can retrieve the file from that node directly. scheme in SBARC. Unused storage space in one supernode domain are managed together by the primary supernode and appeared as a 4.3 Node Operations big virtual cache. This cache can keep more files than an individual node. Besides files, their routing information are also cache-able In SBARC, when a new node comes, it needs to initialize its tables objects. Because the size of a file’s routing information is much and inform other nodes of its presence. The first step of the node smaller than the actual file, with a very small storage space, a large arrival is to determine its supernode domain. Assuming the new amount of routing information can be cached. node X knows initially about a nearby node A’. X asks A’ about its In SBARC, two kinds of cache tables are created on supernodes.
Routing Cache Table 0
10232000,(10233002, 02313021)
23021220,(23021331, 11123021)
1 N−1
20120231,(21020003, 33210211)
N hash value
File Cache Table 0 1
Counter FileId NodeId 2 13012112 23310032 1 00230221 12333011
13012112 02321112 Node 23310032
22033210
13213301 00230221 Node 12333011
N−1 N File Cache Partitions on each node Within the supernode domain
Figure 2: Tables on supernode 01332302.
One is routing cache table which is used to cache routing information, the other is file cache table which is used to manage all cached files in this big virtual file cache. A simple example of cache tables on a hypothetical supernode 01332302 is shown in Figure 2.
4.4.1 Routing Cache Table Routing cache table caches files’ routing information. Each entry in this table records a fileid, the nodeid and associated IP address of the node which is numerically closest to this fileid, the nodeid and IP address of this node’s primary supernode. In SBARC, when a routing procedure finishes, the routing information is cached on the supernodes along the routing path. Next time, when a new routing request for the same file comes, a supernode which caches the routing information can reply the originator directly, a lower routing cost is achieved. A small storage space is allocated for routing cache table on each supernode. Because the size of routing information is much smaller than the size of the corresponding file, such a small space is big enough to cache much more routing information than the actual files. Because of the relatively large number of entries, an efficient algorithm is used to speed up the searching procedure. SBARC organizes routing cache tables as hash tables for this purpose, it also creates a LRU list for all the entries of each hash value to utilize the temporary locality of popular files. With the usage of routing cache table, the routing cost can be further reduced.A high cache hit rate can be expected, thus a reduced average number of routing hops and a lower average routing latency can be achieved.
domain are managed together to provide a more efficient file caching scheme. The primary supernode maintains a file cache table to manage the information of all files cached in its domain. During a routing procedure, if the primary supernode finds the requested file is cached in its domain, it replies the originator with the nodeid which stores the copy immediately and this node can satisfy the originator with the requested file. In most cases, this node is supposed to be much closer than the node whose nodeid is numerically closest to the fileid because of the network locality property held in SBARC. File cache tables are also organized as hash tables to speed up searching procedure.
5 Performance Evaluation In this section, we evaluate SBARC by simulations. First, we describe emulated network model and workload traces. Then the experimental results of the supernode based routing algorithm are presented. The efficiency of the routing cache table is tested next. Finally, we discuss the supernode based file caching scheme.
5.1 Simulation Environment 5.1.1 Network Model and Workload Traces We use Transit-Stub topology (GT-ITM TS model) [18] as the emulated network model. In our emulated networks, the latencies of intra-transit domain links, stub-transit links and intra-stub domain links are set to 100, 20 and 5ms respectively. The total number of nodes varies from 1000 to 100000. We choose web proxy logs obtained from the National Laboratory for Applied Network Research (NLANR) in our simulation. This method is also used by [6]. The trace data we used are collected from eight individual servers between April 30, 2002 and May 6, 2002.
5.2 Routing/Locating Algorithm
In first simulation, we evaluate the SBARC supernode based routing algorithm. Caching is disabled to remove the impact of the caching scheme, node failure problems are ignored also. For Pastry, routing tables are generated by selecting the node which has the minimal link latency to the present node and satisfies the nodeid prefix requirement. For SBARC, the nodes within a stub domain forms a supernode domain and the node with minimal processing delay time and link latency is chosen as the primary supernode. No secondary supernodes used. An entry in SBARC routing tables records in4.4.2 File Cache Table formation of both the node which has the minimal link latency to The routing cost can be reduced by the usage of routing cache ta- the present node and its primary supernode. For both SBARC and ble. However, the originator still needs to contact the destination PAST, the parameter we use are b=4, =16 and =32. Same node whose nodeid is numerically closest to the key to retrieve the parameters are also used in other simulations. requested file. Generally, that node has a longer network distance As we mentioned before, Pastry can not avoid all weak nodes than the inter-mediate nodes during the routing procedure. If the re- during routing procedures. Figure 3 shows the percentage of weak quested file can be found in one of the inter-mediate node, the orig- nodes Pastry routing algorithm encountered. The total weak nodes inator can download the file from it, a better system performance is curve reflects the percentage of weak nodes in one routing proceacquired. Thus, caching files is also very important. dure. The last hop weak node curve reflects the possibility of the SBARC also utilizes unused storage space for file caching as node whose nodeid is numerically closest to a request key is a weak PAST. But unlike PAST which lets each node managing its unused node. In SBARC, nodes along the routing path are mostly superncache space individually, in SBARC, all nodes within a supernode odes which have smaller processing delay time and can process re-
5.3 Caching Scheme Performance
5.3.2 File Cache Table We check the performance of SBARC file caching scheme in the last experiment. We adopt the method used in PAST [3] in our simulation. April 30th, 2002 NLANR trace is chosen as the workload. The TS model used here is a 2580-node network. The contributed storage space distribution is a truncated normal distribution with the mean value 27MB, standard deviation 10.8, and with upper and lower limits at 51MB and 2MB respectively. The total storage capacity on all peers is 78GB. These are the same parameters used in PAST [3]. PAST caching scheme is used here for comparison. We disable PAST replica diversion and file diversion for simplicity. In PAST, each PAST node can only manage and use its own storage for file caching. In SBARC, when a request comes to a supernode for the first time, the supernode chooses a node in its domain to cache this file. In case it is full, some old files will be moved to a weak node. SBARC keeps popular files on supernodes and power nodes for fast accesses. The replacement algorithm used in both SBARC and PAST is LRU. 100
cache-10 cache-100 cache-1000
90 80
Cache Hit Rate (%)
quests quickly. For Pastry routing algorithm, as the total number of nodes increases, both the percentage of total weak nodes and last hop weak node drop accordingly. However, even in a 100000-node network, Pastry still has 5.83% weak nodes in its routing procedures. Because the processing delay time on weak nodes is much larger than other nodes, the routing performance in Pastry is greatly affected by the small percentage of weak nodes. SBARC routing algorithm has nearly the same average number of routing hops as Pastry for any size network, but it achieves a much lower average routing network latency because it avoids weak nodes. Figure 4 shows the results. In this simulation, SBARC routing algorithm has 32.1% to 46.4% less average routing network latency than Pastry. As the network size increases, the performance gap between SBARC and Pastry diminishes a little. However, even in a 100000-node network, SBARC outperforms Pastry by 32.1%. Figure 4 also shows both SBARC and Pastry routing algorithms have good scalability, the average routing latency rises slowly as the network size increases. However, because supernodes are more stable than ordinary nodes, SBARC will not suffer from the node failure problem as much as Pastry, a larger performance gap between SBARC and Pastry routing algorithms can be expected especially in a highly dynamic environment.
70 60 50 40 30 20 10 0
30-Apr
5.3.1 Routing Cache Table
2-May
3-May
4-May
5-May
6-May
Figure 6: Routing cache table hit rate 100
3
2.5
80 70
2
60 50
1.5
40 1
30 SBARC (hit rate) PAST (hit rate) SBARC (# hops) PAST (# hops)
20 10 0
0
20
40
60
Utilization (%)
80
0.5
Average Number of Routing Hops
90
File Cache Hit rate (%)
In this experiment, the efficiency of the routing cache table is evaluated. The number of entries in the routing cache table varies from 10 to 1000 and the LRU algorithm is used as the replacement policy. System with no routing cache table is used as the baseline. Figure 5 shows the results. For baseline system, a routing procedure finishes only when it reaches the destination node, thus it has the largest average number of routing hops which is nearly . By using the routing cache table, the average number of routing hops drops significantly. The more entries in routing cache tables, the more performance gain achieved. With a 10-entry table, it drops by 20.3% to 36.6%. As the number of entries increases to 100, SBARC get another 32.2% to 38.4% reduction. As the number of entries increases to 1000, the average number of routing hops decreases even more. We test more number of entries and find the performance improvement is limited and maintanence cost increases greatly. So we prefer to use a 1000-entry table. Figure 6 shows the cache hit rates for seven workload traces in a 10000-node network. The emulated TS network includes 6 transit domains with 7 nodes in each and each stub domain contains 16 nodes. In this experiment, we count a cache hit if a routing procedure finishes before it arrives the destination node whose nodeid is numerically closest to the request key. As the number of entries in routing cache table increases, the cache hit rate increases accordingly. For a 10-entry table, the hit rate varies from 5% to 15% with an average of 10.57%. As the number of entries increases to 100, the cache hit rate is improved to 25% – 59% with an average of 43.43%, and it varies from 67% to 89% with an average of 77.29% for a 1000-entry table. The hit rate is highly dependent on the traces. For example, in 100-entry configuration, system only gets 25% hit rate for 4-May trace while gets 59% for both 1-May and 5-May traces. However, with a 1000-entry configuration, system can always get a high hit rate.
1-May
0
100
Figure 7: Least-Recently-Used (LRU) Policy
Figures 7 shows the experimental results. It is clear that SBARC file caching scheme has higher hit rates than PAST. When the total storage utilization is low, both SBARC and PAST file caching scheme achieve good performance. As the storage utilization increases, the file cache hit rates decrease in both schemes. However, Performance degradation in PAST is much faster than SBARC especially when the system storage is nearly full. SBARC can achieve a 40% cache hit rate with 95% storage utilization while PAST can only achieve a 13% hit rate. For both SBARC and PAST, the average number of routing hops increases as the storage utilization increases. But in SBARC, it increases more slowly.
6 Conclusion and Future Work Peer-to-Peer system has become a hot research topic recently. As a self-organized and fully distributed system, it has great advantages over the traditional C/S architecture for some particular applications. However, such a network architecture is far more complex and faces more technical difficulties. In this paper, we propose a new
last hop weak node
Percentage (%)
12 10 8 6 4 2
90 80
5
SBARC
no-cache
Average Number of Routing Hops
total weak nodes 14
Average Routing Latency (ms)
100 16
Pastry
70 60 50 40 30 20 10
cache-10
4
cache-100 cache-1000
3
2
1
0 0
10 00 20 00 30 00 40 00 50 00 60 00 70 00 80 00 90 00 10 00 0 20 00 30 0 00 40 0 00 50 0 00 60 0 00 70 0 00 80 0 00 90 0 0 10 00 00 00
0 10 00 00
00
0
Number of Nodes
80
0
00 60
0
00
00
40
20
00 10 00 0
00
80
00
60
40
20 00
10 00 20 00 30 00 40 00 50 00 60 00 70 00 80 00 90 0 10 0 00 0 20 00 30 0 00 0 40 00 0 50 00 60 0 00 0 70 00 0 80 00 90 0 00 10 0 00 00
0
Number of Nodes
Number of Nodes
Figure 3: Possibility of weak nodes Figure 4: Routing latency compari- Figure 5: The average number of in Pastry (No cache used) routing hops comparison son (No cache used)
P2P file sharing system, SBARC. SBARC does not separate system nodes as clients and servers as in C/S architecture, nor does it treat all system nodes equally as other P2P systems. It classifies peers into supernodes and ordinary nodes according to their different resources and capabilities. SBARC makes intensive use of supernodes to boost system performance. Its routing algorithm achieves lower routing costs and the usage of routing cache table further improves the routing performance. Unused storage space on all nodes within a supernode domain form a large virtual file cache and are managed together by supernodes, a higher file cache hit rate is achieved. Simulation results show SBARC achieve better routing and caching performance than PAST. SBARC can also be used on other systems such as CFS, CAN and OceanStore, we will further our research on these systems.
Acknowledgments This work is supported in part by the National Science Foundation Career Award CCR-9984852, and an Ohio Board of Regents Computer Science Collaboration Grant. Thanks to Sudhindra Rao and Juan Li for providing many suggestions during the writing of this paper.
References [1] S. Saroiu, P. K. Gummadi, and S. D. Gribble, “A measurement study of peer-to-peer file sharing systems,” in MMCN, San Jones, CA, Jan. 2002. [2] P. Druschel and A. Rowstron, “Past: A large-scale, persistent peer-to-peer storage utility,” May 2001. [3] A. Rowstron and P. Druschel, “Storage management and caching in PAST, A large-scale, persistent peer-to-peer storage utility,” in SOSP, Banff, Alberta, Canada, pp. 188–201, Oct. 2001. [4] J. Kubiatowicz, D. Bindel, P. Eaton, Y. Chen, D. Geels, R. Gummadi, S. Rhea, W. Weimer, C. Wells, H. Weatherspoon, and B. Zhao, “OceanStore: An architecture for global-scale persistent storage,” in ASPLOS, Cambridge, MA, pp. 190–201, Nov. 2000. [5] F. Dabek, M. F. Kaashoek, D. Karger, R. Morris, and I. Stoica, “Wide-Area cooperative storage with CFS,” in SOSP, Banff, Alberta, Canada, pp. 202–215, Oct. 2001.
[6] A. I. T. Rowstron and P. Druschel, “Pastry: Scalable, decentralized object location, and routing for large-scale peer-topeer systems,” in Middleware, Heidelberg, Germany, pp. 329– 350, Nov. 2001. [7] B. Zhao, J. Kubiatowicz, and A. Joseph, “Tapestry: An infrastructure for fault-tolerant widearea location and routing.” Technical Report UCB/CSD-01-1141, U.C.Berkeley, CA, 2001. [8] I. Stoica, R. Morris, D. Karger, M. Kaashoek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications.” Tech. Report TR-819, MIT. [9] F. Dabek, E. Brunskill, M. F. Kaashoek, D. Karger, R. Morris, I. Stoica, and H. Balakrishnan, “Building peer-to-peer systems with chord, a distributed lookup service,” in (HotOS), Schoss Elmau, Germany, pp. 195–206, May 2001. [10] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker, “A scalable content addressable network.” Technical Report, TR-00-010, U.C.Berkeley, CA, 2000. [11] E. P. Markatos, “Tracing a large-scale peer to peer system: an hour in the life of gnutella,” in CCGrid, May. 2002. [12] B. Zhao, Y. Duan, L. Huang, A. Joseph, and J. Kubiatowicz, “Brocade:landmark routing on overlay networks,” in IPTPS, Cambridge, MA, March 2002. [13] KaZaA, “http://www.kazaa.com/.” [14] I. Clarke, O. Sandberg, B. Wiley, and T. W. Hong, “Freenet: A distributed anonymous information storage and retrieval system,” in Workshop on Design Issues in Anonymity and Unobservability, Berkeley, CA, pp. 46–66, Jul. 2000. [15] R. Dingledine, M. J. Freedman, and D. Molnar, “The free haven project: Distributed anonymous storage service,” in Workshop on Design Issues in Anonymity and Unobservability, Berkeley, CA, pp. 67–95, Jul. 2000. [16] A. D. R. Marc Waldman and L. F. Cranor, “Publius: A robust, tamper-evident, censorship-resistant, web publishing system,” in Proceedings of 9th USENIX Security Symposium, Denver, CO, pp. 59–72, Aug. 2000. [17] M. K. Reiter and A. D. Rubin, “Crowds: anonymity for Web transactions,” ACM Transactions on Information and System Security, vol. 1, no. 1, pp. 66–92, 1998. [18] E. W. Zegura, K. L. Calvert, and S. Bhattacharjee, “How to model an internetwork,” in Proceedings of the IEEE Conference on Computer Communication, San Francisco, CA, pp. 594–602, Mar. 1996.