May 1, 2001 - Retrieval performance between end-hosts is highly variable and dynamic. ⢠Need to ... Peer-to-Peer Syste
Enabling Efficient Content Location and Retrieval in Peer-to-Peer Systems by Exploiting Locality in Interests Kunwadee Sripanidkulchai, Bruce Maggs, Hui Zhang, Carnegie Mellon University
Challenges Challengesposed posedby bypeer-to-peer peer-to-peer • Want scalable and high performance content location and peer
Gnutella overlay
selection. Existing solutions provide scalable location, but have not addressed peer selection. • Retrieval performance between end-hosts is highly variable and dynamic.
Content Peer list overlay
May 1, 2001
High variability (σ ≈ 1 sec) in ping times over a 24-hour period to a random end-host on the Internet (typical for 1/3 of 2400 end-hosts pinged in our experiments)
4
Ping Time (ms)
10
3
10
3) The list evolves as more content is retrieved and more peers are discovered
Content location Queries for content are sent to peers in the list
18:00:00
00:00:00 Time
06:00:00
12:00:00
• Need to use up-to-date performance state to select a peer • For scalability, cannot maintain up-to-date state for all peers • Which peers should we maintain state for? - Peers that have locality in interests
Locality in interests Observation: people share common interests. Can we exploit this to improve content location and retrieval? D, E, F 0/3
Fine-grained dynamic performance state can be maintained for peers on the list
Potential benefits and overhead • Use Boeing corporate web proxy traces to drive the request stream for the simulations • Treat a request for a new document (a compulsory cache miss for a web cache) as a publish in peer-to-peer system • Ran simulations over a period of 5 minutes to 3 hours • Content location algorithms are based on - Asking random peers - Asking peers with same interests (1 hop) - Asking peers of peers with same interests (2 hops)
A, C, D, E
2/3 0/3
3/3 A, B, C, D
Peer selection and content retrieval
A, B, C
60
11
max
F, G, H
10
50 9 content−based 1 hop 8
Miss rate (%)
Proposedsolution solution Proposed
average over 16 runs
30
min
Number of peers
random
40
7
content−based 2 hops
6
5
20 4
A distributed algorithm for peers to self-organize into clusters based on interests (peer list) Why is it easier to incorporate dynamic performance state when using locality in interests to locate and retrieve content? - Only need to keep performance state for peers that are likely to provide the content one is looking for.
Peer list 1) Each peer maintains a list of peers who share the same interests 2) Peer lists are initially bootstrapped using existing protocols, such as Gnutella, Tapestry, Pastry, CAN, or Chord. We use the following heuristic: peers that have the content you are looking for have the same interests.
content−based 1 hop
3
content−based 2 hops
2
10
0
0
2000
4000
6000 Simulation length (s)
8000
10000
12000
Locating content amongst peers with locality in interests results in low miss rates
1
0
2000
4000
6000 Simulation length (s)
8000
10000
12000
Maintaining a small list of peers who share the same interests provides good hit rate
Implementationstatus status Implementation • Refining algorithm by ranking peers in one’s list to select peers that are more likely to have content • Exploring alternative mechanisms to bootstrap peer lists • Developing techniques for incorporating dynamic performance state into algorithm • Implementing our solution using Gnutella to bootstrap peer lists