Building Small Worlds in Unstructured P2P Networks using a Multi-agent Bayesian Inference Mechanism Prithviraj Dasgupta Computer Science Department University of Nebraska, Omaha, NE 68182. E-mail:
[email protected] ABSTRACT Over the past few years, peer-to-peer(p2p) unstructured networks have emerged as an attractive paradigm for enabling online interactions between a large number of users in a decentralized manner. However, the decentralized nature of unstructured p2p networks makes load balancing a challenging problem. Specifically, the self-interested nature of users on the nodes of a p2p network and dynamic changes in network topology give rise to an unbalanced distribution of nodes across an unstructured p2p network. This results in network congestion and significant search latencies for all nodes. In this paper, we describe a small-world network model and a Bayesian inference mechanism within a multiagent setting to address these issues. Simulation results for a file sharing p2p application show that our algorithm achieves an exponential reduction in number of messages exchanged and improves load-balancing across the network. Categories & Subject Descriptors: I.2.11 Distributed Artificial Intelligence: Multi-agent systems. General Terms: Management, Design, Experimentation. Keywords: P2P networks, small world graphs, topology balancing.
1.
INTRODUCTION
A p2p network is an overlay network between nodes interconnected by an underlying physical network and enables users located on its nodes to establish short-term connections with other users(nodes) to acquire and share resources with each other in a decentralized manner. In the absence of any centralized mechanism for controlling the topology of the network and for load balancing, p2p networks are characterized by an unbalanced topology with high clustering of nodes in some regions and relatively sparse node distribution in others. This results in network congestion, significant latencies for resource search requests, and a disparate distribution of load across the p2p network. In this paper,we address the issues related to unbalanced topology for unstructured p2p networks.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. AAMAS’07 May 14–18 2007, Honolulu, Hawai’i, USA. Copyright 2007 IFAAMAS .
In an unstructured p2p network, a user(node) wishing to join the network selects one or more existing nodes within the network as its neighbors. Once it has joined the network, a node sends information it wishes to disseminate across the network (for e.g. resource search queries in a p2p file sharing application) to its neighbors. The neighbors then forward the query across the network using a flooding mechanism. The unbalanced network topology in an unstructured p2p network can be attributed to two features of the p2p nodejoin mechanism: (1)When a node joins a p2p network, the selection of neighbors is decided only by the node entering the network, while existing nodes in the network unconditionally accept the entering node as a neighbor. (2) A node can join a p2p network network at any arbitrary location and potentially generate a large amount of network traffic. This gives rise to sporadic traffic patterns leading to an uneven distribution of network load. In this paper, we extend the model of a small-world structure for p2p networks described in [2] using a Bayesian inference mechanism that enables each agent to respond to dynamic changes in the network’s topology and a node’s parameters. 1 Building a small-world graph within an unstructured network is considerably challenging because nodes can possess any arbitrary degree and can join at any arbitrary location in the network.
2. SMALL-WORLD MODEL FOR UNSTRUCTURED P2P NETWORKS Small-world networks are a class of networks[3] that have the following properties: (i) a short characteristic path length that is typical of random graphs, and, (ii) a high clustering coefficient that is typical of well-ordered graphs. These properties prevent the formation of hubs in the network and make small-world networks easily navigable. A small-world graph is constructed from a non-small-world base graph in two phases. First, edges are added between certain nodes of the non-small world base graph to get an augmented graph that satisfies the small-world properties. The edges that are added can be of two types: (i) Short-range link that connects two nodes that are separated by a distance ≤ Ls in the base graph, and (ii) Long-range link that connects two nodes that are separated by a distance > Ls in the base graph. The distance Ls is called the short-range distance of the small-world graph. Simultaneously, the number of links that are added is controlled to maintain a fixed ratio α between the number of short-range links and the total number 1 We have modeled our unstructured p2p network using the Gnutella 0.6 protocol [4].
936 c 978-81-904262-7-5 (RPS) 2007 IFAAMAS
of links (short range+long range) added to each node in the augmented graph. α is called the proximity factor. A suitable value of α = 0.66 to ensure small-world properties has been reported in [5].
3.
UTILITY-BASED NEIGHBOR SELECTION
The parameters used by our agent-based neighbor selection algorithm are the following: N Ni R τ rj,t
Set of agents(nodes) where each agent is located on a node of the p2p network. Set of neighbor agents(nodes) of agent i ∈ N Set of resources stored by agents(nodes) of the p2p network Set of resource types for the different resources ∈ R Number of resources of type t ∈ τ shared by agent j
For our model, we define the following parameters to enable a priori calculation of resource availability on a node: (1) Resourcefulness: The resourcefulness of agent j to r agent i for resource type t is given by ρi,j,t = Σn∈Nj,trn,t . i Intuitively, ρi,j,t denotes the relative contribution of agent j to the total number of resources of type t shared by different agents with agent i. (2) Connectedness: Let N (i, h, ρmin ) denote the set of agents that are h hops away from agent i such that each agent j ∈ N (i, h, ρmin ) satisfies the criterion ρi,j,t > ρmin for resource type t. The connectedness of an agent i for Pk |N(i,h,ρmin )| resource type t is given by χ(i, t, k) = h=1 hδ 2 where δ is a control parameter and k is the connectedness lookahead parameter. The connectedness value of an agent is a measure of the number of agents with a minimum resourcefulness of ρmin that are within a radius of k hops from the agent, and reflects the expected availability of resources among agents in the vicinity of the agent. We assume that the probability of locating a resource of type t at an agent j by another agent i is approximated from agent j’s resourcefulness and connectedness parameters and is given by πi,j,t,k = ργi,j,t × χ1−γ j,t,k where the constant γ ∈ [0, 1] denotes the relative preference between ρi,j,t and χj,k,t . In the rest of the paper, we drop the subscripts i and k from π and denote the probability as πj,t for legibility. Let v and c denote respectively the average valuation of a resource and the average search cost/hop to agent i. The utility to agent i by accepting agent e as a neighbor for resource type t is given by: Ui,e,t = Utility from accepting e as a neighbor − Utility from not accepting e as a neighbor, = (v πe,t − c) j=a −(v πe,t − c)(Πj=an−1 (1 − πj,t )) 1 j=a or, Ui,e,t = (v πe,t − c)(1 − Πj=an−1 (1 − πj,t )) 1 Fig. 1 shows the decision function used by an agent to admit neighbors while maintaining the small-world properties.
4.
BAYESIAN UPDATE MECHANISM FOR ADDRESSING AGENT DEPARTURES
In this section, we describe a Bayesian inference mechanism that enables an agent to dynamically update another 2
δ = (maxh=1..k h logh d) + 1 ensures that χ ∈ [0, 1]
function response returns void{ inputs: Message extendedPing; Calculate ρi,e,t = Σ re r n∈Ni n,t
Calculate Ui,e,t using the eq. for utility of neighbor selection if (Fd (e, Ui,e,t , αi , di , hi,e )) = true) send pong message appended with ρ( i, e), χi,t
}
function Fd returns boolean{ inputs: int e; //id of agent originating ping double Ui,e,t ; // utility of agent e to agent i for res. type t double αi ; // proximity factor of agent i int di ; //no. of neighbors(degree) of agent i int hi,e ; // distance between agents i and e int nsl ← αi × di ; //reqd. no. of short-range links of i int nll ← (1 − αi ) × di ; //reqd. no. of long-range links of i Ui,e = Σt∈τ wi,t Ui,e,t , // wi,t gives the p.d.f of agent i //over the different resource types. if (hi,e ≤ Ls ) if ((| Si |< nsl) return true; else if ((Ui,e,t > min Ui,aj ,t | ∃aj ∈ Si ){ Si ← Si − {aj } ∪ {e} return true; else if ((| Li |< nll) return true; else if (Ui,e,t > min Ui,aj ,t | ∃aj ∈ Li ){ Li ← Li − {aj } ∪ {e} return true; return false;
}
Figure 1: Algorithm used by agent i to respond to a ping from an entrant agent e. agent’s ρ and χ values without explicitly exchanging messages with that agent. We have modeled dynamic updates arising from agent departures through the changes to the resourcefulness(ρ) and connectedness(χ) parameters of agents in the vicinity of the departed agent. We postulate that as long as most of the neighbors of an agent j remain unchanged, the value of χj,t,k can be estimated from the connectedness of agent j’s neighbors. Based on this assumption, we describe a Bayesian update based mechanism that enables an agent i to approximate the resourcefulness and connectedness parameters of an agent j dynamically without exchanging any additional messages with agent j or its neighbors. Suppose an agent i wishing to determine the utility of accepting entrant agent e as a neighbor discovers that it has not recently updated the connectedness χj,t,k of agent j which lies along the ping-path from agent e to agent i. Let C : χ → wc/wc denote a function that maps the connectivity of an agent to a boolean variable that takes two values: well-connected(wc) and not well-connected(wc’). We assume that the connectedness parameter of agent j is influenced by agents lying within a radius of k -hops from it. Agent i is then interested in calculating the probability: P (C(χj ) = wc | C(χa(1,j,n) ), C(χa(2,j,n) )....),
(1)
for h = 1..k , where χa(l,j,h ) denotes the connectedness of the l-th agent that is h -hops away from agent j. The probability that agent n, a h -hop neighbor of agent j is also an m-hop(m ≤ n + 1) neighbor of agent i is given by: P (hn,i = m|hn,j = h ) = Δh −m+1 where Δ is the clustering
The Sixth Intl. Joint Conf. on Autonomous Agents and Multi-Agent Systems (AAMAS 07)
937
χa(l,j,n) =
hX +1
(P (hn,i = m|hn,j = h ) × χa(l,i,m) )
m=1
or, χa(l,j,n) =
hX +1
(Δh
−m+1
× χa(l,i,m) )
8
10
7
10 Ratio between no. of pong and no. of ping messages (log scale)
coefficient of the agents in the network. A suitable value for Δ[1] is given by: Δ = N12 Σ di (dEii−1) , where N2 is the number of agents in the sub-network located within a radius of T T Lhops of agent j with degree > 2, Ei is the number of edges in the sub-network and di is the degree of agent i. We can then rewrite the connectedness of an agent h hops away from agent j as:
(2)
d=6
4
10
d=9
d=3
3
10
2
10
d=9
d=3 d=6
0
10
20
30
40 50 60 70 Percentage of Nodes Deployed
80
90
100
Figure 2: Ratio between no. of pong and ping messages (log-scale) vs. percentage of agents deployed.
P (C(χa(1,j,h ) ),C(χa(2,j,h ) )...)
P (C(
P
P Δh −m+1 ×χa(1,i,m) ),C( Δh −m+1 ×χa(2,i,m) ...)|C(χj )=wc) P h −m+1 P Δ ×χa(1,i,m) ),C( Δh −m+1 ×χa(2,i,m) )...)
P (C(
×P (C(χj ) = wc)
Each term in the above equation is available at agent i. Therefore, the connectedness parameter χj of agent j can be approximated by agent i from its knowledge of the connectedness parameter of agent j’s neighbors.
5.
SIMULATION RESULTS
For our experimental results we consider a p2p network comprising a maximum of 1000 agents. The arrival rate of agents follows a Poisson distribution with a mean = 50. The maximum degree of an agent is drawn from {3, 6, 9}. The values of the different constants used for the simulation are k = 2, δ = 2, γ = 0.5, v = 10, c = 2, α(proximity factor) = 0.66, and,Ls (short-range distance for small-world network)= 3. Parameters d(average agent degree) and T T L(time-tolive for a message) are varied across simulation runs. The period (in minutes) for which an agent remains in the network is drawn from U[5-30]. Each simulation was run over a period of 15 mins. The dataset used for the resources on each agent is drawn from a set of 10, 000 files. Resources can be one of three possible resource types, and each agent has a preferred resource type selected at random. Resource search queries are generated at intervals of 10 seconds from an agent selected randomly from the currently active agents. The query string for a query is drawn randomly from the file names in the dataset. All results are averaged over 10 runs. The simulation results are reported in Figures 2 through 4.
6.
REFERENCES
[1] P. Boykin and V. Roychowdhury. Leveraging social networks to fight spam. IEEE Computer, 38(4):61–68, 2005. [2] P. Dasgupta. A multi-agent mechanism for topology balancing in unstructured p2p networks. In Proc. of Intl. Conf. on Intelligent Agent Technology, pages 389–392, 2006. [3] J. Kleinberg. The small-world phenomenon: an algorithm perspective. In STOC, pages 163–170, 2000. [4] T. Klingberg and R. Manfredi. Gnutella 0.6 protocol specification draft. 2002. [5] S. Merugu, S. Srinivasan, and E. Zegura. Adding structure to unstructured peer-to-peer networks: the use of small-world graphs. J. Parallel Distrib. Comput., 65(2):142–153, 2005.
938
1 0.9
N = 1000, TTL = 7, Ls = 3 d: Average node degree
Agent−based Gnutella
0.8 Average hit ratio on queries
=
5
10
10
d=6 0.7 0.6 0.5 0.4 0.3 d=6
0.2 d=9
d=3
0.1 0 10
20
30
40 50 60 70 Percentage of nodes deployed
80
90
100
Figure 3: Average query hit ratio (linear scale) vs. percentage of agents deployed.
25 N = 1000 nodes, TTL = 7, L s =3, d = 6 20
Mean node stress
=
6
10
1
m=1
P (C(χa(1,j,h ) ),C(χa(2,j,h ) )...|C(χj )=wc)×P (C(χj )=wc)
Gnutella Agent−based
10
Using Bayes’ rule and Equation 2 we can rewrite Equation 1 as: P (C(χj ) = wc | C(χa(1,j,h ) ), C(χa(2,j,h ) )....)
N = 1000, TTL = 7, Ls = 3 d: Average node degree
Adapt mechanism (Merugu, Srinivasan, & Zegura 2005) 15
10
5
Agent−based mechanism Gnutella protocol
0 10
20
30
40
50
60
70
80
90
100
Percentage of nodes deployed
Figure 4: Average link stress on each agent of the network (lower is better) for different mechanisms vs. percentage of agents deployed.
The Sixth Intl. Joint Conf. on Autonomous Agents and Multi-Agent Systems (AAMAS 07)