On studying P2P topology based on modified fuzzy adaptive resonance theory* Yufeng Wang 1, Wendong Wang 2 1
2
Communications Engineering Department, Nanjing University of Posts and Telecommunications (NUPT), Nanjing 210000, CHINA
State Key Laboratory of Networking & Switching Technology, Beijing University of Posts and Telecommunications (BUPT),Beijing 100876,CHINA
[email protected]
Abstract. Considering vast and miscellaneous contents in P2P system, intelligent P2P network topology is required to route queries to a relevant subset of peers. Based on the incremental clustering capability of Fuzzy Adaptive Resonance Theory (Fuzzy ART), this paper made use of the modified fuzzy ART to provide small-world P2P construction mechanism, which was not only to categorize peers so that all the peers in a cluster were semantically similar, but, more important, to construct the P2P topology into small-world network structure. In detail, the modified fuzzy ART net was used to cluster peer into one or more appropriate categories according to its data interest, and the reverse selection mechanism in modified fuzzy ART was provided to construct semantic long-range edges among clusters. Simulations demonstrated that P2P small-world network emerged, i. e., highly clustered networks with small diameter, and the information retrieval performance was significantly higher than random topology.
1
Introduction
Recently, there have been much research interests in emerging Peer-to-Peer (P2P) overlay networks because they provide a good substrate for creating large-scale data sharing, content distribution etc. Research implies that P2P content sharing has become very popular in the last few years, and is nowadays the biggest consumer of Internet bandwidth [1]. Generally, there exist two classes of P2P overlay networks: unstructured and structured [2]. The technical meaning of structured is that the P2P overlay network topology is tightly controlled and contents are not placed at random peers but at specified locations that will make subsequent queries more efficient. Such structured P2P systems use the Distributed Hash Table (DHT) as a substrate to provide lookup of data based on keys through mathematical functions. Although structured P2P networks can efficiently locate rare items since the key-based routing * Research supported by the NSFC Grants 60472067, JiangSu education bureau (5KJB510091) and State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications.
is scalable, they incur significantly higher overheads than unstructured P2P networks for popular content, and they do not provide efficient fuzzy keyword based search (semantic-based search), which is more important in large-scale file sharing. P2P systems are unstructured in that the overlay topology is ad hoc and the placement of data is completely independent of the overlay topology. The network uses floodinglike search mechanisms to send queries across the overlay with a limited scope. Flooding based techniques are effective for locating highly replicated items and are resilient to peers joining and leaving the system, but they are poorly suited for locating rare items and not scalable as the load on each peer growing linearly with the total number of queries and the system size. Thus, intelligent network topologies are required to route queries to a relevant subset of peers in unstructured P2P systems. Many artificial intelligence techniques including neural networks and fuzzy inference methods have recently been proposed to search contents in P2P network. An intelligent P2P content-based document retrieval system, known as iSearch-P2P, was proposed in [3], which incorporated an intelligent technique based on Fuzzy ART neural network to perform document clustering in order to support content-based publishing and retrieval over P2P networks. This approach avoided indexing and query flooding problems of most existing P2P systems, and improved scalability greatly. But in iSearch-P2P, the P2P architecture was static two-level hierarchy, which could not adapt to the high dynamic P2P environment, furthermore, the objects were classified into certain one category, but, in fact, one object may belongs to several categories at the same time. In document clustering, Ref. [4] made use of a modified version of the Fuzzy ART to enable a document to be in multiple clusters, and the number of clusters was determined dynamically. Semantic Overlay Network (SONs) [5] used static profile to organize P2P networks to improve search in P2P data-sharing networks by clustering peers with semantically similar contents. Each piece of data must be assigned manually to a globally predefined classification hierarchy maintained by some authority, which broke the model of truly decentralized networks. The large size of the P2P system and the great number of contents in high dynamical P2P environment made the static classification unsuitable for large-scale applications. Ref. [6] proposed a model in which peers advertised their expertise in the P2P network. The knowledge about the expertise of other peers formed a semantic topology. Based on the semantic similarity between the subject of a query and the expertise of other peers, a peer could select appropriate peers to forward queries to, instead of broadcasting the query or sending it to a random set of peers. In many networks found in nature and society, a so-called small world structure has been observed, namely, a small average diameter and a high degree of clustering, which make them effective and efficient in terms of spreading and finding information. Ref. [7] demonstrated how small world of peers can organize themselves in an ontology-based P2P knowledge management system, which provided rewiring strategies to build a network in which peers with similar semantic interest formed clusters. These strategies only relied on the local knowledge of each peer and on the notion of similarity derived from the relationships of entities in ontology. Ref. [8] presented a new algorithm for content-oriented search in P2P networks, in which the P2P architecture was organized into a small world network structure. This structure could then be exploited for implementing an efficient search algorithm: each peer
forwarded incoming queries to just one of its neighbors (the one whose document profile best matched the query). Semantic small world (SSW) [9] was proposed to cluster peers into a small world overlay network according to the local data semantic. The main focus of this paper was to examine the use of fuzzy ART to classify (or organize) peers into clusters corresponding to their data interest, and formed the P2P architecture into semantic small world network. Our paper’s contribution lied in combining many of the above ideas in a way that was guided by the concept of fuzzy adaptive resonance theory and small world network model to organize the P2P architecture into connected semantic clusters. Although Fuzzy ART has the name “fuzzy” in the mean that it is used to work with fuzzy data, it categorizes an object into certain one cluster (i.e. it is a hard clustering algorithm), thus it cannot be used for peer clustering effectively, in that a peer many belong to several semantic categories. A modified version of the Fuzzy ART was used to enable a peer to be in multiple semantic clusters, and reverse selection in modified Fuzzy ART was provided to form the connectivity among semantic clusters. The paper is organized as follows: a brief introduction about Fuzzy ART and small world network model was given in Section 2. In Section 3, we provided the modified fuzzy ART-based peer clustering algorithm and organized the P2P architecture into semantic small world based on reverse selection in soft fuzzy ART. In Section 4, from the measure metrics in small network model (like average clustering coefficient and average shortest path length) and information retrieval (recall ratio), the search performance of our architecture was simulated. Finally, the brief conclusion and future work was given in Section 5.
2
System architecture
To facilitate to carry out semantic based search in P2P systems, peers usually were represented by a collection of attribute values which could be derived from the share contents in those peers. In our intelligent P2P architecture, considering cooperation and understanding amongst peers, assume there was an ontological definition that defined the semantics of commonly used concepts and terminologies. The ontology could be defined using DAML-OIL (http://www.daml.org/). Each peer could be seen as a point in a multidimensional semantic space. P2P overlay network designed for efficient semantic based search should be constructed in a way such that the peers are organized in accordance with their location in semantic space. But the shared contents in P2P systems are vast and miscellaneous, more important, the peer churn rate (the rate of peer joining/leaving P2P systems) is very high, so it is imperative to study how to cluster peers in those high dynamic systems and adapt to rapid change of P2P architecture. It was shown that a self-organizing ART network is suitable for dynamic partnership seeking, so, in this paper, the Fuzzy ART network was used to conduct the dynamic cluster formation process. ART neural networks were developed to address the problem of stability-plasticity dilemma, which could be proposed as follows: How can a learning system be designed to remain plastic or adaptive and at the same time remain stable to irrelevant events? The ART networks solved this problem through incremental algorithm. In detail, it adapted to new inputs indefinitely,
at the same time, it wouldn’t let new inputs to change any stored patterns until the input pattern matched the stored pattern within a certain tolerance. That is, new categories could be formed when the environment did not match any of the stored patterns, but the environment couldn’t change stored patterns unless they are sufficiently similar. But, basic ART worked only with binary input patterns, which didn’t fit the P2P environment. So we used fuzzy ART net model which accepted “fuzzy binary” inputs (i.e. analog numbers between 0 and 1), and incorporated theories from fuzzy logic. The general structure of a fuzzy ART network is shown in Fig.1.
Fig. 1 Fuzzy adaptive resonance theory
A typical ART network includes three-layers. The layers F0, F1 and F2 are input, comparison and recognition layers, respectively. The input layer F0 gets the attributes of peers which are needed to be classified. Each peer advertises its capability in the comparison layer F1 for competence comparison. The nodes in F0 and F1 are composed of the entities of the ontology. The corresponding nodes of layer F0 and F1 are connected together via one-to-one, non-modifiable links. Nodes in recognition layer F2 are candidates of the semantic clusters. There are two sets of distinct connections between the layers: bottom-up (F1 to F2) and top-down (F2 to F1). Then there is a vigilance parameter ρ which defines some kind of “tolerance” for the comparison of vectors. F2 is a competitive layer, which means that only the node with the largest activation becomes active and the other nodes will be inactive (in other words, each node in F2 corresponds to a category). Therefore, every node in F2 has its own, unique top-down weight vector, also called “prototype vector” (it is used to compare the input pattern to the prototypical pattern that is associated with the category for which the node in F2 stands). But the traditional fuzzy ART mechanism has the following disadvantages: one is that they are slow compared to non-fuzzy algorithms. Fuzzy clustering algorithms tend to be iterative, which require repeatedly calculating the associations between every cluster/peer pair; furthermore, a single peer very often contains multiple themes. For example, the share contents in author’s computer are related two fields: basketball, and computer science. The traditional fuzzy ART clustering algorithms mentioned above assign each peer to a single cluster. So, for the problem of constructing P2P topology, this paper used the soft fuzzy architecture to classify peers into several appropriate clusters.
In a social system, people tend to be surrounded mostly by people who are similar to themselves in some sense. On the other hand, if people are related to very similar people only, this will lead to so-called “caveman worlds”, i.e. disconnected cliques which are not connected to each other. In practice, however, many people maintain relationships to people from different professions, geographical locations, etc., which are called long-range edges. The topology of the social network graph– where nodes are persons and edges are acquaintances connecting them – is called a “small world”. This phenomenon can be found not only in social networks, but also in a great number of other self-organizing systems. Watts and Strogatz [10] describe the basic notions of the clustering coefficient and characteristic path length measures as indicators of small-world networks (that is, short average distance and large clustering coefficient). Obviously, the small-world network structure can then be exploited for implementing an efficient search algorithm in P2P system, and we need algorithms to mimic the behavior of a social system so as to organize the P2P architecture into small-world model. Thus, this paper used modified fuzzy ART net to form the cluster including peers with similar interest, and provided the reverse selection mechanism in modified fuzzy ART to form semantically long-range edge among cluster so that the P2P architecture could be constructed into small world network.
3
P2P architecture based on modified fuzzy ART
Before describing our small-world P2P architecture based on modified fuzzy ART, we introduce several concepts. Peer profile. The P2P network consisted of a set of peers. Every peer had a knowledge base that contains the knowledge that it wanted to share. The knowledge was condensed to form peer profile, which consisted of a summarization of the peer’s contents. The paper would later explain in detail how profiles could be calculated. Peer neighbor set: a peer stored the addresses of other peers together with their profile. These sets of neighbors represented the connections between peers and thus define the P2P network topology. Common Ontology We assumed that in our P2P architecture, peers operated on knowledge represented in terms of common ontology, which provided a common conceptualization of their domains. Each peer maintained a modified fuzzy ART neural net to run the intelligent P2P topology construction algorithm which could be broadly divided into three stages: pre-processing, cluster building & small-world network formation and topology adaptation. 3.1
Pre-processing
While there were many traditional clustering algorithms available, peer clustering brought along many distinctive issues to deal with. One such issue is representation. A peer is typically represented as a profile, where each dimension corresponds to a term (word), and the value indicates the importance percentage of corresponding term
r
for the peer. In detail, each peer is represented as a vector I = [ω1 ,
L,ω ] , where n n
is the size of peer profile vector, ωi is the term frequency that indicates the number of times the term i occurred in the peer. We assume the vector to be normalized according to a sum norm, i.e.
3.2
∑
n i =1
ωi = 1 .
Clustering building & small-world network formation
Each peer maintained a personal semantic shortcut index, which defined the virtual topology of P2P architecture. Two strategies were used to create and maintain the semantic shortcut index in high dynamic P2P setting. Clustering strategy Although Fuzzy ART has the name “fuzzy” in the sense that it is used to work with Fuzzy data. But it is a hard clustering algorithm, which categorizes an object into a specific cluster. The basic fuzzy ART model doesn’t fit the P2P environment in which peer may belong to several categories. So, a modified version of Fuzzy ART was used for soft peer cluster building. Instead of choosing a maximum similarity category and applying the vigilance test to check if it is close enough to the input pattern, we checked every category in the F2 layer and applied the vigilance test. If the category passed the vigilance test, the peer was put into that particular category. The similarity measure computed in the vigilance test defines a degree of membership of the given input pattern to the current cluster. Each peer maintained a modified fuzzy ART neural net to run the intelligent P2P topology construction algorithm, which takes two input parameters, vigilance parameter (0 ≤ ρ ≤1) and learning rate (0≤λ≤1), and. The detailed steps are given as follows: Step 1: Initialization: Initialize all the parameters, that is, each peer will calculate its peer profile according to pre-processing. Step 2: Apply input peer profile: let I be the input next peer profile vector; Let P denote the set of candidate prototype vectors (semantic categories), which represents the features of the known categories for peer. Initially, the set P only contain the current peer’s own profile, since the current peer has no knowledge about other semantic categories. Step 3: Vigilance test: each prototype vector undergoes a vigilance test that compares the similarity between the prototype and the current input peer profile, let r r r r r sim[i]= I ∧ Pi I where I ∧ Pi is a vector that its ith component is equal to r r the minimum of I and Pi and • is the norm of an vector, which is defined to be
ķ
the sum of its components. The vector sim is used to record similarity between each prototype and the current input peer profile, which is sorted in decrease order. All prototype vectors which pass the vigilance test (that is, sim[i]>ρ) will be adapted to the given input peer profile (Step 4). Note that each prototype vector actually represents a specific cluster’s semantics. This mechanism makes the updated prototype vector accommodate the features of the new peer, that is, the updated prototype vector represents the features of the whole cluster (the original cluster plus
the new peer). If none of them passes the test, a new prototype (cluster) is created. Go to step 2 to continue for the next input peer profile. Step 4: Matched prototype update: The matched prototype is updated to move closer to the current input peer profile according to the following equation: r r r r Pi = λ ( I ∧ Pi ) + (1 − λ ) Pi , where λ is the learning rate. If λ is 1, it is called fast learning. After the update, all the prototypes are reactivated and the algorithm continues with the next input peer profile (step 2). Our modified Fuzzy ART has the following advantages: first, it avoid iterative search because every F2 node is checked. This makes it computationally less expensive; another advantage is that by eliminating the category choice step, this method reduces the number of user-defined parameters in the system. Inter-cluster strategy To construct the overlay with small-world network properties, each peer maintains a set of short range contacts pointing to peers in the same semantic cluster and a certain number of long range contacts to other semantic clusters. Inspired by small-world formation mechanism provided by Kleinberg [11], the long range contacts are obtained by choosing peer categories in reverse order in the above modified fuzzy ART algorithm (that is, select new neighbors whose profile is least similar to the current peer). The detailed process is given as follow: In step 3, select several categories from the sorted vector sim in bottom-up order (that is, the selected categories are semantically far away from the current peer), and connect the corresponding categories based on the following distribution: C*sim[j]2, where sim[j] represents the similarity between current peer and selected category j, and C is a normalization constant that brings the total probability to 1. Our approach made use of the above modified fuzzy ART to construct the semantic long-range edge in P2P topology.
ĸ
3.3
Query processing and Topology adaptation
When a peer received a query, the query profile vector was summarized as the similar way of calculation of peer profile. Then the query profile was input into the modified fuzzy ART model to recognize the appropriate cluster matching with query profile. If the query belonged to the semantic cluster where the current peer located (that is, the similarity between query profile and the current peer’s cluster exceeded the vigilance test parameter), then the query results was returned from the semantic cluster corresponding the query interest. Otherwise, if query profile belonged to other category which the current peer knows about, the query was forwarded to the semantic cluster corresponding to its interest. In the worst case, if query profile couldn’t be classified into any category (for, till now, the modified fuzzy ART has not recognized the new category), the new category was formed. Then, the query was forwarded to several semantic long-range neighbors according to the similarity measure which is defined as a simple scalar product between the query profile r r vector q and the category profile vector p . Note that, in the processing of query, the fuzzy weights are not be updated.
For the high dynamic P2P environment, this paper made use of gossiping mechanism responsible for propagating and learning change, in which each peer randomly selected other peer to exchange their neighborhood view, and used the modified fuzzy ART net to update the topology structure. Peers also inspected any passing queries that they have not issued themselves. Because storage space is limited, cached files and routing table entries may have to be replaced occasionally, which will be done using a LRU (least recently used) strategy. Generally, category (and peer) profile attributes are defined in two types: one is defined as a feature vector, in which each element represents the ability of the category (peer); another is defined as the attributes that is assessed by others who have contacted with. So, ideally, a category (peer) attributes are combined by those two types of features. That is, the modified fuzzy ART weights of certain semantic category should be updated according to the evaluation of the peer which contacted with this category in the past, then based on the changed fuzzy ART weights, the semantic small-world network should be updated. The processing of small-world P2P based on modified fuzzy ART is given in Fig.2.
Fig. 2 Architecture of P2P topology construction based on modified ART
4
Simulation results
In our experiment, some simplifying assumptions had to be made in order to reduce the complexity of the problems. The first simplification concerned peer profile: Instead of working with real peer knowledge base, we assumed peer profile vectors to consist of semantic categories, that is, we presume that each peer can be classified according to the topics it covers and – for our simulation – we assume this classification to be available for all peers. This means that each peer is represented by
r
L
a category vector I = [ω1 , , ωn ] , the weight ωi indicating how important topic i is for this peer. Ref. [12] suggested that the number of files per peer is significantly skewed in typical P2P networks: there are a few peers that hold a large number of documents whereas the majority of the peers share few files. So, we implemented a Zipf distribution for the number of topics per peers: most peer profiles consist of just one category whereas a few peer profiles cover many categories. In our experiments,
we used 2000 peers, each of which was allowed to have a routing index of size 20, and keep 14 “inner-cluster” neighbors and 6 “inter-cluster” ones. The query TTL was set to 5; the vigilance parameter is defined in the experiment is 0.7. We used the free software package PeerSim1.0 to perform the actual simulation. The simulator structure was based on components, which made it easy to reach extreme scalability and to support dynamism. Simulation experiments consisted of the following sequence of operations: We created the peers with their profile according to the peer profile distribution and arranged them in a random network topology, where every peer knew 20 random peers. We did not make any further assumptions about the network topology. This random graph will serve as a benchmark for our modified fuzzy ART-based intelligent P2P topology construction mechanism, that is, we will examine the topology characteristics obtained through our approach and evaluate which of the two graphs allows a better search. Similar with the Ref. [7], this paper used the weighted clustering coefficient and the characteristic path length (average shortest path length) to measure emergence of the P2P small-world network. Table 1. Characteristics of formed P2P topology
Modified fuzzy ARTbased Topology Random topology
(weighted)Clustering coefficient 0.35 (without weight) 0.21 (weighted) 0.01
Average shortest path 4.1 3.4
The (weighted) clustering coefficient and average shortest path in the stabilized P2P topology was given in Table 1. It was obtained that small world structure emerged: paths were short (although they are slightly larger than the average shortest path length in a random graph) and the clustering coefficient was significantly higher than its counterpart in a random graph. To evaluate search performance in P2P system, we used recall measures known from classical information retrieval (this measure indicated, for a given query, how many of the peers that had relevant information were reached), which was defined as follows: # peers found based on our approach in P2P network Re call peer = # peers in whole P2P network 0.8 Modif ied f uzzy A RT-based topology
0.7
random topology
Recall ratio
0.6 0.5 0.4 0.3 0.2 0.1 0
1
2
3 TTL
4
5
Fig.3 Recall ratio vs. TTL in two types of network topology
From the Fig. 3, we could obtain the recall in the modified fuzzy ART-based P2P topology was significantly higher than in a random graph. The main reason was that
our approach organized the P2P topology into small-world pattern, in which peers with similar contents were clustered in semantic category, and the long-range edges were constructed to maintain the connectivity in P2P systems. So, query could be answered by the category where the querying peer was located, or by other categories that could be reached through few hops in long-rang edges among categories. Note that the simulation in our experiment was mainly carried on a certain number of cycles, that is, in one cycle, each peer which maintained a modified fuzzy ART neural would run the intelligent P2P topology construction algorithm through gossiping protocol to exchange the view of semantic categories with randomly selected peer. So the communications overhead and computational complexity may be greatly larger than other P2P topology construction mechanisms, which should be further investigated in future work.
5
Conclusion
The traditional unstructured P2P systems randomly selected the peer neighborhood, and used the flood-like search mechanism to forward the query messages, which was not suitable for the features of P2P systems: the large size, vast and miscellaneous contents, and high churn rate etc. Thus, the intelligent network topology is required to route queries to a relevant subset of peers. In this paper we provided small-world P2P construction mechanisms based on modified Fuzzy ART. The most important feature of the Fuzzy ART neural network was its incremental clustering capability that was very effective in dynamic P2P systems. Considering that the traditional fuzzy ART clustering algorithms assigned each peer to single cluster, this paper used the soft fuzzy architecture to classify peers into appropriate clusters (one or more) or form new semantic category. Inspired by the small-world formation methods, the reverse selection mechanism in fuzzy ART was provided to form the long-range edges among semantic clusters. In detail, our intelligent P2P topology construction algorithm was composed of three stages: pre-processing, cluster building & small-world network formation (including clustering strategy to form the semantic cluster, and intra-cluster strategy to form the long-range edges in P2P topology) and topology adaptation using the gossiping mechanism. From the characteristics of small-world model and metric in information retrieval, the simulation results showed that the small-world P2P network emerged, and recall ratio in our approach was significantly higher than in random P2P topology. To our knowledge, it was the first time to use fuzzy ART neural network to construct P2P topology, but our works was very preliminary, which can be improved in the following aspects: The approach assumed same ontology used in peer profile and worked only with hypothetical semantic categories, which may not be realistic in existing P2P networks. In real P2P systems, the problems of emergent ontologies, ontology alignment and mapping, would have to be solved. The approach only briefly mentioned that: the modified fuzzy ART weights of certain semantic category should be updated according to the evaluation of the peer which contacted with this category in the past, which, in turn, made the
semantic small-world network updated. The further research is to realize this idea, and evaluate its effect on P2P topology and search performance.
References 1. Saroiu S., Gummadi K. P., Dunn R., Gribble S. D., Levy H. M.: An Analysis of Internet Content Delivery Systems. In Proc. of OSDI’02, Dec. 2002 2. Eng Keong Lua, Jon Crowcroft, Marcelo Pias, Ravi Sharma, Steven Lim: A Survey and Comparison of Peer-to-Peer Overlay Network Schemes. IEEE COMMUNICATIONS SURVEY AND TUTORIAL, MARCH 2004 3. Rodionov Maxim, Siu Cheung Hui: Intelligent Content-Based Retrieval for P2P Networks. Proceedings of the 2003 International Conference on Cyberworlds (CW’03) 4. Ravikumar Kondadadi, Robert Kozma: A Modified Fuzzy ART for Soft Document Clustering. International Joint Conference on Neural Networks, World Congress on Computational Intelligence, 2002 5. Arturo Crespo, Hector Garcia-Molina: Semantic overlay networks for P2P systems. available at: http://www-db.stanford.edu/~crespo/publications/op2p.ps 6. Haase P., Siebes R., Harmelen F. V.: Peer Selection in Peer-to-Peer Networks with Semantic Topologies. Proceedings of the International Conference on Semantics in a Networked World (ICNSW’04), LNCS 3226 7. Christoph Schmitz: Self-organization of a Small World by Topic. Proceedings of 1st International Workshop on Peer-to-Peer Knowledge Management, 2005 8. Witschel H. F.: Content-oriented Topology Restructuring for Search in P2P Networks. available at: http://www.sempir.informatik.uni-leipzig.de/VeroeffentlichungenDateien/simulation.pdf. 9. Li Mei, Lee Wang-Chien, Sivasubramaniam: Semantic Small World: An Overlay Network for Peer-to-Peer Search. Proceedings of the 12th IEEE International Conference on Network Protocols(ICNP’04) 10. Watts, D. J., Strogatz, S.: Collective Dynamics of ‘Small-World’ Networks. Nature 93 (1998), pp. 440–44 11. Kleinberg J.: The Small-World Phenomenon: An Algorithmic Perspective. In Proceedings of the 32nd ACM Symposium on Theory of Computing, 2000 12. Saroiu S., Gummadi P., Gribble S.: A Measurement Study of Peer-to-Peer File Sharing Systems. In Proceedings of Multimedia Computing and Networking, 2002