Ontology-Driven Peer Profiling in Peer-to-Peer Enabled Semantic Web Olena Parkhomenko School of Computing & Engineering University of Missouri-Kansas City
[email protected]
Yugyung Lee
E. K. Park
School of Computing & Engineering School of Computing & Engineering University of Missouri-Kansas City University of Missouri-Kansas City
[email protected] [email protected] also enable an improved semantic search, by providing access to the data/files, not explicitly listed on the web, and provides an alternative view on the possibilities of dynamic web services composition, based on peer-to-peer communication model. In this paper we will propose a draft of peer ontology-based peer profile in RDF format and demonstrate its manifold benefits for peer communication and knowledge discovery in both P2P networks and semantic web.
ABSTRACT Peer-to-peer (P2P) systems and Semantic Web are two novel technologies that face a lot of shortcomings if considered as isolated paradigms. We present an approach that utilizes ontologies to set up a peer profile containing all the data, necessary for peer-to-peer interoperability. Using this profile can help eliminate some major issues persistent in current P2P networks, such as security, resource aggregation, group management. We also consider applications of peer profiling for Semantic Web built on P2P networks, such as an improved semantic search for resources, not explicitly published on the Web, but available in a P2P system. We develop the ontologybased peer profile in RDF format and demonstrate its manifold benefits for peer communication and knowledge discovery in both P2P networks and Semantic Web.
2. Related Work There has been a lot of research work done both in the area of P2P networking and Semantic Web. In [9] the authors introduced an ontology engineering framework proving that ontologies can be successfully used to resolve generic interoperability issues and represent knowledge. A model for a dynamic composition of data and services as well as agent communication to retrieve data on the semantic web showed that ontologies could be successfully adopted to express various kinds of requirements [5]. One step further is taken in [1] with an approach to enable querying for semantic relationships between entities on the semantic web, based on general domain independent characteristics. A P2P network can be modeled similarly to demonstrate the possibility of effective querying relationships in a P2P community of interest. The concept of a community of interest on the semantic web, supported by ontology-based knowledge representation, was further developed and implemented in [10]. A conceptual ontology-based architecture for semantic web enabled web services was proposed in [3]. The underlying principles of this framework are decoupling and scalable mediation service based on peer-to-peer communication that served a stepping stone for our approach. The critical necessity in many domains to retrieve on-line services that provide useful behavior was addressed in [6]. Among some practical implementations of the combination of P2P and semantic web is SELF-SERV in [2], the system through which existing web services can be declaratively composed, and the resulting composite services can be executed based on P2P paradigm. An interesting approach for a distributed discovery service on the base of a peer-to-peer infrastructure was discussed in [11]. The approach was developed as an alternative to the existing centralized web service registry – UDDI – which does not have the capability to manage models or schemas of metadata. It can potentially let peer-to-peer network not only to enable service discovery, but also to handle the service binding and executing the services on behalf of the query originator.
Categories & Subject Descriptors: H.3.3 Information Search and Retrieval General Terms: Management, Design, Standardization Keywords: P2P, Semantic Web, ontology, profiling, peer profile
1. Introduction There has been a lot of interest recently in P2P technology, since it promises an improved scalability, by eliminating centralized dependencies, allows for aggregating resources, while maintaining low-cost interoperability, eliminates the need for an expensive infrastructure by distributing the costs of maintenance among clients/peers. While the benefits of P2P are obvious, there are still challenges to face with one of them being interoperability. Quite a few P2P systems already exist, but there is still no support to enable these P2P systems to interoperate [7]. Another novel technology has developed along with P2P systems, and that is Semantic Web. One of the key enabling technologies for the Semantic Web are ontologies, because they enable a shared and common understanding of a domain that can be communicated between people and application systems [4]. The approach proposed in this paper utilizes ontologies for peer information exchange format, used to construct dynamic peer profiles containing all the data, necessary for peer-to-peer interoperability. In other words, ontology is used to solve the interoperability issue in P2P networks. Since our approach considers semantic web, based on peer-topeer networks, the ontology-driven peer discovery algorithm can Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CIKM’03, November 3–8, 2003, New Orleans, Louisiana, USA. Copyright 2003 ACM 1-58113-723-0/03/0011…$5.00.
3. Proposed solution in a nutshell As current research and implementations suggest, both P2P systems and Semantic Web have to adapt to a highly dynamic environment. Effective data and services retrieval approach should
564
take into account dynamic composition of peer-to-peer community of interest for the purpose of completing a task that involves a dynamic retrieval and use of web services. To bring out the full potential of peer-to-peer networks, it is important to understand and explore the social interactions between the peers [7]. This can be achieved by ontology-based peer profile that provides a universal data format for peer classification and management. Use of ontologies for peer profiling will provide the following benefits [8]: • Ability to share common understanding of the structure of information among both peers and software agents • Enable re-use of domain knowledge • Make domain assumption explicit • Separate domain knowledge from the operational knowledge • Analyze domain knowledge Figure 1 shows an example scenario of using ontology-driven peer profiles to determine a virtual peer community that hosts several web services used to buy a travel package. A machine that needs to discover its peers on the network, be it for service retrieval or information search, can send a flooding request (broadcast) to directly connected peers. The response can be negative, positive or a reference (IP address) of a peer that can potentially be part of the necessary group. Communities of interest can be as small as two peers, and as big as the whole web. They can overlap, or some peers can belong to several communities based on their profile. The ontology-driven peer profiling will provide the following advantages: • Resolve peer-to-peer interoperability issues • Dynamic peer discovery • Dynamic web services composition • Address privacy and security issues (can have different copies of peer profiles for different requests) • Allow to search for data and web services, as well as retrieve data, not explicitly posted on the web (file exchange) Travel Community
discovery of information and services hosted on peers, explicitly published on the Web. Security is by far one of the main components and one of the biggest challenges of a robust P2P system. The reason for it lies in the benefit provided by P2P, namely a peer community member can act as both a client and a server, and as such it should have the mechanisms to handle the risks associated with unauthorized data access, since only authenticated and trusted information or service requests should be granted. The security requirement in the existing P2P systems either requires potentially cumbersome intervention from the user, or interaction with a trusted third party [7]. The metadata peer profile proposed here provides a solution that allows a peer to maintain, access, and update security and trust data in its metadata profile that can be stored both locally and centrally. The profile can be used to handle the issues of digital rights management, peer reputation, trust and accountability. Ability to aggregate resources of the interacting nodes in a P2P system is one of the crucial requirements. Resources can range from files, services and other content residing on a node to the bandwidth, disk space and CPU processing power. It is both difficult to classify the architecture of resource aggregation and to implement it, due to the vast resource diversity. In this respect, the ontology-based profile is a very handy tool, since not only does it provide detailed information about the available resources on each peer, but it also delivers input and output parameters and possibly other aggregation-specific information, wherever applicable, which should significantly simplify resource aggregation. Peer group management includes discovery of other peers in the community and location and routing between those peers [7]. Peer discovery algorithms in practice depend on the network topology, and therefore can range from highly centralized to highly distributed ones. The factors that influence such diversity include, for example, the nature of peers, since wireless or mobile devices depend on their own range of communication. The existing peer discovery algorithms are in essence location and routing algorithms that can be classified into three major models ([7]): centralized directory model, flooded requests model and document routing model. While centralized directory model can be quite efficient for some specific P2P systems, such as Napster, in this paper we will concentrate on pure P2P models with no prior advertisement of shared resources. Table 1 shows a comparative analysis of P2P models and their ontology-based peer profile solution. The document routing model looks very promising for an integrated P2P and Semantic Web solution, since it proved to be very efficient for large, global communities. This model is implemented by assigning a random ID to each peer on the network and keeping the registry of the known peers. When a resource is shared or a document is published on such network, an ID is assigned to the document based on a hash of the document’s contents and its name [7]. The document is then routed by each peer, with a local copy kept, to the peer, whose ID is most similar to the document’s ID. So when a peer sends a request for a document on the P2P network, it is forwarded to the peer with ID, most similar to the document’s ID, with the process ending when a copy of a document is found. The ontology-driven peer profile proposed here easily builds over this model, by keeping a record of the known peers registry, current peer ID and detailed information of all documents and resources available from or known to a given peer. The peer ontology adds the semantic accuracy for finding any type of answers on the P2P network, as well as semantic web.
Library Community
… … -
broadcast established P2P community of interest connection negative membership response positive membership response
Figure 1. The Ontology-Based Community Group
4. Conceptual model The proposed model is designed to smoothly build into the existing peer-to-peer infrastructure, enhance it, and provide a way of integration with semantic web architecture. Peer networks are very dynamic in nature, not only do they connect a wide spectrum of communication paradigms, ranging from desktop machines to various mobile and wireless devices, peer groups on a P2P network frequently change, since devices go on and off at unpredicted times. The challenge here is to provide application level connectivity and task processing. Information and service retrieval on the semantic web face similar issues, since thousands of new resources appear and evolve dynamically. Thus, the proposed ontology-based peer profile will enhance peer communication mechanisms, its metadata representation will provide a bridge for semantic search and
565
While it is obvious that the proposed ontology-based peer profile has an inherent potential of providing the missing link between P2P and Semantic Web and enhancing both the technologies, it still has to face the challenges related to ontology use: Peer ontology update: In P2P environment we cannot expect any ontology maintenance, since users often won’t even know what is in the ontologies on their machines [4]. The peer profile ontology is not an exception. The problem here is that it is crucial that every
single peer utilizes the same profile structure which in return relies on the peer ontology. One of the possible solutions might be agreeing on a standard structure of the peer ontology, since if defined with enough details, there will not be any need for frequent updates. Peer profile maintenance: While ontology structure might be the same for any peer, each peer will instantiate it with its own profile data. A mechanism should be provided to keep the profile content accurate and up to date.
Table 1. Comparative analysis of P2P models vs. ontology-based peer profile solution Model Centralized metadata index location inquiry from central server, download directly from peers (Napster) Broadcast request to as many peers as possible, download directly Document routing within uni- or multidimensional ID space
Reliability Central server returns multiple download locations, client can retry
Issues Central point of failure for managed infrastructure, with large number of requests, not very scalable
Peer profile solution Use distributed public peer registry storage (as, for example, UDDI)
Receive multiple replies from peers with available data, requester can retry Replicate data across multiple peers; keep track of multiple paths to each peer.
Not very scalable solution, efficient only in limited communities.
Simplifies discovery of information and services on P2Pbased semantic web, if integrated with domain ontologies and semantic web, one can request the ‘right’ peers, instead of flooding the network. The ontology-driven peer profile easily builds over this model, by keeping a record of the known peers registry, current peer ID and IDs and detailed information about all resources available from or known to a given peer, adding semantic accuracy to information retrieval.
Document ID must be known before posting a request for a given document, hence difficulty implementing search; community islanding issue
P2P network, registering itself with the directly connected peers and searching for a service using category property. We do not propose here a new discovery algorithm, but we would like to emphasize that the peer profile easily builds in and can significantly enhance all existing P2P models by adding semantics to the communication and interaction mechanisms, and especially by providing a possibility to for a generic resource discovery approach, suitable for both P2P and Semantic Web.
5. Implementation 5.1 Peer profile The peer profile ontology (Figure 2), developed in accordance with principles outlined in [8], has several major classes, such as Resource, PeerRegistry and Agreement that are crucial for enabling profile-based peer communication and resource discovery. Let us briefly describe the properties and functions of those classes. ‘Resource’ is an abstract class with its subclasses (Software, Document, WebSite, WebService) covering several possible types of resources hosted at or available through a given peer. Each subclass has its own properties that best characterize it. For example, web service is represented in terms of its input/output parameters, namespace, binding attributes, available operations, etc. Apart from the subclass-specific properties, all the subclasses inherit general information, such as ID, price, provider, name, category, from their parent class ‘Resource’. Below is an example of the Document class: The paper describes enhancing P2P and Semantic Web with peer profiling …
Figure 2. Peer profile ontology with classes and slots The peer profile holds the capacity for timed content update (RefreshTimer class), group membership handling (GroupMembership class), query and request language support definition (Language class). It also provides the possibility of utilizing UDDI services registry, since every resource instantiated within the peer profile will have a unique ID.
5.2 P2P algorithms with peer profile In the document routing model that we adopt to demonstrate our approach, the peer identifier space is multidimensional, that is each peer keeps track of its neighbors in each dimension. Peer profile is used to enhance the communication medium. Currently, when a new peer joins the network, it randomly chooses a point in the identifier space and contacts a peer currently responsible for that point. Then, the contacted peer splits the entire space for which it is responsible into two pieces and transfers responsibility of half to the
‘Agreement’ is another parent class for various policies and terms of resource/peer use. It is designed to provide vast criteria for setting up security information and resolving some major security issues pertinent to P2P networks. ‘PeerRegistry’ is developed to support peer group management. A possible example scenario has a peer entering a
566
new peer, which contacts all of the neighbors to update their routing entries [7]. After that the peer profiling comes into play. First, the refresh timer on all entries in the new peer profile is checked and all expired entries are updated. Then, the new peer updates its peer registry section of profile with information about its neighbors. It then advertises to the neighbors its original group name and available categories of resources and collects the same information from the neighbors. To increase the robustness of this algorithm, the entire identifier space can be replicated to create two or more “realities”. In this case peer can have a separate profile entity for each reality. In each reality each peer is responsible for a different set of information; therefore if a document cannot be found in one reality, a peer can use the routing information for a second reality to find the desired information. The final step is to register the available services with UDDI. Here is the algorithm, showing how peer profile is used along with conventional peer management (membership and discovery): Let N be a total number of peers in a peer community and P[i1, i2…ik] be peer identifier space, where Pr[i] a peer is responsible for and k