Hai Jin , Hao Wu, Yunfa Li, Hanhan Chen. Cluster and ..... H. Zhuge, J. Liu, âFlexible Retrieval of Web Services,â Journal of Systems and Software,. Vol. 70, No.
Semantic-Overly-Driven Web Services Discovery Hai Jin , Hao Wu , Yunfa Li, Hanhan Chen Cluster and Grid Computing Laboratory, School of Computer, Huazhong University of Science and Technology, Wuhan, 430074, China {hjin, haowu, yunfali,Chenhanhua}@hust.edu.cn
Abstract. There are many common characteristics between Web service and Peer-to-Peer (P2P) computing environment. Therefore, carrying out Web Service on P2P network is a perfect project which can utilize the advantages of P2P to achieve the services integration and resource self-government. The service discovery is a key step during this convergence. In this paper, a semantic-overly based approach is presented for web service discovery. To enable the semantic web service superiority, service profile is used to describe web service and as the service data source. The service-expertise based model is proposed for service’s node selection. Meanwhile, similarity functions in our approach are accentuated for measuring the semantic similarity of different elements inside a service profile.
1 Introduction Peer-to-Peer (P2P) is an animate network architecture which has already inaugurated a research area and a huge industry. Many design problems associated with directory services, communication protocols, and message formats are being addressed by P2P applications. Because all the peers have the same software and communicate symmetrically, as view as Web service providers, each of them distributes on a loose coupling network and some providers are also the consumers of which provides supporting services for them. Thus there are many common characteristics between Web service and P2P computing environment. Therefore, carrying out Web Service on P2P network is a perfect project which can utilize the advantages of P2P to achieve the services integration and resource self-government. On the other hand, Semantic Web Services (SWS)[1] is integrated with semantic web techniques to enable automatic discovery, selection, execution and monitoring of Web services by following either inter-organization business logic or mark-up based on first-order logic derivatives, and it already enabled rapid use of e-commerce based applications. OWL-S is just a service ontology to describe web service with semantic, and it has three parts as a Service Profile, a Process Model and a Grounding. The Service Profile describes what a service does and is the main part for semantic service matching. Service request and service advertisement can both derive from it. Furthermore, the Service Profile is registry-model-neutral and can support various registry models.
To enable the SWS superiority together with P2P, a semantic-P2P based approach is designed for web service discovery in this paper. And the approach mainly focuses on service discovery which includes two phases: (1) The discovery of service peer. (2) Searching service in peer’s service registry. In our framework, similarity functions are used to select peer and also play key roles in service selection. We will emphasize them in this paper and apply them to the different step during service discovery. The paper is structured as follows. In Section 2 we present semantic-P2P based service discovery framework. We will illustrate the expertise-based model for service nodes selection. Similarity functions in our approach are accentuated. Section 3 reviews previous work on service discovery. It also summarizes related works on semantic P2P architecture. Finally, we conclude our works and discuss future work.
2. Semantic-P2P based Service Discovery Framework
2.1 Expertise-based model for services node selection At present, there are totally three kinds of pattern for Web Service discovery, listed as follows: Matchmaking: In this pattern, there is usually a service registry center which all the service providers register their service descriptions to. Then matching is made between the query and the services registered, and the optimal service information is returned to the service requestor. Broker: The broker just performs as an agent server. Both provider and requestor register its services or its request to the broker; then, the broker finds the best service and deliveries the query to that service; it also relays the answer to the original requestor returned by that service. P2P model: This model utilizes the distributed ability of the service registry to avoid the single point fault. The service request is diffused by the routing mechanism over the P2P network, and searching the most appropriate service for the query. However, a critical factor of the success of P2P model is the available network bandwidth. As the number of peers searching for content or communicating in the network increases, the resulting traffic can potentially block the whole network bandwidth. In this section we introduce an expertise-based model for service discovery, in which each peer advertises its Service Expertise (SE) in the network. The peer selection is based on semantic matching the subject of a service request and the service expertise, which reduces the average lookup length of the query, and make the query routing effectively, accordingly the bandwidth consumed in P2P network cuts down. The model is shown as Fig.1 as following. Expertise-based model has already been presented in [2], here we improve on and apply it to our work.
Fig. 1. Expertise-based service peer matching
2.1.1 Semantic description of service expertise Peers: The P2P network consists of a set of peers P. Every peer p∈P performs service provider and service consumer simultaneously. Each peer also has Local Service Registry (LSR) and Metadata/Knowledge Repository. LSR which provides local service registry can adopt different registry center architecture, such as UDDI, eXML. Local KR or MR stores metadata item which include the metadata extracted from Web service description and the metadata for describe peer resources, e.g. expertise. Some of these resources also act in service discovery. Common Ontology: The peers share an ontology O, which provides a common conceptualization of Web service domain. The ontology is used for describing the expertise of peers and the subject of requests. Here the ontology O can be a ServiceTaxonomy (it includes a set T and the relations between the concepts (T×T)) about Web service classification, and all peers refer to it when to categorize Web service. Note that to make clients easily find the service, the service taxonomies both are supported by UDDI and Service Profile [3]. And several well-defined industry standard taxonomies have existed today, such as the North American Industry Classification System (NAICS) 1 and UNSPSC 2 . Based on these taxonomies, the ontology ServiceTaxonomy can be created for services classification. Service Expertise: An expertise description e ∈ E is an abstract, semantic description of the knowledge base of a peer based on the common ontology O. This expertise can either be extracted from the knowledge base automatically or specified in some other manner. In our model, the expertise model can be a concept hierarchy as shown in Figure 2. Service Expertise is a subclass of the expertise, and it is an abstract semantic description of the local service registry. Category, Funcitonal Description and NonFunctional Description can also be the instance types of the service expertise. Here, we take the ServiceTaxonomy as base of the service expertise model, namely, we adopts Catergory as the implementation of the service expertise, rather than the functional description and the nonfunctional description. Service
1 2
http://www.census.gov/epcd/www/naics.html http://eccma.org/unspsc/browse/
expertise E is defined as E ⊆2T, where each e⊆ E represents a set of ServiceTaxonomy concept, and each peer provides the instances of the category.
Fig. 2. Service Expertise
Peer Advertisements: An advertisement a ∈ A is used to promote descriptions of the expertise of peers in the network, associates a peer p with an expertise e with relation A ⊆ P×E. An advertisement indicates what kinds of Web services peer majors in, and consists of service topics in the ServiceTaxonomy. Peers decide autonomously, without central control, whom to promote advertisements to and which advertisements to accept. This decision can be based on the semantic similarity between expertise descriptions. 2.1.2 Services node search and selection Service Requests. Requests r ∈ R are posed by a user and are evaluated against the service registries of the peers. First a peer evaluates the request against its local service registry and then decides which peers the request should be forwarded to. Request results are returned to the peer that originally initiated the query. Request Subjects. A subject s ∈ S is an abstraction of a given request r expressed in terms of the common ontology. The subject can be seen a complement to an expertise description, as it specifies the required expertise to answer the query. Analogy to the expertise, a subject is an abstract of service request. During service discovery, each s is topics set from ServiceTaxonomy. For example, one peer supports consultation service for traveling, and then its subject can be: Information Retrieval/Travel Consultation. Similarity Function. The similarity function SF is used to measure the semantic similarity between two entities. An increasing value indicates increasing similarity. It has three characters: (1)SF(a,b)=1, if a=b; (2) SF(a,b)=0, if a disjoint with b; (3) SF(a,b)=SF(b,a); SF:S×E→[0,1] yields the semantic similarity between a subject s∈S and an expertise description e∈E ,and is used for determining to which peers a service request should be forwarded. Another similarity function SF:E×E→[0,1] can be defined to determine the similarity between the service expertise of two peers. We present the similarity functions used in section 3.3 in detailed. Peer Selection Policy. The peer selection algorithm returns a ranked set of peers, some peers in which is selected as the next destinations for the query routing. For
each service request r, the peer first calculates the topics of r with all the expertises contained with array E[], then ranks all the similarity value computed in array S[], at final select all peers whose rank value is above a certain threshold. 2.2 Semantic topology in service network The semantic topology (as shown in Fig. 3) relies on the knowledge of the peers about the expertise of other peers; thus it is independent of the underlying network topology. Due to this, the TTL of service request need not be conceived, as it can be processed by underlying P2P protocol. The semantic topology can be described by the relation: Knows ⊆P× P, where Know (p1, p2) means that p1 knows about the expertise of p2. The relation Knows is established by the selection of which peers a peer sends its advertisements to. Furthermore peers can decide to accept an advertisement to include it in their registries, or to discard it. The semantic topology in combination with the expertise-based peer selection is the basis for semantic routing.
Fig. 3. Layered Architecture of the Framework
We present our abstract description of service. A quaternion is defined for describing a service which can be expressed as (Category, Fi, Qi, Ci), where Category is the class label of a service taxonomy, the category is important for service classification and service selection. And it also is taken to compute similarity with expertise to do peers selection and query routing. Fi is the functional description of a service; Qi is the QoS attributes description of a service, while Ci indicates the cost of a service. Service Profile provides functional and nonfunctional description of a service, however the model for describe QoS is not formally enough. Some works has already done to enhance this weakness [4].
At present, the existing service match and selection focus on functional part and QoS properties of a service, and make a suitable choice based on the matching degree. While in our semantic P2P model, we do the matching process in several steps. Firstly, all the services will be classified when they are registered. Searching and peer selection is based on the classification. The classification is under the reference of a ServiceTaxonomy which provides a category vocabulary repository for the all web services, and is represented by ontology to all peers sharing and referring to. Secondly, we do matching on functional part; then, we do similarity computing on QoS attributes. We design this based on suppose that a consumer firstly conceives of the Fi of service fitting in with its requirement or not during service discovery, then the Qi and the Ci is paid its attentions. The service discovery flow is described as followings. 1. Firstly, semantic topology is set up in P2P network, since this is the base of semantic-based routing. Each peer joined in will advertise its service expertise to network, then peers create semantic overly topology according to the known relations of the expertise between them. 2. One original peer initialize a web service request described with Service Profile, and in which the key information ServiceCategory must be initialized for semantic routing of the request. 3. The original peer compute the similarities between request r and all expertise ( a set of e)of other peers cached in local-storage, and select peers whose corresponding similarity value (i.e. sim(r,e) ) is larger than a certain threshold to forward the request. 4. The peer located in query forwarding path, when receiving the request, they perform the same procedure as in step 3 and forwards the request. Meanwhile, the peer searches the service in local registry by the request. The search focuses on semantic similarity computing for (Fi, Qi, Ci).When the suitable services are found, the result will be returned to the original peer. 5. When the original peer received the query result, it will analysis them and communicate with the service provider by SOAP message. 2.3 Similarity functions in our approach To combine the local similarities to the global similarity Sim, we use a weighted average by assigning weights Wi to all involved local similarities:
SimWeightedAverage = ∑i =1 wi ⋅ simi , where∑i =1 wi = 1 n
n
(1)
The weighted average allows a very flexible definition of what similar means in a certain context. During our similarity measure for web service, we can divide our global similarity into different parts, such as category part, functional part, and nonfunctional part, and aim at the specialty of each part to select and design similarity function. Then by amalgamate these local similarities into global similarity, we easily optimize local part to improve on global measure performance.
2.3.1 Category similarity To compare the classifications of two services according to the ServiceTaxonomy, we give the category similarity function. We can build the semantic topology of the P2P network according to the taxonomic similarity of the peers’ expertise; here we can use the service category as the expertise. In [5], the author proposed a metric to measure conceptual distance between C1 and C2 in hierarchical “is-a” semantic nets. The distance between C1 and C2 is the minimum number of edges separating C1 and C2. This approach assumes that the domain of measurement is represented by a network and that concepts within have a purely hierarchical relationship. Subsequently [6] have compared different similarity measures and have shown that for measuring the similarity between concepts in a hierarchical structured semantic network, the following similarity measure yields the best results: ⎧ αl eβ h − e−β h ⎪e ⋅ β h S im c a te g o r y ( C 1 , C 2 ) = ⎨ + e−β h e ⎪ 1 ⎩
if ( C 1 ≠ C 2 ) o th e r w is e
(2)
α≥0 and β≥0 are parameters scaling the contribution of shortest path length l and depth h in the concept hierarchy, respectively. The shortest path length is a metric for measuring the conceptual distance of C1 and C2. Intuitively, concepts at upper layers of the concept hierarchy are more general and are semantically less similar than concepts at lower levels. With function (2) we can compute the similarity of any pair of category concepts. For the situation that the category topics of the subject of a service request is multiple, we can iterate over all topics of the subject and average their similarities with the most similar topic of the service expertise. 2.3.2 Functional similarity As usual, the functional description in Service Profile is IOPE, i.e. Input, Output, Precondition and Effect, they have corresponding object properties in Profile, as hasInput, hasOuput, hasPrecondition and hasEffect respectively. The matching of IOPE is the key for whole service matching, as this we also pay our main attention on IOPE’s similarity. There has already some works on precise matching on functional structure, such as [7] and [8]. Because our matching is based on one precondition that service request and service advertisement is expressed with the same ontology (Service Profile), thus we conceive of the parameters of the IOPE instead of the XML tags of the IOPE. In our algorithm, the first step is dividing the IOPE and expresses them in four sets {Input}, {Output}, {Precondition}, and {Effect}. The elements in these sets are those operation parameters opposite to IOPE in turn (Some tools, e.g. OWL-S API 3 can be used to extract these parameters from the service profile document). Then for each set, we use Set similarity [9] to compute the IOPE similarity between two different profiles. As often, it is necessary to compare not only two entities but two sets of entities. Multidimensional Scaling (MDS) [10] is known from statistics and is a family of methods that map a set of points into a finite-dimensional flat (Euclidean) domain, where the only data given is the corresponding distance between every pair 3
http://www.mindswap.org/2004/owl-s/api/
of points. It can optimize set similarity computing. As the calculation of similarity values between single entities was already did (can do this with function (2)), by MDS each entity can be described through a vector representing the similarity to any other entity contained in the two sets. Then for both sets a representative vector can now be created by calculating an average vector over all individuals. And the cosine between the two set vectors through the scalar product can be determined as the similarity value. S im { set} ( A , B ) =
∑
a∈ A
a ⋅
∑
b∈ B
(3)
b
A ⋅ B
With set A and B are defined analogously. A={a1,a2,…}, a=(sim(a,a1), sim(a,a2),...sim(a,b1),...); B={b1,b2,…}, b=(sim(b,b1),sim(b,b2),... sim(b,a1),...); In final step, the overall similarity for how well the two services match on function (see formula 4) is computed by matching the services names and by identifying the pair-wise correspondence of their IOPE. (Note that, different from WSDL, Service profile puts all the Input and Output parameters together instead of associating them with each atomic process.) SimFunctional (S1, S2 ) = w0 ⋅ SimServiceName (S1, S2 ) + w1 ⋅ Sim{Input}(S1, S2 ) + w2 ⋅ Sim{Output}(S1, S2 ) + w3 ⋅ Sim{Precondition}(S1, S2 ) + w4 ⋅ Sim{Effect}(S1, S2 ),
(4)
4
where∑wi =1; i=0
2.3.3 Non-functional similarity Non-functional aspect of the service describes the constraints such as the Quality of Service, management statements, security policies, pricing information, and other contracts between Web services. According to [11], QoS of Web services addresses some generic dimensions such as price, execution duration, availability and reliability. Assuming that there is a set of web services that have the same non-functional properties (Here we focus on QoS part), and using m QoS criteria to evaluate web service, each QoS of service can be expressed with a dimension vector. As usual, QoS attributes of web services show hierarchical and layered. For example, service S1 and S2 provide similar service and their QoS grade are level 1 and level 2 respectively; suppose that level 2 excels level 1. As a rule, all the QoS criteria in S2 excel the corresponding one in S1. Thus we can compare two profiles on the basis of the similarity of the vector of QoS. Here the QoS semantic similarity mostly focuses on the numerical value semantic, as QoS usually is specified in numerical value, eg. response time adopts 5000 millisecond or 5 seconds as the measure, availability adopts percentage(99.9%).The numeric comparison need define better or worse semantic for the logical operator, such as larger-than, less-than, equal, etc. For example, S1 and S2 have QoS profiles formalized with Q1= {Response time=50ms, Availability=99.99%} and Q2= {Response time=100ms, Availability=99%} respectively. In our intuition, Response time1 is less than Response time 2 and then S1 gets over S2 on response time; while availability 1 is larger than availability 2, this indicates that S1 is a better choice than S2.
To simplify the similarity computation, normalization of the QoS attributes is proposed, and this can allow for a uniform measurement of service qualities independent of units. Think of cost per request, availability, and response time per request as three attributes to be considered. We can assign a utility score between 1 and 5, with five representing the highest utility, to each attribute’s value range. For instance, the scores for the availability attribute A could be 5 (A ≥ 99.999 percent), 4 (99.99 percent ≤ A < 99.999 percent), 3 (99.9 percent ≤ A < 99.99 percent), 2 (99 percent ≤ A < 99.9 percent), and 1 (A < 99 percent). Scores for cost and response time could be assigned in a similar fashion. In this method, the corresponding QoS profile may be as this: Q1= {Response time=2, Availability=4} and Q2= {Response time=1, Availability=2}.We define logical operator larger-than as a better semantic, i.e. the higher the normalize value is, the better level the QoS is. Figure 4 lists the algorithm matching QoS attributes for assessing the similarity of two corresponding QoS attributes in QoS Vector. Algorithm 1. Matching QoS Attributes Algorithm Input: QoS vector A and B Output: Similarity of the vector A and B Suppose: SQoS = {QA1, QA2 … QAn}, where Ai is the QoS attribute; int Simi (QAi, QBi) { if (QAi or QBi does not exist) score=0; else if (QAi is equal to QBi ) score = 1; else if (QAi is larger than QBi ) score = 0.8; else score =0; return score; } The global similarity for QoS description is defined as follows: S im
Q oS
(Q
A
, Q
B
) =
∑
n i=1
w i ⋅s im ∑
i n i=1
(Q wi
Ai
,Q
B
i
)
, where
∑ in= 1 w i = 1
(5)
Fig. 4. Matching QoS attributes
3. Related Works Converging P2P and Web service attach some research works on it. In paper [12], the author migrate their daml-s service invocating system to P2P platform gnutella and present a P2P-based web service system. However, they did not conceive of semantic topology issues which can reduce the average lookup length of the query, and promote query routing effectively, while the flooding query in gnutella consumes a huge bandwidth in P2P network. The partial similar works to us are presented in [13] and [14] which both present service-oriented semantic P2P system. The former’s
approach is based on Bibster which is bibliographic shared system and relies on semantic web and P2P techniques. The Bibster promote bibliographic information search through creating semantic topology in JXTA network. The later exposes existing Edutella/JXTA P2P services as Web services, and integrates Web services enabled content providers into Edutella/JXTA. Different from both service-oriented system, our works conceive of the service request routing in P2P and the service matching in local peer. Our system framework is totally designed for the web service, while their works are partial to provide the web service invocation for the corresponding function in Bibster or Edutella, such as bibliographic searching service, etc. The most similar works are reported in [15], they present semantic-link-based retrieval infrastructure for service discovery. We create the semantic overly based on the similarity between the peers, which are semanticly represented by the serviceexpert model. These are the different implementations for same concept. As far as the service matching methods are concerned, [7], [8] both presents description logic based precise matching for service. We utilize the semantic similarity to determine the service’s selection, and the similarity measurement is based on semantic distance of concepts in ontology. [16] and [17] propose the analogous method with us. The former works adopts the WSDL as the service matching source, and adopts text similarity computing techniques to make service similarity assessment. The similarity value consists of similarity of document texture and the vocabulary semantic distance within the Word-Net. Rather than WSDL, we done the similarity computing based on the service profile.[17] proposes the method which is based on the principle that the more shared information is hold by two concepts, the more similar they are. By quantitizing the individual information two OWL concepts contain and also the shared information between them, then the similarity between two concepts is measured, consequently, the similarity of two service ontologies can be measured. Other than computing the similarity between two concepts, we abstract and divide the service functional description (IOPE) into different parts, and make set similarity computing. Besides this, we add our similarity computing algorithm for QoS attributes matching which is absolutely necessarily for service discovery; whereas, others works often neglect this aspect.
4 Conclusion and Future works An approach for service discovery based on semantic-overly architecture is proposed in detailed. Service-expertise based model is adopted as the base for creating semantic topology between service peers. We also present the step-based service discovery procedure and accentuate on the similarity functions for service selection, such as Category similarity, Functional similarity, Non-functional similarity. The comparisons with related works are also discussed to show our works. The similarity-based method is being implemented for web service selection. (It is a reference sharing system based on P2P architecture and under developing with us) In the future, our model will be further improved on and the similarity functions should be studied deeply. Community-based web service management will be conceived of inside the semantic topology. Due to the answers returned to the original
peer may be excessive, the answers analysis will be studied so as to give a better precise answer. Acknowledgment. This work is supported by National Basic Research Program of China (973) under Grant No.2003CB317003. Furthermore, I am grateful to Yijiao Yu and other members in SemreX team for their creative advices.
References: 1. 2. 3.
4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.
S. A. McIlraith, D. L. Martin, “Bringing Semantics to Web Services,” IEEE Intelligent Systems, Vol.18, No.1, 2003. pp.90-93 P. Haase, R. Siebes, F. van Harmelen, “Peer selection in peer-to-peer networks with semantic topologies,” In Proceedings of International Conference on Semantics of a Networked World: Semantics for Grid Databases, 2004, Paris. K. Verma, K. Sivashanmugam, A. Sheth, A. Patil, S. Oundhakar, and J. Miller, “METEOR-S WSDI: A Scalable Infrastructure of Registries for Semantic Publication and Discovery of Web Services,” Journal of Information Technology and Management, Special Issue on Universal Global Integration, Vol. 6, No.1, 2005. pp. 17-39. C. Zhou, L.T. Chia, B.S. Lee, “Service Discovery and Measurement based on DAMLQoS,” In Proceeding International World Wide Web Conference 2005. R. Rada, H. Mili, E. Bicknell, and M. Blettner, “Development and application of a metric on semantic nets,” In IEEE Transactions on Systems, Man, and Cybernetics, volume 19, Jan/Feb 1989. Y. Li, Z. A. Bandar, and D. McLean. “An approach for measuring semantic similarity between words using multiple information sources,” Transactions on Knowledge and Data Engineering, Vol. 15, No. 4, July/August 2003. pp.871–882 M. Paolucci, T. Kawamura, T. R. Payne, K. P. Sycara, “Semantic Matching of Web Services Capabilities,” In Proceedings of International Semantic Web Conference 2002.pp. 333-347 L. Li and I. Horrocks, “A software framework for matchmaking based on semantic web technology,” In Proceedings of the Twelfth Int. World Wide Web Conf. (WWW 2003), ACM, 2003. pp. 331–339. M. Ehrig, P. Haase, M. Hefke, and N. Stojanovic, “Similarity for ontology-a comprehensive framework,” In Workshop Enterprise Modeling and Ontology: Ingredients for Interoperability, 2004. I. Borg, and P. Groenen, “Modern Multidimensional Scaling, Theory and Applications,” New York, Springer-Verlag, 1997. D. A. Menasce, “QoS Issues in Web Services,” IEEE Internet Computing, Vol. 6, No. 6, 2002. M. Paolucci, K. Sycara, T. Nishimura, and N. Srinivasan, “Using DAML-S for P2P Discovery,” In Proceedings of the First International Conference on Web Services (ICWS'03), Las Vegas, Nevada, USA, June 2003, pp. 203-207 P. Haase, S. Agarwal and Y. Sure, “Service-Oriented Semantic Peer-to-Peer Systems,” In Workshop on Intelligent Networked and Mobile Systems of WISE 2004. Q. Changtao, W. Nejdl, “Interacting the Edutella/JXTA peer-to-peer network with Web services,” In Proceedings of 2004 International Symposium on Applications and the Internet, 2004. pp. 67–73 H. Zhuge, J. Liu, “Flexible Retrieval of Web Services,” Journal of Systems and Software, Vol. 70, No. 1-2, 2004. pp. 107-116
16. 17.
Y. Wang, E. Stroulia, “Semantic Structure Matching for Assessing Web-Service Similarity,” In Proceedings of First International Conference on Service Oriented Computing, Trento, Italy, December 15-18 2003. J. Hau, W. Lee, J. Darlington, “A Semantic Similarity Measure for Semantic Web Services,” In Workshop of WWW2005, Web Service Semantics: Towards Dynamic Business Integration, 2005.