Semantic approach for Web service classification using machine ...

4 downloads 573 Views 2MB Size Report
Web service classification, a task of assigning a category to the service from a predefined set, is a challenging task nowadays as manually organizing and ...
SOCA DOI 10.1007/s11761-015-0182-1

ORIGINAL RESEARCH PAPER

Semantic approach for Web service classification using machine learning and measures of semantic relatedness Shailja Sharma1 · J. S. Lather2 · Mayank Dave3

Received: 19 December 2014 / Revised: 4 September 2015 / Accepted: 5 September 2015 © Springer-Verlag London 2015

Abstract Web service classification, a task of assigning a category to the service from a predefined set, is a challenging task nowadays as manually organizing and searching services are simply not feasible, given the time constraints or the exponentially growing number of services. In this paper, a hybrid approach independent of service description models is suggested for automatic classification of Web services to improve classification accuracy. The proposed classification approach assists the repository administrator and the users during registration and service retrieval, respectively. It utilizes the semantic as well as syntactic information present within the service description by combining the techniques from machine learning, data mining, logical reasoning, statistical methods and measures of semantic relatedness. The proposed approach applies Omiotis measure of semantic relatedness to transform the service vectors into semantically enriched service vectors which are used by the classification algorithms. Supervised machine learning-based support vector machines and k-Nearest Neighbor classifiers are used to categorize service profiles under different categories. Empirical evaluation and comparison of the proposed approach implemented on OWL-X dataset is presented for enabling the discovery and reusability of the existing services.

B

Shailja Sharma [email protected]

1

Department of Computer Applications, National Institute of Technology, Kurukshetra, India

2

Department of Electrical Engineering, National Institute of Technology, Kurukshetra, India

3

Department of Computer Engineering, National Institute of Technology, Kurukshetra, India

Keywords Web service discovery · Web service classification · Measures of semantic relatedness · Machine learning · Support vector machine

1 Introduction Web services, the emerging technology, provide the way to uncover the functionalities of information systems by utilizing the standard web technologies for automating the inter-organizational interactions. Lack of automatic mechanisms for service classification in the service repositories and non-utilization of explicit or implicit semantic information of a service during its publishing are the two major challenges faced by the service classification area on the Internet. Appropriate classification of services is a major issue as it facilitates the user to discover and reutilize the required services in the repositories over the Internet. Currently, the platform elements for Web services are Extensible Markup Language (XML) [1], HTTP [2], Web Services Description Language (WSDL) [3] and Universal Description Discovery and Integration (UDDI) [4] specifications. Web services are usually described using WSDL definitions and advertised in UDDI registries. The publishing, searching and finding services over the web are done through the centralized repository, i.e. UDDI. During registration of the service, it needs to be classified according to some taxonomy such as United Nations Standard Products and Service Code and North American Industry Classification System [5]. Among each taxonomy, there exist predefined categories such as Stock, Entertainment, Scientific from which the service publisher manually selects one of them for their service. This classification process is complex due to large number of classes present in the taxonomy. Many times, assigning the appropriate class to a service can be a tedious and error-prone

123

SOCA

task due to the large number of categories usually present in Web services registries. In practice, manual classification of services and maintenance of repositories by the human beings makes the process difficult and error-prone. Some of the issues that make the classification task more complex are: first, the size of the taxonomies in real-world applications is really large, each consisting of thousands of categories, organized in multiple hierarchal levels; second, there are multiple people involved in maintaining or sharing services in a common repository; third, several distributed repositories are shared these days; and fourth, proper classification of a service to a class requires good knowledge of the taxonomy, organization of the various classes, application domains and service profiles. Due to above reasons, the classification task becomes overwhelming for the UDDI administrators. Service users too have to manually browse and search published services by category. This is again a time-consuming process, and to determine the right category for the required service is often very tough which may hinder the reutilization of the published services. Due to lack of semantic in the current standards and difference in knowledge of the administrators and users, the services returned are inadequate and sometimes not relevant to the user query. In current scenario, the service classification during registration and service retrieval needs user intervention. To minimize the human intervention in the classification and retrieval, automatic mechanisms for classification are required. In order to deal with these problems, we propose an automatic hybrid classification approach which integrates the semantics of the service profiles with its statistical information and generates the semantic vectors for the services. These vectors are used by the supervised machine learningbased classification algorithms to improve the service classification process and hence its retrieval. The techniques from the measures of semantic relatedness and machine learning are applied on the operations and textual documentation, namely argument definitions and comments written by the service developers. The process involves preprocessing of the Web services data, integration of the semantics of the service profiles with the statistical information, and transformation of the data into a form that is suitable for the specific classification method. The proposed approach utilizes the principles from Machine learning, Data Mining, Statistical techniques, logical reasoning and Measures of Semantic Relatedness (MSR). The process for transforming the service profiles to semantic vectors is based on Omiotis measure of semantic relatedness [6], constructed from WordNet (the word thesaurus) and lexical ontology. It is capable of handling the synonymy and polysemy problems. We use Omiotis as it has shown highest correlation with human judgment among various dictionary-based measures of semantic relatedness. Research has shown that using Omiotis the text classification performance can be improved remarkably [7]. Here, we have

123

tried to extend, utilize and implement the Omiotis measure of semantic relatedness in the area of the Web service classification. Experiments have been performed to validate the efficiency of the presented approach. The main contributions of this paper are summarized below: • To present a hybrid approach for the automated classification of Web services. • To validate the efficiency of the proposed hybrid classification approach. • To present comparative view of improvements in terms of recall, precision and accuracy. The remainder of this paper is structured as follows: we present related work on classification of Web services in Sect. 2. Section 3 provides discussion on Omiotis and various other measures of semantic relatedness. Section 4 presents the proposed approach for the automatic classification of Web services along with the algorithmic steps followed to achieve automatic classification. Details on the implementation of proposed approach, tools, dataset are provided in Sect. 5 followed by an experimental analysis, comparisons and results. Section 6 concludes the paper and provides future directions for enhancements.

2 Literature review Various machine learning-based approaches [8–16], referred in literature, used for service classification differ on the basis of matching syntactic or semantic concepts of argument definitions [8,9,12] or document classification techniques [10,11]. Among the various classification approaches, METEOR-S Web Service Annotation Framework (MWSAF) [8] matches the argument definitions for classifying the WSDL documents. MWSAF, based on graph matching, converts the argument definitions into various graphs and then matches it with the ontological concepts for categories. In [9], Duo et al. convert the service definitions into the ontological concepts and then ontology mapping is done for classifying various services into their relevant class. METEOR-S [10] presents improved version of MWSAF, considers each Web service as a document and applies the document classification to the Web services. In comparison, ASSAM considers the natural language description and applies the SVM algorithm to the set of WSDL documents. Batra and Bawa [13] proposed a normalized similarity score (NSS)-based approach for semantic Web service classification. By using the NSS measure of semantic relatedness, the similarity score between the terms of a service and all the categories is found. In [16], an approach to automatic classification of Web services has been proposed by using several vector-based representation models for Web services. By combining the

SOCA

textual descriptions and the input /output signature for syntax /semantic annotations, the services are classified using different machine learning classifiers. AWSC [14] approach for automatic classification of service is based on the Rocchio algorithm where the each service is considered as a separate document and text mining, and machine learning techniques have been used for the service classification. Major observations of these approaches are: first, classification approach proposed by MWSAF has shown low accuracy and is implemented on small dataset. Second, Meteor-S, the machine learning-based enhanced version of MWSAF, is fast and has better accuracy, but it do not considers the service documentation or comments for classification. Third, ASSAM has the problem of low accuracy. The major disadvantage of the above said approaches is that most of these do not take into account the semantic information of the service profiles for the classification purpose. This semantic information can prove to be considerably beneficial in improving accuracy. The approach proposed by Katakis et al. [16] utilizes the semantic relationship for the service classification. Our observation regarding the work in [16] is that this approach does not consider the senses of the terms used in the textual description. Further, it does not consider the polysemy and synonymy issues for the textual terms. In comparison with some of the approaches presented above, we have implemented our approach on a larger service set of 1007 Web services. Our approach includes the service documentation or comments written in natural language for the task of classification and discovery. For selecting the category based on the Service information, we have used natural language processing, machine learning, subsumption reasoning, measures of semantic relatedness and text mining techniques. Our approach is different in terms of usage of Omiotis measure of semantic relatedness along with the machine learning algorithm that considers the sense disambiguation for the terms. Moreover, improvements in terms of accuracy, precision and recall have been attained during experimental evaluation.

[6] introduced Omiotis, a semantic measure based on WordNet to know the semantic relatedness score between texts. In Omiotis, final semantic relatedness score between texts is calculated by combining the words’ statistical information with the semantic relatedness in word level. For measuring the relatedness between different texts, semantic relatedness (SR) value between words has been used. Omiotis is the first measure of semantic relatedness between texts that considers all three factors for measuring the pairwise word-to-word semantic relatedness scores. Semantic path in Omiotis is calculated by considering three key factors: (a) semantic path length, (b) intermediate nodes specificity denoted by the node depth in the thesaurus’ hierarchy, and (c) types of the semantic edges that compose the path. Experimental evaluation proved that Omiotis measure of semantic relatedness approximates human understanding of semantic relatedness between words better than previous related measures [6]. Omiotis is based on a sense relatedness measure SR [6] and is described for a pair of terms T (t1 , t2 ) using the semantic compactness and semantic path elaboration as below: Definition 1 Consider a word thesaurus O, let T = (t1 , t2 ) ∈ O such that X 1 is the set of senses of t1 and X 2 is the set of senses of t2 in O. Let S1 , S2 , . . . , S|X 1 |·|X 2 | be the set of pairs of senses, Sk = (si , s j ), with si ∈ X 1 and s j ∈ X 2 . The semantic relatedness of T (S R(T, S, O)) is defined as: max Sk {max P {SC M(Sk , O, P) · S P E(Sk , O, P)}} = max Sk {S R(Sk , O)} for all k = 1..|X 1 | · |X 2 |.

(1)

The semantic compactness of S(SC M(S, O, P)) is defined  by the SC M(S, O, P) = li=1 wi , where w1 , w2 , . . . , wl are the path’s edges weights. The semantic path elaboration of the P E(S, O, P)) is defined by the S P E(S, O, P) = l path2d(S 1 i di+1 i=1 di +di+1 · dmax where di is the depth of sense pi according to O, and dmax the maximum depth of O. Semantic related/ O ness between two terms t1 , t2 where t1 ≡ t2 ≡ t and t ∈ is defined as 1. Semantic relatedness between t1 , t2 when / O, or vice versa, is considered to be 0. The t1 ∈ O and t2 ∈ details about Omiotis and related terms can be referred at [6].

3 Semantic relatedness and the Omiotis measure MSR are functions that try to find relative meaning similarity between two words or documents. Latent semantic analysis (LSA) [17], pointwise mutual information (PMI-IR) [18], normalized Google distance (NGD) [19], explicit semantic analysis (ESA) [20] and vector generation of explicitly defined multidimensional semantic space (VGEM) are some of the prevalent statistical measures to calculate the semantic relatedness between two words/concepts/paragraphs/documents. Thesaurus-based measures [21–23] have also been proposed to calculate the semantic similarity between words. Similarly, Tsatsaronis

4 Proposed methodology for semantic service classification and discovery The proposed approach consists of the following four major steps viz. creation of service vectors, semantic relatedness matrix calculation, enriching these vectors with semantic information and classification of the services. The proposed hybrid semantic service classifier performs a combined logicbased and Omiotis semantic similarity-based transformation of the service vectors. We have employed SVM and kNN classifier for classifying the Web services.

123

SOCA Fig. 1 Architecture of the proposed approach

4.1 Overview of the classification architecture Automatic service classification helps in identifying the relevant class for new service during service publication. The architecture of the approach is depicted in Fig. 1. The proposed approach involves preprocessing the Web services profiles, assigning the importance to each extracted word, integrating the semantic knowledge to the service vectors and transforming the data into a form that is suitable for the input to a classification method. The hybrid semantic integrator generates semantic service vectors which are computed by aggregated valuations of: (a) Ontology-based subsumption reasoning (b) Omiotis-based semantic similarity measure. The process combines the semantic as well syntactic technologies to generate semantic service vectors which are further used for classification of Web services. By adding the semantic information into the syntactic service vectors, semantically enriched service vectors are prepared. Thus, by embedding the semantic information into the syntactic service vectors, the service profiles are transformed into the semantically enriched service vectors. These semantically enriched service vectors are used as an input to the two machine learning-based classification algorithms k-Nearest Neighbor [24] and SVM classifier [25] to identify the best category among the N available categories.

Fig. 2 Snapshot of a section of service description for service BookAuthorPriceService

4. Machine learning-based automatic classification of services. The process for calculation of semantic vectors for services starts with the extraction of all relevant, non-trivial and quality information from the service profiles. The services are preprocessed to extract relevant information such as input, output, service names and other textual documentation accompanying the Web services from a service description file as shown in Fig. 2. The service profiles are mapped onto vectors using information retrieval techniques. We have used five data vector representations based upon the service information for modelling the service profiles. The extracted terms/concepts are assigned binary weights and stored in different vectors depending upon the representation of the services. Figure 3 gives the algorithm of the proposed approach, and detail of the algorithm is elaborated in following subsections.

4.2 Hybrid semantic integrator Hybrid Semantic integrator transforms the service profiles into semantically enriched vectors that can be used by the machine learning-based classifiers. It integrates the syntactic and semantic information of the services. The proposed framework is divided into four steps: 1. Creation of semantic service vectors for services. 2. Omiotis-based semantic relatedness matrix calculation. 3. Transformation of service vectors into Omiotis-enriched Hybrid Service Representations.

123

4.2.1 Creation of vector-based representations of service profiles We have used data representations [16] for the OWL-S service profiles in which the service profiles are transformed into vector-based data representations. The various vector representations are stated below: 1. T ext Service vectors representing only the textual descriptions of the service profiles. This vector extracts terms from the tag of the all

SOCA

Fig. 3 Proposed algorithm for classification and discovery of services

the service profiles. The terms/concepts are extracted and stored in term vector T er ms_T ext (|X |). A binary weight (0 or 1) is assigned for each term in the vector, based upon their presence or absence in each service descriptions. For any ser vice i in the service set, the text vector will be:

class concepts existing in the ontology hierarchy have also been included in the service vector and are assigned binary weights based upon its existence. For any ser vicei in the service set, the Sem vector will be:

T ext (Ser vice i) = T ext (x(i,1) , x(i,2) , . . . , x(i,|X |) ) (2)

4. T ext_Syn Service vectors represent the combination of text Text and syntactic information Syn of the service profiles. The terms/concepts are extracted and stored in term vector T er ms_T ext_Syn(|X + Y |). For any service i in the service set, the T ext_Syn vector will be:

Here |X | is the size of all the terms present in the textual comments of the service corpus. 2. Syn Service vectors representing syntactic information of service profiles containing concepts from the , , etc. tags of the service profiles. The terms/concepts are extracted and stored in term vector T er ms_Syn(|Y |). A binary weight (0 or 1) is assigned for each concept in the vector based upon its existence in the service descriptions. For any service i in the service set, the Syn vector will be: Syn(Ser vice i) = Syn(y(i,1) , y(i,2) , . . . , y(i,|Y |) )

(3)

Here |Y | is the size of vocabulary of all the Input and Output concepts present in the service corpus. 3. Sem Service vectors represent semantic information of service profiles. The terms/concepts are extracted and stored in term vector T er ms_Sem(|Y |). In this representation, OWL DL reasoner has been used for deducing the subsumption relationships such as exact, plugin, subsumption, etc. among the various Input and Output concepts of the service profiles. Accordingly, based upon the reasoner, for any concept all the superclass and sub-

Sem(Ser vice i) = Sem(y(i,1) , y(i,2) , . . . , y(i,|Y |) )

(4)

T ext_Syn(Ser vice i) = T ext_Syn (x(i,1) , x(i,2) , . . . , x(i,|X |) , y(i,1) , y(i,2) , · · · , y(i,|Y |) ) (5) 5. T ext_Sem Service vectors representing combination of text and semantic information of the service profiles. The terms/concepts are extracted and stored in term vector T er ms_T ext_Sem(|X + Y |). T ext_Sem(Ser vice i) = T ext_Sem (x(i,1) , x(i,2) , · · · , x(i,|X |) , y(i,1) , y(i,2) , · · · , y(i,|Y |) ) (6) In this representation, the textual information and the semantic information of the service profiles as present in the Sem(Ser vice i) deduced from the OWL DL reasoner is combined.

123

SOCA Table 1 Semantic relatedness matrices for five representations Service representations

Name of term vectors

Semantic relatedness matrices

T ext (n, |X |)

T er ms_T ext (|X |)

Semantic_Matri x_T ext (|X | , |X |)

Syn(n, |Y |)

T er ms_Syn(|Y |)

Semantic_Matri x_Syn(|Y | , |Y |)

Sem(n, |Y |)

T er ms_Sem(|Y |)

Semantic_Matri x_Sem(|Y | , |Y |)

T ext_Syn(n, |X + Y |)

T er ms_T ext_Syn(|X + Y |)

Semantic_Matri x_T ext_Syn(|X + Y | , |X + Y |)

T ext_Sem(n, |X + Y |)

T er ms_T ext_Sem(|X + Y |)

Semantic_Matri x_T ext_Sem(|X + Y | , |X + Y |)

4.2.2 Omiotis-based semantic relatedness matrix calculation

4.3 Transformation of service vectors into Omiotis-enriched hybrid service representations

A semantic relatedness matrix is a symmetric square matrix giving semantic relatedness score of each term with all other terms in the corpus. For all the five service representations of Sect. 4.2.1, i.e. T ext, Syn, Sem, T ext_Syn and T ext_Sem, the terms/concepts are extracted and stored in five term vectors named T er ms_T ext (|X |), T er ms_ Syn(|Y |), T er ms_Sem(|Y |), T er ms_T ext_Syn(|X + Y |) and T er ms_T ext_Sem(|X + Y |), respectively. Further, for all five term vectors as mentioned in Table 1, corresponding five semantic relatedness matrices, i.e. Semantic_Matri x_ T ext (|X | , |X |), Semantic_Matri x_Syn(|Y | , |Y |), Semantic_Matri x_Sem(|Y | , |Y |), Semantic_Matri x_T ext_ Syn(|X + Y | , |X + Y |) and Semantic_Matri x_T ext_ Sem(|X + Y | , |X + Y |), are calculated . For each of the five representations, the semantic relatedness score for each term of the services is calculated with the other terms in the each corpus using Omiotis measure of semantic relatedness derived by T (S R(T, S, O)) equation at (1), thereby forming five semantic relatedness matrices viz. Semantic_Matri x_T ext (|X | , |X |), Semantic_Matri x_ Syn(|Y | , |Y |), Semantic_Matri x_Sem(|Y | , |Y |), Semantic_Matri x_T ext_Syn(|X + Y | , |X + Y |) and Semantic_ Matri x_T ext_Sem(|X + Y | , |X + Y |). Each entry of the semantic relatedness matrix contains the semantic relatedness value between the term pairs of the vector in the corpus.

In this phase, the service vectors of the services are merged with the semantic information to transform the service profiles into semantically enriched vectors. In the proposed approach, the semantic information generated in the semantic relatedness matrices is combined with the information retrieval-based binary vectors to generate the semantically enriched service vectors through integration as shown in Fig. 4. Semantic information is enriched into T ext, Syn, Sem, T ext_Syn and T ext_Sem matrices. The service representation matrices, i.e. T ext, Syn, Sem, T ext_Syn and T ext_ Sem, are multiplied with their corresponding semantic relatedness matrices i.e. Semantic_Matri x_T ext (|X | , |X |), Semantic_Matri x_Syn(|Y | , |Y |), Semantic_Matri x_ Sem(|Y | , |Y |), Semantic_Matri x_T ext_Syn(|X + Y | , |X + Y |) and Semantic_Matri x_T ext_Sem(|X + Y | , |X + Y |) to generate a semantically enriched service matrices. These representations are transformed into Omiotisbased semantically hybrid representations: 1. Omiotis_T ext The T ext representation containing the terms from the textual description of services is enriched with Omiotis-based semantic information to generate semantically enriched vectors for the services. This is calculated by integrating the T ext matrix with the semantic relatedness matrix Semantic_Matri x_T ext (|X | , |X |)

Fig. 4 Transformation of service vectors into Omiotis-enriched semantic service vectors

123

SOCA

for the |X | terms present in the textual vocabulary of n services:

Omiotis_T ext_Syn(Service i) = Omiotis_T ext_Syn (ox(i,1) , ox(i,2) , . . . , ox(i,|X |) , oy(i,1) , oy(i,2) , . . . , oy(i,|Y |) )

(10) Omiotis_T ext (n, |X |) = T ext (n, |X |) × Semantic_Matri x_T ext (|X | , |X |)

Here each row gives semantically enriched service vector for a single service i

Omiotis_T ext_Sem(n, |X + Y |) = T ext_Sem(n, |X + Y |)

Omiotis_T ext (Ser vice i) = Omiotis_T ext (ox(i,1) , ox(i,2) , . . . , ox(i,|X |) )

5. Omiotis_T ext_Sem Omiotis-enriched Service vectors representing combination of text and semantic information of the service profiles are given by the product of matrices:

(7)

2. Omiotis_Syn The Syn matrix is merged with Semantic_Matri x_Syn(|Y | , |Y |) to generate Omiotis-enriched service vectors representing syntactic information of service profiles:

×Semantic_Matri x_T ext_Sem(|X + Y | , |X + Y |)

Each row gives the semantic vector for single service containing textual and semantic information: Omiotis_T ext_Sem(Ser vice i) = Omiotis_T ext_Sem (ox(i,1) , ox(i,2) , . . . , ox(i,|X |) , oy(i,1) , oy(i,2) , . . . , oy(i,|Y |) )

Omiotis_Syn(n, |Y |) = Syn(n, |Y |)

(11)

×Semantic_Matri x_Syn(|Y | , |Y | Here each row gives the hybrid vector for a single service: Omiotis_Syn(Ser vice i) = Omiotis_Syn (oy(i,1) , oy(i,2) , . . . , oy(i,|Y |) )

(8)

3. Omiotis_Sem Transformation of Sem matrix is attained by combining it with the Semantic_Matri x_Sem(|Y | , |Y |) which gives the Omiotis-enriched service vectors representing semantic information of service profiles: Omiotis_Sem(n, |Y |) = Sem(n, |Y |) × Semantic_Matri x_Sem(|Y | , |Y |) where each row gives the logic- and non-logic-based semantic information for each service: Omiotis_Sem(Ser vice i) = Omiotis_Sem (oy(i,1) , oy(i,2) , . . . , oy(i,|Y |) )

(9)

4. Omiotis_T ext_Syn Omiotis-enriched service vectors representing the combination of text and syntactic information of the service profiles are achieved by multiplying the T ext_Syn with the Semantic_Matri x_T ext_Syn (|X + Y | , |X + Y |) matrix: Omiotis_T ext_Syn(n, |X + Y |) = T ext_Syn(n, |X + Y |) ×Semantic_Matri x_T ext_Syn(|X + Y | , |X + Y |)

Every row gives a semantically hybrid vector for the textual and syntactic terms of any service:

4.3.1 Machine learning-based automatic classification of services Machine learning algorithms are described as either “supervised” or “unsupervised”. The distinction is drawn from how the learner classifies data. In unsupervised learning algorithms, there are no predefined labels associated with the data for, e.g. clustering, self organizing maps. In supervised algorithms, the classes are predetermined and the machine learner’s task is to search for patterns and construct mathematical models. These models are then evaluated on the basis of their predictive capacity in relation to measures of variance in the data itself. In our approach, the semantically enriched vectors of all five representations i.e.Omiotis_T ext, Omiotis_Syn, Omiotis_Sem, Omiotis_T ext_Syn and Omiotis_T ext_Sem are passed on to the classification phase as input data and the machine learning-based classification algorithms are then applied. In classification phase, the category vector contains Categor y_ V ector = {Categor y1 , Categor y2 , Categor y3 , . . . , Categor ym } having m domains, which are used to classify the services in the repository. The service training set consists of a set of j ordered pairs (Sem_Ser v1 , Categor y1 ), (Sem_Ser v2 , Categor y2 ), (Sem_Ser v3 , Categor y3 ), . . . , (Sem_Ser v j , Categor y j ), each Sem_Ser vi here represents the semantic vector of the Web service i and Categor yi here represents the preclassified label of the category assigned by the human experts to that service. The service test data is another set of semantically enriched vectors without having any formally assigned class labels, i.e. (Sem_Ser v j+1 , Sem_ Ser v j+2 , Sem_Ser v j+3 , . . . , Sem_Ser v j+l ). The representation of the training set and test set is shown in Table 2, where columns represent semantically enriched services and

123

SOCA Table 2 Training set and test set representation

T raining set Sem_Ser v1

T estset ···

Sem_Ser v j

Sem_Ser v j+1

···

···

Sem_Ser v j+l

Categor y1

member1,1

···

···

member1, j

member1, j+1

···

···

member1, j+l

···

···

···

···

···

···

···

···

···

Categor yi

memberi,1

···

···

memberi, j

memberi, j+1

···

···

memberi, j+l

···

···

···

···

···

···

···

···

···

Categor ym

memberm,1

···

···

memberm, j

memberm, j+1

···

···

memberm, j+l

rows represent categories. The cell of the figure indicates the category of the services. memberi, j = 1, if Sem_Ser v j belongs to Categor y j memberi, j = 0, if Sem_Ser v j does not belongs to Categor y j In our implementation, the classification of services into domain-specific classes is performed using the SVM and kNN classification algorithms, which give classification inferences based on the training set. Finally the service is assigned a category based on the inference made by the classifier. In this way, the service having semantically similar profiles will belong to the same category. This automatic classification approach reduces the human intervention in assigning a class or choosing the appropriate class for a service, thus improving the service discovery process. For discovery of any relevant service, appropriate class for the incoming query is found and all the services present in that class are matched with the query to discover the most relevant services. The cosine similarity between the semantic vector of the query and services in the recommended class is calculated, and if this matching degree is better than or equal to the minimum threshold specified by the user, then that service will be considered as potentially relevant and it will be returned to the user else it will be dropped. The proposed approach helps in the efficient classification and thus semantic discovery of the Web services. For example: most of the service advertisements from the automobile sales domain vary in concepts such as vehicles, automobiles, car, motorcycle, bike and price or cost. The concept “automobile” is semantically similar to other concepts like “vehicle”, “car”, “motorcycle” and “bike”. Similarly, the “cost” concept is also semantically related to “price”. In absence of automatic classifier, these services will be classified manually, but irrespective of different terms they should all be placed in same category as they all share the same semantic space. In our proposed approach, the end users like service providers and administrator will submit the service profiles to the automatic classifier. The classifier will preprocess the incoming service to extract the terms and will transform the service profile in a semantic vector using Omiotis measure of

123

···

semantic relatedness and ontology reasoner. Based upon the automatic classifier, appropriate category will be deduced for the semantic vector of the service that has to be classified. This helps in attaining globally consistent decision irrespective of end users.

5 Empirical evaluation and results For implementation and validation of our approach, we used the OWL-S service retrieval test collection Owls-TC v2. The collection Owls-TC v2 is available as open source at [26]. Any Web service specification language that provides required information about a Web service could be used for Web service description. We choose the widely used language OWL-S, which is proposed by W3C. After preprocessing, the binary matrices for all the five representations, i.e. T ext, Syn, Sem, T ext_Syn and T ext_Sem were created. The terms/concepts were extracted, and the different term vectors T ext_T ext (456), T er m_Syn(395), T er m_Sem (395), T er ms_T ext_Syn(851) and T ext_T ext_Sem(851) were generated, respectively, for each representation. Further, the semantic relatedness matrices Semantic_Matri x_ T ext (851, 851), Semantic_Matri x_Syn(395, 395), Semantic_Matri x_Sem(395, 395), Semantic_Matri x_T ext_ Syn(851, 851) and Semantic_Matri x_T ext_Sem(851, 851) are calculated. For calculating semantic relatedness values between every term pair, we used the freely available Omiotis MSR [29]. Once the semantic relatedness matrices between the term pairs has been calculated, the semantic vectors are generated by integrating the binary values of T ext, Syn, Sem, T ext_Syn and T ext_Sem matrices with the Omiotis-based semantic relatedness matrices for creating semantic vectors. Finally, the product of the semantic relatedness matrix and the service vectors for each service is calculated where each row represents a semantic vector for the service. Matlab [28] toolkit is used for the semantic integration of statistical and semantic matrix to generate semantically enriched matrices, i.e. Omiotis_T ext, Omiotis_Syn, Omiotis_Sem, Omiotis_T ext_Syn, Omiotis_T ext_Sem. These semantically enriched representation vectors of services are passed

SOCA Table 3 Comparison of performance of SVM and kNN classifiers using the proposed approach Representation/classifier

kNN Accuracy (%)

SVM Weighted_ mean_ recall (%)

Weighted_ mean_precision (%)

Accuracy (%)

Weighted_ mean_recall (%)

Weighted_ mean_precision (%)

T ext

92.05

87.21

93.70

94.54

93.27

94.38

Omiotis_T ext

93.84

90.05

94.20

94.54

93.22

94.20

Syn

84.41

81.28

87.64

94.04

89.88

96.54

Omiotis_Syn

90.07

89.05

91.71

94.04

91.41

93.60

Sem

91.36

89.11

87.76

96.82

96.75

96.29

Omiotis_Sem

93.54

91.97

94.83

96.13

95.18

95.21

T ext_Syn

92.06

89.29

92.72

95.63

94.40

95.57

Omiotis_T ext_Syn

94.03

91.85

95.12

95.93

95.90

96.27

T ext_Sem

92.95

90.25

93.55

97.02

96.51

96.74

Omiotis_T ext_Sem

94.83

92.82

96.14

97.22

97.38

96.89

to the machine learning-based classifiers. Service profiles have been classified using Support Vector Machine (SVM) and k-Nearest Neighbor (kNN) classifiers. Rapid Miner [27] facilitates the inbuilt implementation of the kNN algorithm and the SVM algorithm. The performance of the classification on the semantically enriched service vectors is evaluated by creating kNN and SVM classifiers. The kNN algorithm was executed with value of k = 3. To evaluate our proposed approach, we have used three performances measures, viz. accuracy, weighted_mean_recall and weighted_mean_precision. For validating the procedure, we have used 10-fold Cross validation. The performance details of the two classifiers for different representations of service vectors are depicted in Table 3. 5.1 Discussion The proposed approach has been evaluated on two classifiers. The performance of both the classifiers is depicted below: 5.1.1 Performance of kNN classifier The proposed approach was evaluated for all the five representations of the services, and the classification performance measures in terms of accuracy, weighted_mean_recall and weighted_mean_precision were calculated. The classification accuracy of the proposed approach for all representations was higher than its counter baseline representations. Initially, the T ext representation of services was taken into consideration, and its classification accuracy, weighted_mean_recall and weighted_mean_precision were compared with Omiotis_T ext. After enriching the Omiotisbased semantic information, the performance measures for T ext representation improved by 1.79 % in terms of accuracy, 2.84 % in terms of weighted_mean_recall and 0.50 %

in terms of weighted_mean_precision. Similarly, by integrating the Omiotis-based information to the T ext_Syn representation of the service profiles, the classification accuracy, weighted_mean_recall and weighted_mean_precision improved by 1.79, 2.84 and 0.50 %, respectively. The classification accuracy further improved by incorporating the logical subsumption reasoning and Omiotis-based semantic information to the service profiles in case of Omiotis_T ext_Sem. The Omiotis_T ext_Sem outperformed all the other representations with 92.82 and 96.14 % in terms of weighted_mean_recall and weighted_mean_ precision. This approach gave a clear improvement of 1.88, 2.57 and 2.59 % in terms of classification accuracy, weighted_mean_ recall and weighted_mean_precision over its baseline counterpart. 5.1.2 Performance of SVM classifier After enriching the Omiotis-based semantic information, highest weighted_mean_recall and weighted_mean_precision of 97.38 and 96.89 % respectively, were attained for the Omiotis_T ext_Sem representation that outperformed all other representations of the services. Thus, it can be concluded that by combing text, logical reasoning and Omiotisbased semantic information the service classification can be improved. The accuracy comparison of the baseline and the proposed approach for kNN and SVM classifiers are depicted in Fig. 5. It is evident from Fig. 5 that a high accuracy of 94.83 % is obtained in the case of Omiotis_T ext_Sem representation for kNN. An accuracy of 97.22 % is obtained in the case of Omiotis_T ext_Sem representation for SVM classifier. The results achieved clearly show that by combining textual description, service signatures, subsumption reasoning

123

SOCA

Fig. 5 Comparative view of accuracy of kNN and SVM for baseline versus proposed approach

Fig. 6 Accuracy comparison of kNN and SVM classifiers

and integration of Omiotis-based information, better classification accuracy can be achieved. Further, the proposed approach outperformed all the representations by having the better accuracies for different service representations. We compare the classification accuracy of kNN and SVM as shown in Fig. 6. From this figure it can be deduced that SVM classifiers give better accuracy in comparison with kNN. The proposed automatic classifier will determine the most suitable categories based upon the required functionality where the service providers, repository administrators and consumers will be able to find the service categories in a consistent and better manner, hence enabling the discovery and reusability of the existing services.

6 Conclusion and future work In this paper, a semantic approach for automatic classification of Web services is proposed. It involves machine

123

learning, data mining, logical reasoning and measures of semantic relatedness. The proposed hybrid approach tries to improve the classification accuracy by merging the semantic information to the service profiles and transforms the Web service profiles into Omiotis-based semantically enriched vector representation which are used as input to the machine learning-based classifiers. For experimental evaluation and validation of results, supervised machine learning-based support vector machines (SVM) and k-Nearest Neighbor (kNN) classifiers have been used. The proposed approach has been evaluated for different data representation models of OWLS service test collection dataset. Empirical evaluation has shown that a high Web service classification accuracy of 94.83 % for kNN and 97.22 % for SVM classifier can be obtained by using the proposed approach. Thus, by automatically assigning an appropriate category to a service, human intervention in classification process can be minimized and further discovery can be improved. In future, we are planning to implement the proposed approach for other service description languages.

References 1. Dashofy E, Hoek M, Taylor A (2001) A highly-extensible, XMLbased architecture description language software architecture. In: Proceedings of working IEEE/IFIP conference on software architecture, Amsterdam, pp 103–112 2. Curbera F, Duftler M, Khalaf R, Nagy W, Mukhi N, Weerawarana S (2002) Unraveling the Web services web: An introduction to SOAP, WSDL, and UDDI. IEEE Internet Comput 6(2):86–93 3. World Wide Web Consortium (W3C), Web Services Description Language (WSDL) 1.1. (2001). http://www.w3.org/TR/wsdl/ 4. UDDI Technical White Paper (2000). http://www.uddi.org/pubs/ Iru-UDDI-Technical-White-Paper.pdf

SOCA 5. Corella M, Castells P (2006) A heuristic approach to semantic web service classifications. In: Proceedings of the 10th international conference on knowledge-based and intelligent information & engineering systems, LNAI 4253, Springer, pp 598–605 6. Tsatsaronis G, Varlamis I, Vazirgiannis M (2010) Text relatedness based on a word thesaurus. Artif Intell Res 37:1–39 7. Nasir J, Karim A, Tsatsaronis G, Varlamis I (2011) A knowledgebased semantic kernel for text classification. In: Proceedings of the 18th international conference on string processing and information retrieval, pp 261–266 8. Patil A, Oundhakar S, Sheth A, Verma K (2004) METEOR-S Web Service annotation framework, In: Proceedings of the 13th international conference on World Wide Web, ACM Press, pp 553–562 9. Duo Z, Zi L, Bin X (2005) Web service annotation using ontology mapping. In: IEEE international workshop on service-oriented system engineering, pp 235–242 10. Oldham N, Thomas C, Sheth A, Verma K (2004) METEOR-S Web Service annotation framework with machine learning classification, In: Semantic web services and web process composition, vol 3387 LNCS, CA, 2004, Springer, pp 137–146 11. Heb A, Kushmerick N (2003) Learning to attach semantic metadata to web services. In: International semantic web conference, vol 2870 of LNCS, Springer, 2003, pp 258–273 12. Corella M, Castells P (2006) Semi-automatic semantic-based web service classification, In: Business process management workshops, vol 4103 of LNCS, Vienna, Austria, Springer, pp 459–470 13. Batra S, Bawa S (2011) Semantic discovery of web services using principal component analysis. Phys Sci 6(18):4466–4472 14. Crasso M, Zunino A, Campo M (2008) AWSC: An approach to Web services classification based on machine learning techniques. CONICET, Inteligencia Artificial, RevistaIberoamericana de Inteligencia Artificial 12(37):25–36 15. Mohanty R, Ravi V, Patra M (2012) Classification of web services using Bayesian network. Softw Eng Appl 5:291–296 16. Katakis I, Meditskos G, Tsoumakas G, Bassiliades N, Vlahavas I (2009) On the combination of textual and semantic descriptions for automated semantic web service classification. In: Artificial intelligence applications and innovations III, IFIP, vol 296, pp 95– 104 17. Kaur I, Hornof A (2005) A comparison of LSA wordnet and PMI for predicting user click behaviour. In: Proceedings of the SIGCHI conference on human factors in computing, pp 51–60 18. Bouma G (2009) Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of GSCL, pp 31–40 19. Cilibrasi R, Vitanyi P (2004) The google similarity distance. ArXiv.org or Clustering by Compression. IEEE Trans Inf Theory 51(4):1523–1545 20. Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th international joint conference on artificial intelligence (IJCAI), Hyderabad, pp 1606–1611 21. Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th international joint conference on artificial intelligence, vol 1. Morgan Kaufmann Publishers Inc., San Francisco, pp 448–453 22. Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the fifteenth international conference on machine learning (ICML ’98), San Francisco, pp 296–304 23. Jiang J, Conrath W (1997) Semantic similarity based on corpus statistics and lexical Taxonomy, In: Proceedings of international conference on research on computational linguistics, pp 90–98 24. Guo G, Wang H, Bell D, Bi Y, Greer K (2004) An kNN model approach and its applications in text categorization, vol 2945 of LNCS, Springer, pp 559–570 25. Hearst M, Dumais S, Osman E, Platt J, Scholkopf B (1998) Support vector machines. Intell Syst Appl IEEE 13(4):18–28

26. 27. 28. 29.

http://projects.semwebcentral.org/projects/owls-tc/ http://rapid-i.com/ http://www.mathworks.in/ http://Omiotis.hua.gr/WebSite/wsinfo.html

Shailja Sharma is working as System Analyst in Kurukshetra University, Kurukshetra. She did B.Tech. from Kurukshetra University, Kurukshetra, and M.E. in Computer Science and Engineering from Thapar University, Patiala. She is a research scholar in the area of Semantic Web services from the Department of Computer Applications, National Institute of Technology, Kurukshetra. Her research interests are focused on Semantic Web, Web services, Data Mining and Machine Learning. JagdeepSingh Lather is working as a Professor at Department of Electrical Engineering and Computer Applications, National Institute of Technology, Kurukshetra. He holds a B.Tech., M.Tech. and doctorate degree in Electrical Engineering. He has over 15 years of experience in teaching and research. His areas of interests are Robust Control, Flexible AC Transmission systems, Mining Techniques and Decision Trees, ANN and Fuzzy Logic. He is lifetime member of Scientific and Professional Societies, ISTE, India. Prof. Mayank Dave received B.Tech. degree from Aligarh Muslim University, Aligarh, India, in 1989, and M.Tech. degree in Computer Science and Technology and the Ph.D. degree from IIT Roorkee, India, in 1991 and 2002, respectively. He is currently a Professor in the Department of Computer Engineering at National Institute of Technology Kurukshetra (NIT Kurukshetra), India, with over 24 years of experience in teaching and research. He has published approximately 130 research papers in various international/national journals and conferences. His research interests include mobile adhoc and sensor networks, cyber security, cloud computing, software engineering. He is a senior member of IEEE, and life member of the IETE, Computer Society of India, and Institution of Engineers (India).

123

Suggest Documents