Enhancing service discovery using cat swarm optimisation based web ...

6 downloads 657 Views 413KB Size Report
the natural language documentation contained within a service description. ... the ability of a Web service search engine to retrieve the most relevant service.
+Model PISC-353; No. of Pages 3

ARTICLE IN PRESS

Perspectives in Science (2016) xxx, xxx—xxx

Available online at www.sciencedirect.com

ScienceDirect journal homepage: www.elsevier.com/pisc

Enhancing service discovery using cat swarm optimisation based web service clustering夽 Sunaina Kotekar, Sowmya S. Kamath ∗ Department of Information Technology, National Institute of Technology Karnataka, Surathkal, India Received 20 February 2016; accepted 9 June 2016 Available online xxx

KEYWORDS Web service discovery; WSDL; CSO; Clustering; Swarm intelligence

Summary Web service discovery is a critical task in service oriented application development. Due to extensive proliferation in the number of available services, it is challenging to obtain all the relevant services available for a given task. For the retrieval of most relevant Web services, a user would have to use those service-specific terms that best describe and match the natural language documentation contained within a service description. This process can be time intensive, due to functional diversity of available services in a repository. Domain specific clustering of Web Services based on the similarities of their functionalities would greatly boost the ability of a Web service search engine to retrieve the most relevant service. In this paper, we propose a novel technique to cluster service documents into functionally similar service groups using the Cat Swarm Optimisation Algorithm. We present experimental results that show that the proposed technique was effective and enhanced the process of service discovery. © 2016 Published by Elsevier GmbH. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Introduction Web services (WS) are client and server applications that communicate over standard Web protocols like HTTP and HTTPS. The main components of web service architecture

夽 This article belongs to the special issue on Engineering and Material Sciences. ∗ Corresponding author. Tel.: +91 9741799088. E-mail addresses: [email protected] (S. Kotekar), [email protected] (S.S. Kamath).

are provider, consumer and a service broker like the UDDI (Universal Description and Discovery Integration). In UDDI, the service descriptions in WSDL (Web Service Description Language) format, which describe functionality of particular WS are available. The task of searching for relevant WS for a given requirement is normally based on the service name and natural language description. But as per many studies (Elgazzar et al., 2010), most of the services may not have well-described natural language documentation. To overcome this limitation, text mining techniques can be applied on WSDL to identify useful components, which describe actual functionality of the corresponding WS. Using

http://dx.doi.org/10.1016/j.pisc.2016.06.068 2213-0209/© 2016 Published by Elsevier GmbH. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/ licenses/by-nc-nd/4.0/).

Please cite this article in press as: Kotekar, S., Kamath, S.S., Enhancing service discovery using cat swarm optimisation based web service clustering. Perspectives in Science (2016), http://dx.doi.org/10.1016/j.pisc.2016.06.068

+Model PISC-353; No. of Pages 3

ARTICLE IN PRESS

2

S. Kotekar, S.S. Kamath

this functionality related information, WSDLs can be clustered so as to achieve search space reduction during the process of service discovery and selection (Zhu et al., 2012). In document clustering, a chosen algorithm plays major role. Traditional algorithms are K-means, Hierarchical Agglomerative, Suffix Tree, etc. Ant Colony Optimisation (ACO), Particle Swarm Optimisation (PSO), Genetic Algorithms (GA) etc., are popular swarm intelligence based clustering algorithms (Abraham et al., 2008). In this paper, we adapted the a new approach called the Cat Swarm Optimisation (CSO) (Chu et al., 2006) algorithm for WS clustering, which is based on a cat’s social and foraging behaviour in nature. We applied the adapted CSO algorithm to a set of WSs to determine similar groups. The clustering accuracy of the CSO algorithm is compared with that of a traditional K-means basic clustering algorithm and the results are presented in this paper.

Proposed system

Figure 1

The proposed methodology encompasses the problem of extracting the functional information of services, and using this to automatically categorise a set of WSs in a domain specific manner. At first, a WSDL document is pre-processed using basic NLP techniques like stop-word removal, stemming to get natural language terms referred to as ‘Attributes’. Using a NLP technique TF-IDF (Term Frequency-Inverse Document Frequency) (Eqs. (1)—(3)), the nearness and dissimilarity between the documents are calculated using Euclidean distance (Eq. (4)). tf − idfi,j = tfi,j × idfj tfi,j =

number of times attribute j in documenti Total number of words in documenti

idfj = log N 

N

 d ∈ D : j ∈ D 

(1) (2)

(3)

The k-means algorithm and also the CSO algorithm are applied to the dataset and the documents are clustered using the computed similarity values. Finally, after a predefined condition is reached, the formed clusters are observed. Standard datasets chosen are Iris, Glass, Balance scale, Soybean small, Wine. Along with these datasets WSDL documents are taken from OWLS-TC4 and processed to create TF-IDF matrix along with their domain as classes.

K-means clustering In k-means algorithm, the number of clusters to be formed is given by k. In the beginning, randomly k documents are chosen as cluster centres. All documents are assigned to the nearest centre by calculating Euclidean distance between centre and document as per Eq. (4). The mean of all the documents in each cluster are found and the one with the least value is made the new cluster centre. Now, the documents are reassigned as per newly calculated value of Euclidean distance, and the process is continued till there

Flowchart of CSO algorithm.

are no more reassignments possible, i.e., stable clustering has been reached.

  n  2 d(x, y) = |x − y| =  (xi − yi )2

(4)

i=1

Cat swarm clustering The CSO (Santosa and Ningrum, 2009) algorithm consists of two sub-procedures based on live cat behaviour in nature while hunting pray, termed as the ‘‘seeking mode’’ and ‘‘tracing mode’’. In CSO, number of cats required within each iteration is initialised; each cat has a position of D dimensions, velocities for every dimension, a fitness worth, that shows the accommodation of the cat to the fitness operate, and a flag which identifies mode of cat(seeking/tracing). Ultimate resolution is the most effective position of one of the cats. CSO has to be applied till the best clustering is obtained, i.e. one with the least computed SSE (sum of squared errors) value. Fig. 1 presents the process of CSO, submodules of CSO are explained later. Seeking mode: Four fundamental aspect of Seeking mode are: seeking memory pool (SMP) number of cluster centre copy, self-position consideration (SPC) boolean random value 0 or 1, seeking range of the selected dimension (SRD) is mutative ratio in between [0.1], counts of dimension to change (CDC). 1. Define seeking mode specifications (SMP, SPC, and SRD). 2. For all cluster centre: SMP times replicate cluster centre position, Find j = SMP—SPC value, Determine shifting value (SRD * cluster centre). 3. m = 1, While (m less than j), do add or subtract shifting value to centres randomly. ((SMP × k) cluster centre candidates are produced.) 4. Determine distance, assign data to clusters, then find SSE. 5. Use roulette wheel selection method to choose a new cluster centre candidate.

Please cite this article in press as: Kotekar, S., Kamath, S.S., Enhancing service discovery using cat swarm optimisation based web service clustering. Perspectives in Science (2016), http://dx.doi.org/10.1016/j.pisc.2016.06.068

+Model

ARTICLE IN PRESS

PISC-353; No. of Pages 3

Enhancing service discovery using cat swarm optimisation based web service clustering Table 1

3

Purity of cluster formation for different datasets for k-means and CSO algorithms.

Dataset name

No of documents

Attributes

Classes

K-means purity (%)

CSO purity (%)

Iris Glass Balance scale Soybean small Wine WSDL documents

150 214 625 47 178 684

4 9 4 35 13 644

3 6 3 4 3 9

67 54 61 79 70 41

90 58 78 83 72 45

of WSDL documents, domain for which WS belongs is taken into account for calculating the accuracy. Domains were: ‘‘communication’’, ‘‘economy’’, ‘‘education’’, ‘‘food’’, ‘‘geography’’, ‘‘medical’’, ‘‘simulation’’, ‘‘travel’’ and ‘‘weapon’’ (Fig. 2 and Table 1)

k

Purity =

1

j

max1 (Document belong to each class) Total number of documents

(8)

Conclusion and future work

Figure 2 Purity comparison graph of different dataset kmeans vs CSO.

SSE =

k  

(||x − mi ||2 )

(5)

i=1 x ∈ Di

Tracing mode: Is the sub-model depicting cats while tracing pray. 1. For all cluster centres Update velocity (6), Update position (7), find new cluster centre. 2. Determine distance, assign data to clusters, then find SSE. Vk,d = vk,d + r1 × c1 (xbest,d − xk,d )

(6)

xk,d = xk,d + vk,d

(7)

where xbest,d implies cat position with best fitness value, xk,d implies catk position, c1 denotes constant and r1 refers to randomly generated value in between 0 and 1.

Experimental analysis and results The experiment was conducted to cluster the standard datasets chosen as well as the WSDL documents. The clustering purity or accuracy was calculated using Eq. (8), where k denotes #clusters and j is # classes. In the case

In this paper, an approach for categorising Web services to deal with their functional diversity was discussed. For clustering the services, both K-means and the bio-inspired CSO clustering algorithms were applied and their performance was compared for both standard datasets and WSDL dataset. Based on the results, it is evident that CSO performed better than K-means, as K-means stops when documents are stable in cluster, but in case of CSO tracing mode, the centres are changed randomly to check for better clusters. We intend to extend the proposed clustering methodology for optimising real time Web service search engines, for enhancing the time and precision related performance during Web service discovery.

References Abraham, A., Das, S., Roy, S., 2008. Swarm Intelligence Algorithms for Data Clustering. Soft Computing for Knowledge Discovery and Data Mining, pp. 279—313. Chu, S.C., Tsai, P.W., Pan, J.S., 2006. Cat Swarm Optimization, LNAI 4099, 3 (1). Springer-Verlag, Berlin/Heidelberg, pp. 854—858. Elgazzar, K., Hassan, A.E., Martin, P., 2010 July. Clustering WSDL documents to bootstrap the discovery of web services. The 8th IEEE International Conference on Web Services (ICWS’10), Miami, FL, pp. 147—154. Santosa, B., Ningrum, M.K.,2009. Cat swarm optimization for clustering. In: International Conference of Soft Computing and Pattern Recognition, 2009 (SOCPAR’09). IEEE. Zhu, J., Kang, Y., Zheng, Z., Lyu, M.R., 2012. A Clustering-Based QoS Prediction Approach for Web Service Recommendation. IEEE Paper.

Please cite this article in press as: Kotekar, S., Kamath, S.S., Enhancing service discovery using cat swarm optimisation based web service clustering. Perspectives in Science (2016), http://dx.doi.org/10.1016/j.pisc.2016.06.068

Suggest Documents