Noname manuscript No. (will be inserted by the editor)
Service Discovery Acceleration with Hierarchical Clustering Zijie Cong · Alberto Fernandez · Holger Billhardt · Marin Lujak
This is the accepted manuscript version. The final publication is available at link.springer.com DOI: http://dx.doi.org/10.1007/s10796-014-9525-2
Abstract This paper presents an efficient Web Service Discovery approach based on hierarchical clustering. Conventional web service discovery approaches usually organize the service repository in a list manner, therefore service matchmaking is performed with linear complexity. In this work, services in a repository are clustered using hierarchical clustering algorithms with a distance measure from an attached matchmaker. Service discovery is then performed over the resulting dendrogram (binary tree). In comparison with conventional approaches that mostly perform exhaustive search, we show that service-clustering method brings a dramatic improvement on time complexity with an acceptable loss in precision.
Work supported by the Spanish Government through grants TIN2009-13839-C03-02 (co-funded by Plan E), CSD2007-0022 (CONSOLIDER-INGENIO 2010) and TIN2012-36586-C03-02. Z. Cong Artificial Intelligence Research Group University Rey Juan Carlos, Mostoles, Spain Calle Tulipan s/n E-mail:
[email protected] A. Fernandez Artificial Intelligence Research Group University Rey Juan Carlos Calle Tulipan s/n, Mostoles, Spain E-mail:
[email protected] H. Billhardt Artificial Intelligence Research Group University Rey Juan Carlos, Mostoles, Spain Calle Tulipan s/n E-mail:
[email protected] M. Lujak Artificial Intelligence Research Group University Rey Juan Carlos, Mostoles, Spain Calle Tulipan s/n E-mail:
[email protected]
Keywords Service discovery · Hirerachical clustering · Service matchmaking
1 Introduction With the recent increase in popularity and attention from both the scientific and industrial communities, systems that are designed based upon Service Oriented Architectures pattern (SOA) are widely visible in practice. This category of system design principles is based on loose-coupling functionality units termed “service”. To facilitate the routine tasks in SOA, combined research efforts have contributed standardized service description languages [2, 18, 13, 3, 23], automated service discovery tools [11, 22, 12, 10, 6] and service composition approaches [27]. Among various SOA operations, web service discovery is an integral link. Despite the fact that contemporary service matchmakers have achieved a remarkable precision and recall measure scores [1], efficiency of matchmaking is usually not a critical concern. Conventional approaches perform rudimentary comparison between a service query and each service advertisement registered in the service repository. Linear complexity is experienced in such cases, along with the growth of the size of service repository, the resulting query response time becomes hardly acceptable. Sophisticated semantic matchmaking mechanisms often prolong the matching operation, which worsen this situation. An efficient matchmaking method is thus required for service-based system to achieve scalability and practicality. An adequate example would be a large open multiagent system where a significantly large number of services exist to describe agents’ capabilities. In this paper, an approach that organizes services registered in repository into clusters is presented. Services are clustered using hierarchical clustering algorithms based on
2
Zijie Cong et al.
the chosen distance measure. This measure should be identical to the measure used by the matchmaker attached to the service directory. The resulting dendrograms produced by hierarchical clustering are binary trees. Service discoveries are then performed over binary service tree instead of linear service list, therefore, reducing the time complexity. Additional benefits of the proposed method is its independence from the matchmaker and the service description language. The rest of the paper is organized as follows: In section 2, related works are introduced, this includes introductions to service matchmaking, clustering techniques applied to service discovery and related fields. Section 3 briefly introduces a neutral service description model that is used throughout this paper and a common distance measure. Section 4 presents the main approach of this paper in details with an example presented in Section 5. Experiments descriptions and evaluations are presented in Section 6. Finally, Section 7 concludes this work. 2 Related Works Since this work is a combination of efforts from various research fields, related techniques and background information are briefly introduced in this section, including Web Service description, discovery and document clustering.
[13] is such an effort that brings semantic annotation into plain, existing WSDL documents. To achieve precise service discovery, matchmakers that process both syntactic and semantic information have been proposed. Common Semantic Web Service matchmakers evaluate the subsumption relations of inputs and outputs between a service request and a service advertisement to determine the degree of match (DOM). In [21], authors have proposed four DOMs of semantic matching. exact, plug-in, subsumes and fail. Semantic inputs and outputs are matched based on their ontological subsumption relation. This measure has since been widely adopted by various matchmaker implementations[11, 12, 6]. In practice, matchmakers usually have hybrid implementation, i.e. the final degree of match is an aggregated score of both semantic and syntactic information. While most of the previously mentioned matchmakers perform exhaustive search in service repository, the concerns and solutions about the efficiency of matchmakers are, explicitly or implicitly, presented in the literature. An example is the use of text search engine, such as Lucene, in semantic web service matchmakers, e.g. in [12], inverted index of syntactic information (tags) are used to optimize the query speed.
2.1 Web Service Description and Discovery Web Service description has gone through a rapid development in the last decade. Various Web Service description language and underlying models have been proposed from both industrial and academic entities. One of the earliest effort on standardizing Web Service description resulted in a W3C1 recommendation named Web Service Description Language (WSDL) [2]. WSDL provides syntactic information of the functionalities of a service with low-level message exchange descriptions. Since this model is designed primarily as a humanreadable document, service discovery mechanisms built for WSDL are mostly based on Information Retrieval techniques such as TF-IDF/Cosine Similarity[17]. Semantic annotations were proposed to enhance service discoverability, invocability and compatibility in fully automated environment. The two most well-known Semantic Web Service description approaches are Semantic Markup for Web Services (OWL-S) [18] and Web Service Modeling Ontology (WSMO) [3]. Both approaches are ontology based models and have theirs grounding in WSDL2 . Lightweight semantic approaches exist as well in standards and practice. Semantic Annotation for WSDL (SAWSDL) 1 2
World Wide Web Consortium Though other formats are technically acceptable as well
2.2 Document Clustering Essentially, web service discovery based on service description is a special case of document retrieval task. Researchers in service discovery usually can benefit from the mature research results of information retrieval (IR) field. Document clustering [14, 8, 24] is a common technique in IR for grouping documents into topics or organizing search engine results. Document clustering usually requires no training samples to be provided, the general goal of document clustering is to partition a set of unlabeled documents into a number of clusters through an unsupervised process. This process groups certain documents into one cluster based on a distance measure. Common distance measures include Euclidean distance, Manhattan distance, Edit distance etc. Undoubtedly, efforts of clustering web services can be found, such as [4, 7, 16]. The approach presented in [4] performs centroid-based clustering (Quality Threshold) over service directory and uses these clusters as answers to service query, no explicit performance evaluation was presented in this work. Those works emphasized on the feature extraction for non-semantic web service descriptions and discovery precision.
Service Discovery Acceleration with Hierarchical Clustering
3 Service Description and Similarity Measures
3
Function conDist (Equation 1) computes the distance between two elements, C1 and C2 , in a given ontology O as defined in [15].
Despite the fact that there exist a number of Web Service description models as mentioned in the previous section, in ( this paper, a neutral service description model is employed 0 if C1 = C2 to facilitate the presentation of the rest of the work. It should conDistO (C1 ,C2 ) (1) eβ h −e−β h −αl 1−e otherwise be pointed out that the approach presented in this work is ineβ h −e−β h dependent from the description model and the matchmaking Where α ≥ 0 and β ≥ 0 are parameters scaling the constrategies, i.e. there is no explicit reason to choose one detribution of the shortest path length (l) between the two conscription model or matchmaker over another. cepts and the depth (h) of the common subsumer in the conThis neutral model is named GCM [6], it captures varicept hierarchy, respectively. ous components commonly found in other service descripThe components distance is defined as the average distion models, ranging from semantically annotated inputs/outputs tance of best-matched concepts from both components. Calto natural language text description. It currently covers the culated by Equation 2. following components: Semantic Inputs/Outputs Semantically annotated inputs and outputs are found in OWL-S, SAWSDL, WSMO-Lite, etc. These components are usually associated with ontological concepts and represented by URIs. Syntactic Inputs/Outputs Syntactic input/outputs are found in WSDL, represented by IDs of XML Schema elements, usually in nouns or a noun phrases. Text in Natural Language Textual description can be found explicitly in OWL-S, WSMO or human agents service query description. Syntactic Tags Syntactic Tags(keywords) can be provided directly by service provided or extracted from textual description. Preconditions/Effects Preconditions and effects can be provided by highly expressive service description models such as WSMO and OWL-S. Category Categorization of a service can be given by its provider according to certain service classification system e.g. North American Industry Classification System (NAICS) [20]. In this work, semantic/syntactic IOs and syntactic tags are used for distance computation since services involved in the experiments contains no other information. Comparing to complete models e.g. OWL-S and WSMO, GCM is lightweight and simple to manipulate due to its disregard of the process model (flow-control) and the grounding information. The distance between two services is a weighted sum of the distance of each component (a collection of concepts). Even though GCM contains both semantic and syntactic information, the distance of both types is computed based on semantic subsumption relation. Domain ontologies are naturally used in case of semantic components and lexical database WordNet [19] is used as an ontology in case of syntactic information.
∑
min(conDist(σ , τ ∈ P2 ))
σ ∈P1
+ cDist(P1 , P2 ) =
∑ min(conDist(τ, σ ∈ P1 )) τ∈P2
|P1 ∪ P2 |
(2)
Same as the choice of description model, any distance measure for Web Services can be used for clustering with two constraints: – Symmetric: Distance measure should be symmetric. – Fine-grained: To get a meaningful clustering result, finegrained score should be used instead of categorical degree of match as presented in [21]. Summation of distance scores are calculated by function serviceDist (equation 3), where distSynT , distSemI, distSemO, distSynI and distSynO denote distance score of syntactic tag, semantic inputs, semantic outputs, syntactic inputs, syntactic outputs between two services respectively using function dist. Weights of components have been set empirically. For that, we took into account that semantic IOs information is expected to provide more precise information than syntactic IOs. Also, syntactic tags are more important than syntactic IOs because they are obtained from explicit tags or a piece of textual description, which usually conveys more information about the functionality of the service. serviceDist =
0.30distSynT + 0.25distSemI +0.25distSemO + 0.10distSynI
(3)
+0.10distSynO
4 Clustering Web Services The main idea behind clustering services in a repository is to re-organize the linear structure into a tree structure; hence efficient tree-based search algorithm can be applied
4
Zijie Cong et al.
to service discovery. Since no natural order exists in service Complete Linkage descriptions, hierarchical clustering is used to achieve this Complete linkage criteria uses the distance of the furgoal. thest pair of services from two clusters as the distance Hierarchical clustering [28] is a clustering analysis method between two clusters designed for building a hierarchy of clusters. The fundamenLinkageC (A, B) = max serviceDist(a, b) tal idea behind the bottom-up or agglomerative hierarchical a∈A,b∈B clustering algorithm is to merge consecutively two similar clusters into one, until only a root is reached. The top-down Group Average Linkage or divisive hierarchical clustering is a reversed process, evGroup Average linkage uses the average distance of every step divide the service repository into two sets until no ery pair of services from two clusters as the distance bemore division can be made or a predefined depth is reached. tween two clusters 4.1 Agglomerative Clustering A naive algorithm of hierarchical clustering is straightforward: distance matrix M is pre-computed, M is a N × N matrix of pair-wise distance of items (services). In the first phase of clustering, each node is considered as one cluster, and then service pair with least linkage function (explained below) result is merged into one cluster. This process is repeated until only one root cluster is left. This algorithm is named Hierarchical Agglomerative Clustering and is shown is algorithm 1. input : Set of services S={s1 , s2 , ..., sn } output: Clusters C={c1 , c2 , ..., cm } C:= S; for i = 1 to n do for j = 1 to n do Mi, j = serviceDist(Ci, j ) end end while |M| 6= 1 do Merge cl , cq with min. serviceDist (cl , cq ) to cu ; cu is represented by the label of the cluster; Remove cl , cq from M; Add cu To M; end
LinkageG (A, B) =
1 ∑ ∑ serviceDist(a, b) |A| + |B| a∈A b∈B
Comparing to single linkage criteria, complete linkage tends to produce compact clusters with similar diameters [5], this is important for the search function to find the best match for a given query by avoiding local optimum. Group average linkage is often an ideal candidate for many applications, the main drawback of the criteria is the complication for the cluster labeling task.
Single Linkage
Complete Linkage
Group Average Linkage
Algorithm 1: HAC Algorithm Fig. 1: Three Linkage Criteria 4.1.1 Linkage Criteria One important part of hierarchical agglomerative clustering is the choice of the linkage criteria, i.e. which service is used from a cluster to represent the cluster in the distance calculation. In this work, we experimented with three different linkage criteria used for constructing the clusters (shown graphically in figure 1). Single Linkage Single linkage criteria uses the distance of the closest pair of services from two clusters as the distance between two clusters LinkageS (A, B) = min serviceDist(a, b) a∈A,b∈B
4.2 Hierarchical Divisive Clustering Unlike the hierarchical agglomerative clustering, divisive clustering starts from one cluster that contains all services in the repository. A flat-clustering algorithm is then used to divide the cluster in to k partitions, this process repeats until no further division is possible, i.e. one cluster contains only one service. In some circumstances divisive clustering produces more accurate hierarchies than bottom-up algorithms [25].
Service Discovery Acceleration with Hierarchical Clustering
5 input : Cluster S={s1 , s2 , ..., sn } output: Service cnetroid for i = 1 to n do for j = 1 to n do AverageDisti += serviceDist(si , s j ) end AverageDisti = AverageDisti /|S| end return centroid:=s with minimal AverageDist
One crucial concern in this clustering method is the choice of flat clustering algorithm, several candidates can be taken into consideration such as k-means clustering [9] and ExpectationMaximization(EM) clustering [26]. In this work, experiment with k-means clustering is implemented since it is the most common clustering algorithm. To generated a binary tree from a set of services, k is set to 2 in every step. The overall algorithm is presented in Algorithm 2 input : Services S={s1 , s2 , ..., sn } input : TreeNode* node p repeat Cen1 := Random service from S ; Cen2 := Random service from S ; until Cen1 6= Cen2; repeat Clus1 = {s ∈ S|serviceDist(s,Cen1) < serviceDist(s,Cen2)} Clus2 = {s ∈ S|serviceDist(s,Cen2) < serviceDist(s,Cen1}} Cen1 := CentroidSelection(Clus1); Cen2 := CentroidSelection(Clus2); until Cen1 and Cen2 remains unchanged; node p →left := Cen1 ; node p →right := Cen2 ; divisiveClustering(Clus1, new nodechild1 ) ; divisiveClustering(Clus2, new nodechildr ) ;
Algorithm 2: Divisive Clustering
4.3 Cluster Labeling Each cluster is assigned with a label, this is referred as cluster labeling in machine learning research. Among many existing cluster labeling techniques, a frequently method used in information retrieval is centroid labeling. Since service discovery can be considered essentially as a special case of information retrieval problem, same technique is adopted in this work. Centroid labeling selects one element (service) within the cluster as centroid to represent the entire cluster. The selection of the centroid is based on its average distance to other elements in the same cluster. Algorithm 3 shows the details of the realization. In agglomerative clustering, after the node merging operation, a label service should be picked from the services within the newly formed cluster. For divisive clustering, a centroid is picked by k-means algorithm during the division operation. 5 Example To explain the clustering process with better clarity, an example is provided in this section for hierarchical agglomer-
Algorithm 3: CentroidSelection
ative clustering. This example takes four services from the OWL-S test collection ver. 4 (OWLS-TCv4)3 . These four services are originally described in OWL-S. Before going through the clustering process, services are transformed into GCM descriptions (described in section 3). The four services used in this example are summarised below, including their inputs and outputs. – Service 1 : governmentweapon funding service – Input: SUMO.owl#Weapon – Input: SUMO.owl#Government – Output: SUMO.owl#Funding – Service 2 : governmentmissile funding reliableservice – Input: Mid-level-ontology.owl#Missile – Input: SUMO.owl#Government – Output: SUMO.owl#Funding – Service 3 : governmentmissile funding service – Input: Mid-level-ontology.owl#Missile – Input: SUMO.owl#Government – Output: SUMO.owl#Funding – Service 4 : governmentmissile financing service – Input: SUMO.owl#Weapon – Input: SUMO.owl#Government – Output: Mid-level-ontology.owl#Financing An example of GCM representation of Service 1, obtained by transforming OWL-S into GCM (details in [6]) is the following4 : Inputs = {< sumo : Government, government >, < sumo : Weapon, weapon >} Outputs = {< sumo : Funding, f unding >} Preconditions = Ø Effects = Ø Category = Ø Tag-cloud = {< research, 1 >, < types, 1 >, < government, 2 >, < f unding, 3 >, < weapon, 2 >} Text = ”This service returns the funding for research on the given weapon types provided by the given government.” Note that Inputs/Outputs are sets of pair of semantic (e.g. sumo:Government) and syntactic (e.g. government) descriptions. Tag cloud includes a set of pairs < keyword, f requency >, which was obtained automatically using information retrieval 3 4
http://projects.semwebcentral.org/projects/owls-tc/ prefix sumo=< htt p : //127.0.0.1/ontology/SUMO.owl# >
6
Zijie Cong et al.
techniques over the textual description and other fields (e.g. inputs/outputs). The distance matrix for this example is shown in Table 1; the values are calculated using distance measure described in section 3. Note that the matrix is symmetric since the distance function is symmetric.
Table 1: Example service distance Service 1
Service 2
Service 3
Service 4
Service 1
0,00000
0,02680
0,02828
0,03563
Service 2
0,02680
0,00000
0,00082
0,00327
Service 3
0,02828
0,00082
0,00000
0,00062
Service 4
0,03563
0,00327
0,00062
0,00000
In the initial phase, Service 3 and Service 4 are merged into a cluster (C1). Cluster C1 is then compared with S1 and S2, under complete linkage, C2 (Service 4 as cluster label) is then merged with Service 2 to form C2. Finally, with Service 1 merged with C2, C3 is created, the root cluster is found, the algorithm halt. The resulting dendrogram is shown in Figure 2
C3
C2
governmentweapon_fu nding_servce.owls
5.1 Service discovery over Dendrogram Service discovery over the resulting dendrogram of clustering is done in two steps. Firstly, the service request is compared with each level of cluster labels to find the appropriate path to follow, until this process reaches the leaf node of the dendrogram , best-matched service advertisement should be found. Then the service repository is ordered according the best matched service advertisement, starting from the best matched service, other services are attached based on their distance to the best matched service. A query example of the presented is depicted in figure 3, where dashed lines indicate the best-matched-node finding step, and the dot-dashed line indicates the ranked list generation step. Query result is (ordered list of four services) also presented in Figure 3.
6 Experiment and Evaluation To evaluate the performance of the proposed framework, experiments were carried out based on de facto standard test collection and tools. The test-collection used to perform the experiment is OWL-S test collection version 4. OWLS-TC is the most complete collection to test semantic Web service matchmaking algorithms and has been used in the Semantic Service Selection (S3) contest5 , which took place yearly from 20072012. This test collection contains 1083 web services described in OWL-S language and 42 queries. All services in OWLSTC contain semantic inputs, outputs, textual service description and 160 of them contain preconditions and effects described in SWRL and PDDL. Relevance sets of the test collection were produced based on human judgments and past S3 contest participants results. The experiment is performed to test three main criteria: – Precision/recall – Query response time – Average precision
governemntmissile_funding_r ealiableservice.owls
C1
The precision and recall values are defined as:
recall = governmentmissile_fun ding_servce.owls
governmentmissile_fin ancing_service.owls
Fig. 2: Dendrogram of the Example Services
|relevantservices ∩ retrievedservices| relavantservices
|relevantservices ∩ retrievedservices| retrievedservices Since the test-collection contains multiple queries and their relevance sets, macro averaging is adopted to produce the results for the complete series of test. The macro-averaged precision =
5
http://www-ags.dfki.uni-sb.de/~klusch/s3/
Service Discovery Acceleration with Hierarchical Clustering
7
C3
QUERY governmentmissile_fi nancing_service.owls
governmentweapon_fu nding_servce.owls
C2
QUERY RESULT governmentmissile_financing_service.owls governmentmissile_funding_servce.owls governemntmissile_funding_realiableservice.owls governmentweapon_funding_servce.owls governemntmissile_funding_r ealiableservice.owls
C1
governmentmissile_fun ding_servce.owls
governmentmissile_fin ancing_service.owls
Fig. 3: Query example Table 2: Summary of Average Precision Method Exhaustive Single Linkage Complete Linkage Group Average Linkage Divisive Clustering
Average Precision 0,646327712 0,598832021 0,590268042 0,607004631 0.313872513
6.1 Experiment results
Figure 4 illustrates the precision/recall result of five matchmaking strategies, the conventional exhaustive search in flat service directory, and the proposed methods over clustered service directory. Average precision (Table 2) for exhaustive matchmaker is 0.646, this value is used as a reference value. All three linkage criteria showed drop in average precision. The group precision computes the mean of precision values for answer average linkage clustering is the best-performed criteria among sets returned by a matchmaker for all queries in the test colthree with value of average precision 0.607. lection at equidistant standard recall levels (Recalli , 0 ≤ i ≤ The divisive clustering method has performed surprisλ) ingly worse than the agglomerative approaches. No evident explanation is found for this phenomena, a preliminary es1 would be that the quality of flat clustering algoPrecisionmacro (i) = ∑ max{P0 |R0 ≥ Recalli ∧(R0 , P0 ∈ 0q )}timation |Q| q∈Q rithm affects significantly the final result. Note thatk-means is a general method to create k clusters, in this case particuWhere Q is the set of request documents, Oq denotes the larised to two. That fact along with some randomness when set of observed pairs of recall and precision values for query selecting the centroids can be the reason for the low quality q when scanning the ranked services in the answer set for of clusters. the query stepwise for true positives. The average query response time has dramatically deThe average precision is calculated using: creased from 5620 milliseconds (exhaustive search) to 113 milliseconds (clustered). To demonstrate the advantage in time complexity of the clustering method, two existing match∑ q ∈ QPrecisionq AP = makers, SeMa2 and OWLS-iMatcher [10], are put into com|Q| parison in Figure 5. All experiments were performed on a personal computer with Intel i5 processor, 2.60 GHz and 4 gigabytes of RAM. Time for creating the clusters (7 seconds without distance 7 Conclusion matrix computation, 14.5 minutes with distance matrix comThis paper presents a method for accelerating automated serputation) in the proposed method is not taken into comparivice discovery in SOA. Hierarchical clustering is used to son.
8
Zijie Cong et al.
0.8
0.6
0.4
Linear Search Complete Linkage Clustering Single Linkage Clustering
0.2
Group Average Linkage Clustering Divisive Clustering
0
0.2
0.4
0.6
0.8
1
Fig. 4: Precision vs. Recall graph of different clustering methods
Query Time(in milliseconds) 6000 5000 4000 3000 2000 1000 0 Exhaustive
OWLS-iMatcher
SeMa2
Clustered
Fig. 5: Query time using different matchmakers
organize flat service directory registrations into tree structure and the search is performed over it. The benefits of this approach include the dramatic reduction of time complexity. Additional advantage, comparing to other service discovery acceleration approaches is that this approach is independent from concrete distance metrics (matchmaking algorithm) and service description model, hence most existing matchmakers can be applied to the proposed framework with little or no modification. Experiments with various clustering methods were tested. The expriment results have shown that hierarchical agglomerative clustering produces acceptable precision vs. recall
level with much higher efficiency. On the other hand, divisive clustering has shown an obvious drop in precision vs. recall level. The drop in slight precision for HAC is possibly due to the fact that most matchmakers produce asymmetric similarity score, since most of the existing matchmakers have employed Paolucci’s Degree of Match definition. In clustering, distance matrix is always symmetric. Asymmetric clustering is an ongoing research with little practical algorithms. To prove this theory, state-of-art clustering method must be used instead of conventional clustering techniques; this will be part of the future work. The future work will also include embedding other existing matchmakers to examine the effectiveness of proposed method. Experimenting with different linkage criteria, divisive clustering strategies and advanced cluster labeling techniques will also be performed. References 1. Blake, B., Cabral, L., K¨onig-Ries, B.: Semantic Web Services: Advancement Through Evaluation. Springer (2012) 2. Christensen, E., Curbera, F., Meredith, G., Weerawarana, S., et al.: Web services description language (wsdl) 1.1 (2001) 3. De Bruijn, J., Lausen, H., Polleres, A., Fensel, D.: The web service modeling language wsml: An overview. In: The Semantic Web: Research and Applications, pp. 590–604. Springer (2006) 4. Elgazzar, K., Hassan, A.E., Martin, P.: Clustering wsdl documents to bootstrap the discovery of web services. In: Web Services (ICWS), 2010 IEEE International Conference on, pp. 147–154. IEEE (2010)
Service Discovery Acceleration with Hierarchical Clustering 5. Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Hierarchical clustering. Cluster Analysis, 5th Edition pp. 71–110 (2001) 6. Fern´andez, A., Cong, Z., Balt´a, A.: Bridging the gap between service description models in service matchmaking. Multiagent and Grid Systems 8(1), 83–103 (2012) 7. Fernandez, A., Hayes, C., Loutas, N., Peristeras, V., Polleres, A., Tarabanis, K.: Closing the service discovery gap by collaborative tagging and clustering techniques. In: 7th International Semantic Web Conference, ISWC, pp. 115–128 (2008) 8. Fung, B.C., Wang, K., Ester, M.: Hierarchical document clustering using frequent itemsets. In: Proceedings of SIAM international conference on data mining, pp. 59–70 (2003) 9. Hartigan, J.A., Wong, M.A.: Algorithm as 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28(1), 100–108 (1979) 10. Kiefer, C., Bernstein, A.: The creation and evaluation of isparql strategies for matchmaking. In: The Semantic Web: Research and Applications, pp. 463–477. Springer (2008) 11. Klusch, M., Fries, B., Sycara, K.: Automated semantic web service discovery with owls-mx. In: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems, pp. 915–922. ACM (2006) 12. Klusch, M., Kapahnke, P.: isem: Approximated reasoning for adaptive hybrid selection of semantic services. In: The semantic web: Research and applications, pp. 30–44. Springer (2010) 13. Kopecky, J., Vitvar, T., Bournez, C., Farrell, J.: Sawsdl: Semantic annotations for wsdl and xml schema. Internet Computing, IEEE 11(6), 60–67 (2007) 14. Larsen, B., Aone, C.: Fast and effective text mining using lineartime document clustering. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 16–22. ACM (1999) 15. Li, Y., Bandar, Z.A., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. Knowledge and Data Engineering, IEEE Transactions on 15(4), 871–882 (2003) 16. Liu, W., Wong, W.: Web service clustering using text mining techniques. International Journal of Agent-Oriented Software Engineering 3(1), 6–26 (2009) 17. Manning, C.D., Raghavan, P., Sch¨utze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press Cambridge (2008) 18. Martin, D., Paolucci, M., McIlraith, S., Burstein, M., McDermott, D., McGuinness, D., Parsia, B., Payne, T., Sabou, M., Solanki, M., et al.: Bringing semantics to web services: The owl-s approach. In: Semantic Web Services and Web Process Composition, pp. 26–42. Springer (2005) 19. Miller, G.A.: Wordnet: a lexical database for english. Communications of the ACM 38(11), 39–41 (1995) 20. NAICS-Association, et al.: Naics-north american industry classification system (2003) 21. Paolucci, M., Kawamura, T., Payne, T.R., Sycara, K.: Semantic matching of web services capabilities. In: The Semantic WebISWC 2002, pp. 333–347. Springer (2002) 22. Pedrinaci, C., Liu, D., Maleshkova, M., Lambert, D., Kopecky, J., Domingue, J.: iserve: a linked services publishing platform. In: CEUR workshop proceedings, vol. 596 (2010) 23. Sheth, A., Verma, K., Miller, J., Rajasekaran, P.: Enhancing web service descriptions using wsdl-s. Research-Industry Technology Exchange at EclipseCon pp. 1–2 (2005) 24. Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 208–215. ACM (2000) 25. Steinbach, M., Karypis, G., Kumar, V., et al.: A comparison of document clustering techniques. In: KDD workshop on text mining, vol. 400, pp. 525–526. Boston (2000)
9 26. Sundberg, R.: Maximum likelihood theory for incomplete data from an exponential family. Scandinavian Journal of Statistics pp. 49–58 (1974) 27. Yue, P.: Automatic service composition. In: Semantic Web-based Intelligent Geospatial Web Services, pp. 21–25. Springer (2013) 28. Zhao, Y., Karypis, G., Fayyad, U.: Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery 10(2), 141–168 (2005)