Using Recommendation to Limit Search Space in Web Services Discovery Mohamed Sellami∗ , Walid Gaaloul∗ , Samir Tata∗ and Mohamed Jmaiel† ∗ Institut
TELECOM, CNRS UMR Samovar, Evry, France Email: {mohamed.sellami,walid.gaaloul,samir.tata}@it-sudparis.eu † ReDCAD Laboratory, Sfax, Tunisia Email:
[email protected] Abstract—The discovery of suitable Web services for a given task is one of the central operations in Service-Oriented Architectures, and research on Web services (WS)s aims at automating this step. For the large amount of available WSs that can be expected in real-world settings, the efficiency of automated discovery becomes important. For this, we propose to exploit previous discovery results to reduce the search space in distributed registry environment. The architecture we adopt consists in structuring registries of WSs into communities. Then, we propose to adapt recommendation techniques to ease the services discovery process. This consists of (1) the selection of communities of registries using semantic matching and (2) the recommendation of registries within these communities using user-characterization based technique. To put our proposed distributed registry architecture in practise and to test the efficiency of recommendation-based discovery, we propose to implement a Peer-to Peer (P2P) registries environment on top of JXTA platform.
I. I NTRODUCTION Nowadays we are living democratization of Internet and electronic services. As a result, more and more companies are adopting the concept of Business-to-Business (B2B) for achieving transactions with their partners or possibly ensuring on-line their supply chains. In such scenarios, the involved companies have to put their services accessible on the net and available for consultations through services registries. Note that in the definition, WSs are designed to be used by other software programs automatically. Considering the increasing amount of available WSs, this becomes in particular relevant for employing a discovery engine as a heavily used component in systems for dynamic WS composition and semantically enabled business process management [1], [2]. Despite more than half a decade’s effort, discovering WSs is still considered as difficult as looking for a needle in the haystack [3]. Basically, a company interested in another’s company service has to screen several registries to discover the service that best suits its needs. This task can be very cumbersome since the number of available registries, and also the services they advertise, can be very important. Several approaches dealing with distributed registries [4], [5], [6] propose to structure their registries networks into several groups according to business domain, functional, or non-functional properties of the services they host. This is certainly a very interesting solution to limit a service requester search space, but as the number of registries inside a group can also be very important, the
service discovery process can remain complex. To deal with this problem, and to enhance service discovery in distributed registries environments, we propose to use recommendation techniques to limit a requester’s search space. We aim at discovering WSs faster and more precise based on user’s interests and WSs usage data. Thereafter, this paper presents a distributed registries architecture designed to enhance the WSs discovery process by limiting a requester’s search space. The architecture is based on a P2P network of WSs registries. A recommender system suggests for a discovery query the appropriate WSs registry using WSs usage data. The paper is organized as follows, In Section II we define a WS discovery query according to our needs and we specify user and registry characterizations. Thereafter, sections III and IV details our recommendation techniques to reduce WS research space. Section V introduces the proposed WS distributed registries architecture and illustrates the implementation efforts done to validate our approach. In Section VI we present a review of some recommendation-based approaches for WS discovery. Section VII concludes the paper with an outlook to future work.
II. T HE WS DISCOVERY QUERY In this work, we cluster WSs registries as communities to bootstrap a service discovery process. Indeed, to cater for highly dynamic and distributed nature of WSs, we adopt divide-and-conquer strategy by grouping WSs registries. Using distributed registries instead of a centralized one provides higher availability of registries, by overtaking the ”single point of failure” problem, and allows avoiding the bottleneck problem of centralized registries. However, dealing with an important number of registries makes service discovery in such environments awkward. To deal with this issue, we group registries catering for similar domain needs into communities. Grouping WSs registries into business-domain similar communities can greatly reduce the search space of a service discovery task. In this section, we define the WS discovery query used in our approach and introduce the concept of User/Registry characterization.
A. User inputs According to [7], the discovery of WSs in UDDI registries typically follows an Information Retrieval approach, whereas high-level match-making techniques [8] are utilised for semantic WSs due to the more structured annotation of service profiles. However, semantic WSs are still only available at the academic level for testing out a practical methodology like the one proposed in this paper. Instead, we opt for the more readily available format, namely, WSDL. Basically, we separate the semantics from the WSDL description and define a service requester’s query Q as the triplet < SD, RC, Cq > where: SD represents an abstract WS description written in the standard WS description language WSDL. It represents the functional requirement of a service requester. RC stands for the requester characterization. We define a characterization as a data structure containing useful information for recommendation characterizing a service requester (areas of interests, invocation history, etc.). More details are given in Section II-B1. Cq stands for the query’s concept and expresses the category of the WS the requester is looking for. It’s represented by a concept taken from a business domain ontology. To express this concept, we suppose that all the service requesters share the same semantic stack. This is done through common ontologies or ontology mediation mechanisms if different ones are used. By the same way, in a service publication process, the service provider has to supply the tuple < SD, Cp >, where SD and Cq are respectively the description and category of the provided WS. In this case, the concept Cp will be used to forward the SD to the adequate registry community. B. Recommendation technique inputs Registries of services recommendation is based on past user characterizations and the service requester characterization. In this section we introduce the concepts of user characterization and global registry characterization. 1) User Characterization: We define a user characterization as a data structure containing the user’s areas of interest, his trace and his non-functional requirements. • Areas of interest. The business domains the user is interested in (e.g. software, hardware, etc.). Those domains can be represented as concepts from a given domain ontology. Although we aim to recommend a registry from several ones that belong to the same community (having the same semantic domain), the requester’s areas of interests have an impact on the recommendation. In fact, a user interested in Software and Mobile services might have different needs from users interested in Software and Computer. • Trace. The list of concepts of the services invoked by a service consumer. In these traces, an invoked service is referenced by the semantic concept annotating the
invoked operation. These concepts are extracted from a domain ontology specific for each registries community. Trace has an important role in recommendation since service requesters having the same past invocation traces, are mostly interested in the same services. • Non-functional requirements. In addition to the functional requirements (SD), a service requester may specify nonfunctional ones. These requirements are represented by pairs of (concept, value) where concept is an ontology concept of a non-functional property and value is the desired weight for this property. 2) Global registry Characterization: Each registry in a community keeps a copy of requester’s characterization who successfully discovered and invoked a service. These characterizations are useful to recommend or not a registry to a service requester. We propose to merge past users characterizations into a single global registry characterization containing: • Global areas of interests. The list of all domains of interest found in a registry’s past user’s characterizations and the time they were identified. • Global trace. The list of all invoked services concepts from past user’s characterization traces and their number of occurrence. • Global non-functional requirements. A list of (concept, value) pairs where concept is identified from past user’s characterization and value is the average of all values corresponding to this concept. In Table I, we show how to merge two user’s characterizations (RC1,RC2) into a single global registry characterization (GRC). TABLE I G LOBAL REGISTRY CHARACTERIZATION Areas of Interest
Trace
Non-Functional Req
RC1
Hardware Software
SwOrdering PackOrdering
DeliveryTime=2
RC2
Software Packaging
SwOrdering
DeliveryTime=4
GRC
(Hardware,1) (Software,2) (Packaging,1)
(SwOrdering,2) (PackOrdering,1)
DeliveryTime=3
III. R EDUCING THE SEARCH SPACE A. Motivations In order to be sure to discover the appropriate service that best fits his needs, a service requester has to screen all available registries. Obviously, such a process is not conceivable in reality, especially when the number of online available registries, and the services they offer, is very important. We propose to reduce the search space of a service requester in a distributed and structured registries environment by, first selecting the appropriate registries community, and then recommending the registry(ies) that best suit his needs from that community according to his characterization.
Recommender systems form a specific type of information filtering technique that attempts to support users by identifying interesting products and services in situations where the number and complexity of offers outstrips the user’s capability to survey them and reach a decision [9][10]. In this paper we apply a collaborative filtering technique to find the solution for enhancing WSs discovery. The well-known approach of collaborative filtering is finding the nearest neighbors with the current item based on some criteria such as user’s preferences, ratings, usage data, or interests. And a part of this technique is the item-to-item collaborative filtering [11] which is used effectively in finding the most relevant items with the current one. Suppose that a user is listening a song from an application of a recommender system. After listening a song, he may be interested in listening another song with the same rhythm. The system can suggest him by finding the most similar rhythm songs after analyzing the sound wave of the current song. Then, a list of songs which have the most similar sound waves will be suggested to the user. In this use case, itemto-item collaborative filtering algorithm would be applied and the criteria here is the similar sound waves. There are many algorithms used for finding item-to-item similarity and one of them is used very effectively by Amazon[12]. B. Registry community selection In our work, registries are structured according to the business semantic domain of the services they contain. It will be useful to only execute the discovery query on communities that are semantically similar to the query. This step will be ensured by computing the semantic similarity between the service requester’s query and the list of registries communities. The semantically most similar communities to the query will be recommended to the requester. The process of the registry community selection is achieved as follow. A service requester starts by formulating his query Q =< SD, RC, Cq >. After that, the semantic similarity between the query and the list of the registry communities 1 is computed. This step is explained in Algorithm 1. The selected registry community(ies) by the community selection component might, and will probably, contain an important number of registries. Algorithm 1 CommunitySelection(q) 1: input: Q: A user’s query. 2: output: L: The list of selected registry communities. 3: 4: 5: 6: 7: 8:
Extract the concept cq from Q; for all registry communities gi do Extract the associated concept cgi ; Calculate the semantic similarity between cq and cgi ; end for Select N-Top similar registry communities;
Currently we are using the approach proposed by [13] to compute the similarity between two concepts. We have four 1 Registry communities are referenced with concepts taken from a business domain ontology
levels of similarity: exact, plug in, subsumes, and fail. C. Registry recommendation After the selection of a registry community according to the query, the service requester might be again faced with the large number of services published in the registries of that selected community. Executing a traditional syntactic or semantic matching between the user’s query and all the available services to discover the most adequate service will be time consuming. To deal with that, we propose to limit the user’s search space using a recommender technique that will guide the service requester to the most appropriate registry to his needs. This recommendation will be given on the basis of the service requester characterization and the list of past requesters characterizations represented through global registry characterizations. So registries, which present the more similar past requester’s characterizations to the current service requester characterization, will be recommended to the requester as they have the greatest chance to satisfy his query. The registry recommendation is ensured by a recommender system having as input the identifier of the selected community for the query (see Section III-B) and the service requester characterization. The recommender system requests then all the global registry characterizations from the registries in the selected community. Once received, the recommender system starts computing the recommendations factors between the service requester characterization and all the global registry characterizations (see Algorithm 2). Details about the recommendation process are given in Section IV. Finally, the recommender system returns back the recommended registry(ies) to the service requester. At this stage, he can run his query on the recommended registry(ies). Algorithm 2 RegistryRecommendation(id, RC) 1: input: id: The registry community identifier, RC: The user’s
characterization. 2: output: L: The list of recommended registries. 3: Request all global registry characterizations GRC from registry
community id; 4: for all GRCi do 5: compute recommendation factor between GRCi and RC; 6: end for 7: Recommend N-Top registries according to the recommendation
factor;
IV. C HARACTERIZATION - BASED REGISTRIES RECOMMENDATION
Using the service requester characterization and the global registries characterizations, the recommender system can recommend one or several registries from a community to the requester. The proposed recommendation formula uses the following definitions: • Similarity Factor: we define SF as the function computing the similarity factor between two characterization sets.
Recommendation Factor: we define RF as the function computing the recommendation factor between two characterization. The recommendation will be given on the basis of different similarity factors. • Characterization Set: represents the different attributes extracted from a characterization. We identify three types: the set of areas of interests denoted by I, of traces denoted by T and non functional requirements by R. • SF Weights: we associate to each SF an importance visa` -vis the global recommendation factor. We define α as the weight associated with SF (Ii , Ij ), β the weight associated with SF (Ti , Tj ) and δ with SF (Ri , Rj ) The recommendation factor RF between a service requester characterization RC and the global registry characterization GRC is computed with the following formula: •
RF (GRCi /RC) =
α × SF (Ii , Ij ) + β × SF (Ti , Tj ) + δ × SF (Ri , Rj ) 3 (1)
The registry that has the highest recommendation factor will have the ”greatest” chance to satisfy the user’s request. In fact, this factor is calculated on the basis of similarities between the interests, traces and non functional requirements of a RC and a list of GRC. A GRC represents the characterization of past users who used a specific registry. So a high similarity between RC and GRC means a resemblance between the service requester and past user’s of that registry.
we use the total number of times an area (resp. trace) is identified in all the GRCs as a weight. The similarity factor between two characterization’s Areas of Interests (resp. Traces), represented by two vectors, is then calculated using the cosine function: SF (Irc , Igrc1 ) = cosine(W Irc , W Igrc1 ) =
wi,j = T Fi,n × IDFi =
|Ii,j | |GRC| × log P |Ii | |GRCk,j |
(2)
Where: • Ii = {(a1 , n1 ), (a2 , n2 ), ..., (ak , nk )} represents the set of the areas of interests aj found in GRCi and the number of time nj they were identified. • |Ii,j | the number of time the area of interest areaj appears in Ii . • |Ii | the total number of times areas appears in Ii . • |GRC| the total number of GRCs. P • |GRCk,j | the total number of GRC where the area of interest Ii,j appears. By the same way, we compute the vectors associated with the requester characterization (RC). But since a service’s requester set of areas of interests (resp. trace) is not weighted,
(3)
B. Non functional requirements similarity factor Using the TF-IDF metric for computing a vector representing the non functional requirements is not suitable. We can not represent them with a vector on the basis of the number of times the requirements were identified. In fact, each requirement is associated with a value, representing its importance to the service requester, and that can have different significations (time in ms, price, etc). To compute the similarity factor between two sets of non-functional requirements we define: • The Characterization Set: we define R as a set of tuples (ci , vi ) where ci represent a concept in the characterization and vi its associated value. • The Common Characterization Set: we define the common characterization set C as {(c1 , v1rc , v1grc ), (c2 , v2rc , v2grc ), ..., (cn , vnrc , vngrc )} where ci the common concept between Rrc and Rgrc , virc (resp. vigrc ) the value of that concept in Rrc (resp. Rgrc ). To compute SF , we suppose that all requirements have the same level of importance and we use the following formula:
A. Areas of Interests/Traces similarity factor To compute the similarities between the areas of interests (resp. traces) we use the vector space model [14] to represent these document fragments as vectors using the Term Frequency-Inverse Document Frequency (TF-IDF) metric for the vector’s weighting (see Formula 2). The similarity between the two vectors is then computed using the cosine function (see Formula 3). The vector W Igrci = {wi,1 , wi,2 , ..., wi,n } representing the weight of every area of interest (resp. W T for the trace) in a global registry characterization (GRCi ) (resp. the requester characterization (RC)) is computed as follows:
W Irc × W Igrc1 |W Irc | × |W Igrc1 |
SF =
P min(virc ,vigrc ) sup(virc ,vigrc ) |Rgrc |
(4)
Where |Rgrc | is the number of concepts in GRC C. Example We illustrate now how the recommendation process of a registry takes place in practice. We take the simplified example of a personal computers manufacturing company. In order to guarantee its production line, the company has to contact other companies to supply the necessary components. To simplify, we suppose that the company will only need a hardware component, a software component and a packaging box. We suppose that the company has just consumed two services to buy the hardware and software components. The used operations, namely buyHardComponent and buySoftComponent (respectively referenced by the concepts HwOrdering and SwOrdering), are saved in it characterization. The current company characterization is shown in Table II. TABLE II T HE COMPANY CHARACTERIZATION Areas of Interest
Trace
Non-Functional Req
Hardware Software Packaging
HwOrdering SwOrdering
DeliveryTime=2
Now, the company wants to find a WS to buy the packaging box. The adequate community is selected, through semantic
matching (see Section III-B), and the query is routed to the Packaging registry community proposing a big number of registries offering different services. A recommendation, using the RC and the GRC of registries from the Packaging community, can now advise on which registry the company’s query should be executed. We suppose that this community presents 100 global registry characterizations For simplicity reasons, we apply our recommendation technique introduced in Section IV on two registries. The values extracted out from their GRCs are presented in Table III. To compute the recommendation factor between the company characterization RC and the different global registry characterizations GRC1 and GRC2, we start by computing the similarity factor between the areas of interest of the RC and the GRC1. From the characterization we extract: • The set of areas of Interest of GRC1: Igrc1 = {Software(150),Packaging(10)} • The set of areas of Interest of RC by using the total number of times an area is identified in all the GRCs of the registry community: Irc = {Hardware(430),Software(540),Packaging(330)} TABLE III T HE GLOBAL REGISTRY CHARACTERIZATIONS Areas of Interest
Trace
GRC1
(Software,150) (Packaging,10)
(SwOrdering,20) (PackOrdering,40)
DeliveryTime=4
GRC2
(Hardware,130) (Software,89) (Packaging,20)
(SwOrdering,40) (PackOrdering,30)
DeliveryTime=3
According to this example and supposing that (β = 0.8 > α = 0.5 > δ = 0.3)2 , the following recommendation factors are obtained using (1): RF (GRC1 /RC) = 0.5×0.78+0.8×0.35+0.3×0.5 ≈ 0.27 3 RF (GRC2 /RC) = 0.5×0.91+0.8×0.66+0.3×0.66 ≈ 0.39 3
On the basis of those values, the query is routed to the registry that has the GRC2 . V. E XPERIMENTS In this section, we propose an architecture for implementing and testing our approach for WS discovery in a distributed registry environment. After that, we implement a WSs discovery system according to this architecture and illustrating the feasibility of our approach. A. The proposed architecture Due to the dynamic nature of the registries network (at any time a new registry can join or leave the network) a P2P network is well adapted to distribute WS registries. In addition, using a P2P based decentralized infrastructure allows a better interoperability and scalability for our system. The proposed architecture is shown in Figure 1. In the following, we present the two layers of our architecture and we show how a WS discovery query is handled.
Non-Functional Req
In the 100 GRCs of the Packaging registry community, we suppose that the area of interest Hardware appears in 30 of them, Software in 20 and Packaging in 15. By applying (2) we obtain W Igrc1 and W Irc representing respectively the weight vectors of the areas of interest in GRC1 and RC: W Igrc1 = (0, 0.65, 0.05) W Irc = (0.17, 0.29, 0.20)
By applying (3), we obtain the similarity factor between the areas of Interest in GRC1 and RC: SF (Irc , Igrc1 ) = cosine(W Irc , W Igrc1 ) W Irc × W Igrc1 = ≈ 0.78 |W Irc | × |W Igrc1 |
Similarly, we compute the similarity factor between the traces in GRC1 and RC: SF (Trc , Tgrc1 ) = cosine(W Trc , W Tgrc1 ) W Trc × W Tgrc1 ≈ 0.35 = |W Trc | × |W Tgrc1 |
To establish the similarity factor between the non functional requirements in GRC1 and RC we use (4): SF (Rrc , Rgrc1 ) =
2 4 1
= 0.5
By the same way, the similarity factors between the different characterizations elements in GRC2 and RC are: SF (Irc , Igrc2 ) ≈ 0.91, SF (Trc , Tgrc2 ) ≈ 0.66 and SF (Rrc , Rgrc2 ) ≈ 0.66
Fig. 1.
WSs discovery architecture
The Presentation & Trading layer represents the access point for both requesters and providers of services. It helps requesters in expressing their queries and providers in publishing their services. The Presentation & Trading layer consists of several Presentation & Trading peers (P&T). When a requester/provider, connects to the architecture, the discovery/publication will be assisted by one of those P&T peers. To limit the search space, the P&T peer recommends to the service requester the registry upon which he should run his query. This is done in two steps. First, it forwards a user’s query to the most adequate registry community (community selection component) based on the WS category. Then, the 2 We consider that a similarity in the invocation history is more important than a similarity in the areas of interest which is more important than a similarity in the non-functional requirements.
P&T peer (the recommender system) recommends to the user one or several registries that have the highest ”probability” to satisfy the query. The community selection mechanism and the registries recommendation is done using the techniques presented in Section III and IV. At this stage, the service requester can execute his query on the recommended registries. To resume, the different component constituting a P&T peer (Figure 2) are: • Community selection component. Finds based on sematic similarity the most suitable community of registries for a service description in a publication process, or for a user’s query in a discovery process. • Recommender system. Recommends a list of registries to a user in the discovery process. The recommendation is done according to past users characterizations as presented in Section IV. • Data component. Holds replications of the used ontologies and contains the list of existing registry communities. • GUI component. The interface proposed for service providers/clients that interact with the discovery system. • Watchdog component. Monitors and updates traces of a service consumer characterization.
deployed a P2P based-testbed on top of Sun Microsystems’s JXTA platform [15]. The JXTA platform is suitable for implementing our distributed registries network test bed. A registry community is then viewed as a peer community where each services registry corresponds to a JXTA peer. To simulate a distributed registry environment (the service providing layer), we can implement several peer communities, i.e., registries communities, hosting several peers, i.e., registries peers. Those JXTA peers will create a virtual network on top of the physical network.
Fig. 3.
Fig. 2.
Structure of a presentation & trading peer
The service providing layer contains a P2P network of registry peers. Every registry peer represents a proxy to a WSs registry. These peers are clustered according to the Cp concepts associated to their services descriptions. So, each community of registry peers will host descriptions of services according to a specific business domain (Chemistry, Mechanics, etc.). The structure of communities is managed and updated by the presentation & trading peers and the community management peer. Communities are created on the basis of a domain ontology containing concepts defining the semantics of business domains. This ontology is stored in the data component of the presentation & trading peers. Due to the dynamic nature of this layer (new registry peer arrival/departures, new services publication, etc), the structure of this network had to be updated and managed regularly. A community management peer ensures this task. As this functionality goes out of the scope of this paper, details are not presented. B. Implementation To illustrate the feasibility of our recommendation based approach for reducing the service requester search space, we
A screenshot of the P&T peer console captured at run-time.
Our experimental work implemented the characterizationbased registries recommendation. Here, a client interacts with a group of peers, i.e., a registry community, that hosts a set of peers, i.e., registry peers. Two JXTA peers have been developed: TradingPeer and RegistryPeer. The aforementioned JXTA peers have been deployed over various distributed platforms. Communications between a TradingPeer and the RegistryPeer are ensured through JXTA pipes. The WSs registries are implemented using the java open source implementation of UDDI, jUDDI 3 and RegistryPeer(s) communicate with them using the UDDI4J4 API. In our experiments, we created a TradingPeer, three RegistryPeer(s) that belong to the same registry community and connected to jUDDI registries, and a Client application. Figure 3 uses some screenshots to illustrate the interactions between the peers at run-time. In this figure, the different interactions are numbered. When a service requester contacts the TradingPeer, a GUI appears for the client offering the possibility to chose a predefined query and his characterization (Figure 4(a)).
3 http://ws.apache.org/juddi/ 4 http://uddi4j.sourceforge.net/
TABLE IV P ROS AND CONS OF CURRENT METHODS
(a) Result of the recommendation
(b) Details of the result Fig. 4.
Methods
Pros
Cons
Clustering WS: WSs are clustered by splitting inputs, outputs, operations and merging similar words Query string based analysis: Find the relevant WSs based on similarity between strings Syntactical matching: Find similar WS by matching WS documents syntactically Semantic description analysis: Match semantic descriptions of WSs Rules based analysis: Operations are suggested by the rules defined by system administrators Rating based analysis: WSs are suggested based on their ratings
Reduce search space on clusters
Applied only for formatted terms & lost the meaning of WS description
Do not need to record user’s profile
Could not track user’s interest
Easy to implement
Lack semantic meanings of documents
Semantic meaning WS are considered
of
Lack of semantic resources
Flexible to define & manage the rules for recommendation
Suggestions depends on personal perspectives and it’s difficult to fulfill all of the rules New coming or low rated WSs are not recommended while high rated ones are always in the recommended list
User’s interest is considered in generating recommendations
Graphical User Interface offered by the P&T peer.
Once validated, those two files are temporary saved by the TradingPeer (Figure 3, Interaction (i1)). Using the query, the community selection component of the TradingPeer will
choose the most adequate community for the requester’s query (i2). This selection happens in compliance with the description we provided in Section III-B. Afterwards, the TradingPeer contacts the selected community (Packaging community) to get the list of registries (i3). At this stage, the TradingPeer will interact with the different RegistryPeer(s) of the selected community, in our example three, to recommend one of them. The TradingPeer recovers the global registry characterization of the first registry in the list and computes the different similarity factors vis-`a-vis the requester characterization (i4). Using those factors, we obtain the associated recommendation factor (i5). The different calculations in (i4i5) are done according to the formulas introduced in Section IV. The same steps are used to compute to recommendation factors of the two others registries. When it completes its recommendations computing (Figure 4(b)), the TradingPeer will recommend to the service requester Registry 3 belonging to the Packaging registry community (Figure 4(a)). VI. R ELATED W ORKS There is already substantial work in the field of Semantic WSs (SWS) on automating WS discovery by semantic matchmaking, mostly focussing on the retrieval performance [16], [17]. However, the computational performance of semantically enabled discovery and the practical consequences for SWS environments have not received a lot of attention so far. Current approaches show that there are two tendencies to improve WS discovery: recommender systems or search engines. Both of them reduce the search space of WSs to a list of services that
are most relevant with user’s interest or key words in the query string. Pros and cons of current researches can be synthesized as in table IV. No solution is perfect in delivering most relevant services to users because each of them has its own limitations. This paper proposes a new solution which can overcome the cons of current approaches and improve WS discovery based on user’s characterizations. Although recent recommender systems have limitations in WSs discovery, they are the best solutions that can lead users to the expected services. In the following, we briefly present some works that used recommendation techniques to enhance the WSs discovery process. In [18], Aliaksandr et al propose an implicit recommender system to improve WS Discovery. Their approach benefits from past services developer’s discovery experiences: observed data from requests and their corresponding service invocations and executions. This is done without any explicit interaction with the developers. The system uses a WS based on the implicit culture theory of service developers (IC-service). For each user’s request, the IC-service collects and manages observation data from the service’s invocation and execution. Using these data, the proposed system can compute similarity between a user’s request and the theory’s antecedent to recommend services. In [19], the authors note that semantics and syntax are insufficient to discover a service that best suits user’s needs in a grid infrastructure. They propose to add to semantics and syntax two additional dimensions of service description, namely: quality and usage patterns. On the basis of those four services description dimensions, they propose an architecture for recommendation-based service mediation. To handle user requirements that may change over time, Zhang et al. [20] propose to proactively recommend services to users. On the basis of the outputs of the previously-chosen
services, they define the user’s Interest model. The recommendation is done by matching between available service’s inputs and the user interests. To ensure a better accuracy in the matching, the authors semantically annotate services descriptions and users interests. In [21], authors combine semantic matching of WSs descriptions with recommendation techniques. To select a service, a semantic matching between the user’s requirements and available services descriptions is executed. The list of services resulting from this matching is ordered by a recommender system based on previous user’s ratings. That’s why the proposed system has to explicitly ask users to rate services. Some works used syntactical matching to recommend WSs. In [22] for example, Blake et al. propose to compute a WS recommendation score by matching strings composing the user’s file (string data collected during a user’s operational sessions) and strings composing the registered WSs (messages names and operation names). To compute this score, they consider the syntactical heterogeneity that can be observed in the services descriptions (i.e. services offering functionally same inputs, but who are syntactically different) and propose a recommendation algorithm considering this point. On the basis of the obtained score, they can judge if a user might be interested in some services. The approach that we propose is radically different from those mentioned above. The previous approaches were developed for service discovery in centralized registries while we aim at discovery in a distributed registry environment. The proposed WSs clustering approach can automatically reduce the search space of WSs effectively. Therefore, it can be seen as a predecessor for WSs Discovery. In addition, rather than recommending WSs, in our approach we recommend registries having the better probability to satisfy a service requester’s need. Moreover, combining past users executions and a service requester’s trace is the best way to find the real needs of a requester. To the best of our knowledge, only one approach [18] used past users executions as input data for their recommendation technique. However, in this approach the authors don’t consider the current service requester’s trace in their recommendation as we did. Additionally, we use ontological concepts to model past users executions instead of textual descriptions, extracted from past users requests, which are ambiguous and prone to typos. VII. C ONCLUSION In this paper, we proposed an approach to enhance the WSs discovery process in distributed and structured registries environment. We proposed to limit a user’s search space by: (1) selecting a registry community according to his query and (2) recommending a registry from that community using the requester’s characterization. We also proposed a suitable architecture for implementing WSs discovery systems using our approach. We also demonstrated and deployed the proposed approach on top of a JXTA-based distributed registries environment.
Our future research work will focus on continuing the experimentation we started. We foresee to collect and publish WSs descriptions on the implemented jUDDI registries so that we can simulate the entire discovery process. On that basis, we can collect performance indicators for our approach such as recall, precision and discovery time. Those indicators will allow us to compare our recommendation based discovery approach to a matching based approach (i.e. where discovering the right service in a registry community, implicates a matching between the query and all the services in that community). When using our approach, we might obtain less pertinent results in the discovered WSs. However, the discovery time will be better since we dont have to scan all registries. On the basis of our prototype, it will be interesting to observe the ratio of results pertinence to discovery time using our approach. R EFERENCES [1] P. Traverso and M. Pistore, “Automated composition of semantic web services into executable processes,” in ISWC, 2004. [2] M. Hepp, F. Leymann, J. Domingue, A. Wahler, and D. Fensel, “Semantic business process management: A vision towards using semantic web services for business process management,” in ICEBE, 2005. [3] J. Garofalakis, Y. Panagis, E. Sakkopoulos, and A. Tsakalidis, “Web service discovery mechanisms: Looking for a needle in a haystack?” in International Workshop on Web Engineering, 2004. [4] M. Sellami, S. Tata, and B. Defude, “Service discovery in ubiquitous environments: Approaches and requirements for context-awareness,” in BPM Workshops, 2008. [5] K. Sivashanmugam, K. Verma, and A. P. Sheth, “Discovery of web services in a federated registry environment,” in ICWS, 2004. [6] B. Xu and D. Chen, “Semantic web services discovery in p2p environment,” in ICPPW Workshops, 2007. [7] J. D. Garofalakis, Y. Panagis, E. Sakkopoulos, and A. K. Tsakalidis, “Contemporary web service discovery mechanisms,” J. Web Eng., vol. 5, no. 3, 2006. [8] M. Klusch, B. Fries, and K. Sycara, “Automated semantic web service discovery with owls-mx,” in AAMAS, 2006. [9] P. Resnick and H. R. Varian, “Recommender systems,” Commun. ACM, vol. 40, no. 3, 1997. [10] A. Felfernig, G. Friedrich, and L. Schmidt-Thieme, “Guest editors’ introduction: Recommender systems,” IEEE Intelligent Systems, vol. 22, no. 3, 2007. [11] B. Sarwar, G. Karypis, J. Konstan, and J. Reidl, “Item-based collaborative filtering recommendation algorithms,” in WWW, 2001. [12] G. D. Linden, J. A. Jacobi, and E. A. Benson, “Collaborative recommendations using item-to-item similarity mappings,” Patent, 2001. [Online]. Available: http://www.freepatentsonline.com/6266649.html [13] M. Paolucci, T. Kawamura, T. R. Payne, and K. P. Sycara, “Semantic matching of web services capabilities,” in ISWC, 2002. [14] G. Salton, A. Wong, and C. S. Yang, “A vector space model for automatic indexing,” Commun. ACM, vol. 18, no. 11, 1975. [15] B. Traversat, A. Arora, M. Abdelaziz, M. Duigou, C. Haywood, J.-C. Hugly, E. Pouyoul, and B. Yeager, “Project JXTA 2.0 Super-Peer Virtual Network,” Sun Microsystems, Tech. Rep., 2003. [16] M. Paolucci, T. Kawamura, T. R. Payne, and K. P. Sycara, “Semantic matching of web services capabilities,” in ISWC, 2002. [17] L. Li and I. Horrocks, “A software framework for matchmaking based on semantic web technology,” in WWW, 2003. [18] A. Birukou, E. Blanzieri, P. Giorgini, and N. Kokash, “Improving web service discovery with usage data,” IEEE Softw., vol. 24, no. 6, 2007. [19] B. Mehta, C. Nieder´ee, A. Stewart, C. Muscogiuri, and E. J. Neuhold, “An architecture for recommendation based service mediation,” in ICSNW, 2004. [20] C. Zhang and Y. Han, “Service recommendation with adaptive user interests modeling,” in ICDCIT, 2007. [21] U. S. Manikrao and T. V. Prabhakar, “Dynamic selection of web services with recommendation system,” in NWESP, 2005. [22] M. B. Blake and M. F. Nowlan, “A web service recommender system using enhanced syntactical matching,” in ICWS, 2007.