Service retrieval based on behavioral specification Daniela Grigori, Mokrane Bouzeghoub Prism, Universite de Versailles Saint-Quentin en Yvelines 45 avenue des Etats-Unis, 78035 Versailles Cedex, FRANCE
[email protected]
Abstract The capability to easily find useful services (software applications, software components, scientific computations) becomes increasingly critical in several fields. Current approaches for components retrieval are mostly limited to the matching of their inputs/outputs. Recent approaches have demonstrated that this approach is not sufficient to discover relevant components. To go beyond the limits of these approaches, a substantial effort has been done by different works on semantic web and ontologies by exploiting more knowledge on the semantics of the components. However, this effort still remains insufficient and does not fulfill user needs as many functional or quality aspects are hidden within the specification of components behavior. In this paper we argue that, in many situations, the component discovery process should be based on the specification of this behavior, that is the process model (e.g. workflow) which describes each component. The idea behind is to develop matching techniques that operate on process models and allow delivery of partial matches and evaluation of semantic distance between these matches and the user requirements. Consequently, even if a service satisfying exactly the user requirements does not exist, the most similar ones will be retrieved and proposed for reuse by extension or modification. To do so, we reduce the problem of service behavioral matching to a graph matching problem and we adapt existing algorithms for this purpose. We propose a set of operators useful in behavioral matching that can be implemented using variations of the same algorithm. Keywords: web services, services retrieval, behav-
ioral matching
1. Introduction The capability to easily find useful services (software applications, software components, scientific computations) becomes increasingly critical in several fields. Examples of such services are numerous: • Software applications as web services which can be invoked remotely by users or programs. One of the problems arising from the model of web services is the need to put in correspondence service requesters with service suppliers, especially for services which are not yet discovered or which are new, taking into account the dynamic nature of the Web where services are frequently published and removed. • Programs and scientific computations which are important resources in the context of the Grid, sometimes even more important than data [Fos03], [Fos02]. In such a system, data and procedures are first rank classes which can be published, searched and handled. Thus, the scientists need to retrieve procedures with desired characteristics, to determine if a required calculation was already carried out and whether it is more advantageous to carry out it again or to retrieve data generated previously. • Software components which can be downloaded to create a new application. To reduce the development, test and maintenance costs, a fast solution is to re-use existing components.
In all these cases, users are interested in finding suitable components in a library or collection of models. User formulates a requirement as a process model; his goal is to use this model as a query to retrieve all components whose process models match with a whole or part of this query. If models that match exactly do not exist, those which are most similar must be retrieved. For a given task, the models that require minimal modifications are the most suitable ones. Even if the retrieved models have to be tailored to the specific needs of the task, the effort for the tailoring will be minimal. In the next section we present several motivating scenarios. Section 3 presents existing approaches for service retrieval and shows their drawbacks for the presented scenarios. In section 4 and 5 we show how the behavioral matching is reduced to a graph matching problem and fix base requirements for this purpose. Section 5 presents the algorithm which constitute our background and sets the preliminary definitions. Section 6 shows how this algorithm can be adapted to our problem and section 7 exemplify its use for composite service discovery. Finally section 8 present ongoing work and conclusions.
2 Motivating scenarios 2.1 Web services integration: Consider a company that uses service S to order office suppliers. Suppose that the company wants to find retailers (say WalMart or Target) having compatible web services (a new retailer or replacing the current partner). The allowed message exchange sequences are called business protocols and can be expressed for example using BPEL abstract processes, WSCI, or other protocol languages (see, e.g., [BeCT04]). The specification of the business protocol is important, as it rarely happen that service operations can be invoked independently from one another. Thus the company will search for a service having a compatible business protocol. Among retailer services, the most compatible one has to be found. If the service is not fully compatible, the company will adapt its service or will develop an adaptor in order to interact with the found service. In both situations, the differences between the business protocols have to be automatically identified. In the former case, finding the most similar service
allows to minimize the development cost. In the latter case, identifying automatically the differences between protocols is the first stage in the process of semiautomatically developing adapters (see [BCG+05]). As an example, suppose that the first protocol may expect to exchange messages in the following order : clients can invoke login, then getCatalogue to receive the catalogue of products including shipping options and preferences (e.g., delivery dates), followed by submitOrder, sendShippingPreferences, issueInvoice, and makePayment operations. In contrast, the second protocol allows the following sequence of operations: login, getCatalogue, submitOrder, issueInvoice, makePayment and sendShippingPreferences. This is possible, e.g. because the provider does not charge differently according to shipping preferences. Clients are allowed to specify their shipping preferences at a final step. The retrieval process is expected to show that the two services are similar, excepting the order of the operations. In this case, an adapter that resolves the mismatch order can be automatically generated using the techniques presented in [BCG+05]. Note that finding retailer services taking into account only inputs and outputs can be used only as a first filter as it does not guarantee that retrieved services support the same order of exchanged messages.
2.2 Delta-analysis: Delta-analysis [GO97] consists in comparing two models. For instance, the first describes the current state of a business process in the company while the second highlights how the business process should be carried out in the future. The future model can be a reference model or a model built especially for the needs of the company by a consultant. The reference model is important in those business environments where the interaction has been standardized either de jure or de facto (e.g. due to the presence of a dominant player in the market). For example, the RosettaNet consortium standardizes the external behavior of services in the IT supply-chain service. The reference model describes the processes in a generic form, independently of the company. To make the decision whether the current model should be or not adapted to the future model (”reengineered”), the development cost for adapting
it must be taken into account. The cost and effort of adapting the process to a required model can be asset by automatically identifying the differences between the models and assigning costs for each type of modification needed.
2.3 Reusing components: Reusing components permits to reduce modeling and development costs. The application designer formulates a requirement model that contains essential elements (or objects) of his application. The discovery process will first match these elements against those of the component library and selects the most appropriate ones to be composed in the user application. The remaining components which do not match will be be developed by the application programmers. Compared to services reuse whose implementations are encapsulated, component reuse permits code manipulation and allows code specialization and code overriding. For this reason, providing partial matching elements may be of great interest. Similarly to services matching, components matching gets more relevant results if it is based on the formal specifications of the components behavior, that is the process model of each component.
2.4 Reverse engineering: Reverse engineering allows to abstract source code into high level specification for different goals: either to improve source code documentation or to migrate a legacy application into a modern system. Matching the behavioral models resulted from reverse engineering processes may respectively allow computing deltas between successive models (e.g. to capture transitions between models) or checking whether a legacy component can be involved or not within a new system architecture (e.g. to migrate its persistent data).
2.5 Scientific experiments: Suppose that a scientist would like to apply an astronomical analysis consisting in several steps invoking different applications. If the analysis was already carried out and the results stored, he will save months of calculations. If a program which makes this analysis exists, he should not write one from scratch. He
would need to modify an existing program, for instance only by adding a step, adding data in the data flow, etc. Similarly, many biological applications result from long experimental processes which are based on specific conditions acquired from several tests and refinements. The reuse of such applications as services to build more complex experiments reduce dramatically the development cost as the ground experiments, which may take months or years, are not run again. Providing a powerful and relevant service retrieval to a scientific Grid community may save a lot of money and increase significantly scientists’ productivity.
3. Related work Currently, the algorithms for Web services discovery in registers like UDDI or ebXML are based on a search by key words or tables of correspondence of couples (key-value). Within the framework of the semantic Web, description logics were proposed for a richer and precise formal description of services. These languages allow the definition of ontologies, such as for example DAML-S, which are used as a basis for semantic matching between a declarative description of the required service and descriptions of the services offered ([PKP+02], [BeK02], [BeH03]). In [PKP+02], [BeH03], a published service is matched with a required service when the inputs and outputs of the required service match (are equal or a generalization of the type) the inputs and outputs of the published service. [BeK02] defines a query language for services which allows to find services by specifying conditions on the activities which compose them, the exceptions treated, the flow of the data between the activities. In [Kaw03] independent filters are defined for service retrieval: the name space, textual description, the domain of ontology that is used, types of inputs/outputs and constraints. The approach presented in [Car03] takes into account the operational properties like execution time, cost and reliability. The software engineering community developed techniques of deductive research (services and the requests are expressed formally by using logic) and techniques based on the analysis of the execution (comparison between the desired behavior at output compared to that observed). In the context of the Grid [Fos02], the research of
procedures is based on a high-level language which expresses the relations between procedures and data used in input and produced as output. We can see through these references that service retrieval based of key words or some semantic attributes is not satisfactory for a great number of applications. The tendency of recent work is to exploit more and more knowledge on service components and behavior. The need to take into account the behavior of the service described by a process model was underlined by several researchers [TBG01], [BaV03], [BeK02], [ZdJ04], [PVM01], [WMF+04]. In [BeK02], in order to improve precision of web service discovery, the process model is used to capture the salient behavior of a service. The greater expressiveness of process models, as compared with keywords, offers the potential to increase substantively retrieval precision. [BaV03] argues that the matchmaking based on service input and output is not sufficient as some output data may be produced only under certain internal conditions. Thus, they propose an algorithm that matches output data taking into account the process structure, for instance conditional branching. [TBG01] underlines the importance of including behavior aspects in matchmaking process in the B2B environment and mention it as a future work. [PVM01], which presents a model for dynamic service aggregation, stresses also the capability to automatically verify the behavioral compatibility of various processes as a requirement in electronic marketplaces. [ZdJ04] deals with the equivalence of two processes modeled using Petri nets. It is supposed that partners discover each other by searching in business registry, and then they agree on a protocol. Their work verifies the compatibility between the agreed protocol and the process existing in the enterprise. We take a different approach, by allowing to find a partner that is compatible (fully or partially) to an existing enterprise process. Very recently, authors in the academia have published papers that discuss similarity and compatibility at different levels of abstractions of a service description (e.g., [BeCT04a, Bordeaux04, DHM+04, PoFo04, WMF+04]). In terms of protocols specification and analysis, existing approaches provide models (e.g.,
based on pi-calculus or state machines) and mechanisms to compare specifications (e.g., protocols compatibility checking). Our objective is to propose an approach for service retrieval based on behavioral specification. To the best of our knowledge there is not another approach allowing to retrieve services having similar behaviour.
4. A meta-model for behavioral service retrieval Web services interface definition is complemented by conversation model. Conversation protocol describes the observable behavior of a web service by specifying constraints on exchanged messages order for correct interaction with the service. The composition model allows to integrate web services according to the specification of an explicit process. For both models (composition and conversation), most of existing proposals (standard and research models) are graph based. For this reason, we choose to use a metamodel based on process graphs in order to compare two process models. A process is represented as a directed graph, whose nodes are activities. Edges have associated transition conditions expressing the control flow dependencies between activities. An activity having more than one incoming edge is a join activity; it has an associated join condition. For an or-join activity the associated join condition is the disjunction of conditions associated to incoming edges. For an and-join activity the associated join condition is the conjunction of conditions associated to incoming edges. Activities have a type and can be atomic or composed. Activities associated to web services consume input data elements and produce output data elements. They specify also the name of the operation. In summary, a process model is represented by a directed attributed and labeled graph whose nodes are activities and whose edges represent control flow constraints. A labelled graph will be defined by a quadruple G = (V, E, rV , rE ) : - V is the set of vertices - E ⊂ V × V is the set of edges - rV : V → LV is a function assigning labels to the vertices
- rE : E → LE is a function assigning labels to the edges. where LV and LE denote the set of vertex and edge labels, respectively. Let p be the number of node attributes, and LV = A1 × A2 × · · · Ap . Edge (u, v) originates at node u and terminates at node v. In this section we proposed a graph-based metamodel for service behavioral specification. In the following we will use it to compare two processes and to allow an inexact match.
5 Background and preliminary definitions Using graphs as a representation formalism for both user requirements and service models, the service matching problem turns into a graph matching problem. We want to compare the process graph representing user requirements with the model graphs in library. The matching process can be formulated as a search for graph or subgraph isomorphism. However, it is possible that it does not exist a process model such that an exact graph or subgraph isomorphism can be defined. Thus, we are interested in finding process models that have similar structure, if models that have identical structure do not exist. The error-correcting graph matching integrates the concept of error correction (or inexact matching) into the matching process ([Saha81], [Bun00]). To make the paper self-contained, in this section we recall the principle of this graph matching method and the basic definitions (taken from [Mes95]). In order to compare the model graphs to an input graph and decide which of the model is most similar to the input, it is necessary to define a distance measure for graphs. Similar to the string matching problem where edit operations are used to define the string edit distance, the subraph edit distance is based on the idea of edit operations that are applied to the model graph. The graph edit operations are used to alter the model graphs until there exist subgraph isomorphism to the input graph. To each of the graph edit operations, a certain cost is assigned. The costs are application dependent and reflect the likelihood of graph distortions. The more likely a certain distortion is to occur the smaller is its cost. The subgraph edit distance from a model to an input graph is then defined to be the minimum cost taken over all sequences of edit operations
that are necessary to obtain a subgraph isomorphism. It can be concluded that the smaller the subgraph distance between a model and an input graph, the more similar they are. The following graph edit operations are considered: vertex and edge label modifications, inserting and suppressing vertices, inserting and suppressing edges. In the following we give the definitions for error correcting graph matching (taken from [Mes95]). Definition 5.1 Graph isomorphism Let g and g’ be graphs. A graph isomorphism between g and g′ is a bijective mapping f : V → V ′ such that - rV (v) = rV′ ′ (f (v)) for all v ∈ V - for any edge e = (u, v) ∈ E there exists an edge e′ = (f (u), f (v)) ∈ E ′ such that rE (e) = ′ (e′ ) and for any edge e′ = (u′ , v ′ ) ∈ E ′ there rE ′ exists an edge e = (f −1 (u), f −1 (v)) ∈ E such ′ (e′ ). that rE (e) = rE ′ If f : V → V ′ is a graph isomorphism between graphs g and g′ , and g′ is a subgraph of another graph g”, i.e. g′ ⊂ g”, then f is called a subgraph isomorphism from g to g”. Given a graph G, a graph edit operations δ on G is any of the following: • substituting the label rV (v) of vertex v by l (this implies substituting all the attributes of vertex v rV (v) by (l1 , l2 · · · lp ) • substituting the label rE (e) of edge e by l′ • deleting the vertex v from G (for the correction of missing vertices) Note that all edges that are incident with the vertex v are deleted, too. • deleting the edge e from G (for the correction of missing edges). • Inserting an edge between two existing vertices (for the correction of extraneous edges). The five edit operations are powerful enough to transform any graph G into a subgraph of any other GI . Therefore it is always possible to find a sequence of edit operations that transform a model graph G into another graph G′ such that a subgraph isomorphism f from G′ to a distorted input graph GI exists.
Definition 5.2 Edited graph Given a graph and an edit operation δ , the edited graph δ((G)) is a graph in which the operation δ was applied (modifying an edge or vertex label, suppressing an edge or vertex, inserting an edge). Given a graph G and a sequence of edit operations ∆ = (δ1 , δ2 , · · · δk , ), the edited graph ∆(G) is a graph ∆(G) = δk (· · · δ2 (δ1 (G)))..). Definition 5.3 Ec subgraph isomorphism Given two graphs G and G′ , an error correcting (ec) subgraph isomorphism f from G to G′ is a 2-tuple f = (∆, f∆ ) where - ∆ is a sequence of edit operations such that there exists a subgraph isomorphism from ∆(G) to G′ . - f is a subgraph isomorphism from ∆(G) to G′ . The cost of an ec subgraph isomorphism f = (∆, f∆ ) is the cost of the edit operations ∆ , i.e; C(f ) = C(∆(G)). There is usually more than one sequence of edit operations such that a subgraph isomorphism from ∆(G) to G′ exist and, consequently, there is usually more than one ec subgraph isomorphism from G to G′ . We are interested in the ec subgraph isomorphism with minimum cost. Definition 5.4 Subgraph edit distance Let G and G” be two graphs. The subgraph distance from G to G”, de(G, G”) is given by the minimum cost taken over all error-correcting subgraph isomorphism f from G to G”. The subgraph isomorphism detection is based on a state-space search by means of an algorithm similar to A* [Saha81]. The basic idea of a state-space search is to have states representing partial solutions of the given problem and to define transitions from one state to another such that the latter state represents a more complete solution than the previous state. For each state s there is an evaluation function f(s) which describes the quality of the represented solution. The states are expanding according to the value of f. In the case of ec subgraph isomorphism detection, given a model graph G and an input GI a state s in the search space represents a partial match from G to GI. Different algorithms have been proposed for ec graph matchingin order to reduce the computation complexity (see for example [Mes95], [MeBu98]).
6. Behavioral matching In this section we show how the error-correcting subgraph isomorphism algorithm can be used for service behavioral matchmaking. First we define a similarity measure based on the subgraph distance (see definition ?? ) allowing to rank models in the library. Then we deal with the situation when model graphs are matched against disjoint subgraphs in the input graph. Finally a set of operators useful for model management are proposed.
6.1. Similarity measure for behavioral matching The subgraph edit distance defined previously expresses the cost of transformation needed to adapt the model graph in order to cover a subgraph in the input model. This distance is asymmetric, it represents the distance from the model graph to the input graph. In order to rank the model graphs, the similarity measure has to take into account the number of vertices in the input graph that were covered by the model graph. If two model graphs have the same subgraph distance to the input graph but are matched to subgraphs with different number of nodes, the one that matches a subgraph with more nodes will be preferred. Depending on the application, the similarity measure can be defined in different ways. The total distance between the two graphs can be defined as the sum of the subgraph edit distance and the cost of adding the nodes of the input graph not covered by the ec subgraph. The second possibility is to define it as the subgraph edit distance (de) divided by the number of nodes of model graph (NM ): D = de/NM . The similarity measure is the inverse of this distance.
6.2. Composing fragments to match the input graph If two models are matched into 2 subgraphs of the input model that are disjoint (the set of nodes are disjoint), then it is possible to combine them to form a graph that will be matched to a bigger subgraph of the input graph. Suppose two model graphs G1 and G2 . Let f1 = (∆1 , f∆1 ) and f2 = (∆2 , f∆2 ) be two ec subgraph isomorphism from G1 and G2 to GI , respectively. The problem is to find an ec subgraph isomorphism from
G = G1 U G2 to GI that is based on f1 and f2 . f1 and f2 can be combined if no two vertices in ∆(G1 ) and ∆(G2 ) are mapped into the same input vertex. More precisely, the intersection of the images of f∆1 and f∆2 must be empty. The construction of an ec subgraph isomorphism f = (∆, f∆ ) from f1 and f2 requires that a set of edit operations ∆ and a subgraph isomorphism f∆ are generated on the basis of ∆1 , ∆2 , and f∆ 1,f∆ 2 respectively such that ∆ is a subgraph isomorphism from (G1 ∪ G2 ) to GI : f∆ (v) = f∆1 (v) if v ∈ V∆1 f∆2 (v) if v ∈ V∆2 ∆ = ∆1 + ∆2 + ∆E ∆E is constructed as follows: For each pair (vi , wj ), vi ∈ V1 , wj ∈ V2 If there is an edge eI = (f∆1 (vi), f∆2 (wj)) ∈ GI , then an edge (vi, wj) must be inserted in G1 ∪ G2 . To summarize, if two models G1 and G2 cover disjoint subgraphs in the input graph, it is possible to construct a model that is their union in order to find a better match for the input graph.
6.3. Behavioral matching operators Any algorithm that implements optimal ec subgraph isomorphism can be used for graph isomorphism, subgraph isomorphism or maximum common subgraph detection if it is run under a cost function that satisfies the following conditions [Bun99]: • cost of node deletion + cost of node insertion is less than cost of node substitution. • cost of node deleting + cost of node insertion is less than cost of edge substitution. There won’t be any substitution operation in a minimum cost sequence of edit operations, and the only edit operations actually applied are insertions and deletions. The ec matching algorithm allows to compute the following operators. • Match- The operator Match takes two models as input and returns a mapping betwwen them. The mapping identifies objects in the models that are either equal or similar. Two versions of the match operator can be defined. ExactM atch(G1, G2) is equivalent to graph or subgraph isormorphism
and can be found by the ec subgraph isomorphism algorithm under certain cost functions. The approximate match M atch(G1, G2) returns an ec subgraph isomrophism from G1 to G2 . • Difference - The difference between two models is the set of objects in one model that do not correspond to any object in the other model. The difference is calculated relative to a given mapping, computed by the match operator. Dif fmatchM −>I (GM , GI ) the difference between the model graph and the input graph relative to the ec subgraph isomorphism from GM to GI represents the set of vertex and the set of edges that were suppressed from GM in the sequence of edit operations ∆. Dif fmatchM →I (GI , GM ) the difference between the input graph and the model graph relative to the ec subgraph isomorphism from GM to GI represents subsets of nodes and edges of GI that have no correspondences in the model graph. The set of nodes is VM − f∆ (VI ) and the set of edges is EM − f∆ (VI ) × f∆ (VI ). • Merge - returns a copy of all objects in the two models, but objects of input models that are equal are represented only once in the output model. Having two models G1 , G2 , the operator M erge(G1 , G2 ) determines the minimum common supergraph of G1 and G2 . For instance, consider the example of two subsidiary companies that fusion and whose marketing departments target different, but overlapping market segments. In this case, the merge operator could be applied to the processes modeling the activity of these departments. • Intersection - The intersection between two models returns the set of objects that are common to both models. In term of graphs, Intersection(G1 , G2 ) represents the maximum common subgraph of G1 and G2 . • Match composition - Having M atch(G1 , M ), M atch(G2 , M ) the problem is to determine M atch(G1 ∪ G2 , M ). In section ?? we showed how this operator is calculated.
7 Semantic composite web service discovery In this section we illustrate the use of the algorithm presented above for semantic composite web service discovery. Suppose that user needs to develop a new composite web service. He specifies the composition model and then he will try to find in the library similar web services or fragments that could be composed to develop the new web service. In order to allow a semantic match, we suppose that the following semantic information is attached to each service (the composite web service and each web service component) similar to [SVS+03]: • Name of operation. In the process definition, an activity attribute, called operation-concept is added which allows users to search for operations based on ontological concepts. For instance, operations buyTicket and cancelTicket are mapped to ontological concepts TicketBooking and TicketCancellation in the TravelService Ontology. • Input and output parameters of activities. Using ontologies to annotate input and output elements brings user requirements and service advertisements to common conceptual space and helps to apply reasoning mechanism to find a better match. The semantics implied by these parameters, which are known only to provider of the service, can be made explicit. For example, the input Travel Details and output Confirmation are mapped to ontological concepts TicketInformation and ConfirmationMessage, respectively. The mapping of parameters to ontological concepts could be done manually or semiautomatically. In [DHM+04] an algorithm is presented that clusters operation parameters names from WSDL descriptions into meaningful concepts. • preconditions and effects. Each activity may have a number of preconditions and effects. The preconditions are some logical conditions, which must be true for executing the activity. Effects are changes in the world after the execution of the operation. These preconditions and effects are mapped to ontological concepts. Similarly, tran-
sition conditions associated to edges are mapped to ontological concepts.
7.1 Behavioral matchmaking User enters web service requirements as a template constructed using ontological concepts. The template and existing process models are transformed in graphs. In this way, the problem of service matchmaking is transformed into a problem of graph matching. If user defines input, output parameters and operation name for the new composite web service, then a first filter could be applied for components retrieval based on these properties. The behavioral matching will be applied either to the set of services retrieved in the first step or independently. We use the algorithm for ec subgraph isomorphism to retrieve the most similar models. For each attribute of a service, the cost function of substituting an attribute value has to be defined. These attributes being concepts in the ontology, the cost function is the distance between these concepts in the ontology. We use Wu and Palmer similarity measure [WP94] to calculate this distance. Given a tree, and two nodes a,b of this tree, we first of all find their deepest (in terms of depth in the tree) common ancestor c. The similarity measure is computed as follows: SW P (a, b) = 2 × Depth(c)/(Depth(a) + Depth(b) + 2 × Depth(c)) The distance (cost) will be: d(a, b) = 1 − SW P (a, b) Each service is characterized by following attributes : operation name, input, output, pre-conditions and effects. For the ec algorithm the cost function for each graph operation has to be defined. The cost for deletion/insertion of an edge and an vertex can be set to constants. The cost for substituting a label and its attributes is defined as follows. Given a label l1 and attributes a1 , a2 , · · · an and another label l2 and attributes b1 , b2 , · · · bm , the cost for substituting l1 with l2 is computes as follows. If l1 and l2 are not identical then the substitution cost is set to the corresponding constant. If, on the other hand l1 and l2 are identical and n=m, the substitution cost CS is defined to be the weighted mean of distances between a1 , a2 , · · · an and b1 , b2 , · · · bn , i.e., P P CS = d(ai , bi ) × wi / wi
Applying the ec subgraph isomorphism algorithm to compare user requirements with the model graphs allows to find the most similar models in the library. The algorithm outputs also the transformations needed, for exemple adding an edge, or changing the order of two nodes. In a second stage, if some models cover disjoint subgraphs of user model, a composition of these models can be proposed.
8. Conclusion In this paper we proposed a solution for process retrieval based on behavioral specification. First we motivated the need to retrieve services based on their composition or conversation model. By using a graph representation formalism for services, we proposed to use a graph error correcting matching algorithm in order to allow an inexact matching. The behavioral matchmaking can be implemented as a web service that takes as input the graph representations of two processes and calculates the degree of similarity between them and outputs also the transformations needed to transform one process into the other. This service invokes the service that matches services based on functional requirements (services considered as black boxes) for which methods were proposed in the literature. We also proposed a set of process management operators that are useful for process retrieval. We are working on extending the algorithm and the operators to tackle the situation when one node in the first graph corresponds to a subgraph of the second graph. This situation appears when the first service has a single operation (activity) to achieve certain functionality, while in the second service the same behavior is achieved by receiving several messages. Suppose that a protocol requires to receive the purchase order as well as shipping preferences in one message, while the second one needs two separate messages for this purpose (sendShippingpreferences and submitOrder). While in this paper we dealt with the semantics aspects of behavioral matchmaking, we did not address the operational aspects. The graph isomorphism computation being a NP-complete problem, in our future work we will try to apply constraints and heuristics to cut down the computational effort to a manageable size.
9 References [BaV03] S. BANSAL, J. M. VIDAL. Matchmaking of web services based on the DAML-S service model. ” In Proc. of the Second In.l Joint Conference on Autonomous Agents and Multiagent Systems ”, pages 926-927, 2003. [BeCT04] B. Benatallah, F. Casati, and F. Toumani. Web services conversation modeling: A Cornerstone for E-Business Automation. IEEE Internet Computing, 8(1), 2004. [BCG+05] B. Benatallah, F. Casati, D. Grigori, H. R. Motahari Nezhad, F. Toumani, Developing Adapters for Web Services Integration, CAISE 2005. [BeH03] B. BENATALLAH, M. S. HACID, C. REY and F. TOUMANI. Semantic Reasoning for Web Services Discovery. WWW2003 Workshop on EServices and the Semantic Web. May 20, 2003. Budapest. Hungary. [BeK02] A. BERNSTEIN, M. KLEIN. Towards High-Precision Service Retrieval in ” Proc. of Int. Semantic Web Conference (ISWC) ”, 2002. [Bordeaux04] L. Bordeaux et al. When are two Web Services Compatible?. VLDB TES’04. Toronto, Canada. 2004. [Bun99] H. BUNKE: Error correcting graph matching: On the influence of the underlying cost function, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol 21, No 9, 1999, 917 - 922 [Bun00] H. BUNKE Recent developments in graph matching, Proc. 15th Int. Conf. on Pattern Recognition, Barcelona, 2000, Vol 2, 117 - 124 [CaS03] J. CARDOSO, A. SHETH. Semantic EWorkflow Composition. in ” Journal of Intelligent Information Systems ) ”, Volume 21 , Issue 3 (November 2003), 191 - 225. [DHM+04] L. DONG, A. HALEVY, J. MADHAVAN, E. NEMES and J. ZHANG. Similarity Search for Web Services. Proceedings of VLDB, 2004. [FVW+02] I. FOSTER, J. VOECKLER, M. WILDE, Y. ZHAO. Chimera: A Virtual Data System for Representing, Querying and Automating Data Derivation. in ” Proc. of the 14th Conf. on Scientific and Statistical Database Management ”, Edinburgh, Scotland, July 2002. [FVW+03] I. FOSTER, J. V?CKLER, M. WILDE, Y. ZHAO. The Virtual Data Grid: A New Model and
Architecture for Data-Intensive Collaboration. . in ” CIDR - Conf. on Innovative Data System Research ”, 2003. [GO97] V. GUTH, A. OBERWEIS. Delta-Analysis of Petri Net based Models for Business Processes, In: E. Kovacs, Z. Kovacs, B.Cserto, L. Pepei (Hrsg.): Proceedings of the 3rd Int. Conference on Applied Informatics, Eger-Noszvaj/Ungarn, S. 23-32; [KBH+03] T. KAWAMURA, J.A. DE BLASIO, T. HASEGAWA, M. PAOLUCCI, K. SYCARA. A Preliminary Report of a Public Experiment of a Semantic Service Matchmaker combined with a UDDI Business Registry, in ” Proc. of 1st International Conference on Service Oriented Computing (ICSOC) ”, Trento, Italy, December 2003. [Mes95] B. Messmer, Graph Matching Algorithms and Applications, PhD Thesis, University of Bern, 1995 [MeBu98] B. Messmer, H. Bunke ,A New Algorithm for Error-Tolerant Subgraph Isomorphism Detection, IEEE Trans. PAMI, 20:493-505, 1998 [PKP+02] M. PAOLUCCI, T. KAWAMURA, T. R. PAYNE, K. SYCARA. Semantic Matching of Web Services Capabilities. in ” Proc. of The First International Semantic Web Conference (ISWC) ”, Sardinia (Italy), June, 2002. [PoFo04] S. R. Ponnekanti and A. Fox. Interoperability among Independently Evolving Web Services. Middleware ’04. Toronto, Canada, 2004. [PVM01] G. PICCINELLI, G. DI VITANTONIO, L. MOKRUSHIN. Dynamic service aggregation in electronic marketplaces. . in ” Computer Networks ”, 37(2): 95-109, 2001. [TBG01] D. TRASTOUR, C. BARTOLINI, J. GONZALEZ-CASTILLO. A Semantic Web Approach to Service Description for Matchmaking of Services. in ” Proc of Int. Semantic Web Working Symposium (SWWS) ”, Stanford University, California, USA, 2001. [WMF+04] A. WOMBACHER, B. MAHLEKO, P. FANKHAUSER, E. NEUHOLD. Matchmaking for Business Processes based on Choreographies IEEE International Conference on e-Technology, e-Commerce and e-Service(EEE-04), March 2004, Taipei, Taiwan [Saha81] L. G. Shapiro and R. M. Haralick, ”Structural descriptions and inexact matching,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI- 3, pp. 504-
519, 1981. [SVS+03] K. SIVASHANMUGAM, K. VERMA, A. SHETH, J. MILLER. Adding Semantics to Web Services Standards, International Conference on Web Services (ICWS’03), June 2003 [WP94] Z. WU and M. PALMER ”Verb Semantics and Lexical Selection”, Proceedings of the 32nd Annual Meetings of the Associations for Computational Linguistics, pages 133-138. [ZdJ04] J. ZDRAVKOVIC, P. JOHANESSON. Cooperation of Processes through Message Level Agreement, in ”Proc. of Int. Conf. On Advanced Information Systems Engineering (CAISE) ”, Riga, Latvia, June 2004.