Integrating Directories and Service Composition - Semantic Scholar

2 downloads 0 Views 119KB Size Report
Peter Cheeseman, Bob Kanefsky, and William M. Taylor. Where the ... Craig A. Knoblock, Steven Minton, Jose Luis Ambite, Naveen Ashish, Ion Muslea, Andrew.
Integrating Directories and Service Composition Ion Constantinescu and Boi Faltings Artificial Intelligence Laboratory Swiss Federal Institute of Technology IN (Ecublens), CH-1015 Lausanne (Switzerland)  ion.constantinescu, boi.faltings @epfl.ch http://liawww.epfl.ch

Abstract. A major challenge in web service composition is that there will be a large number of often redundant web services that can be used. Furthermore, the set of services is unbounded and changes dynamically. Consequently, web service composition methods have to be tightly integrated with directories or search engines for such services. We introduce an approach for creating scalable directories of web services modelled by DAML-S service descriptions. Based on this approach, we develop a service composition algorithm that incrementally queries the directory. Our contribution also includes an implementation of such a system that helped us in understanding the practical challenges in building a tool for service integration. The major novelty of our approach is that it is fully able to deal with partial matches, both in the directory and the planning algorithm. Keywords: Semantic Web Services, Semantic Interoperability, Searching and Querying

1 Introduction While the current WWW is most commonly accessed through a browser, the future semantic web will be very often accessed through web services. This will require automatic techniques for finding and then composing these services to achieve the desired functionality. Previous work has shown techniques for matchmaking (for example, LARKS [10]) and service composition with AI planning techniques (for example, [8], [11]). In order to be useful in a full-scale semantic web, these techniques will have to be scaled to environments with large numbers of services. Furthermore, it will often be the case that web services can fullfill the requested functionality only partially. For example, we might be interested in finding a phone directory for the US but we find only regional directories for each state. In this case of partial matches a general service can still be composed by routing queries to the relevant information sources. Even if service composition is related to AI planning, there are a number of differences between the two which makes the direct adoption of planning techniques for service integration questionable: the number of services is very large, but on the other

2

Ion Constantinescu, Boi Faltings

hand the rarely resource conflicts which have motivated many of the AI planning algorithms are very rare. In this paper, we show how to construct scaleable directories for large numbers of web services and how to build service composition algorithms on this basis. We investigate the scaling behavior on randomly generated scenarios and show that the directory component significanly enhances the scalability of service composition. 1.1 A music sharing scenario Let’s consider as an example (see Figure 1) the following scenario: a personal agent (PA) is delegated the task to find for its user an album of good pop music (since its user doesn’t like to listen to melodies without context). For that the PA first has to determine what is currently considered as good pop and uses a recommendation site (e.g. billboard.com) to find Album X as a first option. Then the PA has to use a library site (e.g. cdcovers.cc) to determine which are the melodies in Album X - let’s presume that they are Melody-1, Melody-2 and Melody-3. In the next step the PA determines from the installed system software which are the current ways for downloading music (e.g. it could use the p2p networks KaZaa, iMesh and DC++). Let’s also assume that due to the fact that the user wants to use the computer while the music is retrieved the PA has to minimize the required computing power and is constrained to use for all the downloads only one p2p network.

Recomendation site

good-pop ?

Album X

Library site

2 3 Listen good pop album

Album X ?

Personal Agent

Melody-1, 2, 3 4 Melody-1 ? Yes

1

Melody-2 ? Yes Melody-3 ? Yes

6

5 Melody-3 ? No

p2p network 1

Melody-3 ? No

Melody-1 ? Yes Melody-2 ? Yes

p2p network 3

p2p network 2

Fig. 1. Putting together a good pop music album.

The PA tries to use the first p2p network for searching for the three melodies. Since Melody-1 had a nice videoclip and Melody-2 was a huge concert succes they are very

Integrating Directories and Service Composition

3

easy to find but Melody-3 is quite unknown so it’s not found. As such this p2p network cannot be used for Album X. It is very important to notice that in fact finding Meldoy-3 is a hard part of the problem (since the melody itself is not popular), independent of the p2p network used. When searching the second p2p network the PA could start again by searching first Melody-1, etc. Still a better strategy would be to try first to solve the hard part of the problem: finding Melody-3. Again the melody cannot be found and as such this network could not be used but searching first for Melody-3 speeded-up the process (we have spared the search for Melody-1 and Melody-2). Finally searching the third network by Melody-3 gives a positive answer, as well as for Melody-1 and 2. The problem has been solved and the current p2p network can be used to retrieve all the files. This example shows how several information services can be linked together by searching a directory of potentially very large number of services. We consider this to be a new class of service integration problems where the service integration dificulty lies in the high cost of accessing the directory. 1.2 Web Services and Semantic Web Services Why are service descriptions like WSDL, which define only input/output/failures, not enough ? Because when large numbers of services will be developed the situation could occur that services that are functionally equivalent are semantically very different (E.g. a good number of services could only have one string as input and one string as output and without some different markup regarind preconditions and effects they could be undistinguishable). We have investigated this issue in more depth by gathering 473 descriptions of existing services and doing some statistics. All descriptions used the Web Services Description Language (WSDL http://www.w3.org/TR/wsdl). The statistics show that all services were RPC-like, in average with 2 inputs and 1 output and a maximum of 22 inputs respectively 36 outputs. Each service defined it’s supported messages using an average of 6 unique fields from a total of 136 unique fields (we considered as fields unique names of message parts). The vast majority of those fields (97%) were encoded as primitive data types (86% strings, 6% longs, 2% ints, 1% doubles, 1% booleans and 1% arrays of primitive types) and only 3% were encoded as user defined data types. These figure confirm our suspicions that doing serivce integration using existing web services technology will be a very difficult and imprecise task - since for the analised descriptions the majority of the fields of a web service are defined as completely opaque strings. Thus enhancing current web services to semantic web services seems to be a very promising approach for enabling better service integration. 1.3 Driving assumptions for service integration This paper will build on the following assumptions that will be discussed and justified in detail in Section 2:

4

Ion Constantinescu, Boi Faltings

Service Descriptions: services are described by preconditions and effects expressed as conjunctions of grounded literals. Services will be coarse-grained usually with more than one precondition or effect. We consider only information services that have only positive preconditions and effects. Large directories: there will be large numbers (    ) of services which will use directories for publishing and discovery . Partial and incremental discovery: the directory and the integration algorithm will be able to deal with partial matches of service descriptions and will be able to rank results from the directory accordingly to the match. 1.4 About this paper This paper puts the service integration in a wide context where large numbers of continously changing service descriptions will be stored in directories. In this case the difficulty is due to the high computational cost of accessing the directory. We compare experimental results on three lgorithms: a basic forwardChaining algorithm, and two directory enhanced algorithms - forwardChainingBestMatch and backwardChaining. Our experimental results show that the last two algorithms make better use of the directory and outperform the classic forwardChaining algorithm while also being more scalable. The rest of the paper is as follows: next in Section 2 we present our problem definition regarding service descriptions and directories and some existing technologies that motivated some of the choices made; then in Section 3 we show how such service descriptions can be matched and we present a number of possible match types; in Section 4 we show how large numbers of such service descriptions can be indexed in large directories that provide millisecond response times for tens of thousands of registered services; in Section 5 we present three algorithms for service composition that make use of the search capabilities of the directory and handle partial matching services ; then in Section 6 we present two synthetic testbeds for service integration and we show some experimental results; finally Section 7 Conclusion and Section 8 Acknowledgements end this paper.

2 Background and Problem Definition Even if our underlying formalism is richer (see below) for the purpose of this paper we consider that each service description can be represented as two sets of grounded literals - one set for preconditions and one set for effects. We  note a service description X having precondtions  ,  and effects  and  as      . We define also the functions 

 "! and #$#% "! which return the set of literals describing the preconditions and respectively the effects of the service  . This simplification of notation was done for making the integration algorithms more clear but the implemented systems and the experimental results are based on the key/value formalism presented below. Our underlying formalism builds on DAML-S Profiles [1] while also being similar to UDDI. A service is described via a set of key/value pairs. Keys can have multiple

Integrating Directories and Service Composition

5

inheritance relations with other keys. The same applies also for values. A possible mapping between our formalism and DAML-S Profile and UDDI can be seen in Figure 2. In the case of UDDI a key/value pair maps to a keyedReference, a key maps to a tModel anda value maps to a string. In the case of DAML-S a key/value pair maps to a ParameterDescription, a key maps to a DAML Property and a value maps to a DAML class expression or to an instance resource of a given type.

1 ServiceDescription __________

1..*

KeyValuePair __________

Class __________

subClassOf

1 ClassOrInstance __________

UDDI:

businessEntity

keyedRefferences

tModel

String

DAML-S:

Profile

ParameterDescription

Property

Class, Resource

Fig. 2. Possible mapping to UDDI and DAML-S.

Our formalism is more restricted than DAML-S by only allowing for preconditions and non-conditional effects. This is due to the fact that we consider inputs and outputs as a form of information availability precondtions and information availability effects. We consider a directory system that can be searched for a given ”template” service description (e.g. & ) and can return one or more results (e.g. ' ()'*+()'-,(./.0. ) accordinlgy to a given type of match (e.g. MatchType): 12234568793:;45=? @()&A!CB '  ('*D('-,+(.0./. . The directory has also the ability to sort results accordinlgy to their match precision in respect with the query & and as such at the begining of the result set we will find the ”best” matches. Also results can be pulled one-by-one from the directory: EF2G:HI1?34568&J!K! will return only the next result. Finally the directory can also return only ”new” results by filtering results from a list of known services: 1?345@LM2NA8&O(PEFQ?N>E 122?RST4H1?! will return only the services that match & but are not in the known services set. Please note that in the algorithms in Section 5 these different query capabilities of the directory service will used agregated as single functions (e.g. nextBestNewOverlapMatch will retrieve the next value from a sorted set of the results of an overlap match query while filtering out some values). Now we can define the core problem addressed in this paper, service integration: given a service description & and a system of directories that can be queried for service descriptions, provide a new integrated service ' by chaining existing services such that ' provides all the functionality of & .

3 Discovery and Matchmaking of Web Services Currently UDDI is the state of the art for directories of web services. The standard is clear in terms of data models and query API, but suffers from the fact that it considers service descriptions to be completely opaque. More fine-grained behaviour can

6

Ion Constantinescu, Boi Faltings

be achieved by the specification of application dependent tModels but still in the end the UDDI query process is a comparison of sets of tokens which does not take into consideration classification relations or the structure of the service description. One method used for determining relevant services from a directory of advertisments is to use matchmaking. In this case the directory query (requested capabilities) is formulated in the form of a service description template that presents all the features of interest. This template is then compared with all the entries in the directory and the “matching” results are returned. A good amount of work exist in the area of matchmaking including LARKS [10], and the newer efforts geared towards DAML-S [9]. Other approaches include the Ariadne mediator [7]. The outcome of the matchmaking process could be one of the relations below between a query service Q and a library service S (examples in Figure 3). The first three types have been previously identified by Paolucci in [9]. Determining one of such match relations between two service descriptions requires that subsequent relations are determined between all the inputs and preconditions of the query Q and library service S and between the ouputs and effects of the library service S and query service Q (note the reversed order of query and library services in the match for outputs and effects). Still our approach is more complex than the one of Paolluci in that we take also into account the relations between the Properties that introduce different inputs or outputs (equivalent to parameter names). This is quite important for disambiguating services with equivalent signatures (e.g. we can disambiguate two services that have two string outputs by knowing the names of the respective parameters). In the example below we well consider how the match relation is determined between services Q and S that have only one ouput defining the provided style of music. – Exact - S is an exact match of Q ( &VUW' ). In our example this is the case of Q1 and S1. – PlugIn - S is a plug-in match for Q, if S could be always used instead of Q ( &YXZ' ). In our example this is the case for output of Q2 and S1. – Subsumes - Q contains S ( '[X & ). In this case S could be used under the condition that it satisfies some additional constraints such that it is specific enough for Q. In the case of several S’s, discrimination between them could be done based on those constraints. This is the case with output of Q1 and S2. – Overlap - Q and S have a given intersection ( \+]VX^&`_a' ). In this case, runtime constraints both over Q and S have to be taken into account. In our example this is the case for output of Q3 and S3 ( bcQ4PdX^8efQ hgdbcQ4P!_M8bcQ4PJgMi$3jj! ). – Failed - Q and S have no intersection ( &_>'Ck`l ). In this case the system could use a “nearest neighbour” technique to provide services which are as close as possible to Q. This is the case for output of Q2 and S2. It has to be noticed that the following implications hold for any match between query and library service descriptions & and ' : monFp+qsrIta(u6!wvyxfz|{~}@H€Ita(u6! and m‚nFp+q rƒt(u6!„xfz8{6}H€Ita(u6!F„auF{6…~†2{~‡[ˆ†ƒt(u6!*vŠ‰O‹@ˆŒz|p+ƒt(u6! . Given also that a Subsumes match requires the specification of supplementary constraints we can order by “precision” the types of match as following: Exact, PlugIn, Subsumes, Overlap. We consider Subsumes and Overlap as ”partial” matches. This order corresponds to the one suggested by Paolucci in [9].

Integrating Directories and Service Composition MusicStyle

7

MusicStyle

Q1 Classic

Classic

Q1

S1 Instrumental

Opera

Opera

S2

Exact match

Q2

MusicStyle

Classic

Instrumental

PlugIn match

Constraints sharing=Opera

Subsumes match

MusicStyle

S1

Instrumental

Pop

Opera

Q3 Constraints user=Rock sharing=Rock

Jazz

Rock

S3

Overlap match

MusicStyle

Nearest neighbour

Classic

Q2

Instrumental

Opera

S2

Failed match

Fig. 3. Match types of one output of query and library services Q and S by “precision”: Exact, PlugIn, Subsumes, Overlap.

4 Efficient Service Directories In a real world environment created by numerous service providers that advertise their particular services, we assume a realistic setting in which directories will store numerous descriptions. The directory service must efficiently deal with data organisation and retrieval. The need for efficient discovery and matchmaking leads to the creation of search structures and indexes for directories. The novelty of our approach is to consider a service description as a multidimensional data record and then use in the directory techniques related to the indexing of such kind of information. For that we need to first numerically encode objects and service descriptions and we present below a technique for doing that. This approach leads to local response times in the order of milliseconds for directories containing tens of thousands (  Ž ) of service descriptions. 4.1 Multidimensional Access Methods - GiST There is a lot of work in the database community regarding the indexing and storage of multidimensional objects from rectangles, polygons, CAD drawings to images. Many

8

Ion Constantinescu, Boi Faltings

solutions have been proposed for managing multidimensional data. Therefore, work has been done for isolating the common approach that all these solutions take. Hellerstein [6] proposed as an unifying framework the Generalised Search Tree (GiST). The design principle of GiST arises from the observation that search trees used in databases are balanced trees with a high fanout in which the internal nodes are used as a directory and the leaf nodes point to the actual data. Each internal node holds a key in the form of a predicate P and can hold at the most a predetermined number of pointers to other nodes (usually a function of system and hardware constraints, e.g. filesystem page size). To search for records that satisfy a query predicate Q the paths of the tree that have keys P that satisfy Q are followed. So the requirement for a general search tree is that the search key of a given node is a predicate that holds for all the nodes below. A large number of existing tree algorithms can be recasted in terms of GiST: B , R tree, R  tree (by slightly modifying the GiST insertion algorithm), extended KD trees, etc. GiST is well known in both academic (Postgresql) and industrial (Informix) DB communities for defining access methods to custom data types. The architecture of GiST is split in two parts: key methods (that have to be implemented for any new application) and tree methods (which are to be provided by toolkits supporting GiST). In order to be able to use GiST we must express a service description as a key and implement the GiST key methods. In order to accomplish this our approach is to numerically encode a service description and its data types. We present this encoding technique below.

4.2 Encoding service descriptions

Property

prop2

prop3

Thing

Class mapping:

classA

prop1=prop1_0 prop2=prop2_0 prop3=prop3_0 prop4=prop4_0, prop4_1

classB

classC

classD

prop4 classE

(b) Class hierarchy

Properties Service Description: prop1=classE prop4=classC

classA=classA_0 classB=classB_0 classC=classC_0 classD=classD_0 classE=classE_0, classE_1 classF=classF_0, classF_1

classF

(a) Property hierarchy

prop4_1

prop4_0

_0

sE _1

sC

Classes

cl as

cl as

sE _0

prop1_0

cl as

prop1

Property mapping:

(c) Numeric encoding of a service description

Fig. 4. Numeric encoding of a service description

Integrating Directories and Service Composition

9

Taxonomies can be numerically encoded such that inclusion relations can be determined by very simple operations [4]. Our approach is to use an interval based representation for both classes and properties. The method is generalised in order to support multiple parents by allowing for the encoding of a class/property as a set of intervals instead of only a single interval. The numeric encoding of a service description is straightforward: the pairing of properties represented as sets of intervals with classes or values also represented as sets of intervals can be seen as a set of rectangles in a bidimensional space having on one axis Classes or Values and on the other Properties. We take as an example the case of a service description with two properties prop1 and prop4 which have as ranges (types) the classes classE and respectively classC (Figure 4 (c)). As it results from Figure 4 (a) prop4 multiple inherits from prop2 and prop3 such that it will be represented by a set of two single inheritance classes - prop4 0 and prop4 0 and their associated intervals. Similarly classE multiple inherits from classB and classD such that it is also going to be represented by two single inheritance classes / intervals - classE 0 and classE 1. So the service description above can be represented in a bidimensional space as a set of four rectangles (prop1 0 x classE 0, prop1 0 x classE 1, prop4 0 x classC 0, prop4 1 x classC 0).

5 Service composition with directories For the last years service integration was an active field for both the AI and databases research communities including Infosleuth [2], work by Doan and Halevy [5], McIlrath [8] and Thakkar and Knoblock [11]. In this paper we analise a class of algorithms for building integrated services that incrementally extend an initial set of propositions until the set satisfies the initial integration query. Depending on what is used to query the directory for extending the set we see two possible approaches: – a forward chaining approach (Figure 5 (a)) that will continously expand the set of propositions by using the set as a querying to the directory. A similar approach was used by Thakkar and Knoblock [11]. – a backward chaining approach (Figure 5 (b)) that will expand the set by the difference between what was fullfilled and what remains to be fullfilled. This approach might lead to backtracking as some expansion paths might end up by not having any result. A similar approach is used by classical STRIPS-like planners. The first algorithm for service integration (Algorithm 1 - forwardChain) is a variant of the one proposed by Thakkar and Knoblock [11] and uses a forward chaining technique. The algorithm tries to fullfill a query service description & by incrementally searching for services that can be satisfied using preconditions available in the current search state ( ‘ ). This search state is also used for filtering out the services that where already retrieved from the directory. It has to be noted that this algorithm produces only the ”possible” services to be used for solving the integration problem. For creating an actual solution the resulting ‘ must be searched again backwards starting from the goals. A similar polynomial

10

Ion Constantinescu, Boi Faltings

Q

integrate s1

integrate s2

integrate s3

(a) Forward chaining approach (b) Backward chaining approach

Q integrate s1

integrate s4

Q Q integrate s2

integrate s3

integrate s5

integrate s6

Q Q Q

query

no more results

Fig. 5. Two approaches to service integration: forward chaining and backward chaining.

post-processing named ”dataflow analisis” is also performed by Thakkar and Knoblock [11]. The main issue with this first algorithm is that in an environment with large numbers of availbale services retreiving all the services that match a given query will be practically impossible (due to changing environment and large quantities of data that should be transfered). In order to overcome this issue we rely on the capabilty of the directory service to find the ”best” match for a given query while again filtering out results already in the solution state ‘ . This is presented in Algorithm 1 - forwardChainBest where only one new service is retrieved at each time. We also propose a different algorithm more similar in approach to classical planning where at each step the directory is searched by using the unsatisfied preconditions and already retrieved services are filtered using the set PEFQ?N>E 122?RST4H1 . A service that is a best new match for it is retrieved and added to the candidate solution 12Q’ “:TSTQ?E . The main dissadvantage of this approach is that the search process might encounter dead-ends where no new services can be found for the remaining problem. As such the algorithm might need to backtrack. Still our assumptions (resource rich environment) should lead to a large number of choices for solving a given problem and due to this intensive backtracking should actually not be required.

Integrating Directories and Service Composition pre( ” eff( ”

) • pre( Q ) ) •—–

procedure forwardChain( ˜ , ” ) do results • • allNewPlugInMatches(pre(” ), ” if results ™ ” then return failure end for š x › results do pre( ” ) • pre( ” ) œ eff( x ) eff( ” ) • eff( ” ) œ eff( x ) end if eff( Q ) ™ eff( ” ) then return success else return žŸ ¢¡sŸ£ ¤ ¥¡s¦ § ( ˜ , ” ) end end do.

pre( ” eff( ”

11

) • pre( Q ) ) •—–

procedure forwardChainBestMatch( ˜ , ” ) do x • nextBestNewPlugInMatch(pre(” ),” ) ) if x = NIL then return failure end pre( ” ) • pre( ” ) œ eff( x ) eff( ” ) • eff( ” ) œ eff( x ) if eff( Q ) ™ eff( ” ) then return success else return žŸ ¢¡ Ÿ£ ¤ ¥¡s¦ §¨o©ª)«;¬­¡«ƒ®¥ (˜ ,” end end do.

Algorithm 1: Two forward chaining algorithms: classic and using best match. Q is the integration problem and ‘ is the solution state.

6 Service composition testbed and experimental results For experimental purposes we have tested the proposed algorithms against a number of synthetic models regarding the services existent in the network. Given that services are described by grounded literals (preconditions/effects) the differences between models come from different choices regarding: – the coupling between services - e.g. given one service & providing some outputs and effects, what other services ' (././.0('$¯ depend on it. – the fan-in of a service - how many preconditions does a service need. – the fan-out of a service - how many effects does a service provide. – the redundancy of a service - for a given service description, how many equivalent (same preconditions/effects) services exist – model determinism - are the service descriptions in the model generated completley deterministic in respect to the parameters above or is there a random factor involved 6.1 Layered Test Model First we have considered a completley deterministic model. Here service descriptions are organized in a number of ’ layers where a service on layer P can be coupled only with services from the previous ( P°` ) or the next ( P²±9 ) layers. All services have the same fan-in (number of inputs+preconditions) and the same fan-out (number of outputs+effects). There are no redundant services.

)

12

Ion Constantinescu, Boi Faltings

previousGoals •V– procedure backwardChain( Q, solution ) do if eff( Q ) = – then return solution end if ³«ƒŸ¦|©H£ ´ž¡ µ¶ª› previousGoals s.t. eff( Q ) ™ triedGoals then return NIL end  previousGoals • previousGoals œ eff( Q ) known services •V– foreach x = nextBestNewOverlapMatch( eff( Q ), known services ) do known services • known services œ x newSolution • newQ • Q; eff( newQ ) • eff( newQ ) • eff( newQ ) •

x œ solution eff( newQ ) · eff(x) eff( newQ ) œ pre(x) eff( newQ ) · pre( Q )

resultSolution • backwardChain( newQ, newSolution ) if resultSoltion ¹ ¸ NIL then return resultSolution end

end return NIL end do.

Algorithm 2: Backward chaining algorithm. pre( Q ) are the initial problem conditions and eff( Q ) are the goals to be fullfilled.

For creating an instance of the model we first uniquely assign facts (preconditions/effects) to any of the ’ layers. Then Service Descriptions are defined using facts from two adjacent layers (e.g. inputs and preconditions from layer k and outputs and effects from layer k+1). We have implemented a testbed using this model. The main measure according to which we have compared the three algorithms presented before is the number of directory access computed as the product between queries and retrieved results. Please note that for the forwardChainBest and backwardChain this is equivalent with the number of directory queries since in these cases only one result is retrieved from the directory at each time. For testing purposes we have varied both the depth of the model (the numbers of steps required for the integration) and the breadth of one layer. The tests show that increasing the number of layers doesn’t have a big impact on the number of access in

Integrating Directories and Service Composition

13

Layered Model − Level Breadth 4 300 Forward Chaining Forward Chaining Best Match Backward Chaining

Directory Accesses (Queries * Retrieved Results)

250

200

150

100

50

0 100

120

140

160 180 200 220 240 Number Of Services Descriptions in Directory

260

280

300

Fig. 6. Layered test model: variation of directory access where level depth varies and breadth is constant (4).

contrast to an increase in the number of propositions on each layer which has highy influence. Still tests used services with equal fan-in/fan-out. We have also measured the number of directory accesses in respect to the number of services when the breadth varied for the same constant level depth. As it can be seen in Figure 6 all three algorithms show a linear dependence on the number of services. In terms of performance the focused backwardChaining algorithm performs better than the other two algorithms. The forwardChainingBest which retrieves only one service at the time performs also better than the classical forwardChaining algorithm. 6.2 Random Test Model As a second model we have considered one generated in a non-deterministic manner. As the main parameter of the model we have used the number of services defined over a maximum services size of propositions from the vocabulary of vocabulary size. As for the layered model we have measured the quality of the three algorithms by the number of directory accesses computed as the product between queries and retrieved results. For creating problem instances we have first generated a set of unique random combinations of maximum service size propositions. Then for each service description to generate we randomly pick from this set two combinations - one to be used as preconditions and one as effects. Equivalently when creating a test problem we create it by randomly picking from the combinations a set of propositions for the initial conditions

14

Ion Constantinescu, Boi Faltings

and a set of propositions for the goals to be achieved. Even if this approach induces some form of dependece between the created services and the problems tested we argue that in reality this is also the actual case (where only small sets of combinations of fact propositions are usually of interest). For the actual tests presented in Figure 7 we have used a maximum service size of 3 and we have varied the vocabulary size from 10 to 100.

Random Model − Maximum Service Size 3 4500 Forward Chaining Forward Chaining Best Match Backward Chaining

Directory Accesses (Queries * Retrieved Results)

4000

3500

3000

2500

2000

1500

1000

500

0

0

1000

2000

3000 4000 5000 6000 Number Of Services Descriptions in Directory

7000

8000

9000

Fig. 7. Random test model: forward chaining best match and backward chaining.

The results show that both the forwardChainingBestMatch and the backwardChaining algorithms make better use of the directory and outperform the classic forwardChaining algorithm while also being more scalable. forwardChainingBestMatch and backwardChaining have comparable performance which suggest that the decision of choosing one in favor of the other might have to be application dependent.

7 Conclusion Web services will likely be a major application of semantic web technologies. Automatically activating web services requires solving problems of indexing and automatic service composition. We have presented approaches to both problems. Web service composition presents a new application domain for planning techniques. It is easier than many other applications since it does not present many resource conflicts, but on the other hand the fact that there will be a large number of services presents new challenges. In this paper, we have compared three algorithms for service indexing and composition regarding their scalability to large numbers of available services.

Integrating Directories and Service Composition

15

We have created two synthetic environments: an optimistic one (layered) where all possible service combinations are present and a random one where services are generated non-deterministically. These have been used to compare the performance of three algorithms with respect to the number of directory accesses required. The first one, which we call forwardChaining, was suggested by [11], and does not use the functionalities of a searchable directory. By integrating it with a directory mechanism, we have extended it into a second algorithm, forwardChainingBestMatch, which incrementally retrieves the services that best match the current problem solving state. The third algorithm, backwardChaining, is closer to AI planning as it creates a solution backwards by starting from the goals and querying the directory for services that partially match the remaining problem. The results show that as expected the first algorithm has only a theoretical value as it is outperformed by an order of magnitude by the other algorithms. Both of the algorithms using the directory show good performance and scalability on the randomly generated problems. On the optimistic (layered) approach, backtracking scales significantly better than forward chaining. Of course, until web services actually become widely used, we do not have any accurate model of their distribution. However, one clear conclusion is that integrating composition planning with a directory is important to achieve scalability, and we have shown an approach to do this that appears to be practical.

8 Acknowledgements Special thanks to Walter Binder which provided a consistent contribution to this work.

References 1. A. Ankolekar. DAML-S: Web Service Description for the Semantic Web, 2002. 2. R. J. Bayardo, Jr., W. Bohrer, R. Brice, A. Cichocki, J. Fowler, A. Helal, V. Kashyap, T. Ksiezyk, G. Martin, M. Nodine, M. Rashid, M. Rusinkiewicz, R. Shea, C. Unnikrishnan, A. Unruh, and D. Woelk. InfoSleuth: Agent-based semantic integration of information in open and dynamic environments. In Proceedings of the ACM SIGMOD International Conference on Management of Data, volume 26,2, pages 195–206, New York, 13–15 1997. ACM Press. 3. Peter Cheeseman, Bob Kanefsky, and William M. Taylor. Where the Really Hard Problems Are. In Proceedings of the Twelfth International Joint Conference on Artificial Intelligence, IJCAI-91, Sidney, Australia, pages 331–337, 1991. 4. Ion Constantinescu and Boi Faltings. Efficient matchmaking and directory services. Technical Report No IC/2002/77, Artificial Intelligence Laboratory, Swiss Federal Institute of Technology, 2002. 5. AnHai Doan and Alon Y. Halevy. Efficiently ordering query plans for data integration. In ICDE, 2002. 6. Joseph M. Hellerstein, Jeffrey F. Naughton, and Avi Pfeffer. Generalized search trees for database systems. In Umeshwar Dayal, Peter M. D. Gray, and Shojiro Nishio, editors, Proc. 21st Int. Conf. Very Large Data Bases, VLDB, pages 562–573. Morgan Kaufmann, 11–15 1995.

16

Ion Constantinescu, Boi Faltings

7. Craig A. Knoblock, Steven Minton, Jose Luis Ambite, Naveen Ashish, Ion Muslea, Andrew Philpot, and Sheila Tejada. The Ariadne Approach to Web-Based Information Integration. International Journal of Cooperative Information Systems, 10(1-2):145–169, 2001. 8. S. McIlraith, T.C. Son, and H. Zeng. Mobilizing the semantic web with daml-enabled web ´ services. In Proc. Second Intl´ Workshop Semantic Web (SemWeb2001), Hongkong, China, May 2001. 9. Massimo Paolucci, Takahiro Kawamura, Terry R. Payne, and Katia Sycara. Semantic matching of web services capabilities. In Proceedings of the 1st International Semantic Web Conference (ISWC), 2002. 10. K. Sycara, J. Lu, M. Klusch, and S. Widoff. Matchmaking among heterogeneous agents on the internet. In Proceedings of the 1999 AAAI Spring Symposium on Intelligent Agents in Cyberspace, Stanford University, USA, March 1999. 11. S. Thakkar, C. A. Knoblock, J. L Ambite, and C. Shahabi. Dynamically composing web services from on-line sources. In Proceedings of AAAI-02 Workshop on Intelligent Service Integration, July 2002.