Using Agents in Performing Multi-site Queries

0 downloads 0 Views 201KB Size Report
and are able to produce a result by executing multiple queries on different sites. Searching agents ... port to intermediaries that allow users and sellers to get in contact and execute a business ..... quest and then translates it into a RDL script to be passed to an agent. The .... E-Commerce and the Information Market. Commu-.
Using Agents in Performing Multi-site Queries Elisabetta Di Nitto1 , Carlo Ghezzi1 , Maurizio Sabba2 , and Paolo Selvini2 1

Politecnico di Milano, Dipartimento di Elettronica e Informazione, Piazza L. Da Vinci, 32 20133 Milano 2 CEFRIEL, Via Fucini, 2 20133 Milano

Abstract. Search engines, portals, and directories have some limitations as mechanisms to support search over the Internet. Recent solutions to such problems enable the usage of more sophisticated query languages whose expressive power resembles traditional DBMS query languages. These approaches, however, fail when the data of interest are partitioned on di erent sites. In such a case, an approach that supports combined searches is needed. In this paper we present a runtime platform and a language supporting a powerful approach to extract and assemble information according to users' needs. The language is based on a XML query language and adapts it to the case where information is obtained by aggregating parts distributed over multiple sites. The platform is based on mobile agents and acts as an intermediary between users and information providers.

1 Introduction The Web is currently the largest and most up-to-date information source. It is therefore the ideal store where to look for any kind of data. But whoever wants to nd information over the Internet can easily get overwhelmed by the large amount of data being published. The most widely used tools supporting search of Web information are collectively called search engines [7] and [9]. Search engines o er a simple yet very powerful approach to information search. Users can digit some keywords and search engines return a number of links that are related to such keywords. Usually, however, the number of returned links exceeds users' ability (and willingness) to manage them. Also, most of the returned data are not related to the actual searching objective of users. Thus, despite their exibility and simplicity, search engines result to be too unstructured to be helpful in all cases. This is why they are complemented by directories and portals that provide a structured view over a small portion of the available information. Directories and portals usually classify information in (possibly nested) categories and allow users to look for information just like in a phone directory. Being based on the idea of manually classifying information that is stored in unstructured documents, such an approach is intrinsically adequate for small amount of data; typically, data concerning a speci c subject, a speci c organization, or a speci c group of users. Indeed, categories are de ned once for all and typically the user has a very limited freedom to de ne his/her own categorization structure.

Recent solutions to such problems are based on the idea of providing a structure for web information that enables the usage of more sophisticated query languages whose expressive power resembles traditional DBMS query languages. Such approaches are based on XML [2] as the language for structuring information. Current research in this area is focused on the de nition of proper primitives for data extraction and selection from a single XML document [1]. Such a limitation makes current query languages unsuitable when the interesting data are partitioned on di erent sites. Consider for instance the case where you want to go on vacation in a place o ering a golf eld and that is directly connected to your home town by plane. Usually the information you are looking for (hotels and ights) is provided by a number of di erent web sites and is not aggregated (some web sites provide only information on ights and others only on hotels). In this case, with the existing techniques, you will have to manually run the search over all involved sites. In this paper we start from the experience that is being developed in the eld of XML query languages and try to apply it to a context where the user requires information that is obtained by aggregating parts distributed over multiple sites. In particular, we present a platform that acts as an intermediary between users and information providers by supporting a powerful approach to extract and assemble information according to users' needs. Such a platform o ers to the user some searching agents that are instructed with the objective of the search and are able to produce a result by executing multiple queries on di erent sites. Searching agents can either perform the queries from a remote server or they can move from a data source to another to collect results, if this is more appropriate for performance reasons. The goal of the current paper is to present the rst prototype of our platform and to discuss the lessons we have learnt from this experience. The work we present is part of a wider project called OPELIX that aims at supporting what we call i-commerce. By i-commerce we mean electronic commerce of information, i.e., goods that can be handled and delivered electronically (e.g., documents, tickets, and reservations). OPELIX aims at providing proper support to intermediaries that allow users and sellers to get in contact and execute a business transaction. [6] provides an interesting taxonomy of the roles that intermediaries or infomediaries cover in di erent business scenarios. In the context of this paper, we focus on a speci c service provided by infomediaries, that is, to support users in nding the information, independent of the location and the number of information providers. The rest of the paper is structured as follows. Section 2 presents an example of information search that is used in the rest of the paper to demonstrate and motivate our approach. Section 3 provides a general overview of the searching infrastructure we have developed and puts it in the context of the OPELIX project. Sections 4 and 5 are the main contributions of the paper and present the language we have de ned to instruct searching agent and the architecture of our prototype, respectively. Finally, Section 6 draws the conclusions and presents our current research agenda.

2 A running example of information search This section sketches an example that is used in the rest of the paper to demonstrate how our approach works. The example is focused on showing that information is seldom packaged and classi ed according to the user's needs. In such cases, current web technology does not help much. The user has to query di erent sites, portals, or search engines to get some partial results that need to be assembled by hand to obtain the desired nal information. Let us consider a system comprising a network of web sites where information about goods to be sold is stored. The stored information can have a non-homogeneous structure. For example, consider the case where vendor sites sell information about theatre performances, ight tickets and hotel reservations. Each of these sites sells only one kind of data (either theatre tickets, ight, or hotel reservations). Tables 1 and 2 show a partial snapshot of the state of two of these sites (site1 and site2), containing information about theatre and ight tickets, respectively. Let us then suppose a user wants to attend some opera performance, e.g. "La Traviata", and, in particular, any available performance, no matter where it is located, provided that a ight directly connecting his/her home cite (New York) to the performance city exists (with a schedule compatible with the performance) and that a ve stars hotel in that city has a suite available. It is clear that our user will have to browse the network of sellers, looking for each piece of information individually, taking care of all the compatibilities and matching among such data. For example, at site 1 he/she should pick from Table 1 data from the second and the third row. Then it should merge these results with similar data collected at the other theatre sites, and nally, it should match them with data from ight sites (see Table 2) and hotels sites. While the matching is trivial if the amount of data being analyzed is small, when the size of data and number of sites to visit grow, searching can become dicult, time consuming, and boring. name city date price Hamlet London 05/03/2001 200 EUR La Traviata Roma 07/07/2001 100 EUR La Traviata Milano 07/31/2001 400 EUR ... ... ... ...

Table 1. Theatre performances database hosted by site site1.domain1.com.

depart/date depart/place arrival/date arrival/place code price 07/05/2001 New York 07/08/2001 Roma AZ1301 700 EUR 04/05/2001 New York 04/06/2001 Tokyo NW675 900 EUR ... ... ... ... ... ...

Table 2. Flights database hosted by site site2.domain2.com.

3 OPELIX and the Request and MatchMaking component The goal of the OPELIX project (see www.opelix.org) is to enable enterprises to produce, sell, deliver, and manage highly personalized contents and services over the Internet. In more detail, OPELIX supports the following tasks: { De ne o ers for certain products. { Formulate requests and match them with existing o ers. { Negotiate an o er or a set of o ers. { Deliver products (i.e., information) in a timely and personalized manner, to precisely match the information demands of users. { Manage exible and secure payments according to di erent payment models such as pay-per-view, volume-based, quality-dependent, time-based, and at subscription fees, or a combination of these. { Provide means for authentication, non-repudiation, and copyright management. { Support for generating higher-level products on the basis of combination, categorization, or personalization of information coming from other (commercial) sources. In OPELIX three main business roles are envisaged: the seller who o ers some products (e.g., ight tickets), the customer who may buy some products, and the intermediary who o ers to both the customer and the seller services that enable and support the interaction among them. In addition, the intermediary is able to add value to some sellers o ers by combining them and by making them more interesting for customers. OPELIX has been designed as a composable system, where di erent functional components can either be installed and used alone or be integrated with the other components. Figure 1 shows the various functional components. They are integrated through an event-based communication infrastructure that enables a good decoupling and independence of components [4]. In turn, each functional component it is often distributed on the customer, the seller, and the intermediary sites. In this paper we focus on the Request and MatchMaking component (RMM henceforth) that is in charge of matching the requests with available o ers. The RMM component is actually a distributed system that can be installed on all

User Profiles

Business Workflow Content Management Communication Infrastructure

Delivery and Dissemination

email Minstrel

Payment

SET Millicent

Negotiation

Aglets Voyager

Targetin

Request and Matchmaking

User Interface Browser

Security Service SEMPER

Fig. 1. High level architecture of OPELIX. the sites that store the o ers managed by a certain intermediary. In general, a request formulated on a site can require the execution of searching activities on the other sites. The actual execution of the search activity is delegated to a searching agent that is also able to move through di erent sites when needed. Mobile agents become very useful when information is stored separately in di erent nodes as in the example presented in Section 2. In this case, an agent, having a list of hotels and theaters, can rst look in the data about theaters for the dates of the customer's preferred performances and then look for an accommodation in some near hotel. We call combined searches those that are executed on di erent sites and provide an integrated result. A searching agent (either mobile or xed) can make decisions on the number of results to collect before terminating the visit. The search, in fact, could either be exhaustive or could stop when, for instance, at least one result has been found. Moreover a searching mobile agent can take decisions on the sites to visit and the opportunity to perform parallel searches on more than one site. All these decisions can be taken on the basis of information that are either be provided to the agent as part of a request or be part of its own semantics. The RMM component has been developed in order to support the rst scenario. In particular, agents are instructed by formulating requests in a simple language we have called Request De nition Language (RDL).

4 The Request De nition Language Request De nition Language (RDL) is the language we have developed to de ne a search request. A search request is de ned in terms of a main objective and, optionally, some subordinate objectives. In the example of Section 2, the main objective is to nd a \La Traviata" performance. Subordinate objectives concern nding ights landing in the the city where the theater is located and hotels in the same city having a suite available in the performance period.

Any main or subordinate objective is de ned in RDL according to the following syntax: APPLY ``query'' AS ``queryname'' AT ``site1'', ``site2'', ... ,``siteN'' USING_POLICY ``policyname''

The query de nes the data that have to be extracted to ful ll the corresponding objective. Later in this section we present its syntax. queryname is a label that identi es the query. site1, site2, etc. form the set of information providers where the query has to be executed (the itinerary). Seller addresses are currently represented in the form tcp://:PORT. policyname de nes the strategy to enforce termination of the current search. The policies we have de ned so far are: { best first: the search is stopped when a rst set of results is retrieved from one of the sites in the site list. { complete: the search is performed on all sites in the site list. { medium: the search is stopped when some results are retrieved from a subset of the sites in the site list. The number of sites from which to obtain results is a parameter of this policy. This policy has been introduced for the sake of exibility. Currently we do not foresee any speci c usage of it. We will possibly eliminate it based on the feedbacks of the end users that will exploit the prototype within the OPELIX project. In principle the query might be expressed in any query language that is understood at the sites where the query is actually executed. We use a XML query language since we assume that data being queried are expressed in XML. According to this design decision, all sites participating in the i-commerce community supported by OPELIX are required to provide an XML-based interface to their own data. The increasing widespread adoption of XML makes this assumption realistic. The query language we have adopted within the OPELIX project is XQL [12]. However, RDL has been designed in such a way that it will be easy to replace XQL with any language will result from the standardization work of the W3C query working group.

4.1 XQL The following XQL query extracts information about books on sale: /Order/Item[discount > 10]/Book[price < 50]?/ (Author/Firstname | Author/Lastname | Title)

The expression /Order/Item represents the from part of the query. If the exact path to reach a certain element is not known, XQL features a sort of a "shortcut",

by means of the "//" operator. The expression //Item allows to refer to every element named "Item" in the document being queried, regardless of where this element is located. The expression between square brackets is called lter and corresponds to the where part. Finally, the elements between rounded brackets are the attributes belonging to the result (the select part). The question mark in the query is used to denote that the three attributes belonging to the result have to be included inside a Book element.

4.2 Objectives and sub-objectives in RDL Queries referring to main objectives are XQL well-formed formulas. The clause below represents the main objective of the example presented in Section 2: APPLY ``//show[name = ``La Traviata'' $and$ status = ``Available'']?/ (city|date|price)'' AS ``performances'' AT ``tcp://site1.domain1.com:10000'', ``tcp://site3.domain3.com:10000'' USING_POLICY ``complete''

Subordinate objectives are preceded by the THEN keyword. Queries referring to them present some additional non-XQL terms that are used to express relationships between main and subordinate objectives. An example of subordinate clause is reported below: THEN APPLY ``//flight[arrival/date < %[email protected]% $and$ arrival/place = %[email protected]% $and$ departure/place = ``New York'']?/ (code|price|departure/date)'' AS ``flights'' AT ``tcp://site2.domain2.com:10000'', ``tcp://site4.domain4.com:10000'', ``tcp://site5.domain5.com:10000'' USING_POLICY ``complete''

In the constraint part references to results of some other queries are present. Such references are resolved during the execution of a request (see Section 5.1 for more details). Syntactically, we use the notation %queryname@tagname% to refer to the values of tagname in the result set of query queryname. Every RDL statement has a RETURN keyword specifying the maximum number of nal results to be produced. In order to require that all results be returned, that keyword has to be followed by "ALL". It is also possible to specify the exact number of expected results. The keyword WITH JOIN before keyword RETURN indicates that the results have to be joined according to the traditional relational semantics. If this keyword is not used, Results obtained by executing each query are simply concatenated.

5 The system architecture The RMM component is in charge of managing search requests expressed in the RDL language. The core components of the architecture are searching agents that interpret and execute the requests. In Figure 2 agents are represented as small faces. The picture shows that they can move from site to site to perform a search. In addition, the picture shows that several sources of information (sellers, in the OPELIX context) can be involved in the execution of a search.

R equest F orm

R MM (S eller)

C usto mer R MM (Interm ediary)

R MM (S eller)

Fig. 2. High level architecture of the RMM component. Agents are visualized by small faces.

An agent is instantiated as a result of a customer's request. Such a request is usually received by an Intermediary who interacts with the customer o ering searching and other i-commerce services. The RMM component installed at the Intermediary site provides users with guidance in the de nition of a search request and then translates it into a RDL script to be passed to an agent. The agent interprets an RDL statement by executing the following steps:

{ { {

identify all objectives to be ful lled; for each objective, query proper sources (in our example, the web sites providing information on theatres, airlines, and hotels); assemble the results while they are being produced and eventually stop the search and provide the results to the user.

The management of each objective requires execution of the following operations:

{ { {

Query preparation: if the query contains references to other objectives then these references are solved and a well-formed XQL query is generated. Query execution: the query is executed at the sites belonging to the site list. Assembling of results: results obtained at each site are merged.

After considering all searching objectives, the agent joins the obtained results if this is required in the RDL request and generates the nal result in an XML format.

In order to perform all the above steps, the architecture of Figure 2 is re ned as in Figure 3, where the internal components of the RMM system are shown. At each site the mobile agent infrastructure is installed. Thanks to such infrasInte rmediary ins tallation

X QL interprete r

R MM U I Search reques t transforma tion A gent infra structure

X ML-based D B MS

A gent infra structure Seller ins tallation

X QL interprete r X ML-based D B MS

Fig. 3. Detailed architecture of the RMM component. tructure, agents live and operate within speci c execution contexts that isolate them from the details of the machine where they are running and provide all services needed for moving and communicating with other agents. In addition, the agent infrastructure protects the machine from attempts by malicious agents to perform any security attack. Mobile agent infrastructures are still in the research stage and several are currently available in the literature [5], [11]. Most of them are available as research prototypes. In our research we decided not to devote our e orts in developing yet another agent infrastructure, but rather, to rely on one among the available prototypes. In particular, we have selected Voyager [8] developed by ObjectSpace, which, at the stage of our analysis, seemed to be the easiest to use and most powerful. In principle, however, any infrastructure could replace it. From the protected context provided by the agent infrastructure, the agent has to execute queries that require access to the information on products stored at each site. Such access is mediated by the XQL interpreter which, for the sake of exibility, we assume to be installed locally at each site. Such a component could have been implemented as part of the agent itself at the cost of increasing the size of the agent. The other components shown in Figure 3 are devoted to managing the interaction with the customer and the translation of the customer's request into the RDL syntax. A detailed presentation of the approach we use to guide the customer in the de nition of a request is out of the scope of this article and will be detailed in a future publication. In the following sections more details on the steps performed by agents during the execution of RDL requests are provided.

5.1 Query preparation

During query preparation the agent determines if the query contains any reference to other objectives. As mentioned in Section 4, such references are syntactically identi ed by '%'. If some reference exists, the agent replaces it with the

value computed during the execution of the corresponding query. For instance, consider the two queries presented in Section 4 that refer to the example of Section 2. Assume that the execution of the query about performances provides the following result set: ``Roma'', ``07/07/2001'', ``100 EUR'' ``Milano'', ``07/31/2001'', ``400 EUR''

In this case, the \ ight" query is transformed in: //flight[date < ``07/07/2001'' $and$ arrival/place = ``Roma'' $and$ departure/place = ``New York''?/ (code|price|departure/date) $or$ //flight[date = ``07/31/2001'' $and$ arrival/place = ``Milano'' $and$ departure/place = ``New York'']?/ (code|price|departure/date)

In the resulting query the terms %[email protected]% and %[email protected]% are replaced by the corresponding values extracted from each tuple of the performances result set. Since two results exist in the result set, two clauses are generated, connected by an or. In general, a query, let say, A, can only contain references to queries that are executed before A is prepared. Since the order of execution corresponds to the order in which queries are written in the RDL statement, these queries have to appear before A in the statement. The experience we gained so far has shown that this restriction on queries it is not too constraining in practice. Moreover, it guarantees the absence of circular references that could otherwise be hard to identify and manage.

5.2 Query execution

As a rst step of query execution, the agent determines the number of sites where the query will be executed. This is the starting point for the de nition of a search strategy that determines the way these sites should be visited. At this stage, we have de ned the following, very simple strategy. If the number of sites to be visited is smaller than a given threshold (de ned at installation time), the agent moves to the site where the query has to be performed and executes it by interacting with the local XQL interpreter. Otherwise, it creates a set of slave agents that run the query on all sites in parallel. Such agents are slave in the sense that are devoted to executing only a portion of a more complex request that can include more than one single objective. When slave agents are spawned, the master agent suspends its activity and waits for the results. In turn, slave agents move to the site where the query will be executed, do their job, and then go back to communicate the results to the master agent. In order to manage possible faults that do not allow slave agents to rejoin back their originating agent, a timeout policy is adopted. If a slave agent does not nd the originating agent waiting for it, it simply terminates its execution.

Other strategies for query execution will be studied and implemented in the future to improve the overall performance and provide more exibility.

5.3 Assembling results If the query is executed on multiple sites, the result set is, in the general case, the union of the single result sets collected at each site. The completion policy determines when to stop collecting non-empty result sets. For instance, if the policy is best first, the rst result set is the nal one. If slave agents are exploited in the search, this means that the master agent waits for the rst agent to come back and then sends a dispose message to all the others. Viceversa, if the master agent performs the search by itself, it simply stops its journey at the rst site that o ers a result set.

6 Conclusion Our search subsystem, being dynamic in its behavior, distinguishes itself from the traditional search engines. It allows users to perform sophisticated searches, which can yield results that are tightly adherent to their expectations. The searching process depends on the single request. At the level of each request, it is possible to de ne the itinerary to be followed and the depth of the search, de ned as the number of results to be collected. Queries with some degree of mutual inter-dependency can be formulated, as we have shown in our running example. Our approach is based on autonomous agents. Each agent interprets a searching (sub)task and executes it producing a result that is either communicated to the customer or packaged with the results obtained by other agents. By exploiting mobility, the agent, which packages both the knowledge needed for searching and the partial results obtained in a searching activity, performs the search at the site where the resources are stored. This approach is not always convenient wrt traditional remote method invocation [3]. Our preliminary experience gained with OPELIX shows that moving agents o ers advantages in the case of network faults and in the cases where users may become disconnected from the network. In these cases, in fact, the agent can perfom at least part of its task even if it is temporarily isolated from the rest of the system. One of the problems we faced with in the development of our prototype is the lack of standard approaches for mobile agents development. Each support plaform o ers its own linguistic and runtime mechanisms based on a model of mobility that in general vary from case to case. MASIF [10] is an approach aiming at overcoming this problem through standardization of agent management, transfer, system names, system types, and location syntax. One of our current research e orts is to exploit an approach \a la MASIF" to build a software layer that enables interoperability between agent plaforms. A notable aspect that is currently missing in our approach concerns security. This problem has several facets that need to be considered separately. An issue

that is particularly crucial when mobility is used, is to ensure that agents moving on a site do not try to attack the system that host them and, in turn, are not attacked from it. This problem is under investigation in the mobile agents community. We plan to integrate results achieved by other researchers as soon as they are suciently general and stable. Other future activities concern the re nement of RDL in order to make it more expressive. Also, we plan to build a GUI that allows every user to formulate queries in a simple way without knowing the internals of RDL sentences. The RMM prototype is going to be used in real case studies as a part of the OPELIX project. We plan to collect useful hints, feedbacks, and new requirements from these experiences.

Acknowledgements We thank all members of the OPELIX team for their invaluable help and support in the development of our approach. We also thank Angela Bonifati who helped us with the analysis of XML query languages.

References 1. A. Bonifati and S. Ceri. Comparative Analysis of Five XML Query Languages. ACM Sigmod Record, 29(1):68{77, March 2000. 2. T. Bray, J. Paoli, and C. M. Spergberg. Extensible Markup Language (XML) 1.0 - W3C Recommendation, Oct. 2000. http://www.w3.org/TR/2000/REC-xml20001006. 3. A. Carzaniga, G. P. Picco, and G. Vigna. Designing Distributed Applications with Mobile Code Paradigms. In R. Taylor, editor, Proceedings of the 19th International Conference on Software Engineering (ICSE'97), pages 22{32. ACM Press, 1997. 4. G. Cugola, E. Di Nitto, and A. Fuggetta. The JEDI Event-based Infrastructure and its Application to the Development of the OPSS WFMS. To appear on IEEE Transactions on Software Engineering. 5. A. Fuggetta, G. P. Picco, and G. Vigna. Understanding Code Mobility. IEEE Transaction on Software Engineering, 24, May 1998. 6. V. Grover and J. T. C. Teng. E-Commerce and the Information Market. Communication of the ACM, 44(4), April 2001. 7. V. N. Gudivada, V. V. Raghavan, W. I. Grosky, and R. Kasanagottu. Information Retrieval On The World Wide Web. IEEE Internet Computing, pages 58{68, Sep. - Oct. 1997. 8. O. Inc. ObjectSpace Voyager 4.0 Documentation, 2000. http://support.objectspace.com/doc/index.html. 9. S. Lawrence and C. Lee Giles. Searching the Web: General and Scienti c Information Access. IEEE Communications, 37(1):116{122, 1999. 10. OMG. CORBA Facilities: Mobile Angent System Interoperability Facilities Submission. OMG Technical Report, 1997. 11. OMG. Agent Technology. OMG Green Paper, March 2000. 12. J. Robie, J. Lapp, and D. Schach. XML Query Language (XQL), 1998. http://www.w3.org/TandS/QL/QL98/pp/xql.html.

Suggest Documents