Hybrid Reasoning for Web Services Discovery | SpringerLink

Hybrid Reasoning for Web Services Discovery Mohamed Quafafou1 , Omar Boucelma1 , Yacine Sam2 , and Zahi Jarir3 1 2

Aix-Marseille Univ, LSIS, 13097, Marseille, France CNRS, UMR 6168, 13097, Marseille, France Fran¸cois Rabelais University, 37041, Tours, France 3 Cadi Ayyad University, Marrakech Morocco

Abstract. This paper describes a novel approach for discovering web services. The approach combines two logic-based formalisms and two reasoning engines and is illustrated in using a set of web services dedicated to web information extraction. The approach exhibits more or less complex relationships between services: (1) a service may have one or many variants, these are services that perform the same task, which leads to a first category of services specified with Feature Logics, (2) services may also be related by more intimate relationships, those links are expressed in using OIE ( Ontology for Information Extraction), a generic ontology that we designed. Hybrid reasoning is then performed based respectively on feature logics and OIE for Web services discovery.

1

Introduction

Automatically discovering web services is still considered as a research problem, despite recent advances where semantic layers have been added and promoted as a way to solve the problem: semantic web technologies and ontologies [5], [9], [10], reasoning mechanisms [4], [7] to cite a few, are examples of such solutions. In this paper, we present a novel approach for tackling the issue. Our approach relies on two logic-based formalisms which are Feature Logics or FL [14] and OIE – an Ontology for web Information Extraction that we designed and implemented. The approach is illustrated by means of a use case drawn from a real world example, that is web services for web information extraction (IE for short). A typical problem statement in this domain is as follows: given a set of IE methods and techniques (Web services), how to choose the most suitable methods (services) that fulfils some IE specific needs? We believe that our contribution is twofold: – Structuration of the services’ space into subspaces, – Definition of two semantics’ levels described by means of two complementary formalisms, that is FL and OIE our Ontology for web Information Extraction, (OIE) expressed in SHOIN description logics [3]. The remainder of the paper is organized as follows: Section 2 describes IE concepts and methods. Section 3 gives an overview of FL and its usage for IE services’ representation. Hybrid reasoning is discussed in Section 4. Finally we conclude in Section 5. Z. Lacroix and M.E. Vidal (Eds.): RED 2010, LNCS 6799, pp. 150–159, 2012. c Springer-Verlag Berlin Heidelberg 2012

Hybrid Reasoning for Web Services Discovery

2

151

Web Information Extraction

We identified two kinds of IE services: those dedicated to web data management and those related to data extraction. 2.1

Web Data Management Services

Some IE services are generic as they are associated to web data management including data source querying, web pages fetching and parsing. These generic services include the following classes: – HTTP query building: An HTTP query is composed of three parts: a query method, a base URL and a set of key/value pairs. This service builds these three parts from its parameters and returns a list containing a unique item. – Fetching: A fetching service takes as input either a URL or an HTTP request and proceeds to the downloading of the document referred to. It returns an HTTP response or an empty list in case of an error. – Querying: A service querying consists in calling a predetermined service with a set of parameters. It takes as input a set of parameters and outputs the result of calling the service. – Parsing: A parsing service takes a given document with a specific format like XML, HTML, PDF, DOC, etc., parses it and returns a result according to a specific type like DOMobject, abstract type, etc or an empty list in case of a parsing error. – Filtering : This service does a selection on its input according to a predetermined predicate that can be defined as a set of tests. Any input object verifying the predicate is returned. All other input is kept back. – Extracting: An extraction service returns subparts of its input using an expression which is applied to the input. For example, given the DOM representation of an HTML page and the //a/@href XPath expression, the resulting extraction service returns the links contained in the input document. – Transforming: A transformation service consists in changing the format of the input. When the input is an HTML/XML document (or its DOM representation) the transformation can be described by an XSL Stylesheet. 2.2

Web Data Extraction Services

The extraction service may be more complex as the research field on web information extraction is still active and many algorithms for wrappers construction are available. Services implementing such algorithms belong to the same class as they are dedicated to the same extraction task An example of such class of services is the Pages labeling one introduced and formalized in [6]: the input of a service is a labeled page where each and every instance found on these pages needs to be labeled. Another class of services, Document structure analysis,

152

M. Quafafou et al.

consider that a wrapper can be obtained by the analysis of the ”logical” structure of the pages. In [1], authors propose to automatically construct a wrapper by using the search for maximal prefixes of the sequence formed by a document. Finally, Knowledge-based wrappers are services based on methods involving the construction of knowledge based wrappers [12,2]. In order to cope with those different classes, we need high level models to support reasoning on such services considering both their intrinsic properties and their mutual relationship: those models are described in the sequel.

3

Formal Web Services Representation

Web services are offered under multiple variants. As an example, a Personal Web information extraction service may have two explicit variants: one for the US SuperPages (superpages.com), and the second for the French Pages Jaunes (pagesjaunes.com). An explicit variant may also have some variants that we denote as implicit service variant. In fact, there exists many IE services (methodsalgorithms), each of them being used in a specific context (e.g., task complexity, extraction technique, degree of automation, etc). Our Web services representation is performed with two formalisms: feature logics [13], FL for short is the language for the description of explicit Web services variants and feature terms unification is used for Web services variants discovery and composition. For implicit Web services variants, an ontology language is used and a reasoning mechanism is also dedicated. 3.1

Feature Logics: An Overview

Feature Logics [13] is a knowledge representation formalism based on feature terms. A feature term ft denotes a set of objects characterized by some features. A feature is a functional property or a characteristics of an abstract object. In its simplest form, a ft consists of a conjunction of “feature:value” pairs named slots, where each feature represents an object characteristics – an object being a service in our case. A feature value may include literals, variables, and embedded ft. For sake of simplicity, we restrict our syntax to slots. Fig. 1 below illustrates some of the useful ft operators. In this figure, x denotes a variable, f an attribute, a a constant, and S a ft. Complex ft can recursively be built on elementary ft in using well known boolean operators such as intersection, union, and complement. 3.2

FL Specification of Explicit Web Services Variants

Information extraction services are generally specified under several variants. The need of multiple variants of each information extraction service is motivated by the difference in the characteristics of the resources. As an example, a Personal Web information extraction service can be offered under multiple variants, depending on its country (US Directory Supperpage for the USA and

Hybrid Reasoning for Web Services Discovery Notation (also [ ]) ⊥ (also { }) a f :S x ST (or [S,T ]) ST (or {S,T })

153

Name Interpretation Top Universe Bottom { }, Inconsistency Atom {a } Selection value of f is in S Variable — Intersection S and T hold Union S or T holds

Fig. 1. FT syntax and semantics

Service 1 Service 2 Service 3 Variant 1 Filter : us-filter Transform : us-q Fetch : sp-f Variant 2 Filter : fr-filter Transform : fr-q Fetch : pj-f Service 4 Service 5 Service 6 Variant 1 External : sp-e Xmlparse : sp-p Extract : sp-n Variant 2 External : pj-e Xmlparse : pj-p Extract : pj-n Fig. 2. Web information management services

Pages Jaunes for France). For the same reason, other services are also offered under multiple variants, such as Filter, Transform, Fetch, External, Xmlparse, and Extract (here under two variants). Reasoning techniques on FL are also used to select compatible variants of services in the workflow of services (classes) of each source. The IE process being a workflow of tasks (i.e., of services), the selection of a given service variant in the workflow will imply the selection of a compatible variant in the next service of the workflow. In fact, it’s not possible to use a transformation service designed for the French pages jaunes Web site in a filtering process that already have selected filtering service designed for US Directory Superpage Web site. This example can be generalized for all the services that are in sequence in the workflow: they have to be derived from compatible variants; the compatibility means here the use of services designed for the same country. An IE service of a given domain can be enriched as new variants can be added. When the number of variants is big, the automatic construction of combinations is necessary; this can be done based on (1 ) feature logics and (2 ) unification for, respectively, specifying the Web information management services and the selection of compatible variants in the workflow of the target source. Fig. 2 represents every service of the workflow with its variants. In this example, every service has exactly two variants. In other cases, this number can be higher, depending on the variants specified by different users. Using feature logics, the services of Fig. 2 can be represented as depicted in Fig. 3 below:

154

M. Quafafou et al.

Service Service Service Service Service Service

1. 2. 3. 4. 5. 6.

[Service : filter, country : {usa, fr}] [Service : transform, country : {usa, fr}] [Service : fetch, country : {usa, fr}] [Service : external, country : {usa, fr}] [Service : xmlparse, country : {usa, fr}] [Service : extract, country : {usa, fr}]

Fig. 3. FL expression of web services

4

Hybrid Reasoning about Web Services

Hybrid reasoning results from the application of both FL and OIE to a query which consists of the network/workflow of services involved in an extraction process. In a nutshell, FL deals with services’ variants while OIE will help in selecting the most appropriate IE method. 4.1

FL-Based Reasoning

FL query processing starts as follows: based on the query variability criteria, a compatible variant of the first service node is selected. The selection process will then be propagated to the other services of the workflow based also on compatibility in the variability criteria. To make it clear, let us illustrate the selection process for the services presented above. The variability criteria here is the country, so the feature term [country : value] will be added to the services’ specifications. Different country values will generate different variants of a given service. For the French personal Web information service pages jaunes, the feature term will be [country : f r]. The selection of Web information management services is done by feature unification [14] between the first service of the workflow and the query; the selection of the other variants will be done by feature propagation [14]. The unification of the query with the first service of a workflow (filtering service) is represented as follows: [country : fr] [service : filter, country : {usa, fr}] = [service : filter, country : [fr, usa]] [service : filter, country : [fr, fr]] = [service : filter, country : [⊥] [service : filter, country : [fr, fr]] = ⊥ [service : filter, country : [fr]] = [service : filter, country : fr] As a result of the unification process, the second variant of the filter service is selected. Then, this result is unified with the second service in the workflow, which leads to the following result:


155

[service : filter, country : fr] [service : transform, country : {usa, fr}] = [service : [filter,transform], country : [fr, usa]] [service : [filter,transform], country : [fr, fr]] = [service : [filter,tansform], country : [⊥] [service : [filter,transform], country : [fr]] = ⊥ [service : [filter,transform], country : [fr]] = [service : [filter,transform], country : fr] We can see that the second variant transform has been selected. This variant is compatible with the second variant of the filter service already selected: this process is known as feature propagation [14]. In fact, variability criteria is passed through the services in the workflow starting from the first service, which is selected by the query. We can notice here that the values of attribute service (name of the web service) are aggregated instead of being unified; this is done by a new operator that we introduced in [11] in order to deal with services composition based on FL. 4.2

OIE-Based Reasoning

In general, information extraction methods are compared according to different dimensions. In our case, we consider three dimensions: (i) task complexity, (ii) degree of automation, and (iii) extraction technique used by the service. The difficulty of the extraction task depends on the input document: page category (semi-structured, unstructured, structured, etc), html support, variability of information to be extracted, data format heterogeneity, target extraction, etc. The degree of automation relates to users’ expertise (programming, labeling, etc.), the capture of input document (yes or no), output format (text, xml, etc.), applicability to other areas (high, low, medium), etc. Finally, the extraction algorithm is characterized by: a single or multiple scan pass of the document, the type of extraction rules (e.g., regular expressions), adopted parse technique (DOM tree, syntactic, semantic, etc.), learning algorithm (bottom-up ILP, top-down ILP, etc.), level of analysis (word, label, etc.). OIE ontology, a SHOIN DL formalism [3], is used for describing concepts of the IE domain. OIE will be used for any purpose that is essential for the extraction of multi-source information from the web. For example, lets’s consider the query “list of men’s shoes with their price”. Assuming there exist two data sources that can contribute to the answer, each source requiring specific IE methods, OIE will find the specific methods to be applied to each source for digging out the desired information. The main issue here is to find all concepts that are semantically related to a given concept (see use of OIE above). OIE allows to model semantic neighborhood while taking into account many possible relations in the field of Web Information Extraction. 4.2.1 TBox and ABox Contents. The top concept of our ontology is OIE. Specialization of OIE leads to the other concepts including Tool, Criteria,

156

M. Quafafou et al.

Classification. The concept Criteria is also specialized and leads to AutomationDegree, TaskDomains, TechniquesUsed, etc. The TBox contains three basic concepts that are Tool, Criteria and Classif. Tool represents IE methods that are well known in the literature; Classif defines the IE method (manual, semi-supervised, unsupervised); and finally, Criteria represents some benchmarks. The TBox contains also some axioms of type condition (*), specialization (**) and definitions (***), some of them are presented in Fig. 4. Fig. 5 shows some assertions of the corresponding ABox. 4.2.2 Ontology Guided Service Discovery. Once the axioms and assertions are specified, the reasoning engine is involved for query processing. Since the formal specification of OIE has been made in SHOIN DL, we are using a DL Reasoner (like RacerPro, Pellet, Kaon, etc). Let us consider the following query: Concept-Disjointness (*) Classif Tool ⊥ Classif Criteria ⊥

(A01) (A02)

Subsumption (**) AutomationDegree Criteria (A04) TaskDomains Criteria (A05) Extensional Concepts (*) Criteria ≡ AutomationDegree TaskDomains TechniquesUsed (A07) Concept-Equivalence (***) MA-MVA ≡ NHS (A26) Fig. 4. Excerpt of TBox axioms

Concept Classif

Assertions Classif(Manual) Classif(Semi-Supervised) Classif(Supervised) Tool Tool(DEByE) Tool(DeLa) Tool(DEPTA) Tool(EXALG) Applicability Applicability(High) Applicability(Low) Applicability(Medium) Fig. 5. Excerpt of ABOX assertions


157

“retrieve IE methods sites that are supervised and generate results in XML” which is expressed in SHOIN as follows: ? hasClassif. (’Supervised’) ? ? hasOutputAPISupport. (’XML’). The execution of Algorithm 1 below returns the following three well known IE services: RAPIER, SRV and WHISK (see a survey of IE methods in [8]). Algorithm 1. IE service discovery Require: ’OIE DIG.xml’, urlMI, (url of the inference engine) ulrS, (data source URL) ulrsP, (pages URLs) Criteres (criteria) Ensure: A set of IE services that match with the IE task 1: Variables : Request : String, file RESPONSE : ResponseDocument 2: connect(urlMI); 3: createKB(); 4: populateKB(’OIE DIG.xml’); 5: Source.Download Source(String urlS, String[] urlsP) 6: Query ← Define Criteria(Source, Criteria) 7: ’DIG asks.xml’.content ← format asks(R) 8: file RESPONSE ← inference engine.execute(’DIG asks.xml’)

4.2.3 Processing a Service Request. Recall that service requests (queries for short) are characterized by several dimensions such as task complexity, degree of automation, extraction techniques, etc. These queries are represented by a network/workflow of nodes/services. Each node of this network is represented by a pair of information: the class of Web service required and the list of constraints related to this class. Formally, we will represent each node by a couple (X,Y) where X is the class and Y is the list of constraints of the class. To handle multi-dimensions of each query, we propose a processing algorithm that perform hybrid reasoning, that is both FL and OIE. The main building blocks of this algorithm are: IEMatchMaker (IEM), FLMatchMaker (FLM), and OIEMatchmaker (OIEM for ontology Matchmaker). This algorithm behaves as follows: when a query is received, for each node (X,Y), IEM must discover the appropriate web services of class X that perform the required IE method under constraint Y. First, IEM forwards the structure of the node to FLM to discover the list of Web services related to that node: FLM uses an unification process based on FL reasoning. If the result returned by FLM is empty, IEM will ask OIEM to handle the same query. As several services may be returned either by FLM or OIEM, we apply a Select service that selects only one service to be invoked, as depicted in Algorithm 2 below:

158

M. Quafafou et al.

Algorithm 2. InstanceClass(X:Y) (Combining FL and OIE reasoning schemas) Require: X:Y (atomic query where X is the class of the service looking for and Y a constraint) Ensure: R : a set of variants candidate to answer the atomic query 1: BEGIN 2: R = Select(for all s in class X do verify(s, Y)) 3: If (empty(R)) Then 4: R = Select(OIE(X:Y)) 5: Endif 6: Return(R) 7: END

5

Conclusion

In this paper we described a novel approach for discovering web services. The approach is illustrated with a use case related to the (Web) Information Extraction (IE) domain where a typical problem statement is as follows: given a set of IE methods and techniques, how to choose the most suitable method that fulfils some IE specific needs? Our approach departs from existing ones as follows: (1) we propose to structure the set of services in subsets or classes; services within a class are related by means of a semantic link expressed in OIE, an ontology for IE that we came up with, (2) we are making a different use of ontologies: OIE is not a domain ontology (e.g., tourism, biology, etc.), as it is the case in many research works, (3) we address web services customization by means of variants expressed in Feature Logics; note that a variant expresses another type of (semantic) link between concepts, and finally, (4) we designed a hybrid resolution algorithm that combines both FL and OIE reasoning engines for Web services discovery.

References 1. Chang, C.-H., Lui, S.-C.: Iepad: information extraction based on pattern discovery. In: WWW, pp. 681–688 (2001) 2. Habegger, B., Quafafou, M.: Multi-pattern wrappers for relation extraction from the web. In: ECAI, pp. 395–399 (2002) 3. Horrocks, I., Sattler, U.: A tableaux decision procedure for shoiq. In: IJCAI, pp. 448–453 (2005) 4. Klusch, M., Fries, B., Sycara, K.P.: Automated semantic web service discovery with owls-mx. In: AAMAS, pp. 915–922 (2006) 5. Kopeck´ y, J., Vitvar, T., Bournez, C., Farrell, J.: Sawsdl: Semantic annotations for wsdl and xml schema. IEEE Internet Computing 11(6), 60–67 (2007) 6. Kushmerick, N.: Wrapper induction: Efficiency and expressiveness. Artif. Intell. 118(1-2), 15–68 (2000) 7. K¨ uster, U., K¨ onig-Ries, B., Stern, M., Klein, M.: Diane: an integrated approach to automated service discovery, matchmaking and composition. In: WWW, pp. 1033–1042 (2007)


159

8. Laender, A.H.F., Ribeiro-Neto, B.A., da Silva, A.S., Teixeira, J.S.: A brief survey of web data extraction tools. SIGMOD Record 31(2), 84–93 (2002) 9. Martin, D.L., Burstein, M.H., McDermott, D.V., McIlraith, S.A., Paolucci, M., Sycara, K.P., McGuinness, D.L., Sirin, E., Srinivasan, N.: Bringing semantics to web services with owl-s. In: World Wide Web, pp. 243–277 (2007) 10. Roman, D., Keller, U., Lausen, H., de Bruijn, J., Lara, R., Stollberg, M., Polleres, A., Feier, C., Bussler, C., Fensel, D.: Web service modeling ontology. Applied Ontology 1(1), 77–106 (2005) 11. Sam, Y., Colonna, F.-M., Boucelma, O.: Customizable-resources description, selection, and composition: A feature logic based approach. In: Meersman, R., Tari, Z. (eds.) OTM 2006 Part-I. LNCS, vol. 4275, pp. 377–390. Springer, Heidelberg (2006) 12. Seo, H., Yang, J., Choi, J.: Knowledge-based Wrapper Generation by Using XML. In: IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, Seattle,Washington (2001) 13. Smolka, G.: Feature-logik. In: GWAI, pp. 477–478 (1989) 14. Smolka, G.: Feature-constraint logics for unification grammars. J. Log. Program 12(1&2), 51–87 (1992)