Semantic and Conceptual Context-Aware Information Retrieval Bénédicte Le Grand, Marie-AudeAufaure and Michel Soto 1, 3
Laboratoire d’Informatique de Paris 6 – 8, rue du Capitaine Scott – F-75015 Paris, {Benedicte.Le-Grand, Michel.Soto}@lip6.fr 2 Supélec – Computer Science department – plateau du Moulon – 3, rue Joliot Curie – F-91192 Gif sur Yvette Cedex,
[email protected] 2 Axis Research Team – INRIA – Domaine de Voluceau – BP 105 – F-78 153 Le Chesnay
Abstract. This paper presents an information retrieval methodology which uses Formal Concept Analysis in conjunction with semantics to provide contextual answers to Web queries The conceptual context defined in this article can be global - i.e. stable- or instantaneous- i.e. bounded by the global context. Our methodology consists first in a pre-treatment providing the global conceptual context and then in an online contextual processing of users’ requests, associated to an instantaneous context. Our information retrieval process is illustrated through experimentation results in the tourism domain. One interest of our approach is to perform a more relevant and refined information retrieval, closer to the users’ expectation. Keywords. Context-dependant semantics, emergent semantics in information retrieval systems, Formal Concept Analysis.
1 Introduction This paper presents a context-aware semantic information retrieval tool. Our goal is to use conceptual analysis in conjunction with semantics in order to provide contextual answers to users’ queries on the Web. In this paper, we present our methodology and show experimentation results of an information retrieval performed on selected tourism Web sites. The information retrieval process is divided into two steps: • Offline pre-treatment of Web pages; • Online contextual processing of users’ requests. The pre-treatment consists in computing a conceptual lattice from tourism Web pages in order to build an overall conceptual context; this notion is defined in sections 2 and 3. Each concept of the lattice corresponds to a cluster of Web pages with common properties. A matching is performed between the terms describing each page and a thesaurus about tourism, in order to label each concept in a standardized way.
2-9525435-1 © SITIS 2006
- 322 -
Whereas the processing of tourism Web pages is achieved offline, the information retrieval is performed in real-time: users formulate their query with terms from the thesaurus. This cluster of terms is then compared to the concepts’ labels and the bestmatching concepts are returned. Users may then navigate within the lattice by generalizing or on the contrary by refining their query. This method has several advantages: • Results are provided according to both the context of the query and the context of available data. For example, only query refinements corresponding to existing tourism pages are proposed; • The added semantics can be chosen depending on the target user(s); • More powerful semantics can be used, in particular ontologies. This allows enhanced query formulation and provides more relevant results. This paper is organised as follows: section 2 introduces the notion of context, in the general sense and in the field of computer science. Section 3 briefly describes Formal Concept Analysis and Galois Lattices, and defines our global and instantaneous conceptual contexts. Our methodology for a semantic coordination of conceptual contexts and ontologies –or thesauri- is proposed in section 4. Finally, we conclude and give some perspectives of this work.
2 Notion of context Context is an abstract notion that cannot be precisely defined because it only makes sense when it is linked to a particular situation. Human beings implicitly associate a context to a set of actions, an attitude, etc. in situations of everyday life: context surrounds and gives meaning to something else. Some definitions of context have emerged in cognitive psychology, philosophy and areas of computer science like natural language processing. The concept of formal context was introduced by McCarthy [1] [2]. According to Giunchiglia, who has also worked on context formalization, “a context is a theory of the world which encodes an individual’s subjective perspective about it”. This theory is partial –incomplete- and approximate as the world is never described in full detail [3]. Context is a key issue for many research communities like artificial intelligence, mobile computing, problem solving, etc [4] [5]. In artificial intelligence, means to interact between contexts are defined by rules allowing navigation from one context to others [6]. Contexts can be represented by conceptual graphs, topic maps, description logics with OWL extensions, etc. As for the Semantic Web, context is often used either as a filter for disambiguation in information retrieval [7], to define contextual web services [8] or as a means to integrate or merge different ontologies [9] [10]. Context could be specified with different granularity levels (document, web page, etc.). Additional information, i.e. the context, could then be linked to each resource.
- 323 -
3 Conceptual contexts and relationship with ontologies In the previous section, we have presented various definitions of context. In this article, we define conceptual contexts, based on Formal Concept Analysis and Galois lattices in particular. Many research works apply concept lattices to information retrieval [11]. Formal concepts can be seen as relevant documents for a given query. The introduction of domain ontology, combined with concept lattices to enhance information retrieval is more recent. In [12], the authors propose an approach based on Formal Concept Analysis to classify and search relevant data sources for a given query. This work is applied to bioinformatics data. A concept lattice is built according to the metadata associated to the data sources. Then, a concept built from a given query is merged in this concept lattice. In this approach, query refinement is performed using domain ontology. The refinement process of OntoRefiner, dedicated to Semantic Web Portals [13], is based on the use of domain ontology to build a Galois Lattice for the query refinement process. The domain ontology avoids building the whole lattice. Finally, the CREDO system [14] allows the user to query web documents and to see the results in a browsable concept lattice (http://credo.fub.it). This system is useful for quickly retrieving the items with the intended meaning, and for highlighting the documents’ content. In [7], the authors investigate methods for automatically relaxing over-constrained queries based on domain knowledge and user preferences. Their framework combines query refinement and relaxation in order to provide a personalized access to heterogeneous RDF data. Contrarily to this approach, our method is dedicated to imprecise and user-centered queries. In our proposition, Galois lattices are built in order to represent the web pages’ content. The user can then browse the lattice in order to refine or generalize his/her query. Compared to the approaches described above, our proposition is not only dedicated to information retrieval, but can also be used for other purposes like populating ontologies, comparing web sites trough their lattices, helping web site designers, etc. This section is organized as follows: after a short introduction to Galois lattices, we propose our definition of global and instantaneous conceptual contexts. 3.1 Introduction to Formal Concept Analysis and Galois Lattices FCA is a mathematical approach to data analysis which provides information with structure. FCA may be used for conceptual clustering as shown in [15] and [16]. The notion of Galois lattice for a relationship between two sets is the basis of a set of conceptual classification methods. This notion was introduced by [17] and [18]. Galois lattices consist in grouping objects into classes that materialise concepts of the domain under study. Individual objects are discriminated according to the properties they have in common. This algorithm is very powerful as it performs semantic classification. The algorithm we implemented is based on the one that was proposed in [19].
- 324 -
Let us first introduce Galois lattices basic concepts. Let two finite sets E and E’ (E consists of a set of objects and E’ is the set of these objects’ properties), and a binary relation R ⊆ E x E’ between these two sets. Figure 2 shows an example of binary relation between two sets. According to Wille’s terminology [20], the triple (E, E’, R) is a formal context which corresponds to a unique Galois lattice, representing natural groupings of E and E’ elements. Let P(E) be a partition of E and P(E’) a partition of E’. Each element of the lattice is a couple, also called concept, noted (X, X’). A concept is composed of two sets X ∈ P(E) and X’ ∈ P(E’) which satisfy the two following properties : (1) X’ = f(X) where f(X) = { x’ ∈ E’ | ∀x ∈ X, xRx’ } X = f’(X’) where f’(X’) = { x ∈ E | ∀x’ ∈ X’, xRx’ } A partial order on concepts is defined as follows: Let C1=(X1, X’1) and C2=(X2, X’2),
(2) C1