Ontological Modeling for Interactive Question Answering

Ontological Modeling for Interactive Question Answering Roberto Basili, Diego De Cao, and Cristina Giannone University of Rome Tor Vergata, Department of Computer Science, Systems and Production, 00133 Roma, Italy {basili,decao,giannone}@info.uniroma2.it

Abstract. This paper proposes a model for ontological representation supporting task-oriented dialog. The adoption of our ontology representation allows to map an interactive Question Answering (iQA) task into a knowledge based process. It supports dialog control, speech act recognition, planning and natural language generation through a unified knowledge model. A platform for developing iQA systems in specific domains, called REQUIRE (Robust Empirical QUestion answering for Intelligent Retrieval ), has been entirely developed over this model. The first prototype developed for medical consulting in the sexual health domain has been recently deployed and is currently under testing. This will serve as a basis for exemplifying the model and discussing its benefits.

1

Introduction

Ontologies represent nowadays the common practice to represent knowledge in information systems, but the sharing of distributed information is still problematic. In general Information Retrieval (IR) methods are widely used to locate information and knowledge across the Web resources and the best humancomputer interaction modality is still query based, i.e. a surrogate of natural language (NL) questioning. In general, returned results are too many, precisely because of the low level of NL surrogate represented by a query. Usually, the employed IR engine cannot use directly the semantic information available in an ontology. An increasingly applied methodology to solve this gap is Question Answering (QA). QA is the particular process that allows the user to ask questions in natural language and find back a set of point wise answers: these are document fragments that contain a precise reply to the user question [1]. An interesting use of ontologies as Knowledge for QA is showed in PowerAqua[2]. Even QA is affected by noise as retrieval and answer detection incurs in wrong interpretation due to the vagueness or ambiguity typical of the source questions. Interactive Question Answering (iQA) is the attempt to solve this problem according to a mixed-initiative dialog. In iQA, the system interacts with the user and helps him to disambiguate the question to focus on the suitable answer. In this paper, we discuss an integrated ontology-driven approach to iQA that resulted into a platform called REQUIRE (Robust Empirical QUestion answering for Intelligent Retrieval ) [3]. R. Meersman, Z. Tari, P. Herrero et al. (Eds.): OTM 2007 Ws, Part I, LNCS 4805, pp. 544–553, 2007. c Springer-Verlag Berlin Heidelberg 2007

Ontological Modeling for Interactive Question Answering

545

In REQUIRE, the dialog is managed as a finite-state process, initiated by the user question and culminating into the system final answer. The intermediate states deal with temporary situations needed to solve ambiguous state-of-affairs that do not allow to produce an unique answer: these are clarification, explanation and disambiguation states. A typical interaction supported by REQUIRE is reported in Fig. 1. This example shows the processing of a vague question (U1 ) where the agent reply (A1 ) is a clarification question. The user reply (U2 ) confirms the sub-topic suggested and allows the agent to propose the further suggestions (A2 ). The user response (U3 ) is an explanation question: he does not know the topic introduced by the agent and asks information about it. The agent provides topic definition and re-formulate the question (A3 ). Now the user accepts (U4 ) and receives the final response (A4 ). In order to support the above

U1 : How can I use a long term effect contraceptive methods? A1 : Are you interested to artificial contraceptive methods? U2 : Yes, I am. A2 : Do you want to speak about under-skin implants and about its prescription, in particular? U3 : What is an under-skin implant? A3 : The under-skin implants are a long effect contraceptive methods. They are plastic capsule to insert under the skin, through a cut. Do you want still to speak about prescription of the under-skin implants U4 : Why not? A4 : A doctor, or a sanitary assistant, have to insert and remove the under-skin implant ... Fig. 1. Dialog example with the system in explanation state

process, REQUIRE makes an explicit use of knowledge regarding the domain and the target QA task. As a consequence, its iQA paradigm results in a knowledgebased approach to all the involved tasks, like NL question interpretation, dialog planning, clarification and NL generation. The REQUIRE ontology comprises two major components: a World Model describing the reference world in which each dialog is immersed. Concepts and relationships are here essential for inferences like interpretation of input utterances, and the ontology offers their rigorous dictionary. A Task Model as a structured representation of the answers into an hierarchy of different topics. In the rest of the paper we will first introduce the overall view on the proposed ontological model for iQA (Section 2). In Section 3 we will then discuss the role of the individual ontological components and their integrated use in the overall dialog process. Section 4 will report performance measures of the REQUIRE dialog system that result in an indirect evaluation of the proposed knowledge model.

546

2

R. Basili, D. De Cao, and C. Giannone

Overview of the REQUIRE Ontology

The role of knowledge in an iQA system is to support natural language interpretation of the user input and the dialog control dynamically based on the retrieval results. Information relevant here refers to the involved natural language, to the underlying knowledge domain and to the nature and properties of the available answers. This knowledge is stored into specific and independent repositories or can be inferred from them. A lexical knowledge base (LKB), a world model (W M ) and a task model (T M ), respectively, are designed as reported in Fig. 2.

Fig. 2. Structure of ontologies in REQUIRE

The domain and the task in REQUIRE are represented by two independent ontologies, described according to the standard OWL formalism1 [4]. However, users may refer to the domain during the dialog through NL, so that alternative ways to refer to the WM are possible. A specific Lexical Knowledge Base is thus foreseen highly correlated with the WM. Its aim is to express the common linguistic ways to refer to the domain that are different, when not diverging, from the WM definitions. We can see here the WM as a specialized domain thesaurus and there is no guarantee that the user, in dialog, is able to properly refer its concepts: linguistic reasoning is thus needed. This is done according to the LKB content. The three resources are here integrated as they are all expressed by the same KR formalism but, at the same time, specific inferences that allow to cross refer the concepts of individual KBs are supported. The unifying logic layer in Fig. 2 explicitly deals with the problem of locating domain concepts from lexical items or vice versa, selecting the proper linguistic expressions from the available WM conceptualizations. The common RDF/OWL interfaces that are enabled allow to deal flexibly with NL interpretation and dialog control as well. This architecture provides thus a critical support to the modular design of the overall target iQA system. Ready-made resources are usually available in newer applications, although they provide just partial solutions to the iQA requirements. For example, large scale LKBs, like WordNet[5], are publicly available, but they are not well suited in specific domains. Specialized thesauri can be also found as WM s for specific domains, but they are not guaranteed to provide the proper coverage for a target application. The interaction between a WM and a lexicon (e.g. WordNet) allow to fill the inherent gap between the user language and the ontology without relying on too strict hypothesis about the completeness of this latter. The availability of 1

www.w3.org/TR/owl-features/


547

large resources based on RDF/OWL (including widely available reasoning tools) augment the applicability of the approach for the variety of required inferences: NL interpretation and generation, dialog planning and control. 2.1

The World Model

The role of the World Model is to provide the encyclopedic knowledge regarding the application domain through a specialized dictionary and relevant relations, at least the subsumption among concepts. The first prototype of the REQUIRE system realizes a sexual health information service. In this case the thesaurus needs to define precisely the dictionary and jargon of a medical consultant. We adopted MeSH2 , a large scale medical taxonomy created by the National Library of Medicine. Menstrual Cycle The cyclic cellular, histological, and functional changes in the OVARY during the MENSTRUAL CYCLE in response to the changing endocrine environment. Endometrial Cycle Ovarian Cycle phase:14425301 organic_process:12762272

Fig. 3. An example of MeSH concept represented in RDF/OWL

The availability of a description of each concept, guaranteed by MeSH, is a further opportunity for the offered iQA service. The user can in fact access also complementary knowledge, made available by these definitions. He can ask explanations about the concepts already introduced in the discourse, like in the example of Fig. 1: in turn U3 , the user gets this information via the definition returned by the system. The WM also plays a role in the resolution of anaphoric references as they emerge from the dialog: later references to a discourse focus previously introduced (e.g. menstrual phase) are in general partial (e.g. phase) and matching them along a dialog session is a problematic issue. The MeSH concepts (e.g. Menstrual Cycle) are linked to topics (e.g. senses like menstrual phase) as well as these latter are introduced through NL interpretation. This information is helpful to solve later references to them (phase). A thoroughly discussion about the the WM in this process is reported in Section 3. 2.2

The Task Model

The Task Model includes knowledge about the answers. General topics are defined to group the different certified answers made available by the application. 2

http://www.nlm.nih.gov/mesh/

548


The grouping of answers in topics provides a good trade-off as it helps to maximize the expressiveness of system replies in ambiguous state-of-affairs: in the current implementation, a ranked list of candidates is returned by a traditional IR engine3 and a large set of candidate answers is mistakenly retrieved. As previously exemplified (see A1 or A2 in Fig. 1), clarification questions are meaningful as they are oriented to evaluate the user interest in general topics rather than in possible candidate answers: such topics are in fact much less than answers. While it is certainly true that structuring the IR output in an open domain is not always possible a priori, as the overall set of topics is unknown4 , in a large number of domain specific systems, the set of answers is closed: in all such application scenario certified information is required to refer to a large, but known, set of possible answers. In these cases, an a priori grouping of certified answers in topics and the corresponding organization of topics in an hierarchy is possible. We call this information, derivable a priori, as the Task Model. In the TM, each concept is related to an answer group dealing with the same (or close) topics and different topics are separated in different branches of the TM : elicitation of this information through dialog allows to converge on the suitable answer by pruning irrelevant candidates. In Fig. 4, a small excerpt of the task model within the sexual health domain is reported. It represents the topic ”AIDS ” as a sub concept of the ”sexually transmitted pathologies”. The TM ontology represents the important areas about the domain along two semantic dimensions: areas (i.e. clusters of answers) via the Mconcept notion and individual responses (i.e. certified answers) via the notion of Mresponse. The hasResponse relation links the former with the latter. One Mconcept corresponds usually to many answers as they provide information about its different aspects: for example, information about infection (i.e. contagio) by AIDS is corresponding the Mresponse r 441: users asking about infection by AIDS will receive the information in the property statement of the Mresponse r 441. Different aspects of an Mconcept are described by different Mresponses. The TM has 346 classes that represent different Mconcepts in the sexual health area and 749 Mresponses. Notice how that the TM also supports Natural Language Generation (NLG). A Mresponse is also morphologically characterized (property morphology) as it triggers the generation of NL sentences during dialog turns. NLG is based on textual grammars, encoded in the TM via NL Templates, as proposed in [7]. NL templates are made available for each Mresponse through the hasTemplate property. Given the dialog status, and the involved Mresponse, the related NL templates allow to efficiently compile the clarification and explanation utterances. This mechanism will be further discussed in section 3. In this way, the TM hierarchical structure inspires the planning of the best dialog strategy triggered by an initial user question. Details on the TM-based planning are given in Section 3.

3 4

http://lucene.apache.org/ Although clustering technologies could be applied as well to operate at run time (i.e. as [6]).


549

AIDS malattie a trasmissione sessuale o veneree contagio Il contagio avviene attraverso i rapporti sessuali. il-mas.sing

Fig. 4. An example of concept represented in RDF/OWL in the task model

3

Using Ontological Knowledge for iQA

We previously showed which ontological components are used by REQUIRE. Here we will discuss how they are employed according to the main dialog states used in REQUIRE. The Disambiguation and Explanation states require the access to the World Model in order to find correct interpretations with respect to ambiguous user utterences. The Task Model is the main resource supporting the Clarification state, in two specific aspects: discourse planning and natural language generation. NL interpretation though the World Model During dialog, inappropriate uses of terms are very common and must be managed to detect the intended ontology concepts. Often inaccurate term usages hide the semantics implicitly intended by the user. In general, although ontologies capture aspects of the semantics that are somewhat independent from linguistic information, most text processing applications (as in iQA) require explicit mapping between domain concepts and their textual counterparts. The problem is: which nouns, verbs are used to communicate a given conceptual notion or an event ? This lexical semantic information must be made available to the ontological description especially with respect to potentially useful paradigmatic properties. A type e in an ontology is a specific concept and through inheritance it may be connected with several (topological) properties in the hierarchy. However, this semantic dimension is independent from the linguistic properties (e.g. the rules needed for detecting potential realizations of e in textual material). Linguistic rules usually include a combination of syntagmatic S and semantic M constraints that a text t must satisfy in order to realize e. The separation between the WM and the LKB is preserved and a

550


many-to-many mapping between word senses (in LKB) and ontological concepts (in WM) is automatically computed. The integration of lexical and ontological knowledge is thus made automatically according to quantitative models of semantic similarity [8] and ontology learning [9], discussed in previous works. The LKB word senses are mapped (when possible) to the specific concepts C of the WM through a specific bidirectional relation called conceptualize, that connects ontology concepts C to word senses l (as a C “conceptualizes” l) or vice versa (l “conceptualized in” C). In Fig. 3, we see how the MeSH concept Menstrual Cycle conceptualizes the word sense phase or organic process together with its aliases. Each mapping pairs < l, C > in the conceptualized in relation is characterized by a preference score, denoted by αlC : given an ambiguous noun n, and its lexical senses l(n), the best interpretation C reachable through the pairs < l(n), C > in the ontology can be selected by maximizing scores αlC . As the acquisition of the “conceptualized in” mapping (< l, C > and αlC ) is carried out automatically5 , the above process is largely applicable to ontologies of different domains. The conceptualized in relation is also useful in the resolution of anaphoric references. When an ambiguous linguistic item t is introduced in the dialog, the detection of the proper antecedent t in the discourse is problematic. Item t and the antecedent t (i.e. a previously mentioned topic) can be mapped to concepts C and C , respectively. Anaphoric resolution of t is thus done by choosing the antecedent t whose concept C is the closest to the interpretation C of t. The topic t in this case is thus selected as the proper referent of t and the dialog can proceed. The World Model in above sense supports both the NL input interpretation and the dialog control. Explanation. Explanation is triggered by topics t, unknown to the user, as for example suggested in turn A2 of Fig. 1. The user may ask about them (as in U3 ) and the system switches to an Explanation state. In this case, the encyclopedic knowledge in the WM is needed to provide the topic definition (A3 ). Locating the proper concept evoked by t requires the disambiguation of potential conflicts and the application of the conceptualization algorithm discussed above is triggered: the concept C maximizing the score with t is selected and its definition from the WM allows the user to meaningfully proceed in the dialog. Using the Task Model for Clarification. The Clarification state is the major responsible of the dialog control. Detecting the best answer is carried out through a chain of clarification questions, as for example the turn A2 in Fig. 1: this involves the planning and the NL generation components both strictly dependent on the task model. Selecting the best answer in iQA consists in deciding how the dialog will proceed during a session triggered by an initial user question. As many candidate answers are returned by the IR engine, they form the answer list AL. The 5

Details on the automatic acquisition of the lexicon-ontology mapping and their preferences are given in [9].


551

system uses a plan to decide which interactions are useful to focus on the subset of relevant ones. The dialog in fact aims to fully prune out the irrelevant responses from AL, possibly triggered by the user original question. Individual turns are planned to reach the correct response (i.e. a node in the Task model) by minimizing the number of interactions. A specific module, i.e. the Planner, navigate the TM hierarchy and organize turns into a plan. In REQUIRE, a probabilistic model of the answer distribution in P (AL) is maintained. The best plan, i.e. the most informative chain of clarification questions, is derived by maximizing the information gain at each turn. The shortest chain of clarification questions is the one that efficiently minimizes the overall entropy: it will be selected as the best plan at each step ([3]). Natural Language Generation is needed at each clarification steps and it is controlled by the TM itself. Every concept in the task model is assigned to a linguistic pattern also named a template class, as in [7]: it expresses the form of a clarification reply suitable for a given Mresponse. These are activated by the dialog state according to the current context involved, including either a TM concept, in clarification, or a WM node in disambiguation. The NL output of the system is derived through slot filling of a template class, according to the involved Mresponse node in the TM. During the slot filling the information about such node is employed, e.g. its label. Slot filling produces a sentence skeleton, onto which linguistic transformations are applied: the compilation of linguistically well-formed prepositional phrases describing specific properties of the Mresponse (e.g. property morphology) is thus carried out. The nouns characterizing the property label of an Mresponse concept in the TM are here used.

4

Evaluation

An evaluation test has been made according to a user satisfaction metrics through a Web-based survey, as also suggested by [10]. Various evaluation criteria have been studied as:Topic understanding, Meaningful interaction as the quality of the system behavior according to the utility of each generated turn.Topic coverage, Contextual appropriateness as the ability to produce clear turns consistent with the dialog progress, as proposed in [11],the Interaction/Dialog quality as the overall quality of the system generated sentences and the Ease of use: the usability of the system, as friendliness perceived by the user Users were asked to declare them with respect one entire dialog session, except for the Contextual Appropriateness that is reported on a per turn basis. The reported score in this case is the average across the set of all turns. These metrics allows us to measure the dialog user satisfaction and indirectly the impact of the ontology model. Fig. 5 reports the aggregate results of subjective scores within four different areas. The class Disease, i.e. a well-specified topic, results in the better scores across the different criteria. It was, in fact, the inspiring area of the entire service, targeted to provide information about possible sexual diseases, poorly known among young people. On the contrary, the class sexuality, that is

552


Fig. 5. Perceived Accuracy by the user according to different question types

much more vague and ambiguous, has the lowest values in all cases. An analysis of the sessions in this area showd that user questions were very general and did not correspond to any available question in the TM. It is interesting to notice that the best values for all classes are generated by the Dialog Quality measure that emphasizes the good perceived quality of the NL-Generator component. In the scope of this paper we are mostly interested to the Contextual Appropriateness that measures the quality at each individual turn. Here, the best value is achieved for the anatomy topic and not for diseases. It is worth noticing how this class is well represented in the WM (i.e. MeSH) but poorly covered by the task model6 . This suggests that even when no final response can be obtained (as anatomy is almost absent within the TM), individual turns across a whole dialog session still keep the interactions within acceptable levels of quality and consistency: the average accuracy of the several individual turns is in line with user expectations of a naturally flowing communication. This was the main contribution of the overall ontological modeling.

5

Conclusion

The ontological model discussed in this paper is a flexible and largely applicable approach to the development of knowledge-based interactive Question Answering system. It aims to support a clear separation between domain and control knowledge interleaving in dialog thus allowing maximal reuse of existing customer or public domain resources. The model has been used to develop an iQA platform, REQUIRE, according to widely available resources, like WordNet and MeSH. Due to the consistent reuse of existing resources (i.e. MeSH and Wordnet), the design and development effort required to deploy the final system has been limited. The high level of usability as measured by the Web-based survey 6

The targer service was not assumed to give information about anatomy, but rather concentrate on consulting about diseases and good practices.


553

is a rather good result. The perceived naturalness of the REQUIRE dialogs is a good result also compared by AquaLog [2] evaluation where the linguistic failure influences the global performaces achieving the 48.69% of questions correctly handled agains our 64.4% dialogs successfully conclused. We account for most of the positive aspects of the system to its underlying knowledge organization and its use of large scale of lexical resources within a more traditional dialog architecture. iQA here is used as an indirect way of evaluating a complex ontology. However, as it can be also seen as a paradigm for accessing and navigating large scale knowledge resources, we plan to apply the proposed architecture to the development of human-oriented interfaces to ontologies.

References 1. Hirschman, L., Gaizauskas, R.: Natural language question answering: The view from here. Journal of Natural Language Engineering 7(4), 275–300 (2001) 2. Lopez, V., Sabou, M., Motta, E.: Powermap: Mapping the real semantic web on the fly. In: Cruz, I.F., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 414–427. Springer, Heidelberg (2006) 3. Basili, R., De Cao, D., Giannone, C., Marocco, P.: Data-driven dialogue for interactive question answering. In: Pazienza, M.T., Basili, R. (eds.) AI*IA 2007: Advances in Artificial Intelligence, 10th Congress of the Italian Association for Artificial Intelligence, Springer, Heidelberg (2007) 4. Antoniou, G., van Harmelen, F.: Web ontology language: owl. In: Studer, R., Staab, S. (eds.) Handbook on Ontologies, Springer, Berlin (2004) 5. Miller, G.A.: Nouns in wordnet: a lexical inheritance system. International Journal of Lexicography, 245–264 (1990) 6. Spaccapietra, S., Atzeni, P., Fages, F., Hacid, M.-S., Kifer, M., Mylopoulos, J., Pernici, B., Shvaiko, P., Trujillo, J., Zaihrayeu, I. (eds.): Journal on Data Semantics VIII. LNCS, vol. 4380. Springer, Heidelberg (2007) 7. Stent, A., Dowding, J., Gawron, J., Bratt, E.O., Moore, R.: The commandtalk spoken dialogue system. In: Proceedings of ACL (1999) 8. Basili, R., Pennacchiotti, M., Zanzotto, F.M.: Language learning and ontology engineering: an integrated model for the semantic web. In: Proceedings of 2nd Meaning Workshop, Trento, Italy (2005) 9. Basili, R., Vindigni, M., Zanzotto, F.M.: Integrating ontological and linguistic knowledge for conceptual information extraction. In: Proceedings of the IEEE/WIC WI2003 Conference on Web Intelligence (2003) 10. Harabagiu, S.M., Hickl, A., Lehmann, J., Moldovan, D.I.: Experiments with interactive question-answering. In: Proceedings of the Meeting of the Associations for Computational Linguistics (2005) 11. Danieli, M., Gerbino, E.: Metrics for evaluating dialogue strategies in a spoken language system. In: In Working Notes of the AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation (1995)