Web-based Anaphora Resolution for the QUASAR ... - Semantic Scholar

Web-based Anaphora Resolution for the QUASAR Question Answering System Davide Buscaldi and Yassine Benajiba and Paolo Rosso and Emilio Sanchis Dpto. de Sistemas Inform´ aticos y Computaci´ on (DSIC) Universidad Politécnica de Valencia, Spain {dbuscaldi,ybenajiba,prosso,esanchis}@dsic.upv.es

Abstract. This paper describes the work done by the RFIA group at the Departamento de Sistemas Inform´ aticos y Computaci´ on of the Universidad Politécnica of Valencia for the 2007 edition of the CLEF Question Answering task. We participated in the Spanish monolingual task only. A series of technical difficulties prevented us from completing all the tasks we subscribed. Our 2006 system was modified in order to comply with the 2007 guidelines, especially with regard to anaphora resolution, tackled with a web based anaphora resolution module.

1

Introduction

QUASAR (QUestion AnSwering And Retrieval) is the name we gave to our Question Answering (QA) system. It is based on the JIRS Passage Retrieval (PR) system [1], specifically oriented to this task. JIRS does not use any knowledege about the lexicon and the syntax of the target language, therefore it can be considered as a language-independent PR system. The system we used this year differs slightly from the one used in 2006. Its major improvement has been the insertion of an Anaphora Resolution module in order to comply with the guidelines of CLEF QA 2007. As evidenced in [2], the correct resolution of anaphora is crucial and allows to improve accuracy of more than 10% with respect to a system that does not implement any method for anaphora resolution. The web is an important resource for QA [3] and has been already used to solve anaphora in texts [4,5]. We took into account these works in order to build a web-based Anaphora Resolution module. In the next section, we briefly describe the structure of our QA system, with particular emphasis on the new Anaphora Resolution module. In section 3 we discuss the results of QUASAR in the 2007 CLEF QA task.

2

System Architecture

In Fig.1 we show the architecture of the system used by our group at the CLEF QA 2007. The user question is first examined by the Anaphora Resolver (AR). This module will pass the question to QUASAR in order to obtain the answer that

Input Question

Question Analysis Passage Retrieval Answer Extraction

Question Analyzer Constraints

Question Classifier Anaphora Resolver

Question Class

Class Patterns

Search Engine (JIRS) Passages

Answer Patterns

Candidates

Filter Reranked Passages

Candidates and Weights

Passage ngrams extraction

Answer Selection

Ngrams Comparison

Answer

Question ngrams Extraction

TextCrawler

Legenda

Fig. 1. Diagram of the QA system

will be used in the following questions of the same group (i.e., questions over the same topic) for the anaphora resolution. The AR hands over the eventually reformulated question to the Question Analysis module, which is composed by a Question Analyzer that extracts some constraints to be used in the answer extraction phase, and by a Question Classifier that determines the class of the input question. At the same time, the question is passed to the Passage Retrieval module, which generates the passages used by the Answer Extraction (AE) module together with the information collected in the question analysis phase in order to extract the final answer. A detailed description of each moidule goes beyond the scope of this paper. We describe in detail only the new AR module we added to the QUASAR system [6] in order to comply with the CLEF 2007 guidelines. 2.1

Anaphora Resolution Module

The anaphora resolution module carries out its work in 5 basic steps: – Step 1: Using patterns in this step the module induces the different entity names, temporal and number expressions occuring in the first question of a group. – Step 2 : For each question of the group the module replaces the entity names which occurred in the first question (which is most likely to contain the target) and occur only partially in the others. For instance, if the first question was “Cu´ al era el aforo del Estadio Santiago Bernabéu en los a¨ nos 80?” (“What was the capacity of the Santiago Bernabeu stadium in the 80s?”)

and another question within the same group was “Quién es el due˜ no del estadio?” (“Who is the owner of the stadium?”), “estadio” in the second question is replaced by “Estadio Santiago Bernabéu”. – Step 3: In this step the module focuses on pronominal anaphora and uses a web count to decide on whether to replace the anaphora with the answer of the previous question or with one of the entity names occuring in the first question of the group. For instance, if the first question was “Cómo se llama la mujer de Bill Gates?” (“What’s the name of the wife of Bill Gates?”) and the following question was “En qué universidad estudiaba él cuando cre´ o Microsoft?” (“In which university was he studying when he created Microsoft?”), the module would check the Web counts of both “Bill Gates cre´ o Microsoft” and “Melinda Gates creó Microsoft” and then it would replace the anaphora which whould change the second question to “En qué universidad estudiaba Bill Gates cuando creó Microsoft?”. – Step 4: The other type of anaphora left to be solved are the possessive anaphora. Similarily to the previous step, the module decides on how to change the question using web counts. For instance, if we take the same example mentioned in Step 2 and we say that a third question was “Cuánto dinero se gast´ o durante su ampliación entre 2001 y 2006?” (“How much money was spent for its enlargement between 2001 and 2006?”), the module would check the web counts of both “ampliación del Estadio Santiago Bernabéu” and “ampliación del Real Madrid Club de F´ utbol” and change the question in order to become “Cuánto dinero se gastó durante ampliación del Estadio Santiago Bernabéu entre 2001 y 2006?”. – Step 5: For the questions which have not been changed during any of the previous steps the module adds at the end of the question the entity name which has been found in the first question.

3

Experiments and Results

This year we experienced some technical difficulties with the result that we were able to submit only one run for the Spanish monolingual task. Moreover, we realized after the submission that JIRS indexed only a part of the collection, specifically the Wikipedia snapshot. The results is that we obtained only an accuracy of 11.5%. The reason of the poor performance is due partly to the fact that the patterns were defined for a collections of news and not for an encyclopedia, and partly to the fact that many questions had their answer in the news collection. For instance, in newspapers definitions are provided as expression between commas alongside with the entity being defined, whereas in an encyclopedia the entity is the title of the article and the definition is given in the first paragraph of the article. Moreover, Wikipedia includes templates for some categories of entities that contain most information. Currently our system is not able to process such templates. We obtained 54 NIL answers, more than the 25% of the total number of questions, and only once correctly. The Anaphora Resolution module did not perform particularly well, since we obtained only a

3.33% accuracy over linked questions, compared to an accuracy of 12.94% over self-contained questions.

4

Conclusions and Further Work

Our experience in the CLEF QA 2007 exercise was disappointing in terms of results, however we managed to develop an anaphora resolution module based on the web and we acquired valuable knowledge for our next participation. The first lesson we learnt from this participation was that the answers do not appear in the same form in Wikipedia and the newspaper collections. In order to address this problem we will need to implement different answer extraction methods, especially focused on encyclopedias, such as the one proposed in [7]. The second lesson was that the anaphora resolution method needs a deeper analysis of the question structure. We believe that syntactical parsing may improve the performance of this module.

Acknowledgments We would like to thank the TIN2006-15265-C06-04 research project for partially supporting this work.

References 1. G´ omez, J.M., Montes-y G´ omez, M., Arnal, E.S., Rosso, P.: A passage retrieval system for multilingual question answering. In: Text, Speech and Dialogue. Volume 3658 of Lecture Notes in Computer Science. Springer (2005) 443–450 2. Vicedo, J.L., Ferr´ andez, A.: Importance of pronominal anaphora resolution in question answering systems. In: ACL ’00: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, Association for Computational Linguistics (2000) 555–562 3. Lin, J.: The web as a resource for question answering: Perspectives and challenges. In: Language Resource and Evaluation Conference (LREC 2002), Las Palmas, Spain (2002) 4. Markert, K., Modjeska, N., Nissim, M.: Using the web for nominal anaphora resolution. In: EACL Workshop on the Computational Treatment of Anaphora, Budapest, Hungary (2003) 5. Bunescu, R.: Associative anaphora resolution: A web-based approach. In: EACL Workshop on the Computational Treatment of Anaphora, Budapest, Hungary (2003) 6. Buscaldi, D., G´ omez, J.M., Rosso, P., Sanchis, E.: N-gram vs. keyword-based passage retrieval for question answering. In: Evaluation of Multilingual and Multimodal Information Retrieval. Volume 4730 of Lecture Notes in Computer Science. Springer (2007) 377–384 7. Fissaha, S., Jijkoun, V., de Rijke, M.: Fact discovery in wikipedia. In: 2007 IEEE/WIC/ACM International Conference on Web Intelligence, ACM (2007)

Web-based Anaphora Resolution for the QUASAR ... - Semantic Scholar

Web-based Anaphora Resolution for the QUASAR ... - Semantic Scholar

Suggest Documents

Another Evaluation of Anaphora Resolution ... - Semantic Scholar

Handling complex anaphora resolution cases - Semantic Scholar

Anaphora Resolution by Default - Semantic Scholar

Anaphora Resolution

Semantic tagging for resolution of indirect anaphora

acquiring lexical knowledge for anaphora resolution - Semantic Scholar

Arabic Anaphora Resolution: Corpus of the Holy ... - Semantic Scholar

Webbased peer assessment: feedback for ... - Semantic Scholar

Multilingual robust anaphora resolution

Anaphora Resolution: To What Extent Does It Help ... - Semantic Scholar

Anaphora Resolution in Hindi: Issues and ... - Semantic Scholar

Choosing a Parser for Anaphora Resolution

Designing Test-beds for General Anaphora Resolution

varieties of anaphora - Semantic Scholar

a language-independent anaphora resolution system for

A Modular Architecture for Anaphora Resolution

Anaphora in Turkish - Semantic Scholar

Zero Pronominal Anaphora Resolution for the Romanian Language

A WebBased Decision Support System for ... - Semantic Scholar

Anaphora Resolution Using Named Entity and Ontology

Impact of Anaphora Resolution on Opinion Target

Recognizing Anaphora Reference in Persian ... - Semantic Scholar

Presupposition Projection as Anaphora Resolution ... - Google Sites

Anaphora Resolution of Malay Text: Issues and