Using Syntactic Information for Improving why-Question ... - Google Sites

3 downloads 116 Views 1MB Size Report
Improving the QA system. – Better retrieval technique (search engine). – Better ranking system. • Syntactic knowle
Using Syntactic Information for Improving why-Question Answering Suzan Verberne Lou Boves Nelleke Oostdijk Peter-Arno Coppen Radbound University Nijmegen

Presenter: Sai Qian 1

Structure • Introduction & Related work • Paragraph retrival for why-QA • Answer re-ranking • Discussion • Future directions

2

Introduction & Related work • 5% of all questions in the QA system are why•

questions Difference from factoid question

– Can not be stated in a single phrase – Paragraph retrieval instead of named entity retrieval

• Improving the QA system

– Better retrieval technique (search engine) – Better ranking system

• Syntactic knowledge between question and answer helps?

3

Introduction & Related work • A substantial amount of work in improving

QA system by adding syntactic information – Tiedemann, 2005 – Quarteroni et al., 2007 – Higashinaka and Isozaki, 2008

• Syntactic information gives a small but significant improvement on top of the traditional bag-of-words approach

4

Paragraph retrival for why-QA • Baseline system – Wumpus Search Engine – Question analysis • Remove stop words • Remove punctuation • Remains: set of question content words – Ranking: QAP algorithm (passage scoring algorithm) 5

Paragraph retrival for why-QA • Evaluation

– Manual assement – Parameter: Success@10, Success@150, Mean Reciprocal Rank@150 – Answer & Document retrieval

• Result

• Improvement – Retrieval – Ranking

6

Answer re-ranking • QAP algorithm (baseline system) – Term overlap between query and passage – Passage length – Total corpus frequency for each term

• Example – Why do people sneeze? – Why do women live longer than men on average? – Why are mountain tops cold?

• The aim: The syntactic information that discourses a relation between the question and its answer! 7

Answer re-ranking • Re-ranking system

– Idea: term overlap – Term: a subset of question terms – Feature: a set of question items and a set of answer items – Proportion:

• Defined Features: 32 in total

– F1: head; F2: modifier; F3: noun phrase; – F4: subject; F6: main verb; F10: direct object; – …… 8

Answer re-ranking • Feature extraction

– Parser • Pelican Parser: more detailed • EP4IR Dependency Parser: more robust – Lemmatization • “sailors of the old” • Only to verbs

• Re-ranking

– Scoring: 0-10 for each feature – Feature selection: genetic algorithm (optimize MRR) 9

Answer re-ranking • Result

• Features that substantially contribute to the ranking score

10

Discussion • Error analysis – No effect: 35/93 • 25/35 no relevant answer • 10/35 RR=1 – Improve: 40/93 – Deteriorate: 18/93 – 11 drops out of top 10, 22 enters top 10

11

Discussion • Example of deteriorated QA pairs

– Why do neutral atoms have the same number of protons as electrons? (answer in “Oxidation number”) – Why do flies walk on food? (answer in “Insect Habitat”) – Why is Wisconsin called the Badger State? (answer in “Wisconsin”)

• Reason

– No lexical overlap between the question focus and the document title – Feature 28 & Feature 13 12

Discussion • Feature selection analysis

– QAP: baseline system – Cue words: because, since, therefore, in order to, due to…… – Main verbs: lemmatization leads to more matches – Question focus & Document title

• Parser comparison

– Only EP4IR is applied to the answer documents 13

Future directions • Improving retrieval • Collecting a larger data collection: improve feature selection • Investigating extra information for why-Q other than syntactic description • Improving the EP4IR parser in constituent extraction 14

The End

Thank you very much!

15

Suggest Documents