Using Information Extraction to Generate Trigger. Questions for Academic Writing Support. Ming Liu and Rafael A. Calvo. University of Sydney, Sydney NSW ...
Using Information Extraction to Generate Trigger Questions for Academic Writing Support Ming Liu and Rafael A. Calvo University of Sydney, Sydney NSW 2006, Australia
Abstract. Automated question generation approaches have been proposed to support reading comprehension. However, these approaches are not suitable for supporting writing activities. We present a novel approach to generate different forms of trigger questions (directive and facilitative) aimed at supporting deep learning. Useful semantic information from Wikipedia articles is extracted and linked to the key phrases in a students’ literature review, particularly focusing on extracting information containing 3 types of relations (Kind of, Similar-to and Different-to) by using syntactic pattern matching rules. We collected literature reviews from 23 Engineering research students, and evaluated the quality of 306 computer generated questions and 115 generic questions. Facilitative questions are more useful when it comes to deep learning about the topic, while directive questions are clearer and useful for improving the composition. Keywords: Information Extraction, Question Generation, Academic Writing Support
1
Introduction
The purpose of academic writing is to document new knowledge, generally including a review of what is currently known about a given topic [1]. This is the particular focus of a literature review genre, a common activity in advanced undergraduate and postgraduate courses, and necessary for all research students. Afolabi [2] identified some of the most common problems that students have when writing a literature review including not being sufficiently critical, lacking synthesis and not discriminating between relevant and irrelevant materials. Helping students with these issues is difficult and time consuming, a significant problem in research methods courses. Automated and semi-automated feedback approaches are being developed to ease the burden. One common form of feedback is questioning the writer about issues in the composition. This is considered an effective method for promoting critical thinking, yet not much is known about how human instructors generate questions or what type of questions are most effective. In order to find out how the human supervisors generate such specific trigger question, we conducted a large study [3] on an Engineering Research method course and analyzed 125 trigger questions generated by 25 human supervisors for supporting their research students’ literature review writing. In that study, we identified important concept types such as Research Field, System, Technology and Technical Term, which the questions generated from. The aim of the
current study is to automatically generate two types of questions (Directive and Facilitative) from these important concept types. Q1 and Q2 in Example 1 were computer generated to ask the student writer to critically analyze the difference between the Technology concept Principal Component Analysis (PCA) and factor analysis in relation to the writing while Q1 In Example 2 to critically compare PCA with other types of true eigenvector-based multivariate analyses. Q2 in Example 2 triggers reflection on the limitations of the PCA. Example 1 Q1: Have you discussed the differences between PCA and factor analysis in: relation to your project? If not, please consider doing so. (Directive) Q2: What do you think of the differences between PCA and factor analysis in relation to your project? (Facilitative) Example 2 Q1: Have you compared the advantages and disadvantages of PCA to other types of the true eigenvector-based multivariate analyses in relation to your project? If not, please consider doing so. (Directive) Q2: One limitation of principal component analysis is that the results of PCA depend on the scaling of the variables. How do you address these issues in your project? (Facilitative) Another intention of this research is to explore how useful the directive and facilitative strategies shown in the examples above are. Black and William [4] defined directive feedback as that which tells the student what needs to be fixed or revised while facilitative feedback provides comments and suggestions to help guide students in their own revision and conceptualization. Ellis [5] found that teachers chose directive and facilitative strategies at different times to accomplish different purposes. For example, facilitative feedback may help students improve overall essay organization and coherence while directive feedback may help them to address spelling and grammatical errors or improve sentence structures. However, it is still unknown the impact of these two strategies on our question templates. In this study, we evaluated both directive and facilitative questions generated by the system. The remainder of the paper is organized as follows: section 2 provides a brief review of the literature focusing on question generation and information extraction. Section 3 describes the linguistic patterns developed. Section 4 briefly describes the question generation process while section 5 details the evaluation and results.
2
Related Work
One of the first automatic QG systems proposed for supporting novices to learn English was AUTOQUEST [6]. This approach is based on simple pattern matching rules to transform the declarative sentence into a question. For example, the pattern S1 (cause) + so that (conjunction) + S2 (effect) can be used to generate why question.
E.g. sentence: Jamie had an afternoon nap so that he wouldn’t fall asleep at the concert later. Question: Why did Jamie have an afternoon nap? Other systems that support reading and language learning include Kunichika et al. [7] who proposed a question generation method based on both syntactic and semantic information (Space, Time and Agent) so that it can generate more question types (Where, When and Who). More recently, Mostow and Chen [8] proposed an approach to generate questions based on a situation model. It can generate what, how and why questions. E.g. what did ? why/how did ? Although these approaches are useful for reading comprehension task, it is not suitable for writing support since it is not useful to ask an author questions about what they just wrote, especially then expecting the answer to be that contained in the same document. Our solution is to extract knowledge from Wikipedia that discuss concepts described in the student’s writing. Typically, information extraction can be used to identifying name entities (e.g. authors and books), and relationships between name entities (e.g. authorX write bookY). Most of work focused on supervised methods which identified the name entities and extract their relations [9]. However, these approaches required a manually annotated corpus, which is very time-consuming and laborious. Semi-supervised and unsupervised approaches depend on seeds patterns or examples of specific types of relations, which is learned by using regular expressions [10]. The comparative expressions in English are divisive and complex. Frizeman et al. [11] concentrated on extracting the comparative relation between two drugs based on a shared attribute in the medical domain. In this study, the information extraction task focuses on extracting other entities that have comparative (similar-to and different-to) and hierarchical relations (kind-of) with a key concept in the student’s composition.
3
Linguistic Patterns Generation
Our training set contains frequent comparative patterns identified by Frizeman [11] and 52 sentences (one of the three relations), extracted from 20 Wikipedia articles (one for each composition). After observing common linguistic patterns from our training set, we developed 26 Tregex rules including 5 for kind-of relation, 10 for different-to and 11 for similar-to. The reasons for using Tregex [12] are that it can identify target syntactic elements (e.g. main verbs or subject complement) which matched predefined cue phrases. If they are matched, we extract matched noun phrases (NP) as entities. For example, the sentence A extends B is identified as a kind-of type by detecting its main verb which matches the cue phrase ‘extend’. Then, the matched A (NP) in the Subject as Entity1 and B (NP) in the Object as Entity2 can be extracted.
3.1
Interpreting kind-of patterns
For kind-of sentences the frequent linguistic pattern is denoted as the subject complement in the form of a possessive case. Table 1 illustrates the frequent pattern, where the noun phrase (NP) possessed by the Entity2 (NP) in the Subject Complement matches the cue phrase, such as kind. Entity1 is the matched NP in the Subject while Entity2 is the processor of a possessive. These linguistic patterns indicate necessary linguistic units. {BE} means some form of be, such as is, am and belongs to while slash indicates disjunction. {Prep} means the preposition word, such as to and of. From the example in Table 2, we extracted feature extraction as Entity1 and dimensionality reduction as Entity2 with kind-of relation. Table 1: The frequent pattern in kind-of relation type sentence
Name Frequent Pattern Example Tregex Rule Other patterns
The subject complement in the form of a Possessive Case. Entity1{BE}kind/family/form/generalization/class/example/ extension {Prep} Entity2 In pattern recognition and in image processing, feature extraction is a special form of dimensionality reduction. S < NP=entity1 < (VP