A Method to Resolve Ambiguity of Interpretation of ... - Semantic Scholar

2 downloads 0 Views 257KB Size Report
{kunitika, honda, tsukasa, takeuchi}@minnie.ai.kyutech.ac.jp. Abstract: This paper presents a method of ambiguity resolution for natural language processing in.
A Method to Resolve Ambiguity of Interpretation of English Sentences for Intelligent English Learning Support Systems Hidenobu KUNICHIKA, Minoru HONDA, Tsukasa HIRASHIMA and Akira TAKEUCHI Dept. of AI, Kyushu Institute of Technology, Japan {kunitika, honda, tsukasa, takeuchi}@minnie.ai.kyutech.ac.jp

Abstract: This paper presents a method of ambiguity resolution for natural language processing in intelligent English learning support systems. Ambiguity of interpretation of sentences is one of the most important problems for intelligent language learning support systems which allow learners input composed sentences freely. Our system has a question and answer function which asks learners the contents of a story. Our method for the function resolves structural and semantic ambiguity by using the degree of agreement between results of natural language processing: syntactic and semantic information, of learners' answers and a story.

Introduction QA (Question and Answer) about the contents of a story is widely used in language learning. QA in a target language is effective for acquiring practical skills because learners use plural language skills to answer questions, i.e. to grasp the contents of a story and question sentences and to compose answer sentences. When a teacher practices a learner by such QA, the teacher will lead a learner adaptively into answering the question correctly. That is to say, the teacher will give simple hints to learners who almost always answer correctly and give concrete knowledge to learners who often answer incorrectly. Some computer assisted language learning systems are equipped with test functions which ask about the contents of a story. Most of them, however, use typical answers and comments prepared beforehand (Heift, 2001; Levin, 1995), so teachers are burdened with preparing them for every learning material. And it is very difficult to prepare sufficient amount of answers and comments to correspond with various answers. The target of our study is to realize a QA function which provides suitable questions for each learner and tailored advice according to the learner's answers. To achieve the target, we need the following functions: (1) to understand English sentences, (2) to generate various kinds of question sentences automatically, (3) to select suitable questions for each learner from a set of generated question sentences, (4) to analyze learners' answer sentences and to diagnose errors and (5) to support the learners depending on the errors. In the previous studies, we have proposed a method of analyzing stories and representing their meanings (Kunichika, 1995) for the function (1), a method of generating various kinds of questions from the syntactic structures and the meanings of stories (Kunichika, 2001) for the function (2), a method of calculating the difficulty of a question (Kunichika, 2002) for a part of the function (3), and a method of identifying syntactic errors (Kunichika, 1995) for the function (4). Generally, when computers analyze natural language, plural interpretations of sentences will be generated. In order to support learners adaptively, computers need to correctly interpret learners' answer sentences. That is to say, computers need to resolve ambiguity of interpretation of the sentences. This paper proposes a method of ambiguity resolution. In the following chapters, first, we describe the outline of the QA function. Then, we describe a method of supporting learners. After that we present factors of ambiguity of interpreting learners' inputs and a method of ambiguity resolution. And finally, our conclusions and discussions are presented.

The outline of the QA function The QA function gives questions about the surface meaning of a story for novices in English learning. We assume that users of the QA function have studied basic knowledge of English grammar. Learning targets of QA are to master the use of grammatical knowledge and to train for conversation. In this chapter, we describe processing flow and information which is also used for ambiguity resolution.

Processing flow Fig. 1 shows the processing flow of the QA function. First, the QA function generates question sentences about the story as many as possible by both referring to syntactic and semantic information extracted by natural language processing and transforming them into interrogative sentences. The query forms which the QA function treats are alternative questions and questions using interrogative pronouns, and the kinds of questions are five; to ask about the content of one sentence, to use antonyms and synonyms, to use modifiers appeared in plural sentences, to ask the contents represented by plural sentences by using a relative pronoun, and to ask time and space relationships. Next, the function calculates difficulty of each question for a learner, and then, selects a suitable and purposive question for the learner by referring to the difficulty of questions, educational intentions, information about a story and student model. Learner's answers are analyzed by natural language processing, and syntactic and semantic information are extracted. The QA function also identifies syntactical errors. After that it judges the answers semantically by comparing semantic information of the answers with semantic information of the story. Results of the judgment are stored into a student model and then used for selecting the next question. If the learner's answer is incorrect, the QA function helps the learner correct the error intelligently by referring to both the student model and a teaching paradigm. For example, if the learner frequently uses a particular grammatical knowledge incorrectly, the function teaches the knowledge in detail. We describe a method of supporting learners later. Question and Answer function

Educational Intentions Info. about textbooks New Words New Idioms

Info. about sentences Sentences Syntax Semantics Structure

Dict. of synonyms antonyms

Interrogative tables

Generating questions and information for judgment Question generation rules Selecting a suitable question

Asking the question rn Le a

Judging the answer and identifying error origin

w an s er' s

er

Student model

Figure 1: The processing flow of the QA function

Information for the QA function Syntactic and semantic information and student models are important for the QA function. Syntactic and semantic information is mainly used for automated question generation and judging learners' answers. A student model is used for selecting a suitable question. The QA function uses syntactic and semantic information generated by natural language processing. We have proposed a method of extracting syntactic and semantic information of stories based on DCG (Definite Clause Grammar). Here we briefly describe our representation method of syntactic and semantic information. Fig. 2 shows an example of syntactic information. Syntactic information consists of a syntactic tree, which expresses parts of speech and modification relationships of words and phrases, and a feature structure, which expresses both grammatical functions of words and phrases and grammatical information such as sentence structure and idioms. Semantic information consists of time and space information and information about verbs, nouns and modifiers. Fig. 3 shows the semantic information of the four sentences placed on the upper left-hand corner. One of features of the

|-sentence |-sent | |-stv | |-np | | |-pron -- i | |-vp | |-vp | | |-v | | | |-see | | | |-suffix -- i_ed | | |-np | | |-det1 -- a | | |-ng | | |-adjp | | | |-adj -- beautiful | | |-n -- girl | |-prp | |-pr -- in | |-np | |-det2 -- the | |-ng | |-n -- park |-end_of_st -- .

subj: verb:

obj:

prp:

stype: tense1: tense2: voice: t_aux: idiom:

Syntactic tree

person: num: case: vtype: tense: conj: vcheck: ind_art: det: count: ntype: num: person: det1: case: prep: np:

first single subjective vt past irregular undefined spl: a can yes common single third a objective in def_art: spl: the det: can count: yes ntype: common det1: a num: single person: third case: objective

positive past undefined active undefined undefined

Feature structure

Figure 2: An example of syntactic information

John had breakfast at eight this morning. Then he went to West Park on a blue bicycle. He sat on a white bench in the park. There was a red bicycle near the small bench. t in

Time

po

this_morning

eight

then Time Information

have class : bodily_action agent : john object : breakfast attribute: time : eight restriction : at go class : physical_transfer agent : he attribute: time : restriction : nil t n space_to:west_park poi restriction : to tool : bicycle

Space he

he

West_park

on

white bench

near

Space Information

red bicycle

breakfast class : meal eight class : time time : this_morning this_morning class : time john class : people west_park class : park

sit class : bodily_action agent : he attribute : space : bench restriction : on

bicycle class : two_wheels color : blue quantity : a color : red

be class : existence exist : bicycle attribute : space : bench restriction : near

bench class : furniture quantity : a color : white space : park restriction : in size : small

Verbs

Nouns

quantity 1 color blue red white size small Adjectives & Adverbs

Figure 3: An example of semantic information

expression is that each piece of information is stored separately and relationships of correspondence are expressed by links shown as arrows, so context such as referring relationship by pronouns, or the time order of some events can be retrieved easily. A student model of each learner is prepared, and every time the QA function judges a learner's answer, the student model is updated. A student model keeps a history of whether or not a learner has used each vocabulary or grammar correctly as its understanding state. Thus, the QA function can know (un)familiar vocabulary or grammar by referring to the student model.

Supporting learners When a learner made errors, the QA function helps the learner correct the error intelligently. Kinds of errors are classified into three; errors on vocabulary, syntax and semantics. Present QA function does not treat errors of spelling, but tells learners which words the function does not know. The QA function can identify 61 kinds of syntactic errors (Kunichika, 1995). They are typical errors observed among Japanese junior and senior high school students. Answers including semantic errors mean that the answers are not appropriate for a question. In order to judge a learner's answer semantically, the QA function compares the semantic information of the answer with the semantic information of a story. Thus, semantic errors appear as difference of them: lack, excess and replacement. It is desirable that learners correct errors by themselves instead of giving correct answers immediately. The QA function, therefore, support learners according to a policy of urging them to consider carefully as follows. An adaptive method to support learners In order to correct errors, it is necessary both to realize the errors and to express an answer sentence by using correct knowledge. Because it is desirable to give learners a chance to correct errors by themselves, our adaptive method of supporting learners does not give full support at a time, but gives information step by step as follows. (1) to inform that an answer is incorrect (2) to give a hint; a sample sentence which has correct knowledge for syntactic errors and a part of original story for semantic errors (3) to give where errors are and the kinds of the errors concretely (4) to give correct knowledge concretely (5) to give an example of correct answer (1) and (3) are information to help learners realize errors; (2) and (4) are to help them express correct sentences. The QA function, moreover, plans to change abstract level of the giving information: first, the function gives learners abstract information like hints: (1) and (3). If the learners still make the same errors, then the function gives them more concrete information: (2) and (4). (5) is the final step for learners who have given up. For example, when the QA function generates and gives a learner a question "Who has a baby on her back?" from "The mother koala is running in the movie. She has a baby on her back.", and the learner's input is "A kangaroo has a baby on her back." which has a semantic error, the QA function gives information as follows. (1) The function tells the learner "Your answer is incorrect." (2) If the learner still makes the same mistake, the function gives the original story. (3) If the learner still makes the same mistake, the function gives the message "Replacement error: The agent "a kangaroo" is incorrect." (4) If the learner still makes the same mistake, the function gives the message "The agent is "the mother koala" instead of "a kangaroo"." (5) If the learner still makes the same mistake, the function gives an example answer "The mother koala has a baby on her back." as the final way of support. Information should be extracted from learners' inputs In order to realize the support mentioned in the previous section, the QA function needs to extract where errors are and the kinds of the errors from learners' inputs correctly. The function analyzes learners' inputs and identifies the words which have syntactic errors and the kinds of the errors. The syntactic errors can be identified are typical 61 kinds as mentioned before. Semantic errors are identified by comparing the semantic information of learners' answers, which is generated by natural language processing, with a story's. As the result, kinds of errors: lack, excess and replacement, and words which have the semantic errors are identified.

Problems on analyzing learners' inputs In order to help learners appropriately, it is necessary to interpret learners' input sentences correctly. Generally, natural language processing will produce ambiguous interpretation of the sentences. Even if sentences are

syntactically correct, ambiguity may be created. Because users of the QA function are learners, input sentences sometimes have syntactic errors. Our parser tries to interpret erroneous sentences by identifying causes of the errors. The number of possible interpretation, therefore, is larger than the case of analyzing only correct sentences, and it complicates ambiguity resolution. Ambiguity is classified into two; structural ambiguity caused by modifying relationship and semantic ambiguity caused by words with many meanings. Generally, common sense such as restriction of word cooccurrence will reduce ambiguity. The QA function does not, however, use such restriction because the function allows authors input their own stories and it is very difficult to prepare sufficient amount of knowledge of common sense for every story. The rest of this chapter explains each ambiguity. Structural ambiguity That there are, in general, plural candidates modified by a modifier such as prepositional phrase. Such case is called structural ambiguity. For example, the prepositional phrase "on the ground" in the sentence "Tom found the school on the ground." can modify the verb "found" or the noun "the school". Semantic ambiguity Although some words have the same spelling, parts of speech, tense or meanings are different. For example, "rock" means stone, music and so on. And the present, past and past participle of "put" are the same spelling. Therefore, there are cases that ambiguity caused by word meaning is produced.

Automated ambiguity resolution One of simple ways to resolve ambiguity is to ask learners to select the correct interpretation. However, the aim of the QA function is to master the use of grammatical knowledge and to train for conversation. Thus it is not the essential activities that learners select the correct interpretation of a sentence from many candidates of the interpretation. The selection is, moreover, difficult for novice learners. It is, therefore, improper for the learners to ask the correct interpretation. This chapter presents a method of automated ambiguity resolution for the QA function. The QA function asks on the contents of a story and learners answer the questions. Thus we assume that learners' answer sentences will accord with the contents of the story. The QA function, therefore, resolve ambiguity by fundamentally referring to the syntactic and semantic information of sentences in the story. We also assume that learners will try to answer questions correctly. So, interpretations which are adequate as correct answer are given priority over other candidates of interpretation. We describe automated resolution of structural and semantic ambiguity below. Resolution of structural ambiguity Fundamentally, the structural ambiguity is resolved by referring to the syntactic information of sentences in the original story as mentioned before. The correctness of the syntactic information of the original story is guaranteed because its author has checked it by hand at the authoring stage. We first show a simple example of ambiguity resolution. "Tom found the school on the ground." shown before has ambiguity. We assume that T1 shown below is the original sentence of a question. That is to say, the question is generated from the sentence and "on the ground" modifies the verb. (T1) Tom found the school on the ground. If a learner's answer sentence is also "Tom found the school on the ground.", the function judges that "on the ground" modifies the verb by referring to the modification relationship in the syntactic information of the original sentence. However, learners' answers and original sentences do not always have the same modification relationship, that is to say, modifiers, modified words and their parts of speech are not always the same, because the answer sentences may have errors. For example, if the original sentence of a question is T1 and an answer sentence is "Tom found the school in the ground.", they have not the same modification relationship. Human will, however, consider that it is an

error on preposition, that is to say, a learner uses "in" instead of "on" because the learner will not be able to distinguish the prepositions, and judge that "in the ground" modifies the verb. story original sentence M2o: modified words P2o: part of speech

M1o: modifier P1o: part of speech

answer sentence M2a: modified words P2a: part of speech

M1a: modifier P1a: part of speech

Figure 4: Modification relationships In order to realize the ability of structural ambiguity resolution, we have defined degree of agreement. M1o and M2o in Fig. 4 are a modifier and the modified words in the original sentence respectively. P1o and P2o are the parts of speech of M1o and M2o. "answer sentence" in Fig 4 shows one of candidates of interpretation of an answer sentence. M1a, M2a, P1a and P2a are for the interpretation and correspond to M1o, M2o, P1o and P2o respectively. The following is the degree of agreement. L5 is the highest level. The QA function calculates the level of each candidate interpretation and selects the interpretation classified in the highest level among them. L5: (M1o = M1a) and (M2o = M2a) and (P1o = P1a) and (P2o = P2a) This level corresponds to cases that both modification relationships are the same. L4: searching for another sentence in the story instead of the original sentence and trying to satisfy the conditions of L5. Learners may input the content of another sentence appeared near the original sentence because of confusion of the sentences. Thus the function compares another sentence with the answer sentence. L3: ((M1o = M1a) or (M2o = M2a)) and (P1o = P1a) and (P2o = P2a) There are cases that learners express a part of modification relationship. This level corresponds to cases that modifiers or modified words are the same. L2: (the head of M1o = the head of M1a) and (the head of M2o = the head of M2a) and (P1o = P1a) and (P2o = P2a) This level corresponds to cases that fewer expressions are the same than L3, that is to say, the heads, which are the most important information in phrases, of modifiers or modified words are the same. L1: none of L2 to L5 Resolution of semantic ambiguity As the same with the resolution of structural ambiguity, the QA function resolves semantic ambiguity by referring to the semantic information of the original story. All possible interpretations of an input sentence are generated by natural language processing. Semantic ambiguity is resolved by comparing difference between the semantic information of the interpretations and that of original sentence. We will show you an example. The followings are an original sentence and an answer sentence. Original sentence: Answer sentence:

Eva fishes for her mother. Eva fishes her mother.

The followings are candidates of interpretation: C1 to C3, and difference between each interpretation and the interpretation of the original sentence: d1 to d5. Each "[]" in the following sentences indicates lack of a word. C1: Eva fishes [] her mother. d1) lack of a preposition C2: Eva fishes her [] [] mother. d1) lack of a preposition d2) lack of an article for "mother"

d3) use of the verb as a transitive verb d4) excess of the object "her" C3: Eva fishes her mother. d3) use of the verb as a transitive verb d5) excess of the object "her mother" d1 and d2 are syntactic errors. The interpretation which has syntactic errors are also generated because we want to evaluate semantic correctness of learners' answer, and give advice even if the answer includes syntactic errors. Fig. 5 shows relationship among three interpretations and five differences.

C2 d2 C1

d4 C3 d3

d5

d1

Figure 5: Relationship among the interpretations and the differences The difference of C2 includes C1's. The QA function, therefore, discards C2 which differs from the semantic information of the original sentence more than C1. There is not inclusion relationship between C1 and C3. Thus the function can not identify which is the correct interpretation in this stage. Resolution of remaining ambiguity If an answer sentence includes excess information which has ambiguity, the ambiguity can not be resolved by the methods mentioned before. We show you an example as follows. Original sentence: Question: Learner's answer:

Tom sings a song. Who sings a song? Tom sings a song for June.

When "June" means both "the name of sixth month" and "the name of a human", the prepositional phrase "for June" has a semantic role "a period of time" or "purpose". The phrase is an excess of information as compared with the semantic information of the original sentence. Because the function can not resolve the ambiguity caused by excess errors by referring to the semantic information of the original sentence, the ambiguity remains. If ambiguity remains, the function selects an interpretation from remaining candidates at random because of the following reasons. - The QA function uses surface expressions for providing support messages for learners, that is to say, semantic roles are not used. For example, the function will give the message "Excess error: "for June" is an excess of information as compared with the story." for the above error. - Student models for the QA function keep kinds of errors instead of concrete information such as surface expressions and semantic roles of phrases in answer sentences.

Conclusions and Discussions This paper presented a method to resolve ambiguity of natural language processing for intelligent English learning support systems. We described the method with the intention for our QA function. However, the method will be able to be used for other systems because the method uses typical syntactic and semantic information such as modification relationship of words, parts of speech and semantic roles, and does not depend on forms of the information. The performance of the ambiguity resolution, however, depends on the performance of natural language processing, that is to say, the degree of detail of syntactic and semantic information.

Our QA function allows authors to input their own stories. When sentences in the stories are analyzed, ambiguity will be generated. Ambiguity in authoring is resolved by authors. In order to reduce authors' burdens, the QA function analyzes sentences in stories by using only correct grammatical rules for reduction of ambiguity and gives the authors candidates of interpretation of each sentence. We have not sufficiently evaluated the performance of our method, but we have tried to resolve ambiguity by using some conceivable answer sentences. As the result, the QA function has selected the correct interpretation. An evaluation using actual answer sentences is a remaining issue.

References Heift, T. & Nicholson, D. (2001). Web Delivery of Adaptive and Interactive Language Tutoring, International Journal of AIED, 12, 310-324. Kunichika, H., Takeuchi, A., & Otsuki, S. (1995). An Multimedia Language Learning Environment with Intelligent Tutor. In Chan, T., & Self, J. (Eds.). Emerging Computer Technologies in Education. VA: Association for the Advancement of Computing in Education. Ch.10. Kunichika, H., Katayama, T., Hirashima, T., & Takeuchi, A. (2001). Automated Question Generation Methods for Intelligent English Learning Systems and its Evaluation, Proc. of ICCE2001, 2, 1117-1124. Kunichika, H., Urushima, M., Hirashima, T., & Takeuchi, A. (2002). A Computational Method of Complexity of Questions on Contents of English Sentences and its Evaluation, Proc. of ICCE2002. Levin, L., & Evans, D. (1995). ALICE-chan: A Case Study in ICALL Theory and Practice. In Holland, V., Kaplan, J., & Sams, M. (Eds.). Intelligent Language Tutors: Theory Shaping Technology. NJ: Lawrence Erlbaum Associates Inc. Ch.5.

Suggest Documents