Tree Transformation Rules for Lexicalized TAG Parser in Korean

0 downloads 0 Views 159KB Size Report
In lexicalized grammar, each basic struc- ture associated with a lexical item spec- i es the domain of locality over which constraints can be stated. A basic struc-.
Tree Transformation Rules for Lexicalized TAG Parser in Korean Kong Joo Lee, Changhyun Kim, and Gil Chang Kim Department of Computer Science Korea Advanced Institute of Science and Technology Taejon, 305-701, Korea [email protected]

Abstract In lexicalized grammar, each basic structure associated with a lexical item speci es the domain of locality over which constraints can be stated. A basic structure is de ned for a combination of content words and function words in Korean. It is considered that a function word such as a verbal ending or a particle causes a basic structure to be transformed. We construct basic structures for only content words rather than all possible combinations of content words and function words, because a structure for a combination of them can be acquired by transformation of a basic structure of a content word. In this paper, we propose tree transformation rules that automatically transform a basic structure into an appropriate one according to each function word.

1. Introduction One disadvantage of Lexicalized Tree Adjoining Grammar (LTAG) is that they are made up of many elementary trees. A tree family is essentially a set of sentential trees sharing the same argument structure abstracted from the lexical item. Each tree in a family can be thought of as all possible syntactic transformation (e.g. wh-question, relative clause, topicalized and passive sentence) of a given argument structure. In case of English, one concern in parsing with LTAG is that the tree

family for each lexical item must be constructed in advance. Metarules (Srinivas, 1994) are used to derive the trees automatically within the tree family. Also, the number of elementary trees considered by a parser can be quite large because the greater part of the trees within a family are included in parsing. To select the appropriate elementary trees, Supertagging (Joshi, 1994) picks the best elementary tree sequence using statistical information. In Korean, a word-phrase is a sequence of words delimited by whitespaces. A word-phrase consists of content words only or of content words and function words. The content words can be classi ed into a noun, a verb stem, or an adjective stem, while the function words can be classi ed into a verbal ending or a particle. Function words follow content words because of Korean's postpositionality. A content word indicates the meaning of the word-phrase it belongs to. A function word indicates not only a grammatical relation but also transformational information, that designates which transformation occurred in the word-phrase. Basically, we assume that an elementary tree of a content word of a word-phrase is transformed due to its function words. In other words, we can derive transformed tree from a base form of an elementary tree according to the verbal endings or particles. For example, in a word-phrase cohahanun1 (like-ADNOMINAL), because there is an adnominal ending nun, we know that the S-type ini1 The notation of Korean follows Yale Romanization System.

S VP

A NP

A

A

VP

NP0

v

N

Substitution

(a)

P

left

Tom

NP

PP

VP*

NP1

N

for

Rome

(c)

(d)

(b) S

S

S A A

A

VP

NP N

VP

Tom

v

PP

A* A Adjunction

P for

left

NP N Rome

Figure 1: Substitution and Adjunction Operations tial tree of the base form cohaha-ta (like) is transformed into an NP-type auxiliary tree that can modify its following noun. In this paper, we propose tree transformation rules that automatically transform an elementary tree into an appropriate one for a parser. Henceforth, we need not construct a tree family in advance nor include unnecessary trees among inputs to a parser. The overall processing is as follows. Only the base form rather than all variations of an elementary tree of content words are preserved in a lexicon. As Korean is a well-developed postpositional language, verbal endings and particles can be repeated within a word-phrase, and can even be mixed within it (Lee & Lim, 1994). From the result of morphologicalanalysis, we can, therefore, get a sequence of verbal endings and/or particles for each word-phrase. For each function word class, we propose a tree transformation rule. Each transformation rule is applied to an elementary tree of a content word with respect to the verbal endings and/or particles one by one. Consequently, for a word-phrase, the nal transformed trees that are inputs to a parser are acquired. In section 2, we review the formalism of lexicalized TAG. Then we describe a Korean parser based on LTAG. Next, we present the tree transformation rules and their graphs. Finally, we

(e)

Figure 2: Elementary trees and a Derived tree make a conclusion with future direction.

2. The Formalism of Lexicalized TAG TAGs (Joshi et al., 1975; Joshi, 1985) are powerful enough to characterize dependencies such as subcategorization of a verb, which might be at unbounded distance, nested or crossed. The primitive elements of the TAG formalism are known to as elementary trees. Lexicalized TAG associates each elementary tree with a lexical item. Each lexical item is called the anchor of the corresponding structure over which linguistic constraints are speci ed. Elementary trees are of two types : initial trees and auxiliary trees. In describing a natural language, initial trees are minimal linguistic structure that contains no recursion, i.e., trees containing the phrasal structure of simple sentences. Recursive structures are represented by the auxiliary trees, which represent constituents that are adjuncts to basic structures. Initial trees are characterized by the following. All internal nodes are labeled by nonterminals and all leaf nodes are labeled by terminals or by nonterminal nodes marked for substitution(#). In auxiliary trees, all internal nodes are labeled by nonterminals and all leaf nodes are labeled by terminals or by nonterminal nodes marked for substitution,

except for exactly one nonterminal node, called the foot node(*). The foot node has the same label as the root node of the tree. There are two operations that combine each elementary tree : substitution and adjunction. Substitution replaces a node marked for substitution by a tree rooted with the same label as the node. In an adjunction operation, an auxiliary tree is inserted into an elementary tree. The root and foot node of the auxiliary tree must match the node label at which the auxiliary tree adjoins. Figure 1 shows substitution and adjunction operations. Figure 2(a),(b) and (d) are initial trees and Figure 2(c) is an auxiliary tree. As an example, the elementary trees shown in Figure 2 can be combined to form the sentence \Tom left for Rome." as follows : 1. Figure 2(a) substitutes at the node NP0 of (b). 2. Figure 2(d) substitutes at the node NP1 of (c). 3. The result of step 2 above adjoins to the VP node of the result of step 1. The nal result is shown in Figure 2(e).

3. Korean Parser based on MC-LTAG Korean is a verb- nal language and therefore the order of the arguments of a verb is relatively free. However, pure TAG formalism do not permit an elementary tree to have scrambled arguments for an anchor. So, it is not natural to adopt TAG formalism for parsing Korean sentences. In this paper, we implemented a parser for Korean based on Multi-Component TAG (MC-TAG) (Weir, 1988). In MC-TAG, an elementary tree to represent a verb is extended to a set of trees. The tree set representing a verb contains auxiliary trees for each argument of the verb and an initial tree that corresponds to the maximal projection of the verb. It is assumed that all arguments are adjoined to a VP node of an initial tree in Korean (Rambow & Lee, 1994).

WP1 Input Sentence

WP2

WP3 ..... word-phrase

Morphological Analyzer & Tagger

1st stage Dictionary

content word

sequence of functions words

base form tree

Tree Transformation Rules

transformed tree

2nd stage

Figure 3: The rst stage of the parser A general two-step parsing strategy for LTAG is as follows. In the rst stage, the parser selects a set of elementary trees associated with the lexical items in the input sentence, and in the second stage the sentence is parsed with respect to this set. Figure 3 shows the rst stage of our parser based on LTAG. First, a word-phrase is morphologically analyzed, and then it constitutes one content word and a sequence of function words. Second, the parser consults a dictionary to nd out the base form of elementary trees for the content word. Then, the parser transforms the base form according to the verbal endings and particles. This process is performed for all word-phrase in a sentence. Finally, the transformed trees are fed to the second stage of the parser. In next section, we will describe an automatic tree transformation rule for each verbal ending or particle.

4. Tree Transformation Rules Korean di ers from English in syntactic transformation. Whereas an interrogative sentence, for instance, causes a structural transformation of a declarative sentence in English, it does not trigger any structural transformation in Korean. Instead, a conjugation by a verbal ending can convert a sentence into a nominal, an adverbial or an adnominal clause, that leads its structure to be transformed.

Table 1: Tree transformation rules name of tree

transformation

triggered condition

transf m

adnominal ending (exm) adnominal particle (jcm) nominal ending (exn) adverbial ending (exa) adverbial particle (jca) subordinate conjunctive ending (ecs) coordinate conjunctive ending (ecq) connective particle (jj) predicative particle (jcp)

rule

transf n transf a transf s transf q transf p

4.1 Tree Transformation Rules A tree transformation rule is what converts an elementary tree into transformed one according to a verbal ending or a particle. In fact, we made a rule not for a lexical item but for a part-of-speech class of function words, because part-of-speech in the same class behaves identically in transformation. In Korean, there are 7 categories of verbal endings and 6 categories of particles in general (Kim & Seo, 1994). Among the 7 categories of verbal endings, 5 categories can transform base tree, while 2 categories just change its attribute. Most of particles just indicate the grammatical relation between word-phrases. However, some of them make a base tree transformed like a verbal ending. Table 1 summarizes 6 tree transformation rules. 

e ects

NP-type auxiliary tree `ÁÆç(nun)',`(.»(ten)' `¥(uy)' NP-type initial tree `$(ki)', `ÏÚï(um)' VP-type auxiliary tree `2L(key)' `9L)(eyse)' S-type auxiliary tree `))(ese)' S-type initial tree `UW(ko)' NP-type initial tree `–ž(wa)' S-type initial tree `$(i)' 

transf n : This rule transforms an elementary

tree into an NP-type initial tree. If a nominal ending is attached to the tail of a verb stem, then the sentential structures led by the verb can be a nominal. Next to a nominal ending, obviously, any particles can be attached.



transf a : This rule makes an elementary tree transformed into a VP-type auxiliary tree.



transf s : This rule makes one sentence con-



transf q : A coordinate conjunctive end-

transf m : An elementary tree is transformed

into an NP-type auxiliary tree by this rule. An adnominal ending and an adnominal particle make a word-phrase can modify a following noun. Actually, an adnominal ending takes the form of a relative clause with an argument missing. For example, application of this rule, for a word-phrase cohaha-nun(likeADNOMINAL), generates three transformed trees { one with a missing subject, one with a missing object and one without missing argument.

examples



nected subordinately to its following sentence.

ing and a connective particle have the same meaning as `and' or `or' in English. The former forms a connection between sentences, and the latter does between word-phrases. This rule transforms an elementary tree into an S-type initial tree for a coordinate conjunctive ending, and into an NP-type initial tree for a connective particle. transf p : A predicative particle is similar to

the verb `be' in English. If it is attached to a noun, then the word-phrase becomes a predicate. This rule results in an elementary tree with one argument { subject.

Figure 4 gives a graphical illustration of each

NP

NP

null/transf_m

exm

e1

A

A

NP*

jx

A

A

exa

null/transf_a

efp

e2

e6

null/transf_q jx

transf_m

transf_n

ecq

predicative start

ef

S

ecs

S* exn/transf_n

transf_a

S

transf_s

e5

jcp/transf_p

jc/transf_n

VP*

jc/transf_n

e4 jx

S jcp/transf_p

S

S

jcp/transf_p

jx,jc

j1

jx,jc

jcp/transf_p jca

S

null/transf_a

j2 jcm

VP

jcm

j0 jx

A

null/transf_s

jx

jcp/transf_p

VP

null

e3

e0

j5

jcm

A

A

A

NP [subj]

NP

j3

NP nominal start

jj/transf_q

null/transf_m null

j4

transf_q

transf_p

null

Figure 4: Tree Transformation Rules

Figure 5: Graph of Tree Transformation Rules

tree transformation rule.2 The rest of verbal endings and particles except above mentioned ones change the attribute rather than the structure of an elementary tree. The followings are verbal endings and particles that do not cause the transformation.  nal ending (ef) : This nal ending just indicates the style of a sentence { an interrogative, a declarative, an imperative, and an exclamatory sentence. Unlike English, there is no transformation for sentence style in Korean.

phrase in Korean. To get a nal elementary tree of a word-phrase, tree transformation rules are repeatedly applied to a base tree according to the sequence of verbal endings and particles. In the course of applying tree transformation rules, con icts may occur. These con icts take place when a tree transformation rule that converts a tree into an auxiliary tree follows a rule that converts into an intial tree. That is, a tree transformation rule into an initial tree right after a rule into an auxiliary tree causes a con ict. For example, a rule transf p after a rule transf a results in a con ict. Based on the analysis on a corpus composed of about 160,000 word-phrases, we constructed a graph of tree transformation rules without any con ict. On each edge of the graph, there exists a condition (part-of-speech of a function word) and its corresponding transformation rule. In order to avoid a con ict, the application of a transformation rule is delayed until no con icts are expected. As can be seen from Figure 5, for example, we apply a rule transf m not by an adnominal ending only but by null 3 next to the adnominal ending. The following is an example of applying tree transformation rules. There are 6 words



pre nal ending (efp) : It represents tense, politeness or honori c of an elementary structure.



case particle (jc) : It decides the case of an elementary structure.



auxiliary particle (jx) : It gives an additional meaning to an elementary tree.

4.2 Graph of Tree Transformation Rules

As has been stated before, there can be several verbal endings and particles within one word-

2 In transf p, node NP [subj ] is, in fact, formed as an VP-type auxiliary tree based on MC-LTAG.

3

A null symbol means an end of one word-phrase.

Input ejel sicak-eyse-pwuthe-i-ess-ko morpheme

sicak eyse pwuthe i

pos tag

nc

jca

jx

ess

ko

jcp efp ecq S

S

NP sicak

S [tense:past]

[tense:past] S

S

VP

VP

NP0 NP

NP0 NP

NP0

sicak-eyse-pwuthe-i

sicak-eyse-pwuthe-i-ess

sicak-eyse-pwuthe-i-ess-ko

(a)

(b)

transf_p

VP NP

(d)

(c)

analysis on corpus, we constructed a graph which eliminates any con ict between tree transformation rules. As a result, we present the possibility of reduction of the number of elementary trees using the result of morphological analysis and tree transformation rules in Korean parser based on MC-LTAG.

transf_q

Figure 6: Example of tree transformation in the word-phrase `$"¸9L)‚ˆ')$,ÈUW'(sicak-eysepwuthe-i-ess-ko). `$"¸'(sicak) is a content word (noun) which means `start', and `9L)‚ˆ')$,È UW'(eyse-pwuthe-i-ess-ko) is a sequence of function words : i.e., `9L)'(eyse) is an advervial particle, `‚ˆ ')'(pwuthe) is an auxiliary particle with the meaning of `from', `$'(i) is a predicative particle as `be' in English, `,È'(ess) is a pre nal ending that means the past tense and nally `UW'(ko) is a coordinate conjunctive ending. In Figure 6, tree (a) is a base form, and application of transf p results in tree (b) according to the graph. Because the pre nal ending `,È'(ess) does not cause transformation, tree (c) remains the same as tree (b), except that its attribute tense is changed. And then, due to the existence of a coordinate conjunctive ending `UW'(ko) and null, tree (c) is transformed by rule transf q into tree (d). As a result, we can get the nal tree (d) without any con ict.

5. Conclusion In Korean, function words { verbal ending and particle { play an important role in transforming base trees. In this paper, we have proposed 6 tree transformation rules according to each function word class. By using these rules, a basic tree can be transformed into an appropriate one for parsing. When applying tree transformation rules, various sequences of verbal endings and particles in a word-phrase may cause a con ict. From an

References

Aravind K. Joshi, L. Levy and M. Takahashi, 1975, \Tree adjunct grammars," Journal of Computer and System Science. Aravind K. Joshi, 1985, \Tree Adjoining Grammars : How much context sensitivity is required to provide a reasonable structural description," Natural Language Parsing, pp. 206-250, Cambridge University Press, Cambridge, U.K. Aravind K. Joshi and B. Srinivas, 1994, \Disambiguation of Super Parts of Speech(or Supertags) Almost Parsing," Proceedings of COLING '94, pp. 154-160. Jae-Hoon Kim and Jungyun Seo, 1994, \A Korean Part-of-Speech Tag Set for Natural Language Processing," CAIR-TR-94-55. Young-Suk Lee, 1993, \Scrambling as a CaseDriven Obligatory Movement," Ph.D. thesis, University of Pennsylvania. I. S. Lee and H. B. Lim, 1994, \Korean Syntax," HakYeonSa, (in Korean). Owen Rambow, Young-Suk Lee, 1994, \Word Order Variation and Tree Adjoining Grammar," Computational Intelligence on Tree Adjoining Grammar.

B. Srinivas, D. Egedi, C. Doran and T. Becker, 1994, \Lexicalization and Grammar Development," Proceedings of KONVENS '94. Yves Schabes, Anne Abeille and Aravind K. Joshi, 1988, \Parsing Strategies with `lexicalized' grammars : Application to Tree Adjoining Grammars," Proceedings of the 12th International Conference on Computational Linguistics.

David J. Weir, 1988, \Characterizing Midly Context-Sensitive Grammar Formalisms," PhD thesis, University of Pennsylvania.

Suggest Documents