A Karaka Based Approach to Parsing of Indian ... - ACL Anthology

0 downloads 0 Views 377KB Size Report
selectJonal. restFic ~. 1ions uaual].y specify semantic types. fief" e w e. u s e. k a r a k a. r e l a t i o n s ,. a n d. ~peci. fy not just semantic type~.~ but also post- ...
A Karaka Based Approach to Parsing of Indian Languages Akshar Bharati

Rajeev Sangal

Department of Computer Science and Engineering

indian Institute of Technology Kanpur Kanpur 208 016 Abstract

India

A karaka based &i)pro,'~cl'~ f o r ' t ) , x Y s { n g o f described. I~ has been used for, building a prototype Machine Translation system.

[nc/ian languages a parseL' of ttindi

is for

A lex.[ca].[sod gt'&mlnaF formalism has been developed that a].lovas c o n s t r a i n t s to be s p e c i f i e d between 'demand' ~and 'source' ~;or'ds (e.g., between verb and its karaka roles). The parser has two important novel features: (.[) It has a local word grouping phase in u h i c h wot"d gr'oups a r e f o r m e d using 'local' infor-marion o n l ~ ~. They are formed based on f i n i t e state machine specifications thu~ resulting in a fas~t g r o u p e r . (ii) T h e parser. is a general constraint :~o]ver. It f i r s t transforms the cons t r ' a i n t s to ~ n i n t e g e r programming pr.ob]em and then solves it.

i.

Introduction

ttlPe"

Languages belonging %o the Indian linguistic area shaFe several common features. They are relatively wor.d order free, nominals are inflected or- h a v e p o s t po::it i o n case markers (collectively called as having vibhakti) , have verb complexes consisting of sequences of verbs (possibly joined together into a single word), etc. T h e r e ar'e also commonal]ties in vocabulory, in senses spanned by a ~4ord in o n e l a n g u a g e to those of its counterpart in a n o t h e r Indian language, etc.

sentence V

Before let

describing our grammar us l o o k at %he parser

+ ....................... +

lactive

I

morphological

I

I

analyzer

I

I

lexiconl->

]

......................+

+ ....................... +

I lexical ......................

Iverb

form

.......................

~

r

I entries

+ ...................

chartl-->llocal

+

word

grouperl

+ ....................

word

We base our grammar on the karaka (pronounced kaarak) structure. It is necessary to ment ion that although kaFakas are thought of as similar to c!~,'os, ~ } ~ y ?,r'o f u D d ; ) m e r ~ t : . a ] ] y ,7! { f f ei'e~]+.: : "The pivotal categories of "the ~bstL'act syntactic Fepresentation are the karakas, the grammar ica] functions as ~ signed to n o m i n a l s in r e l a t i o n to the •v e r b a l root. They ar'e ne] ther' semantic nol." morphological categories in themselves but cor'r'espond to s e m a n t {cs according to r'u].es specified in the grammar' a n d to m o r . p h o l o g y according to other rules specified in the grammar." [Kip&rsky, 82] .

ism,

+ .................... +

+ I I

groups

I

+ ................. +

+ .................. +

Ikaraka chart & I .... I lakshan charts ] .)..........................+

I

par, s e t

I

+ .................

+

core

l v intermediate representation Function of t h e mor'phol ogi cal analyzer is to take each word in t h e input sentence and extract its root and other associated grammatical information. This information for,ms t h e i n p u t to the local word grouper (LWG).

formalstruc.-

1

25

2.

Local

Word

Grouper

(LWG)

central to the model. These are semantico-syntactic relations between the ve~'b(s) and the nominals in a sentence. The computational gTammar specifies a mapping from the nominals and the verb(s) in a sentence to k a r a ka r'elations between them. Similarly, o t h e r r u l e s of g r a m m a r p r o v i d e a mapping from karaka ~elations to ( d e e p ) s e m a n tic relations between the verb(s) and the nominals. Thus, t h e k a r a k a rela-tions by themselves do not give the semantics. They specify relations which mediate between vibhakti of nominals a n d v e r b f o r m on o n e hand and semantic ['elations on the other [Bharati, Chaitanya, Sangal, 90].

T h e f u n c t i o n of t h i s b l o c k is to f o r m the word groups on the b a s i s of t h e 'local information' (i.e., information based on adjacent words) which will need no r e v i s i o n l a t e r on. T h i s implies that wheneve~ there is a p o s s i b i l i t y of m o r e than one grouping for some word, they w i l l n o t be g r o u p e d t o g e t h e r b y t h e LWG. This block has been introduced to reduce the load on the core parser resulting in increased efficiency and simplicity of the o v e r a l l s y s t e m . The following example the job done by t h e LWG. lowing sentence in H i n d i : ladake

adhyapak

haar

ko

illustrates In t h e f o l -

pahana

rahe

hein

teacher to g a r l a n d garland are garlanding the teacher.)

boys (Boys

F o r e a c h v e r b , for o n e of its forms called as basic, there is ~a d e f a u l t karaka chart. The default karak chart specifies a mapping from vibhakfis to karakas when that verb-form is u s e d in a sentence. (Karaka chart has additional information besides vibhakti pertaining to ' y o g y a t a ' of t h e n o m i n a l s . This serves to r e d u c e t h e possible parses. Yogyata gives t h e s e m a n t i c t y p e t h a t m u s t be s a tisfied by the word group that serves in the kamaka role.)

-ing

the output corresponding to the word ' l a d a k e ' for-ms o n e u n i t , w o ~ d s ' a d h y a p a k ' a n d 'ko' form the next unit, similarly 'pahana', '~ahe' and 'hein' w i l l f o ~ m the last unit.

3.

Come

When a verb-form other than the basic occurs in a sentence, the applicable k a r a k a c h a r t is o b t a i n e d by taking the default karaka chart and transforming it u s i n g t h e v e r b type and its form. The new karaka chart defines the mapping from vibhakti to kanaka relations for the sentence. T h u s , for e x a m p l e , 'jotata hat' ( p l o u g h s ) in A.I has the default karaka chart which says that karts takes no parsarg (Ram). However, for 'jots' (ploughed) in A.2, or A . 4 , t h e k a r a k a c h a r t is t r a n s f o r m e d so that the karts takes the vibhakti 'ne' 'ko' or 'se~,

Parser

The f u n c t i o n of t h e c o r e parser is to accept the input from LUG and produce an 'intermediate language' representation (i.e parsed structure along with the identified karaka role,~;) of the given source language sentence. T h e c o r e p a r s e r h a s to perfo~-m e s s e n t i a l ly t w o k i n d s of t a s k s l) k a r a k a ~ o l e a s s i g n m e n t tom v e r b s 2) s e n s e d i s a m b i g u a t i o n for v e r b s a n d nouns ~espectively. For translating ~mong lhdian languages, assignment of k a r a k a r o l e s [s s u f f i c i e n t . O n e n e e d n o t do t h e s e m a n t i c r,o l e a s s i g n ment after the kaFaka assignment. Let 3.1

us

Grammar The

.

.

.

.

now

.

.

.

.

notion .

.

.

.

.

at

the

grammar.

Formalism

.

.

.

*Here, we use tended sense darthya' etc. kas.

26

look

.

.

of .

.

.

.

.

karaka* .

.

.

.

.

.

.

.

.

.

.

rel~tion .

.

.

.

.

.

.

.

.

.

.

.

A.I

Ram

khet

ko

(Ram

Ram farm ploughs his

A.2

Ram

ne

Ram

ne(Ram

farm koploughed the

A.3

Ram

ko-parsarg farm.) khet

ko

jotata

hai.

plough

-s.

jots.

ploughed. farm. )

is .

.

.

.

t h e w o r d 'kar, aka' in a n exwhich includes 'hetu' , 'tain a d d i t i o n to a c t u a l k a r a -

Ram (Ram

2

ko

khet

jotana

pada.

kofarm plough had-to. h a d to p l o u g h t h e f a r m o )



(

G

~ 4 ~

/

T y p [ ('el ].g, a number of source word groups will qualify for a part:i cul ilr-