selectJonal. restFic ~. 1ions uaual].y specify semantic types. fief" e w e. u s e. k a r a k a. r e l a t i o n s ,. a n d. ~peci. fy not just semantic type~.~ but also post- ...
A Karaka Based Approach to Parsing of Indian Languages Akshar Bharati
Rajeev Sangal
Department of Computer Science and Engineering
indian Institute of Technology Kanpur Kanpur 208 016 Abstract
India
A karaka based &i)pro,'~cl'~ f o r ' t ) , x Y s { n g o f described. I~ has been used for, building a prototype Machine Translation system.
[nc/ian languages a parseL' of ttindi
is for
A lex.[ca].[sod gt'&mlnaF formalism has been developed that a].lovas c o n s t r a i n t s to be s p e c i f i e d between 'demand' ~and 'source' ~;or'ds (e.g., between verb and its karaka roles). The parser has two important novel features: (.[) It has a local word grouping phase in u h i c h wot"d gr'oups a r e f o r m e d using 'local' infor-marion o n l ~ ~. They are formed based on f i n i t e state machine specifications thu~ resulting in a fas~t g r o u p e r . (ii) T h e parser. is a general constraint :~o]ver. It f i r s t transforms the cons t r ' a i n t s to ~ n i n t e g e r programming pr.ob]em and then solves it.
i.
Introduction
ttlPe"
Languages belonging %o the Indian linguistic area shaFe several common features. They are relatively wor.d order free, nominals are inflected or- h a v e p o s t po::it i o n case markers (collectively called as having vibhakti) , have verb complexes consisting of sequences of verbs (possibly joined together into a single word), etc. T h e r e ar'e also commonal]ties in vocabulory, in senses spanned by a ~4ord in o n e l a n g u a g e to those of its counterpart in a n o t h e r Indian language, etc.
sentence V
Before let
describing our grammar us l o o k at %he parser
+ ....................... +
lactive
I
morphological
I
I
analyzer
I
I
lexiconl->
]
......................+
+ ....................... +
I lexical ......................
Iverb
form
.......................
~
r
I entries
+ ...................
chartl-->llocal
+
word
grouperl
+ ....................
word
We base our grammar on the karaka (pronounced kaarak) structure. It is necessary to ment ion that although kaFakas are thought of as similar to c!~,'os, ~ } ~ y ?,r'o f u D d ; ) m e r ~ t : . a ] ] y ,7! { f f ei'e~]+.: : "The pivotal categories of "the ~bstL'act syntactic Fepresentation are the karakas, the grammar ica] functions as ~ signed to n o m i n a l s in r e l a t i o n to the •v e r b a l root. They ar'e ne] ther' semantic nol." morphological categories in themselves but cor'r'espond to s e m a n t {cs according to r'u].es specified in the grammar' a n d to m o r . p h o l o g y according to other rules specified in the grammar." [Kip&rsky, 82] .
ism,
+ .................... +
+ I I
groups
I
+ ................. +
+ .................. +
Ikaraka chart & I .... I lakshan charts ] .)..........................+
I
par, s e t
I
+ .................
+
core
l v intermediate representation Function of t h e mor'phol ogi cal analyzer is to take each word in t h e input sentence and extract its root and other associated grammatical information. This information for,ms t h e i n p u t to the local word grouper (LWG).
formalstruc.-
1
25
2.
Local
Word
Grouper
(LWG)
central to the model. These are semantico-syntactic relations between the ve~'b(s) and the nominals in a sentence. The computational gTammar specifies a mapping from the nominals and the verb(s) in a sentence to k a r a ka r'elations between them. Similarly, o t h e r r u l e s of g r a m m a r p r o v i d e a mapping from karaka ~elations to ( d e e p ) s e m a n tic relations between the verb(s) and the nominals. Thus, t h e k a r a k a rela-tions by themselves do not give the semantics. They specify relations which mediate between vibhakti of nominals a n d v e r b f o r m on o n e hand and semantic ['elations on the other [Bharati, Chaitanya, Sangal, 90].
T h e f u n c t i o n of t h i s b l o c k is to f o r m the word groups on the b a s i s of t h e 'local information' (i.e., information based on adjacent words) which will need no r e v i s i o n l a t e r on. T h i s implies that wheneve~ there is a p o s s i b i l i t y of m o r e than one grouping for some word, they w i l l n o t be g r o u p e d t o g e t h e r b y t h e LWG. This block has been introduced to reduce the load on the core parser resulting in increased efficiency and simplicity of the o v e r a l l s y s t e m . The following example the job done by t h e LWG. lowing sentence in H i n d i : ladake
adhyapak
haar
ko
illustrates In t h e f o l -
pahana
rahe
hein
teacher to g a r l a n d garland are garlanding the teacher.)
boys (Boys
F o r e a c h v e r b , for o n e of its forms called as basic, there is ~a d e f a u l t karaka chart. The default karak chart specifies a mapping from vibhakfis to karakas when that verb-form is u s e d in a sentence. (Karaka chart has additional information besides vibhakti pertaining to ' y o g y a t a ' of t h e n o m i n a l s . This serves to r e d u c e t h e possible parses. Yogyata gives t h e s e m a n t i c t y p e t h a t m u s t be s a tisfied by the word group that serves in the kamaka role.)
-ing
the output corresponding to the word ' l a d a k e ' for-ms o n e u n i t , w o ~ d s ' a d h y a p a k ' a n d 'ko' form the next unit, similarly 'pahana', '~ahe' and 'hein' w i l l f o ~ m the last unit.
3.
Come
When a verb-form other than the basic occurs in a sentence, the applicable k a r a k a c h a r t is o b t a i n e d by taking the default karaka chart and transforming it u s i n g t h e v e r b type and its form. The new karaka chart defines the mapping from vibhakti to kanaka relations for the sentence. T h u s , for e x a m p l e , 'jotata hat' ( p l o u g h s ) in A.I has the default karaka chart which says that karts takes no parsarg (Ram). However, for 'jots' (ploughed) in A.2, or A . 4 , t h e k a r a k a c h a r t is t r a n s f o r m e d so that the karts takes the vibhakti 'ne' 'ko' or 'se~,
Parser
The f u n c t i o n of t h e c o r e parser is to accept the input from LUG and produce an 'intermediate language' representation (i.e parsed structure along with the identified karaka role,~;) of the given source language sentence. T h e c o r e p a r s e r h a s to perfo~-m e s s e n t i a l ly t w o k i n d s of t a s k s l) k a r a k a ~ o l e a s s i g n m e n t tom v e r b s 2) s e n s e d i s a m b i g u a t i o n for v e r b s a n d nouns ~espectively. For translating ~mong lhdian languages, assignment of k a r a k a r o l e s [s s u f f i c i e n t . O n e n e e d n o t do t h e s e m a n t i c r,o l e a s s i g n ment after the kaFaka assignment. Let 3.1
us
Grammar The
.
.
.
.
now
.
.
.
.
notion .
.
.
.
.
at
the
grammar.
Formalism
.
.
.
*Here, we use tended sense darthya' etc. kas.
26
look
.
.
of .
.
.
.
.
karaka* .
.
.
.
.
.
.
.
.
.
.
rel~tion .
.
.
.
.
.
.
.
.
.
.
.
A.I
Ram
khet
ko
(Ram
Ram farm ploughs his
A.2
Ram
ne
Ram
ne(Ram
farm koploughed the
A.3
Ram
ko-parsarg farm.) khet
ko
jotata
hai.
plough
-s.
jots.
ploughed. farm. )
is .
.
.
.
t h e w o r d 'kar, aka' in a n exwhich includes 'hetu' , 'tain a d d i t i o n to a c t u a l k a r a -
Ram (Ram
2
ko
khet
jotana
pada.
kofarm plough had-to. h a d to p l o u g h t h e f a r m o )
•
(
G
~ 4 ~
/
T y p [ ('el ].g, a number of source word groups will qualify for a part:i cul ilr-