Language Technologies Research Centre,. International Institute of Information Technology,. Hyderabad, India ... specifi
Experiments in Indian Language Dependency Parsing Bharat Ram Ambati, Phani Gadde and Karan Jindal Language Technologies Research Centre, International Institute of Information Technology, Hyderabad, India – 500032 {ambati,phani.gadde}@research.iiit.ac.in,
[email protected]
Abstract In this paper we present our experiments in parsing three Indian Languages namely Hindi, Telugu and Bangla. We explore two data-driven parsers Malt and MST and compare the results of both the parsers. We describe the data and parser settings used in detail. Some of these are specific to either one particular or all the Indian Languages. We report our results on the test data of ICON tools contest. The average of best unlabeled attachment, labeled attachment and labeled accuracies are 88.43%, 71.71% and 73.81% respectively.
1
Introduction
Parsing is one of the major tasks which helps in understanding the natural language. It is useful in several natural language applications. Machine translation, anaphora resolution, word sense disambiguation, question answering, summarization are few of them. This led to the development of grammar-driven, data-driven and hybrid parsers. Due to the availability of annotated corpora in recent years, data driven parsing has achieved considerable success. The availability of phrase structure treebanks for English has seen the development of many efficient parsers. Indian Languages are morphologically rich free word order languages. It has been suggested that free word order languages can be handled better using the dependency based framework than the constituency based one (Hudson, 1984; Shieber, 1985; Mel’čuk, 1988, Bharati et al., 1995). As a result, dependency annotation using paninian framework is started for Indian Languages (Begum et al., 2008). There have been some previous attempts at parsing Hindi following a constraint based approach (Bharati et al.
1993, 2002, 2008b). Due to availability of treebank for Hindi, some attempts are made at building statistical (Bharati et al., 2008a, Husain et al., 2009; Ambati et al., 2009) and hybrid parsers (Bharati et al., 2009). In all these approaches both syntactic and semantic cues are explored to reduce the confusion between ambiguous dependency tags. In this paper we describe our experiments in parsing three Indian languages namely Hindi, Telugu and Bangla, in detail. Some of these are specific to either one particular or all the Indian Languages and some in general to any kind of language. We explore two data-driven parsers Malt and MST and compare results of both the parsers. We report our results on the ICON tools contest test data. The average of best unlabeled attachment, labeled attachment and labeled accuracies are 88.43%, 71.71% and 73.81% respectively. The paper is arranged as follows, in section 2, we present general information about the contest data. In section 3, we describe our approach for parsing. Section 4 describes the data and parser settings for all the three languages. We present our results in section 5. We conclude our paper in section 6.
2
Tools Contest
In ICON tools contest 2009, we are provided with treebanks of three Indian languages namely Hindi, Telugu and Bangla1. Hindi and Bangla are Indo-Aryan languages and Telugu is a dravidian
1
Hindi, Telugu, Bangla are official languages of India. These are fourth, seventh and fourteenth widely spoken languages in the world respectively. For complete details, see http://en.wikipedia.org/wiki/Ethnologue_list_of_most -spoken_languages
language which is agglutinative in nature. Table 1, shows the general statistics of the data.
maximum entropy algorithm for labeling (Dai et al., 2009).
Hindi
Telugu
Bangla
4
Sentences
1800
1756
1279
4.1
Words
25420
9482
12183
Unique Words 5465
3201
4283
Chunks
6753
8221
Both the parsers take CoNLL format as input. So, we have taken data in CoNLL format for our experiments. The FEATS column of each node in the data has 6 fields. These are six morphological features namely category, gender, number, person, vibhakti5 or TAM6 markers of the node. We experimented considering different combinations of these fields for both the parsers. For all the three languages vibhakti and TAM fields gave better results than others. This is similar to the settings of Bharati et al. (2008a). They showed that for Hindi, vibhakti and TAM markers help in dependency parsing where as gender, number, person markers won't.
16185
Table 1. Statistics of the data. In the contest we are provided two types of data for all the three languages. Both the data differ only in the tagset. One is fined grained and the other is coarse grained dependency tagset. There are around 30 tags for telugu and bangla and 50 tags for hindi in the fine grained tagset, whereas in coarse grained tagset there are 12 tags for all the three languages.
3
Approach 2
We used two data driven parsers Malt (Nivre et al., 2007a), and MST3 (McDonald et al., 2005b) for our experiments. Malt is a classifier based Shift/Reduce parser. It uses arc-eager, arc-standard, covington projective and convington non-projective algorithms for parsing (Nivre, 2006). History-based feature models are used for predicting the next parser action (Black et al., 1992). Support vector machines are used for mapping histories to parser actions (Kudo and Matsumoto, 2002). It uses graph transformation to handle non-projective trees (Nivre and Nilsson, 2005). MST uses Chu-Liu-Edmonds (Chu and Liu, 1965; Edmonds, 1967) Maximum Spanning Tree algorithm for non-projective parsing and Eisner's algorithm for projective parsing (Eisner, 1996). It uses online large margin learning as the learning algorithm (McDonald et al., 2005a). Malt provides an xml file where we can specify the features for the parser. But for MST, these features are hard coded. Accuracy of the labeler of MST is very low. We tried to modify the code but couldn't get better results. So, we used maximum entropy classification algorithm, maxent4 for labeling. First we ran MST for unlabeled dependency tree. On the output of MST we used
4.2
Settings Input Data
Malt Settings
Malt provides options for four parsing algorithms arc-eager, arc-standard, covington projective, covington non-projective. We experimented with all the algorithms for all the three languages for both the tagsets. Tuning the SVM model was difficult; we tried various parameters but could not find any fixed pattern. Finally, we tested the performance by adapting the CoNLL shared task 2007 (2007b) settings used by the same parser for various languages (Hall et. al, 2007). For feature model also after exploring general useful features, we experimented taking different combinations of the settings used in CoNLL shared task 2007 for various languages. Table 2, shows the best settings for all the three languages for both the tagsets. Algorithm
SVM Settings
Hindi - fine
arc-eager
Turkish settings
Telugu - fine
arc-eager
Portuguese settings
Bangla - fine
arc-eager
Portuguese settings
Hindi - coarse
arc-standard English settings
Telugu - coarse arc-eager
Czech settings
Bangla - coarse arc-eager
Czech settings
Table 2. Malt Settings
2
Malt Version 1.2 MST Version 0.4b 4 http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html 3
5
Vibhakti is a generic term for preposition, post-position and suffix. 6 TAM is Tense, Aspect and Modality marker.
4.3
MST Settings
MST provides options for two algorithms, projective and non-projective. It also provides options to select features over single edge (order=1) and over pairs of adjacent edges in the tree (order=2). We can also specify k-best parses while training using training-k attribute. Table 3, shows the best settings for all the three languages for both the tagsets. algorithm
training-k order
Hindi non-projective 5 (coarse, fine)
2
Telugu non-projective 1 (coarse, fine)
1
Bangla non-projective 5 (coarse, fine)
2
Table 3. MST Settings With the original MST parser, the labeled accuracy is very low. This is because only minimal features are used for labeling. Features considering FEATS column of CoNLL format are not used. Vibhakti and TAM markers which are crucial in labeling are not considered by the parser as they are present in FEATS column. We modified the code to so that vibhakti and TAM markers which are cruicial for labeling are used (Bharati et al., 2008a). We tried to add much richer context features like sibling features, but modification of the code is little complex. For this we have to modify the entire data structure being used for labeling task. Because of this we used maxent for labeling. 4.4
Table 4, shows the best settings for all the three languages individually. Features Hindi (fine, coarse)
D, SC, DS, CC
Telugu (fine, coarse)
D, SC, DS
Bangla (fine, coarse)
D, SC, DS
CN: W,R,P,CP,VT,R+P PN: R,P,CP,VT RLS: R,CP,VT LRS: R,CP,VT CH: R,VT CN: W,R,P,CP,VT,R+P PN: R,P,CP,VT RLS: R,CP,VT LRS: R,CP,VT CH: R,CP,VT CN: W,R,P,CP,VT PN: R,P,CP,VT RLS: R,CP,VT LRS: R,CP,VT
Maximum Entropy Settings
Unlabeled dependency tree given by MST is given to maxent for labeling. We did several experiments with different options provided by the maxent tool. Best results are obtained when number of iterations is 50. As we have complete tree information we have taken edge features as well as context features. Details of the nodes and the features experimented are Nodes • • • • •
Features • W: Lexical item • R: Root form of the word • P: Part-of-speech tag • CP: Coarse POS tag • VT: Vibhakti or TAM markers • D: Direction of the dependency arc • SC: Number of siblings • CC: Number of children • DS: Difference in positions of node and its parent. • PL: POS list from dependent to tree’s root through the dependency path.
CN: Current node PN: Parent node RLS: Right-most left sibling LRS: Left-most right sibling CH: Children
Table 4. maxent Settings (CN: W; represents lexical item (W) of the current node (CN)) 4.5
Some general Hindi specific features
For Hindi we also explored using TAM classes instead of TAM markers (Bharati et al., 2008a). TAM markers which behave similarly are grouped into a class. This reduced the number of features and training time but there isn't any significant improvement in the accuracies. Genitive vibhakti markers (kA, ke, kI) are the major clue for ‘r6’ relation. But in case of pronouns the vibhakti feature doesn’t provide this information. But the category given by morph will have a special value ‘sh_P’ for such cases. Using this infor-
mation and suffix of the pronoun, we gave kA, ke or kI as vibhakti feature accordingly. 4.6 Clause Boundary Information as feature for Hindi 4.6.1
Why Clause Boundary?
Traditionally, a clause is defined as a phrase containing at least a verb and a subject. It can be an independent clause or a dependent clause, based on whether it can stand alone when taken in isolation or not respectively. By the definition itself, the words inside a clause form a set of modifiermodified relations, thereby forming a meaningful unit, like a sentence. This makes most of the dependents of the words in a clause to be the words in the same clause, or we can say that the dependencies of the words in a clause are localized to the clause boundary. Given a sentence, the parser has to disambiguate between several words in the sentence, to find the parent of a particular word. Making the parser use the clause boundary information enables the parser to localize its decisions to some extent, which will result in a better performance. The search space of the parser can be reduced by a good extent if we solve a relatively small problem of identifying the clauses. This is the whole idea behind partial parsing, where we sacrifice completeness for efficiency and still get valuable information about the sentence. Interestingly, it has been shown recently that, most of the non-projective cases in Hindi are inter-clausal (Mannem et al., 2009). Identifying clausal boundaries, therefore, should prove helpful in parsing non-projective structures. The same holds true for long-distance dependencies. 4.6.2
Clause Boundary Identifier
We used the Stage1 parser of Husain et. al, 2009, for the clause boundary information. We used this partial parser, as a pre-processing step, to get the clause boundary information. It uses Malt to identify just the intra-clausal relations. To achieve this, they introduce a special dummy node named _ROOT_ which becomes the head of the sentence. All the clauses are connected to this dummy node with a dummy relation. In effect the parser gives only intra-clausal relations. We did extensive experiments on Hindi, trying different definitions of clauses, to get an optimal clause definition, to do the partial parsing. Since some of these experiments need to be analyzed in more detail in case of Bangla and Telugu, we
limited ourselves to use the clause boundary information in Hindi parsing. Since the above tool parses all the clauses, we have the information about the structure of the clause too, along with the clause boundary information. We did several experiments trying to utilize all the information that is being given by the partial parser. The best results are when the clause boundary information, along with the head information for each clause is given as a feature to each node. The accuracies of the clausal features are given in the following table. Precision
Recall
Clause Boundary Information
84.83%
91.23%
Head Information
92.42%
99.40%
Table 5. Accuracies of the features being used. Since the clause inclusion features for each node can't be given either in Malt or in MST, we tried to modify the parsers to make them handle these kind of features. The best results using clause boundary information are those when a modified version of MST is used for parsing. We first experimented by giving only the clause inclusion (boundary) information to each node. This can help the parser reduce its search space in parsing decisions. Then, we provided the head and non-head information (whether that node is the head of the clause or not) as well. The head or non-head information helps in handling complex sentences in which there are more clauses and each verb in the sentence having its own argument structure. The best accuracy for MSTparser, that is reported in this paper is when both of these features are used. For Hindi, we used clause boundary feature for unlabeling. This output is given to maxent for labeling.
5
Experiments and Results
We merged both the training and development data and did 5-fold cross-validation for tuning the parsers. We extracted best settings from the cross validation experiments. These settings are applied on the test data of the contest. Size of the test data is 150 sentences for each of the three languages. Table 5, shows the individual and average results on both coarse-grained and finegrained tagset for all the three languages' test data using both the approaches.
Malt
MST + maxent
Hindi- Telugu- Bangla- Average- Hindifine fine fine fine coarse
Telugu- Banglacoarse coarse
Average- Averagecoarse total
UAS
90.14
86.28
88.45
88.29
90.22
85.25
90.22
88.56
88.43
LAS
74.48
60.55
72.63
69.22
79.33
65.01
78.25
74.2
71.71
L
76.38
61.58
75.34
71.1
81.66
66.21
81.69
76.52
73.81
UAS
91.26
84.56
87.51
87.78
90.48
85.42
87.41
87.77
87.77
LAS
73.79
57.12
69.61
66.84
79.15
63.12
75.03
72.43
69.64
L
76.04
58.15
73.15
69.11
81.75
64.67
79.19
75.2
72.16
Table 6. Results on Test Data
6
Conclusions and Future Directions
For all the languages Malt performed better over MST+maxent. We would like to do proper error analysis after the release of gold test data. We have modified the implementation of MST to handle vibhakti and TAM markers for labeling. We observed that even during unlabeled parsing some features which might not be useful in parsing are being used. We would like to modify the implementation to do experiments with features for unlabeling also. Acknowledgments We would like to express our gratitude to Dr. Dipti Misra Sharma and Prof. Rajeev Sangal for their guidance and support. We would also like to thank Mr. Samar Husain for his valuable suggestions. Mr. Sambhav Jain had worked on a series of initial experiments on hindi parsing.
References B. R. Ambati, P. Gade and C. GSK. 2009. Effect of Minimal Semantics on Dependency Parsing. In Proceedings of RANLP 2009 Student Research Workshop. R. Begum, S. Husain, A. Dhwaj, D. M. Sharma, L. Bai, and R. Sangal. 2008. Dependency annotation scheme for Indian languages. In Proceedings of IJCNLP-2008. http://www.iiit.net/techreports/2007_78.pdf A. Bharati, S. Husain, B. Ambati, S. Jain, D. Sharma, and R. Sangal. 2008a. Two semantic features make all the difference in parsing accuracy. In Proceedings of ICON-08. A. Bharati, S. Husain, D. M. Sharma, and R. Sangal. 2008b. A Two-Stage Constraint Based Depen-
dency Parser for Free Word Order Languages. In Proceedings of the COLIPS International Conference on Asian Language Processing 2008 (IALP). Chiang Mai, Thailand. 2008. A. Bharati, S. Husain, D. M. Sharma, and R. Sangal. 2009. Two stage constraint based hybrid approach to free word order language dependency parsing. In Proceedings of the 11th International Conference on Parsing Technologies (IWPT09). Paris. 2009. A. Bharati, R. Sangal, T. P. Reddy. 2002. A Constraint Based Parser Using Integer Programming, In Proceedings of ICON, 2002. www.iiit.net/techreports/2002_3.pdf A. Bharati, V. Chaitanya and R. Sangal. 1995. Natural Language Processing: A Paninian Perspective, Prentice-Hall of India, New Delhi, pp. 65-106. ltrc.iiit.ac.in/downloads/nlpbook/nlp-panini.pdf A. Bharati and R. Sangal. 1993. Parsing Free Word Order Languages in the Paninian Framework. Proc. of ACL:93. E. Black, F. Jelinek, J. D. Lafferty, D.M.Magerman, R. L.Mercer, and S. Roukos. 1992. Towards history-based grammars: Using richer models for probabilistic parsing. In Proc. of the 5th DARPA Speech and Natural Language Workshop, pages 31–37. Y.J. Chu and T.H. Liu. 1965. On the shortest arborescence of a directed graph. Science Sinica, 14:1396– 1400. Q. Dai, E. Chen, and L. Shi. 2009. An iterative approach for joint dependency parsing and semantic role labeling. In Proceedings of the 13th Confernce on Computational Natural Language Learning(CoNLL-2009), June 4-5, Boulder, Colorado, USA.June 4-5. J. Edmonds. 1967. Optimum branchings. Journal of Research of the National Bureau of Standards, 71B:233–240.
J. Eisner. 1996. Three new probabilistic models for dependency parsing: An exploration. In Proceedings of COLING-96, pages 340–345. J. Hall, J. Nilsson, J. Nivre, G. Eryigit, B. Megyesi, M. Nilsson and M. Saers. 2007. Single Malt or Blended? A Study in Multilingual Parser Optimization. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, 933—939. S. Husain, P. Gadde, B. Ambati, D. M. Sharma and Rajeev Sangal. 2009. A modular cascaded approach to complete parsing. In the Proceedings of COLIPS International Conference on Asian Language Processing(IALP), 2009. R. Hudson. 1984. Word Grammar, Basil Blackwell, 108 Cowley Rd, Oxford, OX4 1JF, England. T. Kudo and Y. Matsumoto. 2002. Japanese dependency analysis using cascaded chunking. In CoNLL-2002. pp. 63–69. P. Mannem and H. Chaudhry. 2009. Insights into Non-projectivity in Hindi. In Proceedings of ACLIJCNLP Student paper workshop. R. McDonald, K. Crammer, and F. Pereira. 2005a. Online large-margin training of dependency parsers. In Proceedings of ACL 2005. pp. 91–98. R. McDonald, F. Pereira, K. Ribarov, and J. Hajic. 2005b. Non-projective dependency parsing using spanning tree algorithms. Proceedings of HLT/EMNLP, pp. 523–530. I. A. Mel'čuk. 1988. Dependency Syntax: Theory and Practice, State University, Press of New York. J. Nivre and R. McDonald. 2008. Integrating GraphBased and Transition-Based Dependency Parsers. In Proc. Of ACL-2008. J. Nivre, J. Hall, J. Nilsson, A. Chanev, G. Eryigit, S. Kübler, S. Marinov and E Marsi. 2007a. MaltParser: A language-independent system for datadriven dependency parsing. Natural Language Engineering, 13(2), 95-135. J. Nivre and J. Hall and S. Kubler and R. McDonald and J. Nilsson and S. Riedel and D. Yuret. 2007b. The CoNLL 2007 Shared Task on Dependency Parsing. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007. J. Nivre. 2006. Inductive Dependency Parsing. Springer. J. Nivre and J. Nilsson. 2005. Pseudo-projective dependency parsing. In Proc. of ACL-2005, pages 99–106. S. M. Shieber. 1985. Evidence against the contextfreeness of natural language. In Linguistics and Philosophy, p. 8, 334–343.