Cross-framework parser stacking for data-driven

0 downloads 0 Views 305KB Size Report
un analyseur fondé sur une grammaire et un analyseur fondé sur des ... départ pour l'annotation en dépendances d'une nouvelle langue. KEYWORDS: data-driven dependency parsing, Lexical Functional Grammar (LFG), parser com- ... istic example of the use of the stacked parser in a more complex application scenario,.
Cross-framework parser stacking for data-driven dependency parsing Lilja Øvrelid, Jonas Kuhn and Kathrin Spreyer Department of Linguistics, University of Potsdam Karl-Liebknecht-Str 24/25, 14476 Potsdam {ovrelid,kuhn,spreyer}@uni-potsdam.de

ABSTRACT. In this article, we present and evaluate an approach to the combination of a grammar-

driven and a data-driven parser which exploits machine learning for the acquisition of syntactic analyses guided by both parsers. We show how conversion of LFG output to dependency representation allows for a technique of parser stacking, whereby the output of the grammar-driven parser supplies features for a data-driven dependency parser. We evaluate on English and German and show significant improvements in overall parse results stemming from the proposed dependency structure as well as other linguistic features derived from the grammars. Finally, we perform an application-oriented evaluation and explore the use of the stacked parsers as the basis for the projection of dependency annotation to a new language. Dans cet article, nous présentons et évaluons une approche permettant de combiner un analyseur fondé sur une grammaire et un analyseur fondé sur des données, en utilisant des méthodes d’apprentissage automatique pour produire des analyses syntaxiques guidées par les deux analyseurs. Nous montrons comment la conversion de la sortie d’un analyseur LFG en une représentation en dépendances permet d’utiliser une technique d’empilement d’analyseurs ("parser stacking"), dans laquelle la sortie de l’analyseur fondé sur une grammaire fournit des caractéristiques utilisables par un analyseur fondé sur les données. Nous évaluons notre approche sur l’anglais et l’allemand, et montrons des améliorations significatives pour les résultats d’analyses syntaxiques complètes qui découlent de l’analyse en dépendances ainsi que des caractéristiques provenant de grammaires. Enfin, nous procédons à une évaluation dédiée à une application, et explorons l’utilisation de cet empilement d’analyseurs comme point de départ pour l’annotation en dépendances d’une nouvelle langue.

RÉSUMÉ.

data-driven dependency parsing, Lexical Functional Grammar (LFG), parser combination, stacking, deep linguistic features

KEYWORDS:

analyse syntaxique en dépendances fondé sur des données, Grammaires Lexicales Fonctionnelles (LFG), combinaison d’analyseurs, caractéristiques linguistiques profondes

MOTS-CLÉS :

TAL. Volume 50 – n◦ 3/2009, pages 109 à 138

110

TAL. Volume 50 – n◦ 3/2009

1. Introduction The divide between grammar-driven and data-driven approaches to parsing has become less pronounced in recent years due to extensive work on robustness and efficiency for the grammar-driven approaches (Riezler et al., 2002; Hockenmaier and Steedman, 2002; Miyao and Tsujii, 2005; Clark and Curran, 2007; Zhang et al., 2007; Cahill et al., 2008a; Cahill et al., 2008b). Hybrid techniques which combine hand-crafted linguistic knowledge with statistical models for parse disambiguation characterize most large-scale grammar-driven parsing systems. The linguistic generalizations captured in such knowledge-based resources are thus increasingly available for use in practical applications. The NLP community has in recent years witnessed a surge of interest in dependency-based approaches to syntactic parsing, spurred by the CoNLL shared tasks of dependency parsing (Buchholz and Marsi, 2006; Nivre et al., 2007). Nivre and McDonald (2008) show how two different approaches to data-driven dependency parsing, the graph-based and transition-based approaches, may be combined and subsequently learn to complement each other to achieve improved parsing results for a range of different languages. Although there has been some work towards incorporating more linguistic knowledge in data-driven parsing by means of feature design or representational choices (Klein and Manning, 2003; Bod, 1998; Øvrelid and Nivre, 2007), few studies investigate a setting where a data-driven parser may learn directly from a grammar-driven one.1 In this paper, we show how a data-driven dependency parser may straightforwardly be modified to learn from a grammar-driven parser, hence combining the strengths of the two approaches to syntactic parsing. We investigate an approach which relies only on a general mapping from grammar output to dependency graphs. We evaluate on English and German and show significant improvements for both languages in terms of overall parsing results, stemming both from a dependency structure representation proposed by the grammar-driven parser and a set of additional features extracted from the respective grammars. A detailed feature and error analysis provides further insight into the precise effect of the linguistic, grammar-derived information in parsing and the differences between the two languages. We furthermore investigate the importance of parser quality in the parser stacking setting. Experiments with automatically assigned part-of-speech tags set the scene for an application-realistic setting and we show how very similar and significant improvements may be obtained in the application of the parser combination to raw text. Finally, we go on to explore a realistic example of the use of the stacked parser in a more complex application scenario, which among other things involves out-of-domain application of the components: we apply the parsers as the basis for the projection of dependency annotation to a new language and show significant improvements of results for this task compared to baseline parsers. 1. See Zhang and Wang (2009), however, for a similar study which exploits an English HPSG grammar during parsing and applies the resulting system to an out-of-domain parsing task.

Cross-framework parser stacking

111

converted: 3 ‘halteh. . .i’ predicative 7 6V TYPE 7 6 “pro” 7 6S UBJ 6 2 37 7 6 P RED ‘Verhalten’ 7 6 6 6 77 acc 6O BJ 6C ASE 77 7 6 4 5 S PEC “das” 6 ˘f3 ¯7 6 A DJUNCT f4“damalige” 7 7 6 f22 7 6 3 7 6 P RED ‘fürh. . .i’ 7 6 7 6 7 6 7 6X COMP -P RED 6P TYPE »nosem 7 – 4 4 P RED ‘richtig’ 5 5 O BJ S UBJ f1

X COMP -P RED

2P RED

S UBJ

S PEC

S UBJ - OBJ O BJ

A DJCT

Ich halte das damalige Verhalten für richtig. 1sg

pred. SB

acc NK

NK OA

gold:

nosem NK

MO

Figure 1. Treebank enrichment with LFG output for German example sentence Ich halte das damalige Verhalten für richtig ‘I consider the past behavior (to be) correct’

The paper is structured as follows. Section 2 briefly introduces grammar-driven LFG-parsing, and Section 3 describes the conversion to dependency structures and the feature extraction. In Section 4 we introduce MaltParser, the data-driven dependency parser employed in the experiments, and Section 5 goes on to describe the parser stacking experiments in more detail. Section 6 presents the results from the experiments, and Section 7 provides an in-depth error analysis of these results. The effect of using automatically assigned part-of-speech tags is examined in a set of experiments detailed in Section 8. In Section 9 we present experiments with the stacked parsers in the task of annotation projection. Finally, Section 10 concludes and discusses the generality of the approach and some ideas for future work. 2. Grammar-driven LFG-parsing The ParGram project (Butt et al., 2002) has resulted in wide coverage grammars for a number of languages. These grammars have been written collaboratively within the linguistic framework of Lexical Functional Grammar (LFG) and employ a common set of grammatical features. In the work described in this paper, we use the English and German grammars from the ParGram project. The XLE system (Crouch et al., 2007) performs unification-based parsing using these hand-crafted grammars, assigning a LFG analysis. It processes raw text and assigns to it both a phrasestructural (‘c-structure’) and a feature-structural, functional (‘f-structure’) representation which encodes predicate-argument structure. The set of grammatically possible parsing analyses for the input string is represented in a packed format and then undergoes a statistical disambiguation step: a log-linear model that has been trained on a standard treebank (Riezler et al., 2002) is used to single out the most probable analysis.

112

TAL. Volume 50 – n◦ 3/2009

In a dependency-based evaluation2 on a subset of Penn treebank sentences, the English grammar was earlier reported to achieve a 77.6% F-score (Kaplan et al., 2004), whereas the German grammar achieves a 72.59% F-score (Forst et al., 2004) on the Tiger treebank. In order to increase the coverage of the grammars, we employ the robustness techniques of fragment parsing and ‘skimming’ available in XLE (Riezler et al., 2002). 3. Dependency conversion and feature extraction In extracting information from the output of the deep grammars, we wish to capture as much of the precise, linguistic generalizations embodied in the grammars as possible, while keeping to the requirements of the dependency parser. This means that the deep analysis must be reduced to a token-based representation. The process is illustrated in Figure 1 and details of the conversion process are provided in Section 3.2 below. 3.1. Data We make use of standard data sets for the two languages. The English data set consists of the Wall Street Journal sections 2-24 of the Penn treebank for English (Marcus et al., 1993), converted to dependency format (Johansson and Nugues, 2007). The treebank data used for German is the Tiger treebank for German (Brants et al., 2004), where we employ the version released with the CoNLL-X shared task on dependency parsing (Buchholz and Marsi, 2006). 3.2. LFG to dependency structure The parser stacking relies on a conversion of LFG output to dependency graphs, so we start out by extracting a dependency representation from the XLE output. The extraction is performed by a set of rewrite rules which are executed by XLE’s built-in extraction engine. The mapping between input tokens and f-structures is readily available in the LFG analysis via the φ-projection. In Figure 1, the dashed lines illustrate the mapping for the object NP das damalige Verhalten in example (3) below. The head of the NP (Verhalten) maps to the f-structure f2, which has f3 (identified with das) as its specifier (SPEC) and f4 (damalige) as an adjunct (ADJCT). These dependencies are straightforwardly adopted in the extracted dependency representation as shown. In Figure 1, the gold standard treebank analysis is shown below the sentence 2. F-structures are reduced to dependency-triples and evaluation is performed using the standard measures of precision and recall. A subset of the triples, i.e. the dependencies ending in a ‘pred’-value, provides the basis for the evaluation scores presented here.

Cross-framework parser stacking

113

ROOT

M OD

S PECDET

S UBJ P HI

[...] 251.2 million shares were traded .

Figure 2. Example of generic traded

PHI -arc

between co-heads of the verbal complex were

(‘gold’) and the resulting converted dependency analysis (‘converted’) is shown above the sentence. The mapping between input tokens and f-structures represented via the φprojection is not necessarily injective. This means that two tokens may be mapped to the same f-structure, e.g., the auxiliary and the main verb in the English present perfect construction. While this lack of isomorphy allows for highly modular representations that clearly distinguish between surface-oriented properties (c-structure) and abstract syntactic and semantic dimensions (f-structure) including the additional linguistic features described in Section 3.3 below, it is not directly compatible with the word-based representations assumed in dependency grammar. We reconcile this discrepancy by postulating generic arcs (labeled P HI) between such co-heads, such that the token that introduces the predicate to the f-structure is the head of all its co-heads in the dependency structure. This is illustrated in Figure 2, where the passive auxiliary were becomes the P HI-dependent of traded in the derived dependency structure, because they are mapped to the same f-structure in the XLE output. Most dependency parsers require that sentences be represented as trees where each token is a node and dependents have exactly one head. LFG parsers, on the other hand, compute structures which are directed acyclic graphs, where a dependent may have more than one head. Within LFG, so-called ‘structure sharing’ is employed to account for phenomena such as raising and control. In the conversion process, we attach dependents with multiple heads to their closest head and supply them with the corresponding label. In an alternative version, we also represent the additional attachment as a set of complex dependency labels listing the functional paths. For instance, in the English example in (1), a so-called ‘tough’ construction (Rosenbaum, 1967; Postal and Ross, 1971) where the subject of the matrix verb is also an object of the infinitival clause, or in the subject-to-object raising construction in (2), the boldfaced argument receives the complex label SUBJ - OBJ. In German, we find similar analyses of object predicative constructions, for instance, as in (3). The converted dependency analysis in Figure 1 shows the f-structure and the corresponding converted dependency output of example (3), where Verhalten receives the complex SUBJ - OBJ label.

114

TAL. Volume 50 – n◦ 3/2009

(1)

the anthers in these plants are difficult to clip

(2)

allowed the company to begin marketing a new lens

(3)

Ich halte das damalige Verhalten für richtig I hold the past behavior for right ‘I consider the past behavior (to be) correct’

Since the LFG grammars and the treebank annotation make somewhat different assumptions with respect to tokenization, a stage of post-processing of the extracted data is necessary in order to make the annotations truly parallel. In particular, the treatment of multiword units, hyphenated expressions, and sentence-internal punctuation, such as initials and genitive ’s in English, is mapped to match the treebank annotation in a subsequent post-processing stage. This stage improves on the level of token-wise parallelism between the two versions of the treebank – the gold standard and the XLEparsed, converted version – and has a quite dramatic effect, illustrating the importance of this stage. The level of mismatch between the two versions, i.e. the number of tokens that are not mapped to a grammar-derived analysis, is reduced following postprocessing by 85.8% for English and 66.8% for German. The dramatic effect of this process is mostly due to propagation of mismatch within a sentence. For instance, the XLE grammar for English and German treats multiword units like New York as one word form, whereas the treebank annotations split these into two forms. On the other hand, the grammar treats hyphenated expressions in English, such as needle-like, as one form, whereas the treebank does not. In terms of the effect of post-processing on the dependency analysis of the grammar, we adopt the following strategy: we duplicate the analysis of composite forms which are split, like New York, and, we assign to concatenated forms the analysis of its last element, as in the hyphenated needle-like. 3.3. Additional linguistic features The LFG grammars contain linguistic generalizations which may not be reduced to a dependency structure. For instance, the grammars contain information on morphosyntactic properties such as case, gender and tense, as well as more semantic properties detailing various types of adverbials, distinguishing count and mass nouns, specifying semantic conceptual categories such as human, time and location, etc. Table 1 presents the features extracted for use during parsing from the German and English XLE-parses, organized by the part-of-speech of the dependent. The final two rows of Table 1 provide features which are language-specific for the two languages (English and German), whereas the rest are shared features of both grammars. We also provide the possible values for the features, most of which are atomic or binaryvalued. A few features (GOV P REP, C OORD F ORM) take the word form (Form) of another token in the syntactic context as feature value. Due to the fact that LFG is a unification-based formalism, some of the features will not be strictly local to a specific token, but stem from a head or dependent (or mother/sister, to be precise).

Cross-framework parser stacking

115

For instance, the German definiteness feature (DEF) of a common noun comes from the determiner. The features also bear witness to the fact that XLE makes use of a phrase-structural representation for parsing (c-structure). For instance, it provides a COORD - LEVEL feature which takes as values phrase categories such as NP and VP , detailing the phrasal level of coordination. Quite a few of the features detailed in Table 1 are available in the original treebank annotation. Clearly, treebanks differ quite a lot in the amount of information expressed by the annotation. The Penn Treebank contains only information on partof-speech and syntactic structure, whereas the Tiger treebank in addition distinguishes the morphosyntactic categories case, number, gender, person, tense and mood. The level of linguistic analysis expressed by the features varies ranging from morphosyntactic information, which is fairly straightforward to obtain, to deeper features detailing structural aspects (GOV P REP, C OORD F ORM), subcategorization information (VT YPE , S UB C AT) or various semantic properties which reflect more fine-grained linguistic analyses of phenomena such as nominal semantics (COMMON , P ROPER T YPE etc.), referentiality (NT YPE , D EIXIS , G END S EM), adverbial semantics (AD JUNCT T YPE , ADV T YPE ), nominalization ( DEVERBAL ), etc. Whether these distinctions prove to be helpful in the task of syntactic parsing and which of them contribute the most is an empirical question which we will investigate experimentally below. 3.4. Coverage Hand-crafted grammars are often accused of low coverage on running texts. However, a lot of effort has in recent years been put into increasing their robustness. As mentioned earlier, we employ the robustness techniques available with XLE, allowing for fragmented parses as well as using the technique of ‘skimming’, which sets a bounded amount of work performed per subtree and skims the remaining constituents when too much time has passed (Riezler et al., 2002). Following the XLE-parsing of the treebanks and the ensuing dependency conversion, we have sentence coverage of 95.4% for the English grammar and 97.3% for the German grammar. In order to align the two versions of the treebank we then perform tokenization fixes for the respective languages, as described in Section 3.2 above. Following this stage of post-processing, we find that 92.3% of the English tokens and 95.2% of the German tokens may be mapped to the corresponding token in the treebank. In terms of sentence coverage, we end up with a grammar-based analysis for 95.2% of the English sentences, 45238 sentences altogether, and 96.5% of the German sentences, 38189 sentences altogether.

116

TAL. Volume 50 – n◦ 3/2009

POS

Possible values of features  

            

       

                      MOOD Verb  !  " TENSE   ! #$ 

$  !   VT YPE  ) PASSIVE(+/− % , PERF(+/− % , GOV P REP(& ' (!    ) *   $      CASE   " (+  !    COMMON

 ,"- , LOCATION T YPE  

   Noun/Pro NUM   #!* ./ (  ( "   ! ! 0 NT YPE 0)  "$ #  PERS 1  #2"3   ,(    #!   4  .!    PROPERT YPE  ) GOV P REP(&  .   #$   P S EM Preposition  #-  P T YPE 567 8  58 9 8 : 8 9 " 8 8  7 8  (; COORD L EVEL Coordination  % COORD(+/− % , COORD F ORM (&

/(-  

$ (  !  .+  ?@A= 9 BC 8