More than Surface-Based Parsing; Higher Level Evaluation of Cass-SWE Dimitrios Kokkinakis Språkdata, Göteborg University Box 200, SE-405 30, Sweden
[email protected]
Abstract Surface-based, shallow or partial parsing has become an important sub-area in Language Engineering. This development is not difficult to motivate. Full syntactic analysis is computationally expensive, time consuming and with poor linguistic coverage. Parsing systems aiming for global optimization and exhaustive-search based syntactic analysis tend to have a limited coverage, and to degrade when facing unrestricted data, since efforts to create a parse spanning an entire sentence ends by making poor local decisions. This paper gives a brief description of a parser for written Swedish that has the ability to provide a syntactic analysis for any sentence it is confronted with, and it is able to fast and reliably recognize and extract grammatical relations using finite-state cascades. Since the parsing mechanism and the Swedish grammar developed therein have been reported elsewhere, this paper will only give a brief presentation of the grammar and instead concentrate on an evaluation intended to determine its effectiveness at a higher level of linguistic analysis, namely grammatical relations.
1. Introduction Surface-based, shallow or partial parsing has become an important sub-area in Language Engineering (LE), the interdisciplinary field that performs tasks involving processing human language, by applying NLP and Computational Linguistic techniques to practical software systems. In parallel with this development, the interest in the evaluation of language technology components has grown rapidly (promoted by the Language Resources and Evaluation Conferences, LREC (1998, 2000); workshops initiated by the Expert Advisory Group on Language Engineering Standards, EAGLES; as well as numerus competitions focused on the evaluation of components and/or larger systems. Specifically for syntactic parsing, the PARSEVAL scheme has been the dominating evaluation paradigm. PARSEVAL compares a candidate parse with its reference parse from a manually annotated corpus. PARSEVAL uses three metrics, precision, what proportion of the parse found that is correct, recall, what proportion of the ”correct” parse the parser has found, and average crossing brackets, the number of response constituents that violate the boundaries of a constituent in the key-parse which occurs where two bracketed sequences overlap, but neither is properly contained in the other; cf. Grishman et al. (1992). However, during the last couple of years researchers have raised the issue regarding whether the currently available parsing evaluation schemes, such as PARSEVAL, are appropriate for measuring parsing accuracy or not. A starting point for this discussion has been the fact that it is fairly impossible to make objective inter-system comparisons, since there are no ways to measure the difference in the syntactic complexity between syntactically annotated corpora. Different grammatical frameworks produce different representations, parsing design philosophies across languages and corpora vary considerably. Carroll et al. (1998) motivate how a new and more informative and generally applicable technique of measuring parsing accuracy can be based on the use of grammatical relations. Similarly, Gaizauskas et al. (1998) proposed an alternative, flatter scheme for evaluation in which ‘problematic’ items have been deleted, and only constituency annotations on which there is broad agreement across a range of grammatical theories have been kept for evaluation. This paper will primarily deal with measuring parsing accuracy using a handfull of grammatical relations as its evaluation input. Background information on parsers that are able to extract grammatical relations for different languages and their evaluation based solely on these relations wll be given. A short presentation of the parser used in this work, called Cass-SWE (Kokkinakis & Johansson Kokkinakis (1999)); a worked example showing how Cass-SWE processes a particular sentence; evaluation results carried out in order to determine its effectiveness based on the grammatical relations recognised and ways in which Cass-SWE may be improved and extended constitute the outline of this paper.
2. Background Partial parsing does not require a full syntactic parse of a sentence to pursue semantic analysis. For a number of LE applications it suffices if a system, or a component of larger architecture, has access to ”limited” syntactic information in the form of grammatical relations, such as subject and object. Some of the applications that may benefit from the utilization of such relations include: document indexing, Grefenstette (1994) and Information Extraction, Rilloff (1993), Grishman (1997). An approach that has many similarities with ours is the evaluation of the automatic recognition and extraction of subject and object dependency relations for French, presented in Aït-Mokhtar & Chanod (1997). They used sequential finite-state transducers for the extraction of the grammatical relations, parsing was done incrementally, (prior to tokenisation and part-of-speech tagging). The impact, that the part-of-speech annotation has on high quality parsing, was commented. The authors calculated that over 25% of the errors produced by their parser were due to tagging, followed by coordination and apposition errors. Their results for the subject recognition were: 94.4% Prec. and 87.4% Recall; and for the object recognition were: 86.3% precision and 79.9% Recall. Evaluation was based on 1,005 sentences of an avg. 23 tokens/sent. Descriptions of heuristic identification of syntactic relations for English and their evaluation are given by Tapanainen & Järvinen (1997). The authors use a commercial parser, a successor of the English Constraint Grammar (ENGCG), see Karlsson (1994). Their evaluation in terms of subject, object and predicative relations in different types of texts gave, for subject assignment 95%-98% Prec. and 83%-92% Recall, for objects 89%94% Prec. and 83%-91% Recall and for predicatives 92-97% Prec. and 86%-96% Recall. 3. The Parser To our knowledge, very few attempts have been made to shallow parsing Swedish, both efficiently and in a very large scale. Moreover robust parsers that assign grammatical relations are even more rare. We have implemented a large grammar for written Swedish which is efficient, with large coverage and which can produce grammatical relations of the form described in Section 3.2. The grammar we developed is called Cass-SWE. Cass is an acronym given by Steven Abney to describe the parser he developed and which we use in our work, Cascaded analysis of syntactic structure. Cass uses a finite-state cascade mechanism and extended expressions, called by Abney (1997) internal transducers, for inserting actions and roles into patterns. Details for the Swedish grammar can be found in Kokkinakis & Johansson Kokkinakis (1999). 3.1 Characteristics of Cass-SWE The head for every phrase constituent is returned, on demand; Any kind of input, even ill-formed and ungrammatical is allowed by Cass-SWE. Finite state approximation is a key element of current parsing practice and, although this means that a finitestate recognizer will sometimes treat sentences as grammatical when they are not, the usual effect is that the approximation is more efficient and tolerant than a context-free model; The output is considered partial, no attachment is performed; however different types of elliptical constructions are recovered and marked; Any type and number of features provided by other software can be added into the input format. An example is named-entities (see the worked example in Section 3.3); The labels of a small number of tokens, such as the infinitive marker ‘att’, are intentionally ignored by Cass-SWE; this shows the lexicalisation dimension of the parser. This can be seen as a way to reduce ambiguity with the penalty of writing a few more rules at different levels; The levels in Cass-SWE are ordered according to the following principles: (1) Basic Phrases: Temporal adverbial phrases are recognized first, followed by locative adverbial phrases; Noun phrases follow, then adjectival, and prepositional phrases. A distinction is made for pps having the preposition av 'by' as head, since such pps may signal the Agent in passive; Verbal groups, or chains, follow. These allow adverbial phrases to intervene in the chain; Phrases of the above types consist of finite-state rules. Bundles of rules are divided into different lev-
els depending on their internal complexity, simpler follow complex. For instance, 5 levels are distinguished for verbal groups: finite active & passive, non-finite active & passive and auxiliary; (2) Subordinate Clauses, Questions and Main Clauses: Embedded questions with interrogative pronoun+‘som’ Relative clauses with ‘som’ and ‘där’ Adverbial clauses Infinitive clauses Complement clauses, headed by the complementizer ‘att’ Complement clauses, headed by a preposition wh-Questions with interrogative adverb & pronoun yes/no Questions Relative clauses without ‘som’ Copula passive constructions (‘bliva/vara’+past participle) Various types of main clauses; All types of clauses are divided into levels. The division depends partly on the type of the verbal group and the word order, partly on any available lexicalized complementizers or part-of-speech tags that can provide strong evidence for a particular type of clause. (3) Complex phenomena, discontinuous constructions, embedding etc. Complex combinations of different types of main and subordinate clauses; Verbless constructions are recognized last 3.2 Grammatical Relations The parser identifies five types of grammatical relations: subject, agent, object, indirect object, predicative attribute and predicative adjunct, the two last are collapsed into a single relation. To these five relations it should be added the recognition and annotation of the main predicate of any clause, marked ‘VRB’. No subcategorisation information is used or required, the topographical structure of the surface strings decides what grammatical label different constituents may get. Moreover, the identification of these relations does not take the verb's transitivity into account. If wished, verbal transitivity can be dealt with at a later stage, using a lexicon containing information on transitivity. However, it can be discussed whether accurate lexicons of that kind are compatible with whatever one can find in large corpora. The phrasal heads (in base form if so wanted) are used as fillers of the above roles. 3.3 Worked Example We provide a short example showing how it is processed by the system. (1) Enligt tidskriften Der Spiegel började härvan rullas upp i höstas sedan en schweizisk företagskonsult träffat ett par höga chefer på det schweizisk-svenska företaget ABB i Zürich.
(1’) According to the Der Spiegel magazine the bundle started to be dis-entangled in autumn when a Swiss business-consultant met a couple of high managers of the Swiss-Swedish company ABB in Zürich.
Any text to be analyzed by Cass-SWE, is tokenised and part-of-speech tagged. During tokenisation a filter is used to recognize and annotate different multi-word phenomena (Pre-Tagging Filter). Pre-Tagging Filter Multi-word expressions: prepositions, adjectives, adverbials, pronouns/determiners, common nouns, particle verbs; A number of complex proper nouns, ‘Der Spiegel’, ‘Dow Jones’, ‘Los Angeles’; Non-lexicalized/naturalized foreign, e.g. ‘walk over’, ‘bungee jump’; Although the morphosyntactic annotation scheme used for tagging is rich in features, it has certain limitations when it comes to apply it to further processing such as finite-state parsing. By this, we do not mean that parsing cannot be carried out, we simply imply high quality parsing based on anno-
tated input. Therefore, we consider the use of a Post-Tagging Filter as a way to simplify the parsing complexity, by putting more emphasis on the lexicon, i.e. the tokens and their respective morphosyntactic description provided by the tagger. Thus, we reduce ambiguity from the parser’s input. The filter scan the local context of the tagged words, and if appropriate conditions apply, a feature is added to an existing tag. The fact is, that by slightly complicating, or rather enriching the tagset, the grammar rules can be kept simpler. After the pre-tagging filtering the sentence in (1) is transformed into (2) and then part-of-speech tagged (2’). The morphosyntactic description is given in upper-case following the slash ‘/’. The tagset can be found in: spraakdata.gu.se/lb/parole/sgml2suc.html. (2) Enligt tidskriften Der_Spiegel började härvan rullas_upp i höstas sedan en schweizisk företagskonsult träffat ett par höga chefer på det schweizisk-svenska företaget ABB i Zürich. (2’) Enligt/S tidskriften/NCUSND Der_Spiegel/NP började/VMISA härvan/NCUSND rullas_upp/VMN0P i/S höstas/NC0000 sedan/CS en/DIUS0 schweizisk/A0PUSNI företagskonsult/NCUSNI träffat/VMU0A ett/DINS0 par/ NCNSNI höga/A0P0PN0 chefer/NCUPNI på/S det/DFNS0 schweizisk-svenska/A0P0SND företaget/NCNSND ABB/Y i/S Zürich/NP ./F
The second filter is used after tagging and can recognize and appropriately annotate: Post-Tagging Filter Coordination; Measure/Quantity Nominals; Modal & temporal auxiliary verbs; Temporal adverbs and a common nouns designating time; Copula verbs in passive; Adjectival nouns; Various types of appositions; The new annotations after filtering for the sentence in (2’) are given in (3). Note that the annotation for the marked tokens is the one modified by the filter, either by adding a feature, such as in the case of ‘APP1’ a type of apposition, or ‘VAISA’ which in this case the original annotation ‘VMISA’ i.e. main verb, is changed to auxiliary verb. Optionally, a named-entity recognizer, or other annotation software, can be used to add annotations of interest. An example is given in (3’) using annotation for named-entities. This is a process that can be used separately and then merge its output result into the input text-format of the parser. (3) Enligt/S tidskriften/NCUSND-APP1 Der_Spiegel/NP började/VAISA härvan/NCUSND rullas_upp/VMN0P i/ S höstas/NC0000-T sedan/CS en/DIUS0 schweizisk/A0PUSNI företagskonsult/NCUSNI träffat/VMU0A ett/ DINS0 par/NCNSNI-MSR höga/A0P0PN0 chefer/NCUPNI på/S det/DFNS0 schweizisk-svenska/A0P0SND företaget/NCNSND-APP1 ABB/Y i/S Zürich/NP ./F (3’) Enligt tidskriften Der_Spiegel började härvan rullas_upp i höstas sedan en schweizisk företagskonsult träffat ett par höga chefer på det schweizisk-svenska företaget ABB i Zürich. The input format in Cass-SWE is that every token follows each other vertically, and every token can be associated with any number of tab-separated features.
(5) [main_hvp02 [pp3 [S hdp=Enligt ne=n/a] [np-app1 hd=tidskriften ne= n/a [NCUSND-APP1 hd=tidskriften ne= n/a] [NP hd=Der Spiegel ne=n/a]]] [vg_ hd=började ne=n/a [AUX_m hd=började ne=n/a]] SBJ=[np0 hd=härvan ne=n/a [NCUSND hd=härvan ne=n/a]] VRB=[vg_p_i hd=rullas_upp ne=n/a [VMN0P hd=rullas upp ne=n/a]] [rp1 hd=höstas ne=TIME [S hdp=i ne=TIME] [NC0000-T hd=höstas ne=TIME]]] [adv_clause02 [CS hd=sedan ne=n/a] SBJ=[np2 hd=företagskonsult ne=n/a [DIUS0 hd=en ne=n/a] [A0PUSNI hd=schweizisk ne=n/a]
[NCUSNI hd=företagskonsult ne=n/a]] VRB=[vg_a_i hd=träffat ne=n/a [VMU0A hd=träffat ne=n/a]] OBJ=[np1 hd=chefer ne=n/a [DINS0 hd=ett ne=n/a] [MEASURE hd=par ne=n/a] [A0P0PN0 hd=höga ne=n/a] [NCUPNI hd=chefer ne=n/a]] [pp3 [S hdp=på ne=ORGANIZATION] [np-app2 hd=företaget ne=ORGANIZATION [DFNS0 hd=det ne=n/a] [A0P0SND hd=schweizisk-svenska ne=n/a] [NCNSND-APP1 hd=företaget ne=n/a] [NP hd=ABB ne=ORGANIZATION]]] [pp3 [S hdp=i ne=LOCATION] [np0 hd=Zürich ne=LOCATION [NP hd=Zürich ne=LOCATION]]]] [F .]
4. Evaluation One of the oldest projects we are aware of that dealt with the production of a treebank for a large sample of a NL has been the SYNTAG project carried out in the mid 80’s at Språkdata. The aim of the project was to annotate with detailed syntactic information 100,000 tokens. The work was carried out and completed sucessfully using an interactive annotation procedure communicating with a relational database, where all the information was stored. Guidelines were also produced in order to ensure tagging consistency, details can be found in Järborg (1986). One of the interesting parts of SYNTAG, was the annotation of the subject and other grammatical relations for all sentences in the sample. Unfortunately, during various changes in hardware some part of the material has been corrupted, approximately 10-15%. In order to evaluate the output of Cass-SWE we extracted 23 sentences from SYNTAG, and then completed them with 80 more from a daily Swedish newspaper (the grammatical relations were tagged manually). The average length of the 103 sentences in the sample was 16,3 tokens/sentence. Table 1 shows the results achieved for the six grammatical relations examined in this work. We used the ‘standard’ metrics precision and recall for the evaluation. These are calculated according to: Precision=Received Correct Relations/Received Relations(Tagged) Recall= Received Correct Relations/Desired Relations(Actual)
Actual SBJ 181 Actual OBJ 89 Actual IBJ 1 Actual AGN 1 Actual FLN 16 Actual VRB 199
Tagged SBJ 173 Tagged OBJ 83 Tagged IBJ 1 Tagged AGN 1 Tagged FLN 10 Tagged VRB 198
Correct SBJ 167 Correct OBJ 77 Correct IBJ 1 Correct AGN 1 Correct FLN 10 Correct VRB 198
Precision 96.53% Precision 92.77% Precision 100% Precision 100% Precision 100% Precision 100%
Recall 92.26% Recall 86.51% Recall 100% Recall 100% Recall 62.50% Recall 99.49%
Table 1. Precision/Recall for grammatical relations
4.1 Error Analysis Error analysis on the instances used was conducted, Table 2 illustrates a typology of the errors that could be detected and which influenced the extraction of roles. This implies that other errors such as
part-of-speech that did not influence the relation assignment were ignored. Most of the errors where due to the lack of appropriate rules in Cass-SWE, sometimes caused by elliptic or word order phenomena. Moreover, some rules could syntactically parse a sentence but there was lack on the mechanism that could specify a syntactic role. Part-of-speech errors were the second major group that produced wrong results. While for three of the examples it was to the author unclear whether these errors were due to the tags or the lack of appropriate rules in the grammar. Error Cass-SWE errors (=>Elliptic)
Occ. Example 16 -(3) … vanligt folk, människor hon träffar på stan … … nämligen skyltar som säger vad du vill … (2) -(=>word order) 8 Part-of-speech errors … ett motiv som för/PREPOSITION tankarna till … … som Last/ADJECTIVE klarade à_la Hedman … 1 Apposition error … [dåvarande chefen] för Götaverken [dr Hugo Hammar] 1 Tokenisation The_story_of_an engine … 3 Combination part-of-speech and [En bit in på 2000-talet]/NP/SBJ kommer [stora grupper]/OBJ att Cass-SWE errors … de skrivit ner [sina], förmodar man, [innersta tankar] Table 4. Error analysis
5. Discussion and Further Work We have described how Cass-SWE, a cascaded finite-state parser for written Swedish, is used to identify and annotate grammatical relations of the type, subject and object. As commented by AïtMokhtar & Chanod, part-of-speech errors were their most serious problem for the identification of syntactic relations, but we do not 100% share this view. Evaluation of the grammatical relations extracted, revealed rather that the error rate is due to a mixture of part-of-speech tagging errors and lack of rules. However, we also think that it is useful to explore to what extend errors caused by a tagger can effect a parser’s results. That was the major reason we conducted the error analysis, and we think that there is ground for improvement at this particular point, by exploring better taggers than what we had at our disposal at the time this paper was written. Our comment on the influence of tagging errors in parsing is, however, that it has limited effect on the recognition of grammatical relations, the evaluation material is of course not large enough but the results seem to support that claim. An explanation might be that the post-tagging filter is sufficiently robust to overcome serious ambiguity problems such as appositions and coordination. Voutilainen (1998) has discussed the usefulness of part of speech taggers for syntactic parsing, and what the impact of part-of-speech tagging errors on syntactic analysis is. His conclusions were that by using a tagger the parser’s output becomes less ambiguous without a considerable penalty to recognition rate. The results can be improved and enriched. It is fairly straightforward to mark other constituents than noun phrases as subject or object, e.g. infinitive clauses, if these stand in such a relation. New, more fine-grained relations can be easily integrated. Feasible is also the possibility to use valency information for the recognition of prepositional objects with higher accuarcy than it can be done now, although we have not empirically investigated how accurate the extraction of prepositional phrases are at this stage. Attachment ambiguity can be resolved using an already implemented PP-attachment recognizer for Swedish (Kokkinakis 2000), an interesting exercise that is left to be tackled in the near future. Elliptical constructions are recognized and marked, but as with the prepositional objects, we have not made any investigations. Unbounded or long distance dependencies require, I think, other mechanisms, such as full parsing in order to be fully recovered. A partial parser can only reliably analyse some of the different forms of unbounded dependency constructions, particularly wh-questions and multiple clause intervention, relative clauses are much harder to resolve. Robust partial parsing in conjunction with integration of semantic information, such as namedentities or ontological categories from semantic lexica, and the extraction of grammatical relations opens new exciting opportunities for further linguistic research. One of the most evident application areas is Information Extraction. Identifying only some aspects of syntactic structure simplifies the
subsequent process of knowledge extraction and template filling. The template slots that an IE system extracts during scenario pattern matching often correspond to noun phrases in a text and the relationships to be extracted often correspond to grammatical relations. Thus, if syntactic relations are already correctly marked in a text, we would expect the scenario patterns to become simpler to acquire and more accurate. References Abney S. (1997), Part-of-Speech Tagging and Partial Parsing, In Corpus-Based Methods in Language and Speech Processing, Young S. and Bloothooft G. (eds), Chap. 4, pp. 118-136, Kluwer Acad. Publ. Aït-Mokhtar S. and Chanod J-P. (1997) Subject and Object Dependency Extraction Using Finite-State Cascades, In Automatic Information Extraction and Building of Lexical Semantic Resources Workshop, Vossen P., Adriaens G., Calzolari N., Sanfilippo A. and Wilks Y. (eds), pp. 71-77, Madrid, Spain Brants T., Skut W. and Krenn B. (1997) Tagging Grammatical Relations, Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing (EMNLP), Cardie. C & Weischedel R. (eds), Rhode Island, USA Carroll J., Briscoe T. and Sanfilippo A. (1998) Parser Evaluation: a Survey and a New Proposal, Proceedings of the 1st LREC, pp. 447-454, Granada, Spain Gaizauskas R., Hepple M. and Huyck C. (1998) A Scheme for Comperative Evaluation of Diverse Parsing Systems, Proceedings of the 1st LREC, pp. 143-149, Granada, Spain Grefenstette G. (1994) Explorations in Automatic Thesaurus Discovery, Kluwer Grishman R., MacLeod C. and Sterling J. (1992), Evaluating Parsing Strategies using Standardized Parse Files, Proceedings of the 3rd ACL Conference on Applied Natural Language Processing, Italy Grishman R. (1997) Information Extraction: Techniques and Challenges, Information Extraction, A Multidisciplinery Approach to an Emerging Information Technology, Pazienza (ed.), pp. 10-27, Springer Järborg J. (1986), Manual för syntaggning, Research Report from the Dept. of Swedish, Göteborg University, (in Swedish) Karlsson F. (1994) Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text, Mouton de Gruyter, Berlin Kokkinakis, D. and Johansson-Kokkinakis S. (1999) A Cascaded Finite-State Parser for Syntactic Analysis of Swedish, Proceedings of the 9th EACL. Bergen, Norway Kokkinakis D. (2000) PP-Attachment Disambiguation for Swedish; (Combining Unsupervised & Supervised Training Data), Nordic Journal of Linguistics, 2000:3 Ljung M. and Ohlander S. (1987), Allmän Grammatik, Liber, (in Swedish) LREC (1998) 1st Language Resources and Evaluation Conference, Granada, Spain LREC (2000) 2nd Language Resources and Evaluation Conference, Athens, Hellas Marcus M., Kim G., Marcinkiewicz M.A., MacIntyre R, Bies A., Ferguson M., Katz K. and Schasberger B. (1994), The Penn Treebank: Annotating Predicate Argument structure, Proceedings of the Human Language Technology Workshop, San Francisco, Ca Rilloff E. (1993) Automatically Constructing a Dictionary for Information Extraction Tasks, Proceedings of the 11th National Conference on Artificial Intelligence, (AAAI-93) Tapanainen P. and Järvinen T. (1997) A Non-Projective Dependency Parser, Proceedings of the Applied Natural Language Processing Conference (ANLP) , Washington D.C. Voutilainen A. (1998) Does Tagging Help Parsing? A Case Study on Finite State Parsing, Proceedings of the FSMNLP ’98 Workshop, Ankara, Turkey