Software for Annotating Argument Structure - ACL Anthology

Software for A n n o t a t i n g A r g u m e n t Structure Wojciech

Skut,

BrigitteKrenn, Thorsten Brants, U n i v e r s i t £ t des S a a r l a n d e s 66041Saarbrficken, Germany

HansUszkoreit

{skut,krenn,brants,uszkoreit}@coli.uni-sb.de change into the structure of the sentence being annotated is immediately displayed. Extra effort has been put into the development of a convenient keyboard interface. Menus are supported as a useful way of getting help on commands and labels. Automatic completion and error check on user input are supported. Three tagsets have to be defined by the user: partof-speech tags, phrasal categories and grammatical functions. They are stored together with the corpus, which permits easy modification when needed. The user interface is implemented in T c l / T k Version 4.1. The corpus is stored in an SQL database.

Abstract We present a tool developed for annotating corpora with argument structure representations. The presentation focuses on the architecture of the annotation scheme and a number of techniques for increasing the efficiency and accuracy of annotation. Among others, we show how the assignment of grammatical functions can be automatised using standard part-of-speech tagging methods. 1

The

Annotation

Scheme

3

Several features of the tool have been introduced to suite the requirements imposed by the architecture of the annotation scheme (cf. (Skut et al., 1997)), which can itself be characterised as follows: • Direct representation of the underlying argument structure in terms of unordered trees; • Rudimentary, flat representations; uniform treatment of local and non-local dependencies; • Extensive encoding of linguistic information in grammatical function labels. Thus the format of the annotations is somewhat different from treebanks relying on a context-free backbone augmented with trace-filter annotations of non-local dependencies. (cf. (Marcus et al., 1994), (Sampson, 1995), (Black et al., 1996)) Nevertheless, such treebanks can also be developed using our tool. To back this claim, the representation of structures from the SUZANNE corpus (cf. (Sampson, 1995)) will be shown in the presentation. 2

User

Interface

A screen dump of the tool is shown in fig. 1. The largest part of the window contains the graphical representation of the structure being annotated. The nodes and edges are assigned category and grammatical function labels, respectively. The words are numbered and labelled with part-of-speech tags. Any 27

Automation

To increase the efficiency of annotation and avoid certain types of errors made by the human annotator, manual and automatic annotation are combined in an interactive way. T h e automatic component of the tool employs a stochastic tagging model induced from previously annotated sentences. Thus the degree of automation increases with the amount of data available. At the current stage of automation, the annotator determines the substructures to be grouped into a new phrase and assigns it a syntactic category. The assignment of grammatical functions is performed automatically. To do this, we adapted a standard part-of-speech tagging algorithm (the best sequence of grammatical functions is to be determined for a sequence of syntactic categories, cf. (Skut et al., 1997)) The annotator supervises the automatic assignment of function tags. In order to keep him from missing tagging errors, the grammatical function tagger is equipped with a function measuring the reliability of its output. On the basis of the difference between the best and second-best assignment, the prediction is classified as belonging to one of the following certainty intervals: R e l i a b l e : the most probable tag is assigned, Less r e l i a b l e : the tagger suggests a function tag; the annotator is asked to confirm the choice,

- Gen eral:

-Sentence:

corp.,: iR~!co,pus tes~op~. Editor: IThorsten

.. I FI IFI

[] _.ar.r I-~---II _.°.o.d II ~_*,t I

No.: 4 / 1269 Comment: I Origin: refcorp.tt

Last edited: Thorsten, 07102/97, 17:39:29

I

511

sog~

i 505

+ Es° spielt1 eben keine3 Rolle PPER VVFIN ADV PlAT NN

's $,

ob die KOUS ART

Mus% gef"allig9 iStlo NN ADJD VAFIN

-i1 $(

nur2 etwas13 ADV PlAT

I

-Dependency:

,--Move:

I " ' ' II " " ' I ~-° to: I I :~0 11;,o I[] ~_,,to. I :~°° II ;'°° I Mato"es:°

I

14 Neues, s

16

mu",,/

,,

i F Parentlabel:

Selection: | Command: I

I N ° d e no.: I / Parent[abel: I

I

I Ex-ecute I

I Swi.!£hing to sentence .no. 4....Done.

Figure 1: Screen dump of the annotation tool

U n r e l i a b l e : the annotator has to determine the. function himself. The annotator always has the option of altering already assigned tags. The tagger rates 90% of all assignments as reliable. Accuracy for these cases is 97%. Most errors are due to wrong identification of the subject and different kinds of objects in S's and VP's. Accuracy of the unreliable 10% of assignments is 75%, i.e., the annotator has to alter the choice in 1 of 4 cases when asked for confirmation. Overall accuracy of the tagger is 95%. In several cases, the tagger has been able to abstract from annotation errors in training material, which has proved very helpful in detecting inconsistencies and wrong structures. This first automation step has considerably increased the efficiency of annotation. The average annotation time per sentence improved by 25%.

28

References Ezra Black et al. 1996. Beyond Skeleton Parsing: Producing a Comprehensive Large-Scale General-English Treebank With Full Grammatical Analysis. In The 16th International Conference on Computational Linguistics, pages 1 0 7 113, Copenhagen, Denmark. Mitchell Marcus et al. 1994. The Penn Treebank: Annotating Predicate Argument Structure. In Proceedings of the Human Language Technology Workshop, San Francisco. Morgan Kaufmann. Geoffrey Sampson. 1995. English for the Computer. The SUSANNE Corpus and Analytic Scheme. Wojciech Skut et al. 1997. An Annotation Scheme For Free Word Order Languages. In The 7th Conference on Applied Natural Language Processing, Washington, DC.

IR

Software for Annotating Argument Structure - ACL Anthology

Software for Annotating Argument Structure - ACL Anthology

Suggest Documents

Japanese Predicate Argument Structure Analysis ... - ACL Anthology

Structure Sharing with Binary Trees - ACL Anthology

Structure-Sharing in Lexical Representation - ACL Anthology

Site Report - ACL Anthology

Untitled - ACL Anthology

Semantic Visualization - ACL Anthology

Briefly Noted - ACL Anthology

The ACL Anthology Searchbench - Association for Computational ...

A Logic for Semantic Interpretation - ACL Anthology

boolean semantics for natural language - ACL Anthology

Patent Claim Processing for Readability - ACL Anthology

Online Supplemetary Materials1 for Daval ... - ACL Anthology

a proposal for sls evaluation - ACL Anthology

Latent Topic Embedding - ACL Anthology

focusing in dialog - ACL Anthology

Tagging Psychotherapeutic Interviews for Linguistic ... - ACL Anthology

Dependency Constraints for Lexical Disambiguation - ACL Anthology

Automatic Sentence Skeleton Compilation for ... - ACL Anthology

Unsupervised Approaches for Automatic Keyword ... - ACL Anthology

Proceedings of the... - ACL Anthology

on the independence of discourse structure and ... - ACL Anthology

the penn treebank: annotating predicate argument

Proceedings of the COLING/ACL 2006 Main ... - ACL Anthology

Transition-Based Neural Word Segmentation - ACL Anthology