The Annotation Graph Toolkit

0 downloads 0 Views 159KB Size Report
Jan 25, 2002 - part of the annotation of one of the sentences. ... W/your. P/axr. Figure 1: TIMIT Annotation Data and Graph Structure ... Graph Toolkit (Version 1.0): Application Developer's Manual. 3 ... and wrappers can easily be added to legacy and third-party ... It throws an AGException if the id does not contain a valid.
The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual Kazuaki Maeda, Xiaoyi Ma, Haejoong Lee and Steven Bird Linguistic Data Consortium, University of Pennsylvania 3615 Market St., Philadelphia, PA 19104-2608 USA fmaeda, xma, haejoong, [email protected] January 25, 2002

Abstract Annotation graphs provide an efficient and expressive data model for linguistic annotations of time-series data. This technical report describes a complete software infrastructure supporting the rapid development of tools for transcribing and annotating time-series data. This general-purpose infrastructure uses annotation graphs as the underlying model, and allows developers to quickly create special-purpose annotation tools using common components. The architecture, the application programming interface and a file I/O library are explained. A toy example of how to build a specialpurpose annotation tool using this general-purpose infrastructure is given.

c Copyright 2001 University of Pennsylvania.

This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v 1.0 or later. (The latest version is presently available at http://www.opencontent.org/openpub/.) 1

2

1

Introduction

Linguistic databases have seen increasingly broad use in the scientific study of language, in research and development of language-related technologies, and in language-related applications more broadly. There have been many independent efforts to provide tools for creating annotated linguistic databases, to provide general formats for expressing them, and to provide tools for creating, browsing and searching databases containing them (see [6, 7]). While the utility of existing tools, formats and databases is unquestionable, their sheer variety – and the lack of standards able to mediate among them – has become a critical problem. Bird and Liberman have discovered striking commonalities among linguistic databases and have developed a general-purpose model – the annotation graph – for expressing the logical structure of linguistic annotations [8]. A mapping to relational form permits existing database technology to be used for persistent storage and transaction processing [5, 9]. The annotation graph model, a generalization of the Tipster model used in text retrieval [13], is capable of representing virtually all types of linguistic annotation (e.g. phonetic, orthographic, part-of-speech, syntactic, discourse, intonational). This development has opened up an interesting range of new possibilities for creation, maintenance and search, and is leading to the development of new annotation tools with applicability across the text, audio and video modalities. It is also permitting existing annotation tools – each with large user-bases – to be made fully interoperable. Annotation Graphs. An annotation graph is a directed acyclic graph where edges are labeled with fielded records, and nodes are (optionally) labeled with time offsets. The model can be illustrated with an application to a simple linguistic database called TIMIT [11]. This database contains phonetically transcribed recordings of 630 speakers of 8 major dialects of American English, and was designed to improve the dialect independence of speech recognition systems for American English. Figure 1 shows part of the annotation of one of the sentences. The file on the left contains word transcription, and the file on the right contains phonetic transcription. Part of the corresponding annotation graph is shown underneath. Each node contains an identifier and an offset (in 16kHz samples). train/dr1/fjsp0/sa1.wrd: 2360 5200 she 9680 11077 your 16626 22179 suit 24400 30161 greasy 36720 41839 water

0 0

P/h#

1 2360

P/sh

2 3270

W/she

5200 9680 had 11077 16626 dark 22179 24400 in 30161 36150 wash 41839 44680 all

P/iy

3 5200

P/hv

| | | | |

4 6160

train/dr1/fjsp0/sa1.phn: 0 2360 h# 3720 5200 iy 6160 8720 ae 9680 10173 y 11077 12019 dcl

P/ae W/had

5 8720

P/dcl

6 9680

P/y

2360 3720 sh 5200 6160 hv 8720 9680 dcl 10173 11077 axr 12019 12257 d

7 10173

P/axr

8 11077

W/your

Figure 1: TIMIT Annotation Data and Graph Structure

Overview. The intended audience of this report is would-be AGTK application developers. Each section is intended to cover a different aspect of application development. Section 2 gives a high-level overview of the architecture, showing how all the pieces fit together. Sections 3 and 4 describe the AG and file I/O libraries respectively, with examples of how to call them (in C++). Section 5 shows how these libraries are called from Tcl and Python. Sections 6 and 7 show how special-purpose GUI components

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual

3

for visualization and annotation are interfaced to the libraries, and how complete applications are developed. Section 8 concludes the presentation with some remarks about ongoing activities. The appendices include further definitions and samples (IDL, DTD, configuration files, code samples). All software described in this report is available under an open source license. For information about downloads and pointers to documentation, please see www.ldc.upenn.edu/AG.

2 Architecture Existing annotation tools are based on a two level model (Figure 2 Left). The system we demonstrate is based on a three level model, in which annotation graphs provide a logical level independent of application and physical levels (Figure 2 Right). This is the three-level model of modern database systems [1] applied to linguistic databases in support of data independence, data reuse, and software integration. The application level represents special-purpose tools built on top of the general-purpose infrastructure at the logical level. Annotation Tools

Extraction Systems

Application Level

Visualization & Exploration Conversion Tools

Evaluation Software

Query Systems Automatic Aligners

Application Level

Visualization & Exploration Conversion Tools

Logical Level

Physical Level

RDB Format

XML

Tab delimited flat files

Annotation Tools

Extraction Systems

Physical Level

Evaluation Software

Query Systems Automatic Aligners

AG-API

RDB Format XML Tab delimited flat files

Figure 2: The Two and Three-Level Architectures for Speech Annotation The system is built from several components which instantiate this model. Figure 3 shows the architecture of the tools currently being developed. Annotation tools must provide graphical user interface components for signal visualization and annotation. The communication between components is handled through an extensible event language. An application programming interface for annotation graphs (AG-API) has been developed to support well-formed operations on annotation graphs. This permits applications to abstract away from file format issues, and deal with annotations purely at the logical level. As with other recent architectures for language technologies (e.g. [2]) the architecture consists of a set of loosely-coupled, heterogenous components that communicate with each other by exchanging messages. This design has three benefits. First, components can be implemented in the most opportune language, and wrappers can easily be added to legacy and third-party components. Second, message traffic can be logged to facilitate error diagnosis and to permit inter-component and human-computer interactions to be replayed and analyzed. Third, message passing permits the transport protocol to be separated from the communication content. The former is enforced by the infrastructure, while the latter is extremely flexible.

4

Figure 3: Architecture for Annotation Systems

3

The AG Library

The AG library (libag) is implemented in C++, and provides functions for creating, deleting, modifying and searching the following AG objects: AGSet, AG, Annotation, Anchor, Timeline, Signal, Feature and Metadata. These objects are related to each other according to the object model shown in Figure 4. The various objects will be explained in more detail as we describe the API.

Figure 4: The AG Object Model The AG library also keeps indexes for the Annotation, Anchor, Feature and Metadata types so that searches can be done efficiently. The API provides access to internal objects (signals, anchors, annotations, etc.) using identifiers. Types of AG identifiers include AGSetId, AGId, and AnnotationId. Please see Appendix A for a complete list of AG data types and their identifiers.

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual

5

3.1 The structure of AG identifiers All identifiers are represented as strings. The internal structure of an identifier can best be understood in terms of the object hierarchy in Figure 4. AG identifiers are fully-qualified: given an object identifier, all of the ancestor objects can be discovered simply by inspecting the identifier. For instance, an anchor "Timit:AG1:Anchor2" belongs to the AG "Timit:AG1", which in turn belongs to the AGSet "Timit". Internally, the AG library maps these string identifiers to object references. A consequence of this design is that AG objects can be referenced from scripting languages, using human-readable names.

3.2 AG API functions This section explains some typical AG API functions. The complete IDL definition of the AG-API is provided in the appendix (also available online). 3.2.1 AGSet and AG functions An AGSet is an object which contains a set of annotation graphs. Typically, an AGSet corresponds to a corpus, but it might also correspond to a particular user-specified selection from a corpus, or a selection spanning several corpora. The first thing to do in working with annotation graphs is to create an AGSet object to hold them. CreateAGSet AGSetId CreateAGSet(AGSetId agSetId);

An AGSet must be created using CreateAGSet before anything else can be done. CreateAGSet creates an empty AGSet with a specified AGSetId, and returns the AGSetId. Once an AGSet is created, Timelines, Signals, AGs, and then Annotations and Anchors can also be created. Certain functions can then be called on these data types; for example, to test for their existence or to delete them. An AGSet can be deleted by using DeleteAGSet. Its existence can be tested by using ExistsAGSet. CreateAG

Once an AGSet has been created, we can create individual AGs. We do this as follows:

AGSet CreateAG(Id id); AGSet CreateAG(Id id, TimelineId timelineId);

The id might be an AGSetId or an AGId. If it is an AGSetId, an AGId will be assigned to the new AG. However, if it is an AGId, the library will try to use the supplied id. If this id is unavailable, it will assign a new AGId. The timelineId is the id of the timeline with which the new AG will be associated. An AG can be created without being associated to any timeline.

6

CreateAG returns the AGId of the new AG. It throws an AGException if the id does not contain a valid

AGSetId, or if the timeline does not exist. 3.2.2

Timeline functions

A timeline is a set of signals sharing the same abstract time. For example, a three way conference call can be recorded with each participant on different channel. Each channel then can be seen as a separate Signal, and these three synchronized signals are grouped into the same Timeline. The functions CreateTimeline, ExistsTimeline and DeleteTimeline will now be explained. CreateTimeline TimelineId

CreateTimeline(Id id);

CreateTimeline creates a new Timeline and returns the object’s TimelineId. Id is AGSetId or TimelineId. In either case, the AGSet to which the Timeline belongs must already exist; otherwise an

exception will be thrown. For example: /* Create an AGSet with id "Timit" */ AGSetId agSetId = CreateAGSet("Timit"); /* Fine, since AGSet "Timit" exists */ TimelineId timeline1 = CreateTimeline(agSetId); /* Fine. The kernel knows that "Timit:Timeline2" is not an AGSetId, so it will consider it as a TimelineId of a Timeline belonging to AGSet "Timit", which exists */ TimelineId timeline2 = CreateTimeline("Timit:Timeline2"); /* Exception caught, since AGSet "CallHome" does not exist */ TimelineId timeline3 = CreateTimeline("CallHome"); /* Exception caught. The kernel knows that "CallHome:Timeline2" is not an AGSetId, so it will consider it as a TimelineId of a Timeline belonging to AGSet "CallHome", which does not exist. */ TimelineId timeline4 = CreateTimeline("CallHome:Timeline2");

ExistsTimeline boolean ExistsTimeline(TimelineId timelineId);

This function tests for the existence of the specified Timeline, and returns true if it exists and false otherwise. DeleteTimeline void DeleteTimeline(TimelineId timelineId);

This function deletes the specified Timeline if it exists.

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual

7

CreateSignal SignalId CreateSignal(Id id, URI uri, MimeClass mimeClass, MimeType mimeType, Encoding encoding, Unit unit, Track track);

Create a new signal and add it to the timeline. The id argument might be TimelineId or SignalId. If it is a TimelineId, the library will generate a new SignalId. If it is a SignalId, the library will try the given id first, and if it’s taken, generate a new SignalId. If the id given is invalid, it throws an AGException. The uri argument specifies a location where the signal is to be found. Applications may use this information to display and replay a signal. The mimeClass and mimeType arguments tell an application about the format of the signal, while the encoding argument specifies how samples are coded (e.g. mu-law). The unit argument specifies the sample rate of the signal; annotation applications may use this information to set the granularity of time coding and time alignment in a user interface. The track argument records which track of the signal file contains the signal. In this way, we can create two or more distinct signal objects which reference different tracks of the same signal file. GetSignals SignalIds GetSignals(TimelineId timelineId); GetSignals returns SignalIds of the Signals contained in the specified Timeline. The SignalIds

are separated by spaces. For example: /* Create an AGSet with id "Timit" */ AGSetId agSetId = CreateAGSet("Timit"); /* Create a Timeline with id "Timit:timeline1" */ CreateTimeline("Timit:timeline1"); /* Create a signal with id "Timit:timeline1:signal1" */ CreateSignal("Timit:timeline1:signal1", "my uri", "my mimeClass","my mimeType", "my encoding", "my unit","my track"); /* Create a signal with id "Timit:timeline1:signal2" */ CreateSignal("Timit:timeline1:signal2", "my uri", "my mimeClass","my mimeType", "my encoding", "my unit","my track"); /* Create a signal with id "Timit:timeline1:signal3" */ CreateSignal("Timit:timeline1:signal3", "my uri", "my mimeClass","my mimeType", "my encoding", "my unit","my track"); /* Get Signals contained in Timeline "Timit:timeline1" variable signalIds will be "Timit:timeline1:signal1 Timit:timeline1:signal2 Timit:timeline1:signal3" */ SignalIds signalIds = GetSignals(timeline1);

8

3.2.3

Annotation Functions

CreateAnnotation AnnotationId CreateAnnotation(Id id, AnchorId start, AnchorId end, AnnotationType annotationType);

Create a new Annotation. The id argument can be an AGId or an AnnotationId. If it is an AGId, an AnnotationId will be assigned to the new annotation. On the other hand, if it is an AnnotationId, the library will try to use the supplied id. If this id is unavailable, it will assign a new AnnotationId. start is the id of the start anchor. end is the id of the end anchor. annotationType is the type of the annotation. CreateAnnotation returns the AnnotationId of the new annotation.

ExistsAnnotation bool ExistsAnnotation(AnnotationId annotationId);

Test if the annotation exists. DeleteAnnotation void DeleteAnnotation(AnnotationId annotationId);

Delete an annotation. CopyAnnotation AnnotationId CopyAnnotation(AnnotationId annotationId); CopyAnnotation copies an existing annotation, with a new identifier assigned to the new annotation.

SplitAnnotation AnnotationIds SplitAnnotation(AnnotationId annotationId);

This function splits an annotation into two, creating a new annotation with the same label data as the original one. It returns Ids for both annotations. 1 4.3

l

2 6.5

1

becomes

l

4.3

Figure 5: Split an annotation

3

l

2 6.5

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual

9

NSplitAnnotation AnnotationIds NSplitAnnotation(AnnotationId annotationId, short N);

This function splits an annotation into N annotations, creating N-1 new annotations having the same label data as the original one. It returns Ids for all annotations, including the original one. 1 4.3

l

2 6.5

becomes

1

l

3

l

4

l

5

l

4.3

2 6.5

Figure 6: Nsplit an annotation with N = 4

3.2.4 Accessing Label Data SetFeature void SetFeature(Id id, FeatureName featureName, FeatureValue featureValue); SetFeature sets the features of an annotation as well as the features of the metadata associated with AGSets, AGs, Timelines and Signals. So, the Id can be AnnotationId, AGSetId, AGId, TimelineId or SignalId. This is also true for other Feature functions, such as ExistsFeature, DeleteFeature, GetFeature, etc.

GetAnchorSet AnchorIds GetAnchorSet(AGId agId); GetAnchorSet returns all the Anchors in a given AG.

GetAnchorSetByOffset AnchorIds GetAnchorSetByOffset(AGId agId, Offset offset, float epsilon=0); GetAnchorSetByOffset returns all anchors with its offset in between offset-epsilon and offset+epsilon,

inclusive. The default value for epsilon is 0. GetAnchorSetNearestOffset AnchorIds GetAnchorSetNearestOffset(AGId agId, Offset offset); GetAnchorSetNearestOffset returns all anchors at the nearest offset to the given offset.

10

3.2.5

Accessing Annotations

GetIncomingAnnotationSet and GetOutgoingAnnotationSet AnnotationIds GetIncomingAnnotationSet(AnchorId anchorId);

This function returns the incoming annotations of the specified anchor. AnnotationIds GetOutgoingAnnotationSet(AnchorId anchorId);

This function returns the outgoing annotations of the specified anchor. The incoming annotations of anchor a are the annotations which end with anchor a. Similarly the outgoing annotations of anchor a are the annotations which start with anchor a. For example, in the AG shown in Figure 7, the incoming annotations of anchor 2 are a,b,c,d,e, and the outgoing annotations of anchor 2 are f,g,h. a b

f

c 1 4.3

g 2

d

6.5

3 h

8.9

e

Figure 7: Incoming and outgoing annotations

GetAnnotationSetByOffset AnnotationIds GetAnnotationSetByOffset(AGId agId, Offset offset); GetAnnotationSetByOffset returns all annotations that overlap a particular offset.

GetAnnotationSeqByOffset AnnotationIds GetAnnotationSeqByOffset(AGId agId); GetAnnotationSeqByOffset returns all annotations sorted by start anchor offsets as the first sorting key, end anchor offsets as the second, and AnnotationIds as the third. AnnotationIds GetAnnotationSeqByOffset(AGId agId, Offset begin);

This function returns all the annotations with their start anchor offset greater than or equal to a specified offset, sorted by start anchor offsets as the first sorting key, end anchor offsets as the second, and AnnotationIds as the third. AnnotationIds GetAnnotationSeqByOffset(AGId agId, Offset begin, Offset end);

This function returns all the annotations with the start anchor offset in between the specified offsets, sorted by start anchor offsets as the first sorting key, end anchor offsets as the second, and AnnotationIds as the third.

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual

11

3.2.6 Built-in Load and Save Functions toXML string toXML(Id id); toXML returns a string in the ATLAS Level 0 XML format of the specified AGSet or AG.

LoadFromDB void LoadFromDB(string connStr, AGSetId agsetId); LoadFromDB loads the specified AGSet from the database server. The variable connStr specifies the

connection string that ODBC uses to connect to the server. It contains information such as hostname, database name, user name, password etc. Table 1 shows some of the parameters used in a connect string, for a complete list, see http://www. mysql.com/doc/M/y/MyODBC_connect_parameters.html. ODBC connect string arguments DSN SERVER UID PWD DATABASE

What the argument specifies Registered ODBC Data Source Name. The hostname of the database server. User name as established on the server. SQL Server this is the logon name. Password that corresponds with the logon name. Database to connect to. If not given, DSN is used.

Table 1: Parameters in Connect String DSN is the registered ODBC Data Source Name, it should be defined in the .odbc.ini file in your home directory. All other arguments can be either defined in the .odbc.ini file, or defined in the connect string itself. To gain access to most ODBC data sources, you must provide a valid user ID and corresponding password. These values are initially registered by the database administrator. Probably the easiest way is to define every argument in the .odbc.ini file in your home directory. The following is a sample driver section for DSN ’talkbank’ in the configuration file for iODBC. To make the explanation easier, line numbers are included. Please notice that UID and PWD become USER and PASSWORD, respectively, in iODBC’s configuration file. 1 2 3 4 5 6 7

[talkbank] Driver = DSN = SERVER = USER = PASSWORD = DATABASE =

/pkg/ldc/lib/libmyodbc.so talkbank talkbank.ldc.upenn.edu myuserid mypasswd talkbank

12

Line 1 is the name of the driver section, which is ’talkbank’. You can have multiple driver sections in one configuration file. Line 2 specifies the ODBC driver to use. Line 3 gives the name of the DSN, which is ’talkbank’. Line 4 specifies the hostname of the machine on which the database server is running. Line 5 is the user name to use to connect to the server. Line 6 is the password associated with the user name. Line 7 is the database to connect to. If you have all required arguments specified in your .odbc.ini file like the one above, the connect string can simply be: DSN=talkbank;

If you have not specified some of the arguments, say USER and PASSWORD, in the configuration file, you can still specify them in your connect string: DSN=talkbank;UID=myuserid;PWD=mypasswd;

StoreToDB void StoreToDB(string connStr, AGSetId agSetId); StoreToDB stores the specified AGSet to the database server. The variable connStr contains connec-

tion information, such as hostname, database name, user name and password, as explained above. Many other functions exist, please see appendix. There are also examples of how to use these functions in the demo code in the source code distribution.

4

The File I/O Library

The AG file I/O library is also implemented in C++. This section describes how the I/O classes are used to read native format files into AGs and write them back. Table 2 summarizes the formats which are currently supported by the AG file I/O library. There is one abstract class named agfio which declares the interface for the load and store methods, which are virtual functions. By inheriting the agfio class and implementing the load and store methods, each format will be a class with load and store methods. Loading or storing is done by creating an instance of a format class and calling the load or store method with the proper arguments.

4.1

Load Method

The load method is declared as follows: AGIds load(const string& const Id& map* map* throw(LoadError);

filename, id = "", signalInfo = NULL, options = NULL)

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual Format name AIF

Supported I/O input/output1

BAS

input

BU LCF SwitchBoard TF TIMIT

input input/output input input/output input

TreeBank

input/output

XLabel

input

13

Target corpus or format ATLAS Interchange Format, Level 0 (http://www.ldc.upenn.edu/AG/doc/xml/ag.dtd) BAS Partitur format (http://www.phonetik.uni-muenchen.de/Bas/ BasFormatseng.html) Boston University Radio Speech Corpus LDC Callhome Format Switchboard Table Format TIMIT Corpus (http://www.ldc.upenn.edu/lol/docs/TIMIT.html) Penn Treebank (http://www.cis.upenn.edu/ treebank/home.html) XLabel format

Table 2: Supported file formats filename is the name of the annotation file to load including the path. However, sometimes it should

not include the extension of the filename. id is the id of an AG or AGSet into which the annotation file is loaded. If there does not exist an AG or AGSet with the specified id, load will create one. The AIF format does not require any id since its files

specify their own ids. signalInfo is a feature-value map. The features a signal can have are uri, mimeClass, mimeType, encoding, unit, and track. (See the interface definition of the CreateSignal() function.) signalInfo

is always optional. options is a feature-value map of options.

Each format has its own requirements for the arguments. Table 3 summarizes those requirements. Note that BU and TIMIT require filename not to include extensions of file names. Those formats consist of several files whose names are identical except for their extensions. The BU and TIMIT loaders will expand the given filename with proper extensions. The base option of BU allows users to select which annotation file to use for the base annotation. Each annotation file is loaded into a single AG except for an AIF file, which can create many AGs. Thus, each call of the load method returns an AGId, except for AIF, where calling load returns a list of AGIds. If an error occurs during loading, LoadError is thrown and the corresponding message is printed.

4.2 Store Method store is used to store the loaded annotation file after it has been modified. The store method is

declared as follows: void

14

filename Yes

id No

signalInfo No

BAS BU

Yes Yes (no extension)

Yes Yes

Optional Optional

LCF SwitchBoard TF

Yes Yes Yes

Yes Yes Yes

Optional Optional Optional

TIMIT

Yes (no extension) Yes Yes

Yes

Optional

Yes Yes

No Optional

AIF

TreeBank XLabel

options Optional encoding := XML encoding name for the output (default value: UTF-8) No Optional base := lbl j lba (default value: lbl) No No Yes header := field names separated by separators separator := separator of the format ann type := type of the annotation (default value: TF) No No Optional ann type := type of the annotation (default value: object)

Table 3: Argument requirements for load method store(const string& filename, const AGIds& agIds = "", map* options = NULL) throw(StoreError); filename is the name of the file where the AGs are stored. agIds is a list of AG ids to be stored. If agIds is empty, all AGs are stored. options is a feature-value map of options. Some formats may require options to work properly.

The TF format takes 2 options – header and separator – for the case when the user wants to change the header or separator of the output file. However, this is optional. If any error occurs during store, StoreError is thrown and the corresponding message is printed. As shown in Table 2, currently only AIF, LCF and TF formats can be stored.

4.3

Example

Consider a case where we want to load a CSV (Comma Separated Value) file, which is of the TF format. Thus, we need the TF I/O class. In order to use the TF I/O class, we have to include TF.h in our program: #include

Now, we can declare a TF loader:

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual

15

agfio* loader = new TF;

or TF loader;

Before we can call the load method of loader, we need to collect the arguments for the method. First of all, the name of the annotation file is V1-1.ann, and it is located in the data directory. Secondly, we will load the file into an AG whose id is CSV74:AG33. Note that we don’t need to create an AG with that id. Thirdly, the signal file for the annotation file is V1-1.wav. It is located in VERVET/somewhere, its mime class is audio, its mime type is wav, its encoding method is 16-bit linear, and its encoding unit is 44.1 kHz. Thus we have the following signalInfo: map signalInfo; signalInfo["uri"] = "VERVET/somewhere/V1-1.wav"; signalInfo["mimeClass"] = "audio"; signalInfo["mimeType"] = "wav"; signalInfo["encoding"] = "16-bit linear"; signalInfo["unit"] = "44.1kHz";

Finally, the TF loader requires options such as header, separator and ann type. Using the header and separator for the VERVET CSV format and ‘CSV’ as ann type, we have the following options: map options; options["header"] = "START,END,TYPE,DATE,TIME,"; options["header"] += "CALLER,RECIPIENT,CONTEXT,CALL_TYPE,REMARKS"; options["separator"] = ","; options["ann_type"] = "CSV";

Now, we are ready to load the file: loader->load("data/V1-1.ann", "CSV74:AG33", &signalInfo, &options); // if the loader is declared as "agfio* loader = new TF;"

or loader.load("data/V1-1.ann", "CSV74:AG33", &signalInfo, &options); // if the loader is declared as "TF loader;"

5 Scripting Language Access to the APIs This section describes how to access the AG API and AG file I/O API from Tcl and Python. Throughout this section, we assume that the AG library is installed in /usr/local, and /usr/local/lib is included in the user’s LD LIBRARY PATH environment variable. This can be done in a Unix Bourne shell with:

16

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

or in a C shell: setenv LD_LIBRARY_PATH "$LD_LIBRARY_PATH":/usr/local/lib

5.1

Tcl

In order to use the AG Tcl interface, a Tcl program must include the following 2 lines of code before any AG API calls: source /usr/local/lib/ag.tcl load /usr/local/lib/ag_tcl.so

By looking at the interface definitions of the AG API in ag.idl (see Appendix A), we can see how to make AG API calls. For example, CreateAnnotation is defined in ag.idl as follows: // Id may be AGId or AnnotationId AnnotationId CreateAnnotation( in in in in

Id AnchorId AnchorId AnnotationType

id, anchorId1, anchorId2, annotationType );

To create an Annotation in a Tcl program, we can do: AG_CreateAnnotation $agId $a1 $a2 $ann_type

where $agId is the id of an AG, $a1 and $a2 are ids of Anchors, and $ann type is the type of the annotation (e.g. “word”). The difference here is that the Tcl command has a prefix AG and that there are no commas or parentheses. The following is an example of the load function call: # setting signal information set signalInfo(uri) "LDC-LCF00/speech_data/en_4065.sph" set signalInfo(mimeClass) "audio" set signalInfo(mimeType) "NIST sphere" set signalInfo(encoding) "8-bit mu-law" set signalInfo(unit) "8kHz" # calling the load function AGF_load LCF "data/en_4065.txt" $agset signalInfo

Compare this to the I/O function load described in Section 4.1. AGF load has an additional argument for format name, which is the first argument of the function. The arguments, signalInfo and options, are Tcl arrays (hashes). You can find more examples in the demo program, demo/ag wrapper test.tcl, in the aglib package.

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual

17

5.2 Python In order to use the AG Python interface, a Python program must import the ag module. Before the ag module can be imported, the Python interpreter must know where the ag module files (ag.py, ag python.so) are located. This can be done in one of two ways. The first method is to append the directory (/usr/local/lib) to the PYTHONPATH environment variable: sh$ export PYTHONPATH=$PYTHONPATH:/usr/local/lib

or csh% setenv PYTHONPATH "$PYTHONPATH":/usr/local/lib

The second method is to put the following lines before importing the ag module into the Python program: import sys sys.path.append("/usr/local/lib")

Now, our program can import the ag module and use the APIs. import ag

By looking at the interface definitions of AG API in ag.idl (see the appendix), we can easily see how to make AG API calls. For example, CreateAnnotation is defined in ag.idl as follows: // Id may be AGId or AnnotationId AnnotationId CreateAnnotation( in in in in

Id AnchorId AnchorId AnnotationType

id, anchorId1, anchorId2, annotationType );

To create an Annotation in a Python program, we do: ag.CreateAnnotation(agId, a1, a2, ann_type)

where agId is the id of an AG, a1 and a2 are ids of Anchors, and ann type is the type of the annotation. The following is an example of load function call: # setting signal information signalInfo = signalInfo["uri"] = "LDC-LCF00/speech_data/en_4065.sph" signalInfo["mimeClass"] = "audio" signalInfo["mimeType"] = "NIST sphere" signalInfo["encoding"] = "8-bit mu-law" signalInfo["unit"] = "8kHz" # calling the load function ag.load("LCF", "data/en_4065.txt", agset, signalInfo)

18

Compare this to the I/O function load described in Section 4.1. The Python load function has an additional argument for format name, which is the first argument of the function. The arguments, signalInfo and options, are Python dictionaries. You can find more example in the demo program, demo/ag wrapper test.py, in the aglib package.

5.3

Exception Handling

AG/AGFIO libraries issue some exceptions when they encounter any situation they can’t handle. AGException is the exception issued by the AG core library. LoadError and StoreError are exceptions issued by the load and store methods of AG File I/O library respectively. Those are C++ exceptions, which we need to deal with somewhat differently in scripting languages. In Tcl Tcl has the catch command to catch exceptions, which we can use to catch AG/AGFIO exceptions. To check if the caught exception is an AG/AGFIO exception, we need to check two things: the error code returned by catch command, and the error message. Firstly, the error code should be 1. Secondly, the error message consists of error type, origin of the error, and message, separated by commas. To be an AG/AGFIO exception, the error type must be one of AGException, LoadError, or StoreError. Here is an example code: set errcode [ catch f AG_CreateAGSet "bad:id" g msg ] if f$errcode == 1g f set i [string first "," $msg] set etype [string range $msg 0 [expr $i - 1]] if f$etype == "AGException"g f puts "AGException caught!!" g g

In Python Python has its own exception handling system. When an AG/AGFIO exception is issued, it is propagated to Python in the form of a Python exception. To catch AG/AGFIO exceptions in Python, the program needs to catch AGWError, which stands for AG Wrapper Error. AGWError is a class with three methods: type() returns the type of error, origin() returns the origin of the error, and message() returns the message of the error. Thus, the user program can do some

investigation on the exception caught.

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual

19

Here is an example code: try: ag.CreateAGSet("bad:id") except ag.AGWError, err: print err.origin(), "caused", err.type() print "message was:", err.message()

6 Waveform Display with WaveSurfer WaveSurfer [20] was developed by K˚are Sj¨olander and Jonas Beskow of KTH as a tool for displaying and manipulating sound files. It is open source, and it is continually being improved. WaveSurfer is written in Tcl/Tk and its widget, called wsurf (Figure 8), can be embedded in an application written in Tcl/Tk. A Python interface has also been developed. This makes WaveSurfer an excellent component to use with the AG Toolkit. When embedding a wsurf widget in an annotation tool, two kinds of interactions are involved: one for controlling a wsurf widget from the annotation tool, and one for getting information about events happening in the wsurf widget. The former is done via the API that the wsurf widget provides. The latter is processed via a special mechanism called the WaveSurfer plug-in. This section explains how these two communication methods are used. Later in this report (Section 7.6), we will learn how to use wrappers for the wsurf widget that provides an inter-component communication mechanism consistent with other components of the AG Toolkit.

Figure 8: The wsurf widget

6.1 Installation of wsurf and snack packages The wsurf widget requires the snack [19] package, also written by K˚are Sj¨olander. In order to use the wsurf widget from the wish Tcl/Tk interpreter, both snack and wsurf need to be installed in a location accessible to the wish interpreter. The directories which the interpreter searches can be found by using the following Tcl/Tk command. puts $auto path

The snack and wsurf packages should be installed in one of the directories in auto path. Alternatively, one could install these packages in a private directory, and add the following line at the beginning of the Tcl script.

20

set auto path [concat private_directory_name $auto_path]

Once these components are in the right path, the following command should be included in order to use the wsurf package: package require wsurf 1.0

6.2

Wsurf API

The commands provided by the wsurf API include the following: openFile saveFile filename play stop configure ....

For example, the following segment of Tcl/Tk code first creates and then packs a wsurf widget. Then, it opens the file sample.wav, and plays the portion of the sound file starting at 0.25 second and ending at 1.25 second. set w [wsurf .w] pack $w $w openFile sample.wav $w play 0.25 1.25

6.3

WaveSurfer Plug-in API

When some event, such as a change in the highlighted region, occurs in the wsurf widget, this information needs to be passed to the main annotation tool. This can be handled using the plug-in architecture offered by WaveSurfer. A plug-in is a Tcl/Tk script stored in a special location where the wsurf widget will look. This location can be specified using the ‘-plugindir’ option given to the ‘wsurf’ command. Within a plug-in, the following Tcl/Tk command is used to register the plug-in. wsurf::RegisterPlugin Plug-inName

Once a plug-in is registered, callback functions for common events, such as cursormoved and openfile, can be specified in the plug-in file. The user-definable callbacks include the following: cursormovedproc cutproc getboundsproc getconfigurationproc getoptproc openfileproc panecreatedproc playproc recordproc

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual

21

6.4 WaveSurfer configuration files Configuration files can be used with a wsurf widget to customize the appearance of the wsurf widget. The configurable factors include the number of waveforms (e.g., 1 for a one channel signal, 2 for two channel signals), and colors or fonts used in the display. In addition, one could also specify whether to include spectrograms or pitch tracks. Appendix D shows an example of a wavesurfer configuration file.

6.5 Python support The AG Toolkit includes a Tkinter-based Python interface for the wsurf widget. This allows a wsurf widget to be created and controlled from a Python program. For example, the following code starts a wsurf widget from Python. from Tkinter import * from agWsurf import * root = Tk() root.tk.eval(’package require wsurf’) w = Wsurf(root) w.pack() root.mainloop()

7 Building Annotation Tools with Tcl/Tk or Python The AG library and the AG file I/O library provide the means to create, manipulate, read and write AG data. Using the wrappers for Tcl/Tk and Python, one can access these APIs from a scripting language. This greatly simplifies and accelerates the development cycle of prototyping, testing and debugging. Combined with the wsurf widget, these components provide the foundation for developing a specialized tool. This section explains the design of the inter-component communication (message passing) architecture we use, shows an example annotation component (a spreadsheet widget) and illustrates how to build a new annotation tool. A toy example of a specialized annotation tool using the AG Toolkit is presented.

7.1 Building tools with Tcl/Tk Tcl [14, 18, 21] is a scripting language, and Tk is a graphical user-interface toolkit. The Tk toolkit provides a graphical user-interface for Unix, Windows and Macintosh platforms. Some graphical userinterface components, called widgets, are provided with the Tcl/Tk standard distribution. For example the following fragment of Tcl/Tk code creates and displays a text widget: set t [text .t] pack $t

Other GUI components, such as buttons, menubars and canvases, come standard with Tk. In addition, open-source GUI components and extensions are developed and distributed by various developers around the world. [http://dev.scriptics.com] provides pointers to such software.

22

7.2

Building tools with Python

Python [16, 4] has become very popular as a scripting language. Python provides an object-oriented programming framework, and is easy to learn. A GUI package called Tkinter [12] based on the Tk toolkit is included in the standard Python distribution.2 Tkinter provides class definitions to the standard widgets included in Tk. In addition, it is relatively easy to write a Tkinter class for a third-party Tk widget that has not been supported by Tkinter. The following segment of Python code creates and displays a Tk text widget. from Tkinter import * root = Tk() t = Text(root) t.pack root.mainloop()

7.3

The inter-component communication architecture

An annotation tool built with the AG Toolkit will consist of several major components. A typical tool might have the following components:

 A main program (script),  an annotation/transcription component, and  a signal display component. An annotation/transcription component is a component in which the user would enter annotations and transcriptions. The signal display component gives access to recorded digital signals, such as speech waveforms. Typically, annotation/transcription components and waveform display components are reused for various specialized annotation tools. To create a new annotation tool, the developer would write a main program using the right selection of widgets and providing callback functions to handle widget events. Events are passed around among components so that necessary tasks can be performed within each component. Consider the following example. Suppose that the user already has an annotation assigned to a specific region in the signal. He/she now wants to assign new start and end times for the signal to the annotation. Suppose the keyboard input Control-g, in the waveform component is assigned to such a task. When the user hits Control-g while there is a newly highlighted region in the waveform, this information (event) needs to be passed to the main program, and then to the annotation/transcription component and the AG library. This propagation of event information is illustrated in Figure 9. The event SetRegion is generated in the waveform component, and passed to the main program with the two parameters, the start time and the end time. Then, the main program sends SetRegion to the 2

However, Tkinter is not built by default.

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual

23

Main program

SetRegion t1 t2

Waveform display

User types Control-G

AG::SetAnchorOffset

AG-API

SetRegion t1 t2

Transcription editor

Update Internal Representation

Update Display

Figure 9: An example of inter-component message passing transcription component so that the start and end times can be updated in the transcription component. Also, the main program uses the AG function SetAnchorOffset to update the internal representation of the AG data. Table 4 shows a list of typical events passed around in this manner. Event name CreateAnnotation DeleteAnnotation SetFeature SetRegion GetRegion SetCurrentAnnotation Play Stop

Typical arguments start time, end time annotation ID feature, value start time, end time start time, end time annotation ID start time, end time

Table 4: A list of common events These events and their necessary parameters are passed as associative arrays (e.g., Tcl arrays and Python dictionaries). The AG Toolkit provides two utility Tcl files, ag-client.tcl and ag-master.tcl, to assist in the creation of arrays. Functions in ag-client.tcl are prefixed with AC and the functions in ag-master.tcl are prefixed with AM . For example, if the current region has a start time of 0.250 seconds and an end time of 1.250 seconds, AC CreateAnnotation creates an array with the following key and value pairs: Name:CreateAnnotation, StartTime:0.250 and EndTime:1.250. These arrays are passed to event handlers defined in the recipients. Table 7.3 illustrates how this passing of events is performed between the main program, an annotation/transcriber component (ag-table), a waveform display component (ag-wsurf ), the AG library and the AG File I/O library.

7.4 A spreadsheet widget in Tcl/Tk The AG Toolkit comes with a spreadsheet annotation widget called agTable (or, ag-table as originally called in the Tcl version). The agTable widget is based on TkTable written by Jeffrey Hobbs, et. al. [http://sourceforge.net/projects/tktable/]. Figure 10 shows a screen shot of the agTable

24

Sender Component main ag-table main ag-wsurf (ag.plug) main main

Code example $t newevent [AC SetRegion] ::ag-table::OutputCommand n $w [AM GetRegion] $w newevent [AC SetRegion] $Info(AgCallback) $w [AM RegionChanged] AG CreateAnnotation ... AG toXML ...

Component ag-table main

Recipient Handler Event handler defined in ag-table Event handler (tableEvent) defined in main

ag-wsurf main AG library AGFIO

Event handler defined in ag-wsurf Event handler (signalEvent) defined in main AG library AGFIO library (libagf)

Table 5: Sender and recipient of event messages in Tcl/Tk

widget. The number of columns and their corresponding feature names are configurable with the -titledef option as follows: ::ag-table::ag-table .w -titledef "ID:text START:text END:text TRANS:text"

The event message passing from the main script to agTable can be done as follows. First, the following Tcl/Tk code creates and displays an ag-table widget. package require ag-table set t [::ag-table::ag-table .t] pack $t

Figure 10: The spreadsheet widget, applied to data from the DAMSL/TRAINS corpus [15] And the following code sends a SetRegion event message to the ag-table widget. $t newevent [AC_SetRegion]

In this case, the arguments for the event SetRegion, the start time and the end time are passed from global variables defined in the main script3 . 3

Of course, this is just one way to handle the arguments, and one can define his/her own AC and AM functions to replace those in the files ag-client.tcl ag-master.tcl.

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual

25

7.5 A spreadsheet widget in Python The AG Toolkit includes a class definition for TkTable; so it is easy to use TkTable as if it is a widget included in Tkinter. In addition, the class agTable based on TkTable is provided in order to support the same functions as our Tcl version of agTable provides. The Python implementation supports a little more systematic way of providing interfaces between components than the Tcl implementation does. For example, the following segment of Python code defines an interface (table.ac) from the main script to the agTable component (table) using the event handler ‘table.newevent’. table.ac = agUtils.agClient(table.newevent)

Once the interface is defined, we can use methods the interface provides, such as ‘setRegion’.4 table.ac.setRegion(annId, start, end)

7.6 Embedding a wsurf widget in Tcl To use the message passing architecture with the wsurf widget, the AG Toolkit provides wrappers (agwsurf.tcl and ag.plug). The ag-wsurf.tcl script takes event messages and converts them into the wsurf commands described in Section 6.2. For example, the following code sends the event SetRegion using the current region defined as global variables to the wsurf widget $w. $w newevent [AC_SetRegion]

On the other hand, event messages passed from the wsurf widget to the main program are generated in the WaveSurfer plug-in file ag.plug, and the messages are passed to a callback function defined in the main program. For example, when the user sets a region in the waveform display, using a mouse button drag, the following message is generated by ag.plug: RegionChanged t1 t2 The event handler defined in the main script receives this message, and saves the passed start and end times as the current region.

7.7 Embedding a wsurf widget in Python The AG Toolkit includes a wrapper class definition for the Wsurf widget in Python. The class Wsurf covers the Wsurf API. The class agWsurf is a subclass of Wsurf, and it provides methods specific to the tasks required by the tools we create. 4

Note that in the Python version, the annotation ID, the start offset and the end offset are provided as arguments.

26

7.8

Putting everything together: a toy annotation tool in Tcl/Tk

Now that we have seen the major components in the AG Toolkit, we can create a specialized tool. For the sake of illustration, we only consider a very simple toy annotation tool. The only events supported in the tool are the following;

 Hitting Return (in Version 1.0) or Control-e (in Version 0.98) in the spreadsheet inserts a new annotation.

 Hitting Control-g in the spreadsheet updates the start and end times of the current annotation using the current region in the waveform.

 Entering a text in a cell and hitting Tab or arrow keys updates the AG data with the text. Appendix E shows the main script for the toy example tool. The Keybindings for Return or Control-e and Control-g are already defined in ag-table.tcl. These events pass the event messages CreateAnnotation and GetRegion to the main script. Tab and arrow keys send SetFeature the annotation id of the current row, the content of the current cell and the feature name (the column name). The following callback function defined in the toy main script handles incoming event messages from the ag-table.

24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

proc ag-tableEvent fw eventg f array set e $event global gv switch $e(Name) f CreateAnnotation f set a1 [AG_CreateAnchor $gv(ag)] AG_SetAnchorOffset $a1 $gv(currentStartPosition) set a2 [AG_CreateAnchor $gv(ag)] AG_SetAnchorOffset $a2 $gv(currentEndPosition) set id [AG_CreateAnnotation $gv(ag) $a1 $a2$gv(AGType)] puts "$id Created" $gv(tableWidget) newevent [AC_CreateAnnotation $id 1] set gv(currentAnnotation) $id g SetCurrentAnnotation f set gv(currentAnnotation) $e(AnnotationId) g GetRegion f $gv(tableWidget) newevent [AC_SetRegion] AG_SetAnchorOffset [AG_GetStartAnchor $gv(currentAnnotation)] $gv(currentStartPosition) AG_SetAnchorOffset [AG_GetEndAnchor $gv(currentAnnotation)] $gv(currentEndPosition) g SetFeature f AG_SetFeature $e(AnnotationId) $e(FeatureName)$e(Value)

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual

52 53 54 55 56

g

27

puts "$e(FeatureName) for $e(AnnotationId) set as$e(Value)." g default fg g

This ag-tableEvent callback function gets the message CreateAnnotation from the ag-table widget. It calls the AG API to create a new annotation: AG CreateAnnotation (Line 33). It uses the start and end times defined in gv(currentStartPosition) and gv(currentEndPosition). Similarly, when the callback function gets the message GetRegion, it updates the AG data using AG SetAnchorOffset for the start and end anchors of the current annotation (Lines 43,46). When the callback function receives the message SetFeature, it updates the AG data as well. The callback function signalEvent handles the event messages passed from the wsurf widget. The RegionChanged simply updates the variables defined in the main script to store the start and end times of the current region (Lines 64,65).

59 60 61 62 63 64 65 66 67 68 69

proc signalEvent fw eventg f array set e $event global gv switch $e(Name) f RegionChanged f set gv(currentStartPosition) $e(StartPosition) set gv(currentEndPosition) $e(EndPosition) g default fg g g

This toy tool actually does not do anything useful since it does not have means to save the data. However, it is quite a straightforward task to add a menu item which will save the contents of the current AG data using AG toXML, for example.

7.9 Putting everything together: a toy annotation tool in Python Similarly, we show a toy annotation tool written in Python in Appendix F. It does basically the same work as the Tcl version of the tool. Note, however, that there are slight differences in the implementation. For example, the GetRegion message from the spreadsheet widget returns a dictionary containing StartPosition and EndPosition, and the updating of the spreadsheet is done within the agTable widget. The following is the method definition of the event message handler for the agTable component in the toy tool. def agTableEvent(self, event): if event[’Name’] == ’CreateAnnotation’:

28

Figure 11: A toy annotation tool

a1 = ag.CreateAnchor(self.ag) ag.SetAnchorOffset(a1, self.currentStartPosition) a2 = ag.CreateAnchor(self.ag) ag.SetAnchorOffset(a1, self.currentEndPosition) annId = ag.CreateAnnotation(self.ag, a1, a2, self.agType) self.currentAnnotation = annId msg = fg msg[’AnnotationId’] = annId return msg elif event[’Name’] == ’GetRegion’: inmsg = self.wsurf.ac.getRegion(’’) if inmsg.has_key(’StartPosition’) and inmsg.has_key(’EndPosition’): self.currentStartPosition = inmsg[’StartPosition’] self.currentEndPosition = inmsg[’EndPosition’] msg = fg msg[’StartPosition’] = self.currentStartPosition msg[’EndPosition’] = self.currentEndPosition return msg elif event[’Name’] == ’SetFeature’: ag.SetFeature(event[’AnnotationId’], event[’FeatureName’],

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual

29

event[’Value’])

8 Conclusion This report has described a new toolkit, AGTK, for developing linguistic annotation software. It is being used for annotation projects at the Linguistic Data Consortium [www.ldc.upenn.edu], and as the framework for the NSF Talkbank project [www.talkbank.org]. Existing third-party software, such as CLAN, Emu and Transcriber, are currently being ported to use the AG API [17, 10, 3]. They will share the same internal data model and relational storage model, while keeping their distinctive user interfaces and file formats. Once these ports have been completed, we will have a shared library of user interfaces to complement the AG and file I/O libraries. We hope that these shared libraries will continue to grow as members of the wider community contribute I/O and GUI components. In time, we hope to have interfaces to video widgets on all platforms. We also hope to develop – or to collaborate on the development of – new applications for annotation in the following areas: sociolinguistics, conversational analysis, sign and gesture, interlinear text, discourse and dialogue, disfluency, and syntax. We also hope to apply the tools in the annotation of materials in many languages, and to support non-Roman scripts. We invite would-be developers to join us.

Acknowledgements This material is based upon work supported by the National Science Foundation under Grant Nos. 9978056, 9980009 (Talkbank). The authors are grateful to Claude Barras, Steve Cassidy, David Day, John Garofolo, Mark Liberman and Gary Simons for discussions on the material presented here.

30

Appendix A: AG API defined in IDL // IDL definition for flat AG API // 2001-09-10 v1.3b2 Steven Bird, Xiaoyi Ma(LDC) interface AG { typedef string Id; // generic identifier // Id can be from any of AGSetId, AGId, AnnotationId, TimelineId, SignalId typedef typedef typedef typedef typedef typedef typedef typedef typedef typedef typedef typedef typedef typedef typedef typedef typedef typedef typedef typedef typedef

string string string string string string string string string string string string string string string string string string string string string

typedef float

AGSetId; AGId; AGIds; AnnotationId; AnnotationType; AnnotationIds; AnchorId; AnchorIds; TimelineId; SignalId; SignalIds; FeatureName; FeatureNames; FeatureValue; Features; URI; MimeClass; MimeType; Encoding; Unit; AnnotationRef;

// // // // // // // // // // // // // // // // // // // // //

AGSet identifier AG identifier AG identifiers (space separated list) Annotation identifier Annotation type Annotation identifiers (list) Anchor identifier Anchor identifiers (list) Timeline identifier Signal identifier Signal identifiers (list) feature name feature name (list) feature value feature=value pairs (list) a uniform resource identifier the MIME class the MIME type the signal encoding the unit for offsets an annotation reference

Offset;

// the offset into a signal

//// AGSet //// AGSetId

CreateAGSet(

in AGSetId

agSetId);

boolean

ExistsAGSet(

in AGSetId

agSetId);

void

DeleteAG(

in AGSetId

agSetId);

in Id in TimelineId

id timelineId );

// Id is AGSetId or AGId AGId CreateAG(

boolean

ExistsAG(

in AGId

agId );

void

DeleteAG(

in AGId

agId );

AGIds

GetAGIds(

in AGSetId

agSetId );

//// Signals //// // Id may be AGSetId or TimelineId TimelineId CreateTimeline( in Id

id );

boolean

ExistsTimeline(

in TimelineId

timelineId );

void

DeleteTimeline(

in TimelineId

timelineId );

// Id may be TimelineId or SignalId SignalId CreateSignal( in Id in URI in MimeClass

id, uri, mimeClass,

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual

in in in in

MimeType Encoding Unit Track

mimeType, encoding, unit, track );

boolean

ExistsSignal(

in SignalId

signalId );

void

DeleteSignal(

in SignalId

signalId );

SignalIds

GetSignals(

in TimelineId

timelineId );

void

SetOffsetUnit(

SignalIds

in AnchorId in Unit GetOffsetUnit( in AnchorId SetAnchorSignalIds(in AnchorId in SignalIds GetAnchorSignalIds(in AnchorId

anchorId, unit); anchorId); anchorId, signalIds); anchorId);

MimeClass MimeType Encoding string string string Track

GetSignalMimeClass(in GetSignalMimeType( in GetSignalEncoding( in GetSignalXlinkType(in GetSignalXlinkHref(in GetSignalUnit( in GetSignalTrack( in

signalId signalId signalId signalId signalId signalId signalId

Unit void

SignalId SignalId SignalId SignalId SignalId SignalId SignalId

); ); ); ); ); ); );

//// Annotation //// // Id may be AGId or AnnotationId AnnotationId CreateAnnotation( in in in in

Id AnchorId AnchorId AnnotationType

id, anchorId1, anchorId2, annotationType );

boolean

ExistsAnnotation( in AnnotationId

annotationId );

void

DeleteAnnotation( in AnnotationId

annotationId );

AnnotationId CopyAnnotation(

in AnnotationId

AnnotationIds SplitAnnotation( in AnnotationId AnnotationIds NSplitAnnotation( in AnnotationId in short

annotationId );

annotationId ); annotationId, N );

AnnotationType GetAnnotationType( AnchorId GetStartAnchor( in AnchorId GetEndAnchor( in void SetStartAnchor( in in void SetEndAnchor( in in

in AnnoationId annotationId ); AnnotationId annotationId ); AnnotationId annotationId ); AnnotationId annotationId, AnchorId anchorId ); AnnotationId annotationId, AnchorId anchorId );

Offset Offset void

GetStartOffset( GetEndOffset( SetStartOffset(

void

SetEndOffset(

AnnotationId AnnotationId AnnotationId Offset AnnotationId Offset

in in in in in in

annotationId ); annotationId ); annotationId, offset ); annotationId, offset );

// this might be necessary to package up an id into a durable reference

31

32

AnnotationRef GetRef(

in Id

id );

//// Features //// // this is for both the content of an annotation, and for the metadata // associated with AGSets, AGs, Timelines and Signals. void

SetFeature(

in Id in FeatureName in FeatureValue

id, featureName, featureValue );

boolean

ExistsFeature(

in Id in FeatureName

id, featureName );

void

DeleteFeature(

in Id in FeatureName

id, featureName );

string

GetFeature(

in Id in FeatureName

id, featureName );

void

UnsetFeature(

in Id in FeatureName

id, featureName );

GetFeatureNames(

in Id

id );

void

SetFeatures(

in Id in Features

id, features );

Features

GetFeatures(

in Id

id );

void

UnsetFeatures(

in Id

id );

FeatureNames

//// List-Valued Features //// //

?? do we need to permit features to have list values?

//// Anchor //// // Id may be AGId or AnchorId AnchorId CreateAnchor(

in in in in

Id Offset Unit SignalIds

id, offset, unit, signalIds );

AnchorId

CreateAnchor(

in Id in SignalIds

id, signalIds );

AnchorId

CreateAnchor(

in Id

id );

boolean

ExistsAnchor(

in AnchorId

anchorId );

void

DeleteAnchor(

in AnchorId

anchorId );

void

SetAnchorOffset(

in AnchorId in Offset

anchorId, offset );

Offset

GetAnchorOffset(

in AnchorId

anchorId );

void

UnsetAnchorOffset(in AnchorId

anchorId );

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual

AnchorId

SplitAnchor(

in AnchorId

AnnotationIds GetIncomingAnnotationSet( in AnchorId AnnotationIds GetOutgoingAnnotationSet( in AnchorId

anchorId );

anchorId );

anchorId );

//// Index //// AnchorIds AnchorIds

AnchorIds

GetAnchorSet( in AGId GetAnchorSetByOffset( in AGId in Offset in float GetAnchorSetNearestOffset( in AGId in Offset

agId ); agId, offset, epsilon ); agId, offset );

AnnotationIds GetAnnotationSetByFeature( in AGId in FeatureName in FeatureValue

agId, featureName, featureValue);

AnnotationIds GetAnnotationSetByOffset( in AGId in Offset

agId, offset );

AnnotationIds GetAnnotationSetByType( in AGId agId, in AnnotationType annotationType ); // Get all the annotations sorted by using start anchor offset // as first sorting key and end anchor offset as the second AnnotationIds GetAnnotationSeqByOffset( in AGId agId); // Get all the annotations with start anchor offset greater or // equal to specified offset, sort by using start anchor offset // as first sorting key and end anchor offset as the second AnnotationIds GetAnnotationSeqByOffset( in AGId agId, in Offset begin); // Get all the annotations with start anchor offset in between // the specified offsets, sort by using start anchor offset // as first sorting key and end anchor offset as the second AnnotationIds GetAnnotationSeqByOffset( in AGId agId, in Offset begin, in Offset end);

//// Ids //// // Id may be AGId, AnnotationId, AnchorId AGSetId GetAGSetId( in Id

id );

// Id may be AnnotationId or AnchorId AGId GetAGId( in Id

id );

33

34

// Id may be AGId or SignalId TimelineId GetTimelineId(

in Id

id );

// dump the current AG Set in ATLAS Level 0 format string toXML(); // deprecated // dump the specified AGSet in ATLAS Level 0 format string toXML( in AGSetId id); // dump the specified AG in ATLAS Level 0 forma string toXML( in AGId agId); // // // //

load an AGSet from the database server, a connect string contains information of hostname, database name, user name and password. Return true if succeed, false if cann’t connect to DB server bool LoadFromDB( in string connStr, in AGSetId agsetId);

// // // //

Store the AGSet to the database server a connect string contains information of hostname, database name, user name and password. Return true if succeed, false if cann’t connect to DB server bool StoreToDB( in string connStr in AGSetId agsetId);

//// Relations //// // // AddToRelation( AnnotationId A, AnnotationId B, string Name, short Position ) // -- adds the named relation between the two annotations, if there // are already relations of this type on A (the ‘parent’) then // Position specifies the position in the relation (child) list. // DeleteRelation( AnnotationId A, AnnotationId B, string Name ) // GetRelation( AnnotationId A, string Name ) -> set( AnnotationId ) // GetInverseRelation( AnnotationId A, string Name ) -> set( AnnotationId ) // // These will be implemented using a higher-level API which encodes // the information in complex feature values, following the Dublin Core // DCSV model http://purl.org/dc/documents/rec/dcmi-dcsv-20000728.htm };

Appendix B: Examples of File Formats TIMIT .txt: 9190 21199

your dark suit

.wrd: 9190 11517 your 11517 16334 dark 16334 21199 suit .phn:

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual

9190 10337 jh 10337 11517 ih 11517 12500 dcl 12500 12640 d 12640 14714 ah 14714 15870 kcl 15870 16334 k 16334 18088 s 18088 20417 ux 20417 21199 q

AIF AIF format sample which is equivalent to the TIMIT sample above: jh your your dark suit ih dcl dark d ah kcl k s suit ux q

Appendix C: A DTD for Annotation Graphs
v1.0b1 v1.0b2 v1.0b3 v1.0

Steven Steven Steven Steven

Bird Bird Bird Bird

(LDC) (LDC) (LDC) (LDC)

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual

2001-04-02 2001-04-23 2001-09-15 2001-12-20

v1.1b1 v1.1b2 v1.1b3 v1.1

Steven Steven Steven Steven

Bird Bird Bird Bird

(LDC) (LDC) (LDC) (LDC)

-->

"1.0" "http://www.ldc.upenn.edu/atlas/ag/" "http://www.w3.org/1999/xlink" "http://purl.org/DC/documents/rec-dces-19990702.htm"



#REQUIRED


37

38

mimeClass mimeType encoding unit xlink:type xlink:href track

CDATA CDATA CDATA CDATA CDATA CDATA NMTOKEN

#REQUIRED #REQUIRED #REQUIRED #REQUIRED #FIXED #REQUIRED #IMPLIED

"simple"

>

#REQUIRED #IMPLIED #IMPLIED #IMPLIED

Appendix D: Example of Wsurf Configuration Files A configuration file for a 1-channel speech file # -*-Mode:Tcl-*# mono.conf: a configuration file for wavesurfer $widget configure -wavefill "#000000" $widget configure -selectfill "#cfcfcf" #$widget configure -framecolor "#7f7f7f" $widget configure -framecolor "white" $widget configure -cursorcolor "red" $widget configure -fillcolor "yellow" $widget configure -background "" $widget configure -overviewheight "35" $widget configure -pixelspersecond "40" set pane [$widget addPane -maxheight 2048 $pane configure -fillcolor "lightyellow" $pane configure -framecolor "blue" $pane configure -height 100

-minheight 10]

if {[ssurfer::PluginEnabled analysis]} { $widget analysis::addWaveform $pane -channel 1 -predraw 1 -limit -1 -fill black }

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual

set pane [$widget addPane -maxheight 10 $pane configure -height "10" $pane configure -scrollheight "10" $pane configure -relief "raised" $pane configure -height 10

-minheight 10]

if {[ssurfer::PluginEnabled timeaxis]} { $widget timeaxis::addTimeAxis $pane -color black -font "Courier 8" }

Appendix E: A Toy Annotation Tool in Tcl/Tk 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

#!/pkg/ldc/bin/wish8.3 # global variables are kept in the array gv set gv(agset) "" set gv(ag) "" set gv(currentStartPosition) 0.00 set gv(currentEndPosition) 0.00 set gv(AGType) toy set gv(currentAnnotation) "" # loading external modules package require -exact wsurf 1.0 set auto_path [concat [file dirname [info script]] $auto_path] package require -exact agtk 1.0 load ag_tcl.so set gv(scriptdir) source [file join source [file join source [file join source [file join

[file dirname [info script]] $gv(scriptdir) lib ag.tcl] $gv(scriptdir) lib ag-client.tcl] $gv(scriptdir) lib ag-wsurf.tcl] $gv(scriptdir) lib ag-table.tcl]

# Callback function for the ag-text widget proc ag-tableEvent fw eventg f array set e $event global gv switch $e(Name) f CreateAnnotation f set a1 [AG_CreateAnchor $gv(ag)] AG_SetAnchorOffset $a1 $gv(currentStartPosition) set a2 [AG_CreateAnchor $gv(ag)] AG_SetAnchorOffset $a2 $gv(currentEndPosition) set id [AG_CreateAnnotation $gv(ag) $a1 $a2$gv(AGType)] puts "$id Created" $gv(tableWidget) newevent [AC_CreateAnnotation $id 1] set gv(currentAnnotation) $id

g

SetCurrentAnnotation f set gv(currentAnnotation) $e(AnnotationId)

g

GetRegion f $gv(tableWidget) newevent [AC_SetRegion] AG_SetAnchorOffset [AG_GetStartAnchor $gv(currentAnnotation)] $gv(currentStartPosition) AG_SetAnchorOffset [AG_GetEndAnchor $gv(currentAnnotation)] $gv(currentEndPosition)

39

40

49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91

g

SetFeature f AG_SetFeature $e(AnnotationId) $e(FeatureName)$e(Value) puts "$e(FeatureName) for $e(AnnotationId) set as$e(Value)."

g

default fg

g

g

# Callback function for the wavesurfer widget proc signalEvent fw eventg f array set e $event global gv switch $e(Name) f RegionChanged f set gv(currentStartPosition) $e(StartPosition) set gv(currentEndPosition) $e(EndPosition)

g

default fg

g

g

# Initializing the AG kernel set gv(agset) [AG_CreateAGSet toy] set gv(timeline) [AG_CreateTimeline $gv(agset)] set gv(ag) [AG_CreateAG $gv(agset) $gv(timeline)] # wsurf widget ::wsurf::Initialize -plugindir [file join lib $gv(scriptdir)] set gv(wsurfWidget) [wsurf .w -configuration [file join libmono.conf]] set ::wsurf::ag-event::Info(AgCallback) signalEvent # table widget set gv(tableWidget) [::ag-table::ag-table .t -eventProcag-tableEvent -maxheight 200 -titledef "ID:text:20 Start:text End:text F1:text"] pack $gv(tableWidget) $gv(wsurfWidget) -fill both -expand yes # load a sample file set p(Name) OpenFile set p(FileName) [file join $gv(scriptdir) .. speech sp1.wav] $gv(wsurfWidget) newevent [array get p]

Appendix F: A Toy Annotation Tool in Python """ A toy annotation tool """ from Tkinter import * from agTable import * from agWsurf import * import ag class agToy: def __init__(self, master): try: master.tk.eval(’package require Tktable’) master.tk.eval(’package require -exact wsurf 1.0’) except TclError: print "This tool requires Tktable and WaveSurfer 1.0"

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual

# create spreadsheet widget self.trans = agTable(eventProc=self.agTableEvent) self.trans.pack(expand=1, fill=BOTH) # create waveform display self.wsurf = agWsurf(eventProc=self.agSignalEvent) self.wsurf.pack(expand=1, fill=BOTH) self.agset = ag.CreateAGSet(’Toy’) self.timeline = ag.CreateTimeline(self.agset) self.ag = ag.CreateAG(self.agset, self.timeline) self.currentAnnotation = None self.currentStartPosition = 0.00 self.currentEndPosition = 0.00 self.agType = ’Toy’ # create interfaces to clients self.trans.ac = agUtils.agClient(self.trans.newevent) self.wsurf.ac = agUtils.agClient(self.wsurf.newevent) def agTableEvent(self, event): if event[’Name’] == ’CreateAnnotation’: a1 = ag.CreateAnchor(self.ag) ag.SetAnchorOffset(a1, self.currentStartPosition) a2 = ag.CreateAnchor(self.ag) ag.SetAnchorOffset(a1, self.currentEndPosition) annId = ag.CreateAnnotation(self.ag, a1, a2, self.agType) self.currentAnnotation = annId msg = fg msg[’AnnotationId’] = annId return msg elif event[’Name’] == ’GetRegion’: inmsg = self.wsurf.ac.getRegion(’’) if inmsg.has_key(’StartPosition’) and inmsg.has_key(’EndPosition’): self.currentStartPosition = inmsg[’StartPosition’] self.currentEndPosition = inmsg[’EndPosition’] msg = fg msg[’StartPosition’] = self.currentStartPosition msg[’EndPosition’] = self.currentEndPosition return msg elif event[’Name’] == ’SetFeature’: ag.SetFeature(event[’AnnotationId’], event[’FeatureName’], event[’Value’]) def agSignalEvent(self, event): if event[’Name’] == ’RegionChanged’: self.currentStartPosition = event[’StartPosition’] self.currentEndPosition = event[’EndPosition’] root = Tk() main = agToy(root) # load a sample speech file main.wsurf.ac.loadFile("test.wav")

41

42

root.mainloop()

References [1] Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison Wesley, 1995. [2] James Allen, Donna Byron, Myroslava Dzikovska, George Ferguson, Lucian Galescu, and Amanda Stent. An architecture for a generic dialogue shell. Natural Language Engineering, 6:213–228, 2000. [3] Claude Barras, Edouard Geoffrois, Zhibiao Wu, and Mark Liberman. Transcriber: development and use of a tool for assisting speech corpora production. Speech Communication, 33:5–22, 2001. [4] David M. Beazley. Python Essential Reference. New Riders, 2001. [5] Steven Bird, Peter Buneman, and Wang-Chiew Tan. Towards a query language for annotation graphs. In Proceedings of the Second International Conference on Language Resources and Evaluation. Paris: European Language Resources Association, 2000. [6] Steven Bird and Jonathan Harrington, editors. Speech Communication: Special Issue on Speech Annotation and Corpus Tools, volume 33. Elsevier, 2001. [7] Steven Bird and Mark Liberman. Linguistic annotation, 1998. http://www.ldc.upenn.edu/annotation/. [8] Steven Bird and Mark Liberman. A formal framework for linguistic annotation. Speech Communication, 33:23–60, 2001. [9] Steve Cassidy and Steven Bird. Querying databases of annotated speech. In Proceedings of the Eleventh Australasian Database Conference, pages 12–20. Los Alamitos, CA: IEEE Computer Society, 2000. [10] Steve Cassidy and Jonathan Harrington. Multi-level annotation of speech: An overview of the emu speech database management system. Speech Communication, 33:61–77, 2001. [11] John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathon G. Fiscus, David S. Pallett, and Nancy L. Dahlgren. The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus CDROM. NIST, 1986. http://www.ldc.upenn.edu/Catalog/LDC93S1.html. [12] John E. Grayson. Python and Tkinter Programming. Manning, 2000. [13] R. Grishman. TIPSTER Architecture Design Document Version 2.3. Technical report, DARPA, 1997. http://www.nist.gov/itl/div894/894.02/related_projects/tipster/. [14] Marck Harrison and Michael McLennan. Effective Tcl/Tk programming: writing better programs with Tcl and Tk. Addison-Wesley, 1998.

The Annotation Graph Toolkit (Version 1.0): Application Developer’s Manual

43

[15] Daniel Jurafsky, Elizabeth Shriberg, and Debra Biasca. Switchboard SWBD-DAMSL Labeling Project Coder’s Manual, Draft 13. Technical Report 97-02, University of Colorado Institute of Cognitive Science, 1997. [http://stripe.colorado.edu/˜jurafsky/manual.august1.html]. ´ [16] Mark Lutz and David Ascher. Learning Python. OReilly, 1999. [17] Brian MacWhinney. The CHILDES Project: Tools for Analyzing Talk. Mahwah, NJ: Lawrence Erlbaum., second edition, 1995. [http://childes.psy.cmu.edu/]. [18] John K. Ousterhout. Tcl and the Tk Toolkit. Addison-Wesley, 1994. [19] K˚are Sj¨olander. The Snack sound toolkit, 2000. [http://www.speech.kth.se/snack/]. [20] K˚are Sj¨olander and Jonas Beskow. WaveSurfer – an open source speech tool. In Proceedings of the 6th International Conference on Spoken Language Processing, 2000. [http://www.speech.kth.se/wavesurfer/]. [21] Brent Welch. Practical Programming in Tcl and Tk, 3rd ed. Prentice Hall, 2000.

Suggest Documents