Problem - CiteSeerX

9 downloads 0 Views 102KB Size Report
Jan 23, 1998 - Sta an Larsson. Dept. of linguistics, G .... 3] Lars Ahrenberg, Nils Dahlb ack and Arne J onsson(1995): Coding Schemes for. Studies of Natural ...
Using a type hierarchy to characterize reliability of coding schemas for dialogue moves Sta an Larsson Dept. of linguistics, Goteborg University January 23, 1998

Problem An important issue in choosing a coding schema is reliability. Is the schema reliable, i.e. can the coding results be replicated by researchers others than those who have devised the schema? If so, then (arguably) the results derived from coding are more likely to re ect facts. Unfortunately, many schemas do not include useful measures of reliability. This problem has only recently been recognized and a proposed solution, the Kappa statistic ([6]), seems to have been generally accepted. The advantage of Kappa is that it normalizes pairwise agreement between coders with respect to the expected random agreement, i.e. the amount of agreement that the coders would reach if they annotated a dialogue by chance Since the theory of speech acts was conceived, there has been numerous proposals of which speech acts (dialogue moves) there are, how they are to be de ned and how they are related to each other. In the last few years, several taxonomies in the form of coding schemas for dialogue moves (or speech acts) have been developed more-or-less independently and used for coding dialogues. HCRC have developed a schema with three complementary structural levels (move, game and transaction) for coding dialogue structure in the Map Task corpus. Linkoping University (Ahrenberg, Dahlback, Jonsson) have coded a corpus of WOZ-dialogues using two di erent schemas - one very simple (henceforth referred to as LINLIN1) and one slightly more complex (LINLIN2). In connection with the TRAINS project [8], Traum ([9]) has developed ve complementary coding schemas for spoken dialogue structure: coherence, grounding, surface form, illocutionary function and argumentation structure. A part of the TRAINS corpus has been annotated using these schemes. Allwood [2] also gives an account of communication management which subsequently has been re ned and speci ed in two complementary coding schemas, Own Communication Management (OCM) and Interaction Management (IM). Finally, The DAMSL (Dialogue Act Markup in Several Layers) schema is the product of the Discourse Resource Initiative (DRI), consisting of researchers from several dialogue projects worldwide. The goal of DRI is to provide a standard for coding of dialogue acts, which if necessary can be augmented with further subdivisions of the given categories. There are signs indicating that the DAMSL schema may become a \standard" schema for dialogue move coding. Such a standard has several advantages, e.g. increased re-usability and comparability of research results and a common framework for research on dialogue moves. However, it is likely (if not certain) that the DAMSL  This paper is an edited version of a poster presented at CMC/98. Work on this paper has been supportedby S-DIME (Swedish dialogue move engine), NUTEK/HSFR Language Technology project F305/97.

1

schema won't please everyone - depending on what you are interested in, it may be too complex regarding some aspects and/or too simple regarding others. This means e.g. that some distinctions that have acceptable reliability in other kinds of dialogue may prove to be unreliable for the kind of dialogue you are studying. In short, we will probably need a more exible coding schema.

Solution Our proposed solution to these problems is this. Firstly, try to produce a \maximal" schema with all imaginable distinctions one might want to make. One way of doing this is to unify existing schemas into a type hierarchy. Figures 1 and 2 shows a rough impression of how some of the schemas mentioned above are related. There are clearly similarities between the schemas, e.g. the division of \core moves" into initiatives and responses. The DAMSL schema is generally the most complex, but in some cases the HCRC schema is gives a more ne-grained analysis. In the \feedback" group, the HCRC, DAMSL, TRAINS and GBG-IM schemas all contribute to uni ed schema with larger coverage than any of the individual schemas. These table suggests that, if we view coding schemas as type hierarchies for dialogue moves, we can embed complex (parts of) schemas in simpler schemas. For example, we can produce a maximally complex schema by extending the DAMSL schema with parts of the HCRC, TRAINS and GBG-IM schemas (see Figure 3), and try to code dialogues using this maximally complex schema. Since it may sometimes be dicult or impossible to decide what category to apply to an utterance, one should allow for coding of \non-leaf" categories, i.e. categories that have subcategories. For example, if one can't decide whether a certain utterance is a Request or Suggest, one should perhaps code it as an In uencing-addresseefuture-action. Of course, one should always try to be as speci c as possible. We can then compute reliability at any desired level (or combination of levels) in the hierarchy. When analysing coded dialogues, we may choose to collapse some distinctions (e.g. seeing checks, aligns and queries as info-requests) if reliability for these aspects turns out to be too low, or if we're simply not interested in them. (An aside: While the kappa statistic is designed for mutually exclusive categories, it can be extended to non-mutually-exclusive categories by replacing each set of such categories with its superset. For example, if we have a set of two independent categories initiating and response, we can compute kappa for the superset f initiating, response, initiating+response g.) Of course, it may turn out that the \maximal"schema was not sucient after all, and we may want to include further subclassi cations. Since we allow coding and analysis at any level, this will not pose a problem - the annotations done with the old schema will simply be seen as non-maximally-speci c uses of the new schema. An obvious problem with this sort of approach is that schemas use di erent definitions for the same phenomena. Thus, to make this kind comparison more exact, there is a need for nding a way of giving (more) exact and principled semantics for coding schemas.

2

LINLIN2 HCRC Initiative

Update Question

|

GBG-IM

Core speech | acts

Query-yn Query-w Check Align Instruct

YNQ WHQ

Response moves

|

TRAINS

Initiating moves Forward Looking Function Explain Statement

|

Response (Answer)

DAMSL

Reply-y, Reply-n, Reply-w, Clarify |

Ready ???

Assert Reassert Other Info-request

In uencing-addresseefuture-action

Inform

Action-directive Open-Option

Request Suggest

O er Commit Explicit-performative Exclamation

O er

Committing-speaker future-action

Backward Looking Function

Answer

Agreement

Accept Accept-part Maybe Reject Reject-part Hold |

Promise |

Core speech | acts

Eval

Accept

+Accept-content

Reject

?Accept-content

|

|

Figure 1: Rough impression of relations between move taxonomies, pt. 1

3

LINLIN2 |

HCRC

DAMSL

TRAINS

|

|

(Acknowledge) | (Acknowledge)

ReqAck ReqRepair

|

(Response moves) Understanding

| Acknowledge | | | | | |

|

Discourse management

|

Opening Continuation Closing

Grounding

| Signal-understanding Ack Acknowledge Repeat-rephrase Completion Signal-Non| Understanding | | (Signal-non-und.) | | | (Signal-non-und.) | Correct-Misspeaking Repair | Initiate Continue Cancel | Turn-taking take-turn keep-turn release-turn assign-turn Conventional |

GBG-IM

Feedback function

Elicit FB

+Accept-com-act

?Accept-com-act +Understanding

?Understanding +Perception ?Perception +Contact ?Contact | Turn Management

Turn acceptance Turn holding Turn closing | |

Opening Closing

Figure 2: Rough impression of relations between move taxonomies, pt. 2

4

Move

Initiating (Forward Looking)

Response (Backward Looking) K=0.89

Update

Assert

Reassert

Instruct

Other

Info-request

Check

Align

K=0.66 Action-directive

Query-yn

Query-w K=?

Open-Option K=0.7 K=?

Figure 3: Part of hypothetical schema formed by uniting DAMSL and HCRC, with phony Kappa values for groups at di erent levels

References [1] Allen, J. and Core, M. (1997): Draft of DAMSL: Dialog Act Markup in Several Layers. [2] Allwood, J. (1995): An Activity Based Approach to Pragmatics. In Gothenburg Papers in Theoretical Linguistics 76, Dept. of linguistics, University of Goteborg. Forthcoming in Bunt & Black (eds.) Approaches to Pragmatics. [3] Lars Ahrenberg, Nils Dahlback and Arne Jonsson(1995): Coding Schemes for Studies of Natural Language Dialogue. in Working Notes from AAAI Spring Symposium , Stanford, 1995. [4] Carletta, J. , Isard, A. , Isard, S. , Kowtko, J. , Doherty-Sneddon, G. (1996): HCRC dialogue structure coding manual. Technical Report HCRC/TR-82. [5] Carletta, J. , Isard, A. , Isard, S. , Kowtko, J. , Newlands, A. ,Doherty-Sneddon, G. , Anderson, A. (1997): The reliability of a dialogue structure coding scheme. Computational Linguistics , Volume 23, Pages 13-31 . [6] Carletta, J. (1996): Assessing agreement on classi cation tasks: the kappa statistic. Computational Linguistics , Volume 22(2), Pages 249-254. [7] Poesio, M. and David R. Traum (1997): Representing Conversation Acts in a Uni ed Semantic/Pragmatic Framework. Draft. [8] Traum, D. R., and Hinkelman E. A. (1992): Conversation Acts in Task-oriented Spoken Dialogue, Computational Intelligence, 8(3):575{599, 1992. Also available as University of Rochester TR 425. [9] Traum, D. R. (1996): Coding Schemas for Spoken Dialogue Structure. Unpublished manuscript.

5