Annotating Conversations for Information State Updates - CiteSeerX

40 downloads 0 Views 83KB Size Report
Feb 15, 1999 - (Allen and Perrault, 1980; Cohen and Levesque, 1990; Sadek et al., 1994) are concerned with the reasoning processes that underlie dialogues ...
Annotating Conversations for Information State Updates Massimo Poesioy , Robin Cooperz , Staffan Larssonz , David Traum , and Colin Mathesony [email protected], fcooper,[email protected], [email protected], [email protected] y University of Edinburgh, z G¨oteborg University,  University of Maryland

February 15, 1999 Theme III - Dialogue Analysis (Empirical)

1 MOTIVATIONS We are engaged in the task of defining a taxonomy of dialogue acts for the Autoroute domain on the basis of an investigation of the Autoroute corpus collected by DERA.1 These days, dialogue acts are found as ingredients of two rather different types of theories. Some people–developers of spoken dialogue systems among them–see labeling utterances with dialogue acts as a compact way of characterizing the structure of conversations in a given domain; such labeling makes it possible to annotate the large amounts of data that are needed to train statistical models of dialogue act recognition (Alexandersson et al., 1997; Carletta et al., 1997). The theories of dialogue acts resulting from this work are suitable for empirical work: they have large coverage and make annotation easy. Other researchers (Allen and Perrault, 1980; Cohen and Levesque, 1990; Sadek et al., 1994) are concerned with the reasoning processes that underlie dialogues; their aim is not just that of providing a list of labels, but also of specifying the effects of speech acts so that planning-like techniques can be used to determine the appropriate dialogue act to use in a given situation. These theories result in detailed formalizations of each dialogue act that make it possible to for other researchers to verify the predictions of a theory and to make meaningful comparisons between theories; such formalizations are often cast in terms of the effects of dialogue acts on the agents’s mental states or, as we will call them here, INFORMATION STATES (Cohen and Levesque, 1990; Poesio and Traum, 1998). The problem we encountered is that neither the empirical nor the formalist approach to dialogue acts has originated taxonomies that can be easily adapted to obtain an empirically supported taxonomy for a given domain. Because classification schemes for dialogue acts produced by the empirical camp are only specified in a fairly informal fashion, it’s not easy to understand whether such a scheme is appropriate for a different domain. On the other hand, formalist theories tend to develop taxonomies with much narrower coverage, and they are hard to verify empirically because checking whether a given utterance satisfies the definition of a certain dialogue act is not always easy. An additional difficulty with the existing schemes is that they are not always easy to reconcile with the hypothesis, common in modern theories of dialogue acts, that utterances may operate at different levels – e.g., grounding, ’core’, and turn-taking - and that more than one ’core’ dialogue act can be performed simultaneously (Traum and Hinkelman, 1992). Annotation schemes that only allow for an annotator to specify a single label ensure better reliability, but at the expense of the annotator’s being able to capture all the effects of an 1 We are grateful to the Speech Research Unit of the Defence Evaluation and Research Agency, Malvern, UK, for making the Autoroute dialogues available to the Trindi project.

1

utterance; conversely, annotation schemes that allow several dialogue acts to be associated with an utterance tend to result in poor reliability as not all annotators may always mark all the relevant labels. We have been working towards an annotation methodology that we feel provides better support for the development of theories of dialogue acts by making annotation possible (if not as easy as simply assigning labels to utterances), while preserving some of the rigor characteristic of more formalized theories (Cooper and Larsson, 1999). The main distinguishing feature of the methodology we are proposing is that we take the notion of information state and its updates as central. Starting from the view that dialogue acts can be characterized in terms of their effects on an information state, (i) we characterized information states in terms of feature structures and (ii) we formulated the effects of utterances in terms of operations on these feature structures, thus making it possible for us to adapt existing annotation tools to the task of annotating in terms of updates to information states. So far, we have cast two theories of dialogue acts in these terms–one developed by Cooper and Larsson on the basis of work by Ginzburg (Cooper and Larsson, 1999; Ginzburg, 1998), and one developed by Poesio and Traum Poesio and Traum (1998)– and we used the derived annotation schemes for some preliminary annotation of the Autoroute corpus. In this paper we describe our characterization of information states and of the operations on them, we give examples of two annotations of a same dialogue using two different annotation schemes, and describe the tools we are using.

2 CHARACTERIZING INFORMATION STATES AND HOW THEY GET UPDATED 2.1 Casting Information States in Terms of Feature Structures The characterization of the state of the conversation adopted in the spoken dialogue systems currently in real use, or close to actual use (e.g., (Albesano et al., 1997)) can be represented in terms of feature structures as in (2): a list of fields which the system must fill before being able to ask a query.2 2

(2)

l1

6 l2 6 4 ...

n

l

= =

a1

=

a

3 7 7 5

a2

n

For example, in the Autoroute domain we are currently studying, the goal of the system is to identify the start and end points of the trip, and the departure time; this information can be represented as in (3), by assuming that the value of each field is a (possibly empty) list of elements.3 2

(3)

start 4 end stime

3

hi hi 5 hi

= = =

Each dialogue act can then be formalized as an operation on these feature structures that either specifies a value for one of the fields (more precisely, a push on that field), or replaces 2 This notation can be interpreted in various ways. One interpretation we have adopted is that in terms of typed records as discussed in Cooper (1998a, 1998b). Using the notation a : T to represent the judgement that a is of type T , if a1 : T1 ; a2 : T2 ; : : : ; an : Tn then the object in (2) is of the record type in (1).

2

(1)

4

l1 l2

: :

T1

:

T

T2

...

n

l

3 5

n

3 Additional

constraints can also be imposed by the user - e.g., minimizing time, or toll cost, etc. We will ignore these constraints here.

2

one of the values already specified (a push and a pop). Jonathan Ginzburg has shown (Ginzburg, 1995a,b, 1998) that a characterization of information states and their dynamics in terms of feature structures and operations on them can be used for more complex types of information state, as well. We took therefore this view of information states as our starting point.

2.2 Some Shared Aspects of Our Theories of Information States Although one of our goals is to develop a methodology for annotation that can be used both with very simple and with very complex theories of information states, we are also concerned with developing A theory of information states adequate for our domain. Therefore, we did not simply develop two completely independent annotation systems; instead, we attempted to identify some of their common features. Specifically, both annotation schemes discussed below incorporate the assumption from Ginzburg’s theory of dialogue, KOS (Ginzburg, 1995a,b, 1998), that conversational rules involve updates by each dialogue participant ( DP) of her own DIALOGUE GAMEBOARD (DGB). Both of the annotation schemes discussed below involve annotating separately the DGB for each agent; the value of that field is in turn a feature structure representing the information state of that agent at that point. The characterization of these information states depends on the theory, but both theories discussed below assume that the feature structure includes a field for each component of the mental state assumed by that given theory (beliefs, intentions, obligations, etc). In the theory developed by Cooper and Larsson, for example (see below), an agent’s information state consists of a private and a shared part, each of which are in turn feature structures with fields bel, agenda, etc: 2

2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4

a

=

6 6 6 6 4

" private

=

shared

b

=

agenda

=

 private

=

2 shared

=

6 4

fg 

=

raise(Where does B want to go? ) raise(What time does B want to make the journey? ) raise(Does B wanto the quickest# or shortest route? ) B has A’s attention A has B’s attention

=

"

2 6 6 6 6 6 4

bel

n bel

=

qud

=

bel agenda

bel qud

= =

fg

(

= =

Where does B want to start?





respond(Where does B want to start? )

)

B wants a route from A B has A’s attention A has B’s attention B wants assistance Where does B want to start?

3



7 5





# 3 3

 3 7 7 7 7 7 5

Depending on the need of the study, one may stop at the level of detail in the previous example, where the contents of propositional attitudes are left unanalyzed, or specify a language for expressing propositions and go deeper in the analysis. In the annotations we’ve done so far we have represented propositions as English sentences, as in the example just given.

2.3 Updates as Operations on Feature Structures Information state update can be specified in terms of operations on features structures such as: pop(a.private.agenda) push(a.shared.qud, Where does B want to start? ) push(b.private.agenda, respond(Where does B want to start? )) push(b.shared.qud, Where does B want to start? ) It is possible to define such operations precisely enough that an annotation tool can carry them automatically; the annotator then simply has to specify which operations on the pre3

7 7 7 7 5

7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5

vious information state are performed by a given utterance, and the annotation tool will automatically produce a new information state. In doing so, an annotator may identify certain regularities: sets of operations that are often performed together. Such sets of operations can then be defined as dialogue acts. For example, the dialogue act query-w from the MapTask classification scheme (Carletta et al., 1997) is defined by Cooper and Larsson as follows: query-w(A; B; q ) “A asks B q ” Preconditions: fst(A.Private.Agenda) = raise(q ) whq(q ) Effects pop(A.Private.Agenda) push(q , A.Common.QUD) push(respond(fst(B.Common.QUD)), B.Private.Agenda) push(q , B.Common.QUD) Having defined these dialogue acts, an annotator can then specify how the previous information state is updated by an utterance simply by specifying which dialogue act(s) were performed, leaving it to the annotation tool to carry over the desired updates. In other words, the methodology we are proposing is one where the development of a taxonomy of dialogue acts for a given domain / task proceeds by specifying (i) the notion of information state relevant for that task, then (ii) the operations on those states that may result from utterances, and finally (iii) a taxonomy of dialogue acts defined as complex update operations. These definitions can then be empirically verified by annotating dialogues in that domain, checking whether all the desired information is present and whether the information states resulting from the operations specified are as desired.

3 TWO EXAMPLES OF ANNOTATION SCHEMES We are currently investigating this approach to the study of dialogues. In the rest of the paper we discuss in more detail how we have used this methodology to develop annotation schemes to study two taxonomies for classifying dialogue acts, one developed by Cooper and Larsson on the basis of work by Ginzburg (Cooper and Larsson, 1999; Ginzburg, 1998), and one developed by Poesio and Traum (1998).

3.1 Cooper and Larsson Ginzburg (1995a,b, 1998) proposes that the DGB is structured to include the following three fields: FACTS: the set of commonly agreed upon facts. QUD: a set of Questions Under Discussion, partially ordered by . L-M: latest move. The aim of Cooper and Larsson is to develop a theory of dialogue update based on the simplest notion of information state that would allow them to illuminate the dynamics of QUD management. Their agent’s DGB involves both Private and Common information. The Private information consists of a set of private beliefs (a set of propositions). In the particular annotation they did this was treated as a static field, not modified as a result of 4

the dialogue moves. As with many aspects of this annotation, this was a simplification that they thought worth pursuing as long as it would hold but that they felt probably would not hold up with a more detailed annotation or a similar annotation of a more complex dialogue. Their overall analytical strategy was to use as simple means as possible until it becomes clear what phenomena motivate additional complexity. The second private field is an Agenda which is a stack of actions which the agent is to perform. The idea here is that the Agenda represents very local actions. More general goals that the agent wishes to achieve with the conversation (or her life) would, on the simple view presented here, be included in the private beliefs. (This feels like it should be an oversimplification and that it will be necessary to have a separate field for goals.) In contrast to goals, Agenda items are actions that should in general be performed in the next move. Agenda items are introduced as a result of the previous move. Cooper and Larsson tried to make minimal assumptions about what actions could be put on the Agenda (i.e. what actions could be performed by the dialogue participants). Dialogue participants may either raise questions (put them on QUD), respond to questions (which are maximal in QUD) or give an instruction to the other dialogue participant. The goal is to do as much as possible in terms of raising or responding to questions. The first Common field in the information state is again for a set of beliefs (i.e. a set of propositions). It is something of a misnomer to call this beliefs since it is meant to represent what has been established for the sake of the conversation and they do not really mean that this necessarily represents a committment on the part of the dialogue participants to the common propositions. The common beliefs represent rather what has been established as part of the conversational record, assumptions according to which the rest of the dialogue should proceed. This can, of course, be distinct from what the dialogue participants “really think”. The second Common field is QUD, a stack of questions under discussion. Like the Agenda, this is meant to be a local affair, representing question(s) that should be addressed more or less in the next turn and not general issues that have been raised by the conversation so far or issues that the agent feels to be generally relevant. An example of an information state before and after utterance U4 in the Autoroute dialogue we have been studying: 2

2

6 6 6 6 6 6 6 6 6 6 4

6 6 6 6 4

A

=

2 B

=

4

2 Private

=

Common

=

Private

=

Common

=

6 4 h h h

Bel

3(i).A.Private.Bel raise(Where does B want to start?), raise(Where does B want to go?), Agenda = raise(What time does B want to make the journey?), raise(Does B want the quickest or shortest route?) i Bel = 3(i).A.Common.Bel [ fB wants a route from Ag QUD = i 3 Bel = 3(i).B.Private.Bel Agenda = i 5 Bel = 3(i).B.Common.Bel QUD =

3 3 3

=






7 7 7 5 7 7 7 7 7 7 5 7 7 7 7 7 7 5

U4 A pop(A.Private.Agenda) push(Where does B want to start the journey?, A.Common.QUD) push(respond(fst(B.Common.QUD)), B.Private.Agenda) push(Where does B want to start the journey, B.Common.QUD) 4. 2

6 6 6 6 6 6 6 6 4

2

A

B

=

=

"

6 6 4

Private

2

Common

=

Private

=

Common

=

4

=

h h h

Bel

3(ii).A.Private.Bel raise(Where does B want to go?), raise(What time does B want to make the journey?), Agenda = raise(Does B want the quickest or shortest i route?) Bel = 3(ii).A.Common.Bel QUD = Where does B want to start the journey? 3 i Bel = 3(ii).B.Private.Bel Agenda = respond(fst(B.Common.QUD)) i 5 Bel = 3(ii).B.Common.Bel QUD = Where does B want to start the journey

# 3 3

=




7 7 7 7 5 7 7 7 7 7 7 5

3.2 Poesio and Traum One of the central concern of Poesio and Traum (1998), which builds upon previous work by Traum (1994), is the process by which the common ground is established, or GROUND ING (Clark and Schaefer, 1989; Traum and Hinkelman, 1992). Poesio and Traum view the information state as including a record both of the material that has already been grounded, indicated as G here, and of the material that hasn’t yet been grounded; the ungrounded part consists of a specification of the current ‘contributions,’ or DISCOURSE UNITS, as they are called in (Traum and Hinkelman, 1992). As in the case of the notion of information state developed by Cooper and Larsson, the information state of each agent is explicitly represented in the feature-based representation; but since the grounding process is explicitly represented, each agent’s information state is going to include, in addition to a list of the agent’s private intentions, the following information:

 The current state of G;  The current DUs and their position on the UDUS stack. The second difference is that obligations are used instead of QUD (Traum and Allen, 1994). G and each DU are going to contain the current information:

 The current obligations of agents;  A list of speech acts, listed under the attribute DH;  The list of propositions to which agents are socially committed to. To summarize, each information state is going to contain the following information: 2

2

2

33

3

OBL: < . . . > 7 6 6 6 6G: 4DH: < . . . > 5 7 77 7 6 6 SCP: < . . . > 7 6 6 77 6A: 6 77 6 6INT: < . . . > 77 6 4 57 7 6 DUi: . . . 7 6 UDUS: < DUi, . . . > 7 6 3 6 2 2 3 7 7 6 7 6 OBL: < > 77 6 6 4 577 G: DH: < > 6 6 7 6B: 6 SCP: < . . . > 7 7 7 6 6 6 77 6 55 4 4DUi: . . . UDUS: < DUi, . . . >

The obligation stack is represented as a list for each dialogue participant (DP), the obligations themselves being of two types (‘address’ or ‘answer’) with pointers to the relevant moves in the DH. The following is an example: A: < A ANSWER 2 > B: < B ANSWER 3, B ADDRESS 1 > This indicates that DP A has an obligation to answer Move 2, while DP B has obligations to answer Move 3 and to address Move 1. The intentions stack is again a simple list of agenda items which must be dealt with in turn. A:




6

To see how this notion of information state applies to representing the effects of utterances, consider the same utterance used to exemplify the Cooper and Larsson approach. The effect of a new utterance is to create a new DU (DU4), which becomes part of the information state of both agents. The main difference between the information states of the agents in this case is that B has the intention to get a route from Malvern to Edwinstowe, whereas A has the intentions to get the information he needs to address that request. U4 [A]: Where would you like to start your journey. 2 2 

OBL: SCP:

< B UNDERSTANDING-ACT 4B, A ADDRESS 3C >

33



6 6G: 6 6 6A: 6INT: < GET(SP), GET(DEST), GET(ST), GET(ROUTE-TYPE), GIVE B ROUTE(SP,DEST,ST,ROUTE-TYPE)   6 6 6 4 OBL: < B ANSWER 4 > 6 DU4A: DH: < 4: INFO -REQUEST > 6 6 2  3 6 OBL: < B UNDERSTANDING - ACT 4B, A ADDRESS 3C > 6 6 6G: SCP: 7 6 6 7 6B: 6INT: < GET A ROUTE FROM MALVERN TO EDWINSTOWE > 7   6 6 7 4 4 5 OBL: < B ANSWER 4 > DU4B:

DH:

< 4: INFO-REQUEST >

77 77 7 >7 77 57 7 7 7 7 7 7 7 7 7 5

The answer from B in U5 results in DU4 being grounded, i.e., added to G. B’s obligation to answer 4 is moved to G and stays there until his action is grounded. B commits himself to the belief that the starting point is Malvern. (We only show A’s info state for brevity.) U5 2 [B]: 2 Malvern. 

OBL: SCP:

< A UNDERSTANDING-ACT DU5B, B ANSWER 4, A ADDRESS 3C>

6 6G: 6 6 6 6INT: < GET(SP), GET(DEST), GET(ST), GET(ROUTE-TYPE), # " 6A: 6 6 6 OBL: < A ADDRESS 5B > 4 4DU5B: SCP: DH:

GIVE

B



ROUTE (SP,DEST,ST,ROUTE-TYPE)

< 5A: ANSWER, 5B: ASSERT >

33 77 77

7 >7 77 77 55

4 THE ANNOTATION TOOLS A set of scripts have been implemented to facilitate annotation using the MAT (Manual Annotation Tool) in GATE. Also, two MAT annotation schemes have been designed for annotating updates to information states. The infostate scheme has the following attributes:

 Participant: The participant whose information state is updated.  Operation: This is the type of operation to be performed, e.g. push, pop (for stacks), add and delete (for sets).  Field: The fields are shorthand names for paths in the information state record, such as qud (for common.qud), agenda (for private.agenda) etc.  Content: The value of this attribute is a reference to an annotation produced by the content scheme. Contents are currently sentences of natural language. Eventually, one might want want to complement this with a more formal represantation of content.  Action: This attribute is used only for pushes to the agenda, as in push(A.private.agenda, raise(content-12)). The actions are raise, respond and instruct.

7

 Order is a natural number indicating when an update is to be performed in relation to other updates caused by a single utterance (segment). It is used in cases where a single segment is annotated with several order-dependent updates. For example, if an utterance is annotated with several pushes to A.shared.qud,the resulting information state depends on the order in which these are executed. The contents scheme is used to annotate the dialogue transcription with (natural language) paraphrases of the contents. These paraphrases (and all other annotations) are assigned indexes, which can then be used as values of the content attribute of the infostate scheme. For example, the annotation for the utterance A

according to the Cooper and Larsson scheme might look something like this:

ID 19 37 39 40

TYPE contents infostate infostate infostate

START

END

164 164 164

209 209 209

41

infostate

164

209

ATTRIBUTES (string:Where does B want to start?) (field:agenda) (operation:pop) (participant:A) (content:19) (field:qud) (operation:push) (participant:A) (action:respond) (content:19) (field:agenda) (operation:push) (participant:B) (content:19) (field:qud) (operation:push) (participant:B)

5 (PRELIMINARY) DISCUSSION) We used the annotation schemes discussed above to study one dialogue from the Autoroute corpus; this involved several annotations and subsequent revisions. In this section we are going to discuss our preliminary conclusions about the methodology and some empirical issues raised by this work.

5.1 Advantages and Disadvantages of the Methodology We do feel that the methodology we are developing could be useful both (i) for people who are interested in studying dialogue acts either from an empirical perspective or by looking in more detail at the formal differences between systems; and (ii) for people who are building systems, who could just come up with a characterization of their information states without worrying about formal details, a characterization of the updates each dialogue act performs, and then use the tool to check that their definitions of dialogue acts behave as intended. There are however some potential problems to be considered. First of all, since the notation does not wear its semantics on its sleeves, more detailed comparisons between theories will involve either more detailed annotations, or spelling out the interpretation of primitives such as intentions and obligations, or both. For example, we have been investigating the differences between a model based on obligations and a model based on questions under discussions; but such differences cannot be revealed as long as the only constraint we impose on the fields is that their values are stacks. Secondly, it has become even clearer to us that annotating for information states is not suitable for large-scale annotation: both because it is time consuming, and because it is even more difficult to agree on what the composition of the information state of an agent is than it is to agree on what dialogue act it is performed. It definitely seems to be the case that this type of annotation should be used in the preliminary phases of an annotation work, to come up with a taxonomy of dialogue acts that appears to have adequate coverage and matches the operations that the system has to perform; subsequent, large scale annotation can then be done in terms of atomic labels.

8

5.2 Some Empirical Issues In this section we will discuss a few preliminary empirical findings–in particular, we’ll compare the ’optimistic’ approach to grounding of Cooper and Larsson, where it is always assumed that utterances are grounded, with the pessimistic approach of Poesio and Traum, where it is assumed that nothing is assumed grounded until an explicit signal is received.

References Albesano, D., Baggia, P., Danieli, M., Gemello, R., Gerbino, E., and Rullent, C. (1997). A robust system for human-machine dialogue in a telephony-based application. Journal of Speech Technology, 2(2), 99–110. Alexandersson, J., Buschbeck-Wolf, B., Fujinami, T., Maier, E., Reithinger, N., Schmitz, B., and Siegel, M. (1997). Dialogue acts in VERBMOBIL-2. Verbmobil Report 204, DFKI, University of Saarbruecken. Allen, J. F. and Perrault, C. (1980). Analyzing intention in utterances. Artificial Intelligence, 15(3), 143–178. Carletta, J., Isard, A., Isard, S., Kowtko, J., Doherty-Sneddon, G., and Anderson, A. H. (1997). The reliability of a dialogue structure coding scheme. Computational Linguistics, 23(1), 13–32. Clark, H. H. and Schaefer, E. F. (1989). Contributing to discourse. Cognitive Science, 13, 259 – 94. Cohen, P. R. and Levesque, H. J. (1990). Rational interaction as the basis for communication. In P. Cohen, J. Morgan, and M. Pollack, editors, Intentions in Communication, chapter 12, pages 221–256. Morgan Kaufmann. Cooper, R. and Larsson, S. (1999). Dialogue moves and information states. In Proc. of the Third IWCS, Tilburg. Ginzburg, J. (1995a). Resolving questions, i. Linguistics and Philosophy, 18(5), 567–609. Ginzburg, J. (1995b). Resolving questions, ii. Linguistics and Philosophy, 18(6), 567–609. Ginzburg, J. (1998). Clarifying utterances. In J. Hulstijn and A. Niholt, editors, Proc. of the Twente Workshop on the Formal Semantics and Pragmatics of Dialogues, pages 11–30, Enschede. Universiteit Twente, Faculteit Informatica. Poesio, M. and Traum, D. (1998). Towards an axiomatisation of dialogue acts. In J. Hulstijn and A. Nijholt, editors, Proc. of the Twente Workshop on the Formal Semantics and Pragmatics of Dialogues, pages 207–222, Enschede. Universiteit Twente, Faculteit Informatica. Sadek, D., Ferrieux, A., and Cozannet, A. (1994). Towards an artificial agent as the kernel of a spoken dialogue system: a progress report. In Proc. of the AAAI Workshop on Integration of Natural Language and Speech. Traum, D. R. (1994). A Computational Theory of Grounding in Natural Language Conversation. Ph.D. thesis, University of Rochester, Department of Computer Science, Rochester, NY. Traum, D. R. and Allen, J. F. (1994). Discourse obligations in dialogue processing. In Proc. of the 32nd Annual Meeting of the Association for Computational Linguistics, pages 1–8, New Mexico. Traum, D. R. and Hinkelman, E. A. (1992). Conversation acts in task-oriented spoken dialogue. Computational Intelligence, 8(3). Special Issue on Non-literal Language.

9

Suggest Documents