Hybrid Discourse Modeling and Summarization for a Speech-to ...

3 downloads 142 Views 1MB Size Report
Dec 18, 2003 - VerbMobil—a project for a speech-to-speech translation system for sponta- ...... message passing, logging etc (Klüter, Ndiaye, & Kirchmann, 2000). ...... ticket time abstract_object object location concret_object top quality.
Hybrid Discourse Modeling and Summarization for a Speech-to-Speech Translation System

Dissertation zur Erlangung des Grades Doktor der Ingenieurwissenschaften (Dr.-Ing.) der Technischen Fakult¨ at I der Universit¨ at des Saarlandes vorgelegt von Jan Alexandersson

Dekan: Prof. Dr. Philipp Slusallek Erstgutachter: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster Zweitgutachter: Prof. Dr. Dietrich Klakow

Saarbr¨ ucken 2003

Promotionskolloquium am 18.12.2003 um 16:15. DFKI, Saarbr¨ ucken, Saal Turing.

Eidesstattliche Erkl¨ arung

Hiermit erkl¨are ich, dass ich die vorliegende Arbeit ohne unzul¨assige Hilfe Dritter und ohne Benutzung anderer als der angegebenen Hilfsmittel angefertigt habe. Die aus anderen Quellen oder indirekt u ¨ bernommenen Daten und Konzepte sind unter Angabe der Quelle gekennzeichnet. Die Arbeit wurde bisher weder im In- noch im Ausland in gleicher oder a¨hnlicher Form in anderen Pr¨ ufungsverfahren vorgelegt.

Saarbr¨ ucken, den 6. November 2003

i

ii

Danksagung Vermutlich ist mein Freund und fr¨ uherer Nachbar Nils Dahlb¨ack an dieser Arbeit schuld. Er ist der Typ, der mich davon u ¨ berzeugt hat in Link¨oping Informatik zu studieren und nicht irgendwas anderes irgendwo anders, weit weg vom Heimatst¨adtchen. Desweiteren u ¨ berzeugten er und seine Kollegen Mats Wir´en, Magnus “Eximanatorn” Merkel, Arne J¨onsson and Lars Ahrenberg mich, dass Computerlinguistik viel mehr Spass macht als, z.B. Prologprogrammierung. Ich tat mein bestes um diese Richtung nicht einzuschlagen, und konzentrierte mich einige Zeit auf andere Dinge. Wie dem auch sei, Mats ist es gelungen, mir den Weg zu bereiten am DFKI in Saarbr¨ ucken eine T¨ ur zu o¨ffnen um mit Computerlinguistik richtig arbeiten zu d¨ urfen. Vielen Dank! Das erste halbe Jahr am DFKI besetzte ich einen kleinen Teil von Stephan Busemann’s B¨ uro. Hier genoss ich eine wierklich angenehme Zeit - tackar tackar! Hiermit bedanke ich mich bei Prof. Hans Uszkoreit daf¨ ur, dass er es mir erm¨oglichte in der IUI-Gruppe am DFKI weiter zu arbeiten. Den entscheidenden Anstoss f¨ ur die vorliegende Arbeit gab mir meine anf¨angliche T¨atigkeit in der IUI-Gruppe von Prof. Dr. Wolfgang Wahlster. Norbert Reithinger bat mich der Dialoggruppe von VerbMobil anzuschliessen, der auch Elisbeth Maier angeh¨orte - vielen Dank! Im R¨ uckblick kann ich sagen, dass meine ersten VerbMobil-Jahre die besten Voraussetzungen boten und optimale Bedingungen schufen um Computerlinguistik wirklich zu lernen. Durch die Jahre hindurch, hatte ich am DFKI die Gelegenheit mit vielen interessanten Menschen zu arbeiten, die aus diesem Lebensabschnitt eine gute Zeit f¨ ur mich machten, und desweiteren in der einen oder anderen Weise einen Beitrag zu dieser Arbeit geleistet haben: Insbesondere Norbert Reithinger, Elisabeth Maier, Michael Kipp, Ralf Engel, Peter Poller, und ¨ Tilman Becker, aber auch Wolfgang Finkler, Anne Kilger, Ozlem Senay, Amy Demeisi, Massimo Romanelli, Patrick Bauer, Markus L¨ockelt, Reinhard Karger, die KollegInnen in den Projekten VerbMobil und SmartKom, iii

. . . (habe ich jemanden vergessen?). Mein ganz specieller Dank gilt Stephan Lesch f¨ ur seine Hilfe bei der Implementierung und Entwicklung von den Konfabulationen. Ich bin den Koautoren meiner wissenschaftlichen Ver¨offentlichungen sehr dankbar f¨ ur die fruchtbare Zusammenarbeit. Meinen w¨armsten Dank an Norbert Pfleger’s nicht-endenden, kamikatze-¨ahnlichen Leistungen w¨arend der Enstehung dieser Arbeit. Melanie Siegel und Mark Seligman erkl¨arten vieles u ¨ ber Japan und die japanische Sprache - domo! Walter Kasper anwortete mindestens auf eintausend Fragen betreffend den wahren Kern der Semantik betreffend. Ein weiterer Dank an Tony Jameson f¨ ur Konfabulationen und Korrekturlesen. Auch Amy Demeisi, Aude Wagni`ere und Tilman Becker danke ich f¨ ur ihre unsch¨atzbare Hilfe beim Korrekturlesen. Tilman Becker, Uli Krieger, Frederik Fouvry, Takashi Ninomiya und Bernd Kiefer trugen wesentlich zum Verst¨andniss der default-Unifikation w¨arend der Formalisierung der Overlay-Operation bei. Klaus Zechner sei bedankt f¨ ur die Erlaubnis seine Software benutzen zu d¨ urfen, obwohl er seine gestige und wissenschaftliche Heimat verlassen hat - Klaus: Ich d¨ urfte sie nie benutzen. . . :-(. ¨ Vielen Dank www.leo.org f¨ ur die unsch¨atzbare Ubersetzungshilfe und google und citeseer f¨ ur das Auffinden dessen was ich gesucht habe. Einen besonderen Dank an Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster f¨ ur seinen unmissverst¨andlichen Ansporn diese Arbeit endlich zu einem Ende zu bringen. Tack s˚ a mycket f¨ ur den Tritt und daf¨ ur, dass Sie Ihre Zeit f¨ ur Diskussionen, Vorschl¨age etc. investiert haben. Speziellen Dank gilt auch Prof. Dr. Dietrich Klakow, der kurzfristich die Aufgaben des Zweitgutachters u ¨ bernommen hat. Meiner Familie in Link¨oping ein warmes Dankesch¨on daf¨ ur, dass sie jederzeit einfach f¨ ur mich da war und immer noch f¨ ur mich da ist. Zum Schluss, meine liebste Eva, ,,Marie-gumman“ och ,,min Malin“: Ohne eure Liebe, Geduld und Unterst¨ utzung und vorallem f¨ ur die Aufmunterung als ich den Glauben in meinen F¨ahigkeiten verlor, w¨are diese Arbeit nicht entstanden. (. . . und hier ist es nochmal:) Ich verspreche, dass ich keine weitere Dissertation schreiben werde.

Jan Alexandersson, November 2003

iv

Acknowledgments This work’s fault is probably friend and former neighbor Nils Dahlb¨acks! He is the guy who convinced me to study computer science in Link¨oping and not something else at some place far away from Link¨oping. Furthermore he and his colleagues Mats Wir´en, Magnus “Eximanatorn” Merkel, Arne J¨onsson and Lars Ahrenberg convinced me that NLP is much hipper than, say, logic programming. I tried my very best to avoid this direction and did for a while concentrate on other things. However, Mats finally managed to open up the door for really working with computational linguistics at DFKI in Saarbr¨ ucken. Thank you! The first 6 months at DFKI I occupied a tiny part of Stephan Busemann’s office. There, I enjoyed a very pleasant time - tackar, tackar. I hereby express my gratitude to Prof. Hans Uszkoreit for making it possible to continue working for the IUI group at DFKI. The reason for this thesis started as I joined prof. Wolfgang Wahlster’s IUI group. I was drafted by Norbert Reithinger to join the dialogue group of VerbMobil which additionally consisted of Elisbeth Maier - many thanks! Looking back, the first VerbMobil years formed the most optimal environment and the greatest opportunity to really learn computational linguistics ever seen! During the years I came to work with a lot of people here at DFKI making this period of life a great time and furthermore contributing in one way or another to this thesis: Especially Norbert Reithinger, Elisabeth Maier, Michael Kipp, Ralf Engel, Peter Poller and Tilman Becker but also Martin ¨ Klesen, Wolfgang Finkler, Anne Kilger, Ozlem Senay and Amy Demeisi, Massimo Romanelli, Patrick Bauer, Markus L¨ockelt, Reinhard Karger, the colleagues in the VerbMobil and SmartKom projects . . . (did I forget someone?). A very special thank to Stephan Lesch for his help with implementation and development of the confabulations. I’m very grateful to my co-authors of the publications I’ve been involved in for fruitful collaboration. My warmest gratitude is due to Norbert Pfleger for his never-ending v

kamikatze-like efforts during the writing of this thesis. Melanie Siegel and Mark Seligman explained a lot about Japan and Japanese - domo! Walter Kasper answered at least one thousand questions about the true nature of semantics. Tony Jameson for confabulations. Amy Demeisi, Aude Wagni`ere and Tilman Becker for invaluable help with proofreading. Tilman Becker, Uli Krieger, Frederik Fouvry, Takashi Ninomiya and Bernd Kiefer contributed enormously to my understanding of default unification during the formalization of the overlay operation. Thanks also to Klaus Zechner for allowing me to use his software although he left his home of dissertation - Klaus: I was never allowed to use it... :-(. Thanks to www.leo.org for invaluable translation support and google and citeseer for finding most of the stuff I was looking for. Very special thanks are due to Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster for his irresistible encouragement to bring this thesis to its longawaited completion. Tack s˚ a mycket for kicking me and for investing your time, for discussions, suggestions etc. I am also indebted to Prof. Dr. Dietrich Klakow for the “Zweitgutachten”. I owe my family in Link¨oping a warm debt for just being there. Finally, meine liebste Eva, “Marie-gumman” och “min Malin”, without your love, patience and support and, in particular, for the encouragement as I lost belief in my abilities, I would have never made it. (. . . and here it is again:) I promise I will never ever write another thesis.

Jan Alexandersson, November 2003

vi

Warum entspricht uns die Welt nicht? Weil sie von selbst keinen Sinn hat. Wir sind so, daß wir etwas, was keinen Sinn hat, nicht ertragen. Also geben wir dem, was keinen Sinn hat, einen Sinn. Schon die Sinnlosigkeit zu erforschen, macht Sinn. Entspricht uns. Why doesn’t the world *reflect* us? Because it doesn’t make sense by itself. We are such that we can’t bear something that doesn’t make sense. So we make sense of what doesn’t make sense. Just investigating senselessness makes sense. *Reflects* us. Martin Walser “Sprache, sonst nichts” “Die Zeit” No. 40, 30. September 1999

vii

viii

Kurzzusammenfassung Die vorliegende Arbeit behandelt haupts¨achlich zwei Themen, die f¨ ur das ¨ VerbMobil-System, ein Ubersetzungssystem gesprochener Spontansprache, entwickelt wurden: das Dialogmodell und als Applikation die multilinguale Generierung von Ergebnissprotokollen. F¨ ur die Dialogmodellierung sind zwei Themen von besonderem Interesse. Das erste behandelt eine in der vorliegenden Arbeit formalisierte Default-Unifikations-Operation namens Overlay, die als fundamentale Operation f¨ ur Diskursverarbeitung dient. Das zweite besteht aus einem intentionalen Modell, das Intentionen eines Dialogs auf f¨ unf Ebenen in einer sprachunabh¨angigen Repr¨asentation darstellt. Neben dem f¨ ur die Protokollgenerierung entwickelten Generierungsalgorithmus wird eine umfassende Evaluation zur Protokollgenerierungsfunktionalit¨at vorgestellt. Zus¨atzlich zu ,,precision“ und ,,recall“ wird ein neues Maß—Konfabulation (Engl.: ,,confabulation“)—vorgestellt, das eine pr¨azisere Charakterisierung der Qualit¨at eines komplexen Sprachverarbeitungssystems erm¨oglicht.

ix

x

Short abstract The thesis discusses two parts of the speech-to-speech translation system VerbMobil: the dialogue model and one of its applications, multilingual summary generation. In connection with the dialogue model, two topics are of special interest: (a) the use of a default unification operation called overlay as the fundamental operation for dialogue management; and (b) an intentional model that is able to describe intentions in dialogue on five levels in a language-independent way. Besides the actual generation algorithm developed, we present a comprehensive evaluation of the summarization functionality. In addition to precision and recall, a new characterization— confabulation—is defined that provides a more precise understanding of the performance of complex natural language processing systems.

xi

xii

Zusammenfassung Die vorliegende Arbeit beschreibt die Ergebnisse mehrj¨ahriger Forschung, die sich auf ein kleines Gebiet der Verarbeitung nat¨ urlicher Sprache konzentriert hat: ein robuster Ansatz zur Dialogmodellierung und Diskursverarbeitung mit dem u ¨ bergeordneten Ziel von Sprachunabh¨angigkeit. Dieser Ansatz wurde haupts¨achlich im Kontext von VerbMobil entwickelt—einem ¨ Projekt, das ein Ubersetzungssystem f¨ ur spontansprachliche Verhandlungsdialoge entwickelt hat. Der erste wesentliche Beitrag dieser Arbeit ist ein Dialogmodell, das auf propositionalem Gehalt und Intentionen beruht. Zwei unterschiedliche Teile des Modells werden im Detail diskutiert: (a) Ein Ansatz zur Modellierung und Verfolgung des propositionalen Gehalts auf der Basis von Beschreibungslogiken und Defaultunifikation. Diese Technik wurde im Rahmen des SmartKom-Projekts weiterentwickelt, bei dem ein symetrisches multimodales Dialogsystem entwickelt wurde. Dies trifft insbesondere zu f¨ ur ¨ die Weiterentwicklung und Formalisierung von Uberlagerung (Engl.: “overlay”), einem Defaultunifikationsalgorithmus in Kombination mit einer Bewertungsfunktion der als Hauptverarbeitungsmechanismus f¨ ur die Interpretation von Benutzerhypothesen dient. (b) Ein Ansatz f¨ ur die Verfolgung von Intentionen, basierend auf Dialogakten, Dialogschritten und Dialogspielen kombiniert mit Dialogphasen. Diese Bausteine werden so arrangiert, dass ein gesamter Dialog sprachunabh¨angig auf f¨ unf unterschiedlichen Ebenen beschrieben werden kann. Im zweiten Hauptbeitrag dieser Arbeit wird eine auf dem Inhalt des Diskursged¨achtnisses basierende Anwendung beschrieben: die multilinguale Generierung von Zusammenfassungen/Protokollen. Bei Verhandlungsdialogen enth¨alt das Diskursged¨achtnis am Ende des Dialogs unter den Diskursobjekten diejenigen Objekte, auf die die Teilnehmer sich verst¨andigt haben. Basierend auf diesen wird ein datengetriebener ”bottom-up” Generierungsalgorithmus vorgestellt, der zusammen mit zwei bereits existierenden Modulen des VerbMobil-Systems–dem Transfermodul und dem multimodalen Generxiii

ator GECO–Zusammenfassungen in allen Sprachen des Systems erzeugt. Der dritte wesentliche Beitrag dieser Arbeit ist die Evaluation der Zusammenfassungsfunktionalit¨at von VerbMobil. Die umfassende Evaluation belegt die Validit¨at unseres Ansatzes zur Dialogmodellierung. Dar¨ uber hinaus wird gezeigt, dass die u ¨ blichen Evaluationsmasse, die auf ,,precision” und ,,recall” basieren, nicht geeignet sind, die Qualit¨at der Protokollfunktionalit¨at zu messen. Eine wesentliche Erkenntnis ist, dass Verarbeitungsfehler in allen Schritten neue Diskursobjekte in das Diskursged¨achtnis einf¨ ugen k¨onnen, die gar nicht Teil des Dialogs waren. Diese Objekte—Konfabulationen (Engl.: confabulations) genannt—tauchen schliesslich in den Zusammenfassungs/Protokollen auf. Die Standardevalutationsmasse ,,precision” und ,,recall” basieren auf der Annahme, dass eine Teilmenge der Diskursobjekte, die tats¨achlich von den (Dialog-)Teilnehmern erw¨ahnt werden, korrekterweise oder f¨alschlicherweise ausgew¨ahlt werden. Mit diesen Massen alleine werden die konfabulativen Fehler des Systems nicht in der Evaluation erkennbar. Daher wird eine ausf¨ uhrlichere Evaluation beschrieben, die Konfabulationen von echt-positiven Diskursobjekten unterscheidet. Mithilfe zweier neuer Metriken—relative und totale Konfabulation—wird eine informativere und ehrlichere Charakterisierung der Systemleistung pr¨asentiert. In der abschliessenden Diskussion werden die Ergebnisse der Arbeit zusammengefasst und weitere Forschungsthemen vorgeschlagen. Schliesslich enth¨alt der Anhang Ausschnitte des Annotationshandbuchs f¨ ur Dialogakte, -schritte und -spiele zusammen mit einigen Charakterisierungen des verwendeten Korpus. Er enth¨alt auch ,,traces“ f¨ ur zwei Beispieldialoge in VerbMobil zusammen mit den zugeh¨origen Ergebnisprotokollen.

xiv

Abstract This thesis describes the result of several years of research focusing on a small part of natural language processing: with an overall goal of languageindependence, a robust approach to dialogue modeling and discourse processing. The main context in which the approach has been developed is VerbMobil—a project for a speech-to-speech translation system for spontaneously spoken negotiation dialogue. The first major contribution of the thesis is a dialogue model based on propositional content and intentions. Two distinct parts of the model are discussed in depth: (a) An approach to the modeling and tracking of propositional content that is based on description logics and default unification. This technique has been developed further within the SmartKom project—a project for symmetric multimodal dialogue. This is true in particular for the advancement and formalization of overlay, a default unification algorithm in combination with a scoring function that together serve as the main operation for the contextual interpretation of user hypotheses. (b) An approach to the modeling and tracking of intentions that is based on dialogue acts, dialogue moves and dialogue games together with dialogue phases. These building blocks can be arranged in such a way that the complete dialogue is described on five different levels in a language-independent way. A new characterization called dialogue moves is introduced that encompasses several dialogue acts. In the second major contribution of the thesis, an application based on the content of the discourse memory is described: multilingual generation of summaries. For negotiative dialogue, the discourse memory contains at the end of the dialogue amongst other discourse objects those objects that have been agreed upon by both interlocutors. On the basis of these, a data-driven bottom-up generation algorithm is described that together with two already existing modules of the VerbMobil system—the transfer module and the multilingual generator GECO—produces summaries in any language deployed by the system. xv

The third major contribution of the thesis is the evaluation of the summarization functionality of VerbMobil. A comprehensive evaluation shows the validity of our approach to dialogue modeling. Additionally, it is shown that standard evaluation metrics based on precision and recall fail to describe the performance of the summarization functionality correctly. A crucial finding is that erroneous processing in any processing step inserts discourse objects that were not part of the dialogue into the discourse memory. These objects—called confabulations—eventually appear in the summaries. The standard evaluation metrics precision and recall are based on the assumption that a subset of those discourse objects that are actually mentioned by the interlocutors are correctly or erroneously selected. If these metrics are used, the number of confabulative errors committed by the system is never revealed. Therefore, a more extensive evaluation distinguishing confabulations from true positive discourse objects is described. Through the use of two new metrics—relative and total confabulation—a more honest characterization of the system performance is presented. In the conclusion, we summarize the thesis and suggest further research directions. Finally, the appendix shows excerpts from the annotation manuals for dialogue acts, moves and games together with some corpus characteristics. It also shows traces of two sample dialogues processed by VerbMobil, along with the corresponding summaries.

xvi

Contents 1 Introduction 1.1 The VerbMobil Project . . . . 1.1.1 The VerbMobil scenario 1.1.2 The VerbMobil system 1.2 Main Scientific Questions . . . . 1.3 Thesis Organization . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

1 7 8 10 15 17

2 Dialogue Modeling 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . 2.2 Some terminology . . . . . . . . . . . . . . . . . . 2.2.1 Syntax, Semantics and Pragmatics . . . . 2.3 Theories of Dialogue Structure . . . . . . . . . . 2.3.1 Modeling utterances . . . . . . . . . . . . 2.3.2 Relations between utterances and turns . 2.3.3 Initiative, Response and Mixed Initiative 2.3.4 Conversational Games . . . . . . . . . . . 2.3.5 Dialogue Games . . . . . . . . . . . . . . 2.3.6 Plan Based Approaches . . . . . . . . . . 2.3.7 Plan Recognition Models . . . . . . . . . 2.3.8 Dialogue Grammars . . . . . . . . . . . . 2.3.9 DRT . . . . . . . . . . . . . . . . . . . . . 2.3.10 Conclusion . . . . . . . . . . . . . . . . . 2.4 Using the Theory - Annotating Corpora . . . . . 2.4.1 Coding Schemata—MATE . . . . . . . . . 2.4.2 Annotating Corpora reliably . . . . . . . 2.5 Dialogue Modeling in VerbMobil . . . . . . . . 2.6 The Intentional Structure . . . . . . . . . . . . . 2.6.1 Dialogue Acts in VerbMobil . . . . . . . 2.6.2 Dialogue Moves in VerbMobil . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

19 20 24 24 27 27 34 35 38 40 40 42 43 45 45 46 47 48 51 52 54 57

xvii

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . .

63 65 66 71

3 Dialogue Management in VerbMobil 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Characteristics of the VerbMobil Dialogues . . . . . . . . . 3.2.1 Human–human vs. man–machine negotiation dialogue 3.2.2 Length of the dialogues . . . . . . . . . . . . . . . . . 3.2.3 Behavioural differences due to cultural differences . . 3.2.4 Turn complexity . . . . . . . . . . . . . . . . . . . . . 3.2.5 Subdialogues . . . . . . . . . . . . . . . . . . . . . . . 3.2.6 Controlling vs. Mediating . . . . . . . . . . . . . . . . 3.2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Recognizing Spontaneous Speech . . . . . . . . . . . . . . . . 3.3.1 Speech Recognition . . . . . . . . . . . . . . . . . . . . 3.3.2 Prosody . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Architecture and Tasks of the Dialogue Component . . . . . . 3.5 Input to the dialogue component . . . . . . . . . . . . . . . . 3.5.1 Recognition of the Dialogue Act . . . . . . . . . . . . 3.5.2 Recognition of Propositional Content . . . . . . . . . . 3.6 Dialogue Processing in VerbMobil - DiVe . . . . . . . . . . 3.7 Managing the Thematic Structure . . . . . . . . . . . . . . . 3.7.1 Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.2 The Dialogue Processor . . . . . . . . . . . . . . . . . 3.7.3 Completing the Data . . . . . . . . . . . . . . . . . . . 3.7.4 Completing the Data - revisited . . . . . . . . . . . . . 3.7.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . 3.7.6 Formalizing overlay . . . . . . . . . . . . . . . . . . 3.7.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Managing the Intentional Structure . . . . . . . . . . . . . . . 3.8.1 The plan processor . . . . . . . . . . . . . . . . . . . . 3.8.2 Acquiring Plan Operators and Language Models . . . 3.8.3 Adapting the Plan Processor for building the Intentional Structure . . . . . . . . . . . . . . . . . . . . . . 3.8.4 Processing Flow of the Intentional Structure . . . . . 3.8.5 Recognizing moves . . . . . . . . . . . . . . . . . . . . 3.8.6 Building the moves structure . . . . . . . . . . . . . .

73 73 74 74 75 75 76 77 77 78 78 79 81 82 83 84 85 88 89 90 92 94 96 99 101 104 110 111 113 116

2.7 2.8

2.6.3 Games in VerbMobil . . . . . . 2.6.4 Dialogue Phases in VerbMobil Propositional Content . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . .

xviii

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

116 118 120 120

3.9

3.8.7 Building the rest of the Structure 3.8.8 Setting the dialogue phase . . . . 3.8.9 Discussion . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

4 Generating Multilingual Summaries 4.1 Summarization . . . . . . . . . . . . . . . . . . . . . . 4.2 A Summarization of Related Work on Summarization 4.2.1 Related Work . . . . . . . . . . . . . . . . . . . 4.3 Natural Language Generation . . . . . . . . . . . . . . 4.3.1 Some Terminology and Concepts . . . . . . . . 4.3.2 Related Work . . . . . . . . . . . . . . . . . . . 4.4 The Summary Generator - SuGe . . . . . . . . . . . . 4.4.1 Requirements and a Solution . . . . . . . . . . 4.4.2 Designing the Generation Algorithm . . . . . . 4.4.3 Implementing the Generation Algorithm . . . . 4.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Discussion . . . . . . . . . . . . . . . . . . . . . 4.5.2 Confabulation vs. Mistake . . . . . . . . . . . . 4.5.3 Error Analysis . . . . . . . . . . . . . . . . . . 4.5.4 Discussion . . . . . . . . . . . . . . . . . . . . . 4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . .

122 123 125 127

. . . . . . . . . . . . . . . .

129 131 135 136 138 140 142 148 148 150 154 163 169 170 174 176 178

5 Conclusion 179 5.1 Main Scientific Answers . . . . . . . . . . . . . . . . . . . . . 182 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Appendix

187

References

215

xix

xx

List of Figures 1.1 1.2 1.3 1.4 1.5 1.6 1.7

The Vauquois triangle . . . . . . . . . . . . . . . . . . . . . . “Reinhard4”—The sample dialogue . . . . . . . . . . . . . . . Reinhard4 - The summary . . . . . . . . . . . . . . . . . . . . A demonstration of the VerbMobil system. . . . . . . . . . The VerbMobil graphical user interface (GUI) . . . . . . . Relating VerbMobil to the Vauquois triangle. . . . . . . . . Textual representation of the German VIT representing the sentence “das ist wundersch¨ on.” . . . . . . . . . . . . . . . . . Graphical representation of a sample VIT. representing the sentence Das Treffen dauert 1.5 Tage. which translates to The meeting lasts 1.5 days. . . . . . . . . . . . . . . . . . . .

5 9 11 12 13 14

Example of reduction . . . . . . . . . . . . . . . . . . . . . . . Harry Bunts Dialogue Control Functions . . . . . . . . . . . . A dialogue grammar for the Cars application. ∗, +, () and | have their usual meaning. . . . . . . . . . . . . . . . . . . . . 2.4 Tree structure of an oral dialogue using the SUNDIAL dialogue manager. . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 The five levels of the intentional structure . . . . . . . . . . . 2.6 Dialogue acts hierarchy as employed in VerbMobil 2 . . . . 2.7 An example of unhanding. In (15), the speaker is requesting a suggestion. Such an act corresponds to the transfer-initiative move. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Clipping of the hierarchy . . . . . . . . . . . . . . . . . . . . . 2.9 Textual representation of the propositional content of the sentence well it is about the business meeting in Hanover . . . . 2.10 Graphical representation of the proposition content for the sentence well it is about the business meeting in Hanover . . .

23 32

1.8

2.1 2.2 2.3

xxi

16

17

44 45 53 55

62 67 67 68

3.1

The dialogue component and its neighbour modules in VerbMobil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Architecture of DiVe (Dialogue Processing for VerbMobil) 3.3 Principal temporal units (capital letters) and their possible specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Dialogue excerpt showing recognized utterance (left), extracted objects (middle) and content objects (right) derived by template filling and completion with a sponsoring expression. The resulting structure is more specific than the arguments of the complete operation. . . . . . . . . . . . . . . . . . . . 3.5 Plan Operator Syntax . . . . . . . . . . . . . . . . . . . . . . 3.6 The PCFG for the greet move. The numbers in brackets are the probabilities produced by the Boogie system. The root of the grammar i the top-most rule (s-10455). . . . . . . . . . . 3.7 Two leaf operators. . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Processing the intentional structure. . . . . . . . . . . . . . . 3.9 Plan operators for i) a complete negotiation dialogue and ii) arbitrary number of I-R games. . . . . . . . . . . . . . . . . 3.10 Plan operators for the negotiation game . . . . . . . . . . . . 3.11 A plan operator for a complete negotiation dialogue extended with the tree context. In the :goal, the variable ?IN is the input variable and the ?OUT is the output variable. . . . . . 4.1

4.2

83 91 97

100 114

117 118 122 123 124

125

The Summary Machine. The dotted lines indicate how the processing in VerbMobil relates to the summary machine. The EXTRACTION module corresponds to the syndialog module and the INTERPRETATION module to the dialogue module of VerbMobil. Finally, the module corresponding to the GENERATION module is presented in this chapter. . . . . . 132 VerbMobil viewed as a summarizer. Message extraction methods are applied to the utterances of the dialogue yielding dialogue act and propositional content (Extraction). These are interpreted in context, forming topic-specific negotiation objects (Interpretation). The most specific accepted suggestions are then processed to produce a summary description (Summary Generation) consisting of formatting directives and German VITs. Depending on target language, the German VITs are eventually sent to the transfer component and, finally, verbalized by the existing generator of VerbMobil–GECO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 xxii

4.3

The speech recorder device . . . . . . . . . . . . . . . . . . . 137

4.4

The pipeline architecture of DiaSumm.

4.5

Reiter’s reference architecture . . . . . . . . . . . . . . . . . . 142

4.6

Sample plan operator a la’ Moore and Paris. This plan operator is used for persuading the user to do an act. . . . . . . . 145

4.7

Conceptual Architecture of the Summary Generator—SuGe— in VerbMobil. The new parts are marked with thicker lines: The actual summary generator and the templates. . . . . . . 150

4.8

The main predicate of the generation algorithm—gen-concept.156

4.9

A sample input structure for the summary generator. The root (pc-appointment) is a meeting with three filled roles: has duration, has date and has participants. A possible verbalization is “Speaker1 and speaker2 meet on the fifth of July. The meeting lasts 1.5 days.” . . . . . . . . . . . . . . 156

. . . . . . . . . . . . 139

4.10 Conditions for relating the generated values of roles of an appointment to a verb. . . . . . . . . . . . . . . . . . . . . . . 159 4.11 Some examples of mappings. . . . . . . . . . . . . . . . . . . 160 4.12 Excerpt from one of the German–English evaluation dialogues. Each block shows the spoken utterance (first row), recognized chain (second row), system translation (third row) and translation (fourth row—82 and 84 is a translation of the system translation whereas 83 is a translation of the spoken utterance). . . . . . . . . . . . . 164

4.13 Evaluation Results for four bilingual German–English dialogues assuming perfect speech recognition. . . . . . . . . . . 166 4.14 Evaluation results for 30 (10 English, 10 German and 10 German–English) dialogues using the output from our speech recognizers and segmentation by the prosody module. . . . . 167 4.15 Evaluation of five German–German dialogues, manually transcribed and processed by speech recognition. . . . . . . . . . . 168 4.16 Confabulation in summarization. The sources for confabulations are i) RECOGNITION: Output from the ASR are almost always incorrect. ii) EXTRACTION: The recognition of dialogue act and extraction of propositional content produce errors. iii) PROCESSING: The interpretation of the extracted information in context may yield wrong result. . . 171 4.17 Evaluation of five German–German dialogues, manually transcribed and processed by speech recognition. . . . . . . . . . . 175 xxiii

1

2

3

4

Numner of dialogue acts per turn for all turns for three sets of monolingual dialogues (German–German, English–English and Japanese–Japanese). . . . . . . . . . . . . . . . . . . . . . Number of dialogue acts per turn for three sets of monolingual dialogues. The upper figure describe turns containing the dialogue acts greet, introduce and bye whereas the lower turns not containing these dialogue acts. . . . . . . . . . . . . Number of dialogue acts per turn for turns containing the 10 multilingual German–English dialogues. Three curves are given: one for the turns containing the dialogue acts greet, introduce and bye, one for all other turns and, finally, one all turns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The sortal ontology in VerbMobil . . . . . . . . . . . . . . .

xxiv

195

196

197 214

List of Tables 2.1 2.2 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9

4.1 4.2

Distribution of task and dialogue initiative for a subset of the TRAINS91 corpus. . . . . . . . . . . . . . . . . . . . . . . . . A confusion matrix for two annotators . . . . . . . . . . . . .

38 48

Annotated CD-ROMs . . . . . . . . . . . . . . . . . . . . . . 75 Development of Speech Recognition during the VerbMobil project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Distribution of dialogue acts and length information for the German dialogues. . . . . . . . . . . . . . . . . . . . . . . . . 86 Distribution of dialogue acts and length information for the English dialogues. . . . . . . . . . . . . . . . . . . . . . . . . . 87 Distribution of dialogue acts and length information for the Japanese dialogues. . . . . . . . . . . . . . . . . . . . . . . . . 87 Distribution of dialogue acts and length information for all data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Mapping from dialog to negotiation act and operations . . . . 95 The different lookings and their corresponding dialogue acts. 119 The distribution of plan operators to move classes. There are a total of 303 semi-automatically acquired operators for the move classes. The class “domaindependent” is the result of compiling the dialogue act hierarchy into operators, whereas “top” consists of handwritten operators for games, phases, and top layer. Finally, “misc” contains plan operators for, e. g., maintaining the tree context. . . . . . . . . . . . . . . . 121 Some RST relations and their meaning . . . . . . . . . . . . . 143 The verbs of the VerbMobil semantic database and their corresponding suffix. . . . . . . . . . . . . . . . . . . . . . . . 153 xxv

4.3

4.4 4.5

The Category Task Contingency Table, visualizing TP, FP, TN and FN. The two columns stipulate the content of one feature of the dialogue. Either a feature (X) is present or not (φ). The rows constitute two distinct features X and Y , or no feature (φ). . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Evaluation of the complete system performance of DiaSumm. 170 Modified version of the Category Task Contingency Table. Additionally to TP, FP, TN, FN we introduce CFP representing the case where a feature not present in the dialogue has been introduced in the summary due to erroneous processing. X and Y are distinct features actually occuring in the dialogue whereas Z is a distinct but confabulated feature. φ represents no feature. CFP eventually occurs in two positions, namely in the case where feature Z is a confabulationbased error. . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

xxvi

Chapter 1

Introduction Contextual knowledge is necessary for understanding almost anything happening in the world; be it reading a sentence of a newspaper text or an article, listening to someone speaking or watching someone doing something. A computer system involved in some way in communication between, for example, two humans is no exception: it is essential for it to maintain a context for a correct interpretation or understanding of the observations made. This becomes evident in particular in short contributions. The knowledge that a question about the age of the dialogue participant has preceded the utterance “forty-two” makes the process of understanding an easy task. Another factor affecting the understanding is the domain in which the communication takes place. Knowledge about the domain often disambiguates the meaning of certain words, expressions and actions. The way contextual knowledge is modeled, organized and managed in a computer system depends on the task performed by the computer system. Most systems have some notion of topic and focus or accessibility of referents. Simple task-oriented systems employ no discourse context at all. This is the case for most applications based on simple dialogue systems based on finite state technology, e. g., VoiceXML (see http://www.voicexml.org/). In such simple systems the interpretation of each word and expression is unambiguous and therefore there is no need for, e. g., clarifications, maybe except in the case where the speech recognizer fails. The drawback being that the dialogues in which the system can participate might be unnatural or artificial. Engaged in a dialogue with another interlocutor, i. e., a human, a computer system can ask clarification questions to resolve ambiguities. This is not necessarily the case for a system mediating or eavesdropping a dialogue 1

between two humans. The latter is the case for a translation system for spoken language like the VerbMobil system. The VerbMobil system is a speech-to-speech translation system for spontaneously spoken negotiation in two domains: appointment scheduling, travel planning (including hotel reservation and entertainments). In a different mode, the domain of PC maintenance is supported. The final VerbMobil system translates between English, German and Japanese 1 . Such a system needs a module that has the responsibility of maintaining contextual information for supporting translation. Whereas the translation between languages within the same family (e. g., roman languages) is easier, more distant languages are harder to translate. One reason for this has to do with how the world is viewed in the cultures of the respective languages. For some cultures, there are situations or phenomena which can be described by fixed phrases or sometimes even a single word (see below). More related languages, like English and German, require less contextual information for an approximatively correct translation as, for example, German and Japanese. When translating to and from Japanese it is important to know, for example, who is speaking to whom and what social relation is between the speakers. The latter is important since, for instance, politeness is in Japanese a highly complicated task. Especially for a non-Japanese. This example is picked up by Levinson in his efforts in defining pragmatics. In (Levinson, 1983, page 10), he talks about language specific pragmatics: “. . . for example, the pragmatics of English might have relatively little to say about social status (beyond what we need to describe the appropriate contexts for the use of sir, your honour and the like), while in contrast the pragmatics of Japanese would be greatly concerned with the grammaticalization of the relative social ranks of the participants and referents.” In computer systems aimed for participating in a dialogue in some way, the contextual information is maintained in a part of the dialogue manager we call the discourse memory or discourse manager. It comprises data structures—which is used to record important parts of the dialogue—and algorithms used to update the content of the data structures or the discourse state for supporting the processing of other components in the system. One of the tasks of the discourse manager is to interpret sensory perception in context. These kind of interpretations are crucial for the functioning of the 1

Additionally, Phillips and Siemens continued to integrate Mandarin into the system.

2

dialogue system. In man–machine dialogue systems, the discourse manager is often part of a module called the dialogue manager. The dialogue manager has the responsibility of initiating actions while confronted with stimuli. A typical action is the reaction on a user contribution by accessing an external resource, e. g., a database and then, depending on the outcome of the access, the initiatiation and possibly the generation of a response. Contrary to such dialogue systems, in VerbMobil there exists no dialogue manager2 in a traditional sense. This is since the VerbMobil system does not control the dialogue but merely mediates the dialogue. The reasons for this is multifarious but one of them is that experiences gained in so-called Wizard-of-Oz (WOZ) experiments (Dahlb¨ack, J¨onsson, & Ahrenberg, 1993) showed, that a translation system intervening too often is not accepted by the users of the system (see chapter 3). Despite its mediating role in VerbMobil, the dialogue manager has the task of keeping track of what has been uttered (content) and attitude towards the content. In case the content of the dialogue memory remains intact throughout the dialogue, it is possible, for instance at the end of the dialogue, to recapitulate or rephrase the dialogue. Additionally, by making use of the attitudes uttered towards the content of the utterances it is possible to construe the result of the negotiation. The user has then a document either affirming what was said and translated during the course of the dialogue, or functioning as a reminder. The list of challenges and topics addressed by the VerbMobil project is very long, but we would like to highlight some particularly important and interesting ones by giving a short introduction of the state-of-the art and challenges: Speech Recognition At the time VerbMobil started in 1993, speaker dependent isolated word recognition with push-to-talk technology for very limited vocabulary size was deployed. During the course of the project, we witnessed an impressive development resulting in open-microphone continuous speaker-independent large vocabulary speech recognition. Some of the reasons for this were new modeling techniques, bigger and better corpora where enough of training material was available. Despite these advancements, the recognition rate is still far from perfect. In fact, the speaker independent, continuous speech 2

Throughout the thesis we will use the terms “dialogue manager”, “dialogue module” and “discourse manager” interchangeably for denoting the dialogue module of VerbMobil.

3

recognizers used in the final system have a word accuracy performance of between 76% and 89% (Waibel, Soltau, Schultz, Schaaf, & Metze, 2000). Additionally, spoken language has no delimiters, like full stop or comma. Therefore, a system dealing with multi-sentence contributions has to split the input stream of words to chunks of words corresponding to what often are referred to as utterances. This process is called segmentation. However, the segmentation of a contribution is not always correctly performed. Spontaneously Spoken Dialogue Even though linguists and socio-linguists have been engaged in analyzing real-world situations, its formal descriptions in terms of, e. g., large grammars is unsolved. One of the unanswered questions before the VerbMobil project started was how people would behave while engaged in a negotiation interpreted by a machine instead of a human interpreter. A related example is that of, e. g., (Lakoff, 1973), where the use of so-called indirect speech acts (see section 2.3.1) are imposed by the “rules of politeness” saying that one should be clear and be polite. In case these rules give raise to a conflict, people tend to err on the side of politeness. But how does this affect the behaviour of a human being while talking to another human being via a machine? One of the findings of this thesis is that people behave very differently using the VerbMobil system. This does not necessarily have to do with politeness, but rather other factors, like limited system performance compared to a human interpreter. Therefore, on the one hand, the structure of the multi-lingual dialogue itself as well as the contributions in the multi-lingual setting are simpler. On the other hand, most of the mono-lingual data in the VerbMobil corpus show more freedom in structure and language. Machine Translation Despite considerable efforts worldwide, at the project start, machine translation (MT) was (and still is) far from mature. In particular, the combination of MT and spontaneously spoken language seemed very challenging. The general task of MT is to translate an expression from some source language (S) into one a target language (T). Within MT, one often refers to a diagram depicted in figure 1.1 called the “MT triangle” (Vauquois, 1975). There, “S” stands for Source, “T” for Target whereas “I” stands for Interlingua. The MT triangle is often used while characterizing an MTsystem or an approach to MT in terms of, e. g., interlingua and transfer, 4

or while discussing the tradeoffs involved in MT. In an ideal world, there is some language-neutral representation—interlingua—which can be used as intermediate representation. Translation would then consist of mapping the source onto the interlingua representation (analysis) and then map the interlingua representation onto the target representation (generation). Now, this theory is far from realizable in practice; in fact, the general consensus is that there will be no interlingua translation devices within the near future. This statement is based on the observation that even if a language analysis component is faced with a (restricted) domain which is known in advance, analysis competitions, e. g., MUC3 often reach results in the area of 30– 95% depending on the task. Instead, so-called transfer approaches are used where translation consists of mapping the source to the target possibly via some intermediate representation but where special knowledge about that particular language pair is used. I

S

T

Figure 1.1: The Vauquois triangle While it is easier (although not an easy task!) to translate between related languages, translating between, e. g., German and Japanese is more difficult. This has to do with a number of factors, like the different structure of the languages, honorifics, the way the world is described in terms of concepts, etc. European languages have of course, partly relatively small differences ranging from honorifics 4 to translation mismatches (see below). Natural languages have different means of describing real-world entities by means of words and grammatical constructs. One language, or maybe better “culture” might use a single word for a certain real-world entity or action while another language might need several words. A simple example is the Swedish word blunda which translates to “eyes closed.” The most 3

MUC = Message Understating Conference. As I left, first person singular was always used to address a stranger in Sweden but this is never the case in German. 4

5

striking example is probably the Feugian (South American) word Mamihlapinatapai which translates to “to look at each other, each hoping the other will offer to do something which both parties much desire done but which neither is willing to do.” As we will see below, one has to have knowledge about context, and sometimes even a different view of how things are functioning to be able to correctly interpret and hence translate an expression. The notions of translation mismatches and translation divergences, e. g., (Kameyama, Ochitani, & Peters, 1991a; Siegel, 1996), describe the phenomenon where an expression in the source language either does not contain enough information for a correct translation, or the target language can not express the meaning of the source. There are several classes of translation mismatches (see also (Kameyama, Ochitani, & Peters, 1991b)): Number and definiteness In Japanese such information is often omitted. Example: “hon wo yomimashita” which can be translated to “I read {the,a,’ ’} book(s)”. Perspective In Japanese, Jibun (self) can mean {my,your,him,. . . }self depending on context where, however, the reference can be far away in context. jibun no accounto means “account of self.” Lexical translation mismatches Some words have no direct translation or are ambiguous. Some words mean something in between the concepts of related words. An example is the Japanese word e which means “painting” or “drawing” but not “photo;” it is more specific than “picture,” but more general than “painting” and “drawing.” The English word “go” corresponds, depending on context, to the German words “gehen” or “fahren.” Speaker perspective In Japanese, honorifics plays an important role. Dependent on the relations between the speakers, a word like “give” has many different translations, e. g., ageru, morau, kureru, kudusaru, sashiageru. Therefore the translation from English to Japanese can be problematic. These phenomena are not unique for the language pair Japanese – English. The Swedish utterance jag v¨ antar p˚ a min kompis is ambiguous in the sense of speaker perspective. The default interpretation would read something like I’m waiting for my buddy, but the syntactic information allows for the reading I’m waiting on my buddy. In German, the buddy can be in accusative case—Ich warte auf meinen Kumpel—indicating a more situative 6

meaning, or in dative—Ich warte auf meinem Kumpel—indicating that the speaker is physically located on his buddy waiting for someone/something else. Whereas humans develop strategies for dealing with these kind of ambiguities, an MT system translating this into German has to resolve the perspective. A related phenomenon is that some sentences cannot be translated from one language to another using a one-to-one sentence approach. In some cases one needs more than one sentence in the target language to express the content of the source sentence correctly. Another interesting characteristic of language hides behind the term idioms. They are not necessarily hard to translate. However, they need to be identified since the translation or maybe better analysis—if there is one—is not based on compositionality but on a fixed meaning. Finally, there are other, more subtle challenges when dealing with language and translation which has to do with how language is used and how one communicates. My own experience (from being now 10 years abroad) is that in some situations there are no satisfying expressions in my mother tongue. There are translations or approximate expressions—yes—but they simply do not fit a 100%.

1.1

The VerbMobil Project

Prior to the actual project, a feasability study answering the question whether “VerbMobil was an appropriate goal to pursue.” (Kay, Gawron, & Norvig, 1991, page 2) was conducted. The study went through the state of the art in the main research fields for the project—machine translation and speech recognition—with a fine-toothed comb. An important part of the study is the chapter containing the recommendations of how the project should be conducted. The actual VerbMobil project was, at the time of writing, the largest European AI project (Wahlster, 2001). Starting in 1993, about 900 person years were spent in sub-projects ranging from speech recognition via analysis, transfer and generation to speech synthesis, corpus collection and annotation. Its first phase employed 125 persons per year for four years, whereas its second phase engaged about 100 persons per year over the next four years. At the end of the project in year 2000, a total of $80 million (private and public) funding was spend. Besides the actual system (see below) the project produced a big corpus of both monolingual as well as bilingual dialogues. This is one of the 7

reasons behind the success of the system: practically unlimited amounts of (annotated) data5 for the training of statistical-based modules and functionalities. In total, 180 hours of dialogues were recorded (see (Karger & Wahlster, 2000) and the Bavarian Archive for Speech signals—BAS 6 ). The corpus is available on CD-ROMs.

1.1.1

The VerbMobil scenario

Following other research projects, the VerbMobil project narrowed down the task of the system by means of a scenario. The basic setting was two business persons, negotiating a meeting possibly including a trip (to the meeting), accommodation and entertainments, e. g., having a dinner or visiting a theater. An Example Dialogue Throughout this thesis we will refer to a dialogue called “Reinhard4” as shown in figure 1.2. Although the dialogue is not taken from our corpus but is a constructed dialogue, we have chosen to use it since it contains most interesting phenomena which are of our concern. In particular, the summary generated for this dialogue (see chapter 4) contains more or less all information mentioned in the dialogue indicating that the discourse processing (see chapter 3) was successful. One of our concerns in this thesis is that of generating summaries. In our view, a summary should contain the agreed-upon items of the dialogue. Such a (German) summary7 for Reinhard4 is shown in figure 1.3. A more complete version of Reinhard4 annotated with linguistic information as processed by the system is found in the appendix. There, also 5 Some statisticians are probably protesting and I agree: Yes, for some tasks you cannot have enough data, but for the training of, e. g., the dialogue act recognition component there were enough! 6 At the time of writing this thesis, their English home page is http://www.phonetik.uni-muenchen.de/Bas/BasHomeeng.html 7 An English translation is: Participants: Speaker B, Thompson Date: 6.8.2002 Time: 3:12 pm to 3:13 pm Theme: Trip with accommodation and recreational activities Result Scheduling: Speaker B and Thompson agreed on a business meeting on the 20th of January 2002 at 11 am in Hanover. Speaker B and Thompson will meet on the 16th of January 2002 at 9:30 at the train station. Traveling: A trip was agreed upon. The outward journey to Hanover by train starts at 5 pm on the 19th of January 2002. Accommodation: A hotel in the city center was agreed upon. A single room costs 80 Euro. Thompson takes care of the reservation. Entertainment: A dinner at a restaurant was agreed upon. Thompson will reserve a table.

8

G1a

sch"onen Guten Tag

E2a b

hello this is Thompson speaking hello hello Mr. Schneider

G3a b

ja es geht um das Gesch"aftstreffen in Hannover das ist ja am zwanzigsten Januar um elf Uhr vormittags

E4a

so we have to leave Munich at six o’clock

G5a b

vielleicht fahren wir lieber den Tag davor da gibt es einen Zug um zwei Uhr

E6a b

I would prefer to leave at five I am pretty busy that day

G7a b

ja gerne, k"onnen wir machen dann brauchen wir noch eine "ubernachtung

E8a

yes

G9a b

ich kenne ein gutes Hotel im Stadtzentrum ein Einzelzimmer kostet achtzig Euro

E10a b

that sounds pretty good can you please do the reservations

G11a b

sicher dann machen wir ein gemeinsames Abendessen in einem Restaurant ich werde einen Tisch reservieren

E12a b

that would be nice let us meet at the station on Wednesday

G13a

um halb zehn am Bahnhof

E14a

good see you then

G15a

bis dann

Figure 1.2: “Reinhard4”—The sample dialogue

9

the content of the discourse memory is found.

1.1.2

The VerbMobil system

One of the features of the VerbMobil project was that apart from the corpus, numerous publications, a book (Wahlster, 2000) (see also (Kay et al., 1991)) etc. the project produced a working system - the VerbMobil system (see figure 1.4). At the time of writing—three years after the end of the project—the system is still alive and can be tried out while visiting DFKI. The system is running on a SUN Ultra-Sparc 80 with 4 processors (450 MHz), 2 GB main memory, 8 GB swap, no special signal processing hardware, Desklab Gradient A/D converter or Sun internal audio device and close-speaking cordless microphones. Figure 1.5 depicts the graphical user interface (GUI) of the system. With some exceptions, the buttons correspond to one or more modules contributing to language processing in some way. Between the buttons, there are paths indicating the main data flow between different modules. The bottom row buttons correspond to the modules for processing input and output to and from the system. Above this row, to the (lower) left, the different speech recognition buttons together with the prosody button (“prosodic analysis”) are positioned. Above to the upper left, one finds the integrated processing and the different more linguistic oriented translation tracks (“semantic construction” etc). To the right of that, the deep linguistic translation track (“dialog semantics,” “transfer” and “generation”) is located. Below that, the focus of this thesis—hidden below the button “dialog and context evaluation”—is found. The shallow translation tracks—“statistical translation” and “case-based translation” are, as indicated by the data flow paths, processing the output directly from the prosody analysis. One important module—the selection module—has no button. Finally, to the lower right, the different synthesizers are found. Furthermore, the VerbMobil system actually consists of many more modules, e. g., technical ones responsible for message passing, logging etc (Kl¨ uter, Ndiaye, & Kirchmann, 2000). The final system processes a vocabulary size of over 10.000 word forms for German, almost 7.000 for English and about 2.000 for Japanese. Obviously, the main focus has been on the two language pairs English–German and German–English and we will, in fact, focus on this language pair in this thesis. One of the reasons of VerbMobil’s success was its parallel translation tracks deploying different translation strategies all with their own characteristics when it comes to advantages and disadvantages. In (Wahlster, 2001), 10

VERBMOBIL ERGEBNISPROTOKOLL Nr. 1 ______________________________________________________________________ Teilnehmer: Sprecher B, Thompson Datum: 6.8.2002 Uhrzeit: 15:12 Uhr bis 15:13 Uhr Thema: Reise mit Treffen Unterkunft und Freizeitgestaltung ______________________________________________________________________ GESPR"ACHSERGEBNISSE:

Terminabsprache: Sprecher B und Thompson vereinbarten ein Gesch"aftstreffen am 20. Januar 2002 um 11 Uhr am Vormittag in Hannover. Sprecher B und Thompson treffen sich am 16. Januar 2002 um halb 10 am Bahnhof.

Reiseplanung: Eine Reise wurde vereinbart. Die Hinfahrt nach Hannover mit der Bahn beginnt am 19. Januar 2002 um 5 Uhr am Nachmittag.

Unterkunft: Ein Hotel im Stadtzentrum wurde vereinbart. Ein Einzelzimmer kostet 80 Euro. Thompson k"ummert sich um die Reservierung.

Freizeit: Ein Essen in einem Restaurant wurde vereinbart. Thompson k"ummert sich um die Reservierung. ______________________________________________________________________ Protokollgenerierung automatisch am 6.8.2002 15:15:58 h

Figure 1.3: Reinhard4 - The summary

11

Figure 1.4: A demonstration of the VerbMobil system.

the evaluation measure used in VerbMobil is described: “We call a translation “approximately correct”, if it preserves the intention of the speaker and the main information of his utterance.” Given this, the multi-engine approach consisting of the translation tracks as described below, contributes to an average translation quality of 85%. The translation tracks are: Example based translation Trained on the aligned bilingual VerbMobil corpus, the example based translation track (Auerswald, 2000) delivers a relatively good translation as long as the input matches the training data. Dialogue-act based translation By robustly extracting the core intention in terms of dialogue act together with propositional content, this translation track (Reithinger & Engel, 2000) is robust against, in particular, speech recognition errors. 12

Figure 1.5: The VerbMobil graphical user interface (GUI)

Statistical translation Trained on the aligned bilingual VerbMobil corpus, the statistical translation track (Vogel et al., 2000) delivers a high approximate correctness especially for the language pair German– English. Substring-based translation By combining statistical word alignment with precomputation of translation chunks together with contextual clustering, the substring based translation track (Block, Schachtl, & Gehrke, 2000) guarantees a translation with high approximate correctness. Semantic-based translation Also known as the deep translation track, the semantic-based translation track deploys a classical computational linguistics approach using deep analysis (Schiehlen, 2000; Flickinger, Copestake, & Sag, 2000; Siegel, 2000; Rupp, Spilker, Klarner, & Worm, 2000; Pinkal, Rupp, & Worm, 2000; Bos & Heine, 2000), 13

semantic transfer (Emele, Dorna, L¨ udeling, Zinsmeister, & Rohrer, 2000) and generation (Becker, Kilger, Lopez, & Poller, 2000a). This track provides a high quality translation in case of success. Next, we characterize the system by relating the different translation tracks to the translation triangle as described above. Depicted in figure 1.6, the different translation tracks are positioned according to to their functioning. Obviously, three translation tracks—example-based, statistical and substring-based—are rather shallow. The semantic or transfer based is more abstract, but is still far from denotable as interlingua. Actually, the dialogue act based translation is the translation track that is most interlingua-like. The reason for this characterization is that this track is based on two language independent information carriers, namely the dialogue act for representing the intention and the VerbMobil domain model for representing propositional content. InterLingua

DiVe

is An

ASR

n

VM−Transfer Semantics

Syntax

tio

ra

al

ys

ne

Ge

DA−based

Generation Substring−based Statistical Example based

Source

CTS Target

Figure 1.6: Relating VerbMobil to the Vauquois triangle.

14

The VerbMobil Interface Term - VIT For the rest of the thesis we use a description language for linguistic information called VerbMobil Interface Term or short “VIT”. The VIT is a uniform data structure serving as the linguistic information carrier mostly between the modules within the so-called deep processing track in VerbMobil. It encodes different linguistically motivated pieces of information, like (most notably) semantics, morpho-syntax, syntactic tense, semantic sorts, scope, prosody etc. But also the analyzed surface string and dialogue act is part of a VIT. As written in the introduction of (Dorna, 1996): “This information is linked to semantics and can be used for computing semantic tense, for disambiguation of under-specified analyses, for guiding semantic evaluation such as anaphora resolution, for adjacency or linear precedence determination, and for many more.” Figure 1.7 shows an example of the textual representation of the VIT representing the German sentence “das ist wundersch¨ on” which translates approximately to That is beautiful—depending on context. Figure 1.8 shows a graphical representation of an English VIT.

1.2

Main Scientific Questions

Approaching the task of dialogue modeling in a speech–to–speech translation system for spontaneously spoken language, we put together a catalogue of unanswered research questions: Representation issues How can we build up, maintain and exploit a representation of the dialogue in a way suitable for incremental processing and the support for translation and dialogue processing? Controlling versus mediating What consequences for dialogue management are there due to the mediating role of VerbMobil? Spontaneous speech What consequences on dialogue processing are there due to spontaneous speech? Multilinguality What additional effects are there on dialogue processing due to the multilingual scenario? Minutes and Summarization How can documents mirroring the result of the course of the negotiation be construed? 15

vit( vitID(sid(2,a,de,0,10,1,de,y,synger), [word(das,1,[l28]), word(ist,2,[l26]), word(’wundersch"on’,3,[l25])]), index(l24,l25,i7), [stat(l25,i7), wunderschoen(l25,i8), decl(l24,h4), pron(l28,i8)], [leq(l23,h4), in_g(l26,l23), in_g(l28,l23), in_g(l25,l34)], [], [prontype(i8,third,demon)], [gend(i8,neut), num(i8,sg), pers(i8,3), cas(i8,nom)], [ta_mood(i7,ind), ta_perf(i7,nonperf), ta_tense(i7,pres)], [pros_mood(l24,decl)] )

% SegmentID % WHG String

% Index % Conditions

% Constraints

% Sorts % Discourse % Syntax

% Tense and Aspect

% Prosody

Figure 1.7: Textual representation of the German VIT representing the sentence “das ist wundersch¨ on.”

16

Figure 1.8: Graphical representation of a sample VIT. representing the sentence Das Treffen dauert 1.5 Tage. which translates to The meeting lasts 1.5 days.

Evaluation What metrics and methods can be used for evaluation? In addition to recognition and translation performance, we are particularly interested in the performance of the summarization functionality.

1.3

Thesis Organization

In the next chapter (chapter 2), we introduce some terminology followed by an overview over related and relevant research in the area of dialogue modeling. We are especially looking at the representation of utterances but also at relations between utterances and higher level structures. Before we put forward our own theory (sections 2.5 – 2.7) we briefly mention a way of verifying a theory by means of corpus annotation. Chapter 3 starts with a look at some features of the VerbMobil corpus followed by a presentation of the performance of speech recognition and its effects (section 3.3). Thereafter, we summarize the main tasks and sketch the implementation of the dialogue module of VerbMobil in sections 3.4 and 3.6. The sections 3.7 and 3.8 contain a detailed description of the processing of the two main data structures of the dialogue module - the thematic structure and the intentional structure. 17

The third main pillar of the thesis—multilingual summary generation— is the content of chapter 4. There, we summarize relevant work in section 4.2 (summarization) and 4.3 (generation). In section 4.4, the summary generator is presented. Before we conclude the chapter, we evaluate our dialogue module by evaluating the summary generation functionality in section 4.5. Chapter 5 concludes the thesis and we point at some future directions. Finally, the appendix contains two sample dialogues and their processing in the working system. Also, some linguistic information, e. g., the VerbMobil sortal ontology and some samples from our annotation manuals are given.

18

Chapter 2

Dialogue Modeling The goal of this chapter is to lay the foundations for the dialogue modeling used by the dialogue component in VerbMobil. We start by surveying the relevant history and state-of-the-art in dialogue modeling. In particular we are interested in how we can represent chunks of communication, e. g., utterances, moves and turns in a language-independent way. We therefore consult work in the area of speech act theory and conversational games. It is important that we develop a theory which is tailored for the application to a working system processing spontaneous spoken language. Therefore, on the one hand, the model we finally end up with has to be fine grained enough to draw relevant inferences, but on the other hand it should not require that much detail that we will fail to use the model due to the processing circumstances, i. e., false recognition. In section 2.3 we therefore identify models capable of contributing to a model of negotiation dialogues in the VerbMobil scenario. Section 2.4 surveys an important part of theories in general: corpus annotation, i. e., the process of, given a theory, annotating collected linguistic material with the theory. There we discuss more thoroughly, how we measure how well a corpus is annotated, and if we can be sure that, given the measurement, the theory is valid. Finally, we put forward our own model of negotiation dialogues in the VerbMobil scenario in sections 2.5–2.7. Whereas the first of these sections (2.5) has an introductory character, the second and third are concerned with two means for analyzing our negotiation dialogues. The former serves the purpose of characterizing a tree-like structure spanning over the whole dialogue. The goal of the latter is to contribute to, e. g., translation and the tracking of the objects being negotiated. Next, however, we give an introduction to this chapter with some historical remarks and 19

some notes on the VerbMobil scenario (section 2.1). Section 2.2 introduces some of the terminology used, in particular, in this chapter but also throughout this thesis.

2.1

Introduction

Until the VerbMobil project started, most of the work within the NLP community concerned with the implementation of systems had focussed on dialogues between a human and a computer (henceforth man–machine dialogue). The first man–machine dialogue system was probably ELIZA (Weizenbaum, 1966). It is a computer program that communicates using natural language for playing the role of a psychiatrist. Among the fundamental issues dealt with, there is, e. g., identification of key words and discovery of minimal context. A milestone in the history of AI programs was SHRDLU, e. g., (Winograd, 1972). SHRDLU—developed in the late sixties1 —is a computer program that controls a robot arm in a blocks world. The program understands and reacts on a wide variety of user contributions or speech acts (see below), including statements about, e. g., ownership of the items in the world. Additionally, it can answer questions about the reason for actions. Simple anaphorical expressions are resolved and questions concerning why a particular action was performed are answered. SHRDLU was one of the first systems that used procedural semantics, e. g., (Woods, 1981). Procedural semantics means that a certain natural language expression is associated with a piece of program code. Question-answering (Q/A) systems are systems assuming each input is one question and which produce an answer. One of the first Q/A systems was LUNAR (Woods & Kaplan, 1977). It was the first computer program that potentiated typing questions to a data base using natural language. LUNAR translated English questions about the chemical analysis of moon rocks into data base queries with a vocabulary site of about 3500 words. The syntactic analysis handled phenomena like tense, modality and even some anaphora and conjunctions. The processing of anaphora reveals the presence of some notion of discourse memory. Notable is that LUNAR is one of the first systems which was evaluated: Out of 111 questions, 78% were correctly answered. Later on, the TRAINS system, e. g., (Ferguson, Allen, Miller, & Ringger, 1996) allowed for spoken language. LINLIN (J¨onsson, 1993) was an attempt to develop an typed interface to a travel agency. Both TRAINS and LINLIN belong to a category of dialogue systems which have to process ellipses and anaphor1

See also http://hci.stanford.edu/cs147/examples/shrdlu/

20

ical expressions and hence require a more or less elaborate discourse model capable of resolving these kind of phenomena. One of the primary goals of these projects was to develop computer systems facilitating man–machine communication using written or spoken language. The modeling within these systems were aiming for the use, or direct implementation of dialogue systems. However, at some sites in Japan, Germany and USA, research on human–human models applied to, e. g., speech translation systems similar to that of the VerbMobil system has been carried out (Iida & Arita, 1992; Levin et al., 1995). A wide range of models for describing certain aspects of dialogue or communication has been proposed in the past. Some of them are built on a quite simple dialogue structure, imposed by, for example, technical limitations of the system which the model is aimed at. Others, not necessarily more complicated models, have been developed from a more theoretical perspective. Within some of these models, sub-models characterise certain aspects of the dialogues, such as, clarification sub-dialogues. One of the reasons for the quite exhaustive modeling of clarification sub-dialogues are, for example, due to limitations of speech recognition, where, even today the best speaker independent speech recognizers perform quite badly (see section 3.3). Admittedly, for limited vocabulary and especially for speaker dependent recognition, current systems reach recognition accuracy in the region of 95% and more. Another good reason for the exhaustive modeling of clarification dialogues is when the dialogue system is not allowed to make any errors. This typically occurs when the dialogue system is involved in a critical task such as money transaction. For many systems dealing with spoken dialogue, short, single or two sentence user contributions were the basis of models and theories. A common approach for processing user utterances is to represent them with speech acts together with the propositional content. Speech acts (or dialogue acts) can be viewed as labels mirroring the intention behind the utterance, e. g., posing a request (see section 2.3.1). The propositional content is us used to represent the semantic content of an utterance (without looking at the intention). The propositional content is commonly represented using, for instance, some logical form or frame-based formalisms (see section 2.7). Example of systems advocating the speech acts and propositional content approach are SHRDLU and TRAINS. There are, however, exceptions to this approach such as the one taken in the sundial project (McGlashan et al., 1992; Heisterkamp & McGlashan, 1996) where the result of the analysis is semantics as feature structures (SIL) only. Aiming for the translation of spontaneous speech within a negotiation 21

scenario, VerbMobil (Wahlster, 2000) is one of the first implemented broad coverage systems concerned with mediating human–human dialogues. The final system is able to translate between the three languages English, German and Japanese using a vocabulary of 10,000 for German, about 6,900 for English and around 2,500 for Japanese. 2 One of the efforts of the project was the large data collection (Karger & Wahlster, 2000) which resulted in 3,200 dialogs (German: 1,454; English: 726; Japanese: 1,020) of transcribed mono- as well as multi-lingual negotiation dialogues. Different parts of the corpus has been annotated with various linguistic information which has opened up the way for machine learning approaches. The VerbMobil Scenario The VerbMobil dialogues are negotiation dialogues between two business people, involved in the negotiation of a meeting time/place, travel planning, hotel booking and leisure time activities. In the VerbMobil setting no barge-in, i. e., the dialogue participant is interrupting the system during translation, is allowed. Instead, one interlocutor is speaking at a time and the contribution is then analyzed, translated, and verbalized in the target language by VerbMobil. After the translation has been synthesized, the next contribution is allowed to be uttered. One such contribution from one interlocutor is called a turn. A turn is possibly divided into segments which sometimes coincide with utterances or even sentences. In what follows, we will use the terms segment or utterance interchangeably to denote these parts of a turn. It is assumed, that each segment can be annotated with at least one dialogue act. A segment consists of at least one word as recognized by one of the speech recognizers or as transcribed in the VerbMobil corpus, see (Burger, 1997; Alexandersson et al., 1998). In the running system, it is assumed that each segment should be translated. Our use of turn differs from that of, e. g., (Allwood, 1994) who uses turn to denote that a speaker is “holding the floor.” Using this definition it is thus allowed for another speaker to intervene with, for example, back-channeling utterances. Instead we have a more practical and technically oriented usage: it is not allowed for the listener to intervene during speaking or in the pause imposed by the system during processing. Interestingly, human interpreters choose a translation strategy which is not based on a word-by-word translation or even sentence-by-sentence translation but is rather based on something we can be denoted abstraction 2 Additionally, Siemens and Phillips has continued to integrate Mandarin into the system.

22

Original utterance Oh, Moment, ich glaube, Freitag habe ich einen festen Termin, da kann ich leider nicht. Also Freitags kann ich nicht, ich kann dienstags, mittwochs und donnerstags. Ham Sie da vielleicht noch einen Termin frei? Literal translation Oops, one moment, I think Friday I have a regular appointment, unfortunately I can’t then. So, Fridays I can’t. I am free Tuesdays, Wednesdays and Thursdays. Are you free then? Interpreter Translation Friday is impossible, but Tuesday, Wednesday, Thursday is okay. Figure 2.1: Example of reduction

or reduction of the contribution. This is illustrated by the example in Figure 2.1. It shows a turn with its literal translation, and its translation by a human interpreter (Schmitz & Quantz, 1995). The interpreter does not translate the individual segments but rather renders the intended interpretation of the turn. Worth noting is that this reduction of course eliminates performance phenomena like hesitations (“oh”) or repetitions (“also freitags kann ich nicht”), but also turn-giving segments (“Ham Sie da vielleicht noch einen Termin frei?”). The two core intentions of the contribution are, however, retained. The first part of the turn is backward looking; it refers back and responds to the previous proposal. The second part is concerned with displaying to the hearer a new, forward looking proposal. The interpreter chooses in this case not to translate the request (are you free then?) but instead a quite neutral translation. But also here the core intention is kept. Note, that such an approach to translation violates the basic assumption mentioned above that each segment should be translated. Our corpus of negotiation dialogues are cluttered with such pairs of a forward looking (part of a) turn which is reacted upon in a backward looking (part of a) turn. Such patterns are well-known and described as adjacency pairs, language games and conversational games by several researches and philosophers, e. g., (Wittgenstein, 1974; Schegloff & Sacks, 1973; Kowtko, Isard, & Doherty, 1993). It is possible to characterize our dialogues on other levels. For instance, our negotiation dialogues can be divided into phases such as greeting or 23

negotiation phase. In most dialogues 3 the speakers start by greeting each other. Then, the topic of the dialogue might be mentioned 4 followed by the actual negotiation which can be divided into several negotiations. Finally, the result of the negotiation is sometimes concluded and there is some kind of farewell phase. A salient phenomenon is the difference in negotiation style: the American English part of the corpus shows more efficiency in that one speaker often verbalizes parts of his diary by posing several, alternative possibilities. The listener then just has to choose. The German speakers are more verbal and take their time, even asking the listener to suggest, for example, a date. For the Japanese part of the corpus other behaviour occurs, which is almost exotic to us. A suggestion not suitable for the listener is not directly rejected. Rather, it is commended, and rejected because of the adjacent suggestion from the listener (see also section 3.2, page 75).

2.2

Some terminology

In this section we summarize and discuss some terminology important for this thesis. Important is the definition of pragmatics. We slightly re-define pragmatics and, given this definition, we will show in the next chapter that we are in fact, with our work described here, able to at least partly carry out some of the challenges hidden behind this term.

2.2.1

Syntax, Semantics and Pragmatics

The trichotomy of syntax, semantics and pragmatics, originally introduced by Charles Morris (Morris, 1938), is one of the most popular ways of discriminating the study of human language or human language communication respectively. There are a wide variety of definitions of these concepts, especially since Morris’ original definition of, for instance, pragmatics has split into several disciplines such as socio-linguistics, psycholinguistics, etc. Nevertheless we will recapitulate his original and some currently used definitions: • Syntax: 3

There are exceptions to almost every regularity described here. The topic is not negotiated - the instructions of the participants contains no alternative to the negotiation scenario. But some - I guess hungry or, maybe, bored subjects - try to solve the task as fast as possible and pose the first suggestion without greeting or considering the topic even being worth to mention. 4

24

In his classical work Morris describes syntax as “the study of the formal relation of signs to one another.” A less formal description could be something like: syntax is used to denote the study of how words can be put together to form phrases, utterances or sentences in some natural language (cp. with syntax in formal languages). • Semantics:

According to Morris, semantics stands for “the study of the relation of signs to the objects to which the signs are applicable.” Another view of semantics is “the study of situation independent meaning of utterances.” To achieve this, one usually uses truth, reference and logical form.

• Pragmatics:

Morris states: “the study of the relation of signs to interpreters.” (Levinson, 1983) devotes a chapter to the discussion of the definition of the word pragmatics. One of these is: “pragmatics is the study of those principles that will account for why a certain set of sentences are anomalous, or not possible utterances.” (Mey, 2001) uses the definition “Pragmatics studies the use of language in human communication as determined by the conditions of society.” We argue that pragmatics stands for The study of the principles for how non-anomalous communicative acts make sense. Another possible definition of pragmatics is – contrary to the definition of semantics above – “pragmatics deals with the situation dependent meaning.” We will return to our definition of pragmatics in chapter 3. There we describe a method for answering the question whether an utterance is pragmatically relevant or not given the state of a discourse memory.

Propositional Content What is a proposition (lat. pr¯ opositi¯ o )? What is “propositional content?” (Bußmann, 1990; Honderich, 1995; Reber, 1996) present the following definitions: • (Honderich, 1995, proposition (-al content)): “The precise formulation varies, but a proposition, or propositional content, is customarily defined in modern logic as ‘what is asserted’ when a sentence (an indicative or declarative sentence) is used to say something true or false, or as 25

’what is expressed by’ such a sentence. The term is also applied to what is expressed by the subordinate clauses of complex sentences, to forms of words which, if separated from the complex sentences of which they are part, can stand alone as indicative sentences in their own right. Accordingly, such sentences and clauses are often called ’propositional signs’. . . . ” • (Reber, 1996, proposition): “4. A linguistic proposition is a formal statement that represents a component of the underlying meaning of a sentence. Here, the sentence ’apples are red’ would be represented as (apple, all, red). The notion of truth here is irrelevant; the concern is with whether or not the proposition provides an accurate characterisation of the underlying meaning of the sentence being analysed.” • (Reber, 1996, propositional content) “In linguistics, the full set of propositions (4) expressed by a sentence or paragraph or extended discourse.” • Hadumod Bußmann (Bußmann, 1990) According to Bussman, the term proposition has evolved out of the philosophy and logic. Later, the term has been adopted by the research fields linguistic semantics and speech act theory. In (Bußmann, 1990) her basis is on the work of, e. g., Russel, Austin and Searle. In her opinion, the proposition is: der sprachunabh¨angige, bez¨ uglich des Illokutionstyps neutrale gemeinsame Nenner der Bedeutung von S¨atzen bezeichnet, die das Zutreffen eines Sachverhalts zum Inhalt haben. A possible translation5 is . . . the language independent meaning without taking the illocutionary force into account that assert the truth to the state of affairs under some circumstance. 5

A well-known online machine translation facility related to space fishes makes the following out of the quote: languageindependent common denominators of the meaning of sentences, neutral concerning the Illokutionstyps, designates, which cover an applying of circumstances.

26

To conclude, the propositional content of, for instance, a sentence seems to correspond with what is stated without looking at the illocutionary force (see section 2.3.1). In the following examples the proposition remains the same despite the different illocutionary force and sentence mode: (1)

Jan smiles casually

(2)

Is is true that Jan smiles casually?

(3)

Does Jan really casually smile?

A logical form could be something like smiles casually(jan).

2.3

Theories of Dialogue Structure

We discuss theories of dialogue structure relevant for this thesis or theories which one would think are relevant. This includes different aspects or levels of models and theories that describe primarily spontaneously spoken dialogue on different levels which can be used for constructing a model we will implement in the VerbMobil system. The requirements (see section 3.4) of our model are mainly posed by other modules or functionalities in the system, such as the transfer module or the generation of summaries or minutes.

2.3.1

Modeling utterances

For the modeling of utterances a vast variety of work is found in the literature. They almost all refer back to Austin and Searles’ work on speech acts. These theories of speech acts try to map linguistic utterances onto a fixed alphabet of categories possibly in combination with, for instance, propositional content. As we will see below, a beloved child has many names – the categories bear names like speech acts, dialog(ue) acts, communicative acts or actions or moves. We start by walking through the important work concerning labels for utterances, continue with propositional content and finally, work concerned with higher level structures. Speech acts The theory of speech acts was first coined and formulated by Austin (Austin, 1962)6 and a couple of years later by Searle (Searle, 1969). Austin showed 6 Austin’s classical paper was not written by himself! It was published posthumously by his students. It is thus questionable whether it really represents Austin’s own view or

27

that the traditional semantics at that time almost completely neglected almost everything except the informative function of language as it appears in indicative statements. Instead he viewed communication – speaking as well as writing – as a kind of social action. Within his theory communication is built up from basic (atomic) items – speech acts – which distinguish three aspects or forces of an utterance or act: • locution the utterance itself, its meaning and reference, • illocution the conventional function of the utterance as it should be understood by the hearer(s), • perlocution the consequence or effect of the audience. Searle (Searle, 1969) continued the work of Austin. He claims that an utterance can be characterized as performing four different acts: • Uttering words performing utterance acts • Referring and predicating performing propositional acts • Stating, questioning, commanding, promising, etc. performing illocutionary acts • Perlocutionary act the consequence or effect – intended or not – achieved in an audience In (Searle, 1975) he extends this approach and argues that his speech act theory accepts the basic principles of communication as defined by Grice (Grice, 1975). Presupposing the Gricean maxims, the speech act theory categorizes the acts a speaker performs into five general actions one can perform: • assertives the speaker tells how things are (concluding, asserting) • directives the speaker tries to get the listener(s) to do something (concluding, asserting) • commissives the speaker commits to doing something (promising, threatening, offering) • expressives the speaker expresses his attitude, feeling (thanking, apologising, welcoming, congratulating) not, see (Allwood, 1977; Levinson, 1983).

28

• declaratives the speaker brings about changes (declaring war, christening, excommunicating, etc) where each speech act is characterised by ten so-called felicity conditions.7 When introducing speech acts, one has to mention the term indirect speech acts. In sentences like “Can you pass me the salt?” the literal meaning is a question in which the hearer is asked whether she is able to pass the speaker the salt. However, the intention behind the utterance is more of a request which asks the hearer (in a polite way) to pass the salt, e. g., (Brown, 1980; Levinson, 1983; Gordon & Lakoff, 1975). (Levinson, 1983) states that this list (assertives – declaratives) is “a disappointment in that it lacks a principled basis” and points at other classificatory schemes (Levinson, 1983, p. 241). He concludes (Levinson, 1983, p. 243 ff) that: (i) “all utterances not only serve to express propositions, but also to perform actions.” (ii) “...there is one privileged level of action that can be called the illocutionary act – or, more simply, the speech act. ...” (iii) “. . . there is at least one utterance form of utterance (the explicit performance) that directly and conventionally expresses it8 - namely, the explicit performative.” (iv) “The proper characterization of illocutionary force is provided by specifying the set of felicity conditions for each force.” He continues to claim that “the combination of illocutionary force and the propositional content can be used to describe an utterance.” Important is that the illocutionary force constitutes something which cannot be captured by truth-conditional semantics. Rather, illocutionary acts are to be described in terms of felicity conditions, which are specifications for appropriate usage. The same goes for propositional content which can 7

The “felicity conditions” were introduced by Austin as a tool for determining whether a performative succeeds or “goes wrong.” The term “goes wrong” is used since performatives cannot be true or false: Suppose I utter “I hereby christen you, my daughter, Ida.” Then it might be acceptable, if she has not been named something else. But if she already has, then the performative is not false but “went wrong” or is, in Austin’s terminology, infelicitous. 8 Any particular illocutionary force

29

receive “propositional content restrictions” which describe restrictions on the content together with particular illocutionary forces. Also (Allwood, 1977) takes a critical look at the work by Austin and Searle. He points out that Austin’s definition is not exact enough: It is for instance unclear how one can distinguish between the intended and the achieved effect of an utterance in the mind of the hearer. Furthermore, the term “speech act” or more precisely the use of “speech” seems to restrict the speech act theory to spoken acts. But, human–human communication is not just spoken, but also non-verbal (see (Allwood, 1995a) for examples). Communicative Acts Among other critiques and suggestions, Allwood points out that the atomic view of communicative actions is suboptimal and, since communicative actions usually occur ”sequential in interaction” and not ”in isolation”, one might better study larger chunks of communication. Instead of speech acts, the term “communicative act” is suggested and a new “more suitable conceptual framework for the study of communicative actions” (Allwood, 1977) is put forward (see also (Allwood, 1976)). In the view of (Allwood, 1977), the following framework should be used for the study of communicative action: 1. The intention and purpose – intended effects – of a communicative action. 2. The actual overt behaviour used to perform the communicative action. 3. The context in which the communicative action is performed. 4. The actual achieved effects on a receiver of the act of communication (which does not necessarily coincide with the intended effect). 5. As an extension of 4, the notion of conventional force, i. e., the social consequences of a certain communicative action. In this more general framework, we find in (1) the illocutionary force. (2) is a very general description of how the message is performed, e. g., language, gestures. Apart from (3) which is present in some form in most frameworks, Allwood uses Austin’s (and Searle1969’s) perlocutionary act in (4). The extension of (4) is to our knowledge not found in other frameworks. 30

Dialogue acts The term dialogue act was coined by Harry Bunt. Dialogue act as a concept is part of the Dynamic Interpretation Theory (DIT) (e. g., (Bunt, 1994, 2000)) which consists of two main concepts: that of context and that of dialogue acts. A dialogue act can be viewed as a context-changing operator which, applied to a context, produces another, updated, context. We will not pursue the context part of DIT in too much detail, but briefly mention its aspects which are necessary for the text below. Context is described by five facets: • Linguistic context is the (local) context surrounding, e. g., a word or a phrase. • The semantic context is formed by the underlying task and the task domain, like objects, properties and relations relevant to the task. • The physical context consists of global aspects like time and place plus local aspects like communicative channels etc. • The social context means the type of interactive situation and the roles of the participants. • The cognitive context comprises the participants beliefs, intentions, plans and other attitudes. In DIT, dialogues are viewed (Bunt, 1994, p. 3) “in an action perspective and language as a means to perform context-changing actions.” He continues: “We have introduced the term ‘dialogue act’ for referring to the functional units used by the speaker to change the context. These functional units do not correspond to natural language utterances in a simple way, because utterances in general are multi-functional, . . . ” DIT distinguishes between three aspects of an utterance or dialogue act: • The utterance form of a dialogue act determines the changes to the linguistic context that a dialogue act causes. 31

• The communicative function defines precisely what significance the semantic content will have in the new context. • The semantic content will have a particular significance in the new context (which it did not necessarily have before the dialogue act was performed). As an example of the multi-functionality of dialogue acts, “thank you” in the context of an answer is used. It functions not only as an expression of gratitude, but also to inform that the answer was understood 9 . In addition – depending on intonation – the utterance may have a turn management function.     -informative                                         dialogue     control                                   





(

 -info. seeking - QUESTION, VERIFICATION  -info. providing - INFORM, ANSWER, CORRECTION (   perception    interpretation  -positive −  evaluation -auto-fb −   execution       -negative –- ( - "   perception     -positive − interpretation     evaluation    −  ( execution   perception    interpretation  -negative − -allo-fb −  evaluation        ( execution    perception      interpretation   -eliciting −   evaluation   execution  -topic management n

task − oriented interaction − oriented -

                    -feedback                 

  discourse    -structuring                  interaction   management            



-

dialogue



-opening

delimitation -closing  -turn management  -time management    -own communication management   self     introduction           −  -greeting − -opening social   -obligations −  -farewell   management       -apology   n      -gratitide − INIT  expression

− −

n

n

INIT REACT INIT REACT

REACT

Figure 2.2: Harry Bunts Dialogue Control Functions 9 Whether the answerer knows that the answer was correctly understood is left unanswered.

32

In (Bunt, 1994) the dialogue control functions are divided into two main parts. The first part covers informative dialogue functions. These are subdivided into task-oriented and interaction oriented. As examples of the former, QUESTION and VERIFICATION are given. Examples of the latter are INFORM and ANSWER. For the second part – dialogue control – three groupings are presented: feedback, discourse structuring and interaction management. Feedback functions provide the hearer with information about the speakers processing of the preceding utterance. Examples are perception and evaluation. Topic management, opening and closing are examples of the discourse structuring part. This grouping serves to indicate the speaker’s view of the state of his/her plan for how to continue. Finally, the interaction management grouping consists of, e. g., turn/time management and social obligation management such as apology and different kinds of greetings: welcome. This grouping functions as a “smoothing glue” between turns and at the beginning and end of (parts of) dialogues. In (Bunt, 1996) the author adds the concept of allo-feedback together with some classes for “illustrative purposes.” This add-on has been incorporated into Figure 2.2. Clearly, and admittedly (personal communication), completeness of the inventory of dialogue control functions is problematic: the repertoire described above has been restricted to Dutch and English information seeking dialogues. Moreover, the system of dialogue control functions is “customizable.” In subsequent papers, e. g., (Bunt, 1996, 1995, 2000), different aspects of the dialogue control function system are developed: For instance, in (Bunt, 1995), the author introduces the difference between dialogue control acts and task-oriented acts. The reason for differentiating dialogue acts is that they operate on different parts of the context. Whereas the taskoriented acts cause changes in the semantic context, the dialogue control acts may only cause changes in the social or physical context. Dialogue Moves The term dialogue move has been used in a number of publications including (Carlson, 1983; Larsson, 1998a). Within the TRINDI project (Larsson, 1998a) the term move is used. The reason for this is displayed in the first footnote of (Larsson, 1998a, p. 1): “There is a proliferation of names for this level of dialogue structure which perhaps is symptomatic of the lack of agreement between researchers. Apart from dialogue moves (Carlson) we 33

have dialogue objects (J¨onsson), communicative actions (Allen), communicative acts (Allwood and Traum), and the speech acts of Austin and Searle. Of course, there are also sometimes theoretical differences between many of these accounts. As noted, we will use the ‘dialogue move’ terminology which is derived from Wittgenstein’s ‘language game’ metaphor, but this should not be taken as indicating any specific theoretical standpoint.”

2.3.2

Relations between utterances and turns

Next, we turn to relations between utterances and turns. Relations like backward looking aspects are naturally found in mono- as well as dialogue since sentences or utterances are related to the previous context in some way or are unconnected, i. e., they introduce something new not previously mentioned. Important means related to communication management include (see (Allwood, 1994)) basic communication feedback (Allwood, Nivre, & Ahls´en, 1992), turn management (Sacks, Schegloff, & Jefferson, 1974), and sequential structuring (Schegloff & Sacks, 1973). The concept of forward/backward looking center is an important ingredient in the framework centering (Grosz, Joshi, & Weinstein, 1995). Forward and Backward looking aspects The concept of forward and backward looking aspects of an utterance has been used by a number of researchers, e. g., (Allwood, 1994, 1995a, 1995b; Larsson, 1998b) to characterize a certain aspect of the relation between communicative actions. Allwood argues that (Allwood, 1995b) “. . . every contribution in a dialogue, except possibly the first and the last, could be said to have both a backward looking reactive expressive aspect and a forward looking evocative aspect.” We argue that the first utterance can have a backward looking aspect as it might refer back to a dialogue prior to the current one. Also, the last contribution may consist of, for example, a promise to investigate something which might be the topic of another dialogue. In (Allwood, 1994, S. 7) the author writes: “With regard to forward and backward orientation, all parts of an utterance can be potentially relevant in both directions, however, the obligated functions are mainly backward-looking and the obligating functions are mainly forward-looking.” 34

(Carletta, Dahlb¨ack, Reithinger, & Walker, 1997) describe the result of the Discourse Resource Initiative (DRI) meeting at Schloss Dagstuhl 1997. The goal of that meeting was to come up with a general framework for the annotation of different resources, i. e., corpora. Amongst other topics, two sub-groups discussed forward and backward looking aspects of utterances. Some of the aspects discussed in the forward looking group were, e. g., Statement, Assert, Reassert. Moreover, for the backward looking aspects of an utterance the four dimensions understanding, agreement, information relations and answer. In particular, the agreement dimension contains speakers attitude towards actions, plans objects etc. Succeeding work produced the DAMSL - Dialog Act Markup in Several Layers - scheme for the annotation of different aspects of, for example, forward/backward aspects of (parts of) dialogues, see (Allen & Core, 1997).

2.3.3

Initiative, Response and Mixed Initiative

Several researchers have looked into a phenomena which is known under the terms initiative and response. With the term initiative the modeling of “holding the floor”10 (Sacks et al., 1974), “who is driving the dialogue” (Strayer & Heeman, 2001) or “controlling the dialogue” (Walker & Whittaker, 1990) is described. For mixed-initiative collaborative problem-solving dialogues (Smith, 1997) takes the view that “initiative lies with the participant whose goals currently have priority.” In (Guinn, 1996) a more global viewpoint is used: “An agent is said to have dialogue initiative over a mutual goal when that agent controls how that goal will be solved by the collaborators.” Also the term mixed initiative is used for the modeling of these aspects of dialogue, e. g., (Smith, 1997; Walker & Whittaker, 1990; Novick, 1988). Initiativ och Respons Inspired by the work of, for example, (Severinson-Eklund, 1983; Schegloff & Sacks, 1973; Goffman, 1981), Linell and Gustavsson (Linell & Gustavs10 . . . in combination of turns, where the turn is a combination of the notions of utterance, sentence and speech act

35

son, 1987) describe a model called “Initiativ och Respons” 11 (henceforth IR) for dialogue communication that characterizes utterances into various types. The model is developed on the basis of recorded dialogues from various situations, such as court room negotiations, dinner chats, and class room lectures. The focus of their work lies on the dynamics and coherence of dialogue and on the dominance between the dialogue partners. Basic parts of the model are initiative and response. The authors use initiative for contributions where the speaker adds something new to the dialogue. Response is used for cases where the speaker refers back to the previous contribution. Thereby, the responses are defined in terms of how they refer to the context: focal, reference to own material or to the partner’s contribution. Also whether the contribution is relevant to the previous discourse is distinguished. The IR model is related to the work known under adjacency pairs, e. g., (Schegloff & Sacks, 1973) and “opening move” and “responding move” (Sinclair & Coulthard, 1975). Initiative, Control and Mixed Initiative One of the most cited work in the area of mixed initiative is (Walker & Whittaker, 1990) in which the authors discuss the two terms mixed initiative and control in task oriented and advice giving 12 dialogues. They explore how control is related to attentional state. The authors predict a shift of the attentional state where a shift in control is being performed. Due to the master-slave assumption for task-oriented dialogues, the control is expected to be held exclusively by the expert. On the other hand, they predict that the control is transfered back and forth within advice-giving dialogues. The authors use four basic “utterance types:” assertion, command, question and prompt for the annotation of the dialogues. With these, rules for who has control of the dialogue are given: Utterance Type ASSERTION COMMAND QUESTION PROMPT

Controller SPEAKER (unless response to a question) SPEAKER SPEAKER (unless response to a question or command) HEARER

11 No folks, this is not a typo, it is genuine Swedish meaning—surprise, surprise— Initiative and Response. :-) 12 The dialogues consist of ten financial support advice-giving dialogues (474 Turns) – see (Hirschberg & Litman, 1987) “Now lets talk about now”, (Pollack, Hirschberg, & Webber, 1982) “User participation in the reasoning process of expert systems” and four dialogues (450 turns) (Whittaker & Stenton, 1988).

36

The thesis of their work is that the controller corresponds to the Initiating Conversational Participant (ICP). In (Heeman & Strayer, 2001; Strayer & Heeman, 2001) the ideas put forward in (Walker & Whittaker, 1990) are applied to the TRAINS corpus; a part of the TRAINS corpus was annotated with the utterance types presented in (Walker & Whittaker, 1990). An inspection of the annotation indicated that the thesis of (Walker & Whittaker, 1990) is basically correct. However, it was found that the initiator of a discourse segment does not necessarily keep the initiative throughout the entire segment. Initiative and Response on Two Levels (Chu-Carroll & Brown, 1997, 1998) propose a two-level description of initiativeresponse which is able to distinguish between two different kinds of initiative. The authors argue that there is a difference in taking the initiative on the dialogue level and the task level, and that previous models fail to distinguish these. To clarify their issue they use the following example where a student (S) is consulting an advisor (A): (1) (2)

S : I want to take NLP to satisfy my seminar course requirement. Who is teaching NLP?

(3a) A : Dr. Smith is teaching NLP. (3b)

A : You can’t take NLP this semester because you haven’t taken AI, which is a prerequisite for the course.

(3c)

A : You can’t take NLP this semester because you haven’t taken AI, which is a prerequisite for the course. I would suggest that you take some other seminar course to satisfy your requirement, and sign up as a listener for NLP if you’re really interested in it.

(Chu-Carroll & Brown, 1997) argue that previous models, i. e., (Walker & Whittaker, 1990; Novick, 1988) can distinguish the difference between 3a and 3b (and between 3a and 3c) but fail to explain the difference between 3b and 3c. Using the wordings of (Chu-Carroll & Brown, 1997), “in 3c A is more active” in that she proposes an action which she believes should be incorporated into A’s plan. An “agent” is said to have the task initiative if 37

“she is directing how the agent’s task should be accomplished, i.e., if her utterances directly propose actions that the agents should perform” Furthermore an agent is said to have the dialogue initiative if “she takes the conversational lead in order to establish mutual beliefs between the agents, such as mutual beliefs about a particular piece of domain knowledge or about the validity of a proposal” Important here is the insight that dialogue initiative is subordinate in favour of task initiative: “Thus, when an agent takes over the task initiative, she also takes over the dialogue initiative, since the agent’s proposal of actions can be viewed as an attempt to establish the mutual belief that a set of actions be adopted by the agents.” In (Chu-Carroll & Brown, 1998) their model is applied to a subset of the TRAINS91 corpus by annotating 16 dialogues with task initiative holder (THI) and dialogue initiative holder (DIH) obtaining the result showed in table 2.1. Table 2.1: Distribution of task and dialogue initiative for a subset of the TRAINS91 corpus.

DIH:System DIH:User

TIH:System 37 (3.5%) 4 (0.4%)

TIH:User 274 (26.3%) 727 (69.8%)

Evidently in the majority of the turns, the task initiative holder holds the dialogue initiative. However, in almost 27 % of the turns, the “agents’ behaviour can be better accounted for by tracking the two types of initiatives separately.”

2.3.4

Conversational Games

One example of a corpus based on human–human dialogues is the map task corpus (Carletta et al., 1997a). In the dialogues recorded within this 38

scenario, two humans are trying to duplicate a route drawn on one of the maps. However, the maps differ in details, and so the description of the route is not straight forward. There is sometimes eye contact, but the locutors are not allowed to see each other’s map. The modeling within the map task project is based on those of, e. g., (Sinclair & Coulthard, 1975; Carlson, 1983; Levin & Moore, 1977), but the conversational games model is described in (Kowtko et al., 1993). Three levels of structure have been developed and annotated. Starting with the lowest, conversational moves are the building blocks for the second level– conversational games. On top of these, on the third level the dialogues are divided into transactions. Conversational moves are grouped into two different categories for modeling the dialogues: • initiating moves Instruct, Explain, Check, Align, Query-yn, and Query-w • response moves Acknowledge, Reply-w, and Clarify Additionally, there is a third type of move - Ready - which occur between two dialogues, and signals that the participant is prepared for a new dialogue. Games are used to model the goal structure of a dialogue. It is composed by one initiating move and its succeeding moves until the goal of the initiating move has been fulfilled or abandoned. However, there are some naturally occurring aspects of spontaneous dialogue complicating this (theoretically nice) idea: It is not always clear for the hearer what goal the initiator of a game is trying to achieve. In general, a common phenomenon in spontaneous dialogues (which the map task dialogues prove) is embedded games which might serve new purposes not even salient in the current top level game. These might even be introduced while the dialogue partner is speaking. Therefore, the authors conclude that (Carletta et al., 1997, S. 3.4): “This makes it very difficult to develop a reliable coding scheme for complete game structures.” The annotated game structure can be viewed as a tree structure where the top level of each game is named by the purpose of the initiating move. Another view is that the conversational games, and indeed the transactions as well, just serve a bracketing purpose. 39

Typical for the dialogues in the map task corpus are the short turns (typically one or two moves). And even in the cases where the turns are longer they consist either of sequences of the same move type (typically sequences of Instruct), or one ending of a game followed by one or two initiating moves. Another important observation is that the ending of an embedded game may also coincide with the ending of the conquering game.

2.3.5

Dialogue Games

In (Carlson, 1983), the fundamental unit for describing discourse is the socalled language game. It is based on the work by, e. g., Wittgenstein, Grice and Coulthard. Carlson writes “Wittgenstein did however not intend to develop his idea into a systematic theory of language games.” but continues to argue that there is “a very precise definition of a game (of strategy) in the mathematical theory of games developed by John von Neumann and Otto Morgenstein.” Within this theory there are three central concepts of strategy, payoff and solution to a game. These concepts are to be given for each participant of the game and a rational agent is one “who uses the most efficient means available to him to further his goals, i. e., one who follows his optimal strategies. The main virtue (and occasional weakness) of game and decision theory is its ability to explicate this key concept of goal-oriented action. Carlson uses the following, rather strong characterization of “well-formedness of discourse”: “A discourse is coherent if it can be extended into a wellformed dialogue game.”

2.3.6

Plan Based Approaches

In the late seventies and early eighties, a plan based approach for modeling intentions for agents involved in cooperative dialogue was developed, e. g., (Cohen, 1978; Allen & Perrault, 1980). In particular, it was shown that by modeling the plans of an agent and another agent it was possible to provide 40

a formal explanation of not just speech acts but also indirect speech acts and the reason for providing more information than asked for. Within this line of research, an agent A is said to have beliefs 13 , intentions and goals, e. g., acquire some information. To reach a goal he creates a plan which is then executed. The execution of the plan may be observable by another agent B. B may try infer the plan of A, a process which is often referred to as plan recognition, e. g., (Kautz, 1987). State-of-the-art problem solving systems, like STRIPS (Fikes & Nilsson, 1990), provides a formulation of actions and plans for these tasks. Cohen (Cohen, 1978) showed that it is possible to model certain speech acts, like request and inform, actions in a planning system. Using the same planning paradigm, Allen and Perrault (Allen & Perrault, 1980) showed how three relevant tasks can be encoded in plans, namely • generation of responses that provide more (but necessary) information than requested • generation of responses providing an appropriate reaction to an utterance that consists solely of a sentence fragment • generation of clarification dialogues for the case when the interpretation of an utterance is ambiguous There is less or nothing written about the performance and evaluation of the approach. And, since these are highly knowledge intensive approaches, the robustness and coverage of such a system are questionable. One interesting design decision during the development of the TRAINS 14 system, e. g., (Allen et al., 1995), is revealed in (Allen, Miller, Ringger, & Sikorski, 1996) where the authors even admit that “We also knew that it would not be possible to directly use general models of plan recognition to aid in speech act interpretation (as in (Allen & Perrault, 1980; Litman & Allen, 1987; Carberry, 1990)), as these models would not lend themselves to real-time processing.” This argument is circumstantiated by the fact that there is no polynomial algorithm for STRIPS-like planners. However, the structure of plan based system is appealing, and the authors continue: 13

Beliefs can be though of as knowledge, i. e., “things” the agent thinks he knows or are - true. 14 See http://www.cs.rochester.edu/research/trains/

41

“Our approach was to try to retain the overall structure of plan-based systems, but to use domain-specific reasoning techniques to provide real-time performance.” Using this domain-specific reasoning, in (Allen et al., 1996) an evaluation of the complete system is given. Even today when the computers are 100 or maybe 1000 times faster than in the late seventies and in the early eighties, Allen admitted that (personal communication): “A straightforward implementation of those techniques as a heuristic search in a domain independent way would still be too inefficient—and too inaccurate—for a realistic application.” Still, a characteristic property of many of the dialogues in this line of research is the short user contributions. At most, a contribution consists of two utterances.

2.3.7

Plan Recognition Models

Contrary to the theoretical models presented above, (Iida & Arita, 1992) present a four layered plan recognition model which has been developed for a translation scenario. Although the material is monolingual (Japanese only) it shares a lot of the problems we have when developing models within the VerbMobil project. The model tries to describe a dialogue by means of a hierarchical tree structure, and consists of four different kind of plans: • Interaction plans are connected to the utterances of the dialogue and allow for the exchange of information between speaker and hearer. Example of such a plan is a request–response pair (in their terminology: request–action unit). • Communication plans roughly connect to the interaction plans and thus are an abstraction of the different possible realizations of the turn taking pair. • Dialogue plans span a dialogue. The participants first opens up the dialogue, then talks about something (content), and finally closes the dialogue. • Domain plans are the top level plan and allow for performing a given action as a sequence of other actions. As an example a registration for a conference is decomposed into “obtaining a registration form”, “filling out the form”, and finally “sending the form.” 42

As evaluation corpus four dialogues are used. There is no quantitative evaluation but merely one example of the structure constructed by the plan recognizer. Furthermore, the model is constructed purely by hand. Kautz suggests a formal plan recognition model based on events in a hierarchy (Kautz, 1987, 1991). His plans are more restrictive than those of, e. g., (Allen & Perrault, 1980). Still, his plan hierarchy has to be completed with circumscription. Kautz argues that the plans need to be formalized in such a restricted way since “. . . it would lead to massive increase in the size of the search space, since an infinite number of plans could be constructed by chaining on preconditions and effects.” (Vilain, 1990) shows that plan recognition using this frame work can be translated into parsing with a context free grammar. For more information on plan recognition, see (Kautz, 1991; Carberry, 1990).

2.3.8

Dialogue Grammars

Another line of research is the one concerned with describing dialogues with a (context free) grammar, e. g., (Scha & Polanyi, 1988; J¨onsson, 1993). One of the arguments for this line is more pragmatic: some human—machine dialogues are simple enough to be described with a grammar and the use of, for instance, plan recognition models like in (Allen & Perrault, 1980). LINLIN Developing models for typed natural language interfaces, (Dahlb¨ack & J¨onsson, 1992) describes a simple tree dialogue model—LINDA—which basically accepts IR-units. A move introducing a goal is called an initiative, and response if it is a goal-satisfying move. 98% of the 21 dialogues stemming from five different applications can be described by LINDA. 88% of same dialogues can be described by adjacency pairs. Figure 2.3 shows a sample dialogue grammar for an interface to a database with used cars taken from (Dahlb¨ack & J¨onsson, 1992). There, Q T and AT are task related questions and answers and QD and AD are reparation sequences initiated by the system. SUNDIAL In (Bilange, 1991), a task independent oral dialogue model for spoken dialogue developed within the SUNDIAL project is presented. SUNDIAL was 43

D IR QT /AT QD /AD QX /AS

::= ::= ::= ::= ::=

QT /AT |QX /AS QT /AT |QX /AS QT (QD /AD )∗ (AT ) QD (AD ) QT AS |QS AS |QD AS

Figure 2.3: A dialogue grammar for the Cars application. ∗, +, () and | have their usual meaning.

a European project whose goal was to develop generic oral cooperative dialogue systems. One if its main application was flight reservation. Basis for the model is two negotiation patterns: 1. Negotiation opening + Reaction 2. Negotiation opening + Reaction + Evaluation Four decision layers are used: Rules of conversation is divided into four levels: • Transition level This level resemble a discourse segment (Grosz & Sidner, 1986). • Exchange level This level models exchanges.

• Intervention level This level consists of the building blocks for the Exchange level. “Three possible illocutionary functions are attached to the interventions: initiative, reaction and evaluation.” • Dialogue acts In Addition to the core speech act, the dialogue act contains structural effects on the dialogue (Bunt, 1989). System Dialogue Act Computation User Dialogue Act interpretation Strategic decision layer Since the model involves non-determinism, it is possible to continue the dialogue in several ways, this layer is concerned with selecting the next best action, i. e., dialogue act. A general strategy for oral dialogue is to avoid embedded structures. The model gives raise to tree structures as depicted in figure 2.4 (Bilange, 1991). 44



...

  initiative(system . . .)    reaction      E1  initiative(system . . .)       evaluation : E2  reaction(user . . .)  T1  evaluation(system . . .)     initiative(system . . .)   E  reaction(user . . .)  3  evaluation(system . . .) 

...

Figure 2.4: Tree structure of an oral dialogue using the SUNDIAL dialogue manager.

2.3.9

DRT

The Discourse Representation Theory, e. g., (Kamp, 1981; Kamp & Reyle, 1993) is a semantic theory developed for the represention and the computation of “trans-sentential anaphora and other forms of text cohesion” (Kamp & Scott, 1996). Other motivations for the development of DRT was donkey anaphora and tense and aspect. Although DRT was developed under a more theoretical perspective, there are now a lot of partial implementations running at several sites. In DRT, the fundamental structure for representing a discourse is the discourse representation structure (DRS) which consists of a set of discourse referents and set of conditions thereon. The set of discourse referents is the set of individuals of the discourse the DRS is representing. The different kinds of conditions are, e. g., predicates, negation and conditionals. The construction of a DRS consists of following a number of construction rules, e. g., the universal quantifier rule. The interpretation of a DRS consists of assigning the DRS a model: Given a DRS and a model, the DRS is said to be true iff there is a function f into the model which verifies the DRS.

2.3.10

Conclusion

The pioneer work by Austin and Searle resulted in an awareness, that truth semantics alone is not sufficient for modeling communication. Instead communication is composed by atomic items which received the name speech 45

acts. The basic idea behind speech acts is present in a number of works, e. g., (Allwood, 1977; Bunt, 2000; Carlson, 1983; Larsson, 1998a) where, however, different aspects have received different levels of elaboration. The bottom line is that utterances should be modeled by at least something like the illocutionary force together with propositional content. This is exactly the approach taken in VerbMobil (see section 2.6.1). For higher level structures, there are some similarities between the different models but what is modeled by the structures differ. There are approaches where communicative games can stand as representative. The idea is that there are certain obligations existent especially in dialogue, such as, one responses to a greeting with a greeting, answers a question, poses a question when one is requested to ask one. Computationally, there are two major lines for modeling dialogue on an abstract level. The first is a computationally expensive approach based on STRIPS-like planning. This model is capable of explaining rather complex phenomena such as the reason for a, possibly indirect, speech act in a certain context (Allen & Perrault, 1980; Carberry, 1990; Litman & Allen, 1987). The other line follows the assumption that dialogue, both typed and oral, can be modeled by a (context free) grammar (Scha & Polanyi, 1988; J¨onsson, 1993; Bilange, 1991; Kautz, 1987; Vilain, 1990). Research in the area of context free grammars has produced efficient (polynomial) algorithms for processing these grammars. For a system like VerbMobil where the task is to follow the dialogue rather than control it and where real time processing is of great importance we adopt the latter approach (see section 2.6). Although regarded as too “ineloquent” (Allwood, 1995a) from a theoretical perspective, this choice is not necessarily uninteresting since it allows for a formal machinery expressive enough for drawing interesting conclusions, e. g., (Kautz, 1991; Vilain, 1990).

2.4

Using the Theory - Annotating Corpora

We now turn into the area of corpus annotation. There are huge efforts in both time and money world-wide invested in annotating corpora with different kinds of information, such as syntax, semantics, dialogue acts and conversation games. The reasons for annotating a corpus are manifold and amongst the ones which are prominent to us are the following: • Proof of concept By annotating a corpus with a model, and especially if the annotation of several annotators are homogeneous, the annotation serves as a kind of proof of concept. However, unreliable 46

coding does not necessarily indicate a non-perfect model. The annotation manual might be premature, and a revision of it might lead to better annotation results. • Supervised Learning An annotated corpus can be used to gain experience not just by humans but by machines. • Evaluation An annotated corpus can be used to evaluate the performance of a machine. We will return to these topics later in this chapter and in chapter 3.

2.4.1

Coding Schemata—MATE

The MATE project (Klein et al., 1998) was, like DAMSL (Allen & Core, 1997), an effort to unify several dialogue act annotation schemata into one. Additionally an annotation tool was developed. The effort was based on the observation that there are a lot of similarities between the schemata. Several groups have developed different annotation schemata for their corpus with their own set of dialogue acts or speech acts. All these groups were working on models for different tasks, like information retrieval/seeking dialogues (i. e., Alparon, and LinLin), analysis of child language (i. e., Chat), travel-planning (CSTAR), appointment-scheduling (JANUS, and VerbMobil), courtroom interaction (SLSA) and others. Some of these coding schemata or more precise sets of dialogue acts were additionally used in dialogue systems (i. e., JANUS, CSTAR, VerbMobil, and LinLin), whereas some were developed with more theoretical interests in mind (i. e., Chat, and SLSA). Almost all schemata investigated in the MATE project are task- and application-oriented contrary to more general domain-independence. The reason for this is that most of the schemata are geared towards some taskoriented application. Consequently these schemata are domain restricted too. Also, for most of the schemata several dialogues - in the range of 15 to some thousand - have been annotated. Most of the annotation schemata have been evaluated as well (see below). Finally, the MATE project has judged three of these schemata to be “good” from a dialogue act perspective: Alparon, SWBD-DAMSL and VerbMobil. This judgement is based on the comparison in (Klein et al., 1998, p. 28–30), where the resources put into the annotation scheme as well as the annotation are listed. Also, the evaluation itself in the VerbMobil project is outstanding together with the efforts for SWBD-DAMSL. 47

2.4.2

Annotating Corpora reliably

We present and discuss the annotation of three relevant corpora: the map task corpus (Carletta et al., 1997a), the TRAINS91 corpus (Chu-Carroll & Brown, 1998), and the VerbMobil corpus (Maier, 1997; Reithinger & Kipp, 1998). To measure the quality of the annotation the inter-coder reliability κ (Cohen, 1960) is used: κ=

P (A) − P (E) 1 − P (E)

(2.1)

In the formula 2.1 P (A) represents the proportion of times that the coders agree, and P (E) the proportion of times that one would expect them to agree by chance. Basis for the computation is a so-called confusion matrix which shows how the annotations coincide. Table 2.2 shows such a table for two annotators, annotating n dialogue acts on N utterances. There, a cell (n, m) where n 6= m indicates that the annotators have different opinions about the annotation for a certain dialogue act, whereas the case n = m reflect how often the annotators agree on a certain dialogue act. Table 2.2: A confusion matrix for two annotators d1 d2 .. .

d1 f1,1 f2,1 .. .

d2 f1,2 f2,2 .. .

... ... ... .. .

dn f1,n f2,n .. .

dn

fn,1

fn,2

...

fn,n

P

f∗,1

P

f∗,2

...

P

f∗,n

P f P 1,∗

f2,∗

P

fn,∗

N

To compute P (A), we sum the cases where the annotators agree divided through N : Pn fi,i P (A) = i=1 (2.2) N and to compute P (E): P (E) =

Pn

i=1 f∗,i N2

∗ fi,∗

(2.3)

There are different opinions about the usefulness or behaviour of the kappa statistics. Whereas some of the advantages of the kappa statistics are 48

its ease in calculating and its appropriateness for testing whether agreement exceeds chance levels for binary and nominal ratings, the list of disadvantages is regrettably longer. Amongst the, to us, more relevant cons we have: • Despite a high level of agreement and the fact that individual ratings are accurate, κ may be low. Whether a given kappa value implies a good or a bad rating system or diagnostic method depends on what model one assumes about the decision-making of raters (Uebersax, 1988). • The κ is influenced by base-rates and trait prevalence. Consequently, kappa figures are seldom comparable across studies, procedures, or populations (Thompson & Walter, 1988; Feinstein & Cicchetti, 1990).” Whereas the general opinion seems to be that a κ value greater than 0.8 indicates good agreement for, in our case, the annotators, a lower value does not necessarily indicate a lower agreement. For 0.67 < κ < 0.8, tentative conclusions are allowed to be drawn (Carletta, 1996). However, for binary traits and binary ratings, especially where the data is skewed, e. g., 0.9/0.1 split of the data, a good agreement might result in a lower kappa value (Grove, Andreasen, McDonald-Scott, Keller, & Shapiro, 1981; Uebersax, 1987; Gwet, 2001). This observation implies that biases tend to move kappa down, rather than inflate it. A good κ value should be used as an indication for good agreement, but further analysis on absolute agreement using the proportions of specific positive/negative agreement might shed more light on the actual situation. To conclude, the kappa statistics can be used for, e. g., evaluating intercoder agreement, but it should be interpreted with care. It could, on the other hand, be followed up with additional analysis. Recent suggestions advocate alternative measurements, e. g., the AC1 statistics (Gwet, 2001). We summarize some of the corpus annotations relevant to our work: The Map Task Corpus (Carletta et al., 1997a) In what follows K stands for “number of coders”, N for “Number of units”. For move segmentation κ = .92 , N = 4097 , K = 4 was reached using word boundaries as units, and for move classification κ = .83 , N = 563 , K = 4 . For game coding the agreement for where the games start and, for agreed starts, where they end has been used as metrics. Pairwise percent agreement is used. Agreement for the beginning of game reached 70% with an agreement of type of game of κ = .86 , N = 154 , K = 4 . In (Carletta et al., 1997a) it is concluded that there is room for improvement of the 49

annotation scheme. For the cross-coding agreement the problems arise where “the dialogue participants begin to overlap their utterances or fail to address each other’s concerns clearly.” TIH-DIH (Chu-Carroll & Brown, 1998) an experiment with the annotation of task and dialogue initiative on 16 dialogues from the TRAINS91 corpus. In total, three humans annotated “approximatively 10 %” of the corpus, which probably corresponds to one or maybe two dialogues. The inter-coder reliability was tested using the kappa measure obtaining κ = 0.57 for the task initiative holder and κ = 0.69 for the dialogue initiative holder, thus probably rendering the annotation irrelevant. To explain the reason for not getting better κ values, the authors points at the above-mentioned effect with κ in combination with skewed data. This is clearly the case (see table 2.1 on page 38). Another reason might be due to too few a number of annotated dialogues. VerbMobil At two points during the project an evaluation of the coding of dialogue acts as well as the segmentation was performed (Maier, 1997). During the first phase of VerbMobil, the dialogue acts were defined in (Jekat et al., 1995), a document also serving as coding manual. The inter-coder reliability during the first phase was evaluated on 10 dialogues. For segmentation, given the boundary agreement P (A) = 0.9849, we have κ = 0.9293. For the dialogue acts, ten pre-segmented were annotated by two annotators giving P (A) = 0.8294. Therefore we have κ = 0.7975 During the second phase, the inter-coder reliability of dialogue acts was tested for ten dialogues (Reithinger & Kipp, 1998). Given P (A) = 0.8530 we have κ = 0.8261. One coder was asked to re-annotate 5 of the dialogues, obtaining κ = 0.8430 (P (A) = 85.86%). In total, 1505 dialogues (738 German, 375 English and 402 Japanese) have been annotated with dialogue acts. This corresponds to 76210 dialogue acts (37954 German, 22682 English and 15574 Japanese). Conclusions We make two important observations. The inter-coder reliability for the VerbMobil corpus is: • comparable with that of another well known and widely accepted annotated corpus – the map task corpus – being aware that such a com50

parison is questionable, e. g., (Thompson & Walter, 1988; Feinstein & Cicchetti, 1990). • good enough to show the validity of the definition of our dialogue acts as defined in (Alexandersson et al., 1998). This opens up the door for actually using our corpus for, e. g., machine learning methods (see section 3.5.1).

2.5

Dialogue Modeling in VerbMobil

We are now in the position of describing the models used by the dialogue module in VerbMobil. At this point we would like to stress that VerbMobil is a translation system. Its purpose is to translate utterances from one language to another. The translation is performed within a negotiation scenario. Whereas the task for a man–machine dialogue system is to act as a dialogue partner, the tasks for a dialogue component within a translation scenario is to track the dialogue 15 . Our processing (see chapter 3) is therefore focussed on providing contextual information which can be used to improve the overall translation: • Prediction What is going to happened next? • Recognition What is the most reasonable interpretation of a certain sign? • Translation What is the most reasonable disambiguation of a sign, e. g., word or phrase? • Generation (How) has an negotiation object been uttered/realized? To reach this goal we have developed two means of representing the dialogues. The first one, called the thematic structure, is concerned with representing the content and status of the negotiation. This is modeled and represented by the propositional content (see section 2.2.1) together with the dialogue act. The information contained in this structure can be used to, for instance, generate summaries, but can contribute to realizing the ideas presented in (Schmitz & Quantz, 1995), where the authors point at the redundancy phenomena common in spontaneous speech (see Figure 2.1). The second structure we call the intentional structure. This tree 15 see section 3.7.4 for a discussion of the similarities between the discourse processing in a mediating system (VerbMobil) and a man–machine system (SmartKom)

51

structure is used to represent different abstractions of the dialogue, e. g., the dialogue phase. As it turns out, this information can be used to improve the translation of words and phrases (Alexandersson et al., 1998). The model presented below - the intentional structure including, e. g., dialogue acts and dialogue games, as well as the design of the propositional content - is based on our rather large corpus analysis effort. In the next section (2.6) we will start by introducing the intentional structure. The propositional content is presented in section 2.7.

2.6

The Intentional Structure

Our main objective within VerbMobil with the intentional structure are twofold: First, it is used to support the translation process (Alexandersson et al., 1998). Some utterances, phrases, and words like the German phrase “Guten Tag” could be translated to “Hello” or “Good bye” depending on the dialogue phase (see Section 2.6.3 and 2.6.4). Second, it has the potential to be used to form the basis of the generation of minutes (see chapter 3). However, the exercise of designing a model capable of describing such a structure requires influences from much previous work as well as novel ideas. Especially when it comes to building the structure in the running system (see chapter 3), the use of techniques and research results from the area of probabilistic language modeling and machine learning has turned out to be fruitful. The design of the intentional structure is depicted in Figure 2.5. It is a hierarchical structure composed of 5 levels, where each level can be viewed as a more abstract representation of the dialogue the higher up in the tree we go. A basic assumption is that cooperative negotiation dialogues can be represented by a tree structure. Much of the work presented below resembles work described elsewhere, e. g., (Bunt, 2000; Chu-Carroll & Brown, 1998; Carletta, 1996; Allwood, 2000; Levinson, 1983) but some parts, e.g., moves are, to our knowledge, novel. The levels are (from the bottom): • dialogue act level. This level is an abstraction of the utterances 16 . The dialogue act received its name because the ideas presented within the Dynamic Interpretation Theory, e. g., (Bunt, 1996) resemble much of the modeling in VerbMobil. But also the description in (Levinson, 1983) has inspired our definition. Our dialogue acts are discussed in section 2.6.1. 16

We will also use the more technical term “segments” in the next chapter.

52

(APPOINTMENT NEGOTIATION)

dialogue level

(NEGOTIATION PHASE)

(GREETING PHASE)

dialogue phase level

(INITIATIVE−RESPONSE) dialogue games level

(RESPONSE)

(INITIATIVE)

dialogue moves level

dialogue act level (FEEDBACK)

(SUGGEST)

How about ....

Well

(REJECT)

that does not suit me

Figure 2.5: The five levels of the intentional structure

• dialogue move level. On this level, sequences of dialogue acts are combined into dialogue moves, like greet and initiative. Section 2.6.2 describes our definition and usages of moves. • dialogue game level. Here, we combine the moves into games. Example of games are greeting and negotiation. Our definition and usage of the games are discussed in section 2.6.3. • dialogue phase level. The games form the phases (see Section 2.6.4). As indicated in, e. g., (Kowtko et al., 1993), a dialogue can be described by dialogue phases. The phases are discussed in section 2.6.4. • dialogue level. This level consists of just one part of the structure: the top level. In the case of VerbMobil (assuming cooperative negotiation dialogue) this level corresponds to, i. e., appointment scheduling or phase 2 scenario complete (see section 2.6.4). 53

The first level above - the dialogue acts - has emerged partially by corpus analysis as well as literature studies. Along with the project, the corpus collection effort (see section 1.1) resulted in an impressive collection of material which was used for numerous tasks. One was the identification of typical communicative acts which we call dialogue acts. The dialogue acts have been defined in (Alexandersson et al., 1998). This document also served as manual for the annotation effort.

2.6.1

Dialogue Acts in VerbMobil

From both a theoretical as well as a practical point of view, the annotation of utterances or segments with labels, such as dialogue acts or communicative functions, is a well-established technique for analyzing communication and building dialogue systems. For dialogue modeling and one of the translation tracks in VerbMobil, the basic processing/translation entity is the dialogue act. As in many other schemata, our dialogue acts are task-oriented. Their purpose is to mirror the intention behind the dialogue phenomena occuring in our negotiation dialogues. In general, language has to be segmented into chunks which can be processed according to the task of the system. For written (European) language one can use markers, like a full stop, comma or question marks to perform this.17 The result is a sequence of sentences, and/or part(s) of sentences. Spontaneously spoken language contains nothing like full stops. Instead other means (i. e., prosodic analysis) for distinguishing where to segment the input have to be used. In the VerbMobil corpus, many contributions can be divided into sentences or phrase-like pieces of linguistic material. But our corpus shows frequent occurrences of non-grammatical material—in the sense of not being described as traditional grammar—which might still be pragmatically meaningful or even important for the dialogue (Allwood, 2001; Ward, 2000). The dialogue act has proved to be a helpful tool for assigning these chunks meaning. In VerbMobil it is used for different purposes, like translation (Reithinger, 1999; Reithinger & Engel, 2000), disambiguation for the semantic transfer (Buschbeck-Wolf, 1997; Emele et al., 2000) and for the generation of summaries (Alexandersson et al., 2000). In section 3.7 we discuss our usage of dialogue acts for discourse processing. Figure 2.6 shows the hierarchy of dialogue acts used in the VerbMobil project (Alexandersson et al., 1998). The hierarchy is formed as a decision 17 Japanese is one example for which this is not true. Instead other means, like sentence final particles have to be used.

54

tree for supporting the annotation of dialogues as well as processing. It divides the dialogue acts into three main classes: GREET BYE INTRODUCE POLITENESS_FORMULA CONTROL_DIALOGUE

THANK DELIBERATE BACKCHANNEL INIT

MANAGE_TASK

DEFER

REQUEST_SUGGEST

CLOSE

REQUEST_CLARIFY

DIALOGUE_ACT

REQUEST_COMMENT REQUEST SUGGEST

REQUEST_COMMIT DIGRESS EXCLUDE

TOP PROMOTE_TASK

DEVIATE_SCENARIO REFER_TO_SETTING

CLARIFY

INFORM

EXPLAINED_REJECT

GIVE_REASON FEEDBACK NOT_CLASSIFIABLE

COMMIT

FEEDBACK_NEGATIVE

REJECT

FEEDBACK_POSITIVE

ACCEPT CONFIRM

OFFER

Figure 2.6: Dialogue acts hierarchy as employed in VerbMobil 2

• control dialogue this set contains dialogue acts for segments concerned with social interaction or segments related to the dialogue itself (e. g. opening or closing the conversation) or segments used for smoothing the communication • manage task these acts are for segments concerned with managing the task (i. e. initializing, deferring, closing the task) • promote task finally, we use this set for utterances concerned with promoting the task. A complete and very detailed description of the definition of the dialogue acts (including examples in German, English and Japanese) and how to utilize the hierarchy as a decision tree is found in (Alexandersson et al., 55

1998),18 but see also (Alexandersson et al., 2000). In the appendix, an example of a dialogue act definition taken from (Alexandersson et al., 1998, page 77–78) where dialogue act suggest is reprinted.

Discussion The definition and usage of dialogue acts in VerbMobil has been inspired by a number of theories. Indeed, Austin and Searle were the pioneers (Austin, 1962; Searle, 1969) arguing that the intention of an utterance can be represented by labels. Bunt’s definition of dialogue acts are very much inspired by the work of Allwood. A closer inspection of the relationship between our dialogue acts and those of Bunt (Bunt, 2000) shows that one of the three facets of his dialogue acts - the linguistic form - is outside our definition. Instead, Levinsons view has been adopted (Levinson, 1983). In (Levinson, 1983) it is argued that utterances can be solely described by something called the illocutionary force (together with propositional content) (see Section 2.3.1 on Page 27). Finally, our usage and definition of dialogue acts coincide with the first of Allwood’s five facets of his communicative acts (Allwood, 1977), namely the intention and purpose of the utterance. The main reason for the relatively poor resemblance between our work and that of Allwood is that his view on communication is more general and takes more into account than that of ours. On the other hand, the frameword of Allwood is more tailored towards the study than the implementation of dialogue systems. For example, it is very unclear how a computer system equipped with speech and prosody recognition only, should be able to capture the “notion of conventional force”. However, the dialogue acts do not suffice when it comes to describing and processing our dialogues. The main challenge lies in the nature of spontaneous speech: many contributions contain repetitions, hesitations, thinking-aloud and the like. Still, the main force(s) of the contribution does not change. Also, a sequence of dialogue acts can contribute to form a single force or unit. Our solution are three additional levels of modeling: dialogue moves, dialogue games and dialogue phases as described below. 18

(Alexandersson et al., 1998) does not contain the description of inform feature. This dialogue act was introduced at the very end of the project in an effort to discriminate scenario relevant informs from “deviative” informs, i. e., informs outside the core scenario.

56

2.6.2

Dialogue Moves in VerbMobil

There are several facts and observations motivating the moves layer of the intentional structure. Among the more prominent we have: our data show frequent occurrences of sequences of utterances which together can be viewed as a unit. This unit is detected and used by (human) interpreters, in that they reduce the contribution into its central parts. Our next level in the intentional structure is concerned with this modeling - dialogue moves. Additionally we define the moves in such a way that, given the move, it should be possible to decipher who is having the initiative on the task level. We discuss the difference between our usage of moves and its usage in other contexts. Consider the following example (translation in italics):

Example 1 (cdrom15par: g009ac) 

               ABA018:   initiative             

init suggest

suggest -

suggest -

−           

                      

  

(4)

 

ja , am n¨ achsten Tag hm dann zur¨ uck yes , on next day hm the back yes return on the next day

(5)

hm ja, da k¨ onnen wir eigentlich dann hm yes, the could we really then ja schon mittags fliegen (PART) yet noon fly hm well, then we could in fact fly at noon

(6)

oder wir k¨ onnten nat¨ urlich auch noch or we could of course also still den Nachmittag nutzen, um in the afternoon use, (PART) in Hannover irgendwas zu unternehmen? Hanover something to do? or we could of course use the afternoon, to do something in in Hanover

57



 accept    accept       accept   ABD019:   response       inform     





 

(7) (8) (9)



 (10)              

 (11)       

   ABA020:   suggest -    initiative      

ja , yes , gerne , I’d love to machen wir das make we that let us do that doch . da gibt’s doch Einiges however . there are still some things zum anschauen . to look at sure . there are quite a few things to look at und dann k¨ onnten wir so gegen acht and then could we around eight oder gegen neun zur¨ uckfliegen. or around nine fly back how about flying back at about eight or nine 2

In turn ABA018 the speaker takes the initiative not just by introducing the topic, but also by suggesting flying at noon. This is followed by another, alternative suggestion. These utterances form the first move – the initiative. The second speaker accepts this second suggestion with three utterances of type accept, and makes an additional remark that identifies which one of the suggestions was accepted. Our data is full of another kind of examples which is yet another reason for introducing the moves layer. In these examples, the interlocutor uses different, sometimes even culture specific means of realizing her intention. Whereas the (american) English interlocutors tend to be very direct while rejecting a proposal more or less directly: (12)

MGT017: that is not going to work . reject



it is in the Japanese culture impossible—unless one wants to be impolite— to reject a suggestion in such a way. Instead, there is a standard surface structure for reject a propsal which consists of several dialogue acts. 58

The following example is take from (Siegel, 1996, page 35) where a negotiation dialogue between a German and a Japanese is interpreted. The German interlocutor has suggested to meet on Monday Morning which turns out to be impossible for the Japanese dialogue partner. Since the Japanese culture forbids the American direct style of rejection, the interlocutor chooses the following surface structure: (13)

eto getsubˆ obi no gozenchˆ u wa desu ne watashitachi no tokoro wa kore Monday NO morning WA COP NE we WA this kyˆ ujitsu ni natte orimahite kaisha yasumi na mon de desu ne.dame nan holiday NI get (Polite) company free COP NE bad desu ne.sumimasen COP NE excuse

which translates to: Well Monday Morning (feedback), that is a holiday at our side (give reason), the company is closed (give reason), that is bad (reject), (I’m) sorry (politeness formula). This sequence of dialogue acts19 appearantly expresses a (Japanese) refusal. Still, the core intention of the contribution is the same as in 12. Hence to model our dialogues in a language independent way, we need to abstract away from the culture or language dependent realization. The concrete a pattern in 13 is absent in the (American) English part of our corpus. A very polite German interlocutor might utter something similar, but since the VerbMobil dialogues have a more colloquial style, we rather find direct rejects in the German part of the corpus too. For a characterization of a move within a negotiation scenario we use the labels initiative, response, transfer-initiative and confirm. Central for the characterization is the concept of the forward or backward looking aspects of an utterance, e. g., (Allwood, 1994). By backward looking aspect we mean that – possibly a part of – the turn contains a direct reaction to something, which in the case of negotiation dialogues is, e. g., a proposal of a meeting place introduced earlier in the dialogue. The forward looking aspect roughly covers the cases where something was proposed or a new topic was introduced, which opens up a new discourse segment. The main classes for the negotiation part of a dialogue are: • initiative A (part of a) turn is annotated with the label initiative when 19

In (Siegel, 1996), the sequence of dialogue acts is feedback give reason give reason reject date apologize. The dialogue acts used in (Siegel, 1996) are the set of dialogue acts used in the first phase of VerbMobil (see (Jekat et al., 1995)). The set presented in this thesis contains no reject date but the more general reject. Jekat et al.’s apologize is a special case of Alexandersson et al.’s politeness formula.

59

the turn has a forward looking aspect and (i) when something is suggested and the dialogue contains no open topics, or (ii) when a suggestion refines a previous proposal that has been accepted explicitly, or (iii) when a direct counter proposal is made. • response A (part of a) turn is annotated with the label response when the turn has a backward looking aspect. This occurs (i) when some earlier proposal is rejected or accepted, or (ii) when a declarative or imperative suggestion with an implicit acceptance or rejection contains a refinement of an earlier proposal. • transfer-Initiative A (part of a) turn is annotated with the label transferInitiative when the turn has a forward looking aspect and (ii) when a suggestion is explicitly requested, or (i) when a topic is introduced without the locutor making a suggestion. • confirm A (part of a) turn is annotated with the label confirm 20 when the turn has a backward looking aspect and (i) when a preceding acceptance is confirmed, or (ii) when a summarization of the agreement achieved so far is accepted. We use the following moves for the beginning and the end of the dialogues: • greet A (part of a) turn is annotated with the label greet if the speaker greets and optional introduces herself. • bye A (part of a) turn is annotated with the label bye if the speaker saying goodbye. Additionally, we introduce the following moves for the modeling of clarification dialogues: • clarify-query A (part of a) turn is annotated with the label clarify-query for the cases where a clarification question is performed. This move can have a backward as well as a forward looking aspect. • clarify-answer A (part of a) turn is annotated with the label clarifyanswer for the case where an answer to a clarification question is given. This move can always be characterized by a backward looking aspect. 20 Throughout this thesis, the distinction between the dialogue act confirm and the move confirm is indicated by the type style.

60

We use some additional moves which are outside the pure negotiation but which serve additional phenomena occuring in our corpus: • request-commit A (part of a) turn is annotated with the label requestcommit for the cases where there is a request for commitment, i. e., taking care of booking hotel or table. • commit A (part of a) turn is annotated with the label commit for the case when the speaker commits herself to take care of some action, e. g., booking hotel. • request-describe A (part of a) turn is annotated with the label requestdescribe for the cases where a description of something is requested. • describe A (part of a) turn is annotated with the label describe for the case where the speaker provide a description of something. As for the dialogue acts, we give an example of the definition and instruction for the annotation of moves. In the appendix we give the definition of the move initiative. Discussion Although pointing in the right direction, we believe that the notion of initiative in (Linell & Gustavsson, 1987) is a bit diffuse. In their work a participant is taking the initiative by “adding something to the discourse.” “Something” is too coarse for our purpose, and we allow ourselves to depart from such a general framework. Instead, our usage of initiative and response resembles one of the two levels - the task level - described in (Chu-Carroll & Brown, 1997), where the initiative is modeled on two levels. Chu-Carroll and Brown argued that it is possible, and indeed useful, to separate the task and the dialogue initiative. For the sake of the generation of minutes and summaries we are rather interested in how objects under negotiation are introduced and treated by the dialogue participants. The work of Walker and Whittaker is more concerned with the relation between the control and attentional state within task-oriented and advice-giving dialogues. In (Walker & Whittaker, 1990) it is even questioned whether the notion of control is the same as having the initiative. Our usage of forward- and backward-looking aspects of a move is related to the concept of obligated (backward-looking) and obligating (forwardlooking) functions of a contribution, e. g., (Allwood, 1994). As presented in (Allwood, 1994), his contribution seems to resemble what we call a turn and 61

is thus potentially a bigger chunk than our moves. The obligating and obligated function is more general than our forward/backward-looking aspect in that it has more facets. Whereas our aspects are focused on the attitude towards the task level, Allwood’s functions are - as most parts of his frame work - multi-functional. The unique feature of our framework is the (forward looking) move transfer-initiative - we argue that it is indeed possible to, on the task level, deliberately hand over the initiative as well as take it. This becomes evident when a direct request to suggest, for instance, a date is performed (see (15) in figure 2.7 on page 62). This feature of our model is absent in other approaches known to us. Rather, the initiative is said to be taken by one of the dialogue participants, but never unhanded. This might be due to the importance of taking the initiative for the sake of reaching some goal faster in a man–machine dialogue. According to our experiences with the VerbMobil corpus, this is not necessarily the case. Instead the speakers take their time, act politely, and sometimes even give the initiative away. ... (14)

wir wollen ja da zusammen nach Hannover fahren zu unserm Gesch¨ aftspartner we want there together to Hanover going to our business partner We should go to our business partner in Hanover

(15)

und ich wollte Sie jetzt mal fragen wann es Ihnen denn prinzipiell and I wanted You now particle ask when it You then in principal am besten passen w¨ urde best suit would and I’d like to ask when it would suit you

(16)

also m¨ ogliche Termine w¨ aren bei mir auf den ersten Blick einerseits well possible dates would be to me on the first glance on the one side zwischen dem vierten Juni und dem sechsten Juni oder aber zwischen dem between the fourth June and the sixth Jund or but between the . . . Well, it would be possible between the fourth and sixth of June or between the . . .

...

Figure 2.7: An example of unhanding. In (15), the speaker is requesting a suggestion. Such an act corresponds to the transfer-initiative move.

For the modeling of initiative, there are a number of subtle cases which are more difficult to grasp and define occurring, e. g., where a suggestion from one speaker is very general. The initiative is then, in a narrow sense and according to our definition, taken by the speaker. But the speaker, in this case, is not driving the negotiation that adamantly. However, this 62

observation is non-trivial to quantify and to capture by a sharp definition. Thus, for the sake of simplicity and robustness during processing, we have chosen not to model different levels of initiative. Finally, our choice of using “move” as the term for the modeling described above might be sub-standard since there is already a different concept or term in the literature called move. That move, however, resembles what we call a dialogue act and originates from the work of language games (Wittgenstein, 1974) and conversational games (Sinclair & Coulthard, 1975). Although we have searched for another term, we find that “move” best fits our intention and modeling.

2.6.3

Games in VerbMobil

With the moves as building blocks we can compose games. Games are described by context-free rules, where the left-hand side denotes the name of the game and the right-hand side consists of at least one move. The first is called the opening and the rest - one or more moves - are called the closing. For the course of cooperative negotiation dialogues we use the following basic games: • Introduction This game corresponds to the INIT move. It consists of the turn(s) concerned with the topic introduction of the negotiation. • Negotiation This game is the most characteristic game for our negotiation dialogues. A negotiation game always encompasses an initiative. However, the initiative might be preceded with a transfer-initiative, and succeeded with a response and, finally, one or more optional confirm(s). • Closing This dialogue game, typically found at the end of the entire dialogue, signalizes that the speaker is concluding the conversation. Commonly precluding the actual farewell, this game may also be used to signalize the transition between the different dialogue topics such as accommodation and transportation. Additionally the following phenomena occur in our corpus: • Opening This game is used for the mutual greeting. It usually consists of greets, but might also contain clarifications. • Exit This game signalizes the end of the conversation. The game Exit includes, e. g., the move Bye. 63

• Nego-Incomplete This game is reserved for cases within the dialogue where a certain negotiation between speakers has begun but for some reason, for example an inquiry or an abrupt topic change, has not been completed. • Exchange This dialogue game is used to categorize the exchange of information between speakers that does not contain the kind of explicit suggestions and rejections typical of the Negotiation game but rather the small talk back and forth between speakers. • Commit This game covers the case where the willingness to take care of certain aspects of our scenario, i. e., book actions is expressed. In the appendix is the definition of the Negotiation game taken from the annotation manual. Discussion The concept of “opening move” or “initiating move” and “responding move” used for the modeling of English classroom discourse (Sinclair & Coulthard, 1975) or the map task (Carletta et al., 1997) has served as an inspiration to our model. Since our model differs a bit we avoid nomenclature conflicts by saying that the left part of the game is called opening move and the right, responding part is called closing move. This, since in our model, the number of closing moves is not restricted to exactly one move but is allowed to be zero, one or more. If we compare our work with that of (Carletta et al., 1997), we see that their moves correspond to our dialogue acts whereas their conversational games correspond to our games. Their concept of transactions is absent in our model. Furthermore their frame work is tightly coupled with the goal of the game - the move starting with an initiating move is then completed with its responding move when the goal of the initiating move has been fulfilled. As the authors admit, it is very difficult for a human annotator faced with the transcription of the whole dialogue to annotate such a structure reliably (see (Carletta et al., 1997, p. 10) for more details). There is evidence for the usefulness as well as for negative influences from more complex structures when it comes to the recognition of, for example, dialogue acts. (Qu, Di Eugenio, Lavie, Levin, & Rose, 1997) describe an experiment for the inclution of higher level structures into the recognition of dialogue acts in the JANUS speech-to-speech translation system. There, 64

the cumulative error, i. e., errors due to prior errors, seems to be more problematic if more context is taken into account for the recognition of dialogue acts. In (Poesio & Mikheev, 1998) another experiment based on the Map task corpus is described. They suggest that the inclusion of hierarchical context as found in the map task corpus can positively affect the recognition. However, their recognition is based on a perfectly recognized and thus construed context. (Eckert & Strube, 2000) use a simplified structure inspired by the work of, i. e., (Carletta et al., 1997a, 1997b) where the basic entities are dialogue acts clustered into initiation and acknowledgement. These two entities are then combined to synchronizing units. We believe that their concept is too generic for the focus of our model. Still, our exchange game serves a similar function in our model. Our work can be viewed as a sub-model of the full flavour of the structures in the map task corpus. However, the phenomenon of cumulative error as described above has made us restrict the modeling of games to the initiative and its immediate response(s), possibly with a prepended transferinitiative. Part of the dialogue concerned with the exchange of information outside the scope of the pure negotiation is modelled by the more general exchange game. The only embedded games are the clarification games. Higher level structures are found in our phase level (see below). These render the complex higher level structures used in the map task corpus although they are simpler. The exchange level of dialogue model of SUNDIAL (Bilange, 1991) shares much resemblance with our games level. Especially their negotiation patterns are more or less the same as our negotiation game. However, our optional initial transfer-initiative is missing in their patterns. The dialogue model of LINLIN is simpler than ours. This model, however, has been developed with typed dialogues in mind. Finally, our games resemble the “interaction plans” in the plan recognition model of (Iida & Arita, 1992).

2.6.4

Dialogue Phases in VerbMobil

Whereas the topmost level describes different scenarios, such as, “VMscenario-complete”, the fourth level of our intentional structure ties the games together. In our scenario, cooperative negotiation dialogues consist of a greeting phase followed by a negotiation phase. A negotiation phase is sometimes preceded by an initiation phase where the topic of negotiation is 65

presented/discussed21 and succeeded by a recapitulation of the topic’s result. Finally, the dialogue ends with a good-bye phase possibly preceded by a closing phase where the result of the negotiation as a whole is recapitulated. In VerbMobil, the following dialogue phases are definened and used: • opening includes the greeting until the themes has been introduced. • init includes the part where the topic of a negotiation is presented. • negotiation includes the negotiation. • closing includes the conclusion of the negotiation. • bye includes the saying of good-bye.

2.7

Propositional Content

To model propositional content we use a ontological description language - Discourse Language Representation (dlr) (Koch, 1998; Alexandersson et al., 2000) - inspired by description logic languages such as KL-ONE, e. g. (Brachman & Schmolze, 1985) and CLASSIC, e. g., (Borgida, Brachman, McGuinness, & Resnick, 1989). Just as in such languages there are concepts known as the A-Box which can be instantiated to represent abstract and physical real world object, called the T-Box. A concept has a name and a fixed number of roles which can be filled with other objects belonging to some concept. Concepts are connected by multiple inheritance where a more special object may introduce new roles. We call relations within one object sister-relations or sister-roles and the objects of sister-relations sister objects or sisters. A DLR expression is called direx (DIscourse Representation Expression). The basis for the modeling are objects and situations. Situations are used to model activities of the speakers, such as events and actions, e. g., journeys, meetings and different kind of moves. Objects on the other hand are entities in the world - abstract and concrete - like time (see below), different type of locations and seats. As an example, figure 2.9 shows the textual representation of a direx expression for the sentence well it is about the business meeting in Hanover. Figure 2.10 shows the graphical representation for the same expression. The 21

In our scenario the speakers receive stringent instructions on how to behave. Therefore, there is no discussion or negotiation when it comes to the topic; the topics are introduced by the speaker and the hearer immediately accepts it.

66

abstract_object

object

time

concret_object

ticket geo−location

location nongeo−location

top

hotel

quality

book_action

situation

action

move

event

watch_theater

move_by_bus

topic

Figure 2.8: Clipping of the hierarchy

roles has det and has loc spec are primarily used for translation purposes. These roles shows an example of sister-roles. has appointment:[appointment, has meeting:[meeting, has name=’geschaeftstreffen’], has location:[city,has name=’hannover’, has loc spec=in,has det=unknown]]

Figure 2.9: Textual representation of the propositional content of the sentence well it is about the business meeting in Hanover

Embedded in DLR is another language - Temporal Expression Language (tel22 ) - which is used to model time (Endriss, 1998). tel is a quite surfaceoriented representation language but can be used as an interlingual representation.23 and contains the necessary inferences (see below) in the domain of 22

tel is the successor of ZeitGram (K¨ ussner & Stede, 1995). There are some expressions in German, like morgens which are ambiguous as well as not directly translatable into English. Its meaning is either ‘‘in the morning’’ or “every morning.” 23

67

has_appointment

appointment has_meeting: meeting has_name=geschaeftstreffen has_location: city has_name=hannover has_loc_spec=in has_det=unknown

Figure 2.10: Graphical representation of the proposition content for the sentence well it is about the business meeting in Hanover

appointment scheduling. tel is based on TYPE:VALUE pairs, e. g., Monday at eight o’clock in the morning is represented as an unordered list [dow:mon, pod:morning, tod:08:00]. tel is based on a huge corpus analysis of English and German dialogues in the VerbMobil corpus and contains a wide range of constructs which are divided into five main constructions (see (Alexandersson et al., 2000)): Simple comprises day, day of week, part of day, seasons, holidays etc. Modified comprises qualifications, such as, early or fuzzy expressions such as around two o’clock. Spans comprises durations or open spans Referenced comprises time expressions with reference to some point, such as, two hours from now. Counted comprises constructions for expressions like the second week of June. Expressions in tel are called tempex (TEMPporal EXpression). Below are some examples of tempex expressions taken from our corpus 24 : (17)

How ’bout the afternoon of Monday the ninth? during:[pod:afternoon,dow:mon,dom:9]

(18)

How ’bout the afternoon of Monday the ninth? during:[pod:afternoon,dow:mon,dom:9]

24

See (Endriss, 1998) for more examples

68

(19)

I could do it on a weekend, from Friday the sixth of May to Saturday the seventh of May. interval:[pow:weekend, min between([dow:fri,dom:6,month:may], [dow:sat,dom:7,month:may])]

(20)

What about twenty-ninth through to the second of April?” interval:min between(dom:29,[dom:2,month:apr])

(21)

We need to get another meeting going for about two hours in the next few weeks. [for:fuzzy dur(dur(2,hours)), during:next(int:dur(several, weeks))]

(22)

Are you free on Monday the twenty-sixth after twelve o’clock or how ’bout sometime in the morning on Tuesday the twenty-seventh? from:set([[dow:mon,dom:26,ex after(tod:12:0)], [pod:morning,dow:tue,dom:27]])

For the representation of larger chunks of discourse and for disambiguation we divide utterances as belonging to one of four topics: Scheduling for the actual date schedule (begin and end time, location etc). Traveling for the travel arrangements (departure information, means of transportation etc). Accommodation for hosting (rooms, prices etc). Entertainment for spare time activities (visiting a cinema, restaurant etc). Now we have the tools to represent utterances: The dialogue act, topic, dlr and tel. Here are some more examples taken from our corpus together with their textual representation: (23)

das ist ja am zwanzigsten Januar um elf Uhr vormittags that is on the twentieth of january at eleven o’clock in the morning [SUGGEST, uncertain scheduling, has date:[date,tempex=’tempex(ge 2920 0, [from:[dom:20,month:jan,tod:11:0,pod:morning ger2]])’]]

(24)

so we have to leave Munich at six o’clock

69

[SUGGEST, scheduling, has move:[move,has source location: [city,has name=’muenchen’],has departure time: [date,tempex=’tempex(en 2920 0,[from:tod:6:0])’]]] (25)

so how about a three o’clock flight out of Hannover [SUGGEST, traveling, [has move:[move,has departure time:{time of day:3}, has source location:[city,has name=’hannover’]]]

Discussion We have strived to make our modeling of propositional content language independent and usable for discourse processing and translation. The KLONE like description language is expressive enough to model the content of not just utterances but bigger chunks of discourse by abstracting away from stylistic and linguistic features in favour of a goal-oriented representation. Since the concepts of our ontology are products of an extensive corpus analysis we have a model which is deeply anchored in the domain. While this is an advantage for discourse processing, a drawback of such a comparatively coarse-grained model is, of course, that we lose, e. g., linguistic information while instantiating the domain objects. However, we can represent the result of a complete dialogue with our language (see chapter 3 and 4). Finally, it is, using typed frame-based representations, possible to add information to the discourse state in a formal way using default unification like operations (see section 3.7.3 and 3.7.4). This is very important, since our version of default unification—the overlay operation—has turned out to be the fundamental operation for discourse processing. Our approach to modeling propositional content shares some similarities with that of the C-Star consortium (Levin, Gates, Lavie, & Waibel, 1998). There, an interlingua-approach for a translation scenario is presented where an expression in the so-called interchange format is called domain action (DA). In Addition to concepts and arguments, a DA contains a speech act. A DA has the following syntax: speaker : speechact + concept∗ argument ∗

(2.4)

The speaker tag is either “a:” for agent, or “c:” for customer. The speech 70

act is a single label.25 Concepts and arguments can appear in arbitrary number one after another. A total number of 38 speech acts, 68 concepts, and 86 arguments have been defined yielding 423 DAs. An example of the representation is: (26) On the twelfth we have a single and a double available a:give-information+availability+room (room-type=(single & double), time=(md12))

Admittedly, there are certain linguistic phenomena not covered by their approach, i. e., relative clauses, modality, politeness, anaphora, and number. An evaluation, however, yielded a coverage of 92% measured on two previous unseen dialogues. Clearly, this approach is simpler than ours. It is also unclear if their approach is expressive enough to represent the result of a complete negotiation of a topic, or if the representation is just aimed at representing utterances. Also, their language lacks constructions for representing anaphora.

2.8

Conclusion

We have presented and motivated the design of our modeling of the negotiation dialogues for the VerbMobil scenario. The modeling is based on three pillars: • The dialogue act

The dialogue act is our primary tool for characterizing utterances in VerbMobil. The dialogue act is used to capture the illocutionary force of an utterance. We described how the design of the dialogue act has been influenced by previous literature works, such as those of Austin, Searle, Levinson, Allwood and Bunt.

• The propositional content

Using widely accepted as well as new ideas, the second pillar for representing utterances and discourse is the propositional content (Alexandersson et al., 2000; Koch, 1999). Our representation language - DLR - was inspired by knowledge representation languages, like KL-ONE (Brachman & Schmolze, 1985). Embedded in DLR there is a language - TEL - for representing time expression. The goal of the design—this

25

There are three exceptions: The speech acts “verify”, “request-verification”, and “negate” combine with other speech acts. “So you’re not leaving on Friday, right? has the speech act “request-verification-negate-give-information”.

71

goes for the dialogue acts as well—has been to represent information encoded in the surface structure of language in such an abstract way that we have, in principal, a representation which can be used as an interlingua for translation and additional tasks, such as, resolving ellipses. The most prominent are translation, analysis and reasoning as well as generation. • The intentional structure

Based on the utterances annotated with dialogue acts, we have designed a tree structure that we call the intentional structure. The requirements from the transfer module (i. e., supporting the translation process by distinguishing the dialogue phase) as well as robustness when it comes to processing has a big impact on the design of the structure. However, previous work in the area of mixed initiative, initiative-response, e. g., (Linell & Gustavsson, 1987; Chu-Carroll & Brown, 1997; Walker & Whittaker, 1990; Kowtko et al., 1993) has also been indispensable. Most notably, we have a characterized a phenomenon in a, to our knowledge, new way: the moves layer of the intentional structure. There, we view parts of contributions as chunks expressing the same move type. However, the notion of move in this thesis differs from the traditional use. To us, the move represents a sequence of utterances which together form a certain communicative abstract function, e. g., initiative and response. The traditional usage of the term move is what we call the dialogue act. One of the moves – Transfer-Initiative – is new and not found anywhere else in the literature. The Transfer-Initiative move resembles what is known as how may I help you? utterances, where the system starts by unhanding the initiative.

In the next chapter we will present the algorithms and tools used in the running system.

72

Chapter 3

Dialogue Management in VerbMobil This chapter is devoted to the implementation of the dialogue component of VerbMobil which we will refer to as DiVe. We start by presenting the tasks of the module and the difference between the tasks our module are faced with in comparison to those of dialogue managers of man–machine dialogue systems. Selected parts of both mono-lingual as well as multilingual dialogues show the variability in the VerbMobil data.

3.1

Introduction

Apart from, or maybe in addition to, imperfect input, at least two other factors influence our approach to the understanding and processing of VerbMobil dialogues: • users of the running system have to deal with inaccurate translations and therefore abundantly use repetitions, confirmations, and clarifications. The ideal conditions during the recording of, for instance, the mono-lingual dialogues from our corpus, drastically change the users behaviour. Our mono-lingual part of the corpus does not show ample examples of clarification dialogues due to translation problems. This trait is easy to explain, but does not change the fact that we lack data covering phenomena like clarifications caused by, e. g., one participant suspects a false translation. • VerbMobil mediates the dialogue and, as indicated below (see section 3.4), is not supposed to perform too frequent clarification dia73

logues. It has to obey the principle of unobtrusiveness. As a result, we pursue an approach using methods not unlike those of Information Extraction (i. e. (Hobbs et al., 1996)): we know what to expect, we try to extract as much information as possible, checking consistency on the way. To make it more clear what the consequences are for processing within a mediating scenario, the interpretation of (partial) utterances in a man–machine system is heavily guided by, e. g., the expectations of the system. Many systems1 employ a rather simple way of managing the dialogue. For instance, a well known technique for getting the user to provide information needed for, e. g., a data base look-up is for the system to take the initiative and ask precise questions. There are, of course, other differences between the VerbMobil dialogues and man–machine dialogues. Amongst the more striking ones, the size and complexity of a user contribution in the VerbMobil corpus is on average larger.

3.2

Characteristics of the VerbMobil Dialogues

This section surveys the characteristics of the VerbMobil data from the point of view of dialogue management. As will be shown, the experiences gained during recording of wizard-of-oz dialogues, led us take a mediating approach to dialogue management. Other important characteristics are the variability in verbosity and complexity of user contributions.

3.2.1

Human–human vs. man–machine negotiation dialogue

In contrast to man–machine dialogues, our corpus contains many dialogues where the turns does not consist of one or two utterances, but up until ten or even more. The average number of utterances in our corpus is 2.2 for all dialogues (English and German) annotated with dialogue acts and the distribution of the number of utterances or dialogue acts per turn is shown in table 3.1. The distribution for English, German and Japanese monolingual dialogues are shown in the appendix (pages 195–197). The average turn length is 2.4 for the German dialogues, 1.9 for the English, and 2.5 for the japanese ones. Contrary to the German and English dialogues, 1 There are, of course, systems employing real mixed initiative, i. e., the user might at any point say anything.

74

the Japanese dialogues has a lower number of turns with one dialogue act than turns with two dialogue acts. This shows that the structure of turns is more complex in particular for the Japanese–Japanese dialogues but also for the German–German dialogues than the English–English (see the appendix for more details). 10 sample multi-lingual (German–English) dialogues have been annotated with dialogue acts. The distribution is shown in figure 3 in the appendix. For these dialogues, the average turn length is 1.8. The number of multi-lingual dialogues are, however, to small to draw any conclusions. Still, the curves indicates a more complex structure than the monolignual dialogues in that the number of turns with 2 dialogue acts are twice as many as the one with one dialogue act.

3.2.2

Length of the dialogues

Table 3.1 shows the annotated CD-ROMs—“# CD-ROMs”, the number of annotated dialogues—“# dial.” and dialogue acts—“# dial.acts”, and the mean, minimal, and maximal length of dialogues, measured in dialogue acts. Table 3.1: Annotated CD-ROMs language German English Japanese Sum

3.2.3

# CD-ROMs 13 6 2 21

# dial. 738 375 402 1505

# dial. acts 37954 22682 15574 76210

mean 51.24 59.93 39.52 50.41

min 6 7 16 6

max 208 347 83 347

Behavioural differences due to cultural differences

• While German speakers tend to be polite and offers the dialogue partner to take the initiative and pose a suggestion, the American English speakers are very direct and efficient, often presenting parts of their calendar asking the partner to select one of the proposed dates. • While our English parts of the corpus contain rather short turns, the German contain a lot of phenomena like thinking aloud and thereby producing, e. g., suggestions which are immediately rejected by the same speaker in the same turn. • Apart from the big difference between Japanese and European languages, the way one behaves within the different cultures is different 75

too in that Japanese speakers are very polite. For negotiations, the way one negotiates is (for us Europeans) somewhat extreme: there are no direct rejections since it is impolite to reject a suggestion outright. Instead, the following pattern is common: the suggestion is repeated (feedback) followed by at least one explanation why it is impossible to accept the suggestion. Not until this point is the actual rejection uttered and is followed by an obligatory excuse (Siegel, 1996, page 35). This negotiation pattern is possible in the German culture but rare 2 and absent in the (American) English speaking culture.

3.2.4

Turn complexity

Some turns are rather complex as shown by the utterances 27 – 36. This turn also shows another particularity in our data. The speaker tends to think out loud, verbalizing utterances which humans can filter due to, e. g., prosodic characteristics and lack of (new or interesting) information, but, which are hard for a machine to interpret as deliberations or small talk. (27)

ja also well

(28)

Zu- nach Hannover ist die Verbindung von M"unchen aus auch an und f"ur sich auch recht g"unstig, Trai- to Hanover is the connection from Munich also by itself also pretty good

(29)

deshalb , im $I-$C-$E ist ’s auch an und f"ur sich kein Problem. therefore , in ICE is it no problem

(30)

w"ar’ mir auch sehr recht. that would be okay for me

(31)

das heißt , ab dem zehnten Februar w"are es tut mir ,(REJECT) liegen zwei da geht es fr"uhestens am achtzehnten Februar , dem . (SUGGEST) here a set of possible options is suggested

2

German Example: cdrom7 ,m068n JUJ018: ja , das kommt mir auch sehr gelegen . (ACCEPT) lieber an dem Samstag oder lieber an dem Sonntag ? (SUGGEST) MCE019:

(yes)  AAK061:   da30@ beauftrag’ ich meine Sekret"arin commit initiative . (I’ll tell my secretary)   accept - @30oh @30ja , (oh yes) AAJ062: accept - @30prima . (great) response 

192

2

A sample game definition Negotiation This dialogue game mostly encompasses the move initiative from speaker A followed by a response move by speaker B. Optionally speaker B opens the game with an initial transfer intiative. The game response can also be confirmed by speaker A. A special case is the response initiative: Some contributions from one speaker may function as a response and as an initiative at the same time. However, it can include different negotitating games as well that relate to the appointment scheduling task. It commonly encompasses the moves: initiative, transfer intiative, response, describe, response initiative and confirm. Example 10 (Cdrom39multi-par: moo2e.gam) 

      negotiation       

AB: initiative

BA: response

     

   

 okay I am free between the suggest - twenty second and the twenty  fourth  what is your schedule request comment like?   ja am vierundzwanzigsten habe  ich einen freien Termin (I accept have an appointment free on    the 24th) 2

Example 11 (Cdrom23-par: e004ac.gam) 

          negotiation           

BA: transferinitiative

AB: initiative

BA: response

     

     

 

  I am going off on   vacation then for explained reject - three weeks so until   the first week of    December all right feedback positive  I am away the beginning of that    week but I could do it towards suggest the end of the week or the    beginning of the next week  accept - okay the beginning of the next week accept sounds good 2

193

Example 12 (Cdrom15-par: g002ac.gam) 





oder die Woche danach? (or the suggest BA:  week after that?)   initiative    Montag Dienstag Mittwoch   suggest - ein-und (Monday, Tuesday,  AB:    Wednesday first) response      Dienstag Mittwoch (Tuesday, BA: suggest  Wednesday)  response     negotiation  Dienstag Mittwoch?  AB: request clarify  (Tuesday, Wednesday?)  clarify-query      BA: feedback positive - ja (yes)    clarify-answer    das w"urde gehen (that is  accept      possible) AB:   accept - hervorragend (splended)  response accept - ja (yes) 2

194

Corpus Characteristics To provide a picture of the distribution of dialogue acts per turn in our corpus, we have counted the dialogue acts for each turn for our annotated dialogues. Basis for the curves are 794 German–German, 375 English–English and 403 Japanese–Japanese dialogues. Three curves are provided: The first (i. e., figure 1) includes all turns, the second all turns including the dialogue acts greet, introduce and bye, i. e., the upper graph in figure 2. The third describes the length of all negotiation turns, i. e., turns not containing the dialogue acts in the second counting. An example of this is the lower graph in figure 2. Additionally we provide the same information for a small set (10 dialogues) of German–English multilingual dialogues (see figure 3). The figures there are—due to the low number of annotated dialogues—to be viewed merely as a hint than as a finding. 7000 German (avg: 2.434553) English (avg: 1.907913) Japanese (avg: 2.451446) 6000

frequency

5000

4000

3000

2000

1000

0

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 segments per turn

Figure 1: Numner of dialogue acts per turn for all turns for three sets of monolingual dialogues (German–German, English–English and Japanese– Japanese).

195

1000 German (greet/introdue/bye turns) (avg: 2.783030) English (greet/introdue/bye turns) (avg: 2.193168) Japanese (greet/introdue/bye turns) (avg: 1.855114)

900 800 700

frequency

600 500 400 300 200 100 0

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 segments per turn German (negotiation turns) (avg: 2.355079) English (negotiation turns) (avg: 1.886131) Japanese (negotiation turns) (avg: 2.679409)

6000

5000

frequency

4000

3000

2000

1000

0

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 segments per turn

Figure 2: Number of dialogue acts per turn for three sets of monolingual dialogues. The upper figure describe turns containing the dialogue acts greet, introduce and bye whereas the lower turns not containing these 196 dialogue acts.

110 German/English multilingual all turns (avg: 1.83) greet/introduce/bye turns (avg: 2.10) negotiation turns (avg: 1.79)

100 90 80

frequency

70 60 50 40 30 20 10 0

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 segments per turn

Figure 3: Number of dialogue acts per turn for turns containing the 10 multilingual German–English dialogues. Three curves are given: one for the turns containing the dialogue acts greet, introduce and bye, one for all other turns and, finally, one all turns.

197

Two Sample Dialogues We provide our sample dialogue—Reinhard4 —annotated with propositional content and dialogue act as processed in the VerbMobil system. The trace is followed by the content of the dialogue memory and the corresponding German summary. ;************************************************************* ; Dialog: 3000/reinhard4 ; Time : NIL ;************************************************************* ; ; ; ;

sch"onen Guten Tag ^ t1000ageb2e3r1geySHALLOW (((GREET . 30.064) 10.021) ((POLITENESS_FORMULA . 40.993) 13.664)) [GREET,any_topic]

; ; ; ;

hello this is speaking ^ t1001benb3e4r1genSHALLOW (((GREET . 54.993) 10.998) ((INTRODUCE . 61.868) 12.373)) [GREET,any_topic,has_addressee:[person, has_last_name=’’]] hello hello Mr t1001benb4e5r1geySHALLOW (((GREET . 51.717) 12.929) ((INTRODUCE . 60.128) 15.032)) [GREET,any_topic,has_addressee:[person, has_last_name=’’]]

; ; ; ;

; ; ; ; ; ; ; ; ;

; ; ; ;

ja es geht um das Gesch"aftstreffen in Hannover ^ t1002ageb5e6r1genSHALLOW (((INIT . 58.054) 7.256) ((INFORM . 65.007) 8.125)) [INIT,scheduling,has_appointment:[appointment,has_meeting:[meeting, has_name=’geschaeftstreffen’],has_location:[city, has_name=’hannover’,has_loc_spec=in,has_det=unknown]]] das ist ja am zwanzigsten Januar um elf Uhr vormittags t1002ageb6e7r1geySHALLOW (((SUGGEST . 71.321) 7.132) ((ACCEPT . 77.315) 7.731)) [SUGGEST,uncertain_scheduling,has_date:[date, tempex=’tempex(ge_2920_0,[from:[dom:20,month:jan,tod:11:0, pod:morning_ger2]])’]] so we have to leave Munich at six o’clock ^ t1003benb7e8r1geySHALLOW (((SUGGEST . 73.986) 8.220) ((INFORM_FEATURE . 78.199) 8.688)) [SUGGEST,travelling,has_move:[move,has_source_location:[city, has_name=’muenchen’],has_departure_time:[date, tempex=’tempex(en_2920_0,[from:tod:6:0])’]]]

198

; ; ; ;

; ; ; ;

; ; ; ;

vielleicht fahren wir lieber den Tag davor ^ t1004ageb8e9r1genSHALLOW (((SUGGEST . 65.576) 9.368) ((INFORM . 75.163) 10.737)) [SUGGEST,travelling,has_move:[move,has_departure_time:[date, tempex=’tempex(ge_2920_1,[from:neg_shift(dur(1,days),ana_point)])’ ]]] da gibt es einen Zug um zwei Uhr t1004ageb9e10r1geySHALLOW (((SUGGEST . 58.064) 7.258) ((INFORM . 65.774) 8.221)) [SUGGEST,travelling,has_move:[move,has_transportation:[rail], has_departure_time:[date,tempex=’tempex(ge_2920_2,[from:tod:2:0])’ ]]]

; ; ; ;

I would prefer to leave at five ^ t1005benb10e11r1genSHALLOW (((SUGGEST . 55.135) 7.876) ((REJECT . 63.196) 9.028)) [SUGGEST,travelling,has_move:[move,has_agent:[speaker], has_departure_time:[date,tempex=’tempex(en_2920_1,[from:tod:5:0])’ ]]] I am pretty busy that day t1005benb11e12r1geySHALLOW (((GIVE_REASON . 35.271) 5.878) ((REJECT . 39.514) 6.585)) [GIVE_REASON,any_topic,has_agent:[speaker]]

; ; ; ; ; ; ; ;

ja gerne k"onnen wir machen ^ t1006ageb12e13r1genSHALLOW (((ACCEPT . 31.367) 6.273) ((SUGGEST . 42.486) 8.497)) [ACCEPT,any_topic] dann brauchen wir noch eine "Ubernachtung t1006ageb13e14r1geySHALLOW (((INIT . 43.727) 7.287) ((SUGGEST . 46.141) 7.690)) [INIT,accommodation,has_accommodation:[hotel,has_amount=1]]

; ; ; ;

yes ^ t1007benb14e15r1geySHALLOW (((ACCEPT . 14.225) 14.225) ((INFORM . 19.302) 19.302)) [ACCEPT,any_topic]

; ; ; ;

ich kenne ein gutes Hotel im Stadtzentrum ^ t1008ageb15e16r1genSHALLOW (((INIT . 71.335) 10.190) ((SUGGEST . 75.288) 10.755)) [INIT,accommodation,has_agent:[speaker],has_accommodation:[hotel, has_amount=1,has_quality_indication=good,has_location:[nongeo_location,has_name=’stadtzentrum’,has_loc_spec=in,has_det=def]]] ein Einzelzimmer kostet achtzig Euro t1008ageb16e17r1geySHALLOW (((INFORM . 57.922) 11.584) ((INFORM_FEATURE . 57.937) 11.587)) [INFORM,accommodation,consists_of:[room,has_amount=1, has_size=single,has_price:[price,has_amount=80,

; ; ; ;

199

has_currency=euro]]] ; ; ; ; ; ; ; ;

that sounds pretty good ^ t1009benb17e18r1genSHALLOW (((ACCEPT . 23.778) 5.944) ((SUGGEST . 40.537) 10.134)) [ACCEPT,any_topic] can you please do the reservations t1009benb18e19r1geySHALLOW (((SUGGEST . 65.944) 10.990) ((REQUEST_SUGGEST . 69.315) 11.552)) [SUGGEST,any_topic,has_book_action:[book_action, has_agent:[addressee]]]

; ; ; ;

sicher dann machen wir ein gemeinsames Abendessen in einem Restaurant ^ t1010ageb19e20r1genSHALLOW (((ACCEPT . 87.567) 8.756) ((SUGGEST . 89.498) 8.949)) [ACCEPT,entertainment,has_entertainment:[entertainment, has_theme:[dine_out],has_location:[nongeo_location, has_name=’restaurant’,has_loc_spec=in,has_det=undef]]] ich werde einen Tisch reservieren t1010ageb20e21r1geySHALLOW (((COMMIT . 33.705) 6.741) ((REQUEST_COMMIT . 57.783) 11.556)) [COMMIT,any_topic,has_book_action:[book_action,has_agent:[speaker]]]

; ; ; ; ; ; ; ; ; ; ; ;

that would be nice ^ t1011benb21e22r1genSHALLOW (((ACCEPT . 30.028) 7.507) ((SUGGEST . 35.490) 8.872)) [ACCEPT,any_topic] let us meet at the station on Wednesday t1011benb22e23r1geySHALLOW (((SUGGEST . 62.319) 7.789) ((ACCEPT . 65.925) 8.240)) [SUGGEST,scheduling,has_appointment:[appointment, has_location:[nongeo_location,has_name=’bahnhof’,has_loc_spec=at, has_det=def],has_date:[date, tempex=’tempex(en_2920_2,[from:dow:wed])’]]]

; ; ; ;

um halb zehn am Bahnhof ^ t1012ageb23e24r1geySHALLOW (((ACCEPT . 39.409) 7.881) ((SUGGEST . 42.095) 8.419)) [ACCEPT,uncertain_scheduling,has_date:[date, tempex=’tempex(ge_2920_3,[from:tod:9:30])’], has_location:[nongeo_location,has_name=’bahnhof’]]

; ; ; ;

good see you then ^ t1013benb24e25r1geySHALLOW (((BYE . 24.403) 6.100) ((ACCEPT . 31.609) 7.902)) [BYE,any_topic,has_agent:[addressee]]

; ;

bis dann ^ t1014ageb25e26r1geySHALLOW

200

; ; NIL

(((BYE . 9.760) 4.880) ((ACCEPT . 22.613) 11.306)) [BYE,any_topic]

====================== DIALOGUE FRAME ====================== begin: 5. July 2002, 11:51 pm end: 6. August 2002, 2:55 pm participants: VM_SPEAKER (SPEAKER2) [en] [b] VM_SPEAKER (SPEAKER1) [] [ge] [a]

=========================================== SCHEDULING frame [open] =========================================== theme:

attitudes: (#) ===> ACCEPT relations: ((MORE_SPECIFIC_THAN . #)) APPOINTMENT (P5*+0) HAS_LOCATION --> CITY (P4*) HAS_NAME=hannover HAS_MEETING --> MEETING (P3**) HAS_NAME=geschaeftstreffen HAS_DATE --> DATE (P5*) TEMPEX=tempex(ge_2920_0, from:[year:2002, month:jan, dom:20, pod:morning_ger2, tod:11:0]) attitudes: (#) ===> ACCEPT relations: ((MORE_SPECIFIC_THAN . #) (MORE_SPECIFIC_THAN . #)) APPOINTMENT (P29*+0) HAS_LOCATION --> NONGEO_LOCATION (P30***) HAS_NAME=bahnhof HAS_DATE --> DATE (P29*) TEMPEX=tempex(ge_2920_3, from:[year:2002, month:jan, dom:16, tod:9:30])

201

=========================================== ENTERTAINMENT frame [open] =========================================== theme:

attitudes: (#) ===> ACCEPT relations: NIL ENTERTAINMENT (P22**) HAS_THEME --> DINE_OUT (P23**) HAS_LOCATION --> NONGEO_LOCATION (P24**) HAS_NAME=restaurant attitudes: (# #) ===> ACCEPT relations: NIL BOOK_ACTION (P25*) HAS_AGENT --> VM_SPEAKER (SPEAKER1) HAS_LAST_NAME= HAS_LANGUAGE=ge HAS_CHANNEL=a =========================================== ACCOMMODATION frame [open] =========================================== theme:

attitudes: (# #) ===> ACCEPT relations: ((MORE_SPECIFIC_THAN . #)) HOTEL (P17*) CONSISTS_OF --> ROOM (P19**) HAS_PRICE --> PRICE (P20**) HAS_CURRENCY=EURO HAS_AMOUNT=80 HAS_SIZE=SINGLE HAS_AMOUNT=1 HAS_AMOUNT=1 HAS_QUALITY_INDICATION=GOOD HAS_LOCATION --> NONGEO_LOCATION (P18*) HAS_NAME=stadtzentrum attitudes: (#) ===> ACCEPT relations: NIL

202

BOOK_ACTION (P21*) HAS_AGENT --> VM_SPEAKER (SPEAKER1) HAS_LAST_NAME= HAS_LANGUAGE=ge HAS_CHANNEL=a =========================================== TRAVELLING frame [open] =========================================== theme:

attitudes: (# #) ===> ACCEPT relations: NIL JOURNEY (P14*+0) HAS_MOVE_THERE --> MOVE (P14*) HAS_TRANSPORTATION --> RAIL (P12**) HAS_DEST_LOCATION --> CITY (P4*) HAS_NAME=hannover HAS_DEPARTURE_TIME --> DATE (P15*) TEMPEX=tempex(en_2920_1, from: [year:2002, month:jan, dom:19, pod:pm, tod:5:0])

VERBMOBIL ERGEBNISPROTOKOLL Nr. 1 ______________________________________________________________________ Teilnehmer: Sprecher B, Thompson Datum: 6.8.2002 Uhrzeit: 15:12 Uhr bis 15:13 Uhr Thema: Reise mit Treffen Unterkunft und Freizeitgestaltung ______________________________________________________________________ GESPR"ACHSERGEBNISSE:

Terminabsprache: Sprecher B und Thompson vereinbarten ein Gesch"aftstreffen am 20. Januar 2002 um 11 Uhr am Vormittag in Hannover. Sprecher B und Thompson treffen sich am 16. Januar 2002 um halb 10 in einem Bahnhof.

203

Reiseplanung: Eine Reise wurde vereinbart. Die Hinfahrt nach Hannover mit der Bahn beginnt am 19. Januar 2002 um 5 Uhr am Nachmittag.

Unterkunft: Ein Hotel in einem Stadtzentrum wurde vereinbart. Ein Einzelzimmer kostet 80 Euro. Thompson k"ummert sich um die Reservierung.

Freizeit: Ein Essen in einem Restaurant wurde vereinbart. Thompson k"ummert sich um die Reservierung. ______________________________________________________________________ Protokollgenerierung automatisch am 6.8.2002 15:15:58 h

A bigger example—the Dialogue of the Week ddw33 This dialogue is the Dialog der Woche—ddw-33 1 from the third of November 1999 together with the German and English summary. In the trace, the spoken followed by the recognized and segmented chains for each turn are given. Below the trace, the selected (see section 4.4.1) content of the dialogue memory is given. ;************************************************************* ; Dialog: 99_33_AC ; Time : NIL ;************************************************************* - hi, I’m calling about the trip to Hanover, in March. ; - hi and clear ; ^ t103benb2e3r1genSHALLOW ; (((GREET . 39.984) 13.328) ((INIT . 44.642) 14.880)) ; [GREET,any_topic] ; - that trip to Hanover on in on shoot ; t103benb3e4r1geySHALLOW ; (((REQUEST_COMMENT . 86.046) 10.755) ((INIT . 88.607) 11.075)) ; [REQUEST_COMMENT,travelling,has_move:[move,has_dest_location:[city, has_name=’hannover’,has_det=unknown]]] - das is’ eine wunderbare Idee, da komm’ ich doch glatt mit. ; - das ist eine wunderbare Idee 1

Dialogue of the week

204

; ; ; ; ; ; ;

^ t104ageb4e5r1genSHALLOW (((ACCEPT . 28.081) 5.616) ((INFORM . 50.517) 10.103)) [ACCEPT,any_topic] da komme ich doch glaube nicht t104ageb5e6r1geySHALLOW (((GIVE_REASON . 54.244) 9.040) ((ACCEPT . 54.507) 9.084)) [GIVE_REASON,any_topic,has_agent:[speaker]]

- shall we travel on the first of March then and return on the second? ; - we travel on the the first of March then ; ^ t105benb6e7r1genSHALLOW ; (((SUGGEST . 95.022) 9.502) ((REJECT . 106.574) 10.657)) ; [SUGGEST,travelling,has_move:[move,has_addressee:[person, has_sex=fem,has_first_name=’’], has_departure_time:[date,tempex=’tempex(en_28042_0,[from: [dom:1,month:mar]])’]]] ; - and return on the second ; t105benb7e8r1geySHALLOW ; (((SUGGEST . 47.415) 9.483) ((REJECT . 48.738) 9.747)) ; [SUGGEST,travelling,has_move_back:[move,has_departure_time:[date, tempex=’tempex(en_28042_1,[from:dom:2])’]]] - mir w"urde es am zweiten M"arz besser passen. ; - mir w"urde es am zweiten M"arz besser passen ; ^ t106ageb8e9r1geySHALLOW ; (((SUGGEST . 59.840) 7.480) ((ACCEPT . 68.829) 8.603)) ; [SUGGEST,uncertain_scheduling,has_date:[date, tempex=’tempex(ge_28042_0,[from:[dom:2,month:mar]])’]] - good so, we we’ll leave Hamburg on the first and return on the second // I suggest we travel by train. ; - I would so we were to leave Hamburg on the first ; ^ t107benb9e10r1genSHALLOW ; (((INFORM . 96.959) 8.814) ((REJECT . 98.477) 8.952)) ; [INFORM,travelling,has_move:[move,has_source_location:[city, has_name=’hamburg’],has_departure_time:[date, tempex=’tempex(en_28042_2,[from:dom:1])’]]] ; - and we can on the second ; t107benb10e11r1geySHALLOW ; (((SUGGEST . 42.756) 7.126) ((ACCEPT . 46.975) 7.829)) ; [SUGGEST,uncertain_scheduling,has_date:[date, tempex=’tempex(en_28042_3,[from:dom:2])’]] - ja, mit dem Zug fahren finde ich gut. ; - ja mit dem Zug ; ^ t108ageb11e12r1genSHALLOW ; (((ACCEPT . 23.735) 5.933) ((SUGGEST . 27.805) 6.951)) ; [ACCEPT,travelling,has_move:[move,has_transportation:[rail]]] ; - fahren finde ich gut

205

; ; ;

t108ageb12e13r1geySHALLOW (((ACCEPT . 34.604) 8.651) ((SUGGEST . 42.556) 10.639)) [ACCEPT,travelling,has_move:[move,has_agent:[speaker]]]

- good, there is a train that leaves at ten in the morning. shall we take that train? ; - there is a train that leaves at ten in the morning so we take that train ; ^ t109benb13e14r1geySHALLOW ; (((INFORM_FEATURE . 146.674) 9.167) ((SUGGEST . 151.907) 9.494)) ; [INFORM_FEATURE,travelling,has_move_there:[move,has_transportation: [rail],has_arrival_time:[date, tempex=’tempex(en_28042_4,[from:tod:10:0])’], has_arrival_time:[date,tempex=’tempex(en_28042_5, [from:pod:morning_ger1])’],has_transportation:[rail]]] - ja, dieser Zeitpunkt passt mir sehr gut. ; - ja diese Zeitpunkt pa"st mir sehr gut ; ^ t110ageb14e15r1geySHALLOW ; (((ACCEPT . 50.438) 7.205) ((SUGGEST . 64.508) 9.215)) ; [ACCEPT,any_topic] - good, do you know a hotel in Hanover? ; - good did you know the hotel in Hanover ; ^ t111benb15e16r1geySHALLOW ; (((INFORM_FEATURE . 75.444) 9.430) ((SUGGEST . 83.008) 10.376)) ; [INFORM_FEATURE,accommodation,has_agent:[addressee], has_accommodation:[hotel,has_location:[city,has_name=’hannover’, has_det=undet],has_det=def]] - ja, ich kenne mehrere Hotels in Hannover. ; - ja ich kenne mehrere Hotels in Hannover ja ; ^ t112ageb16e17r1geySHALLOW ; (((INFORM . 69.915) 8.739) ((SUGGEST . 74.666) 9.333)) ; [INFORM,accommodation,has_agent:[speaker],has_accommodation:[hotel, has_amount_indication=several,has_location:[city,has_name=’hannover’, has_det=undet]]] - good. can you make reservations for us there? ; - you make reservations four us there ; ^ t113benb17e18r1geySHALLOW ; (((REQUEST_SUGGEST . 72.282) 12.047) ((INFORM_FEATURE . 73.725) 12.287)) ; [REQUEST_SUGGEST,any_topic,has_book_action:[book_action, has_agent:[addressee]]] - k"onn’ Sie das bitte wiederholen? ; - k"onnen Sie das bitte wiederholen ; ^ t114ageb18e19r1geySHALLOW ; (((REQUEST . 36.478) 7.295) ((REQUEST_SUGGEST . 50.099) 10.019)) ; [REQUEST,any_topic,has_agent:[addressee]]

206

- yes. will you please make reservations at a hotel? ; - would you please make reservations at the hotel ; ^ t115benb19e20r1geySHALLOW ; (((COMMIT . 75.067) 9.383) ((SUGGEST . 83.135) 10.391)) ; [COMMIT,accommodation,has_book_action:[book_action,has_agent: [addressee],has_book_theme:[hotel,has_loc_spec=at,has_det=def]]] - ja, welches Hotel von den dreien m\"ochten wir nehm’? ; - ja welches Hotel von den dreien m"ochten wir nehmen ; ^ t116ageb20e21r1geySHALLOW ; (((INIT . 84.874) 9.430) ((INFORM . 95.995) 10.666)) ; [INIT,accommodation,has_book_action:[book_action, has_book_theme:[hotel]]] - you choose. ; - yeah she is ; ^ t117benb21e22r1geySHALLOW ; (((ACCEPT . 36.332) 12.110) ((INFORM . 41.427) 13.809)) ; [ACCEPT,any_topic] - ich schlage das Hotel Luisenhof vor. ; - ich schlage das Hotel lohnt diese denn Hof vor ; ^ t118ageb22e23r1geySHALLOW ; (((SUGGEST . 92.379) 10.264) ((COMMIT . 99.272) 11.030)) ; [SUGGEST,accommodation,has_agent:[speaker], has_accommodation:[hotel,has_det=def]] - good. I trust your choice. why don’t we meet at the station on Wednesday? ; - good I will trust your choice ; ^ t119benb23e24r1genSHALLOW ; (((ACCEPT . 67.657) 11.276) ((BYE . 69.806) 11.634)) ; [ACCEPT,any_topic,has_agent:[speaker]] ; - why don’t we meet at the station on Wednesday ; t119benb24e25r1geySHALLOW ; (((SUGGEST . 67.800) 7.533) ((REQUEST_SUGGEST . 94.176) 10.464)) ; [SUGGEST,scheduling,has_appointment:[appointment, has_location:[nongeo_location,has_name=’bahnhof’, has_loc_spec=at,has_det=def],has_date:[date, tempex=’tempex(en_28042_7,[from:dow:wed])’]]] - ja, dann treffen wir uns doch am Mittwoch am Bahnhof. ; - ja dann treffen wir uns doch am Mittwoch am Bahnhof ; ^ t120ageb25e26r1geySHALLOW ; (((ACCEPT . 60.055) 6.005) ((SUGGEST . 62.715) 6.271)) ; [ACCEPT,scheduling,has_appointment:[appointment,has_date:[date, tempex=’tempex(ge_28042_1,[from:dow:wed])’], has_location:[nongeo_location,has_name=’bahnhof’, has_loc_spec=at,has_det=def]]]

207

- good. at nine forty five $A-$M. ; - great at nine ; ^ t121benb26e27r1genSHALLOW ; (((ACCEPT . 29.854) 9.951) ((SUGGEST . 34.135) 11.378)) ; [ACCEPT,uncertain_scheduling,has_date:[date, tempex=’tempex(en_28042_8,[from:tod:9:0])’]] ; - forty five $A-$M ; t121benb27e28r1geySHALLOW ; (((ACCEPT . 47.872) 15.957) ((INFORM_FEATURE . 48.404) 16.134)) ; [ACCEPT,uncertain_scheduling,has_date:[date, tempex=’tempex(en_28042_9,[from:pod:am])’]] - einverstanden. um Viertel vor zehn treffen wir uns am Bahnhof. ; - einverstanden um Viertel vor zehn treffen wir uns am Bahnhof ; ^ t122ageb28e29r1geySHALLOW ; (((ACCEPT . 74.036) 7.403) ((SUGGEST . 78.333) 7.833)) ; [ACCEPT,scheduling,has_appointment:[appointment,has_date:[date, tempex=’tempex(ge_28042_2,[from:tod:9:45])’], has_location:[nongeo_location,has_name=’bahnhof’, has_loc_spec=at,has_det=def]]] - good. see you there. ; - good see you then ; ^ t123benb29e30r1geySHALLOW ; (((BYE . 24.033) 6.008) ((ACCEPT . 31.635) 7.908)) ; [BYE,any_topic,has_agent:[addressee]] - ja, alles klar. wann fahren wir zur"uck? ; - ja alles klar ; ^ t124ageb30e31r1genSHALLOW ; (((ACCEPT . 17.700) 5.900) ((BYE . 20.166) 6.722)) ; [ACCEPT,any_topic] ; - wann fahren wir zur"uck ; t124ageb31e32r1geySHALLOW ; (((REQUEST_SUGGEST . 38.082) 9.520) ((SUGGEST . 43.212) 10.803)) ; [REQUEST_SUGGEST,travelling,has_move_back:[move, has_departure_time:[date]]] - I think, we should return on Thursday evening. there is a six thirty train. why don’t we take that one? ; - should return on Thursday evening ; ^ t125benb32e33r1genSHALLOW ; (((SUGGEST . 54.203) 10.840) ((ACCEPT . 62.967) 12.593)) ; [SUGGEST,travelling,has_move_back:[move,has_departure_time:[date, tempex=’tempex(en_28042_10,[from:[dow:thu,pod:evening]])’]]] ; - there is the six thirty ; t125benb33e34r1genSHALLOW ; (((INFORM_FEATURE . 44.202) 8.840) ((SUGGEST . 46.394) 9.278))

208

; ; ; ; ;

[INFORM_FEATURE,uncertain_scheduling,has_date:[date, tempex=’tempex(en_28042_11,[from:tod:6:30])’]] train why don’t we take that one t125benb34e35r1geySHALLOW (((SUGGEST . 73.156) 10.450) ((REQUEST_SUGGEST . 90.955) 12.993)) [SUGGEST,travelling,has_move:[move,has_transportation:[rail]]]

- gut. um Donnerstach Abend fahren wir zur"uck. super Sache. ; - gut und Donnerstag abend fahren wir zur"uck ; ^ t126ageb35e36r1genSHALLOW ; (((SUGGEST . 57.070) 8.152) ((ACCEPT . 64.842) 9.263)) ; [SUGGEST,travelling,has_move_back:[move,has_departure_time:[date, tempex=’tempex(ge_28042_3,[from:[dow:thu,pod:evening]])’]]] ; - super Sache ; t126ageb36e37r1geySHALLOW ; (((ACCEPT . 22.597) 11.298) ((INFORM . 33.639) 16.819)) ; [ACCEPT,any_topic] - excellent. so, I see you on Wednesday morning. bye. ; - excellent so I will see you on Wednesday morning ; ^ t127benb37e38r1genSHALLOW ; (((ACCEPT . 66.313) 7.368) ((SUGGEST . 71.151) 7.905)) ; [ACCEPT,uncertain_scheduling,has_date:[date, tempex=’tempex(en_28042_13,[from:[dow:wed,pod:morning_ger1]])’]] ; - bye ; t127benb38e39r1geySHALLOW ; (((BYE . 10.547) 10.547) ((ACCEPT . 20.013) 20.013)) ; [BYE,any_topic] - m"ochten Sie die Preise vom Hotel Luisenhof noch wissen? ; - m"ochten Sie die Preise vom Hotel Juli diesen Hof noch wissen ; ^ t128ageb39e40r1geySHALLOW ; (((REQUEST . 124.347) 11.304) ((SUGGEST . 126.908) 11.537)) ; [REQUEST,accommodation,has_agent:[addressee],has_accommodation: [hotel,has_det=def],has_date:[date, tempex=’tempex(ge_28042_4,[from:month:jul])’]] - no, that’s okay. you take care of that. ; - no that is okay could take care of that mhm ; ^ t129benb40e41r1geySHALLOW ; (((REQUEST_SUGGEST . 93.184) 9.318) ((ACCEPT . 98.854) 9.885)) ; [REQUEST_SUGGEST,any_topic] ; - that ; ^ t130aenb41e42r1geySHALLOW ; (((ACCEPT . 13.415) 13.415) ((INFORM . 14.796) 14.796)) ; [ACCEPT,any_topic] - tsch"uss. ; - tsch"u"s

209

; ; ;

^ t131bgeb42e43r1geySHALLOW (((BYE . 10.995) 10.995) ((ACCEPT . 19.939) 19.939)) [BYE,any_topic]

- bye. ; - bye ; ^ t132aenb43e44r1geySHALLOW ; (((BYE . 8.090) 8.090) ((ACCEPT . 20.824) 20.824)) ; [BYE,any_topic]

Here is the content of the dialogue memory: DIA(119): (dm-show-tm) ====================== DIALOGUE FRAME ====================== begin: 26. September 2002, 10:11 pm end: 1. October 2002, 1:10 am participants: VM_SPEAKER (SPEAKER1) [en] [b] VM_SPEAKER (SPEAKER2) [FEM] [] [ge] [a]

=========================================== ACCOMMODATION frame [open] =========================================== theme:

attitudes: (#) ===> ACCEPT relations: ((MORE_SPECIFIC_THAN . #)) BOOK_ACTION (P24*) HAS_AGENT --> VM_SPEAKER (SPEAKER2) HAS_SEX=FEM HAS_FIRST_NAME= HAS_LANGUAGE=ge HAS_CHANNEL=a HAS_BOOK_THEME --> HOTEL (P25*) =========================================== TRAVELLING frame [open] =========================================== theme:

attitudes: (# #) ===> ACCEPT

210

relations: NIL JOURNEY (P47*+0) HAS_MOVE_THERE --> MOVE (P45**) HAS_TRANSPORTATION --> RAIL (P46**) HAS_ARRIVAL_TIME --> DATE (P18*****) TEMPEX=tempex(en_28042_5, from:[year:2002, month:mar, dom:2, pod: morning_ger1]) HAS_ADDRESSEE --> VM_SPEAKER (SPEAKER2) HAS_SEX=FEM HAS_FIRST_NAME= HAS_LANGUAGE=ge HAS_CHANNEL=a HAS_DEPARTURE_TIME --> DATE (P44***) TEMPEX=tempex(en_28042_11, from:[year:2002, month:feb, dom:28, pod:evening, tod:6:30]) HAS_MOVE_BACK --> MOVE (P47*) HAS_DEPARTURE_TIME --> DATE (P48*) TEMPEX= tempex(ge_28042_3, from:[year:2002, month:feb, dom:28, dow:thu, pod:evening]) =========================================== SCHEDULING frame [open] =========================================== theme: attitudes: (#) ===> ACCEPT relations: ((MORE_SPECIFIC_THAN . #)) APPOINTMENT (P37**) HAS_DATE --> DATE (P38**) TEMPEX=tempex(ge_28042_2, from:[year:2002, month:feb, dom:27, pod:am, tod:9:45]) HAS_LOCATION --> NONGEO_LOCATION (P39**)

211

HAS_NAME=bahnhof

This is the German summary. Please note that Michelle is a confabulation. VERBMOBIL ERGEBNISPROTOKOLL Nr. 1 ______________________________________________________________________ Teilnehmer: Sprecher B, Frau Michelle Datum: 2.9.2003 Uhrzeit: 15:12 Uhr bis 15:13 Uhr Thema: Reise mit Treffen und Unterkunft ______________________________________________________________________ GESPR"ACHSERGEBNISSE:

Terminabsprache: Sprecher B und Frau Michelle treffen sich am 26. Februar 2003 um viertel vor 10 am Morgen am Bahnhof.

Reiseplanung: Eine Reise wurde vereinbart. Die Hinfahrt findet mit der Bahn statt. Die Hinfahrt dauert vom 27. Februar 2003 um halb 7 am Abend bis dem 2. M"arz 2003 am Morgen. Die R"uckreise beginnt am Donnerstag Abend am 27. Februar 2003.

Unterkunft: Frau Michelle reserviert ein Hotel. ____________________________________________________________________ Protokollgenerierung automatisch am 2.9.2003 15:15:58

212

Next, the English summary: VERBMOBIL SUMMARY Nr. 1 ____________________________________________________________________ Participants: Michelle, Speaker B Date: 2.9.2003 Time: 15:12 until 15:13 Uhr Theme: Appointment schedule with trip and accommodation ____________________________________________________________________ RESULTS: Scheduling: Speaker B and Michelle will meet in the train station on the 26. of february 2003 at a quarter to 10 in the morning. Traveling: A trip was agreed on. The trip there is by train. The trip there lasts from the 27. of february 2003 at six thirty pm until the 2. of march in the morning. The trip back starts an thursday the 27. of february 2003. Accommodation: Michelle is taking care of the hotel reservation. ____________________________________________________________________ Summary generation automatically 2.9.2003 15:15:58

213

The VerbMobil sortal ontology Sorts property field abstract

info content institution symbol meeting sit communicat sit

anything

temporal

situation action sit move sit

space time

position sit temp sit mental sit time agentive object entity

thing

human animal instrument info bearer food vehicle

location

substance geo location nongeo location

Figure 4: The sortal ontology in VerbMobil 214

Examples Angst, Interesse Informatik, Wirtschaft, Sport Programm, Plan BASF, Universit¨at Bindestrich, Punkt treffen, Konferenz, Sitzung sagen, danken, Gespr¨ach arbeiten, schreiben, handeln fahren, legen, gehen, Reise liegen, stehen anfangen, beenden, dauern passen, denken, Annahme Sommer, Abend, dofw, mofy Frau, Kollege, Leute K¨afer, Hund Computer, Telefon Kalender, Unterlagen Bier, Abendessen Auto, Zug Luft Berlin, Saarbr¨ ucken Raum, Geb¨aude

References Alexandersson, J., & Becker, T. (2001). Overlay as the Basic Operation for Discourse Processing in a Multimodal Dialogue System. In Workshop Notes of the IJCAI-01 Workshop on “Knowledge and Reasoning in Practical Dialogue Systems” (pp. 8–14). Seattle, Washington. Alexandersson, J., & Becker, T. (2003a). Default Unification for Discourse Modelling. In H. Bunt & R. Muskens (Eds.), Computing meaning (Vol. 3). Dordrecht: Kluwer Academic Publishers. (Forthcoming) Alexandersson, J., & Becker, T. (2003b). The Formal Foundations Underlying Overlay. In Proceedings of the 5 th International Workshop on Computational Semantics (IWCS-5). Tilburg, The Netherlands. Alexandersson, J., Buschbeck-Wolf, B., Fujinami, T., Kipp, M., Koch, S., Maier, E., Reithinger, N., Schmitz, B., & Siegel, M. (1998). Dialogue Acts in VERBMOBIL-2 – Second Edition (Verbmobil-Report No. 226). DFKI Saarbr¨ ucken, Universit¨at Stuttgart, Technische Universit¨at Berlin, Universit¨at des Saarlandes. Alexandersson, J., Engel, R., Kipp, M., Koch, S., K¨ ussner, U., Reithinger, N., & Stede, M. (2000). Modeling Negotiation Dialogues. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation (pp. 441–451). Springer-Verlag. Alexandersson, J., & Poller, P. (1998). Towards Multilingual Protocol Generation For Spontaneous Speech Dialogues. In Proceedings of the 9th International Workhop on Natural Language Generation (INLG98) (pp. 198–207). Niagara-On-The-Lake, Ontario, Canada. Alexandersson, J., & Poller, P. (2000). Generating Multilingual Dialog Summaries and Minutes. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation (pp. 509–518). Springer-Verlag. 215

Alexandersson, J., Poller, P., Kipp, M., & Engel, R. (2000). Multilingual Summary Generation in a Speech–To–Speech Translation System for Multilingual Dialogues. In Proceedings of the international natural language generation conference (INLG-2000) (pp. 148–155). Mitzpe Ramon, Israel. Alexandersson, J., & Reithinger, N. (1997). Learning Dialogue Structures from a Corpus. In Proceedings of the 5 rd european conference on speech communication and technology (EUROSPEECH-97) (pp. 2231–2235). Rhodes. Alexandersson, J., Reithinger, N., & Maier, E. (1997). Insights into the Dialogue Processing of Verbmobil. In Proceedings of the 5 th Conference on Applied Natural Language Processing, ANLP-97 (pp. 33–40). Washington, DC. Allen, J., & Core, M. (1997, October). Draft of DAMSL: Dialog Act Markup in Several Layers. Allen, J., Schubert, L., Ferguson, G., Heeman, P., Hwang, C. H., Kato, T., Light, M., Martin, N., Miller, B., Poesio, M., & Traum, D. (1995). The TRAINS Project: A case Study In Defining a Conversational Planning Agent. Journal of Experimental and Theoretical AI, 7, 7–48. Allen, J. F., Miller, B. W., Ringger, E. K., & Sikorski, T. (1996). A robust System for Natural Spoken Dialogue. In A. Joshi & M. Palmer (Eds.), Proceedings of the 34th annual meeting of the association for computational linguistics - (ACL’96) (pp. 62–70). San Francisco: Morgan Kaufmann Publishers. Allen, J. F., & Perrault, C. R. (1980). Analyzing Intention in Utterances. Artificial Intelligence, 15, 143–178. Allwood, J. (1976). Linguistic Communication as Action and Cooperation - A study In Pragmatics. Doctoral dissertation, Department of Linguistics, University of G¨oteborg, University of G¨oteborg. (Gothenburg Monographs in Linguistics 2) Allwood, J. (1977). A critical Look at Speech Act Theory. In O. Dahl (Ed.), Logic, pragmatics and grammars (pp. 53–99). Lund, Studentlitteratur. Allwood, J. (1994). Obligations and Options in Dialogue. Think, 3, 9–18. 216

Allwood, J. (1995a). An Activity Based Approach to Pragmatics (Tech. Rep. No. 76). University of Gothenburg Department of Linguistics Box 200, S-405 30 Goeteborg: Linguistics Department, University of Gothenburg. (ISSN 0349-1021) Allwood, J. (1995b). Dialog as Collective Thinking. In P. Pylkk¨anen & P. Pylkk¨o (Eds.), New directions in cognitive science. Helsinki. (Also in Pylkk¨anen, P., Pylkk¨o, P. and Hautam¨aki, A. (Eds.) 1977, Brain, Mind and Physics, pp. 205-211, Amsterdam, IOS Press) Allwood, J. (2000). An Activity Based Approach to Pragmatics. In H. Bunt & B. Black (Eds.), Abduction, belief and context in dialogue (pp. 47– 80). Amsterdam: John Benjamins. Allwood, J. (2001). Structure of Dialog. In M. Taylor, D. Bouwhuis, & F. Neal (Eds.), The structure of multimodal dialogue ii (pp. 3–24). Amsterdam: John Benjamins. Allwood, J., Nivre, J., & Ahls´en, E. (1992). On the Semantics and Pragmatics of Linguistic Feedback. Journal of Semantics(9), 1–26. Antoniou, G. (1999). A Tutorial on Default Logics. ACM Computing Surveys (CSUR) Archive, 31 (4), 337–359. Auerswald, M. (2000). Example-Based Machine Translation with Templates. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-toSpeech Translation (pp. 420–429). Springer-Verlag. Austin, J. (1962). How To Do Things with Words. Oxford: Clarendon Press. Baggia, P., Kellner, A., Perennou, G., Popovici, C., Sturm, J., & Wessel, F. (1999). Language modelling and spoken dialogue systems - the ARISE experience. In Proceedings of the 6 rd european conference on speech communication and technology (EUROSPEECH-99) (pp. 1767–1770). Budapest, Hungary. Batliner, A., Buckow, J., Niemann, H., N¨oth, E., & Warnke, V. (2000). The Prosody Module. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation (pp. 106–121). Springer-Verlag. Becker, T., Finkler, W., Kilger, A., & Poller, P. (1998). An Efficient Kernel for Multilingual Generation in Speech–to–Speech Dialogue Translation. In Proceedings of the 36th annual meeting of the association 217

for computational linguistics and the 17 th international conference on computational linguistics (COLING/ACL-98) (pp. 110–116). Montreal, Quebec, Canada. Becker, T., Kilger, A., Lopez, P., & Poller, P. (2000a). An Extended Architecture for Robust Generation. In Proceedings of the international natural language generation conference (INLG-2000) (pp. 63–68). Mitzpe Ramon, Israel. Becker, T., Kilger, A., Lopez, P., & Poller, P. (2000b). The Verbmobil Generation Component VM-GECO. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation (pp. 483–498). SpringerVerlag. Becker, T., & L¨ockelt, M. (2000). Liliput: a Parameterizable Finite-Domain Constraint Solving Framework and its Evaluation with Natural Language Generation Problems. In TRICS: Techniques for Implementing Constraint Programming Systems, a CP 2000 workshop (pp. 101–117). Singapor. Bilange, E. (1991). A task Independent Oral Dialogue Model. In Proceedings of the 5th. meeting of the european chapter of the association for computational linguistics – (EACL’91) (pp. 83–88). Berlin, Germany. ¨ Birkenhauer, C. (1998). Das Dialogged¨ achtnis des Ubersetzungssystems Verbmobil. Diplomarbeit, Unversit¨at des Saarlandes. Block, H. U., Schachtl, S., & Gehrke, M. (2000). Adapting a Large Scale MT System for Spoken Language. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation (pp. 396–412). SpringerVerlag. Borgida, A., Brachman, R. J., McGuinness, D. L., & Resnick, L. A. (1989). CLASSIC: A structural Data Model for Objects. In Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data (pp. 59–67). Oregon. Boros, M., Eckert, W., Gallwitz, F., G¨orz, G., Hanrieder, G., & Niemann, H. (1996). Towards Understanding Spontaneous Speech: Word Accuracy vs. Concept Accuracy. In Proceedings of the International Conference on Spoken Language Processing (ICSLP-96) (Vol. 2, pp. 1009–1012). Philadelphia, PA. 218

Bos, J., & Heine, J. (2000). Discourse and Dialog Semantics for Translation. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation (pp. 337–348). Springer-Verlag. Bos, J., Schiehlen, M., & Egg., M. (1996). Definition of the Abstract Semantic Classes for the Verbmobil Forschungsprototyp 1.0 (VerbmobilReport No. 165). Universit¨at des Saarlandes, IBM Heidelberg, Universit¨at Stuttgart. (ISSN 1434-8845) Bouma, G. (1990). Defaults in Unification Grammar. In Proceedings of the 28nd annual meeting of the association for computational linguistics (pp. 165–172). University of Pittsburgh, Pennsylvania, USA. Boye, J. (1996). Directional Types in Logic Programming (Doctoral dissertation, Link¨oping Institute of Technology). Link¨ oping studies in science and technology. Brachman, R., & Schmolze, J. (1985). An Overview of the KL-ONE Knowledge Representation System. Cognitive Science, 9 (2), 171–216. Brown, G. P. (1980). Characterizing Indirect Speech Acts. American Journal of Computational Linguistics, 6 (3–4), 150–166. Bunt, H. (1989). Information Dialogues as Communicative Action in Relation to Partner Modelling and Information Processing. In M. Taylor, F. N´eel, & D. Bouwhuis (Eds.), The Structure of Multimodal Dialogue (pp. 47–73). Amsterdam: North-Holland Elsevier. Bunt, H. (1994). Context and Dialogue Control. Think Quarterly, 3 (1), 19–31. Bunt, H. (1995). Dialogue Control Functions and Interaction Design. In R. Beun, M. Baker, & M. Reiner (Eds.), Dialogue and instruction (pp. 197–214). Heidelberg: Springer Verlag. Bunt, H. (2000). Dialogue Pragmatics and Context Specification. In H. C. Bunt & W. J. Black (Eds.), Abduction, Belief and Context in Dialogue. Studies in Computational Pragmatics (Vol. 1, pp. 81–150). Amsterdam: John Benjamins Publishing Company. Bunt, H. C. (1996). Dynamic Interpretation and Dialogue Theory. In M. M. Taylor, F. N´eel, & D. G. Bouwhuis (Eds.), The structure of multimodal dialogue, volume 2. Amsterdam: John Benjamins. 219

Burger, S. (1997). Transliteration spontansprachlicher Daten (Verbmobil Technisches Dokument No. 56). Universit¨at M¨ unchen. Buschbeck-Wolf, B. (1997). Resolution on Demand (Verbmobil-Report No. 196). IMS, Universitt Stuttgart. Bußmann, H. (1990). Lexikon der Sprachwissenschaft (2 ed.). Stuttgart: Alfred Kr¨oner Verlag. Cahill, L., Doran, C., Evans, R., Mellish, C., Paiva, D., Reape, M., Scott, D., & Tipper, N. (1999). In Search of a Reference Architecture for NLG Systems. In Proceedings of the 7 th european workshop on natural language generation (pp. 77–85). Toulouse. Cahill, L., Doran, C., Evans, R., Mellish, C., Paiva, D., Reape, M., Scott, D., & Tipper, N. (2000). Reinterpretation of an Existing NLG System in a Generic Generation Architecture. In Proceedings of the international natural language generation conference (INLG-2000) (pp. 69–76). Mitzpe Ramon, Israel. Carberry, S. (1990). Plan Recognition in Natural Language Dialogue. Cambridge, MA: The MIT Press. Carbonell, J. G., & Goldstein, J. (1998). The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (pp. 335–336). Melbourne, Austrailia: ACM. Carletta, J. (1996). Assessing Agreement on Classification Tasks: The Kappa Statistics. Computational Linguistics, 22 (2), 249–254. Carletta, J., Dahlb¨ack, N., Reithinger, N., & Walker, M. A. (Eds.). (1997). Standards for Dialogue Coding in Natural Language Processing. Schloß Dagstuhl. (Seminar Report 167) Carletta, J., Isard, A., Isard, S., Kowtko, J. C., Doherty-Sneddon, G., & Anderson, A. H. (1997a). The Reliability of a Dialogue Structure Coding Scheme. Computational Linguistics, 1 (23), 13–31. Carletta, J., Isard, A., Isard, S., Kowtko, J. C., Doherty-Sneddon, G., & Anderson, A. H. (1997b). The Reliability of a Dialogue Structure Coding Scheme. Computational Linguistics, 23 (1), 13–32. 220

Carlson, L. (1983). Dialogue Games: An Approach to Discourse Analysis (Vol. 17). Dordrecht, Holland: D. Reidel Publishing Company. Carpenter, B. (1992). The Logic of Typed Feature Structures. Cambridge, England: Cambridge University Press. Carpenter, B. (1993). Skeptical and Credulous Default Unification with Applications to Templates and Inheritance. In T. Briscoe, V. de Paiva, & A. Copestake (Eds.), Inheritance, defaults, and the lexicon (pp. 13– 37). Cambridge, CA: Cambridge University Press. Carroll, J., Copestake, A., Flickinger, D., & Poznanski, V. (1999). An efficient chart generator for (semi-)lexicalist grammars. In Proceedings of the 7th european workshop on natural language generation (pp. 86– 95). Toulouse, France. Charniak, E., Riesbeck, C. R., McDermott, D. V., & Meehan, J. R. (1987). Artificial Intelligence Programming (2 ed.). Hillsdale, NJ: Erlbaum. Chen, S. F. (1996). Building Probabalistic Models for Natural Language. Doctoral dissertation, Harward University Cambridge, Massachusetts. Chu-Carroll, J. (1998). A statistical Model for Discourse Act Recognition in Dialogue Interactions. In J. Chu-Carroll & N. Green (Eds.), Working notes from the AAAI spring symposium on applying machine learning to discourse processing (pp. 12–17). Stanford University. Chu-Carroll, J., & Brown, M. K. (1997). Tracking Initiative in Collaborative Dialogue Interactions. In Proceedings of the association for computational linguistics – (ACL’97) (pp. 262–270). Madrid, Spain. Chu-Carroll, J., & Brown, M. K. (1998). An Evidential Model for Tracking Initiative in Collaborative Dialogue Interactions. User Modeling and User-Adapted Interaction, 8 (3-4), 215–253. Cohen, J. (1960). A coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, 20, 37–46. Cohen, P. R. (1978). On Knowing What to Say: Planning Speech Acts. Doctoral dissertation, Department of Computer Science, University of Toronto, Ontario. (Available as Technical Report TR 118) Copestake, A. (1993). The Representation of Lexical Semantic Information. Doctoral dissertation, University of Sussex. 221

Dahlb¨ack, N., & J¨onsson, A. (1992). An Empirically-Based Computationally Tractable Dialogue Model. In Proceedings of the 14 th annual meeting of the cognitive science society (pp. 785–790). Bloomington, Indiana. Dahlb¨ack, N., J¨onsson, A., & Ahrenberg, L. (1993). Wizard of Oz Studies – Why and How. Knowledge-Based Systems, 6 (4), 258–266. Dale, R. (1995). An Introduction to Natural Language Generation (Tech. Rep.). Microsoft Research Institute (MRI), Macquarie University. (Presented at the 1995 European Summer School on Logic, Language and Information) Denecke, M. (1999). Integrating Knowledge Sources for the Specification of a Task-Oriented System. In J. Alexandersson, L. Ahrenberg, K. Jokinen, & A. J¨onsson (Eds.), Proceedings of the IJCAI’99 Workshop ’Knowledge and Reasoning in Practical Dialogue Systems’ (pp. 33–40). Stockholm. Deransart, P., & Maluszy´ nski, J. (1993). A Grammatical View of Logic Programming. MIT Press. Dorna, M. (1996). The ADT Package for the Verbmobil Interface Term (Verbmobil-Report No. 104). Universit¨at Stuttgart. (ISSN 1434-8845) Dorna, M. (2000). A library Package for the Verbmobil Interface Term (Verbmobil-Report No. 238). IMS, Universitt Stuttgart. (ISSN 14348845) Early, J. (1970). An Efficient Context-Free Parsing Algorithm. Communications of the ACM, 2 (13), 94–102. Eckert, M., & Strube, M. (2000). Dialogue acts, synchronising units and anaphora resolution. Journal of Semantics, 17 (1), 51–89. Emele, M., Dorna, M., L¨ udeling, A., Zinsmeister, H., & Rohrer, C. (2000). Semantic-Based Transfer. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation (pp. 359–376). Springer-Verlag. Endriss, U. (1998). Semantik zeitlicher Ausdr¨ ucke in Terminvereinbarungsdialogen (Verbmobil-Report No. 227). Technische Universit¨at Berlin. Feinstein, A., & Cicchetti, D. (1990). High Agreement but Low Kappa: I. The Problems of Two Paradoxes. Journal of Clinical Epidemiology, 43 (6), 543–549. 222

Ferguson, G. M., Allen, J. F., Miller, B. W., & Ringger, E. K. (1996). The Design and Implementation of the TRAINS-96 System: A Prototype Mixed-Initiative Planning Assistant (Tech. Rep.). University of Rochester: Computer Science Department. (TRAINS Technical Note 96-5) Fikes, R. E., & Nilsson, N. J. (1990). STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving. In J. Allen, J. Hendler, & A. Tate (Eds.), Readings in Planning (pp. 88–97). San Mateo, CA: Kaufmann. Flickinger, D., Copestake, A., & Sag, I. A. (2000). HPSG Analysis of English. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-toSpeech Translation (pp. 255–264). Springer-Verlag. Fouvry, F. (2000). Robust Unification for Linguistics. In 1st workshop on RObust Methods in Analysis of Natural language Data - ROMAND’00 (pp. 77–88). Lausanne. Fouvry, F. (2003). Constraint Relaxation with Weighted Feature Structures. In International Workshop on Parsing Technologies - IWPT’03 (pp. 103–114). Nancy, France. Gates, D., Lavie, A., Levin, L., Waibel, A., Gavald`a, M., Mayfield, L., Woszczyna, M., & Zahn, P. (1996). End-to-End Evaluation in JANUS: a Speech-to-Speech Translation System. In E. Maier, M. Mast, & S. LuperFoy (Eds.), Proceedings of the ECAI-96 Workshop on Dialogue Processing in Spoken Language Systems (pp. 195–206). Budapest. Goffman, E. (1981). Forms of talk. Oxford: Basil Blackwell. Gordon, D., & Lakoff, G. (1975). Conversational Postulates. In P. Cole & J. Morgan (Eds.), Syntax and semantics (Vol. 3: Speech Acts, pp. 83–106). New York: Academic Press. Grice, H. P. (1975). Logic and Conversation. In P. Cole & J. Morgan (Eds.), Syntax and Semantics (Vol. 3: Speech Acts, pp. 41–58). New York: Academic Press. Grimes, J. E. (1975). The Thread of Discourse. Mouton, The Hague, Paris. Grosz, B., Joshi, A., & Weinstein, S. (1995). Centering: A Framework for Modeling the Local Coherence of Discourse. Computational Linguistics, 2 (21), 203–225. 223

Grosz, B., & Sidner, C. (1986). Attention, Intentions and the Structure of Discourse. Computational Linguistics, 12 (3), 175–204. Grove, W. M., Andreasen, N. C., McDonald-Scott, P., Keller, M. B., & Shapiro, R. W. (1981). Reliability Studies of Psychiatric Diagnosis, Theory and Practice. Archives General Psychiatry, 38, 408–413. Grover, C., Brew, C., Manandhar, S., & Moens, M. (1994). Priority Union and Generalization in Discourse Grammars. In Proceedings of the 32 nd Annual Meeting of the Association for Computational Linguistics (pp. 17–24). Las Cruces, New Mexico. Guinn, C. I. (1996). Mechanisms for Mixed-Initiative Human-Computer Collaborative Discourse. In A. Joshi & M. Palmer (Eds.), Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics - (ACL’96) (pp. 278–285). San Francisco: Morgan Kaufmann Publishers. Gwet, K. (2001). Handbook of Inter-Rater Reliability. STATAXIS Publishing Company. Heeman, P., & Strayer, S. (2001). Adaptive Modeling of Dialogue Initiative. In Proceedings of the NAACL Workshop on Adaption in Dialogue Systems (pp. 79–80). Pittsburgh, USA. Heinecke, J., & Worm, K. (1996). The Verbmobil Semantic Database (Verbmobil-Report No. 106). Humboldt-Universit¨at Berlin, Universit¨at des Saarlandes. (ISSN 1434-8845) Heisterkamp, P., & McGlashan, S. (1996). Units of Dialogue Management: An Example. In Proceedings of the International Conference on Spoken Language Processing (ICSLP-96) (pp. 200–203). Philadelphia, PA. Hirschberg, J., & Litman, D. (1987). Now let’s Talk About Now; Identifying Cue Phrases Intonationally. In Proceedings of the 25 th Conference of the Association for Computational Linguistics – (ACL’87) (pp. 163– 171). Stanford University, Stanford, CA. Hitzeman, J., Mellish, C., & Oberlander, J. (1997). Dynamic Generation of Museum Web Pages: The Intelligent Labelling Explorer. In Proceedings of the Museums and the Web Conference (pp. 107–115). Los Angeles. 224

Hobbs, J., Appelt, D. E., Bear, J., Israel, D., Kameyama, M., Stickel, M., & Tyson, M. (1996). FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text. In E. Roche & Y. Schabes (Eds.), Finite State Devices for Natural Language Processing (pp. 383–406). MIT Press. Honderich, T. (Ed.). (1995). The Oxford Companion to Philosophy. Oxford: Oxford University Press. Hovy, E., & Marcu, D. (1998). Automated Text Summarization. Tutorial at the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING/ACL-98). Nancy. Iida, H., & Arita, H. (1992). Natural Language Dialogue Understanding on a Four-layer Plan Recognition Model. Journal of Information Processing, Vol. 15 (No. 1), 60–71. Imaichi, O., & Matsumoto, Y. (1995). Integration of Syntactic, Semantic and Contextual Information in Processing Grammatically Ill-Formed Inputs. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI’95) (pp. 1435–1440). Montreal, Canada. Jekat, S., Klein, A., Maier, E., Maleck, I., Mast, M., & Quantz, J. J. (1995). Dialogue Acts in Verbmobil (Verbmobil-Report No. 65). Universit¨at Hamburg, DFKI Saarbr¨ ucken, Universit¨at Erlangen, TU Berlin. Jelinek, F. (1990). Self-Organized Language Modeling for Speech Recognition. In A. Waibel & K.-F. Lee (Eds.), Readings in Speech Recognition (pp. 450–506). Morgan Kaufmann. Johnston, M., Bangalore, S., Vasireddy, G., Stent, A., Ehlen, P., Walker, M., Whittaker, S., & Maloor, P. (2002). MATCH: An Architecture for Multimodal Dialogue Systems. In Proceedings of the 42 th Annual Meeting of the Association for Computational Linguistics, ACL’02 (pp. 376–383). Philadelphia. J¨onsson, A. (1993). Dialogue Management for Natural Language Interfaces – an Empirical Approach. Doctoral dissertation, Link¨oping Studies in Science and Technology. Jurafsky, D., Wooters, C., Tajchman, G., Segal, J., Stolcke, A., Fosler, E., & Morgan, N. (1995). Using a Stochastic Context-Free Grammar 225

as a Language Model for Speech Recognition. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing ICASSP-95 (pp. 189–192). Detroit, MI. Kameyama, M., Ochitani, R., & Peters, S. (1991a). Resolving Translation Mismatches with Information Flow. In Proceedings of the 29 th Annual Meeting of the Association for Computational Linguistics (ACL’91) (pp. 193–200). Berkley, California. Kameyama, M., Ochitani, R., & Peters, S. (1991b). Resolving Translation Mismatches with Information Flow. In Proceedings of the 29 th Annual Meeting of the Association for Computational Linguistics (ACL’91) (pp. 193–200). University of California, Berkeley, California. Kamp, H. (1981). A Theory of Truth and Semantic Representation. In J. Groencndijk, T. M. Janssen, & M. Stokhof (Eds.), Formal Methods in the Study of Language (pp. 277–322). Amsterdam: Mathematisch Centrum Tracts. Kamp, H., & Reyle, U. (1993). From Discourse to Logic: Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Dordrecht: Kluwer. Kamp, H., & Scott, D. R. (1996). Discourse and Dialogue. In R. A. Cole, J. Mariani, H. Uszkoriet, A. Zaenen, & V. Zue (Eds.), Survey of the State of the Art of Human Language Technology. Stanford, CA and Giardini, Pisa, Italy: Cambridge University Press. (Available at http://www.cse.ogi.edu/CSLU/HLTsurvey/) Kaplan, R. M. (1987). Three Seductions of Computational Psycholinguistics. In P. Whitelock, H. Somers, P. Bennett, R. Johnson, & M. M. Wood (Eds.), Linguistic Theory and Computer Applications (pp. 149– 188). London: Academic Press. Karger, R., & Wahlster, W. (2000). Facts and Figures about the Verbmobil Project. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-toSpeech Translation (pp. 454–467). Springer-Verlag. Karttunen, L. (1986). D-PATR: A Development Environment for Unification-Based Grammars. In Proceedings of the 14 th International Conference on Computational Linguistics (COLING-86) (pp. 25–29). Bonn, Germany: Institut f¨ ur angewandte Kommunikationsund Sprachforschung e.V. (IKS). 226

Karttunen, L. (1998). The Proper Treatment of Optimality in Computational Phonology. In K. Oflazer & L. Karttunen (Eds.), Finite State Methods in Natural Language Processing (pp. 1–12). Bilkent University, Ankara, Turkey. Kasper, W., Bos, J., Schiehlen, M., & Thielen, C. (1999). Definition of Abstract Semantic Classes (Verbmobil Technisches Dokument No. 61). DFKI GmbH. Kautz, H. A. (1987). A Formal Theory of Plan Recognition. Doctoral dissertation, Department of Computer Science, University of Rochester. (Also available as Technical Report 215) Kautz, H. A. (1991). A Formal Theory of Plan Recognition and its Implementation. In J. F. Allen, H. A. Kautz, R. N. Pelavin, & J. D. Tenenberg (Eds.), Reasoning About Plans (pp. 69–125). San Mateo (CA), USA: Morgan Kaufmann Publishers. Kay, M. (1996). Chart Generation. In A. Joshi & M. Palmer (Eds.), Proceedings of the 34th annual meeting of the association for computational linguistics - (ACL’96) (pp. 200–204). San Francisco: Morgan Kaufmann Publishers. Kay, M., Gawron, J. M., & Norvig, P. (1991). Verbmobil - a Translation System for Face-to-Face Dialog. CSLI. Kemp, T., Weber, M., & Waibel, A. (2000). End to End Evaluation of the ISL View4You Broadcast News Transcription System. In Proceedings of the Conference on Content-Based Multimedia Information Access RIAO-00. Paris. Kipp, M., Alexandersson, J., Engel, R., & Reithinger, N. (2000). Dialog Processing. In W. Wahlster (Ed.), Verbmobil: Foundations of Speechto-Speech Translation (pp. 454–467). Springer-Verlag. Kipp, M., Alexandersson, J., & Reithinger, N. (1999). Understanding Spontaneous Negotiation Dialogue. In J. Alexandersson, L. Ahrenberg, K. Jokinen, & A. J¨onsson (Eds.), Proceedings of the IJCAI’99 Workshop ’Knowledge and Reasoning in Practical Dialogue Systems’ (pp. 57–64). Stockholm. Klein, M., Bernsen, N. O., Davies, S., Dybkjr, L., Garrido, J., Kasch, H., Mengel, A., Pirrelli, V., Poesio, M., Quazza, S., & Soria, 227

C. (1998, July). Supported Coding Schemes. MATE Deliverable No. D1.1. DFKI, Saarbr¨ ucken, Germany. (Also available from http://www.dfki.de/mate/d11) Kl¨ uter, A., Ndiaye, A., & Kirchmann, H. (2000). Verbmobil From a Software Engineering Point of View: System Design and Software Integration. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation (pp. 637–660). Springer-Verlag. Koch, S. (1998). DLR. (Unpublished Technical Memo) Koch, S. (1999). Analyse und Repr¨ asentation von Dialoginhalten – Bestimmung des zentralen Turngehalts. Diplomarbeit, Technische Universit¨at Berlin. Koller, A., & Striegnitz, K. (2002). Generation as dependency parsing. In Proceedings of the 40th Conference of the Association of Computational Linguistics (ACL’02) (pp. 17–24). Philadelphia. Kowtko, J. C., Isard, S. D., & Doherty, G. M. (1993). Conversational Games Within Dialogue (Tech. Rep. No. HCRC/RP-31). University of Edinburgh: Human Communication Research Centre, Edinburgh Human Communication Research Centre, Glasgow. Krieger, H.-U. (1995). TDL—A Type Description Language for ConstraintBased Grammars. Foundations, Implementation, and Applications. (Doctoral dissertation, Universit¨at des Saarlandes, Department of Computer Science). Saarbr¨ ucken Dissertations in Computational Linguistics and Language Technology. Krieger, H.-U. (2001). Greatest Model Semantics for Typed Feature Structures. Grammars, 4 (2), 139–165. Kupiec, J., Pedersen, J. O., & Chen, F. (1995). A Trainable Document Summarizer. In Proceedings of the 18 th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 68–73). Tampere, Finland. K¨ ussner, U., & Stede, M. (1995). Zeitliche Ausdr¨ ucke: Repr¨ asentation und Inferenz (Tech. Rep. No. Verbmobil Memo 100). Technische Universit¨at Berlin. (In German) 228

Lakoff, R. (1973). Logic of Politeness or Minding Your P’s and Q’s. In C. Colum et al (Ed.), Papers from the 9 th Regional Meeting of Chicago Linguistic Society (pp. 292–305). Chicago. Lambert, L. (1993). Recognizing Complex Discourse Acts: A Tripartite Plan-Based Model of Dialogue. Doctoral dissertation, University of Delaware, Newark, Delaware. Larsson, S. (1998a). Coding Schemas for Dialogue Moves (Tech. Rep.). Department of Linguistics, G¨oteborg University. (Technical Report from the S-DIME Project) Larsson, S. (1998b). Using a Type Hierarchy to Characterize Reliability of Coding Schemas for Dialogue Moves. In Proceedings of 2 nd International Conference on Cooperative Multimodal Communication, Theory and Applications (CMC/98). Tilburg University, Tilburg. Lascarides, A., & Copestake, A. A. (1999). Default Representation in Constraint-Based Frameworks. Computational Linguistics, 25 (1), 55– 105. Levin, J. A., & Moore, J. A. (1977). Dialogue Games: Metacommunication Structures for Natural Language Interaction. Cognitive Science 1(4), 1(4), 395–420. Levin, L., Bartlog, B., Llitjos, A. F., Gates, D., Lavie, A., Wallace, D., Watanabe, T., & Woszczyna, M. (2000). Lessons Learned from a Task-Based Evaluation of Speech-to-Speech Machine Translation. In Proceedings of the International Conference on Language Resources and Evaluation (LREC-2000) (pp. 721–724). Athens, Greece. Levin, L., Gates, D., Lavie, A., & Waibel, A. (1998). An Interlingua Based on Domain Actions for Machine Translation of Task-Oriented Dialogues. In Processings of the International Conference on Spoken Language Processing (ICSLP-98). Sydney Convention and Exibition Centre, Darling Harbour, Sydney. Levin, L., Glickman, O., Qu, Y., Gates, D., Lavie, A., Rose, C., Ess-Dykema, C. V., & Waibel, A. (1995). Using Context in Machine Translation of Spoken Language. In Proceedings of Theoretical and Methodological Issues in Machine Translation (TMI-95) (pp. 173–187). Leuven, Belgium. 229

Levinson, S. C. (1983). Pragmatics. Cambridge University Press. Linell, P., & Gustavsson, L. (1987). INITIATIV OCH RESPONS Om dialogens dynamik, dominans och koherens (Vol. SIC 15). University of Link¨oping, Studies in Communication. Litman, D. J., & Allen, J. A. (1987). A Plan Recognition Model for Subdialogue in Conversations. Cognitive Science, 11, 163–200. L¨oeckelt, M., Becker, T., Pfleger, N., & Alexandersson, J. (2002). Making Sense of Partial. In Bos, Foster, & Matheson (Eds.), Proceedings of the 6th Workshop on the Semantics and Pragmatics of Dialogue (EDILOG 2002) (pp. 101–107). Edinburgh. Luperfoy, S. (1992). The Representation of Multimodal User Interface Dialogues Using Discourse Pegs. In Proceedings of acl’92 (pp. 22–31). Deleware, USA. Maier, E. (1996). Context Construction as Subtask of Dialogue Processing – The Verbmobil Case. In A. Nijholt, H. Bunt, S. LuperFoy, G. V. van Zanten, & J. Schaake (Eds.), Proceedings of the 11 th Twente Workshop on Language Technology, TWLT, Dialogue Management in Natural Language Systems (pp. 113–122). Enschede, Netherlands. Maier, E. (1997). Evaluating a Scheme for Dialogue Annotation (VerbmobilReport No. 193). DFKI GmbH. Malenke, M., B¨aumler, M., & Paulus, E. (2000). Speech Recognition Performance Assessment. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation (pp. 583–591). Springer-Verlag. Mani, I. (2000-2001). Summarization Evaluation: An Overview. In Proceedings of the 2nd NTCIR Workshop on Research in Chainese & Japanese Text Retrieval and Text Summarization. Tokyo, Japan. (ISBN: 4924600-96-2) Mani, I., House, D., Klein, G., Hirschman, L., Obrist, L., Firmin, T., Chrzanowski, M., & Sundheim, B. (1998). The Tipster SUMMAC Text Summarization Evaluation - final Report (Tech. Rep.). The MITRE Corp. Mani, I., & Maybury, M. (Eds.). (1999). Advances in Automatic Text Summarization. Cambridge, MA: MIT Press. 230

Mann, W. C., & Thompson, S. A. (1988). Rhetorical Structure Theory: Toward a Functional Theory of Text Organization. Text, 3 (8). (Also available as USC Information Sciences Institute Research Report RR87-190) Marcu, D. (1997). From Local to Global Coherence: A Bottom-Up Approach to Text Planning. In D. Traum (Ed.), AAAI/IAAI (pp. 629–635). Menlo Park, California: American Association for Artificial Intelligence. Marcu, D. (1998). Improving Summarization Through Rhetorical Parsing Tuning. In The 6th workshop on very large corpora (pp. 206–215). Montreal, Canada. Marcu, D. (1999). Discourse Trees are Good Indicators of Importance in Text. In I. Mani & M. Maybury (Eds.), Advances in Automatic Text Summarization (pp. 123–136). Cambridge, MA: MIT Press. McGlashan, S., Fraser, M., Nigel, G., Bilange, E., Heisterkamp, P., & Youd, N. J. (1992). Dialogue Management for Telephone Information Systems. In Proceedings of the 3rd Conference in Applied Natural Language Processing, ANLP-92 (pp. 245–246). Trento, Italy. McKeown, K. R. (1985). Text Generation - Using Discourse Strategies and Focus Constraints to Generate Natural Language Text. Cambridge, GB: Cambridge University Press. McKeown, K. R., & Radev, D. R. (1995). Generating Summaries of Multiple News Articles. In Proceedings, 18th annual international ACM SIGIR conference on research and development in information retrieval (pp. 74–82). Seattle, Washington. Mellish, C., O’Donnell, M., Oberlander, J., & Knott, A. (1998). An Architecture for Opportunistic Text Generation. In E. Hovy (Ed.), Proceedings of the 9th International Workshop on Natural Language Generation (INLG-98) (pp. 28–37). New Brunswick, New Jersey: Association for Computational Linguistics. Mey, J. L. (2001). Pragmatics: An Introduction (2 ed.). Blackwell Publishers. (ISBN 0-631-21132-2) Moore, D. C. (2002). The IDIAP Smart Meeting Room (Tech. Rep. No. IDIAP-Com 02-07). IDIAP. 231

Moore, J. D., & Paris, C. (1993). Planning Text for Advisory Dialogues: Capturing Intentional and Rhethorical Information. Computational Linguistics, 19, 652–694. Morgan, N., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Janin, A., Pfau, T., Shriberg, E., & Stolcke, A. (2002). The Meeting Project at ICSI. In The First International Human Language Technology Conference on Human Language Technology Research HLT-2002 (pp. 246–252). San Diego, California, USA. Morris, C. (1938). Foundations of the Theory of Signs. In Morris (Ed.), Writings on the general theory of signs. The Hague, Mouton. (1971) Ninomiya, T., Miyao, Y., & Tsujii, J. (2002). Lenient Default Unification for Robust Processing within Unification Based Grammar Formalisms. In Proceedings of the 19th international conference on computational linguistics (COLING-2002) (pp. 744–750). Taipei, Taiwan. Novick, D. G. (1988). Control of Mixed-Initiative Discourse Through MetaLocutionary Acts: A Computational Model. Doctoral dissertation, University of Oregon. O’Donnell, M., Knott, A., Oberlander, J., & Mellish, C. (2000). Optimising Text Quality in Generation from Relational Databases. In Proceedings of the international natural language generation conference (INLG2000) (pp. 133–140). Mitzpe Ramon, Israel. Pfleger, N. (2002). Discourse Processing for Multimodal dialogues and its Application in Smartkom. Diplomarbeit, Unversit¨at des Saarlandes. Pfleger, N., Alexandersson, J., & Becker, T. (2002). Scoring Functions for Overlay and their Application in Discourse Processing. In Proceedings of KONVENS 2002 (pp. 139–146). Saarbr¨ ucken. Pfleger, N., Alexandersson, J., & Becker, T. (2003). A Robust and Generic Discourse Model for Multimodal Dialogue. In Workshop Notes of the IJCAI-03 Workshop on “Knowledge and Reasoning in Practical Dialogue Systems” (pp. 64–70). Acapulco, Mexico. Pfleger, N., Engel, R., & Alexandersson, J. (2003). Robust Multimodal Discourse Processing. In Kruijff-Korbayova & Kosny (Eds.), Proceedings of Diabruck: 7th Workshop on the Semantics and Pragmatics of Dialogue (pp. 107–114). Wallerfangen, Germany. 232

Pinkal, M., Rupp, C., & Worm, K. (2000). Robust Semantic Processing of Spoken Language. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation (pp. 322–336). Springer-Verlag. Poesio, M., & Mikheev, A. (1998). The Predictive Power of Game Structure in Dialogue Act Recognition: Experimental Results using Maximum Entropy Estimation. In Proceedings of the international conference on spoken language processing (ICSLP-98) (pp. 2235–2238). Sydney, Australia. Poesio, M., & Traum, D. R. (1997). Conversational Actions and Discourse Situations. Computational Intelligence, 13 (3), 309–347. Pollack, M. E., Hirschberg, J., & Webber, B. L. (1982). User Participation in the Reasoning Processes of Expert Systems. In Proceedings of the 2nd national conference on artificial intelligence (AAAI-82) (pp. 358–356). Pittsburgh, PA. Pr¨ ust, H. (1992). On Discourse Structuring, VP Anaphora and Gapping. Doctoral dissertation, Universiteit van Amsterdam. Qu, Y., Di Eugenio, B., Lavie, A., Levin, L. S., & Rose, C. P. (1997). Minimizing Cumulative Error in Discourse Context. In E. Maier, M. Mast, & S. LuperFoy (Eds.), ECAI workshop on dialogue processing in spoken language systems (Vol. 1236, pp. 171–182). Springer. Qu, Y., Rose, C. P., & Di Eugenio, B. (1996). Using Discourse Predictions for Ambiguity Resolution. In Proceedings of the 16 th international conference on computational linguistics (COLING-96) (pp. 358–363). Copenhagen. Rambow, O., Bangalore, S., & Walker, M. (2001). Natural Language Generation in Dialog Systems. In Proceedings of the 1 st international conference on human language technology research - (hlt’2001). San Diego, USA. Reber, A. S. (1996). The Penguin Dictionary of Psychology. Putnam Inc. Reiter, E. (1994). Has a Consensus NL Generation Architecture Appeared, and is it Psycholinguistically Plausible? In 7 th international workshop on natural language generation (pp. 163–170). Kennebunkport, Maine. Reiter, E., & Dale, R. (2000). Building Natural-Language Generation Systems. Cambridge University Press. 233

Reithinger, N. (1995). Some Experiments in Speech Act Prediction. In Working Notes from the AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation (pp. 126–131). University of Stanford. Reithinger, N. (1999). Robust Information Extraction in a Speech Translation System. In Proceedings of the 6 rd european conference on speech communication and technology (EUROSPEECH-99) (pp. 2427–2430). Budapest, Hungary. Reithinger, N. (2000). Leave One Out Experiments for Statistical Dialogue Act Recognition (Verbmobil-Memo No. 146). DFKI Saarbr¨ ucken. Reithinger, N., Alexandersson, J., Becker, T., Blocher, A., Engel, R., L¨oeckelt, M., M¨ ueller, J., Pfleger, N., Poller, P., Streit, M., & Tschernomas, V. (2003). Smartkom - Adaptive and Flexible Multimodal Access to Multiple Applications. In Proceedings of ICMI 2003. Vancouver, B.C. (forthcoming) Reithinger, N., & Engel, R. (2000). Robust Content Extraction for Translation and Dialog. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation (pp. 428–440). Springer-Verlag. Reithinger, N., Engel, R., Kipp, M., & Klesen, M. (1996). Predicting Dialogue Acts for a Speech-To-Speech Translation System. In Proceedings of the International Conference on Speech and Language Processing (ICSLP-96) (pp. 654–657). Philadelphia, PA. Reithinger, N., & Kipp, M. (1998). Large Scale Dialogue Annotation in Verbmobil. In Proceedings of the 10 th european summer school in logic, language and information (ESSLLI-98) workshop on recent advances in corpus annotation. Saarbr¨ ucken, Germany. Reithinger, N., Kipp, M., Engel, R., & Alexandersson, J. (2000). Summarizing Multilingual Spoken Negotiation Dialogues. In Proceedings of the 38th conference of the association for computational linguistics (ACL’2000) (pp. 310–317). Hong Kong, China. Reithinger, N., & Klesen, M. (1997). Dialogue Act Classification Using Language Models. In Proceedings of the 5 rd european conference on speech communication and technology (EUROSPEECH-97) (pp. 2235– 2238). Rhodes. 234

Rupp, C., Spilker, J., Klarner, M., & Worm, K. L. (2000). Combining Analyses from Various Parsers. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation (pp. 311–321). SpringerVerlag. Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A Simplest Systematics for the Organisation of Turn-Taking in Conversation. Language, 50, 696–735. Scha, R., & Polanyi, L. (1988). An Augmented Context Free Grammar for Discourse. In Proceedings of the 12 th international conference on computational linguistics (COLING-88) (pp. 573–577). Budapest. Scheffler, T. (2003). Generation With TAG – A Semantics Interface and Syntactic Realizer. Diplomarbeit, Unversit¨at des Saarlandes. Schegloff, E. A., & Sacks, H. (1973). Opening up Closings. Semiotica, 4 (VIII), 289–327. Schiehlen, M. (2000). Semantic Construction. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation (pp. 200–216). Springer-Verlag. Schmitz, B., & Quantz, J. (1995). Dialogue Acts in Automatic Dialogue Interpreting. In Proceedings of the 6 th international conference on theoretical and methodological issues in machine translation (TMI-95) (pp. 33–47). Leuven, Belgium. Searle, J. R. (1969). Speech Acts. Cambridge, GB: University Press. Searle, J. R. (1975). Indirect Speech Acts. In P. Cole & J. Morgan (Eds.), Syntax and semantics (Vol. 3: Speech Acts). New York: Academic Press. Severinson-Eklund, K. (1983). The Notion of Language Game: A Natural Unit of Dialogue and Discourse (Tech. Rep. No. SIC 5). University of Link¨oping, Department of Communication studies. Sidner, C. L. (1994). An Artificial Discourse Language for Collaborative Negotiation. In B. Hayes-Roth & R. Korf (Eds.), Proceedings of the 12th national conference on artificial intelligence (AAAI-94) (pp. 814– 819). Menlo Park, California: AAAI Press. 235

Siegel, M. (1996). Die maschinelle u ¨bersetzung aufgabenorientierter Japanisch-Deutscher Dialoge. L¨ osungen f¨ ur Translation Mismatches. Doctoral dissertation, Universit¨at Bielefeld. Siegel, M. (2000). HPSG Analysis of Japanese. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation (pp. 265–280). Springer-Verlag. Simpson, A., & Fraser, N. M. (1993). Black Box and Glass Box Evaluation of the SUNDIAL System. In Proceedings of the 3 rd european conference on speech communication and technology (EUROSPEECH-93) (pp. 1423–1426). Berlin, Germany. Sinclair, J. M., & Coulthard, M. (1975). Towards an Analysis of Discourse: The English used by Teachers and Pupils. London: Oxford University Press. Siskind, J. M., & McAllester, D. A. (1993). Nondeterministic Lisp as a Substrate for Constraint Logic Programming. In R. Fikes & W. Lehnert (Eds.), Proceedings of the 11th National Conference on Artificial Intelligence (AAAI-93) (pp. 133–138). Menlo Park, California: AAAI Press. Smith, R. W. (1997). Practical Issues in Mixed-Initiative Natural Language Dialog: An Experimental Perspective. In Proceedings of the AAAI spring symposium on computational models for mixed initiative interaction (pp. 158–162). Stanford University. Stede, M. (1999). Lexical Semantics and Knowledge Representation in Multilingual Text Generation. Boston/Dordrecht/London: Kluwer Academic Publishers. Steele, G. L. (1984). Common LISP. Digital Press. Stolcke, A. (1994a). Bayesian Learning of Probabilistic Language Models. Doctoral dissertation, University of California at Berkeley. Stolcke, A. (1994b). How to Boogie: A Manual for Bayesian Object-oriented Grammar Induction and Estimation. International Computer Sience Institute of Berkley, California. Stolcke, A. (1995). An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities. Computational Linguistics, 21, 165–201. 236

Strayer, S., & Heeman, P. (2001). Reconciling Initiative and Discourse Structure. In Proceedings of the 2nd SIGdial Workshop on Discourse and Dialogue (pp. 153–161). Aalborg, Denmark. Tessiore, L., & Hahn, W. von. (2000). Functional Validation of a Machine Interpretation System: Verbmobil. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation (pp. 613–636). SpringerVerlag. Teufel, S., & Moens, M. (2000). What’s yours and what’s mine: Determining Intellectual Attribution in Scientific Text. In SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC’00) (pp. 110–117). Hong Kong, China. Teufel, S., & Moens, M. (2002). Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status. Computational Linguistics, 28 (4), 409–445. (Special issue on Summarization) Thompson, W., & Walter, S. (1988). A Reappraisal of the Kappa Coefficient. Journal of Clinical Epidemiology, 41 (10), 949–958. Touretzky, D. (1986). The Mathematics of Inheritance Systems. Morgan Kaufmann. Tsang, E. (1993). Foundations of constraint satisfaction. London: Academic Press Ltd. Uebersax, J. (1987). Diversity of Decision-Making Models and the Measurement of Interrater Agreement. Psychological Bulletin, 101, 140–146. Uebersax, J. S. (1988). Validity Inferences from Interobserver Agreement. Psychological Bulletin, 104, 405–416. Vauquois, B. (1975). La Traduction Automatique a` Grenoble. Documents de Linguistique Quantitative, 24. (Paris: Dunod) Vilain, M. B. (1990). Getting Serious about Parsing Plans: a Grammatical Analysis of Plan Recognition. In Proceedings of American Association for Artificial Intelligence (pp. 190–197). Boston, MA. Vogel, S., Och, F. J., Tillmann, C., Nieen, S., Sawaf, H., & Ney, H. (2000). Statistical Methods for Machine Translation. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation (pp. 379– 395). Springer-Verlag. 237

Wahlster, W. (Ed.). (2000). Verbmobil: Foundations of Speech-to-Speech Translation. Springer-Verlag. Wahlster, W. (2001). Robust Translation of Spontaneous Speech: A multiEngine Approach. In Proceedings of the 17 th international joint conference on artificial intelligence (IJCAI’01) (Vol. 2, pp. 1484–1493). Seattle, Washington: San Francisco: Morgan Kaufmann. Wahlster, W. (2003). Towards Symmetric Multimodality: Fusion and Fission of Speech, Gesture, and Facial Expression. In A. G¨ unter, R. Kruse, & B. Neumann (Eds.), KI 2003: Advances in Artificial Intelligence. Proceedings of the 26 th German Conference on Artificial Intelligence (pp. 1–18). Hamburg: Berlin, Heidelberg: Springer. Wahlster, W., Reithinger, N., & Blocher, A. (2001). Smartkom: Multimodal Communication with a Life-Like Character. In Proceedings of the 7rd European Conference on Speech Communication and Technology (EUROSPEECH 2001 - Scandinavia) (pp. 2231–2235). Aalborg, Denmark. Waibel, A., Bett, M., Metze, F., Ries, K., Schaaf, T., Schultz, T., Soltau, H., Yu, H., & Zechner, K. (2001). Advances in Automatic Meeting Record Creation and Access. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing ICASSP-2001. Salt Lake City, UT. Waibel, A., Soltau, H., Schultz, T., Schaaf, T., & Metze, F. (2000). Multilingual Speech Recognition. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation (pp. 33–45). Springer-Verlag. Walker, M., & Whittaker, S. (1990). Mixed Initiative in Dialogue: An Investigation into Discourse Segmentation. In Proceedings of the 28 th Conference of the Association for Computational Linguistics (ACL’90) (pp. 70–78). University of Pittsburgh. Ward, N. (2000). Issues in the Transcription of English Conversational Grunts. In Proceedings of the acl 2000 workshop 1st sigdial workshop on discourse and dialogue. Hong Kong. Weizenbaum, J. (1966). ELIZA — A Computer Program for the Study of Natural Language Communication between Man and Machine. Communications of the ACM, 9 (1), 36–44. 238

Whittaker, S., & Stenton, P. (1988). Cues and Control in Expert-Client Dialogues. In Proceedings of the 26 th annual meeting of the association for computational linguistics (ACL’88) (pp. 123–130). Buffalo, NY. Winograd, T. (1972). Understanding Natural Language. New York: Academic Press. (Also published in Cognitive Psychology , 3:1 (1972), pp. 1–191.) Wittgenstein, L. (1974). Philosophical Grammar. Oxford: Basil Blackwell. Woods, W. (1981). Procedural Semantics as a Theory of Meaning. In A. Joshi, B. L. Webber, & I. Sag (Eds.), Elements of Discourse Understanding (pp. 300–334). Cambridge: Cambridge University Press. Woods, W., & Kaplan, R. (1977). Lunar Rocks in Natural English: Explorations in Natural Language Question Answering. Fundamental Studies in Computer Science, Linguistic Structures Processing(5), 521–569. Young, S. (1996). Large Vocabulary Continuous Speech Recognition. IEEE Signal Processing Magazine, 5 (13), 45–57. Zechner, K. (2001a). Automatic Generation of Concise Summaries of Spoken Dialogues in Unrestricted Domains. In Proceedings of the 24 th ACM SIGIR International Conference on Research and Development in Information Retrieval (pp. 199–207). New Orleans, LA. Zechner, K. (2001b). Automatic Summarization of Spoken Dialogues in Unrestricted Domains. Doctoral dissertation, Carnegie Mellon University, School of Computer Science, Language Technologies Institute. (Also printed as: Technical Report CMU-LTI-01-168.) Zechner, K. (2002). Automatic Summarization of Open-Domain Multiparty Dialogues in Diverse Genres. Computational Linguistics, 28 (4), 447– 485. (Special issue on Summarization)

239

Suggest Documents