1 Spoken Natural Language: The Future is Now - Semantic Scholar

Spoken Variable Initiative Dialog: An Adaptable Interface Ronnie W. Smith Department of Mathematics East Carolina University Greenville, NC 27858 Abstract Recent advances in speech recognition technology have raised hopes about the development of practical spoken natural language interfaces. Embedding the speech recognition technology within a sophisticated dialog processing mechanism can overcome many of the traditional problems in spoken natural language systems. This paper presents a theory of natural language dialog that enables: (1) robust behavior in the presence of word misrecognitions; (2) exible behavior according to the level of user expertise; and (3) practical implementation of spoken natural language systems using commercial hardware. In addition to the theory, the paper also describes experimental results obtained from using an implementation of this theory.

1 Spoken Natural Language: The Future is Now Major advances in recent years in speech recognition technology (see [7] and [13])1 have raised expectations about the development of practical spoken natural language interfaces. Such interfaces can provide user

exibility as well as allow users to have their hands and eyes busy on the task at hand. Examples of such situations include equipment repair, telephone assistance, airline piloting, automobile driving, heavy equipment operation, etc. Traditionally, voice natural language systems have not been able to provide the required performance due to a lack of exibility, a lack of eective goal-seeking behavior, and poor speech recognition rates, to name a few reasons. These problems can be circumvented by embedding the speech recognition technology within a dialog processing mechanism with the following capabilities:

uses problem solving to help the human user carry out the task; conducts subdialogs to carry out the various subgoals needed to complete the task, switching between subdialogs as needed; exploits knowledge about the user to determine what information is known and what information needs to be communicated; exploits context dependent expectations about user responses to assist in interpreting the user's responses; and engages in dialog where the task initiative can vary from strongly computer controlled to strongly user controlled or somewhere in between.

This ability to vary the level of initiative is called variable initiative dialog, and it represents a major advance in the utility of natural language systems. With such a capability, a natural language system can 1 In addition, several projects demonstrating dramatically improved speech recognition performance were presented at the 1993 ARPA (Advanced Research Projects Agency) Human Language Technology Workshop.

1

DIALOG CONTROLLER receives goal recommendations invokes theorem-prover for goal completion controls interface for acquiring missing axioms updates acquired dialog and user knowledge

HJ]JH *Y JJ HHH

JJJ HHH JJ HH

J HHj

J

J

JJ

JJJ

JJJ

JJ

JJ^

DOMAIN PROCESSOR (Expert System)

provides domain expertise and knowledge

LINGUISTIC INTERFACE Speech Recognition

GENERAL REASONING (Theorem Proving)

proves goal completion noti es dialog controller of missing axioms in proof

KNOWLEDGE

turn sound waves into words

dialog knowledge

Generation

Key- Communications Link

synthesize logical meaning and vocalize

user knowledge

Figure 1: Dialog Processing Architecture eectively communicate with both task novices and experts as well as users with intermediate levels of expertise. Task novices require a computer-controlled dialog to lead them through the details. Task experts need occasional advice while pursuing their own task strategies. A variable initiative dialog system is capable of eective interaction with both. This paper describes a theory of natural language dialog that permits variable initiative behavior as well as the other behaviors previously listed. Experimental results obtained from using an implementation of this theory are also presented. The implemented system uses spoken natural language in providing task assistance to users engaged in the repair of an electronic circuit. The results describe the implementation's eectiveness and robustness as well as the impact of the variable initiative capability. The results provide clear indication that commercial application of this technology is within reach, and should stimulate thought about how to further improve the quality of spoken natural language interaction.

2 An Integrated Model of Natural Language Dialog Processing We assume that the purpose of the natural language system is to assist the user in carrying out some task that is ongoing with the dialog. Starting from a foundation based on the study of issues such as plan recognition, contextual reference, presuppositions, and user model (see in [1], [2], [3], [5], and [12]), we devised an architecture that integrates some of these features into a single system. This architecture is shown in gure 1. This section provides an overview of the model. A detailed description is provided in [10]. The dialog controller handles the overall coordination of the system's processing. The user task to which the dialog pertains is accomplished via completion of a set of domain-speci c task goals. As a result, the 2

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13)

C: U: C: U: C: U: C: U: C: U: C: U: C:

Is there a wire between connector 84 and connector 99? It is not there. Add a wire between connector 84 and connector 99. I need help. Obtain a wire. Done. Connect metal of one free end of the wire to connector 84. How? Bend back connector 84. Done. Insert free end of wire into connector 84. Wire between 84 and 99 is connecting. What is the LED displaying?

Figure 2: Sample Dialog Segment domain processor must provide a recommendation to the dialog controller about what should be the current task goal. Goal completion is described by a theorem; consequently, the dialog controller invokes theoremproving to determine if a goal has been achieved. When the theorem-prover noti es the dialog controller that the proof cannot be completed due to the absence of a required axiom, the dialog controller uses language to acquire the missing axiom. Dialog exibility based on user knowledge is achieved by maintaining a user model (see [6]) that consists of axioms about the user that are available during the theorem-proving process. The dialog controller uses language to acquire the missing axiom by computing a speci cation of the required utterance needed to get the user to verbally supply the missing axiom. For example, the controller may be trying to learn if there is a wire between connectors 84 and 99. An appropriate axiom might be of the form fact(wire(84,99),exist,X) where X=absent or X=present depending on whether or not the wire exists. If no such axiom is present in the knowledge base, the controller may compute an utterance speci cation such as question(fact(wire(84,99),exist,X)), transmitting this speci cation to the linguistic generator for verbalization (i.e. utterance 1 of gure 2). Thus, the key contribution of this model is the role of theorem-proving as a unifying feature that enables the dialog system to incorporate mechanisms for several independently studied phenomena of dialog into an integrated whole. Theorem-proving provides the motivation for the dialog | to acquire missing axioms. It also yields a natural de nition of the subdialog | all the language interaction about a recommended task goal. Finally, it provides a natural method for subdialog clari cation | simply modify the active theorem to include the new subgoals speci cally mentioned. This is illustrated in the sample dialog segment of gure 2 for which the theorem-proving/dialog relationship is illustrated in gure 3. The dialog has reached the point where the wire between connectors 84 and 99 is of main interest to the domain processor (MAIN GOAL 1). The dialog controller attempts to carry out the goal of observing this wire, but is stymied via the lack of an axiom in the knowledge base. Utterance 1 by the computer and the user response utterance 2 are required to acquire the missing axiom about the wire's status. Once it is learned that the wire is missing, the computer initiates a subdialog to have this wire added (MAIN GOAL 2). Utterance 4 initiates a clari cation subdialog about this main goal. Consequently, the dialog controller instructs the theorem-prover to insert into the active theorem a subgoal for learning how to accomplish the main goal (the \learn to do add" subgoal). Resumption of theorem-proving leads to an expansion of this subgoal in order to lead the user through each substep about adding the wire. The rst two substeps are to locate connectors 84 and 99. These are satis ed trivially by the theorem-prover without the use of language. The user model knowledge indicating the user knows how to locate these connectors was inferred from the user utterance, \It is not there." Continuing this subdialog, the subgoal introduced by utterance 7 (connect(end1,84)), requires initiation 3

MAIN GOAL 1

where X=absent (from utt. 2) Q missing axiom (utt. 1) J \It is not there" INFER

fact(wire(84,99),exist,X)

userknows(loc(84)) userknows(loc(99))

(utt. 2)

hhhhhhh h

do(action(add,wire(84,99)))

MAIN GOAL 2

hh

missing axiom (utt. 3) \wire between 84 and 99 is connecting"

(utt. 12) learn to (\I need help") hdohadd (utt. 4) h " b h h h h " b hhhhhhh " b " b connect(end2,99) "" bb (proof never started) " b " b b " locate 84 locate 99 obtain wire connect(end1,84) hD hhhhhh (utt. 7) D (utt. 5) missing axiom DD user model user model missing axiom axiom axiom learn atoado (\How?") % aaa(utt. 8) \done" % (utt. 6) aa % bend 84 insert(end1,84) ? ? missing axiom ? (utt. 11) missing axiom (utt. 9)

userknows(loc(84)) userknows(loc(99))

\done" (utt. 10) MAIN GOAL 3

XXXXX XXX

fact(LED,display,X)

missing axiom (utt. 13)

Figure 3: Theorem-proving/dialog relationship

4

of a second clari cation subdialog for learning how to do its substeps as a result of utterance 8. After completing the rst two of these substeps, utterance 12 indicates the wire has been added. The system searches through the possibilities for missing axioms for the unsatis ed goals introduced at utterances 11, 7, and 3 to determine that utterance 12 supplies the missing axiom for the main goal, adding the wire. Because each computer utterance addresses a speci c missing axiom, the system has speci c expectations for the user's response. Sample expectations for a response to utterance 3 include: (1) that the user performed the action (adding the wire); (2) that the wire exists; (3) that the user does not know how to perform the action; and (4) that the user needs help performing the action. Similar expectations for user responses at other points in the dialog can be computed from speci c domain knowledge about the required action (e.g. adding a wire) as well as general dialog knowledge. By searching through the dierent sets of expected responses, the system can determine which goal is relevant to the user's utterance. In the case of utterance 12, it is MAIN GOAL 2 which is directly relevant rather than the subgoals introduced at utterances 11 and 7. After it is determined that MAIN GOAL 2 is completed, proofs of the substeps are discontinued (note also that the proof of the subgoal \connect(end2,99)" does not even need to be started), and the dialog continues with the next main goal (MAIN GOAL 3). If at this point the user realizes the wire might have been misconnected, the user might say \where is connector 99?" in response to utterance 13. This is an example of an interrupt or an unexpected change to another subdialog. Interrupts are detected based on the expectations for user responses in the active subdialog as well as the possible expectations of other subdialogs. Based on a search through these expectations, it can be determined that the user utterance is relevant not to MAIN GOAL 3, but to MAIN GOAL 2, and the proof and corresponding subdialog about MAIN GOAL 2 can be reopened. We have seen how the principles of dialog processing operate, with one glaring exception. Namely, how does variable initiative dialog take place? This is our next topic.

3 Variable Initiative: Giving Priority to a Conversant's Goals 3.1 Description The need for variable initiative dialog arises because at some points the user has sucient knowledge to take control of the dialog and accomplish several goals without much computer assistance while at other times, as in the sample dialog, the user needs detailed assistance. In other words, user initiative is characterized by giving priority to the user's goals of carrying out steps uninterrupted, while computer initiative is characterized by giving priority to the speci c goals of the computer. However, initiative is not an all or nothing prospect. Either the user or computer may have the initiative without having complete control of the dialog. Based on this idea we have identi ed four dialog initiative modes that characterize the level of initiative that the computer has in the dialog. These are described below: 1. directive - The computer has complete dialog control. It recommends a task goal for completion and will use whatever dialog is necessary to obtain the needed item of knowledge related to the task goal. No interruptions to other subdialogs are allowed. 2. suggestive - The computer still has dialog control, but not as strongly. The computer will make suggestions about the task goal to perform next, but will allow minor interruptions to closely related subdialogs. 3. declarative - The user has dialog control. Consequently, the user can interrupt to any desired subdialog at any time, but the computer is free to mention relevant, though not required, facts as a response to the user's statements. 4. passive - The user has complete dialog control. Consequently, the computer will passively acknowledge user statements. It will provide information only as a direct response to a user question. 5

Our notion of dialog control extends the one given by Whittaker and Stenton [11]. They propose a de nition for dialog control based on the utterance type of the speaker (question, assertion, command, or prompt). We extend their view to consider the potential relationship between task control and dialog control as well as provide a computational model for participating in variable initiative dialogs. A brief summary of our model is provided in the remainder of this section.

3.2 Mechanism The computer can indicate its level of initiative only through its responses. Consequently, the choice of response topic is aected by dialog mode. In directive and suggestive mode, the computer has the initiative and should thus base its response primarily on its own goal. Conversely, in declarative and passive mode, the computer should select as its response topic something it believes relevant to the perceived user goal. As an example, suppose that the user initially states, \the light is o," and suppose the computer knows that in order for the light to be lit, the switch must be turned up to activate the power. Consequently, the computer goal of highest priority is to put the switch up. However, depending on the mode, any of the following goals could be selected: 1. directive - goal: make the switch be up; possible computer response: \put the switch up." 2. suggestive - goal: observe current switch position; possible response: \what is the switch position when the light is o?" 3. declarative - goal: user learn \switch is up ) power circuit is activated"; possible response: \the power is on when the switch is up." 4. passive - goal: user learn \computer understood user's last utterance"; possible response: \okay." The dialog segments of gure 4, obtained from actual usage of the implemented system, illustrate the dierences between mode for the two modes in which we experimentally evaluated the system: directive and declarative. Note the following phenomena from the dialog segments: 1. In the directive mode dialog segment the subject is performing task goals under the close guidance of the computer. There is language interaction about each task goal. 2. In the declarative mode dialog segment the subject independently carries out several task goals known to be necessary without any interaction. By allowing the user to arbitrarily change subdialogs, the computer is able to provide the relevant assistance when the potential problem is reported without requiring language interaction for task goals already completed. To evaluate user behavior with the system as well as the problem-solving eectiveness of the system under diering dialog initiative modes, an experimental study was conducted. Details about the implementation and its evaluation are described next.

4 Implementation and Evaluation The hardware con guration is sketched in gure 5. The dialog processor has been implemented on a Sun 4 workstation with the majority of the code written in Quintus Prolog, and the parser in C. The implemented dialog system assists users in repairing a Radio Shack 160 in One Electronic Project Kit. The particular circuit being used causes the Light-Emitting Diode (LED) to alternately display a one and seven. The system can detect errors caused by missing wires as well as a dead battery. Speech recognition is performed 6

Dialog Segment 1 (Directive Mode) C: U: C: U: C: U: C: U: C: U: C: U: C: U: C:

This is the circuit fix it shop. How may I help you? I want to fix a circuit. What is the id of the circuit? Rs111. I am familiar with that circuit. The LED is supposed to be displaying alternately flashing one and seven. The LED is off. What is the switch at when the LED is off? The switch is down. Put the switch up. Okay. Put the knob to one zero. Okay. What is the LED displaying? LED is displaying alternately flashing one and top of seven. Is there a wire between connector one zero two and connector one zero four?

************************************************************** Dialog Segment 2 (Declarative Mode) C: This is the circuit fix it shop. How may I help you? U: Rs111 is not working. C: I am familiar with that circuit. The LED is supposed to be displaying alternately flashing one and seven. NOTE:

Subject put the switch up and set the knob to one zero, observed the LED display, and noted the potential problem without requiring any assistance from the computer.

U: There is no wire on one zero four. C: There is supposed to be a wire between connector one zero four and connector one zero two.

Figure 4: Sample Dialog Segments: Directive vs. Declarative

Voice Input

SUN WORKSTATION

-

Speech Recognition on IBM PC

-

Parsing Dialog Processing Response Formulation

Figure 5: Hardware Con guration 7

-

Speech Output on DecTalk

by a Verbex 6000 running on an IBM PC. To improve speech recognition performance we restricted the vocabulary to 125 words. A DECtalk2 DTCO1 text-to-speech converter is used to provide spoken output by the computer.

4.1 Problem-solving Eectiveness The system was experimentally evaluated using eight Duke University undergraduates. These students had completed one computer science class and were taking a second, but none had taken any AI classes nor had they previously used a natural language system. Furthermore, none were electrical engineering students, who could have probably xed the circuit without any computer assistance. Each subject participated in the experiment in three sessions. During the rst session, a training session, subjects recorded their voice patterns for the speech recognizer and then practiced using the system on up to four \warmup" problems where the computer had maximal initiative (directive mode). The last two sessions were problem-solving sessions. In one problem-solving session the computer operated in directive mode, and in the other session, it operated in declarative mode, giving the user the initiative. Because we have not yet developed a good strategy for automatically changing modes during a dialog, we had to lock the system in one mode for the duration of a dialog to study the eects of dierent levels of initiative. In each of the problem-solving sessions a subject could work on up to ten dierent problems. A total of 141 problems were attempted of which 118 or 84% were successfully completed. Of the 23 that were not completed successfully, 22 were terminated prematurely due to excessive time being spent on the dialog. Misunderstandings due to misrecognition were the cause in 13 of these failures. Misunderstandings due to inadequate grammar coverage occurred in 3 of the failures. In 4 of the failures the subject misconnected a wire. The domain processor did not have sucient knowledge to properly assist the user in this situation. In one failure there was confusion by the subject about when the circuit was working, and in another failure there were problems with the system software. A hardware failure caused termination of the nal dialog. Thus, the cause of over two-thirds of the failures was due to miscommunication between the computer and user. The vast majority of these failures were due to misrecognition by the computer. In fact, out of 2840 utterances, only 50% were correctly recognized word for word by the speech recognizer. However, due to the development of an error-correcting parser [4] in conjunction with using dialog expectations to select among equally likely parses, 81.5% were still correctly interpreted. Furthermore, the robustness that a goal-seeking dialog processor provides enabled the vast majority of dialogs to be completed successfully in spite of the recognition problems. A detailed description of the experimental procedure and results is given in [9].

4.2 Eliminating Miscommunication Whenever spoken input is used, it is virtually certain that words will be misrecognized. This leads to ungrammatical utterances that the computer will misinterpret. Our error-correcting parser overcomes this problem by determining the corrections of \lowest cost" needed to transform the ungrammatical input received from the speech recognizer into a grammatical utterance. Allowed corrections are the insertion and deletion of words as well as the substitution of phonetically similar words such as \eight" and \it". The cost of a correction is pre-speci ed according to its impact on the utterance meaning. For example, the insertion or deletion of \not" should have a high cost while the insertion or deletion of \a" and \the" will normally have a low cost. In addition to the cost of the corrections, each utterance will have an expectation cost based on the likelihood that the utterance will be spoken in the current context. The overall cost is a function of the correction cost and expectation cost (see [4] for details). Despite the error-correcting parser, 18.5% of the utterances were still misinterpreted. Due to the potential for confusion, whenever a serious misrecognition occurred that caused the system to compute a contradictory interpretation of the user's utterance, the experimenter was allowed to notify the user that a misrecognition 2

DECtalk is a trademark of Digital Equipment Corporation.

8

had occurred, and to inform the user of the incorrect interpretation. An example from the experiment is the following: the user said \the circuit is working", but the speech recognizer returned the words \faster it is working"; this was interpreted as the phrase \faster"; consequently, the experimenter told the user, \Due to misrecognition, your words came out as `faster'." It is important to note that the experimenter did not tell the user what to do, but merely described what happened. In this way, the interaction was restricted to being between the computer and user as much as possible, given the current quality of commercial, real-time, continuous speech recognition devices, where the application domain has a vocabulary with many phonetically similar words (e.g. \switch" and \which", \eight" and \it", etc.). In general, misrecognition problems appear to be the limiting technological factor for commercial spoken natural language dialog. Even in the restricted conditions of the experimental system | 125 word vocabulary and speaker dependent recognition3 | the system still had a signi cant number of misrecognitions. Systems that allow less restrictive forms of spoken input such as a 1000 to 2000 word vocabulary of speaker independent speech are even more prone to misrecognitions. Thus, research continues into providing better recognition performance and understanding of spoken natural language inputs. Two systems that have been constructed to study the problem of spoken natural language understanding in the larger vocabulary, speaker independent environment are the MINDS system of Young et al. [13] and the TINA system of Sene [8]. MINDS was the rst spoken natural language system to exploit high-level dialog information to assist with the speech recognition process. As in our system, it uses domain-speci c and dialog-speci c knowledge to predict user responses. These predictions constrain the search during the speech recognition process for the next user utterance, improving word recognition accuracy from 82.1% to 97.0% and semantic accuracy from 85% to 100% in a test of 200 spoken sentences. MINDS illustrates the power of high-level expectations with a domain-speci c dialog environment of database query. TINA is designed to provide a portable spoken natural language interface to database query systems. TINA uses probabilistic networks in the parsing process. The networks are created dynamically from contextfree grammar rules where the probabilities are derived from frequency counts on the rules generated in parsing a set of training sentences selected by the system designer. The networks serve to provide expectations for the occurrence of words during parsing. The system has been evaluated in two dierent domains, successfully parsing 84% of a 200 sentence test set in a domain with a 965 word vocabulary while successfully parsing 76% of a 560 sentence test set in another domain. Another avenue of research is selective automatic veri cation, whereby the computer asks the user, \Did you mean to say X?", where X is the computer's interpretation based on the recognized words. Veri cation is selective in that it is only to be done when the correction cost of the selected interpretation exceeds a given threshold, implying that there is reason to believe there are substantial inaccuracies in the recognized words. Such a system was constructed after completion of our experiment and tests conducted to see what percentage of utterances might be correctly interpreted with veri cation. The maximum value achieved was 97% but with a tradeo of verifying 22% of the utterances that were initially interpreted correctly. It is still an open research problem to devise the best ways for reducing misrecognition and the possible confusion it entails.

4.3 The Eects of Initiative The comparative results of users as a function of the level of initiative followed the expected trends. When experimental subjects had the initiative they tended to:

Complete the dialogs faster (4.5 minutes in declarative versus 8.5 minutes in directive) Speak fewer utterances per dialog (10.7 in declarative versus 27.6 in directive)

3 Speaker dependent recognition requires each system user to individually record his or her voice patterns for use by the recognizer. In contrast, a speaker independent recognizer does not require each user to record his or her voice patterns in advance. Such a recognizer uses generic voice patterns already recorded.

9

Speak longer utterances (37% one word-utterances in declarative versus 60% in directive)

A tradeo occurred with respect to the impact of misrecognitions. A higher percentage of misinterpretations occurred in declarative mode, 25% as opposed to 15% in directive mode, but it was easier to recover from the misrecognitions in declarative mode. More misinterpretations occurred in declarative mode because user initiative encouraged longer user utterances, leading to more misrecognized words, but user initiative also allowed users to easily redirect the computer back to the appropriate goal after a misinterpretation. In contrast, when a misrecognition caused the computer to take up an erroneous goal in directive mode, there was little the user could do to change the course of the dialog until the computer realized it was not making progress. This points out one of the needs for automatic mode switching: to allow a user following a misrecognition to redirect the computer to the appropriate goal before yielding control to receive the computer's guidance again.

5 Where Do We Go Next? This paper has discussed a model for spoken variable initiative dialog. The current results show the utility of commercial speech recognition and speech production hardware provided that the constructed system has: (1) an integrated dialog processing model that combines a domain problem solver, general subdialog mechanism, and knowledge about the user for providing timely and coherent task assistance; and (2) a robust parsing and language understanding mechanism for correctly determining utterance meaning in the presence of misrecognitions. Speci cally, the implemented system demonstrates

problem-solving eectiveness: 84% of the problems were completed within the arti cial time constraints imposed for the experiment. Probably 95% would have been completed if more time had been allowed.

exible behavior: experimental subjects given the initiative used 45% less time to complete the problems. robust behavior: a signi cant percentage of problems were completed although over 18% of the utterances were misinterpreted due to misrecognition. real-time response: after the experimental testing was completed an improved parser was implemented. Running on a Sparc-2 workstation, the average response time by the computer for a subject tested with the enhanced system was 2.2 seconds.

With continuing improvements in speech recognition technology and the algorithms for: (1) using dialog expectations; (2) verifying suspicious user utterances; and (3) changing mode, spoken variable initiative dialog can provide a robust, exible, and adaptable interface to expert systems and other application programs for users of varying expertise and training.

6 Acknowledgments This research was supported by National Science Foundation Grant Number NSF-IRI-88-03802, and by Duke University. Other researchers who have contributed to this project include Alan W. Biermann, Robert D. Rodman, Ruth S. Day, D. Richard Hipp, Dania Egedi, and Robin Gambill. The author also expresses his grateful appreciation to the anonymous referees, whose comments have helped greatly in the preparation of the nal version of this paper. 10

References [1] S. Carberry. Plan Recognition in Natural Language Dialogue. MIT Press, Cambridge, Mass., 1990. [2] R.E. Frederking. Integrated Natural Language Dialogue: A Computational Model. Kluwer Academic Publishers, Boston, 1988. [3] G. G. Hendrix, E.D. Sacerdoti, D. Sagalowicz, and J. Slocum. Developing a natural language interface to complex data. ACM Transactions on Database Systems, pages 105{147, June 1978. [4] D.R. Hipp. Design and Development of Spoken Natural-Language Dialog Parsing Systems. PhD thesis, Duke University, 1992. [5] S.J. Kaplan. Cooperative responses from a portable natural language query system. Arti cial Intelligence, 19(2):165{187, 1982. [6] A. Kobsa and W. Wahlster, editors. User Models in Dialog Systems. Springer-Verlag, New York, 1989. [7] K. Lee, H. Hon, and R. Reddy. An overview of the SPHINX speech recognition system. In A. Waibel and K. Lee, editors, Readings in speech Recognition, pages 600{610. Morgan Kaufman, San Mateo, CA, 1990. [8] S. Sene. TINA: A natural language system for spoken language applications. Computational Linguistics, pages 61{86, March 1992. [9] R.W. Smith. A Computational Model of Expectation-Driven Mixed-Initiative Dialog Processing. PhD thesis, Duke University, 1991. [10] R.W. Smith, D.R. Hipp, and A.W. Biermann. A dialog control algorithm and its performance. In Proceedings of the 3rd Conference on Applied Natural Language Processing, pages 9{16, 1992. [11] S. Whittaker and P. Stenton. Cues and control in expert-client dialogues. In Proceedings of the 26th Annual Meeting of the Association for Computational Linguistics, pages 123{130, 1988. [12] R. Wilensky, D.N. Chin, M. Luria, J. Martin, J. May eld, and D. Wu. The Berkeley UNIX consultant project. Computational Linguistics, 14(4):35{84, 1988. [13] S.R. Young, A.G. Hauptmann, W.H. Ward, E.T. Smith, and P. Werner. High level knowledge sources in usable speech recognition systems. Communications of the ACM, pages 183{194, February 1989.

11