Goal-Oriented Multimedia Dialogue with Variable Initiative - CiteSeerX

0 downloads 0 Views 221KB Size Report
Variable Initiative. Alan W. Biermann, Curry Guinn, Michael S. Fulkerson, Greg A. Keim, Zheng. Liang, Douglas M. Melamed, Krishnan Rajagopalan.
Goal-Oriented Multimedia Dialogue with Variable Initiative Alan W. Biermann, Curry Guinn, Michael S. Fulkerson, Greg A. Keim, Zheng Liang, Douglas M. Melamed, Krishnan Rajagopalan Duke University, Durham, NC 27708, USA

Abstract. A dialogue algorithm is described that executes Prolog-style

rules in an attempt to achieve a goal. The algorithm selects paths in the proof in an attempt to achieve success and proves subgoals on the basis of internally available information where possible. Where \missing axioms" are discovered, the algorithm interacts with the user to solve subgoals, and it uses the received information from the user to attempt to complete the proof. A multimedia grammar codes messages sent to and received from the user into a combination of speech, text, and voice tokens. This theory is the result of a series of dialog projects implemented in our laboratory. These will be described including statistics that measure their levels of success.

1 Problem Solving and Subdialogues When two individuals collaborate to solve a problem, they undertake a variety of behaviors to enable a fast and ecient convergence to a goal. They reason individually to nd a sequence of actions that will achieve success and if they are successful, the problem will be solved directly. The reasoning, however, could uncover obstacles and they may then communicate to try to address them. One participant may see a solution to the problem if a speci c issue can be solved and bring that to the attention of the other. The other may be able to provide the needed support but may respond with other subproblems to be faced. They hand back and forth subproblems related to the global goal and solutions to them as they nd them. If a sucient set of the subgoals is solved, the global one will be solved also and the interaction will terminate with success. The problematic subgoals are thus the ones that generate most interactions. We will call the set of interactions related to a given subgoal a subdialogue, and every problem solving dialogue seems to be an amalgamation of such subdialogues. Thus we do not see a problem solving dialogue as a linear coverage from beginning to end. Rather we see it as a light footed dance in the space of mentionable topics with almost every step aimed at a problematic subgoal. The concept of subdialogues as constituents of dialogue has been studied widely. For example, Grosz and Sidner [5] call them segments and they give an analysis of dialogue phenomena in terms of a three part model. Speci cally, they de ne the intentional structure to be the reasoning mechanism that supports the interaction, the attentional structure to be the representation of current

and recent subtopics within the dialogue, and the linguistic structure to be the syntax of the interaction. As another example, Allen et al. [1] have developed a blackboard architecture for processing subdialogues and give extensive examples of their use in discourse understanding . Reichman [9] developed a somewhat grammatical view of discourse which she analyzed in terms of \context spaces," a construction analogous to our subdialogues. A theory of subdialogues must account for a variety of phenomena. There must be a way to decide when to open a subdialogue, and each participant must decide whether or not to follow the lead of the other participant. That is, the question of who has the initiative [10] must be decided and this may require some negotiation. A decision must be made within a subdialogue as to what is an appropriate question or answer to transmit to the partner and how to present it. This may include reference to a user model [7]. Sentence processing will necessarily reference the context of the subdialogue for the purposes of noun phrase resolution and other meaning linkages. The subdialogue context will make it possible to correct errors in the recognition of incoming speech. Metastatements must be made from time to time announcing the beginning and end of signi cant parts of the session, giving the session status, and otherwise encouraging collaboration and keeping the partners in synchronism. An example of a problem solving dialogue appears in Figure 1. This illustrates the usage of the Duke Programming Tutor as a student discovers she has forgotten to include quotes around the strings to be printed. All interactions are spoken except where noted. The subdialogues are marked; the top level subdialogue aimed at teaching the student a set of concepts in programming is listed as subdialogue 1; the speci c interactions aimed at nding the remaining error and at getting instruction on the writeln statement are listed as subdialogues 2 and 3. The collection of mechanisms that enable participation in problem solving dialogue constitutes a theory of such dialogue [1, 3, 5, 6, 9, 10, 11, 13]. This paper will overview such a theory, describe several implementations, and give data concerning its capabilities in sessions with human beings. For an overview of the literature on computer aided instruction, see [12].

2 A Theory of Dialogue for Machine Implementation The issue is how to build a machine that can participate in problem solving dialogue. The theory begins with a problem solving mechanism; it is a Prologstyle theorem prover [6, 10, 11] that can operate on a Horn clause database to infer goals in the application domain of interest. The system begins with a top level goal and attempts to prove that the goal is true using rules from the database1. If it succeeds, the system will announce that the goal is achieved and no signi cant dialogue will occur. If it fails, it will look for subgoals that would help complete the proof. Then it will consult its user model to determine whether 1

The system does not use traditional Prolog depth- rst search left to right processing. It selects paths using domain and dialogue criteria.

All interactions by student and computer are spoken except where noted. The subdialogues are indicated by integers. The interaction has proceeded to the point where everything is correct except that there are no quotes around the strings that are to be printed. Student I need more help. 1 Computer THE WRITELN STATEMENT 2 IS NOT CORRECT. (Figure 2) Student Explain writeln. 3 Computer THE WRITELN STATEMENT IS 3 USED WHEN YOU WANT TO OUTPUT SOME VALUE ONTO THE SCREEN DURING THE EXECUTION OF THE PROGRAM. Student Okay. 3 Computer TO USE THE WRITELN 3 STATEMENT, YOU HAVE TO TYPE IN THE FOLLOWING. (Figure 3) Student What's wrong with my program? 1 Computer THE WRITELN STATEMENT IS 2 NOT DONE YET. Student I do not understand. 2 Computer THERE IS SOMETHING 2 MISSING HERE. (Figure 4) Student I do not understand. 2 Computer YOU NEED TO ADD AN 2 APOSTROPHE HERE. Student fOh.g [Student inserts two 2 missing quotes in each writeln statement.] Okay. Computer CONGRATULATIONS. YOUR 1 PROGRAM IS CORRECT NOW.

Fig. 1. A spoken language and graphics dialogue between a student and the machine. any failed subgoals may be approachable by the user. Upon nding a needed subgoal that may be within the user's repertoire, it will initiate a subdialogue to try to achieve that subgoal. This interaction may involve the discovery of additional subgoals and their associated subdialogues, it may be interrupted by user initiated subdialogues, and a complex and highly fragmented interaction may ensue. The system controls ow continually, of course, to guide the session to ultimately solve the global goal.

Fig. 2. \THE WRITELN STATEMENT IS NOT CORRECT."

Fig. 3. \TO USE THE WRITELN STATEMENT, YOU HAVE TO TYPE IN THE FOLLOWING:"

Fig. 4. \THERE IS SOMETHING MISSING HERE." All of this can be illustrated by showing the design of our Duke Programming Tutor which is aimed at teaching a set of concepts to a student. The highest level goal is that the student understand the set of concepts and its subgoals are those concepts. This is shown in Figure 5. The subgoals below any speci c concept are its constituents and this decomposition proceeds to a very low level, the atomic concepts in the current domain. An important subgoal in the proof tree is that the user be able to do an example program. The tutorial session proceeds by selecting subgoals that the user model says are appropriate for this user at this time and then initiating dialogue to address them. Suppose, for example, that the user has been asked to write an example program and has typed in the code shown in Figure 2. (In order to serve a speci c course currently being taught at Duke, the system teaches Pascal.) The system compares this program with a model and creates a Prolog description of what the user has typed. The system then attempts to prove that the program is correct using this synthesized set of rules. The theorem proving tree is shown in Figure 6. It turns out that this program has an error in the writeln statements; the single quotes around the text to be printed are missing. Theorem proving proceeds down the tree and discovers that none of the following subgoals can be achieved: exerciseprog, body, writeln, leftquote. So it can undertake a subdialogue to try to achieve one of these subgoals. But the system must not initiate an iteraction without consulting the user model. It must decide what subgoal is reasonable to attempt. It must not use

Fig. 5. The goal tree for the Programming Tutor.

Fig. 6. The theorem proving tree for the student's example program. vocabulary or concepts that are outside the user's repertoire. Also it should not address the user at a level far below his or her capabilities. For example, this tutor is initially to be used in the rst weeks of the course when much emphasis is at the statement level of the code. So the user model would allow reference to statement level concepts and below: writeln, leftquote. The system could then choose the highest level one, writeln, and send it to the user: \THE WRITELN STATEMENT IS NOT CORRECT" (with highlighting). On the other hand, a rst day student would not be familiar with the concept of a statement and

might have to receive coaching at the character or word level: \YOU NEED TO ADD AN APOSTROPHE HERE" (with a displayed arrow). A more expert student (at a level we are not working with at this time) might be able to handle concepts like a sort routine or a hash function computation. For such a student, references at the statement level would be quite inappropriate unless higher level coaching failed. For the purposes of the current example, let us examine the system behavior after the statement \THE WRITELN STATEMENT IS NOT CORRECT" (with highlighting). Here are the possible user responses: Correct repair The user might see the error in the code and edit the program until the error is xed. This result would yield a revision to the Prolog representation of Figure 6 and would show the failed nodes related to this error now repaired. The theorem proving would continue to try to show that the exercise program is correct and look for other failure points in the proof. Additional questioning The student might fail to understand the recent output. Either the user model was incorrect or the student just needs more help. Here the system steps to a level lower in the theorem proving tree and gives another output: \THERE IS SOMETHING MISSING HERE" (with a graphic arrow).

Fig. 7. Proving the assertion that the user understands writeln.

Related remarks The student may give an assertion or ask a question related to the recent output. For example, the response might be to ask a general question about the particular construction at hand: \Explain writeln." Here the system does a nearest neighbor search of expected meanings in the current context and tries to nd a local meaning with a syntax that is near (in the Hamming distance sense) to the user's utterance. If it nds such, it makes a corresponding entry in the Prolog database and computes its appropriate answer.

In the current tutorial example, the dialogue algorithm attempts to prove and fails. Figure 7 shows the knowledge base for the writeln statement. The same mechanism as described above governs processing. We will assume the predicate understand statement has been proved through earlier interactions; the user knows what a statement is. Next the system examines use writeln, checks that the user model allows the interaction, and then enunciates \THE WRITELN STATEMENT IS USED WHEN YOU WANT TO OUTPUT SOME VALUE ONTO THE SCREEN DURING THE EXECUTION OF THE PROGRAM." If the user indicates understanding with \okay" or some paraphrase of it, the assertion use writeln will go into the database and control will return to the dialogue system. It will continue the proof of explain writeln by enunciating the predicate syntax writeln. If the user indicates lack of understanding at this point, then the subgoals below syntax writeln will be enunciated to the user: \THE SYNTAX OF THE WRITELN STATEMENT REQUIRES THAT YOU FIRST TYPE THE KEYWORD WRITELN", \YOU THEN HAVE TO TYPE A LEFT PARENTHESIS, '(' ", and so forth. Assuming all of these receive positive con rmation from the user, syntax writeln will be proven and control can return to the dialogue machine. If any of them fails, they could conceivably be decomposed to lower level concepts and their associated explanations. Unrelated remarks The user may assert something that does not relate to the subgoal in any direct way. The system has a priority list of other subgoals and proceeds to search them in order for a (low Hamming distance) match with the incoming utterance. Upon nding one, its behaviors will proceed as described in the previous case. If no match is found, the system can ask for a repeat of the assertion or follow its own initiative for the next step. Metastatements Metastatements [6] are de ned to be assertions above or about the dialogue. They are not part of the information exchange needed to solve the problem but rather refer to what has just been done, what should be done next, or how the interaction is proceeding. They serve the purpose synchronizing the interaction, maintaining morale for the participants, and directing attention to key issues. Our system has few abilities to handle such statements from the user. But if the user says \Please repeat," the system responds appropriately. In summary, the dialogue system has a top level algorithm that cycles through the steps described above. Its functioning is approximated by the algorithm shown in Figure 8. explain writeln

3 Communication with the Reasoning System The internal representation for the purposes of reasoning is in terms of logical predicates. The external form is in terms of spoken sentences, graphical objects (pointers or highlighting, in our case), and displayed text. A translation [4, 8] is needed between the two forms so that the user can communicate with the

Attempt proof of top level goal. If success then halt. While top level goal is not achieved: Select failed subgoal If system has initiative, its own scoring will govern choice. If user has initiative, the choice will follow user's selection. If selected subgoal is approachable as indicated by user model: Engage in subdialogue. Enter results into database.

Fig. 8. The dialogue algorithm. internal system. This section will describe a type of multimedia grammar that can do this translation successfully. The grammar is based upon a set of operators which manage the syntax and semantics of the language. A given multimedia communication is accounted for by a sequence of these operators as will be described here. An operator will have ve constituents: a. a name b. an applicability criterion c. a syntactic form d. a semantic form e. a complexity An example such operator is the noun-line operator associated with the noun \line." It has the following ve constituents: a. noun-line, b. initializes a noun phrase, c. \line" (spoken), d. collect all objects of type line from the input stream, e. 1 The usage of this operator can be illustrated by applying it to the program in Figure 2. That is, we could begin constructing a noun phrase with the spoken word \line" and the associated semantics would collect all of the lines in the input stream. If the program in Figure 2 is the input, then the semantics of noun-line would be as shown in Figure 9(a). The collected objects are those items delineated by square brackets. Another operator corresponds to the use of spoken ordinal words such as \ rst" or \tenth." a. ordinal,

b. c. d. e.

previous operator should be a noun or adjective, the spoken ordinal word appended to the left of the phrase, collect the speci ed single object, 1

We can apply this operator instantiated at \ fth" to the result of the nounline operator in Figure 9(a) to obtain the object in Figure 5(b). It counts through the square bracketed objects until its nds the fth one. It sends that to the output stream. In a similar fashion, a variety of operators can be constructed for other parts of speech. Without further discussion, we note that the complete noun phrase \the ninth character in the fth line" can be easily accounted for as shown in Figure 9. The semantics resolve to the single object [D] which is, in fact, the ninth character in the fth line of the original program. The complexity of this operator sequence is 7. An example of an operator for the spoken article \this" accompanied by an appropriately aimed arrow on the screen is as follows: a. art-pointer b. the previous operator should be a noun or adjective, c. \this" (spoken with an arrow aimed at the designated item), d. collect the speci ed object, e. 1 It is easy to see how this operator can be used instead of some of the others listed above to account for these forms instead of the one shown in Figure 9: \the ninth character in this line" (with graphics arrow) complexity = 6 \this character" (with graphics arrow) complexity = 2 Thus there are many syntactic forms capable of expressing a given meaning and the purpose of the complexity computation is to narrow the search to only a few forms one of which will be most appropriate for de ning the target meaning for the utterance. The mechanism of the complexity computation gives the system the ability to adapt its communications to the current situation being encountered. For example, a normal user operating in a typical environment might appropriately receive a lot of speech communications with graphics aids as described here. However, if the situation changed so the user was distracted from seeing the screen, the complexities of the graphics outputs could be increased greatly to lter them out of the communications. In another situation, the ambient noise could become intolerable so that spoken communications cease to function successfully. In this case, spoken outputs could be rated as highly complex and they would be ltered from the outputs. In general, we allow the full power of a programming language in the syntax and semantics slots for the operators. Thus the syntax could be coded to operate a mechanical arm or adjust the expression on a simulated face. However, our

own usage has included only spoken and displayed words and graphics arrows or highlighting of one kind or another. Certainly, this mechanism is capable of processing full sentences, but we omit the additional discussion here. Notice also that our mechanism works both as a generator of communications or as a recognizer of them. This can be illustrated by re-examining the generation described above. As a generator, one begins with the program in Figure 2 and tries to nd a way to specify a particular [D]. Here one searches the space of operators for a sequence that speci es that [D] and a side e ect is one of the syntactic forms given above. As a recognizer, one begins with the example program and a syntax such as \the ninth character in the fth line." Here we search for the sequence of operators that creates the syntax and a side e ect is a computation of the meaning which is that certain [D]. While the system can function in both directions, our recent implementation has used it only in the generation mode. We have a theory of error correction for speech input in the context of subdialogues that we have not been able to combine with the multimedia grammars. This is illustrated in Figure 10 which shows in the rightmost column a series of expected meanings after the machine utterance \THE WRITELN STATEMENT IS NOT CORRECT." These expectations come from the contexts described above: correct repair, additional questions, related remarks, and so forth. Associated with each meaning is a set of possible syntactic forms as shown in the second column. The incoming utterance, which typically will contain misrecognition, is shown on the left in Figure 10 and the computation is to nd the syntax that is closest to one of the syntactic forms of the second column. The distance is a Hamming-like computation with heavier weighting for the more important words. This process is highly related to what is referred to in the literature as plan recognition [2].

4 Variable Initiative If one participant has the major knowledge related to a subgoal, eciency requires that that individual control the interaction. However, when the current subgoal is complete, the other participant may have the more complete knowledge of some new subgoal so control should be returned. Only if these changes are made quickly to account, at every instant, for where the initiative should be will the dialogue move forward at its best speed. In realistic situations, the case is not always as clear cut. One participant might seem to have slightly more knowledge at a given time but, in fact, be willing to yield the initiative if the other makes a strong assertion. Thus variable initiative is the most general and useful feature. A system may be strongly directive and always follow its own selection of subdialogues without regard for the inputs from the partner. It may be weakly directive and proceed similarly but with the option of accepting partner goals if they are strongly asserted. It may be mildly passive and follow the partner speci ed goals except where it

[program t;] [var] [ answer:string;] [begin] [writeln(Do you like mathematics?);] [readln(answer);] [if answer = 'yes' then] [ begin] [ writeln(You should like this course.);] [ end] [else] [ begin] [ writeln(Learning Pascal will help you.);] [ end;] [readln;] [end.]

(a) Semantics for the syntax \line."

Complexity = 1

[writeln(Do you like mathematics?);]

(b) Semantics for the syntax \ fth line."

Complexity = 2

[writeln(Do you like mathematics?);]

(c) Semantics for the syntax \the fth line."

Complexity = 3

writeln(Do you like mathematics?);

(d) Semantics for the syntax \in the fth line."

Complexity = 4

[w][r][i][t][e][l][n][(][D][o][ ][y][o][u][ ]....

(e) Semantics for the syntax \character in the fth line."

Complexity = 5

[D]

(f) Semantics for the syntax \ninth character in the fth line."

Complexity = 6

[D]

(g) Semantics for the syntax \the ninth character in the fth line." Complexity = 7

Fig. 9. The operator sequence for the syntax \the ninth character in the fth line."

The recognized user utterance

Examples of expected syntax

Expected meanings

(Erroneous input from the recognizer)

(Derived from the expected meanings)

(Derived from the dialogue context tree)

It is fixed. I found it. No. Here is the correct version. Etc.

assert(error_repaired)

Is the error at the keyword? Is writeln spelled wrong? No. Etc.

quest(error_at,keyword)

? ? Explain

white line

Is there supposed to be a left parenthesis? Does a parenthesis follow writeln? Etc.

No.

?

quest(error_at,left_paren)

? ?

No. Is it the parameter?

quest(error_at,arg)

Is the argument wrong? Is the argument correct? Etc.

? ? No.

Is there supposed to be a right parenthesis? Etc.

Yes!

I do not understand writeln. Explain writeln. How does writeln work? Etc.

quest(error_at,rt_paren)

quest(explain,writeln)

No. Please repeat. Say again. Etc.

quest(meta,repeat)

Fig. 10. Error correction in the context of expected meanings. The system nds the best match using a modi ed Hamming distance criterion between the incoming recognized speech and the allowed syntactic versions of the expected meanings.

nds very strong reasons for following its own preferred path. Or it may be very passive and follow the partner without exception. The theory of dialogue described here [6, 10, 11] allows for coding these behaviors. First, the system must have methods for selecting its preferred next subgoal. A reasonable way to do this is to use heuristic information from the domain in conjunction with locally available information to rank all suggested subgoals. Then the highest ranking ones are preferred. The system then selects its best choice for the next subgoal. When the user responds, however, the decision must be made as to whether to follow the user or to more strongly assert its own selection. If the system is in a strongly directive mode, it will tend to not follow the user's preference and if it is in a passive mode, it will follow the user. Guinn [6] has developed a scheme that maintains real numbers on the nodes of the proof tree indicating their desirability as subgoals to be examined. It uses its user model to estimate whether it should take the initiative on a given desirable subgoal. If it chooses to take that subgoal and the user objects, it enunciates the major branches of the subtree that cause it to choose that tree. (This is called negotiation.) This is an attempt to get the user to accept that subgoal and to follow it.

5 Some Examples of Prolog-Style Rule-Based Dialogue Systems and their Characteristics Our project constructed a system called the Circuit-Fixit-Shoppe [10, 11] to gain experience with the ideas described above. This system used speech only for both input and output and contained a database of information for circuit repair. The system used a Verbex 6000 speaker-dependent connected-speech recognition system with a small vocabulary grammar (125 words). It used a Dectalk system for speech output. It had four levels of initiative which were set manually. The circuit being repaired was set up on a Radio Shack experimenter's board so that the user could easily make measurements and change the existing wiring. The system was tested with a series of eight users who attempted a total of 141 circuit repairs and who spoke a total of 2840 utterances. The experimental sessions resulted in 84 percent successful repairs. The input utterances were spoken to the system at the average rate of nearly three per minute. The speech recognizer perfectly recognized 50.0 percent of the inputs but the error correction system using the context and nearest neighbor strategies raised the recognition up to 81.5 percent. The system was tested at two levels of initiative, directive and mildly passive. The main di erence between the two modes was that the sessions were much longer involving more and shorter spoken inputs when the system was in directive mode. In this mode, the system tended to force the user through long sequences of steps that he or she would have preferred to skip. With the system in the mildly passive mode, the users tended to use longer sentences and move much more rapidly to a solution to each problem. A second system [6] was built to test ideas related to mixed initiative and negotiation. This system had a new dialogue algorithm that had been proven

Execution Time (sec.) Number of Utterances Branches Explored

Without Mixed With Mixed Init. and Neg. Init. and Neg. 93 47 41.4 28.1 6.2 3.3

Fig. 11. Averages for execution time, number of utterances, and number of branches explored below a node for solving 206 problems with and without mixed initiative and negotiation features.

to be mathematically correct. It was implemented without any translation to external media; that is, its only essential language was the predicate calculus language of its theorem proving system. The system was analyzed and tested with three new features: an automatic initiative setting feature, a negotiation feature, and a summary feature that enables the system to explain to its partner a failed reasoning chain to prevent its recurrence. The system was tested by setting up two copies of itself, each with a di erent portion of a total database of facts. Then the two copies of the system were allowed to carry out a dialogue by passing predicates back and forth until the dialogue goal was proven. The table in Figure 11 shows the execution times, number of utterances, and average number of branches explored below a node for solving 206 problems when the features are turned o and on. The graphs show that eciency nearly doubled on several dimensions when the mixed initiative and negotiation features were turned on. More recently, our project has implemented the Duke Programming Tutor system as described above. This system has the complete reasoning and multimedia capabilities described and was tested in a prototype version in September, 1995 by eleven students in a rst course in computer science. All eleven students were able to use the system successfully on their assignments although one did not nish in the speci ed time period. The average length of the dialogues was about 16 minutes, during which students spoke an average of 28 utterances (1.78 utterances per minute). The word error rate was about 15 percent.

6 Summary Dialogue theory is the implementing technology for speech and general multimedia interactive systems. It asserts that ecient convergence to a goal involves the discovery of obstacles to its achievement and the immediate addressing of them. This leads to subdialogues and the many jumps from one to another as the problem solving process goes forward. A dialogue system is built around a reasoning mechanism, in our case an executive for Prolog-style rules. The dialogue is driven by the theorem proving steps; speci cally, most interactions address the failures to complete the proof. The communications between the system and the outside world require a trans-

lation mechanism that can convert between the internal predicate calculus and the multimedia external forms. Once the structure of the system has been created, one can nd ways to include many capabilities such as user modeling, variable initiative, negotiation, expectation to error correct speech and many others.

7 Acknowledgment This work is supported by Oce of Naval Research Grant No. N00014-94-10938, National Science Foundation Grant No. IRI-92-21842, and a grant from the Research Triangle Institute which is funded by the Army Research Oce.

References 1. J. Allen, S. Guez, L. Hoebel, E. Hinkelman, K. Jackson, A. Kyburg, and D. Traum. The discourse system project. Technical Report 317, University of Rochester, 1989. 2. J. F. Allen and C. R. Perrault. Analyzing intention in dialogues. Arti cial Intelligence, 15(3):143{178, 1980. 3. J. F. Allen, L. K. Schubert, G. Ferguson, P. Heeman, C. H. Hwang, T. Kato, M. Light, N. G. Martin, B. W. Miller, M. Poesio, and D. R. Traum. The TRAINS project: A case study in building a conversational planning agent. Journal of Experimental and Theoretical AI, 7:7{48, 1995. 4. S.K. Feiner and K.R. McKeown. Coordinating text and graphics in explanation generation. In Proceedings of the 8th National Conference on Art cial Intelligence, volume I, pages 442{449. AAAI Press/The MIT Press, 1990. 5. Barbara J. Grosz and Candace L. Sidner. Attention, intentions, and the structure of discourse. Computational Linguistics, 12(3):175{204, Sep 1986. 6. Curry I. Guinn. Meta-Dialogue Behaviors: Improving the Eciency of HumanMachine Dialogue{A Computational Model of Variable Inititive and Negotiation in Collaborative Problem-Solving. PhD thesis, Duke University, 1995. 7. Alfred Kobsa and Wolfgang Wahlster, editors. User Models in Dialog Systems. Springer-Verlag, Berlin, 1989. 8. M.T. Maybury, editor. Intelligent Multimedia Interfaces. AAAI/MIT Press, 1993. 9. R. Reichman. Getting computers to talk like you and me. The MIT Press, Cambridge, Mass., 1985. 10. Ronnie W. Smith and D. Richard Hipp. Spoken Natural Language Dialog Systems: A Practical Approach. Oxford University Press, 1994. 11. Ronnie W. Smith, D. Richard Hipp, and Alan W. Biermann. An arctitecture for voice dialog systems based on prolog-style theorem proving. Computational Linguistics, 21(3):281{320, September 1995. 12. Etienne Wenger. Arti cial Intelligence and Tutoring Systems. Morgan Kaufmann Publishers, Inc., Los Altos, CA, 1987. 13. S. R. Young, A. G. Hauptmann, W. H. Ward, E. T. Smith, and P. Werner. High level knowledge sources in usable speech recognition systems. Communications of the ACM, pages 183{194, August 1989. This article was processed using the LATEX macro package with LLNCS style

Suggest Documents