ABSTRACT. We are interested in the mechanisms needed to control computer initiative to allow systems to engage in mixed- initiative task-oriented/planning ...
Using Protocols to Model Mixed Initiative Interaction TIMOTHY FOSSUM and SUSAN HALLER Computer Science Department University of Wisconsin - Parkside Kenosha, WI, U.S.A email: fossum, haller @cs.uwp.edu
f
ABSTRACT We are interested in the mechanisms needed to control computer initiative to allow systems to engage in mixedinitiative task-oriented/planning (TOP) dialogues. A TOP dialogue is about an ongoing task that may require replanning during its execution. We discuss our model for initiative and motivate the use of protocols. Finally, we present some protocol rules and analyze a TOP dialogue exchange in terms of them. KEY WORDS Natural Language, Dialogue Management, Mixed Initiative
1. Introduction We are interested in the mechanisms needed to control computer initiative to allow systems to engage in mixed-initiative task-oriented/planning (TOP) dialogues. A mixed-initiative TOP dialogue occurs when the participants negotiate turns that maintain or relinquish control of the interaction in dialogues that people use to do work. A task-oriented dialogue is about a task that goes on during the discussion. In contrast, a planning dialogue is about making a plan for a task that will take place after the dialogue. A TOP dialogue is about an ongoing task that may require replanning during its execution. Mixed-initiative control is essential in these interactions because each participant may have information required to carry out the task successfully. Furthermore, circumstances may change unexpectedly as a result of performing the task. As an example of mixed initiative in a TOP dialogue, consider the hypothetical discussion between petroleum engineer E and robot system S in Figure 1. In the situation, E is constructing and executing a plan with S to take temperature readings in pipes in a distribution system. S tries to plan the shortest path between points based on its partial knowledge of the connectivity of the pipes and the position (open or closed) of the valves connecting them. At line 1, E states that she needs the temperatures in pipes P3 and P7. S agrees to E’s overall task proposal implicitly by taking the initiative to formulate the first subplan of it and reports the activity to E as it executes the subplan (lines 2–3). Afterwards, S continues by formulating the next subplan of the overall task and again reports it to E as execution starts (line 4). However, this plan is
g
flawed. To report task failure, S tells E about the point of failure (line 6). By pausing, S cues E to take the planning initiative. E does so and proposes the first steps of a new route to P7 (line 7). The route suggested by E implies that there is a pipe connection between P2 and P5 that S did not know about. S verifies this fact (lines 8 and 9) and uses it to formulate a route all the way to P7. S signals recognition and acceptance of E’s proposed route with the acknowledgment “Ok”, and as before, S reports the state of the task to E as execution starts (line 10). This plan is also flawed, and S reports its failure (line 12). However, this time S has the knowledge necessary to reformulate the plan from P5 to the destination P7. However, instead of taking the initiative to start the task, S only takes the initiative to propose its plan to E (line 13). E approves it (line 14) and S performs the task implicitly communicating its completion by reporting the temperature in P7 (line 16). Figure 1 illustrates that to participate in a TOP dialogue, a system must be able to recognize the plans that another participant is making, both at the domain level and at the level of the communication. Moreover, to participate in the interaction, the system must reason about the domain plan and also the language to use to point out difficulties and ask for additional information. The initiative in this interaction shifts between participants to share information, to confirm beliefs, and to propose plans of action.
2. Computational Models of Mixed Initiative Interaction In research on TOP dialogues, it is assumed either that the dialogue is controlled by one participant or shared by participants. Smith and Hipp define system dialogue initiative in terms of the system’s overall level of assertiveness [1]. Smith and Hipp identify four dialogue modes that characterize different levels of system initiative in the conversation. In their model, the mode is set at the beginning of each interaction and it cannot be altered during the course of discussion. Their approach to managing dialogue initiative works because the system’s problem-solving knowledge and limitations are always the same: the system has complete knowledge of how to perform the task, but only the user knows the state of the task and has the ability to
DSP1
DSP2
E: S:
E: S: E: S:
E: S:
I need the current temperatures in locations P3 and P7. I will go to P3 through P2. [moves as indicated℄ The temperature at P3 is 33 degrees Celsius. I am going back through P2 to P4 and then to P7. [moves to P2 but is blocked from P4 by a closed valve.℄ The valve to P4 is closed. [pause℄ Try going back through P2 to P5. Is P2 connected to P5? Yes. Ok. I am going through P5 to P6, and then to P7. [moves back through P2 to P5℄ The valve to P6 is closed. It’s possible to get to P7 through P9 and P8. Ok. [moves as indicated℄ The temperature in P7 is 42.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Figure 1. A Task-oriented/Planning Dialogue
affect this state. In general, situations where one participant (either system or user) is the designated controller of the interaction simplifies the problem of modeling the discourse. However, as Guinn suggests, it is often the case in TOP dialogues that the dialogue initiative will shift between participants because each participant has knowledge and abilities to contribute to accomplishing the task [2, 3]. The question of how to represent and reason about mixedinitiative dialogues has been called the dialogue question [4]. Initiative has been considered in discourse models for tutoring systems [5, 6]. However, in designing initiative control mechanisms in tutoring systems, role assumptions like teacher and student and the schematic structure of the tutoring session can be exploited. In contrast, in taskoriented dialogues where no one is the accepted authority, the dialogue initiative may need to be negotiated on the basis of such factors as who actually knows more or the personal assertiveness of each participant. Grosz and Sidner note that the structure of taskoriented dialogues often follows the structure of the domain task under discussion [7]. Guinn used this assumption to design a model for managing interaction. In this model, initiative shifts between the agents occur at transition points between discussion of goals and subgoals of the task [3]. Because knowledge for planning the task is distributed between the two agents, they are forced to exchange initiative at transition points to exchange information. Planning initiative and dialogue initiative are assumed to be one in the same. Even when a goal is a dialogue goal only (a clarification, for example) it is considered to be a subgoal of the overall task. Chu-Carroll and Brown distinguish between dialogue initiative and planning initiative1 [8] and they are tracked separately in their model. Based on their analysis of the TRAINS-91 dialogue corpus [9], Chu-Carroll and Brown 1 Chu-Carroll
calls this task initiative.
note instances where a participant clearly takes the dialogue initiative but not the planning initiative. For example, in the exchange U: S:
Let’s get the tanker car to Elmira and fill it with OJ. You don’t have OJ in Elmira.
S takes the initiative to point out the invalidity of U’s plan without proposing an alternative plan or steps that repair U’s proposed plan. In this model, an agent has the dialogue initiative if she takes control of the conversation to establish mutual beliefs about a piece of domain knowledge or the validity of a proposal, and an agent has the planning initiative if she is suggesting steps for how the task should be accomplished. An agent who takes the planning initiative will necessarily take the dialogue initiative. However (as in the above example), an agent can take the dialogue initiative without taking control of planning.
3. Modeling Task, Plan, and Dialogue Initiative We are developing a model for mixed-initiative interaction that allows a user to engage in a TOP dialogue. The system’s knowledge and the user’s knowledge may be different, but the two can interact to share information. Also, both the user and the system are able to plan. To motivate our model of initiative, consider the segmentation by discourse purpose of the dialogue in Figure 1. The overall task in the dialogue is to get temperature measurements in pipes P3 and P7. The annotations in Figure 1 show how the dialogue decomposes into two discourse segment purposes, DSP1 (getting the temperature in P3) and DSP2 ( getting the temperature in P7) [7]. Each of these discourse segments corresponds to a subtask that S has planned and
is executing. However, in DSP2 there are initiative shifts as E and S replan the task when confronted with new information. Note that there is no segmentation into subtask purpose in DSP2 because the entire plan for this subtask must be flushed. That is, when the valve between P2 and P4 is found to be closed, S and E do not enter a subtask discussion about how to get through the valve. Rather, they enter into a plan discussion to replan the entire subtask starting from this new position. We believe that there is a need to distinguish between task dialogues and planning dialogues. As previously defined, a task-oriented dialogue is about the ongoing execution of a plan, and a planning dialogue is about planning steps in advance to achieve a task goal. For example, when S informs E that the valve to P4 is closed (line 6) S is reporting on the state of the task and is engaging in a task dialogue. If either S or E knew of a plan to open the valve, the task dialogue would have an embedded subtask dialogue (with a different discourse segment purpose) concerning opening the valve. Instead, the closed valve indicates plan failure (neither E nor S can open the valve), which requires S and E to enter into a planning dialogue to formulate a new plan to get to P7. Since this plan is for the same purpose, it is not an embedded discourse segment ( in DSP2 ) with its own discourse segment purpose. In our terminology, the theories of Guinn and ChuCarroll and Brown apply to planning dialogues. They don’t account for failure of an ongoing task and subsequent replanning. In contrast, Smith and Hipp – in the Circuit Fixit Shop – consider only task dialogues. Each initiative shift corresponds to the initiation of a subtask. Since the system has complete knowledge of the task and the plan to carry it out, the system and the user do not engage in any planning dialogue. In the dialogues we are considering, both participants are capable of contributing plans and knowledge about the domain. The task is ongoing – requiring a task dialogue, and plans may fail – requiring a planning dialogue. As a result, the model of initiative in dialogue that we have developed distinguishes between three kinds of initiative: task, planning, and discourse. We define a dialogue participant as having the planning initiative if the participant is suggesting steps for accomplishing a task. A participant has the task initiative if she knows the plan for a task (or subtask) and is directing its execution. Following ChuCarroll and Brown, a participant has the dialogue initiative if she is controlling the conversation to establish mutual beliefs about a piece of domain information or the validity of a proposal. Guinn discusses the use of negotiation to resolve conflicts over who shall have the initiative in planning a given subtask [3]. Guinn uses a predictive model to resolve these conflicts, and the participant who gets the initiative is based on the weighting of positive and negative evidence for each plan. However, initiative conflicts in interaction are often the result of unpredictable events that are not resolved by a predictive model. Furthermore,
these conflicts are often simple clashes that result from both participants trying to take the initiative at the same time. Such conflicts do not necessarily require complex negotiation to resolve. Often, unwritten rules based on factors like social roles, personal assertiveness, and the current locus of control play a part in determining who will give way. We are designing a flexible and adaptive model for managing initiative in human-computer TOP dialogues that allows for simple resolution of conflicts based on the current locus of control (who has the initiative) and the personal assertiveness of the human participant.
4. The Need for Protocols Protocols are sets of rules that govern the interaction of agents in a distributed system [10]. While protocol design has been used primarily in the context of computer networks and data communication, we believe that protocols are an ideal framework to model human-computer interaction as well since protocols must deal with asynchronous and unpredictable events. To motivate the use of protocols in our model, consider an initiative cue type proposed by Chu-Carroll and Brown [8]. They note that a shift in initiative is indicated if there is a perceptible silence or pause, which indicates a discourse boundary. An example is line 6 of our interaction (Figure 1) where S pauses, resulting in a shift of planning and dialogue initiative to E. In line 12, S utters a statement similar to that of line 6, but instead of pausing, S introduces a plan sub-dialogue (line 13), keeping the planning and dialogue initiative. The absence of a pause indicates to E that she is not expected to take the planning and dialogue initiative. In contrast, what if S pauses slightly (perhaps because of an event as innocuous as garbage collection)? Will E consider this as an offer to shift initiative? If E “takes” the initiative, but S still formulates its own plan, under what circumstances should S still express it? As another example, consider part of the dialogue in Figure 1. After discovering that the valve to P4 is closed (line 6) E takes the planning initiative to suggest another route to P7 (line 7) and confirms some domain knowledge for S (line 9). At line 10, S retakes the task initiative to finish planning the route and start execution. However, when a similar failure of this plan occurs at line 12, S takes the planning initiative to suggest a route to P7 without taking the task initiative. Instead S waits for E’s approval. Here, S is negotiating control differently based on previous events – having failed a second time, S seeks approval of its plan before taking the task initiative. As events change knowledge, protocol rules quickly transform these changes into changed behavior. As a third example, consider a modification to part of the dialogue in Figure 1 shown in Figure 2. In line 7, E takes the planning initiative to suggest a new route to P7. Suppose this licenses S once more to take the planning and task initiative to complete the suggested plan and start exe-
:::
S: E: S: E:
The valve to P4 is closed. [pause℄ Try going back through P2 to P5. Ok. I am going through P5 to P6, and then to P7. [starts to move back through P2 to P5℄ Wait, the valve to P6 is closed.
6 7 8 9 10
:::
Figure 2. A Task-oriented Dialogue - Modified
cution. E does not need to second guess what S is going to do because E knows she can interrupt S. This is what happens at line 10: E interrupts S’s task initiative to indicate to S that its task will fail. This kind of flexible initiative resolution with S is desirable so that E doesn’t have to consider all of the possible effects of her utterances on S’s behavior before speaking. These examples are based on asynchronous events and interactions that the agents have little control over and are not resolved using the predictive negotiation method described by Guinn. Even Cohen and Levesque (who have developed rigorous logical theories of communication in terms of agent’s beliefs, intentions, and actions) qualify their work with the admission that “such a program [of activity℄ does not mean that [they℄ believe all language use is completely and consciously thought out and planned for.”[11]. Likewise, we fully admit that – in general – a large component of rational interaction is planned behavior that requires complex negotiation and reasoning. However, we believe that most of the simple initiative clashes that we have illustrated here (and that occur often in TOP dialogues) can be handled efficiently and effectively by using protocol rules.
5. Modeling Mixed Initiative We are representing our protocol rules for plan execution using PROMELA, the protocol modeling language described by Holzmann [10]. In the first phase of this project, we are translating plans for traversing the pipe system - generated by the planning-and-acting-inference engine SNeRE [12] - into PROMELA code that executes the plans as tasks in a virtual pipe world using an external PROMELA interpreter. The channel FIFO queue model described by Holzmann is a simple and powerful mechanism for agent-toagent communication and synchronization. A channel c is created with the variable declaration chan c. Reading (retrieving) an item x from a channel queue is represented by a command of the form c?x, where c is a channel identifier. Writing (putting) an item y on a channel queue uses the command form c!y. Channel reads block if nothing is in the queue. Channel writes block if the queue is full. The PROMELA code fragment in Figure 3 is from the listener that models the top-level interaction between
if :: type == TASK_REQUEST -> chan c; run proceduralize_plan(rep, c); // rep is aplan if // to carry out :: c?FAIL,reason -> say!reason; chan c1; run replan_request(reason, rep, c1); ... :: c?SUCCEED,proc -> // proc is the returned procedure chan c1; run proc(hear, say, c1); // execute c1?result // get result and discard it for // synchronization fi :: type == PLAN_REQUEST -> chan c; run construct_plan(rep, c); // rep is a goal if :: c?FAIL,reason -> say!reason :: c?SUCCEED,plan -> // we have a plan chan c1; // express it run inform(plan, hear, say, c1); c1?result hear?type,rep; // wait for response ... fi ... other utterance types fi
Figure 3. A Protocol Rule for the listener
our system S and user E. Each utterance is translated into a representation that consists of a speech act type and a domain task, plan, or goal that is appropriate to that speech act type. The channel input type component is for the type of speech act, and representation rep is the internal representation of a domain plan component (task, plan, goal). This code fragment shows how the listener differentiates between requests to carry out tasks (TASK_REQUEST) and planning requests (PLAN_REQUEST). The if...fi structure of the code consists of a list of cases, each introduced with “::”. The first statements of all the cases are simultaneously evaluated. Of these, the statement which first succeeds temporally (i.e., fails to block or becomes TRUE) results in the execution of the remaining statements of that case. Once a case is triggered, no other cases are evaluated. The run command starts execution of a procedure as an independent process that is carried out in parallel with other activities (including other processes), and returns immediately after the process is started. Channels are used to synchronize processes. For example, if the speech act type is a PLAN_REQUEST, then rep is a representation of a goal, and it is passed to the construct_plan(rep,c) process which commu-
nicates with our planner. The channel c is used by the construct_plan process to report on the success or failure of planning. The if...fi code following it will not execute until construct_plan deposits a result into the channel c. If S fails to produce a plan (c?FAIL) then a reason will be returned. This might be nothing more informative than a constraint on the world that could not be satisfied (for example, the valve to P6 is closed). This result will be written to to the say channel for communication to E. In contrast, if S formulates a plan (c?SUCCEED), a plan is returned and further text planning (inform) must be performed to communicate the plan. For our initial model we will assume that all attempts to communicate plans succeed, and we simply flush the result from the channel c1 (c1?result) that was created to handle the process and proceed to listen for a response on the hear channel. The listener protocol rule shows how utterance types PLAN_REQUEST and TASK_REQUEST result in different actions by the system. It also shows how these utterance types drive the system’s planning, execution of plans, and communication to E about it. To relate the rule in Figure 3 to the interaction in Figure 1, consider an extension of the PROMELA code fragment in Figure 3 shown in Figure 4. In the shaded region of Figure 4, we have expanded the c?SUCCEED subcase under the PLAN_REQUEST case of our rule in Figure 3. In response to E’s initial plan request (Figure 1, Line 1), S has formulated a plan. S proposes to E that it go on the P4 P7. After informing E of the plan and route P2 listening on the hear channel, the first subcase to succeed is a timeout. Therefore, S executes its plan by opening a new channel and making a TASK_REQUEST to itself (run task_request(plan,c1)). Figure 3 gives code for handling TASK_REQUESTs. In it, executing out the plan (as a task) is accomplished by running (proceduralize_plan(rep,c)) As part of carrying out the task, S evaluates the status of the valve from P2 to P4. The value is closed resulting in a report of the reason for failed task execution say!reason. As part of running replan_request (not shown), S modifies modifies its knowledge base, and formulates a new goal (based on its current situation) and then recalls plan_request. The code fragment we see here represents only what is executed when S finds the valve from P2 to P4 is closed. As seen in Figure 4, S has a channel open to E (hear?type,rep). This allows for the possibility that E may participate in the planning if she has planning-related information. E can also take over task initiative issuing a task request, possibly to abort the current task. The dialogue in Figure 1 indicates that E did not take the planning initiative, though it could have. The code fragment we have shown illustrates how dialog, planning, and task initiative can be coordinated and driven by the distribution of information between S and E and the circumstances that they encounter as the task is carried out.
! !
// S successfuly constructs a plan in response to a PLAN_REQUEST chan c; run construct_plan(rep, c); if :: c?FAIL -> say!reason :: c?SUCCEED,plan -> // we have a plan chan c1; run inform(plan, hear, say, c1); c1?result; // tell the plan and hear?type,rep; // wait if :: type = ==AFFIRM AFFIRM -> // "Ok", "Yes", rep is nil ... // no response -implicit yes :: timeout -> chan c1; run task_request(plan,c1); c1?result; // request to do it say!result // report result fi :: hear?type,rep -> // E has something to say if :: type == INFORM -> // rep is domain info ... :: type == TASK_REQUEST -> ... fi fi
Figure 4. Extended PROMELA code for Figure 3
6. Current Status and Future Work We are building a virtual pipe world in VRML that gives the user the first-person perspective of the robot. At all times, there are three sets of information about the connectivity of the pipes. First, there is the user’s information displayed on a map in our GUI interface. Second, there is the system’s information that is used for planning. This information may of may not agree with the user’s information. Finally, there is the actual pipe configuration that is revealed to the user and the system as virtual robot explores the pipe domain. The GUI interface to allows the user to select planning- and task-related utterances in the context of our pipe domain. We are also identifying primitive acts that can be translated into robot actions. The planning component will be greatly simplified in this early phase, and initiative will be primarily client-server. Next, we will develop the planning component by building mixed-initiative models that are converted into PROMELA code; internally-represented primitive acts will be translated into robot actions. We will then elaborate on the planning component by building in a richer set of inference rules and asynchronous interaction between the user and the system (such as handling interruptions and simultaneous planning). We will replace the PROMELA interpretation of plans by an internal plan execution engine to improve efficiency. We will use our graphical simulation of the robot moving through the pipe system to perform timed task-completion experiments with human subjects and different sets of our protocol rules to determine which are most effective with different types of users.
References [1] R. W. Smith and D. R. Hipp. Spoken Natural Language Dialog Systems. Oxford University Press, 1994. [2] C. Guinn. Mechanisms for mixed-initiative humancomputer collaborative discourse. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pages 278–285, 1996. [3] Curry I. Guinn. An analysis of initiative selection in collaborative task-oriented discourse. User Modeling User Adapted-Interaction, 8(3-4):255–314, 1998. [4] E. Hovy. Recent trends in computational research on monologic discourse structure. Computational Intelligence, 7(4):363–366, 1991. [5] A. Cawsey. Explanation and Interaction – The Computer Generation of Explanatory Dialogues. The MIT Press, Cambridge, Massachusetts, 1992. [6] J. C. Lester, B. A. Stone, and G. D. Stelling. Lifelife pedagogical agents for mix-initiative problem solving in constructivist learning environments. User Modeling User Adapted Interaction, 9(1-2):1–44, 1999. [7] B. J. Grosz and C. L. Sidner. Attention, intentions, and the structure of discourse. Computational Linquistics, 12:175–204, 1986. [8] Jennifer Chu-Carroll and Michael K. Brown. An evidential model for tracking initiative in collaborative dialogue interactions. User Modeling User AdaptedInteraction, 8(3-4):215–253, 1998. [9] D. Gross, J. Allen, and D. Traum. The TRAINS-91 dialogues. Technical Report TN92-1, Department of Computer Science, University of Rochester, 1991. [10] G. Holzmann. Design and Validation of Computer Protocols. Prentice–Hall, Englewood Cliffs, NJ, 1991. [11] P. R. Cohen and H. J. Levesque. Rational interaction as the basis for communication. In P. R. Cohen, J. Morgan, and M. E. Pollack, editors, Intentions in Communication, page 221. MIT Press, 1990. [12] D. Kumar. From Beliefs and Goals to Intentions and Actions: An Amalgamated Model of Inference and Acting. PhD thesis, State University of New York at Buffalo, 1993.