Interactive Dialogues in Explanation and Learning ...

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR

Interactive Dialogues in Explanation and Learning Giacomo Ferrari Department of Linguistics University of Pisa Irina Prodanof Institute for Computational Linguistics National Research Council - Pisa Michele Carenini Paolo Moreschini AITech s.n.c. - Pisa

Foreword This report belongs to a series, which describes the main r e s u l t s of the research carried in the framework of the ESPRIT B a s i c Research Action #3160 “IDEAL” (“Interactive Dialogues f o r Explanation And L e a r n i n g ” ) . They deal with different t h e o r e t i c a l and implementational aspects of dialogue modeling in e x p l a n a t i o n and relate to the architecture of a demonstration system which i s being developed for the project. Each document describes a n d dicusses a single module of the model, and they should b e considered as a whole for a global understanding of the s y s t e m itself. The system being developed is defined as an ITS (“Intelligent Tutoring System”), i.e. a system which p r o v i d e s information to a user about a particular domain. Since the u s e r can be a complete novice or a partial expert, the system must b e able to provide the appropriate information, as far as both t h e form and the content are concerned.

1


Interactive Dialogues in Explanation and Learning Giacomo Ferrari Department of Linguistics University of Pisa Irina Prodanof Institute for Computational Linguistics National Research Council - Pisa Michele Carenini Paolo Moreschini AITech s.n.c. - Pisa

Abstract In this report, the first of the series described in t h e foreword, the most relevant characteristics of d i a l o g u e s in explanation and learning are highlighted, on t h e basis of empirical data collected by City University o f London. The intrinsecally different roles of e x p e r t (teacher) and user (student) are highlighted, and t h e consequences on the structure of the dialogue itself a r e discussed. The need for an approach based on the m o s t recent developments of natural language dialogue modeling is advocated.

1. I n t r o d u c t i o n In this report, some of the characters of dialogue in explanation and learning are examined, in order to draw indications a n d recommendations about the necessary ingredients of a computational model of such a kind of dialogue. In particular, the large variety of types of expression will b e pointed out, which suggests the definition of a comprehensive dialogue model, able to deal with different styles of discourse (simple exchange, complex exchange, long description, etc.).

2

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR It will be also pointed out that many different kinds of knowledge are involved in an explanation task, which is to b e modelled as a knowledge-based process.

2. The scenario 2.1. Models of explanation Explanation is traditionally understood as the activity of an e x p e r t of communicating to a novice/user the reasons of the decisions (s)he has made, or the advice (s)he has given (see, for instance, [Shortliffe 76], [Swartout 83], [Hughes 86], [Buchanan & Shortliffe 84] etc.). In some cases, as in [Pilkinghton et al. 1988] and [Cawsey 88], a broader instructional context has been considered. Other studies on explanatory discourse deal with the generation of t h e expert's answers as a function of the interpretation and classification of the novice's questions, as in [Gilbert 87], a n d [Sarantinos & Johnson 91], or with a knowledge based approach t o answer generation ([Nicolosi et al. 88], [Paris 87]). To shortly sum up, it is clear that the mechanisms which produce an explanation have been often designed as ad hoc solutions in strict dependence of an application, a domain, a specific architecture. The consequence of such a variety of different application-dependent approaches, is the lack of a general technique for the design of man-machine interfaces f o r explanation. More recent research is oriented towards the application of computational dialogue modeling techniques, developed in t h e field of Computational Linguistics, to explanation, as in [Paris 9 1 ] (which contains also a brief discussion of previous approaches). The use of methods and techniques from Computational Linguistics should improve the generality and portability of t h e resulting solutions, and, above all, make interaction in the d o m a i n of teaching and tutoring more close to a natural (mixed-initiative) one.

2.2. The corpus In order to carry research in the field of the application of dialogue modeling techniques to explanation in knowledge a n d task based systems, some empirical evidence have been collected at City University of London, in a scenario where an expert s t a r t s teaching e-mail to a novice, and the novice is free to throw a n y kind of interruption to confirm his/her attention, to ask questions, etc.. Also, videos have been taken of sessions, in order to e v a l u a t e 3

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR the relevance of non verbal communication facts, such as the u s e of pointing or images. Two selected samples are presented in t h e Appendix, and show the kind of dialogues, which have b e e n collected. In particular, Dialogue #1 occurs in the situation w h e r e the student is completely a novice, and is tought e-mail from t h e beginning to the end, according to a very rigid schema; Dialogue #2, instead, is a fragment of a larger dialogue, and deals with t h e situation where the student already has familiarity with t h e subject being tought ("editing"). From these dialogues, a d e e p e r insight in the dialogue phenomena occurring in a teaching a n d explanation context has been acquired.

3. Characteristics

of explanation dialogues

The most important characteristic of the collected explanation dialogues is the variety of expression forms they consist of. I n fact, a novice can submit his problems by means of simple questions, as in E. S. or S.

Anyone with the name Dowling will be listed on the [...] on the screen. Ah, you've got me in one. What if there's two Dowlings? [1] Are we gonna direct...ah...all this...ah...into a file, or into a directory? [2]

as well as by proposing a complex description of some event, i n the place of a request for explanation, as in S.

E.

Ah, well, I remember...ah...seeing...ah...you and Robert...um...doing the mail, managing the mail with...a...some sort of environment where you can choose...ah...ah...which message you want to.... Oh, visual interface? [3]

or in many other different forms. Also, the answer of the expert declarative sentence, as in S. E.

can be either

Return. What's that for? Now, that's sending mail to me...

a short

[4]

or a long description as well, as in S.

What...ah...ah...what's editor prompter? What does it do? 4

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR E.

Well you ca - you can tell it any editor that you want it to use and the default is what you've specified up here...'/usr/local/emacs'. And this is a special editor [...] um, ...which does nothing more than prompt you for the recipient of the message and the subject line. So...and then it will prompt y o u for the actual message itself. [5] Thus, intuitively, the general structure of explanatory interactions is expected not to deviate much from the structure of conversational exchanges; however, the units of the conversation can exhibit even very complex constructs, similar to other types of non dialogic discourse, obeying often complex rhetorical structures. Thus, examples [3] and [5] are structured as narratives, even though they participate into a dialogue exchange as [1], [2], and [4]. It is, therefore, necessary that a theoretical account of dialogue in explanation is able to deal, within the same uniform framework, with many different structures of discourse, from conversational exchanges, to long fluent discourses, as narratives or descriptions. A second peculiarity of explanatory discourse is the use of both linguistic and non linguistic (points, charts, pictures etc.) communication means, as in E.

[...] Ah. let's try that word 'mail' again to see if the carbon copy has reached you. [.....] Ah, there it is. Okay. [...]The word 'homework', which was the one you put as the subject. [...] And you've got, selle, a number one there, and each of these messages will be numbered...... [6] Thus, the second requisite is that

the theoretical framework must exhibit the possibility accounting for verbal, visual, and deictic aspects communication by means of very general descriptive tools.

of of

Finally, it has to be taken into account that communication can be about different sets of subjects. The external perceivable world is directly involved in the just mentioned example [6], as i n all those cases where reference to the external e n v i r o n m e n t occurs by deixis. The domain of the discourse is the main subject of communication, while the strategy of the discourse itself is t h e subject in cases like 5


E.:

That's okay, we've managed to do something on the mail, but there's a lot more facilities. Let's find out what other facilities there are. [7] Thus, adopting a uniform framework,

it will be necessary to provide a description of discourse structures, which do not change with the change of subject. Discourse structures are to be classified independently from t h e subject area (the knowledge source) they deal with. To sum up, a model of dialogue in explanation m u s t accommodate different kinds of (traditionally modelled) types of discourse, and comes quite close to a general comprehensive model of discourse. It is obvious that such a full theory of dialogic communication requires a highly structured and rich system of categories.

4. Structure of an explanation 4.1. Roles in explanation

dialogue

In the context of explanation, the function of dialogue participants is a well defined and limited one. An e x p e r t is to explain a specific subject matter to a l e a r n e r , who asks questions, r e q u e s t s for clarification, and provides confirmation of his/her attention o r understanding. The expert is supposed to have a deep a n d complete knowledge of the subject explanation is about, as well a s of the strategies by which explanation is to be provided. T h e learner, instead, is supposed to exhibit short term reactions t o specific difficulties, acting according to very local strategies. In order to establish the right position of a theory of discourse, we will, first, examine the structure of explanation communication and the role of both expert and learner.

4.2.

Explanation

dialogue

An explanation dialogue is, tendentially, a dialogue where a n expert is committed to explain some technical point, and a l e a r n e r tries to extract as much of a technical content as possible from t h e expert's discourse. It can be seen as an expert-defined loop, w h e r e a learner can interject his/her questions, doubts, local uncertainties and, eventually, confirmation acts.

6

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR The sample dialogues in Appendix have been partitioned i n sections, which have been identified with informal labels. Using these labels, even a superficial inspection of those dialogues, shows the attempt of the expert to always go back to h i s / h e r original explanation plan, while interruptions are, in general, exhausted in a question/answer pair, and do not generate a permanent focus shift. Such a dialogue structure can be visually represented a s follows

local clarification loops general explanation loop

fig. 1: graphic representation of communication loops in explanation

Two different kinds of discourse are, then, identified, a n instructional discourse, which is the main teaching loop, and t h e clarification dialogue, which includes all the interactions, stimulated by the expert or spontaneous, between the expert a n d the novice. The instructional discourse follows a general line which h a s a starting point and comes to an end, because it occurs according to a general strategy, which is partly related to the nature of t h e domain itself, and partly to the expert's personal teaching experience. Thus an instructional discourse follows a strategy, which is decided since the beginning of the dialogue itself, and is not modified, unless specific (very rare) requests for a change a r e asked by the learner(s). Learners, instead, do not seem to have general learning strategies, but more often they play a passive role, simply activating local clarification strategies, when something is unclear. Thus, the intervention of the learner, in the majority of cases, is the request for clarification, which is exhausted in a local loop. The position of the expert is, therefore, the dominant one, and, in general, (s)he will always go back to his/her expalanation strategy, as soon as an interruption has been terminated.

7


4.3. The expert The expert is, therefore, characterised by a more general view of how the dialogue is or should be going on. It is possible to a s s u m e that, to this end, (s)he uses some types of knowledge, a m o n g which the following are included: - knowledge of the domain of d i s c o u r s e : the expert does n o t simply know the subject matter, but (s)he also has a sort of meta-knowledge about what are the most relevant, the less relevant points, what is necessary to know to just start, what is, instead, necessary to progress etc. - knowledge about the strategies of e x p l a n a t i o n , including how it is appropriate to explain something, where examples a r e proper, how complex examples must be, where peaks of attention are needed etc. - general knowledge about learners, i.e. knowledge about t y p i c a l misunderstandings and d i f f i c u l t i e s , knowledge about different levels of acquaintance with the subject etc. - specific knowledge about the current learner, including h i s / h e r present goals or levels of knowledge. 4.4. The learner The learner is simply characterised by the fact that (s)he is involved in learning something, and his/her role is, by definition, the one of learning what the expert is imparting. In rare cases a learner may have more complex intentions, such as learning something for a specific goal, but in general, a learner has the only function of hearing and understanding. No knowledge is s u p p o s e d to be present to his/her mind, but (s)he will simply exhibit local strategies directed to clarify difficulties and doubts. Even in t h o s e cases, in which a quasi-expert learner actively follows t h e instructional discourse, sometimes anticipating the expert's explanations, such as in Dialogue #2 in Appendix, the e x p e r t seems, ultimately, to keep the control of the dialogue.

5. Explanation

dialogues and theory of discourse

The following is an attempt to capture, from the above and requisites, the main functionalities which build up process of explanation: (1) - the expert is aware of the domain (s)he has to builds a general instructional stragegy. In some the following example from Dialogue #1,

discussions the general explain a n d cases, as i n

8


E.: Sharon -- have you used the mail before? S.: No. E.: Do you know what it is? S.: No. [8] (s)he asks questions in order to assess the level of knowledge of the subject on the part of the future learners. These general strategies can be expressed in terms of general action-types, which can be combined to form e n t i r e explanation strategies. Decision about where to u s e examples, where and which examples to choose, how t o organize an explanation discourse etc. are taken in a d v a n c e according to the evaluation of those strategies. Report ATR #5 discusses in detail the explanation strategies. (2) - the expert can often try to assess the level of comprehension of his/her pupils on specific points, by interjecting assessment subdialogues. E.: Where do you mail your mail already? That you are q u i t e comfortable with? [9] (3) - The expert translates the major explanation steps into m o r e detailed, communication oriented sequences of acts a n d speaks according to the current strategy; E.: Has anyone suggested to you the idea of thinking a b o u t the mail system as if it were like telephone exchanges? W i t h lots of gateways? If you think of it, like the way you specify your phone number. [10] (4) - the learner tries honestly to learn, interjecting confirmation signs or local requests for clarification. According to this set of basic actions, the model of a n explanation dialogue can be detailed as follows

9


EXPERT general teaching strategies

DIALOGUE MODEL communication acts structure

LEARNER interruptions

fig. 2 : composition of a Dialogue Model in explanation

It appears that the position of a dialogue modeling theory, is to provide means to account for the different choices of expression of both expert and learner. An expert builds general explanation strategies, which can be represented in terms of general schemata (as in Dialogue #1), like "assess knowledge" "describe process" "make example" corresponding to plan-like structures of linguistic, or, m o r e generally, communicative actions. A learner, instead, produces local reactions to non-understanding conditions, in terms of requests for clarification.

6. C o n c l u s i o n s 6.1. Requirements for a theory of discourse Explanation is a very complex situation, where different styles of expression, serving different (sub)goals, coexist in the s a m e context. The main requirement is, then, the identification a n d classification of basic communication units and communication functions, which can describe, within a uniform framework, t h o s e different kinds of dialogues. Thus, the role of a theory of discourse is to • provide a common description language for u t t e r a n c e s produced both within general explanation strategies a n d within the learner's interrupts; • provide the constructive elements, out of which a n y discourse can be formed. Basic descriptive units shall coincide with basic acts of communication.

1

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR Thus, explanation dialogues are to be described in terms of sequences of discourse acts, which (partition and) classify b o t h the expert's complex discourse, and the learner's interruptions. Discourse acts, and their classification is described in ATR#2. Here, it is important to remark that they serve the role of • providing a classification tool for all the communicative acts occurring in an explanation dialogue; • offering a criterion of well-formedness, of dialogue as only specific sequences of discourse acts can be considered legal; • providing a direct access to the pragmatic and knowledge-based model of dialogue, through the identification of t h e communicaive function of any discourse act, and the evaluation of the felicity conditions of each act in a specific dialogue context.

6.2. Knowledge for explanation Another conclusion is that many different kinds of knowledge a r e involved in explanation, both domain knowledge, and knowledge related to the explanation technique. It is dubious whether all these types of knowledge can be represented in a uniform w a y , but it is in any case necessary to correctly define the interactions between the different knowledge sources.

References [Buchanan & Shortliffe 84] Buchanan, B.G., Shortliffe , E.H., Rule Based Expert Systems: The MYCIN Experiments of t h e Stanford Heuristic Programming Project, Addison-Wesley, 1984. [Cawsey 88], Cawsey, A., Explaining the behaviour of simple electronic circuits, International Conference on Intelligent Tutoring Systems, Montreal (1988). [Gilbert 87], Gilbert, N., Question and Answer Types, in D.S.Moralee (ed.), Research and Development in Expert Systems I V , Cambridge University Press (1987), pp. 162-172. [Hughes 86], Hughes, S., Question Classification in Rule-based Systems, Research and development in expert systems II, Cambridge University Press (1986), pp. 123-131. [Nicolosi et al. 88], Nicolosi, E., Leaning, M.S., Boroujerdi, M.A., T h e Development of an Explanatory System Using Knowledgebased Models, in Proceedings of the 4th Explanations Workshop (KB Design group), Manchester (1988).

1

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR [Paris 87], Paris, C., The use of Explicit User Models in T e x t Generation: Tailoring to a User's Level of Expertise, PhD Thesis, Columbia University, (1987). [Paris 91] Paris, C., Generation and Explanation: Building a n Explanation Facility for the Explainable Expert Systems Framework, in Paris, C., Swartout, W., Mann, W., Natural Language Generation in Artificial Intelligence And Computational Linguistics, Kluwer Academic Press, 1991, p p . 49-82. [Pilkinghton et al. 1988], Pilkinghton, R., Tattersall, C., Hartley, R., Instructional Dialogue Management, CEC ESPRIT 2 8 0 EUROHELP (1988). [Sarantinos & Johnson 91], Sarantinos, E., Johnson, P., Question analysis and Explanation Generation, Proceedings of t h e Twenty-fourth Hawaii International Conference on S y s t e m Sciences, Hawaii (1991). [Shortliffe 76], Shortliffe, E.H., C omputer based medical consultation:MYCIN, Elsevier, New York, 1976. [Swartout 83], Swartout, W., XPLAIN: a System for Creating a n d Explaining Consulting Programs, Artificial Intelligence 2 1 (1983), pp. 285-325.

Appendix Dialogue

#1

Assessing

E.: Sharon--have you used the mail service before? S.: No E.: Do you know what it is? S.: No E.: Well, let's have a look S.: Yeah E.: Because you can send letters round t h e University to other people on t h e computer S.: Yeah E.: We'll just have a look and see if t h e r e ' s any mail for you. Type the word 'mail' and see what happens S.: There E.: Return. No. No-one's sent you any letters. S.: Oh [laugh] E.: But if you want to send me a message... S.: Yeah

knowledge

Introduction

Show example

Introduce

REQ DENY REQ DENY

INFORM

CONF

classification follows according to

1

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR example Subtopic "address"

Asking clarification

Resume

subtopic "address"

Resume main topic

Subtopic "subject"

ask confirmation Confirm Subtopic

"message"

E.: ...or I want to send you a message... S.: Yeah E.: ...we can use this electronic mail system. But first, who am I on the c o m p u t e r ? Well, you know my surname, it's Dowling, so if you type 'finger Dowling' you can see who I am. Space. That's it. Now it'll take a time. A n y o n e with the name Dowling will be listed on the S.: Yeah E.: - on the screen. Ah, you've got me in one. S.: What if there's two Dowlings? E.: Well, you'll also get their.... S.: Oh, okay. E.: ...initials. But here it gives my login name. S.: Yeah. E.: Now, that's the important bit, becasuse that's the one that you're going t o use... S.: Yeah. E.: ...to mail me a message. But if you picked any of your class mates you'd also g e t their login names. S.: Yeah. E.: So let's try mailing me something. This time, type in 'mail', and then follow i t up with my login name. Space. SA326. Return. Now, you want to send m e mail about something. S.: Yeah. E.: So let's see. Homework. Let's just give i t the word 'homework'. That really is a title, just so that you can grab a person's attention -- "Ah, I've got some mail. What's it going to b e about?" S.: Return?

ATR 2

E.: And return. Now we can type in a message. Now if you're going to a s k for help, you're going to say something like - ah - 'Could I see y o u to discuss last week's homework... S.: Yeah. 1


make example misconcept ion correct misconcept ion resume example

Subtopic "end mail"

ask clarification clarification Subtopic "carbon copy"

check/ correct summing up

Create example

E.: ...at two-thirty today?' So that'll simple message to type in here. S.: Yeah. E.: 'Could I see you to discuss...

be a

S.: Shall I type it? E.: Yeah, please. 'Could I see you to discuss... S.: Does it matter about... E.: Anything S.: Doesn't matter about... E.: No, no, just type away. People doing mail systems don't bother too much a b o u t English. It's just a very brief message. 'Could I see you... S.: Yeah. E.: about last week's homework at t w o thirty in your room. Yes. At t w o - t h i r t y in your room. Return. Now, that's p u t in a message. S.: Yeah. E.: You could have had as many lines as y o u like, but when finish it, if you just d o an empty full stop, then it'll finish t h e mail message. Okay, so just type a n empty full stop and return. A n d return. S.: Return. What's that for? E.: Now, that's sending mail to me... S.: Oh okay. E.: ...but you could send a carbon copy t o anyone you like. S.: Yeah, okay. E.: Why don't you send yourself a carbon copy? DJ140. S.: Return E.: No, type in 'DJ14 - 140'. Return. Now, what you've managed to do there is send me a mail message, and also you've got a carbon copy yourself. S.: Yeah. E.: Now, one of your classmates can s e n d you mail.... S.: Yeah.

1


Deviation time Resume example Describe items in the example

summing up preparing next step

E.: ...and when you logon first thing, it will tell you if you've got any new mail. S.: Yeah, okay. E.: If you have got anything, it will say 'mail for DJ140'... S.: Yeah. E.: ...and then you can look at this. Ah, let's try that word "mail" again to see if t h e carbon copy has reached you. S.: How long does it take to convey? Straight away? E.: Almost straight away. S.: Yeah. E.: Ah, there it is. Okay. S.: Yeah. E.: Now, can you see that you've got t h e heading. S.: Yeah. E.: The word 'homework', which was the o n e you put as the subject, S.: Yeah. E.: And you've got, well, a number o n e there, and each of these messages will be numbered, one, two, three, four.... S.: Yeah E.: And you've got the origin -- this o n e came from yourself. S.: Yes. E.: The name of the person that sent it, b u t when they sent it. Now, at p r e s e n t we've seen how to send mail, and h o w to see if we've got any mail... S.: Yeah. E.: ...but the next thing is to read that mail. S.: Yeah. E.: And to do that we pick the message number. S.: Yeah. E.: Number one. Okay. One. Return. Now, there's a lot at the top here, where i t came from. S.: Yeah. E.: But it's got the mail message. Now, t h a t happened to be a carbon copy.

1

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR S.: Yeah. E.: But it could've been the original one. S.: Yeah. E.: And you can list all of them, n u m b e r s one, two, three, as they come up h e r e . When you've finished, you can t y p e now 'q' to quit this mail system. S.: Yeah. E.: So 'q' on its own. Return. And it's s a v e d the message... S.: Yeah. E.: ...in a mailbox, that's what that stands for. S.: Yeah. E.: Now as this is the first time, you wouldn't have a mailbox before.... S.: No. E.: but now you have, and, if we look in y o u r directory, we can find that file. So, if you give the 'ls' command and look a t all the files in your directory... S.: 'ls' E.: Yes. S.: Return? E.: Return.You have got all the files that y o u had before.... S.: Yeah. E.: ...and this one, mailbox. S.: Yeah. E.: Now that's just a normal text file, a n d you can edit it, list it, whatever y o u like with it, but every time someone sends you a message, the n e w message gets added to the end of t h e file 'mailbox'. S.; Yeah. E.: So, at any time you can list it. Type 'cat mailbox'. Let's have a look at t h e contents of that file. 'cat' -- I beg y o u r pardon... S.: Oh. E.: ...'cat mbox'. S.: Okay. E.: Space. That's it. S.: There.

1

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR E.: That's -- hello! S.: [laugh] Ah, yeah. E.: Ah. Someone has sent you... S.: Yeah. E.: ...some messages before. S.: yeah. E.: But incidentally someone's sent you a load of junk as well. S.: Yeah. E.: And there we go. S.: Yeah. E.: At the end of that is the message that I sent you. S.: Yeah. E.: Alright. S.: Yes. E.: Well, you can send me messages, you c a n ask for help, you can answer some of those.... S.: Yeah. E.: ...that reeled off the top of the screen. S.: Yeah. E.: Good. There we go

Dialogue

#2

Build example

E.: Okay. We'll start with sending a mail message to someone.To stop bothering people I'll send it to myself. Okay. Right. It's just like mail where y o u just instead of typing 'mail p e r s o n ' you type 'ream'. And that - t h a t prompts you for a subject, t h a t ' s 'subject' and it puts you into y o u r favourite editor. S.: Oh. Right. E.: Now, unlike ream where you - er, unlike mail, sorry, where you just - it p u t s you into a straight text, this ... ah ... puts you into, in this case, microemacs S.: Right. So that means you can edit the ...

Subtopic "ream"

1

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR E.: Yep, you can {edit all the headers, t h e cc's... S.:{edit all the 'form', 'to', and everything, right, but not - but not the address. E.: Yeah, you can change anything. S.: Oh. Oh, right, so you can do ... E.: So you can add someone if y o u ' v e forgotten. Ah...Like say I wanted t o send it to Bob as well, {just add t h e name Bob in. S.: {Right. Right. So - so you can put m y name on the end of that 'to' list. E.: Yeah, okay. S.: Yeah, let's do it. At aipna, edai, where whatever's easy. Yeah. E.: aipna's the best. S.: Okay. E.: Right. Type in the test message, add a little signature, and quit the editor a s you would normally. And that's i t gone. I've sent the message. S.: Right, so it goes - right, so it goes as soon as you - as soon as you exit the e d i t o r ... E.: That's right.

1


General Theory of Discourse

1.

Introduction

In these pages a General Theory of Discourse, GTD, will b e described, some elements of which have been developed and u s e d also in the IDEAL Project. Initially we were oriented towards a taxonomic approach to the construction of such a theory; h o w e v e r this approach allowed us only to identify a certain number of features of explanation dialogues; to use such features it b e c a m e necessary to develop an alternative approach which gave account of a structure plausible to them. We focused on the relationship between explanation dialogues and discourse theory to emphasize the characteristics the former must have to be reasonably representative of the latter ones. It is obvious that a full theory of dialogic communication requires a highly structured and rich system of categories, which may turn out to be overdimensioned for the decription of dialogues in the field of explanation. The model presented in t h e following sections is, then, a reduced and domain oriented version of a more articulated theory of dialogue and discourse.

2. Requisites of a theory of discourse The main objective of a theory of discourse in explanation a n d learning is the identification of a set of categories that describe and classify the communication units of an e x p e r t - l e a r n e r dialogue. 2.1.

Interpretation

and

reasoning

In computational modeling of discourse, the major use of a theoretical account is to provide the necessary means to describe real utterances in such a way as to trigger the mechanism of interpretation and response generation. If the system is t o simulate the expert behaviour, the general loop will be:

1

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR start

teaching interruption

interpretation & evaluation

response generation (& check) confirmation resume teaching where the activities in bold are those of the expert. This s c h e m a dictates some requisites for the definition of discourse categories. In fact, as they must participate into the process of interpretation, as well as that of response generation, they must - provide a hint on the speaker's (learner's) intentions; - participate into the process of planning of the response; - be compatible with the semantic content of both expert's a n d learner's utterances. The above listed requisites suggest the use of an approach similar to the one of the Speech Acts Theory [Searle 1969], [Searle & Vanderveken 1985].

2.2.

Generality

Discourse categories should also be general, i.e. applicable t o different domains and different kinds of discourse. Many different discourse theories have been elaborated, which propose different types of categorizations. In general those categories have b e e n empirically determined on the basis of a sample corpus, which presents a specific kind of discourse (narrative, free discourse, conversation etc.). Some of these categories can be applied t o

2

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR different kinds of texts, while others seem to be dependent u p o n the type of discourse. A general theoretical framework should, instead, p r o v i d e categories for any kind of discourse. This appears to be q u i t e difficult, as many different accounts can be given of different discourse structures. A solution to this unbounded expansion of classification categories for discourse results in the following further requisites: • discourse categories must be completely independent f r o m the content of the utterances to be classified, i.e. t h e categories must be defined according to formal (syntactic) criteria; • the set of categories must be extensible according to a formal mechanical principle, i.e. categories must be defined in t e r m s of few primitives, whose combination can originate n e w categories, if it is required.

3. General background Many different classification methods and structure descriptions for discourse and dialogue have been developed in the last decade, also aiming at computational applications. They differ both in the kind of approach and in the type of discourse they t r y to model. Some of them adopt a syntactic/semantic approach t o discourse segmentation, both in the case of conversational exchanges [Reichman 1978, 1985], and with the aim of generality. Others try to motivate the structure of discourse, interactive o r not, on a strong pragmatic base. The reconstruction of t h e intentions of the speaker in dialogue are treated by [Allen & Perrault 1980], [Allen 1983], [Allen & Litman 1986], [Litman & Allen 1987], and [Pollack 1986]. An attempt of giving a formal computational account of the speech acts theory is due t o [Cohen 1978, 1987], [Cohen & Perrault 1979], [Cohen & Levesque 1985], and [Perrault & Allen 1980] [Perrault 1989]. Discourse coherence, focusing and reference have been studied by [Grosz 1974, 1977], [Hobbs 1978, 1979], and [Sidner 1983]. More recently, [Grosz & Sidner 1986] presented a comprehensive theory of discourse, which was meant to build on all the previous studies o n computational discourse modeling; it unifies the plan b a s e d approach together with a model of focus and focus shifting, b u t the syntactic aspect is totally missing, even though alluded to i n some point. Also [Wachtel 1986] and [Bunt 1 9 8 9 ] tried to give a 2

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR comprehensive model of discourse, where the pragmatic and t h e semantic structure are related, but their description r e m a i n s limited to interactive discourse. In the frame of the ESPRIT project 527, [Reilly & H a r p e r 1985] elaborated on a discourse description method proposed b y [Burton 1981]. This model is basically a syntactic account of discourse structures, but it is also clear that it shares with t h e other syntactic approaches an intrinsical weakness. In fact, t h e criterion by which the functional categories are defined is a difuse one, and content is often confused with form or function. Thus, f o r example, the distinction between an informative act, realised as a statement, and a m e t a - s t a t e m e n t, also realised as a s t a t e m e n t , relies only on whether the subject is some information in t h e domain, or the structure of the discourse itself. Within the same project a semantic approach was proposed, based on Situation Semantics ([Barwise & Perry 1983]), w i t h o u t any unification with the syntactic description. This was i n t e n d e d firstly to account for a quantity of misconceptions, occurring during communication, which are due to an insufficient comprehension of the relation between the content of a n utterance and its reference to the real world ([Ferrari & Prodanof 1987]). In addition, within the theoretical framework of Situation Semantics it is natural to describe those messages which are t h e result of the evaluation of different comunicative stimuli coming from different perceptual channels, such as verbal communication associated with visually perceived expressions (faces, noddings, pointings).

4. A general theory of discourse: a brief sketch The theoretical framework we are going to present here relies o n few general principles which try to account for the substantially different roles of expert and learner, and the different s t r u c t u r e s of their respective portion of dialogue. As it has been sketched above, the general assumption will be that the expert will set up general explanation strategies. These will be further detailed in terms of units of discourse, which c a n be described in syntactic, taxonomic and functional terms. T h e learner has a single strategy, learning, and his/her interventions are activated by conditions of failure, and expressed in the s a m e units of discourse as explanation. The discourse structures a r e made of connected discourse units which are described in terms of

2

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR - their appearance as linguistic or non linguistic pieces communication; - their syntactic structures; - the subject of the communication; - their functions; - their hierarchical position in a structure of the discourse.

of

Two points must be very clear from the beginning. The first one is that the basic discourse units, as well as the structures, are t h e building blocks by which the intentions of both the expert and t h e learner can be expressed on one side, and inferred on the other. The second point is that classification must be kept a s independent as possible from the content of the messages. The last requisite is a very general one, which will p r o b a b l y find a more extended application in free discourse or narrative; i t may be useless when dealing with explanation situations.

5. Communication Units 5.1. Different channels of communication Communication, especially in explanation and learning situations, occurs in several different ways, and often through different channels. By channel of c o m m u n i c a t i o n the perception channel is meant by which the receiver of the message acquires t h e message itself. In general, a receiver hears linguistic constructions or vocal markers, and sees noddings, pointings, as well as pictures o r posters. In real human-to-human communication, the m o s t frequent case is that both channels are used, as natural language and non verbal communication means are often associated, e v e n unconsciously. In human-computer interaction some semplifications are in order. Thus, the natural language t e x t appearing on a screen will be considered as vocal communication, at least at the dialogue control level, in order not to introduce all the difficulties which arise from the introduction of a model of reading. Also, the pointing by a mouse is assimilated to the finger pointing in natural communication. 5.2.

Communicative

acts

2


According to the communication channel that is employed and t h e form of the message, different types of communication units c a n be identified. The most relevant are: • ordinary linguistic u t t e r a n c e s ; these include well f o r m e d sentences, and elliptical or, in any case, incomplete sentences; • vocal markers, i.e. linguistic units, which cannot be r e d u c e d to sentences, but carry some information. Many so called c u e words fall in this category, as well as confirmation w o r d s (yes, OK, go on, etc); • socially coded g e s t u r e s , the most common of which is t h e pointing. Also noddings can be considered socially coded gestures (positive nodding, negative nodding etc.). T h e pointing with a mouse on a screen, or with a stick on a p o s t e r are classified in this category; • attitudinal f a c e s , which are not precisely socially coded, but make explicite the attitude of the hearer, or, generically, his psychological reaction. These may be very important i n explanatory dialogues, as, in many cases, the expression of the hearer is a good measure, for the speaker, of t h e comprehension of what he is saying. • graphical communication, which occurs by the showing of pictures, charts, graphical examples. It is doubtful if these a r e to be treated as basic units of communication, or sets of units, or, at least, a special unit with a sort of temporal persistence. It is instead clear that the use of the visual channel is, in t h i s case, more structured and conscious than in the other visual aspects of communication, where the perception of, say, a face or a pointing is immediately followed by i t s classification, without any decomposition effort. The above list is intended to cover a wide range of h u m a n - t o human communication situations, and may appear overdimensioned for the dealing with explanation in general, and w i t h our corpus in particular. Any instance of the above listed units, occurring during a communication, is referred to as communicative act. Communicative acts are the basic units of communication a n d consist of a syntactic form, and a semantic content, i n d e p e n d e n t from the context. Linguistic units are traditionally associated w i t h a phrase marker, or any other syntactic representation. Gestures

2

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR and faces can also be associated with syntactic patterns ([Denoued et al. 1986], [Howell & Fuchs 1968], [Jones 1983], [Sheehy & Harper 1987]). The semantic content is the m e a n i n g of the act, taken in abstraction from the context. To make a simple example, the sentence ...what's editor prompter? m e a n s that "there is some x asking from some y to perform t h e action of informing x about the definition of a z of type editor prompter". Also a pointing gesture m e a n s "the object which lies at the end of an ideal line drawn from the tip of the finger". As i n Situation Semantics ([Barwise & Perry 1983], [Prodanof & Ferrari 1987]), a further level of decoding is postulated, in order t o account for communication ([Ferrari 1987]). This is the level called i n t e r p r e t a t i o n , which consists in relating the meaning of communicative acts at least to a specific spatio-temporal location, and to a specific situation where a specific person speaks about a specific object and a specific person hears. 5.3.

Combined

communicative

acts

These basic units can combine together in different ways. A u s u a l combination associates a deictic pointing with the linguistic expressions And it will ask us if we want it to create this folder Also a nodding gesture associated with a happy expression carries a different message than the same nodding combined with a n expression of discomfort. Graphical communication proposes a more complex relation between one/several linguistic utterance(s) and a graphical object persisting for some time.

6. Discourse units 6.1. Discourse acts A complete message is, then, carried by a (set of) communicative act(s), which contribute to the formation of a single message a n d integrate one with another.

2


A discourse a c t is a set of co-temporal communicative acts integrated together, forming a single message a n d performing the same function. In these pages we will focus our attention on those discourse acts constituted exclusively by linguistic expressions. A discourse act is identified by a set of complex features, the varying of one of which causes different classifications. As we will see, it is necessary to classify it according to its syntactic structure and i t s function. 6.1.1 The syntax of a discourse act The syntactic structure traditionally assigned to sentences b y linguistic theory is a sufficient parameter of classification, a n d allows us to individuate the sentence-class, which ranges o v e r traditional categories such as declarative, interrogative (possibly articulated into the traditional types of w h - q u e s t i o n s , y e s / n o - q u e s t i o n s etc.), i m p e r a t i v e ; the class of the described event, event-class, which coincides to the main verb; and t h e syntactic roles, as SUBJECT, OBJECT etc., which belong to f r a g m e n t s of the sentence itself. No specific representation is foreseen for ill-formed sentences; it is, instead, assumed that the syntactic form is a normalised one, where the only unsolved gaps are t h o s e presented by anaphoric pronouns and ellipsis. 6.1.1.1

Cooccurent

communicative

acts

When a discourse act is the result of the cooccurrence of different communicative acts, they in general follow different channels, a n d integrate with one another, forming a compound. The m o s t frequent compounds are between linguistic utterances a n d pointings or faces. In the former case, the pointing accompanies a deictic pronoun and indicates its referent in the world. In t h e latter case, faces have, in general, the function of communicating some mental attitude relative to the uttered proposition. In b o t h cases, the information coming from the non verbal communicative act must be merged with the verbal message, in order to obtain a significant compound. Dialogues contain several utterances where there is a n implicit reference to the screen, but there are also examples of explicit coexistence of pointing and utterance as in

2

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR ....this one came from yourself. It is also possible to infer different intonations and faces by t h e learner, as she, for instance, utters "Oh" and laughs, or other n o n linguistic comments, as it occurs in several dialogues. Non verbal communication acts, which accompany t h e utterance, are represented by their own syntactic structure, or, more frequently, are translated into pseudo-lexical or p s e u d o sentential items, which are merged, at a semantic level, with t h e representation of the main utterance. 6.1.1.2 Pure markers

non-verbal

communication

and

special

Also non verbal communicative acts may form discourse acts. However, it lies out of the scope of this paper to provide a s y n t a x for them. The reference to the above mentioned bibliography is enough. A special position is to be allowed for both the so called c u e words and other kinds of discourse markers. These can b e considered pure indicators in the flow of the communication, and, possibly, syntactically indivisible units. They provide indications about the flow of discourse control, and may or may not fall into the proposed classification. In the affirmative, they will b e classified as any other discourse act; otherwise, they will b e simply taken as unclassified delimiters. 6.1.2

Functions

In 6.1 we have noticed that any discourse act satisfies a single function; what follows is a (not complete) list of examples of t h e s e functions. To add new elements it is important that a representation is associated to each function, useful also to give account of the intentional structure which underlies discourse acts. The given description must not be considered complete, also since no semantics has been assigned to the operators, neither t h e traditional ones as b e l i e v e and w a n t , nor hose derived from t h e planning terminology, as k n o w - i f , h a s - p l a n , p e r f o r m . W e anyway assume that theirs is an intuitive interpretation o r suggested by the current literature on knowledge and action. T h e first three functions are called high level functions, since t h e i r recognition is immediate ("iss" and "rec" stand for "issuer" a n d "receiver"): 1. INFORM, which occurs when the issuer "assumes that t h e receiver does not have some piece of information but w a n t s 2

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR to have it, and wants to cause the receiver to have such a piece of information"; description: bel (iss, not (bel (rec, P))) AND bel(iss, want(rec, bel (rec, P))) AND want (iss, bel (rec, P)) 2. REQUEST, which occurs when the issuer "does not h a v e some piece of information, assumes the receiver has t h a t piece of information, and wants the receiver to pass such a piece of information to him"; description: bel (iss, not (bel (iss, P))) AND bel (iss, bel (rec, P)) AND want (iss, bel (iss, P)) 3. COMMAND, which occurs when the speaker "wants t h e receiver to perform some action, useful to reach some goal"; description: want (iss, perform (rec, Act)) The expression high level function derives from the fact that t h e s e functions can be recognized without any particular reference t o the context of the utterance; i.e. they do not need any o t h e r information but the syntactic ones to be properly recognized. High level functions are all and only those functions in the discourse which can be recognized by a person who does not participate to a conversation but happens to listen to some fragments of t h e dialogue1 . The correct interpretation of other functions, low l e v e l functions, depends strongly on the context of the utterance, a n d cannot be derived directly from the syntactic structure of t h e discourse act.

1

•

REPORT, which occurs when the issuer "assumes that t h e receiver is able to classify a certain event, to infer from i t some facts and to compare these facts with h i s / h e r knowledge about facts related to this event"; description: bel (iss, has-beliefs (rec, P)) where h a s - b e l i e f s means that in the receiver's knowledge base there are some beliefs about P;

•

CHECK, which occurs when the issuer "believes h e / s h e knows the subject of the question he/she asks, but wants t o make the receiver to confirm his/her (own) level of

It is worth noting that such an external listener is a somewhat ideal c h a r a c t e r , since he is assumed as completely not linked to any context. It is also important t o stress the difference between the listener's recognizing of the (syntactic) s e n t e n c e class and his inferring of a p r o b a b l e function of the sentence he heard. The f o r m e r implies only a syntactic recognition of the sentence; the latter presupposes a k i n d of low level interpretation of the dialogue participants's intentions.

2

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR understanding of the subject". In the framework of teaching-learning dialogues, this articulates in two possible directions: the expert checks the user's understanding, o r the user checks his/her own understanding on a particular subject; description: bel (iss, bel (iss, P)) AND want (iss, knowif (bel (rec, P)) •

CONFIRM, which occurs when the issuer "wants the receiver to believe that he believes something"; description: want (iss, bel (rec, bel (iss, P)))

•

DENY, which is the controry of CONFIRM; description: want (iss, bel (rec, not bel (iss, P)))

•

PROPOSE, which conveys intentions or plans about forthcoming discourse; description: want (iss, bel (rec, has-plan (iss, p+ ))) where p + represents a plan, or a set of plans.

the

6.2 The structure of the discourse act Finally, the representation of the linguistic portion of a generic discourse act is the following: {(function d-acti f i ) (sentence-class d-acti s i ) (event-class d-acti e i ) (role 1 d-acti r 1 ) ...(role n d-acti r n ) } where • function identifies one of the previously described functions, which can be always instantiated with a hih level function; i t can be instatntiated with a low level function only at the e n d of the interpretation phase, i.e. after the resolution of t h e references, the specification of the context and the abduction process; • sentence-class identifies a syntactic (DECLARATIVE, INTERROGATIVE or IMPERATIVE); • e v e n t - c l a s s is identified through the individuation main verb of the sentence;

category

of t h e

2

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR • roles identify traditional grammatical roles in the sentence.

7. Exchanges A dialogue is a sequence of acts, at least a question and an answer, connected by a generic uniformity of content. Sometimes t h e y become more complex when clarification subdialogues take place in order to negotiate some unclear aspects of the object of t h e dialogue itself. These sequences of discourse acts can b e partitioned in higher order units, according to some internal, a s well as external parameters. The most relevant internal parameter is focus coherence, that is, in a very informal definition, the fact that a discourse partition relates to a single subject matter. When the subject is changed, a new discourse partition is opened. By external parameters, cue words are in general meant, a s they are supposed to be a superficial mark of the deep level partitioning, i.e. focus shifting. A higher level discourse unit is introduced, the e x c h a n g e . An e x c h a n g e is the minimal dialogue unit relative to t h e same subject. This definition marks the fact that the necessity condition is t h e focus coherence. From the above sketch the boundaries of an exchange a r e delimited in two ways. Under the viewpoint of the content, t h e y relate to the same objects, while structurally they are only sometimes delimited by some cue word or expression.

8. A synthesis of the dialogue model The above discourse model relies on a three levels hierarchy of discourse units. Each of them consists in units of the lower levels, and a set of features and properties. The formalization of t h i s model is, therefore, more resemblant to an augmented PSG, t h a n to a simple rewriting system. Here is a synthesis of the categories: ::= [ ++, ] ::= [, ] 3


::= an object; ::= a list of objects; ]

::=

[,

,

::= INFORM | REQUEST | CHECK... ::= the result of the composition of the various of the which form a discourse act; ::= | | | | ::= a non contextual semantic representation of t h e content of the communicative act; in the case t h e communicative act is an utterance, the meaning is composed as specified in section 6.3; ::= the contextualized . No discourse unit is taken as the axiom, as this notation cannot be, conceptually, taken as a grammar which dictates well-formedness conditions, but simply as the representation of regularities i n discourse. The very nature of discourse does not intuitively include the notion of axiom, as this implicitely defines conditions of termination, which are absent in real dialogues.

9. The relationships model

between

GTD and e x p l a n a t i o n

9.1. Planning discourse units The explanation plan seems to be a typical hierarchical plan, w i t h at least three levels of abstraction, four if one considers t h e general plan underlying the whole dialogue. The following is a sketch of abstraction levels:

3

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR explain e-mail

SESSION PLAN

STRATEGIC PLANS

OPERATIONAL PLANS

assess level of knowledge

CHECK knowledge

define level of explanation

Introduce

make example1

INFORM

COMMAND

Figure 4 This creates a relation between the dialogue categories and t h e planning activity underlying the explanation task. The formal relation is established by the belief/intention structure corresponding to each discourse unit. Three levels of planning activity can be identified i n explanation dialogues. At a higher level, the expert is supposed t o act according to an overall session p l a n , which directly d e r i v e s from a general teaching strategy. This may not be present, if t h e expert does not adopt a highly structured approach. S t r a t e g i c plans translate the general explanation plan into explanation/learning strategies, which include didactic plans, which can take labels a s i n t r o d u c e , make e x a m p l e , g i v e a n a l o g y etc. or student strategies. At a lower level o p e r a t i o n a l plans are built to produce the discourse as a predefined s e q u e n c e of functional acts which deliver explanation content and control the conversation. Discourse acts are the terminal boxes of t h e s e operational plans. 9.2. Focusing, planning, and discourse partitions Two general discourse categories have been introduced, discourse acts and exchanges, which are defined, respectively, in terms of functions, and of focus coherence. There are, however, t w o important sources of difuseness, which affect this classification. The first one is the precise definition of focus coherence, while t h e second has to do with the relationship between discourse planning and content of the conversation.

3

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR Focus coherence should rely on the coherence of the objects introduced by the discourse and of the task that the expert is explaining. However, this definition of focus depends upon t h e definition of focused objects we adopt, which is a function of t h e granularity of the knowledge we are dealing with. Thus in t h e dialogue E.: And your mail, that's in the mail directory. S.: Yeah. E.: Right. So, we can't copy those over using rcp, right because it's a whole directory structure. .................. ............ E.: .........so it's going to put all your mail into a file....... the part in italics deals with a different subject (how to copy directories) from the original one (how to receive mail), but it is the subject of a sub-part of dialogue, which, using a more general view, is integrated into the first subject. Thus, according to t h e level of detail we want to use in the description of focus, this will or will not represent a focus shifting. Also, the relation with the planning activity is unclear. Discourse planning is an autonomous activity, which, however, depends upon the content of the discourse itself. The three levels of planning activity can be defined in terms of discourse p l a n types, but actual discourse plans arise from the instantiation of discourse action types to specific discourse objects. Thus, t h e problem of granularity comes out again. It is reasonable to accept a model where - session plan types are instantiated with high level discourse objects; this should be the case of the previous example, where a plan type, say EXPLAIN-BY-PRACTICE is instantiated with RECEIVE-MAIL as object. This is the level of m a x i m u m focus stabilty, which is to be referred to as e x c h a n g e level; - strategic plan types are instantiated with locally e m p l o y e d objects; in the previous example the expert decides to s h o w how to receive messages by receiveing them and discussing how to do that. The description of how to receive a message is a locally strategic plan, which introduces a reduced set of objects, with respect to the exchange level, but, nevertheless, realizes a communicational choice; - operational plans deal with the building of sequences of discourse acts, instantiated to single objects or coherent s e t s of objects. In our example, the discussion about the different ways of copying a message is of this type. 3


Different discourse planning level correspond, then, to definitely different level of granularity of the evoked knowledge, and t h e distinction between focus shifting from one level to the other is quite difuse. If we want to maintain a symmetry b e t w e e n discourse structure, focus keeping, and levels of planning, t w o different approaches can be taken. The first one consists i n defining in a rigid manner the levels of planning and creating a s many discourse units. The second one consists in recognizing only low level discourse units, leaving the problem of the planning levels an unsolved one. The first approach has the advantage of creating a symmetry between discourse categories and p l a n categories, while the second one does not constrain the number of planning hierarchical levels, but limits very much the classificatory capacity of the theory. The solution which is here indicated is in favour of n o t constraining the number of planning levels, and keeping focus maintainance and plan boxes as two separate entities. Under a generation point of view, a plan is to be considered a structure of uninstantiated communicative actions, from a higher to a l o w e r level of abstraction. Focus maintainance keeps trace of t h e discourse subject matter, regardless of the planning activity. T h e merging point of the two activities is given by discourse acts. 9.3.

Communication

situations

The classification method proposed here is an attempt to associate a sort of syntactic description of discourse with a pragmatic account of the intentional structure underlying the communication. In fact, the notion of function of a discourse a c t tries to provide an account, as formal as possible, of t h e pragmatics of communication. The syntactic aspects are dealt w i t h at a traditional synctactic level. The annotated corpus is a n example of use of the categories introduced in section 7.1.2, but n o exhaustive list of discourse acts is provided, because the m e t h o d described here aims at a complete flexibility and modularity. I n other words, the attempt has been one of proposing a method f o r defining categories, rather than a list of categories. This classification method, however, covers only a part of the phenomenon of communication, namely the descriptive (external) one. It does not account for the process by which a message is understood and produces a reaction, which is coherent with the stimulus. The basic assumption is that the process of communication has a perceptual level, where a set of facts directly related w i t h the intended message, together with the context, are received a n d 3

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR classified by a communication participant, and a knowledge-based level, where windows are opened on the different knowledge sources available to the receiver of the message, in order to c a r r y the evaluation and inference activity, necessary to the d e e p understanding of the content of each communication act. In order to define the framework of communication, it will be assumed that any message has an i s s u e r and one ore m o r e r e c e i v e r s . The i s s u e r produces a (set of) communcative act(s) i n a context which is available to the receiver, and the issuer k n o w s it is. The r e c e i v e r perceives a set of communicative acts. He is also aware of their m e a n i n g , i.e. the set of relations associated with the message independently from any contextual application of that meaning. The receiver perceives also a set of facts a b o u t both the discourse and the real world context. The union of all these facts will be called the communicative situation, and will b e the basic unit of communication semantics. A complete message is, then, carried by a (set of) communicative act(s), together with the perceptual context which is associated with it. In accordance with the Situation Semantics approach proposed in [Prodanof & Ferrari, 1987], such a perceived context is represented by a set of properties, identifying who t h e i s s u e r is, who the receiver is, the l o c a t i o n , the t i m e of the act, and some other facets of the located act. These properties cover also a part of what in Situation Theory is the resource s i t u a t i o n , i.e. a set of generically available facts about the communication, which combine with it in order to complete the information content of the message. A communicative s i t u a t i o n is a set of properties identifying a (set of) communicative act(s) with respect t o its content and its context. Some of these properties are a revision of the properties of a disocurse situation of Situation Semantics, while others are a n attempt to represent the relevant aspects of the resource situation. Also in accordance with Situation Semantics, i n t e r p r e t a t i o n is postulated as a further level of u n d e r s t a n d i n g of a communicative act. The interpretation of a communicative situation is a function of the meaning of the communicative act(s) which it consists of, and the referred to objects. It provides a link between the objects mentioned in the communication act(s), and a set of objects in the world, and can be identified as a general mechanism of real world objects referencing. The negotiation of

3

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR the reference gives raise, in some cases, to clarification subdialogues (see [Prodanof & Ferrari, 1987]), and a specific mechanism for a generalized reference resolution has to b e deviced. Interpretation is the function on which a general m e c h a n i s m of focus shift keeping can be based. In fact, focus shifting can b e defined as a function which maps discourse objects onto w o r l d objects. Until the world objects persist, no shifting occurs, and t h e exchange remains the same.

References [Allen & Perrault 1980]: Allen J. F., Perrault C. R., "Analysing intention i n utterances", Artificial Intelligence 15, 3, 1980, pp. 143-178. [Allen 1983]: Allen J. F., "Recognizing intentions from natural l a n g u a g e utterances", in Brady M., Berwick R.C. (eds.) Computational Models o f Discourse , MIT Press 1983, pp. 107-166. [Allen & Litman 86]: Allen J. F., & Litman D. J., Plans, Goals, and L a n g u a g e , in Ferrari G. (ed.) IEEE Proceedings, Special Issue on N a t u r a l Language Processing,,74-7 July 1986, pp. 939-947. [Barwise & Perry 1983]: Barwise J., Perry J., Situations and Attitudes, MIT Press 1983. [Bunt 1989]: Bunt H. C., "Information Dialogues as Communicative Actions i n Relation to Partner Modeling and Information Processing", in T a y l o r M.M., Néel E. & Bouwhuis D.G., The Structure of Multimodal Dialogue, North Holland, 1989. [Burton 1981]: Burton D., Analysing spoken Discourse, in Coulthard M., Montgomery M., Studies in Discourse A n a l y s i s , Routledge & K e g a n 1981, pp. 61-81. [Cohen 1978]: Cohen P.R., On knowing what to say: Planning speech acts , PhD thesis and TR 118, Comp. Science Dept. Toronto, 1978. [Cohen 1987]: Cohen P.R., "Analyzing the structure of a r g u m e n t a t i v e discourse", Computational Linguistics 13, 1-2, 1987, , 11-24. [Cohen & Levesque 1985]: Cohen P.R., Levesque H.J., "Speech acts a n d rationality", Proc. of ACL, 1985, pp. 49-59. [Cohen & Perrault 1979]: Cohen P.R., Perrault C.R., "Elements of a p l a n based theory of speech acts", Cognitive Science 3, 1979, pp. 177-212. [Denoued et al. 1986]: Denoued B., Pavaro B., Vladis A., Clement, Fromont J., Lejeune A., Maudet J. F., "Command Language Syntax and G e s t u r a l Syntax: Two constraints in Designing Usability and Learnability", i n International Scientific Work with Display Units, Stockholm 1986, p p . 785-788. [Ferrari 1987]: Ferrari G., Theoretical framework for a formal model o f dialogue, in ESPRIT 527 CFID - Deliverable 7, Dublin 1987. [Grosz 1974]: Grosz B. J., "The structure of task oriented dialog", IEEE Symposium on Speech Recognition 1974. [Grosz 1977]: Grosz B. J., "The representation and use of focus in a system f o r understanding dialogs", Proc. IJCAI , 1977, pp. 67-76. [Grosz & Sidner 1986]: Grosz B. J., Sidner C., "Attention, Intention, and t h e structure of discourse", Computationa Lingusitics 12, 3, 1986. [Hobbs 1978]: Hobbs J. R., "Resolving pronoun references", L i n g u a 44, 1978, pp. 311-338.

3

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR [Hobbs 1979]: Hobbs J. R., "Coherence and co-reference", Cognitive S c i e n c e , 1, 1979, pp. 67-82. [Howell & Fuchs 1968]: Howell W. C., Fuchs A. F., "Population stereotype i n code design", in Organisational Behavior and Human P e r f o r m a n c e , 3, 1968, pp.310-339. [Jones 1983]: Jones S., "Stereotyes in Pictograms of Abstract Concepts", i n Ergonomics 26, pp. 605-611. [Litman & Allen 1987]: Litman D. J., Allen J. F., "A plan recognition m o d e l for subdialogues in conversations", Cognitive S c i e n c e , 11 2, 1987, p p . 163-200. [Perrault 1989]: Perrault C. R., "Speech Acts in Multimodal Dialogues" i n Taylor M. M., Néel E. & Bouwhuis D. G., (eds.) The Structure o f Multimodal Dialogue, North Holland, 1989. [Perrault & Allen 1980]: Perrault C. R., Allen J. F., " A plan based analysis o f indirect speech acts", A J C L 6, 3-4, 1980, pp. 167-182. [Polanyi & Scha]: Polanyi L., Scha R. J. H., "Discourse syntax and s e m a n t i c s " in L.Polanyi (ed.) The Structure of discourse , Ablex, forthcoming. [Pollack 1986]: Pollack M., "A model of plan inferenced that d i s t i n g u i s h e s between the beliefs of actors and observers", in Proceedings of t h e 24th Annual Meeting of the ACL , 1986, pp. 207-214. [Prodanof & Ferrari 1987]: Prodanof I., Ferrari G., Discourse context, i n ESPRIT 527 CFID - Deliverable 7, Dublin 1987. [Reichman 1978]: Reichman R., "Conversational coherency", Cognitive Science 2, 4, 1978, pp. 283-328. [Reichman 1985]: Reichman R., Getting computers to Talk like you and me, MIT Press, 1985. [Reilly & Harper 1985]: Reilly R., Harper J., A dialogue classification System, in ESPRIT 527 CFID - Deliverable 2, Dublin 1985. [Sarantinos & Johnson 1990]: Sarantinos E., Johnson P., Explanation Dialogues: A theory of how experts provide explanations to n o v i c e s and partial experts, to appear in Artificial Intelligence. [Searle 1969]: Searle J.R., Speech Acts, an Essay in the Pholosophy o f L a n g u a g e , Cambridge Univ. Press, 1969. [Searle & Vanderveken 1985]: Searle J.R., Vanderveken D., Foundations o f illocutionary logic, Cambridge Univ.Press, 1985. [Sheehy & Harper 1987]: Sheehy N., Harper J., Representation of n o n verbal behaviour in the formal model, ESPRIT 527 CFID - Deliverable 7, Dublin 1987. [Sidner 1983]: Sidner C., Focusing in the comprehension of d e f i n i t e anaphora, in Brady M., Berwick R.C. (eds.), Computational Models o f Discourse, MIT Press 1983, pp. 267-330. [Wachtel 1986]: Wachtel T., Pragmatic sensitivity in NL interfaces and t h e structure of conversation, in Proceedings of COLING86, Bonn 1986, p p . 35-41.

3


The role of the Dialogue Manager in an ITS Abstract In this paper the architecture of the Dialogue M a n a g e r , DM, of an Intelligent Tutoring System is described. We start with a description of the general flow o f communication which involves the DM activities, i.e. which are its internal modules and how do they w o r k during communication. Then the description l a n g u a g e which is returned by a parser (and sent to the DM) w i l l be briefly sketched. Finally all the internal modules o f the DM (the Focus Structures Managing Mechanism, t h e IF-THEN Rules Module, and the Output S t r a t e g i e s Module) will be described in detail.

1.

Introduction

Giving a formalization of man-machine interaction (and of dialogic communication in general), the Dialogue M a n a g e r appears a s the central module in many senses. From a cognitive point of view, it is the logical unit which translates what has been said ( a Natural Language input) in a particular formalism corresponding to a “language of the mind” (see [Fodor 1977]), i.e., a semantic representation which makes the relationships among t h e components of the sentence and the roles played by each of t h e m explicit. From a linguistic point of view, the DM gives account of t h e dialogue participants’ competence, meant as the logic unit which presides at language understanding and generating. From a computational point of view, given a system for t h e simulation of natural language understanding and generation, a s described in [Ferrari et al. 1991b], the Dialogue Manager is t h e module which makes the other modules of the s y s t e m communicate and establishes the modalities of their communication. In an Intelligent Tutoring System, ITS, like t h e one designed for IDEAL, the DM activates three distinct functions: the Focus Structures Managing Mechanism, the IF-THEN r u l e s module, and the Output Strategies module (fig.1) :

3

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR input

PARSER Output

Strategies

DM

logical form1

logical form 2

Focus Structures IF-THEN Rules TS/FS

KRS

RESPONSE GENERATOR query

Student's Model

output

fig.1: The Dialogue Manager’s modules in an ITS

The DM receives by the result of the parsing phase, which is a structure called the logical form. The logical form represents a description of the input sentence which enables to identify t h e following pieces of information (for a theoretical account of t h e relevance of this information see also the General Theory of Discourse, GTD, as described in [Ferrari et al. 1991c]): • function: the function of the sentence according to the list given in GTD; this particular element will not be used b y the DM. Possible instances of functions are: INFORM, CONFIRM, DENY, CONTINUE etc.; • sentence-class, i.e. DECLARATIVE, IMPERATIVE, INTERROGATIVE. As far as INTERROGATIVE is concerned, i t is possible to further specify it with a label (t y p e ) : "Yes/No-questions" or "WH-questions" (the latter ones c a n be recognized through the use of markers as "HOW", "WHAT", "WHEN" etc.); • e v e n t - c l a s s , which sentence;

identifies

the

main

verb

of t h e

• r o l e s , which coincide to traditional syntactic roles inside the sentence (subject, object, specifier etc.).

3


The following is an example of the logical form derived from a sentence: sentence: How can I send a mail? logical form: ((SENTENCE_CLASS request) (TYPE how) (MAIN_VERB send) (SUBJECT speaker) (OBJECT mail)) When the DM is reached by the logical form from the Parser, i t activates the Focus Structures Managing Mechanism (see the n e x t section and [Ferrari et al. 1991a]), which has at its disposal a t a s k s stack and an objects stack. The mechanism returns the c u r r e n t stacks state, and the DM tries to match the result to an IF-THEN rule. Some IF-THEN rules make it possible to build an a p p r o p r i a t e query to the Knowledge Representation System, which returns t h e relevant portion of knowledge. At this point, the DM is able t o select the right Output Strategy, i.e. the appropriate explanatory behaviour which will be then verbalized by the Response Generator. The logical form which is returned by the DM ("logical form 2"2 ) contains all the informations necessary to the Response Generator to create an appropriate output. Other IF-THEN rules allow the DM to select the a p p r o p r i a t e Output Strategy immediately, with no query to the KRS. As we will see, the first input processing causes a query to the KRS, while successive inputs do not (necessarily). The KRS we refer to is fully described in [Ferrari et al. 1990], [Ferrari et al. 1991d] and [Serantinos & Johnson 1990]. I t s knowledge base is divided in two: a Task Knowledge Structure a n d a Domain Taxonomic Substructure. The former is constituted by a collection of structures which describe the tasks relatively to a particular domain; the latter comprehends all the objects of t h e domain.

2. IF-THEN rules The structure of an IF-THEN rule is the following: IF logical form 1 AND stacks state AND tests THEN actions 2

The output from the DM has been called logical form 2 since it is a structure similar to the one which is returned by the Parser.

4


where the variable logical form 1 is constituted by some of t h e elements of the logical form returned by the parser. Here are some examples of actual rules:

IF

THEN

(SENT-CLASS = req) (TYPE = how) (main-verb is ACTION) (SUBJ = speaker)

AND

tasks

objects

query(KRS, main-verb(arguments), result) select(output-strategy) f i g . 2 : Rule 1

This rule treats a SENT-CLASS, the TYPE of which is , with a main verb which can be classified as an ACTION, t h e SUBJECT of which is the 3 ; both the tasks stack and t h e objects stack are empty. All these elements cause a query to the KRS related to a n entry, which matches to the main verb. The result is a t a s k structure, TS. According to the information which constitutes t h e result of the query, the appropriate output strategy is selected, i n this case control_strategy_1, i.e. the strategy which controls t h e explanation of a task. Another rule is the following:

3

These categories are taken from the General Theory of Discourse w h i c h has been used also in the IDEAL framework. See [Ferrari et al. 1991a] a n d [Fodor 1977].

4


IF


AND

AND

m a i n - v e r b ‰ TS of the main-task

THEN

search(main-verb, TS) select(output-strategy)

main-task tasks

f i g . 3 : Rule 2

This rule is applied usually to manage an interruption by the user; it is important to distinguish between the test and the actions t o be performed: the former is used only to check if the main v e r b belongs to the task structure of the task being explained, the m a i n task; the action “search(main-verb, TS)” is necessary to establish the position of the main verb in the current task structure. T h e r e are three possible positions, according to the user’s input: a ) the user asks for information about a topic which h a s already been explained; in this case the selected o u t p u t strategy will lead to the generation of an answer like: “I have already told you that. Do you want me t o repeat it?” and to the selection of some alternative explanation device, using actions like "make_example", "create_analogy" (see also [Ferrari et al. 1991e]); b ) a sequence is being explained; the user asks f o r information about a topic inside that sequence, but n o t respecting the sequential order; in this case t h e selected output strategy will lead to the generation of an answer like: “I haven’t explained that yet. Wait a minute”; no examples or analogies are necessary since all the user must do is to wait; c) a sequence is being explained; the user asks f o r information about a topic out of that sequence ( b u t inside the same task); in this case the selected o u t p u t strategy consists of the generation of an answer like: “ I haven't finished this explanation yet. Do you want m e

4

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR to give it up?”; the system waits for an answer b e f o r e continuing or skipping to another subject. The following rule is applied when the user asks for informations about some topic out of the current task:

IF


∉ TS

AND

main-verb

THEN

select(output-strategy)

AND main-task tasks

of the main-task

f i g . 4 : Rule 3

The fact that the main verb does not belong to the TS makes t h e system select an output strategy consisting at least of a check about the user’s intentions (if abandoning or not the explanation of the current task). If the user asks a question about a particular object during t h e explanation of a task, like: What is a file? the DM selects the appropriate rule, something like:

4


IF

(SENT-CLASS = req) (TYPE = what) (main-verb is NOT_ACTION) (SUBJ = file)

AND

file ‰

THEN

query(KRS, SUBJ, result) select(output-strategy)

AND main task tasks

objects

object-list of TS of the main task

f i g . 5 : Rule 4

In this rule there is a SENT-CLASS, the TYPE of which is , with a main verb which cannot be classified as an ACTION, the SUBJECT of which is the object ; in the tasks stack a m a i n task is stored, and the objects stack is empty. All these e l e m e n t s cause a query to the KRS related to an object in the Domain Taxonomic Substructure. According to the information which constitutes the result of the query, the appropriate o u t p u t strategy, control_strategy_2, i.e. the strategy which controls t h e explanation of an object, is selected. The last example of IF-THEN rule is the following:

IF


∉

AND

file

THEN

query(KRS, SUBJ, result) select(output-strategy)

AND main task tasks

objects

object-list of TS of the main task

f i g . 6 : Rule 5

In this case the user's question is about an object out of t h e task being explained. The Dialogue Manager's behaviour is somewhat equivalent to that following the application of Rule 3. 4

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR The rules which have been introduced in this section a r e only some of the actual rules necessary for the complete managing of control and explanation strategies by the DM during a dialogue.

3. The Focus Structures We will consider now some examples of the way the Focus Structures Managing Mechanism works. Here we will not refer t o a complete dialogue, but only to some fragments which can b e considered quite representative of the movements inside t h e stacks. [1] “How can I send a mail?” The logical form derived from this sentence allows us to fill t h e stacks in this way:

send(s1)

e-mail(a2)

tasks

objects

fig.7: The tasks stack and the objects stack after the first input

where the variable s1 is referred to “send” and put on the first level of the tasks stack; the variable a2 has “e-mail” as its t y p e and is put on the first level of the objects stack. Let’s now consider a user’s answer to a system’s check, like: [2a] “Do you know how to edit a file?” [2b] “No”

4


edit(e1)

file(a3)

send(s1)

e-mail(a2)

tasks

objects

fig.8: The tasks stack and the objects stack after a user’s deny

The negative answer to the question is translated into t h e appropriate logical form (on the basis of the system’s question), and “edit(e1)” is put on the second level of the tasks stack, while “file(a3)” is put on the second level of the objects stack. If we had “Yes” instead of the user’s “No”, it would h a v e coincided with an acknowledgment by the user, and t h e explanation would have continued as if this topic had actually been explained. If the “Yes” is followed by a request for explanation on a particular topic, and this topic belongs to the task structure t h e name of which corresponds to the lowest level in the tasks stack, we have the following situation: [3] “Yes. How can I run the editor?”

run(r1)

editor(a4)

edit(e1)

file(a3)

send(s1)

e-mail(a2)

tasks

objects

fig.9: The tasks stack and the objects stack after a user’s request on a topic inside the task

If the “Yes” is followed by a request for explanation on a particular topic and this topic does not belong to the t a s k structure considered at the moment, such as:

4

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR [4] “Yes. How can I receive a mail?” rule 3 is applied and the system asks for confirmation o n abandoning the current explanation; if the user confirms h i s intention to shift to another topic we have the following final situation:

receive(s2)

e-mail(a3)

tasks

objects

fig.10: The tasks stack and the objects stack after a user’s request on a topic out of the current task

This situation corresponds exactly to a typical beginning of a n exchange.

4. Output Strategies There are two main classes of strategies (see f i g . 1 1 below): t h e first one ("Control strategies") comprehends the two s t a n d a r d explanation strategies relative to a task structure or a f r a m e structure, control_strategy_1 and control_strategy_2. The second class ("Clarification strategies") comprehends the strategies which are selected when the user interrupts the normal flow of explanation. There are two main kinds of interruption the u s e r can make: a request for further explanation on an a l r e a d y explained topic, or a request for the explanation of a new topic. The former must make the system choose an alternative explanation strategy (for instance, giving an example, showing a picture, creating an analogy etc.); the latter must allow the user t o confirm his intention to skip to another topic. Output

Strategies

1st class strategies

Control_strategy_1

Control_strategy_2

2nd class strategies

Clarification strategies

F i g . 1 1 : Output strategies

4


Control_strategy_1, as it has been said, is the strategy which controls the explanation of a task.

• control_strategy_1

TS1

Fig.12:

Control_strategy_1

This picture represents the normal flow of explanation of a t a s k structure TS1, with a starting point, a direction and an end. As w e have seen in the IF-THEN rules section, it can happen that t h e user interrupts the explanation to ask information about something else. clarification subdialogue

interruption

•

• control_strategy_1 Fig.13:

TS1 Control_strategy_1

interrupted

This picture represents a very general situation, with no r e s p e c t to the content of the user’s interruption. Actually, the i n t e r r u p t i o n of the user causes a clarification subdialogue, depending on ( a n d instrumental to make the user aware of) the relationship b e t w e e n the interruption’s content and the point we are into the t a s k currently being explained. These clarification subdialogues t a k e place after the user has answered affirmatively to the system’s requests for confirmation like the ones we have seen in section 2 (when this is the case). The following is a first case of interruption:

4

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR interruption


TS1


object F i g . 1 4 : The definition of an object inside the explanation of a task

Here the user asks for information about a particular object which is in the task being explained, for instance in a situation like t h e following: [5a] System: ...first you must edit a file and ... [5b] User: ...what is a file? In this case, being the object mentioned present in a precise s t e p of a procedure of the task currently explained, and t h e r e f o r e being the proper comprehension of that object essential to t h e understanding of the procedure, the system does not ask f o r confirmation about the user’s intention; control_strategy_1 is suspended but kept active, and the control passes to control_strategy_2. Once the object has been completely defined, the control of the explanation returns to control_strategy_1, a n d the explanation of the task continues. This is in practice a n o r m a l PUSH-POP process, like the one on an ATN graph. This interruption is quite particular, since the clarification subdialogue is constituted by the activation of control_strategy_2. There are other cases in which the control is left to control_strategy_1 (or control_strategy_2, when an object is being explained) a n d strategies of the second class are activated. Now some cases will be presented, with reference to the kinds of interruption we s a w in section 2. The user asks for information about a topic which h a s already been explained; in this case the selected output s t r a t e g y will lead to the generation of an answer like: “I have already told you that. Do you want me to repeat it?” and to the selection (in the case of a confirmation from the user) of some alternative explanation device, using explanation strategies like "make_example", "create_analogy" etc., followed by a check on t h e user's comprehension:

4


clarification subdialogue

create analogy

make example CHECK

• control_strategy_1 TS1 F i g . 1 5 : The second class strategies inside a clarification subdialogue

These explanation schemata (“create_analogy”, “make_example” etc.) represent the strategies belonging to the second class; t h e y can be considered particularly useful explanation strategies which are managed by control strategies. Another case is when a sequence is being explained and t h e user asks for information about a topic inside that sequence, b u t not respecting the sequential order; in this case we will have t h e generation of an answer like: “I haven’t explained that yet. Wait a minute”; no examples or analogies (i.e., no activation of second class strategies) are necessary since all the user must do is t o wait. When a sequence is being explained and the user asks f o r information about a topic out of that sequence (but inside t h e same task), we have the generation of an answer like: “I h a v e n ' t finished this explanation yet. Do you want me to give it up?”; t h e system waits for an answer before continuing or skipping t o another subject. As it has been said, controls_strategy_1 is selected when t h e system must explain to the user a task structure. This means t h a t the strategy is mapped onto a particular task structure r e t u r n e d by the Knowledge Representation S y s t e m4 . An interesting case is the one in which, during the explanation of a task, the user a s k s for information about another task. In this case, the system could answer something like: “The explanation of this task has not b e e n finished yet; do you want to give it up and to pass to t h e explanation of another task?”. If the user answers negatively, t h e 4

It is important to note that “mapped” here means that, once t h e Knowledge Representation System has returned a task s t r u c t u r e , c o n t r o l _ s t r a t e g y _ 1 takes this structure as its “argument”, i.e. as the d a t a base which contains all the steps to be explained. It is important to n o t e this, since the term “mapped” must not be confused with “cabled”: t h e strategy is not structured on task but, instead, is absolutely general.

5

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR explanation of the current task is continued; if the user a n s w e r s affirmatively, the current explanation is aborted and t h e control_strategy_1 is mapped onto another task structure: interruption


TS1

control_strategy_1

TS2

•

F i g . 1 6 : The explanation of a task is interrupted for the explanation of another task

In this case, obviously, no second class strategy is applied.

5 . Normal flow along control_strategy_1

a

task

structure:

The KRS returns to the DM the appropriate portion of the TKS ( w e are not considering the case a frame structure is returned). T h e following steps are performed to reach the explanation's end, depending upon the connectives we meet in the structure (for a complete account of the connectives in a Task Knowledge Structure, see [Ferrari et al. 1991d]): • a l t - a l t : this causes a generation of questions user about possible alternatives (for instance, to mail one can answer a mail he has received, or write a mail ex novo). In this case, the system that there are different paths to make different and so asks the user which one he chooses.

to t h e send a he c a n knows things,

• a l t - e q : the system knows that there are different paths which are equivalent, i.e., they are useful to d o the same thing. For instance, it is the same thing to u s e an editor like "vi" or one like "emacs". It is the s y s t e m which chooses randomly a path or the other, keeping the other one as a possible alternative. • s e q : the system lists the main steps; all the titles of these steps are stored in a list of things to be done; t h e system focuses on the first element of this list; b e f o r e the system explains this step, it performs a check a b o u t 5

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR it. At the end of each step, the system generates a check on the level of understanding of the user. If the u s e r answers affirmatively (and this is always the case i n the normal flow of explanation that we are considering) the system continues in explaining the elements in t h e list of things to be done, and, when the list is empty, i t generates a check on the parent node in the t a s k structure. The following picture represents the task structure relative to t h e task "edit_file": digit("emacs") SEQ

proc_1

digit(file_name)

SEQ

edit_file

ed_chars SEQ

digit(^X) ALT-ALT

save&quit digit(^F)

quit_editor quit

digit(^C)

F i g . 1 7 : The structure of the task "edit_file"

This is the way the picture must be read: to accomplish the t a s k "edit_file", it is necessary to perform a sequence composed b y three steps: "proc_1" (the procedure to open a file), "ed_chars", and "quit_editor". To perform "proc_1", it is necessary to p e r f o r m the sequence "digit("emacs")" - digit(file_name). To p e r f o r m "quit_editor", there are two alternatives: one must decide if h e wants to save the file he has edited (in which case he m u s t perform the sequence composed by ^X and ^F), or if he wants j u s t to quit the editor (in which case he must only type ^C).

6 . Normal flow along control_strategy_2

a

frame

structure:

The control_strategy_2 is the strategy selected when the query’s result is a frame structure. In this case the Knowledge Representation System returns a frame structure which comprehends all the features about that particular object. T h e strategy extracts the class the object belongs to, its e v e n t u a l

5

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR superclass, and a list of the slots which describe the objects. If t h e implied object is itself a class, then also a list of its instances is returned. class:

editor

slots:

name operative_system

s u p e r c l a s s : tool instances:

ed1, ed2, ed3

f i g . 1 8 : the frame structure returned after a question about "editor"

7. Focus managing during interruptions Interruptions are defined as deviations caused by the user which in some way disturb the normal flow of explanation. In t h i s section we will see some possible interruptions with a s h o r t comment, and we will focus especially on the movements inside the stacks. All the following examples are derived from the t a s k "new_mail", which is constituted by the sequence "run_mailer" "identify_address" - "edit_file" - "send". The task "edit_file" is described in fig.17. run_mailer :seq identify_address new_mail

edit_file send

fig.19: The task "new_mail"

case1: The state of the stack, after the step “proc_1” has been explained is the following:

5


proc_1 edit(file) new_mail fig.20: The tasks stack after "proc_1" has been explained

The system makes a check on the user’s understanding explained step:

of t h e

System: (...) Did you understand how to execute "proc_1"? User: Yes. How can I print a mail? i.e., the user answers affirmatively to the check, and then asks a question about another task. This is what happens:

proc_1 * edit(file) new_mail f i g . 2 1 : The tasks stack after the user’s interruption

This is the state of the stack marked, but not yet popped of the :seq) . We have a match with has not been completed, the

after the user’s “Yes”. The top level is (the system has not reached the e n d rule 3 , and, since the previous t a s k system asks:

System: “I haven’t finished with this topic yet. Do you want me t o stop here?”. The user can answer in two ways to the system’s question: i) “Yes” (i.e., I want the explanation of this topic to stop here) and we arrive to this situation:

print_mail

5


f i g . 2 2 : The tasks stack after the user’s confirmation

in which case the explanation of the task "new_mail" is stopped; ii) “No” (i.e., I want the explanation of this topic to c o n t i n u e ; excuse me for the interruption) which leads to this situation: digit("emacs) proc_1* edit(file) new_mail f i g . 2 3 : The tasks stack after the user’s deny

in which case the explanation is continued.

case2: The system "edit_file").

arrived

to the last step of a :seq

(in the

task

System: (...) Did you understand the procedure to quit the editor? User: No (+ reference to print(mail): see case1); Yes System: Then you understood everything about edit_file? User: Yes (all the stacks are reset) No f i g . 2 4 : The possibilities after a :s e q

5

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR When the user answers the first “Yes”, the system is at the end of a :seq, and so it makes a global check on the level (task) immediately superior. The user’s last “No” corresponds to a request for f u r t h e r explanation about a topic that the system considers a l r e a d y explained (see next section).

case3: Another case is the following. After the explanation of t h e “proc_1” step, the user asks for the system to explain him how t o quit the editor: the explanation of a procedure is not finished a n d the user asks for information on a step next to come in the s a m e :seq, disturbing the sequential order. The most plausible a n s w e r by the system in this case seems to be something like: “I haven’t explained that yet. Wait a minute”.

8. Requesting further explanation The user is always allowed to answer negatively to a system’s check, be it a local check, a check about a single step, or a global one about a whole sequence or a whole task. A negative answer is interpreted by the system as a request for further explanation about some topic. The system must have at its disposal s o m e alternative explanations to be used in this case. The system’s behaviour changes accordingly to whether t h e negative answer occurs after a local check or a global one. T h e former takes place after a single step has been explained (for instance, after the “proc_1” step). In this case the system provides an alternative explanation, which can consist of local actions like make_example, create_analogy etc. The latter check takes place typically after a complete sequence has been explained (for instance, once the s e q u e n c e “proc_1” - ”ed_chars” - ”quit_editor” has been explained, a check on “edit_file” is performed). In this case there are t w o possibilities: • the sequence is explained again in the same w a y , waiting for a negative answer to a single check (so reducing the problem to the one of a negative a n s w e r to a local check);

5

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR • the sequence is explained again activating soon alternative ways of explanation for every single step.

Conclusion In this article the role of a Dialogue Manager in an Intelligent Tutoring System has been defined. It should result clear that t h e role the Dialogue Manager plays in an ITS is a central one, since i t integrates the other modules of the system and enables them t o communicate to each other. It is worth noting that the main characteristic of the DM as i t has been presented in these pages is its generality, i.e., fact t h a t its design does not depend on the way the other modules a r e projected: the DM is not interested in what is returned by t h e parser: the logical form must be characterised at least by a certain number of elements which are defined from time to t i m e by the theory underlying the system. Also the KRS is n o t necessarily "visible" to the DM: it must return some s t r u c t u r e describing an object or a task, it doesn't matter how this s t r u c t u r e is stored or built inside the KRS.

references [Ferrari et al. 1990] Ferrari, G., Prodanof, I., Carenini, M., Moreschini, P.: Integrating knowledge structures with discourse processing - ESPRIT BRA #3160 "IDEAL" Deliverable B1 - Report # 3 - Dept. of Linguistics University of Pisa & Institute for Computational Linguistics - CNR, Pisa. [Ferrari et al. 1991a] Ferrari, G., Prodanof, I., Carenini, M., Moreschini, P.: Focus Managing Mechanism in an ITS, ESPRIT BRA #3160 "IDEAL" - Draft A3.2#1 - Dept. of Linguistics - University of P i s a & Institute for Computational Linguistics - CNR, Pisa. [Ferrari et al. 1991b] Ferrari, G., Prodanof, I., Carenini, M., Moreschini, P.: Teaching and Explanation Dialogues, Technical R e p o r t ATR1 - Dept. of Linguistics - University of Pisa & Institute f o r Computational Linguistics - CNR, Pisa. [Ferrari et al. 1991c] Ferrari, G., Prodanof, I., Carenini, M., Moreschini, P.: The General Theory of Discourse, Technical R e p o r t ATR2 - Dept. of Linguistics - University of Pisa & Institute f o r Computational Linguistics - CNR, Pisa. [Ferrari et al. 1991d] Ferrari, G., Prodanof, I., Carenini, M., Moreschini, P.: The Knowledge Representation System, T e c h n i c a l Report ATR5 - Dept. of Linguistics - University of Pisa & Institute f o r Computational Linguistics - CNR, Pisa.

5

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR [Ferrari et al. 1991e] Ferrari, G., Prodanof, I., Carenini, M., Moreschini, P.: Plans and Strategies in Explanation Dialogues, Technical Report ATR4 - Dept. of Linguistics - University of Pisa & Institute for Computational Linguistics - CNR, Pisa. [Fodor 1977] Fodor, J.D.: S e m a n t i c s , Cambridge, 1977.

Harvard

Unversity

Press,

[Serantinos & Johnson 1990] Sarantinos, E. and Johnson, P.: E x p l a n a t i o n Dialogues: A theory of how experts provide explanations to novices a n d partial experts, to appear in Artificial Intelligence.

5


Plans and Strategies in Explanation Dialogues Abstract In this paper the plan-based approach to discourse m o d e l l i n g is briefly discussed and its inadequacy to the tratment o f explanatory discourse is shown. The need for a s t a t i c representation of communication schemata is advocated, a n d , consequently, the requirements for such a r e p r e s e n t a t i o n are laid down. A graph-like formalism is selected and s o m e examples of its use are discussed in detail. Finally, a preliminary formal definition of an ATN-like m e t a l a n g u a g e is attempted.

1. I n t r o d u c t i o n Plan-based approaches to computational discourse modelling s e e m to focus on the intentional structure, i.e. on the goals and t h e actions to obtain them, which underlie human communication. But real human-human communication seems to prove the people make plans in two senses, about what to say, and about how t o say it (see also [Appelt 82]). In adopting a plan-based approach, then, it is necessary t o distinguish between real world p l a n s , which set chains of actions, including linguistic ones, to obtain real world goals, a n d communicational p l a n s , which deal with the form of communication. Social constraints, urgency, need for clarity, diplomacy and any other contextual requirements, are t h e parameters for this latter type of plan.

2.

Plan-based overview

approaches

to

discourse:an

Planning has been often proposed as a possible model f o r discourse pragmatics. This approach is theoretically motivated b y the fact that philosophy of language tends to explain t h e pragmatics of discourse in terms of felicity conditions f o r communication acts, which rely, basically, on a principle of satisfaction of the speaker's and hearer's intentions, while participating into a communication event. As intentions seem to be the only motivations for h u m a n action and human communication, it follows almost automatically, that goals underlie communication, and techniques to reach t h o s e goals are the pragmatic substance of communication. Traditionally, two different discourse types have been studied with t h i s approach, interaction and narrative.

5

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR 2.1. Planning and interaction Many theoretical proposals have been made to explain simple phenomena, occurring in goal oriented interaction, since the first model presented in [Allen 79]. Despite the rich literature, they c a n be reduced to two really different approaches, according to t h e way how a plan is conceived of. In one approach [Allen 79, 83], [Allen & Litman 87], [Appelt 82], [Cohen & Perrault 79] linguistic actions are described in terms of ordinary actions, and a r e included in the set of operators used to build an effective plan. Thus, in this approach, the plan of taking a train to a given destination can include moving from home to the railway station, buying the ticket, carrying one's luggage, as well as asking t h e departure time of the required train. Asking is treated as a n y other action, although it results in a linguistic act, rather than in a physical action. In the other approach, although the way how a (communicative) plan is conceived of does not change, plans a r e described as mental representations of how actions should b e concatenated [Pollack 86]. A sequence of actions, such as those mentioned above, is considered as an agent's believed sequence, rather than a sequence in reality. Other agents, in fact, can have different v i e w s on the necessary actions to achieve the same goal, thus believing a different plan. The real difference between these two approaches is that i n the first one an orthodox plan building mechanism is the control structure for the whole system, while in the second approach, p l a n recognition is carried, or should be carried, by a modal t h e o r e m prover, as modal operators are in play.

2.2. Planning and narratives Plans are also seen as a good explanation of many narrative t e x t s [Schank & Abelson 77], [Hobbs & Agar 81]. This approach is much less formal than the previous one a n d plans can be represented either as real plans or as sequences of typical actions (scripts). In this case, however, plans are n o t expected to provide an explanation of the intentions of t h e participants into the communication process, and of the s t r u c t u r e of communication itself, but simply are the very subject communication is about. They do not support communication, b u t they are the object being communicated. The existence of stereotypical plans is the only explanation of our ability t o understand narratives, even when some shortcuts are taken, t h u s eliminating the explicite men tion of some steps in the story.

6


3. Plans, actions, speech acts, and communication All the presented approaches have a common character; they t e n d to treat linguistic and communicative actions as ordinary actions in the world. Cases dealt with under 2.1. consider linguistic actions like asking and answering at the same level as actions in the world a s moving or taking a taxi. The general assumption being that all actions serve to verify preconditions of further actions, it is obvious that also linguistic actions have this status, and are d e a l t with exactly as other actions. In this approach, the possibility that a speaker s e p a r a t e l y decides which information is to be asked and how is it to be a s k e d is ruled out. Nevertheless, people in reality do make plans about how t o communicate, at least in order to cope with problems of conventions. Under § 2.2. the communication plan is also completely ignored. Narrative is assumed to stick a rigid sequence of goals and actions to reach those goals, but no space is left to the choice of the pure communicative goal, i.e. the way how a goal s t r u c t u r e is presented.

4. Rhetorical categories and planning The situation in Tutoring Systems is quite different. T h e substantive goal remains basically the same, i.e. to communicate a specific set of notions. What may, and actually does, change is t h e way how such a knowledge transmission must take place. In m o r e precise terms, while in the other, computationally known, cases there is a general balance between real world planning a n d discourse planning, in Tutoring there is no structured task, b u t providing information. All the planning activity is limited to p u r e communicational aspects. We deal more with rhetorical and strategical choices t h a n with real plans. This implies, besides a conceptual difference, also a substantial difference in the representational and processing technique. In fact, in the pure plan-based approach, communicational actions are treated as ordinary planning operators, and applied t o real world situations, every time a linguistic action seems t o create the preconditions for the execution of another action. I n general, linguistic actions (speech acts) are generated one at a time as consequences of real world situations; in no case they a r e generated as a consequence of other linguistic actions, or as a precondition for other linguistic actions. 6

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR The same observation holds for narrative texts. In ITSs, instead, linguistic actions are very often chained i n stereotypical sequences, corresponding to well assessed tutoring strategies. It is, therefore, often the case that a specific situation i n the real world, such as a student's question, triggers a sequence of speech/discourse acts. In principle it may be possible to define a series of linguistic operators, such that they can automatically build an explanation plan. However, the general feeling is that such explanation strategies are as stereotypical as to make the use of general planning operators largely overdimensioned to the real problem. The static representation of those stereotypes is probably t h e most suitable means.

5. A proposal 5.1. Requirements for representing explanation strategies From the above discussion, some indications arise, precise e n o u g h as to allow the specification of a representation mean as well as a procedure for the generation of explanations. The general requirements are the following: - explanation strategies consist of schemata for the organization of explanatory communication, which may ultimately be organized in sequences of discourse acts; - different explanation techniques, for the same subject, correspond to different schemata, which propose choice points, where the choice is, intuitively, a function of at least the level of knowledge of the learner, if not also a matter of taste of t h e explainer. This level of knowledge can be implicitly known t o the tutor, or can be assessed by means of explicit questions; - explanation strategies must allow interruptions from the learner; there must be, therefore, a mechanism which keeps track of t h e suspended strategies; - explanation strategies must allow recursion, i.e. they must b e able to call for other strategies, including themselves; in fact, interruptions from the learner may require the call for o t h e r (embedded) strategies.

5.2. Sketching a metalanguage Sequences of acts and choice points can be very easily represented in terms of oriented graphs (transition networks), different paths representing alternative strategies; recursion c a n be represented in terms of RTNs as well. The definition of a metalanguage for augmentations is necessary to access t h e knowledge base, to build communication schemata, and to trigger 6

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR minor linguistic actions like requests for confirmation, execute tests on the status of the explanation.

or t o

6. A model for explanation strategies In this paragraph, an attempt to model explanation strategies b y means of a network is presented. As it has been sketched above, an explanation strategy consists in a sequence of o p e r a t i o n s which build an explanation fragment either as part of a teaching task, or as an answer to the learner's question. These operations, although different in communicative content, may be reduced to a pair , as they must identify the relevant knowledge and assign the right explanation schema. The same schema m a y take different pieces of knowledge as arguments, with different explanatory and communicative results. Strategies must recursively call for substrategies, including themselves. Some example strategies will help to u n d e r s t a n d informally how they are constructed.

6.1. General teaching/explaining strategy The following graph describes a general strategy. START

ST/INTRO

D1

KN/STATE

T1

S1

E1 D2 fig. 1 : representation of a general strategy

This graph introduces the convention that arcs are labelled with upper-case symbols, whenever they are calls to other graphs, 6

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR whose name (initial state) correspond to that label (most like ATN PUSHes). This graph says that a general teaching strategy s t a r t s with an introductory form, specified in the graph, continues with a form, also specified in a graph, and e n d s with one of the five possible continuations. It is possible that continuations can be appended one a f t e r the other (a can follow a etc.), but t h i s possibility has not been included in the present figure, as t h e possible combinations have not been empirically motivated a n d tested. Empirical work carried at City University can feed t h e present model with real data. The first step () is represented in the following graph.

S1

INTRODUCE

S2

fig. 2: the graph for an INTRODUCE strategy

The first arc is traversed only if a general introduction of the domain of explanation is in question. The constructive part of the action ranges over sentences like [1a] The domain of my competence is..... [1b] I am here to teach you...... [1c] I can teach you everything about..... while the to the knowledge base simply relates to the t o p node of it, i.e. what the knowledge base is about. The second arc, , is an example of general action, which can be called both for the entire task, and for s o m e subtask. The constructive part of the action ranges over formulas like [2] Now I am going to teach you how to....... while the refers to the focused (by the system) heading (=task or subtask name).

6.2. A general active strategy The next step presented by the top level graph is a call for a strategy. This is a general strategy which applies to m a n y cases where the initiative stands on the part of the system. A can occur: - at the beginning of a session, after an , in order t o identify the level of acquaintance of the learner;

6

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR - after the introduction, by the system, of a new concept, in o r d e r to establish to what extent it is known to the leaner; - after any explanation cycle; - before initiating an explanation/example cycle ("do you w a n t that....?"). This situation is represented in the following graph. CHECK

C1

fig. 3: the graph for a CHECK strategy

The first arc ranges over sentences like [3a] Have you experience of.... [3b] Do you know anything about..... where the is about the top level knowledge labels. The second arc deals with sentences like [4a} Is it clear.? [4b] Do you understand? etc. The consequence is that: 1- if the answer is affirmative, the original strategy can be carried on; 2- if the answer is negative, clarification and negotiation strategies are to be activated. The third arc ranges over sentences like [5a] Do you want that I explain you.... [5b] Do you need some further examples? which are equivalent to the first type of questions, but for the fact that they mention the learner's intentions. It is interesting to note how this specific strategy is a realization of the CHECK act, in terms of the different arguments i t can take, in perfect agreement with the General Theory of Discourse (see ATR #2).

6.3. A general explaining strategy The following is the description of a strategy for explaining (i.e. describing) something. This is the case of the strategy, which can be used - as a way of introducing a subject, as in

6

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR [6]

I am your ITS; my domain is e-mail. E-mail is a system to send messages around the world through a computer network (DESCRIBE). Do you know something about e-mail (checkcompetence) - as an answer to a learner's question like [7a] What is a prompt? [7b] I don't know what e-mail is The following graph specifies a DESCRIBE strategy. D4 DESCRIBE

D1 D2

D3

fig. 4: the graph for a DESCRIBE strategy

The first arc refers to those cases in which a description starts with a definition, as in [8]

E-mail is........

The second arc ranges over sentences very similar to t h o s e of the first arc. The difference between the two is due to t h e knowledge partition being queried; in the first case a "definition property" will be involved, possibly the "comment" slot (see ATR #5), while in the second, a "describing properties" will be in focus. The third arc refers to those case where an object (procedure) is described by just listing its main steps, as in [9]

E-mail consists in editing a message, calling the mailer, and...

. All these ways of describing a knowledge unit can be u s e d in isolation or associated one with the other, as the graph shows. The three main steps can be followed by an example ( a n EXAMPLE graph).

6.4.

Generalisations

6

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR From the above examples, few generalisations seem to come out. The first one is that strategies consist of sequencies of substrategies and basic actions. Substrategies are, in fact, g r a p h s like strategies, while basic actions consist in - a constructive part, which establishes the kind of message to b e generated; these may coincide, at least at the lower level, w i t h discourse acts; - a part, which, in most cases, says what is the part of t h e knowledge relevant to the current explanation; it is a s s u m e d that the relevant knowledge partition is kept in memory a n d updated by the DM, while the part extracts from focus specific knowledge items (properties of concepts and procedures) necessary for the current explanation/answer. On the basis of these generalizations it is possibile to build a general ATN-like language to describe discourse structures i n terms of calls to (sub)substrategies and primitive actions. Theoretically, strategies are strictly related to discourse acts. In fact, the ultimate units of a strategy, from the point of view of the communication form, are discourse acts. On the other h a n d , the distinction between the constructive (communicative) part of a strategic operation, and the query, realizes the theoretical distinction between abstract discourse acts, and their content. Strategies are accessible from different points, as both t h e system can activate them as a teaching initiative, or they can b e fired by the reasoning component (IF-THEN rules) [see ATR # 3 and #6].

7. A metalanguage Taking ATNs as a model for the representation and running of explanation strategies, it is necessary to define a language, consisting of arc-types, tests, and actions, which actually r u n t h e strategies. The following is a formal, although a very preliminary, presentation of such a language. ::= + ::= + ::= an arbitrary label ::= + + ::= PUSH | POP | GENERATE ::= INTRODUCE | DESCRIBE | CHECK etc. ::= DEFINE-OBJ | DESCR_OBJ | DESCR_TSK | INTRODUCE_DOMAIN | CHECK_INT | CHECK_COMP etc. ::= T | |

6

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR ::= + ::= a boolean operator ::= GETVAL | LISTVAL | GET_PARENT | GET_ANCESTOR | BUILD | GET_PRED ::= the name of an object in the knowledge base (frame, task) ::= the name of a slot ::= the content of the specified registers ordered in a list ::= a predicate from the logical form of a sentence ::= SETR A PUSH occurs when a substrategy is called from a h i g h e r level strategy s, i.e. "discourse complex strategies", consist of "non-terminal" sequences of substrategies called by a PUSH. A POP marks the exit from a graph (a strategy). A GENERATE is a call for a , a discourse simple strategy, i.e. a schema directly related to the structure of the c u r r e n t explanation. and are quite clear. Forms are function which access the knowledge base according to different modalities and in different fields, and return values such as, for example, t h e content of a slot, the list of the contents of all slots of a frame etc. Forms mostly realize the part of the strategies.

8. C o n c l u s i o n s The approach described in the above paragraphs bears resemblances to RST and RST-like approaches (see [ Mc Keown 85], [Mann & Thompson 86]), which have not been inspected i n depth. Also, the idea of pairing a generic communication-building action to an access to the knowledge base vaguely reminds t h e query-semantics approach to generation [Mathiessen 84], although it operates at a conceptually higher level. As in orthodox ATNs, the metalanguage relates to a specific representational device, in our case frames; changing the KRS language would result in changing the methodology, at least t h e forms, exactly as changing syntactic theory resulted in a change i n ATNs metalanguage (see [Kaplan 73].

6


References [Allen 79] J.F. Allen, A plan-based recognition, PhD thesis, Toronto 1979.

approach

to speech

act

[Allen 83] J.F.Allen, Recognizing intentions from natural language utterances, M.Brady & R.C.Berwick, Computational Models o f Discourse , MIT Press 1983, pp. 107-166. [Allen & Litman 86] J.F.Allen & D.J.Litman, Plans, Goals, a n d Language, IEEE 86] , pp. 939-947. [Appelt 82] D.E.Appelt, Planningf Natural Language Utterances t o Satisfy Multiple Goals, TechNote 259, SRI International 1982. [Cohen & Perrault 79] P.R.Cohen & C.R.Perrault, Elements of a p l a n based theory of speech acts, Cognitive Science 3, 1979, pp. 1 7 7 212. [Hobbs & Agar 81] J.R.Hobbs & M.Agar, Text plans and world p l a n s in natural discourse, Proceedings of the Seventh International Joint Conference on Artificial Intelligence, Vancouver 1981, p p . 190-196. [Kaplan 73] R.Kaplan, A general syntactic processor, in R.Rustin (ed.), Natural Language Processing, Algorithmic Press 1 9 7 3 , pp.193-241. [Mann & Thompson 86] W.C.Mann, S.A. Thompson, Rhetorical Structure Theory: Description and Construction of Text Structures, ISI/RS-86-174, Marina del Rey, 1986. [Matthiessen 84] C.M.I.M.Matthiessen, Systemic grammar computation: the NIGEL case, TR RR-83-121, USC/ISI, 1984.

in

[McKeown 85] K.R.McKeown, Text Generation: Using Discourse Strategies and Focus Constraints to Generate Natural Language Text, Cambridge Univrsity Press, 1985. [Pollack 86] M.Pollack, A model of plan inference that distinguishes between the beliefs of actors and observers, Proceedings of the 24th Annual Meeting of the ACL, 1986, p p . 207-214. [Schank & Abelson 77] R.C.Schank, R.P. Abelson, Script, Plans, Goals, and Understanding, Lawrence Erlbaum, 1977. 6


The Knowledge Representation System in an ITS

Abstract This paper is about the design and development of t h e Knowledge Representation component embedded in a n Intelligent Tutoring System. It is strongly oriented t o implementation and only partially interested in describing the state of the art in Knowledge Representation or in evaluating the different t h e o r i e s developed in that particular field. It begins with a general description of the system, to stress which is t h e role of a Knowledge Representation Module inside a n Intelligent Tutoring System. It will be shown that n o t every input is necessarily translated into a query to t h e Knowledge Representation Module; we will then s e e when this is the case and what does it mean to the s y s t e m . The the kind of theoretical model of k n o w l e d g e representation which has been chosen (namely “Task Knowledge Structures”) will be described; and a computational interpretation of that representation w i l l be proposed. We will finally see what does it mean for t h e Knowledge Representation System to return a task structure or a frame structure. The article will f i n i s h with a brief description of how a Student Model (itself a part of the Knowledge Representation System) can b e defined.

1.

Introduction

In an Intelligent Tutoring System, ITS, like the one designed f o r IDEAL, the Knowledge Representation System, KRS, is a n autonomous module which interacts with the other modules of t h e system, under the control of the Dialogue Manager. A b r i e f description of the system will be sketched; for a deeper analysis of the interaction between the KRS and the Dialogue Manager s e e [Ferrari et al. 1991a].

7


input

PARSER Output Strategies

DM

logical form 1

logical form 2

IF-THEN Rules Focus Structures TS/FS

RESPONSE GENERATOR query

KRS Student's Model

output

fig.1: the KRS module in the IDEAL ITS

As it is shown in the figure, (and fully explained in [Ferrari et al. 1991a]), the Dialogue Manager receives the result of the parsing phase, which is a structure called the logical form 1. The logical form represents a description of the input sentence, where t h e following information are clearly identified (for a theoretical account of the relevance of these information see also the General Theory of Discourse, GTD, as described in [Ferrari et al. 1991b]): • f u n c t i o n : the function of the sentence according to t h e criteria and the list given in GTD; this particular e l e m e n t will not be used by the DM; possible instances of functions are: INFORM, CONFIRM, DENY, CONTINUE etc.; • s e n t e n c e - c l a s s , which provides a syntactically-based characterization of the sentence, in terms of the t h r e e categories DECLARATIVE, IMPERATIVE, INTERROGATIVE. As far as INTERROGATIVEis concerned, it is possible t o further specify it as "Yes/No-question" or "WH-question"; • e v e n t - c l a s s , which sentence;

identifies

the

main

verb

of t h e

• r o l e s , which coincide to traditional syntactic roles inside the sentence (subject, object, specifier etc.). 7


When it is reached by the logical form from the Parser, the DM activates the Focus Structures Managing Mechanism (see [Johnson & Jonson 1990]), which can access a tasks stack and an objects stack, which keep track of the task currently explained, and of i t s related objects. The mechanism returns the current stacks state, and the DM tries to match the result to an IF-THEN rule. The IF-THEN rules comparatively evaluate the input logical form and the values returned from the stacks, and build a n appropriate query to the KRS, which returns the relevant t a s k structure or a frame structure relative to some particular topic. A t this point, the DM is able to select the appropriate explanatory behaviour which will be then verbalized by the Response Generator. In some cases, when “check” questions and answers a r e treated, or, in general, very simple loops are in questions, n o access to the KRS is made.

2. Querying the KRS The structure of an IF-THEN rule is the following: IF logical form 1 AND stacks state AND tests THEN actions where the variable logical form 1 is constituted by some of t h e elements of the logical form returned by the parser. Here is a n example of a rule:

IF

THEN

(SENT-CLASS = req) (TYPE = how) (main-verb is ACTION) (SUBJ = speaker) (OBJ = mail)

AND

tasks

objects

query(KRS, main-verb(arguments), result) select(output-strategy) f i g . 2 : A rule for a query

7

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR about a task to the KRS

In this rule there is a SENT-CLASS, the TYPE of which is , with a main verb which can be classified as an ACTION, t h e SUBJECT of which is the 5 , and the OBJECT of which is ; both the tasks stack and the objects stack are empty. All these elements cause a query to the KRS related to a task which matches to the main verb. According to the information which constitute the result of the query, the appropriate output s t r a t e g y is selected (see [Ferrari et al. 1991a]; [Ferrari et al 1991c]). This rule is typically selected for questions from the user like: [1] How can I send a mail? If the user instead asks a question about a particular object in t h e task being explained, like: [2] What is a file? and the tasks stack is not empty, the DM selects a rule like:

IF

THEN


AND t1 tasks

objects

query(KRS, SUBJ, result) select(output-strategy) f i g . 3 : A rule for a query to the KRS about an object

The input here is a SENT-CLASS, the TYPE of which is , with a main verb which cannot be classified as an ACTION, the SUBJECT of which is the object ; in the tasks stack the t a s k t1 is stored, and the objects stack is empty. All these e l e m e n t s cause a query to the KRS related to the object identified as SUBJ. According to the information returned by the query, t h e appropriate output strategy is selected. This rule is typically 5

These categories are taken from the General Theory of Discourse w h i c h has been used also in the IDEAL framework. See [Ferrari et al. 1991b].

7

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR selected after the user has asked information related to the task currently being explained.

3 . Task Structures System

about an object

as a Knowledge R e p r e s e n t a t i o n

In this section a brief account of the knowledge structure which has been used for IDEAL will be sketched. The main formal objects of task-oriented ITSs, where the expert teaches the u s e r the way to perform some particular task, i.e. how (through which steps) he/she can reach a given goal, are task structures. For a complete account of the Task Knowledge Structure theory, s e e [Johnson & Jonson 1990]. The Task Knowledge Structure (TKS) theory, is based on t h e assumption that the knowledge necessary to perform a particular task can be stored in a Knowledge Base as a task structure, i.e. a structure which contains information on plans, goal, procedures, and objects related to the task. The Knowledge Base contains a directory of all the task structures related to a particular domain, and a Domain Taxonomic Substructure, DTS, which r e p r e s e n t s generic knowledge not directly related to any particular t a s k structure. The TKSs contain a goal substructure and a taxonomic substructure: the former represents knowledge about the plan f o r carrying out a complete task; the latter contains information a b o u t the objects associated to the tasks and about the actions they a r e involved into. It became necessary to give a computational interpretation of TKSs, since theoretically they have been described only informally and not as a structure but as a topology, i.e. t h e relationships among the various components of the KB have n o t been made explicit. The only computational reference which h a s been made was constituted by the suggestion to represent TKSs and DTSs as frames with sets of procedures represented b y production rules.

4.

A computational representation of TKSs

4.1.

Knowledge

entities

In order to map task structures into a computational representation, it is necessary to identify and to define t h e elements which instantiate the "boxes" of the topology proposed i n [Johnson & Jonson 1990]. It is important to notice that this m e t h o d

7

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR is domain-independent, provided that domains which can b e represented in terms of task structures are treated. The building blocks of a DTS/TKS are the following (examples are about the specific e-mail domain of IDEAL): • OBJECTS: frame structures representing objects belonging t o the e-mail domain, described in a generic way, with n o relation to any particular task; • ACTIONS: any kind of atomic actions, such as digit("string"), click(icon), press(key) etc.; • PROCEDURES: linear sequences of actions with no branching, i.e. a rigid chaining of actions; • TASKS: control structures on procedures or actions based o n particular relationships holding between them, labelled as: • : s e q ---> a sequence that must be necessarily followed up to its end; • : a l t - a l t ---> two different things;

paths

to obtain

different

• :alt-eq ---> two different paths to obtain the same thing.

4.2 DTS model The Domain Taxonomic Substructure consists of a t a s k independent description of all the involved objects. CLOS, t h e standard Common Lisp Object System (see [Steele 1990]), has b e e n chosen as a description language of objects; it can be used as a frame-based representation language, which allows to m a k e explicit the relations among classes, superclasses and subclasses, and instances. The following are example of such descriptions: classes (defclass address () ((userid :initarg :userid) (node :initarg :node :type string) (network :initarg :network :initform "bitnet"))) (defclass editor (tool) ((name :initarg :name :type string) (operative_system :initarg :op_system :vm)))

:initform

7


(defclass file () ((name :initarg :name :type string) (dir :initarg :dir) (permission :initarg :permission :type flag)) (defclass flag () ((owner :initarg :ow) (group :initarg :gr) (others :initarg :ot))) (defclass tool () ((name :initarg :name :type string) (purpose :initarg :purpose)))

instances (setq file1 (make-instance 'file )) (setf (slot-value file1 'name) "profile")) (setf (slot-value file1 'dir) "/home/user") (setq ed1 (make-instance 'editor )) (setf (slot-value ed1 'name) "vi")) (setf (slot-value ed1 'op_system) "unix") (setq ed2 (make-instance 'editor )) (setf (slot-value ed2 'name) "xedit")) (setf (slot-value ed1 'op_system) "vm") (setq ed3 (make-instance 'editor )) (setf (slot-value ed3 'name) "emacs")) (setf (slot-value ed3 'op_system) "unix") (setq add1 (make-instance 'address )) (setf (slot-value add1 'userid) "paolox")) (setf (slot-value add1 'node) "icnucevm")

Superclasses' names follow classes' names (e.g. the class "tool" is superclass of the class "editor"). In the class' definition a list of slots is then defined, where the keyword "initarg" identifies t h e name by which the slot is referred to; the keyword "initform" identifies a possible default value for the slot; and the k e y w o r d "type" identifies the type or the class of the values the slot c a n have. 4.3 TKS model As far as the Task Knowledge Structures are concerned, it is necessary to introduce three particular classes, namely ACTION, PROCEDURE and TASK. The difference among these three classes is that ACTIONs can be thought of as atomic commands peculiar to the d o m a i n (e.g., the writing of a particular string, the clicks on icons, the u s e of particular function keys etc.).

7

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR PROCEDUREs are defined as fixed sequences of actions which include no alternative between them. They could be thought a s tasks with the connective :seq o n l y . TASKs are defined as control structures on procedures, based on the connectives :seq, :alt-alt, :alt-eq. The following is the CLOS description of the three classes: (defclass action () ((name :initarg :name :type string) (arguments :initarg :arg :type list) (comment :initarg :comm :type string))) (defclass procedure () ((name :initarg :name :type string) (arguments :initarg :arg :type list) (comment :initarg :comm :type string) (body :initarg :body :type list))) (defclass task () ((name :initarg :name :type string) (arguments :initarg :arg :type list) (comment :initarg :comm :type string) (objects-list :initarg :obj-l :type list) (examples :initarg :exs :type string (body :initarg :body :type list)))

THe following is the description for the class TASK; the slots which have the same name in the other classes are filled in t h e same way: 1. 2. 3. 4.

n a m e , a descriptive name of the frame; a r g u m e n t , the list of the arguments of the frame; c o m m e n t , a textual comment about the frame; o b j e c t s - l i s t , a list of the objects related to that particular task; 5. examples, a list of examples which can be used to describe the task; 6. b o d y , which describes the decomposition of the task into substructures and makes use of the special connecticves : s e q , :alt-alt, :alt-eq. Some of these slots occur under particular dialogic circumstances: for instance, the list of e x a m p l e s will turn out to be particularly useful inside a clarification subdialogue to solve a misconception. It can be important also to give an example of the list of t h e objects which is appended to the task: in the e-mail domain, given the frame corresponding to the task "send_mail", the list of the objects will comprehend objects directly and indirectly r e l a t e d to that task (for instance, "mail", "address", "file", "nickname", "subject", "date" etc.).

7

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR As far as c o m m e n t is concerned, it can be used, f o r instance, to introduce the explanation about a particular task w i t h a piece of text which comments in some way that task or t h e forthcoming explanation. Instances of actions, procedures and tasks are described below: (setq (setf (setf (setf

act1 (make-instance 'action)) (slot-value act1 'name) "click_start")) (slot-value act1 'arguments) '(application_icon)) (slot-value act1 'comment) "To start an application, double click on the appropriate icon")

(setq (setf (setf (setf

act2 (make-instance 'action)) (slot-value act2 'name) "start_emacs")) (slot-value act2 'arguments) '(emacs)) (slot-value act2 'comment) "To execute the "emacs", enter the "emacs"")

(setq (setf (setf (setf

act3 (make-instance 'action)) (slot-value act3 'name) "on_file")) (slot-value act3 'arguments) '(file_name)) (slot-value act3 'comment) "To point out on which file you want a command to be executed, digit the name of the file")

command string

(setq send (make-instance 'action)) (setf (slot-value send 'name) "send")) (setf (slot-value send 'comment) "To send a file or a mail, it is sufficient to use the function key F5") (setq (setf (setf (setf

proc1 (make-instance 'procedure)) (slot-value proc1 'name) "open_file")) (slot-value proc1 'arguments) '(emacs, file_name)) (slot-value proc1 'comment) "To open a file using "emacs" you must digit "emacs" and the name of the file") (setf (slot-value proc1 'body) '(:seq act2 act3)) (setq task1 (make-instance 'task)) (setf (slot-value task1 'name) "send_mail")) (setf (slot-value task1 'arguments) '(mail)) (setf (slot-value task1 'comment) "Sending mail is an operation useful to communicate very quickly with other users all across the world. It is very easy to learn how to send a mail") (setf (slot-value task1 'objects_list) '(mail, address, file, nickname, subject, date)) (setf (slot-value task1 'body) '(:alt-alt task2 task3)) (setq task2 (make-instance 'task)) (setf (slot-value task2 'name) "new_mail")) (setf (slot-value task2 'arguments) '(mail))

7

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR (setf (slot-value task2 'body) '(:seq run_mailer id_address edit_file send)) (setq (setf (setf (setf

task3 (make-instance 'task)) (slot-value task3 'name) "reply_mail")) (slot-value task3 'arguments) '(mail)) (slot-value task3 'comment) "When you receive a mail, it is possible to reply immediately to it") (setf (slot-value task3 'objects_list) '(mail, address, file, nickname, subject, date)) (setf (slot-value task3 'body) '(:seq edit_file send)) (setq (setf (setf (setf

edit_file (make-instance 'task)) (slot-value edit_file 'name) "edit_file ")) (slot-value edit_file 'arguments) '(file)) (slot-value edit_file 'comment) "To edit a file it is necessary to activate an editor, giving a name to that file, to edit a text, and to quit the editor.") (setf (slot-value create_file 'body) '(:seq proc1 ed_chars quit_editor))

The KRS can be accessed either by the IF-THEN rules or by t h e strategies, or by both; the search can return a specific piece of knowledge or a complete task structure. The first type of access, largely employed by the strategies (see [Ferrari et al. 1991c]), is necessary when particular pieces of the KRS (a slot value, a n ancestor etc.) is required at some point of the generation of t h e explanation. The second, instead, focuses the knowledge substructure currently being dealt with.

5. Searching through the KRS 5.1. Returning a Task Structure When the DM makes a query to the KRS about a task, it returns a task structure, which is a tree, where the parent node is d e r i v e d from the main verb of the logical form of the input, and the s o n nodes are recursively built from the task description given in t h e previous section. The DM stores this tree in the form of a list ("current task list"). Fig,4 gives an example of a task s t r u c t u r e corresponding to the task "edit_file".

7

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR digit("emacs") SEQ

proc_1

digit(file_name)

SEQ

edit_file

ed_chars SEQ

digit(^X) ALT-ALT

save&quit digit(^F)

quit_editor quit

digit(^C)

f i g . 4 : A task structure for "edit_file"

To complete the task , a sequence of actions/procedures must be performed: (which corresponds to the opening of a file), , and . To perform , it is enough to perform the procedure composed by the two actions digit("emacs") and digit(file_name). Finally, to perform , there are two alternatives: quitting the editor saving what has been written, by the sequence composed by ^X and ^F o r quitting the editor without saving what has been written, by t h e action digit(^C); both the alternatives represent a way to quit t h e editor, but they do imply different consequences (modifying o r not the file in a permanent manner); thus they are connected b y :alt-alt. The current task list represents the list of things to b e explained and is used by the Dialogue Manager to drive t h e explanation of a particular topic, to keep track of the history of the explanation, and to check if a topic introduced by the u s e r belongs to the current task structure. 5.2. Returning a Frame Structure When the Dialogue Manager makes a query to the KRS about a n object, the rule of f i g . 3 is applied, and the KRS returns a f r a m e structure built in this way: 1. if the object is a class, then a structure is returned consisting of a list of the slots of that class, its superclass (if any), and a list of the objects which belong to that class, which are present i n the KB (if any). Fig.5 is an example of such a structure relative to the object “editor”:

8


class:

editor

slots:

name operative_system

s u p e r c l a s s : tool instances:

ed1, ed2, ed3

f i g . 5 : the frame structure returned after a question about "editor"

2. if the object is an instance of a class, then the list of the v a l u e s of its slots is returned (if any), together with the name of t h e class it belongs to. 5.3

Specific

Knowledge

In some cases, specific pieces of knowledge are to be singled o u t to perform some test, or to build some answer. This is, especially, the case of the strategies, which may need access to single slots i n a frame. This process of searching in a limited domain is d e a l t with by the forms, as they are described in [Ferrari et al. 19991c]. 5.4 The Student model The Knowledge Representation System includes also a S t u d e n t Model. Adopting a very simplicistic view, we assume that i t consists of the facts that have been explained till the current step. This assumptions has the great advantage that the student m o d e l is perfectly monotonic unless some explicit request for clarification clearly shows that some piece of knowledge has n o t been acquired. Thus at the end of each step of the explanation t h e system performs a check on the student's level of u n d e r s t a n d i n g (for a complete account of the way the system manages s u c h checks see [Ferrari et al. 1991a]). The student's affirmative answers increase the Student model inside the KRS. Negative answers, instead, leave it as it stands, but require additional explanations. At the end of a : s e q structure a global check (i.e. a check relative to the immediately higher node in the t a s k structure) is performed, with the same effects.

Conclusion

8

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR This report represents a computational interpretation of the Task Knowledge Structures as defined in [Johnson & Jonson 1990]. It is not meant to provide a complete modelling of those structures, but it presents the main steps of a general methodology f o r mapping the results of a task analysis into a computational representation language. In this sense, also the choice of CLOS is t o be seen as not constraining, although strongly motivated.

references [Ferrari et al. 1991a] Ferrari, G., Prodanof, I., Carenini, M., Moreschini, P.: The role of the Dialogue Manager in an ITS, T e c h n i c a l Report ATR3 - Dept. of Linguistics - University of Pisa & Institute f o r Computational Linguistics - CNR, Pisa. [Ferrari et al. 1991b] Ferrari, G., Prodanof, I., Carenini, M., Moreschini, P.: The General Thoery of Discourse, Technical R e p o r t ATR2 - Dept. of Linguistics - University of Pisa & Institute f o r Computational Linguistics - CNR, Pisa. [Ferrari et al. 1991c] Ferrari, G., Prodanof, I., Carenini, M., Moreschini, P.: Plans and Strategies in Explanation Dialogues, Technical Report ATR4 - Dept. of Linguistics - University of Pisa & Institute for Computational Linguistics - CNR, Pisa. [Johnson & Jonson 1990] Johnson H., Johnson P.: Theoretical K n o w l e d g e Representations for supporting dialogues in Explanation and Learning, Deliverable No 6B1, Report No 1, ESPRIT BRA 3160. [Steele 1990] Steele G.L.: Common LISP The Language, second edition, Digital Press, 1990.

8


General Overview of a Dialogue System's Functions in Explanation and Learning

Abstract In this paper an overview of the functions necessary for a demonstrator to be developed within the frame of IDEAL are described. The main point such a demonstrator i s expected to make clear, is that an explanation system is a strongly knowledge-based system, where knowledge a b o u t the domain (task) to be taught and about the way how it i s to be taught shall strictly interact. The most complex technical difficulty is the i n t e r a c t i o n between the general teaching loop and the local q u e s t i o n answer cycle. In fact, a local loop may cause i n t e r r u p t i o n s , revisions, or, even, cancellation of the general (dominating) loop.

1.

Introduction

1.1. Explanation and Dialogue In a simple view, interaction in a teaching context consists in t h e building of answers which are a function of the interpretation of the learner's questions [Gilbert 87]. Remarkable improvements have been produced, within t h e frame of this same project, by the introduction of a sophisticated classification of question types, as in [Sarantinos & Johnson 1991]. According to this approach, the identification of the question t y p e is in itself the entry point in a path which directly leads to t h e right answer format, and selects the right task knowledge partition. In this model, it is of primary importance that t h e representation of the subject being explained (domain knowledge) be compatible with the classification of the question types. Even this very sophisticated approach stick to a single-loop view of the process of question-answering, as any pair is looked at as self-concluded, bearing no relation w i t h the discourse context. Thus, any question is interpreted a n d classified in itslef, and the answer is constructed on the basis of nothing more than the question and the relevant knowledge partition. This way of processing is very similar to the question

8

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR answering systems developed in the early 70s [Woods et al. 1 9 7 2 , Plath 1976] as interrogation languages for data-bases. However, real explanation dialogues show a different structure, where questions and answers are small extra-loop inserted in a general loop which represents the high level teaching task. ATR #1 discusses in detail the structure ooof explanatory dialogue. In this paradigm, interaction is to be dealt with as a dialogue, i.e., technically, as a communication process which can b e treated by the theoretical and implementational tools which h a v e been developed in other domains for computational dialogue modeling. A dialogue approach requires that interaction in explanation and learning be modeled as a knowledge-based process w h e r e knowledge about the domain, as well knowledge about t h e teaching/explanation strategies, are used to carry on communication and build answers to the questions. Also, t h e ingredients of traditional approaches to dialogue modeling are t o be reused and adapted to the new application. 1.2. Dialogue models Specific aspects of computational dialogue modeling, such a s focusing and reference resolution [Grosz 77, Sidner 83], inference of intentions [Allen 83, Allen & Litman 86, Pollack 86], a n d overanswering, have been studied since the early 80s. Also studies on the syntactic/functional structure of some specific k i n d of discourse or dialogue have been carried in the same p e r i o d [Reichman 78, Polanyi & Scha 88]. However, no comprehensive dialogue model has been presented until [Grosz & Sidner 86]; also a general frame for dialogue has been introduced in [Wachtel 86]. The attempt of using concepts from computational dialogue modeling in the context of explanation and learning shall, therefore, go through two steps, a deep and critical revision of dialogue modeling itself, in order to propose a unified view, which is, in fact, almost missing in the literature, and a parallel adaptation to the specific application. An attempt at building a n original and organic theory of discourse and dialogue has b e e n presented in ATR #2. In the following paragraphs, the m a i n functions for the implementation of a prototype system, carrying dialogues in explanation and learning, will be described.

2.

General architecture for dialogue

2.1. Control structure A didactic task can be, in general, divided into two main subtasks, teaching, and reacting to interruptions. During teaching t h e 8

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR initiative is of the teacher, even in those cases where a n interactive style is used, with many questions from the teacher. Interruptions (questions) insert a break in the teaching s u b t a s k and propose two different problems: how to construct an answer, and whether and how to go back to the original teaching subtask. As it has been described in ATR #1, the teacher s t a r t s teaching a subject according to a chosen strategy; then, if a n interruption comes, (s)he has to answer. The interruption m a y require either a simple answer, such as a confirmation, or t h e choice of a complex clarification strategy. After having p r o v i d e d the answer, complex it may be, the teacher has to decide w h e t h e r the interrupt is closed and the original strategy can be r e s u m e d , or it questioned important aspects of the previously explained subjects, and some steps back together with a change in teaching strategy are necessary. Such a situation can be represented as follows main teaching loop

teaching start interrupt

Dialogue Manager

answers

answering loop fig. 1: main loops in a didactic task

A dialogue manager is supposed to deal with two m a i n interacting loops. The main teaching loop deals with the teaching subtask, is carried according to the teacher's initiative, according to some teaching strategies, which can be considered a specific didactic skill. The answering loop deals with students questions o r interruptions, is a mixed-initiative event, as the focused subject is the one introduced by the student. The two loops interact with one another, as both share t h e same knowledge base. They also must have in common t h e knowledge about the teaching strategies, as, in some cases, t h e activation of one of those strategies is necessary in order t o answer some complex question. Finally, the student's i n t e r r u p t may require the teacher to revise the main teaching loop, i.e. t o change the teaching strategy. The entry into the two loops can be either a start, i.e. t h e teacher has its own initiative and is carrying the teaching 8

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR according to his/her own strategy, or an interruption, where t h e student takes the initiative, and an analysis of the question is necessary. Also the output can be either a teaching or a n answering.

2.2.

System's

functions

The following diagram demonstrator.

shows the main functions

of t h e

Generation pool

start

SAIL

w a i t

Dialogue Manager

IF-THEN rules a c t i v e

Strategies

Task Knowledge

F O C U S

student model fig. 2: System's Functions

The diagram tries to capture the two basic tutoring situations, when the system takes the initiative (start) a n d generates explanations, also creating spaces for questions a n d interruptions, and when the system is "listening" to the u s e r ' s questions and interruptions (through the NL analyser SAIL). The Dialogue Manager has access to all the knowledge sources, i.e. the IF-THEN rules, which specify how to build a n answer, the (teaching) strategies, and the task knowledge, which is a representation of the task being explained. The student model, which is, in a simple view, the record of what has already b e e n

8

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR taught, and is assumed to have been learnt by the student, is accessible only through the focus space. Also direct access between the three main modules, IF-THEN rules, Strategies, and Task Knowledge, is allowed, in order t o enable the system to build short answers, without going t h r o u g h the main loops. In the following paragraph, the single functions will b e described in more detail.

8


3. Modules 3.1. Dialogue Manager The main function of the Dialogue Manager is to direct the flow of control; this is obtained by means of the following three m a j o r activities: - it initiates (sub)dialogues. At the beginning of the explanation, the DM will call a start-strategy, to which the top d o m a i n concept (top node or top frame, e.g. e-mail) is associated. This will cause, after verbalization, the generation of sentences like [1]

I am your ITS. I will teach you how to use an e - m a i l system...

The same start-strategy will be recursively useful whenever new partition of the domain knowledge is initiated, as in [2]

a

Now, I'll teach you the sending procedure....

- it asks verification questions, both before introducing a n e w subject, in order to assess the level of knowledge of the l e a r n e r with respect to a partition of the task, as in [3]

Have you never used e-mail before?

and during explanation, in order understanding of the learner, as in [4]

to

assess

the

level

of

Right?......OK?

- it accepts (unexpected) interruptions at given points; i n t e r r u p t s consist in general in questions from the learner. They may occur at any point, and are, in general, allowed whenever t h e explanation reaches a significant point, as in [5a] E.: Before sending a mail, you must edit the text of the message. [5b] L.: How can I edit a file? The DM may admit a break at given points in the explanation. A n s w e r s of the learner reply to check questions of t h e system, as in

8

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR [6a] E.: Before sending a mail, you must edit the text of the message. [6b] Do you know how to edit a file? These are not properly (unexpected) interrupts, but, unless t h e y are plain confirmations, they may require an evaluation a n d imply a modification of the main strategy, previously selected. These different situations are dealt with by the possibility of the Dialogue Manager of taking one of the following two states: • a c t i v e when the system is generating s o m e kind of message, and does not accept interruptions, i.e. does n o t listen to the learner; • wait when the system accepts i n t e r r u p t s and tries to interpret them by activating SAIL, the n a t u r a l language analyser, and the reaction loop. In the w a i t state, after the learner's utterance has b e e n translated into a logical form (see ATR #3), the control is given t o a set of IF-THEN rules, which evaluate the input and indicate t h e appropriate reaction of the system. In the a c t i v e state, the system activates strategies ( a s described in ATR #4) either for producing an explanation from t h e start point, or as a reaction called for by the IF-THEN rules. A dialogue is started by the system, which, being in a n a c t i v e state, calls for a general explanation strategy associated with the top level domain knowledge. The execution of such a n explanation strategy will include some interruptions, by which t h e learner can ask questions or give positive/negative answers (if addressed by a check question). In this phase the system is i n w a i t state, and an interpretation/reaction loop is activated by t h e system. The Dialogue Manager keeps track (focus) of • the current knowledge fragment • the current strategy • the current state of knowledge of the student and makes regular updates of them.

3.2. Strategies Explanation strategies are abstract schemata for the building of explanatory discourse, and may, ultimately, consist of sequences of communication acts directed to a specific communicative goal; when they are activated they are paired with a specific subject, i.e., in our case, a focused partition of the knowledge base. 8

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR Strategies can accept programmed interrupts, which switch t h e Dialogue Manager to w a i t . They are activated by the Dialogue Manager, which is switched to a c t i v e . ATR #4 gives a m o r e detailed description of the strategies.

3.3. IF-THEN Rules These are called by the Dialogue Manager (which goes to a c t i v e ) to define a response. The IF-part of them evaluates the input, as i t is translated into a logical form, together with the current state of knowledge and explanation (focus). The THEN-part activates (i) communication strategies, which may consist of complex discourse structures as well as isolated answers, as well as (ii) queries to the knowledge base. 3.4. The knowledge sources Task Knowledge and Student Model are static knowledge repositories. The Task Knowledge (see ATR #5) contains a description of the task being taught (in the present case e-mail), and is accessed by all the other functions, whenever it is necessary to pick up some partition of it. The Student Model is, i n the present restricted view, a subset of the Task Knowledge, namely the one which has been just taught and is supposed to b e internalized (in attitudinal terms, believed) by the student. During interaction, specific partitions of these knowledge repositories are focused. Computationally, the focus is a collection of pointers to the current state in the Task Knowledge and t h e Student Model, thus keeping track of what is being taught a n d what has been learnt. It is also relevant to keep record of t h e teaching strategies currently activated, and the current step, i n order to be able to resume teaching as soon as interrupts h a v e been dealt with. Finally, it is usefull to keep memory of the very superficial aspects of the interaction, as the stack of previous utterances, w i t h the current utterance on top of it. The above introduced elements combine into a single focus structure, which is better discussed in ATR #3, and provide t h e Dialogue Manager with the relevant information at any instant of the interaction. This complex focus structure is updated at a n y cicle of the Dialogue Manager, and is treated according to a stack policy, as described in detail in Report #2.

3.5. The Generation Pool All the messages coming from the Dialogue Manager, as well as other functions in the model, are to be translated into Natural 9

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR Language utterances. However, generation is not in the focus of this project, and no specific algorithm for this purpose will b e presented. Thus, the Generation Pool box simply corresponds to t h e theoretical assumption that all the messages coming from all t h e different functions of the model, will be collapsed together in a pre-linguistic form (Generation Pool) and a generation algorithm will take the burden of producing natural language expressions. This function is not a specific part of the present project.

4. Sample loop 4.1. Example steps The most obvious loop representable in this model is described below, according to the following steps: • step #1: The Dialogue Manager is active, with the top n o d e (frame) e-mail as argument. An introduction like [7]

I am your ITS system. I will teach you e-mail.

is generated. • step #2: A teaching strategy is selected. According to the r e s u l t of such a selection, the following sentences can be generated: [8a] Do you know what e-mail is? | Have you ever used e-mail before? corresponding to an"assess knowledge" strategy, and [8b] To send an e-mail it is necessary to perform the following steps... corresponding to a simple description strategy. If the answer to step 8a is negative, step 8b can be taken; alternatively, step 8b can be taken without any previous assessment of the students level of knoweldge. If, instead, t h e answer to step 8a is positive, the Task Knowledge must b e consulted, in order to make hypotheses about which subtask of the main task is unknown to the student. • step #3: After a specific subject, together strategy, has been selected, fragments like

with a teaching

9

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR [9]

In order to send an write your message you want the e-mail right procedures to

e-mail you must first edit a file and in, then you must identify the address be sent to, and, finally, perform the eventually send the mail.

can be generated. This step is partially overlapping with step 2 (examples 8a and b). step #4: at this point different possibilities can occur. a) The student is perfectly tuned to the teacher's strategy, a n d does not produce any significant message. The student k e e p s himself the closer to the teacher strategy, and does n o t interject any interruption. In this case, the explanation continues. b) The student does not understand some aspect of the task being taught. Thus he will ask questions like [10a] What exactly is e-mail? [10b] What do you mean by "editing a file"? These questions come like real interruptions, as they require t h a t the question be understood (translated into a special logical form), evaluated, i.e. compared to the current discourse subject, and, finally, that a reponse be calculated. Questions can refer either t o some already explained piece of knowledge, or anticipate s o m e notion which has not yet been taught. In the first case, a step b a c k to the unclear subject is necessary, together with the choice of a new teaching strategy, different from the one which has b e e n previously activated in association with that subject matter. In t h e second case, the system can chose between resuming the initial strategy, with a sentence like [11] I will come to this soon and providing the explanation immediately.

4.2. A specification of the Dialogue Manager The following is a specification of the interacting Dialogue Manager: START

loops in t h e

Set State = ACTIVE Select_Starting_Sentence Select_General_Strategy 9


END

Execute_Discourse_Complex_Strategy Close_Session

After having initialized the system with some starting sentence, and having selected a teaching strategy, the actual core of the system is the execution of a complex strategy of discourse. UNTILL ( 1 ) Select_next_state ( 2 ) START Select_arc arc_type = PUSH Store_pointers Execute_Discourse_Complex_Strategy arc_type = POP Pop_the_stack Resume_pointers CHECK Select CHECK question/action Set State = WAIT Evaluate Set State = ACTIVE GO (1) Execute_Discourse_ Simple_Strategy Set State = WAIT Count_time-out Evaluate Set State = ACTIVE GO (1) Set State = WAIT CR GO (1) Evaluate Set State = ACTIVE GO (1) Set State = WAIT CR GO (1) Evaluate Set State = ACTIVE 9

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR For every state in a strategy, for a selected arc starting f r o m that state - if it is a PUSH, store all the data structures of the c u r r e n t computation, and call recursively this function, with the n e w complex strategy name as parameter; - if it is a POP, the currently stored data structures are to b e popped, and a previous computatation is to be resumed; - if it is a CHECK, the system shall hear the user (state = WAIT), and evaluate the answer (yes or no); - if it is a GENERATE, call Execute_Discourse_Simple_Strategy - at the end of any level of the strategy, an interruption is enabled, evaluated, if proposed by the explainee, and disabled. Arcs starting from a state are not followed according to a n o n deterministic policy, but the selection of arcs is committed to t h e choice of the explainer. Eval Eval Eval This function builds an actual fragment of explanation, a f t e r having performed tests and actions.

Interpret_input (SAIL) OR . NIL Match IF-part Execute THEN-part Update_focus

After having interpreted the input message, if it is a n (implicite or explicite) confirmation, no reaction is necessary. If i t isn't, the relevant knowledge is to be selected by means of the I F THEN rules, an answer shall be built according to the indications of the THEN-part of the rule, and the focus structure is updated. Select_Discourse_Complex_Strategy Execute_Discourse_Complex_Strategy simple_loop 9

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR If the THEN-part of the rule invokes a , t h e n a Complex_Strategy is selected and executed. Strategy selection is, among others, a function of the type and complexity of t h e knowledge currently in focus. Complexity of the knowledge is n o t an arbitrary notion, but it depends upon whether a single concept of property, or a list of alternative elements is involved. At t h e end of the strategy, the space for interruptions is g a r a n t e e d within the Execute_Discourse_Complex_Strategy function. If t h e retrieved knowledge is less complex, a simple_loop is executed.

5. Conclusions In this report, a rough specification of a dialogue model is presented. The most relevant feature of this model is t h e possibility of activating an explanation loop as a reaction to b o t h (predefined) interruptions, and questions solicitated by a n explicite or implicite CHECK question. The interactions b e t w e e n complex strategies, simple strategies, or loops, and the main loop are quite complex, and are solved by calling strategies from t h e main loop, and providing different points, where control is given back to a general strategy. This complex situation, although computationally unclear, tries to capture the relation b e t w e e n planned teaching, and the different contributions of interruptions.

References [Allen 83] Allen, J.F., Recognizing intentions from natural language utterances, M.Brady & R.C.Berwick, Computational Models o f Discourse , MIT Press 1983, pp. 107-166. [Allen & Litman 86] Allen, J.F. & Litman, D.J., Plans, Goals, a n d Language, [IEEE 86] , pp. 939-947. [Gilbert 87].Gilbert, N., Question and Answer Types, in D.S.Moralee (ed.), Research and Development in Expert Systems I V , Cambridge University Press (1987), pp. 162-172. [Grosz 77] Grosz, B.J., The representaton and use of focus in a system for understanding dialogs, Proceedings of IJCAI , 1977, p p . 67-76.

9

ESPRIT Basic Research Action #3160 - “IDEAL” - ATR [Grosz & Sidner 86] Grosz, B.J. & Sidner, C., Attention, Intention, and the Structure of Discourse, Computational Linguistics 1 2 , 3,1986. [Hughes 86] Hughes, S., Question Classification in Rule-based systems, in Research and Development in Expert Systems II, Cambridge University Press (1986), pp. 123-131. [Plath 76] Plath, W., REQUEST: A natural language q u e s t i o n answering system, IBM Journal of Research and Development , 2 0 (1976), pp. 326-335. [Polanyi & Scha 88] Polanyi, L. & Scha, R.J.H., Discourse Syntax a n d Semantics, L.Polanyi (ed.), The structure of Discourse , Ablex 1988. [Pollack 86] Pollack, M., A model of plan inference t h a t distinguishes between the beliefs of actors and observers, Proceedings of the 24th Annual Meeting of the ACL , 1986, p p . 207-214. [Reichman 78] Reichman, R., Conversational Coherency, Cognitive Science , 2, 4 1978, pp.283-328. [Sarantinos & Johnson 91] Serantinos, E. and Johnson, P., Question Analysis and Explanation Generation, Proceedings of the T w e n t y Fourth Hawaii International Conference on System Sciences, Hawaii, Usa, (1991) [Sidner 83] Sidner, C., Focusing in the comprehension of definite anaphora, M.Brady & R.C.Berwick (edd.), Computational Models o f Discourse, MIT Press 1983, pp. 267-330. [Wachtel 86], Wachtel, T., Pragmatic sensitivity in NL interfaces and the structure of conversation, Proceedings of COLING86 , Bonn 1986, pp. 35-41. [Woods et al. 1972] Woods, W.A., Kaplan, R., Nash-Webber, B., T h e Lunar sciences natural language information system: Final report , BBN Rep. # 2378, 1972.

9