Knowledge-Based Dialogue Management

2 downloads 0 Views 185KB Size Report
set of rst order logic, namely Description Logics (DL), ... plete w.r.t. the relevant concept description. ..... system instead of devising a prede ned dialogue struc-.
Knowledge-Based Dialogue Management Bernd Ludwig

Bavarian Research Centre for Knowledge-based Systems (FORWISS) Am Weichselgarten 7 D-91058 Erlangen-Tennenlohe [email protected]

Abstract

This paper reports about our research to design and implement a spoken language dialogue system (SLDS) that can be con gured easily and rapidly to di erent applications. In contrast to nite-state models for information dialogues, we outline a different knowledge-based approach capable to handle dialogues of greater complexity than those following a nite-state argumentation structure. We show how knowledge about the application domain, the dialogue participants, syntax and semantics can be represented in a uni ed framework based on Description Logics (DL) as modelling formalism. On the basis of the knowledge in the domain model, inferences are performed employing a partial rst order logic for the interpretation of user utterances and the planning of system responses.

1 Introduction

Existing nite-state systems for designing dialogue applications rely heavily on phrase and topic spotting techniques for generating semantic representations of utterances. Normally, edges in a state transition diagram are annotated with several variations of phrases the user could utter in order to reach one of the successors of the current state. So, the processing of the speech recognizer output and therefore the whole natural language understanding part of the dialogue system is reduced essentially to pattern matching in order to extract values for parameter slots to be lled. These values are used for querying the underlying data base to retrieve the desired information. However, works in computational semantics such as [BBKdN99] or [Ram97] have shown the importance of inference tasks for the construction of a semantic representation of a utterance to formalize their meaning and the integration of utterances into the whole dialogue context. When relying on inference a dialogue system needs an appropriate knowledge base for linguistic and pragmatic information about the application domain. Normally, this is an enormous knowledge engineering task as much so called common-sense knowledge has to be integrated into the domain model to be constructed. The

complexity of this task has often prevented the usability of prototype dialogue systems for real-world scenarios and as a generic system to be con gured for di erent applications. In this paper we outline our approach to the knowledge representation task for our dialogue system that in contrast to other implementations employs a decidable subset of rst order logic, namely Description Logics (DL), as uniform representation (and inference) language (for an introduction see [Don96]). To get the complexity of the development of the knowledge base under control, we distinguish between application-independent linguistic and common-sense knowledge on the one hand and application-speci c knowledge about the domain on the other hand. While the latter part of the knowledge base has to be con gured for each application, the former is reusable and can be extended with application-speci c knowledge. We show how DL de nes the semantics of Discourse Representation Structures (DRS) with respect to a given domain model represented in DL (for an introduction to DRT see [KamRey93]). Therefore, we can use the well analyzed and ecient inference procedures for DL to reason about the content of utterances. In order to devise a theory of mixed-initiative dialogue, we extend DL to the appropriate1 subset of the partial logic FIL ([Abd95]). In this logic, we can draw inferences whose results hold under certain justi cations only. This means that the results of inferences may depend on conditions veri ed later on the basis of a eventually extended knowledge. In the context of dialogue processing, we interpret these justi cations as information to be given additionally by the user when his utterance was incomplete w.r.t. the relevant concept description. We show how the justi cations of a concept serve to check the consistency between the system's private knowledge and its assumptions about the user's knowledge extracted from the DRS of the current dialogue. We advocate for incorporating an utterance into the dialogue by rst considering how it contributes to the current focus or if it establishes a new one. This approach is based on [Tra94] and increases the robustness 1

I.e. as in DL, predicates have to be of arity one or two.

of the system against speech recognition errors and spoken language phenomena by viewing them as essential part of a dialogue and not as problems beyond the implemented theory of dialogue understanding. Only in this way, one can cope with \dialogue control" (see [Bun97]) phenomena that are|not exclusively|common to spoken language discourse. A grounded focus is incorporated into the dialogue by analyzing its contribution to the communicative plan executed jointly by the dialogue participants. Such contributions are characterized by the set of speech acts permitted in the discourse part of the domain model. Speech act moves are motivated by the change of the information state of the dialogue participants. This change is determined by analyzing the coherence of the current focus with the previous ones. Depending on the computed coherence relation, di erent hypotheses for speech acts are assigned to the focus. Disambiguation of the hypotheses is performed by deciding which conversational game ([Kyo91]) a speech act belongs to as a single move. As speech acts are always recorded in the DRS for the current dialogue, in this approach a precise meaning can be given to modal and performative verbs as well as function words within the given framework for semantics.

2 Modelling Domains in Description Logics 2.1 Linguistic Knowledge

Representing linguistic knowledge is a central issue for any natural language understanding system. In our approach, the parser produces Discourse Representation Structures as semantic representation of the parsed word lattice. A DRS is closely related to an A-box statement as their meaning is equal. To demonstrate this in a very simple case, we give the following example: f 

ight(f) = 9f : ight(f) As discussed in [LudGoNie99], a consequence of this equality is that we may assign a concept description to the syntactic head of any DRS. In doing so, we can incorporate pragmatic reasoning into parsing: To every fragment of the word lattice given to the parser that obeys the used grammar we assign the DRS constructed for it. For the fragment to be a valid result, we require the concept description of the DRS's syntactic head to be satis able within the domain model. The advantage of this approach compared to the traditional serial order of syntactic, semantic, and nally pragmatic processing is to avoid (at least partially) the combinatorial explosion of ambiguities that arises otherwise when context is taken into account in late processing steps only. So, the linguistic part of the domain model contains formal concept descriptions for notions of the application domain that are expressed by natural language. Starting with primitive concepts for lexical categories and elementary notions as well as with (primitive) roles (i.e. binary

relations) for syntactic dependencies of phrases, the domain model de nes concepts for the meaning of phrases which are common to di erent domains and have always got the same structure like temporal and spatial expressions. For example: from

Phrase

Meaning Location is

DL description starting-point \ Location starting point of 8has-start:Location

to

Location is

a movement

Location direction of a

direction \ 8has-dir:Location

movement In addition to such domain-independent de nitions the domain model contains application-speci c concepts. For example: Based on the primitive concept flight, one can de ne the concept of FlightConnection by stating:

Meaning DL description FlightConnection ight \ 8has-dep:(starting-point \ is a flight (and 8has-start:airport) \ therefore a 8has-dest:(direction \ movement) with a

ight-starting point 8has-dir:airport)

and a ight-end point The central idea of our approach is to have a domainindependent so called upper model to which one can add additional concepts for application-speci c notions. Besides of testing satis ability during parsing, this domain model is used for all inference processes in the dialogue system. The instantiation of concepts is described in [LudGoNie99], while the present paper discusses how the knowledge represented in the domain model is used to interpret DRS in the context of a given dialogue (situation).

2.2 Pragmatic Knowledge

For every dialogue application, there exists a problem solver that represents the current state of a airs in the domain, updates this state according to the content of the dialogue and domain constraints, and reports about the current state. Consequently, the problem solver offers a set of operations for referring to the current state. The content of the operations is interpreted according to its semantics as de ned in the domain. In the case of data bases e.g. for the ight information application, we have a very limited problem solver whose operationality is restricted to performing queries about the content of the data base. Data base entries are individuals of certain concepts, and relations between them describe their meaning|as in the following example:

Flight Departure Destination Time

LT204 Rome OA178 Rome

Athens Athens

06:15 10:05

Each column contains entries of a particular concept, while each row de nes an individual of the following concept that describes the semantics of data base entries: FlightObject = 8 ight-number: ight-num \ 8dep-airport:airport \ 8dest-airport:airport \ 8dep-date:date \ 8dep-time:time Extensions of this concept could be:

ight-number(f1 ; OA178) dep-airport(f1 ; Rome) dest-airport(f1 ; Athens) dep-date(f1 ; 1999.09.15) dep-time(f1 ; 10:05)

ight-number(f1 ; LT204) dep-airport(f1 ; Rome) dest-airport(f1 ; Athens) dep-date(f1 ; 1999.09.15) dep-time(f1 ; 06:15)

2.3 Linking Pragmatic and Discourse Knowledge

When the content of DRS has to be interpreted it has to be related to the state of the problem solver. To express it in other words, the describing situation (the content of the DRS) must be consistent with the described situation (the state of the problem solver). In our approach, we add axioms2 to the domain model which force the required consistency of linguistic information (from the content of the DRS) and pragmatic information (from the problem solver). 9 ightdep:airport := 9dep-airport:airport \ 9has-dep:(9has-start:airport) The meaning of the above de nition should become clear by looking at its extension: dep-airport(x; d) ^ airport(d) ^ has-dep(x; y) ^ has-start(y; d) ! ightdep(x; d) As can be seen from the de nition, x can have d as departure only if the appropriate linguistic and pragmatic relations hold. I.e. if the information is contained in the data base as well as in the DRS of the dialogue. Normally, the information contained in the dialogue is incomplete in the sense of the relevant concept descriptions of the pragmatic knowledge. Therefore, a mechanism is required that determines the missing information that has still to be acquired during the dialogue. For this which we call justi cations for reasons that will become clear immediately. 2

purpose we use the resolution proof method of the partial rst order logic FIL (see [Abd95]) restricted to predicates of arity 1 or 2. For reasoning about the content of DRS, we combine corresponding concepts from linguistic and pragmatic knowledge in the following way: An instance of a pragmatic concept is an instance of a linguistic concept as well if justi cations for the concept hold for the particular instance. For example, a ightobject is a ightconnection if the justi cations ightdep, ightdest, ightdepdate, ightdestdate hold. As discussed in [LudGoNie99], we denote this using ionic formulae of FIL:

ightobject(x) 9 1 08

ightdep(x; s) > > =


ightdepdate( x; d ) > : ightdeptime(x; t) ; As shown above, the justi cations build the bridge from linguistic to pragmatic knowledge. How they are used to describe a model of mixed-initiative dialogue, is explained in Section 3.4.

3 Multi-Level Interpretation of Dialogues

In what follows we discuss how the knowledge in the domain model is used to incorporate utterances in the dialogue and to plan the next dialogue steps of the system.

3.1 An Example Dialogue

In this section, we present a dialogue from a ight information application to show the various types of contributions of each utterance and to discuss how the dialogue could be processed within the outlined framework. (S1 ) System: Good evening. This is your ight information system. How can I help you? (U2 ) User: I want a ight to Athens on Sept. 15. (S3 ) System: Where do you want to depart? (U4 ) User: From Munich. (S5 ) System: There is an Olympic Airways ight at 10:05 and a LTU ight at 6:15. (U6 ) User: At what time? (S7 ) System: Olympic Airways at 10:05 and LTU at 6:15. (U8 ) User: Okay. Thanks. Bye.

3.2 Parsing

Our approach to parsing the output of the speech recognizer has been discussed at length elsewhere (see [LudGoNie99]). Essentially, we use a standard chart parser that creates Discourse Representation Structures in a modi ed version of DRT. For the discussion in this paper, we only consider the output produced by the parser.

3.3 Grounding

Normally, semantic theories of discourse like DRT as well as implementations of dialogue systems for (spoken) language assume that each parsed fragment of the user's input is intended to be incorporated globally into the dialogue. But [Tra94] and [TraHin91] have shown that utterances have to be integrated into a local dialogue structure3 where it contributes to the construction of the current focus (or dialogue goal). Taking this analysis into account, many wrong interpretations of the intended e ect of an utterance could be avoided as it would now be possible to distinguish correctly between local and global contributions and their di erent e ect on how they update the content of the dialogue. [Tra94] discusses a set of (linguistic) operations describing how an utterance contributes to the content of the current dialogue goal (initiate, continue, cancel, repair, ack, reqrepair, reqack). These speech acts are analyzed from the viewpoint of their operationality in a transition model as well as how they are linked to epistemic reasoning of the dialogue participants. In our approach, we implement this reasoning by inferring on the basis of the domain model the coherence relation of the current utterance with the current dialogue goal or its justi cations. For example, the user goal stated in U2 is not answered by S3 ; however, S3 coheres with the ightdep justi cation of the concept ightconnection that is entailed in U2 : An answer to S3 would answer the justi cation, too. Therefore, S3 is an acknowledgement of U2 and it initiates a new (sub) goal. U4 contains information that gives an answer to the query in S3 . So, again, this utterance is an acknowledgement of goal S3 and initiates a new focus. S5 acknowledges U4 because it answers U2 which is the super-goal of U4 . U6 , in contrast to the previous utterances does not acknowledge S5 , because it asks for information already expressed in S5 . Consequently, it has to be interpreted as a request for repair. By repeating the information of S5 , S7 performs the requested repair. U8 acknowledges this explicitly so that the system may now assume the information about the connection to be understood.

3.4 Reasoning in the Application Domain with Partial Information

Any knowledge about the application is private to the dialogue manager as well as its assumptions about the user's knowledge which are expressed in the DRS of the past utterances. Private knowledge about the ight information application may be expressed in terms of the relations that hold in the data base. In order to plan the next dialogue step(s) as a reaction to the currently grounded dialogue goal (as discussed in the above section) and to update thereby the content of 3

which they call Discourse Unit (DU)

the dialogue by adding new information4 to it the system has to perform epistemic reasoning within its private knowledge to achieve the following: 1. To check whether the information contained in the current focus is consistent with the knowledge about the application. In the case of information retrieval from data bases it has to be veri ed whether there exist data base entries which are compatible with the content of the user utterances. E.g. if the user required information about ights from Rome to Athens in the evening and there was none in the data base, inconsistency would have to be inferred and the system would have to react appropriately. 2. To compute a set of solutions compatible with the user's request if there are any. Both aims can be reached by trying to prove the justi cations of the current dialogue goal as the justi cations of a concept force linguistic and pragmatic relations to hold in a consistent manner for a justi cation to be acceptable: dep-airport(x; d) ^ has-dep(x; y) ^ has-start(y; d) ! ightdep(x; d) The meaning of this implication is discussed above in Section 2.3. As a justi cation of a linguistic concept it is used in the following way to handle partial information. If ightdep(x; d) is proven to be false from the DRS of the current dialogue, then the pair (x; d) is no solution to the user's initial query. If, on the other hand,

ightdep(x; d) is not de ned in the user knowledge, the system will create a new dialogue goal to ask for the missing information and to decide later whether (x; d) can be an answer to the user's query. For example, given the DRS for U2 : 3 2 user f p Athens d 66 dialogueparticipant(user) 77 77 66 want( ) 77 66 subject( ; user) 77 66 object( ; f) 77 66 ight(f) airport(Athens) 77 66 direction(p) 77 66 has-dir(p; Athens) 77 66 has-dest(f; p) 77 66 has-depdate(f; d) 5 4 date(d) has-value(d; 1999:09:15) the system can infer that the most speci c concept of \f" is ightconnection. From this DRS, the system further infers ightdestination(f; Athens) and

ightdate(f; 1999.09.15). ightdep(f; d), however, is unde ned. Therefore, the system generates utterance S3 . From U4 , ightdep(f; d) can be inferred. After having this inference drawn, the system utters S5 presenting the solutions found in the data base. 4

that has still to be grounded

3.5 Dialogue Segmentation

In analogy to the grounding of utterances one can|on the level of dialogue goals|characterize normatively and descriptively how dialogue (sub) goals contribute to the completion of a currently open goal. A goal is completed if there are no more pending obligations (in the sense of [TraAll94] which require some (communicative) action to be taken by the obliged dialogue participant. Again, as in the case of grounding, the type of contribution is determined by a speech act assigned to the goal. The set of speech acts is determined by discourse as well as application pragmatics. In the case of the ight information application discussed throughout this paper, we can query the data base or the user, inform about facts, greet and thank, accept and reject. Some speech acts pose no obligations and therefore form dialogue segments consisting of one single goal| like greet and thank. Others pose obligations as query does and require a complementary speech act inform to build a segment. This operationality in the sense of conversational games (see e.g. [Kyo91]) is constrained by coherence assumptions: SEGMENT ?! queryI () informR ( ) BR ( ! ) This rule states that a query followed by a inform constitutes a segment only if the Responder to the query assumes the content of the inform-act to imply the initial query. As shown in [LudGorNie98b], segments can be more complex than the example above and therefore describe the structure of dialogues with nested goals to be completed as in our example dialogue which has got the following structure: S1 greetS S1 informS S1 queryS U2 queryU S3 queryS U4 informU S5 , U6 , S7 informS U8 acceptU U8 thankU U8 greetU

4 Con guration of SLDS

Hopefully, the discussion of knowledge representing and reasoning in our approach to understanding dialogues has shown that con guring a SLDS for a speci c applications means to represent the knowledge needed by the system instead of devising a prede ned dialogue structure that cannot change dynamically during a dialogue. The main idea is to incorporate domain independent (reasoning) algorithms that use data speci ed in the knowledge based of the system. By separating application domain and discourse domain, we are able to reuse the discourse domain and general (commonsense) knowledge (e.g. about time and space). The domain model for

a new application has to be incorporated into the preexisting application independent \upper model". Research will have to focus on the topic of how this knowledge can be acquired rapidly to make our approach competitive|as far as con guration speed is concerned|with less powerful nite state dialogue models.

5 Conclusion

Finite state approaches o er tools for editing (usually graphically) an a-priori dialogue structure by de ning a graph whose states describe system prompts and whose edges represent possible user responses that enable the transition from one state to another when annotations of the edge under consideration are matched. In this way, a set of parameters is collected that is used for generating a data base query. Obviously, such an approach does not provide exibility in what the user can utter at a certain moment. Content is restricted to match the prede ned responses and the illocutionary e ect is limited to answer the question posed by the last system prompt. In our dialogue model, we do allow for the exibility the nite-state models lack of by analysing the coherence of the content of utterances with respect to past utterances and open dialogue goals. This approach lets the dialogue structure evolve dynamically. Therefore, the dialogue system is capable of behaving in a more userfriendly manner. We assume that the aim of engaging in a dialogue for any participant is to describe interactively necessary steps of how a goal in the framework of an application domain could be reached (and to eventually execute these steps). Therefore, our model is centred on the dialogue participants and the execution of a shared discourse plan in the above sense. From this point of view, utterances are actions in the discourse domain performed in order to reach the pragmatic goal of the dialogue. These actions are to be seen as speech acts expressing how the utterance is intended to contribute to the completion of the current dialogue goal.

6 Outlook

A major open question is how the dialogue manager can be enabled to choose dynamically a discourse strategy that can accomplish the user's goal as fast as possible and adapts well to the user's resources and capabilities. In this point, we will concentrate research on the question of how discourse plans for communicating the state of the dialogue and the problem solver for the application domain can be integrated smoothly into the existing theory. In particular, it is of interest how such plans can be decomposed into speech acts and dialogue goals whose e ect would be to communicate information to the user that is not triggered by some user utterance. Another important point is to integrate a generation component that makes use of discourse context information stored in the dialogue structure. In our opinion, integrating context into generation is indispensable for

making transparent to the user of a dialogue system why the system is reacting in a certain way.

References

[Abd95] N. Abdallah, The Logic of Partial Information, Springer, New York 1995 [BBKdN99] P. Blackburn, J. Bos et al., Inference and Computational Semantics, in H.C. Bunt and E.G.C. Thijsse (eds.), Third International Workshop on Computational Semantics (IWCS-3), pp. 5{19, Tilburg 1999 [Bun97] H. Bunt, Dynamic Interpretation and Dialogue Theory, in: M. Taylor, D. Bouwhuis, F. Neel (Ed.), The structure of multi-modal dialogue, vol. 2, John Benjamins, Amsterdam 1997 [Don96] F. M. Donini, M. Lenzerini, D. Nardi, A. Schaerf, Reasoning in Description Logics, in: G. Brewka (editor), Foundations of Knowledge Representation, pages 191-236, CSLI-Publications, 1996 [KamRey93] H. Kamp and U. Reyle, From Discourse to Logic, Kluwer, Dordrecht 1993 [Kyo91] J. Kowtko, et al., Conversational games within dialogue, in: Proceedings of the DANDI Workshop on Discourse Coherence, 1991. [LudGoNie98a] B. Ludwig, G. Gorz, H. Niemann, Combining Expression And Content in Domains for Dialog Managers, in: Proceedings of DL 98, ITC-irst Technical Report 9805-03, Trento 1998 [LudGorNie98b] B. Ludwig, G. Gorz, and H. Niemann, Modelling Users, Intentions, and Structure in Spoken Dialogue, in: Proceedings of Konvens 98, Bonn, Germany 1998 [LudGoNie99] B. Ludwig, G. Gorz, H. Niemann, An Inference-Based Approach to the Interpretation of Discourse, in: Proceedings of the First Workshop of Inference in Computational Semantics, edited by Chr. Monz and M. de Rijke, Amsterdam, to appear [Ram97] A. Ramsay, Does It Make Any Sense? Update Semantics as Epistemic Reasoning, in: Proceedings of ACL 97, Morgan Kaufman Publishers, 1997 [TraAll94] D. Traum, J. Allen, Discouse Obligations in Dialogue Processing, in: Preceedings of the 32nd Annual Meeting of the Association of Computational Linguistics (ACL 94), pp. 1?8, Las Cruces 1994 [TraHin91] D. Traum, E. Hinkelman, Conversation Acts in Task-Oriented Spoken Dialog, in: Computational Intelligence, 8(3):575{599, 1992 [Tra94] D. Traum, A Computational Theory of Grounding in Natural Language Conversation, Ph.D. Thesis, Computer Science Dept., University of Rochester 1994