An Architecture for Multi-Domain Spoken Dialog ... - Semantic Scholar

An Architecture for Multi-Domain Spoken Dialog Systems

Nobuo Kawaguchi

Graduate School of Engineering, Nagoya University Furo-cho, Chikusa-ku, Nagoya, 464-8603, JAPAN

[email protected]

Shigeki Matsubara

Faculty of Language and Culture, Nagoya University

Katsuhiko Toyama

Graduate School of Engineering, Nagoya University Center for Integrated Acoustic Information Research(CIAIR), Nagoya University

Yasuyoshi Inagaki

Graduate School of Engineering, Nagoya University

Abstract

Several spoken dialog systems for the speci c task domain have been developed so far. But there are only a few multi-domain systems which consider about extensibility and scalability. This paper proposes a distributed architecture for the multi-domain spoken dialog systems which satis es extensibility and scalability. The key concept of the architecture is distribution and integration of the data fragments. The data fragments are information about speech input obtained from the continuous speech recognition engine. Each fragment is distributed and integrated through the hierarchy of the domain managers and the work modules.

1 Introduction

The spoken dialog systems have been studied to help persons who cannot use hands or pointing devices and who do not familiar with keyboard nor computers (Bolt, 1980; Takebayashi et al., 1992; Ando et al., 1994; Zue, 1997). Several conversational systems for the speci c domains have been developed so far. Jupiter(Zue, 1997) is a telephone dialogue system for weather information based on GALAXY(Goddeay et al., 1994) architecture. TOSBURG-II(Takebayashi et al., 1992) is a multi-modal fastfood clerk system that can manage hamburger orders. Sync/Draw(Matsubara et al., 1997) is a multi-modal drawing tool which incrementally understands speech and quickly responds for inputs. Almost all of these systems can only manage the single task domain in the sequential manner. When considering the conversational system for several task domains, the architectures for these systems are This research has been supported in part by Grant-in-Aid for COE Research Project (No.11CE2005) from Ministry of Education, Science, Sports and Cluture, Japan.

not suitable. The multiple domains conversational system should have the whole knowledge, rules and functionalities for these domains, and this requirement can not be satis ed without the distributed manner. For example, the drivers support system in the car1 should manage several task domains such as air conditioner, car radio, navigation system, information systems, etc. at the same time. It is quite dicult to develop a uniform knowledge base which can manage all information for these domains. Additionally, it might more dicult to keep the knowledge base extensible. Our objective is to design a basic architecture for constructing the conversational system for multiple task domains. Under the well-de ned and the extensible architecture, the development of the conversational system might be done in the distributed and separated manner. This paper proposes an architecture for the spoken dialog system mainly designed to develop the Driver's Secretary System. The key concept of the architecture is distribution and integration of data fragments. We regard the input and output of the conversational system as a stream of the input and output fragments respectively. These fragments are distributed and integrated through the hierarchy of the domain managers and the work modules. The work module is a simple conversational system for the speci c task domain. The module interprets the input fragments and responds by the output fragments. The domain manager is connected to several sub domain managers and work modules, and coordinate the distribution and the integration of the input and the output fragments. The hierarchical structure of the system is similar to the contract net protocol(Smith., 1980) for the multi-agent systems. But there are notable dier1 We call the system \Driver's Secretary System(DSS)".

Speech Recoginition Engine

Speech Synthesizer

Input Fragment

Input Fragment

Manager

Output Fragment

Sub Manager Selection

Master Manager

Distributer

AAA AAA AAA Figure 1: Basic design of archtecture Car Radio

Car Radio Control

Output Fragment

Mail DB

2 Multi-Domain System

Multi-domain spoken dialog system should have the following features. 1. Extensibility The system should be extensible to several domains. New domain should be easily added. 2. Scalability Even if the system is extended to a lot of domains, the system must work at the reasonable speed. 3. Usability When user want to use the system in the speci c domain, user can use the system as a single domain system. So, user do not need to understand the archtecture of the multi-domain system. Additionally, development of the system should be the compositional way. This means that the multidomain system should be composed from several single-domain systems and multi-domain systems. Without this feature, extensibility of the system is not easy to be satis ed.

3 D & I Architecture

This section describes our novel architecture for the multi-domain conversational systems. The key concept of our architecture is the distribution and integration of the input and the output fragments. Underlining idea of the concept is the diculty of understanding spoken language without considering the task domains. So we choose to distribute the whole input information to each domain speci c work modules. In our architecture, several domain managers and work modules are hierarchically composed. The

Integrator & Selector

Dialog Controler

Mail Tool

ences in the distribution and the integration mechanisms. In the following section, we rst explain the requirements for multiple domain system (Section 2). Section 3 describes our D & I archtecture for the multi-domain conversational systems. Section 4 presents the related works.

Dialog Context

Knowledge DB of Sub Managers

Figure 2: Internal design of Manager work module is a conversational system for the speci c domain. The domain manager has the knowledge about sub domain managers and sub work modules. Simple example of the system based on the architecture is shown in gure 1. In this gure, there is only one domain manager(Master Manger) which controls the whole dialogue. CarRadio Control and MailTool are work modules for controlling the car radio and managing e-mails respectively. In the following , we will explain the ow of fragments. First ,user input is recognized at the speech recognition engine. The speech recognition engine output the input fragments as a stream. The input fragment contain information(examples are shown in gure3) about input speech as the recognized word or phrase, the recognition probability, the relevance with the current context. Master Manger rst decides the relevance of the input fragment to each work module by the domain speci c knowledge. In this simple example, the vocabulary of the work modules are enough to decide the relevance. Figure 2 describes the internal design of the manager. Distributor then distributes the input fragments both of CarRadio Control and MailTool with the relevance. For input utterance \Search mail from Kazu", top two fragments shown in gure 3 are distributed to each module. The relevance of the each fragment is calculated from the knowledge of vocabulary. The output fragments contain the information about the utterance phrase , the con dence with task and the relevance. Hence CarRadio Control can not understand the word \mail", \from" and \Kazu", the module returns the output fragment with no relevance(0.0) and full con dence(1.0). MailTool can understand the fragment, then it returns some relevance and con dence. Then, Integrator of Master Manager integrates these outputs and outputs the

// Input Fragment to CarRadio Control Input: { ID : 34054, phrase : search mail from Kazu, probability: 0.75, relevance : 0.2 } // Input Fragment to MailTool Input: { ID : 34054, phrase : search mail from Kazu, probability: 0.75, relevance : 0.8 } // Output Fragment from CarRadio Control Output: { ID : 34054, module ID : CarRadio Control utterance : (null) relevance : 0.0, confidence: 1.0 } // Output Fragment from MailTool Output: { ID : 34054, module ID : MailTool utterance : no mail from Mr. Kazu, relevance : 0.8, confidence: 0.60 } // Input Fragment to MailTool for control Control:{ ID : 34054, selected : true }

Figure 3: Examples of Input and Output Fragments MailTool's fragment to the upper speech synthesizer and sends a control message to MailTool about the selection(Last fragments in gure 3). Response of the each sub-system is integrated and selected by Integrator considering the con dency and the dialog context. The input fragments don't have to contain the whole sentential information. They might be a word or a part of phrase. When using the word-by-word input fragments, the system can perform the incremental interpretation(Inagaki and Matsubara, 1995; Matsubara et al., 1997). In the next subsection, we'll explain our architecture in the more complex con guration. 3.1

Design of the Driver's Secretary System

Figure 4 shows a gobal structure of the Driver's Secretary System(DSS). There are some notable features in this design. Domain managers are now hierarchically composed in several layers. But they are not restricted to form the tree structure. For example, Information Manager and Reservation Manager are both sharing the work module Web Tool. Flight Manager and Restaurant Manager are owned by two domain managers. These managers should manage the dialog among two speakers. One is the driver and another is the clerk talking over the car phone. To manage the simultaneous dialog is also one of the interesting problems. There is also interesting feature. When user wants to drive somewhere, Drive Support Manager will manage the dialog to control the Navigation System. Then, the manager will try to reserve a parking near the desired destination , Parking Manager will noti-

ed to control the dialog. By the output fragment of Parking Manager, Reservation Manager will manage the following dialog. This kind of coordination may work among the several managers using the control fragments. The design of the DSS exempli es the extensibility of the architecture. While keeping the uniform architecture, we can extend the system to the other domains in favor of the stream of data fragments.

4 Related Work

GALAXY-II(Sene et al., 1998) is an archtecture for the information service conversational system designed by MIT. They employ the HUB architecture for the integration of the multiple domain servers. This can be regard as a simple version of the blackboard architecture (Cohen et al., 1994). The blackboard architecture has a diculty in scalability. If there are too many domain servers, it can not work well because of the network congestion. Our architecture works well in favor of hierarchical nature of the modules. The contract net protocol(Smith., 1980) is a similar concept of our architecture. But in our architecture, the distributor do not divide the input fragments. All fragments are distributed to submanagers with the relevance value2 . The integrator also do not simply conbine the results.

5 Conclusion

This paper proposes an extensible and scalable architecture for the multi-domain, multi-modal conversational systems. The novel features of our architecture are, 1. Each work module don't have to consider about the other modules and managers. So work modules can be developed separately and can be easily composed into the multi-domain system. 2. The domain managers which distribute and integrate the fragments only require the knowledge about their sub-managers and work modules. This feature makes the whole system compositional. 3. Because the input and the output fragments have the uniform structure, managers and modules can be connected in any way. Therefore, the con guration of the system is exible and extensible. 4. Each work module and domain manager runs concurrently. So the system based on the architecture is scalable. We expect that the usability of the each system is not lost in favor of intelligence of the domain man2 When the relevance is considerable low, the fragment isn't distributed.

Voice Input

AA AA AA

Speech Recoginition Engine

AAAA AAAA

Car Audio

Speech Output Echo Cancelation

Input Fragment

Speech Synthesizer

Steering Sensor

Speed Meter

Output Fragment Sensor Monitor

Car Radio Control

Master Manager

In-Car Device Control

Car CD Control

Air Conditioner Control

Information Manager

Drive Support Manager

Reservation Manager

Navigation System Control

Parking Manager

Geological Information

Restaurant Manager

Car Air-Con Mail Tool

Mail DB

Internet Manager

Flight Manager

Web Tool

Data

Navigation System

Phone Manager Voice SPREC & SYNTH

Car Phone

Figure 4: Canonical con guration of Driver's Secretary System agers. Evaluation of the usability should be done by the experiment as a future work. We are currently implementing Driver's Secretary System based on the architecture.

References

H. Ando, Y. Kitahara, and N. Hataoka. 1994. Evaluation of multimodal interface using spoken language and pointing gesture on interior design system. In Proc. of 4th International Conference on Spoken Language Processing, pages 567{570. R. A. Bolt. 1980. Put-that-there: Voice and gesture at the graphics interface. ACM Computer Graphics, 14(3):262{270. P. Cohen, A.C. Heyer, M. Wang, and S.C. Baeg. 1994. An open agent architecutre. In Proc. AAAI Spring Symposium, pages 1{8. C. Goddeay, E. Brill, J. Glass, C. Pao, M. Phillips, J. Polifroni, S. Sene, and V. Zue. 1994. Galaxy: A human language interface to online travel information. In Proc. of 4th International Conference on Spoken Language Processing, pages 707{710. Y. Inagaki and S. Matsubara. 1995. Models for incremental interpretation of natural language. In

Proc. of the 2nd Symposium on Natural Language

, pages 51{60. S. Matsubara, H. Yamamoto, N. Kawaguchi, Y. Inagaki, and K. Toyama. 1997. An interactive multimodal drawing system based on incremental interpretation. In IJCAI97 Workshop: Intelligent Multimodal Systems, pages 55{62. S. Sene, E. Hurley, R. Lau, C. Pao, P. Schmid, and V. Zue. 1998. Galaxy-ii:a reference architecture for conversational system development. In Proc. Processing

of 6th International Conference on Spoken Lan-

. R.G. Smith. 1980. The contract net protocol: Highlevel communication and control in a distributed problem solver. IEEE Transaction on Computers, 29(12):1104{1113. Y. Takebayashi, H. Tsuboi, Y. Sadamoto, H. Hashimoto, and H. Shinchi. 1992. A real-time speech dialogue system using spontaneous speech understanding. In Proc. of 3rd ICSLP, pages 651{654. V. Zue. 1997. Conversational interfaces: Advances and challenges. In Proc. EUROSPEECH'97, pages 9{18. guage Processing

An Architecture for Multi-Domain Spoken Dialog ... - Semantic Scholar

An Architecture for Multi-Domain Spoken Dialog ... - Semantic Scholar

Suggest Documents

an agenda-based dialog management architecture for spoken ...

Automated natural spoken dialog - Computer - Semantic Scholar

An Overview of the Slovenian Spoken Dialog System - Semantic Scholar

A Frame Based Spoken Dialog System for Home ... - Semantic Scholar

Dialog Studio: An Example Based Spoken Dialog System ...

ISIS: A Multilingual Spoken Dialog System ... - Semantic Scholar

Comparing Spoken Dialog Corpora Collected with ... - Semantic Scholar

spoken language understanding in spoken dialog system for travel ...

Scaling POMDPs for spoken dialog management - CiteSeerX

Dialog OS: an extensible platform for teaching spoken ... - CiteSeerX

An Integrated Dialog Simulation Technique for Evaluating Spoken

Scaling POMDPs for spoken dialog management - CiteSeerX

Miscommunication handling in spoken dialog

Semantic graph clustering for POMDP-based spoken dialog systems

Dialog Processing - Semantic Scholar

User Models, Dialog Structure, and Intentions in Spoken Dialog

Dialog Strategies in a tourist information spoken dialog system - Limsi

Semantic Confidence Measurement for Spoken ... - Semantic Scholar

An Embedded Reconfigurable Architecture for ... - Semantic Scholar

An Architecture For Contract-based ... - Semantic Scholar

CogPrime: An Integrative Architecture for ... - Semantic Scholar

An Extensible Ubiquitous Architecture for ... - Semantic Scholar

An Architecture for integrating Adaptive ... - Semantic Scholar

An Architecture for Learning Agents - Semantic Scholar