Integration of Dialogue Patterns into the Conceptual

1 downloads 0 Views 754KB Size Report
Besides the dialogue description, also the user has to be modeled. This is .... The instantiation with a set instead of a single value leads to random prompting,.
Integration of Dialogue Patterns into the Conceptual Model of Storyboard Design Markus Berg1 , Antje Düsterhöft1 and Bernhard Thalheim2 Hochschule Wismar, Germany Department of Electrical Engineering and Computer Science {markus.berg,antje.duesterhoeft}@hs-wismar.de 1

Christian-Albrechts-University Kiel, Germany Department of Computer Science and Applied Mathematics [email protected] 2

In this paper we describe the extension of storyboarding by a possibility to integrate speech dialogues. This is important because today's web information systems increase in size and complexity. Formularor keyword-based searches won't be sucient in the future. That's why alternative interfaces have to be considered. The modeling of natural language interfaces for web information systems is an eminent point for the success of such systems and therefore important for e-commerce platforms. The results of this paper show, that it is possible to create patterns for common dialogue-forms. Consequently these patterns are integrated into the storyboard model and build the basis for the modeling of natural dialogues in web information systems. Abstract.

1 Introduction A web information system (WIS) [1] is a database-backed information system that is realised and distributed over the web with user access via web browsers. Information is made available via pages including a navigation structure between them and to sites outside the system. Furthermore, there should also be operations to retrieve data from the system or to update the underlying database. The design of web information systems requires detailed information about the users, their behaviour when using the system and the possibility to use dierent access channels. Moreover you must be able to integrate traditional design methods and support dierent technologies. Storyboards are a methodology which was created to address these problems by oering an abstract conceptual model. Because usability suers along with the complexity of those systems, an interesting approach is the usage of natural language. As language is a very ecient way of exchanging information, it helps to simplify man-machine-interaction and thus increases usability. The formalisation of dialogues by dialogue patterns supports the denition of natural language dialogue models and is the key for an easy to use speech interface for WIS. After a short overview of related work, we will introduce to the topic of storyboard design [1]. Afterwards dialogue patterns are specied and then integrated

II

into the conceptual model of storyboard design. Finally we will give a short conclusion which will also give an insight on benets and future work.

2 Related and Previous Work The number of publications in the eld of WIS is enourmous. The ARANEUS framework [8] determines that conceptual modeling of WIS consists of content, navigation and design aspects. This results in the modeling of databases, hypertext structures and page layout. Another similar approach is OOHDM [9] [10] which is completely object oriented. It also comprises three layers: object layer, hypermedia components and interface layer. Our own work on storyboarding has been reported in [13] and [14]. Moreover we have investigated the role and use of metaphors in storyboarding in [12]. Besides conceptual modeling also aspects of natural language dialogues have to be considered. The process of the development of speech interfaces is analysed in [4]. The work in [5] describes the conceptual base for the dialogue design process. In [6] the Wizard-of-Oz methodology is used for simulating interactive systems. Besides the dialogue description, also the user has to be modeled. This is done in [7]. The usage of dierent modalities when accessing the web is examined in [11]. The modeling of dialogues with the help of dialogue acts is regarded in [16] and [17].

3 Storyboard Design Storyboards are a methodology which was created for the design of large-scale data-intensive web information systems. It is based on the abstraction layer model (ALM) which is shown in gure 1 [3]. The strategic layer is used to describe the system in a general way concerning the intention. It is comparable to a mission statement. The business layer concretises this information by describing stories which symbolise paths through the system. The purpose of this layer is to anticipate the behaviour of the users. In the conceptual layer the scenes in the storyboard are analyzed and integrated. The design of abstract media types support the scenes by providing a unit which combines content and functionality. The presentation layer associates presentation options to the media types. In the implementation layer physical implementation aspects like setting up database schemata, page layout and the realisation of functionality by script languages are addressed. Each layer is associated with specic modeling tasks which allow the transition between the layers. To progress from the strategic to the business layer storyboarding and user proling is required. To get from the business layer to the conceptual layer conceptual models have to be created, i.e. database modeling, operations modeling, view modeling and media type modeling. The transition to the presentation layer is characterised by the denition of presentation styles. In the implementation layer all implementation tasks have to be realised. Storyboarding focusses on the business and the conceptual layer [3]. The business layer deals with user proling and the design of the application story.

III

Fig. 1.

Abstraction Layer Model

The core of the story space can be expressed by a directed multi-graph, in which the vertices represent scenes and the edges actions by the users including navigation. If more details are added, application stories can be expressed by some form of process algebra. That is, we need atomic activities and constructors for sequencing, parallelism, choice, iteration, etc. to write stories. In the conceptual layer media types which support the scenes and operations which support the activities in the storyboard are modeled. Moreover hierarchical presentations, and the adaptivity to users, end-devices and channels are addressed in this layer. WIS can be used by any web user. That's why the design of such systems requires anticipation of the user's behaviour. This problem is addressed by storyboarding. It describes the ways users may choose to interact with the system. A storyboard consists of three parts [3]: The stories, which are navigation paths through the system, the actors which comprise users with the same prole and tasks which link activities (resp. goals) of the actors with the story space which is a container for the description of the stories. Subgraphs of the story space are called scenarios. This enables a hierarchy and encapsulation of scenes. Every action can be equipped with pre- and postconditions or triggering events. This allows us to specify under which conditions an action can be executed. SiteLang is a language which denes a story algebra and allows the formal representation of the theoretical storyboard model. The explanation of the SiteLang syntax is beyond the scope of this paper and can be read in [2].

4 Denition of Dialogue Patterns A dialogue is dened as a conversational exchange of information between people. It consists of many related utterances with a specic meaning and aim. These utterances are called speech acts. Searle identied ve illocutionary acts [15]: assertives, directives, commissives, expressives and declaratives. When classifying dialogues utterances you can observe certain classes like greeting, apologizing or asking which can be assigned to the illocutionary acts (e.g. an apology is an expressive speech act). Aggregation leads to a group of speech acts. While speech

IV

acts classify single utterances, dialogue acts model dialogue classes consisting of several utterances. They can be seen as a superclass of logical related speech acts. This model is well suited when analysing dialogues a posteriori. But when dening dialogue patterns a priori we need to consider dialogue branches as we can't know the answer beforehand. A conrmation dialogue is a six-tuple D = {Q1 , A, V, C, D, Q2 } which comprises the following steps:     

Question Answer Verication Answer: Conrmation | Denial (Question for correction)

As mentioned obove the form of the answer is not known beforehand and introduces a branch. The question for correction is optional, depending on the path the user has chosen. Q1 ∈ µ, A ∈ α, V ∈ φ, C ∈ γ, D ∈ δ, Q2 ∈ ν +

Q1 , Q2 , V ⊆ Q ⊆ Σ

+

(1) (2)

Let µ be {"How many persons take part in your trip?","How many persons?", "With how many persons do you want to travel?"}, α={1,2,3,4,5}, φ={"Are you sure?","You want to travel with λα persons, correct?"}, γ ={yes, yo}, δ ={no, nope} and ν ={"Please say now the number of persons!"}. The instantiation of this pattern results in specic dialogues. Example 1.

D = {µ[2], α, φ[1], γ, δ, ν[0]} would result in the following example dialogue: S: With how many persons do you want to travel? U: 3 S: You want to travel with 3 persons, correct? U: Yes

The instantiation with a set instead of a single value leads to random prompting, which prevents monotony when the user works often with the system. Sets referring to user input like α, γ and δ specify the domain of possible answers. The values can be used to generate grammars. Let a context-free grammar be dened as a quadrupel of nonterminal symbols, terminal symbols, production rules and a start symbol: G = {N, Σ, P, S} with P = A → ω, A ∈ N, ω ∈ {N ∪ Σ}+ . Now we can infer Σ from α. The denition of possible user utterances by just enumerating terminal combinations is not very eective. That's why we need to change the domain of the user utterance sets to: dom(α), dom(γ), dom(δ) = (N, Σ, P, S). The former example for α can now be expressed as α = ({S}, {1, 2, 3, 4, 5}, {S → 1|2|3|4|5}, S). Now we are able to express even more complex utterances like "I want to travel with two persons" or "Three persons, please" as the following example shows.

V

Example 2. G = {N, Σ, P, S} N = {S, N O, P RS, P RE, P OST } Σ = {I, want, to, travel, with, persons, please, one, two, three, f our, f ive} P = {S → P RE N O P RS P OST | N O | N O P RS | N O P RS P OST P RE → I want to travel with N O → one|two|three|f our|f ive P RS → persons P OST → please}

This can be transformed into the following SRGS ABNF grammar: $S=$PRE $NO $PRS $POST; $PRE=I want to travel with; $NO=one|two|three|four|ve; $PRS=persons; $POST=please; To facilitate post-processing the introduction of semantic return values is helpful. With the help of the λ-Operator we can access values from dierent rules. An object-oriented approach allows us to dene subtypes. While λα returns e.g. "three persons please" λα.no would return 3. Now that we have dened grammar generation, some other dialogue types are specied:  Question/Answer: D = {Q1 , A}  Selection: D = {Q1 , A}  Yes/No-Question: D = {Q1 , A}

As you can see, the denition is the same in all three cases. But with the adaption of the domain you can support dierent dialogue semantics. Now we are able to create patterns for dierent dialogue types. But as these dialogues are predened, there are relatively rigid. One of the most important characteristics of natural language is fuzzy answers. Some are under- others overspecied. Underspecied answers lead to re-requests and overspecied answers have to ll several patterns. Example 3. S: When do you want to start? U: Tomorrow in Cologne S*: Where do you want to start?

VI

The third utterance of the above dialogue should not occur, as the user already gave that information. This leads to the necessity of processing overspecied answers. One approach is the extension of the recognition domain through an object oriented pattern-concept (i.e. frame). This enables us to group semantical related dialogue patterns. In a tourism scenario there exist dierent questions. Some of them are mandatory for generating a search query. Possible attributes are:     

begin of journey end of journey destination number of children number of adults

Some are likely to be summarized in single utterances. Instead of asking "When do you want to start your journey?" and "When do you want to end your journey?" you could ask "Please say your travel dates". Moreover it has to be realized that the user gives the answer "Two adults and one child" to the question "How many adults take part?". An approach to realize this, is the enclosure of domain classes. 

 id : journey  D = {Q+ , A+ }  1         id : begin    D = {Q1a , A1 }          id : end D = {Q1b , A2 }

This enclosure is equivalent to the storyboard term scene. When asking the question with the id end we can activate A+ = A1 ∪ A2 which allows the user to overspecify his answer because A+ represents an extended recognition domain. Depending on what the user said, further dialogue steps can be omitted. This can be checked by the lled variables: A+ ∈ α1 ∪ α2 , α1 = {$dates}, α2 = {$dates}. If after asking the question begin α2 6= ∅ the question end doesn't need to be posed.

5 Enhancing Storyboarding by Dialogue Patterns After having dened dialogue patterns we now integrate them into storyboarding, as can be seen in gure 2. Because the combination of dialogue steps is dened as a scene, we extend the scene-denition by dialogue patterns. A scene is dened as follows:

VII

Scene id

MediaObject: modality Actors: user Context: channel Task: id Specification: on event if precondition doScene

implementation

accept on postcondition

Fig. 2.

Fig. 3.

Storyboard with dialogue acts

Instantiation of dialogue patterns

Now the tuple which denes the dialogue has to be integrated. A selection dialogue is dened as: D1 = {Q1 , A1 }, Q1 = P ∪ S

VIII

and can be instantiated (see gure 3) with: P = {”P lease choose your option”, ”W hat kind of accommodation do you pref er?”}, S = {suite, apartment, doubleroom}, A1 ∈ S

For simplication reasons the grammar for set A is omitted in this step. This dialogue is only one part of a scene (i.e. a complex dialogue). Another dialogue would be a question for the number of persons. D2 = {Q2 , A2 }, Q2 = {”P lease say the number of persons!”, ”W ith how many persons do you want to travel?”}, A2 = {1..5}

By enclosing these dialogues, a complex dialogue pattern ∆ = {D}∗ can be created. In a scene denition this pattern has to be referenced. The answer domain of ∆ is Λ ∈ {α1 ..αn }.

Scene accommodation

MediaObject: speechForm Actors: customer Context: channel=speech Task: getMandatoryInformation Specification: on newUserUtterance if Λ = ∅ doScene D1 = selection(Q1 , A1 ) D2 = getN umber(Q2 , A2 ) ∆ = {D1 , D2 } accept on Λ 6= ∅

DialoguePattern

selection

if α 6= ∅ Specification: D = {Q, A}, Q = P ∪ S accept on α 6= ∅

DialoguePattern if α 6= ∅ Specification: D = {Q, A} accept on α 6= ∅

getNumber

The instantiation of the dialogue patterns with specic values has to be done in the presentation layer.

IX

6 Conclusion and Future Work Storyboarding is a method of modeling web information systems. Because those systems are getting more complex, the user has to be oered convenient access modalities. Written and spoken speech is an ecient way of interacting with extensive systems. In this paper we have described a exible approach of modeling dialogues. With the help of speech- and dialogue acts dialogue patterns have been designed. The combination of speech acts results in dialogues which can be further hierarchized. These complex dialogues are called scenes in the model of storyboarding. That's why we integrate dialogue patterns into the scene-description. This enables us to dene dialogues in a formal way. They can be adopted to dierent use-cases and can be instantiated in the presentation layer. These dialogues are more exible than standard dialogues in common interactive voice response systems. Nevertheless the exibility of natural language can't be expressed. Users choose the next dialogue steps on their own and don't want to follow predened questions. Moreover the current model doesn't support to escape the dialogue. You always have to nish a specic sequence. The modeling of these natural language features will be addressed in the future work.

7 Acknowledgements This work is supported by the European Funds for Regional Development (EFRE).

References 1. Schewe, K.-D., Thalheim, B.: Conceptual modelling of web information systems. In: Data & Knowledge Engineering 54, 147188 (2005) 2. Thalheim, B., Düsterhöft, A.: SiteLang: conceptual modeling of internet sites. In: H.S. Kunii et al. (Ed.), Conceptual Modeling-ER 2001, LNCS, vol. 2224, SpringerVerlag, Berlin, pp. 179192 (2001) 3. Schewe, K.-D., Thalheim, B.: Web Information Systems: Usage, Content, and Functionality Modelling. Technical Report (2005) 4. Cohen, M. et al.: Voice User Interface Design. Addison-Wesley (2004) 5. Harris, R. A.: Voice Interaction Design. Crafting the New Conversational Speech Systems. Morgan Kaufman Publ Inc (2004) 6. Fraser, N., Gilbert G.: Simulating Speech Systems. Computer, Speech and Language (1991) 7. Fischer, G.: User Modeling in Human-Computer Interaction. In: User Modeling and User-Adapted Interaction 11 (2001) 8. Atzeni, P., Gupta, A., and Sarawagi, S.: Design and maintenance of data-intensive web-sites. In: Proceeding EDBT'98, vol. 1377 of LNCS., 436450 Springer-Verlag, Berlin (1998) 9. Rossi, G., Garrido, A., and Schwabe, D.: Navigating between objects: Lessons from an object-oriented framework perspective. ACM Computing Surveys 32, 1 (2000)

X 10. Rossi, G., Schwabe, D., and Lyardet, F.: Web application models are more than conceptual models. In: Advances in Conceptual Modeling, vol. 1727 of LNCS., 239 252. Springer-Verlag, Berlin (1999) 11. Wahlster, W.: SmartKom: Multimodal dialogues with Mobile Web Users. Proc. of the Cyber Assist International Symposium, 3334 (2001). 12. Thalheim, B., Düsterhöft, A.: The use of metaphorical structures for internet sites. Data & Knowledge Engineering 35 (2000), 161180. 13. Feyer, T., Thalheim, B.: E/R based scenario modeling for rapid prototyping of web information services. In: Advances in Conceptual Modeling, vol. 1727 of LNCS. Springer-Verlag, 253263 (1999) 14. Schewe, K.-D., Thalheim, B.: Integrating database and dialogue design. Knowledge and Information Systems 2, 1, 132 (2000). 15. Searle, J.R.: Speech Acts. An Essay in the Philosophy of Language. Cambridge (1969). 16. Sitter, S., Stein, A.: Modeling the illocutionary aspects of information-seeking dialogues. In: Information Processing & Management, vol. 28(2), 165180 (1992). 17. Stolcke, A. et al.: Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech. In: Computational Linguistics, vol. 26(3), 339373 (2000).