Automating the Extraction of User Model Information from ... - CiteSeerX

0 downloads 0 Views 2MB Size Report
Nov 9, 2001 - In our case that knowledge base is a hypertext list of frequently asked questions FAQ and ...... that would accept both manga style" and anime manga-inspired", or ...... user I already have one vampire story with the Art ...... hibbert, Mitchell Hunter, Pam Wol , Qiana, Rikke, Sarah, Srishti, Susanna. G. Mead ...
Final Thesis

Automating the Extraction of User Model Information from Consultation Dialogues by cand. inform. Dennis Maciuszek

Technical University of Braunschweig, Germany Examiner:

Nahid Shahmehri, Ph.D., Professor Supervisor:

Johan  Aberg, M.Sc.

Linkoping University, Sweden Department of Computer and Information Science Laboratory for Intelligent Information Systems November 9, 2001

2

3 LiTH-IDA-Ex-01/69 Automating the Extraction of User Model Information from Consultation Dialogues

4

5

Abstract This thesis addresses a natural language processing problem posed in the context of so-called Web assistant systems aka live help systems. A recent feature added to a growing number of Web sites, such systems o er user support via text chat with human assistants. To adapt consultation to the individual user, long-term information about his or her skills and interests is collected in a user model. So far, this updating of user models has been a task performed manually by the assistants. The thesis speci es, designs, implements, and evaluates a software component to automate the user model acquisition task. Text phrases containing information about user skills and interests are (1) automatically highlighted in the consultation dialogues and (2) associated with semantic user model concepts. A requirements speci cation points out the particularities of human-human text chat communication. The particularities are considered in the choice of two suitable and feasible information extraction approaches: one keyword based and one partial parsing approach. The partial parsing approach is being designed to ful l the requirements, and then implemented. An evaluation of performance indicates the resulting system is well suited to facilitate manual user modelling, but not a reliable basis for a full automation yet.

Keywords: information extraction, Web assistant sys-

tem, live help system, user model, consultation dialogue, text chat, conversational circumstances

Contents Abstract

5

Contents List of Figures List of Tables

6 8 9

Foreword 1 Introduction

10 11

2 Information in Consultation Dialogues

26

3 Approaches to Information Extraction

44

1.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2 Application Domain . . . . . . . . . . . . . . . . . . . . . . . 13 1.3 Towards a Solution . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1 Dialogue Style . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2 Grammar and Vocabulary . . . . . . . . . . . . . . . . . . . . 33 3.1 3.2 3.3 3.4 3.5

Choice of Approaches . . . . . Chosen Approaches . . . . . . Keyword Based Interpretation FASTUS Partial Parsing . . . Rejected Approaches . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

4 Implementing Information Extraction

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

44 46 50 53 57

63

4.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.3 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6

CONTENTS

7

5 Results

109

6 Conclusions

122

Bibliography A Grammar Speci cation

130 134

5.1 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

6.1 Recapitulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 6.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . 124

A.1 Grammar Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 134 A.2 Vocabulary List . . . . . . . . . . . . . . . . . . . . . . . . . . 177

List of Figures 1.1 1.2 1.3 1.4 1.5 1.6 1.7 2.1 2.2 4.1 4.2 4.3 4.4 4.5 4.6 5.1 5.2 5.3 5.4 6.1 6.2

Elfwood WIS index page . . . . . . . . . . WAS overview [ASM01] . . . . . . . . . . An Elfwood consultation dialogue log (1) . An Elfwood consultation dialogue log (2) . An Elfwood consultation dialogue log (3) . Elfwood user model attributes (1) . . . . . Elfwood user model attributes (2) . . . . . Communication model of Schulz von Thun \Two and a half" applicable IE tasks . . . Information extraction in the WAS . . . . UML class diagram: Chat language . . . . UML class diagram: Partial parsing . . . . Partial parsing phases . . . . . . . . . . . Partial parsing phases in practice (1) . . . Partial parsing phases in practice (2) . . . The testing applet . . . . . . . . . . . . . UML class diagram: Application . . . . . . Displayed extracted information . . . . . . UML class diagram: Evaluation . . . . . . A sub-ontology of user model attributes . A sub-ontology of parts of speech . . . . .

8

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. 14 . 14 . 16 . 17 . 18 . 21 . 22 . 27 . 35 . 63 . 65 . 71 . 73 . 75 . 76 . 110 . 110 . 112 . 115 . 127 . 128

List of Tables 1.1 2.1 2.2 2.3 3.1 3.2 4.1 4.2 5.1 5.2

Previous dialogue analysis statistics [ASM01] . . . . . Components of the four conversational circumstances A fth conversational circumstance . . . . . . . . . . Speci cation dialogue analysis statistics . . . . . . . . Comparison of chosen IE approaches . . . . . . . . . Keyword based interpretation: Lexicon entries . . . . Tag syntax de nitions . . . . . . . . . . . . . . . . . Tag syntax examples . . . . . . . . . . . . . . . . . . Sample extraction results . . . . . . . . . . . . . . . . Measured system performance . . . . . . . . . . . . .

9

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. 23 . 28 . 30 . 39 . 48 . 51 . 95 . 96 . 113 . 119

Foreword This Master's thesis was written during a six months stay in Linkoping as a guest student from Technical University of Braunschweig in Germany. I wish to express my thanks to everybody involved in making this exchange possible, especially my family for their encouragement and support, and to Dr Spie, who sent me here. I would like to thank my supervisor Johan  Aberg and my examiner Nahid Shahmehri for a truly fascinating project. I was happy being able to work with Natural Language Processing, User Modelling { and not to forget \Elfwood". Thank you for being patient and helpful, when the work grew larger, and results were long in coming. The atmosphere at IISLAB was at the same time warm and professional. Besides my supervisor and examiner, I thank Cecile  Aberg and Govert Meuwis for valuable comments on the document. Finally, thanks to my friends in Germany and in Sweden, for everything you do { especially Brigitte, Christelle, Gerrit, Inna, and Leah. Everybody's names are in the \complex words" le, so that my program would recognise you. Linkoping in July 2001, Dennis Maciuszek

10

Chapter 1 Introduction This introduces the problem to be solved, as well as its theoretical and its practical background.

1.1 The Problem

Concepts The project falls in the area of user support in Web information

systems (WIS). A WIS in this context may be any kind of Web site that provides substantial amounts of structured information plus a number of user functions on the data. User support means helping people acquire the information they seek { be it by guiding them to the information online, or by personal consultation. A Web assistant system (WAS, cf [AS00]), also known as a live help system, is a user support component added to a WIS, consisting of both computer-based help functions and consultation by human assistants. These would be experts in the WIS's application domain and also familiar with Web site operations. Users engage in online consultation dialogues with the assistants, when seeking help beyond what is stored in the system's knowledge base. In our case that knowledge base is a hypertext list of frequently asked questions (FAQ) and their answers. Consultations in a WAS can be personalised and made run more eciently by the assistants modelling user traits of every person they assist (cf [ASM01]). During the textual chat { or afterwards, studying its log le { the assistant records the user's statements regarding personal interests, skills, and the like, inserting discovered items into a well de ned structure: The assistant manually extracts information from the dialogue into a user model. Automation of information extraction (IE; see [Cun99], [CL96], or [AI99]

11

12

CHAPTER 1. INTRODUCTION

for an introduction) in this case means having a computer system process the natural language (or actually chat language) dialogue. It spots pieces of relevant information and stores them in the user model.

Goals To spare the assistants tedious work, an IE system taking over the

extraction task in an existing WAS was to be developed. For obtaining an approach feasible within the scope of one thesis, restrictions had to be set on the undertaking. Instead of allowing cut-backs on accuracy, the IE task of nding and inserting user model information was divided into two steps. The rst step intends the information to be discovered and to be highlighted in the assistant's chat window. It is indicated which aspect(s) of the user model the information in a phrase associates with. Processing happens in \real time", ie chat line by chat line, not just on nished dialogues. The second step, following this thesis, would determine what values to insert, depending on the content of the discovered phrase. The values would automatically be added to the user model. Until then, assistants are only pointed to user information; they still need to insert it themselves. We would consider step one successful, if a considerable percentage of what a human analyst would extract (when studying log les) was discovered and reasonably associated by the implemented IE system. In addition, the rate of false ndings had to be low.

Motivation The introduction of user models in a WAS adds adaptivity

to the system, in the sense that consultation can be tailored to individual user needs. Assistants become more comfortable and ecient in their job. A previous eld study [ASM01] conducted by the Laboratory for Intelligent Information Systems (IISLAB) indicated that assistants participating in a WAS consider user models (especially with many items added) to be helpful in their counselling sessions ([ASM01], Table 2). Only rarely did they lead to wrong assumptions about a user. However, comments about the diculty of the extraction task were less euphoric (a mean of 5.79 on a scale of 1 = hard to 10 = easy, with a standard deviation of 2.94). Presumably, automatic creation of adequate user models would make Web assisting more comfortable and ecient. The assistant job becomes less stressful and time-consuming; the WIS company saves nancial resources. With numerous WIS running today and the vast di erences in users' interests, education and cultural background, the number of possible applications of user models, WAS, and adaptivity on the Web seems huge. Exploring automatic generation of user models should quicken the interests not only of

1.2. APPLICATION DOMAIN

13

WAS researchers: The method could be applied to many types of adaptive systems, eg automatically gaining models of student knowledge in adaptive learning environments with chat features. On the other hand, a practice of automatically analysing people's behaviour { not in general, but on a personal level { and electronically storing the gathered results raises dicult ethical and legal questions. IE can easily be exploited for selective advertising or espionage. In fact, much IE research has been funded by the US Defense Advanced Research Projects Agency (DARPA). At least, this is where we enter a grey area, and we should keep an eye on what our results are being applied to. For the protection of privacy, access to user model data ought to be restricted (eg to registered assistants only) and always be granted to the user in question. My personal interest in the project came from di erent angles. Studying Computer Science and Psychology, my main research interest lies in the application of Information Science methods to psychological questions, as in modelling and emulating cognitive processes. In an earlier project, I had been building an adaptive learning environment based on models of student knowledge and domain1 knowledge. Then, I had already been using some manual extraction method for discovering prerequisite relationships between knowledge items in tutorial hypermedia texts. Now, by showing a way to automatically extract user model information from natural language consultation dialogues, I hope to make a contribution to a more adaptive WWW. As a matter of fact, processing natural language has fascinated me since the very beginning of my interest in computing, which was in the heydays of so-called text adventure games. Writing Fantasy ction as a hobby, I was also curious to apply IE research to the popular \Elfwood" WIS. The following section describes that system.

1.2 Application Domain

WIS The Elfwood Web information system2 is basically a huge, noncom-

mercial archive of amateur artwork and literature in the Fantasy and Science Fiction genre. It features a number of supportive functions like keyword based search and guided tours covering related exhibits, as well as tutorial articles about how to draw and write. Figure 1.1 shows an excerpt of Elfwood's index screen. Special emphasis lies on interactivity; users can comment the exhibits and communicate via IRC chat or message boards. To contribute their works, 1 the eld of knowledge or information 2 http://elfwood.lysator.liu.se

CHAPTER 1. INTRODUCTION

14

Figure 1.1: Elfwood WIS index page

11 00 00 11 00 11 00 11

Answers Question

Support Router

User

QuestionAnswering System

11 00 00 11 00 11 00 11

View and Edit

Consultation dialogue Assistant

Figure 1.2: WAS overview [ASM01]

User Modelling Tool

1.2. APPLICATION DOMAIN

15

users join either one or several of Elfwood's topical branches as a member. The branches are Fantasy art (\Lothlorien"), Science Fiction art (\Zone 47"), and literature (\The Wyvern's Library").

WAS So far, a Web assistant component has been integrated in the Elfwood WIS only temporarily for study and evaluation purposes. Investigated issues were a system evaluation from the user's point of view [AS01c], the generation and retrieval of FAQ items for computer-based support [AS01b], and the e ectiveness of user models in aiding human assistants [ASM01]. The system works as depicted in Figure 1.2. It is called, when a registered user sends in a question. Questions consist of a natural language query and a topic category. A support router clari es, if computer-based support is sucient, or if human assistance will be needed. First, it sends the question to a question answering system, which attempts to retrieve corresponding FAQ answers (cf [AS01b]). If the user decides these do not help, the support router establishes a text-based chat connection with a human assistant whose expertise pro le covers the question's topic category. A text chat window will pop up on each side. User and assistant can then freely debate the problem matter. While talking, the assistant views and edits the user's model via an always visible user modelling tool. The more counselling sessions a user takes, the more re ned her or his model gets, allowing more adapted and individual support. Aside from the user modelling data, log les of all help dialogue chats are being kept. In its rst three weeks evaluation run, Elfwood's WAS included 35 voluntary assistants serving one user at a time. Support was given on questions of art and literature creation, art and literature search, as well as operating functions for users and members. Consultation dialogues A sample dialogue log le, recorded during the

eld study, is displayed in Figures 1.3 { 1.5 (with names anonymised). Note that tags have automatically been inserted for structuring purposes; the actual conversation begins after . Previous lines give the question's topic category () and natural language query (). and enclose FAQ matching results. The consultation dialogue centres around a user's diculties with drawing backgrounds (thus art creation), leading to a lively discussion about one picture in her mind, concluding with the assistant o ering useful inspiration { not something easily accomplished by a computer advice. An attentive assistant can notice several details in the chat dialogue which

CHAPTER 1. INTRODUCTION

16

W 2000-4-4 - 3

Suggest Documents