Knowledge Assimilation and Web Deployment

0 downloads 0 Views 281KB Size Report
Spider and Agent S cript. Processor. PROLOG Lexical and Semantic ... Java based hamsam protocol adapter available from http: //hamsam.sourceforge.net.
Knowledge Assimilation and Web Deployment Techniques for Conversational Agents Elizabeth Figa School of Library and Information Sciences University of North Texas P.O. Box 311068 Denton, Texas 76203 http://www.unt.edu/slis/people/faculty/figa.htm [email protected] Paul Tarau Department of Computer Science University of North Texas P.O. Box 311366 Denton, Texas 76203 http://www.cs.unt.edu/∼tarau [email protected] Abstract We describe techniques for building conversational agents that integerate online and offline knowledge bases with logical inference engines. Information sources such as the WordNet knowledge base and a Web content extraction agent using Google’s Web-search API cooperate in a distributed multi-agent component environment. Our agents interact with users over the Web, as voice-enabled animated characters and can reach wireless devices through Instant Messenger protocol adapters. Keywords: conversational agents, natural language processing, inference engines, knowledge processing, WordNet, agent-based Web services

1. Introduction Conversational agents [3, 22, 24, 5] have been identified as effective multi-modal interface elements for applications ranging from user support automation to video games and interactive fiction [8, 24, 6]. In the context of today’s pervasive computing, the overall convenience of natural language interaction is a pleasant surprise, despite of the long road ahead to human-like perfection, as conversational agents get deployed on channels like instant messengers in environments like wireless PDAs or cell phones.

The availability of high quality lexical knowledge bases such as WordNet [10, 4], FrameNet [1], and common sense ontologies like OpenMind [13, 14] is finally providing a synergistic critical mass for the development knowledgeand inference-based conversational agents. The ability to reuse, with minimal coding effort, semantic and lexical information from such knowledge bases, promises to give a new chance to unconstrained natural language interaction. This paper describes our experience in merging these technologies into a new type of Internet-aware, knowledge intensive conversational agent that provides spoken natural language access to online information sources. Our proof of concept conversational agent is deployed as a voice-enabled character at http://logic.csci.unt.edu:8080/ wordnet agent/frame.html and as the Yahoo Instant Messenger handle jinni agent, a fairly original alternative deployment platform, also accessible through wireless PDAs or cell phones.

2. An Architecture for Knowledge Intensive Conversational Agents Our agent application architecture is described in Fig 1. We will overview the components of the architecture and their interactions as follows.

User Specific Short Term Memory

XML/RDF Meta-data

RDF/XML Parser And Search Module

2.2. Conversational Agents as Voice Enabled Web Services

Web Browser Client User Interface Internet Information Sources

PROLOG HTTP Server, Spider and Agent S cript Processor

PROLOG Natural Language Syntax and Pattern Processor

WORDNET PROLOG Knowledge Base

PROLOG Lexical and Semantic Inference Rules AND WORDNET+FRAMENET Lexical Database Procesor

OPEN MIND Inference Processor

PROLOG Graph Algorithms

PROLOG Facts Derives From OPEN MIND Knowledge Base

Figure 1. A Knowledge Intensive Conversational Agent Architecture

2.1. Server Side Prolog Agents

The agent architecture is centered around Jinni 2004 [19, 17, 16] or BinProlog-based [21, 20] Prolog Server Agents that are able to run as extensions of a Prolog-based Web server. Server agents can run on separate threads and accomplish various functions ranging from Web-based information extraction to dynamic generation of Web pages. At the cost of a few hundred lines of Prolog code (mostly DCG grammar-based) we also provide basic HTTP services. As a result,our agents can we be deployed as Jinni applets or BinProlog-based Web services without having to interface to an external Web server.

Our Agents are deployed using Prolog Web servers and server side Prolog script processing capabilities. This provides seamless integration between the knowledge base, the shallow script processor and the XML metadata reflected as a Prolog set of ontology-specific facts and rules. We have used Microsoft Agent [9] components embedded in a dynamically generated Java Script Web page to provides easy integration of client-side voice and animation services1 . When using Internet Explorer (see Fig. 2), the dynamically generated Web page triggers automatic download of the Microsoft Agent controls from the Microsoft server on first use Client-side voice interaction is provided through the SAPI voice API’s text-to-speech component. Specific text and animation commands are generated by our Prolog Server Agent processor which edits annotations made in a page template. The Pattern Processor, which also provides shallow natural language processing for queries, is used to locate patterns in the template and replace them with content from a Prolog database or an associative list.

2.3. Alternative Interaction Channels: Prolog Agents as Instant Messenger Handles We have implemented a Yahoo Instant Messenger protocol adaptor which also works with wireless devices such as Pocket PC-based cellular phones. The reader can try this alternate interface by connecting to the Yahoo Instant Messenger and initiating a chat with the jinni agent “handle”. The agent uses a Jinni component which embeds the Java based hamsam protocol adapter available from http: //hamsam.sourceforge.net. As an extension to the basic function of the Yahoo IM protocol, our Jinni agents can organize conferences and connect users among them. 2

2.4. PDA and cellular phone deployment through lightweight Jinni applets On PocketPC PDAs and cellular phones supporting the Jeode Java runtime, a lightweight Jinni2004 remote shell can be also used for a text-based interaction, by connecting to the agent server using remote predicate calls over TCP-IP 1

2

While Microsoft Agents have the advantage of easy deployment on Windows platforms, the user interface module can easily accommodate alternative interaction forms. In particular, experiments consisting in interleaving human-to-human Instant Messenger conversation channels with agent-to-human and agent-to-agent channels are possible, with surprising social effects in which users became progressively less and less sure on which side of the Turing-test they were at a given time.

3. Classes of Conversational Agents On the server side, several types of Conversational Agents compete for interaction with the user when a query is received. We have designed a Multi-Agent cooperation model by combining a finite state top-level control-dispatching loop that can also delegate actions to remote components built in Java by using Jinni 2004’s and BinProlog’s remote predicate call interface.

3.1. WordNet Knowledge Agents

Figure 2. Conversational Agents as Voice Enabled Web Services

sockets. The applet is about 15K and it runs under the Javaenabled browsers that come standard with the latest generation of PDAs and cellular phones. The remote predicate call layer simply wraps user input in ascii representation of Prolog remote call predicates that are parsed by the server into Prolog queries.

2.5. Sharing a Conversational Agent Server A remote predicate call based client-server interface provides the ability to share an agent server which runs on top of a multi-threaded Prolog system like BinProlog [18] or Jinni 2004 [19], between several “heterogeneous” clients running on clients ranging from browser to command line shells and Instant Messenger windows.

A systematic representation of the English lexicon, based on psycholinguistic considerations, has been assembled in the WordNet [10, 4] database. Inference-enabled Prolog agents provide quick access to the semantic and lexical knowledge provided by WordNet. This database is also available in Prolog form (see http://www.cogsci.princeton.edu/∼wn ) and is therefore ready to be used as part of a rule-based inference system. WordNet maps word forms and word meanings as a many-to-many relation. An important characteristic of WordNet is that semantic relations (hypernymy, hyponymy, synonymy, meronymy etc.) are defined in WordNet between meanings instead of being defined between words or word phrases. Meanings are represented by integers (called synsets) associated to sets of words and word phrases collectively defining a sense element (concept, predicate or property and also usable for indexing. So, for example, the synset Id=108024371 maps to the following list of words and word phrases: [[actor], [histrion], [player], [thespian], [role, player]], which collectively define a common meaning. At the same time, actor also maps to alternate meanings like the those associated with synset 108026056 which represents [[actor], [doer], [worker]]. 3.1.1. An Efficient Bidirectional Word Phrase to Meaning Mapping We have refactored the set of predicates provided by WordNet closely following the WordNet relation set (see http://www.cogsci.princeton. edu/∼wn/doc.shtml) to support bidirectional constant time access to the set of meanings associated to a given word phrase (indexed by a unique head word) and for the set of word phrases and relations associated to a given (unique) meaning. Definitions and examples originally present in WordNet are preparsed so that they can be processed efficiently, if needed, at runtime. We also collect frequency information and word forms not present in the form of WorldNet entries.

Note also the presence of reversed relations like hyponyms (reverse hypernyms) and reverse meronyms. These are are precomputed to support high performance graph walk operations to provide constant time access to edges related to a given node for a given relation. We have also precomputed mappings from word variants to related meanings, based on the dictionary entry they belong. Finally, we have precomputed “toplevel” nouns and verbs (which do not have further hypernym links) and extracted a toplevel ontology [15] by running the PageRank link-analysis algorithm [11, 2] on the WordNet graph. This refactoring provides sets of facts with the following properties: • given a head word (or a word phrase), we can extract in constant time all related information about their meanings (used in efficient parsing of textual data to lists of synsets) • given a meaning, we can extract in constant time all relations and related meanings or words provided by the WordNet database Overall, our refactoring simplifies WordNet while providing an efficient inference engine through Prolog rules that can digest the information contained in these basic relations.

3.2. Reactive Agents Our reactive agents are implemented as actions attached to a pattern processor.

3.2.2. Using Prolog Graph Algorithms In the directed graph G of meaning nodes NG we can see various WordNet relations as defining elements of a set of edges EG . We have implemented several derived relations as depth-K walks in such graphs. Besides the primitive WordNet relations, we use closures3 as edge generators. 3.2.3. Making Semantic Connections through Least Common Ancestors To explore various semantic relations provided by or inferred from WordNet, we have implemented a generic graph walking algorithm which uses arbitrary relations to dynamically define edges and applies arbitrary conditions to the nodes. Among the interesting applications to chat agents, a fast least common ancestor computation provides answers like in: guest: Relate work and fun. agent: After scratching my head 1 times, relating work to fun makes me think about space walk.

Unfortunately, on an example like the following, with depth limited to 3, our agent will try to save face, after not finding a connection4 : guest: Relate logic and love! agent: About love. Love is a kind of emotion, are you thinking about any strong feeling?

3.2.4. Generating Metaphors Given that several different semantic relations between synsets we can see WordNet as a Category [7] having synsets as Objects and semantic relations as Morphisms. Metaphors are then described as commutative diagrams induced by applying a Functor mapping the WordNet category into itself. Here is the output of an auxiliary predicate which prints out sentences produced by our metaphor generator.

3.2.1. A Definite Clause Grammar-based Pattern Processor We have designed a generic Definite Clause ?- metaphore_test(logic,programming). Grammar-based Pattern processor which works on arbi[logic,is,to,common,sense,as,programming, trary data (character codes, tokens, sentences) to detect and is,to,creating,by,mental,acts] aggregate patterns at a given syntactic level. [logic,is,to,sense,as,programming, The predicate match_pattern(Pattern,InputList) is,to,creating,by,mental,acts] matches Pattern against InputList. Pattern can [logic,is,to,good,sense,as,programming, contain any combination of constants, constrained variis,to,planning] ables of the form X:P where P is a predicate about X, as [logic,is,to,sense,as,programming, well as Gap variables which match arbitrary sequences lois,to,planning] cated between constants and constrained variables. Note [logic,is,to,philosophy,as,programming, is,to,planning] that constants and constrained variables match single [logic,is,to,system,of,rules,as,programming, items and function as known index elements in the Inputis,to,creating,by,mental,acts] List, while Gap variables collect the text to be retrieved. [logic,is,to,principle,as,programming, If patterns contain other patterns, the mechanism allows recursive application. The code actually handles more 3 Our closures are predicate name+argument combinations, which repowerful annotations (i.e. regular expressions and disceive two graph nodes as extra arguments to make-up a callable predjunctive patterns) which have been proven very useful in icates. 4 In fact, the same happens at level 12. And that’s because in Worldapplications like text mining and Internet content processNet’s “view”, there is no connection. ing.

is,to,creating,by,mental,acts] [logic,is,to,principle,as,programming, is,to,planning]

On the more difficult topic of “logic and love”, among the one hundred or so eligible metaphors the agent picked: guest: Compare logic and love! agent: Logic is to common sense as love is to sexual activity.

likely as the result of a hypernym-based commutative diagram. Here are a few more answers to the same query, in raw Prolog form: [logic,is,to,system,of,rules,as,love, is,to,physical,attraction] [logic,is,to,system,as,love,is,to,lover]

3.3. Goal Driven Agents Interesting conversations are the result of mixedinitiative. This suggest adding a goal generation mechanism that can plan a consistent thread of interaction in which the agent tries to control the flow of conversation. The mechanism works as a separate thread in a way that more urgent user requests, detected for instance by the reactive, are prioritized. We have prototyped this concept with three simple goals: • a meaning guessing game • an age guessing game • and anagram guessing game

3.6.1. Interaction Ontology For a successful and credible interaction with a human user, agents have to be aware of references to the participants in the dialog and their feelings, emotions, requests for services etc. materialized as speech acts [12, 23]. Usually the presence of personal pronouns are indications of such an ontology. A special purpose pattern based parser follows up if such hints are detected. As a result, the dialog state machine enters in a personal questions mode where a combination of short term memory and reactive rules are used to guess the focus of the conversation. 3.6.2. Context Ontology Ontology specific agents, specific to a domain or context are implemented using of an XML/RDF descriptions together with pattern-based generic conversation scripts. In the virtual storytelling application described in [6], in addition to the XML/RDF metadata created by human indexers for each of the stories, we have developed a set of relevant questions that are matched to all of the metadata fields. Some are generic-use question templates that work for every metadata tag and some are subject-specific to cover commonly asked questions posed to the conversational agent. Both question sets are expanded using the semantic relations provided by WordNet to cover synonyms and coordinate terms which combine them in alternative forms to enable the agent to answer questions about a given story, that are in different forms but having the same meaning. The resulting dialog provides a fairly natural interaction based on a specific ontology (information related to a specific story, for example).

3.4. Using the Short Term Memory Database Our agents share a short a user specific short term database consisting in a parsed question/answer pairs together with the finite state machine tag indicating the context in which they have occurred. This database is text-mined for dialog fragments that can help disambiguate the question about to be answered and suggest new threads of conversation, and set new goals.

agent: Hello Elizzie. I am a conversational agent who knows about things, words and people. Chat with me! Elizzie: What is the title of the story? agent: The Ant and His Treasure. Elizzie: What is the subject of the story? agent: Learning to rely on yourself; try, try again.

3.5. The Persistence Agent The Persistence Agent periodically saves the state of various databases and keeps track of users based on their login and password based persistent identities. This allows resuming conversations from the point they have been left in a previous interaction section.

Elizzie: Who was telling? agent: I know about telling, disclosing unintentionally. Elizzie: Who was telling the story? agent: Shelly Kneupper.

3.6. Ontology Specific Agents

Elizzie: What is the setting? agent: Anthill, nature.

Ontology specific agents can focus on the domain of the discourse or on the speech acts of the conversation itself.

Elizzie: Who are the characters? agent: Ant, bee, cockroach, spider.

Elizzie: How does the story end? agent: An ant finds a breadcrumb, and attempts to enlist the help of various other animals to transport it to his home but he learns self-reliance from his trials and errors.

We refer to [6] for a more detailed description of storytelling agents interacting with a user about the content of a collection of stories.

natural language and a knowledge processor. The architecture merges these technologies to build voice-enabled, easy to use end user applications with significant knowledge processing capabilities. The synergetic effect of the integrated lexical/semantic knowledge base and our specialized inference algorithms will help developing a new generation of chatbots for interactive fiction environments as well as for uses in online teaching, user support, information retrieval and video game authoring.

References 3.7. Enhancing Conversations with Web Mining Agents Clearly, there are situations like questions about recent events or contemporary personalities which are not part of knowledge bases like WordNet or Open Mind. For instance, during an experiment performed at the very moment the 2004 Oscar prizes were adjudecated, our Google API-enabled Web agent was ready to provide the relevant information from real-time online news sources: guest: Who won the 2004 Oscar for best movie? agent: 2004 hobbit; 2004 for happen battle Oscar history fantasy film ever won best picture

The Java-based Google API adaptor works as a remote Jinni 2004 component to which queries that cannot be handled by local agents are delegated. We expect significant improvements in this component from the advancement of the text mining and meta-search efforts of our research group. A Jinni-based adaptor calls the Google API using a Java Reflection-based interface. This Jinni component starts a server to which various other chat agent components send queries. Queries answers (provided in groups of 10 by the Google API) are cached by the clients and processed as needed in their local conversational context. The client agent is also resposible to possibly expand a query using WordNet-provided synonyms and coordinate terms and than rank the returned answers based on their intersection with the lexical material and the synset set of the query.

4. Conclusion This paper has described an agent architecture supporting multiple-source knowledge source assimilation and multi-channel deployment of knowledge-intensive conversational agents. The architecture uses a Prolog Server Agents-based Web interface, XML/RDF-based Semantic Web data for specific ontologies, and a Prolog-based

[1] C. F. Baker, C. J. Fillmore, and J. B. Lowe. The Berkeley FrameNet project. In C. Boitet and P. Whitelock, editors, Proceedings of the Thirty-Sixth Annual Meeting of the Association for Computational Linguistics and Seventeenth International Conference on Computational Linguistics, pages 86–90, San Francisco, California, 1998. Morgan Kaufmann Publishers. [2] S. Brin and L. Page. The anatomy of a largescale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1–7):107–117, 1998. http://citeseer.nj.nec.com/brin98anatomy.html. [3] M. Cavassa, F. Charles, and S. Mead. Characters in Search of an Author: A.I. Based Virtual Storytelling. In Proceedings of the International Conference on Virtual Storytelling, Avignon, France, Sept. 2001. [4] C. Felbaum. Wordnet, an Electronic Lexical Database for English. Cambridge: MIT Press, 1998. [5] E. Figa and P. Tarau. Story Traces and Projections: Exploring the Patterns of Storytelling. In N. Braun and U. Spierling, editors, TIDSE’2003, Darmstadt, Germany, Mar. 2003. [6] E. Figa and P. Tarau. The VISTA Project: An Agent Architecture for Virtual Interactive Storytelling. In N. Braun and U. Spierling, editors, TIDSE’2003, Darmstadt, Germany, Mar. 2003. [7] M. Fokkinga. A gentle introduction to category theory — the calculational approach. In Lecture Notes of the STOP 1992 Summerschool on Constructive Algorithmics, pages 1–72 of Part 1. University of Utrecht, Sept. 1992. [8] M. Mateas and A. Stern. Integrating Plot, Character and Natural Language Processing in the Interactive Drama Facade. In N. Braun and U. Spierling, editors, Proceedings of the Technologies for Interactive Digital Storytelling and Entertainment Conference, Darmstadt, Germany, Mar. 2003. [9] Microsoft. The Microsoft Agent Home Page. http://www.microsoft.com/msagent. [10] G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Five papers on WordNet. CSL Report 43, Cognitive Science Laboratory, Princeton University, July 1990. [11] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998. [12] J. R. Searle. Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press: Cambridge, England, 1969.

[13] P. Singh. The public acquisition of commonsense knowledge. In Proceedings of AAAI Spring Symposium on Acquiring (and Using) Linguistic (and World) Knowledge for Information Access, Palo Alto, California, 2002. AAAI. [14] P. Singh. The Open Mind Common Sense. Technical report, M.I.T Media Lab, 2003. http://commonsense.media.mit.edu. [15] J. F. Sowa. Relating templates to logic and language. In M. T. Pazienza, editor, Information Extraction: Towards Scalable, Adaptable Systems, LNAI 1714, pages 76–94. Springer-Verlag, 1999. http://users.bestweb.net/ sowa/direct/template.htm. [16] P. Tarau. Inference and Computation Mobility with Jinni. In K. Apt, V. Marek, and M. Truszczynski, editors, The Logic Programming Paradigm: a 25 Year Perspective, pages 33– 48. Springer, 1999. ISBN 3-540-65463-1. [17] P. Tarau. Fluents: A Refactoring of Prolog for Uniform Reflection and Interoperation with External Objects. In J. Lloyd, editor, Proceedings of CL’2000, London, July 2000. LNCS, Springer-Verlag. [18] P. Tarau. BinProlog 10.x Professional Edition: User Guide. Technical report, BinNet Corp., 2004. Available from http://www.binnetcorp.com/BinProlog. [19] P. Tarau. The Jinni 2004 Prolog Compiler: a High Performance Java and .NET based Prolog for Object and Agent Oriented Internet Programming. Technical report, BinNet Corp., 2004. URL: http://www.binnetcorp.com/download/jinnidemo/JinniUserGuide.html. [20] P. Tarau and V. Dahl. A Logic Programming Infrastructure for Internet Programming. In M. J. Wooldridge and M. Veloso, editors, Artificial Intelligence Today – Recent Trends and Developments, pages 431–456. Springer, LNAI 1600, 1999. ISBN 3-540-66428-9. [21] P. Tarau and V. Dahl. High-Level Networking with Mobile Code and First Order AND-Continuations. Theory and Practice of Logic Programming, 1(3):359–380, May 2001. Cambridge University Press. [22] R. Wallace. AIML Pattern Matching Simplified. Technical report, A.L.I.C.E. AI foundation, 2002. Available at http://alice.sunlitsurf.com/documentation/matching.html. [23] E. Werner. A formal computational semantics and pragmatics of speech acts. In Proceedings COLING–88, pages 744– 749, 1988. [24] M. Zancanaro, A. Cappelletti, and C. Signorino. Interactive Storytelling: People, Stories, and Games. In Proceedings of the International Conference on Virtual Storytelling, Avignon, France, Sept. 2001.

Suggest Documents