2006 International Conference on Hybrid Information Technology November 9th ~ 11th, 2006
Action Representation for Natural Language Interfaces to Agent Systems Christel Kemke Department of Computer Science, University of Manitoba e-mail :
[email protected] Abstract. In this paper, we outline a framework for the development of natural language interfaces to agent systems, with a focus on action representation. The architecture comprises a natural language parser and case frame based analysis for semantic representation for the linguistic content of the input. The knowledge base, used as core instance of the mapping and interpretation process, features a representation of actions and related objects in a conceptual hierarchy, which is suited to provide a connection to the artificial agent’s repertoire of actions. The framework thus features representations of actions, specifically designed to link linguistic inputs of the human user to the action set of an artificial agent. The framework has been employed in the development of various agent systems and their natural language interfaces, including simulated household robots, an interior design system, a travel planner, a cook, and a remote controlled toy car. Keyword:
Natural Language Interfaces, Artificial Agents, Agent Communication, Action Representation, Smart House, Context Models 1. Background Research on artificial agents and multi-agent systems has gained a lot of attraction and attention in the past years, with the development of suitable physical, hardware agents as well as software tools and techniques to program agent systems. Specific branches of this research are autonomous agents e.g. unmanned vehicles or smart home applications, or multi-agents systems, as exemplified e.g. in robotic soccer. Our research is settled in the context of human controlled, semi-autonomous agents, and in particular the verbal communication between human and artificial agent. So far, agent communication languages like ACL or KQML have been developed and successfully used to implement communication protocols for agent systems. Nevertheless, the representation of the contents of communicated messages is not addressed in these languages, and thus remains an open problem, to be resolved by the designer of a specific agent implementation. We suggest a general framework for the representation of actions, which is suitable as basis for the interpretation of natural language commands, queries and statements, on one hand, as well as for the connection to an agent system and it's repertoire of executable actions, on the other hand. The core issue of this approach is to provide action descriptions based on a taxonomic hierarchy of action concepts, which allows search and selection processes for specified actions, combined with an action representation formalism, which is suitable for reasoning and planning processes. Using the guide and framework described in this paper, we developed interfaces for a variety of simulated agent systems, including several household agents, a receptionist and guide for the computer science department, and an interior design system.
domain ontology, i.e. actions and objects, which are central to the agent system, in a format, which is suitable to represent the respective contents of natural language inputs, as well as to provide a connection to the agent system's repertoire of actions. The representation has also to be suitable as a foundation for essential reasoning and inference tasks. Thus, we chose a combination of a caseframe like representation motivated by Fillmore's work [6,7], and a formalism based on KL-ONE [3], and the more recent school of description logics [2,16]. 2.1 System Architecture The overall system architecture (see Figure 1) provides an analysis of natural language inputs, separating different types of verbal expression (i.e. speech acts), like commands, questions, or statements. Based on a syntactic parsing process and a rough contents-based analysis, a respective frame representation of the input is developed. This frame representation is centred around the action, which is addressed in the verbal input and supposed to be done by the artificial agent. Additional elements of the analyzed input, like noun phrases specifying objects to act on, or prepositional phrases specifying locations, serve as fillers to slots in the frame structure. This frame is mapped onto the domain representation, which consists of action concepts, arranged in a taxonomic hierarchy. This hierarchical organization allows a directed, contents-based search for an action concept matching the action described in the input. This realizes a mapping from the initial verbal input through a structured representation in terms of case frames onto the formal ontology of actions represented in the knowledge base. This process comprises, on the generic level, a mapping of linguistic terms to standard terms and concepts of the ontology, like generic action and object names, thus eliminating linguistic ambiguity and vagueness.
2. Overview A core issue in the development of natural language interfaces to agent systems is a representation of the
1
2006 International Conference on Hybrid Information Technology November 9th ~ 11th, 2006
"Switch the TV on."
Syntactic Analysis
sentence-type: command head-verb: (verb switch) object-NP: (NP (det the) (noun TV)) compl-prep: (prep on)
Frame Interpreter
This action is specified as a concept through a set of parameters referring to the object to be moved, its current location, and its future location, respectively. The effect of the action is formulated through standard logical formulas, in a STRIPS-like fashion as in [4,24]. In the example of the move-action, the action can be applied only to objects of type moveable_object (which is again a concept in the knowledge base) and the effect of the action is the assignment of a new location-value to the loc-feature of the object. Figure 2 shows the move-action description. The top line shows it as move-function with parameters in ADL style notation [], followed by the structured representation in our representational format. Note that moveable_object and location refer to concepts in the knowledge base; obj_1 refers to an instance of moveable_object, i.e. it is a variable standing for a concrete object in the domain, and loc_1 refers to the new value of the loc-feature of this object, i.e. its new location specification, which will be instantiated with a specific location-value, when the action is instantiated, i.e. when concrete values have been substituted for the parameters obj_1 and loc_1 as a result of the natural language processing and interpretation components within the system. The instantiated action description can then be passed on to a robot or a controlling system for action execution.
move (obj_1: moveable_object; loc_1: location) action: switch object: TV-1 mode: on
action: move object: obj_1 of moveable-object destination: loc_1 of location effect: obj_1.loc := loc_1
Figure 2: Representation of move-action as function and structured concept
Knowledge Base
Action Interpreter
action: switch object: TV-1 state: on effect: TV-1.state:=on
For the object-part of the conceptual hierarchy, we determine a taxonomy based on an extensional semantics and set inclusion, as used in the definition of semantics for standard first-order rpedictae logic, or description logics [], i.e. a concept C is a sub-concept of another concept D, if all objects, which are in (the interpretation of) C, are also in (the interpretation of) D, i.e. I(C) ⊆ I(D). For example, TV is a sub-concept of electronic-device, which is also a sub-concept of physical_object. If we want to specify that TV’s can be moved, the concept TV needs to be defined in addition as sub-concept of moveable_object (Figure 3). physical_object
Figure 1. Processing of a natural language input 2.2 Knowledge Representation The knowledge base itself contains not only a hierarchy of action-concepts but also relevant object-concepts, which serve as representations of physical objects in the domain. These object-concepts are described through a set of features and roles, which capture attributes, e.g. colour, size, or location of objects, as well as relations between objects, like spatial relations, e.g. left-of or ontop-of. Effects of actions can now be described in terms of changes they cause in the environment, i.e. changes of feature- or role-values of these objects. We give a generic move-action as example in Figure 2.
2
electronic_device
moveable_object
TV
TV-1
Figure 3: Part of the object-concept hierarchy
2006 International Conference on Hybrid Information Technology November 9th ~ 11th, 2006
Note that in our object taxonomie, moveable_object is a sub-category of physical_object, which means that every moveable object is a physical_object but not necessarily vice versa (see Figure 3). We can also specify that certain objects in the domain, e.g. a specific appliance or device in a house, belong to a certain concept. In Figure 3, we added an object with the identifier TV-1 as instance of TV. Action as well as object concepts are described through feature- and role-specifications, which are inherited downwards in the hierarchy, or added at any level. For example, some electronic devices can have a feature called state, which can take the Boolean values on or off. Figure 4 shows switchable_device inserted in the hierarchy, as sub-concept of electronic_device and superconcept of TV. TV inherits the state-feature from this new concept. electronic_device
state Boolean_Value
switchable_device
TV
TV-1
Figure 4: New concept switchable-device with added statefeature The hierarchical arrangement of object-concepts induces also a hierarchy of action-concepts. We can, for example, define a generic action switch, which can be applied to all types of switchable _device. We can further define a sub-action of switch by restricting the device to TV only. This new switch-action for TV could involve the use of a special device like a remote control, defined as concept RC, and referred to in the description of the TVconcept. Then, we can define an action-concept switch_TV, dedicated to switching TVs on or off, making use of the remote-control. Base level actions in this hierarchy typically refer to the higher level actions of agents systems, for example, a move-grasper action of a robot, which moves the robot’s grasper from one position to another position.
3. Using the Concept Hierarchy Starting with a representation of the analyzed natural language input as shown in Figure 1, a frame-structure is derived, which eliminates most of the linguistic ambiguity in the input. This process is guided by a mapping of similar words onto specified case frames and roles, and additional selection processes based on the concepts defined in knowledge base. A search in the conceptual hierarchy using the given frame structure resolves remaining interpretation issues regarding the description of the action and involved objects. The outcome of this process is an interpretation of the natural language input, yielding a description of an action in the format described above (section 2.2).
The resulting action representation is thus in a format, which can be used directly by the respective artificial agent, e.g. a robot or software agent, or easily translated into their action repertoire. Since the action-specifications, which are provided as concepts in the knowledge base, can be selected and defined with a view to the artificial agent’s repertoire action, a mapping to the agent’s higher level actions is straightforward.
4. Conclusion We outline in this paper a framework for the development of natural language interfaces to agent systems, with a focus on an integrated action and object taxonomy using a specific format for action descriptions, which allows the construction of a conceptual hierarchy as knowledge base for agent systems. We developed several prototypes of agent systems with natural language interfaces, including software agents as well as physical agents, using this approach and framework [9,11,13,14]. The method of describing actions in a taxonomy has also been used by Walker [23] for a planning algorithm integrating action abstraction and plan decomposition hierarchies. A formal semantics for actionand object-concepts and accompanying classification algorithms, which ensure that the taxonomic hierarchy is coherent and consistent, has been described earlier [10,11]. We intend to do more detailed work in the future on the interface to robotic agents, in order to provide a generic universal adaptable interface for robots based on typical robotic actions and controls. Another aspect is the connection of the natural language interface to standard lexical and ontology systems, for example WordNet and FrameNet, in order to achieve more broadness and standardization on the linguistic level.
References 1.
J. F. Allen, et al. The TRAINS project: A Case Study in Defining a Conversational Planning Agent, J. of Experimental and Theoretical AI, 7, 1995, pp.7-48. 2. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. PatelSchneider (eds.), The Description Logic Handbook, Cambridge University Press, 2003. 3. R. J. Brachman and J. G. Schmolze, An Overview of the KL-ONE Knowledge Representation System, Cognitive Science, 9(2), pp. 171-216, 1985. 4. P. T. Devanbu and D. J. Litman, Taxonomic Plan Reasoning, Artificial Intelligence, 84, pp. 1-35, 1996. 5. B. Di Eugenio, An Action Representation Formalism to Interpret Natural Language Instructions, Computational Intelligence, 14, pp. 89-133, 1998. 6. C. F. Baker, C. J. Fillmore and J. B. Lowe: The Berkeley FrameNet Project. COLING-ACL, Montreal, Canada, 1998. 7. C. Fillmore. 1968. The case for case. In Emmon Bach and Robert Harms, editors, Universals in Linguistic Theory, pages 1--90. Holt, Rhinehart and Winston, New York. 8. D. Jurafsky and J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000. 9. C. Kemke, Speech and Language Interfaces for Agent Systems, Proc. IEEE/WIC/ACM International Conference on Intelligent Agent Technology, pp.565-566, Beijing, China, September 2004. 10. C. Kemke, A Formal Approach to Describing Action Concepts in Taxonomical Knowledge Bases. In: N. Zhong, Z.W. Ras, S. Tsumoto, E. Suzuku (Eds.), Foundations of
3
2006 International Conference on Hybrid Information Technology November 9th ~ 11th, 2006 Intelligent Systems, Lecture Notes in Artificial Intelligence, Vol. 2871, Springer, 2003, pp. 657-662. 11. C. Kemke, What Do You Know about Mail? Knowledge Representation in the SINIX Consultant. Artificial Intelligence Review, 14:253-275, 2000. 12. C. Kemke, About the Ontology of Actions. Technical Report MCCS -01-328, Computing Research Laboratory, New Mexico State University, 2001. 13. C. Kemke, Natural Language Communication between Human and Artificial Agents, Pacific Rim International Workshop on Multi-Agent Systems PRIMA-2006, Guilin, China, August 2006 (in press). 14. C. Kemke, Towards an Intelligent Interior design system, Proc. Workshop on Intelligent Virtual Design Environments (IVDEs) at the Design Computation and Cognition th Conference, Eindhoven, the Netherlands, 9 of July 2006. 15. M. Montemerlo, J. Pineau, N. Roy, S. Thrun, and V. Verma. Experiences with a Mobile Robotic Guide for the Elderly. AAAI National Conference on Artificial Intelligence, Edmonton, Canada, 2002. 16. P. F. Patel-Schneider, , B. Owsnicki-Klewe, A. Kobsa, N. Guarino, R. MacGregor, W. S. Mark, D. L. McGuiness, B. Nebel, A. Schmiedel, and J. Yen. Term Subsumption Languages in Knowledge Representation. AI Magazine, 11(2): 16-23, 1990. 17. Edwin Pednault. ADL: Exploring the middle ground between STRIPS and the situation calculus. Proc. First Int'l Conf. on Principles of Knowledge Representation and Reasoning, pp. 324-332, 1989. 18. A. Stent, J. Dowding, J. M. Gawron, E. Owen Bratt and R. Moore. The CommandTalk Spoken Dialogue System. Proc. th 37 Annual Meeting of the ACL, pp. 183-190, University of Maryland, College Park, MD, 1999. 19. M. C. Torrance. Natural Communication with Robots, S.M. Thesis submitted to MIT Department of Electrical Engineering and Computer Science, January 28, 1994. 20. D. Traum, L. K. Schubert, M. Poesio, N. Martin, M. Light, C.H. Hwang, P. Heeman, G. Ferguson, J. F. Allen. Knowledge Representation in the TRAINS-93 Conversation System. Int. J. of Expert Systems 9(1), Special Issue on Knowledge Representation and Inference for Natural Language Processing, pp. 173-223, 1996. 21. S. Thrun, M. Beetz, M. Bennewitz, W. Burgard, A.B. Cremers, F. Dellaert, D. Fox, D. Haehnel, C. Rosenberg, N. Roy, J. Schulte, and D. Schulz. Probabilistic Algorithms and the Interactive Museum Tour-Guide Robot Minerva. Intern. Journal of Robotics Research, 19(11):972-999, 2000. 22. W. Wahlster. VERBMOBIL: Erkennung, Analyse, Transfer, Generierung und Synthese von Spontansprache. Report, DFKI GmbH, Juni 1997. 23. E. Walker. An Integrated Planning Algorithm for Abstraction and Decomposition Hierarchies of Actions. Honours Project, Dept. of Computer Science, University of Manitoba, 2004. 24. R. Weida and D. Litman. Subsumption and Recognition of Heterogeneous Constraint Networks. Proceedings of CAIA94, pp. 381-388, 1994.
4
Authors Christel Kemke Received a Masters degree in computer science (Diplom-Informatiker) from the University of Dortmund, and a Ph.D. degree (Dr.rer.nat.) from the University of Bielefeld, Germany, and a B. Sc. (Honours) degree and Diploma in psychology from the Open University, Milton-Keynes, Great Britain. She worked in research and teaching at the University of the Saarland and the German Research Centre of Artificial Intelligence in Germany; The International Computer Science Institute in Berkeley, California; University College Dublin in Ireland, and New Mexico State University, USA. Since 2001, she is professor in the Department of Computer Science at the University of Manitoba, Canada.