Intention through Interaction: Toward Mutual Intention in Real World Interactions Yasser F. O. Mohammad1 , Toyoaki Nishida2 Nishida-Sumi Laboratory, Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Japan 1
[email protected] 2
[email protected] Abstract. Human-Artifact interaction in real world situations is currently an active area of research due to the importance foreseen of the social capabilities of near future robots and other intelligent artifacts in integrating them into the human society. In this paper a new paradigm for mutual intention in human-artifact interactions based on the embodied computing paradigm called Intention through Interaction is introduced with theoretical analysis of its relation to the embodiment framework. As examples of the practical use of the framework to replace traditional symbolic based intention understanding systems, the authors’ preliminary work in a real-world agent architecture (IECA) and a natural drawing environment (NaturalDraw) is briefed.
Keywords: Embodiment, Real World Agents, Interactive Perception, Mutual Intention
1
Introduction
Classical approaches to intelligence under the symbolic computational paradigm are subject to heavy criticism that started in the last two decades of the twentieth century. The main problem with classical approaches is the detachment from the real world [1], [2]. In the same period of the last century research in human-robot interaction and human-artifact interactions attracted many researchers due to the increased use of robots and supposedly-intelligent artifacts in real life [3], [5]. To make the shift from the design-stance to the intentional-stance the artifact should be able to understand human intention and communicate its own intention to the human. The literature in human intention understanding is wide, but the approach used in most of this research is still based on the symbolic formalism, and so is still de-attached from the world [3]. In this paper we outline a new research direction toward mutual intention that is based on the embodied computation paradigm [2]. The proposed approach is not only a theoretical framework, but is a robust signal-processing/synthesis based approach to replace the traditional symbolic based approaches. To show the practical applicability of the proposed architecture, sections 5, and 6 highlights briefly two applications of the framework in the areas of Human-Agent Interaction (EICA) and Intelligent Human-Computer Interface (NauralDraw).
2
2
Embodied Intelligence
One of the main problems with GOFAI (”Good Old-Fashioned Artificial Intelligence”) is the detachment from the real world which caused the famous grounding problem. The root of this problem can be tracked to the assumption that cognition or intelligence is a functional algorithm that operates in a central (or may be distributed) mind that is related to the world only through a set of inputs (sensors) and outputs (actuators) [1]. This view of intelligence as some transcendental phenomena is challenged by many authors on both philosophical and practical bases [1], [2]. The need of an alternative encouraged many authors to challenge the basic assumptions of GOFAI, leading to many interrelated hypotheses [2] including the dynamical hypothesis in cognitive science, and the behavioral approach in robotics [10]. There is something common in all of those alternatives; all of them are enactive approaches [2] that rely on some form of embodiment to overcome the grounding problem. Five different notions of embodiment can be identified as suggested by Ziemke [2]. The first two of them are: 1. Structural Coupling which means that the agent is dynamically related to its environment forming a single combined dynamical system. 2. Historical Embodiment through a history of structural coupling with the environment that affects the agent’s internal dynamical system. We propose that the precondition level of embodiment for achieving intelligent autonomous real world agents is the historical embodiment level. What this level of embodiment emphasizes is the role of the extended interaction between the agent and the environment. This extended interaction is usually overlooked as a detail that is not important for implementing intelligent agents, but this interaction is the only way around the grounding problem as the history of the interaction can associate an internal meaning to the perceived signals, allowing the agent to act in the environment in a rooted and situated way, that is not possible using only externally coded algorithms. The point to emphasize here is that the embodied intelligence that is needed is an interactive embodiment that is not only based on having a body, even one that is structurally coupled with the environment at all moments, but also having an experience that is evolving with the environment. Although many researchers in the robotics and AI domains agree in general that some form of embodiment is needed [3], [5], [6], the importance of interactive or historical embodiment is not always appreciated. This situation can be seen clearly in the work done in the area of intention understanding in the Human-Robot Interaction domain. As will be seen in the next section, just naming the problem “intention understanding” [3], [5] reflects a passive attitude that ignores the need of co-evolution of intention and ignores the embodiment paradigm. For example the social embodiment paradigm of Dautenhahn [4], although mentioning the notion of interaction, is still in the structural coupling level (in accordance to his definition of embodiment as structural coupling).
3
3
Mutual Intention
To be able to interact naturally with humans, the real world agent (e.g. robot, ECA, etc) needs to have several faculties: 1. The ability to perceive human generated signals. 2. The ability to understand human behavior in a goal directed manner. 3. The ability to show its own intentions to the human in a natural way. Usually those are treated as three separate problems (in accordance to the normal vertical decomposition of problems in GOFAI ), but in natural interactions between humans this separation is not normally existing. In natural interaction situations the intentions of the two agents co-evolve rather than being communicated. Of course communication situations in which information is transferred in only one direction (as suggested by the points above) do exist, but this communication framework cannot be assumed to cover all possible interactions in the real world especially those involving nonverbal behavior. Let’s look at a very simple situation during which one person is giving another person directions to a specific location. This situation appears to be a one way transfer of information that should conform to the separate three steps formulation outlined above. 1. The listener perceives the signals given by the speaker (Passive Perception). 2. The listener analyzes the perceived signals (Intention Understanding). 3. The listener gives a final feedback (Intention Communication). In reality the interaction will not go this way. The listener will not be passive during the instructions but will actively align his body and gives feedback to the speaker, and those actions by the listener will change the cognitive state of the speaker and indirectly changes the signals perceived by the listener. So perception will be in fact interactive not passive. The second step is also not realistic because during the interaction the listener will continuously shape her understanding (internal representation) of the speaker’s intention, so no final analysis step separated from the perception will occur. The third step is again not realistic because the feedback given is continuous in the form of mutual alignment and not just a confirmation as suggested by the scenario decomposition above. This analysis suggests recasting the problem in real world interaction from three separate problems to a single problem we call Mutual Intention formation and maintenance. Mutual Intention is defined in this work as a dynamically coherent first and second order view toward the interaction focus. First order view of the interaction focus, is the agent’s own cognitive state toward the interaction focus. Second order view of the interaction focus, is the agent’s view of the other agent’s first order view. Two cognitive states are said to be in dynamical coherence if the two states co-evolve according to a fixed dynamical low.
4
Researchers in social robotics have introduced many ideas from developmental psychology like the use of imitation in learning. One problem of such work is that the task is usually the interaction itself and although the robot can actually learn new behaviors (like simple object manipulation) [7] and interacts better (joint attention) [8], the applications of those insights are still very limited because of the simplified environments/tasks and the low autonomy level (which translates to low embodiment degree). The hypothesis we support here is that the ability to form and maintain mutual intention is a precondition to achieve acceptably intelligent real-world agents. Moreover, that this formation and maintenance process is only possible through an interactive operation of composite perception, analysis, and communication of the interacting agents’ cognitive states.
4
Mutual Interaction and Embodiment
The relation between mutual intention as defined in the last section and embodiment as illustrated in the section before is very important to understand both of them. As mentioned in section 2, the level of embodiment required to achieve any nontrivial level of intelligent behavior in the real world is the interactive historical embodiment (as hypothesized by the authors), and, as shown in section 3, the precondition of natural interaction between intelligent agents is the ability to interactively form and maintain mutual intention. The common concept here is the concept of interaction as a co-evolution between the agent and his environment (historical embodiment) and between the agent and the other agents (mutual intention formation and maintenance). The whole vision of the Intention through Interaction framework can be stated as the following hypothesis: Intention can be best modeled not as a fixed unknown value, but as a dynamically evolving function. Interaction between two agents couples their intention functions creating a single system that co-evolves as the interaction goes toward a mutual intention state
5
Embodied Interactive Control Architecture (EICA)
In this section a general framework for the design of a real world agent architecture that is based on the Intention through Interaction paradigm is outlined. This design is still an ongoing work and is not completely realized into an actual agent although the Interactive Perception part (see Fig. 1) is already implemented and gave some promising results [9]. Fig. 1 gives the building blocks of the proposed architecture. The architecture consists of two major subsystems: 1. The sensorimotor control subsystem which is a reflexive control subsystem creating short loop between the sensed signals and the actions. This subsystem is essential for implementing embodiment.
5
2. The deliberative reasoning subsystem which is a deliberating non-symbolic subsystem that implements long term loose control loop between the sensed signals and the actions. The main responsibility of the modules of this subsystem is to control the modules of the lower sensorimotor system. This subsystem is essential for implementing historical embodiment.
Fig. 1. Embodied Interactive Control Architecture (EICA)
The input to the agent in this framework is divided into two signal components. 1. The signal from the inanimate objects in the environment. 2. The signal from the intelligent agents (e.g. humans, animals, and robots). This separation is essential to the design of the system as the two signals are treated differently. The inanimate originated signals are processed through a feedback system that contains two loops, a short loop through the sensorimotor subsystem that implements reflexive behavior, and a long loop through the deliberative reasoning system that implements learning and behavior adaptation (historical embodiment). The signals originating from intelligent agents are processed using another two control loops both passing through the Interactive Perception module, the first is a short interactive loop and the other is a long adaptation and learning loop (historical embodiment). This separation is not compatible with the traditional passive sense-analyzeact architectures that process those two signals without using the possibility of behavior adaptation offered by the intelligent agents.
6
Measuring the performance of EICA can only be based on building actual systems and measuring both the performance in the real world and the ease to build those systems. This is one of the future research directions of the authors. The most important subsystems of EICA will be illustrated in the following subsections. 5.1
Interactive Perception
This module is responsible of creating an interactive control loop around the intelligent agent generated signals in order to stabilize its perception. Interactive Perception is defined as allowing the perception module to intentionally affect the environment in order to align its actions with the feedback of the human, which should also be aligned with the system’s feedback. Fig. 2 gives the main building blocks of the Interactive Perception module. For implementation details refer to [9].
Fig. 2. The Interactive Perception Module. Example signals perceived at the key points of the system are shown when the human is showing the agent how to draw the B character.
To illustrate the operation of the module let’s take an example where a human is trying to teach the agent how to draw the character ”B” by illustration [9]. Fig. 2 shows the block diagram of the system and the signals perceived at the key points during this interaction. The intended behavior signal here is the character ”B” and it should be clear that the intention in this case is not fixed (there is no pre-specified ”B” inside the head of the human), but it evolves with the interaction, and this evolution of the intention is what is actually utilized by the module to robustly create a low level mutual intention with the human. As an example of an unintended behavior signal let’s consider the third (middle) semi-circle added to the middle of the ”B” character unintentionally. As usual in
7
real world interaction there is some noise added to the signal. This is the input to the system signal x(t) in Fig. 2. The processing is done as follows: first an adaptive filter is used to reduce the noise component creating the signal s˜(t). This is what we call Passive Perception and at this point most systems stop. The result of passive perception s˜(t) is shown in Fig. 2 and it is clear that the unintended behavior is not attenuated. After that the signal s˜(t) is processed by the Interactive part of the module which utilizes the temporal interaction pattern with the human to generate some form of low level mutual intention that allows the system to extract the intended behavior of the human. The implementation of this part is based on the correlation between multiple signal channels and the change in the acceleration of the human movement as related to his/her satisfaction level with the agent’s perception which is given to the human online using projection on a screen (as done in [9]). The result of the Interactive Perception module is given in Fig. 2 sˆ(t). It is clear that the unintended semi-circle is effectively eliminated by utilizing the interaction. The main difference between interactive perception and the classical passive perception based on digital signal processing lies in two points: 1. The signal to be perceived is not assumed to be captured then analyzed offline, but the perception operation is taking part in the signal formation by the feedback that affects the human behavior [9]. 2. The signal received by the sensors is assumed to be the result of nonlinear superposition of three signals, namely, the intended behavior signal, the unintended behavior signal, and noise, rather than only two components signal and noise.
5.2
Intention Function
The Intention Function module is the heart of EICA. This module is the computational realization of the view of intention that characterizes the Intention through Interaction approach. Together with the Interactive Perception module this module implements the paradigm. Together with the adaptable sensorimotor subsystem, and the task and interaction learning modules, this module implements embodiment by assigning internal meaning in the form of intention function change heuristics that grounds the agent’s experiment on its perception. The intention function module is realized as an active memory module (a dynamical system) in which the required behavior plans of all the other active modules in the architecture are registered and then the module automatically updates the intentional degree (which is a measure of the probability of selection as the next action) of all those behavior tokens based on the outputs of the emotional module (that summarizes the information from the short past history), and the innate drives module (that implements a rough subjective coherence in the agent in the Uexkullian sense).
8
5.3
Action Selector/Attention Focusing
The action selection module is responsible of driving the actuators of the agent based on the motion primitives stored in the motion repertoire of the agent and the intention function. This module implements a judgment between various actions that are suggested by the weighted behaviors in the intention function in the low action primitive level. The main objectives of this module are to ensure coherence in the agent actions by avoiding jumping between different behaviors and avoiding behavior conflicts. Those objectives can be shown to be conflicting and the heuristic used to bias the decision toward one of them is still an area of research. The Attention Focusing module implements a low level weighting over the sensor generated signals to reduce the effect of irrelevant inputs based on the current state of the Intention Function and the Low Level Emotional state. 5.4
Low Level Emotion
This module continuously summarizes the inputs to the system forming a continuous representation of the internal emotional state in a multidimensional space that affects the action selector and may affect other low level behaviors like the navigation module which will go slower if the confusion dimension for example goes higher . The emotion dimensions of this module need not correspond to the the normal human emotions but should be based on the experience of the robot itself. 5.5
The Deliberative Subsystem
The deliberative part implements the extended experience embodiment of the agent by adjusting various parameters of the low level sensorimotor subsystem based on learning from the interaction between the agent and the environment and other intelligent agents. Some extended time behaviors like interaction regulation can be implemented in the deliberative subsystem even though those behaviors do not represent experimental learning if there natural time scale is much longer than other sensorimotor behaviors, or if simple dynamical system implementation is not possible. Those behaviors should be implemented as control behaviors that affect the outputs of existing sensorimotor behaviors at runtime and should never update the Intention Function directly. 5.6
Relation to Other Robotic Architectures
Some other behavioral robotic control architectures can be related to EICA. The subsumption architecture [10] is implementing a special case of EICA that disables all deliberative reasoning modules (no historical embodiment), and the Interactive Perception module (no mutual intention), and replaces the intention function, the drives, and the action selector with hard wired subsumption
9
channels, and this is the reason that this architecture is very difficult to adopt in communicative robots. The work of Iba [11] in Interactive Multimodal Robot Programming can be considered as another special case that do not use the intention function/action selection mechanism (no mutual intention), assumes a predefined set of programming gestures, and separates programming mode from operational mode. A main difference between EICA and traditional robotic architecture is that it do not only specify the relation between the various processes and behaviors inside the agent but it also specifies the organization of those behaviors and the needed processes to achieve both autonomy and natural interaction with humans in the real world. 5.7
Example Robotic Implementations
Although a complete implementation of EICA in a mobile robot is not currently available, the authors have made a partial implementation that focuses on the Interactive Perception subsystem. This implementation and other suggested examples will be introduced briefly in the following subsections Learning Paths by Demonstration In [9] the authors gave the details of an implementation of the Interactive Perception module in the area of learning paths by demonstration using a virtual robot in the real environment. In this system the user faces the virtual robot and shows it how to draw any 2D drawing using a motion sensor attached to her finger, and the virtual robot projects the perceived signal on a monitor and uses the implicit feedback from the human to adjust its perception. Quantitative analysis revealed that the virtual robot was able to actually reduce the effects of the noise and unintended behavior better than using a passive perception system. Path Following in a Partially observable environment As a step toward implementing the EICA architecture in a robot targeting Human-Robot Team situations, the authors are building a miniature robot (using an e-puck robot) that can understand human gestures and communicates its own internal state. The robot will be used under the control of a human operator to follow a prespecified path in an environment that is partially observable to the human and partially observable to the robot. The robot will only use LEDs and motion cues to communicate its internal state as well as the environmental features it encounters to the operator. The system will implement the sensorimotor part of the architecture. Sweeping Robot Although not implemented yet, the EICA architecture can be used to build a sweeping robot that can be controlled by natural gesture. The interactive perception module will actively analyze the inputs from the human (sensed using motion capture devices, cameras or both) and produces a
10
representation of the human body movements, this representation is processed by the module to interactively attenuate the signals that are not related to the current focus of the robot (decided by the attention focusing module), this operation affects the intention function by registering feedback behavior primitives to it. The same signal is also passed to the higher level interaction regularization module (and interaction learning module) for implementing body alignment, and other nonverbal natural feedback operations. The inanimate signals from the environment are processed through the reflexive subsystem to implement low level behaviors like wandering, waste detection, following, avoidance, etc. Those behaviors also are registered to the intention function module. The inanimate signals are also passed to the deliberative subsystem to implement task learning, and planning. The robot also has a set of innate drives that manages its autonomous behavior like self charging, waste following, asking for help, etc. The only way for a behavior to affect the motor subsystem is through the intention function/action selection modules and both are adapted through the experience of the robot.
6
NaturalDraw
Real world agents are not only hardware agents. Software agents can be considered as real world agents as long as they are embodied in their information environment and interacting with humans in realtime. As an application of the Intention Through Interaction framework (that do not use the EICA architecture) in the software domain, the authors designed and implemented a natural drawing environment for both experienced and novice users [12] called NaturalDraw. The feature of NaturalDraw most related to current discussion is Correction by Repetition. In natural interactions humans tend to repeat whatever they believe was not correctly perceived by the interacting partner, but most artifacts and computer systems cannot effectively use this feature of human behavior because of the inherent passivity in the dominating sensing modules. NaturalDraw overcame this limitation by using two features. 1. Automatic Repetition Detection, which enables the system to interactively detect the existence of repetition in the drawing. The detection operation is done using real-time tracking of the distance measure between the current stroke and the existing drawing and detecting the existence of repetition based on that history [12]. 2. Repetition Processing, which is done using two algorithms (PFP and SA) based on the context of the interaction [12]. The result of this processing is to create a drawing that better resembles the intended one. Preliminary experimentation with the system showed that NaturalDraw is more effective and easier to use than traditional drawing environment, and also showed that the system is equally easy to use for professional and novice users. For details about the algorithms used and statistical analysis of the results refer to [12].
11
7
Conclusion
The preliminary experimentation with systems designed under the guidelines of the Intention through Interaction paradigm suggests that this paradigm can be more effective in creating mutual intention between artifacts and humans than the traditional passive intention understanding methods because it is based on an embodied signal processing and synthesis scheme, and provided two brief illustrations of it designed by the authors: EICA a general agent control architecture in the field of Human-Agent Interaction, and NaturalDraw as a free-hand drawing system in the field of HCI. The main direction of future research is the complete realization of EICA into a real world robot to study the effectiveness of the proposed paradigm.
References 1. Paul Vogt: The Physical Symbol Grounding Problem. Cognitive Systems Research 3, (2002) 429–457. 2. Tom Ziemke: Rethinking Grounding. In Riegler, Peschl, von Stein (eds.): Understanding Representations in the Cognitive Sciences. Plenum Press, New York (1999) 87–100. 3. Karim A. Tahboub: Intelligent Human-Machine Interaction Based on Dynamic Bayesian Networks Probabilistic Intention Recognition. Journal of Intelligent and Robotic Systems, 45 (2006) 31–52. 4. Kerstin Dautenhahn, Bernard Ogden, Tom Quick: F rom embodied to socially embedded agents Implications for interaction-aware robots, Cognitive Systems Research, 3 (2002) 397–428. 5. Wentao Yu Alqasemi, R. Dubey, R. Pernalete, N.: Telemanipulation Assistance Based on Motion Intention Recognition. Proceedings of the 2005 IEEE International Conference on Robotics and Automation (ICRA 2005). (2005) 1121–1126. 6. Peter Dayan, Bernard W. Balleine: Reward, Motivation, and Reinforcement Learning. Neuron, 36 (2002) 285–298. 7. Jeff Lieberman: Teaching a Robot Manipulation Skills through Demonstration. Msc. Thesis in Mechanical Engineering, MIT 2004. 8. Cynthia Breazeal: Towards sociable robots. Robotics and Autonumous Systems, 42, (2003) 167–175 9. Yasser F. O. Mohammad, Toyoaki Nishida: Interactive Perception for amplification of intended behavior in complex noisy environments. Proceedings of the International Workshop of Social Intelligence Design 2006 (SID2006), Mar. 24–26, Osaka, Japan, (2006) 173–187. 10. Rodney A. Brooks: Challenges for Complete Creature Architectures. In Meyer, J-A and S.W. Wilson: From Animals to Animats. Proceedings of the First International Conference on Simulation of Adaptive Behavior. MIT Press/Bradford Books. (1991) 434-443. 11. Soshi Iba, Christiaan J. J. Paredis, Pradeep K. Khosla: Interactive Mulitmodal Robot Programming. The International Journal of Robotics Research, 24, (2005) 83–104. 12. Yasser F. O. Mohammad, Toyoaki Nishida: NaturalDraw: Interactive Perception based Drawing for everyone. Proceedings of the 12th conference on Intelligent User Interfaces (IUI2007), Jan. 27–31, Honolulu, Hawaii, USA. (2007) 251–260.