A Cross-Platform Robotic Architecture for Autonomous Interactive Robots Yasser F. O. Mohammad1 , Toyoaki Nishida2 Nishida-Sumi Laboratory, Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Japan 1
[email protected] 2
[email protected]
Abstract. This paper reports the lowest level of specification of a new cross-platform robotic architecture for HRI applications called EICA. The main contribution of this paper is a thorough analysis of some of the challenges HRI applications impose on the underlying architecture, and the details of the reactive layer of the EICA architecture that was designed to meet those challenges emphasizing how low level attention focusing and action integration is implemented. The paper also describes the implementation of a listener robot that uses human-like nonverbal behavior during explanation scenarios using the proposed architecture and reports some encouraging results from experimenting with a simplified simulation of this robot.
1
REVIEW AND MOTIVATION
Many researchers have studied robotic architectures for mobile autonomous robots. The proposed architectures can broadly be divided into reactive, deliberative, or hybrid architectures. One general problem with most hybrid architectures concerning real world interactions is the fixed pre-determined relation between deliberation and reaction [1], [2]. Interaction between humans in the real world utilizes many channels including verbal, and nonverbal channels. To manage those channels, the agent needs to have a variety of skills and abilities including dialog management, synchronization of verbal and nonverbal intended behavior, and efficient utilization of normal society-dependent unintended nonverbal behavior patterns. Those skills are managed in humans using both conscious and unconscious processes of a wide range of computational load. This suggests that implementing such behaviors in a robot will require integration of various technologies ranging from fast reactive processes to long term deliberative operations. The relation between the deliberative and reactive subsystems needed to implement natural interactivity is very difficult to be catched in well structured relations like deliberation as learning, deliberation as configuration, or reaction as advising usually found in hybrid architectures. On the other hand, most other autonomous applications used to measure the effectiveness of robotic architectures (like autonomous indoor and outdoor navigation, collecting empty cans, delivering Faxes, and underwater navigation) require a very well structured
2
relation between reaction and deliberation. To solve this problem the architecture should have a flexible relation between deliberation and reaction that is dictated by the task and the interaction context rather than the predetermined decision of the architecture designer. Some researchers proposed architectures that are specially designed for interactive robots. Ishiguro et al. proposed a robotic architecture based on situated modules and reactive modules. While reactive modules represent the purely reactive part of the system, situated modules are higher levels modules programmed in a high-level language to provide specific behaviors to the robot. The situated modules are evaluated serially in an order controlled by the module controller. This module controller enables planning in the situated modules network rather than the internal representation which makes it easier to develop complex systems based on this architecture [3]. One problem of this approach is the serial nature of execution of situated modules which, while makes it easier to program the robot, limits its ability to perform multiple tasks at the same time which is necessary to achieve some tasks especially nonverbal interactive behaviors. Also there is no built-in support for attention focusing in this system. Nicolescu and Mataric proposed a hierarchical architecture based on abstract virtual behaviors that tried to implement AI concepts like planning into behavior based systems. The basis for task representation is the behavior network construct which encodes complex, hierarchical plan-like strategies [4]. One problem of this work is the implicit inhibition links at the actuator level to prevent any two behaviors from being active at the same time even if the behavior network allows that, which decreases the benefits from the opportunistic execution option of the system when the active behavior commands can actually be combined to generate a final actuation command. Although this kind of limitation is typical to navigation and related problems in which the goal state is typically more important than the details of the behavior, it is not suitable for human-like natural interaction purposes in which the dynamics of the behavior are even more important than achieving a specific goal. For example, showing distraction by other activities in the peripheral visual field of the robot through partial eye movement can be an important signal in human robot interactions. One general problem with most architectures that target interactive robots is the lack of proper intention modeling on the architectural level. In natural human-human communication intention communication is a crucial requirement for the success of the communication. Leaving such an important ingredient of the robot outside the architecture can lead to reinvention of intention management in different applications. This analysis of existing HRI architectures revealed the following limitations: – Lack of Intention Modeling in the architectural level. – Fixed pre-specified relation between deliberation and reaction. – Disallowing multiple behaviors from accessing the robot actuators at the same time. – Lack of built-in attention focusing mechanisms in the architectural level.
3
To overcome the aforementioned problems, the authors designed and implemented a novel robotic architecture (EICA). In this paper the details of the lowest level of specification of this architecture is given with emphasize of presenting how the proposed architecture implements attention focusing and action integration in the lowest level of the system. A real world experiment with a humanoid robot developed using EICA is then presented.
2
L0 EICA
Fig. 1. L0 EICA
EICA is a behavioral hybrid architecture that specifies no predesigned relation between deliberation and reaction. The architecture has multiple levels of specification to make it possible to design the behaviors of the robot at any level of abstraction required for the task at hand while guaranteeing that all processes developed at whatever level of abstraction can be combined together in the final system using a fixed and simple action integration mechanism. A level of specification (LoS) is defined in EICA as a set of types and entities that can be used to build the computational processes of the robot at a specific level of abstraction. Levels of specification are arranged in a hierarchical structure where any process at a specific level of specification can use the types and entities of the lower levels of specification. In this paper only the lower level of specification L0 EICA is discussed. Fig. 1 shows the lowest level of specification of the EICA architecture called L0 EICA. Every processing component in EICA must ultimately implement the
4
Active interface. In that sense this type is equivalent to the Behavior abstraction in BBS (behavior based systems) or the Agent abstraction in MAS (multi-agent systems). Every Active object has the following attributes: Attentionality a real number that specifies the relative attention that should be given to this process. This number is used to calculate the speed at which this process is allowed to run. As shown in Fig. 2, this attribute is connected to the output of the attention effect channel of the object. Actionability a real number specifying the activation level of the object. A zero or negative activation level prevents the object from execution. A positive actionability means that the process will be allowed to run but depending on the exact value of this attribute the effect of the object on other active objects is calculated. As shown in Fig. 2, this attribute is connected to the output of the action effect channel of the object. Effects A set of output ports that connect this object through effect channels to other active components of the system. This specification for the Active type was inspired by the Abstract Behaviors of [4] although it was designed to be more general and to allow attention focusing at the lowest level of the architecture. By separating the actionability from the attentionality and allowing actionability to have a continuous range, EICA enables a form of attention focusing that is usually unavailable to behavioral systems. This separation allows the robot to select the active processes depending on the general context (by setting the actionability value) while still being able to assign the computation power according to the exact environmental and internal condition (by setting the attentionality value). The fact that the actionability is variable allows the system to use it to change the possible influence of various processes (through the operators of the effect channels) based on the current situation. All active components are connected together through effect channels. Every effect channel has a set of n inputs that use continuous signaling and a single output that is continuously calculated from those inputs. This output is calculated according to the operation attribute of the effect channel. The currently implemented operations are: – Max:y = max (xi |ai > ε ) i=1:n
– Min:y = min (xi |ai > ε ) i=1:n n P (ai xi )
i=1 n
– Avg: y = P
(ai )
i=1
Where xi is an input port, ai is the actionability of the object connected to port i and y is the output of the effect channel. At this level of specification the types that can be used to directly implement the processing components of the robot are:
5
Fig. 2. Schematics of L0 EICA components
MotorPlan Represents a simple reactive plan that generates a short path control mechanism from sensing to actuation. The action integration mechanism provides the means to integrate the actuation commands generated by all running motor plans into final commands sent to the executers to be applied to the robot actuators based on the intentionality assigned to every motor plan. This type of processes represents the reactive view of intention inspired by neuroscience and experimental psychology results regarding human intentional behavior [5]. The motor plans in EICA are more like reactive motor schema than traditional behaviors. Process Provides a higher level of control over the behavior of the robot by controlling the temporal evolution of the intentionality of various motor plans. As will be shown in the application presented in this paper, the interaction between multiple simple processes can generate arbitrary complex behavior. Processes in EICA are not allowed to generate action directly. Reflex A special type of processes that can bypass the action integrator and send direct commands to the executer(s). Reflexes provide safety services like collision avoidance during navigation, or safety measures to prevent any possible accidents to the interacting person due to any failures in other modules. Sensor An active entity intended to communicate with the hardware sensors of the robot directly. This kind of objects was introduced to provide a more efficient sensing capability to the robot by utilizing the latency property of the data channel component. Various kinds of active components in the robot are connected together using data channels to exchange data. A data channel is a simple object that manages the generation and distribution of data from active objects in the system based on its mode of operation. In the on demand mode the data channel is usually
6
passive until it receives a request from one of its output ports. The data item stored is sent to this port directly if the latency of this port is less than the time passed since the current data item was produced, otherwise the source process of the data channel is interrupted to generate a new data item to be stored and passed to the requester. This kind of data on demand is usually not available in behavioral systems, but it can improve the performance when sensing is an expensive operation as when using vision techniques. In the continuous generation mode the data channel interrupts the source process to generate data items every 1/f requency seconds. In the interrupt mode the data generated from the source are immediately sent to the output ports of which the request input is high with a frequency determined by the latency of this port. This mode of operation is useful in separating the preconditions of various processes from the execution core which enables more efficient implementation if the same preconditions are to be calculated by many processes. Other than the aforementioned types of active entities, EICA has a central Action Integrator that receives actions from motor plans and uses the source’s intentionality level as well as an assigned priority and mutuality for every DoF of the robot in the action to decide how to integrate it with actions from other motor plans using simple weighted averaging subject to mutuality constraints. This algorithm although very simple can generate a continuous range of integration possibilities ranging from pure action selection to potential field like action integration based on the parameters assigned by the various motor plans of the robot. The current implementation of EICA is done using standard C++, and is platform independent. The system is also suitable for implementation onboard and in a host server. It can be easily extended to support distributed implementation on multiple computers connected via a network. The EICA implementation is based on Object-Oriented design principles so every component in the system is implemented in a class. Implementing software EICA Applications is very simple: Inherit the appropriate classes from the EICA core system and override the abstract functions.
3
Example Implementations
Until now the EICA architecture was used to implement the TalkBack miniature robot reported in [6] and a humanoid listener robot reported here. 3.1
A Robot That Listens
The ability to use human-like nonverbal listening behavior is an advantage for humanoid robots that coexist with humans in the same social space. [7] implemented a robot that tries to use natural human like body language while listening to a human giving it road directions. In this work we try to build a general listener robot based on the EICA architecture. As a minimal design, only the head of the robot was controlled during this experiment. This decision was based on the hypothesis accepted by many researchers in the nonverbal human
7
interaction community that gaze direction is one of the most important nonverbal behaviors involved in realizing natural listening in human-human close encounters [8]. The evaluation data was collected as follows: 1. Six different explanation scenarios were collected in which a person is explaining the procedure of operating a hypothetical machine that involves pressing three different buttons, rotating a knob, and noticing results in an LCD screen in front of a Robovie II robot while pretending that the robot is listening. The motion tracker’s data was logged 460 times per second. 2. The log files were used as the input to the robot simulator and the behavior of the robot’s head was analyzed. 3. For every scenario 20 new synthetic scenarios were generated by utilizing 20 different levels of noise. The behavior of the simulator was analyzed for every one of the resulting 120 scenarios and compared to the original performance. 4. The same system was used to drive the Robovie II robot and the final behavior was subjectively studied. Four reactive motor plans were designed that encapsulate the possible interaction actions that the robot can generate, namely, looking around, following the human face, following the salient object in the environment, and looking at the same place the human is looking at. The sufficiency of those motor plans was based on the fact that in the current scenario the robot simply have no other place to look, and the necessity was confirmed empirically by the fact that the three behavioral processes needed to adjust the intentionality of all of these motor plans. Fig. 3 shows the complete design of the listener robot in this experiment
Fig. 3. The Design of the Listener Robot
8
The analysis of the mutual attention requirements showed the need of three behavioral processes. Two processes to generate an approach-escape mechanism controlling looking toward the human operator which is inspired by the Approach-Avoidance mechanism suggested in [8] in managing spacial distance in natural human-human situations. These processes were named Look-At-Human, and Be-Polite. A third process was needed to control the realization of the mutual attention behavior. This process was called Mutual-Intention. The details refer to [9]. A brief description of them is given here: 1. Look-At-Human: This process is responsible of generating an attractive virtual force that pulls the robot’s head direction to the location of the human face. This process first checks the Gaze-Map’s current modes and if their weights are less than a specific threshold for more than 10 seconds, while the human is speaking for more than 4 seconds , it increases the intentionality of the followFace motor plan and decreases the intentionality of the other three reactive motor plans based on the difference in angle between the line of sight of the human and the robot and the Confirming condition. 2. Be-Polite: This process works against the Look-At-Human process by decreasing the intentionality of the followFace motor plan in reverse proportion to the angle between the line of sight of the human and the robot depending on the period the human is speaking. 3. Mutual-Attention: This process increases the intentionality of the followObject or the intentionality of the followGaze. The rate of intentionality increase is determined based on the confirmation mode. Five perception processes were needed to implement the aforementioned behavioral processes and motor plans: 1. Human-Head, which continuously updates a list containing the position and direction of the human head during the last 30 seconds sampled 50 times per second. 2. Robot-Head, which continuously updates a list containing the position and direction of the robot head during the last 30 seconds sampled 50 times per second. 3. Gaze-Map, which continuously updates a representation of the distribution of the human gaze both in the spacial and temporal dimensions. The spacial distribution is stored as a mixture-of-Gaussians like structure where the mean µi represents the location of an important object and the variance σi is a measure of the size of that object. The weight wi represents the importance of the place according to the gaze of the human. The details of this process will not be given here due to lack of space, refer to [9] for details. 4. Speaking, which uses the power of the sound signal to detect the existence of human speech. The current implementation simply assumes there is a human speech whenever the sound signal is not zero. This was acceptable in the simulation but with real world data a more complex algorithm that utilizes fourier analysis will be used. 5. Confirming, which specifies whether or not the human is making a confirming action. Currently this value is manually added to the logged data.
9 Table 1. Comparison Between the Simulated and Natural Behavior Item Mutual Gaze Gaze Toward Instructor Mutual Attention
Statistic Mean Std.Dev. Mean Std.Dev. Mean Std.Dev.
Simulation 31.5% 1.94% 77.87% 3.04% 53.12% 4.66%
H-H value 30% – 75% – – –
Some of the results of numerical simulations of the listening behavior of the robot are given in Table 1. The table shows the average time of performing four basic interactive behaviors obtained from the simulated robot in comparison to the known average values measured in human-human interaction situations. The source of the average time in the human-human case are reported from [8]. As the table shows the behavior of the robot is similar to the known average behavior in the human-human case for both mutual gaze and gaze toward instructor behaviors and the standard deviation in both cases is less than 7% of the mean value which predicts robust operation in real world situations. These results suggest that the proposed approach is at least applicable to implement natural listening behavior.
Fig. 4. Effect on the error level on the behavior of the robot
Fig. 4 shows the effect of increasing the error level on the percentage of time mutual gaze, gaze toward instructor, and mutual attention behaviors were recognized in the simulation. As expected the amount of time spent on these interactive behaviors decreases with increased error level although this decrease is not linear but can be well approximated with a quadratic function. Regression Analysis revealed that in the three cases the effect on the mean time spent doing the studied behavior grows with the quadrable of the inverse SNR (Signal to Noise ratio).
10
4
Limitations
Although L0 EICA provides a unified simple architecture for implementing behavioral control systems, and provides a consistent action integration mechanism, it has some limitations that will be targeted by higher levels of specifications. One such limitation is the lack of an explicit learning mechanism that enables the behavior of the robot to evolve over time. Although the architecture does allow the programmer to implement any learning algorithm as a set of processes that control the parameters of other running active components through effect channels, the architecture does not provide a specific learning framework to make this process easier. Another limitation is the lack of higher level planning-like structures that enable complex behaviors to be built easily from simpler ones. Again this higher level planning like structures can be implemented by the robot programmer in an ad-hoc fashion, but it would be better if the architecture itself can provide an easy way to implement such deliberative mechanisms. A third limitation of the current system that resulted from its generality is the lack of an explicit mechanism for reasoning about the believes and behaviors of interacting agents which is required to implement a theory of mind for the robot. Future work in EICA will target all of those limitations by providing higher levels of specifications compatible with L0 EICA.
5
Conclusion
This paper presented the design of the lowest level of specification of a new robotic architecture (EICA) that was designed to meet four basic limitations of available HRI architectures. Some of the novel features of the proposed architecture were shown and the implementation of a humanoid robot that uses nonverbal human like head movement during listening based on it is given. The experimentation with a simulated version of the robot revealed that EICA can indeed be used to develop interactive robots that achieve human-like nonverbal interaction capabilities and that the implemented system has a good noise rejection properties even though the underlying implementation is massively parallel. In the future the details of higher levels of specification of EICA that introduces planning-like capabilities to the system will be reported and tested in a complete version of the listener robot.
References 1. Ronald C. Arkin, Masahiro Fujita, T.T., Hasegawa, R.: An ethological and emotional basis for human-robot interaction, robotics and autonomous systems. Robotics and Autonomous Systems 42(3-4) (March 2003) 191–201 2. S Karim, L Sonenberg, A.T.: A hybrid architecture combining reactive plan execution and reactive learning. In: 9th Biennial Pacific Rim International Conference on Artificial Intelligence (PRICAI). (2006)
11 3. Ishiguro, H., Kanda, T., Kimoto, K., Ishida, T.: A robot architecture based on situated modules. In: IEEE/RSJ Conference on Intelligent Robots and Systems 1999 (IROS 1999). Volume 3., IEEE (October 1999) 1617–1624 4. Nicolescu, M.N., Matari´c, M.J.: A hierarchical architecture for behavior-based robots. In: AAMAS ’02: Proceedings of the first international joint conference on Autonomous agents and multiagent systems, New York, NY, USA, ACM (2002) 227–233 5. Mohammad, Y.F.O., Nishida, T.: A new, HRI inspired, view of intention. In: AAAI-07 Workshop on Human Implications of Human-Robot Interactions. (July) 21–27 6. Mohammad, Y.F.O., Nishida, T.: Talkback: Feedback from a miniature robot. In: Twentieth Australian Joint Conference on Artificial Intelligence. (December 2007) 357–366 7. Kanda, T., Kamasima, M., Imai, M., Ono, T., Sakamoto, D., Ishiguro, H., Anzai, Y.: A humanoid robot that pretends to listen to route guidance from a human. Autonomous Robots 22(1) (2007) 87–100 8. Argyle, M.: Bodily Communication. Routledge; New Ed edition (2001) 9. Mohammad, Y.F.O., Ohya, T., Hiramatsu, T., Sumi, Y., Nishida, T.: Embodiment of knowledge into the interaction and physical domains using robots. In: International Conference on Control, Automation and Systems. (October 2007) 737–744