stack objects, and to perform peg-in-hole tasks. ..... through a C++ framework [1] based on the TAO ACE. ORB. ... PbD of Peg-in-hole and piling of boxes tasks.
Appears in: 12th IEEE Workshop Robot and Human Interactive Communication October 31 - November 2, 2003, San Francisco, CA
Toward Programming of Assembly Tasks by Demonstration in Virtual Environments Jacopo Aleotti, Stefano Caselli, Monica Reggiani RIMLab - Robotics and Intelligent Machines Laboratory Dipartimento di Ingegneria dell’Informazione University of Parma, Italy E-mail {aleotti,caselli,reggiani}@ce.unipr.it
Abstract Service robots require simple programming techniques allowing users with little or no technical expertise to integrate new tasks in a robotic platform. A promising solution for automatic acquisition of robot behaviours is the Programming by Demonstration (PbD) paradigm. Its aim is to let robot systems learn new behaviours from a human operator demonstration. This paper describes a PbD system able to deal with assembly operation in a 3D block world. The main objective of the research is to investigate the benefits of a virtual demonstration environment. Overcoming some difficulties of real world demonstrations, a virtual environment can improve the effectiveness of the instruction phase. Moreover, the user can also supervise and validate the learned task by means of a simulation module, thereby reducing errors in the generation process. Some experiments involving the whole set of system components demonstrate the viability and effectiveness of the approach.
1. Introduction A long term goal in robotics is the development of effective personal and home service robots. This goal requires development of systems enabling robot programming even with little or no technical competence. In other words, untrained users, that do not possess the skills to develop robot programs, need simple and natural methods of interaction with a robot platform. Two major problems arise in specifying tasks to be carried out in an everyday environment. The first one comprises understanding of user’s intentions and plans followed by transformation into a representation that can be executed by the robot. This problem requires exploitation of adequate sensoriality or of a high degree of assistance by the user. The second problem, that should not be neglected, is transformation of the internal representation of the acquired task into a format understandable by humans by means of a high level user interface. Such interface may provide a visual simulation of the task, allowing the human user to accept or reject the interpretation. A promising solution that has been proposed for the automatic acquisition of robot behaviours is Program-
ming by Demonstration (PbD) [5], [7], [4]. The aim of PbD is to make robots acquire their behaviours by simply providing a demonstration of how to solve a certain task, given the necessary initial knowledge to the system. A PbD interface then automatically interprets what is to be done from the observed task, thus eliminating the need for tedious textual or iconic programming. The easiest way to program a robot for a human user is by demonstrating the task in the real world expecting the system to understand and replicate it. This is the most general approach to programming by demonstration, but currently it is still inefficient. Indeed, object recognition techniques and routines for human gesture segmentation can deal only with highly structured environments including a small set of known objects [2]. We remark that if the objects are known, the same a priori knowledge can be used to build their 3D models, that can be directly included into a simulated virtual environment. The use of a simulated virtual environment to directly demonstrate the task can help to overcome some difficulties. First tracking user’s actions within a simulated environment is easier than in a real environment and there is no need for object recognition, since the state of the manipulated objects is known in advance. Moreover, the positions of the human hand and grasped objects do not have to be estimated using real sensors like cameras, thus inaccuracies of sensor-based approaches do not have to be taken into account. Furthermore, while performing the demonstration the user can always control the position of virtual cameras through which the operator views the virtual world. Finally, the virtual environment can be augmented with operator aids such as graphical or other synthetic fixtures [14], and force feedback. This paper presents a PbD platform which handles basic manipulation operations in a 3D “block world”. We propose a robot teaching method where an operator, wearing a dataglove with a 3D tracker, demonstrates the tasks in a virtual environment. The virtual scene simulates the actual workspace and displays the necessary assembly components. Our system recognizes, from the user’s hands movements, a sequence of high level actions and translates them into a sequence of commands
for a robot manipulator. The task recognized is then performed in a simulated environment for validation. Finally, if the operator agrees with the simulation, the task is executed in the real environment. A library of some simple assembly operations has been developed. It allows to pick and place objects on a working plane, to stack objects, and to perform peg-in-hole tasks. Section 2 reviews the state of the art regarding a sample of related PbD research. Section 3 introduces the structure and the main components of our system. Section 4 shows some experimental results that demonstrate the potential of the proposed approach to PhD systems. The paper closes in section 5 that summarizes the work and discusses our future investigations.
2. Related work A number of applications developed in recent years through the Programming by Demonstration paradigm demonstrate its high potential for robotics. Indeed, this approach, trying to learn from the given examples, capitalizes on interaction with the user to reduce errors in the learning process. This capability greatly reduces the time required for the instruction phase. A classification of the approaches based on human demonstration and an overview of implemented systems can be found in [3]. Solutions proposed in literature often address only special problems or well-structured environments, demonstrating the difficulties arising in the implementation of a human demonstration system. In most PbD systems, the demonstration takes place in the real world. The system learns action sequences from the information returned by a set of sensors observing task execution. Examples of this approach are pioneering Ikeuchi and Suehiro’s work on Assembly Plan from Observation (APO) [5] and the system implemented by Kang in his dissertation [6]. The APO system uses a light-stripe 3D vision system to identify objects in a “block world”. The acquisition system observes the operator’s movements and extracts a highlevel task model from state transitions. In [6], Kang uses a CyberGlove with a Polhemus 6 d.o.f. tracker and a multi-baseline stereo system to sense human actions and the workspace respectively, thus leading to a robust system that can track objects despite occlusions. In analyzing the task sequence, the system divides the observed sensory data into meaningful temporal segments, namely the pregrasp, grasp, and manipulation phases. Recent solutions involving demonstration in the real world have been proposed by Z¨ ollner et al. [16] and by Ogawara et al. [11], [12]. Dillmann et al. [4], [13], [3], [16] investigate hand gesture and grasp recognition problems and other collateral aspects of PbD, including inconsistency of human demonstrations and recognition of fine manipulation tasks. Ogawara et al. [11], [12] propose a new method of constructing a human task model by attention point (AP) analysis. Firstly, they observe human task, constructing rough task models and finding
Attention Points which require detailed analysis. Then, they enhance the human task model applying a timeconsuming analysis only on APs. Takahashi et al. have also considered the demonstration in a virtual environment [15], [10]. While the described teaching by showing system uses advanced algorithms to recognize operations and to map them on symbolic tasks, however it does not include any simulation of the generated task, useful for safety reason, and relies entirely on the operator for the set up of the virtual environment. More recently, Lloyd et al. [8] have used a virtual environment to program part-mating and contact tasks. The application builds the virtual environment using a vision system automatically locating simple objects within the work site. Then a 2-D mouse is used to select objects within the simulated environment and move them around. Interaction with the objects is simplified by contact information, first-order dynamics and graphical fixtures useful for part mating. The PbD system described in this paper differs from prior research in several respects, as will be shown in the following sections.
3. Overview of the system This section describes the proposed PbD system (Figure 1). A virtual reality environment is used for task demonstration to avoid a physical presentation in the real world that can be too time consuming and too sensitive to inaccuracies of sensors. The architecture of the system follows the canonical structure of the “teaching by showing” method, which consists of three phases. The first phase is task presentation, where the user wearing the dataglove executes the intended task in a virtual environment. In the second phase the system analyzes the task and extracts a sequence of high-level operations, taken from a set of rules defined in advance. In the final stage the synthesized task is mapped into basic operations and executed, first in a 3D simulated environment and then by the robotic platform. The modular implementation of the architecture allows easy replacement of individual modules, thus improving reusability and flexibility of the application. It should be noted that adaptation to different robots only requires replacement of the final module. Figure 1 describes the main components of the PbD testbed. The actual robot, controlled by the PbD application is a Puma560, a 6 dof manipulator. The system includes also a vision sensor to recognize the objects in the real workspace and detect their initial configuration. Table 1 shows the hardware configuration for the PCs used for the demonstration platform and for the robot and vision server. Communication between client and server stations is achieved through a CORBA framework [1].
HUMAN OPERATOR Visual and Vibrotactile feedback
Hand Gesture
CyberTouch & Tracker
Visual feedback
REAL WORKSPACE
Virtual Environment
OPERATOR DEMONSTRATION INTERFACE
Robot arm PUMA 560 Robot controller (RCCL) Vision server
Task Planner
Task Performer
Figure 1.
Table 1
Vision system
TASK SIMULATION
CORBA INFRASTRUCTURE
PbD system architecture.
Hardware setup.
Demonstration platform Robot server Vision server
Intel dual P 3 700M Hz, 256 MB. OS MS Win 2K Intel P 3 600M Hz, 256 MB. OS Solaris 8.0 Intel P 4 2.4GHz, 512 MB. OS SuSE Linux 8.1
3.1 Demonstration interface The demonstration interface includes an 18-sensor CyberTouch (a virtual reality glove integrating tactile feedback devices, from Immersion Corp., Inc.) and a six degree of freedom Polhemus tracker. The human operator uses the glove as an input device. Operator’s gestures are directly mapped to an anthropomorphic 3D model of the hand in the simulated workspace. In the developed demonstration setup, the virtual environment is built upon the Virtual Hand Toolkit (VHT) provided by Immersion Corp. To deal with geometrical information in a formal way, VHT uses scene graphs data structure (Haptic Scene Graph - HSG) containing high-level descriptions of environment geometries. VRML models can be easily imported in VHT through a parser included in the library. To grant a dynamic interaction between the virtual hand and the objects in the scene, VHT allows objects to be grasped. A collision detection algorithm (V-Clip)
generates collision information between the hand and the objects, including the surface normal at the collision point. A grasp state is achieved if the contact normals provide sufficient friction; otherwise, if the grasp condition for a grasped object is no longer satisfied, the object is released. The user interface also provides a vibratory feedback using CyberTouch actuators. Vibrations convey proximity information that helps the operator to grasp the virtual objects. Current implementation of the virtual environment for assembly tasks in the “block world” consists of a single plane and a set of 3D coloured blocks on it reproducing qualitatively the real workspace configuration.
3.2 Task recognition The task planner analyzes the demonstration provided by the human operator and segments it into a sequence of high-level primitives that should describe the user actions. To segment the human action in highlevel operations, a simple algorithm based on changes in the grasping state has been implemented: a new operation is generated whenever a grasped object is released. The effect of the operation is determined by evaluating the achieved object configuration in the workspace. Three high-level tasks have been identified as basic blocks to describe assembly operations. The first one is used to pick objects and place them onto a support
ment.
Task
Table 2
BasicTask
Move_xy
Move_z
AttachObj
PickAndPlaceOnTable
Figure 2.
HighLevelTask
DetachObj
PickAndPlaceOnObj
PegInHole
Task hierarchy.
plane (PickAndPlaceOnTable), the second one is used to pile objects (PickAndPlaceOnObj), and the last one is used to put small objects in a hole of a container on the working plane (PegInHole). The three high-level tasks have been implemented in C++ as subclasses of a HighLevelTask abstract class (Figure 2). Information about the recognized highlevel task is passed to the constructor when the HighLevelTask class is instantiated. In detail, PickAndPlaceOnTable constructor requires a reference to the grasped object and the release position on the table; PickAndPlaceOnObj constructor requires a reference to both the grasped object and the object on which it must be deployed; finally, PegInHole constructor requires a reference to both the grasped object and the container.
3.3 Task generation A set of BasicTasks have been implemented for the basic movement of the real robot. The available concrete classes (Figure 2) include basic straight movements of the end effector, such as translations in the XY plane, parallel to the workspace table, and towards the z axis. Moreover, two classes describe the basic operations to pick up and to release objects by simply closing and opening the on-off gripper of the manipulator. The high level tasks identified in the previous phase are then decomposed in a sequence of BasicTasks objects describing their behaviour. Table 2 describes how the three proposed high level tasks are decomposed as three sequences of eight basic tasks. The difference is that in the first case the object has to be released on the table, whereas, in the second one, it must be released on the top of a pile at zf height above the table plane. Since the available manipulator has no force sensor, zf is computed in the virtual demonstration environment based on contact relations. For the peg-in-hole task the grasped object must be released after its initial insertion in the hole. In table 2 T OP , T ABLE and DOW N are predefined constants that define the current environ-
1 2 3 4 5 6 7 8
High level tasks decomposition.
P&PlOnTab Move xy(xi ,yi ) Move z(zi ) AttachObj Move z(TOP) Move xy(xf ,yf ) Move z(TABLE) DetachObj Move z(TOP)
P&PlOnObj Move xy(xi ,yi ) Move z(zi ) AttachObj Move z(TOP) Move xy(xf ,yf ) Move z(zf ) DetachObj Move z(TOP)
PegInHole Move xy(xi ,yi ) Move z(zi ) AttachObj Move z(TOP) Move xy(xf ,yf ) Move z(DOWN) DetachObj Move z(TOP)
Each concrete class of the task tree provides two methods to perform the operation, one in the simulated environment and one in the real workspace. Once the entire task has been planned, the task performer (Figure 1) manages the execution process in both the simulated and real workspaces.
3.4 Task simulation After the recognition phase, the system displays to the human operator a graphical simulation of the generated task. This simulation improves safety since the user can check the correctness of the interpreted task. If the user is not satisfied after the simulation, the task can be discarded without execution in the real environment. The simulation is non-interactive and takes place in a virtual environment exploiting the same scene graph used for workspace representation in the demonstration phase. The only difference is that the virtual hand node in the HSG is replaced by a VRML model of the Puma560 manipulator. The simulated robot has the ability to perform all the operations described in the previous section. The movement of the VRML model is obtained applying an inverse kinematics algorithm for the specific robot and is updated at every frame. To this purpose, the PbD system exploits the RRG Kinematix library from the Robotics Research Group at the University of Texas (http://www.robotics.utexas.edu). All information about the robot is included in a Denavit-Hartenberg parameter file supplied to the algorithm. In the simulation, picking and releasing operations are achieved by attaching and detaching the nodes of the HSG representing the objects to the last link of the VRML model of the manipulator.
3.5 Task execution The execution in the real workspace is obtained through a C++ framework [1] based on the TAO ACE ORB. The PbD system builds a client-server CORBA connection using a Fast Ethernet switch. The client side runs on MS Windows 2000, whereas the server controlling the real manipulator runs on Solaris 8. The methods of the concrete classes in the task list invoke blocking remote calls of the servant manipulator object which transforms them into manipulator commands, exploiting the Robot Control C Library (RCCL) [9].
execution
demonstration
step 1
Figure 3.
step 8
step 8
virtual environment
predictive simulation
operator workspace
manipulator workspace
PbD of an assembly task in a block world.
execution
demonstration
step 1
Figure 4.
step 1
step 3
step 1
step 3
virtual environment
predictive simulation
operator workspace
manipulator workspace
PbD of Peg-in-hole and piling of boxes tasks.
The PbD system goes through four different reference frames that must be correctly matched with appropriate homogeneous transforms. The first one is the reference frame relative to the Polhemus tracker. This frame must be mapped into the second reference frame that describes the virtual demonstration environment. The third and fourth frames are attached to the simulation and the real workspaces respectively.
4. Experiments The capabilities of the PbD system have been evaluated in assembly experiments comprising pick and place operations on the workspace plane, stacking of objects and peg-in-hole operations. We describe two of the experiments in the following.
The first experiment (Figure 3) consists of a sequence of eight pick and place actions on the working plane. The workspace contains four coloured boxes of the same dimensions. The images on the left of figure 3 are snapshots of the virtual environment and the operator workspace for the demonstration phase; the images on the right show the simulation and the real workspaces. In the second experiment (Figure 4) the workspace contains three objects: two coloured boxes and a cylinder. The user demonstration consists of a sequence of three steps. The user first picks up the cylinder and puts it in the container, then puts the yellow box on a different position on the table, finally he grasps the blue box and stacks it on top of the yellow one. While performing the demonstration, the user can dynamically adjust the
point of view of the virtual scene. This feature, typical of demonstration in virtual environments, can help in picking partially occluded objects, releasing them on the plane or on other boxes, and inserting them in the container. For space reasons, Figures 3 and 4 only show the initial and final steps for both tasks. Movies of the demonstrations are available at the web page: http://rimlab.ce.unipr.it/Projects/PbD/pbd.html. Figure 5 shows a close-up image of the peg entering the hole. It should be mentioned that since the robot has no force sensor, the task is essentially a pick-andplace task with some extra accuracy requirement and no contact intended between the peg and the hole.
References [1]
[2]
[3]
[4]
[5] [6] [7]
[8]
[9] [10] [11]
[12]
Figure 5.
Peg entering the hole.
[13] [14]
5. Conclusions In this paper we have described the prototype of a PbD system based on a virtual reality teaching interface. We believe that using a data glove as input device provides a natural method for human-robot interaction. Our future research will focus on the extension of the system functionalities, including development of refined learning algorithms and motion planning strategies. Moreover, we intend to add to the demonstration interface graphical and tactile virtual fixtures to assist the user more effectively while performing the task. A long term goal of PbD is also acquisition of generic workspace configurations into the virtual environment regardless of the object type. Solution of the automatic object recognition problem would help in the development of advanced service robot applications based on the programming by demonstration paradigm.
Acknowledgments This research is partially supported by MIUR (Italian Ministry of Education, University and Research) under project RoboCare (A Multi-Agent System with Intelligent Fixed and Mobile Robotic Components).
[15] [16]
S. Bottazzi, S. Caselli, M. Reggiani, and M. Amoretti. A Software Framework based on Real-Time CORBA for Telerobotic Systems. In IEEE Int. Conf. on Intelligent Robots and Systems, 2002. R. Dillmann, M. Ehrenmann, D. Ambela, and P. Steinhaus. A Comparison of Four Fast Vision Based Object Recognition Methods. In IEEE Int’l Conf. on Robotics and Automation, 2000. R. Dillmann, O. Rogalla, M. Ehrenmann, R. Z¨ ollner, and M. Bordegoni. Learning Robot Behaviour and Skills Based on Human Demonstration and Advice: The Machine Learning Paradigm. In 9th Int’l Symp. of Robotics Research, 1999. H. Friedrich, S. M¨ unch, R. Dillmann, S. Bocionek, and M. Sassin. Robot Programming by Demonstration: Supporting the Induction by Human Interaction. Machine Learning, pages 163–189, May 1996. K. Ikeuchi and T. Suehiro. Towards an Assembly Plan from Observation, Part I: Task Recognition with Polyhedral Objects. IEEE Trans. on Robotics and Automation, 10(3), 1994. S. B. Kang. Robot Instruction by Human Demonstration. PhD thesis, Carnegie Mellon University, December 1994. Y. Kuniyoshi, M. Inaba, and H. Inoue. Learning by Watching: Extracting Reusable Task Knowledge from Visual Observation of Human Performance. IEEE Trans. on Robotics and Automation, 10(6), 1994. E. Lloyd, J. S. Beis, D. K. Pai, and D. G. Lowe. Programming Contact Tasks Using a Reality-Based Virtual Environment Integrated with Vision. IEEE Trans. on Robotics and Automation, 15(3), 1999. J. Lloyd and V. Hayward. Multi-rccl user’s guide, 1992. H. Ogata and T. Takahashi. Robotic Assembly Operation Teaching in a Virtual Environment. IEEE Trans. on Robotics and Automation, 10(3), 1994. K. Ogawara, S. Iba, T. Tanuki, H. Kimura, and K. Ikeuchi. Recognition of Human Task by Attention Point Analysis. In IEEE/RSJ Int’l Conf. on Intelligent Robots and Systems, 2000. K. Ogawara, J. Takamatsu, H. Kimura, and K. Ikeuchi. Extraction of Essential Interactions through Multiple Observations of Human Demonstrations. IEEE Trans. on Industrial Electronics, 50(4), 2003. O. Rogalla, M. Ehrenmann, and R. Dillmann. A Sensor Fusion Approach for PbD. In IEEE/RSJ Int’l Conf. on Intelligent Robots and Systems, 1999. C. P. Sayers and R. P. Paul. An Operator Interface for Teleprogramming Employing Synthetic Fixtures. Presence, 3(4), 1994. T. Takahashi and T. Sakai. Teaching Robot’s Movement in Virtual Reality. In IEEE/RSJ Int. Workshop on Intelligent robots and systems, 1991. R. Z¨ ollner, O. Rogalla, R. Dillmann, and M. Z¨ ollner. Understanding Users Intention: Programming Fine Manipulation Tasks by Demonstration. In IEEE/RSJ Int’l Conference on Intelligent Robots and Systems, 2002.