what can robots learn from humans? - CiteSeerX

29 downloads 2206 Views 171KB Size Report
Key Words. Robots, Man-Machine Systems, Programming Support, Machine Learning ... As long as the robot can learn autonomously, i.e. ..... IOS Press. pp.
In: IFAC Workshop on Human-Oriented Design of Advanced Robotics Systems (DARS '95)

WHAT CAN ROBOTS LEARN FROM HUMANS? Holger Friedrich and Michael Kaiser Institute for Real-Time Computer Systems & Robotics University of Karlsruhe, D-76128 Karlsruhe, Germany. E-Mail: [email protected] Abstract. Programming by Demonstration (PbD) is an intuitive method to program a robot. The

user, acting as a teacher or programmer, shows how a particular task should be carried out. The demonstration is monitored using an interface device that allows the measurement and recording of both the applied commands as well as the data simultaneously perceived by robot's sensors. This paper identi es the kind of knowledge that the robot can actually acquire from the human user through demonstrations and the requirements that must be met in order to be able to interpret what has been demonstrated. Finally, it presents and experimentally evaluates an approach to integrated acquisition, evaluation, tuning, and execution of elementary skills and task-level programs for robots based on human demonstrations. Key Words. Robots, Man-Machine Systems, Programming Support, Machine Learning

1 INTRODUCTION One of the major cost factors involved in robotic applications is the development of robot programs. Especially the use of advanced sensor systems and the existence of strong requirements with respect to the robot's exibility ask for very skillful programmers and sophisticated programming environments. While these programming skills may exist in industrial environments, they are certainly not available if the use of robots in a personal environment is considered. For opening the expected new, mainly consumer-oriented service robot market (Schraft, 1994), it is therefore essential to develop techniques that allow untrained users to use such a personal service robot both safely and eciently. Teaching Programming Configuration

Understanding Control Maintenance

Figure 1 Human-Robot Interaction in the context of personal service robots. Two basic aspects of the interaction between the robot and the user can be distinguished (Fig. 1, (Dillmann et al., 1995b)). Firstly, the user wants to con gure and instruct the robot. This requires

to translate the user's language into the robot's, i.e., to compile user intentions into actual robot programs. Secondly, to allow the user to eciently control and maintain the robot, the low-level numerical representations used by the robot have to be translated into an understandable form, i.e., symbols have to be built from signals. What is desired is to enable the robot to perform these tasks partly autonomously, i.e., to learn semantically meaningful descriptions of its own perceptions, actions, and states, and to use these descriptions both to communicate robot knowledge to the user and to interpret user demonstrations, i.e., to acquire human knowledge from observing human performance. These are also the key issues in the work related to Robot Programming by Demonstration (Heise, 1989; Kuniyoshi et al., 1994; Friedrich and Dillmann, 1995) and Robot

Skill Acquisition via Human Demonstration (Asada and Liu, 1991; Reignier et al., 1995; Kaiser et al., 1995a).

Learning, however, can only proceed based on information obtained about, e.g., the environment the robot is operating in, the e ect of the robot's actions, and, obviously, the intentions of the user, i.e., based on some notion of goal the user communicates to the robot. Throughout this paper, it will be investigated and exempli ed by means of experiments with real robots, what kind of information can be expected from an untrained user and what learning techniques are available to make use of this information. Finally, a methodology for transferring human knowledge to robots is

presented which takes the specialties of untrained users' knowledge explicitly into account.

2 LEARNING FROM HUMAN DEMONSTRATIONS There are several learning tasks that can be identi ed in the framework of Robotics and robot control (Kreuziger, 1992). Principally, Machine Learning can be applied to support the following actions during robot development and deployment: 



Initial knowledge acquisition and program generation, i.e., initial robot programming and world-model acquisition. Action and world knowledge re nement, i.e., acquisition of new knowledge in order to be able to solve new tasks as well as re nement and correction of existing knowledge helping the robot to deal with an uncertain and to adapt to a changing environment.

As long as the robot can learn autonomously, i.e. by using only the information that's directly available (e.g., via sensor data and internal evaluation functions), both knowledge acquisition and re nement can be performed at any instant. However, if learning involves information to be asked from or given by a human user, it must proceed demanddriven. Then, two prototypical cases exist that require the robot to extend its knowledge base, i.e., to learn: 1. A task is speci ed (either internally or by the human user) that the robot cannot perform due to missing action knowledge. 2. A task is speci ed that involves objects that are not known or not identi able for the robot.

2.1 Learning Tasks The concrete learning tasks, i.e., the tasks to acquire knowledge from external information, depend to a certain extend on the knowledge available to the robot and on the way this knowledge is used for robot control, i.e., on the robot's control architecture. To allow for a most general formulation of the learning tasks, we assume the robot to provide us with the following: 

a set of objects plus methods that allow to detect/identify these objects,



a set of elementary skills, i.e., perceptionaction transformations involving no model knowledge, that represent the basic capabilities of the robot,



a reasoning mechanism that allows to generate a program to perform an (externally or internally) speci ed task, using the elementary skills and the available object knowledge, and



a user interface that supports the speci cation of tasks as well as the external control of the robot for demonstration purposes.

Initially, no objects may be known to the robot, and the only existing skill may be to move unsupervised (i.e., without sensor feedback) to a commanded location. Then, the learning tasks that must be considered are the following: 1. Acquisition of a new program schema, i.e., learning of a sequence of elementary skills including proper application conditions. 2. Acquisition of a new elementary skill, i.e., learning of a perception-action transformation that does not involve model knowledge. 3. Acquisition of a new perception skill, i.e., learning of the signature representing a speci c object. 4. Integration of the acquired skill, program schema, or signature into the knowledge base, i.e., learning how to use the newly acquired knowledge. None of these tasks can the robot solve purely on its own. Searching the action or perception space completely is absolutely infeasible, and still requires an evaluation by a knowledgeable supervisor.

2.2 Demonstration as the Basis of Learning Since the robot requires (human) support for solving the learning tasks, Programming by Demonstration (PbD) (Cypher, 1993) seems to be a possible way to go. This paradigm relies on demonstrations of the task under consideration that are given by a user. The demonstrations are used as the primary input and therefore as the basis for the learning process. The PbD approach has been applied successfully in domains such as graphic editors (Lieberman, 1993), instructible software agents (Maulsby, 1994) or intelligent interfaces (Minton et al., 1995).

Robot Programming by Demonstration (RPD) has been realized Additionally,

through a number of applications and on di erent levels of both robot control and perception. 





Demonstrations were proven to be suitable for the acquisition of new program schemata on task level (Segre, 1989). In (Kuniyoshi et al., 1994), sequences of video images were analyzed in order to generate assembly plans. (Andreae, 1984) presented NODDY, a system that generated generalized programs by fusing several demonstrations. Single demonstrations and user intentions are the basis for the robot programs generated by the system described in (Friedrich and Dillmann, 1995). On the control level, demonstrations can be used as basis for learning both open-loop and closed-loop new elementary skills. The acquisition of open-loop skills is mostly focused on the reconstruction of trajectories from a sequence of demonstrated states (positions) (Delson and West, 1994; Ude, 1993). Systems dealing with the acquisition of closed-loop elementary skills in general feature a very task-speci c design. They comprise acquisition techniques for manipulation tasks such as deburring (Asada and Liu, 1991) and assembly (Dillmann et al., 1995a; Kaiser et al., 1995a) as well as for vehicle control (Pomerleau, 1991) and autonomous robot navigation (Reignier et al., 1995). Learning new perception skills for object and landmark recognition can also take place on several levels. (Accame and Natale, 1995) present an approach to learn sensor parameterizations from demonstrations. Active perception skills, i.e., the combination of actions and sensing for the purposes of object recognition, are the topic of work presented in (Klingspor and Morik, 1995).

Despite the large number of publications representing research in the various domains related to RPD, what is almost never taken into account is the negative in uence the human teacher has on the learning system. Especially in a service environment with demonstrations provided by unexperienced users, examples must be considered to be not at all optimal with respect to the robot.

3 TRANSFERRING HUMAN KNOWLEDGE TO ROBOTS The human factor in demonstration and subsequent learning, i.e., during the whole RPD process, requires to consider the following questions:

1. How is the quality of human demonstrations in general with respect to their application as basis for robot learning? 2. What errors and/or suboptimalities that might degradate a demonstration quality can or are likely to occur in human demonstrations? 3. How can the quality of a demonstration be automatically assessed, and how can the knowledge about the demonstration quality be used? 4. How can suboptimal demonstrations be handled such that their bad in uence on the learning process can be overcome in order to achieve a good performance? Since automatic quality assessment and demonstration preprocessing were already discussed in (Kaiser et al., 1995b), we are going to restrict ourselves now to questions 1 and 2, while the methodology presented in section 3.2 will consider all of the posed questions.

3.1 The teachers performance - a crucial factor in the learning process Ideally, the human teacher will demonstrate exactly what is required by the learning system, i.e., he will provide { in terms of examples { an exact procedure for performing a certain action or for identifying a certain object. However, knowledge provided by means of human demonstrations will seldom be given as examples of a program schema, a skill, or a signature that are optimal with respect to the robot or the learning system. While suboptimality w.r.t. the robot simply means that the best solution that could principally be executed by the robot is better (e.g., in terms of speed or energy consumption) than the demonstrated one, the latter kind of suboptimality arises from information that's missing in the sampled data. It occurs as soon as the operator uses sensors that are not available to the robot (and can therefore not be used for generating a robot program) or if the operator employs mental models that allow for partially replacing actual robot perceptions. In addition, the user might accidentally provide the learning system with incorrect control knowledge such as incorrect or incomplete information about his intentions or the semantics of the example data. In such a case, the knowledge acquired on the basis of the given example might be correct, but the robot will not be able to use it correctly. More speci cally, the following sources of example degradation exist:

Operator actions during peg-into-hole operation 0.4 Dx Dy Dz

0.2

0

-0.2

-0.4

-0.6

-0.8

-1

-1.2 0

100

200

300

400

500

600

Figure 2 Left: The peg-into-hole task is a typical example of an operation requiring continuous control. Right: Actions (Dx ; Dy ; Dz ) (commanded translational o sets) recorded from a human demonstration. 









Unnecessary actions that do not contribute to achieving the nal goal can take place during the demonstration. Whereas this has to happen consciously on the task level, unconsciously performed unnecessary actions occur regularly on the skill level. As soon as the identi cation of an object requires active perception, this problem occurs also in the context of signature acquisition. While unnecessary motions that are corrected afterwards do not e ect the course and the usefulness of a demonstration, incorrect actions do. These actions occur on the skill level, on the task level, and with respect to active perception for signature identi cation. Unmotivated actions are actions that cannot be learned by the PBD system since it cannot determine any condition for their execution. In general, these actions occur in demonstrations that have been performed using sensors that were not used for recording the example and are not available to the robot. Often, these actions might cause contradictions in the sampled data. The choice of the scenario for the demonstration is also crucial. Both on the task and on the skill level, demonstrating the solution under speci c environmental conditions may severely restrict the general applicability of the generated robot program. For signature acquisition, this may result in signatures that are based not only on object features but as well on irrelevant but randomly correlated features of the environment. The speci cation of a wrong intention is the worst thing that can happen. On the skill level, this simply means that a wrong semantics is assigned to the learned skill, such that it will not be applied in the correct context. On the task level, specifying a wrong intention may result in a complete change of the

generated program. W.r.t. learning object signatures, a wrong intention means to assign a wrong label and, possibly, wrong attributes to an object. Unfortunately, these suboptimalities are not the exception but rather the rule when dealing with human based demonstrations. Unnecessary and incorrect actions are very likely to occur even when demonstrating fairly simple tasks. For example, a human solving a puzzle will put parts in the wrong place every now and then, only to remove them again later. Furthermore he'll move parts around just to test with his eyesight whether they t or not, without showing a strategy that the learning system would be able to draw from these unnecessary actions. Especially when demonstrating elementary skills requiring closed loop control unmotivated actions are likely to occur. For example, the data used by learning system may consist mostly of forces and torques, while the user may additionally use visual information. Also, the user might employ mental models that are not accessible to the robot during task execution. Summarizing, example degradation mainly means that data available for training are in general quite noisy. Fig. 2 shows an example where using the demonstration data displayed on the right will result only in an elementary skill (like the insertion skill shown left) of unsatisfactory performance. Consequently, whatever knowledge is learned from these data has to be re ned on-line (Kaiser et al., 1995b).

3.2 A methodology for robust transfer of human knowledge Since one of the main characteristics of knowledge provided via human demonstrations of actions or

object prototypes is the varying quality of this knowledge, both example quality assessment and on-line re nement of the initially acquired knowledge become important issues. Furthermore, for both acquired elementary skills and acquired object signatures, their immediately available operational representation (e.g., as a neural network) must be complemented by a symbolic one, in order to make this new knowledge accessible both for the robot's reasoning mechanism and the human teacher. Thus, the knowledge transfer process (Fig.3) becomes complex, requires more steps than the usually considered (in robotics (Guez and Selinsky, 1988; Asada and Liu, 1991) as well as in psychology (Fitts, 1964; Anderson, 1982)) example generation, "strategy extraction" (learning), and skill application, and asks for the support of the human teacher in several phases: 1. The teacher determines tasks for the robot that require to extend the robot's knowledge base. 2. The teacher selects the scenario for the generation of the examples and performs a demonstration, either of a task-level program or an elementary skill, or shows an object whose signature is to be acquired. Alternatively, a strategy for autonomous experimentation including boundary conditions on perceptions and actions, and an evaluation function can be speci ed. 3. The teacher, supported by analysis tools, assesses the quality of the example, thereby guiding both the preprocessing and the de nition of convergence criteria. 4. If an elementary skill or an object signature has been acquired, the teacher adds a symbolic interpretation, supported by the learning robot that provides the context in which the missing skill or signature was detected (originating from step 1). 5. The teacher provides an on-line evaluation of the robot performance, either by subjectively evaluating it, by choosing a general evaluation function (e.g., based on motion/recognition speed), or by designing an evaluation function speci c for the newly acquired skill, schema, or signature. The human teacher is therefore heavily involved in the robot's learning process. However, the cumbersome task to program the robot has been replaced by actually communicating with the robot either on a high level of abstraction or via showing solutions instead of formally specifying them.

4 DISCUSSION Throughout this paper, we have identi ed learning tasks that are to be solved by a robot that acquires knolwedge from humans. We have analyzed the possibility and advantages of using human supplied demonstrations as basis of the learning process and thereafter have pointed out some problems originating from the varying quality of human demonstrations. Finally we developed an approach that { despite possible suboptimalities { allows to transfer human knowledge to robots in a highly user interactive way. We believe that the assumptions we have made, i.e., to rely on the ability of a human teacher to demonstrate a solution to a given task, and to provide at least a qualitatively correct evaluation of the robot's performance, are realistic. We cannot expect that the action and perception skills acquired via an interactive learning approach are comparable to those originating from an in-depth task analysis and explicit robot programming. However, especially if robots are to become consumer products, they will be exposed to users who are not at all familiar with computers or robots. For such users, explicitly programming their robot according to their personal requirements is not an option, whereas teaching by showing, i.e., Robot Programming by Demonstration, de nitely is.

ACKNOWLEDGEMENTS This work has partially been supported by the ESPRIT Project 7274 "B-Learn II" and the German Research Foundation by the SFB 314 "Kunstliche Intelligenz," Project R2. It has been performed at the Institute for Real-Time Computer Systems & Robotics, Prof. Dr.-Ing. U. Rembold and Prof. Dr.-Ing. R. Dillmann, Department of Computer Science, University of Karlsruhe, Germany.

5 REFERENCES Accame, M. and F.G.B. De Natale (1995). Neural tuned edge extraction in visual sensing. In: Proceedings of the 3rd European Workshop on Learning Robots (EWLR-3) (M. Kaiser, Ed.). Heraklion, Crete, Greece. Anderson, J. R. (1982). Acquisition of cognitive skill. Psychological review 89(4), 369 { 406. Andreae, P. M. (1984). Constraint limited generalization: Aquiring procedures from examples. In: Proceedings of the National Conference on Arti cial Intelligence (AAAI). pp. 6 { 10. Asada, H. and S. Liu (1991). Transfer of human skills to neural net robot controllers. In: Proceedings of the 1991 IEEE International Conference on Robotics and Automation.

Example generation Example preprocessing

Knowledge/Skill refinement/enhancement

Quality assessment Knowledge/skill evaluation Example segmentation Training data generation

Parameter initialization

Knowledge/Skill application Symbolic Interpretation

Off-line learning process

Figure 3 Di erent phases of robot knowledge/skill acquisition, application, and re nement. Gray arrows indicate feedback loops. Only some phases are permitted to require user interaction. Baroglio, C., A. Giordana, M. Kaiser, M. Nuttin and R. Piola (1995). Learning controllers for industrial robots. Machine Learning. Cypher, A. I. (1993). Watch what I do { Programming by Demonstration. MIT Press. Cambridge, Massachusetts. Delson, N. and H. West (1994). The use of human inconsistency in improving 3D robot trajectories. In: Proceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems. Dillmann, R., M. Kaiser and A. Ude (1995a). Acquisition of elementary robot skills from human demonstration. In: International Symposium on Intelligent Robotics Systems. Pisa, Italy. Dillmann, R., M. Kaiser, V. Klingspor, K. Morik and F. Wallner (1995b). Teaching and understanding intelligent service robots: A Machine Learning Approach. In: 19. Deutsche Jahrestagung fur Kunstliche Intelligenz (KI '95 Workshop: Informationsverarbeitung in Servicerobotern). Bielefeld, Germany. Fitts, P. M. (1964). Perceptual-motor skill learning. In: Categories of human learning (A. W. Melton, Ed.). Academic Press. New York. Friedrich, H. and R. Dillmann (1995). Robot programming using user intentions and a single demonstration. In: Proceedings of the 3rd European Workshop on Learning Robots (EWLR-3) (M. Kaiser, Ed.). Heraklion, Crete, Greece. Guez, A. and J. Selinsky (1988). A neuromorphic controller with a human teacher. In: IEEE International Conference on Neural Networks. San Diego, CA. pp. 595 { 602. Heise, R. (1989). Demonstration instead of programming: Focussing attention in robot task acquisition. Research report no. 89/360/22, Department of Computer Science, University of Calgary. Kaiser, M., A. Retey and R. Dillmann (1995a). Robot skill acquisition via human demonstration. In: Proceedings of the International Conference on Advanced Robotics (ICAR '95). Kaiser, M., H. Friedrich and R. Dillmann (1995b). Obtaining good performance from a bad teacher. In: International Conference on Machine Learning, Workshop on Programming by Demonstration. Tahoe City, California.

Klingspor, V. and K. Morik (1995). Towards concept formation grounded on perception and action of a mobile robot. In: Proceedings of the 4th Intern. Conference on Intelligent Autonomous Systems (IAS-4). Karlsruhe. Kreuziger, J. (1992). Application of machine learning to robotics: An analysis. In: Proceedings of the Second International Conference on Automation, Robotics, and Computer Vision (ICARCV '92). Kuniyoshi, Y., M. Inaba and H. Inoue (1994). Learning by watching: Extracting reusable task knowledge from visual observation of human performance. IEEE Transactions on Robotics and Automation 10(6), 799{822. Lieberman, Henry (1993). Watch what I do. Chap. MONDRIAN: A teachable graphical editor. Maulsby, Dave (1994). Instructibleagents. PhD thesis. University of Calgary, Canada. Minton, Steven, Andrew Philpot and Shawn Wolfe (1995). Speci cation-by-demonstration: The viccs interface. In: Proceedings of the Workshop: Learning from Examples vs. Programming by Demonstration at the ICML'95 (Holger Friedrich, Ed.). Lake Tahoe, USA. Pomerleau, D. A. (1991). Ecient training of arti cial neural networks for autonomous navigation. Neural Computation 3, 88 { 97. Reignier, P., V. Hansen and J.L. Crowley (1995). Incremental supervised learning for mobile robot reactive control. In: Intelligent Autonomous Systems 4 (IAS4). IOS Press. pp. 287 { 294. Schraft, R. D. (1994). Serviceroboter - ein Beitrag zur Innovation im Dienstleistungswesen. Fraunhofer Institut fur Produktionstechnik und Automatisierung (IPA). Segre, A. M. (1989). Machine Learning of Robot Assembly Plans. Kluwer Academic Publishers. Ude, A. (1993). Trajectory generation from noisy positions of object features for teaching robot paths. Robotics and Autonomous Systems 11(2), 113{127.