Kentaro Toyama, Gregory D. Hager, and Jonathan Wang. Department of Computer .... u = ?k J ?1e;. (3) will drive the observed error to zero in the absence of noise. Since the robot ..... 2] A. Castano and S. A. Hutchinson. Visual compliance:.
Servomatic: A Modular System for Robust Positioning Using Stereo Visual Servoing Kentaro Toyama, Gregory D. Hager, and Jonathan Wang Department of Computer Science, Yale University New Haven, CT 06520-8285
Abstract
We introduce Servomatic, a modular system for robot motion control based on calibration-insensitive visual servoing. A small number of generic motion control operations referred to as primitive skills use stereo visual feedback to enforce a speci c task-space kinematic constraint between a robot end-eector and a set of target features. Primitive skills are able to position with an accuracy that is independent of errors in hand-eye calibration and are easily combined to form more complex kinematic constraints as required by different applications. The system has been applied to a number of example problems, showing that modular, high precision, vision-based motion control is easily achieved with othe-shelf hardware. Our continuing goal is to develop a system where low-level robot control ceases to be a concern to higher-level robotics researchers.
1 Introduction
Despite a spurt of recent research in vision guided systems, vision-based robotic systems are still the exception rather than the rule for several reasons: sensitivity to miscalibration of cameras, unavailability of real-time vision systems which are easy to con gure, and lack of vision-based motion control modules which can be used by \non-experts." For a review of visualservoing techniques, see [5]. We have already developed a real-time vision system, which we call \XVision," that allows users to perform fast feature tracking based on gradient edges and texture patches [13]. The XVision system comprises several modular software components which can be combined to track objects of varying complexity, ranging from a simple line segment to human faces. This article describes a similar modular software system we are developing for robot control tasks: Servomatic allows users to write high-level applications by calling primitive calibration-insensitive visual-servoing skills
as subroutines or background processes. Our approach to visual servoing is based on de ning hand-eye skills from a smaller set of primitive skills which enforce kinematic constraints using generic visual inputs. The goal of the skill-based paradigm is to demonstrate that a small repertoire of modular, primitive skills together with an intuitive set of composition operations results in an easy-to-use system which can handle a variety of tasks robustly. Visual servoing systems may employ a single camera, typically mounted on the arm itself [2, 6, 20, 21], or they may use a stereo arrangement [1, 16, 17, 22]. Stereo systems must deal with more input data but can also oer accurate 3-D information. This article discusses a free-standing stereo camera arrangement, although with minor modi cations, the same formulation could be used for dierent con gurations of more than one camera. Our paradigm is image-based, as opposed to position-based. We, therefore, compute feedback directly from measured errors in the camera images instead of from a reconstructed Cartesian reference frame. Image-based systems tend to be more robust to camera miscalibration. The control primitives themselves are chosen to be directly related to intuitive task-space kinematic constraints in order to facilitate programming. Because image-based visual servoing relies on relative measurements between manipulator and target, they require an endpoint-closed-loop (ECL) system that observes both. The more calibration-dependent endpoint-open-loop (EOL) controllers do not observe the manipulator. The diculty with many ECL systems is that they use approximations to perspective transformations which are only locally valid or employ complex adaptive arrangements that require burdensome calculations [4, 14, 15, 25]. We describe an ECL system that uses a globally valid perspective model. Finally, most visual servoing research has concentrated on developing solutions to speci c isolated
problems. An exception is found in [3] where it is noted that it would be possible to compile a \library" of canonical visual tasks. In this paper, we provide the building blocks for such a library. The remainder of this paper discusses the relevant literature, the theory of calibration-free visual servoing, some applications using visual servoing, and our current software design.
2 Background
We rst establish notational conventions and provide general theoretical background.
2.1 Vision-Based Control
Unless otherwise noted, all positions, orientations, and feature coordinates are expressed relative to the robot base coordinate system, W : The pose of an object in this coordinate system is represented by a pair x = (t; R); t 2