In: 7th International Conference on Advanced Robotics (ICAR '95), Sant Feliu de Guixols, Spain
Interactive Environment Modeling and Vision-Based Task Speci cation for a Teleoperated Mobile Robot Lin, I.S., Perret , J., Wallner, F ., Dillmann, R.
Institute for Real-Time Computer Systems & Robotics, University of Karlsruhe, Kaiserstrae 12, D-76128 Karlsruhe, Germany phone: +49 721 608-4059; fax: +49 721 606740; e-mail:
[email protected] LAAS/CNRS, 7 avenue du Colonel Roche, 31077 Toulouse Cedex, France phone: +33 61.33.64.51; fax: +33 61.33.64.55; e-mail:
[email protected]
Abstract. To enable a mobile robot to perform intervention or service tasks in a remote
environment, the robot should be able to learn the unknown world and execute high level task commands from the operator. In this paper an interactive modeling of man-made environment from a single camera is presented. In this model the geometrical, topological and semantic information are integrated in a hierarchical structure. Vision-based interactive task speci cation is proposed to allow the operator to specify a task and the behavior associated with it directly on the visual interface. The underlying planner works on the interactively created world model and maps the speci cation to a suitable sequence of actions. With such schemes a teleoperation system can extend its task to object level, which is essential in the application of the service or intervention robot.
Key Words. Telerobotics, Environment Modeling, Man-Machine Systems, Task Speci cation
1 INTRODUCTION The eld of teleoperation for manipulator arms has long been driven by the needs of the industry and is now mature. It has entered the area of telerobotics, characterized by the introduction of supervision and decision processes at the remote site, both relieving the operator from burdensome tasks and improving eciency [10]. However, little work has concentrated on mobile robots, so that most industrial products still rely on the old teleoperation paradigm. This situation is now changing rapidly because of the emergence of new application areas, which call for robust, fast and intelligent, mobile agents, liable to operate under strong constraints (ill-known dynamic environment, complex tasks, communication delay). Mobile intervention (de-mining in airports, re ghting, nuclear wast cleaning, surveillance, scienti c activities on the moon) and service robots (assistance to the disabled, hotel service) belong to the new market demands.
We address here the case of a mobile robot in a manmade environment (building or equivalent). Some autonomous systems are known to operate rather well in such conditions, but at the expense of the previous modeling and structuring of the environment (magnetic beacons, infra-red landmarks) which has to be static, and for very simple tasks. We argue that there is a need for an advanced teleoperation scheme, in order to cope with the uncertainties and dynamics of the world ( re- ghting), to perform more complex tasks (coordination of several agents), or to demonstrate actions that can be learned by the system (service robotics). We concentrate here on the aspects of advanced teleoperation schemes applied to mobile platforms, interactive environment modeling, and task speci cation. In a rst part, we describe our telerobotic system and give examples of the use of shared control applied to our mobile robot PRIAMOS [12, 9], which can be seen as an extension of the system presented in [5]. The second part deals with our interactive environ-
OPERATOR CONTROL & MONITORING STATION
video
MOBILE ROBOT CONTROL SYSTEM telecommand
Visual Interface telecontrol
cooperation strategy
world modeling task specification
status
command
multisensor feedback
mobile robot
input device
Figure 1 The system con guration of the telerobotic system ment modeling interface, which enables the operator to de ne the structure of the world in the sensory space of the robot. Unlike previous approaches, like [4], we do not limit the model to purely geometric primitives, but propose a hierarchical structure including topological and semantic information. In the third part, we address the problem of high-level task speci cation based on the environment model. In [7], Kay and Thorpe propose a trajectory de nition scheme based on polygonal earth geometry, which can be used in real-time but relies on a very clear structure of the world (a road). In our case, the input of trajectory control points is much simpler because of the geometry of the environment (rooms and corridors). However, we are interested in the speci cation of more complex tasks, based on topological and semantic elements, which can be translated into robust sequences of sensor-based actions, using our hierarchical environment model. Finally, we present some actual results, obtained with the mobile robot PRIAMOS.
2 AN ADVANCED TELEOPERATION SYSTEM The typical problems associated with teleoperation are time delay and degraded perception on the operator side. For the mobile robots the problems are even worse. On the one hand there is no overhead camera as in the usual robot manipulator work cell. The operator does not have a global view of the remote environment. The visual feedback is limited to the front or surroundings of the mobile robot. On the other hand the telecontrol of mobile robot implies velocity control instead of position control. The time delay causes proportional positional error under constant velocity. To overcome these problems the presented advanced teleoperation system adopts the shared autonomy
scheme [9]. Fig. 1 gives an overview of the con guration of the system. The global system consists of two subsystems, the local operator control and monitoring station and the remote mobile robot control system. At the local site an operator is continuously receiving video images of the remote environment. He uses his perception, planning and control capabilities to in uence the remote mobile robot and thus closes the outer feedback loop of the telerobotic control system. The commands issued from the operator (telecommands) will be mixed with the multisensor feedback on the robot to assure a correct response to the world. 24 ultrasonic sensors and structured lights on the two front corners of the robot are used to detect obstacles around and in front of the robot. The operator can control the active camera on the top of robot to get a panoramic view of the remote environment. With the shared control modes described below, the operator can control the robot under bad visual feedback and preserved the positional precision by the remote control. The robot PRIAMOS uses two radio links for data transmission, numerical for telecommands and analog for video.
2.1 Control Modes for Telerobotics Direct Control: the operator uses a suitable input
device such as a 6-D mouse to control the movement of a vehicle and the active camera head on it. Safety measurements are needed to compensate the degraded perception and time-delay. Emergency stop and collision avoidance are two such examples. This system uses rule-based interpretation of sensor data to make real-time decision and obstacle avoidance.
Traded Control: this mode provides alternative
control of the robot. The operator assumes direct control in case of critical situations. The control is given to the robot in case of non-critical environment or bad visual feedback. The sensor feedback allows
y
odometry
x
real trace
90cm
110.5cm
Figure 2 Degree of freedom sharing while moving along a wall the operator to assess the remote situation and make a safe and continuous operation possible.
Shared Control: this is a higher level of humanmachine cooperation than the simple collision avoidance in the direct control. Two shared control schemes are developed in this system. One is the degree of freedom sharing. in this mode, the robot and human each controls speci c degrees of freedom. For example, when moving along a corridor, the robot keeps its orientation and distance with respect to the side walls while the operator controls the forward movement. Fig. 2 is a real trajectory of the robot under shared control to travel along a corridor. Experiences tell that it is not easy for an operator to control more than two degrees of freedom at the same time. In this example the controlled d.o.f is reduced from three to one, which strongly facilitates the control of the remote robot. The second scheme is a smooth change from velocity to position control. In a free place the user input is proportional to the robot velocity. If the robot comes near to the target object, e.g. a dock station, the user command is automatically changed to positional input. The distance to the target object is inversely proportional to the user input. With this shared control the positional precision is preserved under time delay.
Supervisory Control: the operator acts as a su-
pervisor. He performs high-level planning and monitors the robot's execution. He may have to interrupt its execution in dangerous situation or help the robot executing its task , e.g. indicating the location of a landmark. In this mode the robot has the highest degree of autonomy. Supervisory control is an example of cooperation on the task level. The task planning relies mostly on the operator. The robot is responsible for task execution while under human supervision. In traded
control the operator decides when to switch the control. The other control modes mentioned are based on cooperation on the servo level. The telecommands from the operator are mixed with sensor feedback to control the robot. The degree of autonomy increases from direct to supervisory control. The choice of an adequate control mode depends on the task to be executed. This exibility guarantees both ecient and reliable control.
3 INTERACTIVE ENVIRONMENT MODELING Advanced teleoperation modes rely on the interpretation and decision abilities of the robot. These abilities enable the human operator to issue more abstracted commands. The robotic system performs a mapping of these commands on the local environment and checks their relevance, so that it can execute them robustly. This mapping is based on a world model. A typical world model is the layout of a building, in the form of a 2D/3D geometric map; the geometric features build a comfortable representation for the operator, and the Cartesian coordinates are adequate for the robot control system. Usually, either the world model is supplied by the operator (e.g. an architect's plan), or the robot is able to construct it automatically out of its sensor readings. The rst case necessitates a good a priori knowledge of the environment, and implies a lot of work; furthermore, the resulting representation is not always suitable for the robot (because it does not re ect the particularities of its sensors). On the other hand, the automatic generation of the world model suers under severe limitations; in the current state of the art, automatically generated maps are mostly unreliable, incomplete, and of little use to an operator. We propose here a new method for building the envi-
ronment model, which both simpli es the operator's task and produces a reliable, relevant, and useful representation. The basic principle is to bene t from the human interpretation capacities in order to extract relevant information out of the robot's sensor readings. Such a system has already been proposed by [4] and used in industrial applications, for example in subsea servicing of o-shore installations. However, the previous implementations concentrated on the geometrical aspects of the environment, and thus constituted but a bad compromise. Our system is unique in that it can generate a complex hierarchical model, which associates numerical, geometric, topological, and semantic information, permitting a much richer mapping between the human and machine representations. We believe that our approach will lead to more robust, more ecient, and more comfortable human/robot interactions.
3.1 Scene Reconstruction from a Single Camera For a general reconstruction of a scene, a stereo camera system is necessary to get the complete 3-D information. However the transmission of stereo pictures and the image processing of them takes a lot of time, which poses a limitation on the teleoperation and the rapid prototyping of the world model. In this paper attempts are devoted to interactively modeling a structured environments from a single camera. In such environment the three orthogonal directions are clearly identi ed. From the everyday life the operator knows the geometry, topology and semantics of each object in an image. For a mobile robot which moves on the oor with known camera height, the points on the oor complete the lost 3-D information in a image. The shape of the object (a rectangular, paralleloid) and their topological constraints (on the oor, on the wall) help to get additional informations for their locations. These clues contribute to the world modeling and make the interactive modeling of an unknown man-made world possible.
Figure 3 The object for intrinsic calibration and the 16 extracted circles The second stage is to nd the extrinsic parameters, i.e. the translation and orientation of camera w.r.t. the world. If the origin of the world coordinate system are arbitrarily set to the origin of camera coordinate system, the relation between world and camera is 0 1 0 1 0 1
r
x
r
The camera must rst be calibrated to nd the correspondence of the points between world and camera. In this paper we use a two-stage calibration approach to nd the relation between image and world. The rst stage is an o-line intrinsic calibration, which is independent of the environment and is performed with the help of Tsai's algorithm [11] and a calibration object [12]. Fig. 3 illustrates the calibration object and the extracted circles.
r31 r32 r33
zc
zw
(1)
Since the three orthogonal directions in the world are distinguished from other directions, the axes of the world coordinates are chosen to be parallel to them. From the coordinate transformation and the perspective projection of a camera, a line in the world xw = x 0 + m x t (2) yw = y0 + my t (3) zw = z 0 + m z t (4) will be xc = x00 + m0xt (5) yc = y00 + m0y t (6) 0 0 zc = z 0 + m z t (7) in the camera and 0 tm0x) + C xf = ax f (zx0 0++tm (8) x 0 0
3.2 Camera Calibration
x
r
@ ycc A = @ r2111 r2212 r2313 A @ yww A
yf
z
f (y0 + tm0 ) = ay z 00+ tm0 y + Cy z 0
(9)
in the image plane.
According to the central projection the lines parallel to each other in the world will converge to an escaping point (Vx ,Vy ) in the image plane, 0 x Vx = ax fm m0z + Cx fm0 Vy = ay m0 y + Cy z
(10) (11)
where ax, ay , f , Cx and Cy are obtained from the intrinsic calibration. A camera with known height H completes the additional 3-D information needed for the calculation of position. The equation of the horizontal oor is:
r12 xc + r22 yc + r32 zc + H = 0
(12)
With these equations, two approaches can be used to nd the three Euler angles from the extrinsic calibration and the coordinates of the points on the oor: 1. slope based approach: given a point on the oor and the slopes of the three orthogonal directions from it in the image, the position of the point and the rotation matrix in eq.1 can be calculated. 2. vanishing point based approach: from the vanishing points of the three main directions (X ,Y ,Z ), the tilt, turn, and roll angles of the camera can be derived. From eq.12 the coordinates of oor points are obtained.
From the topology, the world model is of hierarchical nature. Fig. 4 shows a example of a corridor. Every object has its own geometry, semantics. The topological relation between objects in the world is automatically constructed. The world model serves two purposes. On the one hand it is the base for interactive task speci cation, which will be discussed in section 4. On the other hand it provides a model for an autonomous system to plan and execute a task. The created world model is closely related to the layered architecture of many mobile robots [3]. In such architecture a task will be successively decomposed into subtasks. The representation of the world is accordingly re ned, as seen from the topology. A teleoperated mobile robot that learns the unknown environment and eventually other knowledge from the operator will extend its applicability in the real world. CORRIDOR
WALL_L
3.3 World Model After the two-stage calibration the operator is in a position to interactively model the world. The generic objects in buildings are stored in a geometric database. These objects may dier from each other in dimension and location, but their shapes are similar. The operator chooses an object out of the database, translates and modi es its generic models in order to match them against the images from the video feedback. The topological constraint, e.g. on the oor or on a wall provides a convenient way to manipulate the selected object under constraints. With a good user-interface the calibration and modeling can be done in a few minutes. An experiment discussed at the end of the paper justi es this approach. Beside geometric information, stored in an object are also:
features: the information (e.g. color) for sensors to recognize the object topology: the topological relations to other objects in the model semantics: the property of an object in this considered environment. The semantics is important for planning and task speci cation, but is not yet automatically achievable.
FLOOR
WALL_R
geometry
door
feature
picture
cupboard
topology semantic
Figure 4 Hierarchical model of a corridor
4 INTERACTIVE TASK SPECIFICATION The second fundamental aspect of advanced teleoperation is the kind of input used for specifying the task to be achieved by the robotic system. Traditional teleoperation makes use of a wide range of input devices, such as master arms, joysticks, with or without force feedback. Telerobotics and telepresence have broadened the range of available man-machine interface supports, like six-degrees-offreedom mice, datagloves and all kinds of exoskeletons. The input strategy has changed accordingly, from continuous control of joint positions to endeector cartesian space to inference of intention and symbolic task description [10]. However, these developments have been mostly devoted to manipulation tasks, and, but for a few nearly-autonomous experimental platforms [1, 2], most mobile teleoperated robots are still based on simple control strategies. In-
deed, mobile robots have a much larger operational space than manipulators, so that the issue of environment modeling becomes crucial.
4.1 Vision-Based Task Speci cation Traditionally the task speci cation for mobile robot is either o-line created [2] or through the graphical user interface (GUI) [8]. [7] proposes a land-vehicle that is guided by a supervising operator from video feedback. In this paper we extend this approach and combine the task speci cation and monitoring in the single visual interface. In addition to specifying a sequence of movements as in [7], the task speci cations are object-oriented and directly on the visual interface. In a common teleoperation system GUI is extensively used to facilitate the o-line task analysis and on-line visualization [8]. Task analysis presumes a known world model, which is not always available. In the on-line visualization the real sensor data (e.g. odometry, ultrasonic sensors) are visualized on the workstation. However such visualization is not proper for a mobile robot, because the slip in wheels makes the odometry unreliable. The discrepancy between monitored position (from odometry) and real location in the world grows with the driven distance. Therefore one or more visual feedbacks are employed in addition to GUI as monitoring tools. Under such con guration the working cycle from task preparation to execution/intervention takes a lot of time. Compared to that, the single visual interface used in our system integrates the telecontrol, interactive world modeling and task speci cation in one. It simpli es and shortens the task preparation and intervention cycle and minimizes the system components.
4.2 Interactive Task Speci cation In the framework of our advanced teleoperation environment, we are in a good position for developing new task speci cation techniques. Our system is based on a complex environment model, obtained interactively through the robot's sensors. This allows us to link symbolic and geometric representations ef ciently, so that a near-to autonomous system is no more necessary. Theoretically, any kind of input device could be used in order to specify a task, be it on a symbolic (e.g. go to John's oce), topological (e.g. take the second door right), geometric (e.g. move one meter to the right) or numerical level. We have chosen to integrate our task speci cation interface in the environment modeling system, so that the tasks re ect directly the structure of the world representation.
The task to be performed is given by the operator in the form of a reference to an object in the environment model and a behavior relative to that object (e.g. go to, drive along, : : :). The object is selected on the video overlay displaying the environment model, and the related behavior appears on a menu. The pair (object, behavior) is checked for coherence, and the system extracts the geometric data of the object from the environment model. Each behavior corresponds to a speci c procedure, in charge of elaborating a robust sequence of sensor-based actions to be executed by the robot. As an example, let us consider the case of a corridor. The operator can select the right wall by simply clicking on its representation on the screen, and then activate the drive along behavior. The geometric model of the wall is sent to the desired procedure, which computes the distance and orientation to the wall, as well as the distance to be covered during this movement. Further parameters can be extracted from the environment model, such as the minimum distance to other objects, which can be used as safety threshold for the ultrasonic sensors. Finally, a pair of actions is sent to the robot, namely the initial rotation in order to align the direction of movement along the wall, and the forward movement, performed under ultrasonic sensor feedback. Because the robot path is known in advance, it can be displayed on the screen and the human operator can supervise its planning and execution, and regain control in case of an unforeseen situation. This framework oers a simple solution to the operation of mobile robots in an unknown but structured environment, and under small time-delay. It was shown by Paul and Funda [6] in the case of a manipulator that the transmission of sensor-based actions is liable to overcome communication delays. At the expense of the environment modeling time, of course, but the time and safety gained soon outranks standard teleoperation schemes. Furthermore, we believe that our approach is relevant for speci c teleoperation applications without time delay, because it participates in relieving the operator of high workload and burdensome tasks, thus enabling him to concentrate on delicate operations. This can turn out to be crucial in some cases, like for example the control of a eet of robots inside a building for a time-critical intervention such as re ghting or preventing chemical hazard.
5 EXPERIMENTAL RESULTS 5.1 Interactive Modeling In this section an example of interactive modeling of a corridor is presented. The camera is rst intrinsically (Fig. 3) calibrated. The extrinsic calibration is done with the help of the vanishing point of the lines parallel to the two main oor lines. The calibrated camera tilt and turn angles are: 2.872 and 2.908 respectively. Because the vertical lines are nearly parallel, the roll angle is assumed to be negligible in this case. The generic objects on the corridor are inserted and directly manipulated to match against the feedback images. Fig. 6 shows the modeled cupboards and doors on the video interface. The consistency in the overlay of model edges and real images demonstrates the accuracy of calibration and the advantages of direct manipulation under geometrical and topological constraints. Fig. 5 compares the interactively modeled world with the real world. The solid lines are real objects in the world, while the dashed lines are modeled objects. The modeled two oor lines are nearly the same as the real ones (error in width: 1.19 cm). The error in the positions of the modeled doors grows more with the distance away from the robot, which results from the central projection of the camera.
Figure 6 Interactive modeling of a corridor
5.2 Task Speci cation In order to demonstrate our task speci cation approach, we chose a simple task, i.e. to go to the second door on the right. This was done simply by selecting the model of the door on the screen, and selecting the go-to behavior in the behavior pull-down menu. As a result, the planning system produced one
sensor-based action, which caused the robot to follow the right wall, guided by the right-side ultrasonic sensors, for a distance of 7.89 meters. We registered both the odometry readings and the actual trajectory, by means of a line drawing on the oor. Fig. 5 shows the real trace as a thin line and the odometry readings in shadowed style. The advantage of a sensor-based movement is obvious there: although the dead-reckoning estimate is quite good in the Xaxis, it is very erroneous in the transverse direction, because of the uncertainty in the value of the initial angle. Indeed, our extrinsic calibration method allows us to evaluate the turn and tilt angles of the camera relative to the environment X-direction, but not that of the robot platform. Therefore, the use of the follow-wall primitive combines the correctness of the dead-reckoning in the X-axis with the sensor feedback in order to proceed robustly with the task. The distance actually driven was 7.88 meters, and the nal range to the wall 55 centimeters (steady), for a command of 50 10 centimeters.
5.3 Eciency of the Proposed Method We can evaluate the eciency of our method by comparing it with traditional teleoperation. In traditional teleoperation, even an experienced operator looses time in turning the camera to the left and to the right in order to monitor the distance to obstacles and to position the robot accurately relative to the door. Using a model of the environment, he is no longer limited to the video camera, and can use a wide range of display techniques such as virtual cameras and overhead views simultaneously; the coherence of the model can be checked using an overlay of the synthetic scene over the real incoming video signal, and thereby corrected. Furthermore, the task speci cation interface relieves the operator from the direct control, so that he can concentrate on the decision-making and supervision, while at the same time regaining full use of the video camera. The price to pay is the time spent while elaborating the environment model. However, the calibration of the camera orientation relative to the world main directions helps a lot, and the model can be limited to the few objects relevant to the task, so that the loss of time is minimal. As an example, the view presented Fig. 6 was obtained in 2 minutes 20 seconds, including the extrinsic calibration. The task itself was performed in 1 min 13 s, making a total of 3 min 33 s. During our experiments, an experienced operator needed 1 min 36 s to perform the task under direct control, and 1 min 25 s under shared control; in both cases, he had to turn the camera to the side only once. We believe that our approach would prove itself more ecient in the case of a more com-
Cupboard1 1m real objects odometry
modelled objects Robot
real trace
Door1
Door2
Cupboard2
Door3
Door4
Figure 5 Experimental results plex task, or sequence of tasks, and of course in the presence of a small communication delay or a limited transmission bandwidth.
6 CONCLUSION The paper aims at extending the traditional teleoperation system to a higher level, i.e. to learn the remote unknown world and specify a task based on the visual feedback. Experiments demonstrate the fast modeling of a remote unknown environment and the execution of a task with the help of the model. These schemes together with shared control will facilitate the application of mobile robots in a real world environment. The integration of several local images to a global world model is the next step in the interactive modeling. The semantic, topological and geometrical information stored in the model t the layered architecture of most mobile robots. Further eorts will be made on the connection of the interactively created world model with autonomous mobile systems. Interactive task speci cation provides a exible and powerful way of programming a mobile robot. More behaviors will be added in the future to complete the system.
7 ACKNOWLEDGMENT The authors would like to thank P. Weckesser for his assistance to the work described. This work was performed at the Institute for Real-Time Computer Systems and Robotics, Prof. Dr-Ing. U. Rembold and Prof. Dr.-Ing. R. Dillmann, Department of Computer Science, University of Karlsruhe, 76128 Karlsruhe, Germany.
8 REFERENCES [1] C. M. Angle and R. A. Brooks. Small Planetary Rovers. In Proceedings of the IEEE International Workshop on Intelligent Robots and Systems (IROS'90), Tsuchiura, Japan, July 1990.
[2] R. Chatila, R. Alami, S. Lacroix, J. Perret, and C. Proust. Planet Exploration by Robots: From Mission Planning to Autonomous Navigation. In Proceedings of the ICAR'93, 1993. [3] R. Dillmann, J. Kreuzinger, and F. Wallner. The Control Architecture of the Mobile System PRIAMOS. In Proc. of the 1st IFAC International Workshop on Intelligent Autonomous Vehicles, 1993. [4] P. Even and L. Marce. Manned Geometry Modelling for Computer Aided Teleoperation. In Proc. Int. Symposium Teleoperation and Control, pages 113{122, 1988. [5] R. Fournier, P. Gravez, and M. Dupont. Computer Aided Teleoperation of the Centaure Remote Controlled Mobile Robot. In Proc. Int. Symposium Teleoperation and Control, pages 97{105, 1988. [6] J. Funda. Teleprogramming: Towards Delay-Invariant Remote Manipulation. Technical report, GRASP Laboratory, University of Pennsylvania, Philadelphia, August 1991. [7] Jennifer Kay and Charles Thorpe. STRIPE: Supervised Telerobotics Using Incremental Polygonal Earth Geometry. In Proceedings of the 3rd International Conference on Intelligent Autonomous System, pages 399{405, 1993. [8] Won S. Kim. Graphic Operator Interface for Space Telerotics. In IEEE International Conference on Robotics and Automation, pages 761{768, 1993. [9] I-S. Lin, F. Wallner, and R. Dillmann. An Advanced Telerobotic Control System for a Mobile Robot with Multisensor Feedback. In Proceedings of the International Conference on Intelligent Autonomous Systems (IAS-4), Karlsruhe, Germany, 1995. [10] Thomas B. Sheridan. Telerobotics, Automation, and Human Supervisory Control. The MIT Press, London,England, 1992. [11] Roger Y. Tsai. A Versitile Camera Calibration Technique for High-accuracy 3D Machine Vision Metrology Using O-the-shelf TV Cameras and Lenses. IEEE Journal of Robotics and Automation, RA-3, 1987. [12] P. Weckesser and G. Hetzel. Photogrammetric Calibration Methods for an Active Stereo Vision System. In Intelligent Robot System, pages 430{436, 1994.