sensor perceptions can be matched to predictions. Ul- trasonic ... An op- erator can teleoperate PRIAMOS and KASTOR with ..... AMOS operates in. Signi cant ...
Multiple Sensorprocessing for High-Precision Navigation and Environmental Modeling with a mobile Robot P. Weckesser, R. Dillmann, M. Elbs and S. Hampel Institute for Real-Time Computer Systems & Robotics Prof. Dr. U. Rembold, Prof. Dr. R. Dillmann Department for Computer Science, University of Karlsruhe 76128 Karlsruhe, Germany
Abstract
In this paper an approach to real-time position correction and environmental modeling based on odometry, ultrasonic sensing, structured light sensing and active stereo vision (bin- and trinocular) is presented. Odometry provides the robot with a position estimation and with the help of a model of the environment sensor perceptions can be matched to predictions. Ultrasonic sensing is capable of collision avoidance and obstacle detection and so enables navigation in simply structured environments. Model-based image processing allows detecting and classifying natural landmarks in the stereo images uniquely. With only one observation the robot's position and orientation relative to the observed landmark is found precisely. This sensing strategy is used when high precision is necessary for the performance of the navigation task. Finally techniques are described that allow an automatic mapping of an unknown or only partially known environment.
1 Introduction
The application of mobile robots in the eld of service robotics requires a very high performance of the
sensor system, because the demand for reliability and
exibility is extremely high. Sensors which are currently used on mobile robots are able to provide distance information based on acoustical or optical active sensing techniques. The need of reliable sensors for position correction and object recognition has led to the development of a multiple sensor system for the mobile robot PRIAMOS. This was necessary in order to operate PRIAMOS safely and accurately in a laboratory or public building environment. PRIAMOS is equipped with four Mecanum wheels, which enable holonome travelling with three degrees of freedom. All these degrees of freedom can be combined. A multisensor system supports the vehicle with odometric, sonar, and visual information. The sensor system and its use is extensively described in [WGD95]. The odometry sensors provide an position and orientation-estimation and the sonar sensors are mainly used for collision avoidance, obstacle detection and mapping of simply structured environments. The image-processing and the camera-head control system are basic compo-
Figure 1: PRIAMOS nents of the optical sensor system. In addition to the binocular stereo vision head KASTOR, PRIAMOS is equipped with a passive camera that is used for multiple purposes. This third camera can be fully included in a trinocular stereo reconstruction process providing an additional epipolar line to simplify the stereo matching algorithm. In addition to this structured light (laser beams) is projected onto the ground in front of the robot and observed by the third camera in order to detect obstacles in front of the robot. The 3rd camera observes two straight lines on the ground in case of a free space and disturbed lines in case of an obstacle. In addition to this the image of this camera is transmitted by a video link to support teleoperated remote control of the robot. The connection to the o-board computers ('sun' workstations) is realized by a data-radio system with 9600 baud. This data-rate is sucient as the whole sensor information is processed on-board. Only parametric information about the environment and commands to the robot are transmitted via radio. An operator can teleoperate PRIAMOS and KASTOR with
Y
wu wv w
C(X 0 ,Y 0,Z 0)
v c B u
H
I(u,v)
X
Y
P(X,Y,Z) 0
X0
Z0
Z
Figure 2: Camera model a six degrees of freedom space-mouse. [LDW95]
2 The active vision system KASTOR
KASTOR [WW94] consists of two cameras, mounted on a platform with motor-controlled tilt and turn. Zoom and focus as well as the vergence of each camera unit are equipped with motors. A third passive camera is added to the system in order to improve the reliability of the vision system. The cameras are connected to a real-time vision system, which is specialized on stereo image-processing. The imageprocessing system is able to produce a symbolic contour description by edges extracted from the stereo images in real-time. With a calibrated stereo camera system it is possible to compute a 3d-reconstruction of the scene.
2.1 Camera calibration with sub-pixel accuracy
Any precise scene reconstruction is not possible before the perspective transformations describing the relation between a scene point and its image points are known. The classical way is to extract the external and internal parameters of the cameras. In general it is very dicult to nd these parameters, especially when zoom-lenses are used. In the following the camera model is introduced which is used for the calibration ( gure 2). The camera is characterized by its image plane (ccd-chip) and the position of the optical center C. Altogether there are 12 camera parameters in this camera model [Gen94]. Instead of extracting the camera parameters directly from the camera geometry it is possible with the knowledge of at least 6 corresponding scene (S) and image (I) points to compute the dlt-matrices (M) (direct linear transformation) for the left and right camera [AAK71]. With the dlt-matrix a homogeneous formulation (equation 1) of the perspective transformation is possible. The 12 parameters of the dlt-matrix mi;j contain all 11 camera parameters [Foe90]. The perspective transformation can be formulated as follows: I = M S (1)
!
=
m1;1 m1;2 m1;3 m1;4 m2;1 m2;2 m2;3 m2;4 m3;1 m3;2 m3;3 m3;4
! 0X1 B @ YZ CA 1
The practical problem with this approach is the exact determination of the positions of the reference points in the images. In [Sch92] a cube is used as reference object. The corners of the cube can be located with a corner operator or by the intersection of the edges of the cube. Experimental results have shown that the accuracy that can be reached by this method is not very high (2 pixels). The accuracy of corner detection determines the quality of calibration. In order to improve this poor accuracy in the detection of reference points for the calibration photogrammetric methods are applied [Bey91]. An automatic calibration technique that detects the reference points in the images with sub-pixel accuracy was developed [WH94]. With this technique for the selection of reference points the accuracy of calibration was improved enormously. The whole calibration procedure is fully automized. The only speci cation the user has to give is an interval for zoom and focus the calibration is supposed to be performed for, and he has to focus both cameras at the beginning on the reference object with the maximum focal length (upper limit of the interval). Then the calibration procedure can be performed for this camera setup. In any case sub-pixel accuracy for the detection of the reference points is achieved [WD94]. As a result it is possible to change the optical parameters and come back to a pre-calibrated position due to the high precision of the mechanical construction of the system [Wil93]. Because of tolerances in the lenses the best results can be achieved if the positions are always adjusted from the same side. The accuracy of the calibration used will always be in the subpixel dimension. The described calibration technique and equation 1 are used for the following purposes: 1. stereo-reconstruction: by using two or more cameras the scene-reconstruction of corresponding objects can be computed [WH94]. 2. structured light: obstacles can be detected using triangulation techniques with a light source (laser) and a camera (section 2.2). 3. predictive and purposive vision: for a given scene point the image point can be computed (section 5.2, [Alo90]).
2.2 Obstacle detection using structured light
A very reliable approach to obtain threedimensional information out of two-dimensional pictures is triangulation with structured light. A wide range of applications has been developed to achieve high accuracy and to expand the eld of view [VO90].
Third Camera
Figure 4: Fusion of colinear edges
Diode Laser
Light stripe projected by using cylinder lenses
Figure 3: Con guration of the sensor There is a wide range of commercial systems, mostly designed to operate in known environments with well de ned light sources. The system introduced here has to operate on changing oor coverings and under changing light conditions. It is usefull to integrate a third camera into the vision system in order to improve the reliability of the stereo matching algorithm as well as for supervision of teleoperation experiments with the robot. Thus it is possible to construct an inexpensive sensor to improve obstacle detection using the third camera in connection with projected laser lines. The camera is mounted on top in the front of the robot, looking straight forward. It can be turned with a stepper motor around the horizontal axis to meet the dierent demands. In the lower part of the vehicle on each side one diode laser, equipped with a cylinder lens, projects a laser line onto the oor. A con guration with two crossing lines allows supervision of a wide area of the image with a minimumof light sources and with maximum inspection of the area directly in driving direction. To be able to distinguish the lines from the light of surrounding sources, an interference lter is built into the lens of the camera. Its transmission peak is located in between the wave lengths of the two diode lasers at 673 nm and it has a bandwidth of 12.4 nm. This bandwidth is sucient to detect the lines accurately (the lines appear to be white in the images), but it still allows using the images of the third camera for other purposes. In order to develop a system that is portable to other mobile robots, two approaches to imageprocessing were implemented. The rst possibility is to make use of the extracted edges of the existing image-processing system. The search algorithm starts following the detected edges in the lower part of the image, close to the robot. Obstacles are identi ed by signi cant changes in the angle of the edges. As the third camera is calibrated by the same means as the two cameras of the stereo vision-system (section 2.1), it is possible to calculate the scene- from the imagecoordinates of the cameras. When a point in the image is found where the laser line breaks or changes orientation it is assumed that this point is located on the ground (y = 0 in scene coordinates). With a calibrated camera and this knowledge it is possible to compute the 3D coor-
dinates out of 2D image coordinates (equation 1). The second way to detect obstacles is to reduce the information of the image by binarization. The projected laser lines can be detected by using a rendering-algorithm . The boundaries de ned with the rendering-algorithm can be compared with the undisturbed straight lines known to the system. As soon as the image-coordinates of an obstacle are found the 3d-position can be computed as described above. Using this sensor, PRIAMOS is able to detect small obstacles on the ground the ultrasonic system can not make out. Descending stairs as well as holes can be detected by breaks in the laser lines. The range of the sensor reaches from 50 cm in front of the vehicle to a maximum of 3 m. The reliablity of the sensor is very high; by using a threshold the height of obstacles that are detected can be determined. With a very low threshold it is even possible to detect a thin cable on the oor. Experiments with transparent obstacles like glass were carried out and it was shown that even these can be detected with a high probability.
3 High-level image processing
The image processing system is able to perform real-time edge detection. Thus our high level image processing system is designed to make extensive use of the extracted edges. The edge extracting procedure is a standard Sobel- ltering algorithm suering from the known problems of randomly broken edges and edges not extracted in edge intersection areas. Thus the rst step in high-level image processing is post-processing of the extracted line segments in order to solve the following problems: randomly breaking edges because of noisy images problems of the extraction algorithm at edge intersections
3.1 Fusion of broken edge-segments
The problem of broken edges is solved by iterative grouping of colinear edge segments. Firstly edges are tested on being colinear and then grouped to a new edge. To do this, reasonable thresholds for the distance of the edges and the dierence in orientation have to be applied. Secondly the grouped edges are tested for common edge segments. If this is the case they can be fused together as well ( gure 4). This algorithm is continued until no broken edges can be fused any more.
3.2 Grouping of straight lines to contours After repairing the linear edge segments, these straight lines are being grouped to contours. Firstly
Figure 5: Parallel and perpendicular edges
Figure 6: Parallelograms neighboring edges are analyzed for geometrical relations (parallel, perpendicular, intersecting ( gure 5 and 6)). According to these geometrical relations the edges can be grouped to more complex contours. These contours can be: parallelograms : two neighboring pairs of parallel edges of the same length are grouped intersections: two intersection edges are grouped. In a natural environment edges that run up to each other generally intersect in a corner. With this relation the lack of the edge operator in corner areas can be made up. Simple objects like cubes or tables can be described by contour lines derived from the structures described above ( gure 7).
4 Reconstruction of an indoor scene
A basic task of image processing is reconstruction and modeling of the scene. In order to reconstruct the environment the perceived images have rst to be matched. With KASTOR being able to perform a real-time edge detection, an edge-based matching algorithm is applied. The high quality of calibration enables a highly precise computation of corresponding endpoints of edges with the use of the epipolar constraint. In [DM94] it was shown that corresponding image points can unequivocally be computed if three or more calibrated cameras are used. With a binocular stereo system that is not provided with any knowledge of the environment the matching is very dicult. To every point an epipolar line can be computed in the other image on which the corresponding point must lie. This results very often in an ambiguity of possible matches. This problem can be solved with
Figure 8: Indoor environment a third camera. With the help of the third camera a second epipolar line can be computed for every point where the corresponding one has to lie on; this means that the corresponding point is the intersection of the two epipolar lines. By applying this technique a very high reliability for the stereo matching algorithm was achieved.
5 Landmark recognition
For the performance of the navigation task the robot needs some a priori knowledge of the environment. Figure 8 shows the indoor environment PRIAMOS operates in. Signi cant natural structures in this environment are used as landmrks to support the robot's navigation. Several dierent objects are used as landmarks (doors, pillars, : : :) to perform a position correction [ZF92] during longer navigation tasks.
5.1 Generic models of landmarks
In order to recognize a landmark an adequate model of this landmark has to exist in the robot's internal map. Again a basically edge-based description for the landmarks is used. The landmarks are generically modeled out of edges. Additionally the edges can be attributed. Such attributes can be visibility of an edge, a pointer to neighboring edges, the grey-value of neighboring planes or anything that supports the recognition process. Two examples of generic landmarks can be seen in gure 9.
5.2 Predictive vision
Figure 7: Edge skeleton of an object
In order to navigate a robot in a partially known environment the observations of natural landmarks are used to update and correct the robot's position from
before correction
after correction
reconstruction ∆υ
→ X mod
→ → X rec = X mod
→ X rec
model estimated position of robot
corrected position of robot
Figure 9: Landmark models: door, pillar
Figure 11: Computation of position and orientationcorrection
5.2.1 Vision-based position correction
Figure 10: Pillar time to time. Especially when high precision is necessary for the performance of a navigation task, the robot needs to know its position very accurately. The odometry sensors of the robot provide a position estimation within an interval of 20cm. In situations when a high precision in navigation is necessary this position-error has to be minimized. A navigation task that requires a very accurate knowledge about position and orientation is to drive through a narrow door. This task can not be executed with a position error of about 20cm and an orientation error of about 5 degrees. Because of that the door that is known in an internal map is used as natural landmark. The robot's estimated position relative to the door allows predicting the position of the door in the camera images. Then a snapshot is taken with the cameras and the prediction of the door is matched with the extracted edges in the images. By doing this it is possible to recognize the object door and localize it in the camera images. As soon as the door is localized in both images the two images can be matched and the position and orientation of the door relative to the robot can be computed. From this results a position correction that allows the navigation of the robot through a narrow door. This is decribed in detail in [WWD95].
In the following the task of position correction is illustrated in another example. The robot received a position estimation that enables it to make a planned perception of the area where a pillar is expected. The predicted model of the pillar is matched to the perceived image and so it is possible to identify the pillar. In gure 10 a grey-value image of the pillar and the extracted edges are shown. In order to correct the orientation of the mobile robot two landmarks (in this case two pillars) have to be observed. The correction of position and orienta~ are computed from the displacement of tion (; X) the position of the landmark in the world model and the computed scene-reconstruction ( gure 11). ~ mod X~ rec (2) cos() = X jX~ mod j jX~ rec j X~ = R() X~ rec ? X~ mod (3)
6 Environmental modeling
The same techniques described above that are used for position correction can also be applied for an automatic mapping of the environment ( gure 8). In the rst step the robot starts with a very rough map of the environment. Given the ground-plan of a hallway the task is to drive through this hallway only using ultrasonic sensing. On the way through the corridor the robot has to detect all doors (closed or open) automatically and it is asked to update the internal map with the positions of the doors. The precision of the automatically generated map is directly related to the precision of the odometry, as no high resolution position correction with the ultrasonic sensors is possible. With the help of the vision-based position correction using the pillars as landmarks a high performance can be achieved. In the next step it is planned to start without any environmental knowledge and automatically map the whole accessible oor of the building.
7 Conclusion and future work
In this paper the sensor system of the mobile robot PRIAMOS is described. It is explained how dierent
types of sensor information are processed with a focus on the visual sensors (cameras). A precise and fast calibration technique for active cameras is proposed. Based on this strategy reactive visual sensing, structured light as well as high-level image processing techniques like landmark recognition and automatic mapping of the environment are described. Future work will focus on the fusion of dierent types of sensor information by using extended Kalman ltering and online sensor-planning [Agg89]. On the other hand there will be a focus on the application of mobile service robots that can supply an operator with all the processed sensor measurements in order to achieve a very reliable teleoperation system with a high degree of autonomy (shared control). Parallel to this the image processing system will be changed from standard to digital cameras. A signi cant improvement in the quality of images and in low-level processing is expected from this.
8 Acknowledgment
The authors would like to thank J. Keller and T. Killimann for their engagement in developing the camera head and G. Hetzel for his contributions to the development of the calibration technique. This work was performed at the Institute for Real-Time Computer Control Systems & Robotics, Prof. Dr.-Ing. U. Rembold and Prof. Dr.-Ing. R. Dillmann, Department of Computer Science, University of Karlsruhe, 76128 Karlsruhe, Germany.
References
[AAK71] Y.I Abdel-Aziz and H.M. Karara. Direct linear transformation into object space coordinates in close-range photogrammetry. In Symposium on Close-Range Photogrammetry, Universty of Illinoios at Urbanachampaign, January 1971. [Agg89] J.K. Aggarwal, editor. Multisensor Fusion for Computer Vision. NATO ASI Series. Springer-Verlag, 1989. [Alo90] J. Aloimonos. Purposive and qualitativ vision. In Proc. of Image Understanding Workshop, pages 816{828, 1990. [Bey91] H. Beyer. An introduction to photogrammetric camera calibration. Technical report, Institute of Geodesy and Photogrammetry, ETH, Zuerich, Schweiz, 1991. [DM94] J. Dold and H.-G. Mass. An application of epipolar line intersection in a hybrid close range photogrammetric system. In Prof. J.F. Fryer, editor, Close Range Techniques and Machine Vision, pages 65{70. ISPRS Commision V, 1994. [Foe90] R. Foehr. Photogrammetrische Erfassung raumlicher Informationen aus Videobildern, volume 7 of Fortschritte der Robotik. W. Ameling and M. Weck, 1990.
[Gen94]
V.
Gengenbach. Einsatz von Ruckkopplugen in der Bildverarbeitung bei einem Hand-Auge-System zur automatischen Demontage. PhD thesis, University
of Karlsruhe, 1994. [LDW95] I.S. Lin, R. Dillmann, and F. Wallner. An advanced telerobotic control system for mobile robots. In Prof. U. Rembold, editor, IAS, 1995. [Sch92] C. Schmid. Auto-calibration of cameras by direct observation of objects. Master's thesis, University of Karlsruhe, Institute for Real-Time Control Systems and Robotics, University of Grenoble, LIFIA, 1992. [VO90] P. Vuylsteke and A. Oosterlinck. Range image acquisition with a single binaryencoded light pattern. IEEE Trans. on Pattern Analysis and Machine Intelligence, 12(2):148{164, February 1990. [WD94] P. Weckesser and R. Dillmann. Accuracy of scene reconstruction with an active stereo vision system using motorized zoom lenses. In Prof. D.P. Casasent, editor, Intelligent Robots and Computer Vision XIII: 3D Vision, Product Inspection, and Active Vision, pages 470 { 481. SPIE, 1994.
[WGD95] F. Wallner, R. Graf, and R. Dillmann. Real-time map re nement by fusing sonar and active stereo-vision. In IEEE International Conference on Robotics and Automation, Nagoya, 1995. [WH94] P. Weckesser and G. Hetzel. Photogrammetric calibration methods for an active stereo vision system. In Prof. A. Borkowsky and Prof. J. Crowley, editors, Intelligent Robotic Systems (IRS), pages 430{436, 1994. [Wil93] R. Wilson. Modeling and calibration of automized zoom lenses. PhD thesis, CMU, Pittsburgh, USA, 1993. [WW94] P. Weckesser and F. Wallner. Calibrating the active vision system KASTOR for realtime robot navigation. In Prof. J.F. Fryer, editor, Close Range Techniques and Machine Vision, pages 430{436. ISPRS Commision V, 1994. [WWD95] P. Weckesser, F. Wallner, and R. Dillmann. Position correction of a mobile robot using predictive vision. In U. Rembold and R. Dillamnn, editors, International conference on Intelligent Autonomous Systems. IAS-4, April 1995. [ZF92] Z. Zhang and O. Faugeras. 3D Dynamic Scene Analysis. Springer-Verlag, 1992.