Proceedings of the 2007 IEEE International Conference on Networking, Sensing and Control, London, UK, 15-17 April 2007
MonM01
A novel approach for Self-Localization based on Computer Vision and Artificial Marker Deposition Savan Chhaniyara*, Kaspar Althoefer, Member, IEEE, Yahya H Zweiri Member, IEEE and Lakmal D Seneviratne, Member, IEEE
Abstract— A new velocity and position sensor concept for manned and unmanned ground vehicles is proposed. The idea of this system is to temporarily place artificial markers in the environment such as the surface the vehicle is maneuvering over [23-25]. Once the markers are placed in the environment, they have zero speed, i.e. they represent the speed of the environment over which the vehicle is traversing. The speed of these markers with respect to the moving vehicle is then measured using a two dimensional (possibly three dimensional) sensor affixed to the vehicle. The markers are preferably of temporary nature and to disappear as soon as the vehicle sensor has traversed over and the signal acquisition process has been completed. The proposed sensor system is composed of two main subsystems. The first subsystem generates the markers and places them on the surface. The second subsystem is a receiving element, which continuously acquires relative position signals from the markers placed on the road surface. This new sensor concept is envisaged to be applied in the following areas, automotive sector, planetary exploration and underwater seabed exploration. Initial experiments employing a camera system as sensor have been conducted and results are presented.
L
I. INTRODUCTION
ocalization is one of the most fundamental problems to provide mobile robots with truly autonomous capabilities. There are various techniques available for localization and motion estimation. Broadly these techniques can be evaluated based on their resolution, accuracy and real-time processing capabilities. There have been tradeoffs between above parameters for different localization techniques. To accurately estimate the motion of a vehicle employing on-board sensors is still a very important research topic. Traditionally, wheel odometry, GPS, DGPS and inertial sensors have been used to obtain a vehicle’s speed and possibly its trajectory [1]. Despite the popularity and usefulness of above sensing techniques, they suffer from drift, low resolution or limited applicability. Wheel odometry performance degrades in presence of vehicle slip/skid. GPS/DGPS suffers from low resolution and low update rates for particular mobile robotic applications. Whereas, inertial sensors are prone too high noise levels, especially, at low speeds and the accuracy of these sensors is affected due to the needed double integration over time, if position estimates
Savan Chhaniyara*, K. Althoefer, and L. D. Seneviratne are with King’s College London, Department of Mechanical Engineering, London WC2R 2LS, UK. (E-mail:
[email protected]). Y. H. Zweiri is with the Department of Mechanical Engineering, University of Mu tah, Karak, Jordan (e-mail:
[email protected])
1-4244-1076-2/07/$25.00 ©2007 IEEE
139
are required [2-4]. Self-localization is still an important topic in mobile robot navigation and a number of research teams aims at developing alternatives to traditional localization techniques. Many navigation algorithms that can compute a robot’s path relatively robustly in unstructured environments have been developed in the recent past. Most notable are the advances in navigation methods based on SLAM [5-6]. On-board sensors are used to acquire images and profiles of the environment (often employing a combination of camera and Laser based sensors) and 2D/3D maps of the explored environment are created while, simultaneously, the robots path in the developing map is computed. Techniques such as Kalman filter and particle filter techniques are used to calculate the path and improve the signal to noise ratio [7]. One of the greatest challenges of these approaches is the extraction of natural features from the environment for map building. One of the advantages of the proposed concept is that it is capable of estimating a vehicle’s trajectory and speed independently of problems inherent to many other visual odometry approaches which rely on the extraction of features from the environment. The latter issue is specifically relevant in environments where natural features are sparse (e.g. on the seabed [8]) or not well known a priori (e.g. on Mars). A few approaches based on visual odometry have emerged recently [9], [17]. These usually employ forward-facing or downward-facing cameras and use image processing techniques for generating 2D or 3D spaces from 2D images, detecting obstacles or extracting motion from an image sequence [10]–[14]. In contrast to methods that attempt to extract natural features from the environment (such as optical flow methods which are directly applied to the raw, acquired camera images) and represent a very time consuming process involving many iterations our new approach is computationally very cheap. When using optical flow methods, the identification of equivalent areas of interest in different images is further complicated by the fact that areas are pixelated and do not necessarily appear in the same way. Image rotations (commonly experienced during vehicle turns) further complicate matters, because the optical flow algorithm does not only need to compare shapes but also check for rotated versions of identified shapes [13]. Also, optical flow methods are very susceptible to changes in
lighting, surface texture, roughness and reflectivity [13, 15]. The proposed concept uses easily recognizable markers which are rotationally invariant, thereby, reducing the computational burden over all and, particularly, when the rotational motion of the vehicle is to be estimated too. Research shows that optical flow techniques can work in real-time only at relatively low vehicle speeds - employing a standard computer, the limit is at about 40 to 50 km/h [15]. It is also noted that a high camera frame rate is needed to obtain images with sufficient overlap to allow a robust estimation of distance and angle between images Cameras are needed that are capable of acquiring images in intervals of less than a 1 ms (1000 frames per second) at a vehicle speed of around 50 km/h [15]. Although technically feasible, it is relatively costly based on today’s technology. The proposed sensor system is expected to provide similar results or even go beyond that using standard camera and PC technology. In [23-27], researchers introduced dropping of artificial markers in to the environment. Mostly, these approaches addressed the issue of robotic exploration in known/unknown environments, where no distance or metric information was taken in to consideration. The approach taken in this paper and by a few research groups is to use markers which are dropped into the environment and, then, to localize the robot pose [28-29]. In [28], main focus was to develop evidence grid map and topological maps. The main assumptions in their paper for perceived position of artificial markers were based on the robot’s global position using wheel encoders, potentially generating large errors. In [29], the authors presented a system capable of navigation without prior beacon locations for autonomous underwater vehicle. Global translation and rotation errors were around 2 meters and a few degrees of heading error respectively, which is too large for terrain based robot localization. Camera based sensor system proposed here measures a vehicle’s trajectory and motion employing a 2 stage process. Real-time imaging processing algorithms are developed to locate the prominent marker features and subsequently calculating the vehicle’s relative speed, position and orientation. The new concept presented here is not restricted to the particular implementation presented here. Depending on the particular application and the vehicle’s environment, a number of different implementations of this new sensor concept can be envisaged using, vision sensor, heat markersthermal cameras; RFID tags-radio frequency emitters/receivers, LED’s(solar LED’s), sound-reflecting markers-ultrasonic transmitters/receivers as a artificial marker. In Section II, the sensor concept and methodology are presented. Section III presents initial results and experiments demonstrating one of the implementation of the proposed sensor concept. Conclusion and Future work are presented in Section V and Section VI respectively.
140
II. SENSOR CONCEPT A. Sensor Architecture The proposed sensing concept goes through two phases for estimating the vehicle velocity as shown in Figure 1. In a first phase, the system ejects small, but distinct markers which attach themselves to the environment. In a second phase, the on-board sensor which is designed to be particularly receptive to the strewn markers senses the distance and orientation of the vehicle with respect to the markers.
Fig. 1. Sensor Architecture
Different from methods where beacons are placed in an environment to aid navigation before the vehicle begins its motion [16]; our approach allows localization in unknown and unexplored environments because the markers are deposited while motion takes place. While beacons are employed in confined spaces such as the factory floor and the vehicle’s position is computed through measuring the distance between and the orientation about the known fixed location of the beacons, the implementation of such a method for open outdoor spaces would prove prohibitive, especially if wide areas are to be covered. The proposed concept is also different from techniques where a robot (leader) lays out markers that can be used by other robot vehicles (followers) to follow. To the best knowledge of the authors, the proposed approach has not been described in the literature beforehand. B. Marker selection & discharge Current natural landmark recognition methods which are employed to determine a robot’s position with respect to the landmarks suffer from several disadvantages. In cases where the working environment is cluttered or unevenly lit, or the landmarks are partially occluded, errors may occur in detecting or recognizing the landmarks, resulting in errors in the position determined for the mobile robot [9] [16]. It is also observed that in many current landmark identification
schemes, the processing needed to extract information from images requires considerable processor resources, which is a hindrance if real-time localization and navigation is required. Natural landmark recognition systems also suffer from poor performance in unfamiliar or unknown terrain, often resulting in a rise of error in position and localization estimates.
known which markers are used for this process and at what distance from the camera they will appear, the dimension and size of the marker image can be computed and this knowledge can be exploited to create the appropriate convolution kernel (filter mask). The markers used in this study are circular paper shavings as shown in figure 2. Circular markers are rotational invariant and, thus, a single convolution kernel (which is effectively a simplified, graphical representation of a marker image) is sufficient to give a strong response at a marker’s location, even if an image rotation has occurred. The following convolution algorithm has been employed:
C (i, j ) = Fig. 2. Image of marker used in this study (left), simplified representation of marker image used as kernel (right).
One issue of the proposed concept is that needs consideration is that the environment is actively altered through the sensing process and depending on the application the appropriate type of marker needs to be selected. In most cases, a temporary or decaying marker is preferable. The marker needs only to be active during the acquisition process by the on-board sensor. After this it may decay by itself or be actively removed by the vehicle. In this way a “pollution” of the environment is avoided and other vehicles following a similar route can use a similar sensor without confusion. In other cases, it may be preferred to use markers that stay active for a longer period, including swarms, repeatedly conducted inspection tasks, (military) leader/follower applications and loop-closure in SLAMbased applications. C. Feature Identification and Tracking The notable developments in recent years with respect to CCD/CMOS sensor technology, computing power and algorithmic improvements have considerably accelerated the research into and usage of visual processing for vision based navigation methods. However, traversing at a speed of 40 mph autonomously using vision as input, presents still a challenge. The maximum speed of the last DARPA Grand challenge champion was in the 35-40 mph range and even though it was not fully relying on visual navigation. In most image processing algorithms an important aspect is feature identification. To achieve real-time processing it is very important to rapidly identify and locate good features from spatiotemporal image sequence [17], [18]. The proposed sensor concept avoids the limitations posed by some approaches mentioned above and simplifies the problem of feature extraction and tracking substantially. It also increases considerably the chance of identifying features even in adverse environment conditions. D. Methodology The core of the developed program to detect the markers in the image is a 2-D convolution algorithm. Since it is
141
( Ma −1) ( Na −1)
∑ ∑ A(m, n) ∗ B(i − m, j − n )
m=0
(1)
n=0
Where 0 ≤ i < Ma + Mb − 1 and 0 ≤ j < Na + Nb − 1 Other (optical flow) approaches pick a small image area in an acquired camera frame as the kernel (mask) and by “comparing” this kernel with the subsequent frame they attempt to find the image area that matches the chosen area in the subsequent camera frame [13]. However, in these approaches the chosen image area or kernel is usually highly rotational-variant and to match the kernel with a rotated image area becomes slow process. Also, for each chosen area in the first frame (at least two are needed to reliable compute image rotation) a separate kernel has to be created and applied to the subsequent frame, inevitably increasing the computational cost. These drawbacks can be solved with this new approach based on artificial markers with unified shape. The convolution process outputs an image whose peaks represent their locations of the markers in a camera frame. In our study, each camera frame contains between two and four marker spots.
Fig. 3 Distances between marker points in two subsequent frames. Distances between all marker point pairs are shown; matching distances between frames are highlighted.
E. Marker grouping In a second stage of the proposed process, the markers identified in one frame need to be paired with the corresponding markers in the subsequent frame. An algorithm has been devised that attempts to pair all marker points in the first frame with all marker points in the second frame and picks those pairs which would lead to a “reasonable” movement. It is explained below what
constitutes a reasonable movement. The developed program compares the following features to come to a decision on how to pair the markers. The first feature is the distance between marker points in the first frame. Since the markers are physical entities which are assumed to be rigidly attached to the ground and, thus, stationary, the distance between the markers will remain constant. Hence, only those pairs of marker points in first frame can be considered which have a corresponding pair of markers in the second frame which are the same distance apart. The program calculates the distance between each pair of marker points in the first frame. This set of distances is then compared to the set of distances computed for the 2nd frame. Marker point pairs whose distance vary strongly between frames are excluded from the further process fig. 3. To increase the robustness of the approach, the rotations between the straight lines spanned by the selected pairs of marker points in the first frame and corresponding straight lines in the second frame are compared to the computed rotation of the previous frame. The marker pair with the least change in orientation is selected. Hence the algorithm eliminates marker pairs which would lead to an unlikely abrupt rotation from the previous frame to the current frame transition. III. EXPERIMENTS & RESULTS The main objective of the experimental procedure is to prove the feasibility of the proposed approach. A number of tests were conducted in a lab environment in order to establish the feasibility and accuracy of the chosen approach. Tests were performed using a linear test rig as shown in Fig. 4, [2]. These initial tests focus on a spatiotemporal image acquisition using a downward facing camera for a pure translational movement. The test rig’s carriage travels on horizontal guides actuated by a DC motor chain pulley arrangement. Displacement and velocity of the carriage can be measured using a high resolution encoder attached to the motor. The camera is mounted on the carriage allowing for a controlled movement and accurate analysis of the developed algorithm. In these tests, a Phillips USB web camera with a nominal frame rate of 30fps at low resolution of 640 X 480 pixels2 is used. In this feasibility study, circular paper shavings have been used as artificial markers fig. 2, 5. These markers were sprinkled over the ground manually ahead of the motion of the carriage with camera. This process could be automated using simple mechanisms which are already used in industry for various different applications as mentioned earlier. The carriage travels at a specific, constant speed. Start and stop positions are registered and matched with the corresponding camera images obtained. During the travel of the carriage, the spatiotemporal image sequence is captured using the webcam and a standard Intel-based Pentium computer with a clock speed of 2.80 GHz, avoiding the need for a costly frame grabber. This process is repeated
142
at different speeds. Since for each step, the distance of motion is known (employing the encoder integrated with the test rig), for each frame-to-frame transition the accuracy of the proposed sensor can be analysed. In the second experimental study, to demonstrate a real environment scenario a cargo trolley with a specially developed camera attachment and a standard laptop are used to collect data. The same low cost web camera was employed taking frames at a rate of 5 per second and producing images of a resolution of 640 × 460 pixels2. In this experiment, no measurements of instantaneous positions were taken; however, start and endpoint were recorded. In addition, a measuring tape was used to measure the distance of intermediate positions which can be correlated with the trajectory of the motion of trolley as measured by the proposed camera-based localization system. As there were no additional sensors for these tests to compare results of the marker algorithm, these experiments were used to demonstrate the ease with which artificial features can be detected even in changing environment and lighting conditions. The experiments are carried out by moving the trolley manually at different velocities. It is noted that this trolley has two caster wheels in the front which caused sudden changes in direction of the camera motion leading to a curvy trajectory and blur images at times. Despite the blurry nature of quite a number of images the system was capable to identify robustly all marker positions in all images, fig 6. This experiment also allowed evaluating the accumulative accuracy of the system over a given distance.
Fig. 4 Linear test rig with moving carriage, camera module and terrain
Fig. 5 (a) Camera Mount on the Cargo trolley (b) Camera Mount
for real time processing; on average, it takes around 1 s to process an image with a resolution of 640 X 480 pixels. It is expected that by using software written in a language such as C and vector coding techniques the processing speed will improve significantly. These initial experiments suggest that the proposed artificial marker disposition concept is feasible and vehicle position and speed can be accurately and robustly estimated. Fig. 6 (a) Typical camera image. (b) Marker identification after processing. Note that the markers are correctly identified despite the relatively strong blur in the camera image.
For all experiments, the relationship between measured pixel distance and real world distance was established in a separate calibration experiment. Markers were placed on the round and ‘go-stop’ images using the used camera positioned at a specified, constant distance away from the ground were taken. The distance between marker images in pixels and the distance between real markers in mm were established. The ratio between these distance measurements allows the computation of real distances from distances in images. Obviously this simple approach can only provide useful results in a lab environment where the distances between camera and ground is kept constant and the image plane of the camera is always parallel to ground. In a real world scenario, additional means would need to be integrated to measure the camera-ground distance and the camera orientation angle. Methods for such a real-world system are discussed in [15]. The following ratios were computed for the used cameras at the respective distances:
V. FUTURE WORK The following issues will be addressed in the future. An in-depth study will be conducted to improve affine motion estimation between frames and overall motion prediction of the camera. Further, the effect of image size on position and orientation estimation robustness and computational time will be investigated. Further efforts will aim at improving the tracking accuracy and analyzing the effect of height variation on velocity and position estimation.
d (mm)
d’(mm)
516.21 444.65 454.75 426.23 439.47 440.94
530.97 442.40 449.63 429.12 435.54 434.89
TABLE I RESULT SUMMARY % Avg Error v (mm) 2.85 13.489 0.506 18.8 1.125 19.50 0.678 24.56 0.894 32.65 1.37 42.22
Avg v’ (mm) 13.56 19.2 20.106 24.12 32.24 41.88
% Error 0.526 2.13 3.01 1.79 1.255 0.805
1) ‘Start-Stop’ Linear test rig experiments: 6 mm/pixels 2) Cargo Trolley Experiments : 4.9 mm/pixels A. Discussion of results The position estimates computed by our system are shown in table I. the results from the start-stop camera experiment on the test rig show that high accuracy can be achieved at each step. The average error over the number of experiments is 1.237%. The error in average velocity of the carriage motion is 1.6% (Table 1). A trajectory plot of various experiment set is shown in fig. 6-9. The results from the trolley experiments are also showing promise. The proposed system correctly computed the distance with only a relatively small error. The overall error is 4.93% (approx. 6 mm). The estimated trajectory of trolley experiment set 1 is shown in fig.9.
Fig. 6 Estimated trajectory and encoder readings on test rig experiment set1
IV. CONCLUSION The proposed sensor concept proved to be fairly robust and responsive to the markers even when experiencing changes in environment, lighting conditions and image quality. It is observed that even using a standard web camera with a low frame rate of 5 fps, good results were achieved. This study also shows that the proposed algorithm is suitable
143
Fig.7 Estimated trajectory and encoder readings on test rig experiment set2
[12]
[13]
[14]
[15]
Fig.8 Estimated trajectory and encoder readings on test rig experiment set3 [16]
[17] [18] [19] [20] [21] Fig.9 Estimated trajectory plot for the trolley experiment [22]
REFERENCES [1]
Georgiev A, Allen PK, “Localization methods for a mobile robot in urban environments” IEEE Transactions On Robotics And Automation 20 (5): 851-864 Oct 2004 [2] Panzieri S, Pascucci F, Ulivi G, ”An outdoor navigation system using GPS and inertial platform”,IEEE-ASME TRANSACTIONS ON MECHATRONICS 7 (2): 134-142 JUN 2002 [3] Lobo J, Dias J,” Vision and inertial sensor cooperation using gravity as a vertical reference”, IEEE Transactions On Pattern Analysis And Machine Intelligence 25 (12): 1597-1608 Dec 2003 [4] Charles Thorpe, Martial H. Hebert, Takeo Kanade and Steven A. Shafer,” Vision and Navigation for the Carnegie-Mellon Navlab,” IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. Io, No. 3, May 1988 [5] S. Thrun and M. Montemerlo. The GraphSLAM algorithm with applications to large-scale mapping of urban structures. International Journal on Robotics Research, 25(5/6):403-430, 2005. [6] A. Davison, “Real-Time Simultaneous Localization and Mapping with a Single Camera," IEEE International Conference on Computer Vision, pp. 1403–1410, 2003. [7] Gutmann, J.-S,” Markov-Kalman localization for mobile robots”, Pattern Recognition, 2002. Proceedings. 16th International Conference on Volume 2, 11-15 Aug. 2002 Page(s):601 - 604 vol.2 [8] J. Yuh, M. West,” Underwater robotics”, Advanced Robotics 5(5): 609-639,September 2001 [9] DeSouza GN, Kak AC, “ Vision for mobile robot navigation: A survey “, IEEE Transactions On Pattern Analysis And Machine Intelligence 24 (2): 237-267 Feb 2002 [10] E. Marchand, P. Bouthemy, F. Chaumette and V. Moreau, “Robust real-time visual tracking using a 2D-3D model-based approach," IEEE International Conference on Computer Vision, pp. 262–268, September 1999. [11] Y. Takaoka, Y. Kida, S. Kagami, H. Mizoguchi and T. Kanade, “3D Map Building for a Humanoid Robot by using Visual Odometry,"
144
[23] [24] [25]
[26] [27]
[28] [29] [30]
IEEE International Conference on Systems, Man and Cybernetics, pp. 4444–4449, 2004. B. Jung and G. S. Sukhatme, “Detecting Moving Objects using a Single Camera on a Mobile Robot in an Outdoor Environment," In the 8th Conference on Intelligent Autonomous Systems, pp. 980–987, March 2004. David Fernandez and Andrew Price, “ Visual Odometry for an Outdoor Mobile Robot”, Proceedings of the 2004 IEEE Conference on Robotics, Automation and Mechatronics, Singapore, 1-3 December, 2004 Stephen Se, David Lowe Jim Little, “Mobile Robot localization and mapping with uncertainty using scale invariant visual landmarks”, The International Journal of Robotics Research,Vol. 21, No. 8, August 2002, pp. 735-758, Savan Chhaniyara, Pished Bunnun, Yahya H Zweiri, Lakmal D Seneviratne and Kaspar Althoefer, “Feasibility of Velocity Estimation for All Terrain Ground Vehicles using an Optical Flow Algorithm”, ICARA 2006-Third international conference on autonomous robots and agents, 12-14 December 2006, New Zealand., to be publish Robert Sim and Gregory Dudek, “Mobile Robot Localization from Learned landmarks,” Proceedings of the 1998 IEEE/RSJ Intl. Conference on Intelligent Robots and Systems Victoria, B.C., Canada, October 1998. Yang Cheng, Mark W. Maimone, and Larry Matthies, “Visual Odometry on the Mars Exploration Rovers”, IEEE Robotics & Automation Magazine, June 2006. D.Chetverik, J.Verestoy, “Feature point tracking for incomplete trajectories”, Computing, 62, 321-338 (199) J.-Y. Bouguet, “Camera Calibration Toolbox for Matlab." http://www.vision.caltech.edu/bouguetj/calib_doc/index.html A. Kelly, “Pose Tracking for Mobile Robot Localization from Large Scale Appearance Mosaics," in International Journal of Robotics Research, 19(11), 2000. M.R. Kabuka, A.E. Arenas, “Position Verification of a Mobile Robot Using Standard Pattern,” IEEE J. Robotics and Automation, vol. 3, no. 6, pp. 505-516, Dec. 1987. Dorin Comaniciu, Visvanathan Ramesh, Peter Meer, “Kernel-Based Object Tracking ”,IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 25, No. 5, May 2003 Dudek G. ; Jenkin M. ; Millios E. ; Wilkes D., “Robotic exploration as graph construction”, IEEE transactions on robotics and automation, Vol. 7, No. 6, pp. 895-865, 1991 S. Caselli, K. L. Doty, R. R. Harrison, F. Zanichelli, “Mobile Robot Navigation in Enclosed Large-Scale Space”, Proceedings IECon94, Bologna, Italy, Sept. 5-9, 1994. Ioannis M. Rekleitis, Vida Dujmovic, Gregory Dudek, “Efficient Topological Exploration.”, IEEE International Conference on Robotics and Automation, Detroit, Michigan, USA, pp. 678-681, May 1999, Maxim A. Batalin, Gaurav S. Sukhatme, “Efficient exploration without localization”, IEEE International Conference on Robotics and Automation, Taipei, Taiwan, September, 2003. B. Tovar, S.M. LaValle, R. Murrieta, “Locally-optimal navigation in multiply-connected environments without geometric maps”, IEEE/RSJ International Conference on Intelligent Robots and Systems, (2003). N. Fairfield and B. Maxwell, “Mobile robot localization with sparse landmarks,” in Proc. SPIE Workshop Mobile Robots XVI, Oct. 2001, pp.148–155. P. Newman, M. Bosse, and J. Leonard, “Autonomous Feature-Based Exploration,” in Proc. Intern. Conf. Robotics and Automation, vol. 1, pp. 1234–1240, Sept 2003. E. Olson, J. Leonard, and S. Teller, “Robust range-only beacon localization,” in IEEE Autonomous Underwater Vehicles (AUV ’04), 2004.