JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006
49
Computer Vision Methods for Improved Mobile Robot State Estimation in Challenging Terrains Annalisa Milella Institute of Intelligent Systems for Automation (ISSIA), Italian National Research Council (CNR), 70126 Bari, Italy Email:
[email protected]
Giulio Reina Department of Innovation Engineering, University of Lecce, 73100 Lecce, Italy Email:
[email protected]
Roland Siegwart Autonomous Systems Lab (ASL), Swiss Federal Institute of Technology Zurich (ETHZ), Zurich, Switzerland Email:
[email protected] Abstract— External perception based on vision plays a critical role in developing improved and robust localization algorithms, as well as gaining important information about the vehicle and the terrain it is traversing. This paper presents two novel methods for rough terrain-mobile robots, using visual input. The first method consists of a stereovision algorithm for real-time 6DoF ego-motion estimation. It integrates image intensity information and 3D stereo data in the well-known Iterative Closest Point (ICP) scheme. Neither a-priori knowledge of the motion nor inputs from other sensors are required, while the only assumption is that the scene always contains visually distinctive features which can be tracked over subsequent stereo pairs. This generates what is usually referred to as visual odometry. The second method aims at estimating the wheel sinkage of a mobile robot on sandy soil, based on edge detection strategy. A semi-empirical model of wheel sinkage is also presented referring to the classical terramechanics theory. Experimental results obtained with an all-terrain mobile robot and with a wheel sinkage test bed are presented to validate our approach. It is shown that the proposed techniques can be integrated in control and planning algorithms to improve the performance of ground vehicles operating in uncharted environments. Index Terms—rough terrain-mobile robots, computer vision, vehicle localization, wheel sinkage estimation.
I. INTRODUCTION Future cross-country mobile robots will have to explore larger and larger areas, performing difficult tasks, while preserving, at the same time, their safety. This will primarily require advanced sensing and perception capabilities. Vision is our most powerful sense through which we can get knowledge of the environment and interact intelligently with our surroundings. Similarly, mobile robots can take advantage of visual capabilities. Video sensors supply contact-free, precise measurements Based on “Stereo-Based Ego-Motion Estimation Using Pixel Tracking and Iterative Closest Point,” by Milella A. and Siegwart R., which appeared in the Proceedings of the Fourth IEEE International Conference on Computer Vision Systems 2006, NY, USA, January 2006. © 2006 IEEE.
© 2006 ACADEMY PUBLISHER
and are flexible devices that can be easily integrated with multi-sensor robotic platforms. Hence, they represent a potential answer to the need of new and improved perception capabilities for autonomous vehicles [1]. One of the main applications of vision in mobile robotics is localization, i.e. the vehicle’s capability to estimate its pose in the environment. Accurate localization is especially challenging for mobile robots operating in rough terrain situations. Conventional dead reckoning techniques are not well suited to rough terrain, since wheel slipping, sinkage, and sensor drift may cause localization errors that accumulate without bound while the vehicle travels [2], [3], [4], [5]. Conversely, since video sensors are exteroceptive devices, i.e. they acquire information from the robot’s environment, vision-based motion estimates are independent of the knowledge of terrain properties and wheel-terrain interaction. Indeed, like dead reckoning, vision could lead to accumulation of errors; however, it has been proved that, compared to dead reckoning, it allows more accurate results and can be considered as a promising solution to the problem of robust robot positioning in high-slip environments [6], [7], [8]. As a result, in the last few years, several localization systems, usually referred to as visual odometry [9], have been developed that rely on feature tracking algorithms for vehicle motion estimation. Nevertheless, in rough terrain situations, methods to sense the dynamic ill effects occurring at the wheelterrain interface are highly desirable, since these effects compromise the vehicle’s traction performance and lead to danger of entrapment with consequent mission failure [10]. A key variable in estimating vehicle-terrain interaction is wheel sinkage [11]. The knowledge of the amount of sinkage a wheel is experiencing would allow a better understanding of the effective rolling radius and more accurate position estimate. Sinkage measurements are also valuable for terrain identification according to classical terramechanics theory [12]. In this paper, two novel vision-based methods for rough terrain mobile robots are developed: a localization
50
JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006
algorithm and a method for estimating wheel sinkage. The visual localization algorithm integrates image intensity information and 3D stereo data using Iterative Closest Point (ICP). The main application of ICP, as originally introduced by Besl and McKay [13], is the registration of digitized data from a rigid object with its idealized geometric model. Here, the potentialities of ICP for vehicle motion estimate are investigated, using stereovision. In algorithm development, two basic problems of ICP were addressed: the failure when dealing with large displacements and its inability to segment input data [13]. Typical solutions rely on odometric information for predicting the displacement between consecutive frames and providing initial motion estimate before ICP registration [14]. Our method, instead, allows overcoming both problems, using the information deriving from a single stereo device, without previous knowledge of the motion. The only assumption is that the scene always contains visually distinctive features, which can be tracked over subsequent images. Experimental results obtained with an all-terrain rover, the Shrimp mobile robot [15], equipped with a stereo head, are presented. Tests were performed in laboratory environment proving the effectiveness of the proposed method in different contexts. This paper also presents an innovative algorithm for visual estimation of wheel sinkage for a mobile robot driving across soft soil. We call it the Visual Sinkage Estimation (VSE) module. A semi-empirical model of wheel sinkage is also introduced serving as an analytical means of comparison. The VSE assumes the presence of a camera mounted on the vehicle body, with a field of view containing the wheel–terrain interface. A pattern of equally spaced concentric black circumferences on a white background is attached to the wheel in order to determine the contact angle with the terrain using edge detection. Experiments performed with a single-wheel test bed are reported that prove the VSE algorithm to be effective under different operating conditions, including non-flat terrain and lighting variations. Related Work Visual odometry is an emerging and promising solution to the problem of mobile robot localization. The key idea of visual odometry is that of estimating the motion of the robot by visually tracking landmarks, opportunely selected in the environment, using an onboard camera [9], [16]. In the last years, a number of visual odometry algorithms have been proposed that use either single cameras [6], [16], [17], [18] or stereovision [7], [16], [19], [20]. They mainly differ depending on the feature tracking method adopted and on the transformation applied for estimating the camera motion. For instance, in [6], the visual module uses a variation of Benedetti and Perona’s algorithm [22] for feature detection, and correlation for feature tracking. Robustness is obtained integrating visual data and Inertial Measurement Unit (IMU) by a Kalman filter. In [7], odometry provides an estimation of the approximate © 2006 ACADEMY PUBLISHER
robot motion that allows selecting a search area for improved feature tracking using a maximum-likelihood formulation for motion computation. In [16], robust visual motion estimation is achieved using preemptive RANSAC [21], followed by iterative refinement. In this paper, we propose a visual odometry algorithm for real-time 6DoF ego-motion estimation, which integrates image intensity information and 3D stereo data in the well-known Iterative Closest Point (ICP) frame. ICP is suited for aligning point clouds where the correspondences are not known, and consists of a twostep kernel: the first step searches for corresponding points between the two point clouds based on the nearest neighbors concept; the second step determines the transformation that minimizes the distance between the nearest neighbors. The process is iterated until a convergence criterion is satisfied. ICP has been extensively studied in literature and many variants have been proposed to improve both accuracy and computational time [23], [24], [25]. Several applications have been developed that use ICP for surface registration and mapping. Most of them employ laser scanners data. However, relatively little work has been published in the domain of ICP-based visual odometry [26], [27]. Approaches using stereo vision and ICP registration can be found in [28], [29] for Simultaneous Localization and Modeling (SLAM), and in [30] for the reconstruction of 3D partial surface models. In this paper, an approach similar to [28] is adopted, using correlation for initial matching and approximate motion estimation, followed by ICP for motion estimate refinement. However, our work is different in that it deals with the visual odometry issue. The original contribution of the proposed method mainly relies on an efficient combination of various image processing and 3D registration techniques that allows robust outlier rejection in both stereo matching and feature tracking phases. Therefore, accurate motion estimates can be achieved using a few interesting points and preserving real-time constraints. For mobile robots driving across soft soil, such as sand, loose dirt, or snow, it is critical that the dynamic ill effects occurring at the wheel-terrain interface be taken into account. One the most prevalent of these effects is wheel sinkage [11]. Iagnemma et al. [31] described an online visual sinkage estimation algorithm that relies on the analysis of grayscale intensity along the wheel rim. Assuming that the wheel has a different color than the terrain, the location of the terrain interface is computed as the point of maximum change in intensity. This method is relatively simple and computationally efficient, but it is very sensitive to lighting variations and shadows. Moreover, it is based on the assumption that the wheel has a different gray level than the terrain, which implies previous knowledge of the soil appearance characteristics. Conversely, our method does not require any a priori information about the environment, while preserving computational efficiency. This paper is structured as follows. Section II presents the visual odometry algorithm. Section III introduces the
JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006
51
Time t1
Time t2
Figure 1
Block diagram of the visual odometry algorithm using two consecutive image pairs corresponding to time t1 and t2.
VSE module and the wheel sinkage model. In Section IV, detailed experimental results and discussions are provided for both methods. Section V concludes the paper. II. VISUAL ODOMETRY USING ITERATIVE CLOSEST POINT
An algorithm for real-time 6DoF ego-motion estimation is presented, which enables a mobile robot to self-localize using only the data acquired by a stereo head mounted on-board. The method employs image intensity information for feature tracking and initial motion estimation, and Iterative Closest Point (ICP) [13], [25] for motion estimate refinement. In algorithm development, two basic problems of standard ICP were taken into account: the susceptibility to gross statistical outliers, and the failure when dealing with large displacements. As an extension of these issues, another drawback of ICP was addressed, i.e. its inability to perform the segmentation of input data points: if data points from two shapes are intermixed and matched against the individual shapes, registration fails [13]. These limitations are intrinsic in ICP basic concept and become particularly restrictive for robot selflocalization and navigation purposes, as, while the sensor moves, different parts of the scene become occluded and, conversely, new objects may appear. Therefore, vast regions may be present in only one of two consecutive point clouds, and, if an outlier region is too close to a valid region, there is no possibility for ICP to perform a correct matching process [26]. The method presented in this work involves three main phases: feature detection, feature tracking, and motion estimation. Figure 1 shows the steps of the algorithm as a flow chart. Each step is detailed in the remainder of this section. Results obtained for a test case are also shown as an example of the proposed approach.
© 2006 ACADEMY PUBLISHER
A. Feature detection The algorithm starts by acquiring a stereo pair and generating a dense disparity map to obtain 3D points. The SRI Stereo Engine algorithm is employed [32]. It consists of an area correlation-based matching process, followed by a post-filtering operation that uses a combination of a confidence filter and left/right check to reject areas with insufficient texture, where bad matches are very likely to appear. The Shi-Tomasi feature detector [33] is then applied to the left image of the stereo frame to select interesting points. Only the points with an associated high stereo-confidence level 3D point are retained for further processing. Two point clouds are in the end available for each stereo pair: the pixel point cloud and its associated 3D point cloud. B. Feature tracking The tracking of visual landmarks between consecutive frames is performed using an algorithm based on Normalized Cross-Correlation (NCC). NCC allows determining the degree of similarity between two image portions f and w of dimension L × K by means of the coefficient C defined as
∑∑ (w(x, y ) − w ) ⋅ ( f (x, y ) − f ) L −1 K −1
C=
x =0 y =0
⎡ 2⎤ ⎢∑∑ (w( x, y ) − w ) ⎥ ⎣ x =0 y =0 ⎦ L −1 K −1
1/ 2
⎡ L −1 K −1 ⋅ ⎢∑∑ f (x, y ) − f ⎣ x =0 y =0
(
) ⎤⎥
1/ 2
(1)
2
⎦
where (x, y) represent the coordinates of an image point, f(x, y) and w(x, y) are the intensity value of f and w at the point (x, y), and f and w are the average intensity in f and w. C ranges between 0 and 1; the greater the value of C, the greater the similarity between f and w [34]. Based on this criterion, corresponding points are established according to the following procedure. Let us denote with {L1} and {L2} the visual landmarks
52
JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006
detected in two subsequent images Il,1 and Il,2 acquired by the left camera of the stereo device at time t1 and t2 respectively. Each point in {L1} is paired with the point in {L2} that generates the maximum value of the coefficient C in a 5 × 5 pixels window centered at the point. To speed up and improve the searching process, only features within a certain pixel distance from each other are matched. A minimum value for the cross-correlation coefficient is also established. False matches are then rejected using two strategies: mutual consistency check and robust statistics. The former consists in applying the cross-correlation-based pairing from both {L1} to {L2} and {L2} to {L1}. Only pairs that mutually have each other as preferred mate are accepted as valid matches [16] and are stored together with their correlation value. A final selection is accomplished based on the median [28] and the standard deviation from median of the computed correlation coefficients. Pairs whose correlation deviates from the median by more than two times the standard deviation from median are rejected. This process brings two principal advantages: first of all, features which do not belong to both frames are discarded, i.e. the segmentation of input data is performed; furthermore, a set of corresponding 3D points is selected which can be used for the successive motion estimation stage, providing initial alignment [28]. C. Motion estimation The problem of estimating the motion that the camera has undergone between two consecutive stereo acquisitions can be expressed as finding the 3D rotation matrix R and the translation vector t that minimize the mean-squares objective function
F(R,t) =
1 N
N
∑ Rp i =1
1,i
+ t − p 2 ,i
2
(2)
where p1,i and p 2 ,i denote two corresponding 3D points and N is the number of pairs. Motion estimation is performed, at first, using point pairs established through cross-correlation, as it is described in the previous section. Then, ICP registration is applied for motion estimate refinement. The rejection scheme proposed by Zhang [25] is employed, which allows setting adaptively the value of the maximum distance between corresponding points using the statistics of the distances (i.e. mean value and standard deviation). Least-squares rotation and translation are computed using the dual number quaternion method [35]. The process stops when the change in motion estimate between two successive iterations is less than 1%. A sample case Here, results obtained for a test case are reported as an example. In this experiment, the algorithm was applied to 320 × 240 px stereo images, after the camera rotated 10° around the pan axis (x). Figure 2(a) and 2(b) show the left frames of the two successive stereo pairs with the selected visual landmarks superimposed. Each feature has an associated 3D point. Once the features in two consecutive stereo frames © 2006 ACADEMY PUBLISHER
(a)
(b) Figure 2
Left images (a) before and (b) after rotation, with selected features superimposed.
have been selected, the problem of finding corresponding points has to be solved. This is done using both pixel intensity and 3D stereo information. In Figure 3, the left image before rotation is shown along with the correspondences determined using intensity information only. Specifically, Figure 3(a) displays the correspondences after normalized cross-correlation-based pairing. Features at a maximum distance of 100 pixels were matched and a correlation threshold of 0.85 was fixed. Figure 3(b) reports the result of the rejection process based on mutual consistency check and robust statistics, showing a reduction of false matches of about 30%. False matches (about 20% of total matches) are still present; that indicates the necessity of a refinement process. Nevertheless, correspondences established based on pixel intensity information can be employed to obtain an approximate motion estimate. Stereo data are then used, applying ICP. Final pairs are plotted in the image plane in Figure 4(a). After seven iterations (see Figure 4(b)-(c)), the absolute position error remains stable at 0.78 cm along the pan axis (x), 2.8 cm along the tilt axis (y), and 0.68 cm along the swing axis (z), while the absolute error in rotation is of 1.10°, 0.39°, and 0.04° for pan, tilt and swing angles, respectively. In Figure 5(a) and 5(b), final selected 3D pairs are displayed, before and after registration. Figure 6 reports, instead, the results obtained by applying ICP directly to the 3D point clouds, without previous processing. Evidently, no good motion estimate would be achieved.
JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006
53
(a) (b) Figure 3. Point pairs (a) after normalized cross-correlation-based tracking, and (b) after false matches rejection using mutual consistency check and robust statistics.
(a)
(a) 20 Error x
Position Error [cm]
16
Error y Error z
12 8 4 0 1
2
3
4 Iterations
5
6
7
(b)
(b)
6 Swing Error
Orientation Error [°]
5
T ilt Error Pan Error
4 3 2 1 0 1
2
3
4 Iterations
5
6
7
(c) Figure 4. Result of Iterative Closest Point (ICP): (a) final correspondences plotted in the image plane; (b)-(c) absolute position and orientation errors during iteration.
© 2006 ACADEMY PUBLISHER
Figure 5. Final pairs in 3D space (a) before and (b) after registration, using correlation and ICP. At the end, the red square points overlap the corresponding black round points.
54
JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006
(a)
(b)
Figure 6. Final point pairs estimated by applying ICP directly to 3D points, (a) re-projected onto the image plane and (b) in 3D space. In (b), black arrows indicate the positions of the red square points after ICP registration. Evidently, no good motion estimate would be achieved.
⎛ ⎜ σ zs = ⎜ ⎜ kc + k ⎜ ϕ ⎝ b
III. SINKAGE ESTIMATION In this section, we present a theoretical analysis of wheel sinkage on soft terrain and we explain our approach for visual sinkage estimation. A. Theoretical Analysis A driven rigid wheel rolling on sandy soil (see Figure 7) undergoes a certain amount of sinkage z depending on the vertical load W acting on the wheel and the wheel slip i defined as i =1−
V ω⋅R
(3)
z = zs + z j
Whereas, zj can be evaluated as ′ σ − pcrit
A semi-empirical model of sinkage z was proposed by Bekker [12], according to (4)
where zs is the value of sinkage due to static load only and zj is the counterpart due to slip. zs can be estimated as
(5)
where kc – cohesive modulus of terrain deformation; kφ – frictional modulus of terrain deformation; n – exponent of terrain deformation; σ - normal stress at the wheel-terrain interface; b – wheel width.
τ max
with V – linear speed of the wheel; ω – angular rate of the wheel; R – radius of the wheel.
1
⎞n ⎟ ⎟ ⎟ ⎟ ⎠
=
j zj
′ ) (σ > pcrit
(6)
being τmax the maximum shear stress that a given terrain can bear according to the Coulomb-Mohr soil failure criterion
τ max = c + σ max tan ϕ
(7)
with c – cohesion of the soil; ϕ
– internal friction angle of the soil;
σmax – the maximum normal stress at the wheel-terrain
interface. ′ the “Terzaghi bearing capacity” [36] and being p crit given by
(
(
)
′ = cN c + γ N q + z s + z j + 0.5b ⋅ N γ pcrit
)
(8)
where Nc, Nq, and Nγ are constants, and γ is the density of the soil. The parameter j refers to the shear displacement which is related to wheel slippage i and angle θ to j (θ ) = r [θ1 - θ - (1 - i )(sinθ1 − sin θ )]
Figure 7. Wheel-soil interaction model (adapted from [12]).
© 2006 ACADEMY PUBLISHER
(9)
being θ1 the so-called wheel entry angle or contact angle (see Figure 7). The accuracy of the sinkage model depends on the accuracy of many empirically found constants. Based on traditional soil parameters for sand [12] and tuning the
JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006
TABLE I.
55
SAND PARAMETERS USED FOR SIMULATIONS
ϕ [deg]
30
c [kPa]
1.0 n+1
k [kN/m ] n+2
kφ [kN/m ]
0.1 55 0.09
b [m]
0.08
R [m] γ [Kg/ m ]
1633
Nq [m]
0.48
3
Nc
0
Nγ
10
n
1
Figure 9. Rigid wheel sinking into deformable terrain.
remaining constants using experimental data (see Table I), we created Figure 8, which shows the relationship between wheel slip and total sinkage for three different values of vertical load. B. Visual Sinkage Estimation In order to estimate wheel sinkage, we developed the VSE module using a camera attached the vehicle body. We assume that the location of the wheel relative to the camera is known and fixed during the vehicle travel. Sinkage z can be evaluated by estimating the contact angle θ1 between wheel and terrain (see Figure 9) using the geometrical relationship z = R ⋅ (1 − cos θ1 )
(10)
The VSE algorithm requires a pattern of equally spaced 1-mm thick concentric black circumferences on a white background attached to the wheel in order to determine θ1 using an edge detection-based strategy. This
Figure 8. Total sinkage as a function of wheel slip for various ground pressures.
© 2006 ACADEMY PUBLISHER
approach allows algorithmic simplicity and computational efficiency, providing fast, real-time measurements. In practice, the VSE operates by identifying the wheel radial lines where the number of detected edges is less than that expected when the wheel rolls without sinkage. Those lines can be associated with the part of the wheel obscured by terrain and thus with sinkage. The VSE consists of the following steps: Region of Interest (ROI) identification, pixel intensity computation, and contact angle estimation. Each step is discussed in detail in the remainder of this section. ROI Identification – In order to estimate the contact angle θ1, the annular region along the wheel rim including the circumference pattern is the only image area that needs to be examined. Thus, ROI identification is first of all performed, reducing the computational time and improving accuracy. Given the position of the wheel center relative to the camera and the geometry of the wheel, the ROI can be detected using simple geometric projections. Pixel Intensity Computation – A pixel intensity analysis is performed along radial lines spanning across the selected ROI with an angular resolution of 1°. A typical intensity plot along a radial line is reported in Figure 10 for a test on sand. The VSE differentiates between a so-called “wheel region” where the wheel is not obscured by terrain, and a “soil region” (“sand region” in Figure 10) where the soil is covering the wheel. The wheel region is characterized by high intensity variations that can be classified as “edges”, while the soil region shows an almost uniform intensity value. Edges are detected based on three factors [37]: contrast: the difference between the average intensity value of the pixels before the edge and the average intensity value of the pixels after the edge; steepness: the number of pixels that constitute the edge; filter width: the number of pixels used for estimating the average intensity values. These factors were determined by analyzing a typical line intensity profile. An adaptive threshold for selecting
56
JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006
the appropriate edge intensity contrast along each radial line of inspection was experimentally determined as C=
LMax − LMin 2
(11)
where LMax and LMin are the maximum and the minimum intensities measured along a given line. Filtering is applied to reduce noise and small-scale changes in intensity due to reflection, pebbles, etc. Contact Angle Estimation – The contact angle θ1 is computed as the wheel angle where the transition between the wheel region and the soil region occurs. Pixel information is converted into metric information, using the camera parameters previously obtained by calibration. IV. EXPERIMENTAL RESULTS A comprehensive set of experiments was performed to validate the proposed methods. The effectiveness of the visual odometry algorithm was tested using an all-terrain mobile robot. A wheel sinkage test bed was employed, instead, to experimentally examine the VSE module and validate the sinkage model. A. Visual Odometry using Iterative Closest Point The visual odometry algorithm was validated using the Shrimp robot, equipped with a Videre Design stereo head as shown in Figure 11. The Shrimp is an off-road rover characterized by a passive non-hyperstatic structure, which makes it able to adapt to a large range of obstacles. It has six motorized wheels and is composed of four main parts: the body, the articulated front fork, and the two side bogies. More details can be found in [15]. Several experiments were performed, in order to test the effectiveness of the method for different motion conditions and environments. Here, results of three different tests are presented. In the first test, the robot was remotely controlled on flat carpet in a typical office-like environment (Figure 12(a)). The other two tests were performed on a rocky surface (Figure 12(b)). In all the experiments, the robot was driven at 6 cm/s. 3D information is referred to a reference frame attached to the chassis of the robot, as shown in Figure 11. The algorithms were developed in C++ language code. Note that both Figure 12(a) and 12(b) show as white
Figure 11 The Shrimp mobile robot equipped with a Videre Design stereo head (adapted from [15])
squares the points that the algorithm uses as starting tracking features. Experiments on a flat surface The ability of the system to reach a target position was evaluated, guiding the robot through an L-shaped path of about 1780(x) × 2200(y) mm to a predefined location. Five runs were executed. At each run, the i-th position errors (eipx, eipy) for i = 1, 2,…n (n = 5) were computed as
(
e ipx = pTx − pexi
)
(
e ipy = pTy − peyi
)
(12)
where [pTx, pTy] denotes the position of the target, and
(a)
(b) Figure 10 Sample diagram of pixel intensity along a radial line.
© 2006 ACADEMY PUBLISHER
Figure 12 (a) Indoor test environment and (b) rocky soil with selected features superimposed.
JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006
[piex, piey] is the estimated final position of the robot at the i-th run. The mean percentage errors and standard deviations resulted in 3.4 ± 3.8 cm along x and of 5.3 ± 4.0 cm along y. An average absolute position error (Epx, Epy) along x and y directions was defined as E px =
1 ⋅ n
n
∑e
i px
E py =
i =1
1 ⋅ n
n
∑e
i py
resulted in 5.3 ± 4.2 cm along y and of 1.2 ± 1.0 cm along z. The computed average position error Ep was of 5.5 cm. In the second test, the robot was guided to overcome two consecutive steps of 50 mm and 100 mm, at first moving forward for about 1100 mm and then backward to the start position. In this test, the variations of all the six degrees-of-freedom of the vehicle can be clearly observed, as shown in Figure 15(a) and 15(b), representing respectively the estimated 3D positions and the Euler angles during one test. Ten runs were executed. In each run, the robot started at a marked location and was driven back to the same location. The discrepancy between the actual robot position and the estimated position is the so-called Return Position Error (RPE) [4]. The absolute RPEs for each run are reported in Figure 15(c). The following mean absolute RPEs and corresponding standard deviations were computed: 2.6±3.3 cm (x), 3.1±2.8 cm (y), 6.2±3.8 cm (z). Taking into account, all the error components, i.e. the error along x, y, and z, a total average position error Ep of 7.4 cm was obtained.
(13)
i =1
Lastly, the average position error was computed as 2 2 E p = E px + E py
57
(14)
Figure 13(a) and 13(b) show, respectively, the estimated path and the variation of the yaw angle during one typical run. The position errors computed for the five tests are reported in Figure 13(c), corresponding to an average position error Ep of 6.3 cm. Experiments on a simulated rocky soil Two different tests were performed with the robot moving on a rocky surface. In the first test, after a forward displacement, the robot was guided to climb up a ramp of about 12° of inclination to reach a target position located approximately at a distance of 2200 mm along y and at a quote of 200 mm over the initial position of the robot. The test was repeated five times. Figure 14(a) and 14(b) show, respectively, the trajectory in the (y-z) plane and the pitch angle variation during one run. Position errors were estimated according to (12), (13), and (14) referred to the (y-z) plane, and are shown in Figure 14(c). The mean percentage errors and standard deviations
B. Sinkage Estimation Validation of the VSE module The performance of the VSE module was evaluated using the test bed shown in Figure 16. It consists of a driven 16 cm-diameter wheel mounted on an undriven vertical axis. A low-cost wireless 1-channel analog camera is attached to the wheel with a field of view containing the wheel-terrain interface. The actual sinkage
0 2000
-15
10
20
30
40
50
-30
Yaw [°]
y [mm]
1500
0
1000 500
-45 -60 -75
0 0
250
500
750
1000
1250
1500
1750
2000
-90
Robot Positions
x [mm]
(a)
(b) 9 6
epy [cm]
3 0 -10
-7,5
-5
-2,5
0
2,5
5
7,5
10
-3 -6
Estimated Positions Target Position
-9 epx [cm]
(c) Figure 13 Tests in indoor environment. L-shaped path: (a) robot trajectory; (b) yaw angle variation; (c) position errors.
© 2006 ACADEMY PUBLISHER
60
58
JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006
200
14 12 10
Pitch [°]
z [mm]
150
100
8 6 4
50
2
0
0
0
500
1000 y [mm]
1500
2000
0
10
20
(a)
30 40 Robot Positions
50
60
(b) 3 2
epz [cm]
1 0 -15
-10
-5
0
5
10
15
-1
Estimated Positions
-2
Target Position
-3 epy [cm]
(c) Figure 14 Tests on a rocky soil. Ramp-like path: (a) displacement in the y-z plane; (b) pitch angle variation; (c) position errors.
of the wheel can be estimated from a potentiometer mounted on the vertical axis of the system. Tests were performed under different operating conditions including non-flat terrains, variable lighting conditions, and terrain with and without rocks. Representative results are shown in Figure 17(a) for a
set of sample images with different sinkage levels. The error was always less than 13%. No misidentifications were detected in all the experiments due to reflections and shadowing. These tests prove that the VSE is able to provide realtime estimation of wheel sinkage with minimum
1200
20 x y
1000
yaw
10
600
[°]
[mm]
15
z
800
roll pitch
400
5
200
0 0 1
6
11
16
21 26 31 Robot Positions
36
41
1
46
6
11
16
-5
21
26
31
36
41
Robot Positions
(a)
(b) 16 14
z
12
y
Absolute RPE [cm]
x
10 8 6 4 2 0 1
2
3
4
5
Runs
6
7
8
9
10
(c) Figure 15. Tests on a rocky soil. Double step trajectory: (a) robot positions; (b) Euler angles; (c) position errors.
© 2006 ACADEMY PUBLISHER
46
JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006
59
Figure 16. The test bed for wheel sinkage estimation.
computational requirements and a sampling rate of 5 Hz. The algorithm also proved to be very robust against variations of lighting conditions. Figure 17(b) shows that the VSE continues to work accurately even for lighting reduction as much as 90% of the optimal value (L = 0.9). Experimental validation of the sinkage model The results obtained from the VSE module were compared with the sinkage model presented in Section III.A for a typical run on soft sand under uniform lighting and with the wheel starting from a standing condition. In
Figure 18. Comparison between the measured and calculated total sinkage.
this experiment the wheel was subjected to a ground pressure of p = 13.8 kPa and slip of i = 0.4. The results are shown in Figure 18. The gray solid line is the sinkage as derived by the VSE module, the black line is the smoothed signal using a Kalman filter to compensate for uncertainties of the measurement. The gray dotted line shows the sinkage value predicted by the model for the same values of p and i. The discrepancy between the experimentally determined sinkage at steady state and the calculated sinkage is less than 5% showing the effectiveness of the model in understanding the sinkage phenomenon. V. CONCLUSIONS
(a)
(b) Figure 17. (a) Visual estimation of wheel sinkage; (b) influence of lighting variations on the VSE module.
© 2006 ACADEMY PUBLISHER
In this paper, two novel vision-based methods for rough terrain-mobile robots were described. First, a stereovision algorithm for real-time 6DoF ego-motion estimation was presented. It integrates image intensity and 3D stereo information in the well-known Iterative Closest Point (ICP) frame, overcoming two basic problems of standard ICP, i.e. its failure in presence of large displacements and its inability to segment input data. Experimental tests with an all-terrain rover were presented that showed this algorithm to be effective for vehicle self-localization in unstructured environments. Successively, an innovative method for wheel sinkage estimation was proposed and experimentally tested on a single-wheel test bed proving to be computationally efficient, relatively accurate with maximum errors below 13%, and very robust to disturbances and variations in lighting condition. The visual sinkage estimation algorithm allowed to experimentally validate a semiempirical model proposed to predict the behavior of the wheel on sandy terrains which showed good agreement with the experiments. The methods described in this paper can be used to gain important information about the vehicle state and its interaction with the soil, improving localization accuracy and traction control of rough-terrain autonomous vehicles.
60
JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006
REFERENCES [1]
E. Tunstel and A. Howard, “Sensing and perception challenges of planetary surface robotics,” Proceedings of IEEE Sensors, Orlando, Florida, USA, 2002. [2] J. Borenstein, B. Everett, and L. Feng, Navigating Mobile Robots: Systems and Techniques, A. K. Peters, Ltd., Wellesley, MA, ISBN 1-56881-058-X, 1996. [3] P. Lamon and R. Siegwart, “3D-Odometry for rough terrain – Towards Real 3D navigation,” Proceedings of the International Conference on Robotics and Automation, ICRA'03, Taipei, Taiwan, 2003. [4] L. Ojeda, G. Reina, and J. Borenstein, “Experimental results from FLEXNAV: an expert rule-based deadreckoning system for Mars rovers,” IEEE Aerospace Conference, Big Sky, MT, USA, 2004. [5] G. Reina, “Rough Terrain Mobile Robot Localization and Traversability with applications to Planetary Explorations,” PhD Thesis, Politecnico of Bari, Italy, 2004. [6] S.I. Roumeliotis, A.E. Johnson, and J.F. Montgomery, “Augmenting inertial navigation with image-based motion estimation,” Proceedings of the 2002 IEEE International Conference on Robotics & Automation, Washington, 2002, pp. 4326-4333. [7] C. Olson, L. Matthies, M. Schoppers, and M. Maimone, “Robust stereo ego-motion for long distance navigation,” in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2000, pp. 453-458. [8] D. Helmick, S.I. Roumeliotis, Y. Cheng, D. Clouse, M. Bajracharya, and L. Matthies, “Slip Compensation for a Mars Rover,” in Proc. 2005 IEEE International Conference on Intelligent Robots and Systems, Edmonton, Canada, Aug. 2-6, 2005, pp. 1419-1426. [9] L.H. Matthies, Dynamic Stereo Vision, PhD thesis, Carnegie Mellon University, 1989. [10] T.L. Huntsberger, H. Aghazarian, Y. Cheng, E.T. Baumgartner, E. Tunstel, C. Leger, A. Trebi-Ollennu, and P.S. Schenker, “Rover autonomy for long range navigation and science data acquisition on planetary surfaces,” Proceedings of International Conference on Robotics and Automation, Washington, DC, 2002. [11] G. Reina, L. Ojeda, A. Milella, and J. Borenstein, “Wheel slippage and sinkage detection for planetary rovers”, IEEE/ASME Transactions on Mechatronics, vol. 11, no. 2, April 2006, pp. 185-195. [12] M.G. Bekker, “Off-Road Locomotion”, the University of Michigan Press, Ann Arbor, Mi, 1960.
[13] P.J. Besl and N.D. McKay, “A method for registration of 3-D shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, February 1992, pp. 239-256. [14] H. Surmann, A. Nüchter, and J. Hertzberg, “An Autonomous mobile robot with a 3D laser range finder for 3d exploration and digitalization of indoor environments,” Journal Robotics and Autonomous Systems, vol. 45, 2003, pp. 181-198. [15] R. Siegwart, P. Lamon, T. Estier, M. Lauria, and R. Piguet, “Innovative Design for Wheeled Locomotion in Rough Terrain,” Journal of Robotics and Autonomous Systems, Elsevier, vol. 40/2-3, pp. 151-162, 2002. [16] D. Nistér, O. Naroditsky, and J. Bergen, “Visual Odometry”, Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, 2004, pp. 652-659.
© 2006 ACADEMY PUBLISHER
[17] A.J. Davison, “Real-time Simultaneous Localization and Mapping with a single camera,” IEEE Int. Conf. on Computer Vision, Nice, 2003, pp. 1403-1410. [18] P.I. Corke, D. Strelow, and S. Singh, “Omnidirectional visual odometry for a planetary rover,” Proceedings of IROS 2004, Japan, 2004. [19] M. Dunbabin, K. Usher, and P. Corke, “Visual motion estimation for an autonomous underwater reef monitoring robot,” Field and Service Robotics Conference (FSR 2005), Port Douglas, Qld., 2005, pp. 57-68. [20] A. Mallet, S. Lacroix, and L. Gallo, “Position estimation in outdoor environments using pixel tracking and stereovision,” IEEE Int. Conf. on Robotics and Automation, San Francisco, CA, USA, 2000, pp. 35193524. [21] D. Nistér, “Preemptive RANSAC for live structure and motion estimation”, IEEE International Conference on Computer Vision, Nice, 2003, pp. 199-206. [22] A. Benedetti and P. Perona, “Real-time 2-D feature detection on a reconfigurable computer,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Santa Barbara, CA, USA, 1998. [23] J. Diebel, K. Reuterswärd, S. Thrun, J. Davis, and R. Gupta, “Simultaneous Localization and Mapping with active stereo vision,” IEEE/RSJ Conf. on Intelligent Robots and Systems (IROS), Japan, 2004. [24] S. Rusinkiewicz and M. Levoy, “Efficient variants of the ICP algorithm,” Proceedings of IEEE 3DIM, Canada, 2001, pp. 145-152. [25] Z. Zhang, “Iterative Point Matching for registration of free-form curves,” IRA Rapports de Recherche N° 1658 Programme 4 Robotique, Image et Vision, 1992. [26] I.A.D. Nesnas, M. Bajaracharya, R. Madison, E. Bandari, C. Kunz, M. Deans, and M. Bualat, “Visual target tracking for rover-based planetary exploration,” Proceedings of the 2004 IEEE Aerospace Conference, Big Sky, Montana, 2004. [27] A. Milella, Vision-Based Methods for Autonomous Mobile Robots, PhD Thesis, Politecnico of Bari, Italy, 2006. [28] M.A. Garcia and A. Solanas, “3D simultaneous localization and modeling from stereo vision,” Proceedings of the 2004 IEEE International Conference on Robotics & Automation, New Orleans, LA, 2004, pp. 847-853. [29] J.M. Sáez and F. Escolano, “A global 3D map-building approach using stereo vision,” Proceedings of the 2004 IEEE International Conference on Robotics and Automation, 2004, pp. 1197-1202. [30] S. Weik, “Registration of 3-D partial surface models using luminance and depth information,” Proceedings of the First International Conference on Recent Advances in 3-D Digital Imaging and Modeling, 1997. [31] K. Iagnemma, C. Brooks, and S. Dubowsky, “Visual, tactile, and vibration-based terrain analysis for planetary rovers,” in Proc. of the IEEE Aerospace Conf., Big Sky, MT, USA, 2004. [32] K. Konolige, “Small Vision Systems: hardware and implementation,” 8th International Symposium on Robotics Research, Japan, 1997. [33] J. Shi and C. Tomasi, “Good Features to Track,” IEEE Conference of Computer Vision and Pattern Recognition, CA, 1994, pp. 593-600. [34] R. Gonzalez and R. Woods, Digital image processing, Prentice Hall, 2nd Edition.
JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006
[35] M.W. Walker, L. Shao, and R.A. Volz, “Estimating 3-D location parameters using dual number quaternions,” CVGIP: Image Understanding, 54, 1991, pp. 358-367. [36] K. Terzaghi, “Theoretical Soil Mechanics”, Wiley, New York, NY, 1942. [37] National Instruments, IMAQ Vision Concepts Manual. [Online]. Available: http://www.ni.com/ Biography Annalisa Milella received the Laurea (summa cum laude) and the Research Doctorate degrees from the Politecnico of Bari, Bari, Italy, in 2002 and 2006, respectively, both in mechanical engineering. In 2005, she was a visiting scholar at the EPFL Autonomous Systems Laboratory. Her research interests include autonomous vehicles and computer vision systems. Currently, she is with the Institute of Intelligent Systems for Automation (ISSIA), Italian National Research Council (CNR) of Bari, Italy. Giulio Reina received the Laurea degree and the Research of Doctorate degree from the Politecnico of Bari, Italy in 2000 and 2004 respectively, both in Mechanical Engineering. From 2002 to 2003, he worked at the University of Michigan Mobile
© 2006 ACADEMY PUBLISHER
61
Robotics Laboratory as a Visiting Scholar. Currently, he is an Assistant Professor in Applied Mechanics with the Department of Innovation Engineering of the University of Lecce, Lecce, Italy. His research interests include ground autonomous vehicles, mobility and localization on rough-terrain, and agricultural robotics. Roland Siegwart is full professor for autonomous systems at ETH Zurich since July 2006. He has a Diploma in Mechanical Engineering (1983) and PhD in Mechatronics (1989) from ETH Zurich. In 1989/90 he spent one year as postdoctoral fellow at Stanford University. After that he worked part time as R&D director at MECOS Traxler AG and as lecturer and deputy head at the Institute of Robotics, ETH Zürich. In 1996 he was appointed as associate and later full professor for autonomous microsystems and robots at the Ecole Polytechnique Fédérale de Lausanne (EPFL). In 2005 he hold a visiting position at NASA Ames and Stanford University. He served as Vice President for Technical Activities (2004/05) and is currently Distinguished Lecturer (2006/07) of the IEEE Robotics and Automation Society. His research interests are in the design and navigation of autonomous robots operating in complex and highly dynamical environments.