1092
IEEE TRANSACTIONS ON ROBOTICS, VOL. 21, NO. 6, DECEMBER 2005
Global Alignment of Sensor Positions With Noisy Motion Measurements Hossein Madjidi and Shahriar Negahdaripour, Senior Member, IEEE
Abstract—We investigate the global alignment of some 3-D spatial points, knowing the motion between pairwise nearby positions. A common application is to determine the trajectory of a mobile vision-based system from the scene images acquired along its track. Exploiting redundant measurements and the rigid body motion constraint as the observation model, we apply the mixed adjustment model paradigm to develop recursive estimation algorithms under various scenarios. Results of experiments are given to demonstrate the performance for different noise levels in the observations and the improvements in the position estimation. We also present the results of an experiment with underwater images in the construction of the camera platform trajectory. Index Terms—Global alignment (GA), mixed adjustment model, sequential solution, stereovision, 3-D mapping, weighted parameters.
I. INTRODUCTION HIS PAPER addresses a problem which has been investigated in the context of robot localization and environmental mapping from range or video data [3], [4], [12], [18], [20]. An important problem in the construction of large-area photomosaics and 3-D topographical maps is the ability to accurately determine the positions and poses (trajectory) of the sensor platform.1 In robotics, overlapping regions in range or video scans/images taken from different robot locations are registered to generate a map which is potentially used for navigation and localization. The underlying problems have also been explored in photogrammetry for decades [15] in the application of an airborne system for terrain mapping, but often auxiliary information from global positioning system (GPS), telemetry, and target landmarks are significant components of the mapping system. The common denominator in all of these applications is the redundancy of information due to the overlap in the scans/images acquired at nearby positions, providing the opportunity to arrive at a globally optimum solution through some estimation process. The key issue is how to exploit this
T
Manuscript received March 3, 2004; revised September 15, 2004. This paper was recommended for publication by Associate Editor F. Chaumette and Editor F. Park upon evaluation of the reviewers’ comments. This work was supported by the National Science Foundation under Grant BES-9711528. The work of H. Madjidi was supported by a NASA Ames Summer Fellowship under Task 2002-IC-1-0117-0. The underwater data set was collected under Project CS-1333 supported by the Department of Defense Strategic Environmental Research and Development Program (SERDP). The authors are with the Electrical and Computer Engineering Department, University of Miami, Coral Gables, FL 33146 USA (e-mail:
[email protected];
[email protected]). Digital Object Identifier 10.1109/TRO.2005.852257 1Since
position estimation implicitly includes the concept of trajectory estimation and trajectory reconstruction, it is referred to as trajectory estimation for the rest of the paper.
redundancy and how to implement the global alignment (GA) strategy. That is, how do we devise a suitable GA scheme that yields the smallest registration error over the entire data set. A typical scenario can be stated as follows: Given some form of redundant measurements—often of a random nature with known uncertainties (variances)—tied to sensor positions, what are the optimum estimates of these positions? The word redundant plays a key role here, because the need for an optimal estimation process arises only when the number of equations in a system is more than the number of unknowns. The selection of the estimation process is also an important step, which is mainly governed by the main equation relating the measurements to the unknown parameters. The application of interest is the automatic or autonomous navigation based on the visual cues in the images of one (monocular) or more (stereo) cameras. Here, we have noisy measurements of the motion from any one position to a nearby position, which generally can be determined from any variety of sources. For example, we can determine the six translational and rotational motion components from the optical flow in the images acquired from any two nearby positions. Though these parameters can be determined from monocular cues (up to a scale factor ambiguity), they can be computed more readily from stereo images by exploiting the disparity at each viewing position. In many situations, the measurements can be expressed explicitly in terms of the unknown position parameters, but not always; this is exactly the scenario in the global-alignment problem investigated here. This calls for the application of a mixed adjustment model, for implicit observation equations, to determine the unknowns from all of the available measurements in a so-called globally optimal way. Moreover, as all of the observations (motion parameters) may not be available at the same time, a sequential solution is proposed to adapt the estimation scheme to this situation. In this case, we can update the previously estimated positions as new observations become available, instead of recalculating all of the positions again. Finally, it may become necessary to incorporate the prior knowledge of some sensor positions, when processing large data sets. In a variation, labeled as the weighted parameters case [11], the problem is reformulated by treating this available information as a second group of observations, weighted by the associated uncertainty measures. II. RELATED WORK Sawhney et al. [18] study the alignment of images for video mosaicking based on a plane homography model. The method recursively performs local and global alignments to
1552-3098/$20.00 © 2005 IEEE
MADJIDI AND NEGAHDARIPOUR: GLOBAL ALIGNMENT OF SENSOR POSITIONS WITH NOISY MOTION MEASUREMENTS
come up with the optimum transformation parameters between all overlapping frames. For frame-to-frame (F2F) registration, they use the sum-of-squared-differences (SSD) error measure in a coarse-to-fine manner. Garcias and Santos-Victor [4] investigate the same problem for underwater imagery, but also construct the camera trajectory. The method is divided into three steps: 1) motion estimation of consecutive images to determine a rough trajectory; 2) determining from the trajectory the nonconsecutive, but overlapping, images to find the motion parameters between them by feature matching; 3) estimating the best pairwise motions and the world plane that fits the data globally. Unnikrishan [22], [23] proposes a solution for mosaicking that is distortion-free in a local sense, and consistent globally. Again, the main idea here is to identify images with overlap, find the common features between them, and use those features as constraints in a global optimization, while exploiting additional constraints that come with cyclic or loop shapes. The odometry data can also be combined into his model. One advantage of this work is to use some of the properties of the process, like matrix sparsity, to speed up the estimation. Unlike earlier methods that deal with alignment of images, others have addressed the alignment of viewer poses in the space of (robot) coordinate frames. Lu and Milios [12] study the problem of consistent registration of “range-scan” measurements from the multiple views of a land rover, to build the map of an environment. They form a network of constraints from a set of nodes and the links between them. A node is of the robot. A link, conthe 3-D pose vector necting two nodes, is an observation which is obtained either by odometry or pairwise matching of the range-finder data in adjacent frames. A maximum-likelihood algorithm, applied over the whole network of frames, is used to drive a pose estimate for each frame by minimizing the Mahalanobis distance of common features between: 1) the world coordinates in neighboring frames; and 2) the observed and the actual odometry reading. Sharp et al. [19] also consider the problem of multiview registration from range data, but deal with all six degrees of freedom in the representation of pose at each view. First, decoupling the translation and rotation analysis is carried out separately. Next, the methodology strongly exploits the existence of loops/cycles, where accumulated angular error of a loop is distributed among relative rotations of the views by minimizing the sum of squared angular error between estimates and measurements. Estimation of the translations between consecutive views involves minimizing the sum of translations around each cycle. The formulation, as proposed, does not account for measurement uncertainties. Hagen and Kröse [20] also study the same problem as Lu and Milios [12], but use images from an omnidirectional camera (instead of a range scanner). The primary issue resolved is a reformulation to overcome the scale-factor ambiguity in the estimation of distance between view positions from monocular cues. Gutmann and Konolige [5], [10] use the concept of consistent pose estimation by Lu and Milios [12], and attempt to overcome its drawbacks: sensitivity to the initial estimate of and growth of the computational time by increasing the number of poses. To deal with these issues, they introduce the concept of local registration and global correlation (LRGC).
1093
Simultaneous localization and mapping (SLAM) is another relevant paradigm in robotics applications [17], [21], where the goal is to concurrently construct an environmental map and determine the robot’s position, as the data is acquired. Much of the work on SLAM uses sonar data for environments modeled by a 2-D map e.g., hallways and rooms in a building within an extended Kalman filter (EKF) framework. Our method also addresses the global-alignment problem in the space of viewer coordinate frames, however, it does not require a dynamical model of the EKF for the predication stage. It is most related in dimensionality to the work of Sharp et al. [19], as we also deal with sensor positions, with the relative motions between them defined as a 6-D vector. Our framework exploits the rigid body motion of the sensor platform directly to estimate the trajectory positions, using the mixed adjustment model. Furthermore, our least-square (LS) formulation provides a natural mechanism to incorporate measurement uncertainties. Last but most significant, we do not exploit the existence/concept of loops, which is the fundamental basis for the optimization criteria in estimating the rotations and translations in [19]. Like most other work, we treat the trajectory positions as a network of nodes, whose connections define our measurements. More precisely, regardless of the sensor path during data acquisition, any two positions/nodes can provide a constraint in the optimization process. The balance of the paper is organized as follows. In Section III, we formulate the problem and present various solutions that build on the mixed adjustment model. Section IV covers an application of the proposed framework in the context of 3-D topographical mapping from flyover imagery. In Section V, results are presented to evaluate the method, comprising both synthetic data and an experiment with underwater images. Finally, conclusions are made in Section VI. III. GLOBAL ALIGNMENT OF SENSOR POSITIONS WITH NOISY MEASUREMENTS A. Problem Statement Consider the path of a mobile system, defined by a set of discrete sample points . Each point is characterized by the 3-D position vector . A measurement comprises the six translational and rotational parameters describing the rigid body transformation of the sensor platform between any and . More generally, a noisy 6-D two neighboring points motion vector and its covariance matrix is assumed. One way to determine this information is from the motions of a number of image features or the optical flow, given images acquired at these two positions (this is the approach used in this paper). Alternatively, we can determine the transformation between the two coordinate systems at nearby positions, using two sets of 3-D measurements or two 3-D range maps described in these coordinate systems [7], [8]. The relationship between the motion parameters and the is governed by the constraint sensor positions and (1) where can be computed from the rotational vector using the Rodriguez formula, and is the translation vector. We have one such constraint between any pair of trajectory
1094
IEEE TRANSACTIONS ON ROBOTICS, VOL. 21, NO. 6, DECEMBER 2005
points with known motion between them. The estimation , problem is to determine the position vectors which are the unknown parameters, knowing the motion parambetween a large number of nearby points, which are eters hereafter treated as noisy observations. With points, we have unknowns, but we can write (1) for each pair of nearby points with known motion. Under best conditions with the knowledge of motions between any possible pairs of points, we constraints. However, we do not know each have pairwise motion in practice, though we typically have many more constraints than unknowns. As stated previously, we can determine the motion based on sufficient overlap between the 2-D images or 3-D maps at the corresponding camera positions. Since the motion measurements cannot be presented explicitly in terms of the position vectors, the problem cannot be solved with standard LS techniques. However, it fits into the mixed adjustment model framework [9], [11], [24], which is generally characterized by a constraint of the form (2) where and are the observation and unknown parameter vectors, respectively. The observations are treated as random variables with attached covariance matrix . If we rewrite (1) in the form of (2), we will have (3) For example, with four positions where all pairwise motions are known, the measurements and unknown parameter vectors are
(4) Having defined the observations and the unknown parameters, we can define a cost function to be minimized, taking into account our rigid body motion constraints. For example, we can directly minimize the sum squared error in the measurement equations. The main drawback here is that no constraint is guaranteed to be satisfied due to noisy measurements, though we minimize the total squared error over all the constraints. Alternatively, we would allow deviations in our motion measurements in such a way that our rigid body motion constraints are satisfied. Using Lagrange multipliers to formulate an unconstraint optimization problem, we define the cost function (3)
B. Mixed Adjustment Model: Overview In this section, we review the theory of mixed adjustment model [9], [11], [24], cost function formulation, and the solution which is derived from the update equations. The mixed adjustment model is used when one cannot express the observations (of a random nature) explicitly as a function of the unknown parameters. Instead, one deals with an implicit constraint of form (2), which may be a nonlinear model. This equation can be linearized by a Taylor series expansion around initial values
(6) where denoted as terms of order 2 and higher. As it beis chosen as the noisy observation, and comes clearer, is the initial guess of the unknown parameters. Ignoring higher , , and order terms and defining we have the linearized form of the constraint equation (7) where (8) The number of constraints is generally more than the number of unknowns, and so we need a way to estimate a solution in the LS sense. With the use of a Lagrange multiplier for each constraint, we formulate an unconstrained optimization problem for the mixed adjustment model, in agreement with [6], [11], [14], and [24]. The cost function consists of the squared magnitude of the observation residual, plus the constraints from (3) weighted by the Lagrange multiplier vector (9) . The solution is given by the minimum of , which is obtained from the zeros of the partial derivatives with respect to , , and where
(5) where is the vector of estimated motion parameters. The minimization of the cost function is carried out by the application of the mixed adjustment model, which is described in the next section. It turns out the revised form of (3) based on linearized constraints is more suitable and computationally efficient in the employment of the mixed adjustment model, leading to closed-form expressions for the iterative update of the unknown parameters.2
(10) To solve this system of equations, we solve for equation
from the first
(11) and substitute into the third equation
2We
are exploring formulations based on the original nonlinear constraint, in order to establish the tradeoff accuracy and computational requirements.
(12)
MADJIDI AND NEGAHDARIPOUR: GLOBAL ALIGNMENT OF SENSOR POSITIONS WITH NOISY MOTION MEASUREMENTS
Defining written as
, the Lagrange multiplier can be (13)
which after inserting into the second equation in (10) leads to the main equation (14) This gives the unknown parameter vector (15) We can then solve for the residual vector (16) Accordingly, the covariance matrices of the estimated parameters and observations can be computed
1095
Two comments should be made. on convergence: It is usuRole of the initial value ally true that the initial value of the estimation parameters has an important role in the convergence of most nonlinear estimation algorithms. In our case, we compute the initial value directly from the motion measurements, which can be often estimated with good accuracy, given sufficient overlap between the images or maps at nearby positions. for the Threshold value: We have used threshold, with no constraint on computational time. For real-time processing, this choice may be dictated by the tradeoff between accuracy and computation time. Two variations in the formulation of our LS estimation, the so-called sequential formulation and weighted parameters estimation, are directly relevant to the application of these techniques for 3-D terrain mapping from flyover imagery. We will describe these, and use them in the results of Section IV.
(17) D. Sequential Formulation C. Solution to Sensor-Position Estimation The above equations provide the solution for our problem based on the nonlinear constraints in (3). As stated in Secand the motion parameters tion III-A, the sensor positions form the parameter and the observation vectors , respectively. And the problem is to find the optimal based on all the measured values. Like any other is required nonlinear estimation problem, an initial value for the unknown parameters. This vector can be computed and extending it to the rest of through (1), starting with the points, knowing that the motion parameters between these points are available. Without loss of generality, the first position is chosen as the reference point. Having determined the initial guess of the parameter vector, the process involves the following steps. 1) Initialize . 2) Set and . 3) With the current estimate, evaluate
4) Determine the unknown parameters from (15) and residual vector from (16). is satisfied or the number of iterations 5) If either reaches a prespecified maximum, go to the next step ( is a preselected threshold). • Otherwise, update , and (18) • Increment , and go to step 2). 6) Compute the estimated covariance matrices of unknown and observations from (17). parameters
One of the assumptions in solving a LS problem or any other minimization computation is that all of the measurements are available at the processing time. What if we have new observation equations that have to be added to the previous ones at a later time? This is a common scenario in certain applications, including vision-based navigation and mapping. It is often the case that the new observations will change the estimation results, as in the application of SLAM in robotics based on the EKF formulation. We would like to avoid the inefficient approach of combining both (the new and old) observations and redoing the LS estimation. In other words, we are interested in the recursive estimation, as in an EKF, of the mixed adjustment model. Here, we explore the impact of the new observations on the parameters without redoing the estimation based on the collective observations. More precisely, we solve the problem in a sequential process [2], [11], where we obtain an estimate of the parameter vector based on the first observation group, and revise it whenever new observations become available. The main advantage of this method is that it exploits the temporal nature of the observations; i.e., all are not available at the same time. Another advantage is to reduce the computations, because we do not process the old observations again, but only their results. The derivation of the formulas for the sequential solution of the mixed adjustment model is based on the generalization of the observation adjustment method for the explicit model [2], [11]. Suppose that: 1) there are two groups of observations , where the second set becomes available at a later time; and 2) we have a solution by the application of the method in . For genSection III-B to the constraint equation erality, we assume that the second set of observations depends on the initial and some new unknown vectors and , respectively. Thus two sets of constraint equation can be written (19)
1096
IEEE TRANSACTIONS ON ROBOTICS, VOL. 21, NO. 6, DECEMBER 2005
with and as the weight matrices associated with the two sets of observations (e.g., inverse covariance matrices). Our goal is to compute the revised solution based on new constraints in (19). After linearization, we have
of unknown parameters do not change between the two groups . Following the procedure in Section III-D, we can write (24)
(20) and . Now, we can formulate knowledge of some (previously computed) parameters in the form , where is a matrix with elements (equal to one when , and zero otherwise). Similarly, the corresponding covariance matrix is determined from and . The solution follows directly from that given for the sequential formulation. To summarize these results, we have reviewed the method of the LS mixed adjustment model in addressing problems where the observations (or measurements) cannot be written explicitly as a function of the unknown parameters. We have given the necessary formulas to determine the solution. Next, we have generalized the methodology for two special cases: sequential solution and weighted parameters estimation. We now explore how the method is applied to estimate by GA of the sensor positions of a mobile vision-based platform. where
or in the matrix form
(21) This is in the form of (7), which based on (14), leads to (22) where
IV. A 3-D MAPPING SCENARIO The inverse of the left-hand matrix can be expressed in terms of its subblocks [2], and used in writing the solution to (22). Taking this into account and considering the fact that a primary estimate of the parameter vector is available from the first observation group, i.e., (23) and , we can come where up with the necessary equations for the sequential solution: 1) ; 2) ; 3) ; ; 4) ; 5) 6) ; 7) . and are the only information needed from the Note that first group of observations for updating the estimates. E. Weighted Parameters Estimation In some cases, some/all of the parameters and their related uncertainties (variances) are known, say, from a previous LS estimation process, a different measurement source or sensor, etc. In such a scenario, we want to incorporate this information into our model (2). This can be done by adding a second group of observations to our model [11], as in the sequential solution, with the difference that: 1) the second group of observations is available at the same time as the first one; and 2) the number
We demonstrate the application of the proposed framework for terrain mapping from flyover stereo imagery. A large area can be covered by many arbitrarily different sensor paths. The most efficient, but also least suitable, in achieving high accuracy is the so-called lawnmower pattern with no overlap between consecutive up and down swaths. In this scenario, the best estimate at each sensor position is the integrated incremental F2F motions. The estimates are highly sensitive to the drift error. which grows with the distance traveled. Improvement in accuracy can be achieved by allowing for image overlap between sample positions along consecutive swaths. Without loss of generality, we consider as an example the following conditions: 1) flyover operation of an airborne system over a 1 km 1 km area; 2) a trajectory that follows the lawnmower pattern; 3) a 10-m platform travel distance, translating to roughly 90% image overlap between consecutive frames; 4) a roughly 80% overlap between adjacent parallel kilometer-long swaths, thus requiring 50 swaths to cover the whole area; or 25 double tracks, 5 of which are depicted in Fig. 1 (bottom right); 5) a network of connections translating to rigid-body motion constraints of the form in (1), defined between any two neighboring camera positions (nodes) with minimum 70% image overlap. As an example, Fig. 1 depicts a sample node and its connections to points over two adjacent swaths (top left), as well as all the connections over an small area (top right). Roughly 5100 stereo pairs are required to image such an area, resulting in significant accumulation error for 3-D mapping if the positioning is solely based on the integration of F2F motions along the tracks. In applying our GA method, we can determine the sought-after motion observations in many ways. For example, we can compute the optical flow for the images of two nearby positions, in addition to the disparity from the
MADJIDI AND NEGAHDARIPOUR: GLOBAL ALIGNMENT OF SENSOR POSITIONS WITH NOISY MOTION MEASUREMENTS
1097
Fig. 2. Average error in estimated position for the first-track trajectory points with noisy observations from new tracks 2:7, for different noise levels.
Fig. 1. Bottom left: first two tracks of a lawnmower pattern where the two magnified sections show the connections at one node (top left) and over some area (top right). Bottom right: first five tracks where line thickness highlights the sequential track-by-track nature of data acquisition and processing.
pair of stereo images for each sample position (node). The motion parameters can be readily computed from the optical flow, knowing the depth (disparity) at the first of two positions [16]. In other applications, these motion parameters can be estimated from other types of input data; e.g., from range images [8], or the 3-D measurements of points in the two coordinate systems [7]. In the scenario described above, we have 15 300 unknown parameters, and roughly 51 000 equations for 17 000 connections. For offline computation, we can employ the general solution from Section III-C and solve for all of sensor positions at once. To overcome the practical complexities in processing such a large amount of data, exploiting the fact that new observations (motion parameters) have little impact on nodes over distant swaths, and to allow for online processing [16], we seek more effective computational strategies. For example, we can carry out the GA optimization at the end of each track , based on the observations over the current and only the previous tracks. When the trajectory has been designed a priori and the crosstrack observations have been planned based on the sensor locations, can be determined, based on a small tradeoff in accuracy for much lower computational cost. We use the formulation in the Appendix in establishing to what level the observations over some future tracks contribute to the estimated sensor positions of some current tracks. Here, we recursively add observations over each new track to the ones from previous track(s), and determine their effect. As we describe in the Appendix, it is sufficient to assume that the new observations are corrupted with additive noise, and to analyze the variations in the estimated positions due to the noise level. Three levels of relative noise are considered here: 1%, 5%, and 10%, with respect to the signal level.
Fig. 3. Normalized mean error in the positions of first-track points from noisy observations of tracks 2:7 is independent of the noise level.
Fig. 2 shows the mean error in the estimated track 1 positions as a result of the noise in the observations of tracks 2–7 for different noise levels. One notes that while the mean variation in track 1 positions (error) drops with the addition of new tracks, the absolute error directly depends on the noise level. The linearity between the noise level and the absolute error is due to and the fact that in the above-mentioned scenario, the pitch components of the rotation vectors are negligible, and roll is also small. We can eliminate the dependency the heading on the noise level, if we instead consider normalized errors. The normalization is done with respect to the largest mean error, corresponding to the case where only one future track (track 2) contributes new observations. (The normalized error is the absolute error divided by the largest). Considering the normalized error, depicted in Fig. 3, we conclude that the error drops to roughly and 1% for (i.e., using observations of 5% for tracks 1–5 or 1–6, respectively). Alternatively, this suggests that beyond four or five future tracks, there is little change (5% or 1%) in the estimated positions of trajectory points over track 1. Consequently, we can restrict the GA strategy to using the ob4 (or 3) previous tracks with those servations over up to from the current track; see Fig. 1 (bottom right). Additionally, the sequential solution introduced in Section III-D provides the advantages of reduced dimensionality
1098
IEEE TRANSACTIONS ON ROBOTICS, VOL. 21, NO. 6, DECEMBER 2005
and computational speedup by determining only the incremental variations in the previous estimates as the observations of a new track become available. Moreover, recall that the nodes of track ( in our case) are also connected to , which would no longer contribute those from track to the GA. We can apply the method of Section III-E by treating these as so-called weighted parameters [11], previously estimated and assumed fixed with known variances. Taking these into consideration, the 3-D mapping scenario has been analyzed in details in our previous work [16]; from trajectory planning and trajectory following, to the final 3-D map computation. It has also been shown that the whole process fits very suitably within the framework of parallel and pipeline computing architectures, and can be carried out in (near) real time. This process consists of collecting the images, computing the optical flow and stereo disparity, estimating the motion parameters, and doing the GA. V. EXPERIMENTS Three types of experiments are presented to test the proposed methods for the GA of trajectory positions from noisy motion measurements. The first one comprises synthetic data, simulating different trajectory types in mapping an area. The second experiment is a simulation of the scenario in Section IV with ground-truth data, where the motion parameters are estimated from the optical flow and disparities in a stereo video. The final experiments deals with the estimation of the camera trajectory from an underwater flyover video imagery. A. Synthetic Data We explore the performance over one track of the lawnmower pattern in the scenario described in Section IV; see Fig. 4 (top row). Alternatively, the same area may be imaged with looping tracks, crossing repeatedly at some known position. Here, errors within each loop/cycle will remain bounded and independent from other loops. One notes an immediate tradeoff, where while the accumulated error may be reset at the crossover point, there is a smaller number of connections (constraints) between nodes that are on the two legs of the path; see Fig. 4 (bottom row), where dashed connections represent nodes with our arbitrary 70% image-overlap rule that contribute to the motion observations. We consider two loops of such a trajectory. The slight increase in the area covered by looping trajectory (compared to the lawnmover pattern) is to carry out a more severe test with a smaller number of connections (to apply the GA). In other words, if we select the same width in the looping trajectory as in lawnmower pattern, we have more connections (motion observations) in the middle section. This allows us to test both the differences in the estimation accuracy due to the track shape and the number of connections. Finally, small variations are also tested where the trajectory closes itself. Open trajectories simulate situations where closing the loop at a desired position may be hard to achieve in practice. The GA is performed for each case based on five different noise levels, 1%, 5%, 10%, 18%, and 25% of the true parameters of the motion between nodes, added to the perfect observations. A Gaussian noise distribution is assumed. Moreover, the
Fig. 4. Four trajectories used in experiments with synthetic data sets. Top to bottom and left to right: open/closed lawnmower pattern, and open/closed looping trajectories.
experiment was repeated 100 times for each noise level. For all data sets, the first position of the sensor is assumed known.3 The average error over all the nodes is used for performance evaluation. In Fig. 5, the error of all the sensor positions for noise level of 10%, and the mean-squared error (MSE) of all the points for various noise levels, have been depicted. Results for each of the four trajectories, given in Table I, can be summarized as follows. • The error for the closed loops is smaller than that of the open trajectories for all noise levels. As expected, this is influenced by the first sensor position, which is assumed known. • The MSE of the estimated positions increases almost linearly with the input noise level. This establishes the degree of stability in dealing with various erroneous observations. • The positioning accuracy improvement over the F2F motion integration varies from 80% for 1% noise level, to 45% for 25% noise level. Decrease in the improvement 3A more complete method of fixing positions is the one that was described in Section III-E.
MADJIDI AND NEGAHDARIPOUR: GLOBAL ALIGNMENT OF SENSOR POSITIONS WITH NOISY MOTION MEASUREMENTS
1099
Fig. 5. Results of experiments with synthetic data for the lawnmower pattern (top) and looping trajectory (bottom). Left to right: position error along trajectory for noise level of 10% and MSE of the entire trajectory versus noise level for various cases. F2F integration (dotted), open-loop trajectory (solid), and closed-loop trajectory (dashed).
TABLE I NUMBER OF EQUATIONS, OBSERVATIONS, AND UNKNOWNS FOR EACH DATA SET IN THE FIRST EXPERIMENT, AND MEAN POSITION ERROR BASED ON F2F INTEGRATION AND AFTER GA
Fig. 6. (a) Sample stereo pair. (b) True depth map and camera trajectory for ground-truth data set in experiment 2.
zero. A 90% image overlap between consecutive frames transm stereo lates to roughly 10-m travel distance, with a baseline. Although the motion parameters can be estimated from different methods [1], we have used the formulation based on the optical flow and stereo disparity computations. MSE of the estimated motion parameters is (m) rate for larger noise levels is expected. Observations become less consistent with the rigid body motion constraint that the GA does not improve the results as significantly. It is to be noted that the aim of this experiment is not to find the best trajectory, but to analyze the performance of the proposed algorithm on tracks with different shapes and with different levels of input noise. B. Simulated Stereo Video This experiment simulates the scenario in Section IV, the navigation and mapping operation of an aerial platform. Fig. 6(b) shows the terrain serving as the ground-truth data, along with the designed trajectory imposed on top of that. The images are constructed using a virtual reality (VR) model based on a prescribed texture map [25]. Fig. 6(a) depict a selected 320 240 stereo pair captured from the VR model and processed by our system. The distance from the start to end of each swath is 1 km, thus the imaged area directly below the path slightly exceeds 1 km . The flying altitude is 110 m above the reference plane, with terrain height variations in the range 5–65 m (above the reference plane). The area is covered over 25 tracks, each with 204 positions. The camera moves in a horizontal plane with no rotation; , , and of the motion vector are i.e. components ,
(m)
(m)
(rad) (rad)
(rad)
Fig. 7 shows the position errors versus frame number computed by F2F integration (dashed) and after GA (solid). Fig. 8 (left to right and top to bottom) gives the error in positions for tracks 1–5, 11, 16, 21, and 25 before and after the GA process. Recall that the GA process is applied to, and thus affects, the tracks at each time. Up to track 5, theremost recent fore, each GA correction process affects the entire past trajectory, and the data from each new track allows us to improve all of the previous estimates. After this, only the last five tracks revised by the GA process are shown. (For instance, GA updating for track 11 involves tracks 7–11.) The solid and bold curves show the results prior to and after the GA. In each case, the primary difference is over the last track, which in the former case is determined by the F2F integration from the beginning to the end of the track. These results provide a measure of the track-by-track error growth, while the entire trajectory without GA provides the impact at the global scale; see Fig. 7. For the entire trajectory, we have also shown in Fig. 9 the 3-D position error (top), altitude error (middle), and the heading error (bottom). (The other two components of the camera pose, not shown here, are negligible.) We note that these components
1100
Fig. 7. Position error GA in experiment 2.
IEEE TRANSACTIONS ON ROBOTICS, VOL. 21, NO. 6, DECEMBER 2005
m versus frame number before (dashed) and after (solid)
Fig. 9. Top to bottom: Errors along the trajectory of position (m), vehicle altitude (m), and heading direction (rad).
m
Fig. 8. Positioning errors versus frame number, before and after GA performed at end of each track, shown in solid and bold-solid, respectively. From top to bottom and left to right are the results for the last (up to five) tracks which end with track number 1–5, 11, 16, 21, and 25.
are relatively small, with the largest errors, not surprisingly, corresponding to points farthest from the start position. The same experiment for the monocular system instead of the stereo system, results in similar performance (up to the scale factor ambiguity of monocular vision), the details of which has been described in [13]. C. Real Monocular Video The last experiment involves a real monocular sequence, comprising 479 images recorded from a remotely operated vehicle flying at an altitude of approximately 2 m above the sea floor (see Fig. 10). The imaged area is relatively small, slightly larger than a 3 m 3 m area, and thus all the available connections between overlapping pairs of images have been considered in the GA computations. Furthermore, the GA is applied over all the sensor positions at once, rather than incrementally over the latest five tracks. The total number of
Fig. 10. Underwater scene in real data set imaged by the ROV; designated area to be mapped is within grid of PVC pipes.
MADJIDI AND NEGAHDARIPOUR: GLOBAL ALIGNMENT OF SENSOR POSITIONS WITH NOISY MOTION MEASUREMENTS
1101
at the two nearby views. The latter is the revised motion after imposing the rigid body motion constraints in the GA process. Fig. 12 comprises an image at one position along a swath (first column), another image with a relatively large overlapping region from a position in the next swath (second column), and the transformations of the second image to the coordinate system of the first view based on the motion estimated directly from the optical flow for these two images, raw observations used as input to the GA (column 3), and after GA (column 4). This comparison has been done for four pairs of images from different parts of the trajectory, shown in Fig. 11(a) with black circles connected to the nearby view by a solid black line. These results have been arranged in the order of the ROV positions along the trajectory, starting from one end of the trajectory at the bottom left. These generally confirm that the substantial improvement in the final trajectory is achieved with little change to the motion observations during the GA process; contribution from the first error term in (5). In fact, a slightly better result in row 4 after the GA reflects the improving impact of the process on the noisy motion measurements. VI. CONCLUDING REMARKS
Fig. 11. Estimated trajectory for the real data set before (dashed) and after GA (solid). (a) plane. (b) direction.
X 0Y
Z
equations, observations, and parameters are 9675, 19 350, and 1434, respectively. In the absence of ground truth, the accuracy is evaluated with respect to a 3 m 3 m grid made of white PVC pipes, placed over a certain region of the terrain that was imaged. Fig. 11 shows the trajectories of the sensor before (dashed line) and after (solid line) the GA, projected onto the plane and in the direction. This grid is shown as a red square on the trajectory. The fact that the trajectory after GA exceeds the grid boundaries is consistent with the true remotely operated vehicle (ROV) path, where each turn to start the new swath was planned to be immediately outside the grid. Furthermore, the elimination of integrated error can also be confirmed from the component, which suggests a relatively constant ROV altitude during the operation. An alternative way to evaluate the impact of the GA process is based on the results in Fig. 12. Here, we have warped images at nearby positions to the same coordinate system for comparison, based on the motion measurements before and after GA. The former is the motion estimated directly from the images
GA of sensor positions with noisy motion parameters has been addressed. The mixed adjustment model paradigm provides the proper framework in arriving at the proposed solutions, when the observations cannot be expressed explicitly in terms of unknown parameters. We have described the estimation method in detail (Section III-B), along with two variations, the sequential solution (Section III-D) and weighted parameters estimation (Section III-E), that provide effective ways to solve the problem when new observations are added incrementally. Three sets of experiments have been conducted, one with synthetic data to show the performance with varying noise levels, the second to show applicability for situations with high data volume, and the last with a sequence of nearly 500 underwater images. Future work will address a suitable strategy to select suitable network nodes and interconnections between them, given a dense set of nearby sensor positions. Given that we have 1–2 orders of magnitude more nodes that the minimum necessary for position estimation, the question becomes how we determine the critical nodes that have the largest impact on accurate pose estimation, which may be eliminated. Additionally, we may devise a recursive estimation technique where we first process the most important/informative nodes, then add other ones to improve the estimation, and stop the process where/when the position improvement is negligible. Another issue is the fusion of vision data with measurements from auxiliary sensors, e.g., GPS/DGPS and telemetry data, though that is more of an application-specific issue. APPENDIX ANALYZING EFFECT OF OBSERVATIONS ON A SUBPART OF SENSOR POSITIONS In our network comprising a group of unknown positions and motion observations, the effect of the observations on the estimated parameters depends on the spatial distance between them; the farther a motion observation is from an unknown trajectory
1102
IEEE TRANSACTIONS ON ROBOTICS, VOL. 21, NO. 6, DECEMBER 2005
Fig. 12. Evaluation of GA based on image registration. From left to right: original image and a second image from next swath with relatively enough overlap, second image warped to the coordinate frame of original image based on direct estimation of motion between two positions, and based on modified motion after GA.
point, the less it impacts the estimated position. The goal here is to quantify this relationship. How do new observations at inof some curcreasing distance affect the estimated position rent point, and when the position update becomes insignificant with additional observations? These results are used directly in Section IV to establish to what degree the observations over fuhave significant enough impact ture tracks on the estimated sensor positions on the current track . Conversely, when does the improvement in the positioning accuracy ? become negligible with observations of some future track To investigate these issues, we process observations over tracks one by one, and compute the change in , until the adjustment based on the observations from track is negligible. Furthermore, the problem can be most effectively addressed by a variation of the sequential formulation presented in Section III-D. In order to derive the proper formulation, we start with the constraint model in (2) (Ai) Suppose we divide the observations and parameters vectors, and , into two parts, according to Fig. 13
Fig. 13. From top to bottom: arranging observation vector into l and l parts, and dividing parameter vector into the part x that is of interest and remainder x .
we can rewrite the observation model (Ai) in two groups
(Aii) which after linearization leads to
(Aiii)
MADJIDI AND NEGAHDARIPOUR: GLOBAL ALIGNMENT OF SENSOR POSITIONS WITH NOISY MOTION MEASUREMENTS
The LS solution of (Aiii) for
can be written as follows:
1103
do not necessarily reflect the views of the National Science Foundation.
(Aiv) where
REFERENCES
and
Without loss of generality, we can assume that the linearization (Aiii) is done around the ideal observations and parameters, so that (Av) due to observations , we repreTo analyze perturbation in , in sent as the ideal component plus noise, order to analyze how the elements of are perturbed due to can be written as noise. This means that
(Avi) Substituting for
and
in (Aiv), we have (Avii)
where . as a result This directly gives the sought-after variations . To summaof the “relevant component” of observations rize, we have arrived at an expression in (Avii) to evaluate the effect of a certain part of the observation vector, , on some component of the unknown parameter vector . This requires the knowledge of only the position of the sensor locations on the network (trajectory) and observations. ACKNOWLEDGMENT Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and
[1] X. Armangue, H. Araujo, and J. Salvi, “A review on egomotion by means of differential epipolar geometry applied to the movement of a mobile robot,” Pattern Recog., vol. 36, pp. 2927–2944, 2003. [2] M. A. R. Cooper, Control Surveys in Civil Engineering. London, U.K.: Collins, 1987. [3] S. D. Fleischer and S. M. Rock, “Global position determination and vehicle path estimation from a vision sensor for real-time mosaicking and navigation,” in Proc. Oceans, Halifax, NS, Canada, Oct. 1997, pp. 7641–7647. [4] N. Gracias and J. Santos-Victor, “Underwater mosaicking and trajectory reconstruction using global alignment,” in Proc. Oceans, Honolulu, HI, Nov. 2001, pp. 2557–2563. [5] J. Gutmann and K. Konolige, “Incremental mapping of large cyclic environments,” in Proc. IEEE Int. Symp. Computat. Intell. Robot. Autom., 1999, pp. 318–325. [6] R. A. Hirvonen, Adjustment by Least Squares in Geodesy and Photogrammetry. New York: Ungar, 1971. [7] B. K. P. Horn, H. Hilden, and S. Negahdaripour, “Closed form solutions of absolute orientation using orthonormal matrices,” J. Opt. Soc., vol. 5, pp. 1127–1135, 1988. [8] B. K. P. Horn and J. G. Harris, “Rigid body motion from range image sequences,” Comput. Vision, Graphics, Image Process., vol. 53, no. 1, pp. 1–13, Jan. 1991. [9] K. R. Koch, Parameter Estimation and Hypothesis Testing in Linear Models. New York: Springer-Verlag, 1999. [10] K. Konolige, “Large-scale map-making,” in Proc. AAAI, 2004, pp. 457–463. [11] A. Leick, GPS Satellite Surveying, 3rd ed. New York: Wiley, 2004. [12] F. Lu and E. Milios, “Globally consistent range scan alignment for environment mapping,” Auton. Robots, vol. 4, pp. 333–349, 1997. [13] H. Madjidi, S. Negahdaripour, and E. Bandari, “Vision-based positioning and terrain mapping by global alignment for UAVs,” in Proc. IEEE Int. Conf. Adv. Video, Signal Based Surveillance, Miami, FL, Jul. 21–22, 2003, pp. 305–312. [14] E. M. Mikhail, Observations and Least Squares. New York: Donnelley, 1976. [15] E. M. Mikhail, J. S. Bethel, and J. C. McGlone, Introduction to Modern Photogrammetry. New York: Wiley, 2001. [16] S. Negahdaripour and H. Madjidi, “Stereovision imaging on submersible platforms for 3-D mapping of benthic habitats and seafloor structures,” IEEE J. Ocean. Eng. Special Issue on Underwater Imaging, vol. 28, no. 4, pp. 625–650, Oct. 2003. [17] P. M. Newman, “On the structure and solution of the simultaneous localization and map building problem,” Ph.D. dissertation, Australian Centre for Field Robotics, Univ. Sydney, Sydney, Australia, 1999. [18] H. S. Sawhney, S. Hsu, and R. Kumar, “Robust video mosaicking through topology inference and local to global alignment,” in Proc. Eur. Conf. Computer Vision, Berlin, Germany, 1998, pp. 103–119. [19] G. C. Sharp, S. W. Lee, and D. K. Wehe, “Toward multiview registration in frame space,” in Proc. IEEE Int. Conf. Robot. Autom., Seoul, Korea, May 21–26, 2001, pp. 3542–3547. [20] S. H. G. ten Hagen and B. J. A. Kröse, “Toward global consistent pos estimation from images,” in Proc. Int. Conf. Intell. Robots Syst., Lausanne, Switzerland, Sep. 30–Oct. 4 2002, pp. 466–471. [21] S. Thrun, D. Koller, Z. Ghahramani, H. Durrant-Whyte, and A. Y. Ng, “Simultaneous mapping and localization with sparse extended information filters: Theory and initial results,” in Workshop Algorithmic Found. Robot., Nice, France, 2002, pp. 363–380. [22] R. Unnikrishnan, “Globally consistent mosaicking for autonomous visual navigation,” M.Sc. thesis, Robot. Inst., Carnegie Mellon Univ., Pittsburgh, PA, Sep. 2002. [23] R. Unnikrishnan and A. Kelly, “A constrained optimization approach to globally consistent mapping,” in IEEE/RSJ Int. Conf. Intell. Robots Syst., vol. 1, Oct. 2002, pp. 564–569. [24] P. Vanicek and E. Krakiwsky, Geodesy; The Concepts, 2nd ed. Amsterdam, The Netherlands: Elsevier, 1986. [25] SRI International, Menlo Park, CA.. VRML Terrain Datasets. [Online]http://www.ai.sri.com/VRMLSets/
1104
IEEE TRANSACTIONS ON ROBOTICS, VOL. 21, NO. 6, DECEMBER 2005
Hossein Madjidi received the B.Sc. and M.Sc. degrees in geomatics engineering from K. N. Tousi University of Technology, Tehran, Iran, and, in 2005, the M.Sc. degree in electrical and computer engineering from the University of Miami, Coral Gables, FL. His research interests include 3-D motion estimation and scene reconstruction from stereo imagery for seafloor and benthic habitat mapping. He has also been involved in different areas of geomatics engineering and photogrammetry for several years.
Shahriar Negahdaripour (S’86–M’87–SM’95) received the S.B., S.M., and Ph.D. degrees in 1979, 1980, and 1987, respectively, from the Massachusetts Institute of Technology, Cambridge. He joined the Electrical Engineering Department, University of Hawaii, Honolulu, in 1987 as an Assistant Professor. In August 1991, he joined the University of Miami, Coral Gables, FL, where he is currently a Professor of Electrical and Computer Engineering. Since the start of a project on automatic vision-based ROV station-keeping in 1988, which was supported by the University of Hawaii Sea Grant College Program, he has been involved in a number of other projects on the development of various vision technologies for underwater applications, which were supported by the National Science Foundation, Office of Naval Research, and the Naval UnderseaWarfare Center, Newport, RI. In addition to numerous journal and conference papers in this subject area, he presented a half-day tutorial at the IEEE OCEANS’98 Conference, Nice, France, on “Computer Vision for Underwater Applications.” Dr. Negahdaripour has organized and moderated panel sessions at IEEE OCEANS’01 and OCEANS’02 conferences on “Robust Video Mosaicking: Techniques, Standards, and Data Sets,” and “Photomosaicking of Underwater Imagery: Challenges and Technical Issues.”