A Multi-Camera Conical Imaging System for ... - Semantic Scholar

Recipient of the Siemens Corporation’s Best Paper Award at IEEE International Conference on Advanced Video and Signal Based Surveillance 2003, Miami, USA.

A Multi-Camera Conical Imaging System for Robust 3D Motion Estimation, Positioning and Mapping from UAVs Pezhman Firoozfam and Shahriar Negahdaripour Electrical and Computer Engineering Department University of Miami Coral Gables, FL 33124 {pezhman,shahriar}@miami.edu

Abstract Over the last decade, there has been an increasing interest in developing vision systems and technologies that support the operation of unmanned platforms for positioning, mapping, and navigation. Until very recently, these developments rely on images from standard CCD cameras with a single optical center and limited field of view, making them restrictive for some applications. Panoramic images have been explored extensively in recent years [12, 13, 15, 17, 18, 19]. The particular configuration of interest to this investigation yields a conical view, which is most applicable for airborne and underwater platforms. Unlike a single catadioptric camera [2, 15], combination of conventional cameras may be used to generate images at much higher resolution [12]. In this paper, we derive complete mathematic models of projection and image motion equations for a down-look conical camera that may be installed on a mobile platform– e.g, an airborne or submersible system for terrain flyover imaging. We describe the calibration of a system comprising multiple cameras with overlapping fields of view to generate the conical view. We demonstrate with synthetic and real data that such images provide better accuracy in 3D visual motion estimation, which is the underlying issue in 3D positioning, navigation, mapping, image registration and photo-mosaicking.

1. Introduction Capturing a large amount of information rapidly in a single image has motivated fast growing popularity of cameras with wider fields of view [10]. Unlike a photo-mosaic that is generally generated by moving the camera viewpoint, the goal is to generate a composite image, preferably a panorama, from a fixed camera position. To capture a much wider field of view, an omnidirectional sensor uses refracting and reflecting elements in arranged configurations [17], a panning camera [13], or combination of several cameras

[12, 14]. Some work has addressed improving the quality of panoramic images through optics and hardware design [3]. Others have focussed on designing suitable mirrors for capturing omni-images with the desired viewing directions and angles [3, 15], as well as the construction of stereo omniimages [13]. Clearly, the geometry, modeling and calibration of omnidirectional cameras are important issues in utilizing these systems for a variety of applications [2, 6]. In the operation of airborne platforms, quantitative information necessary for certain automatic capabilities – including accurate localization and positioning, mosaic-based and map-based navigation, and the construction of topographical maps – can be determined more robustly from these views than from typical images with a small FOV. In this paper, we investigate the application of a downlook conical omni-camera, for deployment on submersible and airborne platforms for terrain mapping. We derive the projection and image motion equations, describe the calibration of the system comprising 6 standard CCD cameras with overlapping fields of view, and study its performance in visual motion estimation. In agreement with earlier theoretical studies, we establish and quantify the improved accuracy with the increase field of view. Results for a calibrated sequence constructed synthetically from aerial imagery and a real data set are given in support of the new theoretical results. In the next section, we present the conical image modelling, and derive the equations for mapping from conical to multi-camera views, and vice-versa. In section 3, we address the calibration problem and the construction of the conical image. In section 4, we give the image motion equations, and the estimation of 3D motion from optical flow. In section 5, we give the results of experiments with synthetic data, an aerial image sequence with ground-truth knowledge of camera motion, and an experiment with real images to test the accuracy in 3D motion and trajectory estimation with conical views. We summarize the contributions and ongoing activities in section 6.

real applications, we should consider a more general offcentered configuration of a conical camera. Comparing Figure 2 shows that the projection equation becomes

Z Scan Line Y

r=f

X

Z

P(R,Z)

r

r

α

α θ O

z

Image Line

d

0

θ

O

R

R

i Mext

Figure 2. Ideal centered and off-centered conical cameras

 

0

0

0

1

0



−d 0   Rz (−αi ) × 0   1 0 0 0

− sin(αi )

0

cos(αi )

0

0

0

1



0 0  = 0  1 0

0



0

    − cos(θ)d  sin(θ)d

1

An arbitrary point P = [R cos φ, R sin φ, Z, 1]T maps onto camera i at position: £ ¤T i × P = Xi Y i Zi 1 = pi = Mext 



−R sin αi cos φ + R cos αi sin φ  −R sin θ cos αi cos φ − R sin θ sin αi sin φ + Z cos θ + d sin θ   R cos θ cos α cos φ + R cos θ sin α sin φ + Z sin θ − d cos θ  i i 1

Projection of an arbitrarily point [X, Y, Z] onto the image cone can be computed based on viewing angles (see Figure 2 (left)): Z tan α = R and tan(α − θ) = r f

So, the projection of point P onto the image plane of camera i is: R(cos αi sin φ−sin αi cos φ) x = f R cos θ cos αi cos φ+R cos θ sin αi sin φ+Z sin θ−d cos θ

The trigonometric expansion of tan(α − θ) and utilizing the above equations yield Z − R tan θ R + Z tan θ

0

  − sin(θ) cos(αi ) − sin(θ) sin(αi ) cos(θ)   cos(θ) cos(α ) cos(θ) sin(αi ) sin(θ) i 

2.1. Centered Camera

r=f

I 0



0





For simplicity, we use cylindrical coordinate system, where a point at arbitrary position P = [X, Y, Z]T is represented by the coordinate P = [φ, R, Z], where R = √ X 2 + Y 2 . Since the camera has symmetry with respect Y to φ = tan−1 X , we model the system in any constant φ plane, without losing generality. A conical view can be generated using a rotational linescan camera; see Figure 1 (left). Placing one-pixel wide scan line images together forms the conical image. For real-time applications, this can be approximated by narrowwidth image strips, captured from several cameras to construct the conical view. Modelling of the ideal situation using a rotating camera model, and construction with multiple cameras are analyzed next.

or

(2)

 R (−π/2) 0   Ry (θ − π/2) 0  = z × × 0   0  0

2. Conical Image Model

Z − tan θ r = R Z f 1 + R tan θ

Y X

Figure 1 (right) shows the realization of high-resolution conical imaging with multiple cameras: A system comprising 6 units, as the one constructed in our lab– Underwater Vision and Imaging Lab (UVIL) [12]. The extrinsic matrix for camera i in multicam system is:    

P(R,Z)

z

Image Line

and φ = tan−1

2.3. Multi-Camera Approach

Figure 1. Conical camera system using rotational scanning camera (left) and multi-cam system (right). Z

Z − (R − d) tan θ (R − d) + Z tan θ

sin θ cos αi cos φ−R sin θ sin αi sin φ+Z cos θ+d sin θ y = f −R R cos θ cos αi cos φ+R cos θ sin αi sin φ+Z sin θ−d cos θ

(1)

dividing numerator and denominator of above equations by cos θ gives, R sin(φ−αi )/ cos θ x = f (R cos(φ−α i )−d)+Z tan θ

2.2. Off-Centered Camera

(3) y=f

Ensuring that the rotation axis precisely goes through the optical center is difficult achieve and often undesirable. In 2

Z−(R cos(φ−αi )−d) tan θ (R cos(φ−αi )−d)+Z tan θ

x',y'

x,y

r,ϕ

Q1

Image to Cone

Image 2

Q2

Image to Cone

…

…

… Image N

QN

Image to Cone

Multicam Images

Calibration Parameters

Mapping to Cone x,y

r,ϕ

3. System Calibration

Cone to Image

Q1

Image 1

Cone to Image

Q2

Image 2

In practice, it is difficult to align a multi-cam system precisely, and the misalignment is compensated by calibration [2]. To do this, we use a planar scene. Thus, the misalignment of each camera can be addressed and compensated by exploiting the plane-to-plane projection or homography (from the raw image of any one camera to the calibrated view) [16]:  k  " #  0k 

…

…

Cone to Image

Mapping from Cone

For any point c(r, φ) on the cone, the corresponding point on image plane of camera i can be found using the mapping equations in section 2.4. Discrepancies due to camera misalignment are rectified by calibration, as described next. Interpolation on the image plane returns the actual pixel value for the desired point on the cone.

Conical Image

x',y'

…

Clipping

Conical Image

2.6. Conical Image Formation Mosaicking

Image 1

QN

Image N

Calibration Parameters

Multicam Images

Figure 3. Mapping between multi-cam (top) and conical image (bottom).

xi 1

2.4. Mapping from Conical to Multi Camera

f − ar r + af

0

pik

(4)

2.5. Mapping from Multi to Conical Camera Derivation of the mapping from the multi-cam to the conical image, as given in Figure 3-a, contains several steps. For simplicity, (3) is rewritten as v u + Za

and y = f

Z − ua u + Za

where u = R cos(φ − αi ) − d

and v = R sin(φ − αi )/ cos θ

Solving for u and v gives u=Z

f − ay y + af

and

v=

x (u + aZ) f

Finally, R and φ can be found from u and v: φ R

= αi + tan−1 ( v cos θ ) u+d p = (u + d)2 + v 2 cos2 θ

{z

q3 q6 1

xi

 yi0 k  1

}

0

0 [xik , yik ]T

where = is some point k in camera i. The implicit up-to-scale homography transformation accounts for the imprecise knowledge of the focal length f for each camera.We also need to use the image-to-cone mapping equations in section 2.4 (see Figure 3), which also depend on the unknown distances Z of points on the scene surfaces. Without loss of generality, we assume that the planar scene is at a constant Z distance, giving one unknown depth value (this is equivalent to choosing an object-centered coordinate system, where the Z axis is in the (opposite) direction to the surface normal). Another degree of freedom is the “effective focal length,” f of the conical system. Finally, we need to determine the elevation angle θ of the conical system. We determine these and the unknowns of the 6 planar homographies from the calibration process. The latter encode the information we need for camera-to-camera image alignment on the cone. Computation of the calibration parameters is done through global optimization based on the information from a number of matching points, so-called conjugate pairs. Corresponding pairs of points in the images of neighboring cameras are selected, where each pair should be mapped onto the same point in the conical image. The mismatch (in the conical image to be constructed) is used as the error in the optimization process. More precisely, the map0 0 0 ping for corresponding points pik = [xik , yik , 1]T and 0 0 0 pjk = [xjk , yjk , 1]T (k = 1, 2, . . .) in cameras i and j, respectively, can be written as

where a = tan θ. Substituting R in (3) gives the transformation from the conical coordinate system (r, φ) to the multi-cam system (x, y).

x=f

|

q2 q5 q8

Qi

Referring to Figure 3, the mapping from the conical to a multi-cam image involves first solving (2) for R: R=d+Z

q1 q4 q7

 yik  '

(5)







0



xki xik  yik  ' Qi  yi0 k  1 1

Equation (2) can be rewritten by replacing R from the above equation. This is the mapping equation from multi-cam system (x, y) into conical system, (r, φ). 3

 and





0



xkj xjk  yjk  ' Qj  yj0 k  1 1 (6)

Points cki and ckj – mappings of pki = [xki , yik , 1]T and pkj = [xkj , yjk , 1]T onto the cone according to equations in section 2.5– are theoretically identical. Euclidean distance (or any other norm) between cki and ckj is the error measure used to determine the optimum calibration parameters; by minimizing over all the matches.

5 10 4.5

4

3.5

3

To derive the image flow equations [9] for the conical image, we differentiate (2) and substitute for the velocity ˙ = [X, ˙ Y˙ , Z] ˙ T of the object point by the rigid body moP tion model Ω × P + T . Here, it is assumed that the motion is described in terms of a translational component T = [tx , ty , tz ]T , and rotation Ω = [ωx , ωy , ωz ]T about the origin of the camera coordinate system. Skipping tedious algebra, we finally arrive at the compact form ¤T ¤T £ £ 1 X=

Ω

Z

T~

m=

f φ˙

2.5

B sin φ

t d s+t Z 2

−A sin φ −B cos φ

A

=

B

C s

= −f t cos2 θ = 1 − tan θ r/f

D t

1 0

−A sin φ C cos φ

A cos φ C sin φ

1.5

2cam

1cam 1

1.5

2

3cam 2.5

3

conical 3.5

4

Figure 4. Logarithmic plot of the condition number (of the system matrix in computing 3D motion from optical flow) confirms increasing robustness with a larger field of view, levelling off at nearly 180 degrees corresponding to the view from 3 of the overlapping cameras.

r˙

0 D

30

2

(7)

where · −A cos φ M=

20

25

4. Image Motion Model

M X = m,

increasing field of view in vertical direction; Numeric labels are half−angle f.o.v

15

¸

³ ´ d = s + t r/f + st cos2 θ Z

5. Experiments

= st cos2 θ = tan θ + r/f

5.1. Sensitivity Analysis

We immediately note the well-known inherent scalefactor ambiguity between T and Z.

Robustness of motion estimation and the impact of the FOV can be investigated theoretically [1, 11]. A detailed theoretical analysis is beyond the scope of this paper, but we provide a summary and highlights. Referring to the solution X = (M T M )−1 (M T m) from the previous section, the conditioning of 6×6 matrix (M T M ) has a significant impact on the robustness and accuracy of the solution [11]. In particular, the eigenvector decomposition reveals the inherent ambiguities in 3D motion recovery from differential image motions, e.g., due to the well-known translation-rotation ambiguity as a result of the similarity of image motions due to tx /ty translations and ωy /ωx rotations [1]. Figure 4 depicts logarithmic plots of the condition number (CN) for various viewing angles, parameterized by the size of the field of view over 1, 2, 3, and 6 cameras of the multi-cam conical system (i.e., 60 degrees for one camera). The various curves show the same results for increasing viewing angle in the vertical direction from 10 to 30 degrees in half-angle field of view. These curves nearly level off at 3 cameras, corresponding to a viewing angle of 180 degrees. Decreasing CN with larger viewing angles verifies the improved robustness in motion recovery. The fact that CN cannot be made any smaller, ideally equal to one, is tied to the inherent ambiguity in distinguishing between translations and rotations in the interpretation of differential image motions. Next, we examine these findings through one experiment with noise-corrupted synthetic data. One way to construct

4.1. 3D Motion Estimation Equation (7) represents the image motion in terms of camera motion and distance along the Z axis to the scene. Thus, it can be utilized to estimate motion from the measurements of optical flow. Typically, this is a difficult problem to solve due to the nonlinearity of the constraint, not knowing motion and depth Z of the scene surface points. In applications where the target distance (say the terrain) from the camera(s) is roughly constant, the scene may be assumed to be at a constant depth Z, which can be absorbed in the translation vector (due to scale factor ambiguity). Thus, the image motion simplifies to a linear constraint equation in terms of the 6 unknown motion parameters. Consequently, these can be found from a least-square error formulation (or any other optimization process): X = (M T M )−1 (M T m) This simplified solution, a special case of planar homography for a frontal surface, suffices for certain applications; e.g., flyover images over (relatively) flat terrains. However, the main motivation in using this solution is to make some comparisons with the solutions that are obtained from images that cover a smaller field of view; that is, to assess the improvement or robustness of the solution with increasing field of view [1, 11]. 4

conical 3 cameras 2 cameras 1 camera

−4

0 −2 −4

ωy [radian]

4

0.4 0.6 σerr [pixel]

0.8

4

0 0.2 −4 x 10


0.8

1

0 0.2 −4 x 10


0.8

1

0


0.8

1

2

0

0 −2

0 0.2 −4 x 10


0.8

−4

1

4

0.5

2

0 −0.5 −1

0

−4

1

dz [m]

ωz [radian]

0 0.2 −4 x 10

−2

1

x 10

−2

2

−4

−4

4 2

dx [m]

2

dy [m]

ωx [radian]

4

x 10

0 −2

0

0.2


0.8

1

−4

0.2

Figure 6. An aerial photograph was used to construct synthetic sequence to test the motion estimation algorithm: First frame as a rectangular image, and three views superimposed with the computed optical flows from one to the proceeding image.

Figure 5. Standard deviation of motion estimation error for various noise levels, utilizing data over one camera, two cameras, three cameras, and entire of the cone.

Neighboring views are shown in different colors to visualize the misalignment in overlapping regions. The middle image is the result after calibration as described in section 3. Finally, the same conical view is depicted as a gray-scale image. For the most part, we have achieved a fairly accurate registration over all the views. The exception is the white PVC pipe in the top left. Based on the alignment quality over the entire image, it is conceivable that the object may have moved slightly as different images were being acquired. For completeness, the following calibrated parameters were computed: f = 6.168 [mm], θ = 59.82 [degree], and Z = 48.68 [cm]. In addition, the six estimated homographies are listed in Table 1. Motion Estimation: Images were acquired by the multicamera system for constructing 18 conical views as we traversed in a diagonally forward direction from one to the other end in a water tank. The lateral component of the motion varies to allow us to test different affects with the same data set; including segmentation of a moving target (toy lobster) based on motion cues (optical flow discontinuity) around the middle of the sequence (where the forward component was also reversed over one frame). The pose angles and positions estimated from the solutions of camera motion– given in section 4– over one, two or three images, as well as the conical view are given in Figure 8. Though the pose angles and vertical positions are fairly similar and within accuracy of the experimental setup, significant discrepancies exist among the XY trajectories. Therefore, it becomes important to assess performance. Unfortunately, the positions of the system along the path could not be recorded due to the complexities of the set up and the

the data is to add various levels of noise to the perfect optical flow computed from (7) for certain assumed camera motion(s). Alternatively, we can use zero motion as the perfect solution, and test the solution with a noise field as our optical flow measurement. We can carry out the motion estimation computations over regions that correspond to one, two, and three camera views, as well as the entire conical view. Assessment of performance is based on calculating the mean and standard deviation of the error by repeating the experiment with a large number of noise samples. In Figure 5, each plot is the standard deviation of the error for one motion component, as we vary the noise level from zero to one pixel. These results are in full agreement with the CN behavior. Figure 6 shows a conical image in rectangular form, and three views superimposed with the optical flows from one to the proceeding image.

5.2. Real Data System Calibration: The serious distortion of our conical imaging system– comprising 6 security cameras each with a field of view of roughly 72 degrees– was first rectified by internal calibration with a chess-board target image [7]. Next, the external calibration process described in section 3 was applied for image alignment in constructing the conic view. Projection onto the conical view of the images from the 6 cameras prior to calibration– with the mapping given in section 2.6, and a priori assumed configuration based on system design parameters– yields the top image in Figure 7. 5

1.01 0.04 -0.10 1.03 -0.00 -0.15 0.98 0.03 0.01

Q1 -0.00 0.94 -0.03 Q3 -0.03 1.01 0.04 Q5 0.10 0.93 0.17

0.00 -0.00 1.00

1.00 0.07 -0.09

0.00 -0.00 1.00

1.07 -0.05 0.07

-0.00 -0.00 1.00

0.90 -0.14 0.24

Q2 0.00 1.01 -0.18 Q4 0.02 1.00 -0.05 Q6 0.01 1.01 -0.18

0.00 -0.00 1.00 -0.00 -0.00 1.00 0.00 -0.00 1.00

Table 1. Planar homographies corresponding to the calibration of the 6 cameras in our conical imaging system (see section 3 for details).

6. Conclusions Figure 7. Calibration of conical view; construction based on assumed camera configurations (top) and after calibration (middle). Color coding is used to distinguish the 6 different images and their overlaps. The last image is the gray-scale conical view. data acquisition process. Therefore we cannot assess the accuracy of the results quantitatively. One indirect evaluation is based on the accuracy in alignment of a reference image for a particular camera (or the conical view) with any frame in the sequence along the path, after compensating for the motion. (Due to space limitation, we use the view for one of the cameras, instead of the conical view, for demonstration.) To do this, we need to warp the view along the path to the coordinate system of the reference image, based on the estimated position and pose of the camera. Choosing the coordinate system of camera 1 at the first frame as the reference, every third frame (frames 4,7,10,13,16) is warped according to the estimated position and pose, and superimposed on frame one (depicted in different colors). These ideally should coincide perfectly within overlapping areas (common fields of view). The results in Figure 10 correspond to the trajectories estimated from the data over the FOV of one, two and three cameras, as well as the conical imaging system. These clearly verify the much superior performance when utilizing a larger FOV, and particularly the conical view. Some misalignment over the bigger objects within overlapping regions of neighboring cameras is natural, and is tied to the discrepancy in their disparity compared to the background plane (somewhat reminiscent of focusing a camera on object at a particular depth, where objects at different depths remain out of focus). We can also do visual inspection of the estimated image displacements of various objects, by superimposing it on the pairs of consecutive frames. For example, Figure 9 depicts frames 9 and 10 of the 6 cameras in different color channels, that are superimposed with the estimated flows; before (red) and after (green) motion. Overall, the vectors correctly depict object motion.

We have described a conical imaging system, constructed from 6 standard security cameras, targeted for testing various vision algorithms/systems to be implemented/deployed on image-based mobile platforms– e.g, a submersible or airborne system for terrain flyover imaging. Such a system not only enables a large amount of visual information to be recorded rapidly in a single high-resolution image, but the extended field of view results in enhanced robustness and accuracy for a number of vision algorithms, including 3D motion estimation. We have derived complete mathematic models that enables the calibration of the multi-cam conical system and the construction of the conical view. The image motion equations have been given, enabling the estimation of 3D motion from these images. We have demonstrate through experiments with synthetic and real data that such images provide improved accuracy in visual motion estimation, which is critical for image-based mapping and positioning. Ongoing work is addressing the construction of these images at frame rate and the potential use of space-variant sampling and processing techniques for certain applications, including 2D and 3D mapping. Acknowledgements: This article is based upon work supported by the NSF under Grant No. BES-9711528, and in part by ONR under Grant No. N000140310074. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF or ONR.

References [1] G. Adiv, “Inherent ambiguities in recovering 3D motion and structure from a noisy flow field,” IEEE PAMI, Vol 11(5), May, 1986. [2] J.M. Gluckman, and S.K. Nayar, “Planar catadioptric stereo: geometry and calibration,” Proc. Conference Computer Vision and Pattern Recognition, Fort Collins, June 1999.

6

ωX [degree]

0.4 conical camera 1 camera 2 cameras 3 cameras

0.2 0 2

4

6

8

10

12

14

16

18

2

4

6

8

10

12

14

16

18

2

4

6

8

10 Frame #

12

14

16

18

0.2

0.3

ωY [degree]

0.4 0.2 0

2 ωZ [degree]

1.5 1

0.5 0

1.2 1

Y [m]

0.8 0.6 0.4 0.2 0 0

0.1

0.4 X [m]

0.5

0.6

0.7

0.8

Figure 9. Superimposed estimated optical flows on frames 9 (red) and 10 (green).

0.5 0.49

[10] V. Nalwa. “A true omnidirectional viewer,” Technical report, Bell Laboratories, Holmdel, NJ, Feb 1996. [11] S. Negahdaripour, C.H. Yu, “Robust Recovery of Motion: Effects of Surface Orientation and Field of View,” Proc. CVPR, Univ. of Michigan, Ann Harbor, June, 1988. [12] S. Negahdaripour, H. Zhang, P. Firoozfam, and J. Oles, Utilizing Panoramic Views for Visually Guided Tasks in Underwater Robotics Applications, Proc. Oceans, Honolulu, USA, 2001. [13] S. Peleg, M. Ben-Ezra, and Y. Pritch, “OmniStereo: Panoramic Stereo Imaging,” IEEE Trans. PAMI, March 2001. [14] R. Swaminathan, and S.K. Nayar, “Polycameras: Camera cluster for wide angle imaging,” CUCS-012-99, Dept. CS, Columbia Univ., 1999. [15] R. Swaminathan, M.D. Grossberg and S.K. Nayar, “Caustics of Catadioptric Cameras,” Proc. International Conference Computer Vision, Vancouver, Canada, July 2001. [16] R.Y. Tsai, T.S. Huang, “Estimating Three-Dimensional Motion Parameters of a Rigid Planar Patch, II: Singular Value Decomposition,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-30, No. 4, August 1982. [17] Y. Yagi, and M. Yachida, “Real-Time Generation of Environmental Map and Obstacle Avoidance Using Omnidirectional Image Sensor with Conic Mirror,” Proc. CVPR, Honolulu, HI, 1991. [18] Z. Zhang, Weiss, R., and Riseman, E.M., “Feature Matching in 360o Waveforms for Robot Navigation,” Proc. CVPR, Honolulu, HI, 1991. [19] J.Y. Zheng, Tsuji, S., “Panoramic representation for route recognition by a mobile robot,” International Journal Computer Vision, Vol 9(1), 1992.

Z [m]

0.48 0.47 0.46

conical camera 1 camera 2 cameras 3 cameras

0.45 0.44

0

2

4

6

8

10 Frame #

12

14

16

18

20

Figure 8. Estimated camera pose angles, XY projection, and vertical position along trajectory. [3] J.M. Gluckman, and S.K. Nayar, “Rectified Catadioptric Stereo Sensors,” Proc. CVPR’00, Hilton Head Island, South Carolina, June 2000. [4] N. Gracias, and J. Santos-Victor, “Underwater video mosaics as visual navigation maps,” Computer Vision Image Understanding, Vol 79, July, 2000. [5] N. Gracias, and J. Santos-Victor, “Underwater mosaicing and trajectory reconstruction using global alignment,” Proc. Oceans’01, Honolulu, HI, November, 2001. [6] M.D. Grossberg and S.K. Nayar, “A General Imaging Model and a Method for Finding its Parameters,” Proc. ICCV, Vancouver, Canada, July 2001. [7] J. Heikkila, and O. Silven, “A four-step camera calibration procedure with implicit image correction,” Proc. CVPR, Puerto Rico, June, 1997. [8] J.F. Lots, D.M. Lane, E. Trucco, “Application of 2.5D visual servoing to underwater vehicle station keeping,” Proc. Oceans’00, Rhode Island Convention Center, Sept, 2000. [9] H.C. Longuet-Higgins, and K. Prazdny, “The Interpretation of a Moving Retinal Image,” Proc. the Royal Society of London, Vol B-208, 1980.

7

FOV= 1 camera

FOV= 2 cameras

FOV= 3 cameras

Conical camera

Figure 10. Each set of 6 images consists of the 1st of 18 frames in a sequence (for camera 1), superimposed with selected subsequent frames (4,7,10,13,16) that are warped to the coordinate system of the first frame, based on estimated positions and poses at these later frames. The 4 sets corresponds to motion estimation from data over the FOV of one, two and three cameras, as well as the conical view.

8