IIIS Proc. of SCI2000, Vol.V, pp.317–322 (2000 7)
An Automatic Camera Calibration Method with Image Registration Technique Toru TAMAKI†‡
∗
† Dept. of Info. Eng.,
[
Nagoya Univ. Nagoya 464-8603 Japan
Tsuyoshi YAMAMURA[
Noboru OHNISHI†‡
Faculty of Info. Sci. Tec., Aichi Prefectural Univ. Aichi 480-1198 Japan
‡ Bio-Mimetic Control Research Center, RIKEN Nagoya 463-0003 Japan
ABSTRACT We propose a novel technique of automated camera calibration method to obtain internal camera parameters, which enables us to compensate the distortion of an image taken by a camera with zooming lens. The proposed method is based on image registration. First a calibration pattern is transformed onto the distorted image of the pattern using affine transformation. Then the registration of the two images are performed using a nonlinear optimization with Gauss-Newton method that minimizes residuals of pixel value of two images. Finally the distortion parameters are estimated to minimize residuals that remain after second step. The experimental results show the usefulness of the proposed method. Keywords: calibration, lens distortion, nonlinear optimization, gauss-newton method, image registration 1.
INTRODUCTION
Calibrating a camera and compensating distortion of lens are one of important processes for computer vision. Although self-calibration have been studied recently, many researches (for examples, [1, 2]) formulate their problems without considering distortion because of simplicity. So pre-calibration of internal camera parameter is required. Some codes of calibration have been available via the internet (e.g., Tsai’s method[3] is available from[4]), however, such ordinary techniques require a lot of correspondences between points on an image and a known three-dimensional coordinates (on a plane or even on some structure like a cube or a house), then the transformation of the points is estimated. In such a method, the correspondences should be done by a human operator, and it is not reliable because of the manual correspondence errors. Moreover, it takes much time and patience, and it ∗ e-mail
:
[email protected]
is too hard to measure a change of the distortion parameter as a camera zooms in/out. An alternative procedure is detecting markers. It can be done by a template matching technique and maybe of sub-pixel order. However, there is an another correspondence problem; which marker on the image corresponds to which point in a space known in advance. It cannot be neglected as the number of the markers increases to improve the accuracy of the estimation. If the problem can be avoided, the number of the points used for correspondence is limited. In this paper, we propose a new calibration method which compensate the distortion of an image due to lens zooming. The proposed method makes the correspondence, and more precise estimation than the marker detection is expected because this method uses not several points of markers but all points of the image. The our method is based on an image registration technique which is used in the area of motion analysis, and consists of the following three stages; affine transformation, plane projective transformation, lens distortion recovery. 2.
THREE ESTIMATION STEPS WITH REGISTRATION
The basic idea is that a calibration needs pointto-point correspondence and a registration can supply that. The proposed method makes a correspondence between an ideal calibration pattern and a distorted image of the printed pattern taken by a camera. Since the sizes of the pattern printed on a paper is measured easily, three dimensional coordinate of each pixel in the pattern can be decided easily. Another features of our method are that any image can be used as the calibration pattern, and that once the parameters are estimated they can be used as long as the zooming of lens doesn’t change. Our method consists of the following three procedures. The first step is to roughly transform the pattern into the image, which is represented by affine and translation parameters. Then the
preside parameters of a transformation of plane under perspective projection are estimated with a nonlinear optimization. Finally the distortion parameters are estimated to minimize residuals that remain after applying second step due to the lens distortion. 3.
AFFINE TRANSFORM WITH DETECTED MARKERS
At first, a pattern image, I1 should be created. It must have three color (r,g,b) markers with coordinates mr , mg , mb on the corners, then be printed with colors to make it easy to detect the markers. Then a camera, which is to be calibrated, takes an image of the printed pattern, I2 . The coordinates of the markers m0r , m0g , m0b in the image I2 are detected by thresholding or template matching. Here what is calculated is the parameters of the transformation from the pattern I1 into the pattern taken in the image I2 . This transformation is represented by six parameters θa = (θ1a , . . . , θ6a )T , first four of them are the affine parameters, and the last two are translation. Let p = (x, y)T be a point on I1 , and p + a(p; θa ) be the point on I2 corresponding to p, where x y 0 0 1 0 a a(p; θ ) = θa (1) 0 0 x y 0 1 and solve the following system of linear equations m0i
a
= mi + a(mi ; θ ),
i = r, g, b
(2)
to obtain the parameters θ a . 4.
min u θ
X
ρ(ri ),
ρ(ri ) = ri2
(5)
i
Here u() is the displacement of a point on a plane when the view changes from one to the other under perspective projection. It is represented by the eight parameters θu = (θ1u , . . . , θ8u )T , which is a motion model of a plane under perspective projection and is often used for motion analysis[5]. Minimization method Estimating the parameters θu , the function (5) are minimized by Gauss-Newton method, a famous nonlinear optimization technique. The parameters are updated by the following rule. θu ← θu + δθu
(6) a
We use the the affine parameters θ obtained at first stage as the initial value of the first six elements of θ u . The last two of θu are initialized to 0. According to [6], the decent direction δθ u are calculated as follows; 1 ˜ T )−1 J Dr ˜ δθ u = −(J DJ " # ∂r ∂ri u J = J(θ ) = = ∂θ u ∂θju ˙ i) ˜ = diag ρ(r D ri ∂ρ(r) ρ(r ˙ i) = ∂r r=ri
(7) (8) (9) (10)
This is the same as the least square formulation, that is, the system of linear equations[5] which is written as X ρ(r X ρ(r ˙ i ) ∂ri ˙ i ) ∂ri ∂ri u δθl = − ri u (11) u u ri ∂θk ∂θl ri ∂θk i l,i
IMAGE REGISTRATION; PERSPECTIVE FITTING
for k = 1, . . . , 8. The partial derivatives are transformed as below using the chain rule of differential.
The first step described above uses only three corresponding points with respect to the affine transformation. Next we make precise correspondences of every points, that is, image registration, using plane projective transformation. Modeling and formulation Now, what to do is minimizing residuals of intensities between pi in I1 and pi + u(pi ; θu ) in I2 corresponding to pi ; ri = I1 (pi ) − I2 (pi + u(pi ; θu ))
(3)
where
∂r ∂u ∂r = −M uT ∇I2 (p + u(p)) (12) u = ∂θ ∂θ u ∂u The iteration of calculating δθ u in (6) are repeated until it converges. At each iteration, the parameters estimated at previous iteration are used for the calculation of u(p). When the iteration stops, we write the estimatˆu ed parameters as θ 5.
IMAGE REGISTRATION; DISTORTION FITTING
(4)
At the end of the previous step, the image registration between the pattern image and the pattern in the distorted image are finished except for the effect of the lens distortion.
and the function which should be minimized is as follows;
1 The reason to introduce D ˜ is to make it easy to deal with a robust function as ρ instead of least-square when the pattern in the image are partially occluded or out of scene.
x y 0 u(p; θ ) = 0 0 x = M u θu u
0 y
1 0 x2 0 1 xy
xy y2
θ
u
Modeling of distortion The relationship between an undistorted and a distorted coordinate in an image is usually modeled by the following five internal camera parameters[7]; the distortion parameters κ1 and κ2 , coordinate of image center (cx , cy )T , and the scale sx which is the ratio of width and height of pixel. We write these parameters as θd = (κ1 , κ2 , cx, cy , sx )T . The distortion is represented by a coordinate system which has its origin at (cx , cy )T , while the system used in the previous section has the origin at top-left corner. Therefore, we introduce another notation below. Let pu = (xu , yu )T and pd = (xd , yd )T be points in the un-distorted and distorted image, both of them have their origins at the top-left corner of their images. And let (ξu , ηu )T and (ξd , ηd )T be points in the un-distorted and distorted image respectively with the origin at the center of image (cx , cy )T . These can be written as follows; (ξu , ηu)T = (xu , yu )T − (cx , cy )T (13) (ξd , ηd)T = (ξu , ηu )T − (κ1 R2 + κ2 R4 )(ξd , ηd)T (14) (xd , yd )T = (sx ξd , ηd)T + (cx , cy )T (15) p where R = ξd 2 + ηd 2 . As shown above, ξu is explicitly written as the function of ξd , while ξd is not. In order to obtain ξd from ξu , we solve the following equations iteratively [7]; (ξu , ηu)T (ξdk , ηdk ) = (16) 1 + κ1 Rk−1 2 + κ2 Rk−14 p where Rk = ξdk 2 + ηdk 2 , starting with R0 = p 2 ξu + ηu 2 . The iteration stops at k = 8 because it is an enough approximation [7]. Using the relations above, we obtain two functions between pu and pd about the system of topleft corner origin, T
d
pd = d(pu ; θ ) (17) pu = f (pd ; θd ) xd −cx (1 + κ1 R02 + κ2 R04 ) + cx s x = (18) (yd − cy )(1 + κ1 R02 + κ2 R04 ) + cy
where
0
R =
r
xd −cx sx
2
+ (yd − cy )2
(19)
f and d are the inverse of each other, but d is not a closed-form function of pu because d corresponds to the procedure (16). Anyway, we can write the transformation between images using Eq.(17) and (18). Let I1u be the image of the pattern transformed by applyˆu to I1 , and I ud be the image of the pattern ing θ 1 transformed by applying θ d to I1u . That is, u
ˆ )) I1 (p) = I1u (p + u(p; θ u ud I1 (p) = I1 (d(p)) I1u (f (p)) = I1ud (p)
(20) (21) (22)
I1u
I1
I1ud p
f(p) p p+u(p;θ)
p
d(p)
I2
Figure 1: Relations between each transformation. and Fig. 1 shows these relationships. Minimization with inverse registration If the transformation from I1 to I1ud was closedform, the same strategy could be used to estimate θd . However, since d isn’t an explicit function, the registration between I1 to I1ud can not estimate directly the parameters. Now consider an inverse registration. As you see in Fig. 1, we intend to match I2 with I1ud , that is, the minimization of residuals of intensities between two images; ri = I1ud (pi ) − I2 (pi )
(23)
But this can be rewritten as follows using Eq.(22); ri = I2 (pi ) − I1u (f (pi ; θd ))
(24)
Hence, the estimation method becomes the same one in the previous step. The minimization is done about the following function with Gauss-Newton method. X min ρ(ri ) (25) θd
Ω
where Ω = {i; pi ∈ I2 , ∃p ∈ I1 , f (pi ) = p + u(p)}, which means that the minimization should use points in I2 within the region corresponding to the pattern I1 . The system of equations which should be solved is the same form as Eq.(11); X ρ(r X ρ(r ˙ i ) ∂ri ∂ri d ˙ i ) ∂ri δθl = − ri d (26) d d ri ∂θk ∂θl ri ∂θk i∈Ω l, i∈Ω and the derivatives in Eq.(26) are as follows; ∂r ∂f ∂f ∂r = = (−∇I1u (f (p))) ∂θ d ∂θ d ∂f ∂θ d
(27)
According to Eq.(18), the Jacobian is 02 xd −cx R R02 (yd − cy ) sx R04 xd −cx R04 (yd − cy ) sx ∂f (pd ) ∂xu ∂yu = (28) ∂c ∂c x x d ∂θ ∂yu ∂xu ∂cy ∂xu ∂sx
∂cy ∂yu ∂sx
where 1 ∂xu = 1 − (1 + κ1 R02 + κ2 R04 ) ∂cx sx (xd − cx )2 − 2(κ1 + 2κ2 R02 ) sx 3 ∂yu x d − cx = −2(κ1 + 2κ2 R02 ) (yd − cy ) ∂cx sx 2 ∂xu x d − cx = −2(κ1 + 2κ2 R02 ) (yd − cy ) ∂cy sx ∂yu = 1 − (1 + κ1 R02 + κ2 R04 ) ∂cy 2
∂xu ∂sx
∂yu ∂sx
(29) (30) (31)
− 2(yd − cy ) (κ1 + 2κ2 R ) (32) −(xd − cx ) = (1 + κ1 R02 + κ2 R04 ) sx 2 (xd − cx )3 (33) − 2(κ1 + 2κ2 R02 ) sx 4 (xd − cx )2 = −2(yd − cy )(κ1 + 2κ2 R02 ) (34) sx 3
Initial parameters of cx , cy , κ2 and sx to solve Eq.(26) are set to half of width and height of I2 , 0 and 1, respectively. On the other hand, κ1 is randomly initialized to avoid that all of Eq.(29)∼(32) become 0 by initializing κ1 = κ2 = 0. We choose empirically κ1 ∈ [−10−7 , 10−7]. 6.
Figure 2: The calibration pattern. 640×480
02
SOME STRATEGIES
Interpolation of pixel value When we need a intensity of a pixel whose coordinate isn’t on an integer grid, we use the bilinear interpolation among the neighbor pixel values as the pixel value. Histogram matching Since the image taken by the camera often change the intensities of the pattern, the histogram of the image is transformed so that it becomes the same as that of the pattern. Coarse-to-fine To reduce a computation time, and to obtain an accurate estimation even when there is a a relatively large residual in an initial state, the coarseto-fine strategy is employed. The procedures mentioned above are applied to a filtered image which is much blurred at first and then gradually becomes fine one. Therefore, the second and third step are repeated in turn as changing the resolution of the images I1 , I1u and I2 .
Table 1: Estimation results as changing zoom Fig.3(a) Fig.3(e) κ1 2.804e-07 -6.7631e-08 κ2 2.992e-13 5.219e-13 cx 327.8 326.6 cy 214.3 184.2 sx 0.9954 0.9997 printed by a laser monochrome printer (EPSON LP-9200PS2), and take images of it by a CCD camera (Sony EVI-D30) fixed on a prop with a capturing software (of SGI O2). As changing the camera zooming from the widest view angle, we took two images of the pattern along with a grid pattern shown in the left column of Fig.3. As you see, the wider the view angle is (Fig.3(a)(c)), the larger the effect of distortion becomes (the lines of grid curve). The estimation results are shown in Tab.1 and Fig.3(b)(d)(f)(h) shows the images compensated by Eq.(18) with the estimated parameters. The curved lines in grid pattern should be transformed to straight with the compensation. The results have some error as seeing in the right column of Fig.3 that the lines of grid still slightly curves especially around the corners of the image, because the gradation of illumination of the image can not be removed by the simple histogram transformation which should be replaced with an estimation of illumination change by some method, such as a linear brightness constraint[8]. Note that in simulation experiments using transformed pattern as a distorted image with additional noise at each pixel, the proposed method worked very well even when the amplitude of added uniform noise is greater than ± 50. 8.
7.
CONCLUSIONS
EXPERIMENTS
We have conducted experiments with the proposed method to real images that is taken a camera as changing its zoom parameter. We use a photograph as a calibration pattern (see Fig.2) which is
We have proposed a new technique of automated camera calibration method to obtain internal camera parameters in order to compensate the distortion of an image. The proposed method is based of image registration and consists of two nonlinear
optimization steps; perspective fitting with geometric transformation and distortion fitting. Experimental results demonstrate the efficiency of the proposed method that can reduce human operator’s labor. The nonlinear optimization somewhat takes time, but it is enough to run as a batch process. 9.
REFERENCES
[1] T. Mukai and N. Ohnishi, “The recovery of object shape and camera motion using a sensing system with a video camera and a gyro sensor,” in Proc. of ICCV’99, pp. 411–417, 1999. [2] J. B. Shim, T. Mukai, and N. Ohnishi, “Improving the accuracy of 3D shape by fusing shapes obtained from optical flow,” in Proc. of CISST’99, pp. 196–202, 1999. [3] R. Y. Tsai, “An efficient and accurate camera calibration tecnique for 3D machine vision,” in Proc. of CVPR’86, pp. 364–374, 1986. [4] R. Willson, “Camera calibration using Tsai’s method,” 1995. ftp://ftp.vislist.com/ SHAREWARE/CODE/CALIBRATION/Tsai-method -v3.0b3/. [5] H. S. Sawhney and S. Ayer, “Compact representations of videos through dominant and multiple motion estimation,” T-PAMI, vol. 18, no. 8, pp. 814–830, 1996. [6] G. A. F. Seber and C. J. Wild, Nonlinear Regression. New York: Wiley, 1989. [7] R. Klette, K. Schl¨ uns, and A. Koschan, Computer Vision Three-Dimensional Data from Images. Singapore: Springer-Verlag, 1998. [8] M. J. Black, D. J. Fleet, and Y. Yacoob, “Robustly estimating changes in image appearance,” CVIU, vol. 78, no. 1, pp. 8–31, 2000.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 3: The images of the calibration pattern taken by a camera. (left column) original images. (right column) compensated images. (upper two rows) images of calibration pattern and grid at the widest view angle. (lower tow rows) images at zoomed out.