Jul 7, 2000 - era, which is common in video images of sports games like soccer or American ... using multiple homographies [1], and recently Seo and.
IEICE TRANS. INF. & SYST., VOL.E83–D, NO.7 JULY 2000
1375
PAPER
Special Issue on Machine Vision Applications
A Multiple View Approach for Auto-Calibration of a Rotating and Zooming Camera∗ Yongduek SEO† , Min-Ho AHN†† , and Ki-Sang HONG††† , Nonmembers
SUMMARY In this paper we deal with the problem of calibrating a rotating and zooming camera, without 3D pattern, whose internal calibration parameters change frame by frame. First, we theoretically show the existence of the calibration parameters up to an orthogonal transformation under the assumption that the skew of the camera is zero. Auto-calibration becomes possible by analyzing inter-image homographies which can be obtained from the matches in images of the same scene, or through direct nonlinear iteration. In general, at least four homographies are needed for auto-calibration. When we further assume that the aspect ratio is known and the principal point is fixed during the sequence then one homography yields camera parameters, and when the aspect ratio is assumed to be unknown with fixed principal point then two homographies are enough. In the case of a fixed principal point, we suggest a method for obtaining the calibration parameters by searching the space of the principal point. If this is not the case, then nonlinear iteration is applied. The algorithm is implemented and validated on several sets of synthetic data. Also experimental results for real images are given. key words: auto-calibration, rotating and zooming camera, camera calibration, homography
1.
Introduction
In this paper we deal with the problem of computing internal camera parameters using inter-image homographies obtained from several images of a scene taken by externally rotating and internally zooming and focusing camera. Previously, Hartley proposed a self-calibration algorithm given matches of images taken by a rotating camera whose internal parameters are fixed [5], [6]. One limitation of the work is that the algorithm cannot be applied when the images are taken by a zooming or auto-focusing camera. On the other hand, however, it is usual to have views taken by rotating and zooming camera, which is common in video images of sports games like soccer or American football. Seo and Hong proposed a flexible calibration method to compute timevarying internal parameters and rotation matrices given Manuscript received October 4, 1999. Manuscript revised March 9, 2000. † The author is under Ph.D. program in EE Dept. of POSTECH, Korea. †† The author is under Ph.D. program in Math. Dept. of POSTECH, Korea. ††† The author is with EE Department of Pohang University of Science and Technology (POSTECH), Korea. ∗ This paper was presented at IAPR Workshop on MVA’98.
inter-image homographies [15]. A similar study using a gradient-based minimization, performed independently, can be found in the work of Agapito, et al.[2]. They also proposed a linear algorithm for the same purpose using multiple homographies [1], and recently Seo and Hong investigated the effect of the estimation error of the principal point in the linear computation of the calibration paramters [16]. The basic principle of those auto-calibration algorithms is that the inter-image homographies obtained from image information are equal to the infinity homographies which consist of just rotation and calibration matrices [12], [21]. In the first part of this paper, we theoretically show that the auto-calibration is possible and there exists a solution for the calibration parameters up to a global rotation matrix, when the skew of the camera is assumed to be zero. And then, by analyzing the inter-image homographies, we develop an algorithm for computing internal camera parameters although they change due to zooming or focusing, without using any prior information such as camera rotation or calibration pattern. In the algorithm formulation, we use the generic constraint that the camera undergoes just a rotation. The calibration result may be used to extract 3D information contained in image sequences captured by rotating and zooming cameras. For example, the work of Kim et al.[10] used the 3D information of a soccer play-ground to estimate the locus of a flying ball which is assumed to be a parabolic curve. If the internal calibration parameters are known, the locus of a more general curve can be computed. Also when novel videos are synthesized using image-mosaicking, the auto-calibration technique proposed in this paper will provide a way to deal with image sequences of rotating and zooming (and auto-focusing) cameras [19]. The calibration algorithm is tested on synthetic data to check its performance in the presence of noise and experimental results on real image data are given. 2.
The Camera Model
In this paper we consider a set of rotating cameras with ˜ k = K k [Rk |0], where Rk denotes camera matrices P the rotation of the k-th camera with respect to the first (0-th) camera and K k is the camera calibration matrix defined by
IEICE TRANS. INF. & SYST., VOL.E83–D, NO.7 JULY 2000
1376
Kk =
αk
sk βk
xk yk . 1
The parameters in K k , the intrinsic parameters, represent the properties of the image formation system: βk represents focal length, γk = αk /βk represents the aspect ratio, sk represents the skew and (xk , yk ) is called the principal point. In this paper, we call (αk , βk ) the focal lengths of the camera as a whole. ˜ = [x, y, z, 1]T in camera centered Any 3D vector x coordinate system is projected at time k into the image space as: λk uk = K k [Rk |0]˜ x = K k Rk x = P kx
(1) (2)
where the scale factor λk , called the depth, accounts for perspective effects and x = [x, y, z]T . In this paper, we consider cameras whose skew is known or is assumed to be zero and we have following definition: Definition 1: A camera that can be modeled as in (1) with sk = 0 is called a zero-skew camera. Internal calibration matrix K of type α 0 x K = 0 β y 0 0 1 is called zero-skew calibration matrix. 3.
Existence of a Soution of Auto-Calibration
Let uk = K k Rk x and ut = K t Rt x, then it follows that uk = K k Rk Rt T K −1 t ut .
(3)
The equal sign denotes the equality up to scale. Choosing the coordinate system with respect to the 0-th camera and re-writing this equation with respect to the coordinate system, we may write uk = K k Rk K −1 0 u0 .
(4)
Thus, we have the following proposition that is an extension of Hartley’s [6]: Proposition 1: Given a set of images I0 , . . . , IN taken from the same location by cameras with the calibration matrices K 0 , . . . , KN , then there exist 2D projective transformations, represented by matrix H k , taking image I0 to image Ik . Moreover, the matrix H k may be written in the form H k = K k Rk K −1 0
(5)
where Rk represents the rotation of the k-th camera with respect to the 0-th.
Internal parameters may be found from Eq. (5) where the H k ’s are computed using image correspondences. However, notice that Eq. (5) provides 8 independent equations but, in the case of general calibration model, the number of unknown parameters is 13: five for each camera matrix and three for the rotation matrix. This means that given N inter-image homographies we are given 8N equations but the number of unknowns is 3N + 5N + 5 = 8N + 5. That is, when all the internal camera parameters vary, we cannot compute them using inter-image homographies and we need a constraint or an assumption to reduce the number of unknowns. In this paper, we assume that the skew is always zero, that is sk = 0. (Due to progresses in camera production this assumption is practical.) Then, the number of unknowns is 3N + 4N + 4 = 7N + 4, and finally, given four or more inter-image homographies we have enough equations to compute the camera parameters. From proposition 1, one may find camera matrices of dimension 3 × 3 P k = K k Rk , k = 0, . . . , N
(6) P k P −1 0
that satisfy the relationship H k = for each k. It can be seen that given one such sequence of camera matrices P k , k = 0, . . . , N , P k Q may also be a possible choice of camera matrices, where Q is a nonsingular 3 × 3 matrix, because they also produce the same interimage homographies. Now it remains to show that given a sequence of camera matrices P k , k = 1, . . . , N , which 1) solves the inter-image transformation problem and 2) represents zero-skew cameras, then the only possible transformation Q that preserve the zero-skew camera condition (Eq. (7)) belongs to the group of orthogonal transformations up to scale. Now we need a lemma to go further [3]. Lemma 1: A camera matrix P = KR = [p1 p2 p3 ]T represents a zero-skew camera if and only if (p1 × p3 ) · (p2 × p3 ) = 0.
(7)
Let us consider a non-singular 3 × 3 matrix G instead of the rotation matrix R in Eq. (7). α 0 ux T P = KG = k1 k2 k3 G = 0 β uy G, (8) 0 0 1 Note that pi = GT ki for i = 1, . . . , 3. Hence, p1 × p3 = −α det(G)G−1 e2 , p2 × p3 = β det(G)G−1 e1 ,
(9) (10)
where ei is a standard basis for R3 . Thus (p1 × p3 ) · (p2 × p3 ) = −αβ(det(G))2 eT2 G−T G−1 e1 (11)
SEO et al.: A MULTIPLE VIEW APPROACH FOR AUTO-CALIBRATION OF A ROTATING AND ZOOMING CAMERA
1377
The following immediate corollary is useful in our further analysis. Corollary 1: A camera matrix P = KG represents a zero-skew camera if and only if eT2 G−T G−1 e1 = 0.
(12)
mi1 = 2 (−(qi1 qi2 ) + qi0 qi3 ) (−qi0 2 − qi1 2 + qi2 2 + qi3 2 ) (13) mi2 = qi0 4 − qi1 4 + 6 qi1 2 qi2 2 − qi2 4 − 6 qi0 2 qi3 2 + qi3 4 (14) mi3 = 2 (qi0 3 qi1 + qi0 qi1 3 − 3 qi0 qi1 qi2 2 + 3 qi0 2 qi2 qi3 +3 qi12 qi2 qi3 − qi2 3 qi3 − 3 qi0 qi1 qi3 2 − qi2 qi3 3 ) (15) 2 mi4 = 2 (qi1 qi2 + qi0 qi3 ) qi0 − qi1 2 + qi2 2 − qi3 2 (16) mi5 = 2 (− qi0 3 qi2 + 3 qi0 qi1 2 qi2 − qi0 qi2 3 + 3 qi0 2
where mij , given in Eqs. (13)–(18), are the (i, j)-th component of the matrix M , composed of the elements of the rotation matrix R. Notice that the subscript i denotes the image index. It is sure that (1, 0, 0, 1, 0, 1)T is contained in the null space of M . Through a symbolic computation, one can show that the dimension of null space of M is 1 in generic case. Note that rank of M is less than 5 if and only if all the determinants of 5 × 5 submatrices of M vanish. Let Σ be the set of 4N-tuples (q 1 , . . . , q N ) of the quaternion vetors such that the matrix M which is constructed from (q 1 , . . . , qN ) is of rank 5. Let Σ0 be the complement of Σ in HN , where H is the space of unit quaternions. Because Σ0 is a Zariski-closed variety in HN , Σ is a Zariski-open dense subset [14]. Thus, if (q 1 , . . . , q N ) is contained in a dense subset Σ, then Q must be a scaled orthogonal matrices. QED. Now, we can give the exact meaning of the term generic.
(18)
Definition 2: A set of zero-skew cameras K i Ri , i = 1, . . . , N is called generic if (q 1 , . . . , q N ) is contained in the open dense set Σ which is defined in the proof of Theorem 1, where q i is the unit quaternion corresponding to the rotation matrix Ri .
Suppose that we have generic N (≥ 5) zero-skew cameras P i = K i Ri . (The exact meaning of the term generic will be explained later.)
Hence, if Q preserves the zero-skew condition for a generic set of N (≥ 5) cameras, then it preserves the zero-skew condition for all cameras. We will consider some exceptional cases in the following sections. Concludingly, the theorem gives that autocalibration is possible.
Theorem 1: If the transformation P i → P i Q preserves the zero skew conditions of all the P i , then Q is an orthogonal matrix up to non-zero scale, QQT = λI, λ = 0.
Remark 1: In fact, the matrix Q is related to the selection of the camera coordinate system. Therefore, choosing the camera coordinate system with respect to an image determines implicitly the matrix Q.
proof: By Corollary 1, we get the following equations for each i :
4.
qi1 qi3 − qi1 3 qi3 + 3 qi1 qi2 2 qi3 + 3 qi0 qi2 qi3 2 −qi1 qi3 3 ) (17) mi6 = 4 (− (qi0 qi2 ) + qi1 qi3 ) (qi0 qi1 + qi2 qi3 ) 3.1 Q is a Scaled Orthogonal Matrix
0 = eT2 (Ri Q)−T (Ri Q)−1 e1 = eT2 Ri (QQT )−1 RTi e1 = eT2 Ri ΛRTi e1 ,
(19)
where Λ = (λmn ) := (QQT )−1 . We choose the unit quaternion representation for a rotation matrix R as shown in Eq. (20) where q = (q0 , q1 , q2 , q3 ) is a unit quaternion
22 2 2 2 q0 + q1 − q2 − q3 2q1 q2 + 2q0 q3 R =4 2q2 q1 − 2q0 q3 q02 − q12 + q22 − q32 2q3 q1 + 2q0 q2
2q3 q2 − 2q0 q1
3
2q1 q3 − 2q0 q2 2q2 q3 + 2q0 q1 5 . q02 − q12 − q22 + q32
(20) Since Eq. (19) is a linear equation in λmn , its matrix form can be written as follows: T M l = [mij ] λ11 λ12 λ13 λ22 λ23 λ33 = 0 (21)
Estimation Algorithm Outline
In the previous section, we showed that we can calculate internal camera parameters using matches of images of the same scene even though the camera parameters vary due to auto-focusing or zooming under the assumption of non-skew camera. In order to compute the internal parameters we have to solve Eq. (5). As mentioned before, because at least four inter-image homographies are needed in the computation and the equations are highly nonlinear, it seems unavoidable to use a nonlinear iterative minimization technique. However, our observation is that if we know the principal point we can compute, using a linear method, the other parameters like focal lengths and aspect ratios. Therefore, we take an approach to solving Eq. (5) by defining an error function of the internal parameters and searching for the best principal point around the image center which minimizes the error function. Equation (5) can be re-written as
IEICE TRANS. INF. & SYST., VOL.E83–D, NO.7 JULY 2000
1378
H k K 0 = K k Rk .
(22)
Multiplying by the transpose of each side, we have H k K 0 K T0 H Tk = K k K Tk .
(23)
The inter-image homography H k can be computed using more than four image matches or direct comparison of image intensities (mosaicking method) [11], [18]. In particular, when point matches are given under the assumption of Gaussian detection noise, the renormalization method of Kanatani [7]–[9] will provide statistically optimal computation of the homography. In the case that image mosaicking methods are used, the images should have enough texture to obtain homographies via direct image intensity comparison. Since obtaining image matches, computing homography itself, and compensating for lens distortion are not main topic of this paper, we assume in the sequel that the homographies are already obtained through an appropriate procedure. In the real experiments of the following sections, point matches are automatically obtained from the images of a calibration pattern, and we tried not to have lens distortion in the acquisition process of the images. However, note that the usage of the calibration pattern is to obtain only the point matches needed for computing homographies. Provided that we know the principal points (x0 , y0 ) and (xk , yk ), we can eliminate the components of the on principal points in Eq. (5) by multiplying by C −1 k the left-hand side of H k and C 0 on the right-hand side, where C k is defined by 1 xk 1 yk . Ck = (24) 1 Now we have principal-point-free version of equation (23): ¯ kK ¯ 0K ¯ TH ¯T =K ¯ T, ¯ kK H 0 k k
(25)
where ¯ k = diag(αk , βk , 1). K
(26)
Now let’s consider the left-hand side of this equation: ¯ kK ¯ 0K ¯ TH ¯ T . After some algebraic calculation, B =H 0 k the elements of B can be written as b11 = α20 h211 + β02 h212 + h213
(27)
b12 = α20 h11 h21 + β02 h12 h22 + h13 h23
(28)
b13 = α20 h11 h31 + β02 h12 h32 + h13 h33
(29)
b22 = α20 h221 + β02 h222 + h223
(30)
b23 = α20 h21 h31 + β02 h22 h32 + h23 h33
(31)
b33 = α20 h231 + β02 h232 + h233
(32)
¯ k = [hij ]. Since b12 = b13 = b23 = 0 from by using H Eq. (25), we have three equations to compute α0 and β0 : h11 h21 h12 h22 2
−h13 h23 α h11 h31 h12 h32 20 = −h13 h33 . (33) β0 h21 h31 h22 h32 −h23 h33 Notice that total number of equations for α0 and β0 is 3N given N homographies. After computing α0 and β0 , we can find αk and βk using the rest of the bij ’s: α2k =
b11 α2 h2 + β02 h212 + h213 = 02 11 b33 α0 h231 + β02 h232 + h233
(34)
βk2 =
b22 α2 h2 + β02 h222 + h223 = 02 21 . b33 α0 h231 + β02 h232 + h233
(35)
It means that given the principal points of the cameras we can compute the other internal camera parameters linearly. In other words, when we have an approximation of the principal points, we may have a linear approximation of the focal lengths, (αk , βk ). Now we define a nonlinear error function to find the optimal calibration parameters including the principal points. Using the relationship Rk =
K −1 k HkK0
3 det(K −1 k H kK0) 1
(36)
we minimize the following error function: E=
N
Rk RTk − I 2 + RTk Rk − I 2 . (37) F F k=1
For nonlinear optimization, we have two approaches. One is the area searching method searching in the whole or a part of image space for the principal points that minimize the error function, and the other is the use of a gradient based minimization algorithm like conjugate gradient method or Levenberg-Maqurdt method. In the latter case, initial values may be obtained by assuming that the principal points are image centers and computing the other calibration parameters using the linear algorithm. However, there is a possibility that the solution may not be the global minimum due to the errors in the image matches or in the computation of inter-image homographies. As mentioned previously, we need at least four inter-image homographies for computing time-varying calibration parameters. However, the number of homographies can be reduced if we make some restrictions on camera models. If we assume that the principal point does not move at all in zooming or focusing and the aspect ratio is known, only focal lengths will vary. In this case one inter-image homography is enough to calibrate the camera using the area searching method, which will be discussed in Sect. 5. In Sect. 6, we generalize the model by assuming the aspect ratio is unknown, in which case two inter-image homographies are
SEO et al.: A MULTIPLE VIEW APPROACH FOR AUTO-CALIBRATION OF A ROTATING AND ZOOMING CAMERA
1379
1. Set principal points: x ← x ¯ and y ← y¯. 2. Compute fk , k = 0, . . . , N , using all homographies. 3. Compute the error E defined by Eq. (41). 4. If E is smaller than the previous one, record the calibration parameters. 5. Repeat 1 – 4 for searching area. 6. The calibration parameters are the recorded ones. Fig. 1
5.1 A Consideration about Rotation Axes
Algorithm for computing calibration parameters.
required. Assuming that the principal point is fixed, we will use the area searching method to find the global minimum. Finally, all the calibration parameters are assumed to vary except the skew, which is discussed in Sect. 7. This case is the most general and we will use an iterative optimization method like the LevenbergMaqurdt iteration. 5.
Fixed Principal Point with Known Aspect Ratio
We consider the camera matrix when the aspect ratio and principal point are fixed with respect to time. That is, the principal point is the same during the sequence and the aspect ratio is known to be one. Only the focal length varies in zooming and focusing. In this case, the number of unknown parameters is seven; three for rotation, two for focal lengths, and two for the principal points. One image-to-image homography H k from which we have eight equations is sufficient to compute the unknown parameters. Because the aspect ratio is assumed to be known (αk = βk = fk ), Eqs. (27) – (32) have slightly different forms, and we come to have three equations to compute f0 : h11 h21 + h12 h22 −h13 h23 (38) f02 h11 h31 + h12 h32 = −h13 h33 h21 h31 + h22 h32 −h23 h33 and finally fk can be computed using two equations: fk2 = =
f02 (h211 + h212 ) + h213 f02 (h231 + h232 ) + h233
(39)
f02 (h221 + h222 ) + h223 . f02 (h231 + h232 ) + h233
(40)
Using the definition of the error function E, we parameterize the error function, Eq. (41), by the principal point (x, y), which is fixed in time, and the calibration parameters are found using the algorithm proposed in Fig. 1. N
Rk RTk − I 2 + RTk Rk − I 2 . E(x, y) = F F k=1
(41)
When the rotation is only about the x-axis or y-axis, we should not use all the three equations in Eq. (38). When the x-axis is the rotation axis, only the equation b23 = 0 is valid. When the y-axis is the rotation axis, only the equation b13 = 0 is valid. These analyses are important because the axis of the rotation is usually the x-axis, y-axis or the composition of the two axes, and the rotation about z-axis is usually small. At first we investigate when the rotation is about the x-axis. Suppose that we are given two images and know the homography H between them. At this time we ignore the effect of estimation errors in the computation of H. Since the rotation is about the x-axis, we ¯ of the form: are given H ¯ −1 ¯ 1 Rθ K ¯ true = K H X 0 −1 f1 1 f0 f1 c −s = 1 s c −1 f1 f0 = cf1 f0−1 −sf1 sf0−1 c
f0−1
1
where c = cos(θ) and s = sin(θ). Thus, note that the equations b12 = 0 and b13 = 0 are identically zero when there are no errors in the principal point and the homography. Thus, only the equation b23 = 0 is valid for the computation of the calibration paramters. A similar approach can be made for rotation about the y-axis. In this case, only the equation b13 = 0 is valid for the computation of f0 . 5.2 Experiment with Synthetic Data Using synthetic data we tested our calibration algorithm and determined its performance in the presence of noise. The 3D coordinates of synthetic data were generated randomly so that they exist in the cube of 1×1×1, and the location of the center of cube in the camera coordinate system was chosen to be (0, 0, 4.5)T. Noiseless image coordinates were computed using true rotation axes, angles and the calibration parameters for each frame. Then we added a Gaussian noise with zero mean and standard deviation σ to each of the image coordinates (u, v) and computed inter-image homographies. √ Therefore the actual RMS pixel error is 2 times the indicated value σ. Table 1 shows the results of camera calibrations for various image noise levels. The first row shows the true values of the calibration parameters: the focal lengths of the two cameras are 1000 and 1100 and the principal point is (330, 230). In-between rotation angles are 10◦ about the x-axis and 10◦ about the y-axis. The
IEICE TRANS. INF. & SYST., VOL.E83–D, NO.7 JULY 2000
1380 Table 1 Noise 0 0.5 0.7 1.0
Computation results with one run at each noise level. f0 1000 1021.7 986.6 972.2
f1 1100 1123.9 1081.2 1073.5
x 330 333.0 344.0 347.0
y 230 232.0 238.0 258.0
rx 10◦ 9.90 10.17 10.38
ry 10◦ 9.69 9.85 9.32
rz 0◦ 0.02 0.07 0.27
Table 2 Computation results after 100 runs at each noise level. Two views were used in the simulation. For each noise level, the first row shows the mean values of the parameters and the next row shows the standard deviations. Noise f0 f1 x y rx ry rz 0 1000 1100 330 230 10◦ 10◦ 0◦ 0.5 (mean) 999.7 1099.9 330.0 230.1 10.0 9.98 0 (std) 12.1 13.5 8.5 9.8 0.13 0.29 0.11 0.7 (mean) 1002.0 1102.3 330.8 230.7 9.99 9.97 0.01 (std) 20.6 23.5 10.8 13.1 0.23 0.33 0.13 1.0 (mean) 1000.5 1101.1 328.6 229.1 10.02 10.01 -0.01 (std) 24.6 25.8 18.0 15.5 0.24 0.45 0.19
Table 3 Computation results after 100 runs at each noise level when ten views were used in the simulation. Among the calibration parameters, those for the first two views are listed. Compared with the values in Table 2, the standard deviations of the focal lengths have about half of the values. The standard deviations for the principal point and rotation angles are also reduced or similar. Noise 0 0.5 0.7 1.0
f0 1000 999.7 7.4 998.3 11.6 996.6 16.8
f1 1100 1099.0 9.4 1099.8 15.2 1097.2 20.0
x 330 330.8 7.9 330.5 10.18 332.4 11.5
y 230 228.7 6.4 229.1 11.4 228.8 11.5
rx 10◦ 10.03 0. 8 10.05 0.11 10.09 0.17
ry 10◦ 9.96 0.28 10.01 0.48 9.93 0.51
number of image matches used in the computation of the homography is about 80 – 120. Table 2 shows the calibration result of 100 runs with two image matches using the same simulation parameters. It shows the mean value and the standard deviation of each of the parameters at each noise level. The computed parameters f0 and f1 show some correlation between them, whereas the others show little. Table 3 shows the computation results for the first two views when ten views are used in the simulation with 100 runs at each noise level. When compared with the results listed in Table 2 the standard deviations are noticeably reduced, which means the computation using multiple views yields better results.
rz 0◦ 0 0. 7 -0.01 0.14 -0.03 0.15
(I0 )
5.3 Experiments with Real Data (I1 )
Now, we show the result of testing our algorithm using two images of a calibration pattern. The images, obtained from a TV camera on a tripod, are I0 and I1 in Fig. 2. The calibration pattern in the scene is just for obtaining point matches from which inter-image homographies are computed; no 3D information of the calibration pattern is utilized in the auto-calibration. The calibration parameters were not changed between the two views, and we tried to keep the rotation axis lim-
(I2 )
Fig. 2 Three images of the calibration pattern. This 3D calibration pattern is for obtaining image matching points from which inter-image homographies are computed. No 3D information of the calibration pattern is utilized in the auto-calibration of the camera. Their image sizes are 640 × 480. Calibration parameters are fixed during in-between rotation.
ited to the y-axis. The sizes of the images are 640×480. The TV camera used in the acquisition of the images in
SEO et al.: A MULTIPLE VIEW APPROACH FOR AUTO-CALIBRATION OF A ROTATING AND ZOOMING CAMERA
1381 Table 4 Estimation error of the homography. Note that sixty one matches have a transfer error of 0 – 0.5 and fifty matches 0.5 – 1.0. RMS error is 0.64 pixels and the maximum error is 1.80 pixels. ERROR No. of matches
0 – .5 61
.5 – 1.0 50
1.0 – 1.5 9
1.5 – 2.0 2
RMS 0.64
MAX 1.80
Table 5 Calibration result for the two images, I0 and I1 , in Fig. 2. Notice that the focal lengths are almost the same and the rotation is almost totally about the y-axis. f0 573.7
f1 572.8
(x, y) (302,228)
(rx , ry , rz ) (0.41◦ , −16.97◦ , 0.52◦ )
Table 6 Estimation error of the homography. Forty matches are within 0 – 0.5 transfer error, etc. RMS error is 0.93 pixel and the maximum error is 2.81 pixel. ERROR No. of matches
0 – .5 40
.5 – 1.0 54
1.0 – 1.5 26
1.5 – 2.0 3
2.0 – 3.0 4
RMS 0.93
MAX 2.81
Table 7 Calibration result for the two images I1 and I2 in Fig. 2. The focal lengths are different because of zooming and the rotation is almost about the y-axis. f1 543.0
f2 768.9
(x, y) (315.5,217.5)
(rx , ry , rz ) (4.09◦ , 20.68◦ , −2.36◦ )
Table 8 Calibration result for the three images in Fig. 2. Rotational angles for θ(1, 0) have reversed signs because the reference view is chosen to be I1 . f0 560.0
f1 556.1
f2 794.8
(x, y) (300.0,217.0)
θ(1, 0) (−0.24◦ , 16.50◦ , −0.73◦ )
Fig. 2 and Fig. 4 was HITACHI SK-2600PW with Fujinon A8.5 × 5.5VM/RD lens. A total of 122 point matches were found automatically, and the homography was computed. Estimation error of the homography is shown in Table 4 where the error is defined by the distance between the point of the first image and transferred point of the second image: E(p1 , p2 ; H) = p1 − Hp2 2 .
1. Set principal points: x ← x ¯ and y ← y¯. 2. Compute αk and βk , k = 0, . . . , N , using all homographies. 3. Compute the error E defined by Eq.(43). 4. If E is smaller than the previous one, record the calibration parameters. 5. Repeat 1 – 4 for searching area. 6. The optimal calibration parameters are the recorded ones.
(42)
The homography’s transfer error shows that 111 points (91%) are within 1.0 pixel distance and its RMS error is 0.64 pixel. The calibration result is shown in Table 5: f0 = 573.7, f1 = 572.8, (x, y) = (302, 228). The rotation angles computed are (0.41◦ , −16.97◦, 0.52◦ ) indicating that the rotation is almost totally with respect to the y-axis. 5.3.1
θ(1, 2) (3.93◦ , 21.17◦ , −2.35◦ )
Pattern 2 and 3
Now we show the results of testing our algorithm for the images I1 and I2 that have some zoom factor as shown in Fig. 2. The rotation axis was also almost totally about the y-axis. The number of matching points were 127 and the error statistic is shown in Table 6. The resulting RMS transfer error is 0.93. The calibration result is shown in Table 7: f1 = 543.0, f2 = 768.9 and (x, y) = (315.5, 217.5). The computed rotation angles are (4.09◦ , 20.68◦ , −2.36◦).
Fig. 3 Algorithm for computing calibration parameters when the principal point is fixed and the aspect ratios are unknown. Note that the error function is parameterized by the principal point.
5.3.2
Pattern 1, 2 and 3
Table 8 shows the result of calibration using the three views. As before, the focal lengths of the first and the second images are almost the same and the rotation angles indicate that the rotation axes are almost about the y-axis. However, at this time, the signs of the rotation angles are reversed for the first and the second views since the reference view is changed from I0 to I1 . 6.
Fixed Principal Point
Now we consider the camera matrix of unknown but fixed principal points. At this time, however, the aspect ratio is not known. In this case, the number of unknown
IEICE TRANS. INF. & SYST., VOL.E83–D, NO.7 JULY 2000
1382 Table 9 Computation results with 100 runs at each noise level when three views were used. The parameters for the first camera are listed. Noise 0 0.5 0.7 1.0
Table 10 level.
(α0 , β0 ) (1000, 1030) (1001.0, 1029.7) (10.6, 19.5) (995.4, 1031,4) (15.3, 24.5) (1001.0, 1032.8) (20.4, 44.1)
θ01 (rx , ry , rz ) (7◦ , 7◦ , −1◦ ) (7.01, 6.99, -1.00) (0.21, 0.21, 0.11) (6.98, 7.03, -1.03) (0.28, 0.23, 0.17) (7.0, 7.02, -1.0) (0.45, 0.38, 0.22)
Computation results when ten views were used, with 100 runs at each noise Noise 0 0.5 0.7 1.0
(α0 , β0 ) (1000, 1030) (1000.1, 1031.2) (6.5, 8.5) (997.9, 1031,1) (9.2, 13.7) (997.4, 1030.3) (13.2, 15.9)
θ01 (rx , ry , rz ) (7◦ , 7◦ , −1◦ ) (6.99, 7.02, -0.99) (0.05, 0.21, 0.06) (7.0, 6.98, -0.98) (0.09, 0.36, 0.09) (7.02, 6.94, -0.96) (0.12, 0.44, 0.10)
parameters is nine for one image-to-image homography H k which provides eight equations. Hence, at least three images or equivalently two image-to-image homographies are needed to compute all the parameters (given two homographies we have 16 equations to compute 14 unknowns). Since the aspect ratio is assumed to be unknown, we can use the equations detailed in Sect. 4. That is, we search for the principal point around the image center that minimizes the error function defined by Eq. (43) as described in Fig. 3. Note that the error function E is parameterized by two variables which made up the principal point (x, y). That is, E(x, y) =
(x, y) (330, 230) (329.8, 230.9) (15.4, 19.7) (327.9, 229.6) (20.2, 30.1) (327.5, 229.5) (30.0,40.0)
N
Rk RT − I 2 + RT Rk − I 2 . k k F F k=1
(43) Although this iteration may take more time, it is sure that we find the global minimum. 6.1 Exceptional Cases When the rotation axis is only the x-axis for all the input images, it is impossible to compute the scale factors αk . Also when the rotation axis is y-axis, one cannot compute the βk ’s. This is due to the special forms of the rotation matrices in these two cases. When the rotations are about the x-axis, the rotation matrices are of the form 1 k c −s = RX,θ (44) k s c where c = cos θk , s = sin θk and θk is the rotation angle between the 0-th camera and the k-th camera. Notice
(x, y) (330, 230) (328.3, 231.1) (7.1, 7.5) (326.0, 232.6) (10.3, 11.6) (327.6, 235.4) (13.9, 14.5)
that k k = D(λ, 1, 1)RX,θ D(λ−1 , 1, 1) RX,θ k k
(45)
where D(λ, 1, 1) = diag(λ, 1, 1).
(46)
That is, we have multiple solutions for the calibration parameters that satisfy the equation H k = K k Rk K −1 0 , or more specifically we cannot determine exact αk ’s, because if K k is a solution then K k D(λ, 1, 1) is a solution, too. In the case of rotations about the y-axis, it is impossible to compute unique βk ’s for the same reason. That is, if K k is a solution then K k D(1, λ, 1) is also a solution. In conclusion, the rotation axis should not be purely the x-axis nor the y-axis for unique computation of the calibration parameters. Except for those two cases, the rotation axis may be fixed. For example, images of the camera rotated about the axis ( √12 , √12 , 0) can be used to compute the calibration parameters. This problem was investigated previously in [6] for the auto-calibration of fixed internal parameters. 6.2 Experiment with Synthetic Data The simulation data was obtained in the same environment as the previous section. Table 9 shows the calibration results of 100 runs with 3 image matches. Only the parameters for the first camera are listed. In Table 10 we show the result of the simulation when ten views were used, after 100 iterations at each noise level. Among the parameters, those of the first camera are shown. It is clear that the result is better than that of three views, indicating that it is important to use multiple views to compute calibration parameters accurately.
SEO et al.: A MULTIPLE VIEW APPROACH FOR AUTO-CALIBRATION OF A ROTATING AND ZOOMING CAMERA
1383 Table 11 Calibration parameters computed for the views depicted in Fig. 4. We tried to fix the zoom factor for the views I1 – I4 . The calibration seems very plausible because the scale factors for the last four views are almost the same and the principal point is near the center of the image. Also the mean of the aspect ratios is 1.028 and the maximum deviation from the mean is within 0.3%. Principal Point (312.6, 218.8) View I0 I1 I2 I3 I4 Table 12 varying.
(αk , (630.2, (815.1, (808.6, (810.6, (812.0,
βk ) 612.6) 790.5) 788.7) 789.9) 790.0)
Rotation Angles (6.84, -6.35, 0.7) (10.46, 10.62, -2.29) (2.23, 16.66, -1.35) (6.64, 3.45, -0.67)
Aspect Ratio 1.029 1.031 1.025 1.026 1.028
Calibration result of one run at each noise level when the parameters are all
Noise Level 0 0.5 0.7 1.0
(α0 , β0 ) (1000, 1030) (1002.3, 1034.9) (994.4, 1027.0) (988.3, 1011.5)
(I0 )
(I1 )
(I2 )
(rx , ry , rz ) (7◦ , 10◦ , −1◦ ) (7.06◦ , 10.28◦ , −0.92◦ ) (7.08◦ , 10.05◦ , −0.98◦ ) (7.25◦ , 10.71◦ , −0.96◦ )
of sizes 640 × 480 which are depicted in Fig. 4. The images were obtained from a TV camera on a tripod. As we did in Sect. 5, the calibration pattern in the scene is just for obtaining point matches from which inter-image homographies are computed, and hence no 3D information of the calibration pattern is utilized in the autocalibration. The reference view is I0 and the matches of the images were obtained automatically. The number of the matches were about 130 – 140 and the RMS errors for the estimation of the homographies were from 0.99 pixels to 1.22 pixels. There is zooming between the reference view and the other views, and we tried to keep the zooming factor fixed for the other four views. The calibration result is shown in Table 11 where the principal point, scale factors, and rotation angles with respect to the reference view are listed. The result of calibration seems very plausible, since the scale factors for the last four views are almost the same and the principal point is near the center of the image. Also the values of aspect ratios are in the range of (1.025, 1.031), their mean is 1.028 and their deviations from the mean are within 0.3%. Practically the aspect ratio does not change significantly in the image acquisition process. 7.
(I3 )
(I4 )
Fig. 4 Five images of size 640 × 480 used in the experiment. The first image I0 is selected to be the reference view. The calibration pattern shown in the images is just for obtaining point matches automatically without concerning about any image matching problem; and hence no 3D information of the pattern is utilized in the auto-calibration of the camera.
6.3 Experiment with Real Data The algorithm was applied to matches from five images
(x0 , y0 ) (330, 231) (328, 228) (333, 228) (324, 235)
Varying Internal Parameters
Finally we deal with the most general case by assuming that the skew of the calibration matrix is zero (s = 0). From Eq. (22), if we have four H k ’s (N = 4) then we have 7N + 4 = 32 unknowns and 8N = 32 equations. Therefore, we can compute the internal calibration parameters and the rotation matrices. In the computation of the calibration parameters, we can not use the area searching algorithm due to the explosion of the search space. Thus we use an iterative minimization method like LM method [4]. We find the calibration parameters that minimize the following
IEICE TRANS. INF. & SYST., VOL.E83–D, NO.7 JULY 2000
1384 Table 13 Calibration parameters computed for the views depicted in Fig. 4. Here the iterative method was used with the initial parameters obtained using the method of Sect. 6. The calibration seems very plausible because the focal lengths for the last four views are almost same and the principal points are near the center of image. Also the mean of the aspect ratios is 1.048 and the maximum deviation from the mean is within 0.4%. View I0 I1 I2 I3 I4
(αk , (649.7, (840.6, (832.8, (831.5, (837.8,
βk ) 618.3) 799.2) 797.2) 796.6) 798.8)
Rotation Angles (6.97, -6.56, 0.64) (10.65, 10.95, -2.11) (2.28, 17.17, -1.33) (6.76, 3.56, -0.64)
Principal Point (313, 225) (300, 228) (319, 235) (329, 221) (308, 228)
Aspect Ratio 1.051 1.052 1.045 1.044 1.049
Table 13 seems more accurate in the sense that the error computed at the optimal calibration parameters (E ∗ = 4.36e − 05) is less than the error for the parameters of Table 11 (E ∗ = 9.75e − 4). 8.
Fig. 5 Three images of a soccer game, sampled from a broadcasting TV signal, is mosaicked into one. As this example shows, when it is not easy to obtain point or line matches from input images, a mosaicking method can be applied in order to obtain the homographies. Here the direct non-linear iterative method of [18] is apopted to compute the homographies. The optimization method using two homography matrices gave the following result: (u0 , v0 ) = (336.4, 236.6), fˆ0 = 1028.4, fˆ1 = 1054.3 and fˆ2 = 1260.2. The rotation angles were (2.72◦ , 2.06◦ , 0.41◦ ) and (4.49◦ , 10.26◦ , 1.08◦ ), from the left to the right.
error function iteratively where the initial parameter values are computed using the method of Sect. 6. E=
N
Rk RT − I 2 + RT Rk − I 2 . (47) k k F F k=1
7.1 Experiment with Synthetic Data Using the same environment as in Sect. 5, we generate synthetic image data, and the result of the minimization is shown in Table 12. As expected, the calibration is degraded as input noise increases. 7.2 Experiment with Real Data Using the previous matches of the images in Fig. 4, we tested the performance of our algorithm. Table 13 shows the result of the calibration. The result is also very plausible since the scale factors for the last four views are almost the same and the principal points are near the center of the image. Also notice that the aspect ratio is almost the same. When compared to the result shown in Table 11, the calibration result of
Discussions and Conclusion
In this paper we proposed a method for auto-calibration of a rotating and zooming camera, and showed that the calibration is possible without using a 3D pattern. Theoretically we showed that the auto-calibration is possible up to an orthogonal transformation under the assumption that the skew of the camera is zero. In the most general case, where the focal lengths and the principal point may vary, at least four interimage homographies are needed to find the calibration parameters where a nonlinear iterative minimization is utilized. By assuming that the principal point does not change but is fixed, we can compute the calibration parameters using two homographies. Finally, by fixing the aspect ratio of the camera as well as the principal point, we need one homography. In these two cases, the focal lengths are parameterized by the principal point and the calibration parameters are found using a searching algorithm in the two dimensional image space. One practical problem in the auto-calibration of a camera might be lens distortion when a wide field of view is taken. This paper did not deal with such a problem, but the lens distortion can be removed by utilizing some additional geometric information in the scene as can be found in [13], [17], [20]. Rotating and zooming video sequences may be encountered quite often, particularly in sports broadcasting where auto-calibration is important for the analysis of image sequences. For example, Fig. 5 was obtained using a direct non-linear iterative method [18]. It could be used in 3D information extraction like estimating the 3D locus of a flying ball in soccer games using images of a rotating monocular camera [10]. Another application area might be image-based video regeneration or synthesis using mosaicking techniques, where autocalibration plays an important role in the synthesis of a new video – an example could be [19].
SEO et al.: A MULTIPLE VIEW APPROACH FOR AUTO-CALIBRATION OF A ROTATING AND ZOOMING CAMERA
1385
References [1] L. de Agapito, R.I. Hartley, and E. Hayman. “Linear calibration of a rotating and zooming camera,” Proc. CVPR’99, pp15–21, Colorado, 1999. [2] L. de Agapito, E. Hayman, and I. Reid, “Self-calibration of a rotating camera with varying intrinsic parameters,” Proc. British Machine Vision Conf., pp.105–114, 1998. [3] O. Faugeras, Three-Dimensional Computer Vision, MIT Press, Cambridge, Mass, 1993. [4] R. Fletcher, Practical Methods of Optimization, John Wiley, 1987. [5] R. I. Hartley, “Self-calibration from multiple views with a rotating camera,” ECCV’94, Lecture notes in Computer Science, vol.800, pp.471–478, 1992. [6] R. I. Hartley, “Self calibration of stationary cameras,” Int. J. Computer Vision, vol.22, no.1, pp.5–23, 1997. [7] K. Kanatani, Statistical Optimization for Geometric Computation: Theory and Practice, Elsevier Science, 1996. [8] K. Kanatani, “Optimal homography computation with a reliability measure,” Proc. MVA’98, IAPR Workshop on Machine Vision Applications, pp.426–429, Nov. 1998. [9] K. Kanatani, “Accuracy bounds and optimal computation of homography for image mosaicing applications,” Proc. 7th Int. Conf. on Computer Vision, pp.73–78, Kerkyra, Greece, 1999. [10] T. Kim, Y. Seo, and K.-S. Hong, “Physics-based 3d position analysis of a soccer ball from monocular image sequences,” ICCV’98, pp.721–726, 1998. [11] R. Kumar, P. Anandan, M. Irani, J. Bergen, and K. Hanna, “Representation of scenes from collections of images,” IEEE Computer Society Workshop: Representation of Visual Scenes, 1995. [12] Q. Luong and T. Vieville, “Canonic representations for the geometries of multiple projective views,” Proc. 3rd European Conf. on Computer Vision, pp.589–599, Stockholm, Sweden, 1994. [13] S.-W. Park, Y. Seo, and K.S. Hong, “Real-time camera calibration for virtual studio,” Accepted to J. Real-Time Imaging, To be published, 1999. [14] I.R. Schafatevich, Basic Algebraic Geometry I — Varieties in Projective Space, Springer Verlag, 1988. [15] Y. Seo and K.S. Hong, “Autocalibration of a rotating and zooming camera,” IAPR Workshop on Machine Vision Applications, pp.274–277, 1998. [16] Y. Seo and K.S. Hong, “About the self-calibration of a rotating and zooming camera: Theory and practice,” Proc. 7th Int. Conf. on Computer Vision, pp.183–189, IEEE Computer Society Press, Kerkyra, Greece, 1999. [17] R. Swaminathan and S.K. Nayar, “Non-metric calibration of wide angle lenses and polycameras,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp.413– 419, Colorado, 1999. [18] R. Szeliski, “Image mosaicing for tele-reality applications,” Technical Report 94/1, DEC, Cambridge, April 1994. [19] R. Szeliski and H.-Y. Shum, “Creating full view panoramic image mosaics and environment maps,” SIGGRAPH 97 Conf. Proc., COMPUTER GRAPHICS Annual Conf. Series, 1997, pp.251–258, 1997. [20] R. Tsai, “A versatile camera calibration technique for highaccuracy 3d machine vision metrology using off-the-shelf tv cameras and lenses,” IEEE J. Robotics and Automation, vol.3, no.4, pp.323–344, 1987. [21] A. Zisserman, D. Liebowitz, and M. Armstrong, “Resolving ambiguities in auto-calibration,” Philosophical Transactions of the Royal Society of London, SERIES A, pp.1193– 1211, 1998.
Yongduek Seo received the BS degree in electronic engineering from Kyungpook National University, in 1992, and the MS and Ph.D. degrees in electronic and electrical engineering from Pohang University of Science and Technology (POSTECH), in 1994 and 2000. During April–May 1999 he was a guest researcher in Mathematical Imaging Group, Lund Institute of Technology, Sweden. His interests include real-time computer vision, camera self-calibration, structure recovery from motion, and augmented reality.
Min-Ho Ahn received his BS degree from the department of Mathematics SNU in 1994 and his MS degree from department of Mathematics POSTECH in 1996. Currently, he is a Ph.D. student of department of Mathematics POSTECH. His research areas are Autocalibration, reconstruction, and Computer Aided Geometric Design(CAGD).
Ki-Sang Hong received the BS degree in electronic engineering from Seoul National University, Korea, in 1977, and the MS and Ph.D. degrees in electrical and electronic engineering from KAIST (Korea Advanced Institute of Science and Technology), in 1979 and 1984, respectively. During 1984–1986, he was a researcher in the Korea Atomic Energy Research Institute. In 1986, he joined POSTECH (Pohang University of Science and Technology), Korea, where he is currently an associate professor of electronic and electrical engineering. During 1988–1989, he worked in the Robotics Institute at Carnegie Mellon University, Pittsburgh,USA, as a visiting professor. His current research interests include computer vision and pattern recognition.