A New Linear Camera Self-Calibration Technique

4 downloads 0 Views 185KB Size Report
Jan 25, 2002 - section 2, homographies associated with scene planes between two images are discussed. The linear constraints on camera's intrinsic ...
ACCV2002: The 5th Asian Conference on Computer Vision, 23--25 January 2002, Melbourne, Australia.

A New Linear Camera Self-Calibration Technique Hua Li Huaifeng Zhang Fuchao Wu and Zhanyi Hu National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences P.O. Box 2728, Beijing 100080, P.R. China Email:{hfzhang, fcwu, huzy}@nlpr.ia.ac.cn [22] improved Ma’s method. In their new method, rather than two sets of 3 mutually orthogonal motions, four sets of two orthogonal planar camera motions are used. In both Ma’s methods and Yang’s, only 4 intrinsic parameters of camera can be linearly determined. If a full perspective camera model is used, in other words, if the skew factor is non-zero, both of their methods become invalid. In this paper, we propose a new active vision based camera calibration technique which can compute all the 5 intrinsic parameters linearly. In our new method, the planar information in the scene is used, and the camera undergoes N (N>=2) sets of three mutually orthogonal motions or N (N>=5) sets of two orthogonal planar motions. The organization of the paper is as follows: In section 2, homographies associated with scene planes between two images are discussed. The linear constraints on camera’s intrinsic parameters and the uniqueness of solution with respect to configurations of camera motions are elaborated in section 3. A new camera calibration algorithm is outlined in section 4. The experiments on simulated images and real images are reported in section 5 and section 6 respectively. Finally some conclusions are given in section 7.

Abstract In this paper, a new active vision based camera self-calibration technique is proposed. The novelty of this new technique is that it can determine LINEARLY all the FIVE intrinsic parameters of a camera. The basic principle of our new calibration technique is to use the planar information in the scene and to control the camera to undergo several sets of orthogonal planar motions. Then, a set of linear constraints on the 5 intrinsic parameters is derived by means of planar homographies between images. In addition, the uniqueness of the calibration solution with respect to the configurations of the camera’s motion is also investigated. Keywords:

1

Camera Self-Calibration, Active Vision, Homography

Introduction

Camera calibration is an indispensable step to obtain 3D geometric information from 2D images. With the traditional calibration method, the camera’s intrinsic parameters are computed from projected images of a well structured object, called calibration grid. However, in many practical applications, a calibration grid is neither available nor desirable, thus people turned to a new paradigm, called ‘self-calibration’, i.e., calibration without calibration grid. Since the pioneer works in [1,2], many similar techniques have been reported in the literature [3-15]. However, almost all such techniques have to solve some nonlinear equations, which inevitably leads to either low computational speed or non-convergence. In order to overcome this difficulty, some researchers explored the possibility to constrain the camera to undergo some specially designed motions [16-22]. Ma [21] proposed an active vision based linear calibration method. In Ma’s method, two different sets of camera motions, each one of which consists of 3 mutually orthogonal translations, are used to linearly determine the camera’s intrinsic parameters. Yang et al.

2

Homography Associated with a Scene Plane Between Two Images

2.1

Camera Model

Here a full perspective camera model is assumed, then the camera intrinsic parameters matrix is

 fu K =  0  0

s fv 0

u0  v 0  1 

where (u 0 , v0 ) is the principal point, f u , f v the focal lengths in u and v axis respectively, s the skew-factor.

2.2 Homography of a Plane between Two Images

1

Assuming m = (u, v,1) T , m ′ = (u ′, v ′,1) T are the homogeneous coordinates of two corresponding points in two images. If these two corresponding points are projected from a same scene point lying on a plane π , the following relation holds:

With at least 4 pairs of corresponding points, H can be determined. If the camera undergoes pure translations, then from (3), there exists a unique factor σ such that: * tn T −1 H = σ (I + K K ) (4) d * tn T −1 H − σI = σK K (5) d * Because rank ( Ktn T K −1 ) = 1 , σ must be the solution of the following equations:

sm ′ = Hm (1) where matrix H is the homography between the two images induced by the plane π , and s an unknown non-zero factor. In other words, the homography is determined up to a non-zero scale factor. Now suppose the plane π in 3D space is defined * * as: n T x = d , where n is the unit normal vector of π , d the distance from the origin of the world coordinate system to plane π . Assume that the world coordinate system coincides with the first camera coordinate system, then for the first image, we have: λm = Kx Assuming the transformation from the first camera coordinate system to the second one is x ′ = Rx + t , the corresponding point in the second image can be expressed as: 1 * λ ′m ′ = Kx ′ = KRx + Kt = KRx + Ktn T x d * −1 tn T − 1 = λ ( KRK + K K )m d

det ( H − σI ) = 0   det ([ H − σ I ] ) [ H − σ I ] 2× 2 ∈ Ω ( H − σ I ) 2× 2 = 0, 

(6)

(6) contains 6 linear equations about σ , so a unique solution can be obtained. A least squares solution of these 6 linear equations is used in practice.

3 3.1

Linear Constraints and Camera Motion Configurations Linear Constraints

If the camera undergoes two planar orthogonal translations t (1) , t ( 2 ) i.e. (t (1) ) T t ( 2 ) = 0 , assume that H 1 is the homography of a scene plane associated with the first translation, and H 2 is the homography of a scene plane (either the same plane as in the first translation, or another plane) associated with the second translation, then based on (3), we have: *T t (1) n1 H1 = σ 1 (I + K K −1 ) (7) d *T t (2) n 2 H 2 = σ 2 (I + K K −1 ) (8) d

From (1), the homography between these two images is: * tn T −1 H = σ ( KRK −1 + K K ) (2) d in (2), σ is an unknown non-zero factor. If the camera only undergoes a pure translation, the homography becomes: * tn T −1 H = σ (I + K K ) (3) d

2.3 Homography Calculation and the Associated Constant Factor Determination

so,

K −1 ( H 1 − σ 1 I ) K =

σ 1 (1) * T t n1 d

σ 2 (2) * T t n2 d Multiply (10) by the transpose of (9), then,

Since a homography can only be determined up to a scale, we can generally eliminate the scaling effect by assuming the homography has the following form:

K −1 ( H 2 − σ 2 I ) K =

 h1 h 2 h3    H =  h4 h5 h6  h h 1  8  7 Then H can be written as a column vector: h = (h1 , h2 , h3 , h4 , h5 , h6 , h7 , h8 ) T . From (1), a pair of corresponding points m = (u, v,1) T , m ′ = (u ′, v ′,1) T can bring out two linear constraints on h ,

(9) (10)

K −1 ( H 1T − σ 1 I ) K −T K −1 ( H 2 − σ 2 I ) K

=

Let

(u , v,1,0,0,0, u ′u, u ′v)h = u ′ (0,0,0, u , v,1, v ′u, v ′v)h = v ′

σ 1σ 2 * (1) T ( 2 ) * T n1 ( t ) t n2 = 0 3×3 d2

C=K

−T

K

−1

 c1  =  c2 c  3

c2 c4 c5

c3   c5  c 6 

It is a symmetrical positive definite matrix. And the

2

following linear constraint on C can be derived since σ 1 , σ 2 can be obtained from H 1 , H 2 uniquely as shown in the preceding section.

words, the corresponding motion configuration is a degenerated one. Concerning the uniqueness of the solution of C for given five TPOTs, we have the following conjecture: Conjecture: Among the five motion planes, if no two or more planes are parallel, the matrix C can be determined uniquely modulo a scale factor.

( H 1T − σ 1 I )C ( H 2 − σ 2 I ) = 0 3×3 (11) is the fundamental constraint introduced in this paper. Let c = (c1 , c 2 , c 3 , c 4 , c 5 , c 6 ) T , (11) can be re-written as * A9×6 c = 09 (12) Although (12) contains 9 linear constraints on C, we can easily prove that only one constraint is useful, all the other constraints are dependent. In other words, matrix A is of rank one. Simple illation is as below: From (9) and (10), ( H 1T − σ 1 I ) and ( H 2 − σ 2 I ) are of rank one. So equation (11) can produce only one linear constraint on C. And we can conclude that (12) can also produce only one linear constraint on C . In order to uniquely obtain C in the sense of up to a scale factor, at least 5 constraints as defined in (12) are needed.

4

Suppose the camera observes a scene plane and the correspondence of image points is established beforehand, then our new self-calibration algorithm is as follows: (1) Control the camera to undergo N (N>=5) sets of two planar orthogonal translations (or N (N>=2) sets of three mutually orthogonal translations). (2) Compute

H i1 , H i 2

(4) Write the linear constraints on c in the form: * Ac = 0 9 . * (5) Compute the least squares solution for Ac = 09 . (6) Construct C , then decompose C −1 as: C −1 = VV T by Cholesky factorization, then decompose V as: V = KQ by RQ factorization, and finally normalize K to make k 33 = 1 . Then the normalized K is just the matrix of the camera’s intrinsic parameters.

5

Experiments with Simulated Images

As shown above, there are two factors which largely affect the performance of our algorithm. These two factors are noise level and the orthogonality of two camera translations within a same motion set. In order to assess their influences, the following experiments have been done.

i = 1,2

where H ij is the homography associated with the j th translation in the ith TMOT. Here, by “two sets of TMOTs being independent”, we mean that all the following 4×3 matrices must be of rank 3,

5.1 Noise Influence Here the size of simulated images is 1024*1024 pixels, and at each image point, a random noise is added. The camera’s setup is: f u = 1000, f v = 1000, s = 0.20, u 0 = 0, v 0 = 0 . Noise unit: pixel. 20 points are used for determining H . With different magnitude of random noise, the algorithm is run for 100 times, and then means and RMS errors of the intrinsic parameters are computed. The results are shown in Table1 and Table2. From these two tables, we know that with linear increases of noise level, RMS also increases linearly, which is quite satisfactory.

t j1 t k 2 t l 2 ] i, j, k , l = 1,2,3; i ≠ j, k ≠ l . In other words, 4 vectors ( t i1 , t j1 , t k 2 , t l 2 ) are not coplanar ones. i1

homographies

planar orthogonal translations. (3) Determine the scale factor σ i associated with each H i , as shown in section 2.

3.2.1 Two Sets of Three Mutually Orthogonal Translations (TMOT) Each pair of translations among the 3 translations in one TMOT can produce one constraint on C . So one TMOT can produce three constraints on C . Then at least 2 sets of TMOTs are needed to determine C . Concerning the uniqueness of the solution of C , we have the following proposition: Proposition : Γ i = {t i1 t i 2 t i 3 }, i = 1,2 are two sets of TMOTs, if these two sets are independent, then C can be determined uniquely modulo a scale factor from the following constraints:

[t

the

associated with the scene plane in each set of two

3.2 The Configurations of Camera Motions and the Uniqueness of Solution

 ( H i1 − σ i1 I ) T C ( H i 2 − σ i 2 I ) = 0 3×3  T ( H i 2 − σ i 2 I ) C ( H i 3 − σ i 3 I ) = 0 3×3 ,  (H − σ I )T C ( H − σ I ) = 0 i3 i1 i1 3×3  i3

Algorithm

T

Due to the limited space, the proof is omitted here. 3.2.2 Five Sets of Two Planar Orthogonal Translations (TPOT) As shown in the previous section, each set of TPOT can produce one linear constraint on C. Hence in general, 5 sets of TPOTs can produce 5 linear constraints on C. It is evident that in some cases, 5 TPOTs can not produce 5 independent linear constraints, in other

3

Table 1. Means of the estimated intrinsic parameters at different noise level Noise 0.1 0.2

fu

fv

s

u0

v0

999.620 999.524

999.705 999.627

0.535 -0.103

0.108 0.694

0.117 -0.164

0.3

997.178

998.711

-0.145

0.646

-2.838

0.4

1000.014

999.425

1.089

-0.521

0.972

0.6 0.8 1.0 1.5

997.965 999.079 996.211 1025.636

998.322 994.869 993.771 986.579

-0.965 1.826 -1.928 7.314

1.072 5.065 5.782 19.065

-2.249 7.403 4.021 65.364

92

87 | 93

86 | 94

Table 2. RMSs of the estimated intrinsic parameters at different noise level Noise

∆f u

∆f v

∆s

∆u 0

∆v 0

0.1 0.2 0.3

4.420 11.247 16.233

1.419 3.839 5.129

1.629 3.025 4.891

1.227 3.036 4.594

6.690 15.045 23.082

0.4

19.662

6.172

5.843

5.909

27.659

0.6 0.8

31.783 45.845

10.069 16.051

11.292 12.733

8.081 12.589

41.883 63.160

1.0 1.5

51.048 82.179

17.803 29.546

15.346 26.543

15.013 31.442

70.404 141.292

85 | 95

X-Y 89 | 91

The image size is: 1024*1024 pixels. The camera’s setup is: f u = 1000, f v = 1000, s = 0.20, u 0 = 0, v0 = 0 . In this case, image points do not contain any noise. But the included angle between the two camera translations is allowed to vary at random within a given error bound1. At each given error bound and for each given number of motion sets, 100 runs are done. The final results are shown in Table 3 and Table 4. In Table 3 and Table 4, “X-Y” stands for the included angle of the two translations for each planar motion set.

88 | 92

87 | 93

Table 3. Means of the estimated intrinsic parameters at different error bound

89 | 91 88 |

1

fu fv s u0 v0 fu fv

5 sets 976.234 1018.556 -14.591 -24.988 26.342 975.939 993.423

8 sets 1001.389 1000.040 -7.004 0.954 -1.387 999.004 998.746

10 sets 998.881 999.786 1.510 -3.237 0.668 1005.938 1006.312

-23.649 10.699 -21.761 972.576 994.415 -31.717 15.463 -3.369 944.180 1024.595 4.277 13.534 -55.905 928.851 1008.974 -54.663 -42.923 -10.703

-3.687 9.810 -1.292 995.750 1004.771 -5.566 5.320 -6.304 1000.055 1014.221 -17.269 31.311 -5.713 990.974 1039.206 -40.048 34.950 -13.128

-9.525 2.975 10.187 1024.539 1017.879 -19.352 3.799 17.905 1008.938 1009.758 2.719 -6.051 -2.246 1041.977 1038.543 -28.706 10.556 41.784

2.134 3.891 1.484 999.390 1004.243 -0.644 6.204 -1.356 1001.645 1017.907 -13.755 4.980 1.592 1019.730 1032.747 -27.762 21.341 13.602

Table 4. RMSs of the estimated intrinsic parameters at different error bound

5.2 Orthogonality Influence

X-Y

s u0 v0 fu fv s u0 v0 fu fv s u0 v0 fu fv s u0 v0

86

15 sets 998.661 1000.005 -0.893 -0.165 -0.808 1003.623 1003.812

| 94

85 | 95

Remember ideally the two translation should be orthogonal

4

∆f u ∆f v ∆s ∆u 0 ∆v0 ∆f u ∆f v ∆s ∆u 0 ∆v0 ∆f u ∆f v ∆s ∆u 0 ∆v0 ∆f u ∆f v ∆s ∆u 0 ∆v0 ∆f u ∆f v ∆s ∆u 0 ∆v0

5 sets

8 sets

10 sets

15 sets

68.675

22.476

14.833

12.223

95.449

20.111

15.430

12.137

89.769

24.096

15.750

12.298

79.402

20.059

17.572

11.282

93.204

23.037

15.660

12.774

76.504

36.093

28.332

24.695

74.637

36.193

33.237

19.735

76.964

40.716

34.825

25.863

76.272

37.052

36.785

29.878

96.948

44.650

32.760

27.553

116.353

56.076

51.361

33.932

117.717

54.311

42.966

37.355

137.267

69.961

49.458

40.116

124.863

63.049

41.249

36.142

131.662

69.823

50.069

33.646

134.559

74.162

62.798

42.071

157.014

77.104

61.539

54.406

157.637

93.322

66.006

58.155

147.420

101.423

77.876

58.146

181.170

88.721

81.984

51.644

146.971

82.832

88.394

64.677

193.981

100.698

87.937

77.262

186.325

111.342

104.461

68.707

199.289

98.310

106.240

81.448

163.040

99.643

94.143

78.437

The highlighted points are the corresponding points selected from two planes which are orthogonal to each other. Fig 3 are the reconstructed planes with different view directions.

From these tables, we know that the more the number of the motion sets and the smaller the error bound, the better the estimated results. Hence in order to obtain satisfying results, if possible, the two camera translations within a same motion set should keep as orthogonal as possible and more images should be used. Fortunately, these two conditions can be generally satisfied in practice with an active vision system.

6

Experiments with Real Images

In real image experiments, a CCD camera is used. After the calibration, the calibrated intrinsic parameters are used to reconstruct a calibration grid to verify whether the calibrated parameters are reliable.

6.1

Fig 2: The two images used for the reconstruction

Calibrating the Intrinsic Parameters

Here a 3D scene including a plane is used. The camera undergoes 16 sets of two orthogonal translations, and we get 16 groups of images, each group contains 3 images (one image before translation, and two images after each translation). The image size is 384*288 pixels. One of the image groups is shown in Fig1.

(1)

(2) Fig 3: Reconstructed two planes with different view

Fig 1

In Fig 3, (1) is the top view, (2) is the side view. The included angle of the two reconstructed planes is 90.45 degrees, which is quite close to its real value of 90 degrees. Based on the fairly good reconstruction results, we can reasonably think that the calibrated intrinsic parameters are reliable. Besides, it is worth noting that the results from our linear calibration technique can be used as the initial values for a non-linear optimization method for further refining.

A group of images taken by a CCD camera

With our new algorithm, the calibrated intrinsic parameters of the CCD camera are listed in Table5.

Table 5.

The estimated intrinsic parameters

fu 524.6731

6.2

fv 256.9008

u0

v0

172.0141 190.6486

7. Conclusion

s -0.8220

In this paper a new active vision based self-calibration technique is proposed. Our new technique can LINEARLY determine all the FIVE intrinsic parameters. To our knowledge, in the literature there have been no method which can linearly calibrate a full perspective camera. The experiments with simulated data and real images validate our new camera calibration technique.

Verification via Reconstruction

Here a standard stereo vision technique is used to reconstruct the 3D scene to verify whether the calibrated intrinsic parameters are reliable. Fig 2 shows a pair of images used for the reconstruction of a calibration grid.

5

[10] M. Pollefeys, L. Van Gool, M. Proesmans, Euclidean 3D reconstruction from image sequences with variable focal lengths, in Proc. ECCV’96 [11] A. Shashua, Projective Structure from Uncalibrated Images: Structure From Motion and Recognition, IEEE-Trans. PAMI 16:8, pp.778-790, 1. [12] S. Avidan and A. Shashua, Threading fundamental matrices, in Proc. ECCV’98, Vol.-I, pp.124-140 [13] A. Heyden, A common framework for multiple view tensors, in Proc. ECCV’98, Vol.I, pp.3-19, 1998. [14] L. Quan and T. Kanade, Affine structure from line correspondences with uncalibrated affine cameras, IEEE-Trans. PAMI, Vol.19, No.8, pp.834-845, 1997. [15] B. Triggs, Autocalibration from planar scenes, in Proc. ECCV’98, pp.89-105, 1998 [16] L. Dron. Dynamic Camera Self-Calibration of from Controlled Motion Sequences, in Proc.CVPR’93, pp.501-506,1993. [17] A. Basu. Active Calibration: Alternative Strategy and Analysis, in Proc.CVPR’93, pp.405-500, 1993 [18] F. Du and M. Brady, Self-calibration of the Intrinsic Parameters of Cameras for Active Vision Systems, in Proc.CVPR’93, pp.477-482, 1993. [19] F. X. Li, Michael Brady, Charles Wiles, Fast Computation of the Fundamental Matrix for an Active Stereo Vision System, in Proc. ECCV’96, Vol-I. pp.156-166, 1996. [20] M. X. Li, Camera Calibration of a Head-Eye System for Active Vision, in Proc. ECCV’94, pp.543-554, 1994. [21] S. D. Ma, A self-calibration technique for active vision systems, IEEE Trans on Robotics and Automation, 12(1): pp.114-120, 1996. [22] C. J. Yang, W. Wang, and Z. Y. Hu, A active vision based self-calibration technique. Chinese Journal of Computers, Vol 5, pp.428-435, 1998.

Acknowledgments This work was supported by “973” Program (G1998030502) and the National Science Foundation of China (60033010, 69975021).

Reference [1] R. I. Hartley, Estimation of relative camera positions for uncalibrated cameras, in Proc. ECCV’92, pp.579-387, 1992. [2] O. D. Faugeras, What can be seen in three dimensions with an uncalibrated stereo rig, in Proc. ECCV’92, pp.563-578, 1992. [3] S. J. Maybank and O. D. Faugeras, A Theory of Self-Calibration of a Moving Camera, Inter. Journal of Computer Vision 8(2), pp.123-151, 1992. [4] Q. T. Luong and O. D. Faugeras, The Fundamental Matrix: Theory, Algorithms, and Stability Analysis, Inter. Journal of computer Vision 17, pp.43-75, 1996 [5] O. D. Faugeras et al., 3-D reconstruction of urban scenes from image sequences, Computer Vision and Image Understanding, Vol.69, No. 3, March pp.392-309, 1998. [6] R. I. Hartley, Self-Calibration of Stationary Cameras, Inter. Journal of Computer Vision 22(1), pp.5-23, 1997. [7] R. I. Hartley, Line and Points in three Views and the Trifocal Tensor, Inter. Journal of Computer Vision 22(2), pp.125-140, 1997. [8] R. Koch, M. Pollefeys, and L. Van Gool, Multi viewpoint stereo from uncalibrated video sequences, in Proc. ECCV’98, Vol.I, pp.55-71, 1998. [9] M. Pollefeys, R. Koch and L. Van Gool, Self-Calibration and Metric Reconstruction in spite of Varying and Unknown Internal Camera Parameters, in Proc. ICCV’98, pp.90-95, 1998.

6

Suggest Documents