台灣.高雄.高雄大學
2009 第17屆模糊理論及其應用研討會
Further Study on Camera Position Estimation from Image by Neural Network K.H. Hsia1, S.F. Lien2, C.C. Wang3, T.E. Lee4 and J.P. Su5 1 Department of Management Information System, Far East University 2,4
E-mail:
[email protected] Graduate School of Engineering Science & Technology (Doctoral Program) National Yunlin University of Science & Technology
E-mail:
[email protected], E-mail:
[email protected] 3 Department of Electronic Engineering, Chienkuo Technology University 5
E-mail:
[email protected] Department of Electrical Engineering, National Yunlin University of Science & Technology 5 Department of Information Technology, Overseas Chinese University E-mail:
[email protected]
關鍵詞:投影幾何,相機位置估測,類神經網路。
Abstract It is well known that the different position of the camera will cause a different image. From the earlier research, the camera position could be estimated by ANN (artificial neural network) when the object center lies on the image center under certain conditions. In this paper, the method is extended for more situations to estimate the position of the camera by extracting features from a 2D image under released conditions. The computational loading of ANN is eased off by data classification. From the experimental results, it is evidently that the proposed method can estimate the center of the camera correctly.
1. INTRODUCTION Projective geometry [4] provides a useful relationship between the 3D scenery and the pictured 2D image. However, the relationship is nonlinear and highly complicated. To keep away from model building, artificial neural network (ANN) [3] [6] may be a possible way to build this relationship. Consider the case when there is a circular target in the 3D scenery, and the center of the circular target is projected to the centerline of the 2D image, the circle in the 3D world will be projected to a deformed circle in the 2D image. By projective geometry, one can obtain the features of the deformed circle from the diameter of the circular target, the distance between the camera center and the target center, and the depression angle of the camera. It is possible to estimate the position of the camera from some features of the deformed circle and the circular target. However, the relationship between these features and the position of the camera is more complicated then previous research. In [5], an algorithm was developed for solving the problem under the condition that the target center must be at the center of image. In fact, it is not general enough for the applications of security systems [8] and aerial photography [1]. In this paper, the algorithm is improved for a released situation.
Keywords : projective geometry; camera position estimation; neural network.
摘要 眾人皆知相機的位置不同時將會造成不同的 影像。當目標物的中心影像落於特定位置時,我們 已經可以用類神經網路由照片去估測出相機的位 置。在本論文中,我們將先前的作法加以延伸,意 圖解決先前的限制條件放鬆後的相機位置估測問 題。由於資料量的龐大,造成類神經網路的訓練曠 日廢時,我們採用將資料分段的方法來解決這個問 題。由實驗的結果看來,本論文所提出的方法確實 可以正確地估測出相機的位置。
904
台灣.高雄.高雄大學
2009 第17屆模糊理論及其應用研討會
In this paper, we use ANN to learn the mapping relation between the 3D scenery and the 2D image, and the data classification [10] scheme to speed up the learning efficiency of the ANN. The experimental results show that the camera positions could be estimated quickly and accurately by the image features. Details are illustrated in the following contents.
2.
u
R
P
S Q
O
H
a
b
c
PROJECTIVE TRANSFORMATIONS OF
d
THE TARGET IN AN IMAGE
e
For a point in the 3D space being projected to 2D image through the camera center, the projected points will be different with different positions of camera center. Consider the case when there is a circular target in the 3D space, and it is projected to the 2D image. In the captured image, there will be a deformed circle which is the projection of the circular target. One can always trim the camera so that the crossover point of the axes of the deformed circle, called the ‘center’ of the deformed circle, is located on the vertical line crossing the center of the image, as shown in Figure 1. Obviously, this ‘center’ is the projection of the center of the circular target.
f
Figure 2. Features of the circular target projection.
u
R
u cos( )
r
Figure 3. Feature r of the circular target projection.
r
Q
R
Figure 1. Projected image of a circular target. From the image, one can choose five distances, and these quantities satisfy the following properties: 1. h k is dependent on the pitch angle of the camera. 2. r d is dependent on the distance between the camera and the image center in the 3D world. 3. h r is dependent on both the pitch angle of the camera and the distance between the camera and the image center in the 3D world. 4.
Figure 4. Half visual angles. Find the geometry projection of the Perspective target. Procedure Perspective_target (Ratioproject, Ratioheight, Ratioangle, Ratioposition )
k Q is dependent on the target position. Define Ratioproject = h k , Ratioheight = r d ,
“MinRtm: The lowest boundary of the target moving range in image.” “MaxRtm: The highest boundary of the target moving range in image.” “r: The radius of target.” “ : The half vertical visual angle of camera.” “ : The half horizontal visual angle of camera.” “R: The distance between the camera and the image center in the 3D world.”
Ratioangle = h r and Ratioposition = k Q . Based on trigonometric relations and projective geometry, one can obtain these ratios from the parameters of the camera and the target. The relationships of these parameters are shown in Figures 2, 3 and 4. From above, the algorithm to find these ratios could be written as in Algorithm 1.
905
台灣.高雄.高雄大學
2009 第17屆模糊理論及其應用研討會
“ : The pitch angle of the camera.” “The remainder definitions of parameters are shown in Figures 2, 3, and 4. ”
speed up the training. There are at least two advantages using the data segmentation method; the first is saving a lot of time for training and the second is that the resulting neural network could have more precise estimation. The neural network inference procedure is illustrated in Figure 5. Suppose that the data is segmented into n groups. Thus, there are n neural networks for estimating the camera position. The ith group of data is called “class_i”. When data obtained from a picture is input, it is classified to a suitable segment and then the data is sent to the suitable neural network to estimate the camera position.
r=2.6, =27, =36 for R=20:1:320 for =-62:1:62 {
H sin * R ,
a sin * H , b cos * R ,
,
f tan * H , d f 2r ,
MaxRtm= f r b MinRtm= b a r
for {
class class class class
Range = MinRtm: 1: MaxRtm
e b Range ,
tan 1 (
e
) , tan 1 (
d
) H H , ,
_1 _ 2 _i _ n
P tan * R , Q tan( ) * R , S tan( ) * R
Figure 5. Training procedure of ANN.
k Q P , h S Q , u
e H 2
4.
2
v tan * u , m tan * R ,
The procedure for feature extraction and image processing is illustrated in Figure 6. In this paper, image features are Ratioangle, Ratioheight, Ratioposition and Ratioproject. Using the image processing technique, one can extract these features from the pictured image. In the procedure, the thresholding [9] [11] and filtering [2] are proposed for target segmentation. Thresholding is a basic scheme for image segmentation. The binary image will be easy for analyzing and computing. The basic rectangle is defined by the smallest rectangle that the target object can be embedded. The range of the basic rectangle [9] is defined by the minimum rectangle containing the object. The definitions of the major axis and the minor axis are the longer edge and the shorter edge of the rectangle, respectively. For our case, refer to Figure 1, the length of the major axis is equal to r r , and the length of the minor axis is equal to h + k.
n tan * R d= m 2 n 2 ,
r
R*r u cos( )
Ratioproject= h , Ratioheight = r d k Ratioangle= h , Ratioposition= k r Q
,
} }
Algorithm 1: Algorithm for finding the ratios.
3.
EXTRACTING FEATURE OF TARGET FROM IMAGE
ARTIFICIAL NEURAL NETWORK TRAINING
The artificial neural network is used for replace the nonlinear relationship between a circular target in the 3D scenery and its projective transformation. The training data is constructed by Algorithm 1. For the case presented in Algorithm 1, there are over 100000 constructed data. The amount of the data is so huge that the training time is intolerably long. The data segmentation method [7] is proposed to
5. EXPERIMENTAL RESULTS The procedure for camera position estimating is shown in Figure 7. Suppose that an image is taken by camera at somewhere in 3D space, and its features
906
台灣.高雄.高雄大學
2009 第17屆模糊理論及其應用研討會
are extracted by image processing algorithm. The obtained features will be the input of ANN for the camera position estimation.
image, and the second image is the binary image, that is clear to see the target in the image. The last image shows the basic rectangle of target, the major axis length is 30.1994 and the minor axis length is 20.4095. In Figure 9, the major axis length and the minor axis length are 40.9356 and 36.8006, respectively. Figures 8 and 9 show that the feature extraction algorithm could find out the features of image effectively.
(a)
(b)
(c)
Figure 8. Image of the reference object. (a) Image input (b) Binary image (c) Basic rectangle of image
(a)
(b)
(c)
Figure 9. Another image of the reference object. (a) Image input (b) Binary image (c) Basic rectangle of image
Figure 6. Procedure for feature extraction.
The training data is generated by geometry computing. The angle is changed from -62° to 62°, and the distance R is changed from 20 cm to 300 cm. The structure of the ANN to fit the relation between Ratioproject, Ratioheight, Ratioangle, Ratioposition and Q , , R are listed in the following: 1. The number of hidden layers is 4 2. There are 10 nodes in each hidden layer and 2 nodes in the output layer. 3. Input data: Ratioproject, Ratioheight, Ratioangle and Ratioposition. 4. Output data: Q , and R. The training results of the ANNs of 9 randomly chosen data are shown in Table 1. The mean square errors (MSE) of the and R are 0.3301 and 0.2103, respectively. Since both scales of and R are 1, the MSEs are acceptable. From Figure 2, we can easily find that the range will become longer and longer as and R increasing, thus the Q of the 9th I/O data in Table 1 is very large. However, the errors in percentage are very small, and that is a good training result for the ANN. In this paper, the average training error of Q is 3.4%. The ANN data testing results are shown in Table 2. The MSE of the estimated is 0.2122, and the MSE of the estimated R is 0.2448.
Figure 7. Estimation procedure. Thus, there are two steps for the camera position estimation from image. The first setp is image features extraction, and the second step is camera position estimation. In our experiment, the size of the image is 160*120, and the radius of the circular target is 2.6 cm. In Figure 8, the fitst image is an obtained real
907
台灣.高雄.高雄大學
2009 第17屆模糊理論及其應用研討會
pp.483- 490. [3] M. T. Hagan, H B. Demuth and M. Beale, Neural Network Design, Thomson, 2004. [4] R. Hartley and A. Zisserman, Multiple View Geometry in computer vision, Cambridge University Press, 2003. [5] K. H. Hsia, S. F. Lien, C. C. Wang, T. N. Lee and J. P. Su, “Camera position estimation from image by neural network,” Second International Symposium on Intelligent Informatics (ISII2009), 2009, in press. [6] J. S. R. Jang, C. T. Sun, E. Mizutani, Neuro-Fuzzy and Soft Computing, Pearson, 2004. [7] B. Kolman and D. R. Hill, Elementary Linear Algebra With Applications, Prentice Hall, 2008. [8] K. Sage and S. Young, “Security applications of computer vision,” Aerospace and Electronic Systems Magazine, IEEE, 1999, pp. 19-29. [9] L.G. Shapiro and G. C. Stockman, Computer Vision, Prentice Hill, 2001 [10] N. P. Suraweera and D. N. Ranasinghe, “A natural algorithmic approach to the structural optimization of neural networks,” International Conference on Information and Automation for Sustainability (ICIAFS 2008), IEEE, 2008, pp.150-156. [11] D. Ziou, “Optimal thresholding for image segmentation,” 2nd African Conference on Research in Computer Science, Burkina-Faso, 1994, pp.297-299.
6. CONCLUSION In this paper, the restriction in [5] that the ‘center’ of the deformed circle, which is the projection of the circular target, has to be located at the center of the captured image has been released. In this paper, the ‘center’ of the deformed circle is restricted to be located at the vertical line crossing the center of the image. The proposed algorithm could estimate the camera position under this condition. We introduce the data segmentation on the training process for ANN to conquer the huge amount of training data. Experimental results show that this can speed up the ANN learning and reduce the learning time. From the experimental results, it is obviously that the image feature extraction algorithm could find out the available features of object for ANN input, and the obtained ANN is very effective for camera position estimation. It also shows that the complicated mapping and highly nonlinear relation could be replaced by ANN.
7. REFERENCES [1] J. A. Franco, M. Moctezuma, and F. Parmiggiani, “A fusion-based segmentation algorithm for high-resolution panchromatic aerial photography,” Proc. International Geoscience and Remote Sensing Symposium, IEEE, 2002, vol. 6, pp. 3396-3398. [2] R. C. Gonzalez, R. E. Woods, S. L. Eddins, Digital Image Processing, Prentice Hall, 2002,
908
台灣.高雄.高雄大學
2009 第17屆模糊理論及其應用研討會
Table1. ANN data training results 1
2
3
4
5
6
7
8
9
Training input: Ratio project
1.32579 0.92520 0.98187 1.02009 0.98976 1.05818 0.98861 0.99881 0.99935
Ratio angle
0.15936 0.05273 0.00761 0.03949 0.03691 0.04364 -0.00624 -0.01136 -0.00013
Ratio height
1.17289 0.57956 0.10677 1.24019 1.17185 0.85257 -0.65794 -1.08914 -0.07546
Ratio position
0.49058 0.04896 0.00123 0.07498 -0.06778 0.111494 0.00778 -0.18711 0.00001
Training output : Target position (cm) Angle (θ°)
-11.607 47.3128 275.317 10.7824 -25.2489
-51.81 336.237
6.4186 7877.13
-58
43
61
-18
10
-55
48
4
-62
21
25
29
93
95
100
192
292
293
Estimated Target position (cm)
-11.60
47.31
275.31
10.78
-25.24
-51.81
336.23
Estimated Angle (θ°)
-58.49
43.66
61.33
-17.34
9.44
-55.77
47.72
3.53
-61.23
21.24
25.14
28.69
93.52
95.29
100.34
191.45
292.45
293.79
1.4
3.7
4.6
1.2
3.1
3.9
4.3
0.9
5.6
Error of angle (θ°)
-0.49
-0.66
-0.33
-0.65
0.56
-0.77
0.28
0.47
0.76
Error of height (cm)
-0.24
-0.14
0.31
-0.52
-0.29
-0.34
0.65
-0.45
-0.79
6
7
8
9
1.01123 1.22728 0.95033 0.99179 0.99333 1.00757 0.99942
0.9964
0.9936
Ratio angle
-0.00099 0.01675 -0.00379 -0.00106 -0.00107 -0.00516 -0.01754
0.0025
0.0047
Ratio height
-0.20058 7.56670 -1.62177 -0.62743 -0.39756 -0.14470 -1.29337
0.797
0.1764
Ratio position
-0.00028 0.89217 0.01256 0.00096 0.00061 -0.00131 -0.03969 0.00030
0.0017
Height (cm)
Estimated Height (cm) Error of target position (%)
6.41 7877.13
Table2. ANN data testing results 1
2
3
4
5
Testing input data: Ratio project
Testing output data : Target position (cm) Angle (θ°) Height (cm)
-460.614 -15.656
1391.9 749.823
-62
-54
61
58
59
-61
1
61
62
20
23
29
101
103
156
160
184
253
Estimated Target position (cm) 478.018 -15.468 Estimated Angle (θ°)
94.317 597.633 745.420 -647.431 -63.207
96.015 571.935 775.982 -622.829 -61.8799 1467.06 718.333
-62.22
-53.41
60.55
58.74
58.43
-61.32
1.47
60.78
61.4
19.35
23.44
28.47
100.59
102.58
156.21
160.72
183.36
252.69
3.7
1.2
1.8
4.3
4.1
3.8
2.1
5.4
4.2
Error of angle (θ°)
0.22
-0.59
0.45
-0.74
0.57
-0.32
-0.47
0.22
0.6
Error of height (cm)
0.15
-0.44
0.53
0.41
0.42
-0.21
-0.72
-0.64
0.31
Estimated Height (cm) Error of target position (%)
909