Depth Estimation for a Mobile Platform Using Monocular Vision

0 downloads 0 Views 415KB Size Report
Keywords: depth estimation; monocular vision; pixel location; vision sensor. 1. .... processed using Open Source Computer Vision Library (OpenCV) version 2.30 .... and H. Yalou, "Research on active SLAM with fusion of monocular vision and ...
Available online at www.sciencedirect.com

Procedia Engineering 41 (2012) 945 – 950

International Symposium on Robotics and Intelligent Sensors 2012 (IRIS 2012)

Depth Estimation for a Mobile Platform Using Monocular Vision Z. Said*, K. Sundaraj and M. N. A. Wahab School of Mechatronic Engineering, Universiti Malaysia Perlis, Perlis, Malaysia

Abstract This paper briefly discusses the depth estimation method for a mobile platform using monocular vision. The biggest challenge for autonomous mobile platform in an unknown environment is the accuracy in the estimation of the distance and the position of obstacles around them. In order for them to safely navigate from one position to another, reliable range sensors are needed to detect any obstacle that blocked their path. Vision sensor can be used for the purpose, as it can provide a better and cost-effective solution. The method discussed in this paper requires a simple calibration. The data obtained from calibration process will be used to generate the equation for depth estimation procedure. The results presented in this paper testify the reliability of the methodology used for depth estimation.

© 2012 The Authors. Published by Elsevier Ltd. Selection and/or peer-review under responsibility of the Centre of Humanoid Robots and Bio-Sensor (HuRoBs), Faculty of Mechanical Engineering, Universiti Teknologi MARA.

Open access under CC BY-NC-ND license.

Keywords : depth estimation; monocular vision; pixel location; vision sensor

1. Introduction Depth estimation is an important topic in computer vision. Although many techniques have been proposed but there are still many parameters that need to be defined. Depth perception can be obtained by extracting the visual features from the images such as shading, texture, known object sizes, color, defocus, linear perspective, etc [1-4]. Stereo vision system is the most commonly used technique. Two cameras that focus on the same scene from different perspectives are used in this technique to provide a method for the determining the 3D shape and position. Depth estimation can be obtained from disparity map on each point of stereo image [5,6]. There are several drawbacks in using the stereo vision system such as it requires more power consumption, more space for mounting the camera [7] and more memory stack space for computational process [8]. Depth estimation using monocular vision is a challenging problem. For example, although features such as the texture or color of an object can give the information about its depth, the method fails to accurately determine the object’s absolute depth [9]. To overcome these shortcomings, some researchers integrated the monocular vision system with a laser system so as to obtain accurate depth estimation [10,11]. In this study, our focus is to develop simple algorithms that are able to estimate the distance by using a monocular vision accurately. This paper organized as follows: Section 2 describes the previous works on depth estimation methods using monocular vision. Experimental procedures are discussed in Section 3, while in Section 4, the results obtained from the procedure is shown. Finally, Section 5 concludes the paper.

* Corresponding author. Tel.:+6-013 -462-2972. E-mail address: [email protected]

1877-7058 © 2012 Published by Elsevier Ltd. Open access under CC BY-NC-ND license. doi:10.1016/j.proeng.2012.07.267

946

Z. Said et al. / Procedia Engineering 41 (2012) 945 – 950

2. Depth estimation methods using monocular vision In general, it is a challenging task to obtain the distance of an object using monocular vision. In order to obtain depth perception from single-view image, the researcher can only estimate the relative depth values by analyzing depth information in that image. Currently, conventional range sensor such as the laser system provides better accuracy in depth estimation [12]; however monocular vision system still has a lot to offer. 2.1. Known fix feature This technique uses special fixed features from known object [7,13] as a reference for visual depth estimation. To calibrate the system, Ibrar et al. [13] used circles of a known size as a reference pattern on the horizontal plane. The camera field of view (FOV) should be perpendicular to the horizontal plane. The radii of each circle is measured at a different depth position and is arranged as follows r1 = (r11 r12 r13 r14 r15) r2 = (r2 1 r22 r23 r24 r25)

(1)

r1 = (r3 1 r32 r33 r34 r35) where r1 , r2 and r3 represent the radius for the first, second and third circle respectively. Furthermore, three-degree polynomials were used as it is best fit in that condition. Hr1 = a1 r1 3 + b1 r1 2 + c1 r1 + d1 3

2

Hr2 = a2 r2 + b2 r2 + c2 r2 + d2

(2)

Hr3 = a3 r3 3 + b3 r3 2 + c3 r3 + d3 Since Hr1 , Hr2 and Hr3 are supposed to have same depth position, then: Depth = (Hr1 + H r2 + H r3) » 3

(3)

Wahab et al. [7 ] utilized the diameter of a ball used in a soccer robot to estimate the depth position. The diameter of the ball in the image is measured from various depths for calibrating the distance estimation. The image taken for calibrating depth estimation must be a complete image of the ball without any occlusion. Wang et al. [14] used an acute triangle template to calibrate the camera. The distance between the template plane and the camera plane are measured as the template image is captured in different positions. The coordinates of the triangle vertices of the captured images are then matched with the triangle vertex coordinates in a template plane. The depth estimation can be obtained from the information of each vertex. 2.2. Geometrical relations of the mounted camera Geometrical relations of the mounted camera on a robot can be used to calculate the world coordinate frame and the image coordinate frame. The proposed method requires the camera to be tilted to the point that the entire camera FOV intersects with the floor [15-17]. A simple projection model can be used to calculate the transformation equation. Fig. 1 shows the geometrical relations of the mounted camera on a robot where: x 2Į is the vertical field angle of view x 2Ȗ is the horizontal field angle of view x į is the camera pitch angle x b is the distance of the blind area x d is the distance between the nearest point of view and the intersection point of the optical axis and the ground x h is the camera height from the ground x 2w is the width of the field of view at the intersection point of the optical axis and the ground

947

Z. Said et al. / Procedia Engineering 41 (2012) 945 – 950

(a)

(b)

Fig. 1. Geometrical relations of the mounted camera on a robot (a) from top view and (b) from side view [16]

The geometrical relation from Fig. 1 can obtained from the following equation: Į+ȕ +į=ʌ »2

(4)

tan ȕ = b » h

(5)

Qiang et al. [16] used the information obtained from (4) and (5) to estimate the distance as follows: y = h tan (ȕ + 2Į (n - 1 - v) / (n - 1))

(6)

where y is the object distance from the camera, n is an image resolution and v is the pixel location in the image. 3. Experimental setup All experiments were performed using a camera with a resolution of 640 × 480 pixels. The camera is mounted on a platform in which the height of the camera from the floor was 11.5 cm and the camera is tilted to the point where the entire FOV of the camera intersects with the floor. The pitch angle of the camera is 24°. The source code was written in C++ language and compiled in Microsoft Visual Studio 2008. The captured images are processed using Open Source Computer Vision Library (OpenCV) version 2.30 developed by Intel, which is capable of reading real-time images and processing it for various applications. For the calibration process, an object is placed in different linear positions in a range of 10 - 155 cm from the camera at an interval of 5 cm for each position. Relevant data computed by the algorithm are recorded for the calibration. A few basic assumptions for the system were made: 1. The floor is even terrain. 2. The object is on the floor. 3. The intensity level of the object and the floor is different.

(a)

(b) Fig. 2. The capture image at the distance of 15 cm. (a) Original image (b) Processed image.

948

Z. Said et al. / Procedia Engineering 41 (2012) 945 – 950

Before estimation of the distance can be done, the input images must be converted into grayscale. Then the grayscale images are converted into a binary form by applying the object threshold. Each pixel that has a higher grayscale values compared to the threshold values will be considered as an object; otherwise it will be considered as a background. The object height, width, and pixel location can be found from the binary images. Fig. 2 shows the image taken during the calibration process. The left side (a) was the original captured image while the right side (b) was the processed image, which is in black and white. From the processed image, the pixel location of the lowest part of the object, which is the object base, was extracted for measuring the distance. The red dot in (b) indicates the location of the pixel for the object base in the image. 4. Results Some data acquired from calibration cannot be used because of several reasons. For the data of an object at the distance of 10 cm, the object base was not in the image FOV as shown in Fig. 3(a) while for an object at the distance of 155 cm, the object is out of the image FOV as shown in Fig. 3(d). As a result, whenever the pixel location was either at zero or at maximum, that data were ignored. Fig. 3(a) shows that only upper part of the object is visible in the image. At this point, the object enters the blind area of the camera. Fig 3(d) shows only white windows because there is no pixel in the image representing the object. In Fig. 3(b) and (c), the object base is visible in the images. As the object comes closer to the camera, the number of pixel also increases.

(a)

(c)

(b)

(d) Fig. 3. The processed images (a) at 10 cm (b) at 30 cm (c) at 80 cm (d) at 155 cm

Fig. 4 shows the graph of distance of an object versus pixel location of an image. It seems that three-degree polynomials formulation which is used in Ref. [11] was not suitable for this condition. Thus, an exponential function was used because it is the best fit to the condition. If d is corresponding distance relating to pixel location y, the object distance can be obtained as follows: d = ae (by) + ce (fy)

(7)

949

Z. Said et al. / Procedia Engineering 41 (2012) 945 – 950

Using the data collected from calibration, the values of coefficients a, b, c and f in (7) can be obtained as follows. d = 77.21e (-0.02447y) + 75.56e (-0.003776y) Graph of object distance vs pixel location

150

Object distance (cm)

(8)

100

50

0 0

50

100

150

200

250

300

350

400

450

500

Pixel location

Fig. 4. Graph of distance of an object versus pixel location.

The results of estimation of depth using equation obtained previously could be considered very accurate with the percentage of errors around ±1.56%, which has a better accuracy compared to camera projection model technique [16] with errors of 2.45%. Fig. 5 shows the graph of the distance estimation using the proposed technique and camera projection model compared to the real values. Graph of object distance vs pixel location

160

Measured distance Proposed method Camera projection model

140

Object dist ance (c m)

120

100

80

60

40

20

0

50

100

150

200

250

Pixel location Fig. 5. Comparison of object distance using proposed method and camera projection model

300

350

950

Z. Said et al. / Procedia Engineering 41 (2012) 945 – 950

5. Conclusion and future works The various techniques of monocular vision have been tried out by researchers in order to obtain the most accurate and precise depth estimation given by the system. The brief discussion of all techniques used in monocular vision was done. In this paper, we have used the pixel location method in attempt to determine the distance of the object. This study shows that the depth estimation using monocular vision can be achieved by the proposed method. The depth estimation given is quite approximate to the real depth measured. The results show some significant data; however, the estimation technique can be improved. For this experiment, the system only need to process one object at a time in the environment. For future work, multiple objects will be placed at a different location at the same time. This study perhaps contributes to the monocular vision system research area as one of the alternative methods or techniques instead of using the current technologies.

References [1] Y. Salih, A. S. Malik, and Z. May, "Depth estimation using monocular cues from single image," in National Postgraduate Conference (NPC), 2011, 2011, pp. 1-4. [2] P. P. K. Chan, J. Bing-Zhong, W. W. Y. Ng, and D. S. Yeung, "Depth estimation from a single image using defocus cues," in 2011 International Conference on Machine Learning and Cybernetics (ICMLC), 2011, pp. 1732-1738. [3] S. M. Haris, M. K. Zakaria, and M. Z. Nuawi, "Depth estimation from monocular vision using image edge complexity," in 2011 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), 2011, pp. 868-873. [4] J. Jae-Il and H. Yo-Sung, "Depth map estimation from single-view image using object classification based on Bayesian learning," in 3DTVConference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON), 2010, 2010, pp. 1-4. [5] Y.-g. Zhao, W. Cheng, L. Jia, and S.-l. Ma, "The Obstacle Avoidance and Navigation Based on Stereo Vision for Mobile Robot," in 2010 International Conference on Optoelectronics and Image Processing (ICOIP), 2010, pp. 565-568. [6] C. Karaoguz, A. Dankers, T. Rodemann, and M. Dunn, "An analysis of depth estimation within interaction range," in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2010, pp. 3207-3212. [7] M. N. A. Wahab, N. Sivadev, and K. Sundaraj, "Development of monocular vision system for depth estimation in mobile robot - Robot soccer," in 2011 IEEE Conference on Sustainable Utilization and Development in Engineering and Technology (STUDENT), 2011, pp. 36-41. [8] S.-s. Chen, W.-h. Zuo, and Z.-l. Feng, "Depth estimation via stereo vision using Birchfield's algorithm," in 2011 IEEE 3rd International Conference on Communication Software and Networks (ICCSN), 2011, pp. 403-407. [9] Z. Lijing and L. Xufeng, "A Single Texture Object 3-D Depth Recovery," in 2010 Second International Workshop on Education Technology and Computer Science (ETCS), 2010, pp. 684-687. [10]Y. Zhuang, X. Xu, X. Pan, and W. Wang, "Mobile robot indoor navigation using laser range finder and monocular vision," in Robotics, Intelligent Systems and Signal Processing, 2003. Proceedings. 2003 IEEE International Conference on, 2003, pp. 77-82 vol.1. [11] S. Fengchi, Z. Yuan, L. Chao, and H. Yalou, "Research on active SLAM with fusion of monocular vision and laser ra nge data," in Intelligent Control and Automation (WCICA), 2010 8th World Congress on, 2010, pp. 6550-6554. [12]J. Cai and R. Walker, "Height estimation from monocular image sequences using dynamic programming with explicit occlusions," Computer Vision, IET, vol. 4, pp. 149-161, 2010. [13]I. U. Jan and N. Iqbal, "A new technique for geometry based visual depth estimation for uncalibrated camera," in International Conference on Image Analysis and Signal Processing, 2009 (IASP 2009). 2009, pp. 286-291. [14]W. Qizhi and C. Xinyu, "The simple camera calibration approach based on a triangle and depth estimation from monocular vision," in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2009 (IROS 2009), 2009, pp. 316-320. [15]G. Cheng and A. Zelinsky, "Real-time visual behaviours for navigating a mobile robot," in Proceedings of the 1996 IEEE/RSJ International Conference on Intelligent Robots and Systems '96 (IROS 96), 1996, pp. 973-980 vol.2. [16]Z. Qiang, H. Shouren, and W. Jia, "Automatic Navigation for A Mobile Robot with Monocular Vision," in 2008 IEEE Conference on Robotics, Automation and Mechatronics, 2008, pp. 1005 -1010. [17]B.-r. Liu, Y. Xie, Y.-m. Yang, and Z.-Z. Qiu, "A self-localization method with monocular vision for autonomous soccer robot," in Industrial Technology, 2005. ICIT 2005. IEEE International Conference on , 2005, pp. 888-892.

Suggest Documents