Disparity Map Generation, from the use of Rectified Images Camilo Ernesto Pardo Beainy
Fabián Rolando Jiménez López
Electronic Engineering Faculty Santo Tomas University, USTA EICIT Group – Tunja, Colombia
[email protected]
Electronic Engineering Faculty Universidad Pedagógica y Tecnológica de Colombia, UPTC Tunja, Colombia
[email protected]
Edgar Andrés Gutiérrez Cáceres
Luis Fredy Sosa Quintero
Electronic Engineering Faculty Santo Tomas University, USTA EICIT Group – Tunja, Colombia
[email protected]
Electronic Engineering Faculty Santo Tomas University, USTA EICIT Group – Tunja, Colombia
[email protected]
Abstract — In this paper, we propose the development of a system for generating disparity maps using rectified images, one technique that is booming in depth perception systems in computer vision. For this reason, in this project, is made a brief explanation process associated with mating techniques for stereo vision. For this project, we explore correlation algorithms, such as sum of absolute differences and Sum of Hamming Distances, combining the latter with the use of Census Transform. Additionally, we present the behavior of disparity maps for images that have been affected by impulsive noise. Initially rectified images were considered, which were taken from the image bank, presented by Middlebury College, who have worked extensively in the stereo vision problem. In this case, we used the known stereo pair Tsukuba. Keywords — Census Transform, Disparity Maps, Sum of Absolute Differences, Sum of Hamming Distances
I.
INTRODUCTION
•
Visual rays must intersect at homologous points in the space of two by two. This means that the observation should be made according epipolar planes, ie the rays epipolar counterparts are on the same line.
•
The most distant points of both images are not to have a separation greater than the pupillary distance of the observer, or vision based device used
B. Epipolar Theory As shown in the figure below are two images of the same object, I1 and I2, taken by two different cameras. Given a point, P1 of I1, its correspondence in I2, P2, necessarily belongs to a completely straight line determined by the points P1 and P2 and is called epipolar line. In this system, the optical axes of the two cameras are separated only in one dimension, the horizontal, so that a point of the scene captured by the two cameras will differ only in their horizontal component. [9]
In this paper, we present briefly the process of determining disparity maps associated to a specific scene, described by a pair of rectified images previously. This is a very important problem with many applications such as autonomous navigation systems in mobile robotics [2-5], classification systems [6], mobile surface reconstruction [7], face recognition [8], among others. A. Stereoscopic Vision The requirements for stereoscopic vision are: •
Each eye must present a picture to cover a common area and that this parallax with respect to observing the other eye.
Fig. 1. Geometry of a pair of cameras with parallel optical axes. [9]
978-1-4799-1121-9/13/$31.00 ©2013 IEEE
From this system can triangulate the distance to the object in question. In this paper, we propose the development of a system for generating disparity maps using rectified images, one of the techniques is booming compared to depth perception systems in computer vision. It is for this reason that this document is a brief explanation process associated with mating techniques for stereo vision, throwing the following theoretical reference. C. Stereoscopic Pairing Algorithms. The stereo matching algorithms reproduce the process of human stereopsis for a machine can perceive the depth of each point in the observed scene so you can manipulate objects, avoid or recreate three-dimensional models. For a pair of stereoscopic images primary goal of these algorithms is found for each pixel in the corresponding image in the other image (matching) in order to get a disparity map containing the difference for each pixel position between the two images which is proportional to the depth. There are mainly two types of stereoscopic matching algorithm:
In tests on the system, we added Salt and Pepper noise to the original images, Figure 3 shows the images in question subject to noise.
Fig. 3. Stereo Image Pair with Salt and Pepper Noise
To eliminate the noise generated, we used a Median Filter on each of the images considered, this kind of filter has very acceptable behavior against Salt and pepper noise. Figure 4 shows the filtered images.
Based on characteristics: They find only matings of the main features in the image such as edges of objects. They require much less processing time because the only use certain features of the input image information is lower. Have the disadvantage that they can not find depths which change smoothly and produce scattered disparity maps. By correlation, occur dense disparity maps in which the disparity is for each pixel in the image. As a single pixel pair is almost impossible, each pixel is represented by a small region that contains it, called correlation window, thus performing the correlation between the image display and the other, using the value of the pixels within them. For this project, we explore algorithms for correlation, such as Sum of Absolute Differences and Sum of Hamming Distances, combining the latter with the use of Census Transform. Additionally, we present the behavior of the disparity maps for images that have been affected by Impulse Noise. II.
Fig. 4. Pair of Stereo Images Leaked with Median Filter
III.
SUM OF ABSOLUTE DIFFERENCES (SAD)
The sum of absolute differences (SAD) is one of the simplest methods of similarity measures is calculated by subtracting pixels in a square area between the reference image I1 and I2 target image as seen in Figures 5 and 6, this process is followed by the aggregation of the absolute differences within the square window, if the left and right images coincide exactly, the result will be zero, as this procedure is summarized in Equation 1.
STEREO IMAGES
Initially considered rectified images, images taken from the bank presented by Middlebury College, who have worked extensively on the issue of Stereo Vision. In this case, we used the known stereo pair Tsukuba. These images are presented below. Fig. 5. Matching images by correlation
(1)
Fig. 2. Rectified Stereo Pair of Images
978-1-4799-1121-9/13/$31.00 ©2013 IEEE
To try to solve the problems caused by the added noise, we used a Median filter for each stereo pair images. The pixels of the new images are generated by calculating the median of the set of pixels of the pixel neighborhood environment corresponding to the source image. In this way the pixels are homogenized very different intensity with respect to the neighbors. This type of filter is indicated when there is enough random noise.
Fig. 6. Windows of Search and Reference for the images
The next step was to apply SAD, obtaining the disparity map shown in Figure 9, the map of disparities recover an information with respect to disparity map with noise presented earlier in Figure 8.
Using Sum of absolute differences (SAD) were performed 3 tests for disparity maps against different conditions. The first test was to apply SAD to initial stereo pair images, obtaining the disparity map shown in Figure 7, the disparity map is manage to differentiate some of the objects presented in the original image.
Fig. 9. Disparity Map with SAD in the Images using a Median Filter. a). Procedure Done, b) Disparity map Fig. 7. Disparity Map with SAD in the Original Images. a). Procedure Done, b) Disparity map
IV.
CENSUS TRANSFORM
A second test, allowed us to observe the effect of adding noise to the images of a stereo pair; noise was added such as "salt and pepper" (black and white dots) with a noise density equal to 0.1 (affects 10% of the pixels). After adding the noise of original images, we applied SAD, obtaining the disparity map shown in Figure 8, this disparity map not manage to differentiate the objects presented, so that information is lost because of noise.
There are non-parametric measures based on applying a transformation to each pixel and replacing the current value by a value relative to a region of surrounding pixels. They stand within this method transforms Rank and Census.
Fig. 8. Disparity Map with SAD in the Original Images with Salt and Pepper Noise. a). Procedure Done, b) Disparity map
Fig. 10. Rank Transform
Rank transform for a given window is defined as the number of pixels in the window for which the intensity is less than the central pixel's intensity. This transform is very robust, but information will be lost by considering that the relative ranking of the pixels that fall within the neighborhood are encoded in a single value, this behavior is shown in Figure 10, where there are different forms of sort the pixels, for the same value of the transform.
978-1-4799-1121-9/13/$31.00 ©2013 IEEE
Zabih and Woodfill [11] propose a modification of the above method that preserves the spatial distribution of neighbors, this variation of the Rank transform is the Census transform. This transform encoding information into a bit string rather than a single value (as in the transformed Rank). In Figure 11 you can see an example of how Census transform is applied to a pixel in a window of size five. The pixel to which the transform is applied is located in the center of the window with an intensity value of 78. The window runs from the upper left to the lower right, processing the matrix forming the window rows. If the intensity value of the pixel that is being scanned is less than the value of central pixel intensity, a zero is added to the chain, and if the value is higher, add a one.
Fig. 13. Census Transform in the Stereo pair contaminated with noise
Developed a final test involved applying census transform to the filtered images with median filters, as shown in Figure 14. The purpose of these tests was to take each of these results at a later stage, using the method of Sum of Hamming distances (SHD) to determine the disparity map of the scene.
Fig. 11. Census Transform
Within the development of the project also worked with census transform in this case were conducted three tests. The first test developed, was to apply census transform on the original stereo pair images. The results of applying this procedure can be seen in Figure 12.
Fig. 14. Census Transform of filtered Images
V.
Fig. 12. Census Transform in the Original Stereo pair
A second test performed was to apply census transform the images that were previously subjected to impulsive noise. Figure 13 presents the results of the application of the transform on noisy images.
SUM OF HAMMING DISTANCES (SHD)
After completing the census transform can determine the correspondence of a pixel by computing the Hamming distance. Hamming distance is between chains each candidate pixels found in an image, for example the left, and the pixel string which finds its counterpart in the other image of the pair, the right image. The Sum of Hamming Distances (SHD) is usually used to compare images that have previously been treated with a Census Transform. In this method, comparison is performed bit to bit through an XOR operation between the values of the left and right images within a square window. This step is usually
978-1-4799-1121-9/13/$31.00 ©2013 IEEE
followed by an operation of counting bits which translates into the final score for the Hamming distance. In Figure 15 shows the calculation of the Hamming distance for two data streams. Equation (2) is the form of calculation associated with the method (SHD), where C1 and C2 represent the images previously treated with Census Transform.
Fig. 15. Sum of Hamming Distances
Fig. 17. Disparity Map with SHD in the Original Images with Salt and Pepper Noise. a). Procedure Done, b) Disparity map
(2) Using Sum of Hamming Distances (SHD) were performed 3 tests for disparity maps against different conditions. The first test consisted of applying SHD to initial stereo pair images, obtaining the disparity map shown in Figure 16, in the disparity map are achieved differentiate some of the objects presented in the original image.
As in the tests with SAD, we used a median filter for each of the stereo pair images. Again, the pixels of the new images were generated by calculating the median of the set of pixels of the pixel neighborhood environment corresponding to the source image. In this way the pixels are homogenized very different intensity with respect to the neighbors. The next step consisted in applying SHD, obtaining the disparity map is depicted in Figure 18, in this map of disparities recover an information, but no significant improvement is seen compared with the disparity map with noise, calculated with SHD and previously presented in Figure 17.
Fig. 16. Disparity Map with SHD in the Original Images. a). Procedure Done, b) Disparity map
A second test, allowed us to observe the effect of adding noise to the images of a stereo pair; noise was added such as "salt and pepper" (black and white dots) with a noise density equal to 0.1 (affects 10% of the pixels). After adding the noise of original images, we applied SHD, obtaining the disparity map shown in Figure 17, in the disparity map are achieved differentiate the objects presented, although it is appreciated that some of the disparities have changed over respect to previously analyzed images without noise. However, there is a better system response, with respect to the application of SAD.
Fig. 18. Disparity Map with SHD in the Images using a Median Filter. a). Procedure Done, b) Disparity map
978-1-4799-1121-9/13/$31.00 ©2013 IEEE
VI.
CONCLUSIONS
Within the development of this project, we selected images provided by the repository Middlebury College, the purpose of the selection of these images is to verify the behavior of the algorithms in order to isolate the problems inherent therein, such so that in the next step of its application to the real world, there are only those problems of this nature, such as different levels of intensity in the two images, due to glare and other lighting effects of unstructured environments, radial distortion in lens or vertical displacements of the cameras. This paper makes an assessment of algorithms for calculating the disparity map in stereo vision systems, an idea for future development of this project consists in the implementation of algorithms on embedded systems, to observe the behavior of the system against a scenario real-time execution. REFERENCES [1]
L. D. Matteo, R. Verrastro, R. Cignoli, I. Bertacchini, I. Alexis, and P. Pardini, “Mapeo tridimensional de la topología del entorno mediante sistema de visión artificial estereoscópica,” Seminario IA y R, Universidad Tecnológica Nacional Facultad Regional Buenos Aires, 2010. [2] K. Hattori and Y. Sato, “Handy Rangefinder for Active Robot Vision,” IEEE International Conference on Robotics and Automation, pp. 1423– 1428, 1995. [3] J. Pages, C. Collewet, F. Chaumette, and J. Salvi, “A camera-projector system for robot positioning by visual servoing,” Conference on Computer Vision and Pattern Recognition Workshop (CVPRW’06), 2006. [4] A. Vazquez and A. Adan, “3D Vision System for Robot Interaction in Moving Scenes,” IEEE International Symposium on Intelligent Signal Processing, 2007. WISP 2007., 2007. [5] J. Ryu, S. Yun, K. Song, J. Cho, J. Choi, and S. Lee, “High Speed 3D IR Scanner for Home Service Robots,” Direct, no. 3, pp. 678–685, 2006. [6] J. Oh, C. Lee, S. Lee, S. Jung, D. Kim, and S. Lee, “Development of a Structured-light Sensor Based Bin-Picking System Using ICP Algorithm,” International Conference on Control, Automation and Systems 2010, Oct. 27-30, 2010 in KINTEX, Gyeonggi-do, Korea, pp. 1673–1677, 2010. [7] X. Maurice, P. Graebling, and C. Doignon, “Epipolar Based Structured Light Pattern Design for 3-D Reconstruction of Moving Surfaces,” IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China, pp. 5301–5308, 2011. [8] M. Young, E. Beeson, J. Davis, S. Rusinkiewicz, and R. Ramamoorthi, “Viewpoint-Coded Structured Light,” 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, Jun. 2007. [9] D. Martín Carabias, R. Requero García, and J. A. Rodríguez Salor, “Sistema de Visión Estereoscópica para Navegación Autonoma de vehículos no tripulados,” Universidad Complutense de Madrid, Facultad de Informática, 2010. [10] A. Iván and M. Clemente, “Generación de Mapas de Disparidad Utilizando Cuda,” Universidad Carlos III de Madrid Escuela Politécnica Superior Departamento de Ingeniería de Sistemas y Automática, 2009. [11] R. Zabih and J. Woodfill, “Non-parametric Local Transforms for Computing Visual Correspondence,” Proceedings of the third European conference on Computer Vision (ECCV)., pp. 151–158, 1994. [12] G. Xiong, X. Li, J. Gong, and D. Lee, “Color Rank and Census Transforms using Perceptual Color Contrast,” 11th Int. Conf. Control, Automation, Robotics and Vision, no. December, pp. 7–10, 2010.
978-1-4799-1121-9/13/$31.00 ©2013 IEEE