in http://www.middlebury.edu/stereo [11, 12, 13,14]. Initially we arithmetic code the .... property of Middlebury College Stereo Vision Research. Group and can be.
EVALUATION OF DISPARITY MAP CHARACTERISTICS FOR STEREO IMAGE CODING Anil Aksay, M. Oguz Bici, Gozde Bozdagi Akar Electrical and Electronics Engineering Department, Middle East Technical University, 06531, Ankara, Turkey {anil, mobici, bozdagi}@eee.metu.edu.tr, http://mmrg.eee.metu.edu.tr ABSTRACT In order to compress stereo image pairs effectively, disparity compensation is the most widely used method. In this paper we examined the effects of using different disparity maps and their properties in an embedded JPEG2000 based disparity compensated stereo image coder. These properties include the block size, estimation method and the resulting entropy of the disparity map. Experimental results show that basic block matching gives better results than ground truth, especially on occluded regions and boundaries.
1. INTRODUCTION Human Visual System processes 3D world using monocular and binocular cues. Monocular cues are interposition of objects, shading, 3D geometry and blurriness. Binocular cues are the two slightly different images entering left and right eyes. Disparity is called the positional difference of an object between left and right images. By processing disparities of objects, we can perceive depth of objects around us. Stereo Image Capturing systems mimic the difference between the two eyes by parallel cameras separated slightly. Stereo Image Viewing systems try to show these two slightly different images into different eyes of the observer. Since these two images are slightly different, estimating one from the other enables high efficiency in compression of stereo image pairs. Disparity estimation is the most widely used method in stereo image coding [1,2]. Disparity is estimated using several techniques like block matching, optical flow methods and hierarchical methods [11]. Disparity information is represented by block based representation [1, 3], quadtree representation [4] or hierarchical representation [2].
Disparity estimation is the most intensive part of the coding schemes. However, in some of the applications, original disparity information can be gathered from camera setup or from 3D information of the scene and objects. Original disparity information is also called as ground truth. In a previous work [5], the use of original disparity information with different wavelet based coding techniques for reference frame and the residual error frame is investigated. The results show that wavelet based coding gives better results than DCT-based coding. In this work, we have examined the effect of using different disparity maps (estimated or ground truth) in a stereo image coder system. We also tried to improve the coding efficiency in case ground truth information is used. By modifying ground truth without intensive computation, we can achieve higher compression ratios with the same quality. 2. STEREO IMAGE CODER In our system we use a JPEG2000 based disparity compensated stereo image coder and decoder illustrated in Figure 1 and 2. Right image is used as reference frame since evaluated depth maps are generated in that manner. Right image is first JPEG2000 coded and decoded at the specific bitrate (B1). Left image is constructed using the decoded right image and the depth map. Error is JPEG2000 coded at the specific bitrate (B2). Depth map is arithmetic coded. JPEG2000 coding is employed by JasPer software [6] using JP2 coding syntax [7] with default parameters (6 resolution levels, integer mode, layer-resolutioncomponent-position (LRCP) progressive (i.e., rate scalable) mode).
into the bitstream at the start. Overhead of sending probabilities is negligible since maximum disparity is limited for stereo image pairs and coding performance is extremely increased compared with order-0 model. According to disparity compensation mode, we represent the disparity by sub-pixel, half pixel or pixel resolution. According to block size of the depth map, total sequence length changes as well. 3. DEPTH MAPS
Figure 1 Stereo Image Coder
For our experiments, we use map, tsukuba, venus, sawtooth and room image pairs. We have used the following depth maps: Original ground truth, 8x8 Block Matching, 8x8 Block Matching with Smoothing [10] and all of the disparity estimation algorithms evaluated in http://www.middlebury.edu/stereo [11, 12, 13,14]. Initially we arithmetic code the disparity map and compensate left image with disparity map and original right image. We compare the PSNR values of the reconstructed left image and arithmetic coded size of disparity maps. We have evaluated performances of all 33 disparity estimation methods described in [11]. Initial tests show that original ground truth does not give the best performance among the other disparity maps. The problems with ground truth are occluded areas and image boundaries. Moreover, coding disparity values for each pixel is not practical. In Table 1, we are presenting some of the best performing methods.
Figure 2 Stereo Image Decoder
Input image sets are either recorded by parallel cameras or rectified [8] before entering this coding scheme. Therefore depth map consists of only horizontal displacement for each pixel. Vertical positions of pixels are assumed to be identical in both images. Input depth maps can be in sub-pixel accuracy, thus disparity compensation is employed in three different modes, (integer mode, half pixel mode and subpixel mode). In half pixel mode, disparity value corresponding pixel for the location is selected between the two corresponding locations. In subpixel mode, arithmetic average of the two candidate pixels is selected according to disparity value. Arithmetic coding of depth map is based on [9] with updated order-n model (n being equal to the whole sequence size), where all the probabilities are embedded
In order to reduce the entropy of disparity maps, we converted all depth maps into 8x8 blocked depth maps by using averaging and median filtering. Median filtering decreases PSNR by 1.1 dB and averaging decreases by 1.6 dB. Both filtering methods reduce the entropy to similar values and resulting entropies are very small compared to full depth map coding. In order to increase the performance of ground truth, we replace the disparity of the blocks at boundaries and occluded regions with corresponding disparities found in Block Matching. We detect those regions by comparing the SAD (sum of absolute difference) of the reconstructed image and the original image. PSNR of the compensated images with modified ground truth maps and the corresponding ratio of the modified blocks to the total number of blocks are shown in Table 2.
Images Algorithm / Mode Ground Truth Block Matching (BM) BM with Smoothing [10] Graph cuts [12] Maximum Flow [13] Scanline Optimization [11] Genetic Algorithm [14] Bayesian Diffusion [11]
Map Integer
20.35 25.33 25.07 27.13 27.31 26.07 26.11 25.52
Map Subpixel
Tsukuba Integer
Tsukuba Subpixel
22.20 25.33 25.07 27.13 31.01 26.07 26.11 25.52
23.53 29.69 29.47 32.83 32.35 32.76 31.10 32.00
23.53 29.69 29.47 32.83 36.03 32.76 31.10 32.00
Sawtooth Integer
23,61 28,04 27,92 29,12 29,33 29,25 28,89 28,84
Sawtooth Subpixel
24,53 28,04 27,92 29,12 31,99 29,25 28,89 28,84
Venus Integer
25.44 29.05 28.96 29.61 29.80 29.82 29.46 29.36
Venus Subpixel
27.42 29.05 28.96 29.61 31.96 29.82 29.46 29.36
Table 1 PSNR (dB) for several disparity maps with integer and subpixel disparity compensation Depth Map Gt gts gtav gtmed bm gtmod gtmods
Map
Room
Tsukuba
Sawtooth
Venus
20.35 22.20 19.24 20.16 25.33 23.72 25.94
19.54 21.75 19.71 19.64 22.49 21.18 22.73
23.53 23.53 22.79 23.63 29.69 27.28 27.28
23.61 24.53 22.43 23.04 28.04 26.51 27.98
25.44 27.42 25.17 25.53 29.05 27.05 29.37
% Modified Block
12.45
6.45
5.38
4.92
3.14
Table 2 PSNR (dB) for several modified ground truth and updated block percentages (gt = original ground truth, gts = original ground truth with subpixel disparity compensation, gtav = 8x8 blocked ground truth using averaging filter, gtmed = 8x8 blocked ground truth using median filter, bm = 8x8 block matching map, gtmod = modified 8x8 blocked ground truth using median filter, gtmods = modified 8x8 blocked ground truth using median filter with subpixel disparity compensation)
subpixel compensation, we can perform similar to block matching.
Figure 3 PSNR (dB) versus bitrate (bpp) for B1 = B2 with selected algorithms (tsukuba)
4. RESULTS OF CODED STEREO PAIRS After refining the disparity maps, we selected several algorithms according to their performances in the first stage. We used these disparity maps and apply to the system with several bitrates (where B1=B2). We also compare the results with the performance of independently coded image pairs. Since we are compressing both frames, we are using the following PSNR (dB) definition for calculating the quality of the stereo image pair:
255 2 ( Dl + Dr ) / 2 PSNR versus bitrate figures for room and map images are shown in Figure 3 and 4. We can see in the figures, basic block matching gives better results than the other algorithms. By modifying ground truth and using PSNR = 10 log 10
Figure 4 PSNR (dB) versus bitrate (bpp) for B1 = B2 with selected algorithms (map)
As a last step of experiments, we try to change the ratio between B1 and B2. Increasing B1 will increase quality of right image and also quality of the compensated left
image. Even though the rate for the error decreases, better estimation of left image will improve quality of final left image. In Figure 5, we have used block matching with different B1 and B2 ratios. From the figure, we can see that using 0.7 = B1/TB, where TB is the total bitrate, we can improve PSNR about 1.5 dB.
Figure 5 PSNR (dB) versus bitrate (bits per pixel) for different B1/(B1+B2) ratios (tsukuba)
5. CONCLUSIONS AND FUTURE WORK In this work, we have investigated the effect of disparity information in stereo image coding methods. If ground truth information is available, we show that coding efficiency can be improved by modifying ground truth without intensive computation. We have also shown that basic block matching gives better results than ground truth, especially on occluded regions and boundaries. In future, we will try to extend our system for images that are not rectified (having both horizontal and vertical displacements). Also we will incorporate our findings into stereo video coding. 6. ACKNOWLEDGMENTS This work is supported by EC within FP6 under Grant 511568 with the acronym 3DTV. Stereo pairs (map, tsukuba, sawtooth and venus) are the property of Middlebury College Stereo Vision Research Group and can be downloaded at http://www.middlebury.edu/stereo/. Room Stereo pair is the property of Computer Vision Group, University of
Bonn and can be downloaded at http://www-dbv.cs.unibonn.de/stereo_data/. We would also like to thank Çağdaş Bilen and Murat Birinci for their help obtaining the disparity maps. 7. REFERENCES [1] A. Frajka, K. Zeger, “Residual image coding for stereo image compression”, Optical Engineering, Volume 42, pp. 182-189, 2003. [2] S. Sethuraman, M. W. Siegel, A. G. Jordan, “Multiresolution based hierarchical disparity estimation for stereo image pair compression”, Proc. of the symposium on Application of Subbands and Wavelets, Newark, NJ, 1994. [3] R. Shukla and H. Radha, “Disparity Dependent Segmentation based Stereo Image Coding,” IEEE International Conference on Image Processing (ICIP), September 2003. [4] N. V. Boulgouris. and M. G. Strintzis, "Embedded Coding of Stereo Images", In Proc. ICIP, Vol.3, pages 640–643, Vancouver, Canada, 2000. [5] N. V. Boulgouris and M. G. Strintzis, “A family of wavelet-based stereo image coders", IEEE Trans. on CSVT, Vol. 12, No. 10, pp.898-903, October 2002. [6] M. D. Adams and F. Kossentini, “JasPer: A software-based JPEG-2000 codec implementation,” In Proc. of IEEE International Conference on Image Processing, Vancouver, BC, Canada, October 2000. [7] International Organization for Standardization and International Electrotechnical Commission. ISO/IEC 154441:2000, Information technology—JPEG 2000 image coding system—Part 1: Core coding system. [8] N. Ayache, and C. Hansen. Rectification of images for binocular and trinocular stereovision. in Proceedings of the 9th International Conference on Pattern Recognition. 1988. Rome, Italy.: p. 11-16. [9] E. Bodden, M. Clasen and J. Kneis, “Arithmetic Coding revealed: A guided tour from theory to praxis,”, http://ac.bodden.de, May 2004, [10] J. Konrad and Z.-D. Lan, "Dense disparity estimation from feature correspondences," in Proc. SPIE Stereoscopic Displays and Virtual Reality Systems, vol. 3957, pp. 90-101, Jan. 2000, [11] D. Scharstein and R. Szeliski. “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms”, Intl. J. Comp. Vis., 47(1): 7–42, 2002. [12] Y. Boykov, O. Veksler, and R. Zabih. "Fast approximate energy minimization via graph cuts", IEEE TPAMI, 23(11):1222–1239, 2001. [13] S. Roy and I. J. Cox. "A maximum-flow formulation of the N-camera stereo correspondence problem", ICCV, pages 492–499, 1998. [14] M. Gong and Y.-H. Yang. “Multi-baseline stereo matching using genetic algorithm”, In IEEE Workshop on Stereo and Multi-Baseline Vision, 2001.