Depth Image Enhancement for Kinect Using Region Growing and ...

2 downloads 0 Views 512KB Size Report
Microsoft's Kinect as a recent 3D sensor has attracted considerable research attention in the fields of computer vision and pattern recognition. But its.
21st International Conference on Pattern Recognition (ICPR 2012) November 11-15, 2012. Tsukuba, Japan

Depth Image Enhancement for Kinect Using Region Growing and Bilateral Filter Li Chen, Hui Lin and Shutao Li College of Electrical and Information Engineering, Hunan University, Changsha, China 410082 [email protected]; [email protected]; [email protected] Abstract

overcome these problems, several attempts were made in previous works. Matyunin et al. [4] used bilateral filter and median filter in the temporal do main, but it is an off-line approach and not suitable in real time applications. Camp lani and Salgado [5] [6] p roposed two different kinds of methods using bilateral filter and Kalman filter to s mooth depth maps and reduce the random fluctuation in the time do main, but both two approaches were designed for the static scene and cannot be used in the dynamic situation. In this paper, we propose a three-step approach which can deal with both static and dynamic situation in real time. Firstly, a region growing method is developed to detect the pixels with wrong depth values according to the edge informat ion of both depth and color images. Then the bilateral filter and the region growing technique are combined to fill the holes. Finally, a new adaptive joint bilateral filter is proposed to reduce the noise of depth map according to the special noise property of Kinect sensor. The remainder of this paper is organized as follows. In Section 2, we describe the proposed method in detail. Section 3 shows the experiments of the proposed method and Section 4 is the conclusion of this paper.

Microsoft’s Kinect as a recent 3D sensor has attracted considerable research attention in the fields of computer vision and pattern recognition. But its depth image suffers from the problem of poor accuracy caused by invalid pixels, noise and unmatched edges. In this paper, an efficient approach is proposed to improve the quality of Kinect’s depth image. Using its corresponding color image, the pixels with wrong depth values are detected and removed using a region growing method. To accurately estimate the values of invalid pixels, a joint bilateral filter is used to fill the holes. Considering the special noise property of Kinect sensor, an adaptive bilateral filter is proposed to effectively reduce the noise of the depth image. Experimental results show that the proposed method significantly improves the quality of depth image by successfully filling the holes, eliminating the unmatched edges and reducing the noise.

1. Introduction As a new member of 3D sensor family, Kinect has drawn great attention of researchers in the field of 3D computer vision for its advantage of consumer price and real t ime nature, and has been used in a wide range of applications such as free viewpoint television (FTV), natural scene modeling and robot vision[1, 2, 3]. Based on structured light technique, Kinect is able to generate depth and color images simu ltaneously at a speed of about 30 fps. However, limited by depth measuring principle and object surface properties, depth image fro m Kinect inevitably contains some optical noise and unmatched edges, together with holes (invalid pixels) , as shown in Figure 1, wh ich makes it unsuitable in numerous applications. To

978-4-9906441-0-9 ©2012 ICPR

2. The proposed method Our goal of this paper is to provide an effective solution to improve the depth quality fro m Kinect sensor by filling holes, refin ing edges and reducing noise. We notice that unmatched edges exist because some pixels near the object boundaries are assigned wrong depth values. These pixels should be detected and their values should be removed before the holes are filled. So we remove the wrong depth values first, and then fill the holes. No ise reduction is performed in the last step. In the following part, we will describe each step in detail.

3070

(a) color image

(b) depth image

(c) problems of depth image

Fig 1: (a) and (b) are the color and depth images captured by Kinect. (c) shows the problems of depth image. The roughness of the right part shows the optical noise. Black pixels are the invalid pixels (holes). White pixels are the edges of color image. It is obvious that edges of depth map don’t match the corresponding edges of color image (unmatched edges). We should note that the unmatched edges are caused by the wrong value pixels (pixels with wrong depth values) which locate between edges of depth image and its corresponding edges of color image.

pixels,

2.1 Wrong value pixels removing We notice that wrong depth values locate between edges of depth map and the corresponding edges of color image (see Figure 1(c)). This fact g ives us a criterion to determine whether the depth value of a certain pixel is wrong or not. The process of detecting wrong value pixels is shown in Figure 2. Firstly, a reg ion grows [9] (see Figure 2(a)) fro m depth image edge (g reen line) until it reaches the color image edge (red line) or a certain distance. The similar process (see Figure 2(b)) is operated separately from co lor image edge to depth image edge. Figure 2(e) and (f) are the results of Figure 2(a) and (b). Then an AND operator is performed between (e) and (f). The result (Figure 2(g)) masks the pixels between edges of depth image and the corresponding edges of color image. We should note that pixels closely near the edges of depth image may also have wrong values. So we expand the edge pixels (Figure 2(c)) with a 3×3 window and add the result (Figure 2(d)) to the final mask. The final mask of wrong value pixels is generated as follows: M  ( M d 2 c AND M c 2 d ) OR Ed (1) where is the final mask (Figure 2(h)) of wrong value

M d 2 c and M c 2 d are the region growing

results shown in Figure 2(e) and (f). Ed denotes the depth edge image after expansion (Figure 2(d)). We regard wrong value pixels as another kind of invalid pixels and remove their values. The next part shows how to estimate their exact values.

2.2 Hole filling One of the possible solutions of the hole filling problem is estimating the values of the invalid pixels according to the similar valid pixels fro m their neighborhood. As shown in [7], the bilateral filter exactly meets this idea since both the color similarity and spatial similarity are taken into account. However, we should note that there exist some situations where simp le bilateral filter may get wrong values. For example, an invalid pixel locates near an edge but

Fig 3: result of removing wrong pixels.

The original map is shown in Figure 1(b). Point A is an invalid pixel near an edge (white line) of color image shown in Figure 1(a). As shown in this figure, pixels close to A are either invalid or at a different surface (human body) from A (the wall).

Fig 2: the process of wrong value pixels detection.

3071

most of its neighbor pixels in the same side are also invalid [see Figure 3]. It is quite co mmon since invalid pixels usually appear near to the object boundaries which correspond to the edges of depth map. What’s more, our wrong value pixel removing technique inevitably causes this situation. So it is necessary to only use the neighbor pixels at the same surface as the invalid pixel to estimate its depth value. To achieve this, the same region growing method as the first step is used to create a smooth region s around an invalid

between Pi and Pj .  is a neighborhood of Pi . In the above equation, higher standard deviations (larger  s ,  c and  d ) means a s moother result, but the edges are b lurred more. To better reduce the noise and preserve the edges, both  c and  d are adapted in the

pixel Pi with the edge information of color image. So the estimated value DiE of Pi is calculated as follows:

DiE 



js D j 0

Gs (i  j )Gc (Ci  C j ) D j Gs (i  j )Gc (Ci  C j )

(2)

where G s is spatial weight and G c is the weight of color similarity. They are Gaussian functions with 0 means and  s ,  c standard deviations respectively.

 l 

l

(4) Dlm2 where  l and Dlm are the local standard deviation and the local mean, wh ile  l is the modified standard

i  j represents the spatial similarity and Ci  C j is the color similarity. They are the Euclidean distance in the image space and the color space. To make DiE more reliable, a min imu m nu mber of valid pixels criterion is introduced where DiE is calculated only

deviation. Next,  c is adapted as follows:

 c  max{ c 0  k   l,  c min }

when the number of valid p ixels in s reaches a certain number. After the values of the invalid pixels in s mooth area are estimated, a conventional bilateral filter (without smooth region restraint) is used to fill the rest invalid pixels.

(5)

where  c 0 is a relatively high sig ma of Gc ,  c min is the minimu m value  c should have, and k is a negative factor wh ich means the more details (a h igher  l ) a local region has, the better ability of d iscrimination (the lower  c ) the color information should have.

2.3 Noise reduction

3. Experiments

To better reduce the optical noise without blurring the edges, the special noise property should be taken into account in the process of depth image denoising. We note that both theory [7] and experiment [2] show that the error increases as a quadratic function of distance. According to this fact, we proposed an adaptive joint bilateral filter to reduce the noise of Kinect’s depth image. As shown in [8], the conventional bilateral filter is easily expanded to depth image denoising with an aligned color image. For each p ixel Pi , the mod ified depth value is calculated as follows: G ( i  j )Gc( Ci  Cj )Gd( Di  Dj )Dj DiM   s j  Gs ( i  j )Gc( Ci  Cj )Gd( Di  Dj )

proposed method.  d is modified as a quadratic function of distance according to the special noise property of Kinect sensor. We notice that a lo w local standard deviation of depth values often represents a smooth local area, while a high local standard deviation usually means a lot of details. We use this fact to adaptively choose  c . Firstly, the mean and the standard deviation are calcu lated for a neighborhood (  ) of a certain p ixel. Then the standard deviation is modified according to the depth error’s quadratic dependence of distance:

To test the performance of the proposed method, we captured over 300 depth and color image pairs from Kinect sensor ranging from very simple scene

(3)

where Gs , Gc , i  j , and Ci  Cj are the same as

(a) bilateral filter

Eq(2). Gd represents the weight of depth similarity. It

(b) our method

Fig 4: pieces of results of raw depth image shown in figure 1(b).

is a Gaussian function with 0 mean and  d standard deviation. Di  Dj is the difference of depth values

3072

4. Conclusions

(shown in Figure 1) to quite co mplex environ ment (see Figure 5) based on the OpenNI framewo rk. To speed up the proposed algorithm, a common s mooth region was generated and kept for the invalid pixels on the same surface in the hole filling step. Both the modified and the conventional bilateral filters are conducted iteratively to fill the large holes. In the third step, we use the average difference between the depth values of a certain pixel and its neighbor pixels to appro ximate the local standard deviation of depth values. Both edges of depth and color images are obtained by Canny Operator. The window size is 9×9 for wrong value pixels detection and 5×5 for hole filling and smoothing.  s and  c are set to 1.2 and 3 respectively in the second step. The number of minimu m valid pixels is 3. In the third step,  c 0 ,  c min and k are set to 15, 3 and -1.5. Figure 4 shows the results of iterative bilateral filter and the proposed method. As the figure shows, the proposed method clearly draws the outline of the person, especially in hand areas. Figure 5 is another example. The red bo x shows that the proposed method can accurately fill large holes. The green box presents the ability of refining edges. However, we could notice that the proposed method cannot deal with the dark areas where color in formation is missing (the blue box). The speed of the proposed method varies with the number of invalid p ixels. The average computational time is 0.16 sec/frame for the first two steps and 0.74 sec/frame for the third step with 560×420 image resolution in C++ and Open CV on PC with CPU 2.93GHz and 2 GB RAM.

(a) color image

(c) method in [4]

We have provided an effective solution to improve depth map fro m Kinect by removing wrong value pixels, filling holes and reducing noise. As shown in the experimental result, the proposed method can significantly imp rove the quality of depth maps and enlarge Kinect’s applicat ion fields where high quality depth images are required.

Acknowledgements This work is supported by the National Natural Science Foundation of China (No. 61172161).

References [1]

S. Izadi et al. KinectFusion: Real-time 3D Reconstruction and Interaction Using a M oving Depth Camera. Proceedings of ACM User Interface and Software Technologies, pp: 559-568, 2011. [2] A. M aimone and H. Fuchs. Encumbrance-Free Telepresence System with Real-Time 3D Capture and Display using Commodity Depth Cameras. Proceedings of IEEE International Symposium on Mixed and Augmented Reality, pp: 137-146, 2011. [3] A. D. Wilson and H. Benko. Combining M ultiple Depth Cameras and Projectors for Interactions On, Above and Between Surfaces. Proceedings of ACM Symposium on User Interface Software and Technology, pp: 273-282, 2011. [4] S. M atyunin, D. Vatolin, Y. Berdnikov, and M .Smirnov. Temporal Filtering for Depth M aps Generated by Kinect Depth Camera. 3DTV Conference: The True Vision Capture, Transmission and Display of 3D Video, pp: 1-4, 2011 [5] M . Camplani and L. Salgado. Adaptive spatio-temporal filter for low-cost camera depth maps. IEEE International Conference on Emerging Signal Processing Applications, pp: 33-36, 2012 [6] M . Camplani and L. Salgado. Efficient Spatio-temporal Hole Filling Strategy for Kinect Depth M aps. Proceedings of SPIE, 82900E, 2012. [7] K. Khoshelham and O. Elberink. Accuracy and Resolution of Kinect Depth Data for Indoor M apping Applications. Sensors : Journal on the Science and Technology of Sensors and Biosensors, 12(2): 1437-1454, 2012. [8] S. Kim, J. Cho, A. Koschan, and M . A. Abidi. Spatial and Temporal Enhancement of Depth Images Captured by a Time-of-Flight Depth Sensor. Proceedings of IEEE International Conference on Pattern Recognition, pp: 2358-2361, 2010. [9] S. A. Hojjatoleslami, J. Kittler. Region Growing: a New Approach. IEEE Transactions on Image Processing, 7(7): 1079-1084, 1998.

(b) depth image

(d) our method

Fig 5: another example with complex scene.

3073

Suggest Documents