Multimed Tools Appl DOI 10.1007/s11042-010-0719-4
3D human modeling from a single depth image dealing with self-occlusion In Yeop Jang · Ji-Ho Cho · Kwan H. Lee
© Springer Science+Business Media, LLC 2011
Abstract This paper presents a 2D to 3D conversion scheme to generate a 3D human model using a single depth image with several color images. In building a complete 3D model, no prior knowledge such as a pre-computed scene structure and photometric and geometric calibrations is required since the depth camera can directly acquire the calibrated geometric and color information in real time. The proposed method deals with a self-occlusion problem which often occurs in images captured by a monocular camera. When an image is obtained from a fixed view, it may not have data for a certain part of an object due to occlusion. The proposed method consists of following steps to resolve this problem. First, the noise in a depth image is reduced by using a series of image processing techniques. Second, a 3D mesh surface is constructed using the proposed depth image-based modeling method. Third, the occlusion problem is resolved by removing the unwanted triangles in the occlusion region and filling the corresponding hole. Finally, textures are extracted and mapped to the 3D surface of the model to provide photo-realistic appearance. Comparison results with the related work demonstrate the efficiency of our method in terms of visual quality and computation time. It can be utilized in creating 3D human models in many 3D applications. Keywords Depth image · Human modeling · Self-occlusion
I. Y. Jang · J.-H. Cho · K. H. Lee (B) 213, Department of Mechatronics, Gwangju Institute of Science and Technology (GIST), 216, Cheomdan-gwagiro, Buk-gu, Gwangju, 500-712, South Korea e-mail:
[email protected] I. Y. Jang e-mail:
[email protected]
Multimed Tools Appl
1 Introduction The use of a 3D human model has become more wide spread due to its demand in movies, games, and multimedia applications. Though it is widely used, the 3D human model is mostly generated manually by experienced designers. This leads to high cost in industry, because the manual operation to generate a 3D human model requires considerable efforts with inefficient procedures. It is still a challenging research topic to reconstruct a 3D human model with accuracy and efficiency. The shape recovery methods for a static human based on 3D scanners have been widely developed in computer graphics [1, 6, 28], but these methods usually take a long scanning time and high setup cost for the 3D scanning system. On the other hand, the image-based methods have been developed to automatically generate a 3D human model [5, 13, 17, 23] in computer vision, but these methods require accurate both geometric and photometric calibrations and synchronization of cameras when multiple cameras are used. Recently, 3D human modeling methods using a single depth camera have risen as an alternative to overcome these limitations. A 3D human modeling based on a single depth camera requires a simple system setup which does not require any complicated operation or synchronization. It also gives reasonable acquisition time without huge data storage, so that it which can be directly applied to many applications such as 3DTV, game, and virtual reality. Several researchers have proposed efficient methods which can reconstruct a 3D human model using a single depth camera [21, 24, 25] due to these strengths. The single camera-based methods have many advantages in 3D human modeling; however, a self-occlusion problem inherently occurs as a major drawback (Fig. 1).
Fig. 1 Overall procedure of 3D human modeling resolving self-occlusion problem
Multimed Tools Appl
The self-occlusion problem occurs when a hidden region is caused by the other part of a human body. Incomplete input data for the human model is resulted due to the occluded area. For instance, if a person is standing with an arm in front of his/her body, the arm becomes an obstacle to capture the body region (see Fig. 2), consequently no information exists for the occluded area. In order to generate a complete 3D human model, it is essential to resolve the self-occlusion problem in the single camera-based methods. Existing techniques such as 3D scanner-based modeling and common image-based modeling cannot address the self-occlusion problem. Our proposed method solves the problem by combining a feature tracking technique and a smoothness constraint. It outperforms the current state-of-the-art work [15, 20] in terms of the simplicity of its operations while preserving the overall quality of the reconstructed model. Related work The occlusion problem has been a difficult task in computer vision research. Sun deep et al. [3] presented a novel depth-from-defocus method to estimate both the depth and the focused image with a self-occlusion. They proposed an iterative MAP (Maximum A Posteriori)–MRF (Markov Random Field) technique to resolve the occlusion while recovering the depth using the defocus as the cue. D’Apuzzo et al. [7] applied a multi-camera based photogrammetric technique to recover the human motion with occlusion and then modeled the 3D data using a fitting process. Dockstader and Tekalp [8] proposed a distributed real-time computing method to track moving persons and occlusion effects are resolved using a multi-view implementation. Considering self-occlusion often results in ambiguous pose configuration in capturing human motions, Schmaltz et al. [26] overcame the self-occlusion problem by splitting the surface model of an object. Hilsmann and Eisert [12] proposed a direct method for deformable surface tracking with self-occlusion using an algorithm which weights the smoothness constraints. Atiqur et al. [2] concentrated on motions with self-occlusion problem using the concept of a directional motion history image to recognize the human activity. The self-occlusion causes overall shape distortion and it also requires the heavy computation to recover the lost data for the overlapped region. Since it is a difficult task, most previous single image-based modeling methods [15, 24] assume that the self-occlusion does not occur. For example, the 3D human modeling method based
Fig. 2 Input data. A depth image (8 bit gray) and several color images (24 bit RGB)
Multimed Tools Appl
on a monocular depth image proposed by Kim et al. [15] constructs a 3D mesh model using a 2D depth image, but it does not take into account the self-occlusion problem. Although their algorithm contains a series of methods for depth imagebased modeling, it cannot be extended to deal with a dynamic object which has frequent occurrences of self-occlusion. Li et al. [11] proposed a visual hull-based human modeling algorithm that deals with partial occlusion. The partial occlusion in their paper is defined by the occlusion between two different objects. It is improper to apply it to the applications which require a real human model. Park et al. [20] considered the self-occlusion problem in 3D human modeling using a depth image. Their algorithm shows a unnatural reconstruction of a 3D human model, and it cannot provide a computation time that can be operated in a practical application due to their algorithmic complexity. We propose a human modeling method that resolves the self-occlusion using a single depth camera. Our proposed method directly reconstructs a real human as a 3D avatar and also greatly reduces the computational effort to develop the model. It will lead to a significant reduction of the time to market of 3D characters and enable users to be more satisfied by more realistic 3D contents. Paper organization Section 2 describes the acquisition of data and preprocessing operations performed for our method. In Section 3, we present a new mesh generation algorithm that resolves the self-occlusion problem. The rendering method that successfully represents the reconstructed 3D human model is described in Section 4. Section 5 validates our method by experiments. Finally, we conclude the paper in Section 6.
2 The overall procedure A depth image and a corresponding color image are acquired using a depth camera [14] in real time. The 3D geometric shape of a human model is reconstructed from the depth image, and the color images are used to provide the texture for realistic rendering. Although the sensors of a depth camera are improved drastically in recent years, most depth sensors still suffer from quantization and system errors and occasionally lose some pixels in the image. First, in the data preprocessing step, the missing pixels are restored by closing operations and the noise is reduced by median filtering. Next, a series of operations including adaptive sampling, triangulation and separation of an occlusion region are performed to generate the mesh surface and to resolve occlusion. In this step, 2.5D geometric points in a depth image are adaptively sampled [15] to reduce computation time for further processing and rendering tasks. Then, we construct a 3D mesh surface using the Constrained Delaunay triangulation algorithm [9]. A mesh structure is used to represent the model, since it provides a watertight model with connectivity information. The reconstructed 3D surface often contains unwanted triangles due to self-occlusion. We develop a formula based on a smoothness constraint that separates only the unwanted triangles from the regular ones. After the unwanted triangles are removed, there still exist an empty area and a collapsed area on the surface (see Figs. 6 and 10). The imperfect 3D shape caused by
Multimed Tools Appl
the above areas needs to be compensated to make a complete 3D mesh. An empty area is restored by a hole filling algorithm [22], in which interpolation is performed using the surrounding mesh. The collapsed area is restored by our own algorithm. The mesh is then refined using the Gaussian smoothing algorithm [27] to make a quality surface. Upon completion of a 3D mesh, a texture mapping is applied to give a proper color appearance over the surface. Since self-occlusion results in missing texture information in the self-occluded region, it also needs to be restored. A series of operations are performed: detection of the first frame in which an occlusion starts, extraction of the texture and a multiple texture mapping. To detect the occurrence of self-occlusion, we monitor the moving part of a body which causes self-occlusion over all the frames and find out the starting frame of occlusion using a feature tracking technique [4]. The color image in a frame just prior to an occlusion is used as the texture for the self-occluded region. For the areas without self-occlusion, the color image of the current frame is applied to each corresponding area on the 3D surface.
3 Input data acquisition and preprocessing We use a single depth image and continuing several color images (Fig. 2) captured by a depth camera as the main input data. The depth camera, Z Cam™ [14], captures a sequence of depth and color images simultaneously of an object with a 720 by 486 resolution. But the acquired raw depth images usually contain quantization errors and optical noise due to the reflectivity or color variation of the objects. In addition, reflectance properties of IR (Infrared) sensor in the depth camera and environmental lighting conditions also cause additional optical noise, which produces improper artifacts for rendering and visual interaction with the reconstructed model.
Fig. 3 a Two type of noise and missing pixels. b Median filtering. c Morphological closing operation
Multimed Tools Appl
The measuring error which is caused by sensing limitation of the device is often analyzed as an high frequency noise in the depth image. The sensing device also records its own optical noise in the depth image. In addition, some pixels may be occasionally missed during the recording depth image. Figure 3 visualizes these noises in the raw depth image and shows each corresponding preprocessing to fix them. As shown in Fig. 3a, even though the system noise is easily removed since it always occurs in the same position, the quality of depth data is still unacceptable to generate a natural 3D model due to the high frequency noise. We apply a median filter (using an Open Computer Vision Library [18]) which can preserve the boundary of the shape to reduce the high frequency noise in the depth image as shown in Fig. 3b. We also use a closing operation [18] which is one of the morphological image processes to restore the missing pixels (Fig. 3c).
4 Mesh generation and resolving self-occlusion 4.1 Adaptive sampling After the preprocessing step, the depth image is used as the 3D point data to generate a 3D mesh surface. But when the depth image is directly used without any additional process, the generated 3D mesh will consist of very dense triangles, and this will often result in a noisy mesh model. Moreover, the dense data will cause a computation burden in practical applications. We not only reduce the number of points, but also remove many artifacts in advance to the actual mesh generation process using a down-sampling method. The adaptive sampling method, one of down-sampling techniques, is found in the previous work of the author [15]. It gives an optimal solution making a point cloud which finely represents the shape with optimal amount of points from a depth image. 4.2 Triangulation Then, we generate a triangular mesh surface from the sampled points using constrained Delaunay triangulation [9]. At this time, we denote the silhouette points (external boundary points) as the constraint in the triangulation algorithm since we need to keep the connectivity among the silhouette points in order to generate an accurate boundary of the 3D model. 4.3 Removal of unwanted triangles If the 3D mesh model is generated without dealing with the self-occlusion, it produces an awkward or unwanted triangles which connect the arm and the body as shown in Fig. 4. In Fig. 4, only the non-color region is needed for rendering a more realistic 3D human model, but the red region caused by the occlusion is unavoidable during generation of a 3D mesh surface. So, a special method is proposed to remove only the
Multimed Tools Appl
Fig. 4 The red region (occlusion region) means the unwanted triangles caused by the self-occlusion problem
red region. In practice, we formulate an equation (1) based on smoothness constraint which uses the slope of each edge of the triangles. The mesh model is made up of many triangles, and each triangle consists of three → → → → edges, with each edge having two half edge vectors (− v i1 − − v i0 or − v i0 − − v i1 ) and − → → 0 − 1 two vertices ( v i , v i ). Now, the equation uses the deviation of the x, y, z value between two vertices of an edge and multiplies it by the dot product between a − → direction vector of edge and the Z unit vector ( Z ) as described below. Fi = Wi ( pz − p y − px ) − − → → → → → → → Wi = | Z • − e |, v i1 − − v i0 or − v i0 − − v i1 ei = − − → → → p = − v 1−− v 0 ,
− → → p = p(x, y, z), − v = v(x, y, z)
(1)
where i is the index of each edge. Finally, each weight value (Fi ) computed by the equation is assigned to the corresponding edge. Then, these weight values are sorted as shown in Fig. 5. The graph implies that the edges with low value are low-lying; on the other hand, the edges with high value are sloped highly closed to Z unit vector. Thus, the edges of regular triangles in the green region of Fig. 5 have low values, while the unwanted triangles in red region has high values. Using this principle, we can extract only the unwanted triangles by using an appropriate threshold value that separates the occlusion region. To obtain the appropriate threshold, we use an adaptive threshold (AT) [19] which has been used for thresholding based on the histogram in image processing. The AT divides a histogram into two parts by finding a reference point that gives
Multimed Tools Appl
Fig. 5 Edges sorted in increasing order of corresponding weight values (e: edge index, F: weight value)
the maximum variance. Since the AT cannot cover the negative weight values in the histogram and the edges corresponding to the negative values already imply that they belong to the regular region, we exclude negative values. The AT algorithm finds all edge indices having the weight value bigger than the threshold. Finally, we delete a vertex having a smaller z value of two vertices from each edge that has a bigger
Fig. 6 Removal of unwanted triangles for the 3D mesh model
Multimed Tools Appl
weight value than the threshold. Figure 6 shows the 3D mesh model in which the vertices of the unwanted triangles are removed by our method.
5 Shape restoration and refinement A region with no-data is resulted when the unwanted triangles between the occluded and the occluding region is removed (see Fig. 6). Our study uses the method in [22] which optimally restores the collapsed mesh surface with an acceptable computation time. To apply this method, a hole must be constructed first. 5.1 Bridging In Fig. 7, the mesh model does not have any hole except for the boundary hole consisting of the silhouette points which does not need to be filled. So, we apply a novel bridging technique to make a hole for the no-data region. The technique creates a bridging triangle that separates the occluded region (no-data region) and the occluding region which represents the arm in this study (see Fig. 8). We first classify the boundary points into two types of boundary ones as shown in Fig. 7. The original boundary points (red in Fig. 7) defined as a silhouette are selected by applying the chain code algorithm [16] to the original depth data. The new boundary points (blue in Fig. 7) are formed when the unwanted triangles are removed from the mesh model. Then, two intermediate points (green points in Fig. 7)
Fig. 7 New boundary points (blue), original boundary points (red), and intermediate points (green)
Multimed Tools Appl
Fig. 8 Bridging mesh
appear between the new boundary points and the original boundary points. We select the point having the lower z value as the first vertex of the bridging triangle and then find the rest points of the bridging triangle using the following equation. − y y → → → v zj+1 | ∗ |− p1 −− v j| |→ v zj − − − → p 2 ( j) = argmax , → → → → |− p 1x − − v xj | ∗ |− p 1z − − v zj | − → → p = {− p i , i = 1, 2, 3 |
− → → pi ⊂ − v j, j = 1, ..., n}
(2)
where n is the number of new boundary vertices. → → In (2), − p 1 is the first vertex of the bridging triangle, − v means the new boundary − → vertices (blue points in Fig. 8), and v j+1 is the neighboring boundary vertex on both → sides of − v j. The equation 2 first identifies a boundary vertex that has the biggest → difference from the y-coordinate of − p j and the big deviation from z-value of its neighboring vertex, but has little difference in x- and z-coordinate. The boundary → vertex satisfying these conditions is designated as the second vertex, − p 2 , of the − → bridging triangle. The remaining vertex of the bridging triangle, p 3 , is a neighboring → boundary vertex of − p 2 having lower z-value. Figure 8 shows the bridging triangle generated by the equation. The bridging triangle successfully separates the occluded region from the occluding region and creates a valid hole. Now, we can produce a complete 3D mesh model by filling the hole for the 3D human model.
Multimed Tools Appl
5.2 Hole filling We apply Filling Holes in Mesh algorithm proposed in [22] to fill the hole by bridging. The algorithm starts from an arbitrary point among the boundary vertices of the hole. It works by selecting a triangle with the minimum weight that is based on the area of the triangle and the dihedral angle (normal variation) with the neighboring triangle that is located outside the hole. The algorithm repeats until all the boundary vertices are triangulated using (3). Figure 9 shows the 3D human model after the no-data region (occluded region) is restored. 5.3 Restoration of collapsed area Although the occluded area in the body is recovered by hole-filling, we still have another trouble area. It is the collapsed area appearing when the unwanted triangles between the occluding part and the occluded region are removed. Figure 10 shows the collapsed area on the arm that appears like saw teeth. Figure 11 explains our idea to recover the collapsed area using the concave property between two adjacent triangles. Let a polygon depict the collapsed area and the blue and red arrows denote the edge vectors of the boundary triangles in Fig. 11, the two adjacent triangles make a concave shape when the dot product is greater than zero, and the two make a convex shape when the dot product is less than zero. This is our concavity checking method. At this time, it is important to perform the concavity check only once along the boundary of the collapsed area because multiple checks for the concavity may
Fig. 9 The hole filled 3D mesh surface
Multimed Tools Appl
Fig. 10 Collapsed area on the arm
cause the distortion of the overall shape. A triangle is added in the concave opening between two edge vectors in the collapsed area, then we can obtain the restored result as shown in Fig. 12.
Fig. 11 Concavity checks
Multimed Tools Appl
Fig. 12 This figure illustrates our shape refinement. The collapsed area is properly recovered (red region) and the surface is smoothened by Gaussian Smoothing
5.4 Smoothing of a mesh surface Although the noise of the depth image is primarily reduced by the median-filtering in the preprocessing step, the mesh generation and the self-occlusion resolving steps may cause a rough mesh surface. We adopt a 3D Gaussian smoothing [27] to the mesh model to refine the overall shape due to its linear computation time and storage requirement. To apply the smoothing method to the discrete mesh, it is derived as a Laplacian operator. Laplacian L(Pi ) at vertex Pi can be linearly approximated using (3). L( pi ) =
ni −1 1 ( p j − pi ) ni i=0
(3)
where p j is a jth neighboring vertex of pi and ni is the valence of pi . Figure 12 shows the finally refined 3D surface through the restoration of collapsed area and the smoothing of surface.
6 Rendering Our rendering consists of three steps: (1) occlusion detection, (2) texture extraction and (3) texture mapping. As already mentioned, the occluded region has neither color data nor depth data. Thus, the texture for the occluded region must be recovered. We use a previous frame that contains the texture for the occluded region to extract the texture. First, we detect the frame that the object (e.g., arm) starts to make occlusion with the body. In this step, we use a feature tracking technique [4] which has been used in object tracking applications. Second, we need to extract texture for the mesh surface in the sequence of images. Lastly, we finalize
Multimed Tools Appl
our rendering by mapping between the texture images and the reconstructed mesh surface according to the texture coordinate. 6.1 Occlusion detection If we use only the current frame for the rendering, the color data of occluding area in the current frame can be mistakenly used for recovering the occluded region. Therefore, we need to detect in what frame an occlusion starts of the sequence of images. This frame is used to recover the texture of the occluded region which can not be covered with only the current frame having the occluded region. To detect the frame, we use an optical flow technique [4]. The optical flow is usually used for object tracking for a series of images. It yields the vectors that imply the image velocity at each pixel between consecutive frames. We can keep track of a feature using these vectors and the tracked feature makes it possible to detect occlusion by comparing its position in the current frame with the one in the next frame. Figure 13 shows that a feature point (red circle in Fig. 13) can be accurately tracked using the motion vector (blue arrow in Fig. 13) which is implies a direction of the moving feature point (green circle in Fig. 13). The location of a feature point is selected manually at the first frame and the vector is computed at each frame using a sequence of color images. Finally, the location of a feature point is found by using the vector known as the optical flow in each frame, and the tracked feature point is drawn at the computed location of the depth image corresponding to the color image.
Fig. 13 Feature tracking example (green circles: scouting point, red circles: feature point, blue arrows: motion vector)
Multimed Tools Appl
A linear equation that uses the current location of a feature point and the vector computed from each frame is proposed. It calculates the scouting point (green circle in Fig. 13) of the feature point to detect the occurrence of occlusion. T T T − → − → − → s = t d + f , s = sx s y , f = fx f y , d = dx d y
(4)
→ where − s is the scouting point which is used for the occlusion detection. In (4), the location of the scouting point is computed at each frame in terms of the − → x-, y-index of the pixel and d is the motion vector regarded as the image velocity. − → f is the tracked feature point and t is a scalar value. Usually, t is set to the numerical value bigger than the width of tracking part of object. For example, if the tracking part is the hand, t is manually set to the numerical value bigger than the width of a hand. As shown in Fig. 13b, the depth image has the zero intensity of pixels for the background, and the non-zero pixels for the region of object. The scouting point at each frame is moved until it reaches the body. At start, the scouting point has a zero intensity pixel with its neighbors in the depth image, but it will have a non-zero pixel when it makes an occlusion with the body. So, the occlusion frame is determined by detecting the non-zero pixel next to the scouting pixel. Finally, we use the color data of the frame, previous to the occlusion frame, as the texture of the occluded region. 6.2 Texture extraction This section describes our method to extract texture to map it to the reconstructed 3D mesh surface. Fortunately, the color coordinate and the depth coordinate correspond to each other for the data acquired by the depth camera. So, no special computation effort is needed such as the case for ‘parameterization’ of non-parametric geometry, precomputation of solid texture, surface painting, and texture-preserving simplification. Ultimately, we only have to bring the texture for the occluded region from the color image of the detected frame, and the texture for the other region from the color image of the current frame. The texture for the occluded area is extracted from the detected frame using the position information of the occluded area which is known from the hole filling step, and the texture for the other region is easily extracted from the color image of the current frame that corresponds to the mesh model. 6.3 Texture mapping The vertices of reconstructed 3D mesh surface are one-to-one mapped to the color data of texture images. Figure 14 depicts how to completely map the two textures onto the corresponding areas of the 3D human. We need to apply multi-texturing which uses more than one texture at a time on a surface. First, we map the most part of reconstructed surface with the texture in the current frame. Then, we map the secondary texture for the occluded region as shown in Fig. 14 using the color data of the previous frame found in the occlusion detection.
Multimed Tools Appl
Fig. 14 Texture mapping on the mesh model
7 Experimental results We test and validate the performance of our proposed 3D human modeling method by comparing it with the different depth image based methods using the same data. Figure 15 shows the results and Fig. 16 compares our result with the previous work. In Fig. 15, (a) shows the final rendering results by the proposed algorithm and (b) shows the benefit of resolving self-occlusion problem. It successfully shows that
Fig. 15 Rendering results. a Final scene, b composite scene
Multimed Tools Appl
Fig. 16 Visual comparison of related work (1.7 K vertices): a Textured mesh model without considering occlusion by Kim et al. [15], b same model considering occlusion problem by Park et al. [20], c Result by our proposed method
Multimed Tools Appl
a CG object can be inserted into the reconstructed human model without surface collision. Our algorithm is first compared with the depth image based modeling method proposed by Kim et al. [15] that does not consider any occlusion problem. Kim’s method is better in terms of computation time, but the resulting 3D model contains geometrically unnatural regions, which becomes critical drawback to use it for practical 3D applications. Park et al. [20] proposed a method to resolve the problem of self-occlusion, but the method is also not acceptable for practical purpose due to its long computation time. Figure 16 shows the visual quality of the proposed work and the previous work by Kim et al. [15] and Park et al. [20]. It is observed that the proposed method produces much smoother shape while resolving the occlusion problem. 7.1 Evaluation of 3D human modeling methods by different error metrics To evaluate the proposed method, we define an error metric (5, 6, 7) that is applied to the region of interest (see Fig. 17). We compare our method with those of Kim et al. [15] and Park et al. [20] in terms of edge slant, irregularity of triangles and gradient of the surface. The first metric in (5) estimates how much each edge (ei ) is slanted toward the z-direction vector → (− z ). Since the edges in the occlusion area (see Fig. 17) are slanted close to the zaxis, we evaluate the occlusion area using a dot-product between each edge and the z-direction vector. We can decide that more edges exist in the occlusion region when the metric gets bigger. As shown in Figs. 17 and 18, our method can remove the unwanted triangles more thoroughly, compared to the result of Kim et al. [20] which shows many highly slanted edges in the occlusion region. n → 1 → → → S − e = − ei · − z , − e = {Edges|i = 1...n} n i=1
(5)
The second metric in (6) evaluates the irregularity of triangles. When the shape of a triangle is similar to a circle, the shape of a triangle becomes more regular. Regular triangles provide a quality rendering of a mesh with a smaller number of vertices.
Fig. 17 Comparison of mesh generated by different methods
Multimed Tools Appl
Fig. 18 Comparison of 3D human modeling methods in different error metrics
We use the irregularity term in [10] as the metric. Figure 18b shows the irregularity of mesh for different methods. 1 fi (ρ)2 , f = {Triangles|i = 1 . . . m} m i=1 4π fi (ω) m
I( f ) =
(6)
(ρ : perimeter of triangle, ω : area of triangle) As the last metric, we use the gradient of a surface based on the first derivative of the surface, which shows how much each triangle is tilted with respect to its neighbors. The gradient of a surface is defined as follows: 1 1 |g(xi , yi ) − g(x j, y j)| l k i=1 j=1 |xi − x j| l
F(g) =
+
k
|g(xi , yi ) − g(x j, y j)| , g : Function of surface |yi − y j|
(7)
where l is the number of vertices, and k is the number of one-ring neighbor vertices. The mesh surface by Kim et al. [15] shows many triangles standing along the zdirection, on the other hand, the one by Park et al. [20] has many tilting triangles in the connecting area between the arm and the body (see Fig. 17). Our proposed method is compared with other methods in terms of computation time as shown in Table 1. The proposed method is about six times faster than that by Park et al. [20]. The method by Kim et al. [15] is a little bit faster than ours, but Table 1 Comparison of computation time (sec) Process
Kim [15]
Park [20]
Our method
Median filtering Occlusion region detection and separation in an image Adaptive sampling and silhouette smoothing Point sampling in the removed area Delaunay triangulation and refinement Occlusion region removing and hole filling in mesh Gaussian smoothing and computing of texture coordinate Texture recovery Texture extraction Total
0.545 – 0.513 – 0.710 – 0.451 – – 2.219
0.545 2.405 0.513 0.144 0.710 – 0.451 10.80 – 15.568
0.545 – 0.513 – 0.710 0.093 0.451 – 0.050 2.462
Multimed Tools Appl
self-occlusion is not considered in their case. In summary, the proposed method generates a 3D human model with a reasonably fast computation time while resolving the self-occlusion problem by only using 3D mesh data without additional information.
8 Conclusion and future work In this paper, we propose a single depth image based human modeling method which reconstructs a 3D human model using the images of a real human. In particular, we resolve the inherent self-occlusion problem of a human model which often causes shape distortion and color loss. Although our method is developed for a single depth camera system which usually results in a frontal half of the 3D model, it is sufficient to provide a natural 3D human model with a wide range of view angles. Our method has improved the visual quality of a 3D human model by effectively solving the self-occlusion problem. Experimental results show that our method outperforms the previous methods with respect to visual quality and computation time. We plan to further develop our method which takes into account multiple occlusions not only in one frame but in a sequence of images. In addition, the texture extraction will be improved to automatically find a part of the human body so that the manual intervention at the beginning of a scene can be eliminated. Acknowledgements This research was supported by the MKE (The Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA-2010-(C1090-10110003)) and also by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 20100018897).
References 1. Allen B, Curless B, Popovic Z (2003) The space of human body shapes: reconstruction and parameterization from range images. In: Proceedings of ACM SIGGRAPH 2003, pp 587–594 2. Atiqur RA, Tan JK, Kim HS, Ishikawa S (2008) Solutions to motion self-occlusion problem in human activity analysis. In: Proceedings of the 11th international conference on computer and information technology (ICCIT), Article no. 4803095. Khulna, Bangladesh, pp 201–206 3. Bhasin SS, Chaudhuri S (2001) Depth from defocus in presence of partial self occlusion. In: Proceedings of the international conference on computer vision, vol 2. Vancouver, Canada, pp 488–493 4. Bouguet JY (2000) Pyramidal implementation of the Lucas Kanade feature tracker. Intel Corporation, Microprocessor Research Labs 5. Carranza J, Theobalt C, Magnor MA, Seidel HP (2003) Free-viewpoint video of human actors. ACM Trans Graph 22(3):569–577 6. Cyberware. http://www.cyberware.com. Accessed 22 January 2011 7. D’Apuzzo N, Plankers R, Fua P, Gruen A, Thalmann D (1999) Modeling human bodies from video sequences. In: Proceedings of the SPIE videometrics VI, vol 3461. San Jose, USA, pp 36–47 8. Dockstader SL, Tekalp AM (2001) Multiple camera tracking of interacting and occluded human motion. Proc IEEE 89(10):1441–1455 9. Domiter V (2004) Constrained Delaunay triangulation using plane subdivision. In: Proceedings of the 8th central European seminar on computer graphics. Budmerice, pp 105–110 10. Garland M, Wilmott A, Heckbert PS (2001) Hierarchical face clustering on polygonal surfaces. In: Proceedings of symposium on interactive 3D graphics, pp 49–58 11. Guan L, Sinha S, Franco JS, Pollefeys M (2006) Visualhull construction in the presence of partial occlusion. In: Proceedings of the third international symposium on 3D data processing, visualization, and transmission (3DPVT06). Chapel Hill, USA, pp 413–420
Multimed Tools Appl 12. Hilsmann A, Eisert P (2008) Tracking deformable surfaces with optical flow in the presence of self occlusion in monocular image sequences. In: Proceedings of CVPR workshop on non-rigid shape analysis and deformable image alignment. Anchorage, USA, pp 1–6 13. Hilton A, Beresford D, Gentils T, Smith R, Sun W (1999) Virtual people: capturing human models to populate virtual worlds. In: Proceedings of the IEEE international conference on computer animation. Geneva, pp 174–185 14. Iddan GJ, Yahav G (2001) 3D imaging in the studio and elsewhere. . . . In: Proceedings of the SPIE videometrics and optical methods for 3D shape measurements. San Jose, CA, USA, pp 48–55 15. Kim SM, Cha J, Ryu J, Lee KH (2006) Depth video enhancement for haptice interaction using a smooth surface reconstruction. IEICE Trans Inf Syst E89-D(1):37–44 16. Liu YK, Zalik B (2005) An efficient chain code with Huffman coding. Pattern Recog 38(4):553–557 17. Mikic I, Trivedi M, Hunter E, Cosman P (2003) Human body model acquisition and tracking using voxel data. Int J Comput Vis 53:199–223 18. OpenCV Library. http://www.intel.com/research/mrl/research/opencv/. Accessed 22 January 2011 19. Otsu N (1979) A threshold selection method from gray-level histogram. IEEE Trans Syst Man Cybern SMC-8:62–66 20. Park JC, Kim SM, Lee KH (2006) 3D mesh construction from depth images with occlusion. In: Proceedings of the Pacific conference on multimedia (PCM). LNCS, vol 4261, pp 770–778 21. Peng E, Li L (2004) 3D Human model acquisition from uncalibrated monocular video. In: Proceedings of the computer vision and graphics international conference, ICCVG 2004. Warsaw, Poland, pp 1018–1023 22. Peter L (2003) Filling holes in meshes. In: Proceedings of the Eurographics symposium on geometry processing, Aachen, Germany, pp 200–206 23. Plankers R, Fua P (2001) Tracking and modeling people in video sequences. Comput Vis Image Underst 81:285–302 24. Remondino F, Roditakis A (2003) Human figure reconstruction and modeling from single image or monocular video sequence. In: Proceddings of the 4th international conference on 3-D digital imaging and modeling (3DIM). Banff, Canada, pp 79–86 25. Sappa A, Aifanti N, Malassiotis S, Strintzis GM (2003) Monocular 3D human body reconstruction towards depth augmentation of television sequences. In: Proceddings of IEEE int. conf. on image processing. Barcelona, Spain 26. Schmaltz C, Rosenhahn B, Brox T, Weickert J, Wietzke L, Sommer G (2008) Dealing with selfocclusion in region based motion capture by means of internal regions. In: Articulated motion and deformable objects (AMDO). LNCS, vol 5098. Springer, Heidelberg, pp 102–111 27. Taubin G (1995) A signal processing approach to fair surface design. In: Proceedings of the SIGGRAPH, pp 351–358 28. Vitronic. http://www.vitronic.de/bodyscannen/. Accessed 22 January 2011
In Yeop Jang received the MS degree at Gwangju Institute of Science and Technology (GIST), Korea in 2006. He is currently working on his Ph.D degree at the Intelligent Design and Graphics Laboratory at GIST. His interests include 3D broadcasting, computer vision, image processing, and polygon modeling.
Multimed Tools Appl
Ji-Ho Cho received the MS degree in Information and Communications in 2005 from Gwangju Institute of Science and Technology in Gwangju, Korea, and worked during the following eight months as an academic guest at Swiss Federal Institute of Technology ETHZ in Zürich. He recently started working on his PhD degree at the Intelligent Design and Graphics Laboratory at GIST. His main interests lie in 3D Video and computational photography.
Kwan H. Lee received his MS and PhD degree at North Carolina State University in 1985 and 1988 respectively. He worked as an assistant professor at Northern Illinois University from 1988 to 1994. He is a professor in the Mechatronics Department at Gwangju Institute of Science and Technology (GIST) since 1995. His research interests are in CAD and computer graphics, that include geometric modeling, photorealistic rendering, reverse engineering, and rapid prototyping. Currently he is the director of the Immersive Contents Research Center at GIST and focuses on his research in immersive modeling and realistic material modeling.