SINGLE ITERATION VIEW INTERPOLATION FOR ... - CiteSeerX

0 downloads 0 Views 713KB Size Report
switching, both matrix-based depth image based rendering and the complex virtual view ..... edge dependent depth filter and interpolation” in Proceedings of.
SINGLE ITERATION VIEW INTERPOLATION FOR MULTIVIEW VIDEO APPLICATIONS Pei-Kuei Tsung, Pin-Chih Lin, Li-Fu Ding, Shao-Yi Chien, and Liang-Gee Chen DSP/IC Design Lab, Graduated Institute of Electronics Engineering National Taiwan University, Taipei, Taiwan ABSTRACT Multiview video can bring the 3D and virtual reality perceptual experience to users by its multiple view point characteristic. To support the smooth and free view point switching, both matrix-based depth image based rendering and the complex virtual view interpolation schemes are required. In order to reduce the high computation complexity and avoid the iterative processing scheduling in conventional view interpolation flows, a single iteration view interpolation algorithm is proposed. By use of the proposed algorithm, the redundant warping operations are reduced by 86%. In addition, based on the proposed artifact detecting and removing algorithm, artifacts due to imperfect depth maps can be detected and eliminated at the same time. Therefore, no additional post-processing or iteration is required and the single iteration processing is achieved. Index Terms— Multiview video, 3D-TV, FTV, View Interpolation, Warp, DIBR 1. INTRODUCTION For advanced TV applications, vivid perception quality is required. Multiview video (MVV) can bring viewers 3D and real perceptual experiences by projecting video data of multiple views from different viewing angles to users simultaneously. As the display technology evolves, lots of related applications, like 3D-TV [1] and free-viewpoint TV (FTV) [2], are emerging. However, the computation requirement for MVV sequences is huge especially on the compression and virtual view interpolation parts. In most of the previous works on virtual view interpolation, only the horizontal disparity between different views is considered to simplify the computation [3][4][5]. These approaches cannot fit all the MVV and 3D-TV applications. For example, the display of an FTV needs to support the virtual view with vertical shift while users are standing up. Furthermore, the view point rotation and 3D zoom-in/zoomout functionality based on the interaction between users and the display should also be included. Thus, a complete virtual view interpolation that supports not only horizontal shifts but also rotations and translations for all directions is

required. Virtual view interpolation is based on depth image based rendering (DIBR). According to the mathematical model [6], the complete DIBR needs the matrix-based computation. The virtual view is interpolated by blending the synthesized result from all the reference views in the previous flows [3][7]. That is, several matrix-based warping operations are required for each pixel. In addition, the computation complexity is further increased due to the iterative processing for the artifact detection. In order to solve this high computation complexity and iterative processing scheduling, a single iteration view interpolation algorithm is proposed. By use of the proposed algorithm, redundant warping operations can be reduced by 86%. Furthermore, based on the proposed artifact detecting and removing algorithm, artifacts can be eliminated during single iteration. Hence, no additional iteration for artifact removing is required. As the result, single iteration view interpolation is achieved. The remaining of this paper is organized as follows: First, the problem statement and the proposed single iteration view interpolation algorithms are presented in section 2. Second, the proposed single iteration artifact removing algorithm is introduced in section 3. Then, the experimental result of computation complexity is described in section 4. Finally, the conclusion of this paper is drawn in section 5. 2. PROPOSED SINGLE ITERATION VIEW INTERPOLATION ALGORITHM 2.1. Problem Statement Figure 1 shows the conventional view interpolation flow. First, all the reference views are warped to the virtual view position. Each warped frame contains holes due to the depth discontinuity like the green part in Fig. 1. Then, all warped frames are blended together and processed with iterative post-processing to generate the virtual view. In Fig.1, there are two reference views. So, the number of warping operation is 2×(frame size), while only 1×(frame size) pixels are outputted to the display. Thus, about half of the warping operations are redundant since most of warped pixels from two views are overlapped except the holes.

View1

View2 Blend

Blend

Warp

Warp Iterative post processing

Virtual view

Fig. 1. Conventional view interpolation flow Scan pattern in view 1 Scan pattern in view 2

Switch the reference

Fig. 4. Block diagram of the proposed algorithm.

D Warp

Warp Blend

View 1

Warp

Blend

Virtual view

(a)

Warp Blend

View 2

View 1

Blend

Virtual view

View 2

(b)

Fig. 2. Scan pattern in the view interpolation, (a) conventional flow, and (b) proposed DRS and EPP flow.

Fig. 3. The out-of-boundary pixels in the warped result. The red line is the frame boundary of the output virtual view. 2.2. Proposed Algorithm Figure 2 (a) shows the scan pattern in the conventional view interpolation flow. All the pixels on both references are scanned and warped so that many redundant computations are made. Since most pixels of the virtual view can be warped from single reference view, the other reference views can only be used when a hole region is detected. Figure 2 (b) shows the concept of the proposed dynamic reference selection (DRS) scheme. First, view 1 is selected as the initial reference. The warping process is performed on a raster-scan order until a depth discontinuity is reached. The depth discontinuity is detected by simply measuring the distance D between the warped destinations from two successive pixels. Then, D is compared with a

predefined threshold. If D is smaller than the threshold, it is regarded as the same depth region and the warping reference is still view 1. In this case, the unfilled pixels between the discontinuities are bilinear-interpolated directly. On the other hand, if D is larger than the threshold, a hole is detected and the warping reference is switched to view 2. The start point of the warping process in view 2 is calculated by warping the current pixel in view 1 to view 2. The proposed DRS method is sufficient to view interpolation with ground-truth depth maps. However, if there is some inconsistence between depth maps of each view, the inconsistence causes the inconsistent switching and additional holes are generated. To solve it, an end point prediction (EPP) scheme is proposed. While switching among different references, not only the start point but also the end point of the hole region is calculated. Then, the reference is switched back to view 1 again when the warping process is reached the predefined endpoint even if there is no depth discontinuity. By doing this way, only the hole region is warped from view 2 and the redundant computation is further reduced. Sometimes the generated start point while reference switching is pointed to an already warped pixel and the redundant computation occurs. This case is often observed when the two references are not only horizontally shifted. The problem can be solved by usage buffer recording (UBR). A reference usage buffer records the pixels used. When the warping process moves to these areas, the switching mechanism is performed immediately to avoid redundant computation. Last but not least, the boundary detection (BD) can be used to detect the out-of-boundary pixels. Fig. 3 illustrates the out-of-boundary pixels. The red line represents borders

Boundary Fore- Background ground

Warp Hole

Overlap

Fig. 5. Illustration of hole and overlap region.

(a)

(b)

Fig. 6. Artifacts removal with running interpolation: (a) before and (b) after the running interpolation.

(a)

(b)

Fig. 7. Artifacts removal with background erosion: (a) before and (b) after the background erosion. of the actual displayed region. Pixels beyond the red line are not displayed. Thus, the computation of these pixels is redundant and can be removed without effecting the visual quality. The boundary pixels can be easily detected according to the scan order. That is, once the current pixel is an out-of-boundary pixel, all the following pixels in the same row are also out-of-boundary pixels. Figure 4 shows the overall block diagram of the proposed DRS, EPP, UBR, and BD algorithms. The warping process is started from view 1. The BD is used to detect the boundary pixel first. Then, the DRS is performed by comparing the distance D and the threshold. If D is smaller than threshold, the warping reference is switched to view 2 and the end point is calculated by EPP. If D is larger than threshold, the missing pixels between D are bilinearinterpolated. 3. SINGLE ITERATION ARTIFACT REMOVING 3.1. Running Interpolation with Z-Buffer Unlike the graphic processing, depth map used in multiview video is made by stereo matching algorithm

without the predefined ground-truth. Owing to imperfect depth maps, some artifacts are generated during the view interpolation process. As shown in Fig. 6, lots of small dots appear in the warped virtual view even in the region with smooth depth value. Many algorithms are proposed in the prior-arts to detect and remove these artifacts. However, most of the algorithms need iterative searching and processing and cannot be used while only single iteration is available. Artifacts from view interpolation are mainly caused by two reasons: the depth discontinuity and the boundary effect. Method to remove the artifacts from depth discontinuity is proposed in this section and method to boundary effect is introduced in section 3.2. Holes from depth discontinuity can be resolved by the algorithm introduced in Sec. 2. However, there are still other small artifacts which cannot be eliminated by switching the reference. They are mainly caused by the truncation of the floating point matrix-based warping result. According to the block diagram in Fig. 4, these small gaps are filled by simply bilinear-interpolated from the neighbor pixels. Nevertheless, some artifacts still cannot be eliminated that way. For example, Fig. 5 illustrates the warping process. Due to the different depth value of foreground and background, there are some “hole” regions with no pixel and some “overlap” regions contained more than one pixel in the warped frame. If the truncation gaps are occurred in the overlap region, they are hardly detected since the gap is already filled by the background. For example, some black and white dots are found in Fig. 6 (a). The white dots are on the overlap regions and the white color is the color of background. A running interpolation with z-buffer approach can be used here to solve this problem. During the warping process, not only the texture but also the corresponding depth values in the warped frame are saved. Then, each time the disparity truncation occurs, both the texture and depth are interpolated. After the depth interpolation, the overlap region, the white dots in Fig. 6 (a), can be reduced by comparing the depth value between pixels warped to the same position. Figure 6 (b) shows the result of running interpolation. Both the black and white dots are compensated in Fig. 6 (b). 3.2. Background Erosion Another type of artifact due to imperfect depth maps is the boundary effect. Sometimes the object boundaries on the depth map are not clear and some mismatches on object boundaries between depth maps and video frames can be observed. These unstable boundaries bring up the boundary artifacts like ghost effect on the warped frame. The red line in Fig. 5 indicates the boundary region. In the prior-arts, this issue is solved by separating the reference frame into main layer and boundary layer. While warping the boundary layer,

Reduntant Computation (Warping/Pixel) 0

0.2

0.4

0.6

0.8

1

Original DRS + BD + UBR + EPP

Fig. 8. The computation analysis. With the proposed algorithms, the redundant computation is reduced by 86%. Blue: Warped by view 1 Red: Warped by view 2 Green: Warped by both views (redundant)

(a)

(b)

Fig. 9. Reference source profiling: (a) Original blending view interpolation, (b) proposed algorithm. special operations, like matting or filtering, are adopted to reduce the boundary effect [3][7]. However, these methods need several iterations processing. Fig. 7 (a) shows a real case of the ghost effect. This is generated from regarding pixels on foreground boundary as the background and warping them to the wrong place. A background erosion method is proposed here to remove the wrongly warped boundary. Every time a depth discontinuity is detected, the current pixel coordinate is moved backward with a predefined distance and re-warps those parts from another reference view. Since there is disparity between views, the object boundary in one view is not a boundary in other views. As the result, the re-warped region does not contain the boundary pixels and the ghost effect is eliminated. Figure 7 (b) shows the result after the background erosion. In Fig. 7 (b), the ghost found in Fig. 7 (a) is removed. 4. EXPERIMENTAL RESULT The multiview sequence “ballet” and “breakdancers” published by Microsoft are taken as the test sequences [7]. Both of the sequences contain 8 views. In this paper, the distance of reference views is set to 2. For example, view 1 and view 3 is taken as the reference views and the virtual view is set in the same position as view 2. Figure 8 shows the computation analysis of the proposed algorithm. Since each pixel in the warped frame needs at least one warping process, the redundant computation is estimated by the average warping per pixel minus one. As illustrated in Fig. 8, the redundant computation is reduced by 86 % after adopting all the proposed computation reduction algorithms. As the result, only 1.14 warping operations are taken to generate one pixel in the virtual view in average. The

reduction from BD is only 4% because the distance between two reference views is not very far. As shown in Fig. 3, only little parts of pixels are out-of-boundary pixels. Figure 9 shows the computation analysis and reference source profiling. The blue and red part in Fig. 9 mean pixels warped from view 1 and view 2 respectively. On the other hand, the green part means pixels warped from both views and the redundant computation occurs. Figure 9 (a) shows the reference source profiling for view interpolation without the proposed algorithm. Almost all the frame is covered by green and the average computation is 2.0 warping/pixel. After using the proposed algorithm, the green part is largely reduced as shown in Fig. 9 (b). The remaining green parts are caused by the imperfect boundary between foreground and background in the depth map. The wrong object boundary causes the predicted endpoint is the same as the start point and the redundant computation on foreground. 5. CONCLUSION In order to reduce the computational complexity and the complex scheduling of view interpolation, a single iteration view interpolation algorithm is proposed in this paper. By using the proposed DRS, EPP, UBR, and BD methods, the redundant computation is reduced by 86%. Further, the artifacts due to the imperfect depth map are eliminated by the proposed running interpolation and background erosion under single iteration. 6. REFERENCES [1] A. Smolic and P. Kauff, “Interactive 3-D video representation and coding technologies,” Proceedings of the IEEE, vol. 93, no. 1, pp 33-36, Jan. 2005 [2] M. Tanimoto, “Free viewpoint television - FTV,” in Proceedings of Picture Coding Symposium, 2004 [3] A. Smolic et. al, “Intermediate view interpolation based on multiview video plus depth for advanced 3D video systems,” in Proceedings of IEEE International Conference on Image Processing, 2008, pp. 2448-2451 [4] W.-Y. Chen et. al, “Efficient depth image based rendering with edge dependent depth filter and interpolation” in Proceedings of IEEE International Conference on Multimedia & Expo, 2005, pp. 1314-1317 [5] S.-H. Kim et al, “A 36 fps SXGA 3-D Display Processor Embedding a Programmable 3-D Graphics Rendering Engine,” in IEEE Journal of Solid-State Circuits, vol. 43, no. 5, pp. 1247-1259 May 2008 [6]C. Fehn, “Depth-Image-Based Rendering (DIBR), the compression and Transmission for a New Approach on 3D-TV,” in Proceedings of SPIE Stereoscopic Displays and Virtual Reality Systems XI, San Jose, CA, USA, Jan. 2004. pp. 93-104 [7] C.L. Zitnick et al, “High-quality video view interpolation using a layered representation,” ACM SIGGRAPH and ACM Trans. on Graphics, Los Angeles, CA, Aug. 2004, pp. 600-608.

Suggest Documents