2012 IEEE International Conference on Multimedia and Expo Workshops
Novel 3DV Coding Scheme with Down-/Up-sampling And Asymmetrical Prediction Xiang Ma, Junyan Huo, Yilin Chang, Guangliang Ren
Ying Chen, Li Zhang
State Key Lab. of Integrated Service Networks Xidian University Xi’an, China
[email protected]
Qualcomm Incorporated San Diego, USA
efficiency and down-sampling degradation when the sampling factor is set to two. In [10] the video information is employed to design the depth up-sampling filter benefit from the component correlation. During the investigation of 3DV coding, several requirements have been defined in [4] besides the coding efficiency. Considering the decoding capability of the autostereoscope display with affordable cost, the total data volume of uncompressed video data and supplementary data should be at most four times a single uncompressed video data. While in MVD, the data volume is proportional to the number of views. To solve this problem, a novel coding scheme is proposed in this paper, in which down-/upsampling are introduced to meet the data volume requirement. Moreover, an asymmetrical prediction (ASP) scheme is designed to improve the coding efficiency, which introduces inter-view prediction to the views with inconsistent resolution. The experiment in this paper is conducted according to the test condition of the exploration experiments recommend by 3DV [11]. The experimental results demonstrate that the proposed coding scheme goes beyond the capability of the JMVC8.3.1 [12] coding scheme, which is considered as an anchor coding scheme by MPEG [3]. The remainder of this paper is organized as follows. Section II presents the down-/up-sampling coding scheme and asymmetrical prediction. Experimental results and analysis are given in Section III, followed by conclusions in Section IV.
Abstract—This paper proposes a novel three dimensional video (3DV) coding scheme, in which down-/up-sampling filters and asymmetrical prediction (ASP) are involved into the coding system. By using of the down-/up-sampling filters, the 3DV data volume requirement of the decoding capability limitations can be well satisfied. To further improve the coding efficiency, ASP is proposed which enables the inter-view prediction between the views with inconsistent resolution. Experimental results demonstrate that the proposed scheme can significantly improve both the coding efficiency and the quality of the synthesized views, especially to sequences with low resolution and at low bit rates. Keywords-asymmetrical prediction,down-/up-sampling, three dimensional video (3DV)
I.
INTRODUCTION
Three dimensional video (3DV) can bring viewers more immersive feeling than the traditional two dimensional media. It enables the users to freely choose any viewpoint to enjoy a real 3D scene. 3DV has attracted increasing research interests due to its wide variety of applications, including free-viewpoint television (FTV) and three dimensional television (3DTV) [1]. To support those applications, many 3DV data formats have been designed. Multi-view video plus depth (MVD) is one of the typical 3DV data formats which can synthesize virtual views in any viewpoint within a viewing angle based on depth image-based rendering technique (DIBR) [2]. Due to its rendering capability and good compatibility, MPEG are dedicated in the 3DV standardization based on the MVD representation. The final call for proposal (CfP) [3] and the requirements for 3DV [4] have been released in March, 2011. Since both the video and depth data need to be sent to the decoder, designing an efficient 3DV coding scheme is an essential issue for its practical application. A straightforward approach for MVD coding is applying the conventional video coding algorithm, e.g., H.264/AVC or H.264/MVC. While in those approaches, the video and depth data are encoded independently. To further improve the coding efficiency of MVD, the component correlation is exploited by sharing the motion information of video for the depth coding [5]. In addition, several down-/up-sampling algorithms [6-10] have been proposed for depth coding to maintain good rendering quality. The results in [9] showed that a good trade-off can be achieved between the coding 978-0-7695-4729-9/12 $26.00 © 2012 IEEE DOI 10.1109/ICMEW.2012.22
II.
PROPOSED CODING SCHEME
As defined in the CfP [3], the 2-view and 3-view test scenarios are taken into account for research. For 2-view case, the data volume of MVD is still under the 4 times limitation. While the data volume of the uncompressed MVD in 3-view case is about 6 times a single uncompressed video data. Therefore, in subsection A, a new coding scheme using down-/up-sampling for specific views is proposed to decrease the data volume of 3-view case. To further improve the coding performance of the proposed scheme, an asymmetrical prediction scheme is proposed in subsection B. A. Framework of Down-/Up-sampling Coding Scheme In this subsection, a down-/up-sampling coding scheme is proposed to ensure the original data volume limitation. Specifically, down-/up-sampling process is involved into the 84
ViewCF
ViewR F
ViewR H
ViewL F
ViewL H
ViewC'F ViewL'F ViewR 'F
ViewL' H
ViewR 'H
Figure 1. The framework of the proposed scheme
encoder and decoder respectively. Since the down-sampling process introduces the sampling distortion into the reconstructed views, the sampling factor in our proposed scheme is set to 2 to get a tradeoff between the degradation and the coding efficiency, as discussed in [9]. Note that the down-sampling process can be employed in horizontal or vertical direction. In the decoder side, many more virtual views will be generated using the received data for stereoscopic and autostereoscopic displays. Since the reconstructed center view will be used to synthesize any virtual view between the left and right view, the resolution of the center view in our proposed scheme will not be reduced. That is to say, the down-sampling module is only evoked when coding the left view and the right view (both video and depth). Therefore, the 4 times data volume limitation can be satisfied for the 3view case and the center view is designed with a high priority to maintain the quality of the synthesized view. Fig.1 depicts the framework of the proposed down-/upsampling coding scheme. Here, ViewL, ViewC, and ViewR denote the left view, center view and right view, respectively. The subscript ‘F’ and ‘H’ represent the view with full resolution and half resolution. The view with the superscript ‘’’denotes the reconstructed views. As shown in Fig.1, ViewLF and ViewRF should be down-sampled before they are fed into the encoder with ViewCF. In the decoder side, the reconstructed views ViewL’H and ViewR’H are upsampled after they are decoded. Here, the MPEG-4 13-tap filter {2, 0, -4, -3, 5, 19, 26, 19, 5, -3, -4, 0, 2} / 64 is used to down-sample the original MVD data. As for the up-sampling process, the AVC 6-tap filter {1, -5, 20, 20, -5, 1} / 32 is applied for both luma component and chroma component.
Figure 2. The prediction structure of ASP when coding the pictures of center view
center view, take the sequence ‘Newspaper’ for example, which is depicted in Fig.2, the reconstructed pictures of left view and right view can’t be used as the inter-view reference pictures for the inconsistent resolution. However, it is obvious that there exists correlation between center view and the other two views, even their resolution are inconsistent. To exploit the inter-view correlation, an asymmetrical prediction structure is introduced into the proposed coding scheme. More specifically, when coding the picture of center view, the reconstructed pictures of left view and right view in the same access unit are fed into the up-sampling module. Then the up-sampled pictures of the two views are employed as the inter-view reference pictures of the current coding pictures of center view, as shown in Fig.2. Here, the AVC 6-tap filter {1, -5, 20, 20, -5, 1} / 32 is used as the up-sampling filter. In this way, the correlation between the center view and the other two views are exploited in terms of inter-view prediction and the coding efficiency of MVD can be expected to be improved. III. EXPERIMENAL RESULTS AND ANALYSIS The experiments are based on the H.264/MVC reference software JMVC version 8.3.1, which is considered as the reference software by MPEG in the CfP [3]. The MPEG view synthesis reference software (VSRS3.5) [13] is used to synthesis the virtual views. In our experiments, the coding efficiency of MVD and the quality of the synthesized views are employed to evaluate the performance of the proposed scheme. Besides, to evaluate the performance of the introduced ASP, another coding scheme is also tested where only down-/up-sampling coding scheme is employed without ASP. The anchor coding scheme (JMVC8.3.1) has been defined in the CfP document [11] for the comparison of different coding schemes. Eight MPEG test sequences specified in [3] are tested for the overall evaluation of the proposed scheme. The basic
B. Asymmetrical Prediction As the analysis above, to maintain the quality of the synthesized virtual views, the center view is coded with full resolution and the other two views are coded with half resolution. In the existing H.264/MVC standard, the interview prediction, which can achieve a significant coding gain, can only be used for the views with consistent resolution. Thus, when coding the pictures of right view, the reconstructed pictures of left view can be used as the interview reference pictures. While, when coding the pictures of
85
TABLE I.
BASIC INFORMATION OF TEST SEQUENCES
Sequences
Resolution
Kendo Lovebird1 Newspaper Balloons Poznan_Hall2 Poznan_Street Undo_Dancer GT_Fly
1024x768 1024x768 1024x768 1024x768 1920x1088 1920x1088 1920x1088 1920x1088
TABLE II.
Frame Rate (frame/s) 30 30 30 30 25 25 25 25
SL-SR
1-3-5 4-6-8 2-4-6 1-3-5 7-6-5 5-4-3 1-5-9 9-5-1
2.75-3.25 5.75-6.25 3.75-4.25 2.75-3.25 6.125-5.875 4.125-3.875 4.5-5.5 5.5-4.5
RATE-DISTORTION COMPARISON
SAMP+ASP VS JMVC Sequences Kendo Lovebird1 Newspaper Balloons Poznan_Hall2 Poznan_Street Undo_Dancer GT_Fly Average
OL-OC-OR
SAMP+ASP VS SAMP
BD PSNR(dB)
BD bit rate(%)
BD PSNR(dB)
BD bit rate(%)
0.573 0.279 0.383 1.206 0.493 0.023 -0.118 0.059 0.362
-11.26 -5.62 -7.46 -18.01 -15.16 -0.76 2.38 -2.28 -7.27
0.393 0.881 0.490 0.774 0.317 0.791 0.805 0.754 0.651
-10.08 -18.07 -9.74 -12.33 -5.58 -23.29 -21.81 -23.73 -15.58
information of these sequences is described in Table I. The OL, OC and OR denote the left original view, center original view and right original view, while the SL and SR denote the left synthesized view and the right synthesized view. In [11], four recommended video and depth quantization parameter (QP) pairs for the anchor coding scheme are listed. SAMP+ASP represents the combination of down-/upsampling coding scheme and ASP for center view. In the proposed SAMP+ASP coding scheme, the QPs need to be redesigned under each bit rate point. Here, the QPs of the left view and right view are the same, while they are different from the QPs of center view. It should be noted that the QPs of the video and depth are selected independently, and the QP selection criterion of video is given as follow: We select the left view and right view video QP of which the sum video bit rate is comparable to the anchor sum video bit rate of the two views. Then, the QP for the center view video is chosen, of which the bit rate is approach to the anchor center view video bit rate. The same QPs selection criterion is applied to the depth. If there are more than one optional QPs combination, the one with the highest average PSNR for synthesized views will be chosen as the final QPs combination. Here, the QPs combinations of SAMP are the same as that of SAMP+ASP. As for the down-sampling direction, horizontal down-sampling is implemented for all test sequences except for the Poznan_Hall2, which is downsampled in vertical direction. The Bjontegaard method [14] is employed to calculate the average PSNR/bitrate differences between different coding schemes. Here, the BD bit rate [14] is the numerical average reduction (negative value) or increment (positive value) of the MVD coding bit rate of the proposed method compared with that of the reference method at the same coding quality; while the BD PSNR [14] means the numerical average reduction (negative value) or increment
(positive value) of the average video PSNR of the proposed method compared with that of the reference method at the same coding bit rate. The coding efficiency comparison between SAMP+ASP and the other two schemes is given in Table II. Compared with JMVC8.3.1, average coding gains of 0.362dB/-7.27% can be achieved for the SAMP+ASP coding scheme. From the comparison we can also find that there exist differences between the two test data sets with different resolution. More specifically, larger coding gains (0.279dB~1.206dB) can be obtained for sequences with low resolution and smaller coding gain (-0.118dB~0.493dB) for the high resolution sequences. In addition, the coding performance has a connection to the scene complexity of the sequence. For the sequence Undo_Dancer which has rich details, -0.118dB/2.38% coding loss is introduced by the SAMP+ASP coding scheme. The rate-distortion comparison between SAMP and SAMP+ASP is also listed in Table II. Remarkable coding gain 0.651dB/-15.58% can be achieved by introducing the asymmetrical prediction for center view. To compare the performance of SAMP+ASP and JMVC8.3.1 more extensively, the rate-distortion curves of four representative sequences are shown in Fig.3. Here, the horizontal axis is the average bit rate of MVD, and the vertical axis is the average PSNR of the three coding views. It is obvious that SAMP+ASP significantly improves the coding efficiency of MVD compared with JMVC8.3.1. From Fig.3, we can observe that the coding gain between ASMP+ASP and JMVC8.3.1 seems large at lower bit rate, while small or even worse at higher bit rate, e.g., GT_Fly. The reason for that is the inherent distortion introduced by the sampling filters is more pronounced at higher bit rate than at low bit rate. Furthermore, the rate-distortion comparison of synthesized view quality between SAMP+ASP and JMVC8.3.1 is also given in Fig.4. The horizontal axis is the
86
38 37 36 35 34 33 32 31 30 29 28 27
Newspaper
Avg PSNR(dB)
Avg PSNR(dB)
-09&
6$03$63
Avg. Rate(kbps)
Balloons
JMVC8.3.1 SAMP+ASP 0
200
400
(a)
600 800 Avg. Rate(kbps)
1000
1200
1400
(b) 37
Hall2
GT_Fly
36
Avg. PSNR(dB)
Avg. PSNR(dB)
35
34
-09&
JMVC8.3.1 SAMP+ASP
6$03$63
33
Avg. Rate(kbps)
0
1000
2000 3000 Avg. Rate(kbps)
(c)
4000
5000
(d)
Figure 3. The Rate-distortion comparison for (a) Newspaper, (b) Balloons, (c) Poznan_Hall2, (d) GT_Fly.
38 37 36 35 34 33 32 31 30 29 28 27
Newspaper
Avg PSNR(dB)
Avg PSNR(dB)
-09&
6$03$63
Avg. Rate(kbps)
Balloons
JMVC8.3.1 SAMP+ASP 0
200
(a)
400
600 800 Avg. Rate(kbps)
1000
1200
1400
(b) 37
Hall2
GT_Fly
36
Avg PSNR(dB)
Avg. PSNR(dB)
35
34
JMVC8.3.1
-09&
SAMP+ASP
6$03$63
Avg. Rate(kbps)
33 0
(c)
1000
2000 3000 Avg. Rate(kbps)
(d)
Figure 4. Synthesized view quality comparison for (a) Newspaper, (b) Balloons, (c) Poznan_Hall2, (d) GT_Fly
87
4000
5000
[11] “Description of exploration experiments in 3D video coding”, ISO/IEC JTC1/SC29/WG11, Doc.N12037, 2011. [12] Y.Chen, P.Pandit,and S. Yea, “WD 4 Reference software for MVC”, ISO/IEC JTC1/SC29/WG11 and ITU-T SG16, Doc. JVT-AD207, 2009. [13] MPEG-3DV view synthesized reference software, [online].Available: ftp://ftp.merl.com/pub/avetro/3dv-cfp/software/VSRS_software.zip [14] “Calculation average PSNR differences between RD-curves,” ITU-T Q6/SG16, Doc.VCEG-M33, 2001.
sum bit rate of MVD. As for the vertical axis, it is the average PSNR value of the two synthesized views. The PSNR for each synthesized view is calculated by comparing the synthesized views obtained from the decoded MVD data and the original synthesized views obtained from the original MVD data without coding. Here, the decoded half resolution views need to be up-sampled before synthesizing operation. As shown in Fig.4, the quality of the synthesized views can be significantly improved by using the proposed SAMP+ASP coding scheme. IV.
CONCLUSIONS
A novel 3DV coding scheme is proposed in this paper, in which down-/up-sampling operations are applied to part of the views in the coding system. In addition, to fully exploit the correlations among the views with inconsistent resolutions, asymmetrical inter-view prediction is introduced. Experimental results demonstrate that the proposed schemes can significantly improve both the coding efficiency and the quality of synthesized views, especially to sequences with low resolution and at low bit rates. ACKNOWLEDGMENT We would like to acknowledge the support provided by The National Natural Science Foundation of China (61001205, 60902081, 60902052), The Programmer of Introducing Talents of Discipline to Universities (B08038), The Fundamental Research Funds for the Central Universities (72105457) and the Xidian-Qualcomm Joint Research Funds. REFERENCES [1]
A. Smolic, K. Mueller, et al., “3-D video and free viewpoint video– technologies, applications and MPEG standards,” in Proc. IEEE Int. Conf. Multimedia Expo, Toronto, ON, 2006, pp. 2161–2164. [2] C. Fehn, “Depth-image-based rendering (DIBR), compression and transmission for a new approach on 3D-TV,” in Proc. SPIE Conf. Stereoscopic Displays and Virtual Reality Systems XI, vol. 5291, CA, U.S.A., Jan. 2004, pp. 93–104. [3] “Call for proposals on 3D video coding technology”, ISO/IEC JTC1/SC29/WG11, Doc.N12036, 2011. [4] “Applications and requirements on 3D video coding”, ISO/IEC JTC1/SC29/WG11, Doc.N12035, 2011. [5] H. Oh, Y.-S. Ho, “H.264-based depth map coding using motion information of corresponding texture video,” Adv. Image Video Technol., vol. 4319, 2006. [6] K-J. Oh, S. Yea, A. Vetro, Y-S. Ho, “Depth Reconstruction Filter and Down/Up Sampling for Depth Coding in 3-D Video”, IEEE signal processing letters, vol.16, No.9, Sept. 2009, pp. 747-750. [7] M.O. Wildeboer, T. Yendo, et al., “Color Based Depth Up-sampling for Depth Compression,” Picture Coding Symposium (PCS), Nagoya, Dec. 2010, pp. 170-173. [8] Y. Li, L. Sun, “A Novel Upsampling Scheme for Depth Map Compression in 3DTV System,” Picture Coding Symposium (PCS), Nagoya, Dec. 2010, pp. 186-189. [9] K. Klimaszewski, K. Wegner, M. Domanski, “Influence of views and depth compression onto quality of synthesized views”, ISO/IEC JTC1/SC29/WG11, Doc.M16758, 2009. [10] Wildeboer, M.O., Yendo, T., et al., “Depth up-sampling for depth coding using view information", 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON), 2011.
88