Lightfield Compression

8 downloads 0 Views 290KB Size Report
images. 2-D discrete wavelet transform (DWT) is then applied to each of the coefficient ... In practice, image- ... coefficient image coding, option for shape adaptive ... Figure 1. Light field coding system light field image array. Wavelet,. Level.
Light Field Compression Using Disparity-Compensated Wavelet Decomposition Chuo-Ling Chang and Xiaoqing Zhu Email: {chuoling,zhuxq}@stanford.edu

Abstract This paper proposes a new scheme for light field compression that incorporates disparity compensation into wavelet decomposition based on lifting. Disparity-compensated wavelet decomposition is firstly applied to exploit similarity between different views. The resulting coefficient images are encoded using shape-adaptive discrete wavelet transform followed by a modified SPIHT coder. The resultant bit-stream can achieve scalability in all dimensions as well as reconstruction quality. Lossless compression can be achieved. In addition, experimental results indicate that PSNR performance is comparable to traditional DCT-based coder while visual quality of the proposed scheme is considerably better.

1

Introduction

Over the years light field rendering have received growing research interests due to its photo-realistic rendering capability, scene independent rendering speed, and potentials of interactive applications [1]. The amount of data needed for such a technique, however, is rather huge. Light field is essentially a 2D array of images capturing an individual scene or object from different viewpoints and can be interpreted as a 4-D array of pixel values, accompanied by camera parameters for each view. Usually a complete light field dataset consists of hundreds of images, easily exceeding gigabytes in size. Compression is therefore necessary for any practical light field system. Extensive researches have been carried out on light field compression. Aside from high compression ratios, features of random access, fast decoding and embedded representation are also emphasized. The most popular approaches include: • •

Vector quantization, which favors random access and fast rendering yet having only fairly low compression ratios [1,2]. Predictive coding, which efficiently exploits redundancy in between different views by disparity compensation but inevitably introduces inter-view dependencies that hamper random access [3,4,5].



Progressive coding, either directly applying 4-D discrete wavelet transform to image array [6] or using a geometry model to pre-warp each image on to a corresponding view-dependent texture map before 4-D wavelet decomposition [7].

This work seeks to combine the strengths of predictive coding and progressive coding together to achieve high compression, embedded representation, as well as reasonable random access and fast decoding capabilities. We propose a new compression scheme that incorporates the disparity compensation operation into the lifting structure of inter-view wavelet decomposition followed by image-domain wavelet coding for the coefficient images. Additionally, the coefficient image coding is modified to be shapeadaptive to avoid spending bits on unneeded backgrounds. The overall system structure is explained in the next section, followed by concept of disparity compensated wavelet decomposition and the motivations for shape-adaptive wavelet coefficient image coding. Experimental results are given and discussed in the end.

2

System Overview

The proposed light field coding system is illustrated in Figure 1. Firstly inter-view disparity-compensated (DC) lifting is performed to decompose original light field images into low-band and high-band coefficient images. 2-D discrete wavelet transform (DWT) is then applied to each of the coefficient images, providing a multi-resolution representation. Finally a SPIHT coder compresses the wavelet coefficients into bit-stream [8]. The first two stages essentially carries out the 4-D DWT in a separable manner and the coefficients are further compressed by SPIHT. In practice, imagedomain DWT and SPIHT are implemented together as in conventional procedures of wavelet-based image compression. There is also sufficient flexibility for the system. Different wavelet kernels, decomposition levels and disparity-compensation methods can be selected. For coefficient image coding, option for shape adaptive (SA) modification of both DWT and SPIHT is provided, and also there is option of arithmetic coding

light field image array

Inter-View DC-lifting

DC Wavelet, Method Level

Geometry Info

approx. geometry

Coeff. Image DWT

SA

SPIHT

SA

AC

Segmentation Info

bitstream Figure 1. Light field coding system (AC) for the SPIHT coder, although preferably arithmetic coding is not applied to facilitate random access and scalability.

3

Disparity-Compensated Wavelet Decomposition

3.1 Geometry-Based Disparity-Compensation Although light field data are highly correlated across all four dimensions, direct exploitation of such redundancy is difficult due to the presence of view differences among images. In literature an approach analogous to motion compensation in video coding is general applied and called disparity compensation. Similar to the role of motion compensation in video coding, effective disparity compensation is of crucial importance to the efficiency of light field compression. It has also been observed that with the aid of estimated geometry as global information, disparity compensation can be achieved more efficiently [9]. With given camera parameters, an approximate geometry model of the scene can be estimated and represented efficiently [10]. Geometry-based disparity compensation, as illustrated in Figure 2, is used to perform the disparity compensation functionality. However, it is worth mentioning that there is no constraint by the lifting scheme for selection of any methods of disparity compensation. And for special cases where geometry of a scene is hard to estimate, one may resort to the depth-map approach of disparity compensation instead [4].

left view

right view

Figure 2. Disparity compensation using approximate geometry: Corresponding points in left and right view can be related using the geometry. To predict the left view, pixel value of the prediction image is sampled from the corresponding pixel in the right view.

3.2 Wavelet Decomposition Using Lifting Early efforts in the 4-D wavelet decomposition of light field data yielded rather unsatisfactory compression ratios. In [6], 4-D Haar transform is directly applied to the image array. With no disparity compensation, misalignment between views introduced many high frequency components, which is expensive to code. In [7], disparity compensation is applied by warping each image onto its corresponding view-dependent texture map. 4-D Haar wavelet coder is then applied to the set of texture maps. It has achieved better compression ratios compared to [6], but suffered from the nonuniform warping process, which is non-reversible, and restricts the wavelet kernel to Haar. In addition, regions not covered by the approximate geometry cannot be warped into texture map and hence cannot be encoded. In video coding there has been a similar problem of efficiently combining motion-compensation with temporal wavelet decomposition. Recent work proposed using a lifting structure of the wavelet decomposition that allows unrestricted model of motion prediction and compensation within the framework without sacrificing reversibility [10]. For light field compression, the module of motion compensation is replaced by disparity compensation. The lifting structure omitting scale is illustrated in Figure 3. X0 and X1 denote evenly and oddly indexed images. λ1(z) and λ2(z) are z-transform notations of linear filters. For Haar wavelet,

λ1(z) = – z-1 λ2(z) = z / 2 For bi-orthogonal 5/3 wavelet, λ1(z) = – (1 + z) / 2 λ2(z) = (1 + z-1) / 4

+

X0

(a) Original images

λ2(z)

λ1(z) X1

YL

+

YH

Disparity compensation Analysis YL

-

X0

λ1(z)

λ2(z)

-

YH

X1

Disparity compensation

(b) Directly decomposed coefficient images

Synthesis Figure 3. Disparity-compensated (DC) lifting Disparity compensation can be viewed as nonlinear modifications of the filtering. The resultant YL and YH can be viewed as average of the original signals and residual error of the prediction, respectively. Note that perfect construction is ensured as operations of the lifting steps are identical during analysis and synthesis, only with inversed order and a sign flip. With warping from view 1 to view 0 denoted as W1-0 [.]. Omitting scale, disparity-compensated Haar transform is then expressed as:

(c) Disparity-compensated coefficients images

YH = X1 – W1-0 [X0] YL = X0 + W0-1 [YH /2] Its inverse operation is simply: X0 = YL – W0-1 [YH /2] X1 = YH + W1-0 [X0] For bi-orthogonal 5/3 wa1velet, the expression is similar with two warping operations in each lifting step. Two-dimensional decomposition is simply carried out in a separable fashion, applying the DClifting row by row first and then column-wise, or vice versa. The benefit of disparity compensation for interview wavelet decomposition is illustrated in Figure 4. With DC, energy in high band is significantly reduced while the low band component is free of ghosting artifacts.

4

Coefficient Image Coding

4.1 DWT and SPIHT Coding After DC-lifting, much of the correlation between various views is removed and the resulting coefficients are treated as independent images. They are further decomposed in 2-D by a multi-level

Figure 4. (a) Four neighboring images in a lightfield data set. (b) Coefficient images obtained from directly decomposing the original images with a 1-level 2-D Haar kernel (c) Coefficient images from disparitycompensated wavelet decomposition

decomposition using the popular bi-orthogonal 9/7 wavelet kernel, followed by SPIHT coder. The SPIHT algorithm is essentially a bit-plane coding method for tree-structured wavelet coefficients [8]. Based on the assumption that most images have decaying power spectrum, it further exploits the dependency between

sub-bands. In addition, it sorts out coefficients in the order of importance, and also refines previously coded coefficients on a bit-by-bit level. The output bit-stream is then a continuously embedded representation of the original data that supports scalability in all dimensions as well as in reconstruction signal-to-noise-ratio (SNR) sense. It would have been more natural if one uses a 4-D SPIHT coder as mentioned in [6] and [7], for after all, the light field data has been 4-D wavelet transformed in a separable fashion. Preliminary experiments, however, showed that 4-D SPIHT coder actually yielded results inferior to its 2-D counterpart. The reason is that with a non-dyadic and unbalanced structure of 4-D DWT (i.e., different wavelet kernels and decomposition levels for inter-view and coefficient image), coefficients have varying extents of energy concentration along different dimensions, and therefore the coding efficiency of 4-D SPIHT is compromised. The straightforward solution is either using a more sophisticated wavelet kernel or multiple levels for inter-view decomposition or resorting to a simpler kernel for the coefficient image decomposition. However, the former introduced multiple view dependencies unwanted for random access, and the later sacrificed effective exploitation of redundancy within images. This issue needs to be further investigated.

4.2 Shape-Adaptive Modification Many of the light field datasets look onto a single object from multiple viewpoints and the background information is not needed for rendering, therefore it is desirable to avoid wasting any bits on the background. With a priori knowledge of camera parameters and the

Figure 5. Segmentation mask encoding: (top-left) Original image. (bottom-left) Exact mask. (bottomright) Approximated mask generated from the estimated geometry. (top-right) Difference between the exact mask and the approximate mask.

estimated object geometry, segmentation information can be coded with rather trivial bits. Specifically, an approximate mask can be generated using camera parameters and the estimated geometry. Only deviations of the approximate mask from the exact mask need to be coded as shown in Figure 5. Once the segmentation masks are available, background information can be entirely omitted in shape-adaptive coding scheme. The improvement of using shape-adaptive DWT followed by modified SPIHT for video coding has been reported in [11]. Performance gain is rather significant when no arithmetic coding follows the SPIHT coder. Therefore, shape-adaptive coding is especially crucial for light field compression, where arithmetic coding is better switched off for fast decoding and random access concerns. Visual quality of lossy reconstruction also improves, especially at object borders, now that the precise object silhouette is known at the decoder. For light fields with a valid background, the masks can be simply set to the entire image for each view; therefore no generality is lost by this modification.

5

Experimental Results

Experiments are conducted using the light field dataset Garfield, consisting of 256 images at a resolution of 192x144 pixels, captured by a hemisphere arrangement where there are 8 latitudes each containing 32 images. Only luminance component is coded. Typical images of the data set can be seen in Figure 4(a). Approximate geometry for disparity compensation is obtained using the method in [10]. It has to be transmitted as side information but the coding overhead is neglected in bit-rate calculation. Segmentation masks can be obtained directly from the original images and are encoded at 0.022 bpp as mentioned in section 4.2.

Figure 6. Inter-view disparity-compensated wavelet coefficient images using segmentation information. The original images are shown in Figure 4(a). Coefficient images without using segmentation information are shown in Figure 4(c).

Geometry-based disparity compensation is applied in the 2-D inter-view lifting structure. By using the segmentation mask, boundary of the object can be compensated more accurately as shown in Figure 6 compared to Figure 4(c). Resultant coefficient images are further decomposed with the popular 3-level biorthogonal 9/7 wavelet followed by SPIHT coding to convert the two-dimensional coefficient pyramid into bit-stream. Shape-adaptive modification is also implemented and compared to schemes using traditional rectangular transform. 1-level Haar interview decomposition is used and the results are shown in Figure 7. Figure 8. Experimental results for Garfield data set using DCT-based lightfield coder and proposed wavelet-based coder. Arithmetic coding is not applied in wavelet-based coder. (a) DCT-based

(b) Wavelet-based (Haar)

Figure 7. Experimental results for Garfield data set: Solid line-pair represents results with arithmetic coding for SPIHT output bit-stream, dashed line-pair represents results without arithmetic coding. Each line-pair contains results using shape-adaptive coding and traditional rectangular coding. With arithmetic coding, the performance gain with shape-adaptive (SA) modification is about 1.5 dB above that without SA. Without Arithmetic coding, the gap between SA and none-SA cases are even more obvious, up to 10 dB. Performance of the proposed scheme is also compared to a DCT-based approach with geometry-based disparity compensation similar to [9]. Rate-distortion curve of the DCT-based coder is plotted together with that of the wavelet-based coder using 1-level Haar and bi-orthogonal 5/3 without arithmetic coding in Figure 8. Result from Haar wavelet is inferior to the DCT-based coder except at high bit-rate. The bi-orthogonal 5/3 kernel is more efficient, and outperforms the DCT scheme coder for most of the coding range. Perceptive qualities of reconstruction at different bit-rates are also displayed in Figure 9. It can be observed that the images from wavelet coders are free of blocking artifacts and has clearer boundaries compared to those from the DCT-based coder, especially at low-bit rate. Even with Haar transform, which has lower PSNR than DCT coder, the visual qualities are actually better.

(c) Wavelet-based (5/3)

0.065 bpp

0.105 bpp

0.156 bpp

original

Figure 10. Reconstructed images at different bit-rate

6

Conclusions

In this paper we presented a light field compression scheme that uses disparity-compensated lifting for inter-view wavelet decomposition. Shape adaptive modification of both the image domain DWT and SPIHT coding of the coefficients are adopted for higher coding efficiency and better visual quality. Aside from providing a fully scalable representation of the light field data, the proposed scheme also allows lossless compression and reasonable random access. Experimental results showed that it maintains competitive coding efficiency and improved reconstruction quality compared to DCT-based approach.

References [1] Marc Levoy and Pat Hanrahan, “Light field rendering”, in Computer Graphics (Proceedings SIGGRAPH96), August 1996, pp 31-42 [2] Xin Tong and Rover M. Gray, “Interactive view synthesis from compressed light fields”, in Proceedings of the IEEE International Conference on Image Processing ICIP-2001, 2001. [3] Marcus Magnor and Bernd Girod, “Data compression for light field rendering”, in IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, no. 3, pp. 338-343, April 2000. [4] Marcus Magnor, Peter Eisert, Bernd Girod, “Multi-View Image Coding with Depth Maps and 3-D Geometry for Prediction”, in Proceedings SPIE Visual Communications and Image Processing VCIP-2000, November 2000, pp.199203 [5] Prashant Ramanathan, Markus Flierl, and Bernd Girod, “Multi-hypothesis disparity-compensated light field compression”, in Proceedings of the IEEE International Conference on Image Processing ICIP-2001, October 2001. [6] Marcus Magnor, Andreas Endmann, and Bernd Girod, “Progressive compression and rendering of light fields”, in Proceedings Vision, Modeling and Visualization 2000, November 2000, pp. 199-203 [7] Marcus Magnor and Bernd Girod, “Model-based coding of multi-viewpoint imagery”, in Proceedings SPIE Visual Communications and Image Processing VCIP-2000, Perth, Australia, June 2000, vol. 1, pp. 14-22 [8] A. Said and W. Pearlman, “A new, fast and efficient image codec based on set partitioning in hierarchical trees”, in IEEE Transactions on Circuits and Systems for Video Technology, vol 6, pp 243-250, June 1996 [9] Marcus Magnor, Peter Eisert, and Bernd Girod, “Model-aided coding of multi-viewpoint image data”, in Proceedings of the IEEE International Conference on Image Processing ICIP-2000, Vancouver, Canada, September 2000, vol. 2, pp. 919-922. [10] A P. Eisert, E. Steinbach, and B. Girod, “Automatic reconstruction of 3-D stationary objects from multiple uncalibrated camera views,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, no. 2, pp. 261-277, March 2000. [11] Andrew Secker and David Taubman, “Motioncompensated highly scalable video compression using an adptive 3D wavelet transform based on lifting”, in Proceedings of the IEEE International

Conference on Image Processing ICIP-2001, Thessaloniki, Greece, October 2001. [12] G. Minami, Z. Xiong, et al, “3-D Wavelet Coding of Video With Arbitrary Regions of Support”, in IEEE Transactions on Circuits and Systems for Video Technology, vol 11, no. 9, pp 1063-1068, September 2001.

Appendix Chuo-Ling Chang: Lightfield dataset I/O interface Disparity-Compensated Lifting Framework Truncated Haar, 1-D Haar, 1-D 5/3, 2-D Haar, 2-D 5/3 Shape-Adaptive DWT Segmentation mask coding Performance evaluation of the proposed scheme ZHU Xiaoqing: Calculation of coding gain: Truncated Haar, 1-D Haar, 1-D 5/3, 2-D Haar, 2-D 5/3 Implementation of 2-D SPIHT coder Implementation of 4-D SPIHT coder Performance evaluation of 2-D vs. 4-D SPIHT coder Modification of original SPIHT coder adding shape-adaptive functionality