optimal subband bit allocation for multi-view image coding ... - eurasip

2 downloads 0 Views 260KB Size Report
This paper presents an optimal subband bit allocation based on a new rate distortion (R-D) model for multi-view image coding with disparity-compensated ...
OPTIMAL SUBBAND BIT ALLOCATION FOR MULTI-VIEW IMAGE CODING WITH DISPARITY COMPENSATED WAVELET LIFTING TECHNIQUE Pongsak Lasang and Wuttipong Kumwilaisak Communication and Multimedia (CODIA) Laboratory, King Mongkut’s University of Technology Thonburi, Bangkok 10140, Thailand E-mail: [email protected] ABSTRACT This paper presents an optimal subband bit allocation based on a new rate distortion (R-D) model for multi-view image coding with disparity-compensated wavelet lifting. First, the distortion prediction of the reconstructed multi-view image with lifting scheme is presented. A new rate distortion model combining the exponential and power model is developed. Then, the analyzed prediction error and rate distortion model are used in the optimal bit allocation framework. The bit allocation framework allocates bit to all subbands with the goal to minimize distortion of the reconstructed multi-view images. Low-pass and high-pass subbands are compressed by SPIHT [4] with optimal bit solution. We verify proposed method with several test multi-view images. The results show that the bit allocation based on the proposed method provides close results to the exhaustive search in both allocated bits and PSNR. It also outperforms the uniform bit allocation over a wide range of target bit rate. Index Terms— Multi-view coding, wavelet lifting, SPIHT, rate-distortion model, bit allocation 1. INTRODUCTION In recent years, multi-view image coding has become an interesting research area due to its exciting in multimedia application such as 3-D television, free-viewpoint television, and multi-view surveillance. A set of multi-view image is taken by several cameras from different angles. These cameras aim at the same scene to capture depth and other useful information for 3-D object reconstruction. This generates the huge data volume which makes efficient compression necessary. Most multi-view image compression algorithms in the literature tried to reduce intra-view and inter-view redundancies. In [1], Tong et al. exploited inter-view redundancy and examined the disparity compensated predictive coding for multi-view images. Magnor et al. [2] proposed the multi-view image coding based on the texture map and model-aided prediction. In video coding, the efficient motion compensation using lifting technique [3] was proposed to reduce redundancies between video frames, when wavelet transform is used. It was shown that this guarantees the invertibility at the synthesis side.

To optimally code multi-view image with lifting technique, the bit rate should allocate to different subbands with the objective to maximize reconstructed multi-view image quality. Without model, we may need to exhaustively search for optimal bit allocation solution. This makes the multi-view image coding complex. Therefore, this paper proposes the optimal subband bit allocation based on ratedistortion (R-D) model for multi-view image coding with disparity-compensated wavelet lifting. First, the distortion prediction of the reconstructed multi-view image with lifting scheme is analyzed. A new rate distortion model combining the exponential and power model is developed. The objective of this rate distortion model is to capture the rate distortion characteristics of multi-view image precisely over a wide range of target bit rate. Then, the analyzed prediction error and rate distortion model are used in the optimal bit allocation framework. The bit allocation framework allocates bit to all subbands with the goal to minimize distortion of the reconstructed multi-view images. The remainder of this paper is organized as follows: Section 2 presents the distortion prediction of multi-view image, when disparity compensation with wavelet lifting is used. Section 3 describes the proposed optimal bit allocation to different subbands of multi-view image based on a new rate distortion model. The experimental results are shown in Section 4. The conclusion remarks are given in Section 5. 2. DISTORTION PREDICTION IN RECONSTRUCTED MULTI-VIEW IMAGE Lifting scheme is used to construct the discrete wavelet transform (DWT) as investigated in [5]. High-pass (H) and low-pass (L) subband decompositions can be achieved with a sequence of predict and update steps from the lifting structure. The structure of lifting can be shown in Fig. 1. The analysis side of lifting scheme decomposes multi-view images to H and L subbands. The synthesis side reconstructs the multi-view images from the subbands. In multi-view image coding context, occluded and unconnected pixels in predict and update steps influence the computation of distortion. Therefore, their effects are taken into account in distortion prediction. There is computed in term of connected pixels ratio which used in the prediction formulation.

Fig. 1. Structure of 5/3 wavelet lifting scheme. The analysis side (left) and the synthesis side (right).

Suppose that Ak and Bk are the k th view of a set of even and odd views, respectively. At the synthesis side, the inverse update and predict steps recover views Ak and Bk from the subbands. For disparity compensation with 5/3 lifting, they can be written as

Ak = Lk − b ⎡⎣ DC ( H k , −dˆBk → Ak ) + DC ( H k −1 , −dˆBk −1 → Ak ) ⎤⎦ , (1) and

Bk = H k + a ⎣⎡ DC ( Ak , dˆBk → Ak ) + DC ( Ak +1 , dˆBk → Ak +1 ) ⎤⎦ ,

(2)

where dˆn → m is a disparity vector, which is obtained from using frame m as a reference to compensate frame n. DC ( X , dˆn → m ) is the disparity compensated function, which . a and b are compensates X with disparity vector dˆ n→m

the scaling factors used in predict and update steps of wavelet lifting. Lk and H k are the low-pass and high-pass subbands of view k . The distortion in reconstructed view Ak and Bk can be formulated as

(

)

DAk = DLk − b DH k + DH k −1 ,

(3)

(

(

) )

(

)

= (1 − 2ab ) DH k + aDLk − abDH k −1 + aDLk +1 − abDH k +1 .

(4) Assume that every pixel in view Bk can be possibly predicted from two adjacent reference views (view Ak and Ak +1 ). Let f and r be the ratios of connected pixels in forward and backward of the reference image views and 0 ≤ f ≤ 1, 0 ≤ r ≤ 1 . If the distortions of image views are equally distributed and we assume that DH k = DH k −1 = DH k +1 and DLk = DLk −1 = DLk +1 , from (3) and (4), the distortion in the reconstruction images of view Ak and view Bk can be formulated as

)

DAk = g ( f , r ) DLk − 2bDH k ,

and

3.1. Rate-Distortion modeling Accurate rate distortion model plays an important role in multimedia compression and transmission due to its efficiency in computation and low complexity. In high bit rate, the exponential model matches well with rate distortion characteristic [6]. If we model a source as Laplacian and define distortion as De ,l ( x, x0 ) = x − x0 ,

where x is the reconstruct multi-view image and x0 is the original multi-view image, the rate distortion function can be written as [7]

R ( De,l ) = ln

σ De,l

;0 < De,l < σ ,

(7)

where σ is the standard deviation of the source. De,l is the distortion from exponential model when we model a source as Laplacian source and R is the bit rate. When we model a source as Gaussian source and define distortion as De , g ( x, x0 ) = ( x − x0 ) , its rate distortion function can be

formulated as [7]

= DH k + aDLk − ab DH k + DH k −1 + aDLk +1 − ab DH k + DH k +1

(

3. MODEL-BASED OPTIMAL SUBBAND BIT ALLOCATION FOR MULTI-VIEW CODING

2

and DBk = DH k + a DAk + DAk +1

where g ( f , r ) and h( f , r ) are the function of the ratio of connected pixels f and r , respectively. In this paper, we assume that occluded and unconnected pixels are equal (parallel views). As we can see from (5) and (6), the distortion in both reconstructed views are depended on the distortion in low-pass (L) and high-pass (H) subbands as well as the ratio of connected pixels in adjacent views. This can be simplified as the weighted sum of errors of both lowpass and high-pass subbands. Note that the distortion of low-pass and high-pass subbands can be caused by the quantization process or the truncation of wavelet coefficients in each subband.

DBk = h( f , r ) ⎡⎣(1 − 4ab ) DH k + 2aDLk ⎤⎦ ,

(5)

(6)

R ( De , g ) =

1 σ2 log ;0 < De , g < σ 2 , 2 De , g

(8)

where σ 2 is the source variance. Laplacian and Gaussian source models are widely used for source modeling because of their mathematically tractable [8]. We can write the general form of the exponential model of both Laplacian and Gaussian sources as De ( R) = α e− β R , (9) where De ( R) is a general form of exponential model and α and β are the constants depended on the source type. In low bit rate, power model is highly accurate to represent rate-distortion function [6]. This model can be used for both Gaussian and Laplacian sources. The general form of power model can be written as (10) D p ( R ) = η R −γ , where η and γ are constants depended on the source type. However, exponential model or power model can not accurately represent rate-distortion function over wide range of bit arte. Therefore, we propose a new rate

distortion model for multi-view image coding with wavelet lifting. It exploits the advantages of both exponential and power models by trying to capture rate distortion function precisely in whole range of bit rate. The proposed rate distortion model can be written as Dt ( R) = w1 De ( R) + w2 Dp ( R) (11) = w1α e − β R + w2η R −γ ,

where De ( R) is the exponential distortion component,

Dp ( R) is the power distortion component, w1 and w2 are the weights of exponential and power components, where 0 ≤ w1 ≤ 1, 0 ≤ w2 ≤ 1 and w1 + w2 ≤ 1 , and α , β ,η , and γ are the parameters characterizing the proposed distortion model. In this paper, we use w1 = w2 = 0.5 and parameters α , β ,η , and γ can be obtained using least square method. The proposed model requires additional computations since it combines two models together. The complexity is mainly from computing model parameters which is O(n2). 3.2. Model-based optimal subband Bit allocation The optimal bit allocation can be formulated as an optimization problem, which aims to minimize the total distortion in a presence of a rate constraint [9]. The total distortion can be expressed as a weighted sum of the distortion of L and H subbands. We define ∑ DAk + ∑ DBk = ∑ ( ρ Lk DLk ) + ∑ ( ρ H k DH k ),

∀k

∀k

∀k

∀k

(12)

where ρ Lk and ρ H k are the constants, which use to weight the distortion between L and H subbands, respectively. We use ρ Lk = ρ H k = 1 in this experiment. With the assumption that the distortion is equally distributed [10], the total distortion can be simplified as ∑ DAk + ∑ DBk = DL ∑ ρ Lk + DH ∑ ρ H k ,

∀k

∀k

∀k

∀k

(13)

Use (11) as the representation of DL and DH . We obtain DL = w1, Lα L e − β L RL + w2, Lη L RL−γ L ,

(14)

DH = w1, H α H e − β H RH + w2, Hη H RH−γ H ,

(15)

and

where DL and RL are the distortion and rate of L subband. DH and DH are the distortion and rate of H subband. Let Rtotal be the total rate used to code multi-view image,

Rhd , DV is a number of bit used for coding the disparity vectors and header information information, and Rtexture is a number of bit used to code the texture information. We know that Rtexture = Rtotal − Rhd , DV . (16) With the definition of distortion and rate described above, the problem in allocating bit to L and H subbands can be formulated as

Problem Formulation: Given a bit rate constraints Rtexture for coding the multi-view images, find the optimal bit allocation to L and H subbands such that ⎛ ⎞ min ⎜ DL ∑ ρ Lk + DH ∑ ρ H k ⎟ , (17) ∀k ⎝ ∀k ⎠ under the constraint

RL ∑ bLk + RH ∑ bH k ≤ Rtexture , ∀k

(18)

∀k

where bLk and bH k are, constants, the ratio of number of bits in Lk and H k subbands. To compute the optimal bit rate allocation to L and H subbands, we set up a cost function based on the Lagrangian cost function as J (λ ) = ( DL ∑ ρ Lk + DH ∑ ρ H k ) + ∀k

∀k

λ ( Rtexture − RL ∑ bLk − RH ∑ bH k ). ∀k

(19)

∀k

To obtain the optimal solution, we differentiate the cost function with respect to RL , RH , and λ as well as equate the result of differentiation to zero. The solution of RL and RH can be obtained by solving two equations as below

∑ ρ Lk

∀k

∑ bLk

∀k

×

∑ρ

∂DL ∀k H k ∂DH = × , ∂RL ∑ bH k ∂RH

(20)

∀k

and Rtexture = RL ∑ bLk + RH ∑ bH k . ∀k

∀k

(21)

The solution of optimal bit allocation will be passed to SPITH [4] to encode the multi-view image. 4. EXPERIMENTAL RESULTS

In this section, we present a sequence of experimental results to verify the proposed multi-view image coding method. We use test multi-view images consisting of 5 views of Tsukuba1, Teddy2, and Race13. The disparity compensation is performed in macroblock level with the size of 16x16 pixels. The residue error is encoded by SPIHT codec [4]. First, we verify the proposed rate distortion model. We assume that a wavelet coefficient has Laplacian distribution [11]. Fig.2 (top) shows the comparison of the proposed rate distortion model, exponential model, and power model with the actual rate distortion curve of H subband, when Tsukuba is used as a test image. We can see that the proposed method outperforms the exponential and power models in fitting the rate distortion curve. Notice that the rate distortion curve of H 0 and H1 is closed to each other, which verifies the assumption of equally distribution of distortion in Section 2. The results of L subband as shown in Fig.2 (bottom) follow the same trend as H subband. Fig.3 shows the reconstructed signal of H and L subbands with optimal bit allocation (target bit rate: 0.5 bpp, H: 0.14 bpp, and L: 0.74 bpp). 1,2 3

http://cat.middlebury.edu/stereo/data.html ftp://ftp.ne.jp/KDDI/multiview

Table 1 Comparison of subband bit allocation at target bit rate 0.35 bpp

Tsukuba: H subbands 3.2

2.4

Distortion

Images

Actual H0 Actual H1 Proposed Exp. Model Power Model

2.8

2

Tsukuba

1.6

teddy

0.8

0

0.1

0.2

0.3

0.4

0.5 0.6 Rate (bpp)

0.7

0.8

0.9

Race1

Actual L0 Actual L1 Actual L2 Proposed Exp. Model Power Model

Distortion

14 12 10 8

1.1

Rate (bpp)

0.35

0.02

0.025381

28.692 0.35

30.26 0.47667

30.258 0.4645

0.35

0.16

0.17825

35.653

36.48

36.4566

(bpp)

(bpp)

36

PSNR (dB) 0.9

31.713 0.56641

38

0 0.7

31.75 0.57

40

2

0.5

30.4645 0.35

Comparison of average PSNR of Tsukuba tested image

4

0.3

RH k

0.083194

42

6

0.1

0.52787

0.11

PSNR (dB)

20

16

0.51

0.35

RH k

Tsukuba: L subbands 18

0.35

(bpp)

PSNR (dB) RLk (bpp)

1

1.3

1.5

Proposed

(bpp)

RH k

0.4

Exhaustive search

RLk

PSNR (dB) RLk (bpp)

1.2

Uniform allocation

1.7

34 32 30

Proposed (G)

28

Proposed (L)

26

Exhaustive search

24

Fig. 2. Rate distortion curve of Tsukuba in H (top) and L (bottom) subbands comparing the proposed model with exponential model, power model, and actual distortion over wide range of bit per pixels.

Uniform

22 20 0.05

0.2

0.35

0.5

0.65 Rate (bpp)

0.8

0.95

1.1

1.25

Fig. 4. PSNR comparison of Tsukuba test image when using different bit allocation method. G and L stand for Gaussian and Laplacian distributions of the source images.

the optimal bit allocation method was described. Experimental results showed that the proposed method provides close results of the optimal bit allocation and PSNR to the multi-view image coding using exhaustive search. It also outperforms the uniform bit allocation in PSNR in wide range of target bit rate. Fig. 3. The reconstructed signal of H (left) and L (right) subbands. H and L subbands are encoded with bit rate 0.14 bpp and 0.74 bpp, respectively.

Table 1 shows the comparison among the uniform bit allocation of L and H subbands, exhaustive search for optimal bit allocation of L and H subbands, and the proposed method, which uses the model described in previous sections. Results show that the bit rate allocation using the proposed model is close to that obtained from exhaustive search. Moreover, the proposed method provides better PSNR than the uniform bit allocation. PSNR comparison among different bit allocation method is provided in Fig.4 for Tsukuba test image. We can see that over a wide range of target bit rate, the proposed bit allocation provides close results to the exhaustive search method. It also clearly outperforms the uniform bit allocation. 5. CONCLUSION

This paper presented optimal bit allocation framework based on a new rate distortion model for multi-view image coding with disparity-compensated wavelet lifting technique. The new rate distortion model combining the exponential and power model was proposed. Using the derived distortion and the proposed rate distortion model,

6. REFERENCES [1] X. Tong and R. M. Gray, “Coding of multi-view images for immersive viewing,” IEEE Proc. ICASSP’00, vol. 4, pp. 1879-1882, Jun. 2000. [2] M. Magnor, P. Ramanathan, and B. Girod, “Multi-view coding for image-based rendering using 3-D scene geometry,” IEEE Trans. CSVT., vol. 13, no. 11, pp. 1092-1106, Nov. 2003. [3] A. Secker and D. Taubman, “Motion-compensated highly scalable video compression using an adaptive 3D wavelet transform based on lifting,” IEEE Proc. ICIP’01, vol. 2, pp. 1029-1032, Oct. 2001. [4] A. Said and W. A. Pearlman, “A new, fast, and efficient image codec based on set partitioning in hierarchical trees,” IEEE Trans. Circuits Syst. Video Technol., vol. 6, no. 3, pp. 243-250, June 1996. [5] W. Swelden, “The Lifting Scheme: A Construction of Second generation Wavelets,” SIAM Journal of Mathematical Analysis, vol. 29, no. 2, pp. 511-546, Mar. 1998. [6] S. Mallet and F. Falzon, “Analysis of low bit rate image transform coding,” IEEE Trans. Signal Processing, vol. 46, pp. 1027-1042, 1998. [7] T.G. Cover and J. A. Thomas, Elements of Information Theory, Wiley, USA, 1991. [8] H. M. Hang and J. J. Chen, “Source model for transform video coder and its application: Fundamental theory,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 2, pp. 287-298, Apr 1997. [9] Y. Shoham and A. Gersho, “Efficient bit allocation for an arbitrary set of quantizers,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 36, no. 9, pp. 1445-1453, Sep. 1988. [10] T. Rusert, K. Hanke, and J. Ohm, “Transition filtering and optimization quantization in interframe wavelet video coding,” VCIP, Proc. SPIE, vol. 5150, pp. 682-693, 2003. [11] F. Bellifemine, A. Capellino, A. Chimienti, R. Picco, and R. Ponti, “Statistical analysis of the 2-D coefficients of the differential signal for images,” Signal Processing, Image Commun., vol. 4, pp. 477-488, 1992.

Suggest Documents