Rate-distortion optimized video coding considering ... - Semantic Scholar

2 downloads 0 Views 71KB Size Report
MERL - Mitsubishi Electric Research Laboratories, Murray Hill, NJ. Ю. Polytechnic ... encoder choose to code more frames with lower spatial qual- ity or fewer ...
RATE-DISTORTION OPTIMIZED VIDEO CODING CONSIDERING FRAMESKIP Anthony Vetroy , Yao Wangz and Huifang Suny y

z

MERL - Mitsubishi Electric Research Laboratories, Murray Hill, NJ Polytechnic University, Department of Electrical Engineering, Brooklyn, NY ABSTRACT

The general problem of optimized video encoding has received a great deal of attention in recent years. This paper focuses on the optimization of video coding with frameskip. We propose models that estimate the distortion for coded frames as well as non-coded frames. Using these models in conjunction with well-know models that estimate the rate allows us to formulate a rate control problem that tradesoff spatial and temporal quality. Simulation results indicate moderate improvements for low motion test sequences.

by Martins, et al. to some extent [4], however, the trade-off between spatial and temporal quality was achieved with a user selectable parameter. The rest of this paper is organized as follows. The next section discusses the models that we use to estimate rate and distortion for coded and non-coded frames. In section 3, the proposed rate control algorithm that considers trade-offs in the spatial and temporal quality is presented. Section 4 provides our simulation results, and in section 5, we provide concluding remarks and directions for future work.

1. INTRODUCTION

2. R-D MODELS

The general problem that we consider in this paper is the optimal rate-distortion (R-D) encoding of video sequences. The specific problem that we are interested in can be stated as follows. Given a particular video sequence, should the encoder choose to code more frames with lower spatial quality or fewer frames with higher spatial quality? It should be noted that this trade-off is not a simple binary decision, but rather a decision over a finite set of coding parameters. Obviously, the best set of coding parameters will yield the optimal R-D curve. The two parameters of interest are the number of frames per second (fps) and the quantization parameter (QP). It should be emphasized that the overall distortion of the coded sequence, which is usually measured in PSNR, must also include in its calculations the distortion of the frames that have been skipped. This is a point that is often over-looked in papers that report data with skipped frames. In the context of this problem, it is necessary to consider the distortion in this way. The majority of the literature on R-D optimization does not touch on temporal aspect [1]-[3]. By and large, it is assumed that the frame rate is fixed. These papers consider optimizations on the QP [1], mode decisions for motion and block coding [2] and frame-type selection [3]. To achieve the optimum given that frame rate is fixed and the bit rate can be met with the chosen frame rate, such algorithms and techniques can be applied. However, the optimum across varying frame rates has not yet been considered and is the main focus of this paper. It should be noted that this tradeoff between spatial and temporal quality has been studied

The purpose of this section is to extend the current notions of rate and distortion to include arbitrary frameskip. The objective here is to supply the rate control algorithm with a generalized quantification of the rate and distortion that includes frameskip. 2.1. Modeling the Rate According to [5, 6], a quadratic rate-quantizer (R-Q) relationship for a single frame at t tk is given by,



( )=S

R tk

k

=

X1;k Qk



+ XQ22

;k

(1)

k

where Sk is the encoding complexity, often substituted by the sum or mean of absolute differences of the residual component, Qk denotes the quantization parameter and Xi;k denotes the model parameters that are fitted to the data. Of course, the framework proposed in this paper is not limited to the above model; other models such as the one proposed in [7] may also be used. In any case, given the R-Q relationship for a single frame, the average bit-rate over time, R can be expressed as, R

=

+F X

i

k

=

( ) = F  R(t )

R tk

(2)

k

i

( )

where the F is the average frame rate and R tk is the average bit-rate per frame. The parameter that will tie the

rate and distortion together is the frameskip parameter, fs , which is defined as, fs

= FF

src

2.2. Modeling the Distortion Considering now a general formulation for distortion that accounts for skipped frames, we denote the average distortion for coded frames by D c Q and the average distortion for skipped frames by Ds Q; fs , where Q represents the quantization parameter used for coding and fs represents the amount of frameskip. The coded distortion is dependent on the quantizer, while the temporal distortion depends on both the quantizer and amount of frameskip. Although the frameskip factor does not directly influence the coded distortion of a particular frame, it does indeed impact this part of the distortion indirectly. For one, the amount of frameskip will influence the residual component, and secondly, it will also have an impact on the quantizer that is chosen. It is important to note that the distortion for skipped frames has a direct dependency on the quantization step size in the coded frames. The reason is that the skipped frames are interpolated from the coded frames, thereby carrying the same spatial quality, in addition to the temporal distortion. Given the above, we consider the average distortion over the specific time interval dti ; ti+fs , which is given by,

( )

)

]

(

Ddti ;ti+fs ] Qi+fs ; fs

1

fs

"

)=

(

Dc Qi+fs

)+

+X 1

i

k

fs

= +1

(

Ds Qi ; k

)

(4)

#

i

In the above, the distortion over the specified time interval is due to the (spatial) distortion of 1 coded frame at t ti+fs plus the (temporal) distortion of fs skipped frames, which is dependent on the quantizer for the previously coded frame at t ti . Coded Frame Distortion: From classic rate-distortion modeling [8], it is well-known that the variance of the quantization error is given by,

=

1

=

q2

=a2

2  2 R

z

(5)

where z2 is the input signal variance, R is the average rate per sample and a is a constant that is dependent on the pdf of the input signal and quantizer characteristics. In the absence of entropy coding, the value of a typically varies between 1.0 and 10, but may take on values less than 1 with

2 ( )  2

( ) =a2

R ti

Dc Qi

(3)

where Fsrc is the source frame rate. To be clear, fs is a parameter that will be used to quantify the distortion due to frame skipping. In turn, this affects the value of F and ultimately ties back to the average bit-rate, R.

(

entropy coding. We use the above equation to model the spatial distortion for coded frame i, (6)

zi

The above model is valid for a wide array of quantizers and signal characteristics. Such aspects are accounted for in the value of a. However, as stated earlier, the amount of frameskip can impact the statistics of the residual. In our experiments, we have found that the average bits/frame increases for larger values of fs . However, the variance remains almost the same. This indicates that the variance is not capable of reflecting small differences in the residual that impact the actual relation between rate and distortion. This is caused by the design of the particular coding scheme and is believed to happen because of the presence of high-frequency coefficients. Actually, we believe that it is not only their presence, but also the position of such coefficients. If certain run-lengths are not present in the VLC table, less efficient escape coding techniques must be used. This probably means that fs affects the pdf of the residual, i.e., the value of a, while not changing z2i much. Currently, we ignore any changes in the residual due to frameskip and simply use the model given by eqn. (6) to model the spatial distortion. A fixed a and z2i computed from the last coded frame is used. Non-Coded Frame Distortion: To model the temporal distortion due to skipped frames, we assume, without loss of generality, that the temporal interpolator simply repeats the previously coded frame. Other interpolators that average past and future reference frames, or make predictions based on motion, can still be considered in this framework, but the following derivation may no longer hold. As alluded to earlier, the distortion due to skipped frames can be broken into two parts: one due to the coding of the reference and another due to the interpolation error. In [9], we have proposed the following expression for the temporal distortion,



(

Ds Qi ; k



) = D (Q ) + E 2z c

i



(7)

i;k

2 zi;k denotes the expected interpolation error where E and was derived based on the principle of optical flow,



E

 2  zi;k = 2 2 xi



xi;k

+ 2 2 yi

yi;k

(8)

In the above, x2i ; y2i represent the variances for the x and 2 x ; 2 y represent y spatial gradients in frame i, and  i;k i;k the variances for the motion vectors in the x and y direction. The above shows that it is sufficient to model the interpolation error based on the second-order statistics of the motion and spatial gradient. In [9], this model was shown to be accurate for low to moderate motion sequences, which is sufficient since an optimized coder would not need such an accurate model when the motion is high.

(

)

(

)

3. RATE CONTROL 3.1. Algorithm Overview In the previous section, we provided a set of formulas that allow us to estimate the rate for coded frames and the average distortion over a given time interval that accounted for coded and skipped frames. To make use of these formulas, we now consider a rate control algorithm that minimizes the average distortion subject to constraints on the overall bitrate and buffer occupancy. Stated formally,

arg min[8 + Qi

s:t:

< :

fs ;fs

R Bi Bi

] Dd

ti ;ti

R

(

+fs ] Qi+fs ; fs

)

(9)

fs

i

fs

s

=1

= maxf1; f

l

g

Æ , Dmin

=1

3. Determine the quantizer value, Qi+fs using eqn. (1). 4. Estimate the distortion using eqn. (4). 5. Check if the quantizer and the rate that it expects to produce still satisfies rate and buffer constraints. If not, skip to last step since the current value of fs is no longer valid. Otherwise, continue. 6. If the current distortion is less than Dmin , then replace Dmin with the current distortion and record encoding parameters.

min +

= f + 1 while new f  s

i

i

i

i

rent value of frameskip and is expressed as, i

s

(11)

This modification to the traditional buffer fullness must be made to simulate the lower occupancy level as a result of frame skipping. Otherwise, the scaling operation in (10) would force the target too low. If the target is too low for higher fs values, the resulting quantizer would not be able to differentiate itself from the quantizers that were computed at lower fs values. In this case, it would be difficult for the trade-off between coded and temporal distortion in eqn. (4) to ever be in favor of skipping frames. 3.2. Practical Considerations

2. Calculate the target number of bits for the frame, which is mainly dependent on the current value of fs and Bi .

7. Repeat from step 2 with fs ffl Æ; fmaxg.

i

Bi

where R is the target bit-rate, Bmax is the maximum buffer size in bits, Bi is the current buffer level, also in bits, and Rdrain is the rate at which the buffer drains per frame. To solve the above, the following algorithm is proposed. , We begin encoding the video sequence by setting fs which assumes that the full frame-rate is initially used. Let fl denote the amount of frameskip between the last coded frame and its reference. Then, iterations at each coding instant are as follows: 1. Set fs

~ ~ = T1  B2B~ ++2((BBmax BB~ )) ; (10) max ~ , accounts for the curwhere the modified buffer fullness, B T2

~ = B (f 1)  Rdrain

+ R(t + ) < Bmax + R(t + ) f  Rdrain > 0 i

according to the number of remaining bits, the number of frames left in the segment and the bits spent during the last frame. The only difference with this initial estimate is that the remaining number of frames are divided by the candidate fs . In this way, a proportionately higher number of bits will be assigned when the frameskip is higher. After the initial target has been determined, it is scaled according to,

s

The above steps provide an overview of how the algorithm determines its encoding parameters, fs and Qi+fs , at a given coding time instant. It should be noted that the parameter Æ is used to limit the change in fs from one coded frame to another, similar to the usual bounding of the quantization parameter. In the following, further details on the target bit calculation and buffer control are given. Given a candidate value of fs , the target bit rate for the frame is dependent on this value of fs and the buffer occupancy, Bi . As in [6], an initial target, T1 , is determined

The main practical aspect to consider is how the equations for the distortion of non-coded frames are evaluated based on current and past data. For instance, in its current form, eqn. (8) assumes that the motion between i, the current time instant, and k , a future time instant is known. However, this would imply that motion estimation is performed for each candidate frame, k . Since such computations are not practical, it is reasonable to assume linear motion between frames and approximate the variance of motion vectors by,

2 

xi;k

 2

xi

fl ;i





k

i fl

2

(12)

Similarly, estimates of the distortion for the next frame to be coded (i.e., calculation of eqn. (6)) requires knowledge of a and z2i , which depends on fs . As mentioned earlier, motion estimation for every candidate frame is not performed, therefore the actual residuals are not available either. To overcome this practical difficulty, the residual for future frames may also be predicted based on the residual ti . However, as discussed earof the current frame at t lier, the relationship between the a, z2i and frameskip is not as obvious as the relation between motion and frameskip. Also, we have observed that changes in the variance for different frameskip are very small. Therefore, we use the residual variance of the current frame at t ti for the candidate frames as well. In this way, changes in Dc are only affected by the the bit budget for candidate frameskip factors.

=

=

42 41 40

PSNR (db)

and non-coded frames, as well as well-known models that define the relationship between rate and quantizer. Simulation results show that noticeable improvements for low motion sequences can be achieved. Finally, we believe that these methods could be better exploited in an object-based coding scheme that codes objects with different temporal rates. This is a promising direction for future work.

Reference Proposed

39 38 37 36 35 34

6. REFERENCES

33 0

50

100 150 200 Bit Rate (kbps)

250

300

Figure 1: Comparison of R-D curves for Akiyo. 4. SIMULATION RESULTS To evaluate the performance of the proposed coding method, we consider the Akiyo test sequence. This sequence is encoded at a number of constant bit rates using the standard MPEG-4 rate control algorithm that is implemented as part of the reference software [10]. The bit rates that we consider range from 32kbps to 256kbps, and the sequences are encoded at a full frame rate of 30fps. A comparison of R-D curves are shown in Fig. 1. It is clear from this plot that the proposed method outperforms the reference method. At the lowest bit rates, the difference is almost 1dB, while at higher bit-rates, an improvement of 0.4db can still be observed. In the low bit rate simulations, the reference method is forced to skipped frames due to buffer constraints, whereas the proposed method skips frames based on the minimum distortion criterion and rate constraints described in the previous sections. For sequences with moderate to high motion, similar results between the reference and proposed method can be expected. In these cases, the temporal distortion would be to always be high, causing the overall distortion for fs 6 . To achieve greater than the overall distortion for fs gains in these sequences, especially sequences with motion that is localized, e.g., the news sequence, one should consider an object-based framework. In such a framework, objects can be coded with different temporal resolutions, where the temporal resolution for each object is determined similar to the methods proposed in this paper. The major challenge is to devise a rate control scheme that maintains a stable buffer and is capable of performing the bit allocation on objects with varying temporal resolution.

=1 =1

5. CONCLUSIONS In summary, we have proposed a rate control scheme that minimizes the distortion for video coding with frameskip. The method relies on new models for the distortion of coded

[1] H. Sun, W. Kwok, M. Chien, and C.H. John Ju, “MPEG coding performance improvement by jointly optimizing coding mode decision and rate control,” IEEE Trans. Circuits Syst. Video Technol., June 1997. [2] T. Weigand, M. Lightstone, D. Mukherjee, T.G. Campbell, S.K. Mitra, ”R-D optimized mode selection for very low bit-rate video coding and the emerging H.263 standard,” IEEE Trans. Circuits Syst. Video Technol., Apr. 1996. [3] J. Lee and B.W. Dickenson, “Rate-distortion optimized frame type selection for MPEG encoding,” IEEE Trans. Circuits Syst. Video Technol., June 1997. [4] F.C. Martins, W. Ding, and E. Feig, ”Joint control of spatial quantization and temporal sampling for very low bit rate video,” Proc. ICASSP, May 1996. [5] T. Chiang and Y-Q. Zhang, “A new rate control scheme using quadratic rate-distortion modeling,” IEEE Trans. Circuits Syst. Video Technol., Feb 1997. [6] A. Vetro, H. Sun, and Y. Wang, “MPEG-4 rate control for multiple video objects,” IEEE Trans. Circuits and Syst. Video Technol., Feb. 1999. [7] H.M. Hang and J.J Chen, “Source model for transform video coder and its application - Part I: Fundamental theory,” IEEE Trans. Circuits Syst. Video Technol., vol.7, no.2, pp. 287-298, April 1997. [8] N.S. Jayant and P. Noll, Digital Coding of Waveforms, Prentice Hall, 1984. [9] A. Vetro, Y. Wang and H. Sun, “Estimating distortion of coded and non-coded frames for frameskipoptimized video coding,” submitted to Int’l Conf. on Multimedia and Expo, Jan. 2001. [10] ISO/IEC 14496-5:2000 “Information technology – coding of audio/visual objects,” Part 5: Reference Software.

Suggest Documents