Frame Bit Allocation for the H.264/AVC Video Coder Via Cauchy ...

6 downloads 0 Views 883KB Size Report
... the performance of the Cauchy-distribution based rate and distortion models with .... error distortion model to derive a logarithmic rate-quantiza- tion model in ...
994

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 8, AUGUST 2005

Frame Bit Allocation for the H.264/AVC Video Coder Via Cauchy-Density-Based Rate and Distortion Models Nejat Kamaci, Student Member, IEEE, Yucel Altunbasak, Senior Member, IEEE, and Russell M. Mersereau, Fellow, IEEE

Abstract—Based on the observation that a Cauchy density is more accurate in estimating the distribution of the ac coefficients than the traditional Laplacian density, rate and distortion models with improved accuracy are developed. The entropy and distortion models for quantized discrete cosine transform coefficients are justified in a frame bit-allocation application for H.264. Extensive analysis with carefully selected anchor video sequences demonstrates a 0.24-dB average peak signal-to-noise ratio (PSNR) improvement over the JM 8.4 rate control algorithm, and a 0.33-dB average PSNR improvement over the TM5-based bit-allocation algorithm that has recently been proposed for H.264 by Li et al. The analysis also demonstrates 20% and 60% reductions in PSNR variation among the encoded pictures when compared to the JM 8.4 rate control algorithm and the TM5-based bit-allocation algorithm, respectively. Index Terms—Advanced video coding, bit allocation, H.264, rate control, rate and distortion modeling.

I. INTRODUCTION

O

VER the past few decades, transform-based compression for image and video sources has gained widespread popularity for visual information management, processing, and communications. As a result, several industry standards have been developed, such as JPEG [1] for still image coding, and MPEG [2]–[4] and H.26x [5], [6] for video coding. With all of these image and video processing methods, the image frame(s) is divided into nonoverlapping blocks, and a transformation is applied to the block before quantization and entropy coding. The most common transform used in these methods is the two-dimensional discrete cosine transform (DCT). Knowledge of the DCT coefficients’ probability distribution is important in the design and optimization of the quantizer, entropy coder, and related video processing algorithms. It is particularly important in rate control for video coding, since the problems of bit allocation and quantization scale selection require knowledge of the rate-distortion relation as a function of the encoder parameters and the video source statistics. In the earliest studies, the ac coefficients of the DCT were conjectured to have Gaussian distributions [15], [16]. Later, several other distribu-

Manuscript received July 22, 2003; revised March 5, 2004. This paper was recommended by Associate Editor H. Sun. The authors are with the Center for Signal and Image Processing, Georgia Institute of Technology, Atlanta, GA 30332-0250 USA (e-mail: [email protected]; [email protected]; [email protected]). Digital Object Identifier 10.1109/TCSVT.2005.852400

tion models were reported, including generalized Gaussian and Laplacian distributions [17]–[22]. Among these, the Laplacian distribution is probably the most popular, and it is widely used in practice. Our observation is that the actual distributions of the DCT coefficients in image and video applications differ significantly from the Laplacian distribution in most cases. As a result, rate and distortion models based on a Laplacian distribution sometimes fail to estimate the actual rate-distortion-coding parameter relations accurately. Motivated by this observation, we have looked for other analytical density functions that might fit the actual distribution of the DCT coefficients more accurately without a significant complexity increase. We placed two constraints on the distribution model of the DCT coefficients in this search: 1) accuracy and 2) simplicity. Our finding is that the Student t-distribution with one degree of freedom, or equivalently the centralized Cauchy density is a better choice than the Laplacian density for estimating the actual probability density function (pdf) of the DCT coefficients. Based on this finding, we develop new models that describe the relations between the quantization parameter of a video coder and the output bit rate and the distortion caused by quantization. We prove our claim that the Cauchy density is more accurate than the Laplacian density by comparing the performance of the Cauchy-distribution based rate and distortion models with the Laplacian-distribution based rate and distortion models. We show that the change of the rate and the distortion as a function of the quantization parameter can be estimated more accurately using the Cauchy-distribution based models. Later, we use these models to develop a novel solution to the frame bit-allocation problem for rate control of an H.264 video coder to justify the practicality and effectiveness of our findings. The paper is organized as follows. Section II provides a brief review on H.264 video coding and rate control for video coding. It also provides a review of related work on DCT coefficient statistics and rate-distortion analysis and rate-control algorithms for DCT-like transform-based video coders. In Section III, we present our analysis of the DCT coefficients’ statistics, and develop the new rate and distortion models. In Section IV, we propose a frame bit-allocation algorithm that uses the new rate and distortion models for H.264 rate control. Section V compares the performance of the proposed frame bit-allocation algorithm experimentally with a number of existing algorithms

1051-8215/$20.00 © 2005 IEEE

KAMACI et al.: FRAME BIT ALLOCATION FOR THE H.264/AVC VIDEO CODER VIA CAUCHY-DENSITY BASED RATE AND DISTORTION MODELS

implemented for the H.264 video coder. Finally, we draw some conclusions in Section VI. II. BACKGROUND AND PRIOR WORK A. H.264 Video Coding We start with a brief overview of the H.264 video coding standard. The H.264 (equivalently MPEG-4 Part 10) standard is one of the most advanced video coding standards that has been developed. H.264 is designed in two layers: a video coding layer (VCL), and a network adaptation layer (NAL). We are primarily interested with the VCL here. The overall design of the VCL is similar to previous video coding standards [8]. As in H.263 and MPEG-2, the H.264 VCL uses translational block-based motion compensation and transform-based residual coding. It also features scalar quantization with an adjustable quantization step size for output bit-rate control, zigzag scanning and run-length VLC coding of the quantized transform coefficients. However, there are significant differences in the details. The H.264 standard enables the use of a more flexible and efficient model for motion compensation. Use of multiple reference pictures and different block sizes are supported for motion compensation. The standard also specifies the use of an improved deblocking filter within the motion compensation loop in order to reduce visual artifacts. In addition to advances in traditional coding functions, H.264 also introduces new coding functions. It uses an integer 4 transform reduces blocking transform [9], [10]. This 4 and ringing artifacts, and also eliminates encoder-decoder mismatches in the inverse transform, which were present in floating-point DCT implementations. This new transform is a very close approximation to the DCT. H.264 also supports spatial prediction within the frames that helps to reduce the residual energy of motion compensation. The standard also uses more complex and efficient context-adaptive binary arithmetic coding (CABAC) for entropy coding of the quantized transform coefficients [11]. In common with the earlier standards, the H.264 video coding standard does not explicitly define an encoder-decoder pair. Many functional parts of the encoder and the decoder are left open for optimization. One of these functional parts is the rate-control module that is responsible for controlling the output bit rate of the encoder. B. Rate Control for Video Coding The output bit rate and video quality of a video encoder depend on several coding parameters such as the quantization scale and coding mode. In particular, choosing a large quantization scale reduces the resulting bit rate, while at the same time reducing the visual quality of the encoded video. In most applications, a predetermined constant output bit rate is desired. These applications are referred to as constant bit-rate (CBR) applications. The output bit rate of an encoder can be controlled by carefully selecting the quantization parameters for each coding block. This task is performed by the rate-control module. The task of the rate-control module is complicated by the fact that the encoded video quality should consistently be kept at the highest

995

possible quality level for each picture frame and within each frame, avoiding such visual artifacts as blurring, blocking, and jitter. The goal of rate-control unit is to keep the output bit rate within constrained limits while achieving maximally uniform video quality. For practical reasons, the rate-control problem is studied in three subproblems: 1) group of pictures (GOP) bit allocation; 2) picture bit allocation; and 3) macroblock selection. GOP bit allocation involves selecting the number of bits to allocate to a GOP, which in the case of CBR rate-control, simply amounts to assigning a fixed number of bits per GOP. Picture bit allocation involves distributing the GOP budget among the picture frames, so as to achieve a maximal, uniform video quality. Although it does not fully represent the visual quality, the peak signal-to-noise ratio (PSNR) is most commonly used to quantify the video quality. Macroblock selection involves tuning the parameter for each macroblock of a frame so that the the rate regulations are met and a uniform quality is achieved within the picture. As in the H.264 reference software, selection may also affect the motion estimation and compensation operations. The H.264 reference software uses an R-D optimized algorithm for motion estimation in which the Lagrangian multiplier of is selected using an equation that the cost function involves . The optimal solution to the rate-control problem requires explicit knowledge of the video source properties and the encoder operation effects. If we could know the rate-distortion characteristics of each coding block as a function of its properties and the coding parameters, we would be able to solve the rate-control problem in an optimal way. Since we do not have this knowledge, either we approach the solution iteratively by making several encoding/decoding passes to meet our targets, or use mathematical models that estimate the rate and quality of the output of the video coder as a function of the encoding parameters. The first choice of a multipass system is not desirable for most applications that demand a very fast encoder. This is why many conventional rate-control algorithms use rate and distortion models for their operation. C. Related Work on the DCT Distribution Image and video coding systems that use a two-dimensional DCT make several different assumptions about the distributions of the transform coefficients. Pratt [15], and Netravali and Limb [16] conjectured that the dc coefficients should have a Rayleigh distribution, and other coefficients should be Gaussian, based on the central limit theorem. Later, Reininger [17] reported that the dc coefficient was Gaussian and that the ac coefficients were Laplacian. Smooth [21] also recommended the Laplacian distribution to model the ac coefficients. Other studies modeled the DCT coefficients using more complex probability density functions. In [19], Muller used a generalized Gaussian function that includes Gaussian and Laplacian pdf’s as special cases. Eude et al. reported that the DCT coefficients could be modeled by a proper linear combination of a number of Laplacian and Gaussian probability density functions [20]. Comparing their model with Laplacian, Gaussian and Cauchy probability density functions, they claimed that the

996

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 8, AUGUST 2005

DCT coefficients follow neither Cauchy, nor Laplacian distributions, but were most accurately modeled as a mixture of Gaussians. Although a generalized Gaussian density can model the statistics of the DCT coefficients more accurately, it is not used in practice widely because it is difficult for mathematical analysis. D. Related Work on Rate Control Based on the assumption that the DCT coefficients have a Gaussian distribution, Tao et al. used the theoretical rate-distortion function for a Gaussian random variable with a squared error distortion model to derive a logarithmic rate-quantization model in [24]. A perceptual macroblock classification for quantization parameter modulation was also used. The proposed algorithm was reported to achieve 1.2- and 0.5-dB PSNR improvements over the TM5 algorithm [25] of the MPEG-2 test model for the FLOWER GARDEN and the MOBILE AND CALENDAR sequences, respectively. The computational complexity of the proposed algorithm was reported to be 25% more than TM5. In [34], two formulas are used for the rate and distortion as functions of the quantization parameter . The first

was derived from the entropy of a Laplacian distributed random variable with variance . The distortion model is simply

Fig. 1. Distribution of the ac DCT coefficients—a selected example from a frame of the Akiyo sequence (QCIF format).

In general, the Gaussian and the Laplacian probability densities are the most commonly used models for rate-control applications because of their simplicity. The complex models reported in [19] and [20] are not used in practice although they have potential because of their accuracy. Furthermore, the Laplacian density is reported to be a better fit than the Gaussian in several studies, such as in [39], so we will concentrate on the Laplacian density in our analysis. III. ANALYSIS OF DCT COEFFICIENTS OF TYPICAL VIDEO SOURCES

which is derived from the quantization error of a uniform for the th macrandom variable at a quantization level of roblock. Their rate-control algorithm uses these models to solve for the optimal set of quantization parameters and was adopted for test model TMN8 of H.263 [35]. This algorithm was reported to outperform the previous rate-control algorithms for H.263 video. In [36], [37], and [38], He et al. presented a rate-distortion (R-D) model based on the fraction of zeros among the quantized DCT coefficients (denoted as ) and a rate-control method based on these models. It assumes that the DCT coefficients have a Laplacian distribution. Based on the observation that the bit rate and have a linear relationship, He et al. used

as the rate model and

as the distortion model where and are model parameters. The resulting rate-control method was reported to perform better than the TMN8 rate-control algorithm both in terms of its buffer regulation performance and the resulting picture quality. An average of 0.3-dB PSNR improvement (over TMN8) was reported for various video sequences.

Typical visual information sources range from conversational video where the temporal information variation is small, to action movies where both the spatial and temporal variations can be large. Briefly, we can classify video sources according to the amount of spatial and temporal information they carry. Based on its spatial variation, a video source can fall into one of two categories: 1) smooth and 2) textured. Based on its temporal variation, a video source can also be classified into two categories: 1) slow motion and 2) varying content.

The performance of a video coder in terms of its output bit rate, and the encoded video quality varies with the nature of the video source. Most video coding applications dictate certain conditions on the encoded video sequence, such as a desired quality or a constraint on the output bit rate. Traditional video coders use block-based DCT’s for compression.1 For intra coding, the DCT is applied to the image itself; for nonintra coding, a residual image is obtained by performing a prediction, and the DCT is applied to this residual. In both cases, knowledge of the DCT coefficients’ distribution is valuable for optimizing the video coder. Such information is important, for instance, in designing the quantizer and motion compensation. Fig. 1 shows a typical plot of the histogram of the DCT coefficients for an 8 8 block based DCT of an image, for which 1The

H.264 coder uses an approximate version of the DCT.

KAMACI et al.: FRAME BIT ALLOCATION FOR THE H.264/AVC VIDEO CODER VIA CAUCHY-DENSITY BASED RATE AND DISTORTION MODELS

Fig. 2. Comparison of Laplacian versus Cauchy pdfs—selected intra and nonintra frames from Tempete.

the dc coefficients are excluded. The image used here is from the AKIYO sequence. As discussed in Section II, the DCT coefficient distribution is most commonly approximated by a Laplacian pdf with parameter :

997

Fig. 3. Frame from the tempete sequence (CIF format).

sources. Then we will compare the rate-estimation accuracy of the Cauchy-pdf-based entropy function with the Laplacian-pdf-based entropy function by encoding a wide range of video sequences. A. Cauchy-Based Rate Estimation

(1) The Laplacian density has an exponential form, leading to the property that the tail of the density decays very quickly. However, we have observed that the actual DCT coefficient histogram has heavy tails in several cases. Furthermore, we have observed that a zero-mean Cauchy distribution with parameter , having the pdf

It is expected that a more accurate estimate of the ac distribution will lead to more accurate estimate of the rate. The entropy as a function of the quantization parameter both for the Cauchy and Laplacian distributions can be derived as follows. Assume that the DCT coefficients are uniformly quantized be the probability that a with a quantization level . Let coefficient is quantized to , where . Then the entropy of the quantized DCT coefficients is computed as

(2) is a better fit to the signals than the Laplacian density in most cases. The parameter depends on the picture content. This claim contradicts the conclusion reported in [20]. Fig. 2 illustrates the accuracy of the best fit for both Cauchy and Laplacian pdfs for the DCT coefficient distribution of the image shown in Fig. 3. In the figure, the first plot shows the actual distribution of the coefficients for the intra coding case; the second plot shows the actual distribution for the nonintra coding case. In both cases, the Cauchy pdf is a better fit to the actual distribution than the Laplacian pdf. Obviously, however, one example is not enough to prove our claim. We need to do an extensive comparison with a wide range of typical video sources. Equivalently, we can compare the entropy rate curves derived from both densities with the actual output rate of the video sources. This second approach is more interesting since we ultimately seek to achieve better accuracy in modeling the output rate of a video coder. Therefore, in order to justify our claim, first we will develop the entropy functions for the Cauchy and Laplacian

(3) where

For a Laplacian distribution, see the first equation at the bottom of the next page. Therefore, the entropy function as a function of for a Laplacian distribution is

(4) For a Cauchy distribution, please see the second equation at the bottom of the next page.

998

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 8, AUGUST 2005

Therefore, the entropy function as a function of Cauchy distribution is

for a

(5) These two entropy functions based on the Laplacian and Cauchy pdfs are computable, provided that the density parameters and are known. The Laplacian parameter can be computed using its relation to the variance of the ac coefficients

The Cauchy parameter can be computed using the histogram of the ac coefficients as follows. is the empirical cumulative density function Suppose (cdf) obtained from the DCT histogram. We claim that this is the cdf of a Cauchy source with parameter denoted as and given as

Let

for some threshold . A proper selection of would be to set it equal to . While is not known, its range has been found . Thus, in experimentally to generally lie in the interval our experiments we select . Therefore, to calculate , we only need to calculate the fraction of the DCT coefficients with absolute value less than a preselected threshold value (e.g., ) and solve for using (6). B. Rate Experiments To show the effectiveness of the rate model given in (5), we conducted several experiments with a number of video sequences. We encoded an intra frame followed by a nonintra frame for each sequence at several quantization levels to obtain the actual rate as a function of the quantization level for both intra and nonintra coding. The encoder is configured to use CABAC for entropy coding with the rate-distortion optimal mode selection turned on. We calculated the rate estimates using the Cauchy- and Laplacian-based entropy functions and compared these with the actual rates. We used the H.264 reference software version JM 8.42 [40] for our experiments.3 For illustration, Figs. 4 and 5 show the Laplacian-based rate estimates and Cauchy-based rate estimates in comparison with for both intra and nonintra the actual rate as a function of coding. Table I shows the rate estimation accuracy based on both distributions with several frames selected from a wide range of video sequences. In the table, the rate estimation error is calculated as Error

target rate actual rate target rate

2[Online]

Then

Available: http://iphome.hhi.de/suehring/tml H.264 video coder is one of the most advanced existing video coders, and we expect its compression efficiency to be closest to the theoretical limits among the existing video coders so that the output rate of the encoder will be close to the entropy lower bound. 3The

can be calculated as (6)

if if if

if if

KAMACI et al.: FRAME BIT ALLOCATION FOR THE H.264/AVC VIDEO CODER VIA CAUCHY-DENSITY BASED RATE AND DISTORTION MODELS

Fig. 4. Actual rate versus Laplacian and Cauchy based rate estimates for (a) an intraframe and (b) a nonintra frame from Foreman sequence.

Clearly, the Cauchy-based rate estimate is significantly better than the Laplacian-based rate estimate. The Cauchy estimate matches the coder performance very accurately, especially for intra coding.

999

Fig. 5. Actual rate versus Laplacian and Cauchy based rate estimates for (a) an intraframe and (b) a nonintra frame from Irene sequence. TABLE I COMPARISON OF THE RATE ESTIMATION PERFORMANCE OF THE LAPLACIANAND CAUCHY-BASED RATE MODELS

C. Simplified Rate Model Although the entropy of a quantized Cauchy source can be computed using (5), it would be preferable to have a simpler formula. If we plot the entropy function given in (5) for different values of , we obtain nearly linear behavior on a log scale. Fig. 6 shows the entropy function log-log plots for five different values. Clearly, there is a nearly linear relation between and , especially for smaller values and for . Therefore, we can approximate the entropy function of the quantized Cauchy source given in (5) as (7)

where are parameters that depend on . These parameters can be calculated using (5) as follows. First, evaluate (5) for a given at quantization levels. Then, we have a set of equations and two unknowns. We can solve for and as the

1000

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 8, AUGUST 2005

Fig. 6. Log–log plot of the entropy function given in (5) for five different values of . TABLE II APPROXIMATE RATE MODEL PARAMETER VALUES FOR CORRESPONDING CAUCHY PARAMETERS ()

Fig. 7. Theoretical versus approximated entropy functions for five different values of the Cauchy distribution parameter .

This equation suggests that the distortion depends on in addition to . Although this equation is highly complex, it can be approximated. A similar analysis as was performed in the rate simplification section leads to the approximation (9)

least square error solution. This can be done offline, and Table II shows values of and for a possible set of values. To assess the accuracy of this approximation, we plot the entropy function in (5) and the approximate entropy function in (7) for different values of . As shown in Fig. 7, the approximation . is accurate, especially for

are parameters that depend on . As in the rate where modeling case, we will demonstrate this result by comparing the actual distortion with the estimate. Fig. 8 shows actual distortion function versus the estimated distortion function obtained using the distortion model given in (9). The approximate rate and distortion equations of (7) and (9) lead to the rate-distortion function (10)

D. Cauchy-Based Distortion Estimation The distortion due to quantization can also be estimated accurately for the Cauchy pdf assumption. Assume that we have a uniform quantizer with step size . The distortion caused by quantization is given by

It can be shown that this infinite sum converges and is . For a Cauchy source, this bounded from above by expression becomes

(8)

with proper selection of the values and . IV. APPLICATION OF THE PROPOSED RATE DISTORTION MODELS IN RATE CONTROL: FRAME BIT ALLOCATION FOR H.264 To justify the effectiveness and practicality of the rate model based on the Cauchy distribution, we apply it to the problem of bit allocation for an H.264 video coder. As part of the rate control problem of a video coder, the rate control module distributes a given GOP bit budget among the pictures of the GOP. The required bit budget for each picture frame that will result in a constant quality video output varies with the picture content and the picture coding type. I-type pictures require more bits than P- and B-type pictures (nonintra pictures) for a given picture quality. For P- and B-type pictures for the most part motion-compensated frame differences are encoded.4 Because of the motion-compensated prediction, the number of bits needed to encode nonintra pictures depends on how well the previous reference pictures have been encoded. 4It

is possible to have intra coded macroblocks in P- and B-pictures.

KAMACI et al.: FRAME BIT ALLOCATION FOR THE H.264/AVC VIDEO CODER VIA CAUCHY-DENSITY BASED RATE AND DISTORTION MODELS

1001

and

where is the distortion of the th frame. For H.264, selection with roughly produces equal of picture qualities for I-, P-, and B-pictures. Also for simplicity, we assume that the newly developed rate model of (7) is valid for the whole frame. Using the newly developed rate model given for the th frame can be in (7), the target number of bits calculated as if intra else where (11) This requires computation of the parameters of the rate. These distortion relation given in (7) for frames parameters are not known before encoding, thus they need to be estimated. Estimation of these parameters will be discussed in Section IV-A. Equation (11) can be solved iteratively using Newton’s method. Let

and let be the value of iteration can be used to solve

while

at the

th step. The following

repeat

Fig. 8. Actual distortion versus Cauchy and Laplacian based rate approximations for (a) an intra frame, and (b) a nonintra frame from mobile and calendar sequence. The distortion model parameters are: (a) b = 0:20; = 1:44 and (b) b = 0:28; = 1:35.

At the picture layer the bit budget for encoding the transform coefficients should be set so that the overall video quality is maximized. The overall picture bit budget is shared by the encoding of the motion vectors, the quantized DCT coefficients, and the overhead bits. We consider constant-bit-rate scenarios in which a constant number of bits is used within a GOP with a constant quantization parameter within a frame for each frame. Assume that we have a GOP size of frames. Also assume that we are about to encode frames the th frame of the GOP, so that the previous have already been encoded. Assume that we have a bit budget for the remaining frames of the GOP with indexes of . We want to find the bit budget for the th frame so that

The quantization parameter for the frame can be computed using (7)

where level.

denotes rounding to the nearest possible quantization

A. Parameter Handling and Practical Issues In this paper, we focus on single-pass implementation scenarios in which we do not have prior information about the statistical properties of the video source. Hence, the model paramof the approximate rate model of (7) are not known eters and need to be estimated. We propose the following approach: we consider separate esand for each of the picture timators

1002

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 8, AUGUST 2005

TABLE III PERFORMANCES OF THREE ALGORITHMS IN TERMS OF OUTPUT RATE, PSNR, AND EXECUTION TIME.

types for parameters and , respectively. For each of the picture picture with index in a GOP, parameters are estimated using previously encoded pictures as if if if if if if

is an I-type picture is a P-type picture is a B-type picture is an I-type picture is a P-type picture is a B-type picture

Now, we will explain the initialization and update of these estimators. As suggested in [41], we consider initialization of the frame bit allocation algorithm by choosing a frame quantias zation parameter for each frame type as follows: Define the average number of bits targeted per a video frame pixel. It is calculated as

where is the target bit rate, is the frame rate, and is the number of pixels per frame (e.g., for a 4:2:0 format QCIF ). For the given bit rate , sequence, the initial quantization parameter for the first intra picture is decided as

Consequently, . Using the first set of pictures, we estimate the model parameters as follows: after encoding the first intra picture, we collect the DCT statistics, , and set calculate output bits if if else and

KAMACI et al.: FRAME BIT ALLOCATION FOR THE H.264/AVC VIDEO CODER VIA CAUCHY-DENSITY BASED RATE AND DISTORTION MODELS

1003

Fig. 9. PSNR versus frame for three rate control algorithms.

Also, after encoding first P- and B- pictures, we calculate the output bits and set if if else if if else and

For the remaining pictures in the video sequence, the model parameters are updated as follows: 1) is fixed and 2) parameter for picture type is updated using

where is a forgetting factor. We choose lations.

in our simu-

B. Proposed Frame Bit Allocation Algorithm For a given bit rate target and a frame rate , the GOP is given as . The bit budget for bit budget frame is calculated as follows. 1) Initialization: Calculate , and . Initialize the model parameters as described in the parameter handling section. 2) Start of a GOP: Set

where

is the constant target GOP rate.

1004

Fig. 10.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 8, AUGUST 2005

Buffer status as a function of time for three rate control algorithms.

3) For to a) Solve for

repeat: using

b)

Solve for

using

c) d)

Encode frame, and calculate actual output bits Update

4) Update the model parameter 5) If end of the sequence reached, stop. Else go to step 2.

V. APPLICATION RESULTS

.

For evaluating its performance, we implemented the proposed frame bit-allocation algorithm on the joint video team (JVT) H.264 reference encoder software. For comparison, we considered two other methods: 1) the rate control algorithm of JM 8.4 and 2) the improved TM5-based frame bit-allocation algorithm proposed for H.264 in [43]. We used two sets of color video sequences. The first set consisted of the CIF (352 288 pixel resolution) sequences:IRENE, PARIS, MOBILE AND CALENDAR, and TEMPETE. The second set consisted of the QCIF (176 144 pixel resolution) sequences: AKIYO, CAR PHONE, FOREMAN, IRENE, NEWS, COASTGUARD, SILENT, and MOTHER AND DAUGHTER. In the tests, we used an [IPPP ] GOP structure with a GOP size of 12. We used the first 120 frames of each sequence. The H.264 encoder was configured to have two reference frames for

KAMACI et al.: FRAME BIT ALLOCATION FOR THE H.264/AVC VIDEO CODER VIA CAUCHY-DENSITY BASED RATE AND DISTORTION MODELS

inter motion search, -pel motion vector resolution CABAC for symbol coding, rate-distortion optimized mode decisions, and full search motion estimation with a search range of 16. The target bit rates are selected as to achieve an output average PSNR close to 35 dB. The PSNR values are measured on the luminance component only. Table III summarizes the encoding results. Average PSNR values, and the PSNR variation between frames are also shown in the table. The rate control method using the proposed frame bit-allocation algorithm achieves an average of 0.33-dB PSNR gain over the one that uses the TM5-based frame bit allocation algorithm proposed in [43] and an average of 0.24-dB PSNR gain over JM 8.4 rate control method [40]. In addition, the proposed algorithm achieves considerably reduced PSNR variation between frames on the average when compared to the other algorithms. The proposed algorithm achieves 20% and 60% reductions in PSNR variation among the encoded pictures when compared to the JM 8.4 rate control algorithm and the TM5based bit-allocation algorithm, respectively. Table III also shows the execution times for the three algorithms. The proposed algorithm complexity is only marginally worse than the TM5-based algorithm and much better than JM 8.4. Therefore, the proposed frame bit allocation algorithm can achieve significant quality improvement over TM5 based algorithm and JM 8.4 rate control without any significant complexity increase or rate output performance degradation. For further evaluation, Fig. 9 shows the PSNR versus frame plots for each video sequence. The proposed algorithm shows superior performance by achieving a consistent video quality throughout the sequence for all video sequences. Fig. 10 shows the buffer occupancy versus time for each video sequence. The proposed algorithm achieves similar performance when compared to the TM5-based algorithm and JM 8.4.

VI. CONCLUSION In this paper, we presented rate and distortion models developed using Cauchy density approximation to the DCT coefficient distribution. The experimental analysis have indicated that the Cauchy-pdf-based models are significantly more accurate than the Laplacian-pdf-based models, especially for intra frames. Later we developed a frame bit allocation algorithm for H.264 that uses the developed rate and distortion models. We developed parameter handling and initialization methods for single-pass, CBR scenarios. The proposed frame bit-allocation algorithm is capable of achieving an average of 0.24-dB PSNR gain compared to JM 8.4 rate control algorithm and an average of 0.33-dB PSNR gain compared to an improved TM5-based frame bit-allocation algorithm proposed for H.264 video coder. Furthermore, the proposed algorithm helps reducing the PSNR variation among the frames when compared to the two algorithms. Since smoothness in visual quality is desirable for the coded video, this is also an important improvement. We attribute the resulting improvements with the proposed frame bit allocation algorithm to more accurate modeling and improved model parameter handling used for frame bit allocation.

1005

The proposed frame bit allocation method can be extended to a complete rate control algorithm by incorporating a macroblock layer quantization adaptation method. REFERENCES [1] G. K. Wallace, “The JPEG still picture compression standard,” Commun. ACM, vol. 34, pp. 30–44, Apr. 1991. [2] D. LeGall, “MPEG: A video compression standard for multimedia application,” Commun. ACM, vol. 34, pp. 46–58, Apr. 1991. [3] The MPEG-2 International Standard, ISO/IEC, Reference Number ISO/IEC 13 818-2, 1996. [4] T. Sikora, “The MPEG-4 video standard verification model,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 1, pp. 19–31, Feb. 1997. [5] Video Coding for Audiovisual Services at p 64 kbits, Mar. 1993. ITU-T, ITU-T Recommendation H.261. [6] Video Coding for Low Bit Rate Communications, ITU-T, ITU-T Recommendation H.263, ver. 1, 1995. [7] Emerging H.26L Standard, 2002. White paper, UB Video Inc. [8] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560–576, Jul. 2003. [9] H. S. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, “Low-complexity transform and quantization in H.264/AVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 598–603, Jul. 2003. [10] M. Wien, “Variable block-size transforms for H.264/AVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 604–613, Jul. 2003. [11] D. Marpe, H. Schwarz, and T. Wiegand, “Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 620–636, Jul. 2003. [12] J. R.-Corbera, P.-A. Chou, and S. L. Regunathan, “A generalized hypothetical reference decoder for H.264/AVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 674–687, Jul. 2003. [13] T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, and G. J. Sullivan, “Rate constrained coder control and comparison of video coding standards,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 688–703, Jul. 2003. [14] M. Flierl and B. Girod, “Generalized B pictures and the draft H.264/AVC video compression standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 587–597, Jul. 2003. [15] W. K. Pratt, Digital Image Processing. New York: Wiley, 1978, ch. 10. [16] A. N. Netravali and J. O. Limb, “Picture coding: A review,” Proc. IEEE, vol. PROC-68, no. 3, pp. 7–12, Mar. 1960. [17] R. C. Reininger and J. D. Gibson, “Distributions of the two-dimensional DCT coefficients for images,” IEEE Trans. Commun., vol. COM-31, no. 6, pp. 835–839, Jun. 1983. [18] E. Y. Lam and J. W. Goodman, “A mathematical analysis of the DCT coefficient distributions for images,” IEEE Trans. Image Process., vol. 9, no. 10, pp. 1661–1666, Oct. 2000. [19] F. Muller, “Distribution shape of two-dimensional DCT coefficients natural images,” Electron. Lett., vol. 29, no. 22, pp. 1935–1936, Oct. 1993. [20] T. Eude, R. Grisel, H. Cherifi, and R. Debrie, “On the distribution of the DCT coefficients,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 5, Apr. 1994, pp. 365–368. [21] S. R. Smooth and R. A. Lowe, “Study of DCT coefficients distributions,” in Proc. SPIE, Jan. 1996, pp. 403–311. [22] M. Barni, F. Bartolini, A. Piva, and F. Rigacci, “Statistical modeling of full frame DCT coefficients,” in Proc. Eur. Signal Processing Conf. EUSIPCO 98, vol. III, Sep. 1998, pp. 1513–1516. [23] W. Ding and B. Liu, “Rate control of MPEG video coding and recording by rate-quantization modeling,” IEEE Trans. Circuits Syst. Video Technol., vol. 6, no. 1, pp. 12–20, Feb. 1996. [24] B. Tao, B. W. Dickinson, and H. A. Peterson, “Adaptive model-driven bit allocation for MPEG video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 10, no. 1, pp. 147–157, Feb. 2000. [25] Coded Representation of Picture and Audio Information-MPEG-2 Test Model 5, ISO-IEC AVC-491, Apr. 1993. [26] L.-J. Lin, A. Ortega, and C.-C. Kuo, “Rate control using spline-interpolated R-D characteristics,” in Proc. Visual Communications and Image Processing, Orlando, Fl, Mar. 1996, pp. 111–122. [27] T. Chiang and Y.-Q. Zhang, “A new rate control scheme using quadratic rate distortion model,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 1, pp. 246–250, Feb. 1997.

2

1006

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 8, AUGUST 2005

[28] B. Tao, H. A. Peterson, and B. W. Dickinson, “A rate-quantization model for MPEG encoders,” in Proc. Int. Conf. Image Processing, vol. II, Santa Barbara, CA, Oct. 1997, pp. 41–44. [29] K. H. Yang, A. Jacquin, and N. S. Jayant, “A normalized rate-distortion model for H.263-compatible codecs and its application to quantizer selection,” in Proc. Int. Conf. Image Processing, vol. II, Santa Barbara, CA, Oct. 1997, pp. 41–44. [30] H. Gish and J. N. Pierce, “Asymptotically efficient quantizing,” IEEE Trans. Inf. Theory, vol. IT-14, no. 4, pp. 676–683, Sep. 1968. [31] T. Berger, Rate Distortion Theory. Englewood Cliffs, NJ: PrenticeHall, 1984. [32] H.-M. Hang and J.-J. Chen, “Source model for transform video coder and its application,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 2, pp. 287–298, Apr. 1997. [33] L. W. Lee and J. F. Wang, “On the error distribution and scene change for the bit rate control of MPEG,” IEEE Trans. Consum. Electron., vol. 39, no. 8, pp. 545–554, Aug. 1993. [34] J. Ribas-Corbera and S. Lei, “Rate control in DCT video coding for low-delay communications,” IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 1, pp. 172–185, Feb. 1999. [35] Video Codec Test Model, Near-Term, Version 8 (TMN8), Release 0, ITU-T Study Group 16, Video Expert Group, Document Q15-A-59, Jun. 1997. [36] Z. He and S. K. Mitra, “A unified rate-distortion analysis framework for transform coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 12, pp. 1221–1236, Dec. 2001. [37] Z. He, Y. Kim, and S. K. Mitra, “A novel linear source model and a unified rate control algorithm for H.263/MPEG-2/MPEG-4,” presented at the Int. Conf. Acoustics, Speech, and Signal Processing, Salt Lake City, UT, May 2001. [38] Z. He, Y. K. Kim, and S. K. Mitra, “Low delay rate control for DCT video coding via -domain source modeling,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 8, Aug. 2001. [39] E. Y. Lam and J. W. Goodman, “A mathematical analysis of the DCT coefficient distributions for images,” IEEE Trans. Image Process., vol. 9, no. 10, pp. 1661–1666, Oct. 2000. [40] Joint Model Reference Software Version 8.4, JVT of ISO/IEC MPEG and ITU-T VCEG. [41] S. Ma, W. Gao, P. Gao, and Y. Lu, “Rate control for advance video coding (AVC) standard,” in Proc. Int. Symp. Circuits and Systems, ISCAS’03, vol. 2, May 2003, pp. 892–895. [42] S. Ma, W. Gao, F. Wu, and Y. Lu, “Rate control for advance video coding (AVC) standard,” in Proc. Int. Conf. Image Processing, ICIP’03, vol. 3, Sep. 2003, pp. 793–796. [43] Z. G. Li, W. Gao, F. Pan, S. W. Ma, G. N. Feng, K. P. Lim, X. Lin, S. Rahardja, and H. Q. Lu, “Adaptive rate control with HRD consideration,” in Proc. 8th JVT Meeting, Geneva, Switzerland, May 2003, pp. 23–27.

Nejat Kamaci (S’00) received the B.S. and M.S. degrees in electrical and electronics engineering from Bilkent University, Ankara, Turkey, in 1997 and 1999, respectively. He is currently working toward the Ph.D. degree at the Georgia Institute of Technology, Atlanta. Between 1997 and 1999, he also participated in a project for humanitarian landmine detection funded by the Turkish Ministry of Defense. He is the author of several technical papers and he has a pending patent in the area of digital video compression. His research interests include image and video compression, multimedia communications, radar signal processing, and multiple description coding for wireless communications.

Yucel Altunbasak (S’94–M’97–SM’03) received the M.S. and Ph.D. degrees from the University of Rochester, Rochester, NY, in 1993 and 1996, respectively. He is an Assistant Professor with the School of Electrical and Computer Engineering, Georgia Institute of Technology (Georgia Tech), Atlanta. He joined Hewlett-Packard Research Laboratories in July 1996. Meanwhile, he taught at Stanford University, Stanford, CA, and at San Jose State University, San Jose, CA, as a Consulting Assistant Professor. He joined the School of Electrical and Computer Engineering, Georgia Tech, in 1999. He is currently working on industrial- and government-sponsored projects related to multimedia networking, wireless video, video coding, genomics signal processing, and such inverse imaging problems as super-resolution and demosaicking. His research efforts to date resulted in over 100 peer-reviewed publications and 15 patents/patent applications. Some of his inventions have been licensed by the Office of Technology Licensing at Georgia Tech. Dr. Altunbasak is an Associate Editor for the IEEE TRANSACTIONS ON IMAGE PROCESSING, the IEEE TRANSACTIONS ON SIGNAL PROCESSING, Signal Processing: Image Communications, and the Journal of Circuits, Systems and Signal Processing. He served as the lead Guest Editor for the Special Issue on Wireless Video of Signal Processing: Image Communications. He has been elected to the IEEE Signal Processing Society’s IMDSP Technical Committee. He has served as a Co-Chair for “Advanced Signal Processing for Communications” Symposia at ICC’03. He will serve as the Technical Program Chair for ICIP-2006. He also served as a Track Chair at ICME’03, as a Panel Sessions Chair at ITRE’03, as a Session Chair at seven international conferences, and as a Panel Reviewer for government funding agencies. He is a coauthor for a conference paper that received the Best Student Paper Award at ICIP’03. He received the National Science Foundation (NSF) CAREER Award in 2002, and he is a recipient of the 2003 Outstanding Junior Faculty Award at the School of Electrical and Computer Engineering, Georgia Tech.

Russell M. Mersereau (S’69–M’73–SM’78–F’83) received the S.B. and S.M. degrees in 1969 and the Sc.D. in 1973 from the Massachusetts Institute of Technology, Cambridge. He joined the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, in 1975. His current research interests are in the development of algorithms for the enhancement, modeling, and coding of computerized images, synthesis aperture radar, and computer vision. In the past, this research has been directed to problems of distorted signals from partial information of those signals, computer image processing and coding, the effect of image coders on human perception of images, and applications of digital signal processing methods in speech processing, digital communications, and pattern recognition. He is the coauthor of the text Multidimensional Digital Signal Processing (Englewood Cliffs, NJ: Prentice-Hall, 1984). Dr. Mersereau has served on the Editorial Board of the PROCEEDINGS OF THE IEEE and as Associate Editor for signal processing of the IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING and the IEEE SIGNAL PROCESSING LETTERS. He was the Vice President for Awards and Membership of the IEEE Signal Processing Society and a member of its Executive Board from 1999 to 2001. He is the corecipient of the 1976 Bowder J. Thompson Memorial Prize of the IEEE for the best technical paper by an author under the age of 30, a recipient of the 1977 Research Unit Award of the Southeastern Section of the ASEE, and three teaching awards. He received the 1990 Society Award of the Signal Processing Society and an IEEE Millenium Medal in 2000.

Suggest Documents