This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598672, IEEE Transactions on Circuits and Systems for Video Technology IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. **, NO. **, ** 2016
1
λ Domain Optimal Bit Allocation Algorithm for High Efficiency Video Coding Li Li, Bin Li, Member, IEEE, Houqiang Li, Senior Member, IEEE, and Chang Wen Chen, Fellow, IEEE
Abstract—Rate control typically involves two steps: bit allocation and bitrate control. The bit allocation step can be implemented in various fashions depending on how many levels of allocation are desired and whether or not an optimal ratedistortion (R-D) performance is pursued. The bitrate control step has a simple aim in achieving the target bitrate as precisely as possible. In our recent research, we have developed a λ-domain rate control algorithm capable of controlling the bitrate precisely for High Efficiency Video Coding (HEVC). The initial research [1] showed that the bitrate control in the λ-domain can be more precise than the conventional schemes. However, the simple bit allocation scheme adopted in this initial research is unable to achieve an optimal R-D performance reflecting the inherent R-D characteristics governed by the video content. In order to achieve an optimal R-D performance, the bit allocation algorithms need to be developed taking into account the video content of a given sequence. The key issue in deriving the video content-guided optimal bit allocation algorithm is to build a suitable R-D model to characterize the R-D behavior of the video content. In this research, to complement the R-λ model developed in our initial work [1], a D-λ model is properly constructed to complete a comprehensive framework of λ-domain R-D analysis. Based on this comprehensive λ-domain R-D analysis framework, a suite of optimal bit allocation algorithms are developed. In particular, we design both picture level and Basic Unit level bit allocation algorithms based on the fundamental rate-distortion optimization (RDO) theory to take full advantage of content-guided principles. The proposed algorithms are implemented in HEVC reference software, and the experimental results demonstrate that they can achieve obvious R-D performance improvement with smaller bitrate control error. The proposed bit allocation algorithms have already been adopted by the Joint Collaborative Team on Video Coding (JCT-VC) and integrated into the HEVC reference software. Index Terms—R-D analysis, R-λ model, D-λ model, bit allocation, rate control, HEVC
I. I NTRODUCTION Rate control typically involves two steps: bit allocation and bitrate control. The bit allocation step can be implemented in Manuscript received October 12, 2015; revised April 27, 2016 and July 6, 2016; accepted July 16, 2016. This work was recommended by Associate Editor Yao Wang. This work was supported in part by 973 Program under Contract 2015CB351803, the National Key Research and Development Plan under Grant No. 2016YFC0801001, Natural Science Foundation of China (NSFC) under Contract 61325009, 61390510, 61272316. L. Li, H. Li, and C. W. Chen are with the Chinese Academy of Sciences Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, University of Science and Technology of China, Hefei 230027, China. C. W. Chen is also with the State University of New York at Buffalo. B. Li is with Microsoft Asia. Professor Houqiang Li is the corresponding author. (e-mail:
[email protected];
[email protected];
[email protected];
[email protected]). Copyright (c) 2016 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to
[email protected].
D
Best combination
R
Fig. 1.
A typical R-D curve
various fashions depending on how many levels of allocation are desired and whether or not an optimal rate-distortion (R-D) performance is pursued. The bitrate control step has a simple aim in achieving the target bitrate as precisely as possible. In our recent research, we have developed a λ-domain rate control algorithm [1] [2] capable of controlling the bitrate precisely for High Efficiency Video Coding (HEVC) [3]. The Lagrange Multiplier λ is the slope of the line tangent to the R-D curve as shown in Fig. 1 [4]. When encoding a frame, the λ determines the optimization target J = D + λR, which is usually called as R-D cost. During the encoding process, the encoder will traverse all possible encoding parameters’ (mode, motion, and QP) combinations to find the one with the minimum R-D cost. The initial research showed that the bitrate control in the λ-domain can be more precise than the conventional schemes. However, the simple bit allocation scheme adopted in our initial research is unable to achieve optimal R-D performance reflecting the inherent R-D characteristics governed by the video content. Similar to most bit allocation algorithms, the bit allocation in our initial research is also classified into three steps: Groups of Pictures (GOP) level, picture level and Basic Unit (BU) level. The BU is the basic unit of rate control. A BU always consists of one or several continuous coding units (CUs) in HEVC. The size of a BU determines the granularity of a rate control scheme. It should be noted that the proposed bit allocation algorithm targets at the constant bitrate (CBR) case. In our scheme, although the bitrate is not a constant number picture by picture, the bitrate will be constant within the size of a GOP or a predefined sliding window. Under the application scenario of CBR case, the GOP level bit allocation has a simple aim in making the bit stream more adaptable to the bandwidth instead of improving the R-D performance.
1051-8215 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598672, IEEE Transactions on Circuits and Systems for Video Technology 2
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. **, NO. **, ** 2016
Therefore, in our initial work and also this paper, the GOP level target bits are determined by the current network and buffer status [5] [6]. For the picture level bit allocation in our initial work, a simple fixed weight bit allocation is used, which tries to assign fixed percentage of bits to the different pictures in a GOP regardless of various video sequences that may have different characteristics. Obviously, such a fixed bit allocation approach cannot adapt to the different R-D characteristics governed by the various video content, therefore it is unable to achieve an optimal R-D performance. For the BU level bit allocation in our initial work, the bits of each BU are assigned according to the mean absolute difference (MAD) between the original value and predicted value of the collocated BU. The larger the MAD of a BU is, the more bits should be assigned to the BU, and vice versa. However, the MAD alone, which only describes the characteristics of distortion, cannot actually reflect the inherent R-D characteristic of a specified BU. Therefore, the MAD-based BU level bit allocation cannot achieve a good R-D performance either. In order to achieve an optimal R-D performance, the bit allocation algorithms will need to be developed by taking into account the video content of a given sequence. The key issue in deriving the video content-guided optimal bit allocation algorithm is to build a suitable R-D model to properly characterize the R-D behavior of the video content. One class of bit allocation algorithms that attempts to use Q (quantization) to model R-D behavior is referred to as Q-domain R-D analysis [7] [8]. Another class of bit allocation algorithms that tries to use ρ (the percentage of zeros among the quantized transform coefficients) to model the R-D behavior is called ρ-domain RD analysis [9] [10]. In Q-domain or ρ-domain R-D analysis, it is assumed that the Q or ρ is the key factor to determine the bitrate and the corresponding distortion. However, both Q and ρ cannot determine the non-residue bits, such as mode and motion. For HEVC, it supports very flexible coding unit (CU) and transform unit (TU) splitting, hence the non-residue bits would increase dramatically comparing with the previous coding standards. This consequence magnifies the shortcoming that Q and ρ are unable to determine the non-residue bits. Therefore, both the Q-domain and ρ-domain R-D analysis frameworks are not suitable for HEVC. In our previous studies [1] [2], we have analyzed and confirmed that λ is the fundamental factor to determine the bitrate. We have also proposed an R-λ model to characterize the relationship between R and λ for HEVC. However, the R-λ model can only reflect part of the inherent R-D characteristics since the distortion is not taken into consideration. To design the optimal bit allocation algorithms, the inherent R-D characteristics governed by the video content must be fully utilized. Therefore, in this research, to complement the R-λ model we have developed in our initial work, a D-λ model to characterize the relationship between D and λ is properly constructed to complete a comprehensive framework of λdomain R-D analysis. Based on this comprehensive λ-domain R-D analysis framework, a suite of optimal bit allocation algorithms have been developed. In particular, we design both picture level and BU level bit allocation algorithms based on the fundamental rate-distortion optimization (RDO) theory to
take full advantage of content-guided principles. A. Related work There have been many researches on the picture level bit allocation. The picture level bit allocation algorithms can be classified into two categories: Independent Bit Allocation (IBA) and Dependent Bit Allocation (DBA). As suggested by their names, the IBA methods [11] [12] ignore the influence of the current picture on the subsequent pictures while the DBA methods take the inter picture dependency into consideration. Since the IBA methods ignore the inter picture dependency, Ramchandran et al. verify that the coding performance gap between the IBA methods and the DBA methods can be quite significant [13]. Therefore, most recent schemes focus on the DBA methods. For example, Hu et al. propose a picture level bit allocation algorithm based on a linear Q-domain R-D analysis for H.264/AVC [14]. Wang et al. propose a picture level bit allocation algorithm based on the ρ-domain R-D analysis for HEVC [15]. Pang et al. formulate the picture level bit allocation problem as a convex optimization problem [16]. However, all these methods consider the inter picture dependency under quite simple reference structure. For example, both Hu et al. [14] and Wang et al. [15] assume that the number of reference frames in both reference lists for B frames is 1. Pang et al. further simplify the problem that the reference frame of the current frame is assumed to be the immediate preceding frame [16]. Besides, all these schemes consider Q or ρ as the critical factor to determine the bitrate and distortion, which makes them unsuitable for HEVC. Besides the picture level bit allocation, the BU level bit allocation has also been studied by a number of researchers. Yuan et al. try to get the statistical characteristics of the current frame to improve the R-Q model accuracy of each BU and propose a BU level bit allocation scheme using the improved R-D model [17]. He et al. propose to use a ρdomain R-D model to characterize the R-D relationship and provide a BU level bit allocation algorithm based on the ρdomain R-D analysis [18]. However, the parameters Q or ρ are unsuitable to be used as the critical factor for the analysis of the R-D behavior for HEVC as we indicated earlier. Besides, there exists very serious “chicken and egg” dilemma between rate distortion optimization (RDO) and rate control when Q or ρ-domain rate control algorithm is adopted. In addition, Ferguson et al. take the dependency between different BUs due to intra prediction into consideration to make the R-D model even more accurate [19]. However, as only very few blocks may choose intra prediction mode in inter frames, the dependency between different BUs is very weak for inter frames. Therefore, the proposed scheme in [19] can only improve the R-D performance slightly for inter frames. Such an insignificant dependency shall not be considered in the proposed BU level bit allocation in this research. All existing works consider only the picture level or BU level bit allocation. There have also been many researches combining both picture level and BU level bit allocation into an integrated framework. For instance, Seo et al. propose a new R-Q model considering both Variance Of Difference
1051-8215 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598672, IEEE Transactions on Circuits and Systems for Video Technology LI et al.: λ DOMAIN OPTIMAL BIT ALLOCATION ALGORITHM FOR HIGH EFFICIENCY VIDEO CODING
(VOD) and MAD to improve the accuracy of the R-Q model. An optimal picture level bit allocation algorithm is proposed based on the Q-domain R-D analysis [20]. However, when considering the BU level bit allocation, the BU level target bits are not optimized using the fundamental R-D theory, but rather determined by the estimated VOD and MAD. Tao et al. propose to use a parametric R-D model to facilitate both the picture level and BU level bit allocations [21]. However, the picture level bit allocation is not optimized by considering the inter picture dependency but is simply implemented by assigning more bits to the more important I and P frames and less bits to the less important B frames. Besides the above mentioned Q-domain or ρ-domain bit allocation algorithms, there are also some researches which try to improve the λ-domain rate control algorithm we have developed previously. For example, Si et al. try to use the sum of absolute transformed difference (SATD) instead of MAD as the measurement for BU level bit allocation [22]. Karczewicz et al. also use the SATD to improve the bitrate accuracy of the λ-domain intra frame rate control [23]. However, to the best of our knowledge, this is the first work that optimizes bit allocation for the λ-domain rate control algorithm. B. Our Contribution The contribution of this paper is summarized as follows. First, a complete comprehensive λ-domain R-D analysis framework has been developed. To complement the R-λ model proposed in [1] [2], a D-λ model is properly developed in λdomain to characterize the relationship between D and λ in HEVC. These R-λ and D-λ models together form a complete comprehensive λ-domain R-D analysis framework for HEVC. Second, based on the comprehensive λ-domain R-D analysis framework, a content-guided optimal picture level bit allocation algorithm is developed according to the fundamental RDO theory. The optimal picture level bit allocation can be achieved by setting the λ ratio between different pictures inversely proportional to their influence to the whole sequence. The more important the picture is, the more bits will be assigned to the picture so that the picture can be coded with better quality so as to optimize the overall R-D performance. Different from the fixed weight picture level bit allocation algorithm we adopted in our previous work, the proposed picture level bit allocation algorithm can accommodate the inherent R-D characteristics of various sequences since the parameters of R-D model for each picture are content-related. Therefore, the adaptive bit allocation can adjust the percentage of bits between different pictures according to the video content so that more important pictures can always be coded with better quality and thus will lead to better R-D performance. Third, we also develop a content-guided optimal BU level bit allocation algorithm again according to the fundamental RDO theory. The BU level bit allocation can be achieved by setting the equal λ for different BUs in the same picture as they can be thought as independent of each other in inter pictures. Different from the MAD based BU level bit allocation algorithm adopted in our previous work, the proposed algorithm can adapt to the inherent R-D characteristic of each BU since
3
the parameters of R-D model for each BU are content-related and thus can lead to better R-D performance. Notice that the proposed picture level and BU level bit allocation algorithms have already been adopted by Joint Collaborative Team on Video Coding (JCT-VC) [24] and integrated into the HEVC reference software [25]. This paper is organized as follows. Section II derives the D-λ model to complete a comprehensive λ-domain R-D analysis framework. Based on this comprehensive λ-domain R-D analysis framework, the proposed content-guided optimal picture level and BU level bit allocation algorithms are then described in details in Section III and IV. The experimental results are shown in Section V. The benefits brought forth by the picture level and BU level bit allocation individually as well as the benefits of combining them together are discussed in details. Finally, Section VI concludes this paper with a summary. II. λ- DOMAIN R-D ANALYSIS FRAMEWORK In this section, we will present a complete comprehensive λdomain R-D analysis framework which consists of R-λ model and D-λ model. As the R-λ model has already been described in details in [1] [2], we shall focus on the description and verification of the proposed D-λ model. Before presenting the derivation of D-λ model, we should first choose a suitable R-D model. As we know, there are several types of R-D models that have been developed to characterize the relationship between R and D, including the Exponential function [26] and the Hyperbolic function [27] [28]. It has been demonstrated in [1] [2] that the Hyperbolic function expressed in Equation (1) is more suitable for HEVC, D(R) = CR−K
(1)
where C and K are model parameters related to the characteristic of the video content. Therefore, the Hyperbolic function will be adopted to derive the D-λ model in this research. It is well known that λ is the slope of the line tangent to the R-D curve, thus λ can be expressed as follows, ∂D = CK · R−K−1 (2) ∂R It should be mentioned here that Equation (2) can be expressed in another form, λ=−
R=(
1 λ − K+1 ) , αλβ CK
(3)
1 − K+1 1 where α is equal to ( CK ) , β is equal to − K+1 . It should be emphasized that α and β are the model parameters related to the video content. In Section III and Section IV, we will demonstrate that α and β are the key parameters for the bit allocation algorithms to adapt to the video content. Equation (3) is exactly the R-λ model reported in [1] [2], which represents a one-to-one correspondence between R and λ. If we combine Equation (1) and (3), we will have 1
D =C ·(
K λ K+1 ) , α1 λβ1 CK
(4)
1051-8215 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598672, IEEE Transactions on Circuits and Systems for Video Technology 4
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. **, NO. **, ** 2016
K . where α1 is equal to C K+1 · K − K+1 and β1 is equal to K+1 Equation (4) presents a one-to-one correspondence between D and λ. To verify the D-λ model as derived, we conduct a number of experiments using the IBBB coding structure (i.e., only the first picture is I picture and followed by a number of B pictures) with only one reference picture and flat QP in HM-11.0 [25]. Then we fit the curve according to (4), as shown in Fig. 2. It should be mentioned that D is expressed in terms of MSE (Mean Square Error) of Luma component and calculated by 1
M SE =
1 ∑ (Reci − Orgi )2 N i
(5)
where N is the total number of pixels in a picture, Reci and Orgi are the pixel values in the reconstructed and original pictures, respectively. Consider that I pictures and B pictures have different R-D characteristics, only the MSE of B pictures is taken into account. The sequences used in Fig. 2 are WVGA format (the resolution is 832 × 480) specified in [29]. The coefficient of determination R2 is used to measure how well Equation (4) can characterize the relationship between D and λ. From Fig. 2, we can see that the average coefficient of determination is greater than 0.99, which means that the relationship between D and λ in HEVC matches the model in Equation (4) very well. Such a D-λ model, along with the R-λ model proposed in our initial research, forms a complete comprehensive framework of λ-domain R-D analysis. For the convenience of the derivation for the content-guided picture level and BU level bit allocation algorithms in the next ∂D sections, ∂R ∂λ and ∂λ are derived in advance, K+2 1 ∂R 1 − K+1 1 =( ) · (− ) · λ− K+1 ∂λ CK K +1
(6)
K 1 ∂D 1 K+1 K =C ·( ) · · λ− K+1 ∂λ CK K +1
(7)
III. T HE PICTURE LEVEL BIT ALLOCATION In this section, we will introduce the proposed contentguided picture level bit allocation algorithm based on the comprehensive λ-domain R-D analysis framework. Some of the notations used in the remaining of this paper are summarized in Table I. One important objective of the picture level bit allocation is to minimize the overall distortion between the reconstructed video sequence and the original video sequence under the constraint of the GOP level target bits. Since λPi is the key factor to determine the bitrate and distortion of a specified picture, the optimal picture level bit allocation can be formulated as selecting an appropriate λPi for each picture to minimize the total distortion of all the pictures subject to the constraint that the number of bits consumed should be equal to or less than the number of GOP level target bits. That is, NP ∑
min
λP1 ,λP2 ,···,λPN
G
i=1
TABLE I N OTATIONS FOR BIT ALLOCATION IN THIS PAPER
K
DPi s.t.
NG ∑ i=1
Notation NP NG RG RemPi DPi T BPi T APi RPi T arBi RemBi λP i λP L NB D Bi TBi R Bi λB i
By applying the Lagrangian Multipliers method, the constrained optimization problem can be converted into the following unconstrained problem,
(8)
NP ∑
min
λP1 ,λP2 ,···,λPN
G
i=1
DPi + λ
NG ∑
RPi
(9)
i=1
where λ is the Lagrangian multiplier for the optimization problem. The unconstrained problem can be solved by setting the following derivatives to zero, ∑NP ∑NG DPi ∂ i=1 RPi ∂ i=1 +λ = 0, j = 1, 2, ..., NG (10) ∂λPj ∂λPj As the pictures to be encoded after the current picture may use the current picture as their reference, the quality of the current picture usually has significant influence on the quality of the subsequent pictures. Therefore, there exists the so-called “quality dependency” between the current picture and its subsequent pictures. Such a relationship between the current picture and its subsequent pictures can be expressed as ∂DPi ≥ 0, if i ≥ j ∂DPj
(11)
It should be mentioned that i and j mean the encoding order, rather than the display order. Different from the “quality dependency”, the “bitrate dependency” can be very small between different pictures due to rate control, which has been verified by numerous existing bit allocation schemes [14] [15], ∂RPi ≈ 0, if i ≥ j ∂RPj
(12)
Combining Equation (10), (11) and (12), we will have ∑NP ∂ i=j DPi ∂DPj ∂RPj · +λ = 0, j = 1, 2, ..., NG (13) ∂DPj ∂λPj ∂λPj ∂
RPi ≤ RG
Explanation The number of pictures in a sequence The number of pictures in a GOP The target bits of a GOP The target bits of a GOP after encoding picture i − 1 The distortion of picture i The target bits of picture i before encoding the GOP The target bits of picture i after encoding picture i − 1 The target bits of picture i The target bits of BUs before encoding picture since BUi The target bits of BUs after encoding BUi−1 since BUi The lambda of picture i The lambda of the last picture in the previous GOP The number of BUs in a picture The distortion of BUi The target bits of BUi before encoding the picture The target bits of BUi The lambda of BUi
∑NP
DP
i=j i where represents the influence of the current ∂DPj picture on the whole sequence. More specifically, this can be
1051-8215 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598672, IEEE Transactions on Circuits and Systems for Video Technology LI et al.: λ DOMAIN OPTIMAL BIT ALLOCATION ALGORITHM FOR HIGH EFFICIENCY VIDEO CODING
RaceHorses
BasketballDrill
60.00 D = 1.1753λ0.6856 R² = 1
50.00
MSE
MSE
40.00 30.00 20.00 10.00 0.00 0.00
50.00
100.00
5
150.00
200.00
250.00
45.00 40.00 35.00 30.00 25.00 20.00 15.00 10.00 5.00 0.00
D = 1.7324λ0.5703 R² = 0.9982
0.00
50.00
100.00
lambda
200.00
250.00
PartyScene 120.00
45.00 40.00 35.00 30.00 25.00 20.00 15.00 10.00 5.00 0.00
D = 1.5092λ0.5976 R² = 0.9998
D = 1.5174λ0.7664 R² = 0.9994
100.00 80.00 MSE
MSE
BQMall
60.00 40.00 20.00 0.00
0.00
50.00
100.00
150.00
200.00
250.00
0.00
50.00
100.00
lambda
Fig. 2.
150.00
lambda
150.00
200.00
250.00
lambda
The relationship between distortion D (Luma MSE) and the Lagrange multiplier λ
split into the influence on the current picture itself and its subsequent pictures as follows, ∑NP ∑NP DPi DPi ∂ i=j ∂ i=j+1 =1+ , 1 + θPj (14) ∂DPj ∂DPj In Equation (14), θPj denotes the influence of the current picture on its subsequent pictures. It should be noted that the larger the influence of the current picture on the subsequent pictures, the larger the value of 1 + θPj will be. If we take ∂D ∂λ and ∂R ∂λ expressed in Equation (7) and (6) into consideration, Equation (13) can be solved as λ λ λPj = ∑NP = , ωPj λ, j = 1, 2, ..., NG ∂ i=j DPi 1 + θPj ∂DPj
Equation (18) indicates that the λ value of the current picture should be inversely proportional to its influence on the whole sequence in order to optimize the picture level bit allocation. If the current picture has a far-reaching influence on the whole sequence, the parameter ω should be quite small for the current picture. Such a small ω will lead to smaller λ and result in larger amount of bits assigned to the current picture. It is clear that when more bits are assigned to more important pictures, an improvement of the overall R-D performance can be achieved. As we have discussed, the target bits at GOP level should be determined according to the bandwidth and the current buffer conditions. The sum of the target bits of all the pictures in a GOP should be equal to the GOP level target bits, i.e.,
(15)
Then the R-D cost of each picture can be written as λ RP J = DP + λP RP = DP + 1 + θP Equation (16) is equivalent to J ′ = (1 + θP )DP + λRP
(16)
NG ∑
(19)
Combining Equation (3), (15) and (19), we can have NG ∑
(17)
Equation (17) implies that if we use the same optimization target for each picture, the distortion for the more important pictures will be translated into a larger R-D cost. Therefore, we should encode the more important pictures with smaller distortion. To enable a more obvious physical interpretation for Equation (15) more obvious, we rewrite Equation (15) as 1 + θPj ωPi λPi = = , i, j = 1, 2, ..., NG (18) λPj 1 + θPi ωPj
RPi = RG
i=1
i=1
βP αPi λPi i
=
NG ∑
αPi (ωPi λ)βPi = RG
(20)
i=1
As αPi , βPi and ωPi are all known, there is only one unknown parameter λ in Equation (20) (the detailed setting of αPi , βPi and ωPi will be introduced later on). However, it is still difficult for us to obtain the analytical solution since βPi is always a negative decimal number. However, since the left side of Equation (20) is a monotone decreasing function with respect to λ, it can be solved through some commonly used numerical methods. In our experiment, we use the Bisection method [30] with up to 20 iterations to solve Equation (20).
1051-8215 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598672, IEEE Transactions on Circuits and Systems for Video Technology 6
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. **, NO. **, ** 2016
Lev0
When λ is in the range of [0, 10000], 20 iterations shall achieve a precision of 0.01. After the Lagrange multiplier λ is obtained, the λPi for each picture can be obtained through Equation (15) derived from the λ-domain R-D analysis framework consisting of the R-λ model and D-λ model. Therefore, an adapting weight factor for bit allocation for each picture in the GOP can be derived using the R-λ model as βP
ΩPi = αPi λPi i = αPi (ωPi λ)βPi , i = 1, 2, ..., NG
Lev2
f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 GOP
GOP
GOP
(a) encoding structure for LD case
(21)
From Equation (21), we know that the adaptive weighting factor ΩPi is determined by four parameters, λ, αPi , βPi , and ωPi . Except for λ which is the Lagrange multiplier, the parameters αPi , βPi , and ωPi are all related to the video content. The parameters αPi and βPi are able to completely characterize the R-D behavior of the current picture. They will be updated in a picture by picture fashion to accommodate the dynamics of the video content. The parameter ωPi reflects the influence of the current picture on the subsequent pictures, which is also content-related. Therefore, the proposed picture level bit allocation algorithm takes full advantage of contentguided principles. The content-guided principles make the bits allocated to each picture suitable for the current video content and thus will lead to an enhanced R-D performance. Finally, the target bits for each picture can be computed through the following linear combination, RPi = θ · T APi + (1 − θ) · T BPi , θ ∈ [0, 1]
Lev1
(22)
where T APi is the target bits after encoding picture i − 1 while T BPi is the target bits before encoding the current GOP, respectively. They can be calculated through ΩP T APi = RemPi · ∑NG i , i = 1, 2, ..., NG j=i ΩPj
(23)
ΩP T BPi = RG · ∑NG i , i = 1, 2, ..., NG j=1 ΩPj
(24)
At the beginning of encoding each sequence, the larger the θ is, it will be much easier for the current GOP to achieve the GOP level target bits. However, since the picture level target bits may vary seriously if the θ is a large value, it will be much more difficult to make the R-λ model parameters converge quickly and vice versa. After the convergence of the R-λ model parameters, each picture will be able to achieve its own target bits. In this case, the difference between T APi and T BPi will be quite small and the value of θ will be no longer important. In HEVC, there exist two typical encoding structures whose reference relationship is very flexible. One is defined for Low Delay (LD) case as shown in Fig. 3 (a) while the other one is for Random Access (RA) case as shown in Fig. 3 (b). For the λ-domain rate control algorithm, if hierarchical bit allocation is enabled, pictures may belong to different levels. Taking LD case as an example, there are 3 different levels for LD encoding as shown in Fig. 3 (a). Intuitively, since picture f4n belongs to level 0 which is the most frequently referenced picture, it should be assigned the most bits. Picture f4n+2 belongs to level 1 so it should be assigned a moderate
Lev0
Lev1 Lev2 Lev3
f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 GOP
GOP
(b) encoding structure for RA case Fig. 3.
The typical encoding structures in HEVC
number of bits according to its importance. Picture f4n+1 and f4n+3 belong to level 2 so they should be assigned the fewest bits. The situation is similar for RA case which has 4 levels as shown in Fig. 3 (b). It is clear that the R-λ model parameters α and β of the current picture are obtained from the updating process of the previous picture in the same level. This means that α and β of picture f4n , f4n+1 and f4n+2 will be known after finishing encoding the previous GOP while α and β of picture f4n+3 will only be available after finishing the encoding of picture f4n+1 in the current GOP in the case of LD. However, the parameter ΩPi is calculated before encoding the current GOP. To solve this problem of mismatch, α and β of picture f4n+3 are estimated using that of picture f4n+1 as they belong to the same level. In the case of RA, the parameters of α and β of the pictures in different levels will be obtained in the same way. For the setting of ωPi , the method proposed in [31] with multiple pass encoding provides an optimal way. However, in this research, we set the parameter ωPi , which decides the λ ratio between different pictures, in the same way as the HEVC default configuration in order to reduce the encoder complexity. It has been verified that the HEVC default configuration is beneficial in achieving R-D performance. In the HEVC default setting, the pictures belonging to the same level would use the same λ. Thus, the same ωPi is applied for the pictures in the same level for the proposed picture level bit allocation algorithm. Without loss of generality, we can set ωPi for the picture in level 0 as the basis for bit allocation. As shown in Table II, ωPi is related to the λ of the last picture in the previous GOP λPL . Table II (a) shows the setting of ωPi for the case of LD, while Table II (b) shows the setting of ωPi for the case of RA. For the first GOP, λPL is unavailable. Therefore, the initial percentage of bits between different pictures is set as the fixed weight bit allocation as reported in [1].
1051-8215 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598672, IEEE Transactions on Circuits and Systems for Video Technology LI et al.: λ DOMAIN OPTIMAL BIT ALLOCATION ALGORITHM FOR HIGH EFFICIENCY VIDEO CODING
TABLE II ωPi
∂R Substituting ∂D ∂λ expressed in Equation (7), ∂λ expressed in Equation (6), into Equation (29), we can have
FOR HIERARCHICAL BIT ALLOCATION
level 0 1 2
(a) LD case ωPi (λPL < 120) 1.0 0.725 ln(λPL ) + 0.5793 0.943 ln(λPL ) + 0.7531
ωPi (λPL ≥ 120) 1.0 4.0 5.0
level 0 1 2 3
(b) RA case ωPi (λPL < 90) 1.0 0.725 ln(λPL ) + 0.7963 0.943 ln(λPL ) + 1.0352 2.356 ln(λPL ) + 2.5880
ωPi (λPL ≥ 90) 1.0 4.0 5.0 12.3
λBj = λ, j = 1, 2, ..., NB
IV. T HE BU LEVEL BIT ALLOCATION In this section, we will introduce the proposed contentguided BU level bit allocation algorithm based on the comprehensive λ-domain R-D analysis framework. The BU level bit allocation aims at minimizing the distortion of the current picture under the constraint of the picture level target bits. Similar to the picture level bit allocation, in the BU level bit allocation, a suitable λBi value should also be selected for each BU in order to minimize the total distortion of the current picture under the constraint of picture level target bits, i.e., NB ∑
min
λB1 ,λB2 ,···,λBN
B
DBi s.t.
i=1
NB ∑
RBi ≤ RP
(25)
i=1
where RP is picture level target bits determined by picture level bit allocation. Likewise, the above constrained optimization problem can be converted into the following unconstrained optimization problem, NB ∑
min
λB1 ,λB2 ,···,λBN
B
i=1
D Bi + λ
NB ∑
R Bi
(26)
i=1
where λ is the Lagrangian multiplier. The unconstrained optimization problem can be solved as ∑NB ∑NB D Bi ∂ i=1 RBi ∂ i=1 +λ = 0, j = 1, 2, ..., NB (27) ∂λBj ∂λBj It is well known that the inter BUs obtain the prediction value from the referenced picture but use no information from the neighboring BUs in the same picture. On the other hand, the intra BUs obtain the prediction value from the neighboring BUs in the same picture. For inter pictures, most of the BUs are encoded in an inter mode. This means that most of the BUs in the same picture are not dependent of each other. Therefore, we can make the assumption that the distortion of different BUs is not relevant to each other within the same picture for inter pictures, if we ignoring a small number of BUs selecting the intra mode. That is, ∂DBi = 0, i ̸= j (28) ∂DBj With this assumption, Equation (27) can be converted into the following formula, ∂RBj ∂DBj +λ = 0, j = 1, 2, ..., NB ∂λBj ∂λBj
7
(29)
(30)
This means that the Lagrange Multiplier λ for different BUs should be equal to each other so as to achieve the optimal BU level bit allocation. This conclusion is consistent with common knowledge that rate control with only picture level rate control and with BU level disabled will lead to the best R-D performance. However, rate control without BU level λ adjustment is too coarse to ensure bitrate accuracy. Therefore, BU level rate control is still necessary. Although we should not disable BU level rate control, Equation (30) still serves as a guideline to limit the fluctuation of different λ for each BU within a small range. In the case of picture level bit allocation, the GOP level target bits are used as the constraint for the sum of the target bits of all the pictures in a GOP. The same principle can be applied to BU level bit allocation. Therefore, the target bits of each picture, which become available after the picture level bit allocation, can also be used as the constraint for the sum of the target bits of all the BUs in a picture. Similarly, we can derive Equations like (19) and (20) and obtain the corresponding solution λ. However, unlike the picture level bit allocation, in which the complexity of solving Equation (20) is negligible because the number of pictures in a GOP is usually very small, the BU level bit allocation governed a similar equation like (20) would be rather complex because the number of BUs in a picture can be quite large. For instance, if we set the BU size as a CTU (Coding Tree Unit, the typical CTU size is 64 × 64), the number of BUs in a 1080p sequence is 510. This will be more difficult to solve. As a result, we need to find alternative approach to solve this problem. One possible approach will be to use the picture level λP to replace the λ′ s in BU level bit allocation. This is reasonable because we need to limit the fluctuation of different λ′ s within a small range. Hence, an adaptive weighting factor for bit allocation for each BU in a picture can be derived as follows, βB
ΩBi = αBi λP i , i = 1, 2, ..., NB
(31)
From Equation (31), we know that the adaptive weighting factor ΩBi is determined by three factors, λ, αBi , and βBi . Except for λ which is the Lagrange multiplier, the parameters αBi , βBi are able to characterize completely the R-D behavior of the current BU. They will be updated for the collocated BUs in different pictures to accommodate the dynamics of the video content. Therefore, the proposed BU level bit allocation algorithm is able to take full advantage of content-guided principles and expect to achieve an enhanced R-D performance. Finally, the target bits for each BU before the encoding of the current picture can be calculated by ΩB , i = 1, 2, ..., NB TBi = RP · ∑NB i j=1 ΩBj
(32)
The target bits for BUi after encoding BUi−1 can be computed
1051-8215 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598672, IEEE Transactions on Circuits and Systems for Video Technology 8
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. **, NO. **, ** 2016
TABLE III C HARACTERISTICS OF T EST S EQUENCES
Class A Class B Class C Class D Class E
Number of sequences 4 5 4 4 3
Video resolution 2560x1600 1920x1080 832x480 416x240 1280x720
Frame rate 30 & 60 24 & 50 & 60 30 & 50 & 60 30 & 50 & 60 60
TABLE VI B ITRATE ACCURACY OF A LL THE T EST S EQUENCES FOR T HE P ICTURE L EVEL B IT A LLOCATION
Fixed Weight Proposed
Average bitrate error LD RA 0.07% 0.19% 0.05% 0.14%
Maximum bitrate error LD RA 1.00% 2.72% 0.70% 2.42%
B. Results for the proposed picture level bit allocation through RBi = TBi −
T arPi−1 − RemPi−1 , i = 1, 2, ..., NB (33) SW
where SW is the size of the sliding window which aims to enforce the actual bitrate of the current BU to reach the target bitrate in the range of the sliding window and prevent the over-adjustment of the bitrates from occuring.
V.
EXPERIMENTAL RESULTS
A. Simulation Setup The proposed picture level and BU level bit allocation algorithms are implemented in HM-11.0. In order to evaluate the R-D performance of the proposed algorithms, the Low Delay Main Profile and Random Access Main Profile configurations specified in [29] are used as the test conditions, and all the sequences in Class A, B, C, D, and E [29] are used as the test sequences. The characteristics of the sequences are summarized in Table III. It should be noted that the sequences in Class B, C, D, and E are used for LD case while the sequences in Class A, B, C, and D are tested in RA case. The target bitrate of all the following mentioned bit allocation algorithms is the HM-11.0 default anchor bitrate without rate control. More specifically, first we use original HM-11.0 without rate control to encode the test sequences according to HEVC common test conditions, and then we round the bitrate obtained to its nearest integer, which is finally set as the target bitrate of the following bit allocation algorithms. The initial QP for the first picture, which is used to generate HM-11.0 anchor without rate control, is also applied to the following bit allocation algorithms. In our experiments, the θ in the proposed picture level bit allocation algorithm is set as 0.1, and the size of the sliding window SW in the proposed BU level bit allocation algorithm is set as 4. For the proposed picture level bit allocation algorithm, the fixed weight bit allocation and HM-11.0 default anchor are utilized for comparison. When verifying the performance of the picture level bit allocation individually, the original MADbased BU level bit allocation is used and the BU size is set as a CTU. For the proposed BU level bit allocation algorithm, the original BU level bit allocation algorithm in HM-11.0 and the HM-11.0 default anchor are used to verify the effectiveness of the proposed algorithm. When verifying the performance of the BU level bit allocation individually, the fixed weight picture level bit allocation is used.
As mentioned above, to evaluate the performance of the proposed bit allocation algorithm, we mainly present some experimental results on R-D performance improvement. Besides, some experimental results on bitrate accuracy are also shown to explain that the proposed algorithm will not have side effect on the bitrate accuracy. For both LD and RA test conditions, we use 2 sequences as examples (one from Class C and the other from Class D) to show the bitrate accuracy and R-D performance of the proposed picture level bit allocation. The detailed results for LD and RA cases are shown in Table IV and Table V, respectively. In the tables, the bitrate error is measured in terms of per sequence using the following formula, |Rt − Ra | × 100% (34) Rt where Rt is the target bitrate, Ra is the actual bitrate. The results of the average and maximum bitrate errors of all the test sequences for both LD and RA cases are shown in Table VI. From Table VI, we can see that the bitrate accuracy of the proposed picture level bit allocation is slightly better than the fixed weight bit allocation, no matter in terms of the average bitrate error or maximum bitrate error. Since the output bitrates of different bit allocation methods are not matched exactly, BD-rate [32] is employed in our experiments for fair R-D performance comparison. The HM-11.0 default anchor without rate control is set as the benchmark, while the fixed weight bit allocation, and the adaptive weight bit allocation are compared with HM-11.0 default anchor in terms of BD-rate. The results are summarized in Table VII. It should be noted that the positive value means performance loss, while the negative value means performance improvement. From the result, we can see that the proposed adaptive weight bit allocation can bring 1.8% and 1.9% RD performance improvement in average compared with the fixed weight bit allocation in LD and RA cases, respectively. And when compared with the HM-11.0 default anchor, the proposed adaptive weight bit allocation algorithm suffers only about 1.5% and 3.6% R-D performance loss accordingly. As for the complexity, since the rate control is an encoder-only issue, only the encoding time is shown. The Enc. time in the table means the ratio of the encoding time of the proposed algorithm to the encoding time of the HM-11.0 default anchor without rate control. From the table, we can see that the proposed picture level bit allocation algorithm will not increase the encoding time obviously. To better show that the proposed picture level bit allocation algorithm can adapt to the video content, a typical example of the bits cost per picture for both the fixed weight and the E=
1051-8215 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598672, IEEE Transactions on Circuits and Systems for Video Technology LI et al.: λ DOMAIN OPTIMAL BIT ALLOCATION ALGORITHM FOR HIGH EFFICIENCY VIDEO CODING
9
TABLE IV E XPERIMENTAL R ESULTS ON T HE P ICTURE L EVEL B IT A LLOCATION , L OW D ELAY C ODING S TRUCTURE
Sequence PartyScene
BQSquare
Target bitrate 8054 3447 1504 644 2225 772 309 127
Fixed weight bit allocation Bitrate YPNSR Bitrate error 8036.14 38.19 0.22% 3447.08 34.49 0.00% 1504.07 30.99 0.00% 644.11 27.89 0.02% 2224.96 38.34 0.00% 772.01 34.22 0.00% 309.00 31.00 0.00% 127.02 27.96 0.01%
Proposed bit allocation Bitrate YPSNR Bitrate error 8037.89 38.29 0.20% 3446.80 34.50 0.01% 1504.02 31.07 0.00% 644.08 28.00 0.01% 2224.93 38.39 0.00% 772.05 34.41 0.01% 309.01 31.23 0.00% 127.04 28.19 0.03%
TABLE V E XPERIMENTAL R ESULTS ON T HE P ICTURE L EVEL B IT A LLOCATION , R ANDOM ACCESS C ODING S TRUCTURE
Sequence PartyScene
BQSquare
Target bitrate 6829 3112 1466 692 1637 637 284 140
Fixed weight bit allocation Bitrate YPNSR Bitrate error 6831.08 38.11 0.03% 3112.56 34.60 0.02% 1466.19 31.46 0.01% 692.16 28.53 0.02% 1637.81 37.91 0.05% 637.22 34.31 0.04% 284.08 31.47 0.03% 140.27 28.73 0.19%
TABLE VII T HE R-D P ERFORMANCE AND ENCODING TIME OF A LL THE T EST S EQUENCES FOR T HE P ICTURE L EVEL B IT A LLOCATION
Proposed bit allocation Bitrate YPSNR Bitrate error 6830.66 38.16 0.02% 3112.55 34.68 0.02% 1466.22 31.55 0.02% 692.09 28.63 0.01% 1637.49 37.96 0.03% 637.76 34.55 0.12% 284.14 31.73 0.05% 140.12 28.97 0.09%
BlowingBubbles @153kbps 25000 without rate control
Fixed 7.1% 5.8% 3.5% 5.6% – 5.5% 100%
RA Proposed 4.9% 5.2% 1.5% 2.2% – 3.6% 100%
proposed picture level bit allocation algorithm is shown in Fig. 4. Since there is a scene change around frame 140 for the sequence Kimono, the characteristics of the video content before the scene change and after the scene change are totally different. From Fig. 4, we can see that the percentage of bits between different pictures in a GOP before scene change and after scene change is the same under the fixed weight bit allocation algorithm. While the proposed adaptive weight bit allocation algorithm in this paper can adjust the percentage of bits between different pictures in a GOP according to the current video content so as to improve the R-D performance. To show the essence of the benefits brought by the proposed picture level bit allocation algorithm, the figures of both the actual bits per picture and actual PSNR per picture for the sequence BlowingBubbles with hierarchical-B coding structure at 153kbps are shown in Fig. 5 and Fig. 6. From Fig. 5, we can clearly observe that both the fixed weight and adaptive weight bit allocation algorithms can lead to hierarchical bit allocation for different pictures in a GOP. However, different from the fixed weight bit allocation algorithm, the percentage of bits between different pictures under the adaptive weight bit allocation algorithm is quite similar to that of the situation
20000
fixed weight adap"ve weight
15000 bits
Class A Class B Class C Class D Class E overall Enc. time
Fixed – 5.1% 0.1% 3.2% 4.6% 3.3% 100%
LD Proposed – 3.3% –0.7% 0.1% 3.2% 1.5% 100%
10000 5000 0 80
Fig. 5.
82
84
86
88 POC
90
92
94
96
The actual bits per picture for the sequence BlowingBubbles
without rate control. Besides, from Fig. 6, we can see that the proposed adaptive weight bit allocation algorithm can lead to hierarchical picture quality for different pictures in a GOP. The hierarchical picture quality which makes the key pictures (whose POC is a multiple of 8 and are referenced the most) with the best quality is beneficial to the overall performance. On the contrary, we are even unable to figure out the key frames in Fig. 6 under the fixed weight bit allocation algorithm. Therefore we can also observe that the adaptive weight bit allocation algorithm achieves much better PSNR per picture compared with the fixed weight bit allocation algorithm under the constraint of the GOP level target bits. C. Results for the proposed BU level bit allocation Typically, a BU is consist of one or several consecutive CTUs. The size of a BU determines the granularity of bit allocation. The larger a BU is, the larger the granularity of the bit allocation will be, and vice versa. In normal cases, we use one CTU per BU to achieve better bitrate accuracy.
1051-8215 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598672, IEEE Transactions on Circuits and Systems for Video Technology 10
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. **, NO. **, ** 2016
Kimono @2400kbps (Low Delay) fixed weight
300000
combined algorithm 250000
bits
200000
150000
100000
50000
0 100
110
120
130
140
150
160
170
180
190
200
POC
Fig. 4.
The actual bits per picture for the sequence Kimono
TABLE X B ITRATE ACCURACY OF A LL THE T EST S EQUENCES FOR THE BU L EVEL B IT A LLOCATION
BlowingBubbles @153kbps 30.8 without rate control 30.6 fixed weight
PSNR
30.4
adap"ve weight
30.2
Original BU Proposed BU
30
Average bitrate error LD RA 0.02% 0.17% 0.02% 0.17%
Maximum bitrate error LD RA 0.34% 2.42% 0.30% 2.42%
29.8 29.6 29.4 80
Fig. 6.
82
84
86
88 POC
90
92
94
96
The PSNR per picture for the sequence BlowingBubbles
However, if we take one CTU as a BU, the proposed BU level bit allocation algorithm will not lead to the best R-D performance for the following two reasons. First, if we take one CTU as a BU, there will be a large number of BUs for each picture. A very large number of BUs implies on average very few bits allocated to each BU. Efficient allocation of a small number of bits requires more accurate R-D model. In other words, a very small R-D modeling error may result in significant R-D performance loss. Second, if we take one CTU as a BU, the BUs in one picture may have a wide range of R-D characteristics. Some of them may be very active while others may be inactive. This will introduce strong singularity into the bit allocation scheme. In [18], a whole picture is partitioned into 3 parts with slow motion, middle motion, and fast motion respectively to make the ρ-domain bit allocation algorithm achieve better R-D performance. In this paper, we simply set the BU size as one CTU row for both the original and proposed method to make the R-λ model of each BU more steady and better reflect the R-D performance improvement brought by the proposed BU level bit allocation algorithm. It should be noted that the BU size is set as one CTU row for both the original and proposed BU level bit allocation algorithm for fair comparison. For both LD and RA test conditions, we use two sequences as examples (one from Class B and the other from Class C) to show the bitrate accuracy and R-D performance of the proposed BU level bit allocation algorithm. Table VIII and
Table IX show the detailed results for LD and RA cases, respectively. The average and maximum bitrate errors of all the test sequences for both LD and RA cases are shown in Table X. From Table X, we can see that the bit allocation accuracy of the proposed method is almost the same compared with the original method. The R-D performance of the proposed BU level bit allocation algorithm is shown in Table XI. In Table XI, the HM-11.0 default anchor without rate control is set as the benchmark, while the original and the proposed BU level bit allocation algorithms are used for comparison. It can be seen from Table XI that the proposed BU level bit allocation algorithm can achieve much better R-D performance compared with the original BU level bit allocation algorithm for most tested sequences. An average of 0.9% R-D performance improvement is observed for LD case while the average R-D performance improvement for RA case is 1.8%. However, we can also see that the Class E sequences suffer about 0.7% performance loss in LD case. The loss comes from the fact that the target bitrates of all the Class E sequences are very small. As mentioned above, a very small R-D modeling error may lead to unreasonable bit allocation as well as R-D performance loss. If we can get more accurate R-λ model parameters for each BU, we can also achieve better R-D performance improvement for Class E sequences. And when compared with the HM-11.0 anchor, the proposed algorithm suffers only about 1.9% and 3.5% performance loss in LD and RA cases, respectively. As for the complexity, from Table XI, we can see that the proposed BU level bit allocation algorithm will not increase the encoding time obviously.
1051-8215 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598672, IEEE Transactions on Circuits and Systems for Video Technology LI et al.: λ DOMAIN OPTIMAL BIT ALLOCATION ALGORITHM FOR HIGH EFFICIENCY VIDEO CODING
11
TABLE VIII E XPERIMENTAL R ESULTS ON T HE BU L EVEL B IT A LLOCATION , L OW D ELAY C ODING S TRUCTURE
Sequence Cactus
RaceHorses
Target bitrate 20028 5730 2571 1268 5703 2263 996 457
original BU level bit allocation Bitrate YPSNR Bitrate error 20028.70 38.55 0.00% 5728.56 36.65 0.03% 2570.42 34.51 0.02% 1267.99 32.17 0.00% 5702.33 39.73 0.01% 2262.50 36.14 0.02% 995.97 32.98 0.00% 456.94 30.05 0.01%
proposed BU level bit allocation Bitrate YPSNR Bitrate error 20028.81 38.68 0.00% 5729.11 36.72 0.02% 2570.50 34.54 0.02% 1267.93 32.20 0.01% 5702.79 39.86 0.00% 2262.72 36.25 0.01% 995.97 33.05 0.00% 644.00 30.11 0.00%
TABLE IX E XPERIMENTAL R ESULTS ON T HE BU L EVEL B IT A LLOCATION , R ANDOM ACCESS C ODING S TRUCTURE
Sequence Cactus
RaceHorses
Target bitrate 18419 5733 2673 1371 4787 2028 947 464
original BU level bit allocation Bitrate YPSNR Bitrate error 18421.07 38.44 0.01% 5734.19 36.79 0.02% 2673.22 34.88 0.01% 1370.98 32.69 0.00% 4787.75 38.86 0.02% 2028.56 35.79 0.03% 947.12 32.89 0.01% 464.00 30.10 0.00%
TABLE XI T HE R-D P ERFORMANCE AND ENCODING TIME OF A LL THE T EST S EQUENCES FOR THE BU L EVEL B IT A LLOCATION
Class A Class B Class C Class D Class E overall Enc. time
Original – 5.2% –0.9% 2.4% 4.3% 2.8% 99%
LD Proposed – 3.1% –1.7% 1.8% 5.0% 1.9% 99%
Original 6.7% 6.0% 3.0% 5.5% – 5.3% 100%
RA Proposed 5.2% 3.2% 1.5% 4.2% – 3.5% 100%
TABLE XII B ITRATE ACCURACY OF A LL THE T EST S EQUENCES FOR THE COMBINED ALGORITHM
Original combined
Average bitrate error LD RA 0.02% 0.17% 0.02% 0.15%
Maximum bitrate error LD RA 0.34% 2.42% 0.32% 2.18%
D. Results for the combined picture and BU level bit allocation In the above subsections, we validate that both the proposed picture level and BU level bit allocation can bring quite good bitrate accuracy and R-D performance improvement individually. In this subsection, we would like to present some results to show the bitrate accuracy and R-D performance when we combine the picture level and BU level bit allocation algorithms into an integrated framework. Some example RD curves for both LD and RA cases are shown in Fig. 7, from which we can see some obvious R-D performance improvements brought by the combined picture level and BU level bit allocation algorithms in both LD and RA cases. The bitrate accuracy of the combined algorithm is shown
proposed BU level bit allocation Bitrate YPSNR Bitrate error 18421.28 38.53 0.01% 5733.51 36.84 0.01% 2673.02 34.91 0.00% 1370.59 32.72 0.03% 4787.64 38.92 0.01% 2028.55 35.85 0.03% 947.17 32.93 0.02% 464.02 30.15 0.01%
in Table XII. It can be obviously seen from Table XII that the proposed combined picture and BU level bit allocation algorithm can achieve smaller bitrate error compared with the combined fixed weight picture level and original BU level bit allocation algorithm. The average R-D performance improvement of the combined algorithm is shown in Table XIII. In Table XIII, the HM-11.0 default anchor without rate control is set as the benchmark, while the fixed weight bit allocation, and the combined algorithm are used for comparison. In LD case, we can see that the proposed combined algorithm achieves 2.8% R-D performance improvement compared with the fixed weight bit allocation. When compared with the HM-11.0 anchor, almost no R-D performance loss can be observed in average. In RA case, an average of 3.9% R-D performance improvement compared with the fixed weight bit allocation is achieved, and there is only about 1.4% R-D performance loss compared with the HM-11.0 anchor. The results obviously show that the R-D performance improvements brought by the picture level and BU level bit allocation are independent of each other, which results in a cumulative improvement of the total performance. As for the complexity, from Table XIII, we can see that the proposed combined picture level and BU level bit allocation algorithm will not increase the encoding time obviously. E. Summary of the experimental results From the results and discussions above, we can conclude that •
The proposed picture level bit allocation algorithm can achieve an average of 1.8% and 1.9% R-D performance improvement in low delay case and random access case, respectively.
1051-8215 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598672, IEEE Transactions on Circuits and Systems for Video Technology 12
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. **, NO. **, ** 2016
BasketballDrive (Low Delay)
BQSquare (Low Delay)
40
39
39
37
37
HM-11.0 anchor
36
combined algorithm
35
fixed weight
Y-PSNR (dB)
Y-PSNR (dB)
38 35 HM-11.0 anchor
33
combined algorithm 31
fixed weight
34 29
33
27
32 0
5000
10000
15000
20000
0
25000
500
40
39
38
37 HM-11.0 anchor combined algorithm
HM-11.0 anchor
34
combined algorithm 32
fixed weight
28 0
5000
10000
15000
20000
0
500
1000
1500
2000
bitrate (kbps)
bitrate (kbps)
Some example R-D curves of the proposed bit allocation algorithm
TABLE XIII T HE R-D P ERFORMANCE AND ENCODING TIME OF A LL THE T EST S EQUENCES FOR THE C OMBINED P ICTURE AND BU L EVEL B IT A LLOCATION
Class A Class B Class C Class D Class E overall Enc. time
•
2500
30
33
•
2000
36
fixed weight
34
Fig. 7.
PSNR (dB)
Y-PSNR (dB)
38
35
1500
BQSquare (Random Access)
BasketballDrive (Random Access) 40
36
1000
bitrate (kbps)
bitrate (kbps)
Fixed – 5.2% –0.9% 2.4% 4.3% 2.8% 99%
LD Proposed – 1.0% –2.6% –1.3% 3.8% 0.0% 99%
Fixed 6.7% 6.0% 3.0% 5.5% – 5.3% 100%
RA Proposed 2.9% 2.1% –0.5% 0.8% – 1.4% 99%
The proposed BU level bit allocation algorithm can achieve 0.9% and 1.8% R-D performance improvement in average in low delay case and random access case, respectively. The R-D performance improvement provided by the picture level and BU level bit allocation individually is independent of each other, which results in a cumulative improvement of the total performance. VI.
CONCLUSION
In this paper, we first develop a D-λ model to characterize the relationship between D and λ. Combined with our previously proposed R-λ model, we present a complete comprehensive λ-domain R-D analysis framework, which can fully reflect the inherent R-D characteristics governed by the video content. Based on our proposed λ-domain R-D analysis framework, we propose both the picture level and Basic Unit level bit allocation algorithm based on the fundamental rate distortion
optimization theory to take full advantage of the contentguided principles. The experimental results demonstrate that the proposed bit allocation algorithms can achieve obvious R-D performance improvement with smaller bitrate control error. The proposed picture level and Basic Unit level bit allocation algorithms have already been adopted by JCT-VC and integrated into the HEVC reference software.
R EFERENCES [1] B. Li, H. Li, L. Li, and J. Zhang, “λ domain rate control algorithm for High Efficiency Video Coding,” Image Processing, IEEE Transactions on, vol. 23, no. 9, pp. 3841–3854, Sept. 2014. [2] B. Li, H. Li, L. Li, and J. Zhang, “Rate control by R-lambda model for HEVC,” JCT-VC document, JCTVC-K0103, Shanghai, CN, Oct. 2012. [3] G. Sullivan, J. Ohm, W.-J. Han, and T. Wiegand, “Overview of the High Efficiency Video Coding (HEVC) standarad,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 22, no. 12, pp. 1649–1668, Dec. 2012. [4] A. Ortega and K. Ramchandran, “Rate-distortion methods for image and video compression,” Signal Processing Magazine, IEEE, vol. 15, no. 6, pp. 23–50, Nov. 1998. [5] K.-P. Lim, G. J. Sullivan, and T. Wiegand, “Text description of joint model reference encoding methods and decoding concealment,” JCTVC document, JCTVC-N046, Hong Kong, CN, Jan. 2005. [6] H. Choi, J. Yoo, J. Nam, D. Sim, and I. Bajic, “Pixel-wise unified rate-quantization model for multi-level rate control,” Selected Topics in Signal Processing, IEEE journal of, vol. 7, no. 6, pp. 1112–1123, Dec. 2013. [7] Y. Liu, Z. G. Li, and Y. C. Soh, “A novel rate control scheme for low delay video communication of H.264/AVC standard,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 17, no. 1, pp. 68–78, Jan. 2007. [8] B. Lee, M. Kim, and T. Nguyen, “A frame-level rate control scheme based on texture and nontexture rate models for High Efficiency Video Coding,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 24, no. 3, pp. 465–479, Mar. 2014.
1051-8215 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598672, IEEE Transactions on Circuits and Systems for Video Technology LI et al.: λ DOMAIN OPTIMAL BIT ALLOCATION ALGORITHM FOR HIGH EFFICIENCY VIDEO CODING
[9] Z. He, Y. K. Kim, and S. Mitra, “Low-delay rate control for DCT video coding via ρ-domain source modeling,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 11, no. 8, pp. 928–940, Aug. 2001. [10] M. Liu, Y. Guo, H. Li, and C. W. Chen, “Low-complexity rate control based on ρ-domain model for scalable video coding,” in Image Processing (ICIP), 2010 IEEE International Conference on, pp. 1277–1280, Sept. 2010. [11] M. Jiang and N. Ling, “Low-delay rate control for real-time H.264/AVC video coding,” Multimedia, IEEE Transaction on, vol. 8, no. 3, pp. 467– 477, Jun. 2006. [12] S. Zhou, J. Li, J. Fei, and Y. Zhang, “Improvement on rate-distortion performance of H.264 rate control in low bit rate,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 17, no. 8, pp. 996– 1006, Aug. 2007. [13] K. Ramchandran, A. Ortega, and M. Vetterli, “Bit allocation for dependent quantization with applications to multiresolution and MPEG video coders,” Image Processing, IEEE Transactions on, vol. 3, no. 5, pp. 533–545, Sept. 1994. [14] S. Hu, H. Wang, S. Kwong, T. Zhao, and C.-C Kuo, “Rate control optimization for temporal-layer scalable video coding,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 21, no. 8, pp. 1152–1162, Aug. 2011. [15] S. Wang, S. Ma, S. Wang, D. Zhao, and W. Gao, “Rate-GOP based rate control for High Efficiency Video Coding,” Selected Topics in Signal Processing, IEEE journal of, vol. 7, no. 6, pp. 1101–1111, Dec. 2013. [16] C. Pang, O. Au, F. Zou, J. Dai, X. Zhang, and W. Dai, “An analytic framework for frame-level dependent bit allocation in hybrid video coding,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 23, no. 6, pp. 990-1002, Jun. 2013. [17] W. Yuan, S. Lin, Y. Zhang, W. Yang, and H. Luo, “Optimum bit allocation and rate control for H.264/AVC,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 16, no. 6, pp. 705–715, Jun. 2006. [18] Z. He and S. Mitra, “Optimum bit allocation and accurate rate control for video coding via ρ-domain source modeling,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 12, no. 10, pp. 840–849, Oct. 2002. [19] K. Ferguson and N. Allinson, “Modified steepest-desent for bit allocation in strongly dependent video coding,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 19, no. 7, pp. 1057–1062, Jul. 2009. [20] C.-W. Seo, J.-W. Kang, J.-K. Han, and T. Nguyen, “Efficient bit allocation and rate control algorithms for hierarchical video coding,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 20, no. 9, pp. 1210–1223, Sept. 2010. [21] B. Tao, B. Dickinson, and H. Peterson, “Adaptive model-driven bit allocation for MPEG video coding,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 10, no. 1, pp. 147–157, Jan. 2000. [22] J. Si, S. Ma, and W. Gao, “Efficient bit allocation and CTU level rate control for High Efficiency Video Coding,” in Picture Coding Symposium (PCS), 2013, pp. 89–92, Dec. 2013. [23] M. Karczewicz and X. Wang, “Intra frame rate control based on SATD,” JCT-VC Document, JCTVC-M0257, Incheon, KR, Apr. 2013. [24] B. Li, H. Li, and L. Li, “Adaptive bit allocation for R-lambda model rate control in HM,” JCT-VC Document, JCTVC-M0036, Incheon, KR, Apr. 2013. [25] HM, HEVC test model, [online], Available: https://hevc.hhi.fraunhofer.de/svn/. [26] G. Sullivan and T, Wiegand, “Rate-distortion optimization for video compression,” Signal Processing Magazine, IEEE, vol. 15, no. 6, pp. 74-90, Nov. 1998. [27] N. Kamaci, Y. Altunbasak, and R. Mersereau, “Frame bit allocation for the H.264/AVC video coder via Cauchy-density-based rate and distortion models,” Circuits and Systems for Video Technology, IEEE Transactions on, vol. 15, no. 8, pp. 994–1006, Aug. 2005. [28] S. Mallat and F. Falzon, “Analysis of low bit rate image transform coding,” Signal Processing, IEEE Transactions on, vol. 46, no. 4, pp. 1027–1042, Apr. 1998. [29] F. Bosson, “Common test conditions and software reference configurations,” JCT-VC Document, JCTVC-L1100, Geneva, CH, Jan. 2013. [30] Bisection method. [online]. Available: http://en.wikipedia.org/wiki/Bisection method. [31] H. Li, B. Li, and J. Xu, “Rate-distortion optimized reference picture management for High Efficiency Video Coding,” Circuits and Systems
13
for Video Technology, IEEE Transactions on, vol. 22, no. 12, pp. 1844– 1857, Dec. 2012. [32] G. Bjontegaard, “Calculation of average PSNR difference between RDcurves,” VCEG Document, VCEG-M33, Austin, Texas, USA, Apr. 2001.
Li Li received the B.S. degree in electronic engineering from the University of Science and Technology of China (USTC), Hefei, Anhui, China, in 2011. He is currently pursuing the Ph.D. degree with the Department of Electronic Engineering and Information Science, USTC. His research interests include image/video coding and processing.
Bin Li (M’14) received the B.S. and Ph.D. degrees in electronic engineering from the University of Science and Technology of China (USTC), Hefei, Anhui, China, in 2008 and 2013, respectively. He joined Microsoft Research Asia (MSRA), Beijing, China, in 2013 and now he is a Researcher. He has authored or co-authored over 20 papers. He holds over 10 granted or pending U.S. patents in the area of image and video coding. He has more than 30 technical proposals that have been adopted by Joint Collaborative Team on Video Coding. His current research interests include video coding, processing, and communication. Dr. Li received the best paper award for the International Conference on Mobile and Ubiquitous Multimedia from Association for Computing Machinery in 2011. He received the Top 10% Paper Award of 2014 IEEE International Conference on Image Processing. He has been an active contributor to ISO/MPEG and ITU-T video coding standards (JCT-VC). He is currently the Co-Chair of the Ad Hoc Group of Screen Content Coding extensions software development.
Houqiang Li (S’12) received the B.S., M.Eng., and Ph.D. degrees from the University of Science and Technology of China (USTC), in 1992, 1997, and 2000, respectively, all in electronics engineering. He is currently a Professor with the Department of Electronic Engineering and Information Science, USTC. His current research interests include video coding and communication, multimedia search, and image/video analysis. He has authored or co-authored over 100 papers in journals and conferences. He was a recipient of the best paper award for Visual Communications and Image Processing in 2012, the International Conference on Internet Multimedia Computing and Service in 2012, and the International Conference on Mobile and Ubiquitous Multimedia from ACM in 2011, and a Senior Author of the Best Student Paper of the Fifth International Mobile Multimedia Communications Conference in 2009. He served as an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY from 2010 to 2013, and has been on the Editorial Board of the Journal of Multimedia since 2009. He has served on technical/program committees and organizing committees and as a Program Co-Chair and a Track or Session Chair for over ten international conferences.
1051-8215 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598672, IEEE Transactions on Circuits and Systems for Video Technology 14
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. **, NO. **, ** 2016
Chang Wen Chen (F’04) received the B.S. degree from the University of Science and Technology of China in 1983, the M.S.E.E. degree from the University of Southern California in 1986, and the Ph.D. degree from University of Illinois at Urbana– Champaign in 1992. He is currently an Empire Innovation Professor of Computer Science and Engineering with the University at Buffalo, State University of New York. He was an Allen Henry Endow Chair Professor with the Florida Institute of Technology from 2003 to 2007. He was a faculty member of Electrical and Computer Engineering with the University of Rochester from 1992 to 1996 and the University of Missouri–Columbia from 1996 to 2003. He has been the Editor-in-Chief of the IEEE TRANSACTIONS ON MULTIMEDIA since 2014. He served as the Editor-in-Chief of the IEEE TRANSACTIONS ON CIRCUITS ANS SYSTEMS FOR VIDEO TECHNOLOGY from 2006 to 2009. He has been an Editor for several other major IEEE TRANSACTIONS and JOURNALS, including the PROCEEDINGS of IEEE, the IEEE JOURNAL OF SELECTED AREAS IN COMMUNICATIONS, and IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS. He has served as Conference Chair for several major IEEE, ACM and SPIE conferences related to multimedia, video communications and signal processing. His research is supported by NSF, DARPA, Air Force, NASA, Whitaker Foundation, Microsoft, Intel, Kodak, Huawei, and Technicolor. He and his students have received eight Best Paper Awards or Best Student Paper Awards over the past two decades. He has also received several research and professional achievement awards, including the Sigma Xi Excellence in Graduate Research Mentoring Award in 2003, the Alexander von Humboldt Research Award in 2010, and the State University of New York at Buffalo Exceptional Scholar–Sustained Achievement Award in 2012. He is an SPIE Fellow.
1051-8215 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.