Fast and Efficient Prediction Unit Size Selection for ... - IEEE Xplore

5 downloads 7469 Views 294KB Size Report
Email: [email protected]. Abstract—In this paper, a fast learning-based Coding Unit. (CU) size selection method is presented for High Efficiency Video. Coding ...
2012 IEEE International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS 2012) November 4-7, 2012

Fast and Efficient Prediction Unit Size Selection for HEVC Intra Prediction Jian Xiong

Hongliang Li

School of Electronic Engineering University of Electronic Science and Technology of China Chengdu, 611731, China Email: [email protected]

School of Electronic Engineering University of Electronic Science and Technology of China Chengdu, 611731, China Email: [email protected]

Abstract—In this paper, a fast learning-based Coding Unit (CU) size selection method is presented for High Efficiency Video Coding (HEVC) intra prediction. Our analysis shows that nonnormalized histogram of oriented gradient (n-HOG) can be used to select CU size. For each size of CU, n-HOGs of training sequences are clustered to construct a codebook offline. The optimum CU sizes are determined by comparing its n-HOG with the codebook. Experimental results show that the fast intra prediction CU selection scheme speeds up intra coding significantly with negligible increment of Bjontegaard Delta rate.

I. I NTRODUCTION Recently, ISO-IEC/MPEG and ITU-T/VCEG formed the Joint Collaborative Team on Video Coding (JCT-VC). The JCT-VC aims to develop the next-generation video coding standard, called High Efficiency Video Coding (HEVC) [1]. HM was standardized the JCT-VC as the test model of HEVC. Many novel coding tools were proposed to better exploit a higher spatial and temporal redundancy presented in high definition video contents. Like H.264/AVC, HEVC still belongs to block-based hybrid video coding frameworks. In HEVC, a frame is divided into Largest Coding Units (LCU) with a size of 64 × 64 instead of macroblocks with size of 16x16 which defined in H.264/AVC. Each LCU can be recursively divided into four Coding Units (CU) and form a quadtree coding structure. However, 8×8 CUs are the smallest CUs which have two possible prediction units (PU) 8 × 8 and 4 × 4. In HEVC intra coding process, the optimum coding performance of each LCU is achieved by RDO calculations among all the CU sizes in a recursive manner. The support of CU sizes larger than the conventional 16 × 16 macroblock in H.264/AVC is beneficial to homogeneous regions. However, the computational complexity of the encoder is extremely high. A comparison between HEVC and H.264/AVC conducted in [2] shows that HEVC provides greater bit rate saving, but also brings higher encoding complexity. Many efforts have been made to investigate fast intra prediction algorithm to reduce computational complexity in H.264. Most commonly, the sum of absolute transform differences (SATD) is used to reduce the number of candidate intra prediction modes for rate-distortion (RD) optimization (RDO) [3], Edge detection algorithms have been recently

978-1-4673-5082-2 ©2012 IEEE

developed to select a small part of intra modes for RDO calculation. These algorithms determine the dominant edge direction of each block and then select associated prediction modes [4], [5]. Furthermore, homogenous characteristics of Macroblock are checked to skip some RDO calculations [6]. In this work, a learning-based method is proposed to determine sizes of intra CUs in HEVC. Non-normalized Histogram of Oriented Gradient (n-HOG) descriptors of coding units are clustered to construct a codebook. This codebook can be used to select CU size fast. Experimental results show that the proposed method speeds up HEVC intra coding significantly with negligible loss of quality. II. A NALYSIS O F CU S ELECTION A. Analysis of Rate-Distortion Optimization In HEVC intra prediction process, the problem of determining the optimum splitting strategy for a LCU can be simplified to the problem of judging the best CU splitting of each CU size. For each CU size, Rate-Distortion (RD) costs are calculated with unsplit and split, respectively. The optimal CU splitting is selected as follows { unsplit Junsplit ≤ Jsplit (1) intra splitting = split Junsplit ≥ Jsplit where Junsplit and Jsplit denote the minimal RD costs of the current CU which encoded as unsplit and split manners respectively. In (1), these RD costs are calculated based on sum of squared differences (SSD). To accelerate the coding process, the sum of absolute differences (SAD) based costs can be used to approximate the SSD-based costs. That is J ≈ SAD. To evaluate the RD costs roughly, we will consider a split coding unit X using the optimal prediction mode of angle φ from the vertical. As illustrated in Fig. 1(a), xi,j denotes the pixel value at location (i, j), 0 ≤ i, j ≤ N − 1 in X, and is predicted using x0,j+i tan φ . The element of its corresponding residual is ai,j = xi,j − x0,j+i tan φ . The RD cost can be formulated by

366

J ≈ SADφ =

N ∑ N ∑ i=1 j=1

|xi,j − x0,j+i tan φ |.

(2)

2012 IEEE International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS 2012) November 4-7, 2012

B. The Relationship Between the Gradients and the Autocorrelation Coefficient In order to derive the strongest correlation of CUs, we will consider a one-dimension signal Y = {y0 , y1 , · · ·, yN −1 }. The variance-normalized autocorrelation coefficient ρ(k), (0 ≤ ρ ≤ 1) at lag k of the singal Y can be described as follow: ρ(k) = where R(0) = rewritten as

(a)

Fig. 1. (a)HEVC intra prediction for prediction angle φ from the vertical direction. The pixel xi,j is predicted by reference pixel x0,j+i tan φ . (b)The image correlation model. xi,j is predicted by x0,j+i tan φ at angle φ from the vertical. Correlation along the direction at angle θ from the vertical is assumed to be the strongest, with the coefficient ρ1 . And in its normal direction, with the correlation coefficient ρ2 .

In (2), xi,j and x0,j+i tan φ are assumed to follow a first-order stationary Markov process. That is x0,j+i tan φ = ρxi,j ,x0,j+i tan φ · xi,j + n(x) where ρxi,j ,x0,j+i tan φ is the correlation coefficient between the source pixel xi,j and the predicted pixel x0,j+i tan φ . n(x) denotes a zero-mean white noise process. Without regard to the noise, the difference between xi,j and x0,j+i tan φ is proportional to the correlation coefficient. |xi,j − x0,j+i tan φ | ∝ 1 − ρxi,j ,x0,j+i tan φ .

(3)

To simplify the derivation, and without loss of generality, we will assume an image correlation model [7]. In this model, each image pixel is a random variable with zero mean and unit variance. As illustrated in Fig. 1(b), the correlation between pixels xi,j and x0,j+i tan φ is given by ρxi,j ,x0,j+i tan φ = ρ1 d1 ρ2 d2 = ρ1 |i cos θ+i tan φ sin θ| ρ2 |−i sin θ+i tan φ cos θ| (4) where θ is the angle from the vertical to describe the strongest correlation direction in the source CU. ρ1 and ρ2 are the correlation coefficients in the strongest correlation direction and its normal direction respectively, and such that 0 ≤ ρ2 ≤ ρ1 ≤ 1. It can be proved that, if and only if φ = θ , the correlation ρxi,j ,x0,j+i tan φ takes the maximum. Then it can be formulated as: ρxi,j ,x0,j+i tan φ = ρ1 |i cos θ+i tan θ sin θ| = ρ1 | cos θ | . i

(5)

For i/ cos θ ≥ 1, combine with (2) and (3), we get J ≈ SAD ≈

N ∑ N ∑ i=1 j=1

|1 −

|i/ cos θ| ρ1 |

n=0

(7)

yn 2 . This above equation can be

G2 . (8) 2R(0) where ∑N −1G denotes the sum of gradients at lag k and G = [ n=0 (yn − yn−k )2 ]1/2 . From equation (8), it is observed that the autocorrelation coefficients of sequences are related to the sum of gradients. Thus, extending to the 2 dimensional image, the correlation coefficients of CUs are related to their gradients statical characteristics, which can be calculated as non-normalized HOGs. That is, in equation (6), J ∝ 1 − ρ1 is related to the n-HOG of the CU. RD cost of a CU with n-HOG h can be denoted as Jh , and CUs with similar n-HOGs are considered to have close costs. That is, 1 − ρ(k) =

(b)

φ

∑N −1

N −1 1 ∑ yn yn−k R(0) n=0

Jhog1 ≈ Jhog2 , hog1 ∈ U (hog2)

where hog1 and hog2 denote two n-HOGs. Jhog1 and Jhog2 denote the corresponding RD costs of the two n-HOGs. The operator U (·) denotes the neighbor of a vector. We consider two CUs in depth i denoted as CU1 and CU2 . Their pyramid n-HOG vectors are v = (hi , h0i+1 , h1i+1 , h2i+1 , h3i+1 ) and vb = d d d 3 ), respectively. h and h b (hbi , h0i+1 , h1i+1 , h2i+1 , hd denotes i i i+1 d n-HOGs of the two CUs. hji+1 and hji+1 denotes the n-HOGs of the j th , j = 0, 1, 2, 3 sub-CUs (or 4 × 4 PUs) when ∑3 the two CUs are split in quarter manner. That is, Jsplit = j=0 Jhj . i+1 From equation (1), the intra splitting of the CU1 can be formulated as intra splittingCU1 = map(Junsplit , Jsplit ) = map(Jhi , Jh0i+1 , Jh1i+1 , Jh2i+1 , Jh3i+1 ) = map(hi , h0i+1 , h1i+1 , h2i+1 , h3i+1 ) (10) Thus, the intra splitting of a CU with v can be denoted as intra splittingv We assume v and vb are with close values. From equation (9), Jhi ≈ Jhbi , and Jhj

i+1

≈Jd . j hi+1

Hence, CUs with the similar pyramid n-HOG vectors are always encoded with the same intra splitting. It can be described as follows v) intra splittingv = intra splittingvb , v ∈ U (b

∝ 1 − ρ1 .

(6)

(9)

(11)

intra splittingv and intra splittingvb correspond to intra splittings of the two pyramid n-HOG vectors.

367

2012 IEEE International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS 2012) November 4-7, 2012

III. P ROPOSED M ETHOD In this section, a n-HOG learning-based method is proposed to fast select intra splittings of each CU size in HEVC intra prediction. In the offline learning method, n-HOGs of CUs associated with split flags (split or unsplit) are clustered to construct a codebook. Each codeword in the codebook is the center of each cluster, and is assigned to one of three types. The n-HOG of the current CU to be coded is calculated with each codeword to find the nearest one. Then, the corresponding type of the nearest codeword can be used for determining the intra splitting of the current CU. A. Learning Method Based on the analysis in section 2, CUs with the neighboring n-HOGs are always encoded as the same manner. Therefore, these centers can be found by using clustering method. However, some pyramid n-HOG vectors always lead to close RD costs of both the split and the unsplit manners, and they are called unsure n-HOGs. CUs in the neighbour of unsure n-HOGs are not always encoded in only one splitting. However, after clustering, these unsure n-HOGs are aggregated in clusters. Obviously, centers of these unsure clusters are not suitable for determining CU splitting. So that, centers are differentiated in three Types which include: unsplit, split, and unsure. Algorithm 1 : Algorithm for Learning n-HOG Centers Input: Training Data Set D. Initialization: ζ, ρ, (ρ > 0.5), clusters number n. n−1 1: {Cj }j=0 = clustering(D, n) j j Where Cj = {Dunsplit , Dsplit , centerj }. centerj is the center of the j th cluster. 2: for j = 0 → n − 1 do j j 3: k1 = num(Dunsplit ), k2 = num(Dsplit ), 4: if k ≥ (T × ζ) then 5: if k1 /(k1 + k2 ) ≥ ρ then 6: fj = unsplit 7: else if k2 /(k1 + k2 ) ≥ ρ then 8: fj = split 9: else 10: fj = unsure 11: end if 12: codebook = addT oCodebook(centerj , fj ). 13: end if 14: end for 15: Output: codebook;

Fig. 2.

Flowchart of the proposed CU splitting selection method.

First, a clustering operation is conducted on the training set. After that, samples of n-HOGs with the corresponding splittings are aggregated in clusters. Each cluster includes two j j subsets: Dunsplit and Dsplit with the corresponding split flag unsplit and split respectively. Second, clusters that aggregate little numbers of samples will be elided for its poor recognition capability. We have the threshold set as ζ. Clusters with sample number smaller than ζT will be ignored. Furthermore, we have another threshold set as ρ. If the percentage of the majority is larger than ρ in a cluster, it means that this cluster significantly tends to contain only one splitting. This cluster will be used for describing the major splitting. If this major splitting is unsplit, this cluster is assigned to the unsplit type. On the contrast, if the major splitting is split, this cluster is assigned to the split type. Otherwise, if the percentage of the majority is smaller than ρ , it means that this cluster can not describe both of the two splittings. This cluster is assigned to the unsure type. The codebook is constructed with the remainder cluster centers and their types. Each codeword is the cluster center. B. Integrating in HEVC

In the learning method, for each CU size, n-HOGs calculated by the Sobel operators associated with the split flags (unsplit, split) are obtained from learning sequences to form the training set. Steps of the learning method are list in Algo−1 rithm 1. Consider a set of two-category data {hogi , fi }Ti=0 , where hogi is a n-HOG of a training CU, fi denotes the split flags associated with the hogi , T is the total sample number of the training set. The details are as follows.

Once the codebook is built, codewords are assigned to three sets which include: unsplit, split, and unsure. In order to judge whether a CU should be predicted as a split manner or not, n-HOG of the current CU is computed and compared with the codebook to find the nearest codeword. If the nearest codeword belongs to the split type, the current CU should be encoded as the split manner, and the RDO calculation of the unsplit splitting will be early terminated. On the contrary, if the nearest codeword belongs to the unsplit type, the current CU should be encoded with the unsplit splitting and the RDO calculation of the split splitting will be skipped. However, if

368

2012 IEEE International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS 2012) November 4-7, 2012

the nearest codeword belongs to the unsure type, the RDO process of both of the CU sizes should be executed. That means there is no time complexity gain for this type. Fig. (2) shows the flowchart of the proposed fast CU splitting selection method. IV. E XPERIMENTAL RESULTS Our proposed algorithm was implemented within HEVC reference software platform (HM v2.2) to evaluate the performance. The experiments are conducted with encoder intra loco test configurations as stipulated by the common condition which proposed in [8] . QP is set to 22, 27, 32, and 37. Four video sequences (BQTerrace, BQMall, BasketballPass, and Vidyo3) are used as training materials. Others are the test sequences. The codebook is trained by extracting n-HOGs from CUs with sizes of 8 × 8, 16 × 16, 32 × 32, 64 × 64. For each CU size, a set of n-HOGs are trained with the proposed method described in section III. The clustering method is Fuzzy C-Means. In these experiments, the parameters ζ and ρ are set to 0.01 and 0.90 respectively from our empirical study. Table I shows the time saving and Bjontegaard Delta (BD) rate results summarized by class. The time reduction for the proposed fast CU selection method is about 37.87%. and it goes up to 51.83% for class E. Time saving of sequences which have more complexity regions (such as class C and Class D) is somewhat less than the average. This can be attributed to the more small CUs for these complexity sequences. 8 × 8 CUs employ more of the total encoding time. Hence the time saving of these complexity sequences is smaller than that of homogeneous sequences. On average, slight BD increment of about 1.25% is obtained by applying the proposed fast CU selection method. Table II shows the time saving of different QPs. Time saving of higher QPs are bigger than that of lower QPs. This is due to that sequences encoded with higher QPs have more large CUs than sequences encoded with lower QPs.

VI. C ONCLUSION A learning-based fast CU size selection method has been proposed for HEVC intra coding. The implicit relation between n-HOG and RD cost is revealed. The n-HOGs of training data are clustered to construct a codebook. The proposed algorithm compares distance between codewords and the nHOG of the current CU to select the optimum intra splitting. The efficiency of the learned models has been evaluated with respect to the time saving and the coding quality. Experimental results demonstrated that the proposed algorithm significantly reduced the overall encoding time with the negligibly small coding loss. R EFERENCES [1] K. Ugur et al., “High performance, low complexity video coding and the emerging HEVC standard,” IEEE Transaction on Circuits and Syst. Video Technol., vol. 20, no. 12, pp.1688-1697, Dec. 2010. [2] S. Park, J. Park, and B. Jeon, ”Report on the evaluation of HM versus JM,” In Proc. 4th JCT-VC Meeting, Jan. 2011, no. JCTVC-D181. [3] Y. M. Lee, Y. T. Sun, and Y. Lin, “SATD-based intra mode decision for H.264/AVC video coding,” IEEE Transaction on Circuits and Syst. Video Technol., vol. 20, no. 3, pp.463-469, March 2010. [4] H. Li, K. N. Ngan, and Z. Wei, “Fast and efficient method for block edge classification and its application in H.264/AVC video coding,” IEEE Transaction on Circuists and Syst. Video Technol., vol. 18, no. 6, pp.756768, June 2008. [5] Z. Wei, K. N. Ngan, and H. Li, “An Efficient intra-mode selection algorithm for H.264 based on edge classification and rate-distortion estimation,” Signal Processing: Image Communication, vol. 23, no. 9, pp. 699-710, Sept. 2008. [6] Y.-H. Huang, T.-S. Ou, and H. H. Chen, “Fast decision of block size, prediction mode, and intra block for H.264 intra prediction,” IEEE Transaction on Circuits and Syst. Video Technol., vol. 20, no. 10, pp.13671372, Aug. 2010. [7] C. Yeo, Y. H. Tan, Z. Li, and S. Rahardja, “Mode-dependent transform for coding directional intra prediction residuals,” IEEE Transaction on Circuits and Syst. Video Technol., vol. 22, no. 4, pp.545-554, Apr. 2012. [8] F. Bossen, “Common test conditions and software reference configurations,” In Proc. 2nd JCT-VC Meeting, July 2010. no. JCTVC-B300.

TABLE I T IME SAVING

AND

Class Class A Class B Class C Class D Class E Average

BD

BANDWIDTH INCREMENT SUMMARIZED BY CLASS

∆ Time -37.36% -41.28% -32.03% -26.84% -51.83% -37.87%

Y BD-rate 1.55% 2.28% 0.41% 0.28% 1.73% 1.25%

U BD-rate 1.77% 2.02% -0.02% 0.32% 0.69% 0.90%

V BD-rate 1.90% 2.27% 0.07% 0.23% 1.53% 1.203%

TABLE II T IME SAVING OF DIFFERENT QP S Class ∆ Time

QP=37 -47.47%

QP=32 -43.03%

QP=27 -30.13%

QP=22 -30.85%

V. ACKNOWLEDGMENT This work was partially supported by NSFC (No.60972109 and 61271289), and the Ph.D. Programs Foundation of Ministry of Education of China (No. 20110185110002).

369

Suggest Documents