Fast Mode Decision Algorithm for Intraprediction in H.264/AVC Video ...

5 downloads 0 Views 762KB Size Report
Feng Pan, Xiao Lin, Susanto Rahardja, Keng Pang Lim, Z. G. Li, Dajun Wu, and Si Wu. Abstract—The H.264/AVC video coding standard aims to enable.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 7, JULY 2005

813

Fast Mode Decision Algorithm for Intraprediction in H.264/AVC Video Coding Feng Pan, Xiao Lin, Susanto Rahardja, Keng Pang Lim, Z. G. Li, Dajun Wu, and Si Wu

Abstract—The H.264/AVC video coding standard aims to enable significantly improved compression performance compared to all existing video coding standards. In order to achieve this, a robust rate-distortion optimization (RDO) technique is employed to select the best coding mode and reference frame for each macroblock. As a result, the complexity and computation load increase drastically. This paper presents a fast mode decision algorithm for H.264/AVC intraprediction based on local edge information. Prior to intraprediction, an edge map is created and a local edge direction histogram is then established for each subblock. Based on the distribution of the edge direction histogram, only a small part of intraprediction modes are chosen for RDO calculation. Experimental results show that the fast intraprediction mode decision scheme increases the speed of intracoding significantly with negligible loss of peak signal-to-noise ratio.

Fig 1.

Variable block size for rate distortion optimization.

Fig. 2.

Computation of RDcost.

Index Terms—AVC, H.264, intraprediction, JVT, MPEG, video coding.

I. INTRODUCTION

T

HE NEWEST international video coding standard is H.264/AVC [1]. It has been approved recently by ITU-T as Recommendation H.264 and by ISO/IEC as International Standard 14 496-10 (MPEG-4 part 10) Advanced Video Coding (AVC). The elements common to all video coding standards are present in the current H.264/AVC recommendation: an MB is 16 16 in size; luminance (luma) is represented with higher resolution than chrominance (chroma) with 4:2:0 subsampling; motion compensation and block transforms are followed by scalar quantization and entropy coding; motion vectors are predicted from the median of the motion vectors of neighboring blocks; bidirectional pictures (B-pictures) are supported that may be motion compensated from both temporally previous and subsequent pictures; and a direct mode exists for B-pictures in which both forward and backward motion vectors are derived from the motion vector of a co-sited macroblock (MB) in a reference picture. Some new techniques, such as spatial prediction in intracoding, adaptive block size motion compensation, 4 4 integer transformation, multiple reference pictures (up to seven reference pictures) and content adaptive binary arithmetic coding (CABAC), are used in this standard. The testing results of H.264/AVC show that it greatly outperforms existing video coding standards in both peak signal-to-noise ratio (PSNR) and visual quality [2]. Manuscript received October 21, 2003; revised May 20, 2004. This paper was recommended by Associate Editor F. Pereira. The authors are with the Institute for Infocomm Research, 119613 Singapore (e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]). Digital Object Identifier 10.1109/TCSVT.2005.848356

To achieve the highest coding efficiency, H.264/AVC uses a nonnormative technique called Lagrangian rate-distortion optimization (RDO) technique to decide the coding mode [3] for an MB. Fig. 1 shows the possible MB modes and Fig. 2 shows the RDO process. As can be seen from Fig. 2, in order to choose the best coding mode for an MB, H.264/AVC encoder calculates the rate-distortion (RD) cost (RDcost) of every possible mode and chooses the mode having the minimum value, and this process is repeatedly carried out for all the possible modes for a given MB. Therefore, the computational burden of this type of brute force-searching algorithm is far more demanding than any existing video coding algorithm. To reduce the complexity of H.264/AVC, a number of efforts have been made to explore the fast algorithms in motion estimation, intramode prediction and intermode prediction for H.264/AVC video coding [4], [5]. Fast motion estimation is a well-studied topic and is widely applied in the existing standards such as MPEG-1/2/4 and H.261/H.263. However, these fast motion estimation algorithms cannot be applied directly to H.264/AVC coding due to the variable block size motion estimation. On the other hand, fast intramode decision is a new topic in H.264/AVC coding, and very few previous works exist so far. It is believed that fast intramode decision algorithms are also very

1051-8215/$20.00 © 2005 IEEE

814

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 7, JULY 2005

important in reducing the overall complexity of H.264/AVC. We have made two contributions to H.264/AVC related to the fast mode decision algorithms which are adopted as part of nonnormative reference model for H.264/AVC [6], [7]. In this paper, we present one of the contributions, a fast intramode decision algorithm for H.264/AVC intraprediction by using local edge information. The presented algorithm considerably reduces the amount of calculations needed for intraprediction with negligible loss of coding quality. We have observed that the pixels along the direction of local edge are normally of the similar values (this is true for both luma and chroma components), and a good prediction could be achieved if we predict the pixels using those neighboring pixels that are in the same direction of the edge. Therefore, an edge map which represents the local edge orientation and strength is created, and a local edge direction histogram is then established for each subblock. Based on the distribution of the edge direction histogram, only a small number of prediction modes are chosen for RDO calculation during intraprediction. Experimental results show that the fast mode decision algorithms increase the speed of intracoding significantly with negligible loss of the quality. The rest of the paper is organized as follows. Section II gives an overview of intracoding in H.264/AVC. Section III will present in detail the fast intraprediction algorithm based on the edge direction histogram. Experimental results will be presented in Section IV and conclusions will be given in Section V. II. OVERVIEW OF INTRACODING IN H.264/AVC Intracoding refers to the case where only spatial redundancies within a video picture are exploited. The resulting picture is referred to as an I-picture. Traditionally, I-pictures are encoded by directly applying the transform to all MBs in the picture. In previous video coding standards (namely H.263 and MPEG-4), intraprediction has been conducted in the transform domain. Intraprediction in H.264/AVC is always conducted in the spatial domain, by referring to neighboring samples of previously coded blocks. The difference between the actual block/MB and their prediction is then coded. With these advanced prediction modes, the performance of intracoding in H.264/AVC is comparable to that of the recent still image compression standard JPEG-2000 [8]. If an MB is encoded in intramode, a prediction block is formed based on previously coded and reconstructed blocks before deblocking. This prediction block is subtracted from the current block prior to encoding. For the luma samples, the prediction block may be formed for each 4 4 block (denoted as I4MB) or for an entire MB (denoted as I16MB). When using the I4MB prediction, each 4 4 block of the luma component utilizes one of nine prediction modes. Beside DC prediction, eight directional prediction modes are specified. When utilizing the I16MB prediction, which is well suited for smooth image areas, a uniform prediction is performed for the whole luma component of an MB. Four prediction modes are supported. The chroma samples of an MB are always predicted using a similar prediction technique as for the luma component in I16MB prediction.

Fig. 3. I4MB prediction coding is conducted for samples a-p of a block using samples A-Q. (b) Eight “prediction directions” for I4MB prediction.

A. I4MB Prediction Modes 4 luma block are The nine prediction modes for each 4 shown in Fig. 3. It can be seen that I4MB prediction is conducted for samples a-p of a block using samples A-Q. There are in total eight “prediction directions” and one DC prediction mode for I4MB prediction as detailed in the following [1]. • • • • • • • • •

Mode 0: Vertical Prediction Mode 1: Horizontal prediction. Mode 2: DC prediction. Mode 3: Diagonal down-left prediction. Mode 4: Diagonal down-right prediction. Mode 5: Vertical-right prediction. Mode 6: Horizontal-down prediction. Mode 7: Vertical-left prediction. Mode 8: Horizontal-up prediction.

For example, if we choose Mode 0, then the pixels , , , and are predicted based on the neighboring pixel ; pixels , , and are predicted based on pixel , and so on. If we choose Mode 7, then pixel a would be predicted by , pixels and would be predicted by , and pixels and would be predicted by and so on. Note that DC is a special prediction mode, where the mean of the left handed and upper samples (pixels A to D and I to L in Fig. 3) are used to predict the entire block. Normally DC prediction is useful for those blocks with little or no local activities. B. I16MB Prediction Modes As an alternative to I4MB prediction described above, the entire MB may be predicted. This is well suited for smooth image areas where a uniform prediction is performed for the whole luma component of an MB. Four prediction modes are supported. • • • •

Mode 0 (vertical): extrapolation from upper samples. Mode 1 (horizontal): extrapolation from left samples. Mode 2 (DC): mean of upper and left-hand samples. Mode 4 (Plane): plane prediction based on a linear spatial interpolation by using the upper and left-hand samples of the MB.

PAN et al.: FAST MODE DECISION ALGORITHM FOR INTRA PREDICTION IN H.264/AVC VIDEO CODING

Fig. 4. Examples of 4

C. 8

815

2 4 edge patterns and their preferred intraprediction directions.

8 Chroma Prediction Mode

Each 8 8 chroma component of an MB is predicted from chroma samples above and/or to the left that have previously been encoded and reconstructed. The four chroma prediction modes are very similar to that of the I16MB prediction except that the order of mode numbers is different: DC (Mode 0), horizontal (Mode 1), vertical (Mode 2) and plane (Mode 3). The same prediction mode is always applied to both chroma blocks. H.264/AVC uses the RDO technique to achieve the best coding performance. This means that the encoder has to encode the intrablock using all the mode combinations and choose the one that gives the best RDO performance. Since the choice of prediction modes for chroma components is independent to that of luma components, thus for each luma prediction modes, there should be four different chroma prediction modes. Therefore, the number of mode combinations for luma and , chroma components in an MB is , , and represent the number of modes where for chroma prediction, I4MB prediction and I16MB prediction, respectively. It means that, for an MB, it has to perform different RDO calculations before a best RDO mode is determined. As a result, the complexity and computational load of the encoder is extremely high. III. DETERMINING THE PRIMARY EDGE DIRECTION IN THE IMAGE BLOCK We observed that the pixels along the direction of local edge normally have similar values (this is true for both luma and chroma components). Therefore, a good prediction could be achieved if we predict the pixels using those neighboring pixels that are in the same direction of the edge. Fig. 4 shows a few edge patterns of a 4 4 block and their preferred directional predictions. There are a number of ways to get the local edge directional information, such as edge direction histogram which is based on a simple edge detection algorithm [9], and directional

fields which are based on the local gradients [10], etc. The algorithm described in this paper is based on edge detection due to its simplicity in terms of computational complexity. The rest of this section will explain in detail the fast intraprediction algorithm by using an edge direction histogram based on edge detection. A. Edge Map In order to obtain the edge information in the neighborhood of the intrablock to be predicted, the edge map of the video picture is generated by using the Sobel edge operators. Each pixel in the video picture will then be associated with an element in the edge map, which is the edge vector containing its edge direction and amplitude. Sobel operator has two convolution kernels which respond to degree of difference in vertical direction and horizontal direc, in a luma (or chroma) picture, we define tions. For a pixel , as the corresponding edge vector,

(1) and represent the degree of difference in verwhere tical and horizontal directions respectively. Therefore, the amplitude of the edge vector can be roughly estimated by (2) In fact, the amplitude could be obtained more accurately by and . The latter using the rooted sum of the squares of is computationally expensive and thus (2) is used. The direction of the edge (in degree) is decided by the hyper-function,

(3)

816

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 7, JULY 2005

It must be noted that in the actual implementation of the algorithm, (3) is not necessary. This is due to the fact that in H.264/AVC there is only limited number of prediction modes for intracoding. In this paper, a simple thresholding technique is to build up the edge direction histogram. applied to B. Edge Direction Histogram In order to decide whether the image block contains an edge, and how strong this edge is, an edge direction histogram is calculated from all the pixel map of the block by summing up the amplitude of the edge with similar edge directions in the block. 1) 4 4 Luma Block Edge Direction Histogram: In the case of a 4 4 luma block, there are 8 directional prediction modes, as shown in Fig. 3, plus a DC prediction mode. The border between any two adjacent directional prediction modes is the bisectrix of the two corresponding directions. For example, the border of Mode 1 (0 ) and Mode 8 (26.6 ) is the direction at 13.3 , this is because that for Mode 8, prediction is done at an angle of approximately 26.6 above horizontal direction. It is important to note that Mode 3 and Mode 8 are adjacent due to circular symmetry of the prediction modes. The mode of each . pixel is determined by its edge direction Therefore, the edge direction histogram of a 4 4 luma block is decided by the following algorithm. For each pixel in a 4 4 , , be the histogram cell luma block, let of the prediction mode k, and let , then if

and

Fig. 5. Edge direction histogram of Fig. 4(c).

or Fig. 6.

else if

Intra 8

2 8 and 16 2 16 prediction mode directions.

directions, i.e., horizontal, vertical and diagonal (plane) directions, as shown in Fig. 6. Note that both diagonal down right and diagonal down left prediction modes are associated with the plane prediction. Though it is not mathematically correct to associate plane prediction to any directional edge, we can for sure associate the vertical and horizontal prediction to its respective directional edges. Therefore, it is fairly reasonable for us to try plane prediction if it is not obviously a DC prediction. The edge direction histogram for 16 16 luma is constructed as follows:

else if else if else if else if

if else if else if else if (4) Note that Mode 2 is not included in the above algorithm. This is because that Mode 2 will always be chosen as one of the candidate mode. Fig. 5 shows the edge direction histogram of Fig. 4(c). It shows that this block exhibits strong edge in the Vertical Right direction. 2) Edge Direction Histogram for 16 16 Luma Block and 8 8 Chroma Block: In the case of 16 16 luma and 8 8 chroma blocks, there are only two directional prediction modes, plus a plane prediction and a DC prediction mode. Therefore, the edge direction histogram for this case will be based on three

else (5) For the similar reason, Mode 2 is missing in the above algorithm. An example of such edge direction histogram is shown in Fig. 7. Note for 8 8 chroma blocks, the similar equation of the above is applied, except that the order of mode numbers is different. As mentioned above, each cell in the edge direction histogram sums up the amplitudes of those pixels with similar edge directions in the block. Obviously, the histogram cell with the maximum amplitude indicates that there is a strong edge along

PAN et al.: FAST MODE DECISION ALGORITHM FOR INTRA PREDICTION IN H.264/AVC VIDEO CODING

817

predicted by the pixels above and/or to the left of the block. Method 4: During the experiments of Method 2, it is observed that the chosen intraprediction mode is either the primary prediction mode, or one of the two neighboring modes (in terms of direction) of the primary prediction mode. Therefore, the two additional candidate prediction modes are determined to be the two neighbors of the primary prediction mode in terms of directions (refer to Fig. 3). Experimental results have shown that Method 4 achieves a good balance between computational time and coding efficiency, and the rest of this section will describe the detailed implementation of this algorithm. However, in the experimental section, we will still present the comparison among all the methods. Fig. 7. Example of 16 histogram of.

2 16 luma and 8 2 8 chroma block edge direction

this direction in the block, and is thus considered as the preferable prediction direction. The mode whose direction complies with such is chosen as the primary prediction mode. Note that only the cell with global maximum is chosen as the primary prediction mode, even though the histogram might have multiple maximums. Therefore, the above algorithm produces one primary prediction mode each for a 4 4 luma block, 16 16 luma block, and 8 8 chroma block. IV. MODE DECISION FOR INTRAPREDICTION Based on the primary prediction mode previously determined, the fast mode decision algorithms for intraprediction select a small number of the prediction modes as the candidates to be used in RDO computation. It should be noted that, the actual RDO computation in H.264/AVC intracoding is based on the reconstructed images. While the edge directional histogram is calculated from the original lossless images as the reconstructed image is not available at the time of calculating the histogram, the primary prediction mode decided above will not always be the best RDO mode in actual coding. We have thus tried a number of ways in deciding the number of preferred prediction modes, as is discussed in the following. Method 1: The mode with maximum amplitude in edge directional histogram is chosen as the candidate prediction mode, and if this amplitude is below a predefined threshold, the prediction mode will be chosen as DC. Method 2: This method simply encounters DC mode to be candidate mode besides the primary prediction mode. This will eliminate the effect that different thresholds result in different performances in different sequences, which are the cases using Method 1. Method 3: In this method, additional information is added based on Method 2. The window size of the histogram computation is enlarged, by including pixels in the left column and upper row of the block of interest. This is due to the fact that a block of interest is

A. I4MB Prediction Modes Experimental results have shown that, in general, the histogram cell with the maximum amplitude is the best candidate for intraprediction (Method 1). In the case that all the cells have similar amplitudes, DC mode will be a better choice, thus an amplitude threshold is needed in deciding whether the intrablock exhibits strong edge presence or is just a flat region. However, it is difficult to pre-define a universal threshold that suits for different block context and different video sequences. Therefore, we always choose DC mode as the second candidate in participating the RDO operation (Method 2). Extensive experiments also show that, the chosen intraprediction mode is either the primary prediction mode, or one of the two neighboring modes (in terms of direction) of the primary prediction mode. The main cause for this phenomenon is that in H.264/AVC, RDO is based on the reconstructed intralossy images, while the edge directional histogram is calculated from the original lossless images. Therefore, the two additional candidate prediction modes are determined to be the two neighbors of the primary prediction mode in terms of directions. For example, if the primary prediction mode is Mode 1, then two additional candidate prediction modes will be Mode 8 and Mode 6. Note that Mode 8 and Mode 3 are adjacent modes in terms of directions due to the symmetry of the circle. In summary, for the I4MB prediction coding, the histogram cell with the maximum amplitude, and its two adjacent cells, plus DC mode are chosen to take part in RDO calculation. Therefore, for each 4 4 luma block, we will only perform 4 modes RDO calculation, instead of 9. B. I16MB Prediction Modes Based on the same observation above, the primary prediction mode decided by edge direction histogram is considered as a candidate of best prediction mode, and DC mode is also chosen as the next candidate. Therefore, in I16MB prediction coding, we will only perform 2 modes RDO calculation, instead of 4. C. 8

8 Chroma Prediction Modes

For intrachroma blocks, there are two different edge direction histograms, one from component U and the other from V. The

818

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 7, JULY 2005

TABLE I NUMBER OF CANDIDATE MODES

primary prediction modes from the two components are both considered as candidate modes. Same as before, DC mode will also be used in the RDO calculation. Note that according to the standard, the same prediction mode is always applied to both chroma blocks. Therefore, if the primary prediction modes from the two components are the same, there could only be 2 candidate modes for RDO calculation; otherwise, there will be 3. Thus, for each 8 8 chroma block intracoding, we will perform either 2 or 3 modes RDO calculation, instead of 4.

V. EXPERIMENTAL RESULTS Our proposed algorithm was implemented into JM6.1e provided by JVT. According to the specifications provided in [11], the test conditions are as follows. 1) MV search range is 32 pels for QCIF, CIF. 2) RD optimization is enabled. 3) Reference frame number equals to 1. 4) CABAC is enabled. 5) MV resolution is 1/4 pel. 6) GOP structure is IPPPP or IBBPB. A group of experiments were carried out on the recommended sequences with quantization parameters 28, 32, 36, and 40 as specified by [12]. The averaged PSNR values of luma (Y) and chroma (U, V) is used which is based on the equations below: (6) where the average mean square error (MSE) is given by

D. Algorithm Complexity Analysis Table I summarizes the number of candidates selected for RDO calculation based on edge direction histogram. As can be seen from Table I, the encoder with the fast mode decision alif gorithm would need to perform only the two chroma components have the same primary prediction mode. In case that the two chroma components have different primary prediction mode (which is very rare), the total number . Thus of RDO calculations would be our fast intraprediction algorithm has reduced number of RDO modes calculation significantly compared to the 592 modes that are used in the current RDO calculation in H.264/AVC video coding.

(7) The comparison results were produced and tabulated based on the difference of coding time , the PSNR difference and the bit-rate difference , and the coding time statistics is generated from JM6.1e encoder. The test platform used is Pentium IV-2.8 GHz, 512 Mbytes RAM. In order to evaluate the timesaving of the fast intramode decision algorithm, the following calculation is defined to find denote the coding time used by the time differences. Let JM6.1e encoder and be the time taken by the faster intrapreis defined as diction algorithm, and (8)

E. Early Termination of RDO Calculation During the intracoding of any prediction mode, the calculation can be terminated if it can foresee that the current mode will not be the best prediction mode. By early termination of the RDO calculation which is deemed to be suboptimal, a great timesaving could be achieved. In RDO, the coding cost consists of two parts: rate and distortion. After calculating the cost of rate, there might be cases that the cost of rate is higher than the coding cost of the best mode in the previous modes. This implies that the current mode will not be the best mode since its coding cost will not be the smallest. Therefore, the RDO calculation will be terminated and the calculation of the Distortion is then eliminated. An MB is encoded by either I4MB prediction or I16MB prediction. In RDO, the selection between these two coding modes is determined by the coding costs of the MB by each coding mode. After I16MB prediction coding, I4MB prediction coding will apply to the sixteen 4 4 blocks in the MB and the cost of these blocks will be accumulated. However, if the accumulated cost before encoding the entire sixteen 4 4 blocks is already higher than that of I16MB prediction coding, the coding of the remaining of 4 4 blocks in the MB will be terminated pre-maturely.

PSNR and bit-rate differences are calculated according to the numerical averages between the RD-curves derived from JM6.1e encoder and the fast algorithm, respectively. The detailed procedures in calculating these differences can be found from a JVT document authored by Bjontegaard [13], which is recommended by JVT Test Model Ad Hoc Group [12]. Note that PSNR and bit-rate differences should be regarded as equivalent, i.e., there is either the increase in PSNR or the decrease in bit-rate—not both at the same time. A. Experiments on IPPPP Sequences It should be noted that, in H.264/AVC coding, MBs in P-frames also choose intracoding as the possible coding modes in the RDO operation, thus great timesaving is expected by using fast intracoding algorithm for this type of sequences. Table II shows the tabulated performance comparison of the proposed algorithm with JM6.1e for the sequences listed in [12]. In this experiment, the total number of frames is 300 for each sequence, and the period of I-frames is 100, i.e., there is one I-frame for every 100 coded frames. Note that in the table positive values mean increments, and negative values mean decrements. The differences in PSNR and bit rate are calculated

PAN et al.: FAST MODE DECISION ALGORITHM FOR INTRA PREDICTION IN H.264/AVC VIDEO CODING

819

TABLE II RESULTS FOR IPPPP SEQUENCES

Fig. 8. News,

Fig. 10.

Timesaving at different intraperiod.

Fig. 11.

Timesaving at different size of searching area.

1Psnr = 00 067 dB, 1Bits = 1 226%.

Fig. 9. Mobile,

:

:

1Psnr = 00 018 dB, 1Bits = 0 451%. :

:

according to [13]. It can be seen that the fast intraprediction algorithm achieves consistent timesaving (average 25%) with negligible losses in PSNR and increments in bit rate. This means that, the fast intraalgorithm only takes about 3/4 of the time that is needed by JM6.1e. Figs. 8 and 9 show the RD curves of the two sequences “news” and “mobile”. Again, these twofigures have shown that the fast intraprediction algorithm has the similar RDO performance as that of JM6.1e.

We have noticed that the simple early termination scheme described in Subsection IV-E contributed to about 6% to 8% of the total timesaving, with negligible loss of PSNR.However at higher quantization values, the increase in bit-rate is slightly higher than that in the lower quantization values. Figs. 10 and 11 show the timesaving at different intraperiods and at different searching area during motion estimation. It is noted from these ures that the fast intraalgorithm achieves similar timesaving when the intraperiod changes from 50 to 150 frames. However, the timesaving has reduced significantly when the size of the searching area increases. This is because that in H.264 video coding, the rate distortion optimization for intercoding mode decision is much more complex than that for intracoding mode decision due to motion estimation operations, i.e., the time takes to perform the RDO for intercoding is much longer than that for intracoding, and this becomes even so when the searching area increases. Fig. 12 shows the timesaving by using different number reference frames. It can be seen that the timesaving has reduced as the number of reference frames increases. This is similar to that case of Fig. 11, as the increased number of reference frames has increased the proportion of intercoding in the overall computational load.

820

Fig. 12.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 7, JULY 2005

Timesaving at different # of reference frames.

1Psnr = 00 294 dB, 1Bits = 3 902%.

Fig. 13.

News,

Fig. 14.

Mobile,

:

:

TABLE III RESULTS FOR IIIII SEQUENCES

B. Experiments on All Intraframes Sequences In this experiment, a total number of 300 frames are used for each sequence, and the period of I-frames is set to 1, i.e., all the frames in the sequence are intracoded. It can be seen from Table III that the fast intraprediction algorithm achieves consistent timesaving (average 60%), which means that the fast intraalgorithm only takes about 40% of the time that is needed by JM6.1e. The average loss of PSNR of about 0.24 dB, or equivalently, a slight increment in bit rate of about 3.7%. Figs. 13 and 14 show the RD curves of the two sequences “news” and “mobile.” Again, these two ures have shown that the fast intraprediction algorithm has the similar RDO performance as that of JM6.1e.

1Psnr = 00 255 dB, 1Bits = 3 168%. :

:

TABLE IV RESULTS FOR IBBPB SEQUENCES

C. Experiments on IBBPBB Sequences In this experiment, the picture type is set to IBBPB , i.e., there are two B-frames between any two I- or P-frames. A total number of 300 frames are used for each sequence, and the period of I-frames is set to 100. It can be seen from Table IV that the fast intraprediction algorithm achieves consistent timesaving (average 10%) with negligible losses in PSNR in increments in bit rate. It is noted that the timesaving for this type of sequence is much less than that of the IPPPP format. This is due to the fact that in H.264/AVC coding, B-frames do not use intracoding, and also, in B-frame coding, the motion estimation takes much longer time than that in P-frame coding.

Another interesting observation from the table is that QCIF sequences achieve more timesaving than CIF. This is due to the high percentage of the boundary MBs in a QCIF sequence, and the searching area for those MBs is much smaller compared to the nonboundary MBs. Figs. 15 and 16 show the RD curves of the two sequences “news” and “mobile.” Again, these two ures have shown that the fast intraprediction algorithm has similar RDO performance as that of JM6.1e.

PAN et al.: FAST MODE DECISION ALGORITHM FOR INTRA PREDICTION IN H.264/AVC VIDEO CODING

821

VI. CONCLUSION This paper presented a fast mode decision algorithm for intraprediction in H.264/AVC video coding. By making use of the edge direction histogram, the number of mode combinations for luma and chroma blocks in an MB that takes part in RDO calculation has been reduced significantly from 592 to as low as 132. Other techniques such as early termination of RDO mode calculation are also used to further reduce the computation time. This results in a great reduction of the complexity and computation load of the encoder. Experimental results show that the fast algorithm has a negligible loss of PSNR compared to the original scheme. REFERENCES Fig. 15.

News,

1Psnr = 00 156 dB, 1Bits = 3 106%.

Fig. 16. Mobile,

:

:

1Psnr = 00 013 dB, 1Bits = 0 379%. :

:

TABLE V COMPARISON OF DIFFERENT FAST INTRA PREDICTION METHODS

[1] Information Technology—Coding of Audio-Visual Objects—Part 10: Advanced Video Coding, Final Draft International Standard, ISO/IEC FDIS 14 496-10, Dec. 2003. [2] “Report of the formal verification tests on AVC (ISO/IEC 14 496-10 | ITU-T Rec. H.264),”, MPEG2003/N6231, Dec. 2003. [3] G. Sullivan, T. Wiegand, and K.-P. Lim, “Joint model reference encoding methods and decoding concealment methods,” presented at the 9th JVT Meeting (JVT-I049d0), San Diego, CA, Sep. 2003. [4] X. Li and G. Wu, “Fast integer pixel motion estimation,” presented at the 6th JVT Meeting (JVT-F011), Awaji Island, Japan, Dec. 2002. [5] Z. Chen, P. Zhou, and Y. He, “Fast integer pel and fractional pel motion estimation for JVT,” presented at the 6th JVT Meeting (JVT-F017), Awaji Island, Japan, Dec. 2002. [6] F. Pan, X. Lin, S. Rahardja, K. P. Lim, Z. G. Li, G. N. Feng, D. J. Wu, and S. Wu, “Fast mode decision algorithm for JVT intra prediction,” presented at the 7th JVT Meeting (JVT-G013) , Pattaya, Thailand, Mar. 2003. [7] K. P. Lim, S. Wu, D. J. Wu, S. Rahardja, X. Lin, F. Pan, and Z. G. Li, “Fast intermode decision,” presented at the 9th JVT Meeting (JVT-I020) , San Diego, CA, Sep. 2003. [8] D. Marpe, V. George, H. L. Cycon, and K. U. Barthel, “Performance evaluation of motion-JPEG2000 in comparison with H.264/AVC operated in intra coding mode,” in SPIE Conf. Wavelet Applications in Industrial Processing, Oct. 2003, pp. 129–137. [9] A. K. Jain and A. Vailaya, “Image retrieval using color and shape,” Pattern Recognit., vol. 29, pp. 1233–1244, 1996. [10] A. M. Bazen and S. H. Gerez, “Systematic methods for the computation of the directional fields and singular points of fingerprints,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 905–919, Jul. 2002. [11] G. Sullivan, “Recommended simulation common conditions for H.26L coding efficiency experiments on low resolution progressive scan source material,” presented at the 14th VCEG-N81 Meeting, Santa Barbara, CA, Sep. 2001. [12] JVT Test Model Ad Hoc Group, “Evaluation sheet for motion estimation,”, Draft version 4, Feb. 19, 2003. [13] G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” presented at the 13th VCEG-M33 Meeting, Austin, TX, Apr. 2001.

D. Comparison of Different Fast Intraprediction Methods As mentioned in the beginning of Section IV, besides the proposed methods, we have also tried different ways in deciding the number of preferred prediction modes based on the primary prediction mode. Table V gives the comparison of these methods. In this experiment, the settings and parameters used are the same as that in Section V-A, and we only present the results of the two sequences, i.e., news and mobile. It can be seen from Table V that all the four methods have achieved significant timesaving, and in terms of RD performance, Method 3 achieves the best results, though it is slightly inferior in timesaving.

Feng Pan (M’00–SM’03) received the B.Sc., M.Sc., and Ph.D. degrees in communication and electronic engineering from Zhejiang University, Hangzhou, China in 1983, 1986, and 1989, respectively. Since then, he has been teaching and researching in a number of universities in China, U.K., Ireland, and Singapore. He is now with Institute for Infocomm Research, Singapore. His research areas are digital image processing, digital signal processing, digital video compression, and digital television broadcasting. He has published numerous technical papers and offered many short courses for industries. Dr. Pan currently serves as the Chapter Chairman of IEEE Consumer Electronics, Singapore.

822

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 7, JULY 2005

Xiao Lin (M’99–SM’02) received the Ph.D degree from the Electronics and Computer Science Department, University of Southampton, Southampton, U.K., in 1993. He worked with Centre for Signal Processing (CSP) for about five years as a Researcher and Manager on the Multimedia Program. He worked for DeSOC Technology as a technical director where he contributed on the VoIP solution, speech packet lost concealment for Bluetooth WCDMA baseband SOC development. He joined Institute for Infocomm Research, Singapore, in July 2002, where he is now Research Manager in charge of multimedia signal processing areas.

Susanto Rahardja (M’00–SM’04) received the B.Eng. degree in electrical engineering from the National University of Singapore (NUS), Singapore, the M.Eng. degree in digital communication and microwave circuits, and the Ph.D. degree in the area of logic synthesis and signal processing from the Nanyang Technological University (NTU), Singapore, in 1991, 1993, and 1997, respectively. He joined the Centre for Signal Processing, NTU, as a Research Engineer in 1996, a Research Fellow in 1997, and served as a Business Development Manager in 1998. In 2001, he joined NTU as an Academic Professor and was appointed the Assistant Director of the Centre for Signal Processing. In 2002, he joined the Agency for Science, Technology, and Research and was appointed as the Program Director to lead the Signal Processing Program. He is the Co-Founder of AMIK Raharja Informatika and STMIK Raharja, an institute of higher learning in Tangerang, Indonesia. He is currently the Director of Media Division in the Institute for Infocomm Research, Singapore. He has more than 100 articles in international journals and conferences. He is currently an Associate Professor at the School of Electrical and Electronic Engineering in the Nanyang Technological University. His research interests include binary and multiple-valued logic synthesis, digital communication systems, and digital signal processing. Dr. Rahardja was the recipient of IEE Hartree Premium Award in 2002 and the Tan Kah Kee Young Inventors’ GOLD Award (Open Category) in 2003.

Keng Pang Lim (M’95) received the B.A.Sc. and Ph.D. degrees from the School of Computer Engineering, Nanyang Technological University, Singapore, in 1994 and 2001, respectively. He is an Associate Lead Scientist in Institute for Infocomm Research, Singapore, where he is currently leading a video coding group. His research interests include video coding, computer vision, and number theoretical transform. Dr. Lim was the recipient of the Du Pont Scholarship and Sony Prize Award.

Z. G. Li (M’97–SM’04) received the B.Sci. degree and the M. Eng. degree from Northeastern University, Shen Yang, China in 1992 and 1995, respectively, and received the Ph.D. degree from Nanyang Technological University, Singapore, in 2001. Currently, he is with the Institute for Infocomm Research (I2R), Singapore. He is an also Adjunct Assistant Professor of Nanyang Technological University, Singapore. He has published more than 30 journal papers in the fields of video processing, hybrid systems, chaotic secure communication, and computer network.

Dajun Wu received the B.S. degree in computer science from Northwest University, Xi’an, China, and the M. Eng. degree in computer engineering from Xi’an Jiatong University, Xi’an, China, in 1993 and 1998, respectively. From 1998 to 2000, he was a Researcher Scholar in the School of Computer Engineering, Nanyang Technological University, Singapore. Since 2000, he has been with Institute for Infocomm Research, Singapore. His research field includes image/video coding and computer vision.

Si Wu received the B.S and M.Eng. degrees in telecommunication from Xidian University, Xi’an, China. He is currently working as Senior Technical Officer in the Institute for Infocomm Research, Singapore. His research interests are multimedia communication, networking, and video processing.

Suggest Documents