Document not found! Please try again

Improving Intra mode coding in H.264/AVC through block oriented ...

8 downloads 530 Views 688KB Size Report
Email: [email protected]. Béatrice Pesquet-Popescu. TSI - ENST Paris. 46, rue Barrault. 75634 Paris Cedex 13. Abstract—In an H.264 video encoder ...
Improving Intra mode coding in H.264/AVC through block oriented transforms Antoine Robert, Isabelle Amonou

B´eatrice Pesquet-Popescu

France Telecom R&D 4, rue du Clos Courtel 35512 Cesson-S´evign´e Cedex Email: [email protected]

TSI - ENST Paris 46, rue Barrault 75634 Paris Cedex 13

Abstract— In an H.264 video encoder, the important task of intra coding is performed in the residual domain after a spatial transform has been applied to the block data. Although the intra prediction is very efficient and shows the ability to capture some orientations of the image, after this spatial prediction the intra residual may still show regular structures. To take this possible regularity into account, especially orientation, we propose a preprocessing stage that selects the best rotation to be applied to the block among a set of predefined rotations. To this end, we utilize a rate-distortion algorithm. We further propose to calculate the real orientation of the block to decrease the amount of computation. Experimental results show that compared to the standard intraprediction mode of H.264, the proposed method can gain up to 1 dBs.

Experimental results show a significant improvement for intra coding when our pre-processing stage is applied. The remaining of this paper is organized as follows. Section 2 describes the intra prediction mode in an H.264 coder and our pre-processing stage. Section 3 presents our rate-distortion based algorithm and encoding issues. Simulation results are shown in Section 4. Finally, we conclude the paper and give future work directions in Section 5.

I. I NTRODUCTION

In contrast to some previous video coding standards (e.g. H.263 and MPEG-4 Visual), intra prediction in H.264/AVC is always conducted in the spatial domain, by referring to neighboring samples of previously coded and decoded blocks which are to the left and/or above the block to be predicted. 1) Intra 4x4 mode: When using the Intra 4x4 mode, each 4x4 block is predicted from spatially neighboring samples as illustrated on the left-hand side of Fig.1(a). The 16 samples of the 4x4 block which are labelled as a − p on the figure, are predicted using prior decoded samples in adjacent blocks labelled as A − Q. For each 4x4 block, one of nine prediction modes can be utilized. In addition to ”DC” prediction (where one value is used to predict the entire 4x4 block), eight directional prediction modes are specified in the standard, as illustrated on the right-hand side of Fig.1(a). Those modes are suitable to predict directional structures in a picture such as edges at various angles. When samples E − H that are used for the diagonal-down-left prediction mode are not available these samples are replaced by sample D. For the sake of the example, Fig.1(b) also shows five of the nine Intra 4x4 prediction modes. 2) Intra 16x16 mode: When using the Intra 16x16 mode, the whole luma component of a macroblock is predicted. Four prediction modes are supported. Mode 0 (vertical prediction), mode 1 (horizontal prediction), and mode 2 (DC prediction) are specified similar to the modes in Intra 4x4 prediction except that instead of 4 neighbors on each side to predict a 4x4 block, 16 neighbors on each side are used to predict a 16x16 block.

H.264/AVC [1] is the newest international video coding standard of ITU-T (as Recommendation H.264) and ISO/IEC (as International Standard 14496-10 akin MPEG-4 part 10) Advanced Video Coding (AVC). It considerably improves the previous standards in the field, notably MPEG2, in terms of compression efficiency, thereby allowing for a broader range of target applications: broadcast video over all kind of networks (DSL, terrestrial, mobile...), video on demand, streaming services, conversational applications ... Although still heavily based on an hybrid coding predictive architecture, some highlighted features of the design enable enhanced coding efficiency: variable block-size motion compensation, quarter-sample-accurate motion compensation, multiple reference picture motion compensation, improved ”skipped” and ”direct” motion inference, directional spatial prediction for intra coding, in-the-loop deblocking filtering ... [2]. Also the design was enhanced for improved coding efficiency, including for example small block-size transform, hierarchical block transform or context adaptive arithmetic coding. The resulting encoder performs very well compared to the previous video compression standards. However, there is still some room for improvement and in particular of the intra coding mode. Indeed, we have observed that the blocks after intra prediction still show some orientations (cf Fig.2). In this work, we propose to pre-process the block before the H.264 transform is applied: we select the best permutation to be applied to the block among a set of possible permutations corresponding to block orientations.

II. S PATIAL TRANSFORM A. Intra prediction mode in H.264/AVC

TABLE I A LL STATES IN THE 4 X 4 CASE State

Angle

State

Angle

1 3

+27o +45o

2 4

−27o −45o

(a)

Fig. 3.

(b) Fig. 1.

Intra 4x4 prediction

Circular shifts in the 4x4 case

vertical) or the blocks have no suitable direction: they are oriented according to an angle that is too far from the angles (more than ±3o ) that have been defined to correspond to the basic permutations. The others states correspond to blocks that have their direction close to angles (less than ±3o ) defined in Tab.I.

Fig. 2.

Part of Flower (CIF) and its residual after intra prediction

B. The pre-processing stage After intra prediction, we are left with residual blocks that still show some regular patterns as shown in Fig.2. This intra predicted image has been produced by an H.264 coder with 16x16 and 4x4 prediction modes. It corresponds to the best partition of the predicted image after the encoding process, i.e. each block (4x4 or 16x16) of this image exhibits the prediction of its best intra prediction mode. We propose here to define a pre-processing stage based on permutations corresponding to given orientations inside the blocks. In other words, this pre-processing stage performs pseudo-rotations on the blocks. This enables to straighten out these blocks towards horizontal or vertical axes. 1) 4x4 blocks: In this case, we have defined 5 different states that correspond to basic block orientations and the associated permutations. In state 0, nothing needs to be done because either the blocks are non-oriented (if their directions are horizontal or

In all these states, some circular shifts at the pixel level are applied in order to simulate a rotation. These circular shifts enable us to override the problem of interpolation that is inherent to real (matrix-based) rotation schemes. Moreover, by this simple pixel rearrangement we simulate the corresponding rotation without creating holes in the corners of the blocks. In state 1 (cf Fig.3, upper left-hand part), a circular shift is performed on the first two columns and in state 2 (its opposite) on the last two columns. States 3 and 4 use more complex pixel rearrangements: state 3 corresponds to circular shifts applied on the first and on the last column before applying the same circular shift as in state 1. State 4 is similar to state 3 but the operations are performed on the lines: the first and last lines are shifted before applying the operations of state 2 (cf Fig.3, middle part). This figure shows that the direction of the block is coming back to horizontal or vertical directions after the rearrangement has been applied. These circular shifts simulate a real rotation without its disadvantages. 2) 16x16 blocks: Similarly to the 4x4 case, we define here 17 states of macroblocks orientations. State 0, again, reflects the blocks that are either non-oriented or that have their directions too far away (more than ±3o ) from the accepted rotation angles (states). The others states correspond to blocks that have their direction close to angles: ±7o , ±14o , ±20o , ±27o , ±32o , ±37o , ±41o and ±45o . Each of these 16 states defines a permutation that is performed by circular shifts, like in the 4x4 case.

III. R ATE - DISTORTION OPTIMIZATION AND CODING A. Rate-distortion based selection H.264 video coding is based on the concept of ratedistortion optimization (RDO) [3] which means that the encoder has to encode the intra blocks using all the mode combinations and choose the one that gives the best performance. The coding cost is based on two variables: rate and distortion. Depending on the best compromise, a macroblock is encoded by either intra 16x16 coding or intra 4x4 coding. In intra 16x16 coding, the four prediction modes are tested. The one that gives the best compromise between rate and distortion is defined as the best intra 16x16 coding mode. Its rate and distortion are also stored. After intra 16x16 coding has been tested and the best solution retained, intra 4x4 coding is applied to the sixteen 4x4 blocks in the macroblock. Like in the intra 16x16 coding, all the nine prediction modes are tested. The best one for each block in the rate-distortion sense is stored with all its associated information. The rates of the 4x4 blocks are accumulated. The distortion is given in each case by the square error of the macroblock: D=

15 15 X X

2 iM B (m, n) − ˆiM B (m, n)

(1)

m=0 n=0

where iM B (m, n) is the pixel (m, n) of the original macroblock and ˆiM B the reconstructed macroblock. After all the tests have been performed, rates and distortions of the macroblock in both cases (best 16x16 solution versus best 4x4 combination) are compared in order to select the most competitive intra coding. Our pre-processing stage fits perfectly in this rate-distortion optimization scheme. Indeed, our pre-processing simply adds additional modes to be tested in each case: • in the intra 16x16 coding, instead of testing once each of the four prediction modes, we test it 17 times with each of our 16x16 orientations (cf II-B.2). • in the intra 4x4 coding, for each 4x4 block of the macroblock, we test all the nine prediction modes 5 times with our five candidate orientations (cf II-B.1). This RD selection of the orientation is relatively complex in terms of number of rate-distortion evaluations: (17 × 4 + 16×(5×9) = 788 modes to be tested per macroblock (against 148 for H.264)). B. Coding and decoding stages 1) Macroblock: After the best mode has been selected (the best mode being a combination of block size, permutation and intra prediction), the encoding of the macroblock is left to H.264. Each macroblock is transformed by the AVC 4x4 integer transform before being quantized and entropycoded using CABAC [4]: the Context-based Adaptive Binary Arithmetic Coding of H.264 achieves good compression performance through (a) selecting probability models for each syntax element according to the elements context, (b) adapting probability estimates based on local statistics and (c) using

arithmetic coding. It can be used in Main profile (alternatively, CAVLC [1] [2] can be used in others profiles). Oriented blocks are treated just in the same way as non-oriented ones. Because the blocks are straightened out towards horizontal or vertical axes before transform, the AVC integer transform is more efficient on the post-processed data, therefore improving the overall rate-distortion performance. The decoding is left to H.264 too. The macroblocks are entropy-decoded using CABAC, inverse quantized and inverse transformed. The macroblocks that have been oriented before coding have to be re-oriented after the decoding process using the inverse permutations (cf II-B). We obviously need to encode this permutation information for all the macroblocks (state 0 must be transmitted too). 2) Permutation information: In intra 16x16 coding, the macroblock type is written in the macroblock header defining the assigned prediction mode (Intra16x16PredMode). We then add the 16x16 orientation mode that has been selected (that may be 0). In the syntax of H.264, the macroblock type (mb type syntax element) for intra picture can take up to 24 values. We decided to re-use the same context to encode our 17 possible orientations with CABAC, making the assumption that these contexts can well describe our orientations. At the decoding side, the decoder reads the macroblock header. In this header, the macroblock type indicates an Intra block and the prediction information defining the intra prediction modes for luma and chroma. The extra orientation information that we have added is read. The texture information is then decoded and finally reoriented if the orientation is not zero. If the mode used is intra 4x4, the intra prediction modes are predicted from neighboring blocks before being encoded. This information is written in the block header, in the prediction syntax element (mb pred). Because this predictive coding of the information is very efficient, we decided to use the same type of prediction for our 4x4 orientation modes (cf Fig.4) before coding them with CABAC using the same context as those defined for intra prediction modes. Orientation modes from the above (A) and left (B) adjacent blocks are compared if available, otherwise they are set to 0. The higher one defines the most probable orientation mode for the block to be encoded (C). A flag is just coded if the orientation of the block (C) equals the most probable orientation mode, otherwise if it is lower than the most probable mode we fully encode the orientation mode, otherwise we encode the orientation mode minus 1. To summarize, first the luma intra prediction is encoded, then the permutation information, and then the chroma intra prediction. At the decoding side, for each 4x4 block that is decoded as intra, the intra prediction mode for luma is read, then the 4x4 orientation mode and the intra prediction mode for chroma. IV. E XPERIMENTAL RESULTS The proposed algorithm was implemented in JM10 [5] provided by JVT. All the experiments have been done in Main profile at level 4.0 permitting the use of CABAC, but only on residual intra frames. The sequences are generated by varying

TABLE II R ESULTS FOR OTHER SEQUENCES

Fig. 4.

Adjacents 4x4 intra coded blocks

Fig. 5.

d = 700kbits/s ∆P SN R = (dB) +0.10 +0.17 +0.18 +0.19 +0.19 +0.20 d = 250kbits/s ∆P SN R = (dB) +0.13 +0.20

∆P SN R > +0.50dB d > (kbits/s) QP < 9000 10 9500 9 9300 9 9800 6 9400 9 9600 9 ∆P SN R > +0.5dB d > (kbits/s) QP < 1500 9 1700 7

Results for Flower (CIF)

the QP for intra slices over all available values (0-51). They are then made up of twenty intra frames. We show here two types or results: first without taking into account the encoding of the permutation information and then with this information encoded. The results without coding the permutation information for the sequence Flower in CIF format at 15Hz are shown in Fig. 5 and those for the sequence Mobile&Calendar in CIF format at 15Hz in Fig.6. It can be seen on these figures that our method improves H.264 coding at all bitrates. Moreover, experiments have shown that all the possibilities of our pre-processing method have been exploited: our test images are composed of 396 macroblocks. For exemple, at 700kbits/s and for the first image of the sequence Flower, 228 macroblocks are encoded in intra 4x4 and 168 in intra 16x16. In both cases orientations have been

Fig. 6.

Sequence CIF Akiyo Bus Container Football Foreman Tempete Sequence QCIF Carphone Foreman

Results for Mobile&Calendar (CIF)

Fig. 7.

Results for Mobile&Calendar with the permutation information

used. The PSNR improvement over H.264 ranges from 0.21dB for the sequence Flower and from 0.22dB for the sequence Mobile&Calendar at 700kbits/s, to up than 1dB at high bitrate in both case (12000kbits/s). Similar results have been obtained with a large number of sequences like Akiyo, Foreman, Bus, Tempete as shown in Tab.II. For example, the sequence Tempete generates a gain of 0.20dB compared to H.264 at 700kbits/s, and a gain higher than 0.50dB beyond 9600kbits/s or for a QP lower than 9. In a second time we effectively encoded the permutation information with the method presented before. Corresponding RD curve for the same sequence Mobile&Calendar is plotted in Fig.7. This figure shows that the necessary rate to effectively encode the permutation information is more important than the gain brought by our method until a certain (high) rate (here about 12000kbit/s). The permutation information for the sequence Mobile&Calendar at 700kbits/s generates a loss of 0.70dB against a gain of 0.22dB, or a loss of 0.48dB over H.264. The very simple encoding that we performed on the orientation information (for now simply coded using the same prediction, the same context and the same syntax element as the intra prediction modes) is clearly not efficient at the moment and needs further improvement.

V. C ONCLUSION AND FUTURE WORK We have introduced in this paper a pre-processing stage based on block orientation that significantly improves H.264 at all bitrates. Taking into account the orientation of the intra macroblocks before encoding, we adapt them to the DCT transform without modifying it. The oriented blocks or macroblocks are obtained by applying very simple circular shifts at the pixel level, thus avoiding the problems inherent to classical rotation schemes with re-interpolation. Then the coder has to encode and send the permutation information by using CABAC. The encoding method that we use for the orientation information is very close to the one used for intraprediction. Future work will focus on the improvement of this method. In particular, we will work on reducing the impact of the additional orientation information that needs to be transmitted to the decoder and speed-up the algorithm. We also intend to extend it to 8x8 blocks (with FRExt only [6]) with the 8x8 integer DCT. We also plan to extend the method to the

encoding of chroma components and inter frames (16x8, 8x16, 8x4 and 4x8 block modes). R EFERENCES [1] Advanced video coding for generic audio-visual services, JVT - ISO/IEC 14496-10 AVC - ITU-T Recommendation H.264. Draft ITU-T Recommendation and Final Draft International Standard, JVT-G050r1, 2003. [2] T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the h.264/avc video coding standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560–576, july 2003. [3] T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, and G. Sullivan, “Rateconstrained coder control and comparison of video coding standards,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 688–703, july 2003. [4] D. Marpe, H. Scwarz, and T. Wiegand, “Context-based adaptive binary arithmetic coding in the h.264/avc video compression standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 620–636, july 2003. [5] “Joint model 10,” JVT ISO/IEC ITU-T, 2006, http://iphome.hhi.de/suehring/tml/index.htm. [6] Advanced Video Coding Amendment 1 : Fidelity Range Extensions, JVT - ISO/IEC 14496-10 AVC - ITU-T Recommendation H.264 Amendment 1. Draft Text of H.264/AVC Fidelity Range Extensions Amendment, july 2004.

Suggest Documents