One Dimensional Transform For H.264 Based Intra Coding - CiteSeerX

4 downloads 319 Views 151KB Size Report
yumi.sohn@samsung.com and wjhan.han@samsung.com. ABSTRACT .... length coding. Fig. 6: Various type of scan order (4 types of zigzag scan from.
ONE DIMENSIONAL TRANSFORM FOR H.264 BASED INTRA CODING Yumi Sohn and Woo Jin Han Digital Media R&D Center, Samsung Electronics, Co. Suwon, Korea [email protected] and [email protected] ABSTRACT Two dimensional (2D) discrete cosine transform (DCT) are widely used as the transform method in most of video and image coding area. In general, 2D separable DCT is effective for smoothed regions of which components are highly correlated. When the block to be transformed has directional edges or a few values, however, 2D transform can yield worse performance than 1D transform. This drawback is taken care of by 1D transform with line prediction this paper proposed. Line prediction between each coefficient ahead of 1D transform can reduce the repetitive residue. If it is applied with an appropriate scan order, therefore, it can bring additional bit saving of transform coefficients in entropy coding. By experiments, it is shown that the proposed framework including 1D transform and line prediction improves the overall rate-distortion (R-D) coding performance up to nearly 8% in comparison with H.264. Index Terms – video coding, directional one dimensional transform, line prediction 1.

INTRODUCTION

In the international video coding standard, H.264/AVC [1], the coding processes for intra pictures are performed by the unit of 4×4 block or 16×16 macroblock. For intra prediction, there are nine 4×4 prediction modes and four 16×16 modes for each luminance block, and four 4×4 modes for each chrominance component. The encoder selects the prediction mode to have minimum difference between the original block and its prediction. Nevertheless, it is hard to predict the original block from only nine types of intra prediction as well as inter prediction using motion vector. Since 2D DCT sometimes leads the intra prediction to remain a large residual, moreover, it cannot be said that the conventional 2D DCT is most suitable for intra picture coding. In general, the transform with a large block size shows the abilities of the energy compaction and the conservation of the quantized signals' detail

better than the transform with a small block size. On the contrary, the transform with a large size extends its coefficients wider than with a small size in partially stationary signal. The selection of the transform type according to the signal characteristics, therefore, can increase the overall coding performance. Every transform-based coding scheme, almost without exception, choose the 2D DCT on non-overlapped image blocks with a square size of N×N. Actually, this conventional N×N DCT is always achieved by a cascade form of two 1D transform: one is along the vertical direction and the other is along the horizontal direction. For improved compression performance, a variable block size transform coding for H.264/AVC is discussed in [3]. In this paper, the concept of variable block size transform indicates the adaptation of the transform block size to the block size used for motion compensation. In case of intra coding, the size of the block transform is adapted to the properties of intra prediction signal. Not the more, all the transform block which proposed in [3] are square size or double size of square. Before that proposal, lots of researches are taken on variable block size coding which based on quad-tree structures. However, there is no theoretical analysis or experimental results about one dimensional transform. In this paper, we exploit both 2D transform and 1D transform at the same time to transform the residual with 1D directional prediction adaptively. This paper is organized as follows. Section II enters into the details of the proposed one dimensional transform with line prediction. The improvements of the overall rate-distortion performance of the proposed scheme for intra picture are discussed in Section III. Finally, the conclusions are drawn in Section IV. 2.

2.1.

ONE DIMENSIONAL TRANSFORM FOR H.264 BASED INTRA CODING One Dimensional Transform

In the conventional directional intra prediction scheme, it is hard to predict accurately when there

exists a directional edge or a few values only in a particular region of a block. The conventional intra prediction process is as follows: One of 9 directional prediction modes for each 4×4 block, which has the smallest distortion, can be usually selected by ratedistortion optimization. The residual is obtained by subtracting the selected prediction from the original block and it is transformed into frequency domain using 2D DCT. If spatial correlation exists, the 2D DCT shows the best performance. When most values of a residual block are zero or the part of a strong directional edge, however, 1D DCT is much better than 2D DCT. For instance, let us consider two types of coefficients block shown in Fig. 1: one has a smooth region and the other has a strong direction. For a smooth region in Fig. 1 (a), 2D DCT shows a good ability of energy compaction. On the contrary, in the latter case, 2D DCT yields more non-zero AC coefficients than the vertical 1D DCT and, therefore, it cannot be said that 2D DCT is always the best transform method (Fig. 1 (b) and (c)). 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

00 10 00 00 00 10 00 00 00 10 00 00 00 10 00 00

00 10 00 00 00 10 00 00 00 10 00 00 00 10 00 00

Fig. 1: Examples of transformed coefficients. (a) 2D DCT for stationary signal. (b) 2D DCT for partially stationary signal. (c) 1D DCT for partially stationary signal.

The 1D transform in 4×4 blocks can be expressed as depicted in Fig. 2. Besides the typical vertical and horizontal transform, 1×1 transform proposed in [2] can be used as one of the 1D transform. Actually, since 1×1 transform is not a real transform type, nothing is changed at that time. The marginally correlated samples are hardly compressed either in spatial domain or frequency domain. Though line prediction and quantization are used instead of bypassing transform, therefore, the coding efficiency is preserved. In [2], bypassing transform was used with the gradient-based scanning and the perceptual quantization under 1/8pel motion estimation/compensation. Compared to

N×N transform, our proposed design is more flexible with variable sub-partition transform. Four types of transform mode including the existing 2D DCT for each block can be coded by 3 bits variable length coding (VLC). The most probable mode is used like encoding intra prediction mode because the transform mode also has a high spatial correlation with neighboring blocks.

(c) (b) (a) Fig. 2: 4x4 image block in which the 1-D transform is performed. (a) 4x1 transform. (b) 1x4 transform. (c) 1x1 transform.

2.2.

Line Prediction

We proposed the one dimensional transform that is adaptively applied with the conventional 2D transform according to the characteristics of residuals. Since current prediction and transform are not orthogonal, the de-correlation power may be overlapped. When the conventional 2D transform with a square block size is performed, there is no other way to predict line by line because only the decoded area can be used for the prediction. In the processing flow of prediction and transform as depicted in Fig. 3, if applying the conventional block prediction to the 1D transform, there exists spatial redundancy as becoming more distant. (a) Block prediction (using 1D prediction)

2D transform

Line prediction (using 1D prediction)

1D transform

(b)

(c)

Block prediction

Line prediction

Max Spatial distance

Fig. 3: Processing flow of prediction and transform. (a) Current method. (b) Proposed method. (c) comparison between block prediction and line prediction.

By 1D transform, block samples can be transformed and reconstructed line by line. The line prediction raises the efficiency of prediction by reducing spatial redundancy. Fig. 4 shows an example to explain the superiority of the line prediction compared with the conventional intra prediction. It is easily found that it is rather reasonable to apply 1D transform to the residual obtained by the line prediction in the second step. The line prediction performed additionally after intra prediction reduces remaining spatial correlation with respect to the orthogonal directions of 1D transform; e.g., vertical prediction for 4×1 transform and horizontal prediction for 1×4 transform. In case of bypassing transform mode, various types of line prediction can be applied to minimize residuals according to the intra prediction mode.

energy, we designed various scan orders to find the best fit with each transform as depicted in Fig. 6. Without additional information for scan order type, it could be improved as much 6~10% bit saving comparing with just zigzag scan. The scan order type varies with the selected transform type and the characteristics of a sequence, and the zigzag scan is not the most selected one among them. In our experiments, the best combination of 4 transform types and 8 scan order types is selected by rate distortion optimization, where 4 transform types are 2D DCT and other three 1D DCT and 8 scan order types are shown in Fig. 6. The scan mode of each 4×4 block is encoded in the form 3 bits fixed length coding.

(a)

(b)

original

prediction

residual

Fig. 6: Various type of scan order (4 types of zigzag scan from each vertex, horizontal / vertical scan).

3. original

prediction

Fig. 4: A comparison of conventional method and new method. (a) Example of intra prediction (vertical) + 2D transform. (b) Example of line prediction + 1D transform (4 times).

Line prediction can be generalized like Fig. 5. It could be used to not only 1D transform but also 2D transform with other small transform block.

Fig. 5: Various type of line prediction.

2.3.

EXPERIMENTAL RESULTS

residual

Scan Order

After the conventional 2D DCT transform and quantization, most AC values will be zero and we can gather even more consecutive zeros by using zigzag scan. When we use 1D transform, the zigzag scan is not the appropriate method because the location of DC coefficient will be different along the transform direction. Therefore, in order to rearrange the coefficients according to the order of decreasing

In this section, we describe the experiments based on the modified H.264 reference software and ratedistortion curves to prove the effectiveness of our 1D transform scheme. For our experiments, we selected four sequences of the CIF format: “Bus”, “Mobile & Calendar”, “Football”, and “Forman.” All sequences were encoded using four fixed quantization parameters for Intra frame, which are 24, 28, 32, and 36. The transform and scan order of each 4×4 block was selected adaptively with side information. Table 1: Comparison between conventional approach and applying proposed tools one by one (FOREMAN) Improved Bit rate [%]

Proposed tools

QP 24

QP 28

QP 32

QP 36

Avg.

(1)

1.36

1.11

0.63

0.37

0.87

(2)

2.27

1.66

0.97

0.63

1.38

(3)

3.53

2.69

1.93

1.28

2.36

(4)

4.38

3.23

2.19

1.50

2.82

Table 1 is the bit rate reduction of “Forman” sequence comparing four proposed methods with the conventional one, where the four

proposed methods mean (1) conventional + 1×1 transform, (2) (1) + scan order, (3) (2) + 4×1/1×4 transform, and (4) (3) + line prediction. The improvement in bit rate reduction is about 0.8 ~ 2.8% of each step in high bit rate and, therefore, our proposed methods can be said to be efficient. Table 2 shows the reduced bit rates of the H.264 coding method for all test sequences when using the conventional and the proposed method. It can be observed that the proposed algorithm outperforms the conventional one up to nearly 8% at the same PSNR and 2.5% at the average bit rate saving. Next, let us regard these results as the following characteristics of contents, that is, as mentioned above, partially stationary signals are well suited for 1D transform and line prediction. In case of “Football” sequence, the smooth ground area is easy to encode using the conventional transform. On the contrary, since “Mobile” and “Foreman” sequence have lots of edges and directions, the performances for these sequences are higher than the others. Fig. 7 shows the R-D curves of “Mobile” sequence for which the proposed method has the best performance. It is also found that the gains in low bitrates are lower that in high bitrates. This might be due to the symbol bits taking up a small part of the whole in high bitrates. Table 2: Bit rate reduction ratio compared with H.264 Improved Bit rate [%]

Test Sequence

QP 24

QP 28

QP 32

QP 36

Avg.

Bus

2.93

1.86

1.33

0.92

1.76

Mobile

7.51

5.31

3.85

2.87

4.88

Football

0.78

0.32

0.16

0.04

0.33

Foreman

4.38

3.23

2.19

1.50

2.83

Avg.

3.90

2.68

1.88

1.33

2.45

40

PSNR [dB]

38 36 34 32

Conventional

30

Proposed

28 3000

4000

5000

6000

7000

8000

9000

10000

Bitrate [bps]

Fig. 7: R-D curves of the conventional approach and the proposed method (MOBILE).

4.

CONCLUSIONS

In this paper, we proposed the one dimensional

transform framework for intra video coding. To complement the 1D transform, the line prediction is performed in advance to decrease the redundant spatial correlation. We applied the prediction and transform to a coder and proved that our proposed method works better than the traditional one up to nearly 8% in terms of bit rate reduction at the same PSNR in I frame. Scan order selection is one of the key techniques which should be further investigated. 5.

REFERENCES

[1] ITU-T Recommendation H.264 and ISO/IEC 14496-10 (MPEG-4) AVC: “Advanced Video Coding for Generic Audiovisual Services,” Mar. 2005. [2] M. Narroschke, “Extending H.264/AVC by an adaptive coding of the prediction error,” Proc. of Picture Coding Symposium, Beijing, China, Apr. 2006. [3] M. Wien, “Variable Block-Size Transform for H.264/AVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, pp. 604-613, Jul. 2003. [4] N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,” IEEE Trans. Comput., vol. 23, pp. 90-93, 1974.

Suggest Documents