A Hardware-oriented Intra Prediction Scheme for

15 downloads 0 Views 475KB Size Report
video coding standard developed by the AVS Workgroup of China. ... The system block diagram of AVS video encoder is .... Numbers of reference frame. 1.
A HARDWARE-ORIENTED INTRA PREDICTION SCHEME FOR HIGH DEFINITION AVS ENCODER Man-Lan Wong, Yi-Lun Lin, and Homer H. Chen Graduate Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan {r94942048, r94942037}@ntu.edu.tw, [email protected] ABSTRACT Audio and Video Coding Standard - Part 2 (AVS-P2) is a video coding standard developed by the AVS Workgroup of China. In this paper, an intra prediction scheme for high definition (HD) AVS encoder is proposed to reduce the local buffer storage and computational complexity in hardware. Simulation results show that our scheme induces very little performance degradation in general frame structure. It can be easily and efficiently implemented for practical applications. Index Terms—AVS, H.264/AVC, hardware codec design.

intra

prediction,

1. INTRODUCTION Intra prediction is a coding technique for reducing spatial redundancy of neighboring macroblocks (MBs). Most recent video coding standards such as MPEG-4, H.263+ and H.264 have adopted intra prediction, either in spatial domain or transform domain, as one of the principal coding tools. However, two major problems exist in the hardware architecture design of intra prediction. The first one is that intra prediction and mode decision occupy most of the computational resources in the coding of intra frames. Therefore, parallelism in hardware design is required to accelerate this critical function, which considerably increases hardware (H/W) cost and power consumption. The second problem comes from the fact that reconstructed pixels of neighboring blocks are necessary for generation of intra-predicted blocks. This will break the traditional macroblock pipelining scheme and keep back a block from prediction and mode decision until the previous block is reconstructed. Some proposals have been presented in the literature to overcome these issues [1-2], but most of them tend to add extra H/W components or approximate mode decision. In this paper, an intra prediction scheme for AVS [3] encoder is proposed. The scheme provides an “easy-toimplement” solution for hardware design of AVS intra prediction, which significantly reduces gate count and local memory and avoids intra mode decision. As can be seen in the following sections, little performance degradation is observed in IPPP structures, which shows that the scheme is especially suitable for applications with large GOP size.

Fig. 1. Block diagram of AVS encoder

The paper is organized as follows. Section 2 gives an overview of AVS-P2. Section 3 describes the intra prediction modes in AVS. Section 4 introduces the proposed intra prediction scheme. Experimental results are stated in Section 5. Section 6 describes an extension of the scheme to H.264/AVC. Finally, conclusions are given in Section 7. 2. OVERVIEW OF AVS- P2 Audio and Video Coding Standard (AVS) is a second generation source coding standard developed by the AVS Workgroup of China. AVS-P2 defines the standard of video coding for applications such as high definition digital video broadcasting and high density storage media. IPTV and satellite broadcasting experiments using the AVS standard have recently been carried out in the Chinese industry. The system block diagram of AVS video encoder is depicted in Fig. 1. The encoding loop is similar to H.264/AVC but the details are different. AVS provides a

Coding tools Inter prediction

Intra prediction

Transform Interpolation Entropy coding Deblocking

Table 1. Properties of AVS Features Inter modes include 16x16, 16x8, 8x16 and 8x8 with 2 reference frames at most 8x8 based with 5 modes for luminance and 4 modes for chrominance 8x8 based integer transform with prescaling Up to quarter-pel precision, 4-tap filter for half and quarter samples CA-2D-VLC For boundaries of 8x8 blocks, 6-tap filter

modes in H.264/AVC but with small difference. AVS adopts low pass filter to produce prediction values of DC mode and uses pixels from adjacent left and top 8x8 blocks in down-left mode to get more accurate prediction. 4. PROPOSED INTRA PREDICTION SCHEME 4.1. Conventional two-stage pipelining in video codecs

Fig. 2. Luma intra prediction modes

_______________________________________________

In the hardware design of conventional video coding standards, a two-stage pipeline configuration is usually adopted to increase H/W efficiency and utilization. The first stage is the prediction stage, which includes motion estimation (ME) and intra prediction (IP). ME and IP are responsible for prediction in temporal domain and spatial domain, respectively, in this stage. The second stage is the block stage, which generates residues and bit-stream from the prediction in stage one. Reconstructed pixels obtained after the decoding loop in this stage are also fed back to stage one as references for prediction. The functional blocks in stage two include discrete cosine transform (DCT), quantization (Q), inverse quantization (IQ), inverse discrete cosine transform (IDCT), motion compensation (MC), and entropy coding (EC). Compared to serial operation, two-stage pipelining improves the processing capability of video encoder and shortens the encoding time. However, due to long encoding path and large computational requirement of AVS encoder, realtime implementation can only be achieved by utilizing more efficient pipelining schemes. 4.2. Proposed intra prediction scheme for multi-stage pipelining in AVS

Fig. 3. Chroma intra prediction modes

H/W friendly solution to trade off between performance, backward compatibility, and coding complexity. The main coding tools of AVS include pre-scaled integer transform, intra and inter prediction, content adaptive 2DVLC (CA-2D-VLC), and in-loop deblocking. Each of them is described in brief in Table 1. 3. INTRA PREDICTION IN AVS As described in the previous section, the intra prediction of AVS is performed in 8x8 block size. AVS utilizes intra prediction in spatial domain by referring to reconstructed pixels of neighboring blocks before deblocking, which allows rather sophisticated prediction schemes as compared to transform domain prediction. In AVS, there are totally five intra modes for luminance component and four intra modes for chrominance components, denoted as Intra_8x8 and Intra_chroma, respectively. The neighboring left and top macro-blocks in the same frame are exploited for intra prediction. Figs. 2 and 3 show the operations of intra modes in luminance and chrominance components, respectively. As can be seen in Figs. 2 and 3, the intra prediction modes in AVS are very similar to the corresponding intra

4.2.1. Basic ideas Data dependencies in sequential algorithms are a serious impediment to parallelism; they become the bottleneck in the processing speed and efficiency of hardware units especially when the pipelining stages increase. In video coding standards, one of the most critical components that introduces data dependencies and prevents H/W from parallel operations is intra mode prediction and decision. Intra prediction requires reconstructed pixels of neighboring blocks for prediction; therefore, pipelining between neighboring blocks is impossible, and parallel processing is usually necessary for hardware acceleration. Moreover, intra mode decision consumes a lot of computation time to decide the best candidate mode, which significantly increases the power consumption and hardware implementation cost. In order to compensate the drawbacks stated above, we propose a hardware-oriented intra prediction scheme that greatly reduces the computational complexity and simplifies the design of intra predictor. In the proposed scheme, a fix pattern is applied to intra mode decision instead of choosing the best intra mode among all the candidates. The basic pattern is shown in Fig. 4(a), where each block represents an 8x8 luminance block, and DC, H and V denote DC mode, horizontal mode, and vertical mode, respectively.

implementation cost. To relieve this burden, we force all the upper 8x8 blocks in each MB to be predicted with horizontal mode, in which case only left reconstructed pixels are needed instead of upper ones in the prediction process. This way, our scheme reduces the local buffer required for storage of reference pixels from O(n) to O(1), where n is the frame width of the video source. Finally, the modes of the blocks labeled “H/V” in Fig. 4(a) can be assigned with either horizontal or vertical mode according to different criteria. Our simulation results in the next section show that assigning these blocks with vertical mode will lead to better performance than other cases. 5. SIMULATION RESULTS Fig. 4(a) Proposed luma intra prediction scheme (basic pattern), (b-d) Three patterns designed for simulation, noted as “Pattern 1”, “Pattern 2” and “Pattern 3”

Except the top-left 8x8 block that allows DC mode only, all other blocks are coded with either horizontal or vertical mode. Horizontal and vertical modes have the advantage that no filtering operation is required to perform the prediction as compared to other modes, which makes these two modes easy to implement on hardware. Note that although the top-left block is assigned to DC mode in our proposed scheme, time-consuming filtering operations are avoided since the DC mode uses 128 as the prediction value. For chrominance components, the designed pattern is similar to the luminance counterpart except that all the blocks not belonging to the first MB column are assigned to horizontal mode. In summary, our proposed scheme can simplify the complicated intra prediction process such that only two procedure steps are required: First, cache the reconstructed pixels in local memory, and then fetch the pixels from buffer as prediction in later process. 4.2.2. Analysis of the proposed scheme Video quality may degrade due to the simplification of intra mode decision in the proposed scheme. However, according to our observation, the performance degradation is negligible for high resolution videos. It is because most blocks in high resolution videos belong to flat regions, and the difference between various directional predictions decreases, making our proposed scheme especially suitable for applications such as digital video broadcasting (DVB) and high-definition television (HDTV). In addition to the characteristics stated above, our scheme has another desirable property that is specially designed for H/W. In the coding of intra blocks, the reconstructed pixels of the MBs above the current MB have to be stored in buffer as reference pixels for prediction, which requires large memory storage proportional to the frame width. The situation will be even worse for high resolution videos. For example, a 704 (pixels) x 8 (bits/pixel) buffer is required to encode an Iframe at CIF size but a 3840 (pixels) x 8 (bits/pixel) buffer is needed for HD1080p videos, both in 4:2:0 format. The large size of local memory heavily burdens an encoder with large area requirement and high

In the simulation, five patterns are tested and compared to verify the performance of our proposed scheme, and three of which are shown in Figs. 4(b)-(d). The local buffer required for storage of neighboring reconstructed pixels is 24 bytes for Patterns 1 and 2, and 16 bytes for Pattern 3. The other two patterns selected for comparison are “All mode” and “DC mode”. All mode means that the intra mode with minimum SAD is chosen as the best mode for each 8x8 block, and DC mode represents the case where DC mode is assigned to all 8x8 blocks. Our simulation is run on 2.4 GHz Pentium4 PC with 1.24 GB DRAM, the coding parameters for the test sequences are listed in Table 2. “Mobcal” and “Parkrun” were chosen as the test sequences in our simulation. The rate-distortion (RD) curves of Patterns 1, 2 and 3, along with All mode and DC mode, are depicted in Figs. 5 and 6 respectively for the two sequences. It can be found that Patterns 1, 2 and 3 only induce little RD performance drop compared with All mode. H/W designers can design different types of fix patterns according to the requirements and tradeoff between performance and complexity. Similar results are obtained in IBBP structure. Tables 3 and 4 list the total encoding time and overall computational reduction of the five patterns relative to All mode, for “Mobcal” and “Parkrun” respectively, in intra coded frames. Table 2. Simulation parameters Item Specification Resolution HD720p Frame rate 30 fps Numbers of frame 200 Enabled inter modes 16x16, 8x8 Searching ranges ±64 Numbers of reference frame 1 MV precision half-pel Deblocking off Table 3. Comparison of computational complexity of different schemes for “Mobcal” (200 intra coded frames) QP 22 28 32 40

All mode sec % 1349 1231 1169 1071

- - - -

DC mode sec % 675 50.0 626 50.9 600 51.3 567 52.9

Scheme Pattern 1 sec % 686 50.9 645 52.4 621 53.1 588 54.9

Pattern 2 % 687 50.9 641 52.1 616 52.7 583 54.4 sec

Pattern 3 % 687 50.9 646 52.5 611 52.3 578 54.0 sec

Table 4. Comparison of computational complexity of different schemes for “Parkrun” (200 intra coded frames) QP 22 28 32 40

All mode sec % 1442 1337 1249 1127

- - - -

DC mode sec % 703 48.8 49.0 655 632 50.6 52.4 590

Scheme Pattern 1 sec % 735 51.0 674 50.4 645 51.6 615 54.6

Pattern 2 % 725 50.3 672 50.3 643 51.5 611 54.2 sec

Pattern 3 % 723 50.1 670 50.1 641 51.3 612 54.3 sec

Fig. 8. The RD performance of Harbour at HD720p for H.264

Fig. 5. The RD performance of the five patterns for “Mobcal”

the fact that more prediction candidates lead to longer processing time for mode decision, adopting a fixed intra prediction pattern can save computations and power consumption. For demonstration, a possible pattern is simulated on HD videos and the RD result is shown in Fig. 8. In this pattern, only 16x16 based prediction is used, and all MBs are coded with horizontal mode except the top-left one (coded with DC mode) and the first-column ones (coded with vertical mode). The performance degradation is negligible in this case; therefore our prediction scheme can also be applied to H.264/AVC. 7. CONCLUSIONS

Fig. 6. The RD performance of the five patterns for “Parkrun”

An intra prediction scheme for AVS is proposed in this paper. Our low-cost scheme relieves the time constraint on pipeline scheduling of AVS encoder, and reduces the local memory required for storage of reference pixels in intra prediction process. Also, experimental results show that the proposed scheme can reduce the computational complexity while keeping the RD performance degradation negligible. The extension of our scheme to H.264/AVC is also discussed and simulated.

6. EXTENSION TO H.264/AVC

8. ACKNOWLEDGEMENT

H.264/AVC is the first video coding standard that adopts intra prediction in spatial domain; it performs intra prediction on 4x4, 8x8 or 16x16 blocks instead of only 8x8 blocks in AVS. The basic idea proposed earlier in this paper can be efficiently applied to H.264/AVC by adopting a fixed intra pattern to save computation and memory. In the intra prediction of H.264/AVC, a large local buffer can be removed by encoding the upper blocks in each MB with horizontal mode. In 4x4 block-based coding, the upper blocks represent the blocks labeled 0, 1, 4, and 5 in Fig. 7(a); while in 8x8 block-based coding, they represent blocks 0 and 1 in Fig. 7(b). Nine intra modes are defined for 4x4 and 8x8 predictions and four intra modes for 16x16 prediction in H.264/AVC. Due to

This work was supported in part by AviSonic Co. and the National Science Council of Taiwan under contract NSC 95-2219-E-002-012. We would also like to thank Professor Wen Gao for his generous help and support of our work.

0

1

4

5

2

3

6

7

8

9

12

13

10

11

14

15

Fig. 7(a) Scanning order of 4x4 blocks in a MB, (b) Scanning order of 8x8 blocks in a MB

9. REFERENCES [1] Yu-Wen Huang, Bing-Yu Hsieh, Tung-Chien Chen, and Liang-Gee Chen, “Analysis, fast algorithm, and VLSI architecture design for H. 264/AVC intra frame coder,” in IEEE Trans. on Circuits and Systems for Video Technology, vol. 15, no. 3, March 2005. [2] Tu-Chih Wang, Yu-Wen Huang, Hung-Chi Fang, and LiangGee Chen, “Performance analysis of hardware oriented algorithm modifications in H.264,” in Proc. of 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), Hong Kong, China, April 2003. [3] AVS Video Expert Group, Information Technology – Advanced Audio Video Coding Standard Part 2: Video, in AudioVideo Coding Standard Group of China (AVS), Doc. AVSN1063, December 2003.

Suggest Documents