Fast Mode Decision for Spatial Scalable Video ... - Semantic Scholar

Fast Mode Decision for Spatial Scalable Video Coding He Li1, Z. G. Li2, Changyun Wen1,* and Lap-Pui Chau1 1

2

School of EEE, Nanyang Technological University, Singapore 639798

Media Division, Institute for Infocomm Research, Singapore 119613

Abstract—Scalable Video Coding (SVC) is an on-going standard and the current working draft (WD) is an extension of H.264/AVC. It provides scalability at the bit stream level with good compression efficiency, and allowing free combinations of spatial, temporal and SNR scalability. In the WD, exhaustive search technique is employed to select the best coding mode for each macroblock. This technique achieves highest possible coding efficiency, but it results in higher computational complexity. To overcome this, we propose a novel fast mode decision scheme for spatial scalability in SVC. In this scheme, the mode distribution relationship between base layer and enhancement layers is employed to reduce the candidate mode set at enhancement layers. The experimental results show that the proposed scheme provides significant reduction in computational complexity without any noticeable coding loss.

I. INTRODUCTION Scalable video coding as proposed in [1] is an extension of H.264/MPEG-4 Advanced Video Coding (H.264/AVC). It allows specific subsets of the video bit stream with different transport and presentation properties, e.g. different bit rates and different temporal, spatial characteristics. For spatial scalability in SVC, the base layer contains a reduced-resolution version of each coded frame. The enhancement layers are coded based on predictions formed from the base layer pictures and previously encoded enhancement layer pictures. SVC has introduced a number of advanced features and demonstrated significant achievements in terms of coding efficiency, robustness to a variety of network channels and conditions, and breadth of application [2]. One of the new features is inter-layer prediction between base layer and enhancement layers [3][4]. This feature results in many more macroblock (MB) encoding modes than in the other standards. The available MB modes in H.264/AVC include two INTRA modes such as INTRA_4×4 and INTRA_16 × 16, seven inter modes such as MODE_16 × 16, MODE_16×8, MODE_8×16, MODE_8×8, MODE_8× 4, MODE_4 × 8, MODE_4 × 4 and MODE_SKIP. MODE_SKIP is used in our paper to represent the mode type of skip and direct in B slice. For enhancement layers in SVC, two more modes, base_layer_mode and qpel_refinement_mode, are added [5]. These two modes indicate that motion and prediction information including the partitioning of the corresponding MB of the base layer is used. For the mode of qpel_refinement_mode, a quartersample motion vector refinement is additionally

transmitted and added to the derived motion vectors. Since these two modes are all directly predicted from the base layer, we use a single mode type BL_pred to represent these two modes in this paper. Moreover, BL_pred is used in our paper to represent intra macroblock with prediction from base layer. To achieve the highest coding efficiency, SVC makes use of computationally intensive Lagrangian rate distortion optimization (RDO) technique to decide the coding mode for each MB. That is to say, all modes are examined and the one with the minimum rate distortion cost is selected as the best mode for each MB. As a result, SVC scheme achieves optimal coding performance at the expense of tremendously increased computation complexity. There is a number of fast mode decision methods were proposed for H.264/AVC, see for example [6][7][8]. They all speed up the encoding process with negligible PSNR loss and bit rate increase for single layer coding. However, for SVC, the selection of prediction mode at enhancement layers would be more challenging. Motivated by the observations that there is correlation between the mode distribution at the base layer and that at enhancement layers, we propose an effective fast mode decision for spatial scalable coding. As a result, the candidate prediction modes for enhancement layers will be limited to a small subset and the computational complexity could be reduced. Simulation results illustrate that our algorithm can achieve nearly 60% of encoding time on average saving with negligible PSNR loss and bit rate increase for all the layers. The rest of this paper is organized as follows. In Section II, the mode distribution relationship among the base layer and enhancement layers are described. In Section III, the entire MB mode decision process is presented. Simulation results are presented in Section IV and concluding remarks are given in Section V. II. SPATIAL SCALABILITY IN SVC In SVC, spatial scalable coding of video is considered at multiple resolutions (e.g. QCIF, CIF and 4CIF) with a factor of 2 in horizontal and vertical resolution. An over sampled pyramid representation is used for spatial scalability; where for each spatial resolution a separate refinement of motion and texture information is deployed. The following inter-layer prediction techniques are used in the current SVC scheme [2][3]:

*Corresponding author.

0-7803-9390-2/06/$20.00 ©2006 IEEE

3005

ISCAS 2006

•

1) Prediction of a macroblock using the up-sampled lower resolution signal. 2) Prediction of motion vectors using the up-sampled lower resolution motion vectors. 3) Prediction of the residual signal using the upsampled residual signal of the lower resolution layer.

If MB mode at the base layer is MODE_16×16, it is probable that the corresponding MBs at its enhancement layer are coded at the mode of MODE_16×16 or MODE_SKIP. The feature is more obvious if the quantization parameter is large. TABLE I.

When the base layer represents a layer with half the spatial resolution, according to the inter-layer prediction technique, the motion vector field including the MB partitioning is scaled as shown in Figure 1. Take the middle figure as an example. If the corresponding base layer MB is coded in MODE_SKIP, MODE_16 × 16, MODE_16×8 and MODE_8×16, then MODE_16×16 is used for the current MB at the enhancement layer. Therefore, the intra and inter MBs can be predicted using the corresponding signals of previous layers. Moreover, the motion description of each layer can be used for a prediction of the motion description for following enhancement layers [6]. In our scheme, the base layer MB mode is used to predict the corresponding enhancement layer MB mode.

III.

By observing a large number of video sequences, the following relationships can be found:

•

If the MB mode at the base layer is INTRA_4×4 or INTRA_16 × 16, it is probable that the corresponding four MBs at its enhancement layer are intra coded. The intra coded MB at the base layer indicates that one cannot find a good match in the reference frames. Similarly, at enhancement layers, it is also difficult for the up-sampled MB to find a good match in the reference frames. Table I shows the probability of MBs at the enhancement layers that tend to be intra coded when the corresponding MB mode at the base layer is intra coded in three typical video sequences: FOOTBALL, FOREMAN and BUS. After exhaustive search, majority of MBs at enhancement layers are coded in the mode of BL_pred. This indicates that both the motion and texture information of the base layer are used to predict those at enhancement layers.

Video Sequence

Probability

FOOTBALL

98.32%

FOREMAN

97.17%

BUS

95.66%

PROPOSED FAST MODE DECISION SCHEME

Based on the considerations above, our fast mode decision procedure is given as follows:

Figure 1. Up-sampling of Motion Information

•

PROBABILITY OF INTRA CODED MBS AT EL

•

If the estimated MB mode at the base layer is INTRA_4× 4, the candidate mode subset at its enhancement layer is reduced to BL_pred and INTRA_4×4.

•

If the estimated MB mode at the base layer is INTRA_16×16, the candidate mode subset at its enhancement layer is reduced to BL_pred and INTRA_16×16.

•

If the estimated MB mode at the base layer is MODE_16×16, the candidate mode subset at its enhancement layer is reduced to BL_pred, MODE_16×16 and MODE_SKIP.

•

If the estimated MB mode at the base layer is MODE_16×8, MODE_8×16 and MODE_8×8, we will choose the best two modes and pass them to the enhancement layers to build the candidate mode set. These best two modes are the modes with minimum and 2nd minimum estimated rate distortion cost at base layer. Together with BL_pred and MODE_SKIP, the mode set is reduced to four in this case.

A flowchart of the overall process of the proposed scheme is shown in Figure 2. The MB mode at the base layer is estimated using the exhaustive search technique, which is the same as JSVM 2.0. MODEBL represents the best mode selected at the base layer for certain MB. MODEELpred1 and MODEELpred2 are the best two modes generated at the base layer and they are part of the prediction mode set for enhancement layers. These best two modes can be generated after the rate distortion optimization at the base layer by comparing the rate distortion cost.

3006

FOOTBALL Resolution

FOREMAN

Base

BUS

QCIF

Enhancement

CIF

Frame Rate

15 Hz

Total Number of Frames

32

32

32

Qp Value

10, 20, 30 and 40

Coding Option Used

Rate distortion optimization is enabled.

MV search range is ±32 pels Reference frame number equal to 1. MV resolution is 1/4 pel. Codec TABLE III. QP

JSVM Scheme PSNR(dB)

40

101.565

30

314.8425

20 10

QP

As a result, in our scheme, the candidate mode set is reduced at enhancement layers and complexity reduction is achieved. IV. SIMULATION RESULTS The proposed fast mode decision scheme is embedded in JSVM 2.0 encoder [5]. The test platform used is Intel Pentium IV, 1.83GHz CPU, 256M RAM with Windows XP professional operating system. The test condition is shown in Table II. In our experiments, three standard test sequences including FOREMAN, BUS and FOOTBALL, have been tested with QP range from 10 to 40. We only consider the two-layer case and the same QP value is used for both base layer and enhancement layer. The type of all the sequences is IBBBB. The GOP size is set to be 8. These sequences are all 32 frames long. They represent sequences with large and medium motion. The testing parameters in our experiments include the average time saving, Y-PSNR and bit rate for the enhancement layer. We use Time Saving (TS) to indicate the average time saving in encoding process: TS =

TJSVM − T proposed TJSVM

× 100%

where TJSVM and T proposed are the encoding time of JSVM 2.0 encoder and its modified encoder according to the proposed fast mode decision method, respectively. TABLE II.

SIMULATION CONDITION

SIMULATION RESULT FOR FOREMAN

Bit rate(kbits/s)

Proposed Scheme

TS(%)

Bit Rate

PSNR

30.9086

100.695

30.7742

61.65%

36.4844

317.6475

36.3905

56.03%

1048.005

41.6713

1037.535

41.7071

45.47%

3882.203

48.1106

3882.968

48.0637

41.28%

TABLE IV.

Figure 2. Flowchart of Proposed Fast Mode Decision at EL

JSVM 2.0 encoder

SIMULATION RESULT FOR BUS

JSVM Scheme

Proposed Scheme

TS(%)

Bit rate(kbits/s)

PSNR(dB)

Bit Rate

PSNR

40

267.435

27.5573

267.75

27.4185

30

805.44

33.4307

812.64

33.4244

51.72%

62.47%

20

2360.07

40.2659

2376.495

40.256

40.54%

10

6182.745

48.3774

6200.79

48.3099

37.27%

TABLE V. QP

SIMULATION RESULTS FOR FOOTBALL

JSVM Scheme

Proposed Scheme PSNR

TS(%)

Bit rate(kbits/s)

PSNR(dB)

Bit Rate

40

325.425

27.0371

326.8425

26.96

54.67%

30

1125.158

33.5802

1129.44

33.5579

46.88%

20

3152.648

41.109

3156.458

41.0981

46.41%

10

7081.838

48.7973

7098.113

48.7468

49.70%

Table III to Table V show the coding results of JSVM original scheme and our proposed scheme. Only the enhancement layer Y-PSNR and bit rate results are shown in the table. The results show that the proposed method is very effective in reducing the encoding time. The total encoding time is reduced by about 60% on average. Moreover, our scheme achieves consistent time saving over a large bit rate range with negligible loss in PSNR and increments in bit rate. Figures 3 to 5 present the rate distortion (RD) curves drawn from the data from Table III to Table V. They describe the rate distortion relationship for FOREMAN, BUS and FOOTBALL sequences. From these figures, we can see that there is little difference between the two RD curves in each figure. By comparing Figures 3 and 5, the difference between two RD curves at high bit rate is larger for FOOTBALL sequence. This is because FOOTBALL represents a sequence of fast motion with fine details. The motion correlation between base layer and enhancement layer is lower compared to the low motion sequence in this case. Consequently, our proposed scheme almost does not degrade the coding efficiency and picture quality.

3007

Figure 3. RD Curve for FOREMAN

V. CONCLUSION The current scalable video coding relies on exhaustive search to determine the best mode for each macroblock. This technique requires high computational complexity and thus it consumes a lot of time for processing. In order to reduce the computational complexity while maintaining the high coding efficiency, we proposed an efficient mode decision scheme at the enhancement layer for spatial scalability. With this scheme, the candidate mode set is reduced at enhancement layers. Simulation results show that our proposed method can provide significant reduction in computational complexity with only negligibly small coding loss. In this paper, the mode type with large partition size is considered in the mode reduction method at enhancement layers. For small partition size, such as MODE_8x4, MODE_4x8 and MODE_4x4, further investigation is needed.

REFERENCES [1]

[2]

[3]

[4] Figure 4. RD Curve for BUS [5]

[6]

[7]

[8]

Figure 5.

RD Curve for FOOTBALL

3008

H. Schwarz, T. Hinz, H. Kirchhoffer, D. Marpe, and T. Wiegand, “Technical Description of the HHI proposal for SVC CE 1,” ISO/IEC JTC1/WG11, Doc. M11244, Palma de Mallorca, Spain, Oct. 2004. J. Reichel, H. Schwarz, and M. Wien, “Scalable Video Coding – Working Draft 1,” Joint Video Team (JVT), Doc. JVT-N020, Hong Kong, CN, Jan. 2005. H. Schwarz, D. Marpe and T. Wiegand, “Inter-layer Prediction of Motion and Residual Data,” ISO/IEC JTC 1/SC 29/WG 11/M11043, USA, July. 2004. Z. G. Li, X. K. Yang, K. P. Lim, X. Lin, S. Rahardja, and F. Pan, “Scalable Video Coding with Grid Motion Estimation and Compensation,” US Provisional Patent Application (I2R Ref: P2004003-ETPL Ref: I2R/P//1899/PCT), Filed in 23 June, 2004. J. Reichel, H. Schwarz, and M. Wien, “Joint Scalable Video Model (JSVM) 2.0 Reference Encoding Algorithm Description,” ISO/IEC JTC1/SC29/WG11, Doc. N7084, Buzan, Korea, April. 2005. F. Pan, X. Lin, R. Susanto, K. P. Lim, Z. G. Li, G. N. Feng, D. J. Wu, and S. Wu, “Fast Mode Decision Algorithm for Intraprediction in H.264/AVC Video Coding,” IEEE Transactions on Circuits and Systems for Video Technology, VOL. 15, NO. 7, July. 2005. Qionghai Dai, Dongdong Zhu and Rong Ding, “Fast Mode Decision for Inter Prediction in H.264,” International Conference on Image Processing, 2004. J. Lee and B. Jeon, “Fast Mode Decision for H.264 with Variable Motion Block Sizes,” in Proc. Int. Symmp, Computer and Information Scinences (ISCIS) 2003, Nov. 2003, pp. 723-730.

Fast Mode Decision for Spatial Scalable Video ... - Semantic Scholar

Fast Mode Decision for Spatial Scalable Video ... - Semantic Scholar

Suggest Documents

Error Resilient Mode Decision in Scalable Video Coding

SECURE SCALABLE VIDEO STREAMING FOR ... - Semantic Scholar

Scalable Video Adaptation for IPTV - Semantic Scholar

Fast HEVC Intra Prediction Mode Decision Based ... - Semantic Scholar

Supporting Scalable Video Transmission in ... - Semantic Scholar

Scalable Video Summarization Using Skeleton ... - Semantic Scholar

Scalable Video Coding Guidelines and ... - Semantic Scholar

Scalable Video Summarization Using Skeleton ... - Semantic Scholar

Performance Analysis of Scalable Video ... - Semantic Scholar

Decision Tables: Scalable Classification Exploring ... - Semantic Scholar

Supporting Scalable Video Codecs in a P2P Video ... - Semantic Scholar

FAST INTER-PREDICTION MODE DECISION AND ...

Scheduled Video Delivery for Scalable On ... - Semantic Scholar

Rate control for scalable video model - Semantic Scholar

A Scalable Video Compression Algorithm for Real ... - Semantic Scholar

Scalable Multiple Description Video Coding for ... - Semantic Scholar

Scalable Codec Architectures for Internet Video-on ... - Semantic Scholar

is fine-granular scalable video coding benefical for ... - Semantic Scholar

A Pricing Mechanism for Scalable Video Delivery1 - Semantic Scholar

Cross-layer architecture for scalable video ... - Semantic Scholar

The Decision-Theoretic Video Advisor - Semantic Scholar

Video Communication System Supporting Spatial ... - Semantic Scholar

Spatial Video and GIS - Semantic Scholar

A Fast MB Mode Decision Algorithm for MPEG-2 to ... - Semantic Scholar