Y.-H. Kim et al.: Memory-Efficient H.264/AVC CAVLC for Fast Decoding
943
Memory-Efficient H.264/AVC CAVLC for Fast Decoding Yong-Hwan Kim, Yoon-Jong Yoo, Jeongho Shin, Byeongho Choi, and Joonki Paik Abstract — A fast decoding method is presented for
memory-efficient context-based adaptive variable length coding (CAVLC). The CAVLC algorithm in H.264/MPEG-4 AVC significantly increased the compression ratio compared with variable length coding (VLC) of conventional video coding standards such as H.263 and MPEG-4 Visual because of adaptive use of neighboring blocks’ context. Implementation of the CAVLC decoding algorithm, however, is much more complicated than existing VLC decoding methods because of frequent access to the unstructured VLC tables, adaptive switching of tables, and five syntax elements. Especially in mobile video terminals such as digital multimedia broadcasting (DMB) and digital video broadcasting – handheld (DVB-H) player, conventional CAVLC decoding algorithms result in increased power consumption due to frequent memory access. This paper presents a new CAVLC decoding method using arithmetic operations instead of the conventional table look-up method that requires a lots of memory access. Experimental results show that the proposed algorithm has 50% higher speed and 95% less memory access comparing with three conventional CAVLC algorithms, such as table look-up by sequential search, table look-up by binary search, and Moon’s method1. Index Terms — CAVLC decoding, arithmetic operation, memory access
I. INTRODUCTION Variable-length coding (VLC) is a fundamental method in image compression for removing statistical redundancy in digital multimedia data. H.264/MPEG-4 AVC has two advanced VLC techniques: context-based adaptive variable length coding (CAVLC) and context-based adaptive binary arithmetic coding (CABAC) [1]. CAVLC, which is mainly used for the baseline profile of H.264/MPEG-4 AVC, requires less computational load at the cost of about 10 % lower compression ratio than CABAC [2][3]. H.264/AVC baseline profile only includes CAVLC, and is invented for lightweight application such as mobile environment. CAVLC consists of five syntax elements as: coeff_token, sign of 1
This work was supported by Korean Ministry of Science and Technology under the National Research Laboratory Project and by Korean Ministry of Information and Communication under the Chung-Ang University HNRC-ITRC program. Yong-Hwan Kim and Byeongho Choi is with the Digital Media Research Center, Korea Electronics Technology Institute (KETI), SungNam, Korea, and with the Department of Image Engineering, Chung-Ang University, Seoul, Korea (e-mail:
[email protected];
[email protected]). Yoon-Jong Yoo, Jeongho Shin, and Joonki Paik are with the Department of Image Engineering, Chung-Ang University (e-mail:
[email protected];
[email protected];
[email protected]) Contributed Paper Manuscript received May 21, 2006
TrailingOnes, level, total_zero, and run_before. level can be decoded by arithmetic operation because it is Golomb-based structured VLC codes, and sign of TrailingOnes can be easily decoded by maximum 3-bit reading. coeff_token, total_zero, and run_before, however, require a lot of memory accesses due to table look-up because they are composed of unstructured VLC codes. In the ubiquitous computing environment, frequent memory access is a critical pitfall of mobile video terminals such as digital multimedia broadcasting (DMB) player, digital video broadcasting – handheld (DVB-H) player, portable media player (PMP), personal digital assistant (PDA) media player, and mobile phone with video capability because their memories have limited size and low speed. Such heavy memory access results in high power consumption and delay in operations [4][5][6]. For overcoming heavy memory access in simple table look-up-based CAVLC decoding, many hardware decoding methods have been proposed [7][8]. Recently, Moon proposed a software decoding method using partial arithmetic operations [9]. Moon’s method, however, has limited improvement because it still depends on a large portion of table look-up. In this paper we present a new CAVLC decoding method that (i) analyzes correlation of VLC codes, (ii) classifies codes into multiple groups, and (iii) performs arithmetic decoding. Arithmetic operations, which replace the table look-up process requiring heavy memory access, significantly cut down power consumption and increase the decoding speed. This paper is organized as follows. In section II existing CAVLC decoding methods are investigated, and the proposed decoding method is presented in section III. The proposed fast algorithm is compared with existing methods by experiments in section IV, and section V concludes the paper. II. EXISTING CAVLC DECODING METHODS In this section we briefly review existing CAVLC decoding methods, such as table look-up by sequential search (TLSS) that has been implemented by H.264/MPEG-4 AVC reference software [10], in-house table look-up by binary search (TLBS), and Moon’s algorithm [9]. A. TLSS Method The TLSS method is the most fundamental CAVLC decoding method. This method sequentially compares input bits with the entry of VLC table until the exact VLC code is found [10]. Fig. 1 shows C pseudo code for the coeff_token decoding algorithm using the TLSS method. coeff_token consists of TotalCoeff (TC) and TrailingOnes (T1s). The ShowBits(n) function returns n-bit from the bitstream buffer
0098 3063/06/$20.00 © 2006 IEEE
IEEE Transactions on Consumer Electronics, Vol. 52, No. 3, AUGUST 2006
944
with big-endian fashion. Similarly, SkipBits(n) function skips n-bit from the bitstream buffer. As shown in Fig. 1 the sequential table search results in table indices i and j, where i and j respectively represent TC and T1s. VLC numcoeff_tab[3][4][17] = { {{1,1}, {6,5}, {8,7}, {9,7}, {10,7}, {11,7}, {13,15}, …}, {…, {15,14}, {15,10}, {15,1}, {16,14}, {16,10}, {16,6}}, {{0,0}, {0,0}, {3,1}, {7,5}, {8,5}, {9,5}, {10,5}, {11,5}, …}, {{0,0}, {0,0}, {0,0}, {5,3}, {6,3}, {7,4}, {8,4}, {9,4}, …}, }, …, }; VLC* tab = (VLC*)&(numcoeff_tab[vlcnum][0][0]); peek_cod = ShowBits(16); /* maximum length of vlc code */ for (j = 0; j < 4; j++) { for (i = 0; i < 17; i++) { len = tab[i].len; if (!len) continue; if((peek_cod >> (16 - len)) == tab[i].code) { TC = i; /* table width */ T1s = j; /* table height */ SkipBits(len); return; /* found! */ } } tab += 17; } Fig. 1. Pseudo code for coeff_token decoding of TLSS method
The TLSS-based decoding is very slow and consumes much power because of heavy memory access and many conditional statements due to sequential search. B. TLBS Method First, VLC codes should be arranged in the decending order, and TLBS method then searches the entry of VLC table using the binary search algorithm. Fig. 2 shows C pseudo code for coeff_token decoding using the TLBS method. The TLBS algorithm is faster than the TLSS algorithm because every VLC code is found by only six table look-up operations from the table with 64 entries. The TLBS method may, however, show inefficiency in some systems due to random memory access compared with sequential memory access in the TLSS method. C. Moon’s method Moon et al. has recently proposed a method for reducing memory access of CAVLC software decoding [9]. The main idea is to reduce heavy memory access by decoding highly probable short VLC codes using arithmetic operations. On the other hand most VLC codes with low probability of occurrence are decoded by conventional method, that is, the TLSS method. Moon’s method defines arithmetic equations for partial entries of coeff_token VLC tables (Tables 0, 1, and 2) and all entries of run_before VLC tables. Fig. 3 shows C pseudo code for coeff_token decoding, where underlined parts represent the missing lines in the original paper. We also insert the SkipBits() function because the original paper didn’t indicate how to skip bits after coeff_token decoding. The variable m represents the number of leading zero bits up to the first non-zero bit, and GetM(code, a) function returns m in the code with length a.
VLC2 numcoeff_tab2 [3][64] = { { { 0x0002, 15, 1, 13 }, /* 0000 0000 0000 001 */ { 0x0004, 16, 0, 16 }, /* 0000 0000 0000 0100 */ { 0x0005, 16, 2, 16 }, /* 0000 0000 0000 0101 */ { 0x0006, 16, 1, 16 }, /* 0000 0000 0000 0110 */ … { 0x1800, 5, 3, 3 }, /* 0001 1 */ { 0x2000, 3, 2, 2 }, /* 001 */ { 0x4000, 2, 1, 1 }, /* 01 */ { 0x8000, 1, 0, 0 }, /* 1 */ }, …} VLC2* pTab = (VLC2*) numcoeff_tab2[vlcnum]; peek_code = ShowBits(16); min = 0, mid = 0, max = 62; while((max - min) > 1) { mid = (min + max) >> 1; if(peek_code >= pTab[mid].code) min = mid; else max = mid; } SkipBits(pTab[min].len); TC = pTab[min].data1; T1s = pTab[min].data2; Fig. 2. Pseudo code for coeff_token decoding of TLBS method if(vlcnum == 0) { code4 = ShowBits(4); m = GetM(code4, 4); if(m < 4) { SkipBits(m + 1); code = ShowBits(2); d = 3 - code; T1s = (m + (d-1) * ((m+1)>>2) * (d>>1)) % 4; TC = (m + d * ((m+1)>>2) * (d>>1)) % 4; if(m == 3) SkipBits((code >= 2) ? 1: 2); return; } } else if(vlcnum == 1) { code4 = ShowBits(4); code = code4 >> 2; if(code) { D = 3 - code; d = 3 - (code4 & 3); w = code >> 1; T1s = D + (1-w) * (d>>1); TC = T1s + (1-w) * ((d+1) >> 2); if((code4 >> 1) == 3) SkipBits(3); else SkipBits(4 - (w3) == 1) { w = (code4>>2) % 2; d = 3 - (code4 & 3); T1s = 3 + (d-3) * w; TC = d + ((1-w)
The variable vlcnum represents the number of VLC table of coeff_token. As shown in the figure arithmetic decoding is performed if some conditions are satisfied, otherwise the decoding mode switches to a conventional method. The pitfall of Moon’s algorithm is that it provides inconsistent results for various kinds of sequences and QPs because high motion sequence and high QP might produce long VLC codes that is not covered by arithmetic operation. Section 4 justifies this explanation using experiments.
Y.-H. Kim et al.: Memory-Efficient H.264/AVC CAVLC for Fast Decoding
Fig. 4 shows C pseudo code for run_before decoding using Moon’s method with come corrections by underlines. ZL = vlcnum + 1; /* ZerosLeft */ code3 = ShowBits(3); code2 = code3 >> 1; if(ZL > 6) { if(code3 > 0) { RB = 7 - code3; n_skip = 3; } else { code11 = ShowBits(11); m = GetM(code11, 11); RB = 4 + m; n_skip = m + 1; } } else if(ZL == 6) { if(code3 < 2) { RB = code3 + 1; n_skip = 3; } else if(code3 >= 6) { RB = 0; n_skip = 2; } else if((code3==2) || (code3==4)) { RB = code3+2; n_skip = 3; } else { RB = code3; n_skip = 3; } } else if((ZL >= 3) && (ZL 1)); n_skip = 2 - (code2>>1); } else { RB = 1 - (code2>>1); n_skip = 1; } } SkipBits(n_skip); Fig. 4. Pseudo code for run_before decoding of Moon’s method
In Fig. 4, run_before (RB) decoding is totally replaced by arithmetic operations. This algorithm, however, has so many conditional statements that it may not help reducing CAVLC decoding time. III. FAST, MEMORY-EFFICIENT CAVLC DECODING METHOD We propose a fast arithmetic decoding algorithm for overcoming drawbacks of conventional methods, such as high power consumption and slow decoding due to heavy memory access and many conditional statements. Conventional methods also show irregular performance according to various QPs and sequences due to the VLC table search. The proposed algorithm defines arithmetic operations for coeff_token, total_zeros, and run_before syntax. The basic idea of the proposed method is; (i) analyzing correlation of unstructured VLC codes, (ii) classifying the codes into multiple groups, and (iii) assigning an arithmetic decoding equation to each group. Note that the number of groups should be small because conditional statements are required for differentiating groups and cause operation slowdown due to CPU pipeline flush. Each group is classified by following four properties: [A.1] Bit arrangement of VLC code, especially the position of bit 1’s [A.2] Successive values of VLC codes in the descending order [A.3] The number of leading zero bits up to the first non-zero bit: m [A.4] Special case that does not belong to any one of the above three cases
We present coeff_token, total_zeros, and run_before decoding algorithms in the following subsections.
945
A. coeff_token decoding In the H.264/MPEG-4 AVC baseline profile coeff_token syntax has four VLC tables according to nC, equal to average number of non-zero transform coefficient levels of top and left 4x4 luma blocks. nC is equal to -1 for chroma blocks. VLC table of the current block is adaptively selected according to neighboring blocks’ non-zero transform coefficient levels. This adaptivity leads to high compression at the cost of frequent memory accesses and slow decoding time. 1) VLC Table 0 (0 ≤ nC < 2) First, VLC codes are arranged in the descending order, and TABLE I 9-BIT SET AND GROUPING OF PART OF COEFF_TOKEN TABLE 0 Code Value Bit count [T1s, TC] 1 xxxx xxxx ≥ 256 1 [0, 0] 0 1xxx xxxx ≥ 128 2 [1, 1] 0 01xx xxxx ≥ 64 3 [2, 2] 0 0011 xxxx ≥ 48 5 [3, 3] ≥ 40 0 0010 1xxx 6 [0, 1] ≥ 32 0 0010 0xxx 6 [1, 2] ≥ 24 0 0001 1xxx 6 [3, 4] 0 0001 01xx ≥ 20 7 [2, 3] 0 0001 00xx ≥ 16 7 [3, 5] … (7) 0 0000 0100 4 9 [3, 7] TABLE II 16-BIT SET AND GROUPING OF PART OF COEFF_TOKEN TABLE 0 Code Value Bit count [T1s, TC] 0000 0001 11xx xxxx ≥ 448 10 [0, 4] 0000 0001 10xx xxxx ≥ 384 10 [1, 5] … (5) 0000 0000 100x xxxx ≥ 128 11 [3, 9] ≥ 120 0000 0000 0111 1xxx 13 [0, 6] ≥ 112 0000 0000 0111 0xxx 13 [1, 7] … (5) ≥ 64 0000 0000 0100 0xxx 13 [0, 8] 0000 0000 0011 11xx ≥ 60 14 [0, 9] 0000 0000 0011 10xx ≥ 56 14 [1, 9] … (13) 0000 0000 0001 000x ≥ 16 15 [3, 14] 0000 0000 0000 1111 15 16 [0, 13] 0000 0000 0000 1110 14 16 [1, 14] … (8) 0000 0000 0000 0101 5 16 [2, 16] 0000 0000 0000 0100 4 16 [0, 16] 0000 0000 0000 001x ≥ 2 15 [1, 13]
they are separated into 9-bit and 16-bit sets by length and correlation of codes. VLC codes are then classified into 8 groups according to four properties. An arithmetic equation for each group is finally defined. Tables I and II show groups which is separated by bold font and gray color. In the tables a number inside round brackets represents the number of omitted VLC codes for brevity and x represents either 0 or 1. T1s and TC represent TrailingOnes and TotalCoeff, respectively. Fig. 5 shows arithmetic decoding equations expressed as C pseudo code. As shown in the figure division-related arithmetic operations, such as / and %, are completely
IEEE Transactions on Consumer Electronics, Vol. 52, No. 3, AUGUST 2006
946
excluded for operation speed [5]. C comments show classification of groups. The number of conditional statements will be usually lower than four since the first 9-bit set has 95% or higher probability of occurrence. Unlike Moon’s method, conditional statements have simple, regular forms depending on only value of VLC codes. FAST_GETM macro is an option for systems which have native instruction to get m. For example ARMv5 system has CLZ (count leading zero) instruction [5]. This paper, however, does not use FAST_GETM macro. code = ShowBits(9); if(code >= 4) { /* 9-bit representation */ if(code >= 48) { /* 1, 2, 3, 5 bits */ #ifdef FAST_GETM /* [A.3] */ m = GetM(code, 9); T1s = m; TC = m; n_skip = m + 1 + ((m+1)>>2); #else /* [A.1], [A.2] */ g = (code & 128)>>1; f = (code & (192 – g))>>6; T1s = (code & 256) ? 0 : (3 – f); TC = T1s; n_skip = T1s+1 + ((T1s+1)>>2); #endif } else if(code >= 24) { /* [A.2] ; 6 bits */ f = 5 – (code>>3); T1s = f + (f >> 1) ; n_skip = 6; TC = T1s + 1; } else { /* [A.1], [A.2] ; 7, 8, 9 bits */ g = code >> 3; T1s = 7 – (code >> g); n_skip = 9 – g; TC = T1s + (3 – g) + ((T1s + 1)>>2); } } else { /* 16-bit representation */ code16 = ShowBits(16); if(code16 >= 128) { /* [A.1], [A.2] ; 10, 11 bits */ g = code16 >> 8 ; T1s = 7 – (code16 >> (g + 5)); n_skip = 11 – g ; TC = T1s + 5 - g + ((T1s + 1)>>2); } else if(code16 >= 64) { /* [A.2] ; 13-bit */ f = (code16 < 72) ? 7 : (code16 >> 3); T1s = 3 – (f & 3); n_skip = 13; TC = T1s + 6 + ((15 – f)>>2) + ((T1s+1)>>2); } else if(code16 >= 16) { /* [A.1], [A.2] ; 14, 15 bits */ g = code16 >> 5 ; f = code16 >> (g+1) ; T1s = 3 – (f & 3) ; n_skip = 15 - g ; TC = T1s+10 – (g2) + ((4 – T1s)>>2); } else if(code16 >= 5) { /* [A.2] ; 16 bits */ T1s = 3 – (code16 & 3); n_skip = 16; TC = T1s+12 + ((15 - code16)>>2) + ((code16 & 2)>>1); } else { /* [A.1] */ g = code16 >> 2; T1s = 1 – g; TC = 13 + 3 * g; n_skip = 15 + g; } } SkipBits(n_skip) ; Fig. 5. Pseudo code for coeff_token decoding (Table 0)
2) VLC Table 1 (2 ≤ nC < 4) The VLC Table 1 procedure is similar to that of VLC Table 0 case. VLC codes are separated into 9-bit and 14-bit sets, and classified into 7 groups. Tables III and IV show such groups. The first 9-bit set has over 95% probability of occurrence as well. Fig. 6 shows arithmetic decoding equations expressed as C pseudo code.
TABLE III 9-BIT SET AND GROUPING OF PART OF COEFF_TOKEN TABLE 1 Code Value Bit count [T1s, TC] 1 1xxx xxxx ≥ 384 2 [0, 0] 1 0xxx xxxx ≥ 256 2 [1, 1] … (2) 0 100x xxxx ≥ 128 4 [3, 4] ≥ 112 0 0111 xxxx 5 [1, 2] ≥ 96 0 0110 xxxx 5 [3, 5] … (11) ≥ 16 0 0001 00xx 7 [3, 8] 0 0000 111x ≥ 14 8 [0, 4] 0 0000 110x ≥ 12 8 [1, 6] … (5) 0 0000 0100 4 9 [3, 9] TABLE IV 14-BIT SET AND GROUPING OF PART OF COEFF_TOKEN TABLE 1 Code Value Bit count [T1s, TC] 00 0000 0111 1xxx ≥ 120 11 [0, 7] 00 0000 0111 0xxx ≥ 112 11 [1, 8] … (13) 00 0000 0010 00xx ≥ 32 12 [0, 11] 00 0000 0001 111x ≥ 30 13 [0, 12] 00 0000 0001 110x ≥ 28 13 [1, 12] … (7) 00 0000 0000 110x ≥ 12 13 [2, 14] 00 0000 0000 1011 11 14 [1, 14] 00 0000 0000 1010 10 14 [2, 15] 00 0000 0000 1001 9 14 [0, 15] 00 0000 0000 1000 8 14 [1, 15] 00 0000 0000 0111 7 14 [0, 16] 00 0000 0000 0110 6 14 [1, 16] … (2) 00 0000 0000 001x ≥ 2 13 [3, 15]
3) VLC Table 2 (4 ≤ nC < 8) The VLC Table 2 procedure is similar to that of VLC Table 0. VLC codes are separated into 7-bit and 10-bit sets, and classified into 8 groups. Tables V and VI show these groups. TABLE V 7-BIT SET AND GROUPING OF PART OF COEFF_TOKEN TABLE 2 Code Value Bit count [T1s, TC] 111 1xxx ≥ 120 4 [0, 0] 111 0xxx ≥ 112 4 [1, 1] … (5) 100 0xxx ≥ 64 4 [3, 7] ≥ 60 011 11xx 5 [1, 2] ≥ 56 011 10xx 5 [2, 3] … (5) ≥ 32 010 00xx 5 [1, 5] 001 111x ≥ 30 6 [0, 1] 001 110x ≥ 28 6 [1, 6] … (4) 001 001x ≥ 18 6 [2, 7] ≥ 16 001 000x 6 [0, 3] 000 1111 15 7 [0, 4] … (3) 000 1011 11 7 [0, 5]
Y.-H. Kim et al.: Memory-Efficient H.264/AVC CAVLC for Fast Decoding
First 7-bit set has 95% or higher probability of occurrence. code = ShowBits(9); if(code >= 4) { /* 9-bit representation */ if(code >= 128) { /* [A.1], [A.2] ; 2, 3, 4 bits */ g = 1 – (code>>8); f = 7 – (code >> (6 - g)); T1s = (g1); TC = T1s + ((g+f)>>2); n_skip = g + 2 + ((T1s+1)>>2); } else if(code >= 16) { /* [A.1], [A.2] ; 5, 6, 7 bits */ g = (code >> 5); h = (g + 1)>>2; f = (g + 11)>>2; T1s = 3 – ((code>>f) & (3 – h)); k = (T1s + 1)>>1 ; n_skip = 9 – h – f; TC = (3 - g) + (k1); } else { /* [A.1], [A.2] ; 8 and 9 bits */ g = code >> 3; f = code >> g; k = g + ((code == 5) ? 1 : 0)); T1s = ((f – g) == 3) ? 0 : (7 – f); n_skip = 9 – g; TC = T1s + (6–g) – (k & (f & 1)); } } else { /* 14-bit representation */ code14 = ShowBits(14); if(code14 >= 32) { /* [A.1], [A.2] */ g = code14 >> 6; f = (code14 < 36) ? 7 : (code14 >> (2 + g)); T1s = 3 – (f & 3) ; n_skip = 12 – g; TC = T1s+9 – (g2) – ((f&1) & (T1s>>1)); } else if(code14 >= 12) { /* [A.2] */ f = (code14 < 14) ? 5 : (code14 >> 1); T1s = 3 – (f & 3); n_skip = 13 ; TC = 12 + ((15-f)>>2) + ((T1s+1)>>2); } else if(code14 >= 8) { /* [A.2] */ T1s = 12 – code14 – 3 * ((13 – code14)>>2); n_skip = 14; TC = 17 – ((code14 + 1)>>2); } else { /* [A.1], [A.2] ; 14-bit and last 13-bit */ g = code14 >> 2; T1s = (code14 < 4) ? 3 : (7 – code14); TC = 15 + g; n_skip = 13 + g; } } SkipBits(n_skip); Fig. 6. Pseudo code for coeff_token decoding (Table 1) TABLE VI 10-BIT SET AND GROUPING OF PART OF COEFF_TOKEN TABLE 2 Code Value Bit count [T1s, TC] 00 0101 0xxx ≥ 80 7 [2, 9] 00 0100 1xxx ≥ 72 7 [0, 6] 00 0100 0xxx ≥ 64 7 [0, 7] ≥ 60 00 0011 11xx 8 [0, 8] ≥ 56 00 0011 10xx 8 [1, 9] … (11) ≥ 20 00 0001 010x 9 [1, 12] ≥ 18 00 0001 001x 9 [2, 13] 00 0001 000x ≥ 16 9 [0, 12] 00 0000 111x ≥ 14 9 [1, 13] 00 0000 1101 13 10 [0, 13] 00 0000 1100 12 10 [1, 14] … (9) 00 0000 0010 2 10 [3, 16] 00 0000 0001 1 10 [0, 16]
Fig. 7 shows arithmetic decoding equations expressed as C pseudo code.
947 code = ShowBits(7); if(code >= 11) { /* 7-bit representation */ if(code >= 64) { /* [A.2] ; 4 bits */ f = 15 – (code >> 3); n_skip = 4; T1s = (f > 3) ? 3 : f; TC = f; } else if(code >= 32) { /* [A.2] ; 5 bits */ f = 16 – (code>>2); T1s = (f < 4) ? f : (1 + (f & 1)); n_skip = 5; TC = (f == 3) ? 8 : (T1s + (f>>1) + ((9 - f)>>3)); } else if(code >= 18) { /* [A.2] ; 6 bits */ f = 15 – (code>>1); T1s = (f & 3); g = (T1s + 1)>>1; n_skip = 6; TC = (g2)) + (g & 1); } else { /* [A.1], [A.2] */ g = code >> 4; T1s = ((15 - code) & 3) * (1-g); n_skip = 7 – g ; TC = (T1s==0) ? (7 – (code>>2)) : (((T1s+7)>>1)> 3; T1s = ((f + 6) >> 4) = 18) { /* [A.1], [A.2] ; 8, 9 bits */ g = code10 >> 5; f = code10 >> (g+1); T1s = 3 – (f & 3); n_skip = 9 – g; TC = T1s + ((5 - g)2); } else if(code10 >= 14) { /* [A.1] ; 9 bits */ T1s = (code10 & 2) >> 1; n_skip = 9; TC = T1s + 12; } else { /* [A.2] ; 10 bits */ T1s = (13 - code10) & 3; TC = 16 – ((code10 – 1)>>2); n_skip = 10; } } SkipBits(n_skip); Fig. 7. Pseudo code for coeff_token decoding (Table 2)
4) Chroma DC VLC Table (nC == -1) VLC codes are expressed by 8-bit set and classified into four groups. Table VII shows these groups, and Fig. 8 shows arithmetic decoding equations expressed as C pseudo code. TABLE VII 8-BIT SET AND GROUPING OF CHROMA DC TABLE Code Value Bit count [T1s, TC] 1xxx xxxx ≥ 128 1 [1, 1] 01xx xxxx ≥ 64 2 [0, 0] 001x xxxx ≥ 32 3 [2, 2] 0001 11xx ≥ 28 6 [0, 1] 0001 10xx ≥ 24 6 [1, 2] … (2) 0000 11xx ≥ 12 6 [0, 3] 0000 10xx ≥ 8 6 [0, 4] 0000 011x ≥ 6 7 [1, 3] 0000 010x ≥ 4 7 [2, 3] 0000 0011 3 8 [1, 4] 0000 0010 2 8 [2, 4] 0000 000x ≥ 0 7 [3, 4]
IEEE Transactions on Consumer Electronics, Vol. 52, No. 3, AUGUST 2006
948 code8 = ShowBits(8); if(code8 >= 32) { /* [A.2] */ g = code8>>5; T1s = (g >> 2) + (((9 - g)>>3) 1) & (3 – T1s)); #endif } else if(code8 >= 8) { /* [A.2] ; 6-bit */ g = code8 >> 2; f = 7 – g; n_skip = 6; T1s = ((g – 1)>>2) * (f + (f>>1)) ; TC = f + 1 - (((f + 1)>>2)>2; T1s = 4 – (code8 >> g); TC = 4 – g; n_skip = 8 – g; } else { /* [A.4] */ T1s = 3; TC = 4; n_skip = 7; } SkipBits(n_skip);
if(ZL >1)+1)) ; n_skip = ZL – (code>>1); } else if(ZL >1)) >= ZL) { /* [A.2] */ RB = 3 – (code>>1); n_skip = 2; } else { /* [A.2] */ f = (ZL – (code & 3)); RB = (ZL < 6) ? f : ((code < 2) ? (code + 1) : f); n_skip = 3; } } else { /* ZL > 6 */ code11 = ShowBits(11); if(code11 >= 256) { /* [A.2] */ RB = 7 – (code11>>9); n_skip = 3; } else { /* [A.3] */ m = GetM(code11, 11); RB = 4 + m; n_skip = m + 1; } } SkipBits(n_skip) ;
Fig. 8. Pseudo code for coeff_token decoding (Chroma DC Table)
Fig. 9. Pseudo code for run_before decoding
B. Decoding for run_before run_before syntax is composed of 7 tables according to zerosLeft (ZL). We classify VLC codes into 5 groups, and define an arithmetic decoding equation for each group. Table VIII shows these groups, and Fig. 9 shows arithmetic decoding equations expressed as C pseudo code. The proposed algorithm has simpler architecture than Moon’s method shown in Fig. 4.
if(TC == 1) { code = ShowBits(9); if(code >= 256) { TZ = 0; n_skip = 1; } /* [A.2] */ else if(code >= 4) { /* [A.3] */ m = GetM(code, 9); TZ = (m > (7-m)) & 1); n_skip = m + 2; } else { TZ = 16 - code; n_skip = 9; } /* [A.2] */ } else if(TC > 2; h = code >> (g + 1); TZ = ((code>>g) & (h + 1)) + (h - g); n_skip = 3 - g - h; } else { /* [A.1]; TC == 14 or 15 */ code = ShowBits(16 - TC); g = code >> 1; TZ = code & (g + 1); n_skip = 16 - TC - g; }
run_before (RB) 0 1 2 3 4 5 6 7 8 … 14
1 1 0
TABLE VIII GROUPING OF RUN_BEFORE TABLE zerosLeft (ZL) 2 3 4 5 6 1 11 11 11 11 000 01 10 10 10 011 001 00 01 01 001 010 011 00 000 001 010 000 101 100
>6 111 110 101 100 011 010 001 0001 00001 … (5) 00000000001
C. Decoding for total_zeros 1) total_zeros tables for 4x4 blocks total_zeros syntax is composed of 15 tables according to TotalCoeff. We classified VLC codes into multiple groups as well. We, however, found that an arithmetic decoding equation is not adequate to total_zeros syntax because too many groups are required due to very low correlation of codes. Thus we apply arithmetic decoding to tables of TotalCoeff 1, 13, 14, and 15 and TLBS method to remaining codes. Fig. 10 shows arithmetic decoding equations and TLBS method expressed as C pseudo code.
Fig. 10. Pseudo code for total_zeros decoding
2) total_zeros table for chroma DC 2x2 total_zeros syntax for chroma DC is composed of 3 tables according to TotalCoeff. We classify VLC codes into two groups and define an arithmetic decoding equation for each group. Table IX shows two groups, and Fig. 11 shows arithmetic decoding equations expressed as C pseudo code. TABLE IX GROUPING OF TOTAL_ZEROS TABLE FOR CHROMA DC 2X2 BLOCK total_zeros TotalCoeff (TC) (TZ) 1 2 3 0 1 1 1 1 01 01 0 2 001 00 3 000
In this section we defined optimized arithmetic decoding equations for CAVLC decoding. The proposed arithmetic decoding operation has advantage of low power consumption, low memory usage, and fast decoding by minimizing memory access.
Y.-H. Kim et al.: Memory-Efficient H.264/AVC CAVLC for Fast Decoding
f = 4 – TC; code = ShowBits(f); if(code