An efficient joint implementation of three stages for

Multimed Tools Appl DOI 10.1007/s11042-011-0881-3

An efficient joint implementation of three stages for fast computation of color space conversation in image coding/decoding Xiuhua Ji & Caiming Zhang & Xuefen Zhang

# Springer Science+Business Media, LLC 2011

Abstract This paper proposes an efficient joint implementation algorithm for computing color space conversion, quantization and discrete cosine transform (DCT) in an image coder/decoder. By combining the three stages, the proposed algorithm reduces the operation amount of computing color space conversion considerably. In the case of color sampling 4:4:4, the proposed algorithm reduces the multiplication amount by 40% and the addition amount by 42% for the conversion from RGB to YCbCr in an image coder, and reduces the multiplication amount by 60% and the addition amount by 42% for the conversion from YCbCr to RGB in an image decoder. In the cases of down-sampling 4:2:2 and 4:1:1, there are the similar results. The existing fast methods in the literatures can still be applied together with this proposed algorithm into the implementation of the international image coding standards which use the transform coding technology, such as JPEG, MPEG and H.26X, and raises the image coding/decoding speed efficiently. Keywords Joint Implementation . Color space conversion . Quantization . DCT . Transform coding

1 Introduction Image compression plays an important role in the evolution of the technologies, such as multimedia calculating, digital games, information superhighway and videophone, etc. In order to realize real-time processing of image data, it needs the effective image compression methods which can achieve a good performance in image quality, as well as a high compression ratio and a low consuming-time. The common compression methods include prediction coding, transform coding, structure coding and fractal coding, etc. Among these methods, transform coding has already been widely used in the international coding standards, such as JPEG, X. Ji (*) : C. Zhang : X. Zhang School of Computer Science and Technology, Shandong Provincial Key Laboratory of Digital Media Technology, Shandong Economic University, Jinan 250014, China e-mail: [email protected]

Multimed Tools Appl

MPEG and H.26X[5, 22, 23]. Color space conversion, DCT, quantization calculation, etc. are important stages, and high time-consuming in image compression/decompression. In order to raise the image compression speed, the important stages should be studied to find their fast computation. In practice, most image/video acquisition systems only offer RGB video frame. We often need to do the conversion from RGB to YIQ, YUV or YCrCb for further data compression in senders, and also need to convert the image/video data from YIQ, YUV or YCrCb to RGB in receivers to show the image/video on a screen. Techniques which efficiently implement this conversion are desired. By looking up a great lot of the relevant papers, we found less papers about raising the speed of Color space conversion. There are several optimized technologies about color space conversion in the literatures [2–4, 14, 20, 21, 25, 26], which include a look-up table method, a method of substituting integer multiplications and right-shifts for floating-point multiplications, and some novel architecture for efficient implementation of color space conversion suitable for Field Programmable Gate Array (FPGA) and VLSI. All these technologies are based on the color space conversion functions, the numbers of additions and multiplications that need be implemented in the conversion process aren’t reduced and only the ways of implementing the addition and multiplication operations are optimized. DCT and quantization are critical parts in the transform coding technology. Many different fast algorithms for DCT and quantization have been developed for image and video applications [1, 6–13, 15–19, 24]. Among them, the algorithms [1, 7, 9, 12, 24] combine DCT and quantization, reducing substantially the number of computations. The integer DCT algorithm [8, 10, 19, 24], which is used in H.264, also combines the DCT and quantization. This paper proposes an efficient joint implementation algorithm for combining three important stages of color space conversion, quantization and discrete cosine transform (DCT) in an image coder/decoder. The proposed algorithm is suitable for image compression/decompression including transform coding technology. After adopting this proposed algorithm, we can still effectively adopt the existing optimization technologies mentioned in the literatures [1–4, 6–21, 24–26] to reduce the number of computations. In the case of color sampling 4:4:4, the algorithm reduces the multiplication amount and addition amount of computing the conversion from RGB to YCbCr respectively by 40% and 42% in an image coder, and reduces the amounts of computing the conversion from YCbCr to RGB respectively by 60% and 42% in an image decoder. It can be applied into the implementation of the international coding standards such as JPEG, MPEG, and H.26X and raises the image coding/decoding speed efficiently. The composition of the rest of the paper is as follows. Firstly, the paper introduces the three conventional stages: color space conversion, DCT and quantization, which are included in the international coding standards such as JPEG, MPEG and H.26X. Secondly, we explain the basic principle of the joint implementation algorithm for computing the three important stages in the encoder/decoder system in Section 3 and 4. Thirdly, the experiments demonstrate that the algorithm of the paper is simple and very efficient in Section 5. Finally, the conclusion is stated in Section 6.

2 The three conventional stages in the coder/decoder system The joint implementation algorithm reduces the operation amount of computation considerably by combining the three steps of color conversion, DCT and quantization. Figure 1 illustrates the substructure in a coder. The three stages are introduced simply as follows.

Multimed Tools Appl

conversion

R image

from

G image

Y Cb

DCT

RGB to

B image

YCbCr

DCT

Cr

DCT

Quantization

/Q(u,v) Quantization

Y quantized DCT data

Cb quantized

/q(u,v)

DCT data

Quantization

Cr quantized

/q(u,v)

DCT data

Fig. 1 The conventional substructure in an encoder including color space conversion

1) Color space conversion RGB color space is a simple and robust color definition, which uses three numerical components to represent a color. It can be thought of as a three-dimensional coordinate system whose axes correspond to the three components: R or Red, G or Green, and B or Blue. RGB is the color space that computer displays use. It corresponds most closely to the behavior of the human eye. YCbCr Color Space was developed as a part of the Recommendation for worldwide digital component video standard and is used in television transmissions. Here a RGB color is separated into a luminance part (Y) and two chrominance parts (Cb and Cr). The conversion function from RGB to YCbCr is showed as follows according to ITU-R BT.601 Standard: 8 > Y ¼ 0:299 R þ 0:587 G þ 0:114 B > < ð1Þ Cb ¼ 0:564 ðB Y Þ þ 128 > > : Cr ¼ 0:713 ðR Y Þ þ 128 2) DCT DCT is a linear orthonormal transform. We often divide a source image into many 8×8 sub-blocks, then operate DCT on these small 8×8 sub-blocks independently in image coding. The 2D 8×8 DCT/IDCT can be expressed individually as Fðu; vÞ ¼

f ðm; nÞ ¼

7 X 7 X 1 ð2m þ 1Þpu ð2n þ 1Þpv CðuÞCðvÞ cos Þ ðf ðm; nÞ cos 4 16 16 m¼0 n¼0

7 X 7 1X ð2m þ 1Þpu ð2n þ 1Þpv cos Þ ðCðuÞ CðvÞFðu; vÞ cos 4 u¼0 v¼0 16 16

ð2Þ

ð3Þ

where m, n, u, v=0,1,⋯,7,f(m, n) is a pixel value of a 8×8 sub-block in a source image f, F (u, v) is a DCT component of the 8×8 sub-block and pffiffiffi 1= 2 s¼0 CðsÞ ¼ : 1 others Eqs. 2 and 3 can also be expressed in matrices as F ¼ G f GT

ð4Þ

f ¼ GT F G

ð5Þ

Multimed Tools Appl

where f is a source image, F is the DCT component matrix, and 2 1 3 pffiffi p1ffiffi p1ffiffi 2 2 2 6 p 3p 15p 7 1 6 cos 16 cos 16 cos 16 7 7 G¼ 6 .. .. . 7 26 4 .. 5 . . 7p 21p 105p cos 16 cos 16 cos 16 3) Quantization Quantization process is that each DCT component F(u, v) of a sub-block is divided by a corresponding quantization step. Let E(u,v) be the quantized coefficient of F(u,v), that is, Eðu; vÞ ¼ intðFðu; vÞ=qðu; vÞ þ 0:5Þ, where q(u, v) is the quantization step of F(u, v). Actually, we often change the division into the multiplication of decimals to finish the quantizing operation [16]. In Fig. 1, Q(u, v) and q(u, v) are the quantization steps of luminance signal and color signal respectively. Figure 2 illustrates the substructure in the decoder which is opposite to the substructure of the encoder in Fig. 1. It includes the three inverse processing stages of the three stages in Fig. 1. According to ITU-R BT.601 Standard, the conversion function from YCbCr to RGB in Fig. 2 is showed as follows: 8 < R ¼ Y þ 1:402 ðCr 128Þ G ¼ Y 0:7141 ðCr 128Þ 0:3441 ðCb 128Þ ð6Þ : B ¼ Y þ 1:772 ðCb 128Þ RGB is a device-dependent color model. We can actually adjust each coefficient in Eqs. 1 and 6 for human visual character and CRT non-linear character. In the literature [14], the conversion functions are written as Eqs. 7 and 8. 8 < Y ¼ 0:257 R0 þ 0:504 G0 þ 0:098 B0 þ 16 ð7Þ Cb ¼ 0:576 ðB0 Y Þ þ 128 : Cr ¼ 0:730 ðR0 Y Þ þ 128 8 0 < R ¼ 1:164 ðY 16Þ þ 1:596 ðCr 128Þ G 0 ¼ 1:164 ðY 16Þ 0:813 ðCr 128Þ 0:392 ðCb 128Þ : 0 B ¼ 1:164 ðY 16Þ þ 2:017 ðCb 128Þ

ð8Þ

where R′,G′and B′ are the gamma-corrected RGB values.


Inverse Quantization

Cb quantized DCT data


Cr quantized DCT data


×Q(u,v)

IDCT

Y

conversion

R image

from

×q(u,v)

×q(u,v)

IDCT

Cb

YCbCr

G image

To

IDCT

Cr

Fig. 2 The conventional substructure in the decoder opposite to Fig. 1

RGB

B image

Multimed Tools Appl

Equations. 1, 6, 7 and 8 can be rewritten as Eqs. 9 and 10 generally so that we can discuss the proposed algorithm conveniently in Section 3. 8 > < Y1 ¼ Y k Cb1 ¼ d1 Cb ðm=d kÞ > > : Cr1 ¼ 1e Cr ðn=e kÞ

We put Eq. 9 into Eq. 11 and then get Eq. 12. 8 < Y1 ¼ a R þ b G þ c B Cb ¼ B Y1 : 1 Cr1 ¼ R Y1

ð11Þ

ð12Þ

In order to reduce the arithmetic operations of computing color space conversion, we first use Eq. 12 in the color space conversion stage to get Y1, Cb1 and Cr1 images and then send Y1, Cb1 and Cr1 images into the DCT stage. Let Y0 be a 8×8 sub-block of Y image, Y0 1 be a 8×8 subblock of Y1(=Y−k) image and Ck be a 8×8 constant matrix where each element is equal to k. e0 of Y′ is written as follow: According to Eqs. 4 and 11, the DCT component matrix Y e0 1 þ C e0 ¼ G Y0 GT ¼ G ðY0 1 þ Ck Þ GT ¼ G Y0 1 GT þ G Ck GT ¼ Y e k ð13Þ Y 2 » 8 k 6 0 e k is computed easily, that is C ek ¼ 6 . Here C 4 .. Let Cbd ¼ d1 Cb; Cre sub-block Cb0 1 ðCr0 1 Þ of

3 0 0 0 07 .. . 7: . .. 5 0 0

0 0 0 e e ¼ Cb 1 Cr 1 be the DCT component matrix of a 8×8 Cb1(Cr1) image. Similarly, we can get the DCT component matrix 1 e Cr;

Multimed Tools Appl

e 0 d of a 8×8 sub-block Cb0 d of Cbd image and the DCT component matrix Cr e 0 e of a 8× Cb 0 8 sub-block Cr e of Cre image as follows: ( e 0 d ¼ G Cb0 d GT ¼ G Cb0 1 GT þ G Cm=dk GT ¼ Cb e 01 þ C e m=dk Cb ð14Þ T T T 0 0 0 0 e e e Cr e ¼ G Cr e G ¼ G Cr 1 G þ G Cn=ek G ¼ Cr 1 þ Cn=ek :

e m=dk where C

2 » 8 ðm=d kÞ 0 6 0 0 .. .. ¼6 4 . . 0 0

3 2 » 8 ðn=e kÞ 0 0 6 07 0 0 e n=ek ¼ 6 .. 7 ;C .. .. 5 4 . . . 0 0 0

3 0 07 .. 7 : .5 0

8×k, 8×(m/d−k) and 8×(n/e−k) are all constants which can be computed in advance. Therefore, computing each sum of the matrices in the Eqs. 13 and 14 only needs 1 addition per 8×8 matrix. Figure 3 illustrates the improved substructure of the coder in Fig. 1 according to the abovementioned principles. Compared with Fig. 1, the process in Fig. 3 has three adjustments: 1) The color conversion stage computes the conversion from RGB to Y1Cb1Cr1 by using Eq. 12 instead of the conversion from RGB to YCbCr in Fig. 1, and then send Y1, Cb1 and Cr1 images into the DCT stage. e0 1 ,Cb e 0 1 and Cr e 0 1 ) of Y1, Cb1 2) After DCT stage, we get the 8×8 DCT component matrices (Y and Cr1 images. We add a new stage “F(0,0)+8×……” which does the job that the (0,0) th element of each 8×8 DCT sub-block adds to 8×k (8×(m/d−k) or 8×(n/e−k)) value. The new stage is to compute each sum of the matrices in Eqs. 13 and 14. After the new stage, e0 ,Cb e 0 d and Cr e 0 e ) of Y, Cb/d and Cr/e images. we get the 8×8 DCT component matrices (Y 3) The quantization steps are changed to Q(u,v), Qd(u,v) and Qe(u,v) respectively in the quantization stage,where Qd(u,v)=q(u,v)/d and Qe(u,v)=q(u,v)/e. We can do so because DCT is a linear orthonormal transform. After the quantization stage in Fig. 3, we gain the quantized DCT components of Y, Cb and Cr images which are same as the final data in Fig. 1. Compared with Fig. 1, the DCT stage and quantization stage in Fig. 3 are not changed except the values of quantization steps, so the arithmetic operations of the two stages in Fig. 3 are same as in Fig. 1. The arithmetic operations of the color conversion stage in Fig. 3 are much less than in Fig. 1. In the actual application, the efficiency of this algorithm is different in different color sampling situations. 1) For color sampling 4:4:4 images, each pixel in Fig. 1 needs 5 multiplications and 7 additions for the conversion from RGB to YCbCr according to Eq. 9, while each pixel

Y1 R Image

DCT

RGB

F(0,0)+ 8×k

Quantization

Y quantized

/Q(u,v)

DCT data

To G Image

Y1Cb1Cr1

Cb1

F(0,0)+

Quantization

DCT

8×(m/d−k)

/Qd(u,v)

DCT

F(0,0)+ 8×(n/e−k)


Conversion B Image

Cr1

Fig. 3 The improved substructure of the coder

Quantization

/Qe(u,v)


Multimed Tools Appl

in Fig. 3 needs only 3 multiplications and 4 additions according to Eq. 12. The new stage “F(0,0)+8×…” needs 3 addition per 8×8 sub-image, that is, 3/64(≈0.05) addition per pixel. As a result, the multiplication amount of color conversion is reduced by 40% and the addition amount is reduced by 42% (42%=(7–4.05)/7). 2) For down-sampling 4:2:2 images, the number of Cb or Cr data is half of Y data, so the multiplication amount is reduced by 25% (25%=(2+2)/(4×3+2+2)) and the addition amount is reduced by 40% (40%=(4+2+2)/(4×3+2×2+2×2)). 3) For down-sampling 4:1:1 images, there is a similar result. 3.2 Joint implementation in the decompression process The joint implementation in the decompression process is similar to that in the compression process. The improved substructure of the decoder is showed in Fig. 4. Compared with Fig. 2, Fig. 4 has three adjustments: 1) The inverse quantization steps are changed to Qf(u,v), Qj(u,v) and Qg(u,v) respectively in the inverse quantization stage, where Qf (u,v)=Q(u,v)×f, Qj(u,v)=q(u,v)×j and Qg(u,v)= q(u,v)×g. After the inverse quantization stage, we get the 2D 8×8 DCT component matrices of f×Y, j ×Cb and g ×Cr images. 2) Before the IDCT stage, we add a new stage “F(0,0)−8×……” which does the job that a 8×k×f (8×m×j or 8×n×g) value is subtracted from the (0,0)th element of each 8×8 DCT component matrix. 8×k×f, 8×m×j and 8×n×g are all constants. The DCT components of Y2, Cb2 and Cr2 images are gotten after the new stage and Y2, Cb2 and Cr2 images are gotten after the IDCT stage, where 8 < Y2 ¼ f ðY kÞ ð15Þ Cb ¼ j ðCb mÞ : 2 Cr2 ¼ g ðCr nÞ 3) The conversion from Y2Cb2Cr2 to RGB is realized by using Eq. 16. 8 < R ¼ Y2 þ Cr2 G ¼ Y2 h0 Cr2 i0 Cb2 : B ¼ Y2 þ Cb2

ð16Þ

where h′=h/g, i′=i/j. We put Eq. 15 into Eq. 16 and then get Eq. 10. As a result, we get RGB recovered images after the color conversion stage in Fig. 4, which are same as the final data in Fig. 2.




×Qf(u,v)


×Qj(u,v)

Y2

F(0,0)− 8×k×f

IDCT

F(0,0)− 8×m×j

IDCT

Conversion Cb2

R Recovered Image

from Y2Cb2Cr2

G Recovered Image

to Cr quantized DCT data


×Qg(u,v)

F(0,0)− 8×n×g

Fig. 4 The improved substructure of the decoder

IDCT

Cr2

RGB

B Recovered Image

Multimed Tools Appl Y1 RGB

R Image

Integer DCT

F(0,0)+ 16×k

Quantization

Y quantized

/QEf(u,v)

DCT data

Integer DCT

F(0,0)+

Quantization

Integer DCT

F(0,0)+ 16×(n/e−k)

To

Y1Cb1Cr1

G Image

Cb1

16×(m/d−k)

/QdEf

Cb quantized

(u,v)

DCT data

Conversion

Cr1 B Image

Quantization /QeEf

(u,v)


(a) The improved substructure of intra-frame coder with 4×4 integer DCT Y quantized


DCT data

×QfEt (u,v)



×QjEt (u,v)

F(0,0)− k×f

Integer IDCT

F(0,0)− m×j

Integer IDCT

Y2

R Recovered Image

Conversion Cb2

from G Recovered Image

Y2Cb2Cr2 to



F(0,0)− n×g

×QgEt (u,v)

Integer IDCT

RGB

Cr2

B Recovered Image

b) The improved substructure of intra-frame decoder with 4×4 integer IDCT Fig. 5 The improved intra-frame coding/decoding substructure with 4× 4 integer DCT

⊕ −

Y1

R

RGB To

G B

Cb1

Y1Cb1Cr1 Conversion

Cr1

⊕ −

⊕ −

Integer DCT

Quantization

Integer DCT

Quantization

Integer DCT

Motion Estimation and Compensation

Y quantized DCT Residual data

/QEf(u,v)

/QdEf

Cb quantized DCT Residual data

(u,v)

Quantization /QeEf

Cr quantized DCT Residual data

(u,v)

Integer IDCT

Inverse Quantization × QEt(u,v)

Integer IDCT

Inverse Quantization × QdEt (u,v)

Integer IDCT

Inverse Quantization × QeEt (u,v)

(a) The improved substructure of inter-frame coder with 4×4 integer DCT Y quantized DCT Residual data

Inverse Quantization × QfEt(u,v)

Cb quantized DCT Residual data

Inverse Quantization × QjEt (u,v)

Cr quantized DCT Residual data

Inverse Quantization × QgEt (u,v)

Integer IDCT Integer IDCT Integer IDCT

Y2 Motion Compensation

Y2Cb2Cr2 Cb2

To RGB

Cr2

Conversion

R video G video B video

(b) The improved substructure of inter-frame decoder with 4×4 integer IDCT Fig. 6 The improved inter-frame coding/decoding substructure with the 4× 4 integer DCT

Multimed Tools Appl

Compared with Fig. 2, the inverse quantization stage and IDCT stage in Fig. 4 are not changed except the values of quantization steps. In an actual application, the efficiency of this algorithm is different in different image color sampling situations. 1) For color sampling 4:4:4 images, each pixel in Fig. 4 needs only 2 multiplications and 4 additions according to Eq. 16. The new stage “F(0,0)−8×…” needs 3/64(≈0.05) addition per pixel averagely. Therefore, the multiplication amount of color conversion is reduced by 60% and the addition amount is reduced by 42%. 2) For down-sampling 4:2:2 or 4:1:1 images, there are the similar results. Compared with Fig. 1 or Fig. 2, the DCT (IDCT) stage and quantization (inverse quantization) stage in Fig. 3 or Fig. 4 are not changed except the values of quantization steps, so the existing algorithms [1, 6, 7, 9, 11–13, 15–18] are compatible with the proposed algorithm and can still be used in DCT and quantization in Fig. 2 or IDCT and inverse quantization in Fig. 4. Note that the constants added or subtracted in the new stage “F(0,0) + 8×……” or “F(0,0) − 8×…” need be changed accordingly besides quantization steps. Moreover, it is easy to understand that the optimized technologies [2–4, 14, 20, 21, 25, 26] can still be used in the conversion from Y2Cb2Cr2 to RGB or from RGB to Y1Cb1Cr1.

4 The proposed algorithm for video coding/decoding with integer DCT The transform used in h.264 is a 4×4 or 8×8 integer DCT instead of the 8×8 DCT that MPEG-2 uses. The integer transform is originated from DCT, but has lower complexity with little performance degradation. It only involves addition and shift operations. The 4× 4 forward integer DCT is as follow [8, 10, 19, 24]. Y ¼ ðC f XC f T Þ Ef

ð17Þ

where X is an original input matrix, “⊗” stands for the operation that scalar multiplication of two matrices, 2 2 3 1 1 1 1 a ab=2 6 ab=2 b2 =4 6 2 1 1 2 7 6 6 7 Cf ¼ 4 ;E ¼4 2 1 1 1 1 5 f ab=2 a 1 2 2 1 ab=2 b2 =4 2

3 a2 ab=2 rffiffiffi ab=2 b2 =4 7 7; a ¼ 1 ; and b ¼ 2 : a2 ab=2 5 2 5 ab=2 b2 =4

The 4× 4 inverse integer DCT is given by: X0 ¼ CTt ðY Et ÞCt ¼ CTt F Ct

ð18Þ

where 2

3 2 2 1 1 1 1 a 6 1 7 6 ab 1 = 2 1 = 2 1 7; E ¼ 6 Ct ¼ 6 4 1 1 1 1 5 t 4 a2 1=2 1 1 1=2 ab

ab b2 ab b2

a2 ab a2 ab

3 ab b2 7 7; and F ¼ Y Et : ab 5 b2

If Fig. 1 or Fig. 2 is still used to depict the substructure in an intra-frame encoder or a intra-frame decoder with the 4× 4 integer DCT, the DCT step in Fig. 1 is for the

Multimed Tools Appl Table 1 Comparisons of consuming times in a coder with ANN fast DCT algorithm color images (512×512)

Time for color conversion (ms)

Time for DCT and quantization (including “F(0,0)+8×…” stage) (ms)

Total time (ms)

Fig. 1

1876

17451

19327

Fig. 3

1263

17503

18766

computation of CfXCfT, the IDCT step in Fig. 2 is for the computation of CtT⋅F⋅Ct. In fact, the scaling operation “⊗Ef” or “⊗Et” in (17) or (18) can be merged into the quantization or the inverse quantization process in Fig. 1 or Fig. 2 to reduce the total number of multiplications. Then, the quantation steps are changed as follows. QEf ðu; vÞ ¼ Qðu; vÞ=Ef ðu; vÞ Qd Ef ðu; vÞ ¼ Qd ðu; vÞ=Ef ðu; vÞ Qe Ef ðu; vÞ ¼ Qe ðu; vÞ=Ef ðu; vÞ Qf Et ðu; vÞ ¼ Qf ðu; vÞ Et ðu; vÞ Qj Et ðu; vÞ ¼ Qj ðu; vÞ Et ðu; vÞ Qg Et ðu; vÞ ¼ Qg ðu; vÞ Et ðu; vÞ

ð19Þ

where u,v=0,1,2,3. The proposed algorithm for intra-frame coding/decoding with the 4× 4 integer DCT is showed in Fig. 5, where the integer DCT step is for the computation of CfXCfT, the integer IDCT step is for the computation of CtT⋅F⋅Ct. Similarly, the proposed algorithm for inter-frame coding/decoding with the 4× 4 integer DCT is showed in Fig. 6. It is easy to deduce that the stages “F(0,0)+……” and “F(0,0)−…” of the proposed algorithm are not needed anymore because of motion compensation in Fig. 6, and in Fig. 6a, the inverse quantization steps are changed as QEt ðu; vÞ ¼ Qðu; vÞ Et ðu; vÞ, Qd Et ðu; vÞ ¼ Qd ðu; vÞ Et ðu; vÞ, Qe Et ðu; vÞ ¼ Qe ðu; vÞ Et ðu; vÞ. As a result, the output of Fig. 6a is the quantized DCT Residual data of videos.

5 Experiments Our experimental platform is a Thinkpad X200 computer which consists of Intel(R) Core(TM)2 Duo CPU (2.53 GHz) and 3 GB memory running Windows XP. In order to demonstrate the efficiency and rationality of the joint implementation algorithm of the paper, we have used Visual Basic 6.0 programming language to implement all the

Table 2 Comparisons of consuming times in a decoder with Feig’s fast IDCT algorithm color images (512×512)


Time for IDCT and inverse quantization (including “F(0,0)−8×…” stage) (ms)

Total time (ms)

Fig. 2

3326

16891

20217

Fig. 4

1727

16985

18708

Multimed Tools Appl Table 3 Comparisons of consuming times in an coder with the 4×4 integer DCT color images (512×512)


Time for DCT and quantization (including “F(0,0)+16×…” stage) (ms)

Total time (ms)

Fig. 1 (the 4×4 integer DCT)

1849

12760

14609

Fig. 5(a)

1248

12771

14019

processes of Figs. 1, 2, 3, 4, 5, 6 to process many 512×512 color images (such as Baboon, Peppers and Lena standard images) and video clips (such as foreman, carphone and akiyo) in the case of color sampling 4:4:4, and used the timeGetTime function to measure the consuming times. In fact, the proposed algorithm suits for any color images and the applicability of the proposed algorithm is same for any color images. Eqs. 7 and 8 are used for color conversion between the gamma-corrected RGB values (R′G′B′) and YCbCr (that is, a=0.257, b=0.504, c=0.098, d=0.576, e=0.730, f=1.164, g=1.596, h= 0.813, i=0.392, j=2.017, k=16, m=128 and n=128). In computing the DCT/IDCT stages in Figs. 1, 2, 3, 4, 5, 6, we do shift operations by multipling because Visual Basic 6.0 programming language has no shift instructions. It doesn’t cause any trouble to the following comparison between the conventional substructure and the improved substructure. We have done three groups of experiments. In the first group of experiments, ANN fast DCT algorithm [1] is used for 2D 8×8 DCT computation in Figs. 1 and 3, and Feig’s fast IDCT algorithm [9] for 2D 8×8 IDCT computation in Figs. 2 and 4. Noted that the two algorithms are the scaled DCT algorithms so that the parameters added or subtracted in the new stage “F(0,0)−8×…” or “F(0,0)+8×…” and the values in the quantization or inverse quantization stage need to be scaled accordingly. The group of experiments has proved that the final data of the process in Fig. 3 (or Fig. 4) are same as the data in Fig. 1 (or Fig. 2). The consuming times of computing the processes in Figs. 1, 2, 3, 4 twenty times are listed in Tables 1 and 2. The data in the tables are consistent with the theoretical analyzing in Section 3. Tables 1 and 2 show that the time spent in color conversion in Fig. 3 or Fig. 4 is reduced considerably compared with the one in Fig. 1 or Fig. 2, and the time spent in the new stage “F(0,0) − 8×…” or “F(0,0) + 8×…” is tiny, that is, the percentage of computational time of DCT and quantization stages the F(0,0)+8x(or F(0,0)-8x) stage takes is (17503–17451)/17451=0.3% (or (16985–16891)/16891)=0.396%). The total time is reduced in the proposed algorithm. The literature [1] proposed 1D DCT algorithms, which is applied to 2D 8×8 DCT in each of rows and columns in turn. Feig’s fast DCT algorithm [9] is a 2D method based on the sparse factorizations of the DCT matrix. Both of them are scaled DCT algorithms, which combine DCT and quantization for reducing the

Table 4 Comparisons of consuming times in a decoder with the 4×4 integer IDCT color images (512×512)

Time for color Time for IDCT and inverse quantization Total time (ms) conversion (ms) (including “F(0,0)−…” stage) (ms)

Fig. 2 (the 4×4 integer DCT) 2816

12266

15082

Fig. 5(b)

12279

14125

1846

Multimed Tools Appl Table 5 Comparisons of consuming times in an video coder with the 4×4 integer DCT Video clips (176×144, 90 Frames)


Time for DCT/IDCT, (inverse) quantization, motion estimation and compensation, etc. (ms)

Total time (ms)

The conventional coding substructure of H.264 Fig. 5(a) and Fig. 6(a)

2186

26011

28197

1761

26025

27786

number of computations. Therefore, the experiments also demonstrate that the existing fast methods [1, 6, 7, 9, 11–13, 15–18] can still be adopted together with this proposed algorithm to raise the speed of image coding/decoding. In the second group of experiments, Fig. 1 or Fig. 2 is still used to depict the substructure in an encoder or a decoder with the 4× 4 integer DCT if the DCT step in Fig. 1 is for the computation of CfXCfTand the IDCT step in Fig. 2 is for the computation of CtTFCt. The fast algorithm [8, 19] is used for 2D 4×4 integer DCT/IDCT computation in Figs. 1, 2 and 5a, b. The consuming times of computing the three stages twenty times are listed in Tables 3 and 4. The data in the tables show that the proposed algorithm with integer DCT is very efficient. In the third group of experiments, several standard video clips, such as foreman, carphone and akiyo(176×144,90 Frame), are used to compare the improved substructures in Fig. 5 and Fig. 6 with the conventional substructures of H.264[5, 23]. For simplicity, all the motion vectors in the stage of motion estimation and compensation are taken as zero. The first frames of video clips need to use the improved intra-frame coding/decoding substructures in Fig. 5a/b to code/decode. Note that, when the quantized DCT data of the first frames are decoded for motion compensation of the second frames, the inverse quantization steps in Fig. 5b should be changed as QEt(u, v), QdEt(u, v)and QeEt(u, v) instead of QfEt(u, v),QjEt(u, v) and QgEt(u, v) because the results of decoding the quantized DCT data of the first frames for motion compensation of the second frames should be Y1Cb1Cr1 data. The consuming times of computing the four stages of color conversion, integer DCT/IDCT, (inverse) quantation and motion compensation in video coding/decoding are listed in Tables 5 and 6. The data in the two tables show that the speed of the improved substructure is higher than the original substructure of H.264. The group of experiments has proved the efficiency and rationality of the process in Fig. 5 or Fig. 6 for video coding/decoding. Table 6 Comparisons of consuming times in a video decoder with the 4×4 integer IDCT Video clips (176×144, 90 Frame)


Time for IDCT, inverse quantization, motion compensation, etc. (ms)

Total time (ms)

The conventional decoding substructure of H.264

2754

12136

14890

Fig. 5(b) and Fig. 6(b)

1772

12197

13969

Multimed Tools Appl

6 Conclusion The paper proposes a joint implementation algorithm of color space conversion, DCT and quantization. The proposed algorithm is demonstrated to be simple and effective. It reduces the number of computations by combining the three stages. Based on the improved structures proposed in the paper, the existing fast methods of color space conversion, DCT and quantization can still be adopted to raise the speed of image coding/decoding. Therefore, the proposed algorithm can be efficiently applied into the implementation of the international image coding standards which use the transform coding technology, such as JPEG, MPEG and H.26X. Acknowledgment This research is supported by the national natural science foundation of China (Key Program No.60933008 and No.61073162) and the Shandong natural science foundation of China (No. ZR2009GL013).

References 1. Arai Y, Agui T, Nakajima M (1988) A Fast DCT-SQ Scheme for Images. Transactions of Institute of Electronics, Information and Communication Engineers 71(11):1095–1097 2. Bartkowiak M (2001) Optimizations of Color Transformation for Real Time Video Decoding. EURASIP Conference on Digital Signal Processing for Multimedia Communications and Services (ECMCS 2001), Budapest 3. Bensaali F, Amira A (2004) Design and Implementation of Efficient Architectures for Color Space Conversion. International Journal on Graphics, Vision and Image Processing 5(1):37–47 4. Bensaali F, Amira A, Bouridane A (2004) An Efficient Architecture for Color Space Conversion Using Distributed Arithmetic. Proceedings of the IEEE International Symposium on Circuits and Systems, Vancouver (Canada), pp II-265-8, doi:10.1109/ISCAS.2004.1329259 5. Bhaskaran V, Konstantinides K (2003) Image and Video Compression Standards: Algorithms and Architectures. Kluwer Academic Publishers, Norwell 6. Chen WH, Smith CH et al (1977) A fast computational algorithm for the discrete cosine transform. IEEE Trans on Communications 25(9):1004–1009. doi:10.1109/TCOM.1977.1093941 7. Docef A, Kossentini F, Khanh NP, Ismaeil IR (2002) The quantized DCT and its application to DCTbased video coding. IEEE Trans Image Process 11(3):177–187. doi:10.1109/83.988952 8. Fan CP (2006) Fast 2-dimensional 4×4 forward integer transform implementation for H.264/AVC. IEEE Trans on circuits and systems II: Express Briefs 53(3):174–177 9. Feig E, Winograd S (1992) Fast algorithms for the discrete cosine transform. IEEE Trans on Signal Processing 40(9):2174–2193. doi:10.1109/78.157218 10. Ji XH, Zhang CM, Wang K (2009) A Fast Two-Dimension 4×4 Inverse Integer Transform Algorithm for Real-time H.264 Decoder. International Journal of Innovative Computing Information and Control 5 (3):689–696 11. Ji XH, Zhang CM, Wang JY, Boey SH (2009) Fast 2-D 8×8 discrete cosine transform algorithm for image coding. Science in China Series F: Information Sciences 52(2):215–225 12. Khanh NP, Docef A, Kossentini F (1999) Quantized discrete cosine transform: a combination of DCT and scalar quantization. IEEE International Conference on Acoustics, Speech, and Signal Processing 6:3197–3200. doi:10.1109/ICASSP.1999.757521 13. Kusuma ED, Widodo TS (2010) FPGA implementation of pipelined 2D-DCT and quantization architecture for JPEG image compression. 2010 International Symposium on Information Technology (ITSim), Kuala Lumpur, pp 1–6. doi:10.1109/ITSIM.2010.5561411 14. Latha P (2005) Color Space Converter: Y’CrCb to R′G′B′. Xilinx Aplication Note. XAPP283 (v1.3.1) 15. Lee BG (1984) A new algorithm to compute the discrete cosine transform. IEEE Trans on Acoustic, Speech, Signal Processing 32(12):1243–1245. doi:10.1109/TASSP.1984.1164443

Multimed Tools Appl 16. Lengwehasatit K, Ortega A (2004) Scalable variable complexity approximate forward DCT. IEEE Trans on Circuits and Systems for Video Technology 14(11):1236–1248. doi:10.1109/ TCSVT.2004.835151 17. Leoffler C, Ligtenberg A, Moschytz GS (1989) Practical fast 1D DCT algorithms with 11 mutiplications. In Proc IEEE ICASSP 2:988–991. doi:10.1109/ICASSP.1989.266596 18. Liang J (2001) Fast Multiplierless Approximations of the DCT With the Lifting Scheme. IEEE Trans on Signal Processing 49(12):3032–3044. doi:10.1109/78.969511 19. Malvar HS, Hallapuro A, Karczewicz M, Kerofsky L (2003) Low-complexity transform and quantization in H.264/AVC. IEEE Trans on Circuits and Systems for Video Technology 13(7):598–603 20. Mihai S, Stamatis V et al (2002) Y’UV-to-R’G’B’ Color Space Conversion on FPGA-augmented TriMedia-32 Processor. Proceeding of the Workshop on Circuits. Systems and Signal Processing, ISBN 90-73461-33-2, Veldhoven, Netherlands, pp 465–470 21. Mihai S, Stamatis V, et al (2003) Color Space Conversion for MPEG decoding on FPGA-augmented TriMedia Processor. IEEE 14th Intl. Conf. on Application-specific Systems, Architectures, and Processors, Hague, Netherlands, pp 250–259. doi:10.1109/ASAP.2003.1212848 22. Mitchell JL, Pennebake WB et al (1997) MPEG Video Compression Standard. Kluwer Academic Publishers, Norwell 23. Richardson IEG (2003) H.264 and MPEG-4 Video Compression—Video Coding for Next-Generation Multimedia. Wiley, New York 24. Xiang XX, Wang Y, Xiang Yangxia et al (2010) Efficient Fast Algorithm of DCT for H.264/AVC. 2010 Third International Conference on Intelligent Networks and Intelligent Systems, Shenyang, pp 76 – 79. doi:10.1109/ICINIS.2010.28 25. Xue YL, Liu K et al (2002) Optimization for a Parallel JPEG Algorithm. Acta Electronica Sinica 32 (2):153–155 26. Yang Y, Peng YH, Liu ZG (2007) A Fast Algorithm for YCbCr to RGB Conversion. IEEE Transactions on Consumer Electronics 53(4):1490–1493

Xiuhua Ji received the B.E. and M.S. degrees in electronics from Shandong University in 1985 and 1988, respectively, and the Ph.D. degree in computer science and technology from Shandong University, Jinan, China, in 2009. She is now a professor at Shandong Economic University. Her current research interests include image process and compression.

Multimed Tools Appl

Caiming Zhang is a professor and doctoral supervisor of the school of computer science and technology at the Shandong University. He is now also the dean and professor of the school of computer science and technology at the Shandong Economic University. He received a BS and an ME in computer science from the Shandong University in 1982 and 1984, respectively, and a Dr. Eng. degree in computer science from the Tokyo Institute of Technology, Japan, in 1994. From 1997 to 2000, Dr. Zhang has held visiting position at the University of Kentucky, USA. His research interests include CAGD, CG, information visualization and medical image processing.

Xuefen Zhang received the B.E. in electronics and M.S. degrees in Communication and Information System from Shandong University in 2002 and 2006, respectively, and the Ph.D. degree in Communication and Information System from Beijing University of Posts and Telecommunications, Beijing, China, in 2011, She is now a teacher at Shandong Economic University, Her current research interests include Signal processing and general communication theories.

An efficient joint implementation of three stages for

An efficient joint implementation of three stages for

Suggest Documents

An efficient mobile PACE implementation

An Energy Efficient ONU Implementation

An Implementation of Energy-efficient Routing

An Efficient Implementation of Decoupled Communication in ...

Implementation of an Efficient Transformerless Single ...

An efficient implementation of Slater-Condon rules

An efficient microfluidic sorter: implementation of ...

Towards an Efficient Implementation of Sequential Montgomery ...

An Efficient Implementation of a Quasi-polynomial Algorithm for ...

Implementation of an Efficient Transport for Real-Time Game ...

Concepts for an Efficient Implementation of Domain ... - Ricam

âA Modified Approach for Implementation of an efficient Padding ...

an optimal entropy coding scheme for efficient implementation of pulse

an efficient implementation of newton's method for ... - Science Direct

Implementation of joint health indicators in Europe-Joint Action for ...

Design and Implementation of an Efficient Two-level Scheduler for ...

An Efficient BDD-Based Implementation of Gauss-Seidel for CTMC ...

Design & implementation of an efficient windmill anemometer for wind ...

An efficient implementation of the Bellman-Ford algorithm for Kepler ...

Implementation of an Efficient Light Weight Security Algorithm for ...

An Efficient Implementation of Reactivity for Modeling ... - CiteSeerX

An NoC Traffic Compiler for efficient FPGA implementation of Parallel

An Efficient Implementation Of Genetic Algorithms For ... - IEEE Xplore

An efficient implementation of the BandWidth Inheritance protocol for ...

An efficient joint implementation of three stages for