Multi-scale, Perceptual and Vector Quantization Based Video Codec 1
P. Bagheri Zadeh, 2 T. Buggy, and 3 A. Sheikh Akbari 1,2 School of Computing & Mathematical Sciences, Glasgow Caledonian University, Glasgow,UK Email:
[email protected] 3 Department of Electrical & Electronic Engineering, University of Bristol, Bristol, UK
Abstract This paper presents a novel hybrid Multi-scale, perceptual and vector quantization based video coding scheme. In intra mode of operation, a wavelet transform is applied to the input frame and decorrelate it into a number of subbands. The lowest frequency subband is losslessly coded. The coefficient of the high frequency subbands are pixel quantized using perceptual weights, which specifically designed for each high frequency subband. The quantized coefficients are then coded using quadtree-coding scheme. In the inter mode of operation, displaced frame difference is generated using overlapped block motion estimation / compensation to exploit the interframe redundancy. A wavelet transform is then applied to the displaced frame difference to decorrelate it into a number of subbands. The coefficients in the resulting subbands are coded using an adaptive vector quantization scheme. To evaluate the performance of the proposed codec, the proposed codec and the adaptive subband vector quantization coding scheme (ASVQ), which has been shown outperforms H.263 at all bitrates, were applied to a number of test sequences. Results indicate that the proposed codec outperforms ASVQ subjectively and objectively at all bit rates.
1. Introduction Efficient video compression is vital due to rapid growth in multimedia technologies. The need for efficient compression has led to development of standard and non-standard video compression algorithms. Most standard video coding schemes, i.e. H.261, H.263, MPEG2, and H.264/AVC [1], and many non-standard video codecs use hybrid video coding architecture. Hybrid video compression techniques work in two modes of operations: either intra- or inter-frame. In
Second International Conference on Digital Telecommunications (ICDT'07) 0-7695-2910-0/07 $20.00 © 2007
intra-frame mode of operation the input frame is independently coded, while in inter-frame mode of operation the input frame is first predicted from the previous encoded frame/frames using a motion estimation / compensation algorithm. A displaced frame difference is then generated and coded. The inter-frames are predicted from encoded intra-frames, hence, any loss in the quality of the intra-frame is propagated to the successive predicted frames. Therefore the quality of the encoded intra-frame is essential in achieving a high quality video in the receiver. A hybrid wavelet based video coding scheme, which uses an adaptive vector quantization scheme to code the wavelet coefficients in detailed subbands was presented in [2]. This codec employs perceptual weights in vector selection and bit allocations among different subbands when coding the intra-frames, while it only uses vector quantization when coding interframes. Superior performance to that of H263 at low bitrates was reported in this paper. In this paper a novel hybrid wavelet based video coding scheme is presented. The proposed codec works in either intra- or inter- frame mode of operation. In intra-frame mode of operation, a wavelet transform decomposes the input frame into its frequency subband. The lowest frequency subband is losslessly coded. The Coefficients of the high frequency subbands are quantized using a uniform quantization factor divided by their perceptual weights, which is specifically calculated for each subband. The quantized coefficients are then quadtree coded. In inter frame mode of operation, overlapped block motion compensation / estimation is used to generate displaced frame difference (DFD). The wavelet transform is applied to the DFD frame to decorrelate the input DFD frame into its frequency bands. The coefficients in high frequency subbands are coded using adaptive vector quantization coding scheme while the baseband is losslessly coded.
The rest of the paper is organized as follows: In Section 2 the proposed video coding scheme is discussed. Experimental results are presented in Section 3 and the paper is concluded at Section 4.
2.1. Intra-frame mode of operation A block diagram of the proposed encoder in the intra mode of operation is shown in the shade (bold) part of Figure 1. In this mode of operation, an intra frame is input to the encoder. A wavelet transform is used to decorrelate the frame into a number of frequency subbands. The wavelet transform concentrate most of the frame energy into the lowest frequency band. Therefore to preserve visually important information the baseband is losslessly coded using DPCM method. The coefficients in the high frequency subbands are pixel quantized using a uniform quantization factor divided by perceptual weights. The perceptual weights are uniquely generated for the center of each high frequency subband (detailed in Section 2.1.1). The application of the perceptual weights in the quantization process of the wavelet coefficients minimizes the visibility of compression artefacts and improves the visual quality of the reconstructed frames. The output of the quantization process is a significant map and some quantized coefficients for each subband. The significant map is a binary map, which preserves the location of non–zero pixels. A ‘1’ in the binary map corresponds to a pixel that has none zero value and will be used in the next stage and a ‘0’ corresponds to a pixel that has a zero value. For each binary map, the number of zeros and ones are calculated and this map is coded using a quad-
Second International Conference on Digital Telecommunications (ICDT'07) 0-7695-2910-0/07 $20.00 © 2007
IDWT
Inverse Quantization
Lossless Encoder
LLn INTER + _
Quadtree Encoder
HF
DWT subbands
Vector Selection
LLn
AVQ
Quadtree Encoder
IDWT Frame Buffer
M
U
Bit-stream
A Block diagram of the Multi-scale, Perceptual and vector quantization (MPVQ) based video encoder is shown in Figure 1. Two modes of operation are defined in the encoder parts: either intra- or inter-frame. In the intra frame mode of operation, the frame is transformed into frequency domain and the transformed coefficients are pixel quantized and quadtree coded. This mode of operation will be discussed in next section. In the interframe mode of operation, a displaced frame difference (DFD-frame) is generated from the input and its previous encoded frames using overlapped block motion estimation / compensation. The resulting DFDframes are then coded using adaptive subband vector quantization-coding scheme (ASVQ) [2]. This mode of operation is detailed in Section 2.2.
HF DWT subbands Quantization INTRA LLn
Input frames
2. Multi-scale, Perceptual and vector quantization based video codec
Lossless Encoder
LLn
X
IVQ
Overlapped Block Motion Motion Motion Compensation Motion Estimation Vectors Vectors
Figure 1. The Multi-scale, Perceptual and vector quantization (MPVQ) based video encoder.
tree-coding structure for the group with the higher population. 2.1.1 Threshold Generation The threshold value for each detail subband is generated using a uniform quantization factor divided by the perceptual weights calculated for that subband. In this paper, the perceptual weights were calculated for a QCIF image size and a viewing distance of 40 centimeters using an algorithm presented in [3]. This approach first calculates the spatial frequency for the centre of each high frequency subband and then uses the experimental results presented in [5] to design the perceptual weights for each high frequency subband.
2.2. Inter-frame mode of operation A block diagram of the inter-frame mode of operation is shown in the shaded part of Figure 1. In this mode of operation, a frame is input to the codec. The encoder employs an overlapped block motion compensation technique, which is explained in Section 2.2.1, to generate a displaced frame difference (DFD) for the current frame. A 2D wavelet transform is then applied to the DFD frame to decorrelate it into its frequency subbands. The baseband is coded using lossless DPCM method to preserver visually important information. The coefficients in high frequency subbands are first divided into vectors. The most significant vectors according to their energy are preserved for the vector quantization stage. To preserve the location of the selected vectors, a significance
binary map for each subband is generated and quadtree coded. The selected vectors are finally vector quantized using the adaptive vector quantization scheme presented in [2]. 2.2.1 Overlapped Block Motion Compensation The application of the wavelet transforms along with traditional block matching motion estimation / compensation techniques cause a significant reduction in the coding gain and serious blocking artefacts in the decoded frames. This is a result of the block edges, which produce significant amounts of signal energy in the high frequency subbands. Hence, the overlapped block motion compensation (OBMC) technique [2] is employed in the proposed coded to reduce the blocking artefacts and improve the coding gain. Motion estimation is performed by dividing the current frame into blocks of size 2M×2M, overlapping by M pixels. A search strategy is then used to find the best matching block from the previous decoded frame. The best matching block is shifted according to the motion vector and weighted with a smoothing window function a raised cosine function [2]. All the windowed blocks are then summed to form the prediction of the current frame, which is then coded in inter mode. In this implementation, M is equal to 8 resulting the block size of 16×16, overlapping by 8 pixels. The window function is a 2D raised cosine window function (w), as illustrated in Figure 2. This function is defined by Equation (1):
w(u , v) =
2 M −1 2 M −1
∑ ∑ w(u ).w(v) u =0
where
(1)
v=0
wu and wv are a 1D raised cosine function ,
which is defined as follow:
1 π k + 1 2 w(k ) = 1 − cos 2 M
(2)
A quarter of a pixel resolution was used to achieve significant reduction in the energy of the DFD signal. The motion vectors are then DPCM coded.
3. Experimental results In order to evaluate the performance of the proposed codec, six QCIF test sequences: Carphone,
Second International Conference on Digital Telecommunications (ICDT'07) 0-7695-2910-0/07 $20.00 © 2007
Figure 2. function.
Two-dimensional raised cosine window
Foreman, Miss America, Grandma, Trevor and Claire were used for the experiments. These sequences were temporally down-sampled by frame skipping to generate sequences at 10 frames per second. Resulting datasets were then coded using the proposed coding scheme and the adaptive subband vector quantization (ASVQ) coding algorithm at different bitrates. The ASVQ coding technique is the reference codec that has been extended in this paper. In [2] it has been shown that the ASVQ outperforms H.263. Results were produced for the Y domain of the first 150 frames of the test sequences, where both codec uses the following parameters in their coding schemes: a) a three level lifting based Daubechies 9/7 wavelet transform was used to decompose the intra-frames and the displaced frame differences into 10 frequency subbands; b) each group of pictures consists of one intra and four inter frames; c) the same number of bits were used to code the counterpart intra-frames in two schemes; d) the same number of bits were used to code the counterpart inter-frames in two schemes. The quality of the decoded frame sequences was measured by their average PSNR. The PSNR of each frame is calculated by 10 log(255^ 2 / mse) where mse is the mean square error of the luminance component of the reconstructed frame. The average PSNR measurements for the Y domain of the intra frames of the encoded sequences at different bitrates using the MPVQ and ASVQ codecs are shown in Table 1. Table 2 gives the average PSNR measurements for the encoded sequences at different bitrates using the MPVQ and ASVQ codecs. From Table 1, it can be seen that the decoded intra-frames using the MPVQ have significantly higher PSNR than the intra-frames coded using the ASVQ codec. Since
Bitrates
128 kbits/s
256 kbits/s
384 kbits/s
448 kbits/s
Sequence
MPVQ
ASVQ
MPVQ
ASVQ
MPVQ
ASVQ
MPVQ
ASVQ
Clair Carphon Forman Granma Mis.Ame Trevor
48.20 38.37 33.84 39.15 47.70 36.71
34.50 30.17 29.71 34.29 42.25 32.38
55.38 49.40 45.62 49.13 54.73 44.41
35.61 31.73 32.14 37.00 47.39 36.51
59.29 56.43 50.52 56.41 58.92 55.00
35.80 32.01 32.45 37.85 47.71 39.01
62.72 58.88 54.50 58.87 62.86 59.21
35.90 32.05 32.71 37.97 48.95 39.63
Table 1: Average PSNR of the intra frames of the test sequences at different bit rates coded by MPVQ and ASVQ video codecs.
a) Carphone at 128kbps Bitrates
128 kbits/s
256 kbits/s
384 kbits/s
448 kbits/s
Sequence
MPVQ
ASVQ
MPVQ
ASVQ
MPVQ
ASVQ
MPVQ
ASVQ
Clair Carphon Forman Granma Mis.Ame Trevor
37.51 30.60 24.83 38.29 36.93 27.56
34.05 29.18 24.51 36.93 35.98 26.98
38.94 32.92 27.25 40.72 38.32 28.81
34.14 29.40 24.72 37.46 36.87 27.31
39.36 34.34 28.07 42.20 39.07 31.05
34.24 29.49 24.87 37.72 36.99 27.94
39.95 34.80 28.83 42.54 40.33 31.84
34.27 29.39 24.57 37.81 37.07 27.97
Table 2: Average PSNR of the first 150 frames of the test sequences at different bit rates coded by SDCTVQ and ASVQ video codecs.
the encoded intra-frames are used to predict the interframes, any changes in the quality of the encoded intraframes directly affect the quality of the encoded interframes. It implies that the quality of the encoded interframes using the MPVQ codec would have superior quality than those of ASVQ. From Table 2, it is obvious that the decoded video sequences using the MPVQ codec have higher average PSNR than those coded using the ASVQ. The achieved improvement varies at different bitrates i.e. up to 3 dBs at 192 kbits/s and up to 5 dBs at 448 kbits/s. Figure 3a to 3c illustrates the frame-by-frame PSNR measurements of the decoded Carphone, Grandma and Clair test sequences at 128, 256 and 448 kbits/s using the MSVQ and ASVQ coding schemes, respectively. From these figures, it is clear that the decoded frames using MPVQ always have higher PSNR to those, which coded using ASVQ. However, it is well known that the PSNR is an unreliable metric for measuring the visual quality of the compressed images [4]. Therefore, to illustrate the true visual quality obtained using the MPVQ and the ASVQ video codecs, the reconstructed intra - frames (frame 76) and the reconstructed inter-frames (frame 79) Miss America, Clair and Carphone at bit rate of 192 kbits/s are shown in Figures 4 and 5, respectively. It can be seen from Figure 4 and 5 that the reconstructed frames using the MPVQ codec, both intra- and interframes, have much better visual quality with less blurred edges and better surface details than those
Second International Conference on Digital Telecommunications (ICDT'07) 0-7695-2910-0/07 $20.00 © 2007
b) Grandma at 256kbps
c) Clair at 448kbps Figure 3.Frame-by-Frame PSNR comparison of MPVQ and ASVQ video codecs (a) Carphone, (b) Grandma, (c) Clair.
coded using the ASVQ.
4. Conclusion In this paper a new hybrid Multi-scale Perceptual and vector quantization (MPVQ) based video codec was presented. . In intra mode of operation, a wavelet transform was applied to the input frame and decorrelate it into a number of subbands. The low frequency subband was losslessly coded. The
coefficient in the high frequency subbands were pixel quantized and coded using a quadtree-coding algorithm. Perceptual weights were employed to regulate the quantization step for each subband. In inter mode of operation, a displaced frame difference was first generated using an overlapped block motion estimation / compensation technique. Coefficients in the displaced frame difference were decorrelated using a 2D wavelet transform. The resulting high frequency subbands were then vector quantized using an adaptive vector quantization scheme. Experimental results were generated using a number of test sequences. Results have shown that the MPVQ video codec significantly outperforms the ASVQ video codec.
(a) PSNR = 50.79 dB
(b) PSNR = 45.56 dB
(a) PSNR = 50.72 dB
(b) PSNR = 34.66 dB
(a) PSNR = 44.77 dB
(b) PSNR = 32.70 dB
5. References [1] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer, and T. Wedi, “Video coding with H.264/AVC:Tools, Performance, and Complexity” IEEE Circuit and System Magazine, vol.4, 2004, pp.7-28. [2] S. P. Voukelatos and J. J. Soraghan, “Very Low Bit Rate Color Video Coding Using Adaptive Subband Vector Quantization with Dynamic Bit Allocation,” IEEE Transaction on Circuit and System for Video Technology, vol. 7, no. 2, 1997, pp. 424-428. [3] A. Sheikh Akbari and J.J. Soraghan, “Adaptive Joint Subband Vector Quantization Codec for Handheld Videophone Applications,” IEE Electronic Letters, vol. 39, no.14, July 2003, pp 1044 – 1046. [4] X. Kaia, Y. Jiea, Z. Y. Minb and L. X. Lianga, “HVSbased medical image compression,” European Journal of Radiology 55, 2005, pp. 139–145. [5] R.E. Van Dyck, and S. A. Rajala, “Subband/VQ Coding of Colour Images with Perceptually Optimal Bit Allocation”, IEEE Transaction on Circuits and Systems for Video Technology, vol. 4, no. 1, February 1994.
Figure 4. Decoded frame 76 (intra frame) of Miss America, Clair and Carphone, using (a) MPVQ and (b) ASVQ video codec at 192 kbits/s.
(a) PSNR = 26.23 dB
(b) PSNR = 26.05 dB
(a) PSNR = 32.82 dB
(b) PSNR = 31.24 dB
(a) PSNR = 23.98 dB
(b) PSNR = 23.80 dB
Figure 5. Decoded frame 79 (inter-frame) Miss America, Clair and Carphone, using (a) MPVQ and (b) ASVQ video codec at 192 kbits/s.
Second International Conference on Digital Telecommunications (ICDT'07) 0-7695-2910-0/07 $20.00 © 2007