CONTEXT-BASED STATISTICAL MODEL FOR DCT-BASED IMAGE CODER Xiaohui Xue, Wen Gao Department of Computer Science Harbin Institute of Technology, Harbin, Heilongjiang Province, P. R. China, 150001
[email protected],
[email protected] ABSTRACT: DCT is known as the sub-optimal transform and is widely adopted in the international technique standards of image compression currently. However, there is still considerable potential in DCTbased compression schemes. This paper applies the context method to DCT blocks and designs a context-based statistical model, by which we can make further use of the statistical correlation between the DCT coefficients. The implementation turns out to be so convenient, and the experimental results are very effective.
1. INTRODUCTION The discrete cosine transform (DCT) is the core technique of almost all the image compression standards currently. However, the mainstream of the research on image compression has shifted from the DCT to the wavelet transform. In fact, since Shapiro published his now famous embedded zerotree wavelet coder in 1993 [1], a class of improved algorithms based on wavelet and zerotree has emerged and turned out to be a great break-through of image compression technology [2]. Behind this shift, there are some misunderstandings of the effect of decorrelation of DCT; in fact, the DCT coefficients are not encoded efficiently enough so far. The DCT is known as the sub-optimal transform since DCT can achieve almost the same perfect decorrelation as the Karhunen-Loeve transform can do when applied to 1-order stationary Markov process. However, in practice, if we view it in a little more strict way, we can find that natural images are far from 1order stationary Markov process; they are non-stationary instead. In some sense, the most important feature of image is edge, which corresponds to sharp variation of intensity value in image. The Fourier transform of an edge in space is an edge in the perpendicular direction in frequency domain. Therefore, there seems likely to be two-dimensional edge-like structure in DCT blocks, which can be interpreted as statistical correlation in view of compression. More practicably, for natural images, the energy of a particular frequency component is statistically related to that of the nearby components in two-dimensional spectrum to some degree. JPEG has made use of the correlation by run length coding the zero coefficients in the zigzag scan order [3], which is one-dimensional compression actually. An important DCT-based coder proposed in [4] applies the embedded zerotree quantizer to the DCT blocks. Another improved JPEG algorithm optimizes the quantization table [5]. Since zerotree is also a quantization method, the latter two improvements can be interpreted as more proper or optimal quantization techniques, which is very different from our idea to explore and capture as much correlation between DCT coefficients as possible. Besides, the statistical dependency across the zerotree-like structure in DCT blocks ([4]) might be less than that within a simple neighborhood. The statistical correlation across the tree structure in wavelet is reasonable because there do exist self-similarity in the multi-scale transform of image; to observe and exploit such selfsimilarity in terms of zerotree is a great achievement. But, in the case of DCT, the situation is different. The optimization of the rate-distortion performance is the main contribution of zerotree structure in DCT block. We apply the context method to DCT blocks and put forward a context-based statistical model, by which we can make further use of the statistical correlation between the DCT coefficients. Implementation turns out to be so convenient. Experiments on standard test images show that by context-based statistical model our coder
consistently outperforms both the baseline model and the optimal model of IJG by a large margin. Section 2 of this paper gives our context-based statistical model. Section 3 gives the experimental results.
2. STATISTICAL MODEL BASED ON CONTEXT 2.1 Description of Statistical Model The statistical model in image compression can be briefly described as follows. For current symbol s 0 , the neighborhood of s 0 , defined to be a vector (s1 , s 2 ,..., s k ) , is an ordered set of the past k symbols. k is called the degree of the model. The context of s 0 is defined as a mapping f (s1 , s 2 ,..., sk ) . The goal of the statistical model is to estimate the probability under the condition of the context,
p( s 0 | f (s1 , s 2 ,..., s k )) e.g. in the case of f (s1 , s 2 ,..., sk ) = ( s1 , s 2 ,..., s k ) , which means the context does nothing on the neighborhood of s 0 , the output of the statistical model is the k-order conditional probability. The cost of the model is S k , where S is the size of the alphabet of the input symbols. The main problem on the statistical model is known as “Context Dilution” [6][7]. In the case of DCT, this problem might be more serious since the 8 × 8 DCT for 8bits image produces 12bits DCT coefficients. 2.2 Algorithm Suppose the DCT coefficients in a 8 × 8 block be { X i , i = 0,1,...,63} from left top to right bottom. For current coefficient X i , i = 1,2,...,63 , let A, B be the coefficients left to and above X i respectively, which is illustrated in Fig. 1. 39 X0
38 37
B A
Xi
PSNR(db)
36 35 34 33 32 IJG Optimal-lJG This Paper
31 30 29 28 0.2 Fig 1 Context of DCT
0.3
0.4
0.5 0.6 Rate(b/p)
0.7
0.8
Fig. 3 Experimental results of 512 × 512 Lena.
Let the context of X i be
(i, δ ( A), δ ( B))
where
1, δ ( x) = 0,
x=0 x≠0
The output of statistical model is the estimation of the conditional probability:
p( x | (i, δ ( A),δ ( B)) Therefore, we only have 63 × 2 × 2 = 252 contexts. In this paper we omit the description of the treatment of the DCT block boundary and the DC coefficient X 0 for purpose of clarity.
3. EXPERIMENTS The system framework is a general transform-based one (Fig. 2). The arithmetic coding is based on [8], which is a fast arithmetic coder. The DCT, including quantization, is the same as JPEG [3].
DCT
Statistical Model
Arithmetic Coding
Fig. 2 System framework
We draw the experimental results on 512 × 512 Lena in Fig. 3. For comparison purpose, we also draw the coding results of both the baseline model and the optimal model of IJG, the program of which is available [9]. The figure shows that by context-based statistical model our coder consistently outperforms both the baseline model and the optimal model of IJG by a large margin. Actually, there seems to be no significant difference between the experimental results of this paper and that of [4] either. A straightforward implementation of the idea greatly upgrades the performance of DCT-based coder and makes the coder among the best ones. In view of the amazing simplicity, the method turns out to be fairly efficient. The method of this paper can be greatly generalized. In terms of texture analysis, the DCT can do more than concentrating the energy. The information about the pattern of probability distribution provided by the adjacent DCT blocks can be very helpful to guess the probability distribution of the current block.
4. REFERENCES [1] J. Shapiro, “Embedded Image Coding Using Zerotrees of Wavelet Coefficients”, IEEE Trans. on Signal Processing, vol. 41, no.12, pp. 3445-3462, 1993. [2] A. Said, W. Pearlman, “A New, Fast, and Efficient Image Codec Based on Set Partitioning in Hierarchical Trees”, IEEE Trans. on Circuits and Systems for Video Technology, vol. 6, no. 3, pp. 243-250, 1996. [3] W. B. Pennebaker, J. L. Mitchell, JPEG Still Image Data Compression Standard, New York, 1992. [4] Z. Xiong, O. Guleryuz, and M. T. Orchard, “A DCT-Based Embedded Image Coder”, IEEE Signal Processing Letters, vol. 3, no. 11, pp. 289-290, 1996. [5] S. Wu, A. Gersho, “Rate-constrained picture adaptive quantization for JPEG baseline coders,” Proc. ICASSP’93, vol. 5, pp.389-392, 1993. [6] M. Weinberger, J. Rissanen, and R. Arps, “Application of Universal Context Modeling to Lossless Compression of Gray-Scale Images”, IEEE Trans. Image Processing, vol. 5, no. 4, pp. 575-586, 1996. [7] X. Wu, “Lossless Compression of Continueous-Tone Images via Context Selection, Quantization, and Modeling”, IEEE Trans. Image Processing, vol. 6, no. 5, pp. 656-664, 1997. [8] Xiaohui Xue, Wen Gao, “High Performance Arithmetic Coding for Small Alphabets”, Proc. IEEE Data Compression Conf., pp. 477, 1997. [9] ftp://nic.funet.fi/pub/graphics/packages/jpeg/jpegsrc.v6.tar.gz.