IEEE SIGNAL PROCESSING LETTERS, VOL. 14, NO. 2, FEBRUARY 2007
105
High-Quality DCT-Based Image Compression Using Partition Schemes Nikolay N. Ponomarenko, Karen O. Egiazarian, Senior Member, IEEE, Vladimir V. Lukin, and Jaakko T. Astola, Fellow, IEEE
Abstract—This letter presents an advanced discrete cosine transform (DCT)-based image compression method that combines advantages of several approaches. First, an image is divided into blocks of different sizes by a rate-distortion-based modified horizontal-vertical partition scheme. Statistical redundancy of quantized DCT coefficients of each image block is reduced by a bit-plane dynamical arithmetical coding with a sophisticated context modeling. Finally, a post-filtering removes blocking artifacts in decompressed images. The proposed method provides significantly better compression than JPEG and other DCT-based techniques. Moreover, it outperforms JPEG2000 and other wavelet-based image coders. Index Terms—Discrete cosine transform (DCT), image compression, partition schemes (PS).
I. INTRODUCTION TILL IMAGE compression finds numerous applications in communications, remote sensing, data archiving, digital cameras, etc. [1]. The standard JPEG [2], based on discrete cosine transform (DCT) [3], is one of the best known and most widely used lossy image compression techniques. The new standard, JPEG2000 [4], based on discrete wavelet transform (DWT) [4], [5], provides higher compression ratios than JPEG for a comparable quality of decompressed images. In this letter, we propose a new DCT-based coder providing compression ratios higher than those of JPEG2000 and the most of the latest image compression methods. A short description of the proposed method is given in Section II. Section III is devoted to the rate-distortion optimization of the partition scheme (PS). An effective context modeling for bit-plane coding of DCT coefficients is described in Section IV. Section V deals with the post-filtering for removal of blocking artifacts from decompressed images. The superiority of the proposed method in comparison to JPEG, JPEG2000, SPIHT [6], and several of the latest DCT- and DWT-based methods on a set of test images is shown in Section VI.
S
II. SHORT DESCRIPTION OF THE PROPOSED METHOD
Fig. 1. Flowchart of proposed method. a) Image compression. b) Image decompression.
As it can be seen from Fig. 1, the main difference between the proposed method and others consists in the following. 1) A modified horizontal-vertical (MHV) PS [7] is used. 2) A sophisticated bit-plane coding of the quantized DCT coefficients is exploited. 3) An effective method of de-blocking [8] is utilized. The MHV scheme provides appropriate adaptation of the compression together with simplicity of optimized partition. This, in turn, leads to the statistical homogeneity of quantized DCT coefficients and simplicity of elimination of statistical redundancy of the quantized DCT coefficients by bit-plane coding. Additional gain on compression performance is due to the removal of blocking artifacts. In this scheme, it is performed in the DCT domain facilitating software and hardware realizations of the proposed method. III. RATE DISTORTION PARTITION SCHEME OPTIMIZATION
In Fig. 1, flowcharts of the proposed method are presented. A. Motivation of Usage of MHV PS Manuscript received April 25, 2006; revised May 9, 2006. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Lina J. Karam. N. N. Ponomarenko and V. V. Lukin are with the Department of Transmitters, Receivers, and Signal Processing, National Aerospace University, 61070 Kharkov, Ukraine (e-mail:
[email protected];
[email protected]). K. O. Egiazarian and J. T. Astola are with the Institute of Signal Processing, Tampere University of Technology, FIN-33101 Tampere, Finland (e-mail:
[email protected];
[email protected]). Digital Object Identifier 10.1109/LSP.2006.879861
In a DCT-based image compression, similar to a fractal image compression, a proper choice of a PS is a compromise task [9]. A simple quad-tree PS [10] provides low quality of the decoded images with a fast optimization of the PS. A more complex PS may provide much higher quality of decoded images with a price of much higher optimization time and a larger size of memory to store PS data in the compressed image file [7]. In DCT-based compression, there are extra problems related with
1070-9908/$25.00 © 2006 IEEE
106
IEEE SIGNAL PROCESSING LETTERS, VOL. 14, NO. 2, FEBRUARY 2007
Fig. 2. a) Possible variants of block division in MHV PS optimization. b) Example of optimized MHV PS for test image patterns with a bitrate of 1 bpp. Fig. 3. Illustration of bit-plane coding.
a necessity of further coding of DCT coefficients in PS blocks. More complex PS complicates efficient context modeling for such a coding method. Here we use a simple MHV PS [7] for which any block of an image can be divided horizontally or vertically into two equal parts. In order to provide higher statistical homogeneity of data, we restrict ourselves to use only the blocks of the following sizes: 64 64, 64 32, 32 64, 32 32, 32 16, 16 32, 16 16, 16 8, 8 16, and 8 8 pixels. The blocks of other sizes, like 64 16, 8 64, allowed in the standard MHV PS [7], are excluded from our design. B. Optimization PS optimization starts from image partitioning into blocks 64. For each square block greater than 8 8 of size 64 (see Fig. 2(a) with eight types (variants) of block partition), the following procedure is applied recursively. For each parti, the following operations are pertion variant, formed: DCT transformation, quantization of DCT coefficients , coding of DCT coefficients with a given quantization step (see Section IV), decoding of DCT coefficients, and IDCT in and are the blocks. As a result, the following estimates obtained, where is a memory to store coded combination of is a PSNR for decoded . Then, from eight blocks , and variants of partitions, the best one is chosen according to the such that following strategy: among and , we choose
otherwise where
, , , , , and using AGU coder [11] with quantization steps , respectively.
are obtained ,
C. Storage of Optimized PS Data In PS optimization, each time when the best is selected, its index is stored into the output stream. This is sufficient to decode the obtained optimized PS. Since in each case, we have eight choices of partitions, three bits are enough to code the corresponding index. The obtained PS coded in the above manner occupies from 1%–3% of the entire size of coded data, depending
upon a compression ratio (CR). Thus, the use of MHV PS in DCT-based coding, in the worst case, can increase the coded image size by only 1%–3%. At the same time, the benefit due to the use of MHV PS depends upon compressed image properties and can reach up to 40%–50% of the coded image size. IV. BIT-PLANE CODING OF QUANTIZED DCT COEFFICIENTS The bit-plane coding in this letter is based on the approach developed in [11]. The only difference is that, contrary to [11], we do not exploit inter-block correlation since it is difficult to do it for blocks of non-equal size. After calculation of DCT in blocks and uniform quantization of the obtained coefficients, we have an array of integer values. Signs of these DCT coefficients have random behavior close to a white noise. They are allocated to a separate stream and are not compressed. For each sign of nonzero DCT coefficient, one bit is passed to the output stream. The obtained array of absolute values of DCT coefficients is divided into bit-planes, where is a number of the highest bit-plane with nonzero values (see Fig. 3). Coding begins with the bit-plane and comes to the end with the bit-plane 1. be a bit value of a bit-plane of a DCT coLet , where , , efficient with the index , , and denote the numbers of pixels of coded blocks for horizontal and vertical directions, respectively. The following conditions are used for a context modeling of bits of bit-planes. true, if . This 1) condition is assigned “true” if at least one bit among earlier coded higher bit-planes is equal to 1. true, if . This 2) condition is “true” if, without taking into account the previously coded higher bit-plane, the bit with these indexes was equal to 1. true, if . This 3) condition is “true” if in this or in at least one of the earlier coded higher bit-planes, the bit with these indexes was equal to 1. true, if . 4) true, if true 5) .
PONOMARENKO et al.: HIGH-QUALITY DCT-BASED IMAGE COMPRESSION USING PARTITION SCHEMES
Fig. 4. Context modeling for coding a value P (i; j ) of a block of an image.
This condition is “true” if for at least one of the neighboring bits, there is unity in higher bit-planes. true, if true 6) . This condition is “true” if at least one among neighboring and already coded bits of this bit-plane was equal to 1. true, if true 7)
. This condition is “true” if there was 1 in this or higher bit-planes for already coded bits displaced from the coded bit by two rows or two columns. true, if true 8)
107
collected at the stage of image compression. No preliminary training and precomputed frequency tables are used here. Before starting coding each bit-plane, the counters of unities – are initialized as unities, and zeros for the models i.e., the models of each coded plane are independent. , those bits are referred to that belong to the To the model . We propose to avoid coding the bits lowest bit-plane of (they all are considered equal to 0). This is analogous to a «dead zone» in quantization. – are used for difDifferent copies of the models ferent regions of image blocks. For the bits of DCT coefficient , (i.e., the quantized value of the with the indexes – is used. block mean), a separate copy of the models The statistical characteristics of this DCT coefficient considerably differ from the statistics of other DCT coefficients. A sep– is also used for the first arate copy of the models (upper) row of block DCT coefficients. This is explained by the fact that for these coefficients, there is only one earlier coded bit. This leads to a considerable difference of bit distribution between the models. For all other DCT coefficients of a block, the – is used. For each size and third copy of the models shape of blocks (the maximum number in this case is 10), a separate probability model is applied. Thus, for coding of quantized DCT coefficients decomposed into bit-planes, probability models are used. Let us mention one important point once again. For the used , the losses of image quality occur variant that includes the not only at DCT coefficient quantization step but also (though in much smaller degree) at the step of bit values coding for bit, this does not lead to planes. If one avoids using the model any additional losses at this step. The use of such bit-plane coding may provide a progressive transmission and decoding possibilities. V. POST-FILTERING FOR BLOCKING ARTEFACTS REMOVAL
. This condition is “true” if there was 1 in this or higher bit-planes for already coded bits displaced from the coded bit by three rows or three columns. true, if false. 9) true 10) . false 11) true, if . 12) true, if . Fig. 4 presents a scheme of bit value classification by is the probachecking the aforementioned conditions ( bility model number X). Totally, according to a given classification, a bit can be referred to one of 11 probability models. For statistical redundancy reduction of data, it is proposed to use the dynamic version of binary arithmetic coding [12] that is the most effective for the considered case. For each model after coding the current bit, the counters of 0 and 1 are corrected, and they are used for coding the next bits referred to this model. Adequately, statistics for models is
For removal of blocking artifacts, the DCT-based filter proposed in [8] is used. It operates in a sliding 8 8 window with hard thresholding of DCT coefficients
where
is an output of the filter for pixel
of the image
where are the DCT coefficients of 8 8 image block with . Here we set . the left upper corner coordinates Post-filtering in our case allows increasing PSNR of the decoded images by 0.5–1 dB, while the decoding time increases by only 30%–40%. Note that we have used a fast integer realization of the 8 8 DCT transform in the post-filter.
108
IEEE SIGNAL PROCESSING LETTERS, VOL. 14, NO. 2, FEBRUARY 2007
TABLE I COMPARATIVE ANALYSIS OF CODING EFFICIENCY
VI. COMPARISON ANALYSIS The proposed method was compared with JPEG (with the same post-filtering [8], for fair comparison), JPEG2000 (wavelet 9/7), SPIHT, our previous coder AGU [11] based on image division into equal size blocks (this coder and the set of test images are accessible for downloading from http://www.cs.tut.fi/sgn/compression/agucoder.htm), one of the best wavelet-based methods [13] (X-W 1999), LOT-trans32), and several of the latest form coder [14] (GLBT 16 DCT- and wavelet-based high-performance methods [15]–[17], using a set of 512 512 grayscale images. Table I shows PSNR values for different bit rates for methods used in the performance analysis. The proposed method, AGUMHV, outperforms its predecessor, AGU, in average by 0.5–0.6 dB, due to a better spatial adaptation. AGU-MHV is inferior to the method X-W 1999 [13] only for . In all other cases, the new method the image Lena with outperforms others. For the image Barbara, AGU-MHV shows the best results, outperforming X-W 1999 by up to 0.8 dB and SPIHT and JPEG2000 by up to 3 dB. For the test image patterns (the synthetic one, which can be downloaded from http:// www.cs.tut.fi/~karen/testim.zip), the gain is up to 8 dB. We have also compared the performance of AGU-MHV with recently designed methods, described in [15]–[17]. Comparisons were possible for only test images Lena and Barbara. Because of this, the corresponding data for the methods [15]–[17] are not included in Table I. In terms of PSNR for given CR or bitrate, AGU-MHV outperforms the method in [15] on average by 0.5–1.2 dB, the method in [16] on average by 1.2–2 dB, and the method in [17] by 1.2–3 dB. VII. CONCLUSION A new high-quality DCT-based method is developed in this letter. For the test image Barbara, it provides (for bpp from 0.25 to 1) compression ratios better than those of JPEG by up to 2 times and for SPIHT and JPEG2000 by up to 1.7 times. The basic advantage of a new coder is in its adaptation to the local image features due to the use of partition scheme. Thus, the maximal benefit of the new method is observed when the image
contains texture regions as well as sharp edges (see results in Table I for images “Barbara” and “Patterns”). REFERENCES [1] D. Salomon, Data Compression. The Complete Reference, 3rd ed. New York: Springer, 2004. [2] G. K. Wallace, “The JPEG still picture compression standard,” Commun. ACM, vol. 34, no. 4, pp. 30–44, 1991. [3] K. Rao and P. Yip, Discrete Cosine Transform, Algorithms, Advantages, Applications. New York: Academic, 1990. [4] D. Taubman and M. Marcellin, JPEG 2000: Image Compression Fundamentals, Standards and Practice. Norwell, MA: Kluwer, 2002. [5] Wavelet Image and Video CompressionP. N. Topiwala, Ed. Norwell, MA: Kluwer, 1998. [6] A. Said and W. A. Pearlman, “A new fast and efficient image codec based on the partitioning in hierarchical trees,” IEEE Trans. Circuits Syst. Video Technol., vol. 6, no. 3, pp. 243–250, Jun. 1996. [7] N. Ponomarenko, V. Lukin, K. Egiazarian, and J. Astola, “Modified horizontal vertical partition scheme for fractal image compression,” in Proc. 5th Nordic Signal Processing Symp., Hurtigruten, Norway, 2002. [8] K. Egiazarian, J. Astola, M. Helsingius, and P. Kuosmanen, “Adaptive denoising and lossy compression of images in transform domain,” J. Electron. Imag., vol. 8, pp. 233–245, 1999. [9] B. Wohlberg and G. Jager, “A review of the fractal image coding literature,” IEEE Trans. Image Process., vol. 8, no. 12, pp. 1716–1729, Dec. 1999. [10] A. E. Jacquin, “Fractal image coding based on a theory of iterated contractive image transformations,” in Proc. SPIE: Vis. Commun. Image Process., M. Kunt, Ed., Lausanne, Switzerland, Oct. 1990, vol. 1360, pp. 227–239. [11] N. N. Ponomarenko, V. V. Lukin, K. O. Egiazarian, and J. T. Astola, “DCT based high quality image compression,” in Proc. Scandinavian Conf. Image Analysis, 2005, vol. 3540, pp. 1177–1185, Springer Series: Lecture Notes in Computer Science. [12] G. Langdon and J. Rissänen, “A simple general binary source code,” IEEE Trans. Inf. Theory, vol. IT-28, pp. 800–803, Sep. 1982. [13] Z. Xiong and X. Wu, “Wavelet image coding using trellis coded spacefrequency quantization,” IEEE Signal Process. Lett., vol. 6, no. 7, pp. 158–161, Jul. 1999. [14] T. D. Tran and T. Q. Nguyen, “A lapped transform progressive image coder,” in Proc. IEEE Int. Symp. Circuits Systems, 1998, vol. 4, pp. 1–4. [15] Y. Huang and I. Pollak, “MLC: a novel image coder based on multitree local cosine dictionaries,” IEEE Signal Process. Lett., vol. 12, no. 12, pp. 843–846, Dec. 2005. [16] W. Dai, L. Liu, and T. D. Tran, “Adaptive block-based image coding with pre-/post-filtering,” in Proc. Data Compression Conf., 2005, pp. 73–82. [17] X. S. Hou, G. Z. Liu, and Y. Y. Zou, “Embedded quadtree-based image compression in DCT domain,” in Proc. ICASSP, 2003, vol. III, pp. III277–III-280.