zero-quadtree representation. J.M.Zhong, C.H.Leung and Y.Y.Tang. Abstract: An efficient image compression algorithm based on energy clustering and zero-.
Image compression based on energy clustering and zero-quadtree representation J.M.Zhong, C.H.Leung and Y.Y.Tang Abstract: An efficient image compression algorithm based on energy clustering and zeroquadtree representation (ECZQR) in the wavelet transform domain is proposed. In embedded coding, zeros within each subband are encoded in the framework of quadtree representation instead of zerotree representation. To use large rectangular blocks to represent zeros, it first uses morphological dilation to extract the arbitrarily shaped clusters of significant coefficients within each subband. The proposed encoding method results in less distortion in the decoded image than the line-by-line encoding method. Experimental results show that the algorithm is among the most efficient wavelet image compression algorithms.
1 Introduction Since J.M. Shapiro [ l ] proposed the embedded zerotree wavelet (EZW) image compression algorithm, it has been widely used in image compression and video coding. The algorithm has the characteristic property of embedded coding and progressive transmission of image data in decreasing order of its information content. This kind of encoding plays a very important role in real applications, such as image transmission, image database indexing, retrieval and Internet browsing. Recently, many variants of EZW have appeared in the literature, such as set partitioning in hierarchical trees (SPIHT) [ 2 , 31 and significance checking in wavelet trees (SCIWT) [4], and improvements made without introducing additional computational complexity. Instead of using a zerotree, Said and Pearlman used a spatial orientation tree to quantise a large region of zeros across subbands in the same orientation [2, 31. In the SCIWT algorithm, when a new significant coefficient is found, significance checking is further performed to see if it still has any significant descendants and the decision is made whether its children are needed to be encoded or not [4]. The success of these algorithms is mainly attributed to the novel data structure used for the representation and organisation of the wavelet coefficients. In the wavelet transform domain, in addition to the fact that the distribution of insignificant coefficients across different scales in the same orientation is self-similar [I], the significant coefficients within each subband tend to be clustered [6, 71. The proposed algorithm takes advantage of this property by first capturing and encoding the clustered significant coefficients within each subband using morpho-
0 IEE, 2000 IEE Proceedings online no. 20000752 DOI: 10.1049/ip-vis:20000752 Paper first received 9th November 1999 and in revised form 10th July 2000 J.M. Zhong and C.H. Leung are with the Department of Electrical & Electronic Engineering, University of Hong Kong, Pokfulam Road, Hong Kong
Y.Y. Tang is with the Department of Computing Studies, Hong Kong Baptist University, Waterloo Road, Kowloon, Hong Kong 564
logical dilation, followed by encoding the remaining space of the subband, which is mostly zeros, using rectangular blocks. This approach not only makes the rate distortion tend to be optimised, but also makes it possible to use large rectangular blocks to represent zeros. Experimental results show that the proposed algorithm achieves high coding efficiency and fast encodingldecoding.
2 Proposed coding algorithm 2.7 Outline of ECZQR algorithm By means of a wavelet transform the image is decomposed into a hierarchical subband system. The energy of the image is mostly concentrated in the lowest frequency subband, while the high-frequency subbands only contain a small fraction of the energy. As a result, most of the coefficients have very small magnitudes. Since typical images usually have some flat regions and in the wavelet transform domain, coefficients within each high-frequency subband corresponding to these flat regions have very small magnitudes and are usually quantised as zeros. Instead of using a zerotree to represent a large region of zeros across different scales in the same orientation, the proposed ECZQR algorithm attempts to use rectangular blocks to represent zeros within each subband. On the other hand, coefficients corresponding to the edges or coarse textures in the original image have comparatively large magnitudes and they tend to form clusters within each subband [7, 81. However, the shapes of these clusters are arbitrary. This can be observed in Fig. 5 , in which the distributions of significant coefficients as well as the insignificant coefficients with respect to a threshold for the image of Lena are illustrated. Regions in which most of the quantised coefficients are significant, are referred to as ‘significance clusters’ in this paper. To have a larger decrease in distortion in the decoded image for each bit being encoded, it is more efficient to first extract and encode the arbitrarily shaped significance clusters within each subband using morphological dilation. Afterwards, the remaining space of the subband will consist of mostly zeros and the encoding of such regions using large rectangular blocks is very efficient. IEE Proc.-Vis. Image Signal Process., tbl. J47, No. 6, December 2000
After the significance clusters are extracted and encoded, the significance status of the subband is tested. If there is no significant coefficient left, the encoding of the subband is finished. Otherwise, a quadtree is used to partition the subband into four subblocks and the significance status is tested for each subblock. This process is recursively applied to each subblock until it is either insignificant (i.e. all zeros) or contains only one coefficient. Moreover, during the first few passes of embedded coding, the thresholds for quantisation are comparatively large, and most of the high-frequency subbands have no significant coefficient. Thus if all the high-frequency subbands are grouped together as a single entity, it is possible to use only one symbol for encoding the whole group in the first few passes of embedded coding. This approach is more efficient than the subband-by-subband coding. To implement these ideas, the significance status of a region as well as the concepts of significant group and insignificant group are defined as follows. Let c(i,j ) be the coefficient at position (i, j ) . A region R is significant with respect to threshold T if
Otherwise, it is regarded as insignificant. The significance status of a region R can be defined as a function of threshold 7: i.e.
0, otherwise
It
The wavelet transform domain is partitioned into two groups: one is the significant group (SG) consisting of significant subbands and the other is the insignificant group (IG) consisting of insignificant subbands. At the beginning, the SG is initialised to consist of only the lowest frequency subband (e.g. LL, in Fig. I), and the IG is initialised to consist of all the high-frequency subbands at all scales, i.e. the rest of the wavelet transform domain. The initialisation of SG and IC is illustrated in Fig. 2. When the embedded coding proceeds with decreasing thresholds, more and more subbands in IG will be moved to SG and at some point, the 1G will be empty. In each significance identification pass, SG is encoded first, followed by IG. When encoding the subbands in SG, the lowest frequency subband LL is encoded separately using the method as described by Taubman et al. [ 5 ] . The other high-frequency subbands in SG (if present) are encoded in
r
I
I
I
1
I
I
I I
Fig. 1 Order of encoding subbunds IEE Pmc.-Vis. Image Signal Process., Vol. 147, Nu. 6, December 2000
Fig. 2 Initialisation of SG and IC assuming that wavelet tlansform doniain is us shown in Fig. I SG consists of LL, subband shown in grey and IG consists of all high-frequency subbands
the order as depicted in Fig. 1 and the coding details are explained in the following paragraphs. Encoding of the IG proceeds as follows. If it is not empty, its significance status with respect to the current threshold is encoded. If it is insignificant, the encoding of the IG is finished. If it is significant, it will be decomposed into two parts, one is the ‘candidate of significant subband’ and the other is the new IG. The candidate of significant subband is taken from the IG by choosing the first subband on the list of subbands ranked from coarse scale to fine scale, and in each scale, in the order of HL, LH and HH. An example is illustrated in Fig. 3. The reason for this order of decomposition is that the magnitudes of wavelet coefficients tend to comply with the property of ‘decaying spectrum’. Coefficients with larger magnitudes tend to appear in the coarser scale subbands. When the IG becomes significant with respect to a threshold, it is more likely that the subband corresponding to the coarse scale contains significant coefficients. The candidate of significant subband is added to SG and is encoded according to the algorithm to be explained while the new IG is recursively tested and decomposed (if necessary) as described until the current IG is insignificant or empty. 2.2 Extraction and encoding of clusters of significant coefficients When encoding the significant coefficients in each of the high-frequency subbands in SG, it is critical that their positional information be encoded efficiently. If their co-ordinates within the subband are encoded it will be very inefficient. However, due to the clustering property of significant coefficients as illustrated in Fig. 5, the new significant coefficients will more likely appear in the neighbourhoods of those significant coefficients already found in the previous passes. These new significant coefficients can be efficiently located by using morphological dilation. The coefficients within the support of a small structuring element centred at the previously found significant coefficients are encoded. Although some zeros within the support of the structuring element are also encoded for the purpose of determining the relative positions of the new significant coefficients, the encoding of the coordinates of the significant coefficients is avoided and some net savings in the number of bits can be achieved. 565
a
b
Fig. 3 IUustration of decomposition of IG Starting status is assumed to be that shown in Fig. 2 a When IG in Fig. 2 is significant relative to threshold, HL, subband is regarded as candidate of significant subband and is moved from IG to SG. b If new IG in U is significant with respect to same or smaller threshold, it is further decomposed into two parts such that LH, subband is moved to SG
Morphological dilation is a useful tool for extracting arbitrarily shaped regions and was first used in image compression by Servetto et al. [6, 71. An improvement to this method has also been proposed [SI. The detailed description of morphological dilation can be found from these references. The extraction and encoding of the new significant coefficients using morphological dilation consist of the following two steps [8], which are slightly different from that in morphological representation of wavelet data (MRWD) [6, 71. The difference is that in our modification, the significant coefficients found in the previous passes are not refined in the course of morphological dilation. The processes of significance identification and significance refinement are separated from each other. (i) The significant coefficients found in the previous passes are used as seeds and the structuring element is centred at these seeds to search for new significant coefficients within the neighbourhood defined by the structuring element. If a coefficient is insignificant, a ‘0’ is encoded. Otherwise a ‘ 1’ is encoded and the new significant coefficient is again used as a seed for morphological dilation. This process is recursively repeated until no significant coefficient is found. Any coefficient encoded in this stage is labelled, so that it will not be processed twice. (ii) The probable positions of significant coefficients in the current subband are predicted from those in the next coarser scale subband with the same orientation. The prediction process is illustrated in Fig. 4. The predicted positions are scanned one by one. If they have been encoded before, they are skipped. If a coefficient is significant with respect to the current threshold, a symbol ‘1’ is encoded and it is used as a seed for morphological dilation to search for further significant coefficients in its neighbourhood. If a coefficient is insignificant, a symbol ‘0’ is encoded and it is labelled so that it will not be processed again if its neighbouring coefficient is significant and it is within the support of the structuring element centred at that significant coefficient when morphological dilation is performed. A 3 x 3 structuring element is used for morphological dilation in the present work. Since the coefficients with high energies within each subband tend to be clustered, most of the significant 566
coefficients should have been captured and encoded after the process of morphological dilation. Only a few scattered significant coefficients may still exist in the remaining space of each subband. As an example, the result of using morphological dilation to extract and encode the significance clusters in all high frequency subbands for the image ‘Lena’ is illustrated in Fig. 6. Since each subband is encoded region-by-region instead of line-by-line and the symbols encoded in the course of morphological dilation are quite correlated (mostly significant), this encoding method is more efficient. Furthermore, since most of the significant coefficients in each high-frequency subband are encoded before encoding the zeros, the decrease in distortion in the decoded image for each encoded bit tends to be greater than the case of line-by-line encoding.
2.3
Quadtree representation of zeros
As shown in Fig. 6, after the arbitrarily shaped significance
clusters are extracted by using morphological dilation, only a few scattered significant coefficients should exist in the
I Fig. 4 Illustration qf signijcance prediction When coefficient ‘ + ’ in subband HL, is significant, its four children and the children’s neighbours within a 3 x 3 neighbourhood are marked in subband of HL, . in H L , =child of ’ ’ * =child’s neighbour
+
IEE Proc.-Vis. h u g e Signal Process, ’ld
147, No. 6,December 2000
I
2.4 Complete algorithm The cmbcdded ECZQR algorithm consists of ii sequcncz of significance identification and significance retinemcnt passes. Thesc arc similar to the sorting and rclincmcnt passes in the SPIIIT algorithm [3]. The significance idcntification pass is used to identify the coefficients, which arc' significant with respect to the current thrcshold but are insignificant in the prc\,ious pass. The significance refincment pass is uscd to refine those coefficients found in the previous passes. The thrcsholds uscd in the significance identi tication passes and significancc refincment passes art' indexcd by the pass number. Bionhogonal wavelets are uscd for image decomposition becnusc they posscss the property of linear phase, which is usefill in image coding applications [9]. The 9 7 biorthogonal wavelets arc choscn for their high coding pcrfonnancc [ 1 I]. When more dyadic
Fig. 5
Siyii/ii t i ) / / t oe//i \vhite pixcis
Fig. 6 Result
Rh of using nzorphological dilation to extract signgcance
clusters in each suhbancl for 'Lena' image Grey regions stand for significance clusters, black regions stand for zeros in remaining space, and white pixels stand for scattered significant coefficients in remaining space. Threshold used for significance identification is as for Fig. 5
remaining space of the subband. In Fig. 7, if the extracted regions are replaced by zeros the remaining space of each subband can be efficiently encoded using rectangular blocks. The quadtree decomposition method is used to locate the scattered significant coefficients. The significance status of the entire subband is first encoded. If the result is 0 (i.e. insignificant), no hrther encoding is needed. Otherwise, the subband is partitioned into four subblocks as depicted in Fig. 8 and the significance status for each subblock is encoded. This process is recursively applied to each subblock until it is either insignificant or contains only a single coefficient. IEE Proc -Vis. Image Signal Process., Vol. 147, No. 6, December 2000
Fig. 8
Qrradfreepartitioning
of
region R 567
wavelet decomposition levels are used the number of coefficients in the lowest frequency ( U )subband is less, and since the LL band is encoded separately, the coding performance is better. A six-level wavelet decomposition is used in this work. Although the proposed algorithm is implemented using the six-level 917 biorthogonal wavelets, the method can be applied to other types of wavelets. The steps of the proposed coding algorithm are as follows: Step 1. Transform the input image using wavelet transform. Step 2. Calculate the set of descending thresholds { T o , T ,, T2, . . . }, with To equal to 0.5 times the largest magnitude among all the wavelet coefficients, and set T, = T,-,/2 for i = 1,2, 3 , . . . Threshold T, is to be used in pass number i. Step 3 . Initialise the significant group (SG) as the LL subband and the insignificant group (IG) as all the high frequency subbands. Set pass number i to 0.
Fig. 11 Original ‘Barbara ’ image
Fig. 9 Original ‘Lena’ image
Fig. 12 Decoded ‘Barbara’ image (0.25 bits per pixel)
Fig. 10 Decoded ‘Lena’ image (0.25 bits per pixel) 568
Step 4. Perform the significance identification pass. 4.1 Encode the group of SG. 4.1.1 Encode the LL subband separately. The scanning order is from top to bottom and left to right. 4.1.2 Encode each high-frequency subband in SG (if it is present in SG). 0 Use morphological dilation to extract and encode the new significant coefficients with respect to the current threshold T;, as described in Section 2.2. 0 Encode the significance status of the remaining space of the subband to see if there is any significant coefficient left. 0 If it is significant, then 0 do quadtree decomposition to encode the large number of zeros and the small number of significant coefficients as described in Section 2.3. IEE Proc.-Vu. Image Signul Process., Vu/. 147, No, 6, Decernber 2000
For each significant coefficient found in the previous significance identification passes, encode a refinement bit with respect to the current threshold Ti. Step 6. Add 1 to the pass number i and go to step 4. 3
Experimental results
To evaluate the performance of the proposed algorithm the
Fig. 13 Original ‘Goldhill’ image
images ‘Lena’, ‘Barbara’ and ‘Goldhill’ are used for experiments. Arithmetic coding is used [lo]. The encoded bit stream is saved in a single file for transmission and storage. Table 1 gives the results of the proposed algorithm in comparison with the EZW algorithm and also with the SPIHT algorithm which is one of the best algorithms reported. For detailed study and consistent comparison the EZW algorithm is implemented using the same filters for wavelet transformation. The results of EZW in Table 1 are from our implementation rather than from the original paper. The proposed algorithm gives better performance than the SPIHT algorithm for the image of ‘Barbara’ and for some other images it gives comparable results. The original images ‘Lena’, ‘Barbara’ and ‘Goldhill’ and their decoded images at 0.25 bits per pixel are shown in Figs. 9 to 14
Table 1: Performance of proposed ECZQR algorithm compared with EZW and SPIHT algorithms for images ‘Lena’, ‘Barbara’ and ’Goldhill’ PSNR (dB) Algorithm
Image
1.0
0.5
0.4
0.3
bpp
bpp
bpp
bpp
bpp
39.58 36.47 35.41
36.70 30.96 32.32
35.20 29.15 31.48
34.17 28.30 30.75
33.74 27.70 30.30
Lena
40.18
Goldhill
37.38 36.25
37.13 32.44 32.96
36.12 31.18 32.03
34.76 29.09 31.10
34.10
Barbara
Lena
40.41
37.21
36.24
Barbara
N/A
30.76
34.95 29.18
Goldhill
36.55
32.10 33.13
N/A
N/A
34.11 28.13 30.56
Lena
EZW
Barbara Goldhill
ECZQR
SPIHT
Fig. 14 Decoded ’Goldhill’ image (0.25 bits p e r pixel)
4.2 Encode the group of IC. 4.2.1 If the group of IG is empty, do nothing; otherwise e Encode the significance status of IG with respect to the current threshold T I . e If it is insignificant, no further encoding is needed, otherwise, 0 Decompose the 1G into two parts: the candidate of significant subband and the new IG as described in Section 2.1 and Fig. 3. Move the candidate of significant subband to SG and encode its significance status. If it is significant. use morDholoaica1 dilation to encode its significant coerficienk as in step 4.1.2. If it is insignificant, no further encoding is needed. Encode the new 1G (i.e. recursively repeat step 4.2). Step 5. Perform the significance refinement pass. IEE Proc.-Vis. Image Signal Process., Vol. 147, No. 6, December 2000
0.25
28.45 30.58
PSNR = peak signal-to-noise ratio Results for the E Z W algorithm are from the authors’ implementation rather than from the original paper.
4
Conclusions
An efficient image compression algorithm has been given. It first employs morphological dilation to extract the clusters of coefficients with high energies within each subband and then uses the quadtree partitioning strategy to represent blocks of zeros. Since the clusters of coefficients with high energies in each subband are first encoded it becomes possible to use large rectangular blocks for the representation of zeros. The proposed ECZQR algorithm is completely embedded and scalable. Moreover, it has a low computational complexity. 5 1
References SHAPIRO, J.M.: ‘Embedded image coding using zerotrees of wavelet coefficients’, IEEE Trans. Signal Process., 1993,41, (12), pp. 33453462 569
2 3
4
5 6
570
SAID, A., and PEARLMAN, W.A.: ‘Image compression using the spatial orientation tree’. Proceedings of IEEE international symposium on Circuits and systems, 1993, Chicago, IL, pp. 279-282 SAID, A., and PEARLMAN, W.A.: ‘A new, fast, and efficient image codec based on set partitioning in hierarchical trees’, IEEE Trans. Circuits Syst. Video Technol., 1996, 6, (3), pp. 243-250 ZHONG, J.M., LEUNG, C.H., and TANG, Y.Y.: ‘An improved embedded zerotree wavelet image compression algorithm based on significance checking in wavelet trees’. Proceedings of IEEE intemational conference on Systems, man, and cybernetics, 1998, San Diego, CA, USA, pp. 4567-4571 TAUBMAN, D., and ZAKHOR, A.: ‘Multirate 3-D subband coding of video’, IEEE Trans. Image Process., 1994, 3, (5), pp. 572-588 SERVETTO, S.D., RAMCHANDRAN, K., and ORCHARD, M.T.: ‘Wavelet based image coding via morphological prediction of significance’. Proceedings of intemational conference on Image processing,
1995, Washington, DC, pp. 530-533 SERVETTO, S.D., RAMCHANDRAN, K., and ORCHARD, M.T.: ‘Image coding based on a morphological representation of wavelet data’, IEEE Trans. Image Process., 1999, 8, (9), pp. 1161-1 174 8 ZHONG, J.M., LEUNG, C.H., and TANG, Y.Y.: ‘Wavelet image coding based on significance extraction using morphological operations’, IEE Proc. Vis. Image Signal Process., 1999, 146, (4), pp. 206-2 10 9 ANTONINI, M., BARCAUD, M., MATHIEN, P., and DAUBECHIES, I.: ‘Image coding using wavelet transform’, IEEE Trans. Image Process., 1992, 1, (2), pp. 205-221 10 WITTEN, I.H., NEAL, R.M., and CLEARY, J.G.: ‘Arithmetic coding for data compression’, Commun. ACM, 1987, 30, (6), pp. 520-540 11 VILLASENOR, J.D., BELZER, B., and LIAO, J.: ‘Filter evaluation and selection in wavelet image compression’, IEEE Trans. Image Process., 1995, 4, (S), pp. 1053-1060 7
IEE Proc.-Vis. Image Signal Process., Vol. 147, No. 6, December 2000