Fast Wavelet Packet Image Compression - Semantic Scholar

2 downloads 0 Views 250KB Size Report
Current standard algorithms for storage and transmission of digitized images start with a compres- ... picture. Unfortunately any recipe leading to significant compression rates ... Wavelets yield a multiscale decomposition: - low frequency.
Submitted to:

Data Compression Conference -DCC’98, Snowbird, Utah. March 29 - April 1, 1998

Fast Wavelet Packet Image Compression Franc¸ois G. Meyer 1 , Amir Averbuch 2 , 3 Jan-Olov Stromberg ¨ , and Ronald R. Coifman 4 1

Department of Diagnostic Radiology and Computer Science, 4 Department of Mathematics Yale University, New Haven, CT 06520, USA 2 School of Mathematical Sciences, Tel Aviv University, Tel Aviv 69978, Israel 3 Department of Mathematics, Tromso¨ University, Norway e-mail: [email protected]

1 Introduction Current standard algorithms for storage and transmission of digitized images start with a compression method and ends with a decompression method for recovery of a good replica of the original picture. Unfortunately any recipe leading to significant compression rates involves some loss of information, and as such has to be matched to the nature of the image, and to our tolerance of error. The main defect of the block DCT methods (such as JPEG) is due to the limitation put on the size of the blocks and the inability to adapt the patterns to the nature of the picture. Another defect becomes apparent when reaching high compression rates: – the blocks used for compression start to be visible in the reconstructed picture. An answer to this problem can be provided by using wavelets to expand the image, instead of 8x8 DCT blocks. Wavelets yield a multiscale decomposition: - low frequency trends occurring at a large scale in the image, can be as efficiently coded as local singularities, such as edges. Wavelets with many vanishing moments yield sparse decompositions of piece-wise smooth surfaces ; therefore they provide a very appropriate tool to compactly code smooth images [1, 9]. Wavelets, however, are ill suited to represent oscillatory patterns. To alleviate this issue much larger libraries of waveforms, called wavelet packets [4], have been developed. In the two dimensional case wavelet packets are patterns that can vary in scale, oscillation, and location. In order to select an orthonormal basis among all the wavelet packets, these patterns are matched to the image, and a selection of best matches which are sufficient for an efficient reconstruction is made. The selected collection of patterns is called the “best basis” [5]. When coding images that contain a mixture of smooth and textured features, the best basis algorithm is always trying to find a compromise between two conflicting goals: – describe the large scale smooth regions, and describe the local textures. In [8], the optimal basis is chosen according to the rate-distortion criterion. Very recently some authors have proposed to use wavelet packets to characterize texture and code images [3, 6]. The main contribution of this paper is a new fast wavelet packet compression algorithm which encodes very efficiently textured images. This fast wavelet packet compression technique relies on four stages: 1. Convolution and decimation of the image with factorized non-separable very fast filters. 2. Selection of a best basis in the large wavelet packet library. 3. Scanning of the wavelet packet coefficients by increasing frequency. This organization results in sequences of coefficients with a rapid decay. 4. Successive approximation quantization, and entropy coding of the coefficients. This quantization technique encodes the coefficients according to their significance, and generates long sequences of zeros. 1

This paper is organized as follows. In the next section we provide a general description of the principles of the algorithm. This is followed in section 3 by a description of the wavelet packet library. In section 4 we describe a new scheme to order the coefficients of a wavelet packet expansion. The successive approximation quantization scheme, that quantize the wavelet packet coefficient is presented in Section 5. Results of experiments are presented in Section 6.

2 General description of the wavelet packet compression algorithm A block diagram of the algorithm is shown in Fig. 1. The first part is the selection, among a large collection of bases, of that basis which is best adapted to encode the image. The coefficients of the best wavelet packet basis are then quantized using a successive approximation quantization technique. The quantization generates sequences of 0 and 1 that are zero-runlength coded. Collection of budget

libraries of bases

original image f

1

2

Best basis expansion

Coefficients ordering

3 Quantization

Description of the best basis: quadtree

4 run length coding

entropy coding

coded bit stream

Figure 1: Block diagram of the wavelet packet compression algorithm. The compression consists of four parts: (1) the best basis selection, and the calculation of the coefficients of the image using the best basis, (2) ordering of the coefficients, and (3) the quantization of the stream of coefficients. Finally, in (4) the sequence of 0 and 1 is run length coded. The quadtree that describes the best basis is also entropy coded.

3 The wavelet packet library The wavelet packet library [4] is composed of functions, or atoms, that have different time frequency localizations. The atoms provide an overcomplete representation: - there is not a unique decomposition of each image over the library. Among all possible decompositions of a given image, one would like to pick up the most compact decomposition, which yields a sparse representation. We consider two conjugate quadrature filters, the low pass filter fhn g , and the high pass filter fgn g. We have gn = (?1)nh 1?n , and they satisfy

2

X

n2Z

hn hn+2k = 0;k

and

X

hn = 1

in such a way that we have perfect reconstruction. We consider the discrete signal f = ffn g 0; : : : ; N ? 1. The wavelet packet coefficients wn; j; l are defined by the following recursion:

w2n;j;l = w2n+1;j;l = w0;0;k =

X k X k fk 2

(3.1)

n=

hk?2l wn;j+1;k

(3.2)

gk?2l wn;j+1;k

(3.3) (3.4)

The indices are interpreted as follows:

j is the scaling index: the size of the support of the corresponding wavelet packet is approximatively 2?j , - l is the localization parameter: the corresponding wavelet packet is roughly located at l 2?j - and n is the frequency index: the wavelet packet has approximatively n oscillations. Note that we effectively jump to a coarser resolution, from j +1 to j , when we increase the “frequency index” n of the wavelet packet. We note that the library organizes itself into a binary tree as shown -

in Fig. 2 where the nodes of the tree represent subspaces with different time-frequency localization characteristics. x1

x2

x3

x4

x5

x6

H

s1

s2

x8

d3

d4

G

s3

s4

d1

d2 H

G

H

x7

G

ss1

ss2

ds1

ds2

sd1

sd2

dd1

dd2

H

G

H

G

H

G

H

G

sss

dss

sds

dds

ssd

dsd

sdd

ddd

Figure 2: Wavelet packet tree. At each node of the tree, we apply a convolution and a decimation with the lowpass filter H , and the highpass filter, G. The prefix “s” stands for the sum, or lowpass filter, and “d” stands for difference or highpass filter. The wavelet basis is obtained by iterating the decomposition process on the low frequency bands only, without further decomposing the high frequency component at each level of the tree. The wavelet tree is shown in Fig. 3. Clearly the library provides an over-complete description of the signal f . We need to know how to assemble the elements of the library to obtain an orthogonal basis. Loosely speaking, wavelet packets make it possible to adaptively tile the frequency domain into different bands of arbitrary size. Although the situation is slightly more complex, we can say that if a collection of functions in the library provides a tile of the time-frequency plane, then this set of functions is an orthonormal basis. If we associate the dyadic frequency interval [2j n; 2j (n + 1) ) to the wavelet packet coefficient wn; j; l , then we can build orthonormal bases from the binary tree [4]: Theorem 1 If a subset E  N  Z has the property that the union of intervals [2j n; 2j (n +1) ); j n < 2 j , (n; j ) 2 E is a disjoint cover of [0; 1), then the set of wavelet packet coefficients

wn; j; l (n; j ) 2 E are the coefficients of f in an orthonormal basis.

3

2 N; 0 

x1

x2

x3

x4

x5

x6

H

s2

s1

x7

x8

d3

d4

G

s3

d1

s4

d2

G

H ss1

ss2

H

G

sss

dss

ds1

ds2

Figure 3: Wavelet tree in gray. Only the result of the lowpass filtering is further decomposed with the lowpass filter H , and the highpass filter, difference or highpass filter.

G.

The prefix “s” stands for the sum, or lowpass filter, and “d” stands for

3.1 Fast convolution in the wavelet transform: 2-D factorization We seek a procedure that shortens the 1-D and 2-D convolution time of the biorthogonal filters by reducing the sizes of the filters. We merge the decimated lowpass h and highpass g filters into one non-decimated mapping y = (h; g)x (3.5) where the input data x = (xi ) is defined for all integers i and y = (yi ) is defined by:

(P yi = Pk hg k?i xi x k 1+k?i i

for i even for i odd

(3.6)

In this setting two finite band filters almost trivially dissolves into a composition of elementary local operators. We want to reduce the length of the filter h by canceling the endpoint filter coefficient. If the sum of the lengths of the filters is  2m + 1 then we need at most m iterated linear operators to reduce the both filters h and g to length 1 [7]. This gives a very effective way to compute the decimated lowpass and highpass filtered values. It reduces the number of multiplications and additions by factor of about 1=2. This method of reducing the filter computations is equivalent to the so called lifting theorem [10] and [11, 12]. In the two-dimensional case the lowpass filter and the highpass filter combinations in both directions are merged to the grid y = (h; g)2 x: (3.7)

We use the notation (h; g)2 = (h; g)y (h; g)x where (h; g )x is the 1-dimensional mapping (h; g ) taken on each row of the grid and (h; g )y is the 1-dimensional mapping (h; g ) taken on each column of the grid. The operators (h; g )y and (h; g )y are factorized as in the one-dimensional case. Observe that each of the factors in the x-direction is computed with factors of the y -direction. This means that y

= (h; g) x = ml (ha ; gb ) x

(3.8)

= (ha ; gb ) x

(3.9)

2

=1

where (hal ; gbl )2 is 2  2 matrix. The operator y

l

4

l

l

2

l

2

may be computed explicitly by

= xi ;j = xi;j + xi;j? + xi;j (3.10) = xi ;j + xi? ;j + xi ;j = xi;j + (xi? ;j + xi ;j + yi;j + yi;j? ) where is a parameter of the factorization. Thus, the operator (ha ; gb ) needs an average of 8 additions and 2 multiplications per 4 points of data. The above factorization was incorporated into the biorthogonal filters of lengths 9 ? 7, 7 ? 5 and 5 ? 3, i.e m = 4; 3; 2, respectively. yi+1;j+1 yi;j+1 yi+1;j yi;j

+1 +1 +1

1

+1

+1

1

1

+1

+1

+1

1

l

l

2

The main gain in the speedup of the computation is derived from: – the fact that no 2-D transpose of the image is needed, – and from the non-separable 2  2 convolution. The use of a factorization of biorthogonal two-band filters which implements the 2-D biorthogonal filters of length (2m+1; 2m?1) consumes an average of 2m multiplication and 8m additions per set of 4 data points. The overall speedup factor gained by the application of this wavelet packet transform which is based on the above 2-D filter factorization is 4 in comparison with the regular application of the separable 1-D convolution.

3.2 Best basis algorithm The best basis selection algorithm was first introduced in [5]. Several variations of the initial algorithm were described later. In [8], the optimal basis is chosen according to the rate-distortion criterion. We note that the results published in [8] correspond to hypothetical compression rates, since the first order entropy was chosen to measure the rate, and real bit stream were not generated. The “best basis” paradigm permits a rapid (order N log(N ), where N is the number of pixels in the image) search among the large collection of orthogonal bases to find that basis which permits the best approximation according to a given cost function (e.g. coding efficiency). A complete basis called best-basis which minimizes this criterion is searched in this binary tree using a “divide and conquer” algorithm: – at each node, the cost is compared with the cost of the union of its two children’s nodes and if the node’s cost is smaller than the children’s costs, the node is retained; otherwise, the children nodes are retained instead of the node itself. This process is recursively applied from the bottom to the top of the tree. Before selecting the best basis we denoise the image:– we threshold the coefficients in the wavelet packet tree to remove those coefficients whose magnitude are below the given threshold. The threshold is defined as the amplitude of the smallest non-zero coefficients that can be reconstructed after inverse quantization. Secondly we measure the compactness of the bases with the l1 norm. Our cost function is thus defined as follows X (3.11) cost(x; ) = jxij

i;jxij>

We have tried several cost measures, including calculating the first order entropy for each node (from the histogram at that node). After denoising, the l1 norm provides a very good measure of the overall budget required to encode the coefficients. Note that we do not use a scalar quantization, and thus we did not compare our approach to the rate-distortion approach of Ramchandran and Vetterli. We note that we can estimate the best basis in statistical sens, for a large class of images. Once this optimal basis has been estimated, we need not estimate a new best basis for each image, but rather we can use the pre-estimated best basis geometry.

5

4 Frequency ordering of the coefficients Once the best basis has been selected according to the cost function, the image is represented by a set of wavelet packet coefficients. For the wavelet compression we utilize the correlation across scales [2]. While in the wavelet tree, there is no natural multiscale hierarchy in a best basis tree. Therefore, algorithms that exploit zero-tree cannot be easily extended to a best basis tree. Nevertheless we tried to extend this idea, and we proposed two different scanning methods for the wavelet packet coefficients that both result in sequences of coefficients with rapid decay [7]. Both scanning methods exploit the fact that if the image f is smooth, then the amplitude of the coefficients decreases as the frequency of the wavelet packet increases [7]: Lemma 1 [7] If f is C r regular, then there exists a positive constant C such that

jwn; j; l > j  C 2?q= 2?r q j 2q  n < 2q 2

where q is such that:

( + )

(4.1)

+1

Because of this bound on the coefficient, we organize the wavelet packet coefficients by increasing frequency. The goal is to obtain long sequences of zeroes and short sequences of significant coefficients which represent singularities. We consider the graphical representation of the best wavelet packet basis of an image, as shown in figure 4. (0,0)

Wavelet packet coefficients

Figure 4: The wavelet packet subband are ordered by increasing frequency. Each block is characterized by two parameters: its position, and its size. The position of each block encodes the frequency of the wavelet packet (we consider here that the decomposition is done in natural order and not in the Paley order). The size of the block encodes the inverse of the size of the support of the wavelet packet. We define an order in the frequency plane as follows: the frequency(x2 ; y2 )

is greater than the frequency

jx j + jy j < jx j + jy j 1

1

2

2

1

1

2

2

1

1

if (4.2)

or

jx j + jy j = jx j + jy j

(x ; y )

and

jy j < jy j 1

2

Figure 4 shows the zigzag patterns that correspond to ordering all the blocks in the frequency plane. Organizing the wavelet packets consists of traveling in the spatial-frequency domain. First, we order all the blocks using the order defined in ((4.2)). Then, starting with the smallest frequency block, we scan all the coefficient in that block. When we have exhausted all the coefficients in that block, we go to the next higher frequency block. This scanning scheme results in a sequence of coefficients, such that the average amplitude of the coefficients decreases rapidly when we jump from one block of coefficients to another (i.e. when the frequency index, n, of the wavelet packet increases). 6

5 The Quantization Loop Once the wavelet packet coefficients are properly ordered,we quantize the coefficients. The scanning and quantization of the wavelet packet coefficients arrange the coefficients according to their importance in descending order. Therefore, it enable adaptive transmission. The more wavelet packet coefficients are sent, the better reconstructed are the textures and the fine details. These coefficients are quantized during a number of iterations of what is called the “Quantization loop”. This loop generates the quantization code. Whenever the stopping condition is met the loop is terminated (possibly in a middle of a loop) and with it the whole process. Thus, we meet exact budget requirement. In each iteration of the quantization loop the wavelet packet coefficients are classified into two classes: – those coefficients whose magnitude is larger than the threshold are important, – the others are unimportant with respect to the current threshold. The initial threshold T0 is set to be half of the maximal magnitude of all wavelet packet coefficients. The threshold is divided by two at each iteration, and as a result the group of important coefficients widens from one phase to the next. At each stage the quantization loop generates a map of important coefficients with respect to the thresholds T0 ; T20 T40 ; : : :. For each important coefficient the quantizer emits a 1. A sign bit is also emitted for each coefficient when it becomes important for the first time. After K iterations the magnitude of each coefficient falls into one of the bins

[0; 2TK ]; [ 2TK ; 2KT? ]; : : : ; [ T2 ; T ]; [T ; 2 T ] 0

0

0

0

0

1

0

0

The inverse quantizer reconstructs all the coefficients in the bins respectively as:

3 2KT ; 3 2TK ; : : : ; 3 T4 ; 23 T 0

0

0

+1

0

The list of bits emitted by the quantizer are entropy coded using a zero run length coding technique.

6 Experiments We have implemented the coder and decoder, and an actual bit stream was created for each experiment. We present the results of the wavelet packet compression algorithm, using the following three test images: 512x512 “Lenna”, 512x512 “Barbara”, 512x512 “Mandrill”. Barbara and Mandrill are difficult to compress because they contain richly textured regions. In order to emphasize the performance of the algorithm, we have compared the wavelet packet coder to a wavelet coder [2], and to the latest version of JPEG, which usually performs well on textured images at low compression ratio. The performance of the algorithm is summarized in Tables 1, 2, and 3. We work with 8 bit images, and we define the Peak Signal to Noise Ratio (PSNR) of the compressed image as

PSNR = 10 log10

255 PN ? jI (i; j ) ? I (i; j )j c N 2 i;j 2

1 =0

1

2

(6.1)

As is expected, the wavelet packet coder and the wavelet coder, perform equally well on the image Lenna. This image contains mostly smooth surfaces, and is the ideal image for wavelets. However the wavelet packet coder clearly outperforms the wavelet coder on the image Barbara, both in terms of PSNR and visual quality. The wavelet packet coder also outperforms the wavelet coder with the image Mandrill in terms of visual quality, as is shown below. Fig.5 left shows the image Barbara coded with wavelets at a compression ratio of 32:1. On the right of Figure 5 we have the image Barbara coded with wavelet packets at the same compression ratio of 32:1. Fig. 7 left shows the geometry of the best basis. We note that the segmentation is very different from the wavelet basis, 7

reflecting some significant oriented textures in the image. We note that the texture on the pants of Barbara is very well preserved with the wavelet packets. Fig. 6 on the left shows the Mandrill image coded with wavelets at a compression ratio of 57:1. Fig 6 on the right shows the Mandrill image coded with wavelet packets at the same compression ratio of 57:1. Fig. 7 right shows the geometry of the best basis. Again we note that the best basis is very different from the wavelet basis, reflecting some significant oriented textures in the image. Both compressed Mandrill images have similar PSNR, however in terms of visual quality the wavelet packet coded image is much crisper: the Mandrill still keeps its high frequency features such as the whiskers, and the hair on its face. On the wavelet side (left) the high frequency features such as the whiskers and its hair have been smeared.

Ratio 4 8 12 16 20 24 28

JPEG 39.54 36.51 35.02 33.97 33.05 32.22 31.68

Lenna Wavelet 42.65 38.13 36.08 35.22 34.75 33.85 33.06

Wavelet-Packets 41.24 37.44 36.10 34.85 33.99 33.46 32.75

Ratio 32 39 47 52 60 70

JPEG 30.98 30.03 29.19 28.63 27.96 27.12

Lenna Wavelet 32.57 32.01 31.52 30.89 30.16 29.63

Wavelet-Packets 32.13 31.33 30.60 30.37 29.77 29.14

Table 1: Coding results for 8bpp. 512x512 Lenna

Ratio 4 8 12 16 20 24 29

JPEG 38.91 33.25 30.16 28.25 27.00 26.11 25.44

Barbara Wavelet Wavelet-Packets 41.72 40.93 34.38 35.73 32.45 32.93 29.71 31.35 28.94 30.01 28.49 29.27 26.75 28.43

Ratio 32 41 48 58 73 94

JPEG 25.08 24.25 23.80 23.31 22.74 21.87

Barbara Wavelet Wavelet-Packets 26.07 28.05 25.38 26.89 25.09 26.43 24.75 25.85 23.97 24.99 23.42 24.18

Table 2: Coding results for 8bpp. 512x512 Barbara

Mandrill Ratio 4 8 12 17 20 25

JPEG 30.25 26.24 24.74 23.68 23.22 22.63

Wavelet 32.78 27.83 25.62 24.24 23.88 23.41

Mandrill Wavelet-Packets 32.41 27.65 25.96 24.49 23.94 23.29

Ratio 29 34 42 57 85

JPEG 22.29 21.86 21.36 20.73 19.90

Wavelet 23.00 22.40 21.85 21.31 20.86

Table 3: Coding results for 8bpp. 512x512 Mandrill

8

Wavelet-Packets 22.92 22.57 22.10 21.48 20.82

6.1 Conclusion We have implemented the algorithm and we have studied its capabilities. This method does achieve a very good quality image compression even at low bit rates. At these rates traditional coding techniques, such as JPEG, tend to allocate too many bits to the “trends” and have few bits left to represent “singularities”. Traditional methods are based on Windowed Fourier Transform and the localization of these techniques frequently cause artifacts, such as the blocking effect in JPEG. Its success comes from exploiting correctly the information about the image as it is represented in the different orthogonal libraries of wavelet packets . Our experiments prove the robustness and efficiency of the algorithm. The results achieved by this method are better than those achieved by standard techniques such as JPEG, or wavelet coders.

References [1] A. Averbuch, D. Lazar, and M. Israeli. Image compression using wavelet transform and multiresolution decomposition. IEEE Trans. on Image Processing, Vol.5, No.1:pp 14–15, 1996. [2] A. Averbuch and R. Nir. Still image compression using coded multiresolution tree. unpublished manuscript, 1997. [3] T. Chang and C.C.J. Kuo. Texture analysis and classification with tree-structured wavelet transform. IEEE Trans. on Image Processing, Vol.2, No.4:pp 429–441, 1993. [4] R.R. Coifman and Y. Meyer. Size properties of wavelet packets. In Ruskai et al, editor, Wavelets and their Applications, pages pp. 125–150. Jones and Bartlett, 1992. [5] R.R. Coifman and M.V. Wickerhauser. Entropy-based algorithms for best basis selection. IEEE Trans. on Information Theory, Vol 38,No 2:713–718, March 1992. [6] J. Li, P.Y. Cheng, and C.C.J. Kuo. An embedded wavelet packet transform technique for texture compression. In Proceedings of Mathematical Imaging: Wavelet Applications in Signal and Image Processing’95, pages 602–613. SPIE Vol 2569, 1995. [7] F.G. Meyer, A.Z. Averbuch, J-O. Stromberg, ¨ and R.R. Coifman. Multi-layered image compression. Submitted, 1997. [8] K. Ramchandran and M. Vetterli. Best wavelet packet bases in a rate-distortion sense. IEEE Trans. on Image Processing, pages pp 160–175, April 1993. [9] J.M. Shapiro. Embedded image coding using zerotrees of wavelet coefficients. IEEE Trans. on Signal Processing, pages 3445–3462, Dec. 1993. [10] W. Sweldens. The lifting scheme: A new philosophy in biorthogonal wavelet constructions. In Proceedings of Mathematical Imaging: Wavelet Applications in Signal and Image Processing’95, pages 68–79. SPIE Vol 2569, 1995. [11] P.P. Vaidyanathan. Multirate digitial filters, filter banks, polyphase networks, and applications: a tutorial. Proc. of the IEEE, 78:56–93, Jan. 1990. [12] P.P. Vaidyanathan. Multirate systems and filter banks. Prentice Hall, 1993.

9

Figure 5: Barbara, compression ratio: 32. Left: Wavelet. Right: wavelet packets

Figure 6: Mandrill, compression ratio: 57. Left: Wavelet. Right: wavelet packets

Figure 7: Best basis geometry. Left: Barbara, compression ratio: 32. Right: Mandrill, compression ratio: 57 10