Karhunen-Loeve Based Iterated Function System Encodings - CiteSeerX

4 downloads 0 Views 63KB Size Report
image “lena”; we explicitly cite [9] as the source of ... Figure 5: Truncated KL basis for lena at 16x16. .... Alison Ord, Kevin Smith, and John O'Callaghan for.
International Picture Coding Symposium

Melbourne, Australia 13-15 March 1996

Karhunen-Loeve Based Iterated Function System Encodings Franklin G. Horowitz 1,3, Don Bone2, Paul Veldkamp2 CSIRO Division of Exploration and Mining, Perth, WA, Australia 2 CSIRO Division of Information Technology, Canberra, ACT, Australia 3 Australian Geodynamics Cooperative Research Centre, Perth, WA, Australia 1

ABSTRACT: Iterated Function Systems (IFS) raster compression techniques achieve their results by identifying self-similarities in the source image. However, not all source images contain exploitable self-similarity. We describe a compression technique that combines a Karhunen-Loeve basis set (for non self-similar aspects of an image) with a block based IFS.

1. INTRODUCTION Iterated Function Systems (IFS), popularized in Barnsley [1], provide a concise way of coding certain classes of interscale (fractal or self-similar) redundancies. The Block IFS technique of Jacquin [2,3] was the first (published) automated solution to determining an IFS code for any given image. Jacquin's technique partitions an image into (large) domain blocks and (small) range blocks, and then finds best fitting mappings from the domains to the ranges. Currently, most practical IFS encodings are descended from Jacquin's pioneering technique. These descendents all share mappings with the following schema: data from a domain block are “operated” upon (shrunken in support, amplitude scaled, rotated, flipped) and then summed with a “fixed” part to form an approximation to a range block (Figure 1). Up to now, the encodings have relied upon constants (Jacquin), or tilted planes/low order polynomials (Monro and Dudbridge [4]) as basis functions for the “fixed” parts of range blocks. While the idea clearly has been to use these fixed basis sets for capturing low frequency behaviour in the image, in fact, these expansions are the only mechanism available to an IFS for coding non-selfsimilar aspects of the images. function x-section “Operator” support

Domain Block Sum

“Fixed” Basis

Intermediate

2. BACKGROUND Practical, lossy, raster compression schemes tend to have three distinct stages: First, a front-end, whose job it is to transform the raster into a short representation. The decoding of this representation must maintain as much visual fidelity to the original raster as possible. It is also desirable for the front-end representation to be well matched to the processing performed by the subsequent stages. The Second stage is a quantization of the front-end’s parameters. The task here is to restrict the number of discrete symbols that will be presented to the final lossless stage, while still maintaining the fidelity produced by the first stage as much as possible. Finally, the Third stage is a lossless back-end, whose job it is to achieve a file-size as near as possible to the information content of the output of the second stage. This work explores the use of a truncated Karhunen-Loeve (KL) expansion as an orthonormal range block basis for a block IFS encoding. The KL basis is also known as a Principal Component basis. Briefly, a KL basis set is formed by mapping each range block’s contents into a 1D vector, forming the (real, symmetric) covariance matrix of these vectors, and returning the eigenvectors of the covariance matrix sorted in order of decreasing eigenvalue. The KL expansion has the well-known advantage of being optimal with respect to the energy packing and decorrelation of the coding parameters. Clearly, within the constraints of a given bit budget, a truncated KL expansion best encodes some (not-necessarily self-similar) aspects of an image. The intention is to produce first-stage output that is not only “well matched” to subsequent processing, but much of it is “nearly optimal” for subsequent processing. We encode the residual, which is typically high-frequency “texture”, via the “operator” or fractal part of an IFS mapping. The intention is to capture textures not represented by the KL expansion with a self-similar fractal approximation. While we commenced working on these ideas in 1993, we subsequently found that several other workers have independently explored similar avenues: Øien [5], Barthel [6], and Hurd[7].

3. DESCRIPTION Range Block Fig. 1: Schema for Block IFS.

In its simplest form (which is a direct extension of the technique proposed by Monro and Dudbridge [4]), our method proceeds as follows. The image is first

409

International Picture Coding Symposium

Melbourne, Australia 13-15 March 1996

Fixed Domain Block

that is customised to the symbol stream produced by our quantizer.

4. RESULTS Range Blocks share common Domain Block Fig. 2: Monro and Dudbridge style Domain-Range block mappings. divided into a set of contiguous range blocks of uniform dimension. The range blocks are normalized to zero mean and unit standard deviation. A KL basis is formed for these range blocks, and each block is projected onto this basis. The expansion for each range block is truncated at a predetermined length and this forms the “fixed” part of the encoding. A predetermined domain block (figure 2) is also then shrunken, scaled, and added to the “fixed” KL expansion. The scale factor is determined by least squares against the original image. The resulting front-end representation consists of: the basis blocks for the truncated KL expansion; then for each range block: the normalization parameters, a set of KL projection coefficients and the scaling factor from the domain block. During the decoding phase, the fixed part of the representation is expanded to form an initial image approximation. This is used to determine the respective domain blocks, which are mapped and added to the fixed part of the encoding to form the next image approximation. This process is iterated to a fixed point (the existence of which is guaranteed, since the attractor of the resulting IFS can be viewed as the fixed point of an affine transform of pixel space). The fixed point is then un-normalized to form the decoded image. As the second stage for the results presented below, we chose to quantize each group of similar parameters via the K-Means clustering algorithm. While the performance of the actual algorithm in quantizing parameters is probably adequate, we selected the number of quantization states in a rather ad hoc fashion. By choosing the number of quantization states in a better fashion, we believe that substantial gains in overall compression could be made, at a relatively modest cost in fidelity. Finally, as the third, lossless, stage for the results presented below, we simply used the project-GNU program gzip(1). We briefly explored compressing with the arithmetic compression code of Carpinelli et al. [8]. However, for the parameters that we tuned, we found that gzip outperformed the arithmetic compression results. We believe that further compression gains could be achieved, albeit perhaps only modest ones, by using an arithmetic compressor

We present results gathered for the well known test image “lena”; we explicitly cite [9] as the source of our test dataset. Lena is a 512x512 pixel, 256 gray level image. We reproduce the original here as figure

Fig. 3: The original test image “lena”. 3, in order to visually calibrate the effects of the reproduction technology on our later images. We show results from our technique using both 8x8 and 16x16 pixel Range Blocks. For each size, the first 20 KL basis vectors, ranked (in scanline order) by decreasing energy, and displayed 4 times larger than figure 3, are displayed in figure 4 and

Figure 4: Truncated KL basis for lena at 8x8.

Figure 5: Truncated KL basis for lena at 16x16. figure 5 respectively. In order to compare the visual effects of our quantization, we display the highest fidelity

410

International Picture Coding Symposium

encodings from our test suite. The image was encoded with 8x8 pixel Range Blocks, and 20 terms of the KL expansion were retained. First, we show the result decoded from the front end’s result in figure 6. This image has an RMS error of 3.703 gray

Melbourne, Australia 13-15 March 1996

Figure 8 is the highest compression achieved for this test suite that we judge to be visually “acceptable” (barely). The image is based on 16x16

Fig. 8: Lena, at 19.6:1 compression. Fig. 6: The highest quality, unquantized, decoded image in our test suite. levels, and a losslessly compressed, encoded file size of 352,064 bytes. (It is interesting to note that the floating-point output of the front-end is expansive for 8x8 Range Blocks and KL truncations of 13 terms or greater. Nonetheless, after being processed by the second and third stages, the result is modestly compressive.) Figure 7 is the result for the quantized

pixel Range Blocks and a KL expansion truncated at 6 terms. This image has an RMS error of 12.062 gray levels and a compression of 19.6:1. Table 1 lists the rate-distortion results for the 8x8 pixel Range Blocks in the test suite l. The Table 1: Results for Monro and Dudbridge style Karhunen-Loeve 8x8 Range block IFS encodings. Num KL

Fig. 7: The highest quality quantized and compressed image in our test suite. and compressed equivalent of figure 6. This image has an RMS error of 4.121 gray levels, and a compressed file size of 111,879 bytes, yielding a modest compression ratio of 2.34:1 (the original uncompressed image can be stored in 5122 bytes, of course).

411

Comp

BPP

RMSE

PSNR

20

2.34

3.414

4.121

35.83

19

2.50

3.200

4.208

35.65

18

2.61

3.063

4.305

35.45

17

2.72

2.941

4.446

35.17

16

2.84

2.815

4.610

34.86

15

2.98

2.681

4.772

34.56

14

3.12

2.567

4.956

34.23

13

3.30

2.427

5.160

33.88

12

3.50

2.287

5.357

33.55

11

3.72

2.151

5.581

33.20

10

3.95

2.023

5.864

32.77

9

4.22

1.897

6.189

32.30

8

4.52

1.769

6.634

31.70

7

4.89

1.636

7.082

31.13

6

5.34

1.498

7.602

30.51

5

5.95

1.344

8.021

30.05

International Picture Coding Symposium

Melbourne, Australia 13-15 March 1996

Table 1: Results for Monro and Dudbridge style Karhunen-Loeve 8x8 Range block IFS encodings. Num KL

Comp

BPP

RMSE

PSNR

6.83

1.171

8.622

29.42

3

7.97

1.004

9.699

28.40

2

9.51

0.842

11.049

27.26

1

11.67

0.686

12.923

25.90 2

4

the 16x16 pixel Range Blocks in the test suite. The columns are labelled in the same fashion as in table 1. Figure 9 plots the rate distortion curves for this test suite, and compares the results with those found for some other techniques. The other techniques are more fully described, and their rate-distortion data tabulated in the Waterloo Repertoire [9]. Briefly: the

Comp

BPP

RMSE

PSNR

10

Num KL

KL IFS 8x8 KL IFS 16x16 TRNA FIFA FIFB JPEG

2

Table 2: Results for Monro and Dudbridge style Karhunen-Loeve 16x16 Range block IFS encodings.

Compression Factor

3

4

5 6 7

100

columns are titled as follows: “Num KL” is the truncation order for the Karhunen-Loeve expansion; “Comp” is the compression ratio; “BPP” is the corresponding bitrate per pixel; “RMSE” is the root mean squared error against the original image (in gray levels); and “PSNR” is the corresponding peaksignal to noise ratio in dB. Table 2 lists the results for

8.53

0.938

8.206

29.85

19

8.70

0.919

8.355

29.69

18

9.10

0.879

8.502

29.54

17

9.54

0.839

8.696

29.34

16

9.99

0.801

8.884

29.16

15

10.47

0.764

9.078

28.97

14

11.10

0.721

9.305

28.76

13

11.73

0.682

9.147

28.91

12

12.46

0.642

9.715

28.38

11

13.30

0.601

9.995

28.14

10

14.25

0.562

10.348

27.83

9

15.27

0.524

10.642

27.59

8

16.40

0.488

11.116

27.21

7

18.50

0.432

11.562

26.87

5. CONCLUSION

6

19.61

0.408

12.062

26.50

5

21.68

0.369

12.656

26.08

4

24.75

0.323

13.384

25.60

3

28.62

0.279

14.825

24.71

2

33.95

0.236

16.45

23.81

1

42.21

0.190

18.141

22.96

Clearly, from examining the rate-distortion curves for our technique, much of the fidelity comes from the KL basis expansion, with relatively minor amounts coming from the “operator” part of the IFS. This is likely to be an artifact of our preselection of Domain Blocks (in the style introduced by Monro and Dudbridge [4]). If we were to relax this restriction, and perform a search for the best matching Domain Block, we expect to find significantly better performance. The cost of this performance is, of

30

25

20

15

10

5

3

4

5 6 7

20

RMS Error (gray levels)

Fig. 9: Rate-Distortion comparisons. curve labelled “TRNA” is an IFS encoder from Fisher [10]; those labelled “FIFA” and “FIFB” are IFS encoders from two different versions of Iterated Systems Inc.’s commercial product “Images Incorporated”; and “JPEG” is derived from the freely-available source code.

412

International Picture Coding Symposium

course, an increase in the encoding time (because searches are expensive), and an extra parameter to store (the index of the best-fitting Domain Block). This last cost could perhaps be offset by choosing a fixed basis set, such as the DCT, and re-gaining the bits currently allocated to transmitting/storing the KL Basis vectors (a nontrivial cost, as evidenced by the slope of our rate-distortion curves in fig. 9). Clearly there exists a threshold, past which it is more bitefficient to use a DCT (or something else fixed) rather than a KL expansion. Of course, this raises the obvious possibility of an IFS texture enhancement to JPEG compression: the key question is whether or not such a scheme would yield an improvement in JPEG’s rate-distortion performance. There are many other variations on this general framework, and we have explored several - in particular, the extension to 3-D (e.g. geophysical or medical rasters) is straightforward. (Indeed, this compression scheme is a spin-off from our work on representing 3D geoscientific fields with IFS’s.) In this work, we have concentrated, almost exclusively, on the design of the first stage. While we feel that the second (quantization) and third (lossless compression) stage algorithms we use for constructing our rate-distortion curves are reasonable choices, we have no doubt that fine tuning of these algorithms could also yield further (perhaps substantial) gains in performance of this baseline technique.

6. REFERENCES [1] M. Barnsley, “Fractals Everywhere”, Academic Press, Boston, 1988. [2] A. Jacquin, “Fractal Image Coding Based on a Theory of Iterated Contractive Image Transformations”, Proc. SPIE Visual Comm. and Image Proc., 1990, pp. 227-239.

Melbourne, Australia 13-15 March 1996

Germany), personal communication during the NATO ASI on Fractal Image Encoding and Analysis, Trondheim, Norway, July, 1995. [7] Lyman P. Hurd (Iterated Systems, Incorporated, Norcross, Georgia, USA), personal communication during the NATO ASI on Fractal Image Encoding and Analysis, Trondheim, Norway, July, 1995. [8] Carpinelli, J., Salamonsen, W., Moffat, A., Neal, R., and Witten, I., “Word, Character, and Bit Based Compression Using Arithmetic Coding”, University of Melbourne, ftp://munnari.oz.au/pub/arith_coder, March, 1995. [9] John Kominek maintains the Waterloo Repertoire, a suite of 32 test images with rate-distortion statistics for various different compression methods; the URL: points to the source of our test image; the URL: points to the home page. [10] Fisher, Y. (ed.), “Fractal Image Compression: Theory and Application”, Springer Verlag, New York, 1995; home page URL: .

7. ACKNOWLEDGEMENTS This work was performed under the auspices of the IMPADS project, funded by the Australian Department of Industry, Science, and Technology’s GIRD scheme. We would like to thank Bruce Hobbs, Alison Ord, Kevin Smith, and John O’Callaghan for their efforts in enabling this interdisciplinary collaboration. This publication has been authorized by the Director of the Australian Geodynamics Cooperative Research Centre.

[3] A. Jacquin, “Image Coding Based on a Fractal Theory of Iterated Contractive Image Transformations”, IEEE Transactions on Image Processing, v.1, 1992, pp. 18-30. [4] D. M. Monro and F. Dudbridge, “Fractal Block Coding of Images”, Electronics Letters, v. 28, no. 11, 1992, pp. 1053-1054. [5] Geir E. Øien, “L2-Optimal Attractor Image Coding with Fast Decoder Convergence”, Ph.D. Dissertation, Institutt for Teleteknikk, Norges Tekniske Høgskole, Trondheim, Norway, (Rapport 429304) April 1993, 143 p. [6] Kai-Uwe Barthel (Technische Universität Berlin,

413