Optimization and Generalization of Lifting Schemes

Ph.D. Thesis Dissertation

Optimization and Generalization of Lifting Schemes: Application to Lossless Image Compression

Author: Joel Sol´ e Rojals Advisor: Prof. Philippe Salembier

Image Processing Group Department of Signal Theory and Communications Universitat Polit` ecnica de Catalunya Barcelona, April 2006

Als meus pares,

iv

Abstract

This Ph.D. thesis dissertation addresses multi-resolution image decomposition, a key issue in signal processing that in recent years has contributed to the emergence of the JPEG2000 image compression standard. JPEG2000 incorporates many interesting features, mainly due to the discrete wavelet transform stage and to the EBCOT entropy coder. Wavelet analysis perform multi-resolution decompositions that decorrelate signal and separate information in useful frequency-bands, allowing flexible post-coding. In JPEG2000, decomposition is computed through the lifting scheme, the so-called second generation wavelets. This fact has focused the community interest on this tool. Many works have been recently proposed in which lifting is modified, improved, or included in a complete image coding algorithm. The Ph.D. thesis dissertation follows this research line. Lifting is analyzed, proposals are made within the scheme, and their possibilities are explored. Image compression is the main objective and it is principally assessed by means of coding transformed signal with EBCOT and SPIHT coders. Starting from this context, the work diverges in two distinct paths, the linear and the nonlinear one. The linear lifting filter construction is based on the idea of quadratic interpolation and the underlying linear restriction due to the wavelet transform coefficients. The result is a flexible framework that allows the creation of new transforms using different criteria and that may adapt to the image statistics. The nonlinear part is founded on the adaptive lifting scheme, which is extensively analyzed and as a consequence, a generalization of the lifting is proposed. The discrete version of the generalized lifting is developed leading to filters that achieve good compression results, specially for biomedical and remote sensing images.

v

vi

Resumen Esta tesis aborda el problema de la descomposición multi-resolución, tema clave en procesado del se˜ nal que ha llevado estos u ´ltimos a˜ nos a la creación del sobresaliente estándar JPEG2000 de compresión de imágenes. JPEG2000 incorpora una serie de funcionalidades muy interesantes debido básicamente a la transformada wavelet discreta y al codificador entrópico EBCOT. La transformada wavelet realiza una descomposición multi-resolución que decorrela la se˜ nal separando la información en un conjunto de bandas frecuenciales u ´tiles para la posterior codificación. En JPEG2000, la descomposición se calcula mediante el esquema lifting, también llamado wavelet de segunda generación. La integración del esquema lifting en el estándar ha centrado el interés de muchos investigadores en esta herramienta. Recientemente, han aparecido numerosos trabajos proponiendo modificaciones y mejoras del lifting, as´ı como su inclusión en nuevos algoritmos de codificación de imágenes. La tesis doctoral sigue esta l´ınea de investigación. Se estudia el lifting, se hacen propuestas dentro del esquema y sus posibilidades se exploran. Se ha fijado la compresión de imágenes como el principal objetivo para la creación de nuevas transformadas wavelet, que se eval´ uan en su mayor parte mediante la codificación de la se˜ nal transformada con EBCOT o SPIHT. Dentro de este contexto, el trabajo diverge en dos caminos distintos, el lineal y el no lineal. La construcción de filtros lifting lineales se basa en la idea de interpolación cuadrática y la restricción lineal subyacente de los coeficientes wavelet. El resultado es un marco de trabajo flexible que permite la creación de transformadas con distintos criterios y adaptables a la estad´ıstica de la imagen. La parte no lineal tiene sus fundamentos en el esquema lifting adaptativo, del cuál se ofrece un extenso análisis y como consecuencia se propone una generalización del lifting. Su versi´ on discreta se desarrolla consiguiendo filtros lifting que obtienen buenos resultados, sobretodo en imágenes biomédicas y de detección remota.

vii

viii

Agradecimientos Primero, quiero dar las gracias a Philippe. Dif´ıcilmente podr´ıa imaginar un mejor director de tesis. Gracias también al grupo de imagen y a los doctorandos de GPS por toda la ayuda, as´ı como los cafés y viajes compartidos, y el f´ utbol de los jueves. Finalmente, agradezco a todos los que fuera de la universidad me han soportado; compa˜ neros de piso y amigos, y especialmente mi familia y Anna.

Joel Barcelona, Abril 2006

ix

x

Contents

Notation

xvii

Acronyms

xix

1 Introduction

1

1.1

Motivation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.2

Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.3

Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

2 Discrete Wavelet Transform, Lifting, and Image Coding: An Overview 2.1

Discrete Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

2.1.1

Multi-resolution Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

2.1.2

Biorthogonal Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2.1.2.1

Vanishing Moments . . . . . . . . . . . . . . . . . . . . . . . . .

12

Discrete Wavelet Transform in Image Compression . . . . . . . . . . . . .

12

Lifting Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

2.2.1

Classical Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

2.2.2

Polyphase Characterization of Perfect Reconstruction . . . . . . . . . . .

17

2.2.3

Polyphase Characterization of Lifting Scheme . . . . . . . . . . . . . . . .

20

2.2.4

Lifting in JPEG2000-LS . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.2.5

Space-Varying Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

2.2.6

Adaptive Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

2.1.3 2.2

7

xi

2.3

Review of Lifting Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

2.3.1

Methods for Lifting Design . . . . . . . . . . . . . . . . . . . . . . . . . .

28

2.3.2

Methods for Lifting Optimization . . . . . . . . . . . . . . . . . . . . . . .

31

2.3.3

Lifting in Video Compression . . . . . . . . . . . . . . . . . . . . . . . . .

34

2.4

Other Adaptive, Nonlinear, and Sparse Decompositions . . . . . . . . . . . . . .

36

2.5

Wavelet-based Image Coders . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

2.5.1

Embedded Bit-Stream and Scalability . . . . . . . . . . . . . . . . . . . .

39

2.5.2

SPIHT coder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

2.5.3

EBCOT coder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

3 Linear Lifting Schemes: Interpolative and Projection-based Lifting 3.1

3.2

3.3

43

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

3.1.1

Convex Optimization Theory . . . . . . . . . . . . . . . . . . . . . . . . .

45

Quadratic Image Interpolation Methods . . . . . . . . . . . . . . . . . . . . . . .

47

3.2.1

Quadratic Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

3.2.2

Optimal Quadratic Interpolation . . . . . . . . . . . . . . . . . . . . . . .

48

3.2.3

Alternative Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

3.2.3.1

Signal Bound Constraint . . . . . . . . . . . . . . . . . . . . . .

51

3.2.3.2

Weighted Objective . . . . . . . . . . . . . . . . . . . . . . . . .

52

3.2.3.3

Energy Penalizing Objective . . . . . . . . . . . . . . . . . . . .

52

3.2.3.4

Signal Regularizing Objective . . . . . . . . . . . . . . . . . . . .

53

3.2.3.5

l1 -norm Objective . . . . . . . . . . . . . . . . . . . . . . . . . .

54

Projection-based Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

3.3.1

Wavelet Linear Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

3.3.2

Linear Prediction Steps Construction . . . . . . . . . . . . . . . . . . . . .

56

3.3.3

Linear Update Steps Construction . . . . . . . . . . . . . . . . . . . . . .

58

3.3.3.1

59

First Linear ULS Design . . . . . . . . . . . . . . . . . . . . . . xii

3.4

3.3.3.2

Second Linear ULS Design . . . . . . . . . . . . . . . . . . . . .

60

3.3.3.3

Third Linear ULS Design . . . . . . . . . . . . . . . . . . . . . .

61

Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63

3.4.1

Auto-regressive Image Model . . . . . . . . . . . . . . . . . . . . . . . . .

63

3.4.2

Interpolation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64

3.4.2.1

Interpolation Methods PSNR Performance . . . . . . . . . . . .

67

Optimality Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

3.4.3.1

First Prediction Step Study . . . . . . . . . . . . . . . . . . . . .

68

3.4.3.2

Update Step Study . . . . . . . . . . . . . . . . . . . . . . . . .

70

3.4.3.3

Second Prediction Step Study . . . . . . . . . . . . . . . . . . .

71

Improved Linear Lifting Steps Performance . . . . . . . . . . . . . . . . .

74

3.4.4.1

ULS with AR-1 Signal Test . . . . . . . . . . . . . . . . . . . . .

74

3.4.4.2

ULS on a Quincunx Grid with AR Data . . . . . . . . . . . . . .

76

3.4.4.3

Local Adaptive ULS . . . . . . . . . . . . . . . . . . . . . . . . .

78

3.4.4.4

Image Class Optimal ULS Test . . . . . . . . . . . . . . . . . . .

78

3.4.4.5

A Refinement for Mammography . . . . . . . . . . . . . . . . . .

79

3.4.4.6

Optimal Second PLS Test . . . . . . . . . . . . . . . . . . . . . .

81

Chapter Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . .

82

3.4.3

3.4.4

3.5

4 From Adaptive to Generalized Lifting

83

4.1

Adaptive Lifting Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83

4.2

Adaptive Lifting Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

4.3

Adaptive Lifting Steps Construction . . . . . . . . . . . . . . . . . . . . . . . . .

91

4.3.1

Median-based Decision Adaptive ULS . . . . . . . . . . . . . . . . . . . .

91

4.3.1.1

Scheme Description . . . . . . . . . . . . . . . . . . . . . . . . .

91

4.3.1.2

Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

4.3.2

Variance-based Decision Adaptive ULS xiii

. . . . . . . . . . . . . . . . . . .

94

4.4

4.3.2.1

Scheme Description . . . . . . . . . . . . . . . . . . . . . . . . .

94

4.3.2.2

Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

98

Generalized Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.4.1

Discrete Generalized Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5 Generalized Discrete Lifting Steps Construction 5.1

Generalized Discrete Prediction Design . . . . . . . . . . . . . . . . . . . . . . . . 108 5.1.1

5.1.2

Geometrical Design of the Prediction . . . . . . . . . . . . . . . . . . . . . 108 5.1.1.1

Experiments and Results . . . . . . . . . . . . . . . . . . . . . . 110

5.1.1.2

Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Optimized Prediction Design . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.1.2.1

5.1.3

5.2


Adaptive Optimized Prediction Design . . . . . . . . . . . . . . . . . . . . 120 5.1.3.1


5.1.3.2

Convergence Issues

. . . . . . . . . . . . . . . . . . . . . . . . . 123

Generalized Discrete Update Design . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.2.1

Update Step Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 5.2.1.1

5.2.2

5.2.3

Joint Update-Prediction Design . . . . . . . . . . . . . . . . . . 128

Update-Last Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.2.2.1

Entropy Minimization . . . . . . . . . . . . . . . . . . . . . . . . 129

Update-First Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 5.2.3.1

5.3

107

Gradient Minimization . . . . . . . . . . . . . . . . . . . . . . . 132

Nonlinear Lifting Chapters Summary and Conclusions . . . . . . . . . . . . . . . 133

5.A Appendix: Proof of Minimum Energy/Entropy Mappings . . . . . . . . . . . . . 134 5.B Appendix: Algorithm to Implement 3-D SPIHT . . . . . . . . . . . . . . . . . . . 138 6 Conclusions and Future Work 6.1

141

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 xiv

6.2

Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

A Benchmark Images

147

Bibliography

151

xv

xvi

Notation Boldface upper-case letters denote matrices, boldface lower-case letters denote column vectors, upper-case italics denote sets, and lower-case italics denote scalars. R, Z

The set of real and integer numbers, respectively.

R+ , R++

The set of non-negative real and positive real numbers, respectively.

Zn

A set of n consecutive integer numbers.

Rn×m

The set of n × m matrices with real-valued entries.

XT

Transpose of the matrix X.

X−1

Inverse of the matrix X.

[X]i,j

(i, j)th component of the matrix X.

In

Identity matrix of dimensions n × n (the dimension is not explicitly indicated if it is clear from the context).

ei

Canonical vector with all the elements being zeros except the ith one which is equal to one.

a≥b

Elementwise relation ai ≥ bi .

|x|

Absolute value of the scalar x.

kxk

Euclidean norm of the vector x: kxk =

arg

Argument.

max, min

Maximum and minimum.

(·)?

Optimal value.

T S ,

Intersection and union.

[a, b], (a, b)

Closed interval (a ≤ x ≤ b) and open interval (a < x < b), respectively. xvii

√

xT x.

Pr(·)

Probability.

E[·]

Mathematical expectation.

|A|

Cardinality of the set A, i.e., number of elements in A.

L2 (R)

The space of square integrable functions.

l2 (Z)

The space of square summable sequences.

∗

Linear convolution.

∝

Equal up to a scaling factor (proportional).

,

Defined as.

'

Approximately equal.

∼ =

Equivalent to.

∇x f

Gradient of the function f with respect to x.

P (·)

Prediction lifting step.

U (·)

Update lifting step.

d·c

Rounding to nearest integer.

Q(·)

Quantization.

exp(·)

Exponential.

log(·)

Natural logarithm.

loga (·)

Base-a logarithm.

δ[·]

Kronecker delta.

xviii

Acronyms 1-D, 2-D, 3-D

One-dimensional, two-dimensional, and three-dimensional, respectively.

AR, AR-m

Auto-regressive and auto-regressive model of mth order, respectively.

bpp

bits per pixel.

DWT

Discrete Wavelet Transform.

EBCOT

Embedded Block Coding with Optimized Truncation.

EZW

Embedded Zero-tree Wavelet coding.

FB

Filter Bank.

FIR

Finite Impulse Response.

GL

Generalized Lifting.

GLS

Generalized Lifting Step.

IEEE

Institute of Electrical and Electronics Engineers.

IIR

Infinite Impulse Response.

ISO

International Organization for Standardization.

ITU-T

International Telecommunications Union (Standardization Sector).

JPEG

Joint Photographic Experts Group (image standard).

JPEG-LS

JPEG Lossless image standard.

KKT

Karush-Kuhn-Tucker.

LC

Local Characteristics.

LHS

Left-Hand Side.

LMS

Least Mean Square.

LS

Lifting Scheme.

LSB

Least Significant Bit.

LUT

Look-Up Table.

xix

LWT

Lazy Wavelet Transform.

MRA

Multi-Resolution Analysis.

MSB

Most Significant Bit.

MSE

Mean Square Error.

PLS

Prediction Lifting Step.

pdf

probability density function.

PR

Perfect Reconstruction.

PSNR

Peak Signal to Noise Ratio.

ROF

Rank-Order Filter.

RHS

Right-Hand Side.

SNR

Signal to Noise Ratio.

SPIHT

Set Partitioning in Hierarchical Trees.

SST

Sea Surface Temperature.

s.t.

subject to.

ULS

Update Lifting Step.

w.r.t.

with respect to.

xx

Chapter 1

Introduction 1.1

Motivation

Wavelet multi-resolution decomposition of images has shown its efficiency in many image processing areas and specifically in compression. Transformed coefficients are obtained by expanding a signal on a wavelet basis. The transformed signal is a different representation of the same underlying data. Such representation is efficient if a relevant part of the original information is found in a relative small number of coefficients. In this sense, wavelets are near optimal bases for a wide class of signals with some smoothness, which is the reason of its interest for compression. A new family of wavelet-based image coders emerged from the original work of J. Shapiro in 1993 when a way to profit from the two dimensional compact wavelet representation of images was shown. The wavelet-based image encoders improved compression performance relative to the previously existing JPEG standard, as well as having other nice features such as a completely embedded bit-stream representation. However, one of the initial assumptions on which the wavelet-based image coders relies is not precisely fulfilled. Images do not belong to a class of functions which are optimally represented with a wavelet transform. Real images are inherently non-stationary. Most of images are not smooth. Even a first approximation image model has to include different regions separated by edges. Indeed, regions are not usually flat or constant. Patterns and textures exist and they have not the smoothness property that would point to wavelets as their optimal bases. In waveletbased coders it is observed that textures and contours need the major part of bit-rate. Coding these singularities is costly. Therefore, a fixed wavelet decomposition is unable to efficiently represent the complexity of a real image. Wavelet family is broad. The choice of a wavelet basis is conditioned by the application at hand or the given objective. In coding, some wavelets are more adequate for smooth regions and others behave better near discontinuities. Hence, many researchers have proposed adaptive 1

Chapter 1. Introduction

2

schemes that modify the underlying wavelet basis according to local signal characteristics. Filter banks are the fundamental tool to create discrete wavelet transforms. They are formed by the analysis and synthesis low- and high-pass filters and the intermediate stages composed by down- and up-samplings. Several subsignals are the output of the filter bank analysis part. Each subsignal comes from a different filter channel. Initially, the complexity and challenge of adaptivity was to assure the filter bank reversibility in order to recover the original data from the subsignals. The design of a whole filter bank with such property is a difficult task. However, several works attained this goal. Later, the lifting scheme proposed by W. Sweldens gave a suitable framework for developing perfect reconstruction time-varying and nonlinear wavelet filters mainly because lifting structure itself assures reversibility and so, freedom in the design stage is greatly increased. Many contributions use or modify the lifting scheme. These works try in different ways to exploit the correlation existing among the decomposition channels. For instance, the signal local shape or the statistics in one channel may be considered to obtain a good prediction/interpolation of another channel. This is known as space-varying lifting. Adaptive and nonlinear decompositions take into account the non-stationarity of images and in this way they achieve a sparser description of images than classical wavelets. Going one step further, adaptation may be improved if the information given by the same channel to be filtered is considered. There exist methods for searching the best basis for the entire signal or for a class of signals, but this requires book-keeping and so additional bit-rate to attain reversibility. H. Heijmans and G. Piella showed that a point-wise adaptation is possible by using a transforminvariant criterion. In consequence, the analysis filters can be recovered at the decoder side allowing the correct synthesis filter choice without any book-keeping. Following the latter line of research, this Ph.D. thesis dissertation sets out a generalization of the lifting scheme in which additions and substraction in classical lifting steps are embedded to include any kind of operation. The generalization opens a door to new decompositions and design criteria. Lifting prediction and update steps are designed within the new framework, which is essentially devoted to nonlinear processing. The interest of the proposed scheme is demonstrated for lossless applications. There are many applications in which original image data has to be exactly recovered. For example, in biomedical imaging several legal and regulatory issues work in favor of lossless compression. Similarly, exact coders are required in remote sensing imaging in order to ensure accurate values of physical ground parameters. The discrete version of the generalized lifting is developed in order to construct prediction and update lifting steps. The scheme performance is mainly assessed by coding transformed


3

coefficients by means of wavelet-based image coders, but also by drawing some relevant statistics from the coefficients. For instance, the mean energy or the entropy are informative about the goodness of a transform for compression purposes. Three dimensional version of the proposed transforms and the implemented 3-D extension of a known image entropy coder permits the test of the new schemes for the coding of volumes or video sequences. The 3-D version is practical also for biomedical and remote sensing imaging. For example, magnetic resonance imaging and multi-spectral imaging are applications in which the gathered data is highly correlated in each of the three dimensions and so, the extended schemes may attain excellent compression ratios. This Ph.D. thesis dissertation provides contributions to the study of point-wise adaptive and nonlinear decompositions within the lifting scheme, trying to fill in the room opened in these topics. Moreover, there is place in the linear setting for new ideas in space-varying, signaldependant, and adaptive lifting. The linear framework is studied and contributions are made. The convergence of adaptive quadratic interpolation and the theory of convex optimization with the filter bank field leads to the improvement of existing linear lifting steps and the construction of new ones. The concrete issues and aspects addressed in this Ph.D. thesis dissertation concerning these general objectives and their distribution within the text are detailed in the next section.

1.2

Thesis Organization

The Ph.D. thesis dissertation is divided in six chapters. This introductory chapter is followed by an overview on the discrete wavelet transform and surrounding field, which introduces the main concepts employed in the dissertation. The next three chapters are devoted to the presentation of the thesis contributions. Final chapter concludes this work. Outline of the thesis dissertation: Chapter 2 presents an overview of the discrete wavelet transform, filter banks §2.1, and the lifting scheme §2.2. The usefulness of wavelet transform in image compression is established. Space-varying and the adaptive lifting strategy are two variants of lifting employed in the dissertation. They are introduced in §2.2.5 and §2.2.6, respectively. A state-of-theart review regarding the design and optimization in the lifting scheme is found in §2.3. Wavelet-based image entropy coders are described in §2.5. Chapter 3 is dedicated to the optimization of linear lifting filters. An adaptive quadratic interpolation method is described in §3.2 which combined with the theory of convex optimization allows the construction of interpolative prediction steps. Then, the method is


4

employed for the improvement of lifting steps and the construction of new ones in §3.3. Experiments and results are described in §3.4. Chapter 4 starts with the description of the adaptive lifting scheme §4.1. Then, an extended analysis of this adaptive scheme is provided §4.2. Section 4.3 includes the proposal of two steps within this framework. The analysis also leads to the definition of the generalized lifting and its discrete version in §4.4. Chapter 5 proposes the construction of concrete discrete generalized lifting steps. A geometrical approach to the design of a prediction step is described in §5.1.1. The rest of the section §5.1 is devoted to the optimization of generalized prediction steps, while section §5.2 optimizes generalized update steps. Experimental results including the 3-D version are also detailed. Chapter 6 draws the main conclusions from the Ph.D. thesis dissertation and details possible future lines of research.

1.3

Research Contributions

The main contribution of this Ph.D. thesis dissertation is the analysis and development of adaptive wavelet decompositions within the lifting scheme and the creation of a linear framework for the development of new lifting steps. The details of the research contribution and publications in each chapter are as follows. The main results in chapter 3 concern the design of linear lifting steps. A common formulation for quadratic interpolation and lifting design is presented. The connection permits the construction of a variety of new lifting steps, as well as a study of the optimality of known filters according to different criteria. Some of the results appear in the following conference papers: • [Sol06a] J. Solé and P. Salembier, “A common formulation for interpolation, prediction, and update lifting design”, in Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pags. 13-16, May 2006. • [Sol06b] J. Solé and P. Salembier, “Adaptive quadratic image interpolation methods”, accepted to Research in AVR Barcelona, July 2006. • [Sol06c] J. Solé and P. Salembier, “Adaptive quadratic interpolation methods for lifting steps construction”, accepted to the IEEE International Symposium on Signal Processing and Information Technology, August 2006. Chapter 4 characterizes adaptive lifting scheme from a new point of view, leading in a natural way to the proposal of two adaptive lifting constructions and the generalized lifting scheme and


5

its discrete version. The generalized scheme description in chapter 4 and the part of chapter 5 regarding the geometrical approach to the discrete generalized prediction design §5.1.1 are presented in two papers: • [Sol04a] J. Solé and P. Salembier, “Adaptive discrete generalized lifting for lossless compression”, in Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Vol. 3, pags. 57-60, May 2004. • [Sol04b] J. Solé and P. Salembier, “Discrete generalized lifting for lossless image compression”, in Proceedings of Research in AVR, pags. 337-340, February 2004. Chapter 5 develops the discrete generalized scheme proposing the optimization of several prediction and update steps within the framework. An optimized generalized prediction and its space-varying version have been published in two conference papers: • [Sol04c] J. Solé and P. Salembier, “Prediction design for discrete generalized lifting”, in Proceedings of Advanced Concepts for Intelligent Vision Systems, pags. 319-324, September 2004. • [Sol05] J. Solé and P. Salembier, “Adaptive generalized prediction for lifting schemes”, in Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pags. 205-208, March 2005.


6

Chapter 2

Discrete Wavelet Transform, Lifting, and Image Coding: An Overview This second chapter is an overview of the relevant issues required in the development of the Ph.D. thesis dissertation and a description of the state-of-the-art surrounding the lifting scheme. Section 2.1 starts with an introduction to wavelet theory and discrete wavelet transforms. Their connection to filter banks and subband coding is established. The employment of the wavelet transform in image compression is then justified. Section 2.2 introduces lifting scheme and describes its use and advantages in image coding. Adaptive lifting is the foundation stone of the nonlinear transforms proposed in this dissertation. It is briefly described in section 2.2.6 waiting for the more detailed analysis in chapter 4. Section 2.3 is a state-of-the-art review on lifting filters design and optimization techniques. Section 2.4 refers to some related nonlinear decompositions. Finally, section 2.5 is a review of another fundamental aspect for this work, the wavelet-based image coders.

2.1

Discrete Wavelet Transform

A main goal of wavelet research [Dau88, Coh92, Mal98, Bur98] is to create a set of expansion functions and transforms that give informative, efficient, and useful description of a function or signal. In applications working on discrete signals, one never has to directly deal with expansion functions. Discrete wavelet transform (DWT) is obtained simply by passing a discrete signal through a filter bank (figure 2.1). Wavelet theory can be understood and developed only by using such digital filters. This is the meeting point between wavelets and subband coding and the origin of two different nomenclatures for the same concepts. In fact, wavelet transform and subband coding are so closely connected that both terms are often used interchangeably. Filter banks [Vai92, Vet95] are structures that allow a signal to be decomposed into subsignals through digital filters, typically at a lower sampling rate. Figure 2.1 shows a two-band filter bank. 7

Chapter 2. Discrete Wavelet Transform, Lifting, and Image Coding: An Overview

8

It is formed by the analysis filters (Hi (z), i = 0, 1) and the synthesis filters Gi (z), for i = 0, 1. Filters H0 (z) and G0 (z) are low-pass filters. In an M -band filter bank, Hi (z) and Gi (z) for 0 < i < M − 1 are band-pass filters, and HM −1 (z) and GM −1 (z) are high-pass filters. For a two-band filter bank, M = 2 and H1 (z) and G1 (z) are high-pass filters. If the input signal can be recovered without errors from the subsignals, the filter bank is said to be a perfect reconstruction (PR) or a reversible filter bank. To enable PR, the analysis and synthesis filters have to satisfy a set of bilinear constraints. A polyphase characterization of perfect reconstruction is derived in §2.2.2. Every finite impulse response (FIR) filter bank with an additional linear constraint on the low-pass filter is associated with a wavelet basis. The low-pass synthesis filter G0 (z) is associated with the scaling function, and the remaining band-pass synthesis filters (G1 (z) in the 2-band case) are each associated with the wavelet functions. Analysis low-pass filter H0 (z) is associated with the so-called dual scaling function and analysis band-pass filters with the dual wavelet functions. The notion of channel refers to each of the filter bank branches. A channel is the branch of the 1-D scaling coefficients (or approximation signal) and also each branch of the wavelet coefficients (or detail signals). The concept of band involves the concept of frequency representation, but it is commonly used in image processing to refer to each set of samples which are the output of the same 2-D filter. In 1-D linear processing both concepts are interchangeable. The coefficients of the discrete wavelet expansion of a signal may be computed using a treestructure where the filter bank is applied recursively along the low-pass filter channel. Every recurrence output is a different resolution level, which is a coarser scale representation of the original signal. In summary, a DWT is a dyadic tree-structured transform with a multi-resolution structure. An alternative approach to the filter bank structure for computing DWT is the lifting scheme (LS). Lifting is more flexible and may be applied to more general problems. It is studied in detail in section 2.2.

2.1.1

Multi-resolution Analysis

Wavelet theory has a firm mathematical foundation on the multi-resolution analysis (MRA) axiomatic approach [Mal89]. This section starts with the definition of the multi-resolution hierarchy of nested sub-spaces which is then connected to real-valued wavelet basis functions and finally, wavelets are related to the filter bank structure. A multi-resolution analysis on L2 (R) is defined as a set of nested sub-spaces . . . ⊆ V (2) ⊆ V (1) ⊆ V (0) ⊆ V (−1) ⊆ V (−2) ⊆ . . .


H0(z)

x'

2

2

9

G0(z)

x0

x0 H1(z)

y'

2

Analysis

2

G1(z)

Synthesis

Figure 2.1: One-level two-band perfect reconstruction filter bank.

satisfying a set of five multi-resolution properties:

1. Upward completeness:

S

j∈Z V

2. Downward completeness:

T

(j)

j∈Z V

= L2 (R). (j)

= {0}.

3. Shift invariance: if f (t) ∈ V (0) ⇔ f (t − n) ∈ V (0) , ∀n ∈ Z. 4. Scale invariance: if f (t) ∈ V (0) ⇔ f (2−j t) ∈ V (j) , ∀j ∈ Z. 5. Basis existence: There exists ϕ(t) such that the set of functions {ϕ(t − n)}n∈Z is an orthonormal basis of V (0) .

The function ϕ(t) is called the scaling function. The set of its integer translates {ϕ(t−n)}n∈Z form a Riesz basis of V (0) . The dilated and normalized scaling function is denoted by √ ϕj,n (t) =

2−j ϕ(2−j t − n).

The dilated set {ϕj,n (t)}n∈Z is a Riesz basis of V (j) for every j. The sub-space W (j) is the orthogonal complement of V (j) in V (j−1) , that is V (j−1) = V (j) ⊕ W (j) ,

∀j ∈ Z.

A consequence of the MRA is the existence of the wavelet function ψ(t). The set of the integer translates of the wavelet {ψj,n (t)}n∈Z forms a Riesz basis for W (j) , being √ ψj,n (t) =

2−j ψ(2−j t − n).

Also, the set {ψj,n (t)}n,j∈Z forms an orthonormal wavelet basis for L2 (R).


10

Since ϕ(t), ψ(t) ∈ V (−1) , they can be expressed as linear combination of the basis vectors {ϕ−1,n (t)}n∈Z of V (−1) , i.e., the scaling and wavelet function satisfy each one the so-called twoscale or refinement relation ϕ(t) =

√ X 2 g0 [n]ϕ(2t − n),

(2.1)

n

ψ(t) =

√ X 2 g1 [n]ϕ(2t − n),

(2.2)

n

being g0 and g1 the synthesis low- and high-pass filter, respectively. Scaling and wavelet functions are related to the coefficients of a discrete filter through equations (2.1) and (2.2). This permits the interpretation of the MRA from a strict digital processing point of view despite of the underlying real-function setting. Let us illustrate this junction with an example. Assume f (t) ∈ V (j−1) . Then, the function f (t) is completely described by the coefficients of the inner products of f (t) with the scaling basis functions xj−1 [n] =< f (t), ϕj−1,n (t) >

(2.3)

through the linear decomposition f (t) =

X

xj−1 [n]ϕj−1,n (t).

(2.4)

n

Wavelets are necessary to describe the decomposition of f (t) at the lower resolution scale j, because the detail is not available at scale j. The decomposition in the sub-spaces V (j) and W (j) may be also obtained with the corresponding continuous inner products as in (2.3), f (t) =

X

xj [n]ϕj,n (t) +

n

X

yj [n]ψj,n (t).

n

However, the two-scale relation (2.1) applied on the scaling function in the inner product creates the decomposition working directly on the discrete domain samples x[n] and y[n]: xj [n] = < f (t), ϕj,n (t) > Z √ = f (t) 2−j ϕ(2−j t − n)dt Z p X = g0 [m − 2n] f (t) 2−(j−1) ϕ(2−(j−1) t − m)dt m

=

X

g0 [m − 2n]xj−1 [m].

m

Therefore, the relation between the scaling coefficients of f (t) at two consecutive resolution levels is xj [n] =

X m

g0 [m − 2n]xj−1 [m] = g¯0 [2n] ∗ xj−1 [n].


11

Equivalently, the corresponding relationship for the wavelet coefficients is yj [n] =

X

g1 [m − 2n]xj−1 [m] = g¯1 [2n] ∗ xj−1 [n],

m

where g¯0 [n] = g0 [−n] and g¯1 [n] = g1 [−n]. These relations are precisely the low- and high-pass filtering of the decomposition coefficients at V (j−1) . By these means, a continuous decomposition of a function in L2 (R) is related to a subband filtering of a l2 (Z) sequence. Every MRA gives rise to an orthonormal basis and an underlying filter bank that provides a vehicle for implementing the wavelet transform. For image compression applications, there are two important features that transforms may have: linear phase and finite support. However, there is only one two-band orthonormal subband transform with these two properties, the so-called Haar wavelet. The corresponding filters to the Haar transform have two-taps and thus, this wavelet is hardly useful for many applications due to its short length. Linear phase and finite support are obtained at the same time by means of the biorthogonal systems.

2.1.2

Biorthogonal Wavelets

Biorthogonality is obtained by slightly relaxing the fifth MRA property, i.e., the existence of an integer-translate orthonormal basis of V (0) . The distinctive characteristic of biorthogonal systems is that the decomposition bases at the analysis and synthesis side are different. Besides the scaling and wavelet functions, biorthogonal systems have a dual scaling function and a dual wavelet function that also generate a multi-resolution analysis. The dual scaling function is ˜ ˜ denoted by ϕ(t) ˜ and the dual wavelet by ψ(t). Functions ϕ(t) ˜ and ψ(t) satisfy the two-scale relations (2.1) and (2.2) with the coefficients h0 [n] and h1 [n], respectively. The dual functions are biorthogonal to the primal functions ϕ(t) and ψ(t) in the sense that ˜ < ϕ(t), ˜ ψ(t − n) >=< ψ(t), ϕ(t − n) >= 0, ˜ < ϕ(t), ˜ ϕ(t − n) >=< ψ(t), ψ(t − n) >= δ[n]. A biorthogonal wavelet system extends the orthogonal ones, it is more flexible, and generally easier to design. The advantages of a biorthogonal system w.r.t. an orthogonal one are described in the list below.

• Orthogonal system filters must be of the same length, and the length must be even. This restriction is relaxed for biorthogonal systems.


12

• Biorthogonal wavelets may be symmetric and thus, filter frequency response may be linear phase. This is the main reason of the inclusion of biorthogonal systems in the JPEG2000 standard and their widespread use. On the other hand, as mentioned above, there are no two-band orthogonal transforms having FIR linear phase with more than 2 non-zero coefficients in any filter. • In a biorthogonal system, analysis and synthesis filters may be switched and the resulting system is still sound. Therefore, the appropriate arrangement may be chosen for the application at hand. For image compression, it has been observed that the use of the smoother filter in the reconstruction of the coded image leads to better visual appearance. Biorthogonal systems also have some disadvantages: • Parseval theorem no longer holds for biorthogonal wavelets. This means that the norm of the coefficients is not the same as the norm of the functions being spanned. Many design efforts have been devoted to making near orthogonal systems. • White gaussian noise remains white after an orthogonal transform, but becomes correlated after a non-orthogonal transform. This may be considered when biorthogonal systems are employed in estimation or detection applications. 2.1.2.1

Vanishing Moments

Vanishing moments is a core concept of wavelet theory. In fact, the number of vanishing moments was a more important factor than spectral considerations for the choice of the wavelet transforms for the JPEG2000 standard. To vanish the nth moment means that given a polynomial input up to degree n the filter output is zero. A wavelet has N vanishing moments if Z ∞ tn ψ(t)dt = 0, for 0 ≤ n < N. −∞

˜ vanishing moments. N and N ˜ The same definition applies for the dual wavelet to have N are also the multiplicity of the origin as a root of the Fourier transform of the synthesis and analysis high-pass filter, respectively. Also, it is the multiplicity of the regularity factor (1 + z −1 ) in H1 (z) and G1 (z) (which are the z-transform of the filters h1 [n] and g1 [n]).

2.1.3

Discrete Wavelet Transform in Image Compression

For images, filter banks and lifting filters are usually developed for the 1-D case and then they are extended to the separable 2-D case by a succession of a vertical and an horizontal


13

1-D filtering. This structure leads to a 4 band per resolution level decomposition (figure 2.2). The decomposition may be iterated on the LL band (the vertically and horizontally low-pass filtered band). The bands with high frequency components (the HL, LH, and HH bands) are not recursively filtered. One of the wavelet transform advantages concerning data compression is that it tends to compact the input signal energy into a relatively small number of wavelet coefficients. Lowest resolution band coefficients have high energy. High frequency bands represent the majority of the transformed samples. Most of high frequency band coefficients are zero or have low energy. The exceptions are samples lying near strong edges relative to the band orientation. For instance, a vertical edge produces significant wavelet coefficients in the HL band, which is obtained by applying an horizontal high-pass filter and thus, the edge high frequency components are not eliminated. At the same time, the LH band is not affected by such an edge. Equivalent statements are valid for the other bands: in general, horizontal and diagonal edges produce significant coefficients in the HL and the HH bands, respectively. This phenomenon is illustrated in figure 2.3. The 256x256 image crosses 1 (figure 2.3a) is decomposed with the Haar wavelet (figure 2.3b shows the transformed image). The Haar transform analysis filters are H0 (z) = 2−1/2 (1 + z) and H1 (z) = 2−1/2 (1 − z −1 ). The distribution of the high energy coefficients in the different bands (the darker and lighter pixels) may be observed in the transformed image. Therefore, the few high frequency samples with significant energy in a band are generally clustered together. Besides, there is relevant inter-band statistical dependance. Specifically, high energy coefficients often cluster around the same location across several scales. These properties are studied in detail in [Liu01]. They are exploited by the wavelet-based image coders to achieve excellent compression results. Section §2.5 outlines the strategies followed by these kinds of image coders.

2.2

Lifting Scheme

This section introduces the original lifting scheme due to Sweldens, its properties, and applications §2.2.1. The polyphase domain analysis in §2.2.2 and §2.2.3 provides the mathematical reason of the LS structural perfect reconstruction property and the connection between filter banks and LS. The section takes a more state-of-the-art review flavor in §2.2.4, which describes the use of wavelets and lifting in the image coding standard JPEG2000, a fact that has prompted the interest in LS. Space-varying lifting is described in §2.2.5 and §2.2.6 is an introduction to adaptive LS. 1

Images employed in the experiments throughout this Ph.D. thesis are described in appendix A.


LL3 HL3 LH3 HH3

LH2

LL3

HL3

LH3

HH3

HL2

HL2

HL1 HH2

HL1 LH2

HH2

Resolution level

LH1

HH1

HH1

LH1

Horizontal Vertical

Figure 2.2: On the left, notation for a 3-level 2-D separable wavelet decomposition of an image. Every resolution level have 4 bands: LL, HL, LH, and HH, where L stands for low-pass filtered and H for high-pass filtered. On the right, a decomposition example with the image Lenna.

(a)

(b)

Figure 2.3: (a) Image “crosses” and (b) its 2-level Haar wavelet transform.

14


2.2.1

15

Classical Lifting

The lifting scheme (figure 2.4) formally introduced in [Swe96, Swe97] by W. Sweldens is a wellknown method to create biorthogonal wavelet filters from other ones. The scheme comprises the following parts: (a) Input data x0 . (b) Polyphase decomposition (or lazy wavelet transform, LWT) of x0 into two subsignals: – An approximation signal x formed by the even samples of x0 . – A detail signal y formed by the odd samples of x0 . (c) Lifting steps: – Prediction P (or dual) lifting step that predicts the detail signal samples using the approximation samples x, y 0 [n] = y[n] − P (x[n]).

(2.5)

– Update U (or primal) lifting step that updates the approximation signal with the detail samples y0 , x0 [n] = x[n] + U (y0 [n]).

(2.6)

(d) Output data: the transform coefficients x0 and y0 . Possibly, there are scaling factors K1 and K2 at the end of each channel in order to normalize the transform coefficients x0 and y0 , respectively. The inversion of the scheme is straightforward. The same prediction lifting step (PLS) and update lifting step (ULS) are employed and only the sign of the addition is changed. Finally, subsignals are merged into the higher rate signal to recover the original data x0 . Lifting steps improve the initial lazy wavelet properties. Alternatively, input data may be any other wavelet transform (x, y) with some properties to improve. The transform may be a multichannel or M -band decomposition, leading to several detail subsignals yi0 , for i = 1, . . . , M − 1. Also, several steps (a prediction followed by an update step or vice versa) may be concatenated in order to reach the desired properties for the wavelet basis. The prediction and update operators may be a linear combination of x and y, respectively, or any nonlinear operation, since by construction the LS is always reversible. Given a wavelet decomposition that maps the input signal x0 to the output approximate subsignal x0 and the output detail subsignal y0 , a multi-resolution decomposition of x0 is built


x

x0

LWT

x

x'

P

U

U

P

y'

y Analysis

16

LWT -1

x0

y Synthesis

Figure 2.4: Classical lifting scheme.

by the concatenation of the lifting decomposition blocks on the approximate subsignal, exactly like the recursive filter bank tree-structure does. Subsignals x00 and y00 are obtained plugging x0 into another decomposition block formed by the lifting steps. The process may be repeated on x00 , and so on. Thus, the concatenation of K such blocks yields a K-level wavelet decomposition. A 1-D signal decomposed with a 2-band K-level DWT results in the subsignals x0 → (x0 , y0 ) → (x00 , y00 , y0 ) → . . . → (x(K) , y(K) , . . . , y00 , y0 ). LS is known as the second generation wavelet because it has many advantages with respect to the classical method for the construction of wavelets based on the Fourier transform. These advantages are itemized in the following list: 1. Inverse existence. Every lifting step is reversible by structure so there always exists the inverse wavelet transform constructed with lifting steps. 2. Critical down-sampling assured. Initial wavelet is modified with existing samples so no additional information (redundancy) is added. 3. Transform direct spatial interpretation. When constructing a new transform, the lifting structure permits us to consider how the output coefficients of a lifting filter affect the channel being filtered in a quite “visual” manner and without any spectral consideration. The reason is that the lifting structure itself performs a biorthogonal wavelet decomposition despite of the prediction and update filters. The alternative is to first construct the wavelet through Fourier methods and only then observe how it exactly acts on the signal in the spatial (or time) domain. 4. Computational cost reduction. Asymptotically, lifting reduces to one-half the computational cost of the standard filter implementation.


17

5. Memory savings. In-place lifting computation avoids auxiliary memory requirements since lifting outputs from one channel may be saved directly in the other channel. Such implementation considerations are explained in [Tau02a]. 6. FIR decomposition. Daubechies et al. demonstrated in [Dau98] that every wavelet transform with FIR filters can be decomposed into a finite number of lifting steps. 7. Boundary extensions. Lifting significantly reduces the casuistry in the boundary treatment w.r.t. filter banks. Also, lifting implementation does not require explicit signal extension at boundaries. Furthermore, LS has many applications in the wavelets field: • New wavelets construction. • Existing wavelets improvement. • Wavelet construction on irregular grids. For instance, non-separable lifting on quincunx sampled images have been developed [Gou00]. • Easy replacement of linear filters by nonlinear/morphological filters (e.g. [Hei00]). • Design of space-varying and adaptive decompositions (e.g. [Ger00]). LS flexibility is profited for video coding applications as outlined in §2.3.3. Lately it has found applications like the coding of multi-view images [Ana05], the coding of 3-D mesh data [Hon05], or even the construction a multiscale manifold representation from noisy point clouds [Cho05]. Other signal processing applications as reversible image rotation and lossless data embedding in images [Kam05] have also appeared. This Ph.D. dissertation is centered on the more “classical” applications of LS in the context of image coding: the analysis and improvement of existing wavelets as in §3.4.3 and §3.3.2, respectively, the construction of new linear transforms §3.3.3, wavelet construction on irregular grids §3.4.4, the design of adaptive decompositions §4.3, and the creation of new nonlinear decompositions in chapter 5.

2.2.2

Polyphase Characterization of Perfect Reconstruction

The polyphase domain analysis of filter banks permits us to obtain PR conditions on the filters and naturally leads to the lifting scheme as a built-in PR decomposition. The lazy wavelet or polyphase transform separates odd and even samples. The two polyphase components xe [n] = x[2n] and xo [n] = x[2n + 1] are obtained through a delay and two downsamplings (figure 2.5). The z-transform of x[n] can be expressed as the z-transform of the


x[n] z

2

x[2n]

2

2

x[2n+1]

2

18

x[n] z-1

Figure 2.5: Polyphase decomposition or lazy wavelet transform, followed by the inverse process.

x[n]

H(z2)

2

y[n]

x[n]

2

H(z)

y[n]

x[n]

H(z)

2

y[n]

x[n]

2

H(z2)

y[n]

Figure 2.6: Noble multi-rate identities, (top) First Noble identity for the down-sampling and (bottom) Second Noble identity for the up-sampling

polyphase components: X(z) = Xe (z 2 ) + z −1 Xo (z 2 ). The filters are also split in the same way (with the convenient delay for the odd samples): H0 (z) = H0e (z 2 ) + zH0o (z 2 ), H1 (z) = H1e (z 2 ) + zH1o (z 2 ), G0 (z) = G0e (z 2 ) + z −1 G0o (z 2 ), G1 (z) = G1e (z 2 ) + z −1 G1o (z 2 ). Filter He (z 2 ) (resp. Ho (z 2 )) contains the even (resp. odd) samples of the impulse response of H(z) interpolated with zeros, one zero after each coefficient. Such zero-padded filters allow the interchanging of down-sampling and filtering, thus reaching an equivalent structure. This is known as the first Noble multi-rate identity. Figure 2.6 depicts the two Noble identities. The first identity is applied to the even and odd channels. The process is shown in the analysis part of figures 2.7 and 2.8. By these means, down-sampling is performed first and it is followed by the filter He (z) (resp. Ho (z)). The structure in figure 2.8 is computationally more efficient but mathematically equivalent to the structures in figures 2.1 and 2.7. The transformed coefficients x0 and y0 may be expressed as function of the polyphase components of the input signal and filters: X 0 (z) = H0e (z)Xe (z) + H0o (z)Xo (z), Y 0 (z) = H1e (z)Xe (z) + H1o (z)Xo (z).


x[n]

x'[n]

2 Hp(z2) z

y'[n]

2

19

x[n]

2 Gp(z2) z-1

2

Figure 2.7: Polyphase structure of the filter bank before applying the first Noble identity.

x[n]

x'[n]

2 Hp(z) z

y'[n]

2

x[n]

2 Gp(z) 2

z-1

Figure 2.8: Polyphase characterization of a filter bank.

Using the matrix notation the two previous equations are expressed as

X 0 (z) Y 0 (z)

H0e (z) H0o (z) Xe (z) = , H1e (z) H1o (z) Xo (z) | {z }

Hp (z)

where Hp (z) is the analysis polyphase matrix of the filter bank. Filters H0 (z) and H1 (z) are given by

H0 (z) H1 (z)

= Hp (z 2 )

1 z

.

In general, an M -channel filter bank structure is compactly represented by an M xM component polyphase matrix, in which the element [Hp (z)]i,j is the j th polyphase component of the ith filter. The synthesis part of the filter bank may also be described with a polyphase matrix. A ˆ reconstructed signal X(z) is obtained, ˆ X(z) = G0 (z)X 0 (z 2 ) + G1 (z)Y 0 (z 2 ). The expression of the synthesis filters with their polyphase components is introduced, ˆ X(z) = (G0e (z 2 ) + z −1 G0o (z 2 ))X 0 (z 2 ) + (G0e (z 2 ) + z −1 G0o (z 2 ))Y 0 (z 2 ).

(2.7)

The second multi-rate identity (figure 2.6) is applied to (2.7), thus obtaining the equivalent


20

synthesis structure in figure 2.8. Equation (2.7) written in a matrix form is 0 2 G0e (z 2 ) G1e (z 2 ) X (z ) −1 ˆ X(z) = 1 z , G0o (z 2 ) G1o (z 2 ) Y 0 (z 2 ) | {z } Gp (z 2 )

where Gp (z) is the synthesis polyphase matrix of the filter bank. Then, the output signal is related to input polyphase components through the filter bank polyphase matrices: Xe (z 2 ) 2 2 −1 ˆ X(z) = 1 z Gp (z )Hp (z ) . Xo (z 2 )

(2.8)

PR is attained when the output signal is a delayed and scaled version of the input signal. By inspection of (2.8), it is observed that if the condition Gp (z)Hp (z) = I

(2.9)

holds, then the reconstructed signal is X(z), since the structure in figure 2.8 reduces to the LWT of figure 2.5 and thus, PR is attained, i.e., ˆ X(z) = Xe (z 2 ) + z −1 Xo (z 2 ) = X(z). The determinants of the polyphase matrices of a PR FB are related. The polyphase analysis and synthesis matrices of any two channel FIR PR filter bank must satisfy det(Hp (z)) = αz −k , det(Gp (z)) = α−1 z k ,

(2.10)

for some arbitrary delay k ∈ Z and non-zero α ∈ R. The proof is straightforward. Since (2.9) must hold, the product of det(Hp (z)) and det(Gp (z)) must be equal to 1. Also, both determinants are finite polynomials in z because filters are FIR. Consequently, the determinants must be monomials of the form given by (2.10). The determinants are 1 with a proper filter scaling.

2.2.3

Polyphase Characterization of Lifting Scheme

Consider a filter bank from figure 2.1 satisfying the PR property. A “new” high-pass filter H1new (z) is obtained by adding a prediction or dual lifting step after the down-sampling. The new high-pass filter is related to the “old” filter H1 (z) by H1new (z) = H1 (z) − H0 (z)P (z 2 ). The high-pass channel is being lifted (improved) with the help of the low-pass channel. The high-pass filter is improved with an appropriate choice of P . The polyphase components of the


21

new filter are new H1e (z) = H1e (z) − H0e (z)P (z), new H1o (z) = H1o (z) − H0o (z)P (z).

Therefore, the new and old polyphase matrices are related by 1 0 new Hp (z) = Hp (z). −P (z) 1 An inverse PLS should performed in the synthesis part in order to preserve the PR property of the new filter bank. The inverse is trivial, since it is the same matrix with the sign of P (z) changed. The new synthesis polyphase matrix is Gnew p (z) PR is preserved simply because new new Gp (z)Hp (z) = Gp (z)

= Gp (z)

1 0 P (z) 1

1 0 P (z) 1

1 0 −P (z) 1

.

Hp (z) = Gp (z)Hp (z) = I.

A similar procedure is done in order to lift the properties of the low-pass channel: the update or primal lifting step. In the polyphase domain, the ULS is an upper triangular matrix with a positive U (z) at analysis and with a negative sign at synthesis: 1 U (z) new Hp (z), Hp (z) = 0 1 1 −U (z) new . Gp (z) = Gp (z) 0 1 As said above, several lifting steps may be concatenated. The most frequent choice for the filters to begin the LS is the LWT, i.e., the polyphase decomposition with Hp (z) = I and Gp (z) = I. In fact, it was shown in [Dau98] that any FIR wavelet may be decomposed into a finite number of lifting steps, an initial LWT, and a magnitude scaling: (Y ) m K1 0 1 Ui (z) 1 0 Hp (z) = , 0 K2 0 1 −Pi (z) 1 i=1 ( 1 ) 1 Y 0 1 0 1 −Ui (z) K 1 Gp (z) = . Pi (z) 1 0 1 0 K12

(2.11)

i=m

This decomposition corresponds to the polyphase matrix factorization into elementary matrices. A lifting step becomes an elementary matrix, that is, a triangular matrix (lower or upper) with all diagonal entries equal to one. A well known result in matrix algebra states that any matrix with polynomial entries and determinant one can be factored into such elementary matrices.


22

The crucial point is that the determinant of a triangular matrix with all diagonal elements equal to one is one and so, its inverse always exists independently of the value or form of the nondiagonal matrix entry. The inverse is simply obtained by changing the sign of the non-diagonal element. Therefore, despite of the lifting step added to the filter bank, the resulting analysis polyphase matrix determinant is not changed and an inverse step may be added to the synthesis polyphase matrix. The proof of the factorization existence in [Dau98] relies on the Euclidean algorithm that can be used because the z-transform of a filter is a Laurent polynomial. Concretely, the filter P 1 hi z −i is a Laurent polynomial in z of order |H| = k1 −k0 , being z −k a z-transform H(z) = ki=k 0 polynomial of order 0. The set of all Laurent polynomials with real coefficients has a commutative ring structure. In general, exact division within a ring is not possible. However, division with remainder is possible for Laurent polynomials. Polyphase matrix entries are Laurent polynomials, so they also form a ring structure. If the determinant of such a matrix is a monomial, then the matrix is invertible (2.10). Thus, the Euclidean algorithm can be applied for the decomposition of the polyphase matrix. However, the long division of Laurent polynomials is not necessarily unique, so various decompositions are possible for the same filter bank, i.e., a wavelet transform has several lifting versions. The selection of the polyphase matrix factorization has practical relevance because the finite precision representation of the lifting filter and coefficients has effects on performance. Quantization deviates the lifting implementation from the theoretical transform properties. Various factorization criteria has been envisaged: the minimum number of lifting steps, the ratio between the lifting coefficients of maximum and minimum magnitude [Ada98], the closeness of the scaling factor K to one [Cal98], or the minimum nonlinear iterated graphic function [Gra02], which is a figure of the difference between the wavelet function and the quantized LS impulse response. The latter criterion seems to perform the best.

2.2.4

Lifting in JPEG2000-LS

The JPEG2000 [ISO00] is an ISO/ITU-T image compression standard. JPEG2000 is a waveletbased image coder largely derived from the EBCOT coder (§2.5.3) that achieves excellent lossy and lossless results. It has several interesting features, like the support of different types of scalability (§2.5.1). The wavelet transform in JPEG2000 is computed via lifting scheme. The JPEG2000 choice for the lossy-to-lossless compression algorithm (JPEG2000-LS) is the DWT known as LeGall 5/3, spline 5/3, or (2,2). The low- and high-pass analysis filters have 5 and 3 taps, respectively. It was introduced by D. Le Gall [Gal88] in the subband coding domain, seeking short symmetric kernels for PR image coding purposes. Cohen, Daubechies, and Feauveau [Coh92] developed families of biorthogonal transforms involving linear phase filters using


23

Fourier arguments. The shortest biorthogonal scaling and wavelet function with 2 regularity factors (or vanishing moments) at analysis and synthesis, denoted (2,2), is attained with the filter bank proposed by Le Gall. Indeed, the LeGall 5/3 synthesis scaling function is a linear B-spline, which is the reason for the name spline 5/3. Figure 2.9 shows the scaling and wavelet functions. Sweldens [Swe96] proposed the construction of an entire family of Deslauriers-Dubuc biorthogonal interpolating wavelets via lifting, using 2 steps. The LeGall 5/3 wavelet also belongs to this family. The LeGall 5/3 wavelet analysis low-pass filter H0 (z) and the high-pass filter H1 (z) are −z −2 + 2z −1 + 6 + 2z 1 − z 2 , 8 −z −1 + 2 − z 1 . H1 (z) = z −1 2 H0 (z) =

(2.12)

For lossless coding, an integer-to-integer transform [Cal98] is preferred. Lifting with a rounding after each step attains this kind of transform straightforwardly. In this way, any FIR filter bank can be implemented as an integer-to-integer transform by placing the rounding operation after each filter and before the addition or substraction because of the stated FIR filter bank factorization property into lifting steps [Dau98]. For instance, the lifting steps

x[n] + x[n + 1] P (x[n], x[n + 1]) = , 2 0 y [n − 1] + y 0 [n] U (y 0 [n − 1], y 0 [n]) = , 4

(2.13)

realize the integer-to-integer transform of the filter bank (2.12). At low bit rates, reversible integer-to-integer transforms and their conventional counterparts often yield results of comparable quality [Ada00]. If the initial wavelet is the LWT, then the low- and high-pass filters are related to the linear prediction and update through H0 (z) = 1 + H1 (z)U (z 2 ), H1 (z) = z −1 − P (z 2 ).

(2.14)

The analysis polyphase matrix of the LeGall 5/3 wavelet is Hp (z) =

K1 0 0 K2

1 0

1 4z

+ 1

1 4

1 0 1 1 −1 −2 − 2z 1

.

Adopting the convention that the low- and high-pass analysis filters are normalized to have unit gain respectively at ω = 0 and ω = π, then the final scaling factors are K1 = 1 and


ϕ(t)

24

ψ(t)

−2

−1

0

1

2

−1

0

1

2

−1

0

1

2

˜ ψ(t)

ϕ(t) ˜

−2

−1

0

1

2

Figure 2.9: Synthesis and analysis scaling and wavelet functions for the LeGall 5/3 transform. 1

K2 =

− 21 .

1

The synthesis filters are z −1 + 2 + z 1 , 2 −z −2 − 2z −1 + 6 − 2z 1 − z 2 . G1 (z) = z 8 G0 (z) =

Interestingly, the lossless performance is almost independent of the normalization being performed or omitted. However, if the scaling factors are omitted, a performance degradation appear in lossy compression because the information con1 the transform deviates from unitary and thus, 1 tent of the coefficients is not directly related to its magnitude. JPEG2000 Part 1 standard supports the LeGall 5/3 wavelet for reversible transformations and the Daubechies 9/7 wavelet [Ant92] as irreversible transformation for lossy compression purposes. The choice of the LeGall 5/3 wavelet is not casual; it has several interesting mathematical properties, which are analyzed in [Uns03] with those of the Daubechies 9/7. LeGall 5/3 wavelet is the shortest symmetrical (to avoid boundary artifacts) biorthogonal wavelet with two vanishing moments. In addition, it has been shown that the LeGall 5/3 wavelet has the maximum vanishing moments for its support. It may be obtained by factorizing a maximally flat Daubechies or Dubuc-Deslaurier half-band filter [Dau88].


25

Notice that the order in which the filters are applied (analysis vs. synthesis) is important (figure 2.9): the shortest and most regular basis functions are placed on the synthesis side. This is consistent with the principle of maximizing the approximation power of the representation. Intuitively, the smoothest basis functions should be on the synthesis side in order to minimize perceptual artifacts. The reason is that the output is a weighted sum of the synthesis functions. The wavelet transform evaluation work [Ada00] shows that the LeGall 5/3 wavelet fares well considering its very low computational complexity. For the case of images containing significant amount of high-frequency content, the 5/3 tends to obtain better lossless results than all the other longer transforms considered in the work and often by a remarkable margin. However, there is no single transform that performs consistently better for all images. That is the reason of the research effort towards signal and locally adaptive transforms as the works reviewed in section 2.3. Since the main part of this Ph.D. thesis work is devoted to lossless compression, LeGall 5/3 wavelet constitutes the appropriate benchmark for comparisons.

2.2.5

Space-Varying Lifting

Spatial adaptivity is introduced into the lifting structure with the so-called space-varying lifting. This kind of LS chooses a lifting filter at each sample n according to the signal local characteristics (LC ), P (x) = P (x, LC(x)), U (y0 ) = U (y0 , LC(y0 )). In general, there is no need to code side-information to indicate the chosen filter at sample n since lifting steps depend on the same samples as the classical (non-varying) case and so, coder and decoder have the same information available for the filter selection. This is the most significant difference w.r.t. the adaptive lifting §2.2.6 and the generalized lifting proposed in this Ph.D. thesis in chapter 4. Typically, the operator LC(·) indicates flat zones, edges, or textures. The corresponding filters may vary in many ways. Some simple examples are given in the following list. Further modifications are explained in §2.3.1 and §2.3.2. • Coefficient values may be modified in order to vary the number of vanishing moments [Ser00] or to perform an LMS update of the filter coefficients depending on previous estimation errors [Ger00]. • Filter length [Cla97, Cla03] may be altered to take into account an edge or other structures.


x

x'

d

D

Ud

26

P

y

y'

Figure 2.10: Adaptive update lifting step followed by a classical prediction.

The goal is to avoid to predict through such structures, which produces worse results than the prediction in a homogeneous region. • Filter type. In [Egg95], the filter is chosen as linear or morphological in order to obtain good texture representation.

2.2.6

Adaptive Lifting

The adaptive LS is a modification of the classical lifting. A simple version was stated in [Pie01b, Pie01a, PP02] and then, a wider framework and 2-D extensions were presented in [Hei01, Hei05a, Pie05], and a lossy coding version in [PP03, Pie04, Hei05b]. Figure 2.10 shows an example of an adaptive ULS followed by a fixed prediction. At each sample n, an update operator is chosen according to a decision function D(x[n], y). The crucial point is that D(x[n], y) depends on y, as in the classical and space-varying lifting, but it also depends on the sample being updated. For this reason a problem arises because the decoder does not dispose of the sample x[n] used by the coder to take the decision. The decoder only knows x0 [n], which is an updated version of x[n] through an unknown update filter. The challenge is to find a decision function and a set of filters that permit the reproduction of the decision D(x[n], y) at the decoder, D(x[n], y) = D0 (x0 [n], y),

(2.15)

thus obtaining a reversible decomposition scheme. This property is known as the decision conservation condition. The range of D may indicate whether there exists an edge on x[n] if D is the l1 -norm of the gradient D : R × Rk

→ R+

(x[n], y[n]) → d =

X

|yi − x|,

or whether x[n] resides in a textured region if a texture detector D is used, like in [Egg95], or


27

further still, it may indicate other geometrical constraints. Also, the extension to two dimensions allows to take into account 2-D structures. Therefore, a suitable filter for the signal local characteristics at n (made evident by function D) is applied at sample n. Typically, long low-pass filters are chosen for smooth regions and short-support filters are selected around edges. A relevant feature of the adaptive scheme is that it does not require any bookkeeping to enable PR although the filter may vary at each location using non-causal information (i.e., information not available at the decoder). Adaptive lifting is extensively analyzed in chapter 4, which is useful in order to introduce the generalized lifting scheme. Generalized lifting contains all the possible reversible decomposition respecting equation (2.15).

2.3

Review of Lifting Algorithms

This section reviews different approaches for the construction of lifting steps. First part §2.3.1 describes works that aim to design lifting filters. Meanwhile, §2.3.2 reviews lifting optimization criteria and techniques. Finally, §2.3.3 outlines the use of lifting in video compression. Sections 2.3.1 and 2.3.2 describe proposals in the lifting scheme domain. However, LS as part of the wavelet theory may profit and incorporate many interesting ideas from the framework of wavelet bases, discrete transforms, and filter banks design and optimization. The extension of these fields makes an attempt at an exhaustive state-of-the-art review impossible. Only some hints of such ideas applicable to lifting are detailed below. • Adaptivity. Lifting offers an easy way to perform space-varying adaptive decompositions. However, adaptivity may be introduced in many different ways. For instance, [Don95] explains an idea for adaptive coding found in several works. A set of block transforms is constructed with an image training set and a LMS-like learning algorithm. First, input image is transformed with all bases. Afterwards, the bases for which the principal components give minimum MSE are selected for coding. Block transforms are updated for the image being coded. Similar systems may be envisaged using LS. • Topology refers to the number of bands and the tree depth of the filter bank for each band. Many works, as [Ram96], optimize the filter bank topology instead of the filters themselves. Also, it is suggested that merging both optimizations (filters and topology at the same time) should lead to better results, but this seems to be a harder task. • Filter optimization. A myriad of works are devoted to optimizing wavelet filters to attain a certain objective. [Del92] represents a standard approach minimizing detail signal variance, which is a common criterion. The search for optimal filters is usually limited to a filter


28

subset and/or to a signal model. In this case, the subset is the orthonormal FIR filters with a given length and the input signal second-order statistics are considered. The particularity of the proposal is that the algorithm is iterative since it progressively refines the filter by factoring the orthonormal matrices in rotation matrices. • Optimization criterion. Besides filter optimization techniques and algorithms, it is interesting to focus on the optimization criteria. The usual criteria are the variance, entropy, and energy minimization. However, there are works that propose processes to optimize FB with more direct criteria, as bit-rate, a rate-distortion function, or the number of bits required for representing a signal window [Sin93]. All these criteria may also be employed in LS. Other criteria suggested for wavelet systems design are the coding gain, filters frequency selectivity, the number of vanishing moments, and the smoothness of the synthesis functions.

Many other works, as well as the above examples, are applicable to LS since it is composed of filters with a more or less clear objective. The following two sections §2.3.1 and §2.3.2 review works directly involved in the construction of lifting steps.

2.3.1

Methods for Lifting Design

Several works presented decomposition structures resembling LS before it was formally exposed by Sweldens in [Swe96]. For instance, [Bru92] introduces a ladder network scheme related to lifting. Also, [Egg95] proposes a lifting-type scheme that switches the high-pass analysis and the low-pass synthesis filter between a linear and a morphological filter according to a texture detector. The result is a biorthogonal filter bank capable of filtering each kind of region with the suited filter: textures with a linear filter and homogeneous regions and edges with a morphological one. The half-band type decomposition structure guarantees PR irrespective of the decomposition function and indeed, it is maximally decimated. Equivalent structures are used in other nonlinear subband decompositions as [Que95, Que98]. Also [Flo94] describes a decomposition that can be seen as a down-sampling followed by a PLS of the finer scale signal. The prediction is performed by a nonlinear weighted median. The goal is to limit the aliasing artifacts. The output is a decomposition made up of an approximation and several error signals. [Que98] explains a similar idea that works directly on the 2-D image polyphase components. The interpolative prediction is a hybrid median-mean filter [Pit90]. The resulting quantized transform is useful for lossy compression. In the nonlinear decomposition methods proposed before [Flo96], one of the subsignals is always a simple down-sampling of the original signal. This work presents a more general framework composed of nonlinear elementary stages that has linear filter banks and a sort of lifting


29

x0 x1

y1 U

x'1 x2

-P

y'1

y2 U

x'2

-P

y'2

Figure 2.11: Two-level lifted wavelet transform with an update-first structure.

as particular cases. The goal is to produce a reversible and scalable approximate signal while profiting from nonlinear filtering advantages. In [Ham96] one of the first nonlinear subband decompositions for image coding using lifting appears. A median prediction is proposed in order to reduce the ringing artifacts, which are typical of linear filters in lossy coding. [Ham96] is extended in the subsequent work [Ham98], which includes a theoretical analysis of a class of M -band nonlinear decompositions with maximal decimation and PR. An interesting particular case consists of a filter bank formed of injective operators and followed by lifting steps. [Cla97] is already fully developed in the lifting framework. A set of linear predictors of different lengths are chosen according to a nonlinear selection function based on an edge detector. This avoids making a prediction based on data separated from the point of interest by a discontinuity. Prediction is preceded by a linear update (figure 2.11) in order to reach a stable transform throughout all resolution levels (since coarse scale coefficients linearly depend on the original signal) and to maintain a coherent space-frequency interpretation of the updated coefficients. [Cla03] extends the previous work and analyzes the transform reversibility, stability, and frequency characteristics. In addition, [Cla03] employs a 2-D non-separable window to make better prediction filter choices. The scheme is inherently devoted to lossy coding. [Cla98] explains the possibility of applying a median predictor and keeping the track of the underlying basis for computing a median update, which attains a low-pass approximate signal if the appropriate constraints are considered. The scheme achieves nonlinear processing and multi-scale properties at the same time. The drawback is that side-information of the tracking is required except for the simplest case. Therefore, the scheme interest for compression applications


30

is reduced. In the reversible integer-to-integer setting, [Ada99] establishes a set of criteria to be hold by the transform. The criteria are a minimum of two analysis and synthesis vanishing moments, to exceed a certain coding gain threshold, and low-pass and high-pass spectral restrictions. A systematic search for the filters respecting these criteria is proposed among the short filters having powers of two or dyadic rational coefficients. The algorithm outputs are several already known decompositions and other new low-complexity filters. [Tau99] describes a sequence of nonlinear lifting steps that realize an orientation adaptive transform which reduces artifacts near edges. The resulting bit-stream is quite scalable because the edge detector is a function with low susceptibility to quantization errors. A similar idea is proposed in [Ger05, Ger06]. The approximation signal gradient is considered for the choice of the filtering direction of the LeGall 5/3 predictor. A detail sample is predicted through the direction with smaller gradient among the three possible directions, which are the horizontal, the topleft to down-right, and the down-left to top-right directions in the 1-D row-wise filtering case. Therefore, this is a multi-line filtering because data from neighboring rows may be employed for a row-wise filtering (the same occurs for the column-wise filtering). Multi-line lifting can be made computationally efficient as showed in [Tau99]. However, there exists the possibility of update leakage, i.e., information from other rows flows to the approximation signal and thus the subsequent ULS deviates from an anti-aliasing filter. To avoid this problem the strategy is the same as [Cla97], that is, the ULS is performed before the PLS. An axiomatic framework of wavelet-type multi-resolution signal decompositions and nonlinear wavelets based on morphological operators and lifting is presented in [Hei00]. The work is not devoted to compression but the framework is used by [Li02] to create a statistical PLS that implicitly profits from local gradient information. [Abh03a] also uses gradient information extracted from detail coefficients to modify the ULS in order to have a low-pass channel with adaptive smoothing that preserves edges and singularities. In [Gou00], symmetrical lifting steps are constructed for quincunx sampled images. The method applies the McClellan transform to 1-D symmetrical filters to obtain 2-D filters that lie on a quincunx grid. [Kov00] provides a method for building wavelets via lifting in any dimension, for any type of lattice, and any number of vanishing moments. The construction involves an interpolative prediction based on the Neville filters and a running average update. A practical implementation of Neville filters for a quincunx grid appear in [Zee02]. The package includes nonlinear maxmin filters. In [Sun04] a separable linear 2-D lifting is directly applied to the image, instead of applying the usual two 1-D filters on rows and then on columns. Its integer version is slightly different


31

from the usual filters and shows marginal lossless compression improvements with respect to it. The interest resides in the computational load gains. [Jan04] is an original approach that introduces the adaptation in the LS at the signal splitting stage. Coarse scale samples are used to insert new samples close to edges. As a consequence, the number of coarse and detail samples is signal-dependant. [Wan05] employs a lifting structure to perform a curved wavelet transform on the image domain. It improves the results of the separable transform in JPEG2000 despite of the required side-information to encode the curves along which the transform is computed. [Zha04] introduces the orthogonality restriction into the lifting structure itself, instead of the weaker and usual biorthogonality. A class of IIR orthogonal filter banks is constructed by means of IIR all pass lifting filters. Lei et al. [Lei05] work on the design of 2-channel linear phase FB. Linear phase becomes a structural property of the LS in addition to the usual lifting PR property through a slight structure modification and a specific decomposition of the analysis polyphase matrix of the FIR FB. This permits an unconstrained optimization of the remaining FB free parameters. This review shows different ways to profit from the flexibility given by the LS. The surveyed works employ the LS degrees of freedom to design space-varying, nonlinear, or 2-D non-separable decompositions, among others. Criteria are quite intuitive and logical in order to obtain good coding results, but none of the reviewed works states any objective criterion to optimize. The proposal in §5.1.1 remains within the philosophy of these approaches.

2.3.2

Methods for Lifting Optimization

[Sai96b] uses an initial S-transform refined by a second PLS. Three possibilities are given to compute an optimal second step: minimum variance (with Yule-Walker equations), minimum entropy (with Nelder-Mead simplex algorithm), and a frequency domain design (which reports the best results). [Yoo02] proposes 4-tap prediction and update filters that are reduced to be function of one parameter each after applying typical constraints: filter symmetry, zero DC gain of the high-pass filter, zero Nyquist gain of the low-pass, prediction coefficients summing up to one, and running average conservation. Therefore, the optimization is reduced to tune one parameter per lifting step. Several known wavelet transforms may be attained according to the parameter values. The optimization criterion is the weighted first-order entropy [Cal98]. Optimal values are found by an exhaustive search within the parameters interval. [Dee03] starts from a given LS that is improved by means of a wavelet coefficients prediction from those already known. Prediction MSE is minimized considering the projection of the wavelet underlying vector onto the vectorial subspace spanned by the causal wavelet vectors and taking


32

x via u

x0

P

LWT

y y

w

y'

Figure 2.12: Analysis stage of the space-varying LMS filter bank.

into account the signal auto-correlation (a first-order auto-regressive model is used). In [Bou01], input signal is assumed to be a stationary process with a known auto-correlation and spectral energy density. Prediction coefficients minimizing MSE are computed, obtaining the filter Fourier transform in function of the input spectral energy density. Optimal linear predictors are enhanced by directional nonlinear post-processing in the quincunx case and by adaptive-length nonlinear post-processing in the row-column sampling case. The ULS is fixed and linear. In [Ger00] the space-varying decomposition scheme of figure 2.12 is proposed. The prediction is a linear filter, yb[n] =

N X

wn,k x[n − k],

k=−N

and its initial coefficients w0,k are progressively refined with an LMS-type algorithm according to prediction error, w[n + 1] = w[n] + µ

xN e[n]. kxN k2

The step size µ of the algorithm depends on the input values range. This scheme adaptively optimizes prediction coefficients and it is able to deal with the non-stationarity of images. A similar work is that of [Tra99], which aims to design a data-dependant prediction with the goal of minimizing the detail signal. This work distinguishes two approaches. One uses a local optimization criterion with space-varying prediction filter coefficients. The other approach is global in the sense that the l1 -norm of the entire detail signal is minimized through Wiener filter theory. In [BB02], lifting is applied to multi-component images to eliminate intra- and intercomponent correlation. First, the samples involved in filtering are determined. Then, a band and resolution level weighted entropy is minimized. Numeric optimization methods are used


33

(Nelder-Mead simplex) because entropy is an implicit function of the decomposition parameters. Yule-Walker equations solution is taken as the initial prediction filter. This method requires sending side-information. [Ho99] defines several half-band filters. Each resolution level uses a linear combination of them. Combination weights minimize the sum of the prediction error for all resolution levels. [Abh03b] selects as lifting filters the Lagrange polynomial interpolators of degree 0, 1, 2, and 3 (which vanish 0, 1, 2, and 4 signal moments, respectively). Wiener-Hopf equations relating the auto-correlation and the filter coefficients are formulated. The solution gives the MSE minimizing coefficients. At each sample, the chosen interpolation filter is the one that applied to the WienerHopf equation using the optimal filter coefficients gives the minimum MSE. In [Kuz98] initial biorthogonal filters are modified by directly optimizing the ULS coefficients. The trick is that a very precise signal model is considered since the goal is to compress electrocardiography signals. Lagrange multipliers are used to minimize a cost function that leads to compact support wavelets. Two restrictions are imposed: the update step should be a highpass filter and the detail signal equal to zero for the specified model. In [Fah02], for a concrete signal class, update coefficients are gradually improved until reaching the minimum MSE. In [BB03] the image is partitioned into disjoint fixed blocks that are classified in several regions. Then, a predetermined couple of lifting steps is chosen for each block according to local statistics. The global entropy of the resulting pyramidal representation is minimized. Sideinformation indicating the filter choice is required. [Hat04] extends previous work by using a quadtree segmentation rule that gives flexibility to the input image block-partitioning stage. Detail signal statistics are modeled to minimize the entropy. Side-information containing the quadtree structure and the prediction coefficients is needed by the decoder. A further extension of the variable-size block-based procedure for the compression of multi-component images appear in [Hat05]. The quadtree partitioning rule takes into account simultaneously spatial and spectral redundancies. [Gou01] optimizes prediction filter coefficients on a quincunx grid to minimize the detail signal variance. Then, it optimizes update coefficients to minimize the quadratic error between original image and the image reconstructed using a zero detail signal. This principle also aims at offering a better resolution and quality scalability. Filter coefficients are transmitted to the decoder in order to proceed to the inverse transform. [Gou04] generalizes [Gou01] to any kind of sampling grid. Filters are linear phase 2-D FIR. In [Li05], the detail signal energy is minimized. It is expressed as a function of the autocorrelation of the image, the auto-correlation of image difference, or the auto-correlation of the image second-order difference. The kind of correlation shown by an image imposes the choice of the prediction among the three design criteria. Once the prediction coefficients are determined,


34

the ULS that maximizes the smoothness according to the Sobolev regularity is selected. Some nonlinear enhancements are proposed in order to improve the linear filter performance. In summary, all the works reviewed in this section employ an optimization criterion to design lifting steps and an optimization technique to reach or to find such an optimal step. Common criteria are the variance, the (weighted) entropy, and the prediction MSE of the detail signal coefficients. Bit-rate and rate-distortion functions are used more frequently in the filter bank design than in LS. Optimization techniques such as Yule-Walker or Wiener-Hopf equations and the LMS algorithm are widespread. Often, the objective is a differentiable function and a closed-form solution may be derived. Another resource is the heuristic Nelder-Mead method. Decomposition topology is a variable that may also be modified to optimize any criterion. Assumptions on input data tend to simplify the design and to report better results than generic designs. Several proposals throughout the Ph.D. dissertation follow this line to design new lifting steps (e.g., proposals in §3.3, §5.1.2, and §5.2.2).

2.3.3

Lifting in Video Compression

Subband motion-compensated temporal filtering video codecs have recently attracted attention due to their high compression performance comparable to state-of-the-art hybrid codecs based on the predictive feedback paradigm and due to their scalability features, which provide superior support for embedded, rate-scalable signal representations. Scalable video encoders work without knowledge of the actual decoded quality, resolution, or bit-rate. Scalability (cf. §2.5.1) is a desirable property for interactive multimedia applications, and it is a requirement to accommodate varying network bandwidths and different receiver capabilities. Scalable video coding also provides solutions for network congestion, video server design, and the protection for the transmission in error prone environments. Initial subband-based video codecs computed the 3-D decomposition directly through the three spatio-temporal dimensions, leading to poor results w.r.t. the standard predictive coding. Currently, subband coding exploits the temporal inter-frame redundancy by applying a temporal wavelet transform in the motion direction over the frames of the video sequence raising performance to very competitive levels. LS is the considered realization for the adaptive motion-compensated temporal DWT. An early attempt to introduce motion compensation into 3-D subband coding is due to Ohm et al [Ohm94]. Like most of the founding works in this domain, the temporal filter is limited to the two-tap Haar filter, which is unsatisfactory. In later works, the temporal transform support is extended by rendering the LeGall 5/3 wavelet applicable. LeGall 5/3 is spread out in motioncompensated video coding for the temporal filtering part.


35

Many works lean on block-based motion compensation, with fixed or variable block sizes, because of its simplicity and the long experience that has been developed around this tool. However, this technique leads to disconnected and multiple connected pixels, which may produce annoying coding artifacts. Disconnected and multiple connected pixels are those in the reference frame not used for temporal prediction and those used to predict more than one pixel in the current frame, respectively. This casuistry supposes extra difficulty to the subsequent ULS design. In [Ohm94], disconnected pixels are treated differently from connected pixels in order to maintain reversibility in integer-pixel accurate motion compensation. The use of mesh models for motion compensation may be a solution to the problem of processing these pixels. For example, [Tau94] spatially aligns video frames by arbitrary frame warping before applying the 3-D decomposition. In general, this warping is not invertible and therefore, it cannot achieve perfect reconstruction. A research stream is the block-based motion estimation and the plead in favor of finding a satisfactory way to process the disconnected and multiple connected pixels in order to provide high-efficiency video codecs. Many works [Sec01, Meh03, Sec03, Gir05, Til05] are devoted to finding motion-compensated subband decompositions. Efforts have also been led to the design and optimization of lifting steps for the video coding application. In lossy subband motion-compensated video coding, Girod et al. [Gir05] propose a prediction minimizing high-band energy and assert that an ULS design towards the same goal does not significantly contribute to reduce the global bit-rate. On the other hand, the inspection of the inverse transform reveals that the update greatly impacts in the distortion of the reconstructed frame sequences. Therefore, the ULS is optimized in order to minimize the reconstruction distortion assuming that the PLS and ULS are linear and the quantization errors introduced in the low- and high-pass frames are uncorrelated random vectors. The same research line is followed in [Til05] but the assumptions are stronger: all coefficients in the detail frames are quantized with the same quantization step, they are independent and identically distributed, and the error is uncorrelated in the different bands. Then, the reconstruction error variance of an approximation pixel is related to the quantization errors variances. Thus, the optimal ULS coefficients are derived by the reconstruction error minimization. When the update filter coefficients are not restricted to sum up to one, the given solution is the same as the given in [Gir05]. Simple nonlinear ULS are also presented. The different alternatives lead to similar results. ULS design is a difficult part in the video coding application, since the incorporation of this step contributes to the bit-stream scalable properties and to the decrease of the reconstruction error, but it may jeopardize compression rate and introduce disturbing artifacts in the low-pass temporal frames if the motion estimation fails. The trade-off is partly surmounted by adaptively varying the update weights according to the high-pass temporal frame energy [Meh03] or by the


36

related idea of changing the high-pass frames that the ULS employs.

2.4

Other Adaptive, Nonlinear, and Sparse Decompositions

There are many references, not directly related to (nonlinear) lifting, but more generally to new adaptive, directional, and sparse decompositions that is worth being mentioned since their goal is related to the proposed schemes. In the last years, a multitude of such representations has appeared: wedglets [Don97], ridgelets [Don98], bandelets [Pen00], curvelets [Sta02], contourlets, armlets, dual trees, etc. They arise from the observation that in many applications the most interesting parts of a signal are its singularities. The consequent approach is to look for wavelets capable of tracking the discontinuities in shape. This idea leads to the construction of functions whose support has a shape adapted to the signal regularity. For example, Donoho [Don97] studied the optimal approximation of particular classes of indicator functions with an overcomplete collection of atoms, called wedgelets. This construction is based on a multi-scale organization of the edge data. Another approach due to the same author are the ridgelets [Don98], which are elongated wavelets especially suited for object discontinuities across straight lines. Similarly, in [Pen00] a family of orthogonal wavelets capable of efficiently representing singularities along regular contours is combined with the Daubechies 9/7 wavelet transform. Such type of decompositions are called bandelets.

2.5

Wavelet-based Image Coders

This section is mainly devoted to explain wavelet-based image coders and their particular modular structure, but it also describes some other relevant image coders. The concepts of embedded bit-stream and scalability are introduced in §2.5.1. JPEG2000 standard is largely derived from the EBCOT coder. The widespread SPIHT coder and the EBCOT coder are described in more detail in §2.5.2 and §2.5.3. This Ph.D. thesis aims to obtain “good” transforms for compression. Two image coders are mainly used to compare transforms: SPIHT without arithmetic encoder and EBCOT. Statistics drawn from transformed coefficients, as detail coefficients mean energy, variance, entropy, or even the number of zeros may also be indicators of a transform “goodness”. Since a principal focus is lossless compression, a fair final comparison benchmark would seem to be JPEG-LS. For the sake of locating the proposed algorithms performance in a more global perspective, JPEG-LS compression results are sometimes given, but it is important to notice the functionalities loss of


37

this standard and that the goal is to compare different transforms, not image or entropy coders. Wavelet-based image coders era began with the notable breakthrough of embedded zerotree wavelet (EZW) coding by J. Shapiro [Sha93]. The EZW algorithm was able to exploit the multi-resolution properties of the DWT to produce a computationally simple algorithm with outstanding performance and quality embedded bit-stream. In this context, embedded means that every prefix of the compressed image bit-stream is itself a compressed bit-stream, but at a lower rate. A number of wavelet coding methods have been proposed since the introduction of EZW. A common characteristic of these methods is that they use fundamental ideas found in the EZW algorithm. Most of image coders based on MRA have a similar modular structure: 1. Multi-component decomposition (MCT). Color or multi-spectral images have high correlation among their components that is reduced by an initial inter-component decomposition. 2. Multi-resolution analysis. 3. Quantization of the transformed coefficients. Common examples of quantization employed in practice are scalar uniform, trellis, and vectorial quantization. Scalar quantization with deadzone provides an embedded quantization. This means that the intervals of lower rate quantizers are partitioned to yield the intervals of higher rate quanitzers. Embedded quantization is very useful to obtain embedded bit-streams of the compressed image. 4. Bit-plane encoding. The array of quantized scalar indices coming from the quantization step are seen as a set binary arrays or bit-planes, the slices of the indices. The first bit-plane consists of the most significant bit (MSB) of each magnitude. The next bit-plane consists of the second MSB plane, and so on until the bit-plane for the least significant bit (LSB). There is also a bit-plane that consists of the sign bit of each index. Wavelet-based coders have a part dedicated to successively coding each of the bit-planes. EZW and SPIHT’s main contribution is the specific way of bit-plane encoding. Bit-plane encoding appears in JPEG2000 in the so-called tier-1 (part 1 of the EBCOT). 5. Entropy coder. Context-based arithmetic coding is the common form of entropy coding applied to bit-planes. The sequence of symbols resulting from the bit-plane coding in EZW are losslessly compressed with the classical context-dependant arithmetic coding in [Wit87]. JPEG2000 uses the more sophisticated JBIG2 MQ-coder. 6. Rate-distortion control. SPIHT cuts the compressed bit-stream at a given point in order to achieve the desired rate or distortion. JPEG2000 tier-2 is devoted to the bit-stream organization with the goal of rate-distortion control, among others.


Original Image

Pre-Processing MCT

Discrete Wavelet Transform

Tier-2 Coding

Compressed Image

Bit-stream Organization

38

Uniform Quantizer with Deadzone

Tier-1 Coding

Entropy coder: MQ-coder

Context-adaptive Bit-plane Encoder

Figure 2.13: JPEG2000 block diagram.

In general, these points are a good representation of an MRA-based image coder. Blocks 2 to 5 are present in most of coders. The presence of block 1 depends on the target application. Block 6 is quite specific and one of the relevant characteristics of the EBCOT coder (§2.5.3). Quantization is inherently a lossy function, so it is not performed in lossless coding. Figure 2.13 is an example of a wavelet-based image coder block diagram. The figure shows the JPEG2000 standard fundamental building blocks. Scalar-quantization with deadzone is a function Q with the quantization step ∆ of an input coefficient x that produces the quantization indices q: k ( j |x| + τ , if |x| ∆ ∆ + τ > 0, q = Q(x) = sign(x) 0, otherwise. Parameter τ controls the width of the central deadzone. The most common values are τ = 1/2, which amounts to a uniform quantizer, and τ = 0, which corresponds to a deadzone width of 2∆. JPEG2000 uses τ = 0. Assuming that the magnitude of q is represented with N bits, then q may be written in sign-magnitude form as q = Q(x) = s qN qN −1 . . . q1 , where qi is a magnitude bit and s is the sign bit. A bit-plane i is formed of the qi bit of all the transformed coefficients. Embedded bit-stream by means of bit-plane coding were used before the appearance of EZW. It was even included as part of the original JPEG standard [Pen92]. Before the appearance of EZW, coding systems followed a fixed scan pattern, p.e., a zig-zag scan order or a raster scan. Bit-plane coding in EZW permits data-dependant scanning in the form of zero-trees, which allows the coding of large numbers of zeros using very few compressed bits. This idea is retaken and generalized in one of the more widespread methods, the Set Partitioning in Hierarchical Trees (SPIHT) algorithm introduced by Said and Pearlman in [Sai96a].


39

SPIHT became very popular since it was able to achieve equal or better performance than EZW without arithmetic encoder. SPIHT is extensively known in entropy coding and it is a good choice to compare transforms compression properties. It exploits the wavelet transform self-similarity across scales by using set partitioning. Wavelet coefficients are ordered into sets using a parent-child relationship and their significance at successively finer quantization levels. The produced bit-stream is SNR scalable. A more detailed explanation on SPIHT is given in §2.5.2. Appendix 5.B describes an extended 3-D version elaborated for the purposes of this Ph.D. thesis. Another coding technique based on wavelet decomposition is the Embedded Block Coding with Optimized Truncation [Tau00] (EBCOT for short), which has been chosen as the basis of JPEG2000 standard [ISO00, Sko01, Use01]. JPEG2000 is the latest ISO/ITU-T standard for still image coding. A profound reading on the standard and surrounding technology is found in [Tau02a]. The standard is based on the discrete wavelet transform, scalar quantization with deadzone, arithmetic coding with context modeling, and post-compression rate allocation (figure 2.13). Section 2.5.3 explains the foundations of EBCOT (and JPEG2000). MRA coders are the best for lossy compression. However, the concurrence is harder for lossless coding. There are algorithms that perform very well. Possibly, the state-of-the-art lossless coder is still CALIC [Wu97, Say00]. There exist specific coders that outperform CALIC in their target application, as JBIG2 [ISO99b] for binary images or PNG [W3C96], and PWC [Aus00] for palette and graphic images. However, for a wide range of images CALIC reports best results but at the cost of several functionalities found in JPEG2000 and other standards, such as support for scalability and error resilience. JPEG-LS [ISO99a] is the new ISO/ITU-T standard for lossless coding of still images. It is based on LOCO-I [Wei00] algorithm, a simplified CALIC. JPEGLS uses adaptive prediction, context modeling, and Golomb coding. It reaches a compression efficiency very close to the best reported results.

2.5.1

Embedded Bit-Stream and Scalability

Any truncation of an embedded code achieves a decompressed image with a lower quality or resolution than the original. The application of embedded codes applied to image compression allows functionalities as remote browsing and interactive retrieval, and to accommodate different receiver capabilities. Embedding arises from scalable coding. Scalability involves the generation of various layers from a single source or image. The lower layer provides a basic image representation and the subsequent layers successively enhance and refine the representation. A compressed bit-stream is SNR scalable if the representation progressively improves the decoded image quality, i.e., reduces the distortion. The bit-streams produced by the EZW and


40

Figure 2.14: Parent-child relationships in SPIHT.

SPIHT are SNR scalable. A bit-stream is resolution scalable if it contributes to decode the bands of each resolution level successively from the lowest to the highest level. An image coder may produce a bit-stream that is both, SNR and resolution scalable (e.g. JPEG2000). In this case, the bit-stream is composed by elements that successively improve the quality of the bands in each resolution level. These elements are part of the bit-stream that belong to both an SNR scalable and a resolution scalable bit-stream. There also exists the spatial scalability, in which image is decoded in a spatially ordered way. JPEG2000 admits component scalability for multi-component images, too. A linear bit-stream only possesses one kind of scalability. Therefore, when two or more types of scalability are supported at the same time by a bit-stream, additional structural information is required in order to identify the location of the elements that permit the decoder to perform the appropriate kind of decompression according to the target application.

2.5.2

SPIHT coder

SPIHT exploits the dependencies existing among the transform coefficients in different bands. Concretely, coding gain arises from the representation with few bits of large regions of zero or near zero coefficients with related locations through scales and bands. SPIHT employs a parentchild relation among the coefficients of the same orientation bands. For example, a coefficient in the LH3 band is the parent of 4 children in the LH2 band. Coefficients in the HL1, LH1, and HH1 bands have no children. Also, one of every four coefficients in the lowest resolution band has no children. Figure 2.14 shows these parent-child relations. The coefficient marked with a star has no children.


41

SPIHT is described in terms of bit-plane coding of signed indices arising from a deadzone scalar quantization of the transformed coefficients. A coefficient ci,j is quantized to q = Q(c) = s qN qN −1 . . . q1 . Each qi contributes to a bit-plane. The maximum magnitude for all the coefficients N = blog2 max(i,j) (|ci,j |)c determines the number of bit-planes. SPIHT encodes successively the bit-planes from n = N to n = 1. The assumption that the descendants of small coefficients tend to also be small is put in practice by the concept of significance. The significance of a coefficient determines if it is “large” or “small”, which is made precise by comparing coefficients to a series of thresholds Tn , for n = 1, . . . , N . The initial threshold is T1 = 2N −1 . All coefficients with ci,j ≥ 2N −1 are significant w.r.t T1 , which means that they have a 1 at the MSB, i.e., in the bit-plane N . The following thresholds are defined by Tn = 2N −n . Zero-tree is the structure defined to signal that the descendants of a root coefficient have all 0 within a bit-plane. SPIHT employs two types of zero-trees. The first consists of a single root coefficient having all descendants 0 within the given bit-plane. The second type is similar, but excludes the four children of the root coefficient. If all descendants of a given root are insignificant, they comprise an insignificant set of type A. Similarly, the insignificant sets are of type B if they do not contain the root children. SPIHT employs three ordered lists which store the significance information of the different sets and coefficients: • List of significant coefficients (LSC): contains the coordinates of all coefficients that are significant w.r.t. the given threshold. • List of insignificant sets (LIS): contains the coordinates of the roots of insignificant sets of coefficients of type A and B. • List of insignificant coefficients (LIC): contains the coordinates of all the root coefficients that are insignificant, but that do not reside within one of the two types of insignificant sets. Each bit-plane is coded by a significance pass (also called sorting pass), followed by a refinement pass. The refinement pass codes a refinement bit for each coefficient that was significant at the end of the previous bit-plane following the order given by the LSC. Coefficients that become significant in the scan of the current bit-plane are not refined until the next bit-plane. Every coefficient in every band is initialized to the insignificant state. Then, the bit in the current bitplane, starting with qN , for every coefficient in the LIC list is checked. If it is one, the coefficient is significant, it is coded with the sign of the coefficient, and moved to the LSC. After that, each set in the LIS is examined, in the order of appearance in the list. If all coefficients in a set in the LIS are insignificant, a zero is coded for all the set. If there is a significant coefficient, the set is partitioned and the new sets and roots are sent to the corresponding list. This process


42

continues until all the LSB are coded. However, it may be stopped at any point if a desired rate or distortion is reached, since it is assured that the resulting bit-stream until the halting point is a lower rate-distortion code of the image. Only significance bits are arithmetically coded in SPIHT. No gain is obtained to code the refinement or sign bits. Spatial significance bit dependencies are exploited by grouping them in 2x2 blocks. Different coding contexts are used depending on which bits are to be coded. The typical gain of SPIHT with arithmetic coding w.r.t. SPIHT without arithmetic coding is about 0.5 dB.

2.5.3

EBCOT coder

JPEG2000 supports several functionalities at the same time. Important examples are spatial, resolution, and SNR scalability, enhanced error resilience, random access, the possibility to encode images with arbitrarily shaped regions of interest, and open architecture. Many of these functionalities are inherent to wavelets and EBCOT coder. The EBCOT coder version in the JPEG2000 standard [Tau02b] has some differences w.r.t. the original [Tau00]. The former EBCOT [Tau02b] is used in this dissertation. Wavelet coefficients become quantized indices through a deadzone quantization. Then, each bit-plane is split into blocks. A typical block size is 64x64. The coding proceeds from the most significant bit-plane to the least significant bit-plane. Blocks are coded independently using three coding passes: the significance propagation, the refinement, and the clean-up passes. The bitplane coding procedure provides binary symbols and context labels to the MQ coder, which is a specific context-based arithmetic coder that converts the input symbols to compressed output bits. This is the JPEG2000 Tier 1 that generates a collection of bit-streams. One independent bit-stream is generated for each code-block, being each block bit-stream embedded. Tier 1 is followed by the Tier 2 that multiplexes the bit-streams for their inclusion in the image code-stream and efficiently signals the ordering of the coded bit-plane passes. Tier 2 coded data may be easily parsed. Tier 2 also enables the SNR, resolution, spatial, ROI, and arbitrary progression and scalability.

Chapter 3

Linear Lifting Schemes: Interpolative and Projection-based Lifting 3.1

Introduction

Despite of the amount of research effort dedicated to lifting filters optimization (cf. §2.3.2), many works [Li05, Ger05, Hat05] keep on appearing that contribute ideas to improve existing lifting schemes with new optimization criteria and algorithms. Certainly, there is room for contributions, specially in space-varying, signal-dependant, and adaptive lifting. Even in the linear setting, there are several ideas that have not been studied enough. This chapter aims to propose, describe, analyze, and experimentally test linear LS. The chapter is divided in two approaches, different but intimately related. The first one §3.2 is based on adaptive quadratic interpolation. The method follows the line of work established by Muresan and Parks in [Mur04]. The main objective of [Mur04] is distant from the wavelet domain. The goal is the interpolation of images for digital cameras with any rational degree of zooming. Here, the principal idea of the work is taken up again and further developed in such a way that it serves to create a variety of interpolative PLS. The second approach §3.3 is a projection-based construction of lifting steps, which has some similarities to that of Deever and Hemami [Dee03] in its initial development. The name projection-based refers to the interpretation of the simplest prediction step arising from the approach. The optimal result minimizes the projection error of the wavelet basis vector onto the subspace spanned by the scaling basis vectors. New prediction and update lifting steps are derived using this method. The interpolation-based approach is suited for the construction of space-varying schemes, since it may be locally adaptive. The method searches a (local) optimal interpolation. Meanwhile, the projection-based approach seems more suited for signal-class adapted lifting construction. The filter is optimized for a certain class of images and then it is employed whenever an image belonging to the class is coded. 43

Chapter 3. Linear Lifting Schemes: Interpolative and Projection-based Lifting

44

The advances and results arising from one scheme may be applied to the other one. This is possible because in spite of the different point of departure of each approach, the underlying mathematics are essentially the same for both cases. However, the information mainly flows one-way: from the interpolation to the projection-based approach. Experiments concerning linear lifting steps construction employing both approaches are elaborated and described in section 3.4. Finally, chapter summary and some conclusions are provided in section 3.5.

Chapter Notation The notation for this chapter slightly differs from the rest of the dissertation. Let l (∼ = x0 ) and h (∼ = y0 ) be the scaling and wavelet coefficients, respectively. The notation stands for low-pass and high-pass, and this is licit because the filters developed in this chapter are linear and despite the fact they are not always strictly band-pass filters. This notation is consistent and makes the exposition clearer. Next chapters address nonlinear LS and the notation introduced in chapter 2 is retaken. The multi-resolution decomposition is x → (l, h) = (l(1) , h(1) ) → (l(2) , h(2) , h) → . . . → (l(K) , h(K) , h(K−1) , . . . , h).

(3.1)

The decomposition has intermediate l and h subsignals not present in the previous multiresolution representation (3.1). These subsignals are the output of each of the L lifting steps required for the wavelet decomposition. Lifting is defined as the algorithm with the following steps. Super-indexes that indicate the resolution level are omitted for conciseness. (a) Lazy wavelet transform of the input data x into two subsignals: – An approximation or low-pass signal l0 formed by the even samples of x. – A detail or high-pass signal h0 formed by the odd samples of x. (b) Lifting steps, i = 1 . . . L. – Prediction Pi of the detail signal with the li−1 samples, hi [n] = hi−1 [n] − Pi (li−1 [n]). – Update Ui of the approximation signal with the hi samples, li [n] = li−1 [n] + Ui (hi [n]). (c) Output data: the transform coefficients lL and hL .


45

The output signal lL = l(1) may be further decomposed. Since the filters are linear, a usual representation of the steps is hi [n] = hi−1 [n] − pTi li−1 , li [n] = li−1 [n] + uTi hi , where li = li [n] and hi = hi [n] are column vectors containing an appropriate subset of the subsignals centered at sample n. This chapter is dedicated to the optimization of p1 , p2 , u1 , and uL . Indexes are omitted for short when they are clear from the context.

3.1.1

Convex Optimization Theory

The theory of convex optimization is applied in this chapter. This theory provides a general framework for solving many constrained optimization problems. The key mathematical reference on the subject is [Roc71]. Two excellent references from a practical implementation perspective with engineering applications are [Ber99] and [Boy04]. The main advantage of convex optimization theory is that closed-form solutions can be found to many problems under some mild conditions based on the application of the Karush-Kuhn-Tucker (KKT) conditions [Boy04, p. 243]. In the case that a closed-form solution does not exist or may not be found, the solution to a convex optimization problem can always be calculated by applying efficient numerical methods. Currently, there exists a wide range of algorithms and public software packages that solve any kind of convex problem in an admissible period of time. The rest of the section is a brief preliminary on convex optimization, which is useful for the understanding of some mathematical developments throughout the chapter. A set D is a convex set if the line segment between any two points in the set lies in the set. This may be expressed mathematically as αx1 + (1 − α)x2 ∈ A,

∀x1 , x2 ∈ D, ∀α ∈ [0, 1].

Similarly, a real-valued function f is a convex function if its domain D is a convex set and the following inequality holds: f (αx1 + (1 − α)x2 ) ≤ αf (x1 ) + (1 − α)f (x2 ),

∀x1 , x2 ∈ D, ∀α ∈ [0, 1],

which means that the line segment between (x1 , f (x1 )) and (x2 , f (x2 )) lies above the graph of f . A general expression of a constrained optimization problem is minimize x

subject to

f0 (x) fi (x) ≤ 0, hi (x) = 0,

1 ≤ i ≤ m, 1 ≤ i ≤ p,

(3.2)


46

which consists in finding the infimum of the function f0 (x) (which is called the objective function) among all x that satisfy the conditions fi (x) ≤ 0, i = 1, . . . , m, and hi (x) = 0, i = 1, . . . , p, simultaneously. The optimization variable is x. The inequality constraints and inequality constraint functions are fi (x) ≤ 0 and fi (x), respectively. Finally, the equality constraints and equality constraint functions are hi (x) = 0 and hi (x), respectively. The set of points for which the objective and all the constraint functions are defined is called the domain of the optimization problem (3.2). The problem (3.2) is a convex optimization problem if the objective function and the inequality constraint functions are convex, and if the equality constraint functions are affine, i.e., hi (x) = aTi x + bi . This definition implies that the domain of the optimization problem is convex. The optimal value f ? of the problem is defined as f ? , inf {f0 (x) : fi (x) ≤ 0, i = 1, . . . , m, hi (x) = 0, i = 1, . . . , p} .

(3.3)

The KKT conditions is a way that provides the convex optimization theory to obtain the optimal solution. A previous requirement is to define the Lagrangian function L associated to the problem, L(x; λ, ν) , f0 (x) +

m X i=1

λi fi (x) +

p X

νi hi (x),

(3.4)

i=1

where λ = (λ1 · · · λm )T and ν = (ν1 · · · νp )T are the Lagrange multipliers vectors. In the case that all the functions are differentiable and under some other technical conditions, a set of expressions have to be fulfilled by any optimal solution x? and optimal Lagrangian variables (λ? , ν ? ). The expressions are the so-called KKT conditions:

∇f0 (x? ) +

m X

λ?i ∇fi (x? ) +

i=1

p X

hi (x? ) = 0,

i = 1, . . . , p,

fi (x? ) ≤ 0,

i = 1, . . . , m,

λ?i ≥ 0,

i = 1, . . . , m,

λ?i fi (x? ) = 0,

i = 1, . . . , m,

νi? ∇hi (x? ) = 0.

i=1

Some of the problems that appear in this chapter are formulated as quadratic programs, in which the objective function is convex quadratic and all the equality and inequality constraint functions are affine. KKT conditions are used to solve them, giving a closed-form solution. Some other problems are reduced to linear programs, in which all functions fi , i = 0, . . . , m, and hi , i = 1, . . . , p, are affine and for which convex optimization provides simple and efficient numerical algorithms to obtain a solution.


3.2

47

Quadratic Image Interpolation Methods

This section presents an interpolation method based on adaptively determining the quadratic signal class from the local image behavior. The interpolation method has the ability to interpolate by any rational factor and to model properties of the image acquisition device or other external constraints into the algorithm itself. In [Mur04], a quadratic signal model is established and then the interpolation is found by means of the optimal recovery theory [Mic76, Mur02]. Our study of this method has revealed that the problem can be reformulated as the minimization of a quadratic function with linear equality constraints. This insight provides us with all the resources and flexibility coming from the convex optimization theory in order to solve the problem. Furthermore, the initial problem statement may be modified in many ways and convex optimization theory still offers solutions. For instance, equality and inequality constraints on the smoothness of the signal or on its lower and upper bounds may be included. The new found flexibility is employed in section 3.3 to design lifting steps with different criteria from the usual vanishing moments or spectral considerations. However, an interpolative PLS may be constructed directly. The interpolation values may be used as the prediction of the detail samples. Also, the interpolation value may be a basis reference of the image underlying probability density function for an optimized prediction. The prediction may be considered the peak value of a mono-modal symmetrical distribution of the sample (cf. §5.1.2).

3.2.1

Quadratic Interpolation

The adaptive interpolation method is based on two steps. First, a set to which the signal belongs or a signal model is determined. Second, the interpolation that best fits the model given the local signal is found. A quadratic signal class K is defined as K = {x ∈ Rn : xT Qx ≤ }. The choice of a quadratic model is practical because it can be determined easily by a set of training data and also, because an appropriate choice of matrix Q facilitates the derivation of the optimal interpolation values. The training data is taken from the local features of the image. Alternatively, it is taken from images of the same type or from any image model. Assume that a training set of patches S = {x1 , . . . , xm } representative of the local data is given for estimating the local quadratic signal class. Then, matrix Q defines an ellipsoid xT Qx ≤

(3.5)


48

that must be representative of the training set S for some constant . In other words, Q must be a matrix such that when an image patch y is similar to the vectors in S, then (3.5) holds for y. Let matrix S be formed by arranging the image patches in S as columns S = (x1 . . . xm ), and consider the equation relating the image patch y to the training set S using a column vector c formed of m weights, Sc = y. Vector y lies in the expansion of the columns of S. Therefore, y is similar to the vectors in S when c has small energy, kck2 = cT c = yT (SST )−1 y = yT Qy ≤ , where Q is the pseudo-inverse of the product matrix SST . In this sense, good interpolators of y for the quadratic class determined by Q are spanned with the weighting vectors c of energy bounded with some . The training set has to be determined. One direct approach of selecting the elements in S is based on the proximity of their locations to the position of the vector being modeled. In this case, patches are generated from the local neighborhood. For example, in figure 3.1 the center patch x = (x(2,2) x(2,3) x(2,4) x(2,5) x(3,2) . . . x(5,5) )T may be modeled by the quadratic signal class of the set    x(0,0) x(4,4)     x(0,1)   x(4,5)    S =  . , ... , . .   .   ..    x(3,3) x(7,7)

       ,    

where S is formed by choosing all the possible 4x4 image blocks in the 8x8 region of the figure.

3.2.2

Optimal Quadratic Interpolation

Once the high density class K has been determined, the optimal interpolation vector x can be expressed as the solution of a convex optimization problem (3.2). We are looking for the minimum energy vector c subject to the constraint that x is a linear combination of the selected image patches. This statement can be formulated as minimize kck2 x,c

subject to Sc = x.

(3.6)


0

1

2

3

4

5

6

49

7

0 1 2 3 4 5 6 center patch

7

Figure 3.1: Local high density image used for selecting S to estimate the quadratic class for the center 4x4 patch (dark pixels are part of the decimated image).

The optimal solution of (3.6) is x? = 0 and c? = 0. The information coming from the interpolated signal should be included in the formulation to obtain meaningful solutions. Previous knowledge about x is available since only some of its components have to be interpolated. Typically, if a decimation by two has been performed in both image directions, then one of every four elements of x are already known. Also, it may be known that the original high density signal has been averaged before the decimation. In both cases, a linear constraint on the data is known and it may be added to the formulation (3.6). The linear constraint is denoted by AT x = b. In the first case, the columns of matrix A are formed by vectors ei , being the one located at the position of the known sample. The respective position of vector b has the value of the sample. An illustrative example for the second case is the following. Assume that the pixel value is the average of four high density neighbors, then there would be 1/4 at each of their corresponding positions in a column of A. Whatever the linear constraints, they are included in (3.6) to reach the formulation, minimize kck2 x,c

subject to Sc = x, AT x = b.

(3.7)

This formulation is mathematically equivalent to that of [Mur04] but the convex form allows an easier interpretation and resolution, as well as the variety of alternative formulations provided in §3.2.3 and the modification in order to design lifting steps explained in §3.3. The solution of the problem (3.7) is x? = SST A(AT SST A)−1 b,

(3.8)


50

which is the least-square solution with the quadratic norm determined by SST and the linear constraints AT x = b. The vector ˜ A ˜ T A) ˜ −1 b c? = ST A(AT SST A)−1 b = A(

(3.9)

that minimizes the expected energy of c corresponds to an orthogonal projection of 0 onto the ¯ spanned by A ˜ = ST A. subspace S The matrix SST is symmetric and positive definite. This fact makes the optimization problem convex. In the example given by figure 3.1, matrix SST is       x(4,4) x(0,0) x x · · · x (0,0) (0,1) (3,3)  x(0,1)   x(4,5)        .. SST =  .  . . .  .    . . .  .   .  x(4,4) x(4,5) · · · x(7,7) x(3,3) x(7,7)   P P(4,4) P 2 k xk xk+(4,4) k xk xk+(0,1) · · · k=(0,0) xk   P(4,5) P P 2   x x · · · x x x k k k+(4,3) k−(0,1) k k k   k=(0,1) ˆ =   ∝ R, .. .. .. ..   . . . .   P P T (7,7) x x · · · · · · x k k−(4,4) k k k=(4,4) which is proportional to an estimation of the local image auto-correlation. The proportionality factor is not annoying because it appears in SST and in the inverse (AT SST A)−1 of equation (3.8), so it is canceled out. The formulation is made “global” by interpreting the image signal as a discrete-random process and taking the expectation in (3.8). In this case, the quadratic class is determined by the correlation matrix R = E SST . The corresponding solution is x? = RA(AT RA)−1 b,

(3.10)

which is the least-squares solution of the norm determined by R and the constraints AT x = b, i.e., the solution of minimize xT R−1 x x

subject to AT x = b. Note that R is an auto-correlation matrix, which is symmetric and positive definite and so the optimization problem is still convex. There are many ways to extract an estimation of the correlation matrix from the image data: the biased or unbiased estimators, with pre-windowing or not, the auto-regressive parametric models, etc. The image data may even be a region segmented from an image, the entire image, a whole image class, etc. The choice of the estimation method and the signal data depends on the application at hand. Local adapted and global interpolative predictions may be constructed with this common formulation. Additional knowledge is easily included in the formulation thanks to its flexibility.


51

In the next subsection, several alternative formulations that modify the original in different ways are proposed. Another consideration is that the proposed interpolation methods give solution vectors that can be seen as new data patches, better in some sense than the originally used by the algorithm. Therefore, these solution vectors may also be provided to a subsequent iteration of the algorithm, thus improving initial results.

3.2.3

Alternative Formulations

The initial formulation (3.7) solution gives a good interpolation, which is optimal in the specified sense. However, the problem statement may be further refined including more available knowledge, from the local data or from the given application. Knowledge is introduced in the formulation by modifying the objective function or by adding new constraints to the existing ones. Different alternative formulations are described in the following.

3.2.3.1

Signal Bound Constraint

The data from an image is expressed with a certain number of bits, lets say nbits bits. Then, assume without loss of generality that the value of any component of x is low-bounded by 0 and up-bounded by 2nbits − 1. This is an additional constraint that is included in the problem statement, minimize kck2 x,c

subject to Sc = x, AT x = b, 0 ≤ x ≤ (2nbits − 1) · 1,

(3.11)

where 0 (1) is the column vector of the size of x containing all zeros (ones). The symbol ≤ indicates elementwise inequality. Let us define the set D = {x ∈ Rn | 0 ≤ x ≤ (2nbits − 1) · 1}. Notice that (3.11) is a quadratic problem with inequality linear constraints and so, it has no closed-form solution. Anyway, there exist efficient numerical algorithms and widespread software packages that attain the optimal solution very fast. Nevertheless, if the optimal solution x? of (3.11) resides in the bounded domain D, then this is the closed-form solution expressed by (3.10). If the set of patches and linear equality constraints have been correctly chosen, then x? is almost always in the hypercube D.


3.2.3.2

52

Weighted Objective

Another refinement of (3.7) is the weighting of vector c in order to give more importance to the local signal patches that are closer to x. Closer patches are supposed to be more alike than the further ones. The formulation is ˜ 2 minimize kWck x,c

subject to Sc = x, AT x = b, ˜ is a diagonal matrix with the weighting elements wii related to the distance of the where W ˜ T W, ˜ then the corresponding patch (in the column i of S) to the patch x. Let define W = W problem may be reformulated as minimize cT Wc c

subject to AT Sc = b, which is solved using the KKT conditions: T A Sc − b = 0 KKT conditions: 2Wc + ST Aµ = 0

∼ =

AT S 0 2W ST A

c µ

=

b 0

.

(3.12)

Matrix on the RHS of expression (3.12) equivalence sign is invertible, so it is straightforward to compute the optimal vectors c? and x? , c? = W−1 ST A(AT SW−1 ST A)−1 b,

(3.13)

x? = SW−1 ST A(AT SW−1 ST A)−1 b. The solution (3.13) corresponds to the orthogonal projection of 0 onto the subspace spanned ˜ −1 ST A. The initial projection subspace ST A is modified according to the weight given to by W each of the patches. 3.2.3.3

Energy Penalizing Objective

A possible modification of (3.7) is to limit vector x energy by introducing a penalizing factor in the objective function. The two objectives are merged through a parameter γ that balances their importance. The resulting formulation is ˜ 2 + (1 − γ)kxk2 minimize γkWck x,c

subject to Sc = x, AT x = b,

(3.14)

which is equivalent to γW 0 c x,c 0 (1 − γ)I x T 0 A c b subject to = . S −I x 0 minimize

cT

xT

(3.15)


53

The variables to minimize are c and x. All the constraints are linear with equality. KKT conditions are established. If there are l linear constraints because of AT x = b and m local patches in S, then the resulting linear system derived from the KKT conditions is     0l×m AT 0l×l 0l×n c  S  b −In 0n×l 0n×n     x = ,  2γW 0(2n+m)×1 0m×n 0m×l ST  µ 0n×m 2(1 − γ)In A −In where µ ∈ Rl+n . The system matrix is invertible if the chosen W is invertible. Then, the solution reduces to

 T −1 if γ = 0,   A(A A) b, ? −1 T −1 −1 (I − F )A(A (I − F )A) b, if 0 < γ < 1, x =   SW−1 ST A(AT SW−1 ST A)−1 b, if γ = 1,

where F is introduced to make the expression clearer, F=

1−γ SW−1 ST + I. γ

Parameter γ balances the weight of each criterion. If γ = 0, then the solution is the leastsquares onto the linear subspace defined by the constraints AT x = b. On the other hand, the energy of x has no relevance for γ = 1, and the solution reduces to (3.13). Intermediate solutions are obtained for 0 < γ < 1. 3.2.3.4

Signal Regularizing Objective

An interesting refinement is to include a regularization factor as part of the objective function. Let define the differential matrix D, which computes the differences between elements of x. Typically, rows of D are all zeros except a 1 and a -1 corresponding to positions of neighboring data, i.e., neighboring samples in a 1-D signal or neighboring pixels in an image. The minimization of the differences vector Dx energy leads to smooth interpolations. The new problem statement is ˜ 2 + δkDxk2 minimize kWck x,c

subject to Sc = x, AT x = b. As before, the expression can be redefined in a more useful way to derive the KKT conditions. The constraints are the same as (3.15), while the objective function is W 0 c T T c x . 0 δDT D x Therefore, the linear equations system  0l×m AT 0l×l 0l×n  S −In 0n×l 0n×n   2W 0m×n 0m×l ST 0n×m 2δDT D A −In



  c  b  x  =  0(2n+m)×1 µ

(3.16)


54

has to be solved. The system has a unique solution if W and DT D are invertible matrices. W is a weight matrix chosen to be full rank. However, DT D is singular as defined because any constant vector belongs to the kernel of the matrix (since it is the product of two differential matrices). It may be made full rank by diagonal loading or by adding a constant row to D. The latter option has the advantage to introduce the energy weighting factor of (3.14) in the formulation. More or less weight is given to the energy criterion depending on the value of the constant row. Whatever the choice, the optimal solution is x? = M(I − F−1 M)A(AT M(I − F−1 M)A)−1 b,

(3.17)

where M = (DT D)−1 . In general, F is an invertible matrix and it is defined as F = δSW−1 ST + M.

3.2.3.5

l1 -norm Objective

Other norms than the Euclidean may be considered. The l1 -norm may be used to force the components of a vector to sum up to a constant by minimizing the absolute value of the difference between the components sum and the constant. For instance, if the elements of vector c are forced to sum up as close as possible to one, then it is assured that the energy of the solution is similar to that of the patches, which is certainly a desired property in the case of stationary signals. The problem statement reduces to minimize |1T c − 1| x,c

subject to AT x = b, Sc = x,

minimize |1T S# x − 1| ∼ =

x

subject to AT x = b,

where S# = (ST S)−1 ST . The function seems difficult to optimize, but the problem may be put as the equivalent linear program minimize t x,t

subject to 1T S# x − 1 ≤ t, 1T S# x − 1 ≥ t, AT x = b. The problem is reduced to a linear program: it has a linear objective function and linear constraints. A linear program is efficiently solved by many numerical methods, as for example, the simplex method. Linear programs are simple problems within the convex optimization theory. The use of l1 -norms may be mixed with the previously proposed refinements for the quadratic case as long as the added objectives are l1 -norms. The resulting programs are also linear and thus, easily solvable.


3.3

55

Projection-based Lifting

This section takes up again the formulation (3.7) adding linear equality constraints due to the wavelet coefficients inner product in order to create lifting steps. The formulation may be used for the construction of local adapted as well as global interpolative predictions. Remarkably, the same formulation introducing the new linear equality constraints permits the construction of PLS §3.3.2 that are not the first one and also ULS §3.3.3. Therefore, this common formulation is able to deal with three different problems: interpolation and prediction and update lifting steps design. Experiments in §3.4 provide results for the obtained lifting steps.

3.3.1

Wavelet Linear Constraint

The linear constraint on the data may have a different meaning than the ones established in the previous section. It has been assumed that the constraint refers to the specific value of a sample or the average of several high density neighbors in order to perform an interpolation. Assume now that the given data is the wavelet decomposition of a signal. Transformed coefficients are the inner products of wavelet (or scaling) basis vectors wi with the input signal. T x and l[n] = With this notation, coefficients h[n] and l[n] arise from the product h[n] = wh[n] T x, respectively. Then, the linear constraint AT x = b on the data is constructed in the wl[n]

following way. Columns in matrix A are formed by the wavelet transform basis vectors wi and the independent term b is formed by the transformed coefficients themselves. Therefore, the formulation

minimize kck2 x,c

subject to Sc = x, AT x = b, may also be applied if a set of patches S is available. Indeed, the solution is already known from previous section. The global formulation solution is x? = RA(AT RA)−1 b (3.10). The linear constraint allows an alternative way of estimating the signal auto-correlation through the use of the wavelet transform coefficients. Let A be a local wavelet basis and AT x = t, being t a vector containing the transformed coefficients l and h. Then, the inverse of A exists, which is the matrix formed of the synthesis wavelet basis vectors. Therefore, the auto-correlation can be expressed as R = E[xxT ] = E[A−T ttT A−1 ] = A−T E[ttT ]A−1 = A−T Rt A−1 , being Rt the wavelet transform correlation matrix. Thus, using the available transform coefficients, an estimation of Rt may be obtained. This estimation may be directly used in some of the presented solutions as in (3.10) because of the equality Rt = AT RA. Also, the need of computing an inverse matrix twice is avoided.


56

Now, the goal is to predict (update) a wavelet (scaling) coefficient. These coefficients have been already filtered, so somehow the goal is to improve or refine the lifting steps. First, the prediction case is analyzed in §3.3.2, and afterwards the update case in §3.3.3.

3.3.2

Linear Prediction Steps Construction

A coefficient h[n] is predicted using a set of scaling samples, which are denoted with some notation abuse by l[n]. For the sake of exposition clarity, index n is sometimes omitted in vectors l[n] and h[n]. The linear constraint independent term vector is b = l and the system matrix A = Wl , where the notation W indicates that the matrix columns are wavelet basis vectors. Using the established notation, a wavelet coefficient is expressed by h1 [n] = whT1 [n] x and it ˆ 1 [n] = pT b. The second prediction step p2 aims at is predicted with a linear filter such that h 2 obtaining a predicted value h2 [n], ˆ 1 [n] = h1 [n] − P2 (l1 [n]) = h1 [n] − pT l1 , h2 [n] = h1 [n] − h 2 that improves the initial detail samples properties in order to compress them. A key observation is that the coefficients l1 [n] constitute a low-resolution signal version that may be interpolated using any of the derivations of section §3.2. Therefore, an optimal interpolation x? (which is an estimation of x) is used to estimate h1 [n] through the inner product with the known wavelet basis vector wh1 [n] . Thus, the estimated coefficient is ˆ 1 [n] = wT x? . h h1 [n]

(3.18)

The optimal linear prediction filter p2 arises from developing (3.18) using the expression of x? if a closed-form solution exists. In the case of (3.10), such a development is simply ˆ 1 [n] = wT x? = wT RA(AT RA)−1 b = pT b, h 2 h1 [n] h1 [n]

(3.19)

and so, the underlying second prediction is p?2 = (AT RA)−1 AT Rwh1 [n] .

(3.20)

Interestingly, the optimal prediction filter (3.20) obtained with the use of the optimal interpolation solution (3.7) is the same as the one given by the minimum MSE p2 filter of (3.21) up to the exact choice of matrix R. The MMSE filter minimizes the energy of the second prediction ˆ 1 [n], step h2 [n] = h1 [n] − h ˆ 1 [n])2 ]. p?2 = arg min f0 (p2 ) = E[(h1 [n] − h p2

(3.21)


57

The following derivation proves that both optimal prediction filters coincide. First, the objective function is developed, ˆ 1 [n])2 ] = E[(wT x − pT b)2 ] f0 (p2 ) = E[(h1 [n] − h 2 h1 [n] = E[whT1 [n] xxT wh1 [n] − 2whT1 [n] xbT p2 + pT2 bbT p2 ], and differentiated with respect to p2 , ∇p2 f0 = E[−2whT1 [n] xbT + 2pT2 bbT ] = E[−2whT1 [n] xxT A + 2pT2 AxxT A] = −2whT1 [n] E[xxT ]A + 2pT2 AE[xxT ]A = −2whT1 [n] RA + 2pT2 AT RA, Finally, imposing the partial derivative to equal zero, the optimal prediction filter is found: T ∇p2 f0 (p?2 ) = 0 ⇒ whT1 [n] RA = p?T 2 A RA

⇒ p?2 = (AT RA)−1 AT Rwh1 [n] .

(3.22)

Once the optimal filter is found, the predicted value is ˆ 1 [n] = p?T b = wT RA(AT RA)−1 b, h 2 h1 [n] which is the wavelet coefficient of the optimal interpolation vector (3.10) on the wh1 [n] basis and it is the same expression as equation (3.19). This refinement of the initial PLS produces a coefficient h2 [n] = h1 [n] − p?T 2 b with lower expected energy. As discussed in §2.1.3, energy minimization is a useful criterion for image compression since wavelet-based image coders like SPIHT and EBCOT owe their performance to the efficient coding of quasi-zero energy wavelet coefficient sets. Note that p?2 is the filter that minimizes the error of predicting a wavelet basis with the other bases and using the quadratic norm given by the correlation matrix, p?2 = arg min kAp2 − wh1 [n] kR = arg min(Ap2 − wh1 [n] )T R(Ap2 − wh1 [n] ). p2

p2

This construction is only a particular case of the approach. It has been assumed the solution (3.10), but others described in the previous section including more available knowledge may be used to improve results or to construct local adaptive prediction filters. Alternatively, a coefficient h[n] may be predicted using a set of scaling samples l[n] plus a set of its causal wavelet coefficients, denoted by hc [n]. Causality is imposed in order to allow synchronization between coder and decoder. Such a technique already appears in [Sai96b] with lossless compression results comparable to transforms with larger support. Including samples


58

of the detail channel in the prediction loop implies that the synthesis filters are IIR and not necessarily with linear phase. In the reversible integer-to-integer case quantization occurs in many places and quantization errors flow into feedback paths. In consequence, quantization errors may accumulate indefinitely. [Ada00] shows that these type of transforms show worse performance in lossy compression, but their results improve as the bit-rate increases. The prediction with feedback is introduced in the interpolative setting conveniently ordering the matrix and vector components: the linear constraint independent term vector becomes bT = (lT hcT ) and the system matrix becomes A = (Wl Whc ).

3.3.3

Linear Update Steps Construction

The approach offers considerable design flexibility. The same type of construction is applied to the ULS. It has been proved that solution (3.10) leads to the solution of the problem (3.21). This last expression is properly modified to derive useful ULS. The new objective functions consider the l2 -norm of the gradient (in §3.3.3.1 and §3.3.3.2) and the detail signal energy (in §3.3.3.3) in order to obtain linear ULS applicable to a set of images sharing similar statistics. The gradient as optimization criterion is not found in literature. It only appears in some works to construct space-varying predictions [Li02, Ger06], updates [Abh03a], and in the adaptive lifting framework for the construction of ULS (e.g, [Pie01a]). The idea behind the gradient criterion is to obtain a smooth approximate signal that leads to a better prediction performance in the subsequent resolution level. The goal is related to the usual running average preserving ULS. Also, it should be pointed out that the objective of the ULS in [Pie01a] or in [Abh03a] is the opposite: to preserve salient image structures at the lower resolution level (while low-pass filtering the homogeneous regions). These schemes amount to lower resolution image representations with more significant information, but this does not necessarily imply better compression. The objective function in §3.3.3.2 is the gradient l2 -norm of the approximation signal l1 samples. Meanwhile, the objective in §3.3.3.1 is a simplified version, because it only considers the gradient with the neighbors l0 . The consequence of this simplification is twofold: the resulting design is simpler and it allows the same interpretation relying on the optimal interpolation as in the (3.19) PLS. Thus, the different interpolation types in §3.2 as well as the local adaptive techniques may be introduced in this first ULS design. The third objective function (in §3.3.3.3) aims to design an ULS with the same goal as the prediction steps, i.e., the energy minimization. In this sense, both lifting steps work in the same direction. Provided that a coefficient l[n] is updated by means of a set of detail signal samples (which are denoted by vector h[n]), the linear constraint system matrix is A = Wh and the independent


59

term vector is b = h.

3.3.3.1

First Linear ULS Design

In the first design, the objective function is set to be the l2 -norm of the substraction between the updated coefficient l1 [n], l1 [n] = l0 [n] + ˜l1 [n] = l0 [n] + uT1 b, and the set I of the neighboring scaling coefficients. Coefficient l0 [n] is updated with the quantity ˜l1 [n] = uT b. The goal is to find the optimal u? such that 1

1

u?1 = arg min f0 (u1 ), u1

with " f0 (u1 ) = E

# X

(l0 [i] − (l0 [n] + ˜l1 [n]))

2

.

(3.23)

i∈I

The assumption is that the objective function leads to smooth approximate signals that help the prediction to perform better in the next resolution level. The objective function (3.23) is developed, f0 (u1 ) =

X

h i E (l0 [i] − (l0 [n] + ˜l1 [n]))2

i∈I

=

X

i h E (wlT0 [i] x − wlT0 [n] x − uT1 b)2

i∈I

=

X h E wlT0 [i] xxT wl0 [i] + wlT0 [n] xxT wl0 [n] + uT1 bbT u1 i∈I

i −2wlT0 [n] xxT wl0 [i] + 2wlT0 [n] xbT u1 − 2wlT0 [i] xbT u1 , and differentiated with respect to u1 . Then, linear constraints are introduced and the definition of correlation matrix used, ∇u1 f0 =

i X h E 2uT1 bbT + 2wlT0 [n] xbT − 2wlT0 [i] xbT i∈I

= 2

i X h E uT1 AT xxT A + wlT0 [n] xxT A − wlT0 [i] xxT A i∈I

= 2

X

uT1 AT RA + wlT0 [n] RA − wlT0 [i] RA,

i∈I

and finally the derivative equalled to zero ∇u1 f0 (u?1 ) = 0

⇒

|I|uT1 AT RA = −|I|wlT0 [n] RA +

X i∈I

wlT0 [i] RA.


60

Let denote the mean of the neighboring approximate signal basis vectors as 1 X wI = wl0 [i] . |I| i∈I

Then, the optimal update filter minimizing the local gradient is u?1 = (AT RA)−1 AT R(wI − wl0 [n] ),

(3.24)

and the optimally updated coefficient is T T −1 l1 [n] = l0 [n] + ˜l1 [n] = l0 [n] + u?T 1 b = l0 [n] + (wI − wl0 [n] ) RA(A RA) b.

(3.25)

Interestingly, if wI = 0, then (3.25) computes the minimum l2 -norm of the gradient of l1 [n] w.r.t. the zero vector, which is equivalent to minimize the energy. In this case, the optimal update reduces to the optimal prediction of (3.22). Again, the interpretation relying on the optimal interpolation of x is possible: T ? l1 [n] = l0 [n] + u?T 1 b = l0 [n] + (wI − wl0 [n] ) x ,

since it allows the use of additional knowledge for the design. Therefore, interpolation methods that fit into the PLS goals are equivalently useful for the construction of new ULS. A related design is developed in the next section. Note that the proposal is not restricted to the construction of first updates u1 . It is also a design for intermediate or final ULS. 3.3.3.2

Second Linear ULS Design

An additional consideration on the set of approximate signal neighbors I may be included in the previous gradient-minimization design. As each sample in I is also updated, it is interesting to consider the minimization of the gradient of l[n] + ˜l[n] with respect to the updated samples l[i] + ˜l[i], i ∈ I, through a still unknown update filter. To this goal, the objective function (3.23) is modified in order to find the optimal update with this criterion. The objective function is " # " # X X 2 2 f0 (u1 ) = E (l1 [i] − l1 [n]) = E ((l0 [i] + ˜l1 [i]) − (l0 [n] + ˜l1 [n])) . i∈I

i∈I

Taking into account that the updated coefficient basis vector is wl1 [i] = wl0 [i] + Al0 [i] u1 , being Al0 [i] the constraint matrix relative to the position of sample l0 [i] and A = Al0 [n] , the objective function results " f0 (u1 )= E

# X i∈I

(wlT0 [i] x + uT1 ATl0 [i] x − wlT0 [n] x − uT1 AT x)2 ,


61

which is expanded to X h f0 (u1 ) = E wlT0 [i] xxT wl0 [i] + uT1 ATl0 [i] xxT Al0 [i] u1 + 2wlT0 [i] xxT Al0 [i] u1 i∈I

+wlT0 [n] xxT wl0 [n] + uT1 AT xxT Au1 − 2wlT0 [n] xxT wl0 [i] − 2wlT0 [n] xxT Al0 [i] u1 i +2wlT0 [n] xxT Au1 − 2wlT0 [i] xxT Au1 − 2uT1 ATl0 [i] xxT Au1 , and introducing the definition of R, X h f0 (u1 ) = E wlT0 [i] Rwl0 [i] + uT1 ATl0 [i] RAl0 [i] u1 + 2wlT0 [i] RAl0 [i] u1 i∈I

+wlT0 [n] Rwl0 [n] + uT1 AT RAu1 − 2wlT0 [n] Rwl0 [i] − 2wlT0 [n] RAl0 [i] u1 i +2wlT0 [n] RAu1 − 2wlT0 [i] RAu1 − 2uT1 ATl0 [i] RAu1 . Differentiating this expression w.r.t. u1 leads to Xh uT1 ATl0 [i] RAl0 [i] + wlT0 [i] RAl0 [i] + uT1 AT RA ∇u1 f0 = 2 i∈I

i −wlT0 [n] RAl0 [i] + wlT0 [n] RA − wlT0 [i] RA − 2uT1 ATl0 [i] RA . Equalling the expression to zero and denoting the mean of the different products of the basis vectors and matrices as AI

=

RI

=

bI

=

1 X Al0 [i] , |I| i∈I 1 X T Al0 [i] RAl0 [i] , |I| i∈I 1 X T Al0 [i] Rwl0 [i] , |I| i∈I

then, the optimal solution is described by −1 T T A R(wI − wl0 [n] ) + AI Rwl0 [n] − bI . u?1 = AT R(A − 2AI ) + RI

(3.26)

This expression of awkward appearance is simple to compute in practice, since the only difference w.r.t. (3.24) are the additional terms concerning the neighbors basis vectors means, which are known and fixed if the previous prediction step is a fixed classical PLS. 3.3.3.3

Third Linear ULS Design

A third type of ULS construction is proposed. The PLS is assumed to be the same linear filter through all resolution levels. The objective function is set to be the prediction error energy of the next resolution level. Thus, the prediction filter is employed to determine the basis vectors as


62

well as the subsequent prediction error. The ULS is assumed to be the last of the decomposition. (1)

(1)

(1)

The updated samples lL [n] are split into even lL [2n] and odd lL [2n + 1] samples that become (2)

(1)

(2)

(1)

the new approximation l0 [n] = lL [2n] and detail h0 [n] = lL [2n + 1] signals, respectively. For simplicity, L is set to 1 in the following. In the next resolution level, the odd samples are predicted by the even ones and the ULS design aims to minimize the energy of this prediction. It is also assumed that the same update filter is used for even and odd samples. Therefore, the objective function is h i f0 (u1 ) = E (l1 [2n + 1] − pT1 l1 [2n])2 = E (l0 [2n + 1] + ˜l1 [2n + 1] − pT1 (l0 [2n] + ˜l1 [2n]))2 . The prediction filter length determines the number of even samples l1 [2i] employed by the prediction. These samples appear in the column vector l1 [2n] as   .. .    l1 [2n]   . l1 [2n] =   l [2n + 2] 1   .. . Respecting this notation, the objective function results in   .. .    T  wl [2n]  0  f0 (u1 )= E  wlT0 [2n+1] x + uT1 ATl0 [2n+1] x − pT1    w T  l [2n+2]  0  .. .





    x +     

.. . T u1 ATl0 [2n] uT1 ATl0 [2n+2] .. .

 2         x  .       

Employing the prediction filter taps pT1 = ( . . . p1,i−1 p1,i p1,i+1 . . . ) the expression is set in a summation form  f0 (u1 )= E  wlT0 [2n+1] x + uT1 ATl0 [2n+1] x −

!2  X

p1,i wlT0 [2(n+i)] x −

X

i

p1,i uT1 ATl0 [2(n+i)] x

.

i

First, this expression is expanded. Then, the definition of R is used. Afterwards, the resulting expression is differentiated w.r.t. the vector u1 , reaching the expression ∇u1 f0 (u1 ) = 2uT1 ATl0 [2n+1] RAl0 [2n+1] + 2wlT0 [2n+1] RAl0 [2n+1] XX X + 2 p1,i p1,k uT1 ATl0 [2(n+i)] RAl0 [2(n+k)] − 2 p1,i wlT0 [2(n+i)] RAl0 [2n+1] i

+ 2 − 4

X

i

i

i

k

XX

p1,i p1,k wlT0 [2(n+i)] RAl0 [2(n+k)]

k

p1,i uT1 ATl0 [2(n+i)] RAl0 [2n+1] .

−2

X i

p1,i wlT0 [2n+1] RAl0 [2(n+i)]


63

Finally, it is equalled to zero and the optimal filter is derived. Being the notation similar to the precedent design, A = Al0 [2n+1] , X wp = p1,i wl0 [2(n+i)] , i

Ap =

X

p1,i Al0 [2(n+i)] ,

i

the optimal update filter is expressed as −1 T T u?1 = AT R A − 2Ap + Ap RAp A − Ap R wp − wl0 [2n+1] .

(3.27)

The final expression (3.27) is similar to the filter (3.26) obtained in the previous design. However, the optimal filter emerging from this design differs from the previous one even in the simple case that it has two taps and the prediction is p1 = ( 1/2 1/2 )T . For larger supports, the difference is more remarkable. These facts are analyzed in the experiments section.

3.4

Experiments

This section explains several experiments to prove practical applications of the developed framework. Some considerations concerning the interpolation methods are reported in §3.4.2. The formulation derived for the linear lifting filters is employed in two ways. First, as a tool to analyze existing filters optimality in §3.4.3. The basis example is the LeGall 5/3 wavelet, but the same approach is possible for any wavelet filter factorized into lifting steps. Second in §3.4.4, the formulation is used to enhance LS by improving lifting steps and adding new ones according to image or image class statistics. The new decompositions are applied to signal coding and image compression. An estimation or a model of the auto-correlation matrix R is required in the global optimization approaches. Images are assumed to be an auto-regressive process of first-order (AR-1) or second-order (AR-2) in most of the experiments. The auto-regressive model is specified in §3.4.1. The explicit auto-correlation matrix construction is also provided.

3.4.1

Auto-regressive Image Model

An auto-regressive model is a linear modeling of a discrete process based on the assumption that each value of the process depends only on a weighted sum of the previous values plus noise. Mathematically, an AR-m model for the output x[n] is x[n] =

m X i=1

ai x[n − i] + η[n],


Image Class Mean Std. dev.

Natural 0.9701 0.0710

Synthetic 0.9434 0.0686

Texture 0.8569 0.1105

Mammo. 0.9564 0.0479

SST 0.9953 0.0015

64

MOC 0.9936 0.0074

Table 3.1: Mean value and standard deviation of the ρ parameter for each image class.

Image Class a1 a2

Natural 1.0688 -0.0993

Synthetic 0.8958 0.0676

Texture 0.6671 0.2410

Mammo. 0.9374 0.0198

SST 1.1583 -0.1637

MOC 0.7993 0.1949

Table 3.2: AR-2 parameters mean value for each image class.

where ai for 1 ≤ i ≤ m are the auto-regressive parameters and η is the noise. In the AR-1 case, parameter a1 is usually denoted by ρ and it is called auto-correlation coefficient or AR-1 parameter. The auto-regressive parameters may be estimated from the image data through the YuleWalker equations or the least squares method, among other techniques. The estimation may be done for a whole image class, for a specific image, or even for a region or line in an image. Furthermore, the AR parameters may be tuned according to the statistics of each filtering direction. The parameter estimation scope determines the resulting filter range of applicability. In the AR-1 case, matrix R is completely determined by parameter ρ. The mean and standard deviation of ρ for each image class is shown in table 3.1. Appendix A describes the corpus of images employed in the experimental part. Once ρ is obtained, matrix entries are [R]i,j = ρ|i−j| , for 0 ≤ i, j ≤ n − 1 and |ρ| < 1. The AR-1 parameter is in the range 0.95 < ρ < 1 for all classes of images except for the textures class that it is ρ ' 0.86. In the AR-2 case, matrix R is determined by the second-order parameters a1 and a2 . The estimated AR-2 parameters for various image classes are found in table 3.2. Matrix R is a Toeplitz matrix and so, it is completely specified by its first row. Element [R]1,1 is set to 1 and [R]1,2 =

a1 1−a2 .

The recursion equation [R]1,j = a1 [R]1,j−1 + a2 [R]1,j−2 ,

for j > 2 gives the rest of the row elements.

3.4.2

Interpolation Methods

This part §3.4.2 is devoted to a qualitative assessment of the proposed interpolation methods. Three reasons impel to a non-exhaustive experimental setting. First, the power of these type of interpolation is partly known. The approach is constructed on the work [Mur04], which successfully apply a local adaptive interpolation equivalent to the optimal solution given by (3.8).


L

H1 L

65

H1

H2 H3 H2 H3 L

H1 L

H1

H2 H3 H2 H3 Figure 3.2: 2-D grid with a 4-band sampling.

Second, since the dissertation addresses the lifting scheme, the emphasis is put on the new lifting designs performance in §3.4.3 and §3.4.4. The third reason is a practical one. The proposed quadratic interpolation formulation is very rich and offers many different variants. The number of experiments to test all the possible variants is huge because they should contemplate the points in the list below, which also explain the basic setting for the qualitative assessment. • The method is able to construct 1-D separable or directly 2-D interpolations. In the latter case, the strategy is to split the input image in four bands as shows figure 3.2. Once the samples are partitioned in the so-called L, H1 , H2 , and H3 bands, the H3 samples are first interpolated using the other three bands, which are included as linear constraints. Then, the H2 samples are interpolated using the L and H1 ones as reference and linear constraints. Finally, the H1 band is interpolated with L band samples, leading to an approximation signal, and the three detail signals formed by the three bands of prediction errors. This is a simple way to the interpolate employing much of the available information and the one used to obtain the results given below to assess the methods performance. Other strategies may be considered. Alternatives are the use of a quincunx grid or the prediction error feedback in the interpolation of the H1 and H2 bands. • As stated, the formulation accepts local and global settings. – Global means that the same quadratic class is selected for the whole image. In this case, the image model should be chosen. – For the local adaptive interpolation, the local patches size and support have to be selected. In the experiments below, the choice is 4x4 and 8x8, respectively (like in figure 3.1). Furthermore, an initial interpolation is required. Different choices exist to this goal, being the bi-linear and the bi-cubic interpolations the preferred ones. Finally, the patches may be extracted from other similar images or images from the same class. • The interpolation method output may be re-introduced in the algorithm as an initial interpolation. The number of iterations may affect the final result and it should be determined.


66

The experiments below do not iterate if nothing else is stated. Usually, one or two iterations improve the initial results, but the performance tends to decrease in the subsequent iterations. • Six interpolation methods are articulated in section 3.2, each of which may behave differently on each image class. • In addition, some of the methods are parameter-dependant: – The signal regularized and the energy penalizing approaches balance two different objective functions according to a parameter (defined as γ and δ, respectively) that have to be specified. – The weighting objective matrix W in (3.13) should be defined by the application or the image at hand. The distance weighting depends on the image type, e.g., a textured image with a repeated pattern requires different weights than a highly non-stationary image. Clearly, the casuistry is important, but a general trend may be drawn. The interpolation given by (3.8) has a better global behavior than the others; it outperforms the other methods and it reduces the 5/3 wavelet detail signal energy from 5% to 20% for natural, synthetic, and sea surface temperature (SST) images. The results are poorer for the mammography and the textures. The weighted objective interpolation (3.13) attains very similar results to the (3.8), being better is some cases. For instance, the interpolation error energy is around 3% smaller for the texture image set. The signal bound constraint (3.11) may be useful for images with a considerable amount of high-frequency content, as the synthetic and SST classes. Some interpolation coefficients outside the bounds appear for these kind of images, and thus, the method rectifies them. However, there is no energy reduction and certainly a computational cost increase w.r.t. (3.8). The signal regularized solution (3.17) performs very well with small values of δ that give a lower weight to the regularizing factor w.r.t. the c vector l2 -norm objective. Interestingly, in the 1-D case and with a difference matrix D relating all the neighboring samples, the objective factor kDxk2 coincides with xT R−1 x being R the auto-correlation matrix of an AR-1 process with ρ → 1. Therefore, the signal regularized method may be seen as an interpolation mixing local signal knowledge with an image model. The inclusion of the energy penalizing factor in the formulation did not prove to be useful for the available images set because it damages the final result. Maybe, this factor could be considered for highly-varying images like SAR images in order to avoid the apparition of extreme values.


PSNR Baboon Barbara Cheryl Farm Girl Lena Peppers

Bi-cubic 22.447 24.647 35.299 22.439 34.355 33.973 32.020

A - 1 it. 22.459 24.084 35.414 22.658 34.538 34.255 31.724

A - 2 it. 22.395 23.653 35.361 22.736 34.610 34.385 31.776

B - 1 it. 22.468 24.253 35.433 22.647 34.518 34.226 31.881

B - 2 it. 22.433 23.905 35.438 22.737 34.586 34.349 31.839

67

C - 1 it. 22.522 24.756 33.940 21.416 33.072 32.147 30.962

Table 3.3: Interpolation PSNR from down-sampled images using the bi-cubic, the basic quadratic interpolation of (3.8) (column noted by A) and the distance weighted objective (B) with 1 and 2 iterations, and the regularized signal objective (C) with 1 iteration.

PSNR Baboon Barbara Cheryl Farm Girl Lena Peppers

Bi-cubic 22.356 24.296 32.736 20.539 31.693 30.606 29.875

A - 1 it. 23.810 25.653 34.161 22.265 33.232 32.107 31.105

A - 2 it. 23.695 25.741 34.819 22.490 34.034 33.058 31.573

B - 1 it. 23.717 25.610 34.091 22.176 33.147 32.049 31.149

B - 2 it. 23.745 25.753 34.759 22.486 33.936 32.960 31.648

C - 1 it. 23.595 25.831 33.620 21.963 32.762 31.583 30.775

Table 3.4: Interpolation PSNR from the averaged and down-sampled images using the bi-cubic, the basic quadratic interpolation (A) and the distance weighted objective (B) with 1 and 2 iterations, and the regularized signal objective (C) with 1 iteration.

3.4.2.1

Interpolation Methods PSNR Performance

The interpolation methods are further assessed with the ensuing experiment. The bi-cubic interpolation serves as benchmark and the comparison criterion is the PSNR, defined as 2552 PSNR = 10 log10 . MSE Table 3.3 shows some results concerning the natural images with 512x512 pixels of appendix A. Images are down-sampled by a factor of 2 without anti-aliasing filter, and then the down-sampled image is re-interpolated using different methods and number of iterations. Their performance tend to be quite similar to that of the bi-cubic interpolation. Certainly, there are many interpolation methods that outperform these ones, but the interest of the proposed methods resides in the inclusion of additional low-pass filtering constraints and in their application to lifting design. Following variation of the experiment resembles more to the lifting setting. Table 3.4 shows the results for the case that each pixel is the average of four high-density pixels before the down-sampling. Performance in terms of PSNR is better than the bi-cubic interpolation. In addition of the PSNR performance, the resulting interpolated images are less blurry and


68

sharper around the existing edges if the adequate set of parameters is selected. However, the interpolation goal for this dissertation is the lifting steps construction. The assumption that from a good interpolation arises good PLS as well as good first design ULS is made. Another important aspect is the derivation of closed-form LS, which is a desirable property in most of the applications due to its significantly lower computational cost. Following sections deal with lifting steps performance.

3.4.3

Optimality Analysis

The LeGall 5/3 wavelet introduced in §2.2.4 is analyzed through the point of view given by the optimal lifting steps derived in section 3.3. The LeGall 5/3 low-pass or scaling basis vectors have the form wl1 [n] = ( . . . 0 −1/8 2/8 6/8 2/8 −1/8 0 . . . )T , being equal to the 0 vector except for the locations from 2n − 2 to 2n + 2. Meanwhile, the high-pass or wavelet basis vectors have the form wh1 [n] = ( . . . 0 0 −1/2 1 −1/2 0 0 . . . )T , being the 0 vector except for the positions 2n, 2n + 1, and 2n + 2. The prediction lifting filter is p1 = ( 1/2 1/2 )T and the update lifting filter is u1 = ( 1/4 1/4 )T . In the following, such filters are denoted by pLG and uLG , respectively. The optimality of uLG is studied in §3.4.3.2 according to the three ULS designs and the AR image model. The ULS are derived for the prediction pLG . For fair comparison, the proposed ULS employ two neighbors as the uLG filter. Therefore, in practice the application simply reduces to propose a coefficient different from 1/4 for the update filter (since it is symmetric). The proposals attain noticeable improvements even in this simple case. Also, several considerations on pLG are formulated in §3.4.3.1. Finally, second PLS according to expression (3.20) are presented in §3.4.3.3 assuming the initial pLG and uLG . These PLS are optimized for each image class. Some of the resulting basis vectors are given. Results are compared to those of two other known 5/11 transforms. Notice that the type of analysis described in the following may be applied to any existing transform via lifting scheme. 3.4.3.1

First Prediction Step Study

One of the simplest applications of the stated linear lifting framework design is the prediction of h0 [n] with the samples l0 [n] and l0 [n + 1]. Which is the best way of doing such a prediction?


69

0.5 0.45 First Prediction Coefficient

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0

0.2

0.4 0.6 AR−1 parameter

0.8

1

Figure 3.3: Optimal prediction filter as function of the AR-1 parameter using 2-taps and the design (3.20).

Intuitively, the answer is p1 = pLG , or at most, a linear combination of l0 [n] and l0 [n + 1] with coefficients summing up to one. Spectral considerations and vanishing moments also point to the LeGall 5/3 prediction as the best choice. However, the proposed prediction (3.20) gives different answers depending on the auto-correlation matrix R. Basis vectors wl0 [n] , wh0 [n] , and wl0 [n+1] are composed of zeros except one 1 at the position 2n, 2n + 1, and 2n + 2, respectively. These vectors are plugged into (3.20) and the optimal p1 is derived assuming an AR-1 image model. The prediction filter depends on the parameter ρ as figure 3.3 shows. When ρ → 1, the optimal prediction tends to the intuitive pLG . Data is highly correlated and so, its projection onto the vectors wl0 [n] and wl0 [n+1] is informative about 1 the projection onto wh0 [n] , i.e., about the value of h0 [n]. The construction respects symmetry, leading to p1 = pLG . There is no correlation among data when ρ = 0, i.e., when R = I. In this case, there is no information in l0 [n] and l0 [n + 1] about h0 [n], and the expression (3.20) says that any attempt to predict the value h0 [n] amounts to an MSE increase: in mean, the residual has higher energy. Intermediate coefficient values appear for 0 < ρ < 1. Despite these results differ from the usual pLG , the answer given by the proposal is mathematically consistent according to the image model. Therefore, a possible approach weakness is found in the image model determined by R. An AR-1 model may be suited for many image applications as its wide use in image processing confirms, but certainly not for all. As an example, the predictor arising from the texture images case ρ ' 0.86 is p1 = ( 0.494 0.494 )T , which leads to systematically worse results compared to pLG despite the coefficients are very similar. A more suitable image model seems to be the AR-2. A signal generated with an AR-2 model resembles to an image row or column much more than an AR-1 process insofar as the AR-2 parameters sum approximately to one. Figure 3.4 relates the optimal linear PLS (3.20) with the


70

0.5 −0.8 −0.6

0.25

−0.4

a2

−0.2

0.1

0 0.05

0.2 0.4

0.01

0.6 0.8 0.5

1 a1

1.5

2

0

Figure 3.4: Level sets of the optimal prediction coefficient minus 0.5 as a function of the AR-2 parameters using 2-taps and the linear PLS design (3.20).

second-order auto-regressive parameters. The figure shows six level sets of the optimal prediction coefficient with respect to the a1 and a2 parameters. The function is the absolute value of the optimal prediction coefficient for the given parameters minus 1/2. Thus, the resulting filter is similar to pLG in the dark areas and it is different in the light areas. Note that the gray-scale is not uniform w.r.t. the function value. As it can be observed from the figure, the optimal PLS based on the AR-2 model is almost equal to pLG for most part of the parameters. Indeed, prediction coefficients are 1/2 for the set a1 + a2 ' 1. The set a1 + a2 = 1 is relevant because an AR-2 model with these parameters preserves the sample mean expected value. 1

3.4.3.2

Update Step Study

Assuming an AR-1 process and the initial prediction pLG , the three linear ULS of §3.3.3 lead to coefficients depending on ρ as depicted in figure 3.5. The second and the third designs lead to similar coefficients for all the range. Meanwhile, the ULS coefficient arising from the first design is smaller for all the interval. Asymptotically (ρ → 1), the second ULS design output doubles the coefficients of first and third ones. The update filter coefficients are considerably below the 1/4 reference for the three designs and the usual ρ found in practice (like the ones in table 3.1). This fact agrees with the common observation that in some cases the uLG omission increases compression performance, being the ULS included in the decomposition process because of the multi-resolution properties improvement. The issue of the ULS employment can be approached from the perspective given by the proposed linear ULS designs: the ULS is useful, but the correct choice is an update coefficient quite smaller than 1/4 (as the three ULS indicate for the usual ρ


71

0.4 First ULS design Second ULS design Third ULS design

Update Filter Coefficient

0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0

0.2


0.8

1

Figure 3.5: Update filter as function of the AR-1 parameter for the three ULS designs. The update is a two-tap symmetrical filter and so, it is depicted only one coefficient. The first considered prediction is the pLG .

values). The optimal ULS for each of the three designs are also derived assuming a second-order autoregressive model. For a subset of the AR-2 parameters, the resulting optimal update coefficients coincide with uLG , but not for other possible values. Figures 3.6 highlight this fact. For the three cases, each figure relates the optimal update coefficient according to the given criterion w.r.t. the AR-2 parameters. Six level sets of the update coefficient are depicted as a function of a1 and a2 . From the figure, it is concluded that uLG is far from being optimal in the sense of (3.26), (3.24), and (3.27) for many possible image AR-2 parameters. To position a practical 1

reference, the three circles in figure 3.6b depict the mean AR-2 parameters of the synthetic, mammography, and SST image classes. 3.4.3.3

Second Prediction Step Study

The LeGall 5/3 wavelet properties may be improved by means of a second PLS. The high-pass filter support is increased to 11 taps if the PLS uses the samples l1 [n] = ( l1 [n − 1] l1 [n] l1 [n + 1] l1 [n + 2] )T . The inclusion of more approximation signal samples in the second PLS is possible, but it does not assure a performance improvement, since the high-pass filter becomes very lengthy. The 5/11-a transform via lifting in table 3.5 is proposed in [Cal98], being p2 = ( −1/16 1/16 1/16 −1/16 )T . The 5/11-a is a (4,2) transform: it has 4 analysis vanishing moments (and the 2 synthesis


0.5 −0.8 −0.6

0.25

−0.4

a2

−0.2

0.1

0 0.05

0.2 0.4

0.01

0.6 0.8 0.5

1 a1

1.5

2

0

(a) 0.5 −0.8 −0.6

0.25

−0.4

a2

−0.2

0.1

0 0.05

0.2 0.4

0.01

0.6 0.8 0.5

1 a1

1.5

2

0

(b) 0.5

1

−0.8 −0.6

0.25

−0.4

a2

−0.2

0.1

0 0.05

0.2 0.4

0.01

0.6 0.8 0.5

1 a1

1.5

2

0

(c) Figure 3.6: Level sets of the optimal update 1coefficient minus 0.25 in function of the AR-2 parameters for (a) the first linear ULS design, (b) the second linear ULS design (circles indicate the mean parameters of the synthetic, mammography, and SST image classes), and (c) the third linear ULS design.

72


Filter Name

5/11-a

5/11-b

73

Lifting Steps h1 [n] = h0 [n] − 12 l0 [n] + 12 l0 [n + 1] l1 [n] = l0 [n] + 14 h1 [n − 1] + 14 h1 [n] 1 1 h2 [n] = h1 [n] − − 16 l1 [n − 1] + 16 l1 [n] + 1 h1 [n] = h0 [n] − 2 l0 [n] + 12 l0 [n + 1] l1 [n] = l0 [n] + 14 h1 [n − 1] + 14 h1 [n] 1 1 l1 [n − 1] + 32 l1 [n] + h2 [n] = h1 [n] − − 32

1 16 l1 [n

+ 1] −

1 16 l1 [n

+ 2]

1 32 l1 [n

+ 1] −

1 32 l1 [n

+ 2]

Table 3.5: Lifting steps for the 5/11-a and 5/11-b transforms.

vanishing moments coming from the 5/3 structure). The 5/11-b transform in table 3.5 is proposed in [Ada99], which is designed considering several criteria (cf. §2.3.1). The filter coefficients are one-half w.r.t. the 5/11-a, being p2 = ( −1/32 1/32 1/32 −1/32 )T . The 5/11-b is a (2,2) transform, but it attains better results than the 5/11-a for images with high-frequency content. The linear PLS (3.20) for an AR-1 model is applied to the LeGall 5/3 wavelet to obtain a second prediction and the corresponding 5/11 transform. Figure 3.7 depicts p2 as a function of ρ. As a reference, the horizontal grids in the graph depict the values -1/16, -1/32, 1/32, and 1/16 (which are the coefficients of the 5/11-a and 5/11-b second prediction). The graph confirms that values close to 1/16 are suitable for smoother images, and the coefficients may decrease (in absolute value) when there is more high-frequency content, i.e., when ρ moves away from one. Another counter-intuitive effect due to the chosen image model is that prediction coefficients do not sum up to zero for small ρ values. Figure 3.7 shows this effect through the dashed line P that depicts i p2,i . The scheme is applied to some image classes. The optimal second prediction steps p?2 using the ρ of natural, textures, and SST images are p?2,natural = ( −0.05960 0.05966 0.05966 −0.05960 )T , p?2,texutre = ( −0.05832 0.05852 0.05852 −0.05832 )T , p?2,SST = ( −0.05969 0.05970 0.05970 −0.05969 )T , which are close to the 5/11-a because their ρ are close to one. The corresponding underlying wavelet basis vectors are given in table 3.6. Lifting steps have been analyzed by means of the developed framework. The following section employs the optimal steps arising from the framework in order to check their coding performance.


74

Second Prediction Coefficients

1/16

p2,2 , p2,3 p2,1 , p2,4

1/32

0

−1/32

−1/16

0

0.2


0.8

1

Figure 3.7: Second prediction filter p2 as function of the AR-1 parameter, being p1 = pLG and u1 = uLG . The horizontal grid lines depict the coefficients of the 5/11-a P and 5/11-b second prediction filter. The dashed line depicts the sum of the filter coefficients: i p2,i .

n 0 ±1 ±2 ±3 ±4 ±5

wh2 [n] 0.97017 -0.54474 -0.00001 0.05216 0.01490 -0.00745

n 0 ±1 ±2 ±3 ±4 ±5

(a)

wh2 [n] 0.97074 -0.54387 -0.00005 0.05106 0.01458 -0.00729

n 0 ±1 ±2 ±3 ±4 ±5

(b)

wh2 [n] 0.97015 -0.54477 0 0.05223 0.01492 -0.00746 (c)

1 optimal second prediction step to the LeGall Table 3.6: Underlying basis vectors applying the 5/3 wavelet, computed with the ρ for (a) natural, (b) texture, and (c) SST image classes.

3.4.4

Improved Linear Lifting Steps Performance

Despite the LS steps developed in this chapter allow the construction of 2-D non-separable filters, all the following experiments except one are restricted to 1-D separable decompositions. The transforms are integer-to-integer applied to lossless (image) compression.

3.4.4.1

ULS with AR-1 Signal Test

In the first experiment, the input signal is synthetic data generated to check the proposal performance for the assumed image model. An AR-1 process containing 512 samples is decomposed in three resolution levels using pLG followed by uLG or one of the three designed ULS. These four (1)

transforms are compared by computing the gradient l2 -norm of l1

(2)

and the h1

signal mean

energy, which are the second and third ULS objective functions. Figures 3.8 and 3.9 show the


75

Gradient l2−norm (relative to LeGall 5/3)


1.06

1.04

1.02

1

0.98

0.96 0

0.2


0.8

1

Figure 3.8: Relative gradient of l1 for the optimal ULS w.r.t. LeGall 5/3.

Energy high−pass band (relative to LeGall 5/3)


1

0

0.2


0.8

1

1 (2)

Figure 3.9: Relative energy of h1

for the optimal ULS w.r.t. LeGall 5/3.

mean results for 1000 trials. The relative gradient and energy of the three ULS w.r.t. the LeGall 5/3 wavelet are depicted. Second and third design are almost equal and outperform LeGall 5/3 in terms of energy and gradient for all ρ except for ρ ' 0.27; value for which the three design coefficients coincide. The first design shows worse performances, in particular for the case of small ρ. However, this design has more flexibility and may incorporate additional knowledge that leads to a better image model. The given results are obtained with an additive white gaussian noise of a standard deviation 1 equal to 50. The relative gradient and energy are weak functions of the AR process noise variance.

The weighted entropy has also been computed, but it is a function sensitive to the noise variance. However, a qualitative conclusion may be drawn: entropy is higher for the LeGall 5/3 w.r.t. the


o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

(a)

2 2

2

76

2

x

1

x

1

o

1

x

1

x

2

2

2

2 (b)

Figure 3.10: (a) A rectangular grid divided into two quincunx grids (marked ‘o’ and ‘x’). (b) PLS support on quincunx grid. Pixel at site ‘o’ is predicted with pixels at sites ‘1’ (if a 4-tap filter is used) and also with pixels at sites ‘2’ (if a 12-tap filter is employed). Pixels at sites ‘x’ perform the 4-tap prediction at the next resolution level.

second and third design for ρ near 1. On the contrary, LeGall 5/3 attains better entropies for ρ close to 0. 3.4.4.2

ULS on a Quincunx Grid with AR Data

This second experiment contributes a lifting construction example of a non-separable wavelet on a quincunx grid. A 2-D rectangular grid may be down-sampled into two quincunx grids (figure 3.10a), which become the approximation and detail signals. The derivation of ULS on the quincunx lattice within the developed framework is more involving than the 1-D separable case. The underlying basis vectors are two dimensional and their support is larger than the 1-D case. The 2-D basis must be mapped into 1-D vectors to match the linear LS setting. Also, the number of neighboring samples employed for the filtering is greater. These facts complicate the implementation, especially for large filter supports. Therefore, this experiment is restricted to small supports: the prediction of a pixel at site ‘o’ employs the first ring of neighbors, which are the ones indicated in figure 3.10b with a ‘1’. The considered prediction filter coefficient is 1/4 for the four neighbors. This filter is known as the second-order Neville filter [Kov00]. The subsequent 4-tap ULS vanishing 2 moments uses the coefficient 1/8, i.e., the update filter is unv = ( uN

uE uS uW )T = ( 1/8 1/8 1/8 1/8 )T ,

following the notation of figure 3.11a. Larger prediction supports may been considered (e.g., [Kov00]). For instance, the sites marked ‘2’ in the figure 3.10b may be included in the PLS, like in the fourth-order Neville filter. The third ULS design leads to minimize the subsequent prediction error energy. Thus, it is preferred that the ULS is immediately followed by the PLS of the next resolution level instead


uN uE

o

77

a2 a3 a4 uw

a1

o

uS (a)

(b)

Figure 3.11: (a) Support and notation for the 4-tap ULS on the quincunx grid. (b) Typical geometry of a 2-D auto-regressive neighborhood. Four pixels are used for generating the pixel at site ‘o’.

of the filtering in the horizontal/vertical direction as it happens in the 1-D separable case. The two-band quincunx grid permits this. The neighbors employed in next resolution PLS are marked with a cross in figure 3.10b. The linear constraint matrix for each of these four coefficients is plugged into equation 3.27 to obtain the optimal ULS. The filters have the restriction uN = uS and uE = uW by the directional symmetry of the construction and by the auto-correlation matrix structure. As in the previous experiment, the input signal is synthetic data generated according to an auto-regressive model. Four neighboring pixels are considered for the predictive model as figure P 3.11b shows. The constraint 4i=1 ai = 1 is imposed. Images with 512x512 pixels are generated. A 3-level quincunx decomposition is computed using the 4-tap prediction with coefficient 1/4 and the optimal ULS. For comparison, images are also decomposed with the second-order Neville (2)

filter. The comparison criterion is again the h1

signal mean energy. The decomposition is

performed by means of a modified version of the Matlabr LISQ toolbox [Zee02]. In contrast with the 1-D case, the non-separable optimal third ULS design does not perform consistently better than the reference filter. The result depends on the AR signal orientation. If the signal has a dominant horizontal feature (a1 ≈ 1 and a2 , a3 , a4 ≈ 0), the update filter is u = ( −0.4645 −0.9435 −0.4645 −0.9435 )T , (2)

which is very different from unv and it remarkably reduces the h1 energy around a 99%. The same behavior appears when the dominant direction is the vertical, but interchanging the filter coefficient values. Results worsen when the dominant direction moves away the horizontal or vertical directions. The worst results appear when the dominant direction is 45o (e.g., a1 , a3 = (2)

1/2 and a2 , a4 = 0). In this case, the third ULS design increase the second-order Neville h1 energy around a 15%. The update filter turns out to be u = ( 0.3418 0.3474 0.3418 0.3474 )T ,

that is, the design points out a filter with equal-valued coefficients like unv , but this value is significantly higher, and it damages the final performance compared to unv . Maybe the approach


Rate (bpp) Synthetic SST Mammography

5/3 wavelet 3.832 3.252 2.349

78

AR-1 model 3.508 3.123 2.358

Table 3.7: Compression results with JPEG2000 using the standard LeGall 5/3 wavelet and the proposed optimal update with the AR-1 model for the synthetic, mammography, and SST image classes. Results are given in bits per pixel.

should also contemplate a constraint on the filter coefficients or other considerations in order to avoid this drawback. Given this performance disparity, the proposed quincunx ULS may be employed in practice in a space-varying setting which determines the dominant signal direction triggering the use of the proposed ULS in the favorable cases and the Neville filter otherwise.

3.4.4.3

Local Adaptive ULS

This experiment is described in [Sol06a]. An AR-2 model is used to determine the local image behavior. A line-wise space-varying update filter is constructed by estimating the AR-2 parameters for each line in the image using the filter given by (3.26). To assess the performance, the (2)

energy of the coarser level detail coefficients h1 is computed. For the set of natural images, the energy is up to 25% smaller for the space-varying optimal update step w.r.t. uLG .

3.4.4.4

Image Class Optimal ULS Test

This fourth experiment (also appeared in [Sol06a]) derives filters applicable to a more global setting. The AR-1 parameter is estimated for three image classes. Therefore, the model is useful for a whole corpus of images instead of being local. Synthetic, mammography, and SST images are used. Each corpus contains 15 images. The correlation matrix is determined by the AR-1 parameter (in table 3.1), and it is plugged into equation (3.24) in order to obtain an update filter used for all the images in a class. Image compression is performed with a four resolution level decomposition within the JPEG2000 coder environment. Numerical results appear in table 3.7 compared to the LeGall 5/3 wavelet. The proposal compression results improve those of the LeGall 5/3 for the synthetic and SST image classes, but results slightly worsen for the mammography class. The latter case is analyzed in the next experiment.


79

Figure 3.12: A mammography image histogram.

3.4.4.5

A Refinement for Mammography

The optimal ULS (3.24) results in experiment of §3.4.4.4 are worse for the mammography image class w.r.t. the LeGall 5/3 wavelet. The reason may be found in the formation of this kind of images. Clearly, there are two differentiated regions: an homogenous dark one containing the background and a light heterogeneous foreground. Figure 3.12 is the histogram of a mammography. There is an accumulation of light pixels between the 100 and 200 gray level due to the foreground. Background pixels are found at the smaller values, typically less than 50. Most part of background pixels accumulate around the 0 gray level (which is difficult to distinguish in the histogram figure). Background and foreground have distinct auto-correlation and AR parameters. The mean of both AR parameters is not optimal for any of the two regions. A more accurate approach for this class should contemplate an AR model or derive an auto-correlation matrix for each of the regions separately. Thus, an image segmentation is required. The histogram inspection suggests the following segmentation algorithm. As the gray level almost characterizes the two regions, a binary image is formed by thresholding at the gray level T = 50. The resulting binary image has two differentiated regions with some pixels placed in the wrong part. A morphological opening with a disk of radius 5 as the structuring element corrects the misplaced pixels. Two distinguished and connected regions constitute the final segmentation. Figure 3.13 shows the initial image and the segmentation. This threshold plus opening segmentation algorithm is simple and obtains crude results compared to many other techniques, but the output is good enough for the experiment purposes. Using the segmented binary image as a mask, the auto-correlation matrix is directly estimated for each region, as well as the AR-2 parameters. Then, the optimal ULS are derived.


(a)

80

(b)

Figure 3.13: (a) A mammography and (b) the segmentation in background and foreground.

Both auto-correlation matrices lead to similar update coefficients. For instance, the third design coefficient for the background using the AR-2 auto-correlation matrix is 0.12584 and the coefficient using the direct estimation of R is 0.12053. Second and third designs attain very similar coefficients, while the first design coefficient tends to be one half of them. This was proved in the analysis of §3.4.3 for the AR-1 case when ρ → 0. With the direct correlation estimation, second linear ULS background coefficient is 0.13209 and the foreground coefficient is 0.02430. Meanwhile, for the third design the background coefficient is 0.12053 and the foreground coefficient is 0.02456. In view of these results, dyadic coefficients are used for the mammography coding: 1/8 = 0.1250 for the background and 1/32 = 0.03125 for the foreground. Therefore, the background and foreground filters are ub = ( 1/8 1/8 )T and uf = ( 1/32 1/32 )T , respectively. Once the coefficients are determined, the image decomposition does not require any mask. The prediction is followed by a space-varying ULS that depends on the next approximation coefficient value. If this coefficient is greater than the threshold T , it means that the region is foreground an the uf filter is employed. Otherwise, the region is considered to be background and the optimal filter for the background ub is used: ( l1 [n] =

l0 [n] + uTf h1 [n], if l0 [n + 1] > T, l0 [n] + uTb h1 [n], otherwise,

The decoder has to take into account this coding modification in order to be synchronized w.r.t. the coder and to decide the filter according to the same data. Image compression is again performed with a four resolution level decomposition within the JPEG2000 coder environment. The mean results for the 15 mammography decreases from 2.358 bpp to 2.336 bpp.


Rate (bpp) Synthetic SST Mammography

5/3 wavelet 3.832 3.252 2.349

5/11-a 4.044 3.317 2.349

5/11-b 4.011 3.288 2.348

Opt. P AR-1 4.036 3.312 2.349

81

Opt. P AR-2 4.027 3.329 2.347

Table 3.8: Compression results with JPEG2000 using the standard LeGall 5/3 wavelet, the 5/11a, the 5/11-b, and the optimal second prediction according to the AR-1 and AR-2 models for the synthetic, mammography, and SST image classes.

3.4.4.6

Optimal Second PLS Test

Second prediction optimal design is tested using the AR-1 and AR-2 models for each image class. Results are given in table 3.8 for 4 resolution level decompositions. In all cases, the optimized second prediction is a filter with coefficients in between the 5/11-a and the 5/11-b ones. This is due to the image model and the estimated parameters (table 3.1 and 3.2). Both models attain similar results. The performance tends to be better for the 5/11-b, which would arise in the optimized prediction setting for smaller ρ than the estimated ones. Furthermore, the 5/3 wavelet performs significantly better. These results coincide with the findings in [Ada00]: in lossless compression the 5/3 transform yields to better results than the 5/11 transforms for images with relatively greater amount of high-frequency content, often by a considerable gap. The 5/3 wavelet implies ρ ≈ 0.27 in the AR-1 model, which is far from the estimated parameters. The 5/11 transforms outperform the 5/3 in lossy compression and in lossless compression for natural imagery [Ada00]. This observation is consistent with our experiment using the 512x512 natural images. They are decomposed in 4 resolution levels and compressed with EBCOT. Results appear in table 3.9. The optimized second prediction for the image class (columns “O.P. AR-1” and “O.P. AR-2”) yields to similar results for both models, and almost equal to those of the 5/11-a. The model may be computed for each image. The rates attained by the AR-1 model are given in the column headed by “AR-1 Im.”. Results are also similar to the 5/11-a transform because 0.9 ≤ ρ < 1. Further adaptation may be conceived. For instance, the optimal second prediction may be computed for each resolution level in the image. In this case, the mean rate is 4.633 bpp with the AR-1 model. Even more, images may be partitioned and the optimal prediction estimated for each part, like in the works [BB03, Hat04, Hat05] that use a quadtree structure to convey the partition. Book-keeping may be required depending on the estimation procedure. The trend is that performance improves when the model is better matched to the data.


Rate (bpp) Baboon Barbara Cheryl Farm Girl Lena Peppers Mean

5/3 wav. 6.109 4.776 2.442 6.426 3.956 4.314 4.625 4.664

5/11-a 6.092 4.691 2.414 6.402 3.938 4.284 4.627 4.635

5/11-b 6.091 4.723 2.425 6.407 3.937 4.285 4.617 4.641

O.P. AR-1 6.093 4.691 2.415 6.403 3.936 4.280 4.626 4.635

O.P. AR-2 6.092 4.685 2.415 6.403 3.937 4.284 4.632 4.635

82

AR-1 Im. 6.093 4.691 2.415 6.402 3.935 4.281 4.626 4.635

Table 3.9: Compression results with JPEG2000 using the standard LeGall 5/3 wavelet, the 5/11a, the 5/11-b, and the optimal second prediction according to the AR-1 and AR-2 models for a set of seven natural images of 512x512 pixels.

3.5

Chapter Summary and Conclusions

This chapter develops a linear framework employed to derive new lifting steps. The point of departure is a quadratic interpolation method from which several alternatives are given. The conclusion regarding the proposed methods is that their performance in terms of PSNR is around 1.5 dB better than the bi-cubic interpolation when the image being interpolated has been lowpass filtered before the down-sampling. However, the final result depends on the appropriate choice of the interpolation method and its parameters to the image at hand. In a natural way, the initial interpolation formulation is used for the LS design by adding an extra set of linear equality constraints. This permits the design of PLS minimizing the detail signal energy and the design of ULS with approximation signal gradient criteria. Indeed, the given optimal interpolation obtained with any of the precedent methods may be applied to create new PLS and ULS. The framework is also employed for an optimality analysis of the LeGall 5/3 wavelet according to the established criteria. The main conclusion is that there are image classes for which this commonly used wavelet is not optimal. The compression results within the JPEG2000 environment confirm this observation. Also in this case, a correct choice of the image model and parameters is required to obtain the best results. Finally, the lifting design framework flexibility is demonstrated with the variety of described experiments. Different image models are used to derive lifting steps on quincunx grid, spacevarying ULS, first and second PLS, line-wise adaptive ULS, and the two ULS for the mammography. This chapter concludes the research contribution made in the linear lifting scheme. Chapter 4 and 5 address the lifting scheme design and optimization from a nonlinear perspective.

Chapter 4

From Adaptive to Generalized Lifting This chapter deepens in the adaptive lifting, characterizes the perfect reconstruction property, and introduces the generalized lifting scheme. While chapter 3 focuses on linear lifting and the optimization of filters in a linear function setting, this chapter introduces and develops schemes that are essentially nonlinear. Section 4.1 is a detailed description of the adaptive lifting scheme required for the subsequent analysis of the scheme key characteristics in section 4.2. The analysis permits the construction of lifting steps with new criteria within the adaptive framework in section 4.3. The proposal in §4.3.1 is based on a median decision function while the adaptive scheme in §4.3.2 relies on a variance-based decision function. Finally, the analysis in §4.2 leads to the generalization of the (adaptive) lifting scheme presented in section 4.4. The derivation of concrete generalized lifting steps and the description of experiments and the obtained results is postponed to chapter 5.

4.1

Adaptive Lifting Description

The adaptive LS is a modification of the classical lifting proposed in [Pie01a]. In the description below and for simplicity, it is assumed without loss of generality that the adaptive lifting step is an ULS (like in figure 2.10). This is consistent with the description started in §2.2.6. In the adaptive scheme, at each sample a lifting filter is chosen according to a decision function D(x[n], y), which may be a scalar-valued, a vector-valued, or a set-valued function of Rn . The decision D(x[n], y) depends on y, as in the space-varying lifting case, but it also depends on the sample x[n] being modified by the ULS. The decoder knows the coefficient x0 [n], which is an updated version of x[n] through an unknown lifting filter. Coder and decoder have 83

Chapter 4. From Adaptive to Generalized Lifting

84

different information to take the same decision. A goal in adaptive lifting design is to find a decision function and a set of filters that allow to recover the coder decision D(x[n], y) at the decoder side, i.e., D(x[n], y) = D0 (x0 [n], y).

(4.1)

This is the decision conservation condition. If the decision is recovered, then the decomposition scheme may be reversible. The decision function domain is the sample domain of the approximation signal X and a set of k times the detail signal domain Y, since a set of k detail samples in a window around x[n] is employed for the decision, D : X × Yk

→ D

(x[n], y[n]) → d.

(4.2)

Usually, X and Y are the real numbers R, e.g. (4.3), the integer numbers Z, or a finite set of integers Zn . The function D(.) maps the input samples to the decision range, which is the real positive numbers or the binary numbers. The result d is used to choose the update filter and the addition operator that merges the update filter output with the sample x[n] (usually through a linear combination). The range of D may indicate whether there exists an edge at x[n] if D is the l1 -norm of the gradient D : R × Rk

→ R+ X (x[n], y[n]) → |yi − x|,

(4.3)

or whether x[n] resides in a textured region or any other geometrical constraint. Depending on the detected signal local characteristics, a suited lifting filter for these characteristics is chosen. A relevant feature of the adaptive scheme is that it does not require any book-keeping to enable PR at the decoder side despite the filter may vary at each location using non-causal information. In this context, non-causal information is referred to information available at the coder to perform the filtering but not available at the decoder side at the time of performing the inverse filtering. In [Pie01a], the proposed adaptive ULS employs two detail samples, i.e., k = 2 in expression (4.2). The restriction to k = 2 is also satisfied in the classical 1-D lifting with the LeGall 5/3 filter. The approximate signal sample x0 [n] is found through the update coefficients (αd , βd , and γd ) for the given decision, x0 [n] = αd x[n] + βd y[n − 1] + γd y[n].

(4.4)


85

The sum of the filter coefficients is defined as κd = αd + βd + γd . Decision maps are restricted to be based on the gradient vector, noted vT [n] = ( v[n] w[n] ) = ( x[n] − y[n − 1] y[n] − x[n] ), in the following form D(x[n], y[n − 1], y[n])[n] = d(v[n], w[n]), where d : R × R → D. Observe that v[n] + w[n] = y[n] − y[n − 1] does not depend on x[n]. Therefore, if d(v[n], w[n]) depends only on v[n] + w[n], the scheme is reduced to the non-adaptive case. The following auxiliary results are proved in [Pie01a]. Lemma 4.1 [Pie01a, Lem. 5.1] Consider a gradient-based decision map. In order to have perfect reconstruction it is necessary that κd is constant on every subset D(c) ⊆ D given by D(c) = {d(v, w) | v + w = c}, where c ∈ R is a constant. Proof. Assume that for some c ∈ R there exist d1 , d2 ∈ D(c) such that κd1 6= κd2 . Also, assume that (vj , wj ) is such that d(vj , wj ) = dj for j = 1, 2. Let the signals xj , yj be such that yj [n − 1] = q, xj [n] = q + vj , and yj [n] = q + vj + wj = q + c. From (4.4), it is obtained xj [n] = αdj (q + vj ) + βdj q + γdj (q + c) = κdj (q + vj ) − (βdj + γdj )(q + vj ) + βdj q + γdj (q + c) = κdj q + κdj vj − βdj vj + γdj wj . If q is chosen in such a way that κd1 q + κd1 v1 − βd1 v1 + γdj w1 = κd2 q + κd2 v2 − βd2 v2 + γd2 w2 , which is possible since it has been assumed that κd1 6= κd2 , then x01 [n] = x02 [n]. Since y1 [n − 1] = y2 [n − 1] and y1 [n] = y2 [n], this implies that PR is not possible.

Definition 4.1 (Injection) Let f be a function defined on a set A and taking values in a set B. Then, f is said to be an injection (or injective map, or embedding) if, whenever f (x) = f (y), it must be the case that x = y. Equivalently, x 6= y implies f (x) 6= f (y).


86

Note that the proof of lemma 4.1 is based on the injection of the gradient-based decision map. The PR condition on κd is established by assuring that whenever x01 [n] = x02 [n], it is not possible that x1 [n] = x2 [n] (being y1 [n − 1] = y2 [n − 1] and y1 [n] = y2 [n]). In other words, the value x[n] should be derived without ambiguity from the values of x0 [n], y[n − 1], and y[n]. Assume now that the decision is given by the l1 -norm of the gradient, i.e., d(v, w) = |v| + |w|.

(4.5)

In this case, the following lemma holds. Lemma 4.2 [Pie01a, Lem. 6.1] If the decision is given by (4.5), then it is necessary that κd is constant for all d ∈ D in order to have PR. This result is derived from lemma 4.1. It states that κd has to be constant for every subset D(c). If the decision is (4.5), then the subset D(c = 0) is the whole R+ . In consequence, κd must be constant for every decision d ∈ R+ . Assume in the following κd = 1. Sufficient conditions on the filter coefficients αd , βd , and γd that guarantee PR are found: Proposition 4.1 [Pie01a, Prop. 7.1] PR is possible with previous assumptions in each of the following two cases: 1. For αd > 0 and βd , γd non-increasing w.r.t. d. 2. For αd < 0 and βd , γd non-decreasing w.r.t. d. Adaptive (update) lifting has some drawbacks. Firstly, x[n] is weighted with a real number, requiring quantization and thus, the decision recovery becomes in practice more difficult than stated. Also, in lossy compression this may be the cause of more difficulties to achieve PR. Secondly, the described approach imposes severe constraints on the FIR filter coefficients. These are the reasons that impel the analysis and extensions in section 4.2 and the generalization of section 4.4.

4.2

Adaptive Lifting Analysis

This section proposes an original analysis for the adaptive lifting. The detailed analysis of lemma 4.2 perfect reconstruction condition leads in a natural way to the concept of generalized lifting described in section 4.4. To get an insight into the PR condition it is useful to study the adaptive lifting, noted a(.), from the perspective of a mapping between sample spaces a : X × Yk → X 0 × Yk,


87

being k the number of samples of the filtering channel used for the adaptive lifting step. The adaptive lifting function may vary on each (x, y) according to the decision function. The decision conservation condition (4.1) is visualized in the following diagram: X × Yk

a-

X 0 × Yk D0

D

?

D

Id

? - D0

The diagram also aims to highlight that the decision function should obtain the same result when applied on the domain X ×Y k and on X 0 ×Y k , in the sense that a−1 (a(x, y, D), D0 ) = (x, y). For simplicity, in the following it is assumed that the adaptive lifting is a mapping between real spaces, a : R × Rk → R × Rk . For the sake of clarity, let denote xn , x[n], yn , y[n], and yn−1 , y[n − 1]. The case of study is the adaptive ULS with a gradient-based decision function (4.5), being xn ∈ R the sample modified by the two neighbors yn and yn−1 that belong to the detail signal. Therefore, k = 2 and the mapping is between 3-D real spaces a : R × R2 → R × R2 (xn yn−1 yn ) → (x0n yn−1 yn ). Figure 4.1 depicts a geometrical place with constant gradient. For a visual example of a 3-D mapping reader is referred to figure 4.2. The last k components are unaffected by a. The domain of a is the same as the decision function D domain. If a is a linear transform (from R3 to R3 ), then it is completely characterized by matrix 

 αd βd γd Ad =  0 1 0  . 0 0 1

(4.6)

Note that (4.6) is the identity matrix for the last k components. In this linear case, the inputoutput relation is ( x0n yn−1 yn )T = Ad ( xn yn−1 yn )T . The decision of the form (4.5) implies D : R3 → D = R+ . Given a decision d ∈ R+ , its preimage is the geometrical place of R3 constituted by a set of four intersecting hyperplane forming a rectangular cylinder (figure 4.1). Let denote such a set by D−1 (d). Every decision d triggers a choice of the lifting filter, which is the same for all points with equal gradient. For the case of study, the decision d and the l1 -norm of the gradient c, coincide. Let the image of


88

Figure 4.1: Geometrical place of the points of constant l1 -norm of the gradient.

every subset with constant gradient D−1 (c) be analyzed. Expression (4.7) specifies the points ( xn yn−1 yn ) ∈ D−1 (c):  2xn − yn−1 − yn    y n−1 + yn − 2xn D−1 (c) ∼ = yn−1 − yn    yn − yn−1

= c, = c, = c, = c,

for for for for

xn ≥ yn−1 , yn , yn−1 , yn ≥ xn , yn−1 ≥ xn ≥ yn , yn ≥ xn ≥ yn−1 .

(4.7)

0 yn0 )T with the transform Each point ( xn yn−1 yn )T of D−1 (c) is mapped to ( x0n yn−1

Ad . The transformed set of points, noted a(D−1 (c)), is also formed by four intersecting plane, which are specified by  0 2x0n − (2β + α)yn−1 − (2γ + α)yn0 = αc,    0 (2β + α)yn−1 + (2γ + α)yn0 − 2x0n = αc, a(D−1 (c)) ∼ = 0 yn−1 − yn0 = c,    0 0 yn − yn−1 = c,

for for for for

0 x0n ≥ yn−1 − γc, yn0 − βc, 0 yn−1 + γc, yn0 + βc ≥ x0n , 0 yn−1 − γc ≥ x0n ≥ yn0 + βc, 0 yn0 − βc ≥ x0n ≥ yn−1 + γc.

The analysis is clearer if the FIR filter is supposed to be symmetrical. Then, both coefficients 0 β and γ are equal. The expressions of the first two planes become 2x0n − yn−1 − yn0 = αc and 0 yn−1 + yn0 − 2x0n = αc, respectively.

1

The transform system has a plane of fixed points, (α − 1)xn + βyn−1 + γyn = 0, and all the planes parallel to this one, (α − 1)xn + βyn−1 + γyn = c, are moved by Ad to another parallel plane given by the expression (α − 1)xn + βyn−1 + γyn = αc. To sum up, the transform acts on D−1 (c) in the following fashion. If xn is between yn−1 and yn , then the output remains in the same plane, preserving the gradient c. Otherwise, xn is


89

Ad -

Figure 4.2: Example of a mapping from R3 to R3 . Map of D−1 (6) with the parameters (αd βd γd ) = (0.5 0.25 0.25).

outside the margins fixed by yn−1 and yn , and then the system modifies the value xn moving it closer to the values yn−1 and yn . Figure 4.2 depicts the system behavior. The output set a(D−1 (c)) is a rectangular cylinder that coincides in two planes with D−1 (c) and in the other two planes with D−1 (αc). The preliminary analysis of the transform behavior when applied to a single constant-gradient set serves to go up to the next step, which is the following analysis of the whole transform (i.e., the application between 3-D spaces) using coefficients respecting proposition 4.1. If αd > 0, then βd and γd are non-increasing w.r.t. d, so the greater the gradient is, more 1 1 the shape of the transformed rectangular cylinder becomes elongated. Figure 4.3 visualizes this

evolution. If 1 > αd > 0, then 1 > βd , γd > 0, and with increasing d, the transform is more similar to the identity. There is a certain d = dI for which Ad = I3 . For d > dI , the shape of the transformed rectangular cylinder continues the elongation. The key point is that βd and γd are non-increasing, thus the output set shape for different gradients changes in such a way that they never intersect. Therefore, the whole transform is injective, and thus, invertible. In [Pie01a] also appears a binary decision function that toggles between two filters according to the l1 -norm of the gradient, D(v) ∈ {0, 1}. Injection is imposed easily to such a system, since it is a particular case of the previous one. In addition, the challenge of deriving a simple decision recovery function is met. The appropriate inverse filter at each location is straightforward to find. In [Hei01], a binary decision is also employed and the framework is extended to k > 2, considering 2-D structures and several bands for updating a sample. In [Pie01b], various seminorms are combined in the same decision to take into account an increased number of possible 2-D structures. The framework is developed around the concept of seminorm. The decision is a threshold of a seminorm of the gradient vector. Lifting filter


90

Figure 4.3: In gray, the sets D−1 (c) for c = {1.6, 4, 6}, overlapped by the transformed sets a(D−1 (c)) (in black) using the PR condition αd > 0 and βd , γd non-increasing w.r.t. d. Visually, the injection condition is verified.

coefficients and the threshold for the decision recovery are drawn from seminorm properties. Different seminorms imply different filters and thresholds; the choice depends on the application at hand. An analysis of this extended framework raises the same conclusion: PR comes from the transform injection when viewed as a mapping from a R × Rk space to itself. In brief, this analysis establishes that the essential property for the adaptive transform a(.) to attain PR is to be injective. This requirement is demanded to the generalized lifting defined in section 4.4. Note that the proof of lemma 4.1 is based on the injective condition, since it shows that two different points in the input space have the same output if the lemma condition is not fulfilled. In practice, the necessary condition in the hypothesis of the lemma is sufficient, because the input and output spaces are a bounded subset 1 of Rk+1 and so, adaptive lifting with PR may be obtained by finely tuning the variable κd and the values of αd , βd , and γd according to the gradient. Next section develops adaptive ULS within the described framework. The injection condition is imposed to both schemes and thus, PR is guaranteed. Their construction substantially differs from previous adaptive ULS based on seminorms and their properties. Experiments are performed to prove the usefulness of the proposals.


4.3

91

Adaptive Lifting Steps Construction

Two different approaches for the adaptive ULS design are adopted in this section. The first one guarantees PR with a median-based decision function. The scheme is similar to the so-called hybrid filters [Pit90]. The second proposal is fully developed within the mapping between spaces point of view acquired in the previous section. The decision function considers the geometrical place in the 3-D space formed by the points with equal variance.

4.3.1 4.3.1.1

Median-based Decision Adaptive ULS Scheme Description

A set of ULS filters and decision functions that enable PR are proposed in this section. An interesting feature is that both, filter and decision are based on the same rank-order selection function. The case of study is the median but proposition 4.2 guarantees PR for any rank-order filter (ROF). This section incorporates a signal and filter notation that permits a clearer exposition and a better comprehension of the proposition below and its proof. The notation includes a “location” index and extended vectors. For the signal vector, the notation is ˜ j = ( yj,1 . . . yj,k x )T , and y ˜ j0 = ( yj,1 . . . yj,k x0 )T , yj = ( yj,1 . . . yj,k )T , y where subindex j refers to a window of signal y. For the filters, the notation is ˜ j = ( hj,1 . . . hj,k hj,k+1 )T , hj = ( hj,1 . . . hj,k )T and h where filter j is applied to the j signal window. For the ULS case, when a signal window and filter i are chosen, the update filter is linear being u = hi and the adaptive ULS reduces to ˜T y ˜i. x0 = hi,k+1 x + uT yi = h i

The proposal is inspired by the idea of the space-varying step described below, which has similarities with the space-varying prediction proposed in [Cla97] (cf. §2.3.1). Initially, there may be several prediction filters, each one having a different support: causal, anti-causal, both, etc. Then, the predicted value is selected among the results obtained from the different filters. A rank-order selection may be considered. For instance, the median is a reasonable choice, or, depending on the kind of image or image region, the maximum, minimum, or mean of several values are also possible choices. In a different context, the image coders CALIC [Wu97] and LOCO-I [Wei00] rely on a related strategy to perform a prediction. For the ULS case, the same idea is applicable. Assume a rank-order filter that selects as ˜ j be any detail sample window around updated value the L rank-order value, noted ROFL . Let y


92

˜ j any filter, with the (k + 1)th component hj,k+1 = hk+1 , ∀j ∈ [1, J]. Consider the values x and h ˜T y ˜T y ˜T y ˜ j as inputs to a rank-order filter, i.e., x0 = ROF {h ˜1, . . . , h ˜ J }. The selected linear filter h 1

j

J

may be recovered for any rank-order filter employed in the coder when the same filter is used ˜T y ˜T y ˜1, . . . , h ˜ J } is the same as the at the decoder because the rank-order of the elements {h 1

J

0 ˜T y ˜ T ˜ 0 }. rank-order of the elements {h 1 ˜ 1 , . . . , hJ y J

This fact is made evident by noticing that the order is maintained for any pair of linear filter ˜T y ˜T y ˜T y ˜i ≥ h ˜ j and the output x0 = h ˜ l for any l ∈ [1, J], then outputs among them. Assume h i

j

l

˜T y ˜ T ˜ j ⇒ hT yi + hk+1 x ≥ hT yj + hk+1 x h i ˜ i ≥ hj y i j 0 ˜T y ˜T ˜0 , ⇒ hTi yi + hk+1 x0 ≥ hTj yj + hk+1 x0 ⇒ h i ˜ i ≥ hj y j

so the rank order is preserved. Therefore, a space-varying ULS may be constructed using any ROF and linear filters with the constraint that the coefficient for x should be the same for all filters. This scheme is not truly adaptive, since the decision may be recovered without the use of the value x0 . A scheme related to this space-varying ULS is constructed based on proposition 4.2, which assures PR. The proposed scheme is truly adaptive in the sense that the value x0 is required for the decision recovery. Indeed, the adaptive ULS output x0 may be the input x, which is not possible in the previous space-varying ULS example if hk+1 6= 1. The trade-off is that the ROF has only three inputs. ˜ j any ˜ j , for j ∈ {1, 2}, be any detail sample window around x and h Proposition 4.2 Let y ˜T y ˜T y ˜1, h ˜2 filter, with equal (k + 1)th component, h1,k+1 = h2,k+1 > 0. Consider the products h 1

2 x0

and x as inputs of a rank-order filter. The output of the ROF is the updated sample, i.e., = T T ˜ ˜ ˜ 1 , h2 y ˜ 2 }. Then, the decision recovery condition holds for any rank-order filter used ROF{x, h1 y at the coder when the same filter is used at the decoder. The index of the rank-order filter output as decision function implies PR. ˜T y ˜ T ˜ 2 } is mainProof. First, it is proved that the order of the elements at the coder {x, h 1 ˜ 1 , h2 y 0 ˜ T ˜ 0 } if the output of the rank-order filter is x0 = x: ˜T y tained at the decoder {x0 , h 1 ˜ 1 , h2 y 2 0 ˜T y ˜ T ˜ j then h ˜T y ˜ T ˜i ≥ h ˜T y ˜T ˜0 . 1. For h i ˜ i = hi y j ˜ j = hj y j i ˜ i ≥ hj y 0 0 ˜T y ˜ T ˜i ≤ x ⇒ h ˜T y ˜T ˜0 = 2. For h i ˜ i ≥ x (the same demonstration holds for hi y i ˜ i ≤ x ), then hi y i T 0 ˜ ˜i ≥ x = x . h y i

Therefore, the order is preserved in both cases for any rank-order filter. Assume now that 0 ˜T y ˜ T ˜ 2 ). With 3 elements, there are three possible x0 = h 1 ˜ 1 (the same proof holds for x = h2 y rank-order filters: the minimum, the maximum, and the median. The proposition is proved for


93

the median and the maximum, while the proof for the minimum is “symmetrical” to that of the maximum. ˜ T ˜ 1 ≥ x, so x0 = h ˜T y ˜T y 1. ROF , median and h 2 ˜ 2 ≥ h1 y 1 ˜ 1 , as assumed. Then, it should be T 0 T 0 0 ˜ ˜ ˜ ≥h y ˜ ≥x: proved that h y 2

2

1

1

˜T y ˜ T ˜ 1 ⇔ hT y2 +hk+1 x ≥ hT y1 +hk+1 x ⇔ hT y2 +hk+1 x0 ≥ hT y1 +hk+1 x0 ⇔ • h 2 ˜ 2 ≥ h1 y 2 1 2 1 T 0 T 0 ˜ ˜ ˜1. ˜ 2 ≥ h1 y h2 y ˜T y ˜ T ˜ 1 ≥ hk+1 x, being hk+1 > 0. Adding hT y1 on both sides: • h 1 ˜ 1 ≥ x ⇔ hk+1 h1 y 1 ˜T y ˜T y ˜ 1 ≥ hT y1 + hk+1 x, which is equivalent to say h ˜ 0 ≥ x0 . hT y1 + hk+1 h 1

1

1

1

1

˜T y ˜T ˜ T ˜ 1 ≥ x, so x0 = h ˜T y 2. ROF , maximum and h 1 ˜ 1 ≥ h2 and h1 y 1 ˜ 1 . Then, it should be proved T 0 T 0 T 0 0 ˜ y ˜ y ˜ y ˜ ≥h ˜ and h ˜ ≥x: that h 1

1

2

2

1

1

˜T y ˜ T ˜ 2 ⇔ hT y1 +hk+1 x ≥ hT y2 +hk+1 x ⇔ hT y1 +hk+1 x0 ≥ hT y2 +hk+1 x0 ⇔ • h 1 ˜ 1 ≥ h2 y 1 2 1 2 0 ≥h 0. ˜T y ˜T y ˜ ˜ h 1 1 2 2 ˜T y ˜ T ˜ 1 ≥ hk+1 x, being hk+1 > 0. Adding hT y1 on both sides: • h 1 ˜ 1 ≥ x ⇔ hk+1 h1 y 1 T T T T ˜ ˜ ˜ 1 ≥ h y1 + hk+1 x, which is equivalent to say h y ˜ 0 ≥ x0 . h y1 + hk+1 h y 1

1

1

1

1

Assume that a ROFL is used as ULS. Then, the decision function indicates which linear ˜ j for j ∈ {0, 1, 2} has been used to update, being u = hj . The decision function output filter h is the index of the filter chosen by the ROFL , noted index(ROFL ). If the decoder employs the same ROFL , proposition 4.2 guarantees that the index of the selected filter by the ROFL is the same. Being the linear filter known, its inversion is straightforward in order to recover x: x0 − hTj yj x= . hk+1 Therefore, the resulting scheme is PR. Next diagram pretends to clarify this point.

R × Rk index(ROF )

ROF-

R × Rk index(ROF )

?

?

Id {0, 1, 2} - {0, 1, 2}

4.3.1.2

Experiments

To assess the usefulness of the ROF-based adaptive ULS, the signal of figure 4.4 is decomposed in 4 resolution levels. This signal has homogenous, linear, and quadratic regions, representing


94

a first approximation of an image model. Additive gaussian noise with different power is added ˜ = ( 1/3 1/3 1/3 )T [Pie01a] is employed for to the signal. The fixed average update filter h comparison. The adaptive ULS is the median of the input sample, the linear filter output with ˜ and the linear filter output with a delayed version of the filter h. ˜ The ULS is followed by the h, linear PLS with p = ( 1/2 1/2 )T . The weighted first-order entropy is measured from the resulting decomposition. Weighted first-order entropy [Cal98] is defined as the entropy of each band weighted according to the number of samples belonging to the band. Figure 4.5 depicts the weighted entropy as a function of the SNR. The graph is the mean of 200 trials. The adaptive scheme consistently improves the non-adaptive case. This is due to the smoothing effect of the rank-order selection: a filter may cross an edge or any other structure, thus giving a transformed coefficient value appreciably different from the original one or the coefficient obtained through a linear filter that does not cross the edge. The median discards these extreme values, providing a smoother approximation signal. Another comparison is given using filters of different sizes, in the spirit of [Cla97]. The fixed update and the subsequent fixed prediction are the same as in the precedent experiment. Meanwhile, the input of the median are the centered linear filters ˜ 0 = 1, h ˜ 1 = 1/3 1/3 1/3 T , and h ˜ 2 = 1/5 1/5 1/5 1/5 1/5 T . h The results are visualized in figure 4.6. They are worse than in the previous case, specially ˜ 2 filter support is too large, allowing the noise to affect for low SNR, and this is because the h the result and filtering through edges and different kind of regions. However, the adaptive case remains better than the non-adaptive when the SNR increases up to 10 dB.

4.3.2 4.3.2.1

Variance-based Decision Adaptive ULS Scheme Description

The local variance of the signal is considered in this section in order to construct a new adaptive ULS. Variance is interesting as decision function in the following sense: if its value is low, then the signal is locally homogeneous, and in this case, a low-pass filter as ULS is a reasonable option, since the approximation signal becomes smooth and useful for the subsequent prediction. On the other hand, if the variance is high, then it may be assumed that the signal has a local structure, like a texture or an edge, so performing a low-pass filter would redound to a blurring of the approximation signal, damaging the structure without a clear benefit for the prediction. Instead, if there is no update, the original sample value flows to the following resolution level and so does the structure to which it belongs, obtaining a more meaningful lower resolution image.


95

80

70

60

50

40

30

20

10 0

20

40

60

80

100

120

Figure 4.4: Test signal. 2.8 Non Adaptive Adaptive 2.7

Weighted Entropy

2.6 2.5 2.4 2.3 2.2 2.1 2

5

10

15

20

25

30

SNR

Figure 4.5: Comparison of the non-adaptive and the median adaptive ULS using a delayed linear filter. 1 2.8 Non Adaptive Adaptive 2.7

Weighted Entropy

2.6 2.5 2.4 2.3 2.2 2.1 2

5

10

15

20

25

30

SNR

Figure 4.6: Comparison of the non-adaptive and1 the median adaptive ULS using linear filters of different sizes.


96

Two detail samples are used for the construction of the variance-based adaptive ULS, i.e., k = 2. The ULS depends on an approximation sample xn , and its two detail sample neighbors yn−1 and yn . The mean of this three samples is m=

xn + yn−1 + yn , 3

and the variance at sample n (omitting the division by 3) is σn2 = (xn − m)2 + (yn−1 − m)2 + (yn − m)2 .

(4.8)

The decision at n depends on σn2 . According to the previous discussion, if the local variance exceeds a certain threshold T , σn2 > T , then the updated value is equal to the original, x0n = xn . If it does not exceed the threshold, σn2 ≤ T , then the updated value is a function x0n = f (xn , yn−1 , yn ) that decreases the variance or that smooths the approximation signal. At the decoder side, the variance has to be checked. Then, it has to be decided if a smoothing function has been applied or not at the coder in order to recover the original sample value. If it is always recovered, the scheme is PR. In the following, a PR scheme employing the variance as decision function is described. The proposed function f is a mapping with geometrical considerations. The diagram summarizes this scheme:

R × R2

f

- R × R2 0

2 σn

σn2

?

R+

? - R+

Developing the expression of the local variance (4.8), one gets 2 2 + yn2 − xn yn−1 − xn yn − yn−1 yn ). σn2 = (x2n + yn−1 3

(4.9)

The analysis of section 4.1 for the l1 -norm decision case is repeated in a similar way for the local variance decision based on equation (4.9). The geometrical place of the 3-D space points of the same variance is an ellipsoidal cylinder with a common axis, which is the line xn = yn−1 = yn . This line is the set of the points with variance equal to zero. For each cylinder a decision is made to map it into the output space. Decisions are made with the goal of mapping different cylinders without intersections into the output space, i.e., fulfilling the injection condition. Ideally, each cylinder containing the points with σn2 ≤ T should be projected to a cylinder with smaller 0

variance σn2 ≤ σn2 . However, LS imposes an insurmountable constraint that invalidates this kind of projection: only the component x may be modified. In consequence, a map to another cylinder with smaller variance is not possible, since all the components may vary. A modification of this variance-based decision is proposed below.


97

Suppose the initial variance is σn2 and that it is desired to map any point with this variance σn2 to a smaller variance point σn02 = sσn2 , being s ∈ [0, 1] a variance reduction factor. The updated values x0n that attain the variance σn02 are x0n =

yn−1 + yn 1 p ± ∆c , 2 2

where ∆c = 3 2sσn2 − (yn−1 − yn )2 , which has no real solution when ∆c < 0. The reduction factor s determines the existence of a solution: if s ≥ smin =

(yn−1 −yn )2 , 2 2σn

then ∆c ≥ 0. The reduction factor fixes the new local variance,

but its possible value is restricted depending on the relation between the detail samples yn−1 2 and yn and the variance σn,a through the value of smin . In order to consider this restriction,

the reduction factor s is imposed to be a function v(.) of the maximum reduction smin . It is feasible to perfectly reconstruct xn from x0n and the detail samples if the function v(.) fulfills the following three conditions: 1. v(.) is defined in the interval [0, 1]. 2. ∀x ∈ [0, 1) : v(x) > x. 3. The equation v(x) = kx has no more than one solution for any k. The first condition is due to the domain of s and smin . The second one arises from the fact that s cannot be smaller than smin , since smin is the minimum reduction factor with real solution. Finally, the original value x is recovered solving the equation v(x) = kx, so x is uniquely decoded if the equation has one and only one solution. There are two simple functions that meet these three conditions, which are the following linear and quadratic functions: 1. s = v1 (smin ) = (1 − λ)smin + λ, for λ ∈ [0, 1]. 2. s = v2 (smin ) = (λ − 1)s2min + 2(1 − λ)smin + λ, for λ ∈ [0, 1]. Once established these preliminaries, an algorithm to perform the variance-based adaptive ULS is described in table 4.1. The algorithm inputs are xn , yn−1 , yn , and the threshold T . The output is the updated coefficient x0n . The subindex c denotes coding: σc2 is the coding side local variance. The corresponding decoding algorithm is described in table 4.2. The decoding algorithm inputs are x0n , yn−1 , yn , and the threshold T . The output is xn,d , which coincides with xn . The subindex d denotes decoding parameters.


98

Variance-based adaptive ULS coding algorithm: 1. Compute: 2 • σc2 = 23 (x2n + yn−1 + yn2 − xn yn−1 − xn yn − yn−1 yn )

• If σc2 ≥ T , then x0n = xn . End algorithm. 2. Compute: (yn−1 −yn )2 2σc2 yn−1 +yn 2

• smin = •α=

3. Obtain the variance reduction factor through s = v(smin ). 4. Compute: • ∆c = 3 2sσc2 − (yn−1 − yn )2 5. Obtain the output: • If xn ≥ α, then x0n = α + • If xn ≤ α, then x0n = α −

√

1 2 ∆c . √ 1 2 ∆c .

Table 4.1: Variance-based adaptive ULS coding algorithm.

4.3.2.2

Experiments

The experimental setting of §4.3.1 is repeated for the variance-based adaptive ULS in order to assess its usefulness. The test signal, range of SNR, number of trials, number of resolution levels, etcetera, are the same. The employed function v(.) is the linear one, with λ = 0.2. In this case, solving the equation v(smin,d ) = ksmin,d in the point 2.3 of the decoding algorithm is straightforward: smin =

λ . k+λ−1

The threshold is set to T = 20. Results are shown for a broad set of SNR in figure 4.7. To fix ˜ = ( 1/3 1/3 1/3 )T is also a comparison reference, the previous non-adaptive case with h depicted. Results of the variance-based adaptive ULS clearly improve those of the fixed update. A sensible choice appears in this scheme with the threshold T . Needless to say, the best value is signal-dependant. However, qualitative indications may be given in order to choose a good threshold. T should be great enough to permit the variance reduction in nearly homogenous regions and small enough to let structures be unaffected by the filtering; thus, T depends on how “homogeneous” and how “salient” the structures are, i.e., it depends on the kind of signal. Values between 3 and 50 work correctly. Employing the test signal of figure 4.4, the threshold giving the least weighted entropy as a function of the SNR among the set T ∈ [1, 400] is depicted


Variance-based adaptive ULS decoding algorithm: 1. Compute: 2 2 0 0 • σd2 = 32 (x02 n + yn−1 + yn − xn yn−1 − xn yn − yn−1 yn )

• If σd2 ≥ T , then xn,d = x0n . End algorithm. 2. Recover smin : 2.1. Compute: • c1 = •k=

(yn−1 −yn )2 2 σd2 c1

2.2. Deal with limit cases: • If σd2 = 0, then xn,d = yn−1 = yn . End algorithm. • If σd2 6= 0 and c1 = 0, then smin,d = 0. Goto to 3. 2.3. Compute smin,d : • Solve the equation v(smin,d ) = ksmin,d . 3. Obtain the variance reduction factor with s = v(smin,d ). 4. Recover the variance at the coder: • σc2 =

σd2 s

5. Compute ∆c : • ∆c = 3 2sσc2 − (yn−1 − yn )2

6. Obtain the output: • Compute α =

yn−1 +yn 2

• If x0n ≥ α, then xn,d = α + • If x0n ≤ α, then xn,d = α −

√

1 2 ∆c . √ 1 2 ∆c .

Table 4.2: Variance-based adaptive ULS decoding algorithm.

99


100

in figure 4.8. Specifically in this toy example, it is observed that there are intervals of T for which the weighted entropy does not sensibly vary (cf. figure 4.9). The reason is that no new structures are filtered within this interval. A gap appears when a new edge or structure is filtered.

4.4

Generalized Lifting

This section presents a generalization of the lifting scheme (illustrated in figure 4.10) as stated in [Sol04a]. The proposed scheme is similar to the classical lifting except that the sums after the prediction and the update steps are embedded in a more general framework, extending the adaptive lifting idea of §4.1. For instance, in the classical lifting the prediction is viewed as a filter that generates a predicted value that is used to modify y[n] through a subtraction. In the generalized lifting (GL) scheme the prediction is viewed as a function that maps y[n] to y 0 [n] taking into account values from the approximation signal x. The restriction of modifying y[n] only through a sum has been removed and so, the scheme allows more complex, possibly adaptive or nonlinear modifications. The same generalization can be done for the ULS: y 0 [n] = P (y[n], x[n]), x0 [n] = U (x[n], y0 [n]). Furthermore, in order to have a reversible scheme, generalized prediction and update cannot be chosen arbitrarily. The restriction to be imposed on a generalized step to attain reversibility is the injection, as it has been analyzed in §4.2. The following is a formal definition of a generalized lifting step (GLS). Let A be the set of functions a from R × Rk to itself a ∈ A ⇔ a : R × Rk → R × Rk such that (z10 [n] z20 [n − n1 ] . . . z20 [n − nk ]) = a(z1 [n] z2 [n − n1 ] . . . z2 [n − nk ]). Here, samples are denoted by z in order to maintain the same definition for both lifting steps. For the prediction (update) step, it is assumed that z1 [n] = y[n] and z2 [n] = x[n] (z1 [n] = x[n] and z2 [n] = y[n]). Let A0 be the subset of A containing all functions that do not modify z2 [n], that is, for which the restriction to Rk is the identity: A0 = {a ⊆ A | a|Rk →Rk = Ik }. In the sequel, a GLS is considered a function of A in order to highlight its dependency to k + 1 samples. However, a lifting step can only modify z1 [n], so it is a function belonging to the


101

2.6 Non Adaptive Adaptive

2.5 2.4

Weighted Entropy

2.3 2.2 2.1 2 1.9 1.8 1.7 1.6

5

10

15

20

25

30

SNR

Figure 4.7: Comparison of the non-adaptive and the variance-based adaptive ULS. Weighted entropy is depicted as a function of the SNR with the threshold T = 20.

14

Minimum Entropy Threshold

12

10

8

6

4

2 0

5

10

15

20

25

SNR

Figure 4.8: Threshold T value minimizing the 1weighted entropy of the decomposed test signal for the range of SNR from 0 to 25 dB. 2.15 2.1

Weighted Entropy

2.05 2 1.95 1.9 1.85 1.8 1.75

50

100 150 Threshold Value

200

250

Figure 4.9: Weighted entropy for the 4 resolution1 level decomposition of the test signal with SNR = 12 dB as a function of the threshold value.


x

102

x

x'

U-1

U

P-1

P y'

y

y

Figure 4.10: Generalized lifting scheme.

subset A0 . At the same time, if a reversible scheme is desired, the GLS should be an injective function of A0 . The same statements apply to the generalized prediction and update steps. As result, a GLS is defined as an injective function of A0 . The GL has several interesting properties. They are detailed in the ensuing points: • Depending on the point of view, the same lifting step can be considered a nonlinear or an adaptive filter. The GL scheme gives a connection between nonlinear and adaptive lifting filters: they are seen as applications between real spaces. • The GL gives an insight on the kind of decision function required to reach reversibility. Concretely, mathematically complex concepts for reversibility have been reduced to the simple injective condition. • The GL inherits from adaptive LS the capacity to expand a signal through one of several wavelet bases and recover the basis of expansion only from the transform coefficients without any book-keeping, which is not possible in a linear decomposition scheme. • The GL also allows the construction of adaptive non-separable 2-D transforms. The last two properties in the list above are illustrated with a single example. The 2-D non-separable property comes from the sample re-ordering of figure 4.11. Two sets of 2-D basis vectors are depicted in figure 4.12. The first basis is the canonical and the second the separable 2-D Haar. This example constructs a system that selects one of four possible 2-D bases. The scheme uses an adaptive ULS with a thresholded gradient-based decision function (4.5) taken up again from [Pie01b]. The sample filtering dependencies are modified as figure 4.13 shows. Both ULS employ y0 and y1 . They are followed by the LeGall 5/3 prediction. The transformed coefficients are the two approximate samples x00 and x01 and the two detail samples y00 and y10 . If the l1 -norm of the gradient is below a threshold T , then the decision is binarized to d = 0 and the update filter is ( α0 β0 β0 ) = ( 1/3 1/3 1/3 ). Alternatively, if the gradient is


x0

y0

x1

103

x0 y0 y1 x1

y1

Figure 4.11: 1-D signal and the corresponding 2-D signal notation.

1

0

0

1

0

0

0

0 Initial Basis

0

0

0

0

1

0

0

1

1

1

1

-1

1

1

1

-1 2-D Haar

1

1

1

-1

-1

-1

-1

1

Figure 4.12: Two examples of 2-D basis vectors: canonical basis and the non-normalized 2-D Haar basis.

equal or greater than T , then the decision is 1 and the filter is ( α1 β1 β1 ) = ( 1 0 0 ). The selected decision function and filters imply that the underlying decomposition basis commutates among four bases according to the decision at n = 0 (the value of d0 ) and at n = 1 (d1 ). Figure 4.14 shows the basis vectors depending on d0 and d1 . The structure in figure 4.13 is reversible because each step is reversible. The structure itself imposes a constrain on the transformed coefficients that makes possible to deduce the expansion basis and the original data from them, which is generally not possible. For instance, if the thresold is T = 2, then the coefficients ( x00 y00 x01 y10 ) = ( 4/3 3 −7/6 −1/6 ) may only arise from the original data ( x0 y0 x1 y1 ) = ( 1 1 2 3 ), being the decision d0 = 0 and d1 = 1.

4.4.1

Discrete Generalized Lifting

Generalized lifting in its continuous form is hardly useful for compression because quantization implies a critical trade-off between global injection and bit-rate. Discrete generalized lifting is a proposed solution to the quantization problem. The GL scheme as presented so far assumes that the values taken by x and y are real numbers. In many applications related to compression, the values of x and y are quantized before transmission. In this case, it is the mapping Q(a(z1 [n] z2 [n − n1 ] . . . z2 [n − nk ])), where Q(·) represents the quantization, that should be an injective function of the set A0 .


104

x0

x1

y1

B0,d a 0,d

B0,d

-0.5

a 1,d

B1,d

x'0

Prediction

Update

y0

x'1 1

B1,d

-0.5

y'0

y'1

Figure 4.13: Modified lifting sample dependencies using an adaptive ULS and a fixed PLS.

Adaptive Bases 1/3 1/3

1/3

-1/6 2/3

-1/6 -1/3

1/3 1/3

-1/3 -1/6

2/3 -1/6

-1/6 5/6

-1/6 -1/3

0

d0 = 0, d1 = 0 1/3

0

1/3 1/3

0

0

d0 = 0, d1 = 1 1/3

0

1

0

-1/6 -1/2

5/6 -1/2

1

0

0

1/3

-1/2 5/6

-1/2 -1/6

0

0

1/3 1/3

-1/6 -1/6

5/6 -1/6

1

0

-1/2

1

-1/2

0

0

-1/2

1

-1/2

d0 = 1, d1 = 0

0

0

d0 = 1, d1 = 1 0

0

1

0

Figure 4.14: Example set of adaptive basis vectors.


105

z2[n]

z2[n] mapping

Ci,j

C'i,j

z2 [n+1]

z1[n]

z2 [n+1]

z'1[n]

Figure 4.15: Discrete mapping from the Z255 × Z2255 space to itself. The lifting step is reversible if the mapping from each column Ci,j (Ci∈Z2255 ) to itself is bijective.

Several reversible schemes that include quantization may be found. However, most of the resulting decompositions are not suitable for compression. Quantization destroys the injective condition. A possible solution is to consider the discrete version of the generalized LS. To this goal, the values taken by x and y and the generalized step outputs x0 and y0 are assumed to be integers. In this case, no quantization is necessary after a lifting step and the only issue is to design a discrete injective mapping. Despite the injective condition is applicable in the discrete case, a bijective condition arises in a natural way because the input and output spaces have the same size, so mappings have to be one-to-one in order to have all the elements in each space related. Consider now the following framework for discrete gray-scale images where each pixel is represented by 8 bits. Without loss of generality, sample values are assumed to range from -128 to 127. Let Z255 be the set of integers that belong to the interval [−128, 127]. The discrete generalized update and prediction are functions from the Z255 × Zk255 space to itself that can only modify the first component. The statements made in the real case are also valid for the discrete case. In particular, reversibility is obtained if the mappings (z10 [n] z20 [n − n1 ] . . . z20 [n − nk ]) = a(z1 [n] z2 [n − n1 ] . . . z2 [n − nk ]) are bijective. For z2 [n − n1 ], . . . , z2 [n − nk ] fixed, the set of all possible values of z1 [n] describes a column in the Z255 × Zk255 space. Let such a column be denoted by Ci∈Zk : 255

Ci∈Zk

255

= {z1 [n], z2 [n − n1 ] = i1 , . . . , z2 [n − nk ] = ik }.

(4.10)

As the generalized update and prediction may only modify the component z1 [n], they map each column Ci∈Zk

255

to itself. In order to have a reversible scheme, the mapping of Z255 × Zk255


106

to itself should be bijective for all columns. Figure 4.15 illustrates the case k = 2. To simplify the notation, column Ci∈Z2255 is denoted by Ci,j . Once the previous definitions have been stated, the critical problem is to design useful GLS for specific applications, or what is the same, choose an appropriate mapping between the huge amount of possibilities. For instance, the number of bijective mappings of a column to itself is equal to the factorial of 256. The choice or design is crucial and may depend on many factors, such as compression performance to a given image or class of images, computational cost, or memory requirements. The following chapter proposes and analyzes some discrete generalized prediction §5.1 and update §5.2 steps.

Chapter 5

Generalized Discrete Lifting Steps Construction Chapter 4 is a theoretical analysis of the adaptive lifting and the presentation of the generalized lifting framework. The objective of chapter 5 is the development of specific GLS within the new framework and the description of their performance in some image compression applications. The discrete GL §4.4.1 is further developed to construct nonlinear and adaptive PLS in §5.1 and ULS in §5.2. The first proposal §5.1.1 is a PLS design within the philosophy of the approaches in §2.3.1 that have no direct optimization criterion. The rest of the proposals aim to optimize a certain criterion as the approaches described in §2.3.2. The chapter summary and some conclusions are given in §5.3. The framework is essentially devoted to nonlinear processing thus implying a fundamental drawback in the context of compression. Filter design for an embedded lossy-to-lossless code becomes difficult, since the frequency-band notion disappears and the inter-band relations are less obvious. Furthermore, the output coefficients may not be a continuous function of the input samples and so, quantification errors may be magnified through resolution levels. For these reasons, this work is restricted to lossless compression. This choice is also motivated by the large number of applications in which original image data should be exactly recovered, such as remote sensing and biomedical imaging. A point that needs justification is the following. Wavelet-based image coders rely on frequency-band decomposition interpretation, but frequency is a linear concept invalidated by the proposed nonlinear schemes. In this work, a transform band is assumed to be a subset of samples from which it is expected to share the same statistics. This implies that the samples coming from the same filtering channel form a band, and these bands are coded as if they were obtained from the usual spatial-frequency wavelet decomposition. The assumption is surely not optimal, but it does not seem to worsen performance significantly.

107

Chapter 5. Generalized Discrete Lifting Steps Construction

5.1

108

Generalized Discrete Prediction Design

This section discusses three approaches for a discrete PLS design. In the case of prediction, a column (4.10) is defined as Ci∈Zk

255

= {y[n], x[n − n1 ] = i1 , . . . , x[n − nk ] = ik }.

(5.1)

The filter design problem amounts to find a mapping from every column of the Z255 × Zk255 space 0 to the transformed column (noted Ci∈Z k ), 255

0 Ci∈Z k

255

= {y 0 [n], x[n − n1 ] = i1 , . . . , x[n − nk ] = ik }.

(5.2)

Every column mapping should be bijective for the transform to be reversible according to the considerations established in §4.4.1. The restriction to k = 2 holds. Therefore, two neighbors are considered for the prediction of the sample in between, as in the classical lifting with LeGall 5/3 wavelet filter.

5.1.1

Geometrical Design of the Prediction

The design proposed in [Sol04a] is outlined in this section. The approach is quite intuitive and shows the GL flexibility because the design reduces to manipulate the mapping from a three dimension space to itself according to three simple rules depending only on geometrical distances. Every point on the left space of figure 4.15 is mapped (or transformed) to a point on the right space following the three rules, which do not explicitly try to minimize any criterion but that are based on intuitive arguments. In the Z255 × Z2255 space, the line l : x[n] = x[n + 1] = y[n] plays a special role because every point p on l should be mapped to the point (0, x[n], x[n + 1]) to have a zero detail output if the input signal is a constant. Then, the mapping of a point p : (y[n], x[n], x[n + 1]) is based on its relative position and the distance w.r.t. the line l. The distance between a point and a line is defined as the minimum of the distances between the point and any point of the line. The square distance of the point p to the line l is given by distance(p, l)2 ∝ y[n]2 − (x[n] + x[n + 1])y[n] + (x[n]2 + x[n + 1]2 ) − x[n]x[n + 1].

(5.3)

Note that (5.3) is the equation of a parabola respect to y[n]. The prediction is constructed by reordering the points of a column Ci,j according to their distance to l and the following rules, which impose conditions on the filter.


109

1. Vanish the detail signal first moment. This condition and the restriction to k = 2 completely specify a linear filter. In the nonlinear case, this rule only means that the nearest point of every column to l should be mapped to 0. The nearest point ymin is the average of the two neighbors, x[n] + x[n + 1] . (5.4) 2 The rest of the points are specified by the two other rules that try to employ in the most ymin = y[n] =

effective way the additional degrees of freedom obtained from relaxing the constraint of linearity. 2. Continuity. A desired property is that similar inputs give similar outputs. Therefore, the prediction should be a function of y[n] as continuous as possible. This is attained inside the so-called linear zone (figure 5.1), where the values below ymin are mapped to negative integers maintaining their order, and in the same fashion, values over ymin are mapped to positive integers. 3. Logical nonlinear mapping. Beyond the linear zone, values are alternatively mapped to the positive and negative remaining integers. This nonlinear zone generally exists in mappings from a finite discrete space to itself. The proposal is logical because in natural images it maps the more probable remaining points to the minimum output values, thus minimizing the output energy. Obviously, other “logical” mappings exist. For instance, it is also interesting to preserve continuity by using the mapping with minimum discontinuities. It is possible to construct a mapping with only one discontinuity, with the tradeoff that it does not minimize the output energy for a wide range of images. In practice, the coding results that we have obtained are alike. The prediction based on the geometrical design (geometrical prediction for short) is equivalent to the classical LeGall 5/3 wavelet PLS inside the linear zone. This is verified by observing that both mappings are linear, that the output of both mappings is zero when y[n] = ymin (5.4), and that the output varies in the same way as a function of y[n] (cf. input-output relation in figure 5.1). Outside the linear zone, the geometrical prediction does not correspond to a simple linear filter. However, this mapping offers several advantages: 1. The mapping is easily computed through the distance function, avoiding the look-up-table usage. 2. The resulting detail samples have the typical “high-pass meaning” within the linear zone. 3. Wavelet-type output coefficients, which amount to: (a) The possibility to attain a multi-resolution decomposition. (b) The employment of usual entropy coders.


110

distance(p,l)

Linear zone

Nonlinear zone

dmin

Input Values:

-128

ymin

127 mapping

Output Values:

-k -(k-1) ...

... -2 -1 0 1 2 ...

... k-1 k k+1 -(k+1) ...

Figure 5.1: Distance between the points of a column to the line l and the proposed geometrical mapping for the generalized prediction.

4. For most of images, the geometrical prediction detail signal energy is smaller than using the LeGall 5/3 wavelet, as it has been experimentally verified. A resulting detail sample with typical high-pass meaning is important if an update filter follows the prediction, because it can operate as in the classical lifting. An update is useful for a multi-resolution decomposition, since posterior processing of the approximation signal in the next resolution level performs better when signal is a low-pass version of original data than when it is a simple down-sampled version. 5.1.1.1

Experiments and Results

This experiment is presented as in [Sol04a]. Geometrical PLS performance is assessed in a multiresolution framework. To this end, the scheme is completed with a space-varying update ( x[n], if max(|y 0 [n − 1]|, |y 0 [n]|) > T, 0 x [n] = x[n] + d(y 0 [n − 1] + y 0 [n])/4c, otherwise, which varies according to the modulus of the detail signal samples. A sample x[n] is updated with two detail samples y 0 [n−1] and y 0 [n]. If the modulus of these detail samples are small, then, as a first approximation, they are high-pass coefficients and can be directly used by the ULS as


Rate (bpp) Image/Filter Lenna Baboon Barbara Peppers Girl Cameraman Goldhill Mean

2 resolution levels Haar 5/3 G. Pr. 6.034 6.207 5.827 7.418 7.202 6.784 6.828 6.498 6.602 6.648 6.475 6.058 6.235 5.973 5.565 6.478 6.726 6.334 7.059 6.437 6.465 6.671 6.503 6.234

111

3 resolution levels Haar 5/3 G. Pr. 5.243 4.882 4.831 6.740 6.419 6.446 5.787 5.449 5.481 5.513 5.265 5.276 4.896 4.549 4.451 5.323 5.191 5.108 6.192 5.926 5.853 5.671 5.383 5.349

Table 5.1: Natural images rate for 2 and 3 resolution levels using Haar wavelet, LeGall 5/3 wavelet (column headed by “5/3”), and the geometrical prediction (column “G. Pr.”) followed by SPIHT coder. Results are given in bits per pixel (bpp).

classical updates do. Detail samples with large values mean, also as a first approximation, that y[n] comes from an edge. If a smooth x0 is desired, edges should not flow to lower resolution levels and consequently, no update is performed. Values have small or large value according to a threshold T, fixed to 12 as the best value after several experiments with natural images. Since values in the ensuing resolution levels may not have the same dynamic range, the discrete generalized prediction is modified to handle an arbitrary range of values. The algorithm is the same, but the range of values has to be sent to the decoder to recover the original data. One resolution level is obtained by first filtering every row and then only the columns of the approximation image. This leads to a three-band decomposition. The method is applied to 7 natural images and compared to two non-adaptive wavelet filters: the Haar and the LeGall 5/3 wavelets. Decompositions are followed by the SPIHT coder. Resulting bit-rates are shown in table 5.1. For two resolution levels and the tested images, the proposed scheme performs around 4.5% better than LeGall 5/3 wavelet. For three levels, results are only slightly better than LeGall’s. This decrease of gain is possibly due to the worse multi-resolution performance of the prediction and update filter, which are not the best choice for obtaining a good approximate signal for further processing. The geometrical PLS based decomposition is also applied to the MRI group of images through the three dimensions. The decomposition has 4 bands per resolution level, instead of the usual 8 bands. The transformed coefficients are coded with SPIHT 3-D, which is detailed in the appendix 5.B. Table 5.2 contains the final bit-rates. In contrast with the natural images experiment, the MRI are better compressed for all resolution levels with the geometrical approach. For this set, the geometrical PLS reduces the detail signal energy w.r.t. its linear counterpart. Meanwhile, the space-varying ULS is effective enough to obtain approximation signals which are good for its further decomposition.


Rate (bpp) 2 res. lev. 3 res. lev. 4 res. lev.

LeGall 5/3 4.980 3.798 3.667

112

Geom. Pred. 4.943 3.731 3.597

Table 5.2: MRI set compressed with SPIHT 3-D using LeGall 5/3 and the geometrical prediction.

These results are justified from a statistical point of view in §5.1.2. The generalized PLS is optimized w.r.t. the image probability density function (pdf). The assumption of an underlying image pdf has revealed useful in our practice. There is a typical pdf which leads to an optimized prediction mapping equivalent to the geometrical prediction, thus explaining the given compression results. 5.1.1.2

Extensions

The transform support of the proposed decomposition is 3x1 and 5x3 pixels per transformed coefficient for the H1 and HL1 bands (the first detail bands). This support is smaller than the 2-D LeGall 5/3 wavelet support, which is 3x3, 3x5, and 5x3 for the HH1, HL1, and LH1 bands, respectively. Therefore, the proposed geometrical prediction scheme obtains better compression results using less information (i.e., less input samples contribute to obtain an output sample). However, difficulties arise in the generalization of the proposal to larger supports. The mapping between 3-D spaces becomes a less intuitive higher dimensional mapping. In each case, a geometrical place that plays a role similar to the line l in the proposed scheme has to be found. For instance, let us analyze the case k = 4, in which four neighbors are used for the generalized prediction. In a similar manner to k = 2, the line l1 : x[n − 1] = x[n] = x[n + 1] = x[n + 2] = y[n] formed by the points that have all the components equal may be considered. In this case, the nearest point of every column to l1 is ymin = y[n] =

x[n − 1] + x[n] + x[n + 1] + x[n + 2] , 4

which is also the average of the neighbors, as for k = 2. The mapping with l1 is interesting in a two dimensional image grid: it is logical to predict a pixel with the mean of its left, right, up, and down neighbors. However, the resulting mapping does not vanish four moments of the detail signal, which is a property attainable with k = 4. To this goal the line l2 : −x[n − 1] = 9x[n] = 9x[n + 1] = −x[n + 2] =

41 y[n] 4

is the appropriate. The line l2 is related to the Lagrange interpolating polynomial of degree three with equidistant points. The mapping arising from l2 vanishes four moments.


5.1.2

113

Optimized Prediction Design

This section addresses the work presented in [Sol04c]. The PLS design is formulated as an optimization problem that depends on the signal probability density function. The resulting lifting step is applied to biomedical images (mammography) and remote sensing images (sea surface temperature) with good results. As stated in §4.4.1, the transform is reversible if every column mapping is bijective. Columns form a partition of the space Z255 × Zk255 , so the prediction mappings are independent. Accordingly, every column mapping Pi (·) is independently designed from each other, [

y 0 [n] = P (y[n], x[n]) =

P (y[n], x[n])|x[n]=i =

∀i∈Zk255

[

Pi (y[n])|x[n]=i .

(5.5)

∀i∈Zk255

Given i ∈ Zk255 , the transform relates every input value y[n] ∈ Z255 one-to-one to every output value y 0 [n] ∈ Z255 . Therefore, output values for each i are related to input values simply through a permutation matrix. A prediction step P is seen as the union of |Zk255 | permutation matrices, noted Pi . Consequently, the complexity associated to this formulation grows exponentially with k. In practice, one has to use a low value of k (i.e., a reduced number of context values x[n]) or to take advantage of the similarities between permutation matrices that may arise. State-of-the-art entropy coders benefit from several characteristics of wavelet coefficients (cf. §2.5). Specifically, they tend to increase their performance when coefficients energy is minimized. Therefore, a reasonable goal is to design a mapping that minimizes the expected energy of the detail signal. Such an optimal prediction is [

Popt = arg min E[y 02 ] = P

arg min E[y 02 |x = i].

∀i∈Zk255

Pi

(5.6)

The second equality in (5.6) is due to the independency between columns. As a result, the design of the prediction function reduces to the definition of the optimal column mapping Pi (·) (or permutation matrix Pi ) for all columns:

E[y 02 |x = i] =

127 X

n2 P r(y 0 = n|x = i)

n=−128

=

127 X

n2 P r(Pi (y) = n)

n=−128

=

127 X

n2 P r(y = Pi−1 (n)).

n=−128

(5.7)


y[n]

P(y|x=i)

127

P(127| x)

126

P(126| x)

Pi Pi max P(y|x)

0

114

y'2 [n]

y'[n]

1272

127

1262

126

(-128)2

-128

12

-128

P(-128| x)

min P(y|x)

1282

x=i

x=i Figure 5.2: Optimized prediction design.

Note that P r(.) stands for the probability function. Expression 5.7 can be formulated as

E[y 02 |x = i] =

127 X

Pi−1 (n)2 P r(y = n|x = i)

n=−128

=

Pi−1 (−128)2

T . . . Pi−1 (127)2 P r(y = −128|x = i) . . . P r(y = 127|x = i) , (5.8)

because Pi (·) is bijective. Expression 5.9 is obtained by introducing the permutation matrix in (5.8): T E[y 02 |x = i] = (−128)2 . . . (127)2 Pi P r(y = −128|x = i) . . . P r(y = 127|x = i) .

(5.9)

The energy expectation in (5.9) is minimized when the permutation matrix relates input values of high probability with small energy output values. Proposition 5.1 in the appendix demonstrates this statement. The permutation matrix optimizing (5.9) that relates input conditional probabilities with output energies is used in the discrete sample space to relate each input with the corresponding output. Figure 5.2 illustrates the point. Then, assuming that the pdf is known, a column map is created by constructing a vector with input values sorted by their probability in descending order. The first element of this vector, which is the more probable input sample for the given context, is assigned (mapped) to a 0 output value (the minimum energy output). Following, the output value of -1 is assigned to the vector second element(corresponding the input values of second highest probability), 1 is assigned to the third element, 2 to the fourth, and so on. In practice, a PLS is performed by column mappings which are look-up-tables that reorder input values according to their probabilities. These look-up-tables are more practical representations of permutation matrices.


5.1.2.1

115


This design strategy is applied to three classes of images: the natural images and two classes of specific images, mammography and sea surface temperature images. The last two classes are chosen because of their pdf, which differs significantly from that of natural images. Like in the previous section, the restriction to k = 2 holds. The decomposition is followed by an entropy coder, SPIHT or EBCOT.

Experiment 1: Natural Images The probability distribution function of a sample y[n] conditioned to the value of its two vertical neighbors, x[n] and x[n + 1], is extracted from a set of seven natural images (those of table 5.1). Figure 5.3 partly represents the average pdf: it is an histogram depicting the frequency of apparition of a sample value in function of the mean of its two neighbors, m=

x[n] + x[n + 1] . 2

Note that a complete representation would have 4 dimensions because the histogram depends on both neighbors and not only their mean, but this simplified representation will allow us to analyze the system behavior. A common pattern is observed for all contexts in this pdf. Concretely, it has a maximum at the mean value m and decreases monotonically and symmetrically on both sides. This structured pdf allows the avoidance of the PLS implementation by means of the look-uptable design previously described. Once m is computed, the conditional probability order only depends on the difference dy[n] = y[n] − m. The value of dy[n] is related to the number of input values with higher probability than y[n], which have to be mapped to lower energies than y[n]. Therefore, dy[n] indicates the output value corresponding to y[n]. For testing purposes, the seven natural images are compressed with the 1-D 2-taps optimized prediction and the SPIHT coder. No update step is used. Images are first filtered vertically and then the approximation signal is filtered horizontally, resulting in a three-band decomposition. The same PLS is employed vertically and horizontally for all resolution levels. Optimized prediction performs better than the LeGall 5/3 wavelet for 2 resolution levels and marginally better for 3 resolution levels. However, very similar results are obtained for both decompositions using the EBCOT coder. This fact suggests that the design strategy in the case of natural images does not provide a prediction significantly different from LeGall’s 5/3 linear case. In order to clarify this point, let us analyze the prediction resulting from the optimization strategy. The kind of prediction mapping that arises from the natural images pdf has two differentiated parts, which are a linear and a nonlinear part. Figure 5.4 shows the prediction mapping when the context is x[n] = x[n + 1] = −28. The context value is indicated by a vertical line at -28. Input values between -128 and 72 are linearly mapped to output values


116

Figure 5.3: A pdf approximation (logarithmic scale histogram) of y[n] (Y axis) conditioned to the mean value of its two vertical neighbors (X axis) for the set of natural images.

between -100 and 100. This mapping is almost equivalent to the linear combination y 0 [n] = y[n] −

x[n] + x[n + 1] . 2

The linear part of the mapping is due to the pdf that has a maximum in m and decreases monotonically and symmetrically on both sides. In fact, when the conditional pdf has this shape (figure 5.5) the optimized design coincides with the geometrical design of §5.1.1. For input values above 72, the mapping is highly nonlinear but it arises from the choice to work with a discrete finite output space, with values between -128 and 127. As a result of this analysis, it can also be deduced that the mapping is the same as the LeGall 5/3 prediction filter for most probable input values. Therefore, a powerful coder like EBCOT returns practically the same results for both decompositions. On the other side, there is a potential compression gain for those images belonging to a class with a pdf that significantly differs from that of the natural images. The following experiments 2 and 3 illustrate results for two classes of biomedical and remote sensing

1

images with such type of pdf.

Experiment 2: Biomedical Images The set of the first 11 mammography images from the database is selected to realize the experiment for biomedical images. The size of these images is about 1 Mbyte without compression. Six mammography are used to estimate the pdf for this class of images. The resulting pdf does not exhibit a regular pattern as in the case of natural images. As figure 5.6a shows, mammography images pdf is not as structured as natural images pdf. The histogram is nor symmetrical neither decreases monotonically. Usually, several maxima appear and also, darker values are rather probable for most of the contexts. Figure 5.7a depicts


117

200

150

Output Value

100

50

0

−50

−100

−150 −150

−100

−50

0

50

100

150

Input Sample Gray−Scale Level

Figure 5.4: Example of an optimized prediction mapping (solid line) for natural images and LeGall 5/3 prediction (dash-dot line) for the same context (vertical dot line indicates both neighbors value).

Pr(y[n]|m)

-128

m

y[n] 127

1

Figure 5.5: Typical conditional pdf for natural images of a sample given the mean m of its two neighbors.

the mapping when x[n] = x[n + 1] = −88. As it may be observed, the mapping of the most probable input values (around -88) is quite nonlinear. The decomposition is performed for the 5 mammography not used for the pdf estimation. They are compressed with EBCOT. For comparison, images are also decomposed with the LeGall 5/3 transform and followed by EBCOT (both steps compose JPEG2000-LS). Results (table 5.3) are 5% to 6% better for the generalized prediction for all resolution levels.

Experiment 3: Remote Sensing Images

The last experiment applies the optimized pre-

diction to a set of sea surface temperature (SST) images obtained with the Advanced Very High Resolution Radiometer sensors (AVHRR/2&3) from the National Oceanic and Atmospheric Administration (NOAA) satellite series [U.S]. Images size range from 5 to 7 Mbytes. This specific set is devoted to the African northwest coast and forms a huge image corpus and so, modeling


118

(a)

(b)

Figure 5.6: Pdf (logarithmic scale histogram) of y[n] (Y axis) conditioned to the mean value of its two vertical neighbors (X axis) for the set of training (a) mammography and (b) SST images.

250

150

200

100

150

100

Output Value

Output Value

50

1

50

0

1

0

−50

−100 −50

−150

−100

−150 −150

−100

−50

0

50


(a)

100

150

−200 −150

−100

−50

0

50

100


(b)

Figure 5.7: Example of an optimized mapping (solid line) for (a) mammography class and (b) SST class and the LeGall 5/3 prediction mapping (diagonal dash-dot straight line) for the same context (vertical dot lines indicate neighbors values).

150


Rate (bpp) Image/Filter Mamo7 Mamo8 Mamo9 Mamo10 Mamo11 Mean

2 resolution levels 5/3 Opt. Pr. 2.938 2.854 3.279 3.141 3.100 3.039 1.993 1.803 2.196 1.980 2.701 2.563


119


Table 5.3: Mammography images lossless compression bit-rate with EBCOT using LeGall 5/3 and the optimized prediction (Opt. Pr.).

Rate (bpp) Image/Filter SST AfrNW 4 SST AfrNW 5

2 resolution levels 5/3 Opt. Pr. 2.992 2.475 2.566 2.113



Table 5.4: SST images lossless compression bit-rate with EBCOT using LeGall 5/3 and the optimized prediction (Opt. Pr.).

the pdf is worth the effort compared to the gains in compression. Three SST images are used to estimate the pdf. The resulting mapping is stored in memory and then the prediction is performed using look-up-tables. The conditional pdf (figure 5.6b) of this kind of images significantly differs from that of natural images. The most light and dark values are highly probable for all contexts. In these circumstances, the LeGall 5/3 prediction performs poorly, implying that there is much to be gained escaping from the linear processing. Figure 5.7b shows an example of mapping. In this case, the context is x[n] = −1 and x[n+1] = 12. As it can be seen, optimized prediction mappings for these images are quite nonlinear. Two other SST images are compressed by these means followed by EBCOT. A gain of 20% is obtained compared to the lossless JPEG2000 (table 5.4). For the SPIHT coder gains are even larger in terms of bit savings. For instance, the SST image AfrNW 5 is compressed to 2.89 Mbytes with optimal prediction, and only to 3.29 Mbytes with the LeGall 5/3 transform.

Remarks

Results obtained by the optimized prediction are promising. A reduced context

(k = 2) has been used, but even in this case an 8 Mbytes LUT is required, which may be a drawback for some applications. In addition, a major difficulty appears at this point, since larger supports seem difficult to handle in practice because of the exponential growth of the memory requirements as a function of k. This problem may be tackled by context quantization or by taking some benefit from redundancy if the pdf shows some structure. Notice that only one LUT is used for the image class for all resolution levels whereas statis-


120

tics change from one resolution level to the next, which is a sampled version of the preceding level. Accordingly, results should improve if a LUT with the specific optimized prediction for each resolution level is used. Even better, a LUT with the optimized prediction may also be constructed and stored for each filtering direction at each resolution level. The tradeoff is between compression performance and memory requirements. This specific drawback is partly solved in the next section §5.1.3.

5.1.3

Adaptive Optimized Prediction Design

This section explains a modification of the optimized prediction presented in [Sol05] that avoids the necessity of the previous knowledge of the image pdf and thus, also avoids the storage of a LUT for every image class at the coder and decoder side. Furthermore, it may avoid any LUT storage at all if the application at hand requires it, but at the cost of further processing. Indeed, in this approach the LUT may be different from level to level and for each filtering direction, which may result in compression gains w.r.t. one fixed LUT per image class. The drawback is the computation cost of an adaptive pdf estimation. The pdf estimation should be updated at each sample n in a way that permits the coder and the decoder reaching the same results, i.e., a synchronized iterative estimation. Therefore, the prediction is adapted to image statistics and even, the pdf may be independently estimated for each resolution level and each direction reaching finer optimization than using a fixed LUT. Non-parametric density estimation methods are suited for this application because they model data without making any assumption about the form of the distribution. Kernel-based methods is a subclass of these methods which construct the estimation by locating weighted kernel-functions at the index position of the samples. Experiments using different kernel shapes and bandwidths have been carried out leading to similar results for a wide range of values. The delta function has been chosen as the kernel. It is the simplest kernel and amounts to the computation of the histogram. The delta kernel is the choice because its results are not worse w.r.t. other kernels and it has two interesting properties for our purpose. First, it is demonstrated that histogram pdf estimation converges to the optimal pdf that minimizes the detail signal energy for the image at the given resolution level and filtering direction. Second, in practice, the choice of delta avoids an explicit pdf estimation that other choices would not allow: since at each sample only one histogram bin is modified, it is only necessary to re-order that bin in the vector that relates input probabilities with output values. In consequence, the time-consuming pdf re-estimation and the sorting pass of probabilities for constructing the input-output vectors are avoided. An initial pdf estimation is required when no data is available. Different initial estimations may be considered. For example, an interesting approach is to use the LUT of the image class


Rate (bpp) SST (3 im.) Mammography (5 im.) Cmpnd1 - 512 x 768 Chart - 1688 x 2347

JPEG2000 2.874 2.444 2.082 3.088

Fixed Pred. 2.326 2.302 —– —–

121

Adap. Pred. 2.356 2.333 1.352 3.038

JPEG-LS 2.322 2.355 1.242 2.836

Table 5.5: Bit-rates comparison. Mean Values for SST and Mammography classes and for 2 synthetic/compound images using 4 resolution levels.

at hand and then refine the pdf on the fly for the specific image being coded. For the following experiments, the chosen a priori is the pdf corresponding to natural images. At a given sample, the pdf estimation is done by adding the a priori (pdf of natural images) with the histogram of all samples seen until the current one. The estimated pdf is then used to optimize the prediction for the current sample.

5.1.3.1


For testing purposes, several images are compressed with the proposed 1-D adaptive optimized prediction with 2-taps and followed by the EBCOT coder. Note that no ULS is used. The image is first filtered vertically and then, only the approximation signal is filtered horizontally (resulting in a three-band decomposition) because it is observed that applying the horizontal filter on the detail signal damages results. The pdf is estimated twice at each resolution level: vertically and horizontally. For comparison, images are coded with lossless JPEG2000 using LeGall 5/3 filter and with the fixed prediction in §5.1.2 (assuming the pdf is available for this image class) and followed by EBCOT. Table 5.5 shows results for 4 resolution level decompositions. Optimized prediction when applied to natural images tends to perform slightly worse than LeGall 5/3 filter for all resolution levels. Adaptive optimized prediction performs 4.5% better than JPEG2000 for mammography and 18% better for SST images, that is, only slightly worse than the fixed method but without the drawback of keeping a LUT in memory for every image class. For synthetic images (which cannot be treated as a class of images) the adaptive prediction gives compression rates up to 80% better than LeGall’s 5/3. As an example, results for two images from the official JPEG2000 test set (cmpnd1 and chart) are given in table 5.5. These images are composed of text, figures, and natural images (figure 5.8), so they tend to significantly worsen results of adaptive prediction with respect to “pure” synthetic images. JPEG-LS bit-rates are also given. These results show how the conditional pdf does not need to be known in advance: for a wide range of images it may be adaptively estimated. The adaptive optimized PLS is also applied to the MRI group of images through the three dimensions. The transformed coefficients are coded with SPIHT 3-D. Table 5.6 shows the bit-


(a)

122

(b)

Figure 5.8: A compound and a synthetic image, (a) cmpnd1 and (b) chart image.

Rate (bpp) 2 res. lev. 3 res. lev. 4 res. lev.

LeGall 5/3 4.980 3.798 3.667

Geom. Pr. 4.943 3.731 3.597

Fixed Pr. 4.740 3.745 3.635

Adaptive Pr. 4.632 3.618 3.508

Table 5.6: MRI set compressed with SPIHT 3-D using LeGall 5/3, the geometrical prediction (Geom. Pr.), the prediction optimized for the natural images (Fixed Pr.), and the adaptive prediction.

rates, compared to those obtained with LeGall’s 5/3, the geometrical PLS based decomposition in §5.1.1.1, and the optimized fixed prediction for the natural images, which is the point of departure for the adaptive prediction. The fixed prediction behaves better than LeGall 5/3 despite the fact that the chosen pdf is that of the natural images, which does not correspond to the MRI pdf. On the contrary, the geometrical PLS based decomposition attains better results than the fixed optimized PLS for 3 and 4 resolution levels. This means that for the MRI set the space-varying ULS is beneficial for the multi-resolution decomposition. The adaptive prediction starts with the natural pdf, but it successfully captures the underlying MRI pdf, since the final bit-rate is the best. The huge size of this set and the similarity of all images help to reach finer adaptation.

Remarks The algorithm can be slightly modified. If the transform is performed backwards, i.e., starting the prediction process at the coarsest approximation band and estimating the pdf,


123

and then computing the coefficients from coarse to fine scales, the coder and decoder can be kept perfectly synchronized. By these means, finer bands, which are also the larger, are coded with a pdf estimated from coarser resolution levels that may lead to better results than using a “blind” initial pdf. Experiments show that this gain exists, but it is marginal.

5.1.3.2

Convergence Issues

The following experiments are performed for the assessment of the adaptive pdf estimation convergence. Prediction mappings are constructed using the pdf estimated at different image points. The initial prediction coincides with the natural images prediction. Then, the pdf is progressively estimated and thus, the prediction at the end of the process employs the optimal image pdf. Since the prediction goal is to minimize the energy of the detail coefficients, mappings constructed at different points are used to decompose the whole image and the resulting detail coefficients energy is computed. Figure 5.9 depicts the evolution of the normalized detail energy depending on the percentage of the image used to construct the optimized prediction. Energy decreases in almost all cases. The convergence curve with two slopes of the Barbara image (appendix A) is due to the evident non-stationarity of this image. In the right-half of the image highly textured regions appear, like the striped trousers, that belong to contexts never seen before. When the probability conditioned to these contexts is learned, prediction that poorly performed in such difficult regions becomes quickly adapted to the new pattern. Therefore, detail energy decreases strongly after the 50% of the image is analyzed. Because of the variety of contexts and the small image size, the adaptive algorithm is able to capture Barbara image behavior. However, notice that this knowledge is obtained a posteriori. In practice, the adaptation to patterns that are initially difficult to predict is effective when they appear repeatedly. For the mammography, the curve is smoother because all contexts are quite similar and the pdf does not differ considerably from that of natural images. This difference is more remarkable for the SST image, fact that causes the strong decrease in detail signal energy, which is also due to the higher number of different contexts (for the land, sea, and cloud regions). Previous considerations lead to the conclusion that the adaptive algorithm is able to learn the image statistics. It remains to establish if this learning implies energy minimization. The experiment to prove this is summarized in figures 5.10 and 5.11. It shows the relative detail signal energy obtained by the decomposition of the image using the adaptive optimized prediction w.r.t. the detail energy obtained using the initial pdf optimized prediction. The energy is the mean of each column. A 10-tap low-pass filter has been applied to the plot in order to avoid an annoying jitter and thus making easier the observation of the more global trend. The visual inspection of figure 5.10 reveals that the learning is much more effective for the SST image. Meanwhile, the


124

Barbara Mammography 11 SST AfrNW 5 Detail signal normalized energy

1

0.8

0.6

0.4

0.2

0

0

10

20

30

40

50

60

70

80

90

100

% of Image for PDF Estimation

Figure 5.9: Adaptive prediction convergence for 3 images. Vertical axis is normalized by the energy obtained using natural images pdf mapping on the image.

adaptive estimation efficiency is less obvious for the mammography. In mean, the detail signal energy is lower. However, for some regions the adaptive algorithm performance is worse than the non adaptive case. This is due to the fact that the statistics vary in a way that produces a better performance of the initial pdf than the adaptive estimated pdf, i.e., the region statistics match better to the initial pdf assumption. It may be concluded that the adaptive algorithm attains its best performance for those images with slow-varying statistics which differ from the natural image pdf. Figure 5.11 seems to confirm the conclusion. The adaptive scheme worsens the energy obtained for the natural image Barbara w.r.t. the initial pdf. Meanwhile, the image cmpnd1 attains 1 a considerable energy reduction due to the statistics of the letter region. Note the relative energy

is one for the white regions on both sides of the image: energies are equal because both filter are the same, since the optimized prediction is that given by the initial pdf. The results for these two images highlight the relation between energy reduction and compression performance. The adaptive scheme reaches very good compression rates for image cmpnd1, whereas it does not improve results for Barbara.

5.2

Generalized Discrete Update Design

This section establishes the guidelines to reach good generalized update designs. Surprisingly, this task is much more difficult than the generalized PLS design. Fact highlighted in §2.3, where most of the reviewed works propose a prediction step, not an update, and even more, many of them do not use any update at all. While the goal of a PLS seems quite clear, this is not the same in the update case, specially in a nonlinear setting like the discrete GL framework.


125

1.2

Mammography SST

Relative Column Energy

1.1

1

0.9

0.8

0.7

0.6

0.5

0

100

200

300

400

500

600

700

800

900

Number of Column

Figure 5.10: Detail signal relative energy of the adaptive optimized prediction with respect to the energy using the initial pdf optimized prediction. Performance for a mammography and a SST images.

Cmpnd1 Barbara

Relative Column Energy

1.2

1

0.8

0.6

1 0.4

0.2

0

0

50

100

150

200

250

300

350

400

450

500

Number of Column

Figure 5.11: Detail signal relative energy of the adaptive optimized prediction with respect to the energy using the initial pdf optimized prediction. Performance for Barbara and cmpnd1 images.


126

Section 5.2.1 tries to put in evidence the objectives of an ULS. At the end of §5.2.1 a joint update-prediction step is proposed in order to show the importance of the approximation signal multi-resolution properties. Final sections describe two situations. The first one, the design of an ULS when it is the first of the lifting steps (§5.2.3) and the second one, when the ULS follows the prediction (§5.2.2).

5.2.1

Update Step Objectives

The most usual lifting structure is the prediction-then-update (or update-last) structure, in which the polyphase decomposition is directly followed by a PLS. First, even samples channel is used to extract redundancy from the odd channel. The differences, which are the detail or wavelet coefficients, are left in this odd channel. Details are small, except at significant features (those difficult to predict from neighboring data). Then, wavelet coefficients are used to update the even channel in order to obtain a coarse scale version of the input signal that approximates this input as accurately as possible. The ULS can be seen as an anti-aliasing filter after the data splitting, i.e., it has the same objective as the low-pass filter in the classical filter bank implementation. In these prediction-then-update schemes the function of the ULS is twofold: to obtain an accurate approximation signal for embedded coding and to ensure that this signal is useful for the next resolution level processing. A problem arises when the PLS is nonlinear, because it is not clear how to construct an ULS that preserves signal for further processing. Frequency localization is a main property for filter design that is lost. In consequence, powerful linear signal processing tools (as Fourier or z transforms) are not available any more. The non-existence of their nonlinear equivalent seriously limits the ability to face the challenge. Possibly for this reason, there are no works in compression applications (to the author’s knowledge) with an update after a nonlinear prediction except the proposal in §5.1.1.1. In consequence, a down-sampled version of the original signal (that is, the approximation subsignal without update) seems to have better multi-resolution properties than any output of two nonlinear filter stages. In lossy subband video coding, some authors, e.g. [Luo01] have reported that the ULS degrades rate-distortion performance and should be omitted altogether, leading to a truncated wavelet transform. This is a controversial assert since other works [Gir05, Til05] suggest that an accurate design of the ULS produces better rate-distortion curves. To sum up, in the video coding field the appropriate motion compensation for an update step is not obvious at all. Two ways may still be open for the prediction-then-update architecture with nonlinear filters. The first one is to assume signal frequency interpretation despite it is false. As a first approximation (like in §5.1.1) it may be useful. However, the assumption forces to remain near the linear restrictions. A second way is to definitively free the scheme from linear ties at the


127

cost of resigning oneself to the reduced remaining tools at hand. Section 5.2.2 follows this path, fully interpreting signal from the statistical point of view. The ULS assumes that the image pdf is known and minimizes the signal entropy, which is a common optimization criterion in compression applications (cf. §2.3.2). In some proposals the update is the first of the steps after the splitting, an update-thenprediction (or update-first) lifting scheme. In this architecture, while PLS goal remains the same, the ULS purpose may change. The following analyzes the ULS purpose in an update-first structure. Linear filter banks may be reversed, interchanging analysis and synthesis stages and so, the corresponding underlying biorthogonal basis. From this point of view, the linear space is equally partitioned independently of the order of the bases. Therefore, if the linear predictionthen-update structure is reversed the result is still a wavelet filter bank. For instance, the 5/3 wavelet becomes a 3/5 wavelet that has an update as first lifting step in the analysis side. Also, there exist families of wavelets which have larger high-pass filter, for example the (1, N ) Cohen-Daubechies-Feauveau family [Coh92]. All these examples fit in the update-then-prediction lifting structure. In this situation, the update is a low-pass filter that pretends to conserve signal running average. However, a scaling factor is required in the even channel in order to maintain the expected signal mean. The coefficients after the scaling factor are real-valued. The inclusion of a rounding operator at this point is not possible without losing information, so these schemes are unsuitable for lossless coding applications. They are used in lossy compression, but such a filter bank produces more ringing artifacts in the decoded image at low bit-rate. A second ULS purpose is to preserve singularities in the approximation signal. Image salient structures that may carry most significant information are preserved at coarser scales. The drawback is precisely that through these structures the prediction is difficult and a singularity preserving ULS makes this difficulty to be found in the approximation signal throughout all the resolution levels, thus possibly damaging the global performance. However, this approach is interesting for embedded coding. Finally, the update-then-prediction structure has an advantage if the ULS is linear and the transform is only iterated on the low-pass channel because then all low-pass coefficients throughout the entire decomposition linearly depend on the original data (as the example of figure 2.11) and so, they are not affected by the prediction step, which may be nonlinear and freely designed. Moreover, the PLS is only based on low-pass coefficients. Once surveyed these objectives, the possibility to include them in the design of an ULS within the discrete generalized lifting framework remains to be analyzed. Some of the reviewed hints are retaken in the following sections for an ULS design. Perhaps the most damaging consequence of the nonlinear processing choice is that the multi-resolution analysis of nested subspaces is


x[n]

128

x'[n] UP mapping

x[n+1]

y[n]

x[n+1]

y'[n]

Figure 5.12: Joint update-prediction (UP) mapping between 3-D spaces.

abandoned far behind, so output signals may not possess, in the classical sense, any multiresolution property. The importance of this point is illustrated in the joint update-prediction design, in which despite of the very low energy of the detail coefficients, the algorithm may be hardly iterated on the approximation signal.

5.2.1.1

Joint Update-Prediction Design

The goal of this scheme is to permit a joint design of both lifting steps with the same objective: to minimize the detail signal energy. The idea is to extend the GL to let two samples be modified at the same time. In this situation, the column mappings become a plane mapping in which an approximation and a detail sample are transformed at the same time (figure 5.12). This permits to avoid a separate design of the lifting steps when in fact their common objective is to obtain good transforms for compression. Therefore, the set of a prediction and the successive update mapping is embedded in one joint update-prediction (UP) mapping. The construction of the UP step is analogous to that of the optimized prediction and the same knowledge about the image pdf is assumed. The conditional probabilities of the points within a plane P r(x[n], y[n]|x[n + 1]) are ordered and, according to this order, the points are mapped to the transformed plane. Like in the optimized prediction case, most probable inputs are mapped to smallest energy outputs. In this 2-D map, a second criterion is required to distinguish the Z255 outputs with same detail y 0 [n], that is, the value of the approximation sample x0 [n]. The choice is to select x0 [n] in order to improve the horizontal (vertical) filtering that follows the vertical (horizontal) filtering. According to this, the most probable input is mapped to the output value that maximizes the probability of x0 [n] conditioned to its horizontal neighbor x0h [n], the second most probable input is mapped to the second horizontally conditioned output, and so on. Both criteria and mappings are mixed in an UP step with the lexicographical order.


129

The scheme is applied to the mammography set. Images are decomposed in two and three resolution levels, iterating the algorithm on the approximation bands. The energy of each band is computed. Energy for the first high band (H1) is from 2 to 6 times smaller than using the optimized prediction. Most of the detail coefficients are zero. On the other hand, LH1 bands have higher energy than in the optimized prediction case. Compression results with EBCOT are slightly worse than JPEG2000. When the algorithm is iterated another resolution level, new detail bands energy dramatically increases. This is due to the bad multi-resolution properties of the UP step, which only aims to minimize detail bands without creating a good signal for further processing it. The approximation band entropy is elevated, and its aspect is similar to noise. Results with EBCOT for 3 resolution levels get worse with respect to those of JPEG2000. However, notice that this entropy coder is not suited for the kind of approximation signal supplied by the UP transform.

5.2.2

Update-Last Design

The entropy minimization criterion is proposed in this section for the ULS design with the prediction-then-update structure.

5.2.2.1

Entropy Minimization

In general, there is correlation between a signal entropy and its achievable factor of compression. A reasonable design goal for the ULS is to minimize the entropy of the approximation signal, H(x0 ). The approach is similar to that of the optimized prediction. The optimal generalized ULS according to this criterion is Uopt = arg min H(x0 ) U

1 ] = arg min EP r [log U P r(x0 ) [ = arg min EP r [−log P r(x0 )|y0 = i]. ∀i∈Zk255

Ui

(5.10)

The expectancy conditioned to the context value gives the update restricted to that context, E[− log P r(x0 )|y0 = i] = −

127 X

P r(x0 = n|y0 = i) log P r(x0 = n)

n=−128

= −

127 X

P r(U (x, y0 ) = n|y0 = i) log P r(x0 = n)

n=−128

= −

127 X n=−128

P r(Ui (x) = n|y0 = i) log P r(x0 = n).


130

Introducing the permutation matrix, expression 5.11 is obtained: T = −log P r(x0 = −128) . . . −log P r(x0 = 127) Ui P r(x = −128|y0 = i) . . . P r(x = 127|y0 = i) . (5.11) Unfortunately, the column updates cannot be independently optimized w.r.t. the entropy (as it happens in the prediction energy minimization case). The mappings are coupled because the probability of x0 is a function of all the column updates: X X P r(x0 ) = P r(x0 |y0 = l)P r(y0 = l) = P r(Ul (x, y0 = l)|y0 = l)P r(y0 = l). l

l

The coupling also implies that the probability vector P r(x0 ) affects the update design at the same time that the update determines the probability vector. The argument of the minimum Ui of the expression X X P r(Ul (x)|y0 = l)P r(y0 = l) E[− log P r(x0 )|y0 = i] = − P r(Ui (x) = n|y0 = i) log n

depends of all the other Ul , ∀l ∈

l k . Z255

Assume for the moment that there is no coupling in the

probability vector (i.e., that the vector with the probabilities of x0 , noted rx0 , is fixed), then the minimization is straightforward. Since − log(.) is a strictly decreasing function, minimizing (5.11) is the same as minimizing (5.12), and it is minimized as proposition 5.1 indicates. The elements of the LHS vector of (5.12) have a negative sign, so the optimal update permutation matrix multiplies the elements of high probability of the LHS vector with the high value elements of the RHS vector, just the opposite of the optimized prediction case of section 5.1.2. Therefore, if the probability vector rx0 is known, the update design is independent for every column, and the optimal step is realized with the permutation matrices in a similar way to the optimized prediction, according to the expression T − P r(x0 = −128) . . . P r(x0 = 127) Ui P r(x = −128|y0 = i) . . . P r(x = 127|y0 = i) . (5.12) In practice, rx0 may not be known, since it depends of every Ui . In this case, the design strategy is different: the optimal distribution of probabilities within the rx0 vector has to be found. For simplicity, let denote the expression (5.12) as rTx0 Ui rx|y0 =i . If vectors rx0 and rx|y0 =i have the element order shown in (5.12), then the permutation matrix Ui is the one used for the column i transform. Therefore, the probability vector can be expressed as function of the permutation matrices for every context, X rx0 = Ui rx|y0 =i P r(y0 = i). i

The optimal distribution of the probabilities rx0 that minimizes the entropy is reached when the maximum of every conditional probability is mapped to the same output, the second maximum to the same second output, and so on. In the appendix 5.A, proposition 5.2, which derives


131

from lemma 5.2, and the subsequent corollary and remarks demonstrate that this mapping is the optimal one. Since in the described case there is no initial vector with ordered values that fixes the output samples order, like vector rT does in proposition 5.2, then output labels can be arbitrarily chosen. This derives from the fact that the entropy of a random variable is invariable through a bijective mapping. Therefore, the optimized mapping is established up to an assignment of labels. This assignment has to be further indicated. There seems to be several alternatives. For instance, it is interesting to retain original signal statistics and this is possible if they are previously known. Another possibility is to minimize the approximation signal energy by mapping most probable values to lowest energies.

Experiments and Results This last option has been chosen to perform an experiment with the mammography and the SST image classes. Three pdf are estimated for constructing three LUT using an image training set: one LUT for the optimized vertical prediction, one for the vertical optimized update minimizing the entropy, and the last one for the optimized horizontal prediction. As done with previous experiments k = 2, i.e., two approximation neighbor samples are employed for each PLS and two detail neighbor samples for the ULS. With such LUT, a two level decomposition is computed for the test images set. Results are given in table 5.7. The entropy descends for training images, but only sometimes for the test set. The compression rate does not improve w.r.t. the no-update case. This is a shocking result, but it may be explained. Output samples have little relation among them, since their entropy is minimized without any other consideration. The value of one sample gives little information about their neighbors value, so the following prediction performs poorly. Also, the gains in entropy are quite small because detail samples do not partition probability space efficiently: approximation samples are almost independent of the values of the detail samples. Furthermore, because of the low degree of dependence, the update pdf is very image specific, considerably varying from one image to other even within the same image class, so the estimated pdf usefulness is very restricted. These drawbacks are the cost to pay for the chosen nonlinear prediction. However, EBCOT entropy coder is not suited for the kind of signals supplied by the entropy-minimizing ULS transform. EBCOT expects approximation coefficients that are quite different from the ones supplied by the transform. Surely, an entropy coder specifically created for such a transform and its output signal statistics would improve results.

5.2.3

Update-First Design

The use of a geometrical approach similar to the one in §5.1.1 for the construction of an ULS leads to an update-first lifting step equal to the identity because of the restrictions of the problem: the operator is integer-to-integer by definition and it should preserve the ranks of the discrete input-output space. If a linear part is included in order to attain multi-resolution properties,


Entropy Image Mamo1 Mamo2 Mamo3 Mamo4 Mamo5 Mamo6 Mamo7 Mamo8 Mamo9 Mamo10

No Up. 6.213 5.880 6.242 5.834 5.647 4.902 6.131 6.222 6.345 4.291

Vertical Approximation Image Opt. Up. Image No Up. 6.012 SST AfrNW 1 6.097 5.721 SST AfrNW 2 5.767 6.029 SST AfrNW 3 5.302 5.678 5.722 4.925 SST AfrNW 4 4.512 6.004 SST AfrNW 5 4.160 6.283 6.180 4.278

132

Opt. Up. 5.940 5.619 5.182

4.532 4.250

Table 5.7: Mammography and SST images first vertical approximation image entropy. Comparison between down-sampled image (No. Up.) and the entropy optimized update output (Opt. Up.). Images are divided into the training set at the top and the test set at the bottom.

then the only reasonable ULS respecting such restrictions seems to be the identity operator. Alternatively, the update-first design problem may be seen from the pdf point of view with the goal of minimizing the detail signal energy arising from the subsequent prediction. In this case, for natural images it happens that the most probable value is already at the optimal position in the space for the ensuing prediction, then the second most probable is at the second most probable point, and so on. In conclusion, the resulting optimal ULS is also the identity. Another criterion is required. This section proposes to employ the knowledge of the pdf to minimize the approximation signal gradient, which is a nonlinear GLS version of the ULS designs proposed in §3.3.3.

5.2.3.1

Gradient Minimization

In the entropy-optimized prediction design §5.2.2, somehow the multi-resolution image properties have been destroyed in return for a small gain in entropy terms, which is not enough for obtaining compression improvements using EBCOT. While entropy may decrease, the difference between neighbor samples tends to be more random. This leads to design a lifting that may preserve multi-resolution properties; in this case, by minimizing the gradient between the samples which are neighbors in the coarser resolution level. In the proposed scheme, the update is the first of the lifting steps and it creates a 2-D approximation image. Update acts on one of every two samples of the coarser resolution level: approximation image is partitioned in two quincunx grids and one of these two grids is modified by the update. The other rests unmodified in order to retain some of the original image statistics. The update mapping is the equivalent to that of section §5.2.2 minimizing the entropy. In this case, the labels are chosen to minimize the gradient of the updated sample with respect to


133

its four-connected neighbors at the coarser level (which are not updated). Then, the PLS is performed vertically and horizontally leading to a 3 band decomposition.

Experiment and Results This approach is tested with the mammography set. Images are decomposed in three resolution levels in order to establish whether the algorithm can be iterated on the approximation signal keeping reasonable results. The approximation image is observed to be smoother than without the use of the update. In the opposite, detail energy is higher than employing only the optimized prediction. In mean, compression results are marginally better than JPEG2000 with 3 resolution levels, a 0.5% of improvement.

5.3

Nonlinear Lifting Chapters Summary and Conclusions

Chapters 4 and 5 deal with a nonlinear lifting scheme setting. The point of departure is the analysis of an adaptive lifting that reveals some clues for the further nonlinear LS development. LS as a mapping between real spaces facilitates the construction of the two adaptive ULS introduced in 4.3. The two approaches are differentiated: the first one is based on a rank-order filter decision function, while the second one is the local variance value which triggers the update filter choice. Both approaches show potential for image coding. The adaptive lifting analysis also guides to the generalized lifting scheme formulation. Initially, the continuous version is stated and its basic properties are explained, but then, the effort is mainly focused on the generalized discrete version development. The discrete GL scheme is explored providing competitive coding results in image lossless compression. However, several drawbacks have to be overcome, specially in the ULS construction. Also, the framework is essentially devoted to nonlinear processing, which implies that an embedded lossy-to-lossless coding becomes difficult. Further conclusions and future work are postponed to chapter 6.


5.A

134

Appendix: Proof of Minimum Energy/Entropy Mappings

Lemma 5.1 Given the sets of real numbers s1 , s2 ∈ R and r1 , r2 ∈ R such that s1 > s2 and r1 > r2 , then s1 r1 + s2 r2 > s1 r2 + s2 r1 . Proof. s1 − s2 > 0 and r1 − r2 > 0 ⇒ (s1 − s2 )(r1 − r2 ) > 0 ⇒ s1 r1 + s2 r2 > s1 r2 + s2 r1 . Proposition 5.1 Let s, r ∈ Rn , where the elements of r = (r1 r2 . . . rn )T are sorted r1 < r2 < . . . < rn and the elements of s are all different. Let P be an n × n permutation matrix and Po be the n × n permutation matrix such that sTo = sT Po = (so1 so2 . . . son ) and so1 > so2 > . . . > son . Then, Po is optimal in the sense that ∀P 6= Po , f ? = sT Po r < sT P r. That is, Po = arg min sT P r. P

Proof. The demonstration shows that the objective value f = sT P r is not a minimum for any P 6= Po . Let s0 be s0T = sT P = (s01 s02 . . . s0n ). By definition, the inequalities s01 > s02 > . . . > s0n only stand for Po , so for any P 6= Po there exists at least a couple i, j such that i < j and s0i < s0j . Let Pi,j be the n × n permutation matrix that swaps the ith and j th column vector elements. Then, f1 = sT P r = s0T r = s01 r1 + . . . + s0i ri + . . . + s0j rj + . . . + s0n rn > s01 r1 + . . . + s0j ri + . . . + s0i rj + . . . + s0n rn = s0T Pi,j r = sT P Pi,j r = sT P2 r = f2 , so f1 > f2 holds iff s0i ri + s0j rj > s0j ri + s0i rj , which is verified by applying lemma 5.1 on the set of values s0j > s0i and rj > ri . Therefore, P2 is a permutation matrix (product of two other permutation matrices) that reaches a lower objective value than P.

Corollary 5.1 If there are two equal elements of s, there are also two optimal permutation matrices and they are related through a third permutation matrix which swaps the two equal elements of the row vector. Proof. Assume that Po1 is an optimal permutation matrix, that s0T = sT Po1 = (s01 s02 . . . s0n ) and that the two equal elements are s0i = s0j . Then, f ? = sT Po1 r = s0T r = s01 r1 + . . . + s0i ri + s0j rj + . . . + s0n rn = s01 r1 + . . . + s0j ri + s0i rj + . . . + s0n rn = s0T Pi,j r = sT Po1 Pi,j r = sT Po2 r = f2 .

(5.13)


135

Equality 5.13 holds because s0i = s0j , so f ? = f2 and Po2 is also an optimal matrix. Finally, both optimal matrices are related through Po2 = Po1 Pi,j .

Remarks. Previous results can be extended to any number of couples of equal elements and any number of equal elements. Results still hold if s is the sorted vector and r the vector to be permuted. Proofs are straightforward. Finally, the same optimal re-arrangement of vectors elements by a permutation matrix can be done if neither vector is sorted. Lemma 5.2 Given the sets of non-negative real numbers s1 , s2 ∈ R+ and r1 , r2 ∈ R+ such that s1 > s2 and r1 > r2 , then −(s1 + r1 ) log(s1 + r1 ) − (s2 + r2 ) log(s2 + r2 ) < −(s1 + r2 ) log(s1 + r2 ) − (s2 + r1 ) log(s2 + r1 ). Proof. Let define ~(a, b) = −a log(a) − b log(b) and f (ε) = ~(a − ε, b + ε). Note that ∀a, b ≥ 0, ~(a, b) is continuous and ~(a, b) = ~(b, a). The derivative of f respect to ε is f 0 (ε) = log

a − ε . b+ε

(5.14)

Put that a = s1 + r1 and b = s2 + r2 , then the thesis of the lemma says ~(a, b) < ~(a − (s1 − s2 ), b + (s1 − s2 )) = ~(a − (r1 − r2 ), b + (r1 − r2 )). Let lmin = min(s1 −s2 , r1 −r2 ). By definition lmin > 0. Therefore, the demonstration reduces to show that f (0) < f (lmin ) and it occurs if f is an increasing function in the interval between 0 and lmin , i.e., if ∀ε ∈ [0, lmin ], f 0 (ε) > 0. The derivative f 0 (ε) (equation 5.14) is positive as long as a − ε > b + ε. • For ε = 0: a = s1 + r1 > s2 + r2 = b ⇒ f 0 (0) > 0. • For ε = lmin : If lmin = s1 − s2 < r1 − r2 ⇒ a − lmin = s2 + r1 > s1 + r2 = b + lmin ⇒ f 0 (lmin ) > 0. If lmin = r1 − r2 < s1 − s2 ⇒ a − lmin = s1 + r2 > s2 + r2 = b + lmin ⇒ f 0 (lmin ) > 0. • For 0 < ε < lmin : d = lmin − ε > 0, then a − ε > a − ε − d = a − lmin > b + lmin > b + lmin − d = b + ε ⇒ a − ε > b + ε ⇒ f 0 (ε) > 0. As f is a continuous function with continuous strictly positive derivative in the interval [0, lmin ], then f (0) < f (lmin ) and the proof is completed.


136

Lemma 5.2 is extended to n-dimensional inputs in proposition 5.2. Let v ∈ Rn+ be an ndimensional vector v = (v1 . . . vn )T . Function ~(.) is defined for such input vectors as ~(v) = −v1 log(v1 ) − v2 log(v2 ) . . . − vn log(vn ). Proposition 5.2 Let s, r ∈ Rn+ , where the elements of r = (r1 r2 . . . rn )T are sorted r1 > r2 > . . . > rn and the elements of s are all different. Let U be an n × n permutation matrix and Uo be the n × n permutation matrix such that so = Uo s = (so1 so2 . . . son )T and so1 > so2 > . . . > son . Then, Uo is optimal in the sense that ∀U 6= Uo , f ? = ~ r + Uo s < ~ r + U s . That is, Uo = arg min ~ r + U s . U

Proof. The demonstration follows the same lines as proposition 5.1; it is shown that the objective value f = ~ r + U s is not a minimum for any U 6= Uo . Let s0 be s0 = U s = (s01 s02 . . . s0n )T . By definition, the inequalities s01 > s02 > . . . > s0n only stand for Uo . For any U 6= Uo there exists at least a couple i, j such that i < j and s0i < s0j . Let Ui,j be the n × n permutation matrix that swaps the ith and j th column vector elements. Then, f1 = ~ r + U s = ~ r + s0 = = −(s01 + r1 ) log(s01 + r1 ) − . . . − (s0i + ri ) log(s0i + ri ) − . . . − (s0j + rj ) log(s0j + rj ) − . . . . . . − (s0n + rn ) log(s0n + rn ) > −(s01 + r1 ) log(s01 + r1 ) − . . . . . . − (s0j + ri ) log(s0j + ri ) − . . . − (s0i + rj ) log(s0i + rj ) − . . . − (s0n + rn ) log(s0n + rn ) = = ~ r + Ui,j s0 = ~ r + Ui,j U s = ~ r + U2 s = f2 , so f1 > f2 holds if and only if −(s0i + ri ) log(s0i + ri ) − (s0j + rj ) log(s0j + rj ) > −(s0i + rj ) log(s0i + rj ) − (s0j + ri ) log(s0j + ri ), which is verified by applying lemma 5.2 on the set of values s0j > s0i and ri > rj . Therefore, U2 is a permutation matrix (product of two other permutation matrices) that reaches a lower objective value than U.

Corollary 5.2 If there are two equal elements of s, there are also two optimal permutation matrices U and they are related through a third permutation matrix which swaps the two equal elements of the column vector. Remarks. Also in this case, previous results can be extended to any number of couples of equal elements and any number of equal elements. Results still hold if s is the sorted vector and r the vector to be permuted. Proofs are straightforward. Finally, the same optimal re-arrangement of vectors elements by a permutation matrix can be done if neither vector is sorted.


137

If there is more than one vector s1 , s2 , . . . , sm , then f = ~ r + U1 s1 + . . . + Um sm

is minimized when permutation matrices Ui with 1 ≤ i ≤ m align the values of each si in P P the same fashion as in the two-vector case (proposition 5.2). If j rj + i si,j = 1, then the resulting column vector r + U1 s1 + . . . + Um sm is a probability distribution and f = ~ r + U1 s1 + . . . + Um sm its entropy.


5.B

138

Appendix: Algorithm to Implement 3-D SPIHT

The implemented three-dimensional SPIHT is described in this appendix. Notation follows that of section 2.5. The presented algorithm is essentially equal to the algorithm proposed in [Sai96a], but extended to three dimensions and allowing signed and unsigned input data. The parent-child relation and subband notation is depicted in figure 5.13. Notation includes the third dimension, which is the temporal dimension in video coding case, the depth dimension in the 3-D images case, or a component counter in the case of multi-spectral images. Therefore, in the 3-D SPIHT, coefficient stands for transform of the voxels in a volume or 3-D image, but also for the transformed pixels coming from a multi-component image, or for the coefficients of a video signal spatio-temporal transform. Each coefficient ci,j,k has eight children, except the coefficients in each one of the seven highest resolution bands and one of every eight coefficients in the lowest band (LLL). The following sets of coordinates help to better explain the 3-D SPIHT coder: • C(i, j, k): set of coordinates of all children of the node (i, j, k). • D(i, j, k): set of coordinates of all descendants of the node (i, j, k). • H: set of coordinates of all spatial orientation tree roots, i.e., nodes in the lowest resolution level. • L(i, j, k) = D(i, j, k) − C(i, j, k): set of the descendants without the children, i.e., the grandchildren, great grandchildren, etc. The spatial orientation trees are the partitioning subsets in the sorting algorithm. The set partitioning rules are the following: 1. The initial partition is formed with the sets {(i, j, k)} and D(i, j, k), ∀(i, j, k) ∈ H. 2. If D(i, j, k) is significant, then it is partitioned into L(i, j, k) plus the eight single-element sets with (p, q, r) ∈ C(i, j, k). 3. If L(i, j, k) is significant, then it is partitioned into eight sets D(p, q, r), with (p, q, r) ∈ C(i, j, k). The significance information is stored in the three ordered lists: the list of insignificant sets (LIS), the list of insignificant coefficients (LIC), and the list of significant coefficients (LSC). In all the lists, each entry is identified by a coordinate (i, j, k), which represents individual coefficients in the LIC and LSC, and represents either the set D(i, j, k) or L(i, j, k) in the LIS.


139

z

x

Resolution level Depth/Temporal

LLH1

Horizontal Vertical

y

Figure 5.13: Parent-child relationships and nomenclature in the 3-D SPIHT.

To differentiate between them, a LIS entry is said of type A if it represents D(i, j, k) and of type B if it represents L(i, j, k). The significance of a set T relative to a magnitude given by n is a function defined as   1, if maximum(|ci,j,k |) ≥ 2n , (i,j,k)∈T Sn (T ) =  0, otherwise. If a coefficient in the set T is significant w.r.t. the threshold 2n , the whole set is significant and the function output is 1. If no coefficient magnitude is significant, the set is insignificant and the function output is 0. If the set T is a single coefficient, notation is simplified: Sn ({(i, j, k)}) ∼ = Sn (i, j, k). The 3-D encoding algorithm is presented in its entirety in table 5.8.


140

SPIHT 3-D: 0. Initialization: Output n = blog2 max(i,j,k) (|ci,j,k |)c. Set the LSC as an empty list, add the coordinates (i, j, k) ∈ H to the LIC, and only those with descendants to the LIS, as type A entries. 1. Significance Pass: 1.1. For each entry (i, j, k) in the LIC do 1.1.1. Output Sn (i, j, k). 1.1.2. If Sn (i, j, k) = 1, then move (i, j, k) to the LSC. • If signed data, then output the sign of ci,j,k . 1.2. For each entry (i, j, k) in the LIS do 1.2.1. If the entry is of type A, then • Output Sn (D(i, j, k)). • If Sn (D(i, j, k)) = 1 then ∗ For each (p, q, r) ∈ C(i, j, k) do ◦ Output Sn (p, q, r). ◦ If Sn (p, q, r) = 1 then add (p, q, r) to the LSC. - If signed data, then output the sign of cp,q,r . ◦ If Sn (p, q, r) = 0 then add (p, q, r) to the end of the LIC. ∗ If L(i, j, k) 6= Ø, move (i, j, k) to the end of the LIS as an entry of type B, and go to step 1.2.2. Otherwise, remove entry (i, j, k) from LIS. 1.2.2. If the entry is of type B, then • Output Sn (L(i, j, k)). • If Sn (L(i, j, k)) = 1 then ∗ Add each (p, q, r) ∈ C(i, j, k) to the end of the LIS as an entry of type A. ∗ Remove (p, q, r) from the LIS. 2. Refinement Pass: For each entry (i, j, k) in the LSC, except those included in the last sorting pass, output the nth most significant bit of |ci,j,k | 3. Quantization-step Update: Decrement n by 1 and go to step 1.

Table 5.8: Description of the 3-D SPIHT encoding algorithm.

Chapter 6

Conclusions and Future Work This Ph.D. dissertation has considered lifting scheme design for image compression applications. Two design framework have been proposed and analyzed, namely the interpolative and projection-based linear lifting setting and the adaptive and generalized nonlinear setting.

6.1

Conclusions

Chapter 3 is devoted to explain an interpolation framework and its modification for the development of new linear lifting steps, including a variety of space-varying steps. An optimality analysis and results are derived from the proposed designs. The first part of this section reviews chapter 3 and draws the main conclusions. A quadratic interpolation method is presented. The algorithm is able to interpolate by any factor and to model properties of the data acquisition system. The quadratic model may be determined from the local image data, from an image model, or from a combination of both. The proposed formulation of the quadratic interpolation as a convex optimization problem provides the flexibility to introduce knowledge in different ways and to derive the corresponding optimal solutions. Some of the additional considered information is the number of bits representing the pixels, the weighting factors of the local image patches, the signal energy, or the possible smoothness of the original image. This additional knowledge is incorporated by modifying the objective function or by adding linear equality and inequality constraints. Closed-form solutions are found by means of the KKT conditions. Problems with inequality linear constraints or with l1 -norm objective functions have no closed-form solutions. In these cases, the optimal interpolation is obtained by iterative optimization strategies efficiently implemented in many software packages. The interpolation methods are not the best in literature when they face up to a direct interpolation. Their performance is evaluated with the PSNR and it tends to be slightly bet141

Chapter 6. Conclusions and Future Work

142

ter than the bi-cubic interpolation. Nevertheless, if the image is low-pass filtered before the down-sampling, then the proposed methods outperform the bi-cubic. The provided experiment averages four pixels and then down-samples. The subsequent interpolation is 1.5-2 dB better w.r.t. the bi-cubic. Furthermore, the methods are useful for the lifting step construction by the addition of a set of linear equality constraints reflecting in the formulation the inner products due to DWT coefficients. This latter variation allows the construction of novel first and second PLS and a ULS. The optimization criteria are the detail signal energy for the PLS and the gradient of the updated sample w.r.t. its non-updated neighbors for the ULS. Note however that any of the precedent interpolation solutions may be applied to obtain new lifting steps. Two other optimization criteria are considered for the ULS. The objective functions are the approximation signal gradient and the coarser level detail signal energy. Therefore, first prediction, updates, and second prediction steps are designed within a common framework. The formulation has also turned out to be an adequate tool for the optimality analysis of existing wavelet transforms according to the selected criteria and image models. The image model seems to be a sensitive choice. The experiments are performed using auto-regressive models, but others may be considered. Once the model is selected, the parameters for a given image class are estimated, the auto-correlation matrix is derived, and the optimal lifting filter is determined. The study of the first PLS confronts the LS resulting from the AR-1 image model with the LeGall 5/3 prediction. This model is widely used by the image processing community, but when applied to derive an optimal first PLS with 2 samples the result is not the common sample mean, which reports the best image coding results. This leads to also adopt the AR-2 model for the subsequent experiments. However, the approach consistence is confirmed by the better coding of AR-1 processes with the optimal prediction and updates compared to their LeGall 5/3 counterparts. On the other hand, the first PLS derived with the AR-2 model reduces to the two-sample mean for the parameter values that appear in practice. The study of the second PLS optimality helps to recognize the type of images that should be coded with the 5/11-a filter, with the 5/11-b, with any of the intermediate optimal PLS, or even when the second PLS should be avoided. The choice depends on the estimated first-order auto-regressive parameter. For some image classes, like the mammography or the SST, the optimal ULS with the auto-regressive model of first and second order are found to be far from the usual LeGall 5/3 case. In this case, the difference implies an improvement that is supported by the optimized ULS application to image compression within the JPEG2000 environment. The bit-stream size diminishes 8% for the synthetic and 4% for the SST images, but increases for the mammography. This last case is treated separately and overcome using an ULS specific for the foreground and another one for the background. Several experiments and a variety of modifications are provided:


143

lifting filters on a quincunx grid, space-varying steps, and local adaptive steps in a line-wise basis. These experiments illustrate the richness of the approach. Chapters 4 and 5 develop the nonlinear lifting scheme framework. The initial adaptive lifting analysis reveals some important clues for the contributions made in the nonlinear lifting. LS considered as a mapping between real spaces leads to the construction of the two adaptive ULS introduced in 4.3 and the generalized lifting scheme formulation. The discrete GL scheme is explored providing prediction and update step designs. Competitive coding results are obtained. The rest of the section extends this nonlinear lifting summary and provides the main conclusions. The adaptive lifting scheme is analyzed in §4.2 from a new point of view: as a mapping between real spaces. The analysis is useful to gain an insight in such kind of transforms and unveils some hints for the further nonlinear lifting development. The new interpretation leads to two novel adaptive ULS designs with a median-based decision function and a variance-based decision function. Results in terms of the weighted entropy for both adaptive ULS are given showing their potential in lossy and lossless image compression. The generalized lifting scheme is proposed. It defines the lifting step as a mapping between spaces. The continuous version is introduced and some of its relevant properties are stated. Among them, the capacity to recover the decomposition basis from the transform coefficients is explained in more detail. However, the quantization arises as a problem for such scheme and in consequence, the discrete version is formulated to overcome it. Several generalized discrete designs are given. First, the geometrical prediction is introduced. The original three-rule design shows the GL scheme potential and flexibility. Good image compression results are obtained with the SPIHT coder. A 3-D version of this coder is developed and employed for the coding of 3-D images. The geometrical prediction outperforms the LeGall 5/3 wavelet for these images with the 3-D SPIHT. Then, the image pdf is employed to optimize the generalized prediction. Propositions 5.1 and 5.2 are demonstrated. They show that the optimized prediction attains the minimal detail signal energy as well as the minimal entropy at the same time. In the experimental setting, the optimized prediction is derived for the natural, mammography, and SST image classes. Optimization based on the natural image class pdf is found to be similar to the LeGall 5/3 prediction and so, coding gain is small. However, the gain is considerable for the last two classes, that is, when the pdf is remarkably different from the natural image class pdf. The optimized prediction main drawback is the LUT storage in the coder and decoder side. This fact impels the creation of the adaptive optimized generalized prediction. Despite the adaptation algorithm is simple, it offers good convergence properties as seen in the experiments in §5.1.3.2. This amounts to compression results only slightly worse than the non-adaptive version. Additionally, the adaptive prediction is applicable to images that do not belong to a


144

class of images with a common pdf. For example, LeGall 5/3 wavelet is clearly outperformed by the adaptive generalized scheme for the synthetic images. The possibilities of generalized ULS construction are explored. The need to preserve the signal properties for a multi-resolution processing is established. Then, the problem is divided into the update-first and update-last structures and a design for each structure is proposed. The minimal entropy mapping is used for the creation of a generalized update-last step. The design requires an accurate label selection. A gradient minimizing design is proposed for the update-first design. Results are obtained for both designs: they are acceptable, but in general they are worse than the LeGall 5/3 results for the test set. However, the used entropy coders are specifically devoted to linear wavelet coefficients. Better results are expected from the use of an entropy coder that takes into account the characteristics of the coefficients arising from the nonlinear ULS schemes. Once this is said, the conclusion is that the generalized update step development remains a widely open problem. Some hints have been found but clearly a large amount of work is still required.

6.2

Future Work

There exist several lines for future research that can be taken as an extension of the work carried out in this dissertation. Concerning a more theoretical part, there are several points which could be focused in a following of this Ph.D. dissertation: • Chapter 3 explains an optimality analysis of existing wavelets based on the stated design criteria. The converse would be interesting, that is, the study of the new transforms considering some usual informative mathematical parameters of the wavelet transform: number of vanishing moments, coding gain, Riesz bounds, regularity (Hölder and Sobolev), angle between the analysis and synthesis spaces, etc. This study would give an additional comparison basis for the proposed linear schemes. Similarly, an analysis of the nonlinear decompositions properties is possible. The study may include reversibility in lossy compression, stability, synchronization, artifacts, and frequency-domain characteristics (if appropriate) of the approximation signal. Following this line, the resulting bitstream functionalities is a relevant issue. For instance, the proposed nonlinear schemes attain resolution embedded bitstreams, but SNR and quality scalability is not obtained. It would be interesting to profit from the previous analysis to construct generalized transforms with SNR embedded and scalable bitstream (for lossy compression purposes). • The linear framework deserves more attention. For instance, the linear transforms may be evaluated in lossy compression. Perhaps, the objective functions may be changed in order to design specific transforms for lossy compression. The formulation accepts many


145

other design modifications: a possible approach is to design different ULS for the even and odd samples, which makes sense because the subsequent function of these samples is also different. Another working topic within the linear framework is the search and use of other image models than the auto-regressive one. • Generalized lifting may be improved in several ways. Usually, the underlying GLS transform support is smaller than the 2-D LeGall 5/3 wavelet support. This fact leads to the extension of the proposal by enlarging the generalized lifting support, possibly improving the current results. A proportional increase of computation and memory requirements should be avoided if possible. Eventually, 2-D transforms may be developed, both nonseparable and 1-D direction-selective. Finally, an important effort should be focused on the update step development. One possibility is to identify appropriate update steps for the predictions proposed in §5.1.2 and §5.1.3. A second way is to study variations on the gradient minimization generalized ULS, which seems promising. • The discrete generalized version is a fruitful approach that has demonstrated its richness. However, it may restrict the generalized lifting scope since it is a concrete case of the continuous version. The continuous generalized lifting requires a further study which may lead to appropriate mappings for lossy compression applications. There are some practical implementation issues that could help the verification of the potential of this work and that would extend its range of application: • The entropy coders used in this work are devoted to specific linear wavelet transforms. Better results are expected for the GL scheme by using an entropy coder that takes into account the characteristics of the nonlinear coefficients. The development of a simple JPEG-LS-like entropy coder for the nonlinear transforms would be interesting. • The study of the nonlinear schemes interest for video coding would be relevant if the theoretical development of a lossy scheme is successful. Currently, the search for appropriate transforms in video coding is a hot topic.


146

Appendix A

Benchmark Images The image sets forming the benchmark for comparison between algorithms in this Ph.D. dissertation come from different corpus. This appendix briefly describes each of the images classes included in the experimental corpus. All images have 8-bit depth. Tables provide the name and size of the images. An example of image for each class is also given.

• Natural Images: In this work, natural images refers to a generic set of natural scene images. The 25 images in table A.1 are used. • Synthetic Images: Different kinds of synthetic images compose this class: two extracted of the JPEG2000 test set, other obtained from internet data bases, and others designed on purpose to test specific features of the algorithms. See table A.2. Image “crosses”, used to illustrate a wavelet decomposition in §2.1.3, is a 256x256 image obtained from http://links.uwaterloo.ca/bragzone.base.html. • Texture Images: Several textured images have been employed in the experiments. In table A.3 appear their name and sizes. • Biomedical Images: The biomedical image set consists of three different corpus: mammography, axones, and MRI brain images. – Mammography: A set of 15 mammography is employed. Sizes are shown in table A.4. – Axones: The axones class is a set of 9 medical images that have a size of 512x512 pixels. – Magnetic Resonance Imaging (MRI): This class is a group of 3-D human head images. They are a slice by slice view of the inside of a human head. The set is formed by 96 images of 256x256 pixels. 147

Appendix A. Benchmark Images

Image Name aerial aerial2 aquarium baboon barbara bike bridge cafe cameraman carccett cats cheryl farm fish fruit girl goldhill hawaii1 gray jp2 8 lena mit people peppers phone water

# of rows 256 2048 576 512 512 2560 256 2560 256 536 2048 512 512 200 256 512 256 1391 400 512 256 256 512 384 1999

148

# of cols 256 2048 720 512 512 2048 256 2048 256 672 3072 512 512 500 256 512 256 2097 700 512 256 256 512 510 1465

Table A.1: Set of natural images.

Image Name anemone books chart cmpnd1 cwheel docon3 001 house music plan1 stone sunset synCircle syntheticHF tdf1 winaw

# of rows 471 318 2347 768 600 288 600 111 583 133 480 256 128 576 465

# of cols 722 179 1688 512 800 360 800 111 860 169 640 256 128 720 633

Table A.2: Set of synthetic images.

• Remote Sensing Images: Two different kinds of remote sensing images appear in this work: – SST images: Sea Surface Temperature (SST)images are obtained with the Advanced Very High Resolution Radiometer sensors (AVHRR/2&3) from the National Oceanic and Atmospheric Administration (NOAA) satellite series [U.S]. Images size range from 5 to 7 Mbytes. This specific set (table A.5) is devoted to the African northwest. – MOC images: Images collected by the Mars Global Surveyor (MGS) Mars Orbiter Camera (MOC) narrow angle imaging since 20 August 2003. Images are archived in final form with the NASA Planetary Data System (PDS) in [Jet], where can be found with the name R20-00258p, R20-00329p, R20-00387p, R20-00485p.gif, R20-00502p, R20-00725p, R20-00923p and R20-01188p, instead of MOC-1 to MOC-8. They are listed in table A.6.


Image Name bk17 bk25 bk36 bk40 bk42 bk44 bk48 rd04 rd09 rd24 rd49 rd57 rd65 rd77 rd87

# of rows 150 150 150 150 150 150 150 256 256 256 256 256 256 256 256

149

# of cols 150 150 150 150 150 150 150 256 256 256 256 256 256 256 256

Table A.3: Set of texture images.

Image Name SST AfrNW 1 SST AfrNW 2 SST AfrNW 3 SST AfrNW 4 SST AfrNW 5 SST AfrNW 6

# of rows 3268 3197 3035 2655 2422 2705

# of cols 2048 2048 2048 2048 2048 2048

Table A.5: Set of SST images.

Image Name mamo1 mamo2 mamo3 mamo4 mamo5 mamo6 mamo7 mamo8 mamo9 mamo10 mamo11 mamo12 mamo13 mamo14 mamo15

# of rows 1281 1303 1331 1285 1271 1265 727 821 1265 831 1319 1309 799 1309 748

# of cols 896 877 863 857 673 773 1297 1321 826 1327 826 833 1323 712 1291

Table A.4: Set of mammography.

Image Name MOC 1 MOC 2 MOC 3 MOC 4 MOC 5 MOC 6 MOC 7 MOC 8

# of rows 6528 5888 6016 3712 9216 6528 6016 7680

# of cols 512 1024 1024 672 672 1024 1024 640

Table A.6: Set of MOC images.


150

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Figure A.1: Image example from each image class. (a) natural image (Barbara), (b) synthetic image (chart), and (c) texture image (rd04). Biomedical images: (d) mammography (mammo-12), (e) axones, and (f ) MRI image. Remote sensing images: (g) SST AfrNW 5 and (h) part of MOC-1 image.

Bibliography [Abh03a] G. C. K. Abhayaratne, “Discrete wavelet transforms that have an adaptive low pass filter”, Seventh International Symposium on Signal Processing and its Applications, Vol. 2, pags. 487–490, July 2003. [Abh03b] G. C. K. Abhayaratne, “Spatially adaptive wavelet transforms: An optimal interpolation approach”, Third International Workshop on Spectral Methods and Multirate Signal Processing, pags. 155–162, September 2003. [Ada98]

M. D. Adams, Reversible wavelet transforms and their applicaton to embedded image compression, Master Thesis, University of Victoria, Victoria, Canada, 1998.

[Ada99]

M. D. Adams, and F. Kossentini, “Low-complexity reversible integer-to-integer wavelet transforms for image coding”, Proc. IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, pags. 177–180, August 1999.

[Ada00]

M. D. Adams, and F. Kossentini, “Reversible integer-to-integer wavelet transforms for image compression: performance evaluation and analysis”, IEEE Transactions on Image Processing, Vol. 9, no 6, pags. 1010–1024, June 2000.

[Ana05]

N. Anantrasirichai, C. N. Canagarajah, and D. R. Bull, “Multi-view image coding with wavelet lifting and in-band disparity compensation”, Proceedings of International Conference on Image Processing, Vol. 3, pags. 33–36, September 2005.

[Ant92]

M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, “Image coding using wavelet transform”, IEEE Transactions on Image Processing, Vol. 1, pags. 205–220, April 1992.

[Aus00]

P. J. Ausbeck, “The piecewise-constant image model”, Proceedings of IEEE , Vol. 88, no 11, pags. 1779–1789, November 2000.

[BB02]

A. Benazza-Benyahia, J.-C. Pesquet, and M. Hamdi, “Vector-lifting schemes for lossless coding and progressive archival of multispectral images”, IEEE Transactions on Geoscience and Remote Sensing, Vol. 40, no 9, pags. 2011 –2024, September 2002. 151

Bibliography

[BB03]

152

A. Benazza-Benyahia, J.-C. Pesquet, and H. Masmoudi, “Block-based adaptive lifting schemes for lossless and progressive image coding”, Third International Workshop on Spectral Methods and Multirate Signal Processing, pags. 207–211, September 2003.

[Ber99]

D. P. Bertsekas, Nonlinear programming, Athena Scientific, Belmont, Massachussets, 2nd ed., 1999.

[Bou01]

N. V. Boulgouris, D. Tzovaras, and M. G. Strintzis, “Lossless image compression based on optimal prediction, adaptive lifting, and conditional arithmetic coding”, IEEE Transactions on Image Processing, Vol. 10, no 1, pags. 1–14, January 2001.

[Boy04]

S. Boyd, and L. Vandenberghe, Convex Optimization, Cambridge University Press, 2004.

[Bru92]

F. A. M. Bruekers, and A. W. M. Van den Enden, “New networks for perfect inversion and perfect reconstruction”, IEEE Journal on Selected Areas in Communications, Vol. 10, pags. 130–137, 1992.

[Bur98]

C. S. Burrus, R. A. Gopinath, and H. Guo, Introduction to wavelets and wavelet transforms, Prentice-Hall, 1998.

[Cal98]

A. R. Calderbank, I. Daubechies, W. Sweldens, and B. L. Yeo, “Wavelet transforms that map integers to integers”, Applied and Computational Harmonic Analysis, Vol. 5, no 3, pags. 332–369, August 1998.

[Cho05]

H. Choi, and R. G. Baraniuk, “Multiscale manifold representation and modeling”, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Vol. 4, pags. 569–572, March 2005.

[Cla97]

R. L. Claypoole, G. Davis, W. Sweldens, and R. G. Baraniuk, “Nonlinear wavelet transforms for image coding”, Proceedings of the 31st Asilomar Conference on Signals, Systems and Computers, Vol. 1, pags. 662–667, November 1997.

[Cla98]

R. L. Claypoole, R. G. Baraniuk, and R. D. Nowak, “Lifting construction of nonlinear wavelet transforms”, Proceedings of the IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis, pags. 49–52, October 1998.

[Cla03]

R. L. Claypoole, G. M. Davis, W. Sweldens, and R. G. Baraniuk, “Nonlinear wavelet transforms for image coding via lifting”, IEEE Transactions on Image Processing, Vol. 12, no 12, pags. 1449–1459, December 2003.

[Coh92]

A. Cohen, I. Daubechies, and J.-C. Feauveau, “Biorthogonal bases of compactly supported wavelets”, Comm. Pure Appl. Math., Vol. 45, no 5, pags. 485–560, 1992.

Bibliography

[Dau88]

153

I. Daubechies, “Orthonormal bases of compactly supported wavelets”, Comm. Pure Appl. Math., Vol. 41, pags. 909–967, November 1988.

[Dau98]

I. Daubechies, and W. Sweldens, “Factoring wavelet transforms into lifting steps”, Journal of Fourier Analysis and Applications, Vol. 4, pags. 245–267, 1998.

[Dee03]

A. T. Deever, and S. S. Hemami, “Lossless image compression with projection-based and adaptive reversible integer wavelet transforms”, IEEE Transactions on Image Processing, Vol. 12, no 5, pags. 489–499, May 2003.

[Del92]

P. Delsarte, B. Macq, and D. T. M. Slock, “Signal-adapted multiresolution transform for image coding”, IEEE Trans. Information Theory, Special Issue on Wavelet Transforms and Multiresolution Signal Anal., Vol. 38, pags. 897–903, March 1992.

[Don95]

R. D. Dony, and S. Haykin, “Optimally adaptive transform coding”, IEEE Transactions on Image Processing, Vol. 4, no 10, pags. 1358–1370, October 1995.

[Don97]

D. L. Donoho, “Wedgelets: nearly minimax estimation of edges”, Tech. rep., Statistics Department, Stanford University, 1997.

[Don98]

D. L. Donoho, “Orthonormal ridgelets and linear singularities”, Tech. rep., Statistics Department, Stanford University, 1998.

[Egg95]

O. Egger, W. Li, and M. Kunt, “High compression image coding using an adaptive morphological subband decomposition”, Proceedings of IEEE , Vol. 83, no 2, pags. 272– 287, February 1995.

[Fah02]

G. F. Fahmy, and S. Panchanathan, “A lifting based system for optimal compression and classification in the JPEG2000 framework”, Proceedings of IEEE International Symposium on Circuits and Systems, Vol. 4, pags. 153–156, May 2002.

[Flo94]

D. A. F. Florencio, and R. W. Schafer, “A non-expansive pyramidal morphological image coder”, Proceedings of International Conference on Image Processing, Vol. 2, pags. 331–335, November 1994.

[Flo96]

D. A. F. Florencio, and R. W. Schafer, “Perfect reconstructing nonlinear filter banks”, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Vol. 3, pags. 1814–1817, April 1996.

[Gal88]

D. Le Gall, and A. Tabatabai, “Subband coding of digital images using symmetric short kernel filters and arithmetic coding techniques”, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pags. 761–764, April 1988.

Bibliography

[Ger00]

154

O. N. Gerek, and A. E. Cetin, “Adaptive polyphase subband decomposition structures for image compression”, IEEE Transactions on Image Processing, Vol. 9, no 10, pags. 1649–1660, October 2000.

[Ger05]

O. N. Gerek, and A. E. Cetin, “Lossless image compression using an edge adapted lifting predictor”, Proceedings of International Conference on Image Processing, Vol. 23, pags. 730–733, September 2005.

[Ger06]

O. N. Gerek, and A. E. Cetin, “A 2-D orientation-adaptive prediction filter in lifting structures for image coding”, IEEE Transactions on Image Processing, Vol. 15, no 1, pags. 106–111, January 2006.

[Gir05]

B. Girod, and S. Han, “Optimum update for motion-compensated lifting”, IEEE Signal Processing Letters, Vol. 12, no 2, pags. 150–153, February 2005.

[Gou00]

A. Gouze, M. Antonini, and M. Barlaud, “Quincunx lifting scheme for lossy image compression”, Proceedings of International Conference on Image Processing, Vol. 1, pags. 665–668, September 2000.

[Gou01]

A. Gouze, M. Antonini, and M. Barlaud, “Optimized lifting scheme for twodimensional quincunx sampling images”, Proceedings of International Conference on Image Processing, Vol. 2, pags. 253–256, October 2001.

[Gou04]

A. Gouze, M. Antonini, M. Barlaud, and B. Macq, “Design of signal-adapted multidimensional lifting scheme for lossy coding”, IEEE Transactions on Image Processing, Vol. 13, no 12, pags. 1589–1603, December 2004.

[Gra02]

M. Grangetto, E. Magli, M. Martina, and G. Olmo, “Optimization and implementation of the integer wavelet transform for image coding”, IEEE Transactions on Image Processing, Vol. 11, no 6, pags. 596–604, June 2002.

[Ham96] F. J. Hampson, and J.-C. Pesquet, “A nonlinear subband decomposition with perfect reconstruction”, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Vol. 3, pags. 1523–1526, April 1996. [Ham98] F. J. Hampson, and J.-C. Pesquet, “M-band nonlinear subband decompositions with perfect reconstruction”, IEEE Transactions on Image Processing, Vol. 7, no 11, pags. 1547–1560, November 1998. [Hat04]

J. Hattay, A. Benazza-Benyahia, and J.-C. Pesquet, “Adaptive lifting schemes using variable-size block segmentation”, Advanced Concepts for Intelligent Vision Systems, pags. 311–318, September 2004.

Bibliography

[Hat05]

155

J. Hattay, A. Benazza-Benyahia, and J.-C. Pesquet, “Adaptive lifting for multicomponent image coding through quadtree partitioning”, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pags. 213–216, March 2005.

[Hei00]

H. J. A. M. Heijmans, and J. Goutsias, “Nonlinear multiresolution signal descomposition schemes: Part II: morphological wavelets”, IEEE Transactions on Image Processing, Vol. 9, no 11, pags. 1897–1913, 2000.

[Hei01]

H. J. A. M. Heijmans, B. Pesquet-Popescu, and G. Piella, “Building nonredundant adaptive wavelet by update lifting”, Tech. rep., PNA CWI Amsterdam, 2001.

[Hei05a] H. J. A. M. Heijmans, B. Pesquet-Popescu, and G. Piella, “Building nonredundant adaptive wavelets by update lifting”, Applied and Computational Harmonic Analysis, Vol. 18, pags. 252–281, May 2005. [Hei05b] H. J. A. M. Heijmans, G. Piella, and B. Pesquet-Popescu, “Adaptive wavelets for image compression using update lifting: Quantisation and error analysis”, International Journal of Wavelets, Multiresolution, and Information Processing, 2005. [Ho99]

W.-J. Ho, and W.-T. Chang, “Adaptive predictor based on maximally flat halfband filter in lifting scheme”, IEEE Trans. Signal Processing, Vol. 47, no 11, pags. 2965– 2977, November 1999.

[Hon05]

A. Honda, K. Fukuda, and A. Kawanaka, “Permuting and lifting wavelet coding for structured 3-D geometry data with expanded nodes”, Proceedings of International Conference on Image Processing, Vol. 1, pags. 761–764, September 2005.

[ISO99a] ISO/IEC, ISO/IEC 14495-1:1999: information tecnology - lossless and near-lossless compression of continuous-tone still images: baseline, December 1999. [ISO99b] ISO/IEC, ISO/IEC JTC1/SC29/WG1 N1545, JBIG2 Final Draft Int. Std., December 1999. [ISO00]

ISO/IEC, ISO/IEC 15444-1: JPEG 2000 image coding system, 2000.

[Jan04]

M. Jansen, “Nonlinear multiscale decompositions by edge-adaptive subsampling”, Advanced Concepts for Intelligent Vision Systems, pags. 297–301, September 2004.

[Jet]

Jet Propulsory Laboratory and U.S. Geological Survey, Available: http://pdsimaging.jpl.nasa.gov.

[Kam05] L. Kamstra, and H. J. A. M. Heijmans, “Reversible data embedding into images using wavelet techniques and sorting”, IEEE Transactions on Image Processing, Vol. 14, no 12, pags. 2082–2090, December 2005.

Bibliography

[Kov00]

156

J. Kovacevic, and W. Sweldens, “Wavelets families of increasing order in arbitrary dimensions”, IEEE Transactions on Image Processing, Vol. 9, no 3, pags. 480–496, March 2000.

[Kuz98]

K. Kuzume, and K. Nijima, “Design of optimal lifting wavelet filters for data compression”, Proceedings of the IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis, pags. 337–340, October 1998.

[Lei05]

Z. Lei, A. Makur, and Z. Ce, “Design of 2-channel linear phase filter bank: a lifting approach”, IEEE Proceedings of the International Symposium on Circuits and Systems, pags. 4301–4304, May 2005.

[Li02]

H. Li, G. Liu, Y. Li, and X. Hou, “The construction of a statistical prediction lifting operator and its application”, Proceedings of International Conference on Image Processing, Vol. 1, pags. 353–356, September 2002.

[Li05]

H. Li, G. Liu, and Z. Zhang, “Optimization of integer wavelet transforms based on difference correlation structures”, IEEE Transactions on Image Processing, Vol. 14, no 11, pags. 1831–1847, November 2005.

[Liu01]

J. Liu, and P. Moulin, “Information-theoretic analysis of interscale and intrascale dependencies between image wavelet coefficients”, IEEE Transactions on Image Processing, Vol. 10, no 11, pags. 1647–1658, December 2001.

[Luo01]

L. Luo, S. Li, Z. Zhuang, and Y.-Q. Zhang, “Motion compensated lifting wavelet and its application in video coding”, Proceedings of the IEEE International Conference on Multimedia Expo, pags. 481–484, August 2001.

[Mal89]

S. Mallat, “A theory for multiresolution signal decomposition: the wavelet representation”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, no 7, pags. 674–693, July 1989.

[Mal98]

S. Mallat, A Wavelet tour of signal processing, Academic Press, San Diego, California, 1998.

[Meh03] N. Mehrseresht, and D. Taubman, “Adaptively weighted update steps in motion compensated lifting based scalable video compression”, Proceedings of International Conference on Image Processing, Vol. 3, pags. 771–774, September 2003. [Mic76]

C. A. Michelli, and T. J. Rivlin, Optimal Estimation in Approximation Theory, Eds. New York: Plenum, 1976.

[Mur02]

D. D. Muresan, “Review of optimal recovery”, Tech. rep., Cornell University, 2002, available: http://dsplab.ece.cornell.edu/papers.

Bibliography

[Mur04]

157

D. D. Muresan, and T. W. Parks, “Adaptively quadratic (AQua) image interpolation”, IEEE Transactions on Image Processing, Vol. 13, no 4, pags. 690–698, May 2004.

[Ohm94] J.-R. Ohm, “Three-dimensional subband coding with motion compensation”, IEEE Transactions on Image Processing, Vol. 3, no 5, pags. 559–571, September 1994. [Pen92]

W. B. Pennebaker, and J. L. Mitchell, JPEG: Still image data compression standard , Van Nostrand Reinhold, New York, 1992.

[Pen00]

E. Le Pennec, and S. Mallat, “Image compression with geometrical wavelets”, Proceedings of International Conference on Image Processing, Vol. 1, pags. 661–664, September 2000.

[Pie01a]

G. Piella, and H. J. A. M. Heijmans, “Adaptive lifting scheme with perfect reconstruction”, Tech. rep., PNA CWI Amsterdam, 2001.

[Pie01b] G. Piella, and H. J. A. M. Heijmans, “An adaptive update lifting scheme with perfect reconstruction”, Proceedings of International Conference on Image Processing, Vol. 2, pags. 190–193, October 2001. [Pie04]

G. Piella, H. J. A. M. Heijmans, and B. Pesquet-Popescu, “Quantization of adaptive wavelets for image compression”, IEEE Proceedings of the International Midwest Symposium on Circuits and Systems, July 2004.

[Pie05]

G. Piella, B. Pesquet-Popescu, H. J. A. M. Heijmans, and G. Pau, “Combining seminorms in adaptive lifting schemes and applications to image analysis and compression”, Journal of Mathematical Imaging and Vision, July 2005.

[Pit90]

I. Pitas, and A. N. Venetsanopoulos, Nonlinear Digital Filters, Kluwer Academic Publishers, 1990.

[PP02]

B. Pesquet-Popescu, G. Piella, and H. J. A. M. Heijmans, “Adaptive update lifting with gradient criteria modeling high-order differences”, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pags. 1417–1420, May 2002.

[PP03]

B. Pesquet-Popescu, G. C. K. Abhayaratne, H. J. A. M. Heijmans, and G. Piella, “Quantization of adaptive 2D wavelet decompositions”, Proceedings of International Conference on Image Processing, Vol. 3, pags. 209–212, September 2003.

[Que95]

R. L. de Queiroz, and D. A. F. Florencio, “A nonlinear filter bank for image coding”, Midwest Symposium Circuits and Systems, pags. 190–193, August 1995.

Bibliography

[Que98]

158

R. L. de Queiroz, D. A. F. Florencio, and R. W. Schafer, “Non-expansive pyramid for image coding using a nonlinear filterbank”, IEEE Transactions on Image Processing, Vol. 7, no 2, pags. 246–252, February 1998.

[Ram96] K. Ramchandran, M. Vetterli, and C. Herley, “Wavelets, subband coding, and best bases”, Proceedings of IEEE , Vol. 84, pags. 541–560, April 1996. [Roc71]

R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, New Jersey, 2nd ed., 1971.

[Sai96a]

A. Said, and W. A. Pearlman, “A new, fast, and efficient image codec based on set partitioning in hierarchical trees”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 6, no 3, pags. 243–250, June 1996.

[Sai96b]

A. Said, and W. A. Pearlman, “An image multiresolution representation for lossless and lossy compression”, IEEE Transactions on Image Processing, Vol. 5, no 9, pags. 1303–1310, September 1996.

[Say00]

K. Sayood, Introduction to Data Compression, Morgan Kauffman Publishers, 2000.

[Sec01]

A. Secker, and D. Taubman, “Motion-compensated highly scalable video compression using and adaptive 3D wavelet transform based on lifting”, Proceedings of International Conference on Image Processing, Vol. 2, pags. 1039–1042, October 2001.

[Sec03]

A. Secker, and D. Taubman, “Lifting-based invertible motion adaptive transform (LIMAT) framework for highly scalable video compression”, IEEE Transactions on Image Processing, Vol. 12, no 12, pags. 1530–1542, December 2003.

[Ser00]

D. Sersic, “Integer to integer mapping wavelet filter bank with adaptive number of zero moments”, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pags. 480–483, June 2000.

[Sha93]

J. M. Shapiro, “Embedded image coding using zerotrees of wavelet coefficients”, IEEE Transactions on Image Processing, Vol. 41, no 12, pags. 3445–3462, December 1993.

[Sin93]

D. Sinha, and A. H. Tewfik, “Low bit rate transparent audio compression using adapted wavelets”, IEEE Transactions on Image Processing, Vol. 41, no 12, pags. 3463–3479, December 1993.

[Sko01]

A. Skodras, C. Christopoulos, and T. Ebrahimi, “The JPEG2000 still image compression standard”, IEEE Signal Processing Magazine, Vol. 18, pags. 36–58, September 2001.

Bibliography

[Sol04a]

159

J. Solé, and P. Salembier, “Adaptive discrete generalized lifting for lossless compression”, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Vol. 3, pags. 57–60, May 2004.

[Sol04b]

J. Solé, and P. Salembier, “Discrete generalized lifting for lossless image compression”, Research in AVR Barcelona, pags. 337–340, February 2004.

[Sol04c]

J. Solé, and P. Salembier, “Prediction design for discrete generalized lifting”, Proceedings of Advanced Concepts for Intelligent Vision Systems, pags. 319–324, September 2004.

[Sol05]

J. Solé, and P. Salembier, “Adaptive generalized prediction for lifting schemes”, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pags. 205–208, March 2005.

[Sol06a]

J. Solé, and P. Salembier, “A common formulation for interpolation, prediction, and update lifting design”, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pags. 13–16, May 2006.

[Sol06b]

J. Solé, and P. Salembier, “Adaptive quadratic image interpolation methods”, accepted to Research in AVR Barcelona, July 2006.

[Sol06c]

J. Solé, and P. Salembier, “Adaptive quadratic interpolation methods for lifting steps construction”, accepted to the IEEE International Symposium on Signal Processing and Information Technology, August 2006.

[Sta02]

J.-L. Starck, E. J. Candes, and D. L. Donoho, “The curvelet transform for image denoising”, IEEE Transactions on Image Processing, Vol. 11, no 6, pags. 670–684, June 2002.

[Sun04]

Y.-K. Sun, “A two-dimensional lifting scheme of integer wavelet transform for lossless image compression”, Proceedings of International Conference on Image Processing, Vol. 1, pags. 497–500, October 2004.

[Swe96]

W. Sweldens, “The lifting scheme: A custom-design construction of biorthogonal wavelets”, Applied and Computational Harmonic Analysis, Vol. 3, no 2, pags. 186– 200, 1996.

[Swe97]

W. Sweldens, “The lifting scheme: A construction of second generation wavelets”, SIAM J. Math. Anal., Vol. 29, no 2, pags. 511–546, 1997.

[Tau94]

D. Taubman, and A. Zakhor, “Multirate 3-D subband coding of video”, IEEE Transactions on Image Processing, Vol. 3, no 5, pags. 572–588, September 1994.

Bibliography

[Tau99]

160

D. Taubman, “Adaptive, non-separable lifting transforms for image compression”, Proceedings of International Conference on Image Processing, Vol. 3, pags. 772–776, October 1999.

[Tau00]

D. Taubman, “High performance scalable image compression with EBCOT”, IEEE Transactions on Image Processing, Vol. 9, no 7, pags. 1158–1170, July 2000.

[Tau02a] D. Taubman, and M. Marcellin, JPEG2000. Image compression fundamentals, standards and practice, Kluwer Academic Publishers, 2002. [Tau02b] D. Taubman, E. Ordentlich, M. Weinberger, and G. Seroussi, “Embedded block coding in JPEG 2000”, IEEE Signal Processing Magazine, Vol. 17, no 1, pags. 49–72, January 2002. [Til05]

C. Tillier, B. Pesquet-Popescu, and M. van der Schaar, “Improved update operators for lifting-based motion-compensated temporal filtering”, IEEE Signal Processing Letters, Vol. 12, no 2, pags. 154–157, February 2005.

[Tra99]

W. Trappe, and K. Liu, “Adaptivity in the lifting scheme”, 33th Conference on Information Sciences and Systems, pags. 950–955, March 1999.

[Uns03]

M. Unser, and T. Blu, “Mathematical properties of the JPEG2000 wavelet filters”, IEEE Transactions on Image Processing, Vol. 12, no 9, pags. 1080–1090, September 2003.

[U.S]

U.S. Geological Survey, Available: http://edc.usgs.gov/products/satellite/avhrr.html.

[Use01]

B. E. Usevitch, “A tutorial on modern lossy wavelet image compression: Foundations of JPEG 2000”, IEEE Signal Processing Magazine, pags. 22–35, September 2001.

[Vai92]

P. P. Vaidyanathan, Multi-rate systems and filter banks, Prentice-Hall, 1992.

[Vet95]

M. Vetterli, and J. Kovacevic, Wavelets and subband coding, Prentice-Hall, 1995.

[W3C96] W3C, PNG (Portable Network Graphics) Specification, October 1996, available: http://www.w3.org/TR/PNG. [Wan05] D. Wang, L. Zhang, and A. Vincent, “Improvement of JPEG2000 using curved wavelet transforms”, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pags. 365–368, March 2005. [Wei00]

M. J. Weinberger, G. Seroussi, and G. Sapiro, “The LOCO-I lossless image compression algorithm: principles and standardization into JPEG-LS”, IEEE Transactions on Image Processing, Vol. 9, no 8, pags. 1309–1324, August 2000.

Bibliography

[Wit87]

161

I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding for data compression”, Communications of the ACM , Vol. 30, pags. 520–540, June 1987.

[Wu97]

X. Wu, and N. Nemon, “Context-based, adaptive, lossless image coding”, IEEE Transactions on Communications, Vol. 45, no 4, pags. 437–444, April 1997.

[Yoo02]

H. Yoo, and J. Jeong, “Signal-dependent wavelet transform and application to lossless image compression”, Electronics Letters, Vol. 38, no 4, pags. 170–172, February 2002.

[Zee02]

P. M. de Zeeuw, “A toolbox for the lifting scheme in quincunx grids (LISQ)”, Tech. rep., PNA CWI Amsterdam, 2002.

[Zha04]

X. Zhang, W. Wang, T. Yoshikawa, and Y. Takei, “Design of IIR orthogonal wavelet filter banks using lifting scheme”, Proceedings of International Conference on Image Processing, Vol. 1, pags. 2511–2514, October 2004.

Optimization and Generalization of Lifting Schemes

Optimization and Generalization of Lifting Schemes

Suggest Documents

filter design for adaptive lifting schemes - eurasip

Understanding Generalization and Optimization Performance of Deep ...

OPTIMIZATION APPROACHES FOR GENERALIZATION Monika

TRIOID: A GENERALIZATION OF MATROID ... - Optimization Online

A Generalization of Stationary AR(1) Schemes - arXiv

Optimization of Broadcast Encryption Schemes - KTH

Optimization of sampling schemes for vegetation ...

Optimization of Broadcast Encryption Schemes - KTH

A Generalization of Nemhauser and Trotter's Local Optimization ...

Vector Lifting Schemes for Stereo Image Coding - Sites personnels de ...

l1-ADAPTED NON SEPARABLE VECTOR LIFTING SCHEMES FOR ...

A GENERALIZATION OF MATROID AND THE ... - Optimization Online

a generalization of the sequential optimization and ...

Evaluation and optimization of the design schemes of ... - SAGE Journals

Maintenance for lifting equipment and lifting appliances

Relaxation Schemes for Min Max Generalization in ... - ORBi

lifting

optimization of time data codification and transmission schemes - arXiv

Optimization and Performance of Network Restoration Schemes for ...

Sparsity-based optimization of two lifting-based wavelet transforms for ...

Modeling and Optimization of Transmission Schemes in ... - CiteSeerX

optimization of time data codification and transmission schemes

Optimization and Performance of Network Restoration Schemes for ...

Relaxation Schemes for Min Max Generalization in ... - ORBi