Image Data Compression

418 downloads 1958 Views 5MB Size Report
Part II. Image Data Compression. Prof. Ja-Ling Wu. Department of Computer Science and Information Engineering. National Taiwan University ...
Part II Image Data Compression Prof. Ja-Ling Wu

Department of Computer Science and Information Engineering National Taiwan University

Contents I. II.

III. IV.

V. VI.

Introduction Predictive Techniques Transform Domain Coding Techniques Image Coding in Visual Telephony Coding of Two-Tone Images References

Information Theory

2

Image Data Compression I. 

 

Introduction : Image data Compression is concerned with minimizing the number of bits required to represent an image. Applications of data compression are primarily in “Transmission” and “Storage” of information. Application of data compression is also in the development of “fast algorithms” where the number of operations required to implement an algorithm is reduced by working with the compressed data.

Information Theory

3

Image data Compression techniques

Pixel Coding

Predictive Coding

Transform Coding

Others

• PCM/quantization

• Data modulation

• Zonal coding

• Hybrid Coding

• Run-length coding

• Line-by-line DPCM

• Threshold coding

• Vector quantization

• Bit-plane coding

• 2-D DPCM

• Multi-D techniques

• Interframe techniques

• Adaptive

• Adaptive

Information Theory

4



Image data Compression methods fall into two common categories : A. Redundancy Coding : – Redundancy reduction – Information lossless Predictive coding : DM, DPCM B.

Entropy Coding : – Entropy reduction – Inevitably results in some distortion Transform coding



For digitized data, “Distortionless Compression” techniques are possible.

Information Theory

5

Some methods for Entropy reduction:  Subsampling : reduce the sampling rate  Coarse Quantization : reduce the number of quantization levels  Frame Repetition / Interlacing : reduce the refresh rate (number of frames per second) TV signals

Information Theory

6

Predictive Techniques : Basic Principle : : to remove mutual redundancy between successive pixels and encode only the new information. II.

DPCM : A Sampled sequence u(m), coded up to m=n-1. Let u~n  1, u~n  2, be the value of the reproduced (decoded) sequence.

Information Theory

7

~n  , an At m=n, when u(n) arrives, a quantify u estimate of u(n), is predicted from the previously decoded samples u~n  1, u~n  2, , i.e., u~n   u~n  1, u~n  2, ;   :" prediction rule" 

prediction error : en   u n   u~n 

~n  is the quantized value of e(n), then the If e reproduced value of u(n) is :

u~n  u~n  e~n

Information Theory

8

DPCM CODEC u n 

 –

en 

Quantizer

e~n 

~ Communication e n  + Channel

u~n 

Predictor with delay

u~n 



u~n  u~n 

+

Predictor with delay

 +

Reconstruction filter/Decoder

Coder

Information Theory

9



Note :

u n   u~ n   en  

u n   u~ n    u n   u~ n   en   u~ n   e~ n   en   e~ n 



 qn 



 



: the Quantization error in e(n)

Remarks: 1. The pointwise coding error in the input sequence is exactly equal to q(n), the quantization error in e(n) 2. With a reasonable predictor the mean sequare value of the differential signal e(n) is much smaller than that of u(n)

Information Theory

10



Conclusion: For the same mean square quantization error, e(n) requires fewer quantization bits than u(n).  The number of bits required for transmission has been reduced while the quantization error is kept the same.

Information Theory

11

Feedback Versus Feedforward Prediction An important aspect of DPCM is that the prediction is based on the output — the quantized samples — rather than the input — the unquantized samples. This results in the predictor being in the “feedback loop” around the quantizer, so that the quantizer error at a given step is fed back to the quantizer input at the next step. This has a “stabling effect” that prevents DC drift and accumulation of error in the reconstructed signal u~n  .

Information Theory

12

If the prediction rule is based on the past input, the signal reconstruction error would depend on all the past and present quantization errors in the feedforward predictionerror sequence (n). Generally, the MSE of feedforward reconstruction will be greater than that in DPCM. Quantizer

u n 

+

 n 



Predictor

u~n 

+ +

u n  Entropy coder/decoder

Predictor

Feedforward coding Information Theory

13



Example The sequence 100, 102, 120, 120, 120, 118, 116, is to be predictively coded using the prediction rule:

u~n   u~n  1 for DPCM u n   u n  1

for the feedforward predictive coder.

Assume a 2-bit quantizer, as shown below, is used, 5

1 -6

-4

-2

-1

2

4

6

-5

Except the first sample is quantized separately by a 7-bit uniform ~ 0  u 0  100. quantizer, given u





Information Theory

14

Input

DPCM

Feedforward Predictive Coder

N

~n  en  u(n) u

e~n  u~n 

0

100







100

0







100

0

1

102

100

2

1

101

1

100

2

1

101

1

2

120

101

19

5

106

14

102

18

5

106

14

3

120

106

14

5

111

9

120

0

-1

105

15

4

120

111

9

5

116

4

120

0

-1

104

16

5

118

116

2

1

117

1

120

-2

-5

99

19

u(n)

u n  εn 

Information Theory

~ε n  u~n 

u(n)

15

Delta Modulation : (DM)  

Predictor : one-step delay function Quantizer : 1-bit quantizer

u~n   u~n  1 en   u n   u~n  1

u n 

+

en 

e~n 

+



–

u~n  Unit Delay

u~ n 

+

+

Integrator e~n  +

u~n 

+ u~ n 

Unit Delay Information Theory

16

u n 

u~n 

Granularity

Slope overload



Primary Limitation of DM :

1)

Slope overload : large jump region Max. slope = (step size)  (sampling freq.) Granularity Noise : almost constant region Instability to channel Noise

2) 3)

Step size effect : Step Size   (i) slope overload  (sampling frequency ) (ii) granular Noise  Information Theory

17

Adaptive Delta Modulation S k 1

+1

+

EK 1



–1

Adaptive Function

 k , Ek ,  min stored

 k 1 Xk

Unit Delay

X k 1

+ +

EK 1  sgn S K 1  X K 

  K EK 1  12 EK  if  K   min   K 1     E if    min K 1 K min   X K 1  X K   K 1

This adaptive approach simultaneously minimizes the effects of both slope overload and granular noise. Information Theory

18

DPCM Design 

There are two components to design in a DPCM system : i. The predictor ii. The quantizer

Ideally, the predictor and quantizer would be optimized together using a linear or Nonlinear technique. In practice, a suboptimum design approach is adopted : i. Linear predictor ii. Zero-memory quantizer

Remark : For this approach, the number of quantizing levels, M, must be relatively large (M8) to achieve good performance. Information Theory

19

Design of linear predictor Sˆ0  a1S1 a2 S 2    an S n e  S  Sˆ 0



0

0



2 2 E  S 0  Sˆ0    E S 0  a1S1 a2 S 2    an S n   ai ai





 2 E S 0  a1S1 a2 S 2    an S n S i   0 , i  1,2,  n

 E S 0  a1S1 a2 S 2    an S n Si   0 E S  Sˆ S  0, i  1,2,  n



0



0

 

Rij  E Si S j

i



 

E S 0 Si   E Sˆ0 S i

R0i  E a1S1S i a2 S 2 Si    an S n Si   a1 R1i a2 R2i    an Rni  a1  a  R0i   R1i , R2i ,, Rni  2      an 

ai   R1i , R2i ,, Rni 1R0i  Information Theory

20



When Sˆ0 comprises these optimized coefficients, ai, then the mean square error signal is :





2  ˆ σ  E S0  S0     E S 0  Sˆ0 S 0  E S 0  Sˆ0 Sˆ0 But E S  Sˆ Sˆ  0 (orthogonal principle) 2 e

   E S

0

σ e2

0

        Sˆ S   E S  E Sˆ S  0

0

0

0

2 0

0

0

 R00  a1 R01  a2 R02    an R0 n 

σ e2 : the variance of the difference signal R00 : the variance of the original signal 

The variance of the error signal is less than the variance of the original signal. Information Theory

21



Remarks: 1. The complexity of the predictor depends on “n”. 2. “n” depends on the covariance properties of the original signal.

Information Theory

22

Design of the DPCM Quantizer 

Review of uniform Quantizer: Quantization: 1. Zero-Memory quantization 2. Block quantization 3. Sequential quantization output 

Y

X input

Midtread quantizer

Midriser quantizer

Quantization Error : QE = y(X) – X  Average Distortion : D    y X   X 2 P X dX  SNR : 2 SNR  10 log10 σD in dB

 

where σ 2 : the variance of the input x Information Theory

23



Uniform quantizer : p(x) is constant within each interval

Y Lower overload region

Granular

upper overload region

Region

X x1

x2

x3

x4

x5

x6

Quantization Error for Midtread Quantizer

Information Theory

24

Y

y8 y7 y6 y5 x1

x2

x3

x4

x5 y4

x6

x7

x8

x9

X

y3 y2

y1



Uniform Midtread Quantizer M=9

Output level yi always be in the midpoint of the input interval =xi-xi-1. Assume p(x) is constant in the interval =xi-xi-1 and equal to p(xi) Lower overload region : =x0-x1, x1 >> x0 Granular region : =xi-xi-1, 2  i  M-1 upper overload region : =xM-xM-1, xM >> xM-1 M

xi

i 1

xi 1

D  

 yi x   x2 pxi dx

M 1   yi x   x 3  ~   pxi   3 i 2   xi1 xi

Where we assume the contribution of the overload region is negligible ; i.e. p(x1)=p(xM)=0

Information Theory

25

Since

xi  yi  x   2 xi1  yi  x  

Quantizer characteristics

 2

M 1

D~  121  p xi 3 i 2

M 1

But

 p x   1 i

i 2

D~ 

2 12

(Source Model)

if the pdf is p(x) 

1 2V

(V  x  V )

the input variance is 

   x 2 px  2



V

 x  V

2

1 2V

dx 

V2 3

Information Theory

26

Then SNR  10 log10



2

D 2 V  12  10 log10 3  2 But   2MV for M  2  SNR  10 log10 M 2  20 log10 M if M  2n (n - bit quantizer) SNR  20n log10 2  6n (in dB) - valid only for PCM Quantizer Information Theory

27

B.

DPCM Quantizer The pdf of the input signal to the DPCM quantizer is not at all uniform. Since a “good” predictor would be expected to result in many zero difference between the predicted values and the actual values. A typical shape for this distribution is a highly peaked, around zero, distribution. pdf : p(d)

(Ex., Laplacian)



+

: Non-uniform Quantizer is required. Information Theory

28

X Compressor

C 

C x 

dC x  2 xmax  dx M x

Uniform Quantizer

Q

QC x  Expander

C 1 

Y  C 1QC x  Non-uniform Quantizer  compressor + uniform Quantizer + Expander Information Theory

29

Compressor

C(X)

C(X)

Xmax

-Xmax

uniform Quantizer

Xmax Xmax

-Xmax

y

C-1(X)

x

Non-uniform Quantizer

Expander Information Theory

30

For this model, the mean-square distortion can be approximately represented as : 1 M 1 1 3 D p ( x )    i x 12 M 2 12 i 2



L2

L1

px   dx 2  x 

where

  x   C '  x  L  L  2 1 L2  L1 is the quantizer range  2 xmax

C '  x  is the slope of the nonlinear function dC x  2 xmax  dx M x Information Theory

 x 

2 xmax MC ' ( x ) 31

Lloyd-Max Quantizer : the most popular one. 1. Each interval limit should be midway between the neighboring levels,  yi  yi1 

xi 

2

2. Each level should be at the centroid of the input prob. Density function over the interval for that level, that is

 x  y  pxdx  0 xi

xi1

Logarithmic Quantizer : -law dC x  1  KX  dx

V log 1  1xV  yx   log 1   

: US. Canada, Japan

i

(log PCM) A-law Ax V , 0  x    1log A A y x    V V log AxV  V , A  x V   1log A

: Europe Information Theory

32



If a Laplacian function is used to model p(e), pe  

 1 2  exp   e   2e e   Input pdf of the DPCM Quantizer

then the variance of the quantization error is: 2 g

 

2 3M 2

 2g 

9  e2 2M 2

 V  0 

1 2 e

1 3

exp



 2 3 e



e de 

3

as V  

 the SNR for the non-uniform quantizer in DPCM becomes :

SNR  10 log10  10 log10

 2  2g

  2 M 2 2 9  e2

Since M  2 n SNR  6.5  6n  10 log10  2 2

e

For the same pdf, PCM gives : SNR  6.5  6n  DPCM improves the SNR by 10 log10  2 2

e

Information Theory

33



ADPCM : i. Adaptive prediction ii. Adaptive Quantization



DPCM for Image Coding :



Each scan line of the image is coded independently by the DPCM techniques. For every slow time-varying image (=0.95) and a Laplacian-pdf Quantizer, 8 to 10 dB SNR improvement over PCM can be expected : that is The SNR of 6-bit PCM can be achieved by 4-bit line-by-line DPCM for =0.97. Two-Dimensional DPCM : two-D predictor

Ex :

u m, n   a, u m  1, n   a2u m, n  1

a3u m  1, n  1  a4u m  1, n  1 Information Theory

34

Information Theory

35

Information Theory

36

Ⅲ. Transform Domain Coding Techniques Transform Coding : (Block Quantization)  A block of data is unitarily transformed so that a large fraction of its total energy is packed in relatively few transform coefficients, which are quantized independently.  The optimum transform coder is defined as the one that minimizes the mean square distortion of the reproduced data for a given number of total bits.  the Karhunen-Loeve Transform (KLT)  The function of the transformation is to make the original samples so that the subsequent operation of quantization may be done more efficiently.

Information Theory

37



In transform coding systems the total number of bits available for quantizing a block of transformed samples is fixed, and it is necessary to allocate these bits to the quantized transformed samples in such a way as to “minimize the overall quantization distortion”.

Information Theory

38



The KLT :



U : input vector : Nx1 random vector, covariance (zero mean) R A : NxN matrix, not necessary unitary V : transformed vector, each components v(k) are mutually uncorrelated. B : NxN matrix U• : reconstructed vector Problem : Find the optimum matrices A and B and the optimum quantizers such that the overall average mean square distortion 2 1 N D  E  u n   u  n   N  n 1  T 1  E u  u u  u N











is minimized. Information Theory

39

 1.

Solution : For an arbitrary quantizer the optimal reconstruction matrix B is given by B = A–1 where  is a diagonal matrix of elements rK defined as

~ K rK   K ~   K  E vk v * k  







2 *   K  E v k     

Information Theory

40

2.

The Lloyd-Max quantizer for each v(k) minimizes the overall mean square error giving =I (that is, B = A–1)

3.

The optimal decorrelating matrix A is the KL transform of U, that is the rows of A are the orthonormalized eigenvectors of the autocovariance matrix R. This gives B = A–1 = A*T

Information Theory

41



Simplification : Assume there is no quantizers Image [u(z)]

N lines N Pixels per line

u(x, yj) : all the N pixels in the jth line j = 1, 2, …N  [u(z)] = [u(x, j1), u(x, j2),…,u(x, jL)]

the N2 vector composed of all the pixels taken in the normal raster scanned pattern sequence. [V(w)] = [A] [U(z)] Transformed N2x N2 pixels transform matrix

Image vector

Information Theory

42

N2

vwk    u  zi Aik

, k  1,2,..., N 2

u z   A vw

A

i 1

N2

u  zi    vwk Aik

1

 A

t



, i  1,2,..., N 2

i 1

target : vwk  : uncorrelat ed

A is a matrix whose columns are the normalized eigenvectors of the covariance matrix of the original pixels.

Information Theory

43



The covariance matrix of u(z) :



Cu  E u z   Eu  z u z   Eu z 

1

assume Eu z   0 and set u  zi   ui

 

 E u12  E u2u1   Cu      E u N 2 u1



E u1u2  E u22

 

 E u

 N2

u2



 



 

 E u1 , u N2  2   E u2 , u N     2  E u N 2 

Information Theory

 

44



Let  denote the eigenvectors of Cu : Cu  =   det [Cu –I] = 0 Arrange ’s in decreasing order such that 1 2…  N2 and substitute into (Cu –I) = 0 to solve for  When the matrix [A] (whose columns are the  functions) is applied to [u(z)], the covariance of the resulting coefficients v(wk) is a diagonal matrix with diagonal elements 1, 2,… ,N2.  v(wk) uncorrelated. That is C  A C A T

v

u

0 1     2   0      2 N   



The KLT decorrelates the original input. Information Theory

45

 1. 2.

3.

4.

Remarks : The KLT is input data dependent, for an NxN image, the eigenvectors of an N2xN2 matrix have to be found. Given a block of N samples, the KLT packs the maximum amount of variance into the first k coefficients (compared to any other transforms) where k