Part II. Image Data Compression. Prof. Ja-Ling Wu. Department of Computer
Science and Information Engineering. National Taiwan University ...
Part II Image Data Compression Prof. Ja-Ling Wu
Department of Computer Science and Information Engineering National Taiwan University
Contents I. II.
III. IV.
V. VI.
Introduction Predictive Techniques Transform Domain Coding Techniques Image Coding in Visual Telephony Coding of Two-Tone Images References
Information Theory
2
Image Data Compression I.
Introduction : Image data Compression is concerned with minimizing the number of bits required to represent an image. Applications of data compression are primarily in “Transmission” and “Storage” of information. Application of data compression is also in the development of “fast algorithms” where the number of operations required to implement an algorithm is reduced by working with the compressed data.
Information Theory
3
Image data Compression techniques
Pixel Coding
Predictive Coding
Transform Coding
Others
• PCM/quantization
• Data modulation
• Zonal coding
• Hybrid Coding
• Run-length coding
• Line-by-line DPCM
• Threshold coding
• Vector quantization
• Bit-plane coding
• 2-D DPCM
• Multi-D techniques
• Interframe techniques
• Adaptive
• Adaptive
Information Theory
4
Image data Compression methods fall into two common categories : A. Redundancy Coding : – Redundancy reduction – Information lossless Predictive coding : DM, DPCM B.
Entropy Coding : – Entropy reduction – Inevitably results in some distortion Transform coding
For digitized data, “Distortionless Compression” techniques are possible.
Information Theory
5
Some methods for Entropy reduction: Subsampling : reduce the sampling rate Coarse Quantization : reduce the number of quantization levels Frame Repetition / Interlacing : reduce the refresh rate (number of frames per second) TV signals
Information Theory
6
Predictive Techniques : Basic Principle : : to remove mutual redundancy between successive pixels and encode only the new information. II.
DPCM : A Sampled sequence u(m), coded up to m=n-1. Let u~n 1, u~n 2, be the value of the reproduced (decoded) sequence.
Information Theory
7
~n , an At m=n, when u(n) arrives, a quantify u estimate of u(n), is predicted from the previously decoded samples u~n 1, u~n 2, , i.e., u~n u~n 1, u~n 2, ; :" prediction rule"
prediction error : en u n u~n
~n is the quantized value of e(n), then the If e reproduced value of u(n) is :
u~n u~n e~n
Information Theory
8
DPCM CODEC u n
–
en
Quantizer
e~n
~ Communication e n + Channel
u~n
Predictor with delay
u~n
u~n u~n
+
Predictor with delay
+
Reconstruction filter/Decoder
Coder
Information Theory
9
Note :
u n u~ n en
u n u~ n u n u~ n en u~ n e~ n en e~ n
qn
: the Quantization error in e(n)
Remarks: 1. The pointwise coding error in the input sequence is exactly equal to q(n), the quantization error in e(n) 2. With a reasonable predictor the mean sequare value of the differential signal e(n) is much smaller than that of u(n)
Information Theory
10
Conclusion: For the same mean square quantization error, e(n) requires fewer quantization bits than u(n). The number of bits required for transmission has been reduced while the quantization error is kept the same.
Information Theory
11
Feedback Versus Feedforward Prediction An important aspect of DPCM is that the prediction is based on the output — the quantized samples — rather than the input — the unquantized samples. This results in the predictor being in the “feedback loop” around the quantizer, so that the quantizer error at a given step is fed back to the quantizer input at the next step. This has a “stabling effect” that prevents DC drift and accumulation of error in the reconstructed signal u~n .
Information Theory
12
If the prediction rule is based on the past input, the signal reconstruction error would depend on all the past and present quantization errors in the feedforward predictionerror sequence (n). Generally, the MSE of feedforward reconstruction will be greater than that in DPCM. Quantizer
u n
+
n
–
Predictor
u~n
+ +
u n Entropy coder/decoder
Predictor
Feedforward coding Information Theory
13
Example The sequence 100, 102, 120, 120, 120, 118, 116, is to be predictively coded using the prediction rule:
u~n u~n 1 for DPCM u n u n 1
for the feedforward predictive coder.
Assume a 2-bit quantizer, as shown below, is used, 5
1 -6
-4
-2
-1
2
4
6
-5
Except the first sample is quantized separately by a 7-bit uniform ~ 0 u 0 100. quantizer, given u
Information Theory
14
Input
DPCM
Feedforward Predictive Coder
N
~n en u(n) u
e~n u~n
0
100
—
—
—
100
0
—
—
—
100
0
1
102
100
2
1
101
1
100
2
1
101
1
2
120
101
19
5
106
14
102
18
5
106
14
3
120
106
14
5
111
9
120
0
-1
105
15
4
120
111
9
5
116
4
120
0
-1
104
16
5
118
116
2
1
117
1
120
-2
-5
99
19
u(n)
u n εn
Information Theory
~ε n u~n
u(n)
15
Delta Modulation : (DM)
Predictor : one-step delay function Quantizer : 1-bit quantizer
u~n u~n 1 en u n u~n 1
u n
+
en
e~n
+
–
–
u~n Unit Delay
u~ n
+
+
Integrator e~n +
u~n
+ u~ n
Unit Delay Information Theory
16
u n
u~n
Granularity
Slope overload
Primary Limitation of DM :
1)
Slope overload : large jump region Max. slope = (step size) (sampling freq.) Granularity Noise : almost constant region Instability to channel Noise
2) 3)
Step size effect : Step Size (i) slope overload (sampling frequency ) (ii) granular Noise Information Theory
17
Adaptive Delta Modulation S k 1
+1
+
EK 1
–
–1
Adaptive Function
k , Ek , min stored
k 1 Xk
Unit Delay
X k 1
+ +
EK 1 sgn S K 1 X K
K EK 1 12 EK if K min K 1 E if min K 1 K min X K 1 X K K 1
This adaptive approach simultaneously minimizes the effects of both slope overload and granular noise. Information Theory
18
DPCM Design
There are two components to design in a DPCM system : i. The predictor ii. The quantizer
Ideally, the predictor and quantizer would be optimized together using a linear or Nonlinear technique. In practice, a suboptimum design approach is adopted : i. Linear predictor ii. Zero-memory quantizer
Remark : For this approach, the number of quantizing levels, M, must be relatively large (M8) to achieve good performance. Information Theory
19
Design of linear predictor Sˆ0 a1S1 a2 S 2 an S n e S Sˆ 0
0
0
2 2 E S 0 Sˆ0 E S 0 a1S1 a2 S 2 an S n ai ai
2 E S 0 a1S1 a2 S 2 an S n S i 0 , i 1,2, n
E S 0 a1S1 a2 S 2 an S n Si 0 E S Sˆ S 0, i 1,2, n
0
0
Rij E Si S j
i
E S 0 Si E Sˆ0 S i
R0i E a1S1S i a2 S 2 Si an S n Si a1 R1i a2 R2i an Rni a1 a R0i R1i , R2i ,, Rni 2 an
ai R1i , R2i ,, Rni 1R0i Information Theory
20
When Sˆ0 comprises these optimized coefficients, ai, then the mean square error signal is :
2 ˆ σ E S0 S0 E S 0 Sˆ0 S 0 E S 0 Sˆ0 Sˆ0 But E S Sˆ Sˆ 0 (orthogonal principle) 2 e
E S
0
σ e2
0
Sˆ S E S E Sˆ S 0
0
0
0
2 0
0
0
R00 a1 R01 a2 R02 an R0 n
σ e2 : the variance of the difference signal R00 : the variance of the original signal
The variance of the error signal is less than the variance of the original signal. Information Theory
21
Remarks: 1. The complexity of the predictor depends on “n”. 2. “n” depends on the covariance properties of the original signal.
Information Theory
22
Design of the DPCM Quantizer
Review of uniform Quantizer: Quantization: 1. Zero-Memory quantization 2. Block quantization 3. Sequential quantization output
Y
X input
Midtread quantizer
Midriser quantizer
Quantization Error : QE = y(X) – X Average Distortion : D y X X 2 P X dX SNR : 2 SNR 10 log10 σD in dB
where σ 2 : the variance of the input x Information Theory
23
Uniform quantizer : p(x) is constant within each interval
Y Lower overload region
Granular
upper overload region
Region
X x1
x2
x3
x4
x5
x6
Quantization Error for Midtread Quantizer
Information Theory
24
Y
y8 y7 y6 y5 x1
x2
x3
x4
x5 y4
x6
x7
x8
x9
X
y3 y2
y1
Uniform Midtread Quantizer M=9
Output level yi always be in the midpoint of the input interval =xi-xi-1. Assume p(x) is constant in the interval =xi-xi-1 and equal to p(xi) Lower overload region : =x0-x1, x1 >> x0 Granular region : =xi-xi-1, 2 i M-1 upper overload region : =xM-xM-1, xM >> xM-1 M
xi
i 1
xi 1
D
yi x x2 pxi dx
M 1 yi x x 3 ~ pxi 3 i 2 xi1 xi
Where we assume the contribution of the overload region is negligible ; i.e. p(x1)=p(xM)=0
Information Theory
25
Since
xi yi x 2 xi1 yi x
Quantizer characteristics
2
M 1
D~ 121 p xi 3 i 2
M 1
But
p x 1 i
i 2
D~
2 12
(Source Model)
if the pdf is p(x)
1 2V
(V x V )
the input variance is
x 2 px 2
V
x V
2
1 2V
dx
V2 3
Information Theory
26
Then SNR 10 log10
2
D 2 V 12 10 log10 3 2 But 2MV for M 2 SNR 10 log10 M 2 20 log10 M if M 2n (n - bit quantizer) SNR 20n log10 2 6n (in dB) - valid only for PCM Quantizer Information Theory
27
B.
DPCM Quantizer The pdf of the input signal to the DPCM quantizer is not at all uniform. Since a “good” predictor would be expected to result in many zero difference between the predicted values and the actual values. A typical shape for this distribution is a highly peaked, around zero, distribution. pdf : p(d)
(Ex., Laplacian)
–
+
: Non-uniform Quantizer is required. Information Theory
28
X Compressor
C
C x
dC x 2 xmax dx M x
Uniform Quantizer
Q
QC x Expander
C 1
Y C 1QC x Non-uniform Quantizer compressor + uniform Quantizer + Expander Information Theory
29
Compressor
C(X)
C(X)
Xmax
-Xmax
uniform Quantizer
Xmax Xmax
-Xmax
y
C-1(X)
x
Non-uniform Quantizer
Expander Information Theory
30
For this model, the mean-square distortion can be approximately represented as : 1 M 1 1 3 D p ( x ) i x 12 M 2 12 i 2
L2
L1
px dx 2 x
where
x C ' x L L 2 1 L2 L1 is the quantizer range 2 xmax
C ' x is the slope of the nonlinear function dC x 2 xmax dx M x Information Theory
x
2 xmax MC ' ( x ) 31
Lloyd-Max Quantizer : the most popular one. 1. Each interval limit should be midway between the neighboring levels, yi yi1
xi
2
2. Each level should be at the centroid of the input prob. Density function over the interval for that level, that is
x y pxdx 0 xi
xi1
Logarithmic Quantizer : -law dC x 1 KX dx
V log 1 1xV yx log 1
: US. Canada, Japan
i
(log PCM) A-law Ax V , 0 x 1log A A y x V V log AxV V , A x V 1log A
: Europe Information Theory
32
If a Laplacian function is used to model p(e), pe
1 2 exp e 2e e Input pdf of the DPCM Quantizer
then the variance of the quantization error is: 2 g
2 3M 2
2g
9 e2 2M 2
V 0
1 2 e
1 3
exp
2 3 e
e de
3
as V
the SNR for the non-uniform quantizer in DPCM becomes :
SNR 10 log10 10 log10
2 2g
2 M 2 2 9 e2
Since M 2 n SNR 6.5 6n 10 log10 2 2
e
For the same pdf, PCM gives : SNR 6.5 6n DPCM improves the SNR by 10 log10 2 2
e
Information Theory
33
ADPCM : i. Adaptive prediction ii. Adaptive Quantization
DPCM for Image Coding :
Each scan line of the image is coded independently by the DPCM techniques. For every slow time-varying image (=0.95) and a Laplacian-pdf Quantizer, 8 to 10 dB SNR improvement over PCM can be expected : that is The SNR of 6-bit PCM can be achieved by 4-bit line-by-line DPCM for =0.97. Two-Dimensional DPCM : two-D predictor
Ex :
u m, n a, u m 1, n a2u m, n 1
a3u m 1, n 1 a4u m 1, n 1 Information Theory
34
Information Theory
35
Information Theory
36
Ⅲ. Transform Domain Coding Techniques Transform Coding : (Block Quantization) A block of data is unitarily transformed so that a large fraction of its total energy is packed in relatively few transform coefficients, which are quantized independently. The optimum transform coder is defined as the one that minimizes the mean square distortion of the reproduced data for a given number of total bits. the Karhunen-Loeve Transform (KLT) The function of the transformation is to make the original samples so that the subsequent operation of quantization may be done more efficiently.
Information Theory
37
In transform coding systems the total number of bits available for quantizing a block of transformed samples is fixed, and it is necessary to allocate these bits to the quantized transformed samples in such a way as to “minimize the overall quantization distortion”.
Information Theory
38
The KLT :
U : input vector : Nx1 random vector, covariance (zero mean) R A : NxN matrix, not necessary unitary V : transformed vector, each components v(k) are mutually uncorrelated. B : NxN matrix U• : reconstructed vector Problem : Find the optimum matrices A and B and the optimum quantizers such that the overall average mean square distortion 2 1 N D E u n u n N n 1 T 1 E u u u u N
is minimized. Information Theory
39
1.
Solution : For an arbitrary quantizer the optimal reconstruction matrix B is given by B = A–1 where is a diagonal matrix of elements rK defined as
~ K rK K ~ K E vk v * k
2 * K E v k
Information Theory
40
2.
The Lloyd-Max quantizer for each v(k) minimizes the overall mean square error giving =I (that is, B = A–1)
3.
The optimal decorrelating matrix A is the KL transform of U, that is the rows of A are the orthonormalized eigenvectors of the autocovariance matrix R. This gives B = A–1 = A*T
Information Theory
41
Simplification : Assume there is no quantizers Image [u(z)]
N lines N Pixels per line
u(x, yj) : all the N pixels in the jth line j = 1, 2, …N [u(z)] = [u(x, j1), u(x, j2),…,u(x, jL)]
the N2 vector composed of all the pixels taken in the normal raster scanned pattern sequence. [V(w)] = [A] [U(z)] Transformed N2x N2 pixels transform matrix
Image vector
Information Theory
42
N2
vwk u zi Aik
, k 1,2,..., N 2
u z A vw
A
i 1
N2
u zi vwk Aik
1
A
t
, i 1,2,..., N 2
i 1
target : vwk : uncorrelat ed
A is a matrix whose columns are the normalized eigenvectors of the covariance matrix of the original pixels.
Information Theory
43
The covariance matrix of u(z) :
Cu E u z Eu z u z Eu z
1
assume Eu z 0 and set u zi ui
E u12 E u2u1 Cu E u N 2 u1
E u1u2 E u22
E u
N2
u2
E u1 , u N2 2 E u2 , u N 2 E u N 2
Information Theory
44
Let denote the eigenvectors of Cu : Cu = det [Cu –I] = 0 Arrange ’s in decreasing order such that 1 2… N2 and substitute into (Cu –I) = 0 to solve for When the matrix [A] (whose columns are the functions) is applied to [u(z)], the covariance of the resulting coefficients v(wk) is a diagonal matrix with diagonal elements 1, 2,… ,N2. v(wk) uncorrelated. That is C A C A T
v
u
0 1 2 0 2 N
The KLT decorrelates the original input. Information Theory
45
1. 2.
3.
4.
Remarks : The KLT is input data dependent, for an NxN image, the eigenvectors of an N2xN2 matrix have to be found. Given a block of N samples, the KLT packs the maximum amount of variance into the first k coefficients (compared to any other transforms) where k