2.1 BASIS TRANSFORM CODING. Mathematical concepts of transform coders are depicted in Figure. 2-1. A frame buffer arranges N succesive source samplesĀ ...
Adaptive Transform Coding of Speech Signals by Richard Jamss Pinnell
B. Eng (McGill)
McGill University Montreal
Canada
May 1982
ACKNOWIJ3DGEM ENTS
I would l i k e t o thank my t h e s i s s u p e r v i s o r , D r . P. Kabal f o r h i s v a l u a b l e encouragement and guidance i n both t h e experimental a s p e c t o f t h i s work and i n t h e p r e p a r a t i o n o f t h i s t h e s i s .
CHAPTER 1
INTRODUCTION
CHAPTER 2
THE THEORY OF TRANSFORM CODING
2.1 2.2 2.3 2 4 2-5
2.6 CHAPTER 3* 3.1 3-2
3.3 CHAPTER 4
.. .. .. .. .. .. .. .. .. .. .. .. .. .. 2-1 . . . . . . . . . . . . . . 2-4 2-7 . . . . . . . . . . . 2-9 . . . . . . . . . . . . . 2-10 . . . . . . . . . 2-12
BASIS TRANSFORM CODING QUANTIZATION STRATEGY OPTIMAL BIT ASSIGNMENT THE KARHUNEN-LOEVE TRANSFORM SUB-OPTIMAL TRANSFORMS THE DISCRETE COSINE TRANSFORM
ADAPTIVE TRANSFORM CODING 3-4 . . . . . . . . .. .. .. .. .. .. .. .. :. .. 3-7 . . . . . . . . . . . . . . . 3-11
LOG-LINEAR SMOOTHING TECHNIQUE ALL-POLE MODEL HOMOMORPHIC MODEL CODER EVALUATION
. . .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. .. .. .. .. .. .. .. . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. .. .. .. .. .. . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. ............... ..............
SIMULATION PROCEDURE THELPCCODER Coder O p e r a t i o n Reducing Transform Complexity Side Information Interpolation S i d e I n f o r m a t i o n Parameter S t a t i s t i c s And Quantization The Low-Pass E f f e c t Frame Boundary D i s c o n t i n u i t i e s Transform C o e f f i c i e n t S t a t i s t i c s S u b j e c t i v e E f f e c t Of Pre-emphasis And S p e c t r a l Shaping THE HOMOMORPHIC CODER Coder O p e r a t i o n Coder Performance CHAPTER 5
CONCLUSIONS
LIST OF FIGURES
FIGURE
PAGE
TITLE
.............. 2-3 PRE-SCALING EFFECT ...................... 2-6 SUB-OPTIMAL TRANSFORM PERFORMANCE ...... 2-11 BLOCK BOUNDARY DISTORTION .............. 2-16 TRANSFORM CODING STRUCTURE
GENERAL STRUCTURE OF ADAPTIVE TMNSFORM CODING
................................... 3-3
LOG-LINEAR SMOOTHING
3-6
a...................
............ 3-9 LPC PITCH MODEL ........................ 3-10 HOMOMORPHIC SIDE INFORMATION PROCESSING . 3-14 LPC ADAPTIVE TRANSFORM CODER
HOMOMORPHIC ADAPTIVE TRANSFORM CODER STRUCTURE
.............................. 3-15
LPC ADAPTIVE TRANSFORM CODER WAVEFORMS
.
4-5
.................. 4-10 CODER SNR PERFORMANCE .................. 4-11 CODER SNR PERFORMANCE .................. 4-13 SIDE INFORMATION INTERPOLATION ......... 4-17 REFLECTION COEFFICIENT HISTOGRAMS ...... 4-22 AVERAGE ENERGY PARAMETER ............... 4-24 CODER SNR PERFORMANCE .................. 4-25
CODER SNR PERFORMANCE
................. 4-28 ANALYSIS FRAME WINDOWING ............... 4-32 FRAME BOUNDARY DISCONTINUITY ........... 4-33 TRANSFORM COEFFICIENT HISTOGRAM ........ 4-35
VISIBLE BIT ASSIGNMENT
LIST OF FIGURES
FIGURE
TITLE
PAGE
TRANSFORM COEFFICIENT HISTOGRAMS
....... 4-36
TRANSFORM COEFFICIENT QUANTIZER PERFORMANCE HOMOMORPHIC WAVEFORMS CODER CODER
............................ 4-38 ADAPTIVE TRANSFORM CODER
.............................. 4-43 SNR PERFORMANCE .................. 4-46 SNR PERFORMANCE .................. 4-47
ABSTRACT Frequency considerable
domain
coding
attention.
techniques
Prominent
have
( 8-16 k b / s e c ).
rates
for
low
components
using
to
medium
Adaptive t r a n s f o r m c o d e r s d i v i d e s p e e c h
i n t o f r e q u e n c y components by u s i n g a s u i t a b l e t r a n s f o r m these
received
among t h e s e t e c h n i q u e s , a d a p t i v e
transform coding o f f e r s e x c e l l e n t speech q u a l i t y data
recently
pulse
code
modulation
and
transmit
(PCM). Three b a s i c
i s s u e s i n t h e d e s i g n of a d a p t i v e t r a n s f o r m c o d e r s a r e : ( 1 ) S e l e c t i o n of t h e b e s t tran'sform ( 2 ) S e l e c t i o n of the b e s t quantization s t r a t e g y
( 3 ) S e l e c t i o n of a s p e c t r a l p a r a m e t e r i z a t i o n t e c h n i q u e This t h e s i s discusses
design
considerations
with
emphasis
on
f i n d i n g v a r i a n t s o f a d a p t i v e t r a n s f o r m a l g o r i t h m s amenable t o hardware implementation.
I n t h i s c0ntex.t c o d e r performance u s i n g reduced frame
l e n g t h s is presented. caused
by
effects
frame are
distortion. transform presented.
O b j e c t i v e and s u b j e c t i v e performance r e d u c t i o n ,
boundary
investigated Results
coders
from
using
discontinuities
as
the
two
all-pole
and
primary
computer and
low-pass
sources
simulations
of of
filtering perceptual adaptive
homomorphic s p e c t r a l f i t s a r e
SOMMAIRE
L e e techniques d e codage d a n s l e domaine frequentiel ont recemment fait l'objet
d'une
attention considerable.
L e codage d e transformees
par adaptation y occupe une place d e choix parce qu'il
permet o n e
excellente qualitd d e transmission d e la parole pour d e s debits faiblea o u moyens (8-16 kHz).
Les syst'emes d e codage d e transformees
par adaptation effectuent une segmentation d e la parole e n diverses composantes frequentielles grtce 1 l'utilisation d'une
transformee
appropriee et transmettent c e s c o m p o s a n t e s 1 l'aide d e l a modulation par impulsion et codage (MIC).
Les codeurs d e transform6es par
adaptation sont associes 1 trois questions fondamentales: (1)
Selection d e la meilleure transformie
(2) (3)
Selection d e l a meilleure strategic d e quantification Selection d'une
technique d e definition des param'etres
spectraux La presente thSse traite d e consideration theoriques et met l'accent sur la determination d e variantes d'algorithmes
relatifs aux
transform6es par adaptation, pouvant stre tradhits e n syst'emes mecaniques.
Dans c e contexte, o n presente les performances d e
codage, faisant appel 1 d e s longueurs d e trames
reduites.
On
analyse les reductions d e s performances objectives et subjectives resultant des discontinuit6s des limites de trames et d e s effets d e filtrage passe-bas, envisagees comme les sources principales d e la distorsoin liEe i la perception.
On examine enfin les
resultats d e deux simulations par ordinateur d e codeurs d e transforrn6e.s par adaptation, faisant appel 2 des courbes homomorphiques spectrales et entisrement polaires.
CHAPTER 1 INTRODUCTION
The o b j e c t i v e o f s p e e c h coding i s t o t r a n s m i t t h e h i g h e s t q u a l i t y speech
over
the
l e a s t p o s s i b l e channel c a p a c i t y w h i l e employing t h e
l e a s t complex c o d e r . however
directly
Coder
linked
efficiency
in
channel
utilisation
t o c o d e r c o m p l e x i t y and c o s t .
advances i n LSI ( l a r g e s c a l e i n t e g r a t i o n ) t e c h n o l o g y available
more
sophisticated
digital
signal
reduced c o s t s .
Thus, t e l e p h o n e networks
switching
processing
and
of
are
voice s i g n a l s .
is,
Fortunately,
are
now
making
processing devices a t moving
toward
digital
I n v e s t i g a t i o n s o f more
complex coding schemes a r e c o n t i n u i n g i n t h e l i g h t o f t h e s e r e c e n t LSI technology
advances.
This
new
technology
offers
greater
system
f l e x i b i l i t y and c o n s i d e r a b l e c o s t advantage. Speech c o d e r s can be d i v i d e d i n t o two d i s t i n c t c l a s s e s ; coders
and
source
coders
f a c s i m i l e reproduction of statistics
of
a
signal,
(vocoders) the the
signal
.
Waveform
waveform.
coders s t r i v e f o r
By
observing
the
waveform c o d e r can be t a i l o r e d t o t h e
s i g n a l r e s u l t i n g i n reduced coding e r r o r , and a more coder.
waveform
signal
specific
Source c o d e r s employ a minimal p a r a m e t r i c d e s c r i p t i o n d e r i v e d
from a h y p o t h e s i s o f s p e e c h p r o d u c t i o n .
Consequently, t h e s e u n i t s can
be
o p e r a t e d a t lower t r a n s m i s s i o n r a t e s .
Source c o d e r s a r e a l s o more
s e n s i t i v e t o s p e a k e r v a r i a t i o n and background n o i s e t h a n a r e t h o s e t h e waveform
classification.^
I n speech coding, transmission r a t e s coders
the
is
determine
Speech
quality
s o u r c e c o d e r s can be u s e d ,
to
produce
Waveform c o d i n g can be performed
in
either
Two
examples
into
a
of
the
number
synthetic
latter
are
time
subband
using
a
block
waveform
components
can
and
the
be
re-synthesized
subsequent
transformation i f a input
transform signal
is
filter was
short
by
or
frequency
and
adaptive
bank
These
A r e p l i c a of the
decoding
the
frequency
summation
or,
inverse
used.
Both
methods
originally
quasi-stationary
modelled by a s h o r t time spectrum. the
quality
transformation.
f r e q u e n c y components a r e t h e n q u a n t i z e d and encoded.
of
waveform
of f r e q u e n c y bands by u s i n g a f i l t e r bank, o r
i n t o f r e q u e n c y components by
assume
for
Frequency domain coding i s accomplished by d i v i d i n g
transform coders.
input
of
1.
domains.
speech
class
A t lower r a t e s (below
c o d e r s d e c l i n e s v e r y r a p i d l y below t h i s f i g u r e .
speech [ I
which
Above 5 k b / s e c waveform c o d e r s o f f e r
more e f f e c t i v e .
communication and t o l l q u a l i t y speech.
5 kb/sec.)
of
and
can
be l o c a l l y
Perceptually important
components
t i m e spectrum must be i s o l a t e d and t r a n s m i t t e d w i t h o u t
incurring excessive delay o r distortion. A d d i t i o n a l demands a r e p l a c e d on s p e e c h context
in
which
they
are
speech coders is i n telephony. little
control
used.
coding
schemes
by
the
A l i k e l y a r e a of a p p l i c a t i o n f o r
S i n c e a telecommunications c a r r i e r h a s
o v e r t h e type of s i g n a l s t h e network w i l l s u p p o r t , i t
is h i g h l y d e s i r a b l e t h a t s p e e c h c o d e r s
1-2
support
a
variety
of
input
signals
i n c l u d i n g modem s i g n a l s .
I n a m i l i t a r y context encryption is
made p o s s i b l e by t h e d i g i t a l n a t u r e
of
speech
coders.
Since
good
q u a l i t y i s n o t e s s e n t i a l , maximum s p e e c h compression i s one of
speech
t h e primary objectives. The mathematical p r i n c i p l e s behind t r a n s f o r m formulated
by
Huang
in
a
paper
entitled
quantizing
blocks
of
correlated
independent
random
variables.
efficiently
are
allocated
f o r the block a r e exausted.
c o n s t r u c t s (from t h e original
variables
best choice f o r derived
for
variables. Encoding
Segall for
the
random
one-by-one
until
number
[3]
Vector
of in
and
bits
a
random
the
bits
A second l i n e a r t r a n s f o r m a t i o n
values)
transform
A
variables
transformed
the
best
i n a mean s q u a r e e r r o r s e n s e .
each
the
quantized
quantized
dependent Then
variables
Q u a n t i z a t i o n of
Gaussian random v a r i a b l e s .
l i n e a r transformation f i r s t converts the into
"Block
first
were
Huang d e v e l o p s a p r o c e d u r e
C o r r e l a t e d Gaussian Random V a r i a b l e s " [2]. for
coding
an
estimate
of
the
Huang d e v e l o p s t h e
approximate
expression
is
a s s i g n e d t o each o f t h e q u a n t i z e d
paper
entitled
"Bit
Allocation
and
s o u r c e s " o b t a i n e d a more p r e c i s e e x p r e s s i o n f o r
t h e a l l o c a t i o n of a v a i l a b l e b i t s t o q u a n t i z a t i o n
of
the
transformed
variables. Z e l i n s k i and No11 [ 4 ] developed principles
discussed
by
c o n t r i b u t i o n was
an
discrete
transform.
cosine
t e r m spectrum quantization.
Huang
adaptive
obtained
from
a
speech
and
coder
Segall.
quantization
based Their
strategy
on
the
important
employing
the
The a d a p t a t i o n i s c o n t r o l l e d by a s h o r t the
transform
coefficients
prior
to
The s h o r t term s p e c t r u m i s t h e n p a r a m e t e r i z e d and s e n t
t o t h e r e c e i v e r a s s i d e information.
A second paper by
Zelinski
and
No11 [5] p r e s e n t s r e f i n e m e n t s t o t h e s i d e i n f o r m a t i o n p a r a m e t e r i z a t i o n technique. strategy
The
paper
aimed
discusses
improvements
to
the
quantization
a t improving t h e s u b j e c t i v e performance o f t h e coder.
Two p a p e r s by T r i b o l e t and C r o c h i e r e [ 6 , 7 ] d i s c u s s a d a p t i v e coders
which
employ
all-pole
transform
modelling o f t h e s h o r t term spectrum.
The p a p e r s compare sub-band c o d e r s and a d a p t i v e
transform
t h e context of an analysis/synthe;is
Cox and C r o c h i e r e [ 8 ]
in a
paper
Coding"
entitled
"Real-Time
framework. Simulation
of
coders
Adaptive
in
Transform
d e v e l o p a homomorphic model f o r p a r a m e t e r i z a t i o n o f t h e s h o r t
term spectrum.
Cox c l a i m s t h e
technique
performs
as
well
the
as
L
all-pole
model
context.
Numerous a u t h o r s have
transform
and
is
easier
to
implement
contributed
i n a r e a l t i m e coding
to
the
development
of
coding d i r e c t l y and i n d i r e c t l y b u t t h e above p a p e r s a r e t h e
most o f t e n quoted a s r e f e r e n c e s . This t h e s i s reviews c u r r e n t transform coding directed
towards
finding
variants
amenable t o hardware implementation.
and
bit
assignment
Chapter 2
presents
the
algorithms
is
discussed
3 c o n s i d e r s v a r i o u s a d a p t i v e t r a n s f o r m coding s t r a t e g i e s term
Three
techniques
transmission
spectrum of
to
parameterizing
o f l i s t e n i n g t e s t s a r e used t o
their
the
evaluate
causes
from
a
Chapter
employing
a
short
term
spectrum
for
I n Chapter 4 t h e r e s u l t s
coded
speech
generated
by
These c o d e r s u s e e i t h e r a l l - p o l e o r homomorphic
m o d e l l i n g of t h e s h o r t term spectrum. and
theory
adapt the transform c o e f f i c i e n t q u a n t i z e r s .
t o t h e r e c e i v e r a r e presented.
computer s i m u l a t i o n s .
is
The a p p l i c a b i l i t y o f v a r i o u s
t h e o r e t i c a l s t a n d p o i n t f o r t h e i r u s e f u l n e s s i n c o d i n g speech.
short
and
of adaptive transform algorithms
and b a s i c s t r u c t u r e o f t r a n s f o r m coding. transforms
strategies
are
identified. 1-4
Impairments i n Techniques
speech to
combat
quality these
i m p a i r m e n t s a r e implemented. perceptual discussed. energy
observations
The e f f e c t of reduced on
frame
boundary
Q u a n t i z a t i o n n o i s e s h a p i n g and n o i s e
frequency
bands
are
studied.
sizes
and
discontinuities
are
insertion
into
low
summarizes
the
Chapter
i m p o r t a n t r e s u l t s and p r e s e n t s c o n c l u s i o n s based on from t h e s i m u l a t i o n of t h e s e c o d e r s .
frame
5
results
obtained
CHAPTER 2 THE THEORY OF TRANSFORM CODING
I n t h i s s e c t i o n t h e t h e o r y o f bransform c o d i n g is related
topics
are
discussed.
The
developed
t r e a t m e n t emphasizes i m p o r t a n t
e l e m e n t s i n c l u d i n g t h e b a s i c s t r u c t u r e of t r a n s f o r m c o d i n g , and
justification
discussion Sub-optimal
of
an
to
transform
and
bit
assignment
rule.
a r e i n t r o d u c e d and compared w i t h t h e d i s c r e t e
The l a t t e r frame
selection
t h e mean s q u a r e e r r o r d i s t o r t i o n c r i t e r i a , and
optimal
transforms
c o s i n e transform. resistant
of
and
is
known
discontinuity
to
be
a
good
distortion.
choice
The
and
following
p r e s e n t a t i o n includes s u f f i c i e n t theory t o support the s u b j e c t matter.
2.1
BASIS TRANSFORM CODING Mathematical c o n c e p t s of t r a n s f o r m c o d e r s a r e d e p i c t e d i n
2-1.
A frame b u f f e r a r r a n g e s N s u c c e s i v e s o u r c e samples x ( n ) i n t o t h e
X. source vector sampler
The s p e e c h i s assumed t o
be
bandlimited
with
s a t i s f y i n g t h e s a m p l i n g theorm i n o r d e r t o a v o i d a l i a s i n g .
l i n e a r t r a n s f o r m a t i o n i s performed on t h e s o u r c e v e c t o r the
Figure
transform
coefficient
vector
Y. -
Such
an
5
to
operation
r e p r e s e n t e d by t h e m a t r i x e q u a t i o n 2.1 where A i s u n i t a r y .
the
A
obtain can
be
eq.
-Y =
2.1
A-X
R e c o n s t r u c t e d o u t p u t samples a r e o b t a i n e d from t h e q u a n t i z e d t r a n s f o r m A
vector
Y by -
inverse transformation.
The m a t r i x e q u a t i o n r e p r e s e n t i n g
t h i s o p e r a t i o n is
X = A - ~-Y A
eq.
2.2
A
The o v e r a l l mean squared o v e r a l l d i s t o r t i o n o f t h e
coding
scheme
is
equal to the t o t a l quantization error i.e.
Minimization strategy
of
distortion
and t r a n s f o r m .
requires
an
appropriate
quantization
A s w i l l be shown l a t e r a n e c e s s a r y c o n d i t i o n
f o r minimum coding e r r o r i s t h a t e v e r y t h e same amount o f d i s t o r t i o n .
transform
coefficient
suffer
>
xin)
a
frame buffer
X
A
A-1
OUAHTlZER
'
frame buffer
Fin)
1
1
GENERAL T W F O R n C(1DINO
A
L
r
rln)
I
X
frame buffrr
-
t a
A
~1
v
A
B BASIS RESTRICTED TRANSFORn CCOINO
TRANSFORM
CODING
FIGURE
2-1
BASICS
X -
frrmr buffer
9i.r -
2.2
QUANTIZATION STRATEGY
Quantization
strategy
refers
to
quantize the transform c o e f f i c i e n t .
the
restricted
...n
independently.
q u a n t i z a t i o n schemes a r e c o n s i d e r e d h e r e .
b e r a t i o n a l i z e d by dimensional
considering
Gaussian
the
source
X
vector
variables.
shows
[2]
X
Since
components y i a r e n o t Huang
to
only
that
is
to
Y
Gaussian,
Y
uncorrelated
but
is
Only
T h i s can be
an
N
A nonsingualar
random v a r i a b l e with z e r o mean.
X t o y i e l d t h e transform v e c t o r m a t r i x A o p e r a t e s on -
random
employed
Basis r e s t r i c t e d transform coding
schemes q u a n t i z e t h e c o e f f i c i e n t s y i i = l , 2 , 3 basis
technique
o f uncorrelated and i t s
Gaussian
actually
independent.
t h e b a s i s r e s t r i c t e d q u a n t i z a t i o n schemes a r e
o p t i m a l when t h e t r a n s f o r m c o e f f i c i e n t s a r e i n d e p e n d e n t . The
quantization
adaptation
and
bit
strategy assignment
e q u a t i o n 2.3 can be reduced (generally
different)
coefficients. for
each
transform
rules.
by
number
characterized
assigning of
levels
the
is
known
quantizers
coefficients
by
an
by
quantizers
with
accomplished
estimate
size
suitable
t o each o f t h e N t r a n s f o r m quantization
a s t h e b i t assignment. is
step
Overall d i s t o r t i o n given i n
The d i s t r i b u t i o n o f t h e number of
quantizer
adaptation of
is
of
by
the
levels
The s t e p s i z e
pre-scaling
the
coefficients.
The
d i s t r i b u t i o n o f t h e c o e f f i c i e n t s , i n g e n e r a l , depends on t h e t r a n s f o r m and
source s i g n a l .
A n a l y t i c c a l c u l a t i o n of t h e d i s t r i b u t i o n f u n c t i o n
i s too d i f f i c u l t t o y i e l d meaningful r e s u l t s e x c e p t i n s p e c i a l
cases.
Goldberg and C o s e l l [ 9 ] o b t a i n e d numerical r e s u l t s f o r d i s c r e t e c o s i n e t r a n s f o r m (DCT) c o e f f i c i e n t s showing t h e c o e f f i c i e n t lie
between
effect
of
the
Gaussian
pre-scaling
and
transform
Laplace
distribution
distribution.
coefficients
is
to
to
However, t h e make
the
d i s t r i b u t i o n bi-modal a b o u t +1 and - 1 .
I n t h e l i m i t of p e r f e c t energy
+I
e s t i m a t i o n , t h e d i s t r i b u t i o n approaches a p a i r o f impulses a t
-1.
and
The p r e - s c a l i n g e f f e c t i s i l l u s t r a t e d i n f i g u r e 2-2 and i s v a l i d
f o r any d i s t r i b u t i o n . transform
I n any e v e n t , t h e d i s t r i b u t i o n
is
coefficients
such
that
an
of
improvement
v a r i a b l e t i m e domain q u a n t i z a t i o n (PCM) can
be
the
scaled
over
single
expected.
This
is
e x p l a i n e d i n S e c t i o n 2.5 by t h e concept o f t r a n s f o r m c o d i n g g a i n . I n o r d e r t o minimize t h e d i s t o r t i o n measure and
the
quantization
Further, the
the
optimal
distortion
coefficients approach. measure
to
must
validate
the
in
the
the
an
produce basis
mean
optimal
signal-to-noise the
square
ratio
perceptual basis.
maximizes t h e segmental
where
selected.
SNR
is
transform
error
coding
(MSE) d i s t o r t i o n
with t h i s property. desireable
(SNR) f o r speech
transform
each
quality.
The
because block. The
it This
MSE
is
Thus t h e t r a n s f o r m c o d i n g scheme the
segments
are
the
analysis
Segmental SNR i s a b e t t e r i n d i c a t o r o f t h e p e r c e p t u a l q u a l i t y
o f s p e e c h t h a n SNR. measure
measure
restricted
measure
minimized on a block-by-block
frames.
The o p t i m a l
decorrelated
transform
distortion
c o r r e l a t e s well with
distortion
transform
t r a n s f o r m a t i o n r e s u l t i n g from t h e s e l e c t i o n o f
measure
s e l e c t i o n o f t h e MSE maximizes
on
M i n i m i z a t i o n of t h e results
the
b i t assignment must be o p t i m i z e d . depend
l i n e a r transform w i l l
both
may
Nevertheless, s e l e c t i o n
result
in
introduction
d i s t o r t i o n s i n t o t h e coded speech.
of
of
the
MSE
distortion
unacceptable
perceptual
Perceptual factors
i n t o a c c o u n t by modifying t h e b i t assignment.
can
be
taken
A ) TRANSFORM COEFFICIENT DISTRIBUTION
8) PRE-SCALED TRANSFORM COEFFICIENT DISTRIBUTION (good estimate)
C) PRE-SCALED TRANSFORM COEFFICIENT DISTRIBUTION (better estimate)
TRANSFORM COEFFICIENTS
NOWLIZED TRANSFORH COEFFICIENTS
TRANSFUAW CCEFFICIENTS
PRE- SCALING
F I G U R E 2-2
EFFECT
2.3
OPTIMAL B I T ASSIGNMENT The t r a n s f o r m c o e f f i c i e n t yi w i t h
with
Ri
bits/sample
2.4
u2 i
requires
Ri
6
=
The second term distributed
t h e mean squared d i s t o r t i o n Di i s n o t t o be
+ LOG^ is
the
Gaussian
minimal
random
rate
for
variables.
independent The
identically
correction
factor
depends on t h e type of q u a n t i z e r and t h e p r o b a b i l i t y d e n s i t y (pdf)
of
coding
Ri i s g i v e n by:
exceeded.
eq.
if
variance
the
signal.
Neglecting
the
dependence
s u b s t i t u t i n g f o r Di t h e o p t i m a l number o f b i t s
for
function
o f R i on 6 quantizer
6
and
Qi i s
found by minimizing t h e a v e r a g e d i s t o r t i o n g i v e n by
with t h e c o n s t r a i n t of a fixed average b i t r a t e i . e .
The e x p r e s s i o n f o r t h e a v e r a g e d i s t o r t i o n i s minimized s u b j e c t t o
the
c o n s t r a i n t o f a f i x e d b i t r a t e by t r e a t i n g R i a s a c o n t i n i o u s v a r i a b l e and u s i n g an undetermined m u l t i p l i e r
aa ~{%l ,N I0 eq.
2.5
i=l
.-2Riln2
one o b t a i n s :
N
+
6 1 Ri} i=l
It follows t h a t
eq.
2.6
Ui e
-2Riln2
- 21n2 NB - Const f o r i 1 , 2
...N
S o l v i n g e q u a t i o n 2.6 f o r Ri and e v a l u a t i n g C u s i n g t h e c o n s t r a i n t o f a f i x e d b i t r a t e we o b t a i n :
eq.
2.7
R~ =
R LOG^
Given an o p t i m a l b i t a s s i g n m e n t , t h e
lower
bound
on
distortion
is
g i v e n by
D i s t o r t i o n i n t r o d u c e d by a t r a n s f o r m coding scheme depends on t h e distribution
of t h e v a r i a n c e s .
I n p a r t i c u l a r , t h e average d i s t o r t i o n
i s determined by t h e g e o m e t r i c mean of t h e v a r i a n c e s .
The o p t i m a l q u a n t i z a t i o n scheme d i s c u s s e d presented
in
a
paper
above
was
by Huang and S c h u l t h e i s s [ 2 ] .
even n e g a t i v e b i t a s s i g n m e n t s can r e s u l t because Ri i s continous
variable.
originally
F r a c t i o n a l and treated
as
a
S e g a l l [3] d i s c u s s e s t h e o p t i m a l b i t assignment
under t h e c o n s t r a i n t o f p o s i t i v e i n t e g e r b i t assignment.
B i t s a r e assigned optimally i n the Fortran for
this
thesis.
The
t r a n s f o r m c o e f f i c i e n t s Yk tabulated
in
Max's
technique
.
uses
k
simulation
bit
quantizers
developed f o r the
The r e s u l t i n g mean s q u a r e e r r o r s ~ ( k )a r e
paper [lo].
It f o l l o w s t h a t t h e m a r g i n a l r e t u r n
f o r t h e i t h t r a n s f o r m c o e f f i c i e n t can be d e f i n e d a s
Arranging
k i n d e s c e n d i n g o r d e r , and a s s i g n i n g b i t s one-by-one,
the
g l o b a l minimum mean s q u a r e e r r o r w i l l be achieved i n d e p e n d e n t l y o f t h e 2-8
d i s t r i b u t i o n of t h e t r a n s f o r m c o e f f i c i e n t s .
2.4
THE KARHUNEN-LOEVE TRANSFORM C o n s i d e r a d i s c r e t e s i g n a l of N sampled v a l u e s .
be r e p r e s e n t e d as p o i n t i n an N d i m e n s i o n a l space.
This s i g n a l
Each sampled v a l u e
i s t h e n a component o f t h e N v e c t o r X which r e p r e s e n t s t h e
this
space.
Next
can
signal
in
c o n s i d e r a u n i t a r y t r a n s f o r m ( T ) o p e r a t i n g on t h e
data vector X r e s u l t i n g i n t h e transform v e c t o r Y.
The
objective
in
d a t a compression i s t o s e l e c t a s u b s e t o f M components o f Y where M i s less
N.
than
introduces
The
some
remaining
distortion.
copponents
are
discarded
and
this
A u n i t a r y t r a n s f o r m which minimizes t h e
mean s q u a r e e r r o r caused by d i s c a r d i n g components
is
the
objective.
Some o f t h e i m p o r t a n t p r o p e r t i e s of t h e KLT a r e d e s c r i b e d below. The KLT i s a d a t a dependent t r a n s f o r m a t i o n
whose
basis
vectors
a r e e i g e n v e c t o r s o f t h e a u t o c o r r e l a t i o n m a t r i x of t h e X p r o c e s s . transform d i a g o n a l i z e s t h e a u t o c o r r e l a t i o n matrix of vector
Y which
means
the
components
Gaussian assumption independent.
are
the
components.
2 by in The
a
Each t r a n s f o r m c o e f f i c i e n t can
lower
dimensional
only
space
then
It i s p o s s i b l e by
discarding
mean s q u a r e e r r o r r e s u l t i n g from t h i s approximation
i s t h e sum o f t h e v a r i a n c e s o f t h e d i s c a r d e d
If
transformed
u n c o r r e l a t e d and by t h e
be q u a n t i z e d i n d e p e n d e n t l y w i t h o u t l o s i n g performance. t o approximate
This
transform
coefficients.
t h e components o f Y with the lowest variances a r e discarded,
t h e approximation i s o p t i m a l i n a mean s q u a r e e r r o r s e n s e .
Two l i m i t a t i o n s of t h e Karhunen-Loeve t r a n s f o r m a r e , t h a t computationally
burdensome
and,
requires
solutions
problems whose s o l u t i o n s may be n u m e r i c a l y u n s t a b l e . the
KLT
receiver
requires to
a
perform
knowledge an
of
inverse
the
it
is
of eigenvector More
precisely
c o r r e l a t i o n function a t the
transformation.
The
correlation
function is not generally a v a i l a b l e a t the receiver.
2.5
SUB-OPTIMAL TRANSFORMS
The p r a c t i c a l l i m i t a t i o n s o f t h e KLT r e q u i r e an i n v e s t i g a t i o n sub-optimal
transforms.
The
discrete
of
F o u r i e r t r a n s f o r m (DFT), t h e
d i s c r e t e c o s i n e t r a n s f o r m (DCT) and t h e Walsh-Hadamard t r a n s f o r m (WHT) are
all
useful
sub-optimal
transforms.
A
method
t o compare t h e
performances of u n i t a r y t r a n s f o r m s i n t r a n s f o r m coding a p p l i c a t i o n s is highly desirable. Assuming t h e function
(pdf)
transform of
the
only
affects
the
probability
sampled
process
slightly,
p a r a m e t e r 6 i s unchanged whether q u a n t i z i n g i n t h e time domain.
Hence
e i t h e r domain.
the
density
the
quantizer
or
transform
dependence of t h e d i s t o r t i o n on 6 i s t h e same i n
From e q u a t i o n 2.4,
i t i s c l e a r t h a t a lower
bound
on
t h e d i s t o r t i o n of a PCM i s g i v e n by
eq.
2.8
D
= 2 26 2' 2 i i
Pcm
r1
Ni=l
2
i
D e f i n i n g t h e t r a n s f o r m c o d i n g g a i n o v e r PCM a s o v e r PCM, w i l l e n a b l e u s e f u l comparisons.
G
D A pcm tc Dtc
the
increase
in
SNR
The t r a n s f o r m g a i n f o r any u n i t a r y t r a n s f o r m i s t h e n t h e r a t i o of arithmetic
and
geometric
c o e f f i c i e n t a s g i v e n above. of
mean
of
the
variances
of t h e t r a n s f o r m
Variances a r e j u s t t h e diagonal
t h e c o - v a r i a n c e m a t r i x i n t h e t r a n s f o r m domain.
elements
No11 [4] modelled
t h e l o n g term s t a t i s t i c s o f v o i c e d speech by a s t a t i o n a r y t e n t h Markov s o u r c e .
These r e s u l t s a r e shown i n F i g u r e 2-3.
t r a n s f o r m coding g a i n dependence on block l e n g t h N. lengths, well.
the
KLT
performance
improves.
The
For l a r g e r
of
The DCT performs n e a r l y as
ap$lications.
This
algorithm.
large
Distribution
block
of
the
t h e d i s c r e t e cosine transform c o e f f i c i e n t s a r e h o w n t o
converge a s y m p t o t i c a l l y t o t h e power d e n s i t y spectrum o f
[4].
block
d i s c r e t e s l a n t t r a n s f o r m (DST) and WHT show a v e r y poor
performance i n t r a n s f o r m c o d i n g variances
order
and i l l u s t r a t e
The DFT and DCT converge t o KLT performance f o r a
length.
the
property
is
used
in
the
LPC
the
process
a d a p t i v e b i t assignment
2.6
THE DISCRETE COSINE TRANSFORM For
any
practical
computational
savings
transform
offered
by
coding the
DCT
implementation,
its
and
performance, make i t a n a t t r a c t i v e a l t e r n a t i v e t o t h e and
KLT.
optimal The
DCT
KLT a r e a s y m p t o t i c a l l y e q u i v a l e n t i f t h e d a t a c o v a r i a n c e m a t r i c e s
a r e T o e p l i t z hence i t i s n o t s u r p r i s i n g t h a t t h e DCT as
near
the
well
as
the
performs
KLT when t h e d a t a v e c t o r s a r e l a r g e [ I I ] .
nearly The r a t e
d i s t o r t i o n c r i t e r i a o f t h e KLT and DCT a r e a l s o comparable [12].
Formally t h e DCT o f a r e a l M p o i n t sequence ~ ( k )can
eq.
be
defined
2.10
The i n v e r s e DCT i s g i v e n by
eq.
2.11
v(n)
=
k=O
The DCT o f a sequence v ( n ) i s c l o s e l y r e l a t e d t o a 2M p o i n t DFT related
sequence
u(n).
Interpretation
i n c l u d i n g formant
structure
transform
may
domain.
domain
and
therefore
pitch
of
striations
The f o l l o w i n g a n a l y s i s f o l l o w s
u(n) = v(n)
0