Adaptive Transform Coding of Speech Signals by Richard Jamss ...

Adaptive Transform Coding of Speech Signals by Richard Jamss Pinnell

B. Eng (McGill)

McGill University Montreal

Canada

May 1982

ACKNOWIJ3DGEM ENTS

I would l i k e t o thank my t h e s i s s u p e r v i s o r , D r . P. Kabal f o r h i s v a l u a b l e encouragement and guidance i n both t h e experimental a s p e c t o f t h i s work and i n t h e p r e p a r a t i o n o f t h i s t h e s i s .

CHAPTER 1

INTRODUCTION

CHAPTER 2

THE THEORY OF TRANSFORM CODING

2.1 2.2 2.3 2 4 2-5

2.6 CHAPTER 3* 3.1 3-2

3.3 CHAPTER 4

.. .. .. .. .. .. .. .. .. .. .. .. .. .. 2-1 . . . . . . . . . . . . . . 2-4 2-7 . . . . . . . . . . . 2-9 . . . . . . . . . . . . . 2-10 . . . . . . . . . 2-12

BASIS TRANSFORM CODING QUANTIZATION STRATEGY OPTIMAL BIT ASSIGNMENT THE KARHUNEN-LOEVE TRANSFORM SUB-OPTIMAL TRANSFORMS THE DISCRETE COSINE TRANSFORM

ADAPTIVE TRANSFORM CODING 3-4 . . . . . . . . .. .. .. .. .. .. .. .. :. .. 3-7 . . . . . . . . . . . . . . . 3-11

LOG-LINEAR SMOOTHING TECHNIQUE ALL-POLE MODEL HOMOMORPHIC MODEL CODER EVALUATION

. . .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. .. .. .. .. .. .. .. . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. .. .. .. .. .. . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. ............... ..............

SIMULATION PROCEDURE THELPCCODER Coder O p e r a t i o n Reducing Transform Complexity Side Information Interpolation S i d e I n f o r m a t i o n Parameter S t a t i s t i c s And Quantization The Low-Pass E f f e c t Frame Boundary D i s c o n t i n u i t i e s Transform C o e f f i c i e n t S t a t i s t i c s S u b j e c t i v e E f f e c t Of Pre-emphasis And S p e c t r a l Shaping THE HOMOMORPHIC CODER Coder O p e r a t i o n Coder Performance CHAPTER 5

CONCLUSIONS

LIST OF FIGURES

FIGURE

PAGE

TITLE

.............. 2-3 PRE-SCALING EFFECT ...................... 2-6 SUB-OPTIMAL TRANSFORM PERFORMANCE ...... 2-11 BLOCK BOUNDARY DISTORTION .............. 2-16 TRANSFORM CODING STRUCTURE

GENERAL STRUCTURE OF ADAPTIVE TMNSFORM CODING

................................... 3-3

LOG-LINEAR SMOOTHING

3-6

a...................

............ 3-9 LPC PITCH MODEL ........................ 3-10 HOMOMORPHIC SIDE INFORMATION PROCESSING . 3-14 LPC ADAPTIVE TRANSFORM CODER

HOMOMORPHIC ADAPTIVE TRANSFORM CODER STRUCTURE

.............................. 3-15

LPC ADAPTIVE TRANSFORM CODER WAVEFORMS

.

4-5

.................. 4-10 CODER SNR PERFORMANCE .................. 4-11 CODER SNR PERFORMANCE .................. 4-13 SIDE INFORMATION INTERPOLATION ......... 4-17 REFLECTION COEFFICIENT HISTOGRAMS ...... 4-22 AVERAGE ENERGY PARAMETER ............... 4-24 CODER SNR PERFORMANCE .................. 4-25

CODER SNR PERFORMANCE

................. 4-28 ANALYSIS FRAME WINDOWING ............... 4-32 FRAME BOUNDARY DISCONTINUITY ........... 4-33 TRANSFORM COEFFICIENT HISTOGRAM ........ 4-35

VISIBLE BIT ASSIGNMENT

LIST OF FIGURES

FIGURE

TITLE

PAGE

TRANSFORM COEFFICIENT HISTOGRAMS

....... 4-36

TRANSFORM COEFFICIENT QUANTIZER PERFORMANCE HOMOMORPHIC WAVEFORMS CODER CODER

............................ 4-38 ADAPTIVE TRANSFORM CODER

.............................. 4-43 SNR PERFORMANCE .................. 4-46 SNR PERFORMANCE .................. 4-47

ABSTRACT Frequency considerable

domain

coding

attention.

techniques

Prominent

have

( 8-16 k b / s e c ).

rates

for

low

components

using

to

medium

Adaptive t r a n s f o r m c o d e r s d i v i d e s p e e c h

i n t o f r e q u e n c y components by u s i n g a s u i t a b l e t r a n s f o r m these

received

among t h e s e t e c h n i q u e s , a d a p t i v e

transform coding o f f e r s e x c e l l e n t speech q u a l i t y data

recently

pulse

code

modulation

and

transmit

(PCM). Three b a s i c

i s s u e s i n t h e d e s i g n of a d a p t i v e t r a n s f o r m c o d e r s a r e : ( 1 ) S e l e c t i o n of t h e b e s t tran'sform ( 2 ) S e l e c t i o n of the b e s t quantization s t r a t e g y

( 3 ) S e l e c t i o n of a s p e c t r a l p a r a m e t e r i z a t i o n t e c h n i q u e This t h e s i s discusses

design

considerations

with

emphasis

on

f i n d i n g v a r i a n t s o f a d a p t i v e t r a n s f o r m a l g o r i t h m s amenable t o hardware implementation.

I n t h i s c0ntex.t c o d e r performance u s i n g reduced frame

l e n g t h s is presented. caused

by

effects

frame are

distortion. transform presented.

O b j e c t i v e and s u b j e c t i v e performance r e d u c t i o n ,

boundary

investigated Results

coders

from

using

discontinuities

as

the

two

all-pole

and

primary

computer and

low-pass

sources

simulations

of of

filtering perceptual adaptive

homomorphic s p e c t r a l f i t s a r e

SOMMAIRE

L e e techniques d e codage d a n s l e domaine frequentiel ont recemment fait l'objet

d'une

attention considerable.

L e codage d e transformees

par adaptation y occupe une place d e choix parce qu'il

permet o n e

excellente qualitd d e transmission d e la parole pour d e s debits faiblea o u moyens (8-16 kHz).

Les syst'emes d e codage d e transformees

par adaptation effectuent une segmentation d e la parole e n diverses composantes frequentielles grtce 1 l'utilisation d'une

transformee

appropriee et transmettent c e s c o m p o s a n t e s 1 l'aide d e l a modulation par impulsion et codage (MIC).

Les codeurs d e transform6es par

adaptation sont associes 1 trois questions fondamentales: (1)

Selection d e la meilleure transformie

(2) (3)

Selection d e l a meilleure strategic d e quantification Selection d'une

technique d e definition des param'etres

spectraux La presente thSse traite d e consideration theoriques et met l'accent sur la determination d e variantes d'algorithmes

relatifs aux

transform6es par adaptation, pouvant stre tradhits e n syst'emes mecaniques.

Dans c e contexte, o n presente les performances d e

codage, faisant appel 1 d e s longueurs d e trames

reduites.

On

analyse les reductions d e s performances objectives et subjectives resultant des discontinuit6s des limites de trames et d e s effets d e filtrage passe-bas, envisagees comme les sources principales d e la distorsoin liEe i la perception.

On examine enfin les

resultats d e deux simulations par ordinateur d e codeurs d e transforrn6e.s par adaptation, faisant appel 2 des courbes homomorphiques spectrales et entisrement polaires.

CHAPTER 1 INTRODUCTION

The o b j e c t i v e o f s p e e c h coding i s t o t r a n s m i t t h e h i g h e s t q u a l i t y speech

over

the

l e a s t p o s s i b l e channel c a p a c i t y w h i l e employing t h e

l e a s t complex c o d e r . however

directly

Coder

linked

efficiency

in

channel

utilisation

t o c o d e r c o m p l e x i t y and c o s t .

advances i n LSI ( l a r g e s c a l e i n t e g r a t i o n ) t e c h n o l o g y available

more

sophisticated

digital

signal

reduced c o s t s .

Thus, t e l e p h o n e networks

switching

processing

and

of

are

voice s i g n a l s .

is,

Fortunately,

are

now

making

processing devices a t moving

toward

digital

I n v e s t i g a t i o n s o f more

complex coding schemes a r e c o n t i n u i n g i n t h e l i g h t o f t h e s e r e c e n t LSI technology

advances.

This

new

technology

offers

greater

system

f l e x i b i l i t y and c o n s i d e r a b l e c o s t advantage. Speech c o d e r s can be d i v i d e d i n t o two d i s t i n c t c l a s s e s ; coders

and

source

coders

f a c s i m i l e reproduction of statistics

of

a

signal,

(vocoders) the the

signal

.

Waveform

waveform.

coders s t r i v e f o r

By

observing

the

waveform c o d e r can be t a i l o r e d t o t h e

s i g n a l r e s u l t i n g i n reduced coding e r r o r , and a more coder.

waveform

signal

specific

Source c o d e r s employ a minimal p a r a m e t r i c d e s c r i p t i o n d e r i v e d

from a h y p o t h e s i s o f s p e e c h p r o d u c t i o n .

Consequently, t h e s e u n i t s can

be

o p e r a t e d a t lower t r a n s m i s s i o n r a t e s .

Source c o d e r s a r e a l s o more

s e n s i t i v e t o s p e a k e r v a r i a t i o n and background n o i s e t h a n a r e t h o s e t h e waveform

classification.^

I n speech coding, transmission r a t e s coders

the

is

determine

Speech

quality

s o u r c e c o d e r s can be u s e d ,

to

produce

Waveform c o d i n g can be performed

in

either

Two

examples

into

a

of

the

number

synthetic

latter

are

time

subband

using

a

block

waveform

components

can

and

the

be

re-synthesized

subsequent

transformation i f a input

transform signal

is

filter was

short

by

or

frequency

and

adaptive

bank

These

A r e p l i c a of the

decoding

the

frequency

summation

or,

inverse

used.

Both

methods

originally

quasi-stationary

modelled by a s h o r t time spectrum. the

quality

transformation.

f r e q u e n c y components a r e t h e n q u a n t i z e d and encoded.

of

waveform

of f r e q u e n c y bands by u s i n g a f i l t e r bank, o r

i n t o f r e q u e n c y components by

assume

for

Frequency domain coding i s accomplished by d i v i d i n g

transform coders.

input

of

1.

domains.

speech

class

A t lower r a t e s (below

c o d e r s d e c l i n e s v e r y r a p i d l y below t h i s f i g u r e .

speech [ I

which

Above 5 k b / s e c waveform c o d e r s o f f e r

more e f f e c t i v e .

communication and t o l l q u a l i t y speech.

5 kb/sec.)

of

and

can

be l o c a l l y

Perceptually important

components

t i m e spectrum must be i s o l a t e d and t r a n s m i t t e d w i t h o u t

incurring excessive delay o r distortion. A d d i t i o n a l demands a r e p l a c e d on s p e e c h context

in

which

they

are

speech coders is i n telephony. little

control

used.

coding

schemes

by

the

A l i k e l y a r e a of a p p l i c a t i o n f o r

S i n c e a telecommunications c a r r i e r h a s

o v e r t h e type of s i g n a l s t h e network w i l l s u p p o r t , i t

is h i g h l y d e s i r a b l e t h a t s p e e c h c o d e r s

1-2

support

a

variety

of

input

signals

i n c l u d i n g modem s i g n a l s .

I n a m i l i t a r y context encryption is

made p o s s i b l e by t h e d i g i t a l n a t u r e

of

speech

coders.

Since

good

q u a l i t y i s n o t e s s e n t i a l , maximum s p e e c h compression i s one of

speech

t h e primary objectives. The mathematical p r i n c i p l e s behind t r a n s f o r m formulated

by

Huang

in

a

paper

entitled

quantizing

blocks

of

correlated

independent

random

variables.

efficiently

are

allocated

f o r the block a r e exausted.

c o n s t r u c t s (from t h e original

variables

best choice f o r derived

for

variables. Encoding

Segall for

the

random

one-by-one

until

number

[3]

Vector

of in

and

bits

a

random

the

bits

A second l i n e a r t r a n s f o r m a t i o n

values)

transform

A

variables

transformed

the

best

i n a mean s q u a r e e r r o r s e n s e .

each

the

quantized

quantized

dependent Then

variables

Q u a n t i z a t i o n of

Gaussian random v a r i a b l e s .

l i n e a r transformation f i r s t converts the into

"Block

first

were

Huang d e v e l o p s a p r o c e d u r e

C o r r e l a t e d Gaussian Random V a r i a b l e s " [2]. for

coding

an

estimate

of

the

Huang d e v e l o p s t h e

approximate

expression

is

a s s i g n e d t o each o f t h e q u a n t i z e d

paper

entitled

"Bit

Allocation

and

s o u r c e s " o b t a i n e d a more p r e c i s e e x p r e s s i o n f o r

t h e a l l o c a t i o n of a v a i l a b l e b i t s t o q u a n t i z a t i o n

of

the

transformed

variables. Z e l i n s k i and No11 [ 4 ] developed principles

discussed

by

c o n t r i b u t i o n was

an

discrete

transform.

cosine

t e r m spectrum quantization.

Huang

adaptive

obtained

from

a

speech

and

coder

Segall.

quantization

based Their

strategy

on

the

important

employing

the

The a d a p t a t i o n i s c o n t r o l l e d by a s h o r t the

transform

coefficients

prior

to

The s h o r t term s p e c t r u m i s t h e n p a r a m e t e r i z e d and s e n t

t o t h e r e c e i v e r a s s i d e information.

A second paper by

Zelinski

and

No11 [5] p r e s e n t s r e f i n e m e n t s t o t h e s i d e i n f o r m a t i o n p a r a m e t e r i z a t i o n technique. strategy

The

paper

aimed

discusses

improvements

to

the

quantization

a t improving t h e s u b j e c t i v e performance o f t h e coder.

Two p a p e r s by T r i b o l e t and C r o c h i e r e [ 6 , 7 ] d i s c u s s a d a p t i v e coders

which

employ

all-pole

transform

modelling o f t h e s h o r t term spectrum.

The p a p e r s compare sub-band c o d e r s and a d a p t i v e

transform

t h e context of an analysis/synthe;is

Cox and C r o c h i e r e [ 8 ]

in a

paper

Coding"

entitled

"Real-Time

framework. Simulation

of

coders

Adaptive

in

Transform

d e v e l o p a homomorphic model f o r p a r a m e t e r i z a t i o n o f t h e s h o r t

term spectrum.

Cox c l a i m s t h e

technique

performs

as

well

the

as

L

all-pole

model

context.

Numerous a u t h o r s have

transform

and

is

easier

to

implement

contributed

i n a r e a l t i m e coding

to

the

development

of

coding d i r e c t l y and i n d i r e c t l y b u t t h e above p a p e r s a r e t h e

most o f t e n quoted a s r e f e r e n c e s . This t h e s i s reviews c u r r e n t transform coding directed

towards

finding

variants

amenable t o hardware implementation.

and

bit

assignment

Chapter 2

presents

the

algorithms

is

discussed

3 c o n s i d e r s v a r i o u s a d a p t i v e t r a n s f o r m coding s t r a t e g i e s term

Three

techniques

transmission

spectrum of

to

parameterizing

o f l i s t e n i n g t e s t s a r e used t o

their

the

evaluate

causes

from

a

Chapter

employing

a

short

term

spectrum

for

I n Chapter 4 t h e r e s u l t s

coded

speech

generated

by

These c o d e r s u s e e i t h e r a l l - p o l e o r homomorphic

m o d e l l i n g of t h e s h o r t term spectrum. and

theory

adapt the transform c o e f f i c i e n t q u a n t i z e r s .

t o t h e r e c e i v e r a r e presented.

computer s i m u l a t i o n s .

is

The a p p l i c a b i l i t y o f v a r i o u s

t h e o r e t i c a l s t a n d p o i n t f o r t h e i r u s e f u l n e s s i n c o d i n g speech.

short

and

of adaptive transform algorithms

and b a s i c s t r u c t u r e o f t r a n s f o r m coding. transforms

strategies

are

identified. 1-4

Impairments i n Techniques

speech to

combat

quality these

i m p a i r m e n t s a r e implemented. perceptual discussed. energy

observations

The e f f e c t of reduced on

frame

boundary

Q u a n t i z a t i o n n o i s e s h a p i n g and n o i s e

frequency

bands

are

studied.

sizes

and

discontinuities

are

insertion

into

low

summarizes

the

Chapter

i m p o r t a n t r e s u l t s and p r e s e n t s c o n c l u s i o n s based on from t h e s i m u l a t i o n of t h e s e c o d e r s .

frame

5

results

obtained

CHAPTER 2 THE THEORY OF TRANSFORM CODING

I n t h i s s e c t i o n t h e t h e o r y o f bransform c o d i n g is related

topics

are

discussed.

The

developed

t r e a t m e n t emphasizes i m p o r t a n t

e l e m e n t s i n c l u d i n g t h e b a s i c s t r u c t u r e of t r a n s f o r m c o d i n g , and

justification

discussion Sub-optimal

of

an

to

transform

and

bit

assignment

rule.

a r e i n t r o d u c e d and compared w i t h t h e d i s c r e t e

The l a t t e r frame

selection

t h e mean s q u a r e e r r o r d i s t o r t i o n c r i t e r i a , and

optimal

transforms

c o s i n e transform. resistant

of

and

is

known

discontinuity

to

be

a

good

distortion.

choice

The

and

following

p r e s e n t a t i o n includes s u f f i c i e n t theory t o support the s u b j e c t matter.

2.1

BASIS TRANSFORM CODING Mathematical c o n c e p t s of t r a n s f o r m c o d e r s a r e d e p i c t e d i n

2-1.

A frame b u f f e r a r r a n g e s N s u c c e s i v e s o u r c e samples x ( n ) i n t o t h e

X. source vector sampler

The s p e e c h i s assumed t o

be

bandlimited

with

s a t i s f y i n g t h e s a m p l i n g theorm i n o r d e r t o a v o i d a l i a s i n g .

l i n e a r t r a n s f o r m a t i o n i s performed on t h e s o u r c e v e c t o r the

Figure

transform

coefficient

vector

Y. -

Such

an

5

to

operation

r e p r e s e n t e d by t h e m a t r i x e q u a t i o n 2.1 where A i s u n i t a r y .

the

A

obtain can

be

eq.

-Y =

2.1

A-X

R e c o n s t r u c t e d o u t p u t samples a r e o b t a i n e d from t h e q u a n t i z e d t r a n s f o r m A

vector

Y by -

inverse transformation.

The m a t r i x e q u a t i o n r e p r e s e n t i n g

t h i s o p e r a t i o n is

X = A - ~-Y A

eq.

2.2

A

The o v e r a l l mean squared o v e r a l l d i s t o r t i o n o f t h e

coding

scheme

is

equal to the t o t a l quantization error i.e.

Minimization strategy

of

distortion

and t r a n s f o r m .

requires

an

appropriate

quantization

A s w i l l be shown l a t e r a n e c e s s a r y c o n d i t i o n

f o r minimum coding e r r o r i s t h a t e v e r y t h e same amount o f d i s t o r t i o n .

transform

coefficient

suffer

>

xin)

a

frame buffer

X

A

A-1

OUAHTlZER

'

frame buffer

Fin)

1

1

GENERAL T W F O R n C(1DINO

A

L

r

rln)

I

X

frame buffrr

-

t a

A

~1

v

A

B BASIS RESTRICTED TRANSFORn CCOINO

TRANSFORM

CODING

FIGURE

2-1

BASICS

X -

frrmr buffer

9i.r -

2.2

QUANTIZATION STRATEGY

Quantization

strategy

refers

to

quantize the transform c o e f f i c i e n t .

the

restricted

...n

independently.

q u a n t i z a t i o n schemes a r e c o n s i d e r e d h e r e .

b e r a t i o n a l i z e d by dimensional

considering

Gaussian

the

source

X

vector

variables.

shows

[2]

X

Since

components y i a r e n o t Huang

to

only

that

is

to

Y

Gaussian,

Y

uncorrelated

but

is

Only

T h i s can be

an

N

A nonsingualar

random v a r i a b l e with z e r o mean.

X t o y i e l d t h e transform v e c t o r m a t r i x A o p e r a t e s on -

random

employed

Basis r e s t r i c t e d transform coding

schemes q u a n t i z e t h e c o e f f i c i e n t s y i i = l , 2 , 3 basis

technique

o f uncorrelated and i t s

Gaussian

actually

independent.

t h e b a s i s r e s t r i c t e d q u a n t i z a t i o n schemes a r e

o p t i m a l when t h e t r a n s f o r m c o e f f i c i e n t s a r e i n d e p e n d e n t . The

quantization

adaptation

and

bit

strategy assignment

e q u a t i o n 2.3 can be reduced (generally

different)

coefficients. for

each

transform

rules.

by

number

characterized

assigning of

levels

the

is

known

quantizers

coefficients

by

an

by

quantizers

with

accomplished

estimate

size

suitable

t o each o f t h e N t r a n s f o r m quantization

a s t h e b i t assignment. is

step

Overall d i s t o r t i o n given i n

The d i s t r i b u t i o n o f t h e number of

quantizer

adaptation of

is

of

by

the

levels

The s t e p s i z e

pre-scaling

the

coefficients.

The

d i s t r i b u t i o n o f t h e c o e f f i c i e n t s , i n g e n e r a l , depends on t h e t r a n s f o r m and

source s i g n a l .

A n a l y t i c c a l c u l a t i o n of t h e d i s t r i b u t i o n f u n c t i o n

i s too d i f f i c u l t t o y i e l d meaningful r e s u l t s e x c e p t i n s p e c i a l

cases.

Goldberg and C o s e l l [ 9 ] o b t a i n e d numerical r e s u l t s f o r d i s c r e t e c o s i n e t r a n s f o r m (DCT) c o e f f i c i e n t s showing t h e c o e f f i c i e n t lie

between

effect

of

the

Gaussian

pre-scaling

and

transform

Laplace

distribution

distribution.

coefficients

is

to

to

However, t h e make

the

d i s t r i b u t i o n bi-modal a b o u t +1 and - 1 .

I n t h e l i m i t of p e r f e c t energy

+I

e s t i m a t i o n , t h e d i s t r i b u t i o n approaches a p a i r o f impulses a t

-1.

and

The p r e - s c a l i n g e f f e c t i s i l l u s t r a t e d i n f i g u r e 2-2 and i s v a l i d

f o r any d i s t r i b u t i o n . transform

I n any e v e n t , t h e d i s t r i b u t i o n

is

coefficients

such

that

an

of

improvement

v a r i a b l e t i m e domain q u a n t i z a t i o n (PCM) can

be

the

scaled

over

single

expected.

This

is

e x p l a i n e d i n S e c t i o n 2.5 by t h e concept o f t r a n s f o r m c o d i n g g a i n . I n o r d e r t o minimize t h e d i s t o r t i o n measure and

the

quantization

Further, the

the

optimal

distortion

coefficients approach. measure

to

must

validate

the

in

the

the

an

produce basis

mean

optimal

signal-to-noise the

square

ratio

perceptual basis.

maximizes t h e segmental

where

selected.

SNR

is

transform

error

coding

(MSE) d i s t o r t i o n

with t h i s property. desireable

(SNR) f o r speech

transform

each

quality.

The

because block. The

it This

MSE

is

Thus t h e t r a n s f o r m c o d i n g scheme the

segments

are

the

analysis

Segmental SNR i s a b e t t e r i n d i c a t o r o f t h e p e r c e p t u a l q u a l i t y

o f s p e e c h t h a n SNR. measure

measure

restricted

measure

minimized on a block-by-block

frames.

The o p t i m a l

decorrelated

transform

distortion

c o r r e l a t e s well with

distortion

transform

t r a n s f o r m a t i o n r e s u l t i n g from t h e s e l e c t i o n o f

measure

s e l e c t i o n o f t h e MSE maximizes

on

M i n i m i z a t i o n of t h e results

the

b i t assignment must be o p t i m i z e d . depend

l i n e a r transform w i l l

both

may

Nevertheless, s e l e c t i o n

result

in

introduction

d i s t o r t i o n s i n t o t h e coded speech.

of

of

the

MSE

distortion

unacceptable

perceptual

Perceptual factors

i n t o a c c o u n t by modifying t h e b i t assignment.

can

be

taken

A ) TRANSFORM COEFFICIENT DISTRIBUTION

8) PRE-SCALED TRANSFORM COEFFICIENT DISTRIBUTION (good estimate)

C) PRE-SCALED TRANSFORM COEFFICIENT DISTRIBUTION (better estimate)

TRANSFORM COEFFICIENTS

NOWLIZED TRANSFORH COEFFICIENTS

TRANSFUAW CCEFFICIENTS

PRE- SCALING

F I G U R E 2-2

EFFECT

2.3

OPTIMAL B I T ASSIGNMENT The t r a n s f o r m c o e f f i c i e n t yi w i t h

with

Ri

bits/sample

2.4

u2 i

requires

Ri

6

=

The second term distributed

t h e mean squared d i s t o r t i o n Di i s n o t t o be

+ LOG^ is

the

Gaussian

minimal

random

rate

for

variables.

independent The

identically

correction

factor

depends on t h e type of q u a n t i z e r and t h e p r o b a b i l i t y d e n s i t y (pdf)

of

coding

Ri i s g i v e n by:

exceeded.

eq.

if

variance

the

signal.

Neglecting

the

dependence

s u b s t i t u t i n g f o r Di t h e o p t i m a l number o f b i t s

for

function

o f R i on 6 quantizer

6

and

Qi i s

found by minimizing t h e a v e r a g e d i s t o r t i o n g i v e n by

with t h e c o n s t r a i n t of a fixed average b i t r a t e i . e .

The e x p r e s s i o n f o r t h e a v e r a g e d i s t o r t i o n i s minimized s u b j e c t t o

the

c o n s t r a i n t o f a f i x e d b i t r a t e by t r e a t i n g R i a s a c o n t i n i o u s v a r i a b l e and u s i n g an undetermined m u l t i p l i e r

aa ~{%l ,N I0 eq.

2.5

i=l

.-2Riln2

one o b t a i n s :

N

+

6 1 Ri} i=l

It follows t h a t

eq.

2.6

Ui e

-2Riln2

- 21n2 NB - Const f o r i 1 , 2

...N

S o l v i n g e q u a t i o n 2.6 f o r Ri and e v a l u a t i n g C u s i n g t h e c o n s t r a i n t o f a f i x e d b i t r a t e we o b t a i n :

eq.

2.7

R~ =

R LOG^

Given an o p t i m a l b i t a s s i g n m e n t , t h e

lower

bound

on

distortion

is

g i v e n by

D i s t o r t i o n i n t r o d u c e d by a t r a n s f o r m coding scheme depends on t h e distribution

of t h e v a r i a n c e s .

I n p a r t i c u l a r , t h e average d i s t o r t i o n

i s determined by t h e g e o m e t r i c mean of t h e v a r i a n c e s .

The o p t i m a l q u a n t i z a t i o n scheme d i s c u s s e d presented

in

a

paper

above

was

by Huang and S c h u l t h e i s s [ 2 ] .

even n e g a t i v e b i t a s s i g n m e n t s can r e s u l t because Ri i s continous

variable.

originally

F r a c t i o n a l and treated

as

a

S e g a l l [3] d i s c u s s e s t h e o p t i m a l b i t assignment

under t h e c o n s t r a i n t o f p o s i t i v e i n t e g e r b i t assignment.

B i t s a r e assigned optimally i n the Fortran for

this

thesis.

The

t r a n s f o r m c o e f f i c i e n t s Yk tabulated

in

Max's

technique

.

uses

k

simulation

bit

quantizers

developed f o r the

The r e s u l t i n g mean s q u a r e e r r o r s ~ ( k )a r e

paper [lo].

It f o l l o w s t h a t t h e m a r g i n a l r e t u r n

f o r t h e i t h t r a n s f o r m c o e f f i c i e n t can be d e f i n e d a s

Arranging

k i n d e s c e n d i n g o r d e r , and a s s i g n i n g b i t s one-by-one,

the

g l o b a l minimum mean s q u a r e e r r o r w i l l be achieved i n d e p e n d e n t l y o f t h e 2-8

d i s t r i b u t i o n of t h e t r a n s f o r m c o e f f i c i e n t s .

2.4

THE KARHUNEN-LOEVE TRANSFORM C o n s i d e r a d i s c r e t e s i g n a l of N sampled v a l u e s .

be r e p r e s e n t e d as p o i n t i n an N d i m e n s i o n a l space.

This s i g n a l

Each sampled v a l u e

i s t h e n a component o f t h e N v e c t o r X which r e p r e s e n t s t h e

this

space.

Next

can

signal

in

c o n s i d e r a u n i t a r y t r a n s f o r m ( T ) o p e r a t i n g on t h e

data vector X r e s u l t i n g i n t h e transform v e c t o r Y.

The

objective

in

d a t a compression i s t o s e l e c t a s u b s e t o f M components o f Y where M i s less

N.

than

introduces

The

some

remaining

distortion.

copponents

are

discarded

and

this

A u n i t a r y t r a n s f o r m which minimizes t h e

mean s q u a r e e r r o r caused by d i s c a r d i n g components

is

the

objective.

Some o f t h e i m p o r t a n t p r o p e r t i e s of t h e KLT a r e d e s c r i b e d below. The KLT i s a d a t a dependent t r a n s f o r m a t i o n

whose

basis

vectors

a r e e i g e n v e c t o r s o f t h e a u t o c o r r e l a t i o n m a t r i x of t h e X p r o c e s s . transform d i a g o n a l i z e s t h e a u t o c o r r e l a t i o n matrix of vector

Y which

means

the

components

Gaussian assumption independent.

are

the

components.

2 by in The

a

Each t r a n s f o r m c o e f f i c i e n t can

lower

dimensional

only

space

then

It i s p o s s i b l e by

discarding

mean s q u a r e e r r o r r e s u l t i n g from t h i s approximation

i s t h e sum o f t h e v a r i a n c e s o f t h e d i s c a r d e d

If

transformed

u n c o r r e l a t e d and by t h e

be q u a n t i z e d i n d e p e n d e n t l y w i t h o u t l o s i n g performance. t o approximate

This

transform

coefficients.

t h e components o f Y with the lowest variances a r e discarded,

t h e approximation i s o p t i m a l i n a mean s q u a r e e r r o r s e n s e .

Two l i m i t a t i o n s of t h e Karhunen-Loeve t r a n s f o r m a r e , t h a t computationally

burdensome

and,

requires

solutions

problems whose s o l u t i o n s may be n u m e r i c a l y u n s t a b l e . the

KLT

receiver

requires to

a

perform

knowledge an

of

inverse

the

it

is

of eigenvector More

precisely

c o r r e l a t i o n function a t the

transformation.

The

correlation

function is not generally a v a i l a b l e a t the receiver.

2.5

SUB-OPTIMAL TRANSFORMS

The p r a c t i c a l l i m i t a t i o n s o f t h e KLT r e q u i r e an i n v e s t i g a t i o n sub-optimal

transforms.

The

discrete

of

F o u r i e r t r a n s f o r m (DFT), t h e

d i s c r e t e c o s i n e t r a n s f o r m (DCT) and t h e Walsh-Hadamard t r a n s f o r m (WHT) are

all

useful

sub-optimal

transforms.

A

method

t o compare t h e

performances of u n i t a r y t r a n s f o r m s i n t r a n s f o r m coding a p p l i c a t i o n s is highly desirable. Assuming t h e function

(pdf)

transform of

the

only

affects

the

probability

sampled

process

slightly,

p a r a m e t e r 6 i s unchanged whether q u a n t i z i n g i n t h e time domain.

Hence

e i t h e r domain.

the

density

the

quantizer

or

transform

dependence of t h e d i s t o r t i o n on 6 i s t h e same i n

From e q u a t i o n 2.4,

i t i s c l e a r t h a t a lower

bound

on

t h e d i s t o r t i o n of a PCM i s g i v e n by

eq.

2.8

D

= 2 26 2' 2 i i

Pcm

r1

Ni=l

2

i

D e f i n i n g t h e t r a n s f o r m c o d i n g g a i n o v e r PCM a s o v e r PCM, w i l l e n a b l e u s e f u l comparisons.

G

D A pcm tc Dtc

the

increase

in

SNR

The t r a n s f o r m g a i n f o r any u n i t a r y t r a n s f o r m i s t h e n t h e r a t i o of arithmetic

and

geometric

c o e f f i c i e n t a s g i v e n above. of

mean

of

the

variances

of t h e t r a n s f o r m

Variances a r e j u s t t h e diagonal

t h e c o - v a r i a n c e m a t r i x i n t h e t r a n s f o r m domain.

elements

No11 [4] modelled

t h e l o n g term s t a t i s t i c s o f v o i c e d speech by a s t a t i o n a r y t e n t h Markov s o u r c e .

These r e s u l t s a r e shown i n F i g u r e 2-3.

t r a n s f o r m coding g a i n dependence on block l e n g t h N. lengths, well.

the

KLT

performance

improves.

The

For l a r g e r

of

The DCT performs n e a r l y as

ap$lications.

This

algorithm.

large

Distribution

block

of

the

t h e d i s c r e t e cosine transform c o e f f i c i e n t s a r e h o w n t o

converge a s y m p t o t i c a l l y t o t h e power d e n s i t y spectrum o f

[4].

block

d i s c r e t e s l a n t t r a n s f o r m (DST) and WHT show a v e r y poor

performance i n t r a n s f o r m c o d i n g variances

order

and i l l u s t r a t e

The DFT and DCT converge t o KLT performance f o r a

length.

the

property

is

used

in

the

LPC

the

process

a d a p t i v e b i t assignment

2.6

THE DISCRETE COSINE TRANSFORM For

any

practical

computational

savings

transform

offered

by

coding the

DCT

implementation,

its

and

performance, make i t a n a t t r a c t i v e a l t e r n a t i v e t o t h e and

KLT.

optimal The

DCT

KLT a r e a s y m p t o t i c a l l y e q u i v a l e n t i f t h e d a t a c o v a r i a n c e m a t r i c e s

a r e T o e p l i t z hence i t i s n o t s u r p r i s i n g t h a t t h e DCT as

near

the

well

as

the

performs

KLT when t h e d a t a v e c t o r s a r e l a r g e [ I I ] .

nearly The r a t e

d i s t o r t i o n c r i t e r i a o f t h e KLT and DCT a r e a l s o comparable [12].

Formally t h e DCT o f a r e a l M p o i n t sequence ~ ( k )can

eq.

be

defined

2.10

The i n v e r s e DCT i s g i v e n by

eq.

2.11

v(n)

=

k=O

The DCT o f a sequence v ( n ) i s c l o s e l y r e l a t e d t o a 2M p o i n t DFT related

sequence

u(n).

Interpretation

i n c l u d i n g formant

structure

transform

may

domain.

domain

and

therefore

pitch

of

striations

The f o l l o w i n g a n a l y s i s f o l l o w s

u(n) = v(n)

0