Adaptive Transform Coding of Speech Signals by Richard Jamss ...

7 downloads 0 Views 2MB Size Report
2.1 BASIS TRANSFORM CODING. Mathematical concepts of transform coders are depicted in Figure. 2-1. A frame buffer arranges N succesive source samplesĀ ...
Adaptive Transform Coding of Speech Signals by Richard Jamss Pinnell

B. Eng (McGill)

McGill University Montreal

Canada

May 1982

ACKNOWIJ3DGEM ENTS

I would l i k e t o thank my t h e s i s s u p e r v i s o r , D r . P. Kabal f o r h i s v a l u a b l e encouragement and guidance i n both t h e experimental a s p e c t o f t h i s work and i n t h e p r e p a r a t i o n o f t h i s t h e s i s .

CHAPTER 1

INTRODUCTION

CHAPTER 2

THE THEORY OF TRANSFORM CODING

2.1 2.2 2.3 2 4 2-5

2.6 CHAPTER 3* 3.1 3-2

3.3 CHAPTER 4

.. .. .. .. .. .. .. .. .. .. .. .. .. .. 2-1 . . . . . . . . . . . . . . 2-4 2-7 . . . . . . . . . . . 2-9 . . . . . . . . . . . . . 2-10 . . . . . . . . . 2-12

BASIS TRANSFORM CODING QUANTIZATION STRATEGY OPTIMAL BIT ASSIGNMENT THE KARHUNEN-LOEVE TRANSFORM SUB-OPTIMAL TRANSFORMS THE DISCRETE COSINE TRANSFORM

ADAPTIVE TRANSFORM CODING 3-4 . . . . . . . . .. .. .. .. .. .. .. .. :. .. 3-7 . . . . . . . . . . . . . . . 3-11

LOG-LINEAR SMOOTHING TECHNIQUE ALL-POLE MODEL HOMOMORPHIC MODEL CODER EVALUATION

. . .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. .. .. .. .. .. .. .. . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. .. .. .. .. .. . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. ............... ..............

SIMULATION PROCEDURE THELPCCODER Coder O p e r a t i o n Reducing Transform Complexity Side Information Interpolation S i d e I n f o r m a t i o n Parameter S t a t i s t i c s And Quantization The Low-Pass E f f e c t Frame Boundary D i s c o n t i n u i t i e s Transform C o e f f i c i e n t S t a t i s t i c s S u b j e c t i v e E f f e c t Of Pre-emphasis And S p e c t r a l Shaping THE HOMOMORPHIC CODER Coder O p e r a t i o n Coder Performance CHAPTER 5

CONCLUSIONS

LIST OF FIGURES

FIGURE

PAGE

TITLE

.............. 2-3 PRE-SCALING EFFECT ...................... 2-6 SUB-OPTIMAL TRANSFORM PERFORMANCE ...... 2-11 BLOCK BOUNDARY DISTORTION .............. 2-16 TRANSFORM CODING STRUCTURE

GENERAL STRUCTURE OF ADAPTIVE TMNSFORM CODING

................................... 3-3

LOG-LINEAR SMOOTHING

3-6

a...................

............ 3-9 LPC PITCH MODEL ........................ 3-10 HOMOMORPHIC SIDE INFORMATION PROCESSING . 3-14 LPC ADAPTIVE TRANSFORM CODER

HOMOMORPHIC ADAPTIVE TRANSFORM CODER STRUCTURE

.............................. 3-15

LPC ADAPTIVE TRANSFORM CODER WAVEFORMS

.

4-5

.................. 4-10 CODER SNR PERFORMANCE .................. 4-11 CODER SNR PERFORMANCE .................. 4-13 SIDE INFORMATION INTERPOLATION ......... 4-17 REFLECTION COEFFICIENT HISTOGRAMS ...... 4-22 AVERAGE ENERGY PARAMETER ............... 4-24 CODER SNR PERFORMANCE .................. 4-25

CODER SNR PERFORMANCE

................. 4-28 ANALYSIS FRAME WINDOWING ............... 4-32 FRAME BOUNDARY DISCONTINUITY ........... 4-33 TRANSFORM COEFFICIENT HISTOGRAM ........ 4-35

VISIBLE BIT ASSIGNMENT

LIST OF FIGURES

FIGURE

TITLE

PAGE

TRANSFORM COEFFICIENT HISTOGRAMS

....... 4-36

TRANSFORM COEFFICIENT QUANTIZER PERFORMANCE HOMOMORPHIC WAVEFORMS CODER CODER

............................ 4-38 ADAPTIVE TRANSFORM CODER

.............................. 4-43 SNR PERFORMANCE .................. 4-46 SNR PERFORMANCE .................. 4-47

ABSTRACT Frequency considerable

domain

coding

attention.

techniques

Prominent

have

( 8-16 k b / s e c ).

rates

for

low

components

using

to

medium

Adaptive t r a n s f o r m c o d e r s d i v i d e s p e e c h

i n t o f r e q u e n c y components by u s i n g a s u i t a b l e t r a n s f o r m these

received

among t h e s e t e c h n i q u e s , a d a p t i v e

transform coding o f f e r s e x c e l l e n t speech q u a l i t y data

recently

pulse

code

modulation

and

transmit

(PCM). Three b a s i c

i s s u e s i n t h e d e s i g n of a d a p t i v e t r a n s f o r m c o d e r s a r e : ( 1 ) S e l e c t i o n of t h e b e s t tran'sform ( 2 ) S e l e c t i o n of the b e s t quantization s t r a t e g y

( 3 ) S e l e c t i o n of a s p e c t r a l p a r a m e t e r i z a t i o n t e c h n i q u e This t h e s i s discusses

design

considerations

with

emphasis

on

f i n d i n g v a r i a n t s o f a d a p t i v e t r a n s f o r m a l g o r i t h m s amenable t o hardware implementation.

I n t h i s c0ntex.t c o d e r performance u s i n g reduced frame

l e n g t h s is presented. caused

by

effects

frame are

distortion. transform presented.

O b j e c t i v e and s u b j e c t i v e performance r e d u c t i o n ,

boundary

investigated Results

coders

from

using

discontinuities

as

the

two

all-pole

and

primary

computer and

low-pass

sources

simulations

of of

filtering perceptual adaptive

homomorphic s p e c t r a l f i t s a r e

SOMMAIRE

L e e techniques d e codage d a n s l e domaine frequentiel ont recemment fait l'objet

d'une

attention considerable.

L e codage d e transformees

par adaptation y occupe une place d e choix parce qu'il

permet o n e

excellente qualitd d e transmission d e la parole pour d e s debits faiblea o u moyens (8-16 kHz).

Les syst'emes d e codage d e transformees

par adaptation effectuent une segmentation d e la parole e n diverses composantes frequentielles grtce 1 l'utilisation d'une

transformee

appropriee et transmettent c e s c o m p o s a n t e s 1 l'aide d e l a modulation par impulsion et codage (MIC).

Les codeurs d e transform6es par

adaptation sont associes 1 trois questions fondamentales: (1)

Selection d e la meilleure transformie

(2) (3)

Selection d e l a meilleure strategic d e quantification Selection d'une

technique d e definition des param'etres

spectraux La presente thSse traite d e consideration theoriques et met l'accent sur la determination d e variantes d'algorithmes

relatifs aux

transform6es par adaptation, pouvant stre tradhits e n syst'emes mecaniques.

Dans c e contexte, o n presente les performances d e

codage, faisant appel 1 d e s longueurs d e trames

reduites.

On

analyse les reductions d e s performances objectives et subjectives resultant des discontinuit6s des limites de trames et d e s effets d e filtrage passe-bas, envisagees comme les sources principales d e la distorsoin liEe i la perception.

On examine enfin les

resultats d e deux simulations par ordinateur d e codeurs d e transforrn6e.s par adaptation, faisant appel 2 des courbes homomorphiques spectrales et entisrement polaires.

CHAPTER 1 INTRODUCTION

The o b j e c t i v e o f s p e e c h coding i s t o t r a n s m i t t h e h i g h e s t q u a l i t y speech

over

the

l e a s t p o s s i b l e channel c a p a c i t y w h i l e employing t h e

l e a s t complex c o d e r . however

directly

Coder

linked

efficiency

in

channel

utilisation

t o c o d e r c o m p l e x i t y and c o s t .

advances i n LSI ( l a r g e s c a l e i n t e g r a t i o n ) t e c h n o l o g y available

more

sophisticated

digital

signal

reduced c o s t s .

Thus, t e l e p h o n e networks

switching

processing

and

of

are

voice s i g n a l s .

is,

Fortunately,

are

now

making

processing devices a t moving

toward

digital

I n v e s t i g a t i o n s o f more

complex coding schemes a r e c o n t i n u i n g i n t h e l i g h t o f t h e s e r e c e n t LSI technology

advances.

This

new

technology

offers

greater

system

f l e x i b i l i t y and c o n s i d e r a b l e c o s t advantage. Speech c o d e r s can be d i v i d e d i n t o two d i s t i n c t c l a s s e s ; coders

and

source

coders

f a c s i m i l e reproduction of statistics

of

a

signal,

(vocoders) the the

signal

.

Waveform

waveform.

coders s t r i v e f o r

By

observing

the

waveform c o d e r can be t a i l o r e d t o t h e

s i g n a l r e s u l t i n g i n reduced coding e r r o r , and a more coder.

waveform

signal

specific

Source c o d e r s employ a minimal p a r a m e t r i c d e s c r i p t i o n d e r i v e d

from a h y p o t h e s i s o f s p e e c h p r o d u c t i o n .

Consequently, t h e s e u n i t s can

be

o p e r a t e d a t lower t r a n s m i s s i o n r a t e s .

Source c o d e r s a r e a l s o more

s e n s i t i v e t o s p e a k e r v a r i a t i o n and background n o i s e t h a n a r e t h o s e t h e waveform

classification.^

I n speech coding, transmission r a t e s coders

the

is

determine

Speech

quality

s o u r c e c o d e r s can be u s e d ,

to

produce

Waveform c o d i n g can be performed

in

either

Two

examples

into

a

of

the

number

synthetic

latter

are

time

subband

using

a

block

waveform

components

can

and

the

be

re-synthesized

subsequent

transformation i f a input

transform signal

is

filter was

short

by

or

frequency

and

adaptive

bank

These

A r e p l i c a of the

decoding

the

frequency

summation

or,

inverse

used.

Both

methods

originally

quasi-stationary

modelled by a s h o r t time spectrum. the

quality

transformation.

f r e q u e n c y components a r e t h e n q u a n t i z e d and encoded.

of

waveform

of f r e q u e n c y bands by u s i n g a f i l t e r bank, o r

i n t o f r e q u e n c y components by

assume

for

Frequency domain coding i s accomplished by d i v i d i n g

transform coders.

input

of

1.

domains.

speech

class

A t lower r a t e s (below

c o d e r s d e c l i n e s v e r y r a p i d l y below t h i s f i g u r e .

speech [ I

which

Above 5 k b / s e c waveform c o d e r s o f f e r

more e f f e c t i v e .

communication and t o l l q u a l i t y speech.

5 kb/sec.)

of

and

can

be l o c a l l y

Perceptually important

components

t i m e spectrum must be i s o l a t e d and t r a n s m i t t e d w i t h o u t

incurring excessive delay o r distortion. A d d i t i o n a l demands a r e p l a c e d on s p e e c h context

in

which

they

are

speech coders is i n telephony. little

control

used.

coding

schemes

by

the

A l i k e l y a r e a of a p p l i c a t i o n f o r

S i n c e a telecommunications c a r r i e r h a s

o v e r t h e type of s i g n a l s t h e network w i l l s u p p o r t , i t

is h i g h l y d e s i r a b l e t h a t s p e e c h c o d e r s

1-2

support

a

variety

of

input

signals

i n c l u d i n g modem s i g n a l s .

I n a m i l i t a r y context encryption is

made p o s s i b l e by t h e d i g i t a l n a t u r e

of

speech

coders.

Since

good

q u a l i t y i s n o t e s s e n t i a l , maximum s p e e c h compression i s one of

speech

t h e primary objectives. The mathematical p r i n c i p l e s behind t r a n s f o r m formulated

by

Huang

in

a

paper

entitled

quantizing

blocks

of

correlated

independent

random

variables.

efficiently

are

allocated

f o r the block a r e exausted.

c o n s t r u c t s (from t h e original

variables

best choice f o r derived

for

variables. Encoding

Segall for

the

random

one-by-one

until

number

[3]

Vector

of in

and

bits

a

random

the

bits

A second l i n e a r t r a n s f o r m a t i o n

values)

transform

A

variables

transformed

the

best

i n a mean s q u a r e e r r o r s e n s e .

each

the

quantized

quantized

dependent Then

variables

Q u a n t i z a t i o n of

Gaussian random v a r i a b l e s .

l i n e a r transformation f i r s t converts the into

"Block

first

were

Huang d e v e l o p s a p r o c e d u r e

C o r r e l a t e d Gaussian Random V a r i a b l e s " [2]. for

coding

an

estimate

of

the

Huang d e v e l o p s t h e

approximate

expression

is

a s s i g n e d t o each o f t h e q u a n t i z e d

paper

entitled

"Bit

Allocation

and

s o u r c e s " o b t a i n e d a more p r e c i s e e x p r e s s i o n f o r

t h e a l l o c a t i o n of a v a i l a b l e b i t s t o q u a n t i z a t i o n

of

the

transformed

variables. Z e l i n s k i and No11 [ 4 ] developed principles

discussed

by

c o n t r i b u t i o n was

an

discrete

transform.

cosine

t e r m spectrum quantization.

Huang

adaptive

obtained

from

a

speech

and

coder

Segall.

quantization

based Their

strategy

on

the

important

employing

the

The a d a p t a t i o n i s c o n t r o l l e d by a s h o r t the

transform

coefficients

prior

to

The s h o r t term s p e c t r u m i s t h e n p a r a m e t e r i z e d and s e n t

t o t h e r e c e i v e r a s s i d e information.

A second paper by

Zelinski

and

No11 [5] p r e s e n t s r e f i n e m e n t s t o t h e s i d e i n f o r m a t i o n p a r a m e t e r i z a t i o n technique. strategy

The

paper

aimed

discusses

improvements

to

the

quantization

a t improving t h e s u b j e c t i v e performance o f t h e coder.

Two p a p e r s by T r i b o l e t and C r o c h i e r e [ 6 , 7 ] d i s c u s s a d a p t i v e coders

which

employ

all-pole

transform

modelling o f t h e s h o r t term spectrum.

The p a p e r s compare sub-band c o d e r s and a d a p t i v e

transform

t h e context of an analysis/synthe;is

Cox and C r o c h i e r e [ 8 ]

in a

paper

Coding"

entitled

"Real-Time

framework. Simulation

of

coders

Adaptive

in

Transform

d e v e l o p a homomorphic model f o r p a r a m e t e r i z a t i o n o f t h e s h o r t

term spectrum.

Cox c l a i m s t h e

technique

performs

as

well

the

as

L

all-pole

model

context.

Numerous a u t h o r s have

transform

and

is

easier

to

implement

contributed

i n a r e a l t i m e coding

to

the

development

of

coding d i r e c t l y and i n d i r e c t l y b u t t h e above p a p e r s a r e t h e

most o f t e n quoted a s r e f e r e n c e s . This t h e s i s reviews c u r r e n t transform coding directed

towards

finding

variants

amenable t o hardware implementation.

and

bit

assignment

Chapter 2

presents

the

algorithms

is

discussed

3 c o n s i d e r s v a r i o u s a d a p t i v e t r a n s f o r m coding s t r a t e g i e s term

Three

techniques

transmission

spectrum of

to

parameterizing

o f l i s t e n i n g t e s t s a r e used t o

their

the

evaluate

causes

from

a

Chapter

employing

a

short

term

spectrum

for

I n Chapter 4 t h e r e s u l t s

coded

speech

generated

by

These c o d e r s u s e e i t h e r a l l - p o l e o r homomorphic

m o d e l l i n g of t h e s h o r t term spectrum. and

theory

adapt the transform c o e f f i c i e n t q u a n t i z e r s .

t o t h e r e c e i v e r a r e presented.

computer s i m u l a t i o n s .

is

The a p p l i c a b i l i t y o f v a r i o u s

t h e o r e t i c a l s t a n d p o i n t f o r t h e i r u s e f u l n e s s i n c o d i n g speech.

short

and

of adaptive transform algorithms

and b a s i c s t r u c t u r e o f t r a n s f o r m coding. transforms

strategies

are

identified. 1-4

Impairments i n Techniques

speech to

combat

quality these

i m p a i r m e n t s a r e implemented. perceptual discussed. energy

observations

The e f f e c t of reduced on

frame

boundary

Q u a n t i z a t i o n n o i s e s h a p i n g and n o i s e

frequency

bands

are

studied.

sizes

and

discontinuities

are

insertion

into

low

summarizes

the

Chapter

i m p o r t a n t r e s u l t s and p r e s e n t s c o n c l u s i o n s based on from t h e s i m u l a t i o n of t h e s e c o d e r s .

frame

5

results

obtained

CHAPTER 2 THE THEORY OF TRANSFORM CODING

I n t h i s s e c t i o n t h e t h e o r y o f bransform c o d i n g is related

topics

are

discussed.

The

developed

t r e a t m e n t emphasizes i m p o r t a n t

e l e m e n t s i n c l u d i n g t h e b a s i c s t r u c t u r e of t r a n s f o r m c o d i n g , and

justification

discussion Sub-optimal

of

an

to

transform

and

bit

assignment

rule.

a r e i n t r o d u c e d and compared w i t h t h e d i s c r e t e

The l a t t e r frame

selection

t h e mean s q u a r e e r r o r d i s t o r t i o n c r i t e r i a , and

optimal

transforms

c o s i n e transform. resistant

of

and

is

known

discontinuity

to

be

a

good

distortion.

choice

The

and

following

p r e s e n t a t i o n includes s u f f i c i e n t theory t o support the s u b j e c t matter.

2.1

BASIS TRANSFORM CODING Mathematical c o n c e p t s of t r a n s f o r m c o d e r s a r e d e p i c t e d i n

2-1.

A frame b u f f e r a r r a n g e s N s u c c e s i v e s o u r c e samples x ( n ) i n t o t h e

X. source vector sampler

The s p e e c h i s assumed t o

be

bandlimited

with

s a t i s f y i n g t h e s a m p l i n g theorm i n o r d e r t o a v o i d a l i a s i n g .

l i n e a r t r a n s f o r m a t i o n i s performed on t h e s o u r c e v e c t o r the

Figure

transform

coefficient

vector

Y. -

Such

an

5

to

operation

r e p r e s e n t e d by t h e m a t r i x e q u a t i o n 2.1 where A i s u n i t a r y .

the

A

obtain can

be

eq.

-Y =

2.1

A-X

R e c o n s t r u c t e d o u t p u t samples a r e o b t a i n e d from t h e q u a n t i z e d t r a n s f o r m A

vector

Y by -

inverse transformation.

The m a t r i x e q u a t i o n r e p r e s e n t i n g

t h i s o p e r a t i o n is

X = A - ~-Y A

eq.

2.2

A

The o v e r a l l mean squared o v e r a l l d i s t o r t i o n o f t h e

coding

scheme

is

equal to the t o t a l quantization error i.e.

Minimization strategy

of

distortion

and t r a n s f o r m .

requires

an

appropriate

quantization

A s w i l l be shown l a t e r a n e c e s s a r y c o n d i t i o n

f o r minimum coding e r r o r i s t h a t e v e r y t h e same amount o f d i s t o r t i o n .

transform

coefficient

suffer

>

xin)

a

frame buffer

X

A

A-1

OUAHTlZER

'

frame buffer

Fin)

1

1

GENERAL T W F O R n C(1DINO

A

L

r

rln)

I

X

frame buffrr

-

t a

A

~1

v

A

B BASIS RESTRICTED TRANSFORn CCOINO

TRANSFORM

CODING

FIGURE

2-1

BASICS

X -

frrmr buffer

9i.r -

2.2

QUANTIZATION STRATEGY

Quantization

strategy

refers

to

quantize the transform c o e f f i c i e n t .

the

restricted

...n

independently.

q u a n t i z a t i o n schemes a r e c o n s i d e r e d h e r e .

b e r a t i o n a l i z e d by dimensional

considering

Gaussian

the

source

X

vector

variables.

shows

[2]

X

Since

components y i a r e n o t Huang

to

only

that

is

to

Y

Gaussian,

Y

uncorrelated

but

is

Only

T h i s can be

an

N

A nonsingualar

random v a r i a b l e with z e r o mean.

X t o y i e l d t h e transform v e c t o r m a t r i x A o p e r a t e s on -

random

employed

Basis r e s t r i c t e d transform coding

schemes q u a n t i z e t h e c o e f f i c i e n t s y i i = l , 2 , 3 basis

technique

o f uncorrelated and i t s

Gaussian

actually

independent.

t h e b a s i s r e s t r i c t e d q u a n t i z a t i o n schemes a r e

o p t i m a l when t h e t r a n s f o r m c o e f f i c i e n t s a r e i n d e p e n d e n t . The

quantization

adaptation

and

bit

strategy assignment

e q u a t i o n 2.3 can be reduced (generally

different)

coefficients. for

each

transform

rules.

by

number

characterized

assigning of

levels

the

is

known

quantizers

coefficients

by

an

by

quantizers

with

accomplished

estimate

size

suitable

t o each o f t h e N t r a n s f o r m quantization

a s t h e b i t assignment. is

step

Overall d i s t o r t i o n given i n

The d i s t r i b u t i o n o f t h e number of

quantizer

adaptation of

is

of

by

the

levels

The s t e p s i z e

pre-scaling

the

coefficients.

The

d i s t r i b u t i o n o f t h e c o e f f i c i e n t s , i n g e n e r a l , depends on t h e t r a n s f o r m and

source s i g n a l .

A n a l y t i c c a l c u l a t i o n of t h e d i s t r i b u t i o n f u n c t i o n

i s too d i f f i c u l t t o y i e l d meaningful r e s u l t s e x c e p t i n s p e c i a l

cases.

Goldberg and C o s e l l [ 9 ] o b t a i n e d numerical r e s u l t s f o r d i s c r e t e c o s i n e t r a n s f o r m (DCT) c o e f f i c i e n t s showing t h e c o e f f i c i e n t lie

between

effect

of

the

Gaussian

pre-scaling

and

transform

Laplace

distribution

distribution.

coefficients

is

to

to

However, t h e make

the

d i s t r i b u t i o n bi-modal a b o u t +1 and - 1 .

I n t h e l i m i t of p e r f e c t energy

+I

e s t i m a t i o n , t h e d i s t r i b u t i o n approaches a p a i r o f impulses a t

-1.

and

The p r e - s c a l i n g e f f e c t i s i l l u s t r a t e d i n f i g u r e 2-2 and i s v a l i d

f o r any d i s t r i b u t i o n . transform

I n any e v e n t , t h e d i s t r i b u t i o n

is

coefficients

such

that

an

of

improvement

v a r i a b l e t i m e domain q u a n t i z a t i o n (PCM) can

be

the

scaled

over

single

expected.

This

is

e x p l a i n e d i n S e c t i o n 2.5 by t h e concept o f t r a n s f o r m c o d i n g g a i n . I n o r d e r t o minimize t h e d i s t o r t i o n measure and

the

quantization

Further, the

the

optimal

distortion

coefficients approach. measure

to

must

validate

the

in

the

the

an

produce basis

mean

optimal

signal-to-noise the

square

ratio

perceptual basis.

maximizes t h e segmental

where

selected.

SNR

is

transform

error

coding

(MSE) d i s t o r t i o n

with t h i s property. desireable

(SNR) f o r speech

transform

each

quality.

The

because block. The

it This

MSE

is

Thus t h e t r a n s f o r m c o d i n g scheme the

segments

are

the

analysis

Segmental SNR i s a b e t t e r i n d i c a t o r o f t h e p e r c e p t u a l q u a l i t y

o f s p e e c h t h a n SNR. measure

measure

restricted

measure

minimized on a block-by-block

frames.

The o p t i m a l

decorrelated

transform

distortion

c o r r e l a t e s well with

distortion

transform

t r a n s f o r m a t i o n r e s u l t i n g from t h e s e l e c t i o n o f

measure

s e l e c t i o n o f t h e MSE maximizes

on

M i n i m i z a t i o n of t h e results

the

b i t assignment must be o p t i m i z e d . depend

l i n e a r transform w i l l

both

may

Nevertheless, s e l e c t i o n

result

in

introduction

d i s t o r t i o n s i n t o t h e coded speech.

of

of

the

MSE

distortion

unacceptable

perceptual

Perceptual factors

i n t o a c c o u n t by modifying t h e b i t assignment.

can

be

taken

A ) TRANSFORM COEFFICIENT DISTRIBUTION

8) PRE-SCALED TRANSFORM COEFFICIENT DISTRIBUTION (good estimate)

C) PRE-SCALED TRANSFORM COEFFICIENT DISTRIBUTION (better estimate)

TRANSFORM COEFFICIENTS

NOWLIZED TRANSFORH COEFFICIENTS

TRANSFUAW CCEFFICIENTS

PRE- SCALING

F I G U R E 2-2

EFFECT

2.3

OPTIMAL B I T ASSIGNMENT The t r a n s f o r m c o e f f i c i e n t yi w i t h

with

Ri

bits/sample

2.4

u2 i

requires

Ri

6

=

The second term distributed

t h e mean squared d i s t o r t i o n Di i s n o t t o be

+ LOG^ is

the

Gaussian

minimal

random

rate

for

variables.

independent The

identically

correction

factor

depends on t h e type of q u a n t i z e r and t h e p r o b a b i l i t y d e n s i t y (pdf)

of

coding

Ri i s g i v e n by:

exceeded.

eq.

if

variance

the

signal.

Neglecting

the

dependence

s u b s t i t u t i n g f o r Di t h e o p t i m a l number o f b i t s

for

function

o f R i on 6 quantizer

6

and

Qi i s

found by minimizing t h e a v e r a g e d i s t o r t i o n g i v e n by

with t h e c o n s t r a i n t of a fixed average b i t r a t e i . e .

The e x p r e s s i o n f o r t h e a v e r a g e d i s t o r t i o n i s minimized s u b j e c t t o

the

c o n s t r a i n t o f a f i x e d b i t r a t e by t r e a t i n g R i a s a c o n t i n i o u s v a r i a b l e and u s i n g an undetermined m u l t i p l i e r

aa ~{%l ,N I0 eq.

2.5

i=l

.-2Riln2

one o b t a i n s :

N

+

6 1 Ri} i=l

It follows t h a t

eq.

2.6

Ui e

-2Riln2

- 21n2 NB - Const f o r i 1 , 2

...N

S o l v i n g e q u a t i o n 2.6 f o r Ri and e v a l u a t i n g C u s i n g t h e c o n s t r a i n t o f a f i x e d b i t r a t e we o b t a i n :

eq.

2.7

R~ =

R LOG^

Given an o p t i m a l b i t a s s i g n m e n t , t h e

lower

bound

on

distortion

is

g i v e n by

D i s t o r t i o n i n t r o d u c e d by a t r a n s f o r m coding scheme depends on t h e distribution

of t h e v a r i a n c e s .

I n p a r t i c u l a r , t h e average d i s t o r t i o n

i s determined by t h e g e o m e t r i c mean of t h e v a r i a n c e s .

The o p t i m a l q u a n t i z a t i o n scheme d i s c u s s e d presented

in

a

paper

above

was

by Huang and S c h u l t h e i s s [ 2 ] .

even n e g a t i v e b i t a s s i g n m e n t s can r e s u l t because Ri i s continous

variable.

originally

F r a c t i o n a l and treated

as

a

S e g a l l [3] d i s c u s s e s t h e o p t i m a l b i t assignment

under t h e c o n s t r a i n t o f p o s i t i v e i n t e g e r b i t assignment.

B i t s a r e assigned optimally i n the Fortran for

this

thesis.

The

t r a n s f o r m c o e f f i c i e n t s Yk tabulated

in

Max's

technique

.

uses

k

simulation

bit

quantizers

developed f o r the

The r e s u l t i n g mean s q u a r e e r r o r s ~ ( k )a r e

paper [lo].

It f o l l o w s t h a t t h e m a r g i n a l r e t u r n

f o r t h e i t h t r a n s f o r m c o e f f i c i e n t can be d e f i n e d a s

Arranging

k i n d e s c e n d i n g o r d e r , and a s s i g n i n g b i t s one-by-one,

the

g l o b a l minimum mean s q u a r e e r r o r w i l l be achieved i n d e p e n d e n t l y o f t h e 2-8

d i s t r i b u t i o n of t h e t r a n s f o r m c o e f f i c i e n t s .

2.4

THE KARHUNEN-LOEVE TRANSFORM C o n s i d e r a d i s c r e t e s i g n a l of N sampled v a l u e s .

be r e p r e s e n t e d as p o i n t i n an N d i m e n s i o n a l space.

This s i g n a l

Each sampled v a l u e

i s t h e n a component o f t h e N v e c t o r X which r e p r e s e n t s t h e

this

space.

Next

can

signal

in

c o n s i d e r a u n i t a r y t r a n s f o r m ( T ) o p e r a t i n g on t h e

data vector X r e s u l t i n g i n t h e transform v e c t o r Y.

The

objective

in

d a t a compression i s t o s e l e c t a s u b s e t o f M components o f Y where M i s less

N.

than

introduces

The

some

remaining

distortion.

copponents

are

discarded

and

this

A u n i t a r y t r a n s f o r m which minimizes t h e

mean s q u a r e e r r o r caused by d i s c a r d i n g components

is

the

objective.

Some o f t h e i m p o r t a n t p r o p e r t i e s of t h e KLT a r e d e s c r i b e d below. The KLT i s a d a t a dependent t r a n s f o r m a t i o n

whose

basis

vectors

a r e e i g e n v e c t o r s o f t h e a u t o c o r r e l a t i o n m a t r i x of t h e X p r o c e s s . transform d i a g o n a l i z e s t h e a u t o c o r r e l a t i o n matrix of vector

Y which

means

the

components

Gaussian assumption independent.

are

the

components.

2 by in The

a

Each t r a n s f o r m c o e f f i c i e n t can

lower

dimensional

only

space

then

It i s p o s s i b l e by

discarding

mean s q u a r e e r r o r r e s u l t i n g from t h i s approximation

i s t h e sum o f t h e v a r i a n c e s o f t h e d i s c a r d e d

If

transformed

u n c o r r e l a t e d and by t h e

be q u a n t i z e d i n d e p e n d e n t l y w i t h o u t l o s i n g performance. t o approximate

This

transform

coefficients.

t h e components o f Y with the lowest variances a r e discarded,

t h e approximation i s o p t i m a l i n a mean s q u a r e e r r o r s e n s e .

Two l i m i t a t i o n s of t h e Karhunen-Loeve t r a n s f o r m a r e , t h a t computationally

burdensome

and,

requires

solutions

problems whose s o l u t i o n s may be n u m e r i c a l y u n s t a b l e . the

KLT

receiver

requires to

a

perform

knowledge an

of

inverse

the

it

is

of eigenvector More

precisely

c o r r e l a t i o n function a t the

transformation.

The

correlation

function is not generally a v a i l a b l e a t the receiver.

2.5

SUB-OPTIMAL TRANSFORMS

The p r a c t i c a l l i m i t a t i o n s o f t h e KLT r e q u i r e an i n v e s t i g a t i o n sub-optimal

transforms.

The

discrete

of

F o u r i e r t r a n s f o r m (DFT), t h e

d i s c r e t e c o s i n e t r a n s f o r m (DCT) and t h e Walsh-Hadamard t r a n s f o r m (WHT) are

all

useful

sub-optimal

transforms.

A

method

t o compare t h e

performances of u n i t a r y t r a n s f o r m s i n t r a n s f o r m coding a p p l i c a t i o n s is highly desirable. Assuming t h e function

(pdf)

transform of

the

only

affects

the

probability

sampled

process

slightly,

p a r a m e t e r 6 i s unchanged whether q u a n t i z i n g i n t h e time domain.

Hence

e i t h e r domain.

the

density

the

quantizer

or

transform

dependence of t h e d i s t o r t i o n on 6 i s t h e same i n

From e q u a t i o n 2.4,

i t i s c l e a r t h a t a lower

bound

on

t h e d i s t o r t i o n of a PCM i s g i v e n by

eq.

2.8

D

= 2 26 2' 2 i i

Pcm

r1

Ni=l

2

i

D e f i n i n g t h e t r a n s f o r m c o d i n g g a i n o v e r PCM a s o v e r PCM, w i l l e n a b l e u s e f u l comparisons.

G

D A pcm tc Dtc

the

increase

in

SNR

The t r a n s f o r m g a i n f o r any u n i t a r y t r a n s f o r m i s t h e n t h e r a t i o of arithmetic

and

geometric

c o e f f i c i e n t a s g i v e n above. of

mean

of

the

variances

of t h e t r a n s f o r m

Variances a r e j u s t t h e diagonal

t h e c o - v a r i a n c e m a t r i x i n t h e t r a n s f o r m domain.

elements

No11 [4] modelled

t h e l o n g term s t a t i s t i c s o f v o i c e d speech by a s t a t i o n a r y t e n t h Markov s o u r c e .

These r e s u l t s a r e shown i n F i g u r e 2-3.

t r a n s f o r m coding g a i n dependence on block l e n g t h N. lengths, well.

the

KLT

performance

improves.

The

For l a r g e r

of

The DCT performs n e a r l y as

ap$lications.

This

algorithm.

large

Distribution

block

of

the

t h e d i s c r e t e cosine transform c o e f f i c i e n t s a r e h o w n t o

converge a s y m p t o t i c a l l y t o t h e power d e n s i t y spectrum o f

[4].

block

d i s c r e t e s l a n t t r a n s f o r m (DST) and WHT show a v e r y poor

performance i n t r a n s f o r m c o d i n g variances

order

and i l l u s t r a t e

The DFT and DCT converge t o KLT performance f o r a

length.

the

property

is

used

in

the

LPC

the

process

a d a p t i v e b i t assignment

2.6

THE DISCRETE COSINE TRANSFORM For

any

practical

computational

savings

transform

offered

by

coding the

DCT

implementation,

its

and

performance, make i t a n a t t r a c t i v e a l t e r n a t i v e t o t h e and

KLT.

optimal The

DCT

KLT a r e a s y m p t o t i c a l l y e q u i v a l e n t i f t h e d a t a c o v a r i a n c e m a t r i c e s

a r e T o e p l i t z hence i t i s n o t s u r p r i s i n g t h a t t h e DCT as

near

the

well

as

the

performs

KLT when t h e d a t a v e c t o r s a r e l a r g e [ I I ] .

nearly The r a t e

d i s t o r t i o n c r i t e r i a o f t h e KLT and DCT a r e a l s o comparable [12].

Formally t h e DCT o f a r e a l M p o i n t sequence ~ ( k )can

eq.

be

defined

2.10

The i n v e r s e DCT i s g i v e n by

eq.

2.11

v(n)

=

k=O

The DCT o f a sequence v ( n ) i s c l o s e l y r e l a t e d t o a 2M p o i n t DFT related

sequence

u(n).

Interpretation

i n c l u d i n g formant

structure

transform

may

domain.

domain

and

therefore

pitch

of

striations

The f o l l o w i n g a n a l y s i s f o l l o w s

u(n) = v(n)

0

Suggest Documents