statistical normalisation of phase-based feature

0 downloads 0 Views 2MB Size Report
ROBUST SPEECH RECOGNITION. Erfan Loweimi, Jon Barker and Thomas Hain. Speech and Hearing Research Group (SPandH), University of Sheffield, ...
STATISTICAL NORMALISATION OF PHASE-BASED FEATURE REPRESENTATION FOR ROBUST SPEECH RECOGNITION Erfan Loweimi, Jon Barker and Thomas Hain Speech and Hearing Research Group (SPandH), University of Sheffield, Sheffield, UK {e.loweimi1, j.p.barker, t.hain}@sheffield.ac.uk

Distribution Evolution Along MFCC Pipeline

** This is paradoxical … – Signal and its information are recoverable from the phase through phase-only signal reconstruction – One-to-one relationship between phase and magnitude spectra necessitate both carry the same amount of information

– using all Aurora2 training data ( > 1.4 M frames) – with suboptimal assumption that dimensions are independent (for mathematical convenience) 1.2

1e 5

6

PDF of Magnitude Spectrum

Normalised Count

1.0

4

0.6

3

0.4

2

0.2

1 0.5

1.0

1.5

2.5

3.0

3.5

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

LogFBE(3)

PDF of LogFBE

0.4

LogFBE(9)

0.3

LogFBE(15)

0.2

LogFBE(21)

0.1 0.0 0

2

4

6

8

10

12

14

3.5

FBE(9) FBE(21)

0.5

1.0

1.5

2.0

C(1)

PDF of MFCC

C(5) C(10)

8

6

∆ C(5)

6

∆∆ C(5)

2.0

∆ C(8)

5

∆∆ C(8)

1.5

∆ C(10)

2.5

4

0.8

Gauss. Lapla.

0.6 0.4 0.2

3

2

1

0

1

2

3

0.0

4

3

2

1

0

1

– Skewness almost zero – Gaussian < Kurtosis < Laplacian

Statistical Normalisation

4

2

0

2

Principle Equation ...

4

PDF of Delta-Delta

∆∆ C(10)

3

1.0

2

0.5

1

0.0 1.5

1.0

0.5

0.0

0.5

1.0

0

1.5

0.6

0.4

0.2

0.0

0.2

0.4

0.6

Distribution Evolution of Phase-Based Feature

Normalised Count

0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00

B

Non-linearity

D

DCT

Normalised Count

GDF

Temporal Derivative

Post-processing

Recognition Results

E

0.6

arg[X MinPh (10)]

0.5

arg[XVT (10)]

arg[X MinPh (35)]

0.4

arg[XVT (35)]

arg[X MinPh (55)] arg[XMinPh(110)]

2

0

2

arg[XVT (110)]

0.2

1 4

arg[XVT (55)]

0.3

2

0.1

4

6

0.0

4

3

2

1

0

1

2

3

1.2

GDVT (10)

3.0

GDVT (35)

2.5

1.0

3

0.5 0.2

0.0

0.2

0.4

(20)

0.4

B

0.2

A

0.6

(17)

0.6

GDVT (110)

1.5

(13)

0.8

GDVT (55)

2.0

(9)

1.0

0.0

0.8

1.2

1.5

1.0

0.5

0.0

0.5

1.0

1.5

2.0

1.0

(9)

1.0 Normalised Count

A

FilterBank

3.5

C

(13)

0.8 0.6

C(1)

0.8

D

C(5)

(17)

0.6

C(8)

(20)

0.4

C(10)

0.4

0.2

0.2 0.0

1.5

1.0

0.5

0.0

0.5

1.0

0.0

1.5

5

2

1

0

1

14

∆ C(1)

12

∆∆ C(1)

∆ C(5)

10

∆∆ C(5)

3

∆ C(8)

8

∆∆ C(8)

2

∆ C(10)

6

∆∆ C(10)

4 Normalised Count

Extract

Frame Blocking And Windowing

Pre-emphasis

0.0

4

1

E1

0

0.4

0.2

0.0

0.2

0.4

E2

2 0.6

0

0.2

0.1

0.0

0.1

0.2

Distribution of Principle (Wrapped) Phase (ARG) N o rm a lis e d C o u n t

ARG[X(45)]

ARG[X(55)]

ARG[X(80)]

ARG[X(100)]

0.166 0.164 0.162 0.160 0.158 0.156

N o rm a lis e d C o u n t

0.154

3

2

1

0

1

2

3

0.40 0.35

ARG[X(ω)]

0.30

Gauss. Lapla.

0.25 0.20 0.15 0.10 0.05 0.00

3

2

1

0

1

2

3

- Uniform distribution for phase spectrum is – paradoxical ! – artefact of wrapping !

N o r m a lis e d C o u n t

Distribution of Wrapped Magnitude Spectrum 1.2

1e

5

1.0 0.8 0.6 0.4 0.2

N o r m a lis e d C o u n t

0.0 0

1

2

3

4

5 1e5

0.40 0.35

wrapped

0.30

Gauss. Lapla.

0.25 0.20 0.15 0.10 0.05 0.00

www.PosterPresentations.com

Gauss. Lapla.

C(8)

∆∆ C(1)

4.0

TEMPLATE DESIGN © 2008

C3

1e6

7

PDF of Delta

** The proposed approach returns up to 18.6% relative WER reduction compared with previous reported results [Table 1-4]

3

1.0

FBE(15)

∆ C(1)

3.0

C

2

C2

8

** Based on statistical behaviour of the phase-based features in clean condition, 3 normalisation schemes are applied to alleviate the effect of noise

1

1.2

In Clean Condition …

FBE(3)

PDF of Filter Bank Energies (FBE)

0 0.0

4.0 1e5

0.5

** We show that phase spectrum, contrary to the general belief, has a bell-shaped distribution

Source-Filter Separation in the Phase Domain

2.0

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0

1e 6

5

0.8

0.0 0.0

Normalised Count

** Uniform distribution implicitly means that phase spectrum has maximum level of entropy and literally structureless/informationless

Distributions (Histograms) are computed

Normalised Count

** Phase Spectrum is generally assumed to have a Uniform distribution

Distribution of Phase-based Features N o rm a lis e d C o u n t

Abstract

3

2

1

0

1

2

3

– Baseline: BMFGDVT [Interspeech 2015] – Normalisations are applied on both Train and Test data

2

3

Suggest Documents