Structured Support Vector Machines for Noise Robust Continuous ...

4 downloads 36 Views 1MB Size Report
... Speech Recognition. S.-X. Austin Zhang and Mark Gales. 31 August 2011. Cambridge University Engineering Department. Interspeech 2011 Florence ...
Structured Support Vector Machines for Noise Robust Continuous Speech Recognition S.-X. Austin Zhang and Mark Gales 31 August 2011

Cambridge University Engineering Department

Interspeech 2011 Florence

Structured SVMs for Noise Robust CSR

SVMs for Continuous Speech Recognition • SVMs generate boundaries between classes – For isolated digit recognition, no problem (10 classes) sequence kernels: map to fixed length – Continuous speech? too many classes! (e.g., 6 digits ⇒ 106 classes) • Simplest approach:

FOUR

ONE

SEVEN

ONE

ONE

ONE

ZERO SIL

ZERO SIL

ZERO SIL

1. Using HMM-based Segmentation 2. For each segment, isolated classification

Problem: Restricted to one fixed segmentation • Alternatively, incorporate structures into SVM classes – Structured SVM ! Cambridge University Engineering Department

Interspeech 2011 Florence

1

Structured SVMs for Noise Robust CSR

Structured SVMs for CSR • Model whole utterance – observations O, word sequence w – joint features φ(O, w). How to build? generative model to extract features ⇒ model compensation l

s s

e

d

o

m

i

t

v

e

r

a

e n

e g

” E

N

O

λ



"

! #! ! ! #!

+

"

λ





O

T

W

: .

7

0

.

6

.

0

5

0

.

4

0

.

3

0

2

.

0

.

1

0

0

















E

N

O

T

O

W

:

W

E

N

O

r

e

e

T

h

• Target: Learn discriminative parameters α Decoding by match score: arg max αTφ(O, w) w

Cambridge University Engineering Department

Interspeech 2011 Florence

2

Structured SVMs for Noise Robust CSR

Structured SVM Training • Training α, so that match score of correct reference wref ≥ all competing w



“ 0

0

0

0

0

, ”



1

e

l

m

S

a

p 1

,





3

2

1

n

i

t

e

o

m

c

l

t

o

a

g

p

1 )

(

1 )

(

f

r

e





9

9

9

, ”



0

0

0

,

e

n

l

m

S

a





p 1

0

0

,





6

5

4

n

i

t

e

c

o

m

l

t

a

o

g

p

n

n )

(

)

(

f

r

e







9

9

9

,

– Hard-margin, “+1” means “0-1 loss” Cambridge University Engineering Department

Interspeech 2011 Florence

3

Structured SVMs for Noise Robust CSR

Structured SVM Training • To generalize the training – Replace “0-1 loss” as L(wref, w) – Introduce slack variable ξi for linearly nonseparable cases





3

3

linear

2

1

α

2

1





• Unconstrained form min Fssvm(α)

  (ξi is the part in · +) convex

}| {i z }| { z n h n o X 1 (i) (i) ||α||22 + C − αTφ(O(i), wref) + max L(w, wref) + αTφ(O(i), w) w6=wref 2 + r=1 e

n

o

e

n

o

,

Cambridge University Engineering Department

Interspeech 2011 Florence

4

Structured SVMs for Noise Robust CSR

Structured SVM ≡ Large Margin Log Linear Model • Example of log-linear models  1 T P (w|O; α) = exp α φ(O, w) Z • Margin: minimum distance between correct wref and competing w    P (wref|O; α) min log w6=wref P (w|O; α) • Regularised, large margin criterion ≡ Structured SVM n h n oi X 1 (i) (i) ||α||22 +C −αTφ(O(i), wref)+ max L(w, wref) + αTφ(O(i), w) w6=wref 2 + r=1

Cambridge University Engineering Department

Interspeech 2011 Florence

5

Structured SVMs for Noise Robust CSR

Best Segmentation • Features depend on the segmentation θ

+

• For efficiency, consider one “best” segmentation – How to define “best”? – Previously, φ(O, w) = φ(O, w; θhmm) Problem: Best segmentation in HMM may not the best in SSVM – What we want: max αTφ(O, w; θ) θ

Cambridge University Engineering Department

Interspeech 2011 Florence

6

Structured SVMs for Noise Robust CSR

Can we optimize segmentation θ in Structured SVMs?

• How to learn α and θ jointly in training? • How to find w and θ in decoding?

Cambridge University Engineering Department

Interspeech 2011 Florence

7

Structured SVMs for Noise Robust CSR

Overall Training with Optimal Segmentation • Previously, no variable θ, minα (Convex Optimization) convex

linear }| {i z }| { z h n o n (i) T (i) 1 ||α||2 +C P −αT φ(O(i) , w(i) ) + max L(w, w ) + α φ(O , w) 2 ref ref 2 w6=wref

i=1

+

• Now, Optimizing θ with α (Nonconvex Optimization) concave

convex

z

}| }| { n o{ i hz (i) (i) −max αTφ(O(i), wref, θ) + max L(w, wref) + αTφ(O(i), w, θ) w6=wref ,θ

θ

+

e

n

o o

e

n

o

,

Cambridge University Engineering Department

Interspeech 2011 Florence

8

Structured SVMs for Noise Robust CSR

Overall Training with Optimal Segmentation • Previously, no variable θ, minα (Convex Optimization) convex

linear }| {i z }| { z h n o n (i) T (i) 1 ||α||2 +C P −αT φ(O(i) , w(i) ) + max L(w, w ) + α φ(O , w) 2 ref ref 2 w6=wref

i=1

+

• Now, Optimizing θ with α (Nonconvex Optimization) linear

convex

z

}| }| { hz n o{ i (i) (i) L(w, wref) + αTφ(O(i), w, θ) −αTφ(O(i), wref, θref) + max w6=wref ,θ

Cambridge University Engineering Department

Interspeech 2011 Florence

+

9

Structured SVMs for Noise Robust CSR

Overall Training with Optimal Segmentation • Algorithm (EM-style, guaranteed to converge): (i)

(i)

θref ← arg max αTφ(O(i), wref, θ), ∀i

➀ Find reference segmentation:

θ

(i)

➁ Original SSVM training:

α ← min Fssvm(α|θref)

• Illustration of Algorithm (Concave-Convex Procedure): n

i

n

m

n

r

r

i

F

n

t

o

t

a

e

e

s

e

c

e

e

e

d

g

f

1 e

a

c

v

o

C

n

x

v

n

e

o

C

r

i

L

n

a

e

2 i

n

i

n

r t

a

S

S

a

O

g

g

Interspeech 2011 Florence

M

V

l

i

n

i

r

Cambridge University Engineering Department

10

Structured SVMs for Noise Robust CSR

Decoding with Optimal Segmentation • Decoding with optimal segmentation θ arg max αTφ(O, w, θ) w,θ

• In general, very difficult. – For our features, efficient search algorithm exists T

Interspeech 2011 Florence

T

T

Cambridge University Engineering Department

11

Structured SVMs for Noise Robust CSR

Decoding with Optimal Segmentation • Search optimal position of “black nodes”, and the label between them.

 • Viterbi-style search, ψ(t) = max ψ(tst) + tst ,w

w

α

w

α

“zero P”

k=“one”

(w)

αk HMMk



– Related to Factorial HMM inference Cambridge University Engineering Department

Interspeech 2011 Florence

12

Structured SVMs for Noise Robust CSR

Implementation • Overall framework:

2GXMK3GXMOT:XGOTOTM t

d e

a

d

U

p

i

i

)

)

(

(

g

f

r

e

n

1

i

t

n

a

r

e

n

e

G

i

g

m

n

i

o

t

a

t

e

n

e

m

s

e

c

e

n

r

e

e

h

r

h

r

a

c

e

c

a

r

e

S

r

t

a

o

r

e

m

u

N

g

f

m

i

)

(

a

e

i

t

c

t

L

a

r g o r P c

t

a

d e

d

U

i

p 2 t a r

i

t

n

a

r

e

n

e

G

d

g a

u

t

r

o

i

a

n

m

n

o

e

D

i

n

t

o

t

a

e

n

e

m

s

d

n

a

i

n

t

e

o

m

c

S

g

g

p

Q

e

i

t

c

t

L

a

:XGOTOTM6NGYK *KIUJOTM6NGYK

^

XMOT:XGOTOTM

i

t

n

a

r

e

n

e

G

g ^

t

r

o

i

a

n

m

n

o

e

D

i

n

t

o

t

a

e

n

s

e

m

d

a

n

t

s

e

b

r

c

h

S

a

e

g

i

t

e

t

c

L

a

d

i

l

z

e

l

l

e

r

a

:

a

P

2

– Loop

is parallelized training of original structured SVM

Cambridge University Engineering Department

Interspeech 2011 Florence

13

Structured SVMs for Noise Robust CSR

AURORA 2 Results • Noise corrupted continuous digit task. SNRs from 0dB–20dB. Model HMM

Training Decoding Avg. all — — 9.5 θhmm θhmm 7.8 Structured θhmm opt θ 7.6 SVM opt θ opt θ 7.4

• Structured SVMs with optimal segmentation: – Gain over VTS-compensated HMMs: 22% – Gain over Multi-Class SVMs: 10% – Gain over original Structured SVMs: 4% Cambridge University Engineering Department

Interspeech 2011 Florence

14

Structured SVMs for Noise Robust CSR

Conclusion • Examined Structured SVMs for noise robust speech recognition – equivalent to Large Margin Log Linear Models • Enabled optimising segmentation in Structured SVMs training – use concave-convex procedure • Enabled optimising segmentation in Structured SVMs decoding – proposed Viterbi-style efficient search • Good Performance

Cambridge University Engineering Department

Interspeech 2011 Florence

15