Structured Support Vector Machines for Noise Robust Continuous ...

Structured Support Vector Machines for Noise Robust Continuous Speech Recognition S.-X. Austin Zhang and Mark Gales 31 August 2011

Cambridge University Engineering Department

Interspeech 2011 Florence

Structured SVMs for Noise Robust CSR

SVMs for Continuous Speech Recognition • SVMs generate boundaries between classes – For isolated digit recognition, no problem (10 classes) sequence kernels: map to fixed length – Continuous speech? too many classes! (e.g., 6 digits ⇒ 106 classes) • Simplest approach:

FOUR

ONE

SEVEN

ONE

ONE

ONE

ZERO SIL

ZERO SIL

ZERO SIL

1. Using HMM-based Segmentation 2. For each segment, isolated classification

Problem: Restricted to one fixed segmentation • Alternatively, incorporate structures into SVM classes – Structured SVM ! Cambridge University Engineering Department


1


Structured SVMs for CSR • Model whole utterance – observations O, word sequence w – joint features φ(O, w). How to build? generative model to extract features ⇒ model compensation l

s s

e

d

o

m

i

t

v

e

r

a

e n

e g

” E

N

O

λ

“

"

! #! ! ! #!

+

"

λ

”

“

O

T

W

: .

7

0

.

6

.

0

5

0

.

4

0

.

3

0

2

.

0

.

1

0

0

”

“

”

“

”

“

”

“

E

N

O

T

O

W

:

W

E

N

O

r

e

e

T

h

• Target: Learn discriminative parameters α Decoding by match score: arg max αTφ(O, w) w



2


Structured SVM Training • Training α, so that match score of correct reference wref ≥ all competing w

”

“ 0

0

0

0

0

, ”

“

1

e

l

m

S

a

p 1

,

”

“

3

2

1

n

i

t

e

o

m

c

l

t

o

a

g

p

1 )

(

1 )

(

f

r

e

”

“

9

9

9

, ”

“

0

0

0

,

e

n

l

m

S

a

”

“

p 1

0

0

,

”

“

6

5

4

n

i

t

e

c

o

m

l

t

a

o

g

p

n

n )

(

)

(

f

r

e

”

“

“

9

9

9

,

– Hard-margin, “+1” means “0-1 loss” Cambridge University Engineering Department


3


Structured SVM Training • To generalize the training – Replace “0-1 loss” as L(wref, w) – Introduce slack variable ξi for linearly nonseparable cases

”

“

3

3

linear

2

1

α

2

1

”

“

• Unconstrained form min Fssvm(α)

(ξi is the part in · +) convex

}| {i z }| { z n h n o X 1 (i) (i) ||α||22 + C − αTφ(O(i), wref) + max L(w, wref) + αTφ(O(i), w) w6=wref 2 + r=1 e

n

o

e

n

o

,



4


Structured SVM ≡ Large Margin Log Linear Model • Example of log-linear models 1 T P (w|O; α) = exp α φ(O, w) Z • Margin: minimum distance between correct wref and competing w P (wref|O; α) min log w6=wref P (w|O; α) • Regularised, large margin criterion ≡ Structured SVM n h n oi X 1 (i) (i) ||α||22 +C −αTφ(O(i), wref)+ max L(w, wref) + αTφ(O(i), w) w6=wref 2 + r=1



5


Best Segmentation • Features depend on the segmentation θ

+

• For efficiency, consider one “best” segmentation – How to define “best”? – Previously, φ(O, w) = φ(O, w; θhmm) Problem: Best segmentation in HMM may not the best in SSVM – What we want: max αTφ(O, w; θ) θ



6


Can we optimize segmentation θ in Structured SVMs?

• How to learn α and θ jointly in training? • How to find w and θ in decoding?



7


Overall Training with Optimal Segmentation • Previously, no variable θ, minα (Convex Optimization) convex

linear }| {i z }| { z h n o n (i) T (i) 1 ||α||2 +C P −αT φ(O(i) , w(i) ) + max L(w, w ) + α φ(O , w) 2 ref ref 2 w6=wref

i=1

+

• Now, Optimizing θ with α (Nonconvex Optimization) concave

convex

z

}| }| { n o{ i hz (i) (i) −max αTφ(O(i), wref, θ) + max L(w, wref) + αTφ(O(i), w, θ) w6=wref ,θ

θ

+

e

n

o o

e

n

o

,



8


Overall Training with Optimal Segmentation • Previously, no variable θ, minα (Convex Optimization) convex

linear }| {i z }| { z h n o n (i) T (i) 1 ||α||2 +C P −αT φ(O(i) , w(i) ) + max L(w, w ) + α φ(O , w) 2 ref ref 2 w6=wref

i=1

+

• Now, Optimizing θ with α (Nonconvex Optimization) linear

convex

z

}| }| { hz n o{ i (i) (i) L(w, wref) + αTφ(O(i), w, θ) −αTφ(O(i), wref, θref) + max w6=wref ,θ



+

9


Overall Training with Optimal Segmentation • Algorithm (EM-style, guaranteed to converge): (i)

(i)

θref ← arg max αTφ(O(i), wref, θ), ∀i

➀ Find reference segmentation:

θ

(i)

➁ Original SSVM training:

α ← min Fssvm(α|θref)

• Illustration of Algorithm (Concave-Convex Procedure): n

i

n

m

n

r

r

i

F

n

t

o

t

a

e

e

s

e

c

e

e

e

d

g

f

1 e

a

c

v

o

C

n

x

v

n

e

o

C

r

i

L

n

a

e

2 i

n

i

n

r t

a

S

S

a

O

g

g


M

V

l

i

n

i

r


10


Decoding with Optimal Segmentation • Decoding with optimal segmentation θ arg max αTφ(O, w, θ) w,θ

• In general, very difficult. – For our features, efficient search algorithm exists T


T

T


11


Decoding with Optimal Segmentation • Search optimal position of “black nodes”, and the label between them.

• Viterbi-style search, ψ(t) = max ψ(tst) + tst ,w

w

α

w

α

“zero P”

k=“one”

(w)

αk HMMk

– Related to Factorial HMM inference Cambridge University Engineering Department


12


Implementation • Overall framework:

2GXMK3GXMOT:XGOTOTM t

d e

a

d

U

p

i

i

)

)

(

(

g

f

r

e

n

1

i

t

n

a

r

e

n

e

G

i

g

m

n

i

o

t

a

t

e

n

e

m

s

e

c

e

n

r

e

e

h

r

h

r

a

c

e

c

a

r

e

S

r

t

a

o

r

e

m

u

N

g

f

m

i

)

(

a

e

i

t

c

t

L

a

r g o r P c

t

a

d e

d

U

i

p 2 t a r

i

t

n

a

r

e

n

e

G

d

g a

u

t

r

o

i

a

n

m

n

o

e

D

i

n

t

o

t

a

e

n

e

m

s

d

n

a

i

n

t

e

o

m

c

S

g

g

p

Q

e

i

t

c

t

L

a

:XGOTOTM6NGYK *KIUJOTM6NGYK

^

XMOT:XGOTOTM

i

t

n

a

r

e

n

e

G

g ^

t

r

o

i

a

n

m

n

o

e

D

i

n

t

o

t

a

e

n

s

e

m

d

a

n

t

s

e

b

r

c

h

S

a

e

g

i

t

e

t

c

L

a

d

i

l

z

e

l

l

e

r

a

:

a

P

2

– Loop

is parallelized training of original structured SVM



13


AURORA 2 Results • Noise corrupted continuous digit task. SNRs from 0dB–20dB. Model HMM

Training Decoding Avg. all — — 9.5 θhmm θhmm 7.8 Structured θhmm opt θ 7.6 SVM opt θ opt θ 7.4

• Structured SVMs with optimal segmentation: – Gain over VTS-compensated HMMs: 22% – Gain over Multi-Class SVMs: 10% – Gain over original Structured SVMs: 4% Cambridge University Engineering Department


14


Conclusion • Examined Structured SVMs for noise robust speech recognition – equivalent to Large Margin Log Linear Models • Enabled optimising segmentation in Structured SVMs training – use concave-convex procedure • Enabled optimising segmentation in Structured SVMs decoding – proposed Viterbi-style efficient search • Good Performance



15

Structured Support Vector Machines for Noise Robust Continuous ...

Structured Support Vector Machines for Noise Robust Continuous ...

Suggest Documents

Extending Noise Robust Structured Support Vector Machines to ...

Robust ASR using Support Vector Machines - Core

Robust Anomaly Detection Using Support Vector Machines

Support Vector Machines for Robust Channel Estimation ... - CiteSeerX

Robust Support Vector Machines for Anomaly Detection in Computer ...

Robust Support Vector Machines for Speaker Verification Task - arXiv

Support Vector Machines for Robust Channel Estimation in OFDM

Support Vector Machines - CiteSeerX

Support Vector Machines

Support Vector Machines (SVM)

Compressed Support Vector Machines

Support Vector Machines

Predicting structured objects with support vector machines - Yisong Yue

Using support vector machines and acoustic noise ... - Springer Link

On the Noise Model of Support Vector Machines Regression - CiteSeerX

Cost-sensitive Support Vector Machines

Support Vector Machines - R Project

Support Vector Machines without Tears

Support vector fuzzy regression machines

pdf test - Support Vector Machines

Support Vector Machines - Google Sites

Engineering multilevel support vector machines

Parallel Semiparametric Support Vector Machines

Cryptographically Private Support Vector Machines