A Novel and Computationally Efficient Approach to ... - Semantic Scholar

9 downloads 0 Views 1MB Size Report
Decision Thresholds in an Adaptive. Speaker Verification System. Nikki Mirghafori. Joint work with. Matthieu Hebert at Nuance. Friday the 13th, February 2004 ...
A Novel and Computationally Efficient Approach to Setting Decision Thresholds in an Adaptive Speaker Verification System Nikki Mirghafori Joint work with Matthieu Hebert at Nuance Friday the 13th, February 2004

Outline • • •

Beginning Middle End

2

Outline • • • • • •

Motivation Approach Data Analysis Calibration Experimental Results Conclusions ICASSP 2004: Mirghafori, N. and Hebert, M. "Parameterization of the Score Threshold for a Text-Dependent Adaptive Speaker Verification System", Montreal, Canada.

3

Context: Speaker Verification Field Application Please enter your You’re accepted “5551234 ” account number by the system Speaker Model

Compare

Authenticate Verification Authenticate Verification Voice Engine Voice Engine

Score Threshold

An example of a verification session 4

Accept Reject

Problems with Score Thresholds •



Overall fixed system thresholds are difficult to set in advance If combined with online adaptation, thresholds must be dynamic and adaptive

5

Problem I – Setting Overall Threshold Setting overall thresholds before-hand is challenging • •



High convenience? High security? Difference between lab and field conditions can cause differences in thresholds Not all speakers are created equal

User False Reject Rate (FR)



X

High Security Threshold = 1.5 Threshold = 0.8

High Convenience

X

Imposter False Accept Rate (FA) 6

Problem II – Effects of Online Adaptation Effects of Online Adaptation on EER and FA (at fixed threshold)

6

EER FA

4 2

Condition

7

Iter8

Iter7

Iter6

Iter5

Iter4

Iter3

Iter2

Iter1

0 Baseline

Error Rate

8

Problem II – Increase in Impostor Scores

Num Trials (Normalized)

True Speaker and Imposter Distributions Pre and Post Adaptation

PreAdapt:TS PreAdapt: Imp PostAdapt: TS PostAdapt: Imp

-7

-2

3

Verification Scores

8

8

Problem -- II •







Speaker adaptation successfully reduces EER in verification [Heck & Mirghafori, ICSLP ‘00] However, both true speaker and, to a lesser extent, impostor scores may increase Increase in impostor false accept rates can cause speaker model corruption Similar score increase observed in speech recognition for confidence scores [Sankar & Kannan, ICASSP ‘02]. Score transformation used as a solution.

9

Motivation 1.

2.

Eliminate the challenge of setting overall application thresholds in advance Counter the verification score shifts resulting from online adaptation

10

Previous Work in Setting Thresholds in Speaker Verification q

q

q

Mirghafori, N. and Heck, L.P., "An Adaptive Speaker Verification System With Speaker Dependent A Priori Decision Thresholds", Proc. International Conference on Spoken Language Processing, Denver, Colorado, 2002. A.C. Surendran and C.-H Lee. “A priori threshold selection in fixed vocabulary speaker verification system.” ICSLP 2000 J. Lindberg, J.W. Koolwaaij, H.-P. Hutter, D. Genoud, M. Blomberg, J.-B. Pierrot, and F. Bimbot. “Normalization and selection of speech segments for speaker recognition scoring.” RLA2C 1998.

11

Our Approach

SDT = F( target_FA, number_of_training_frames)

Threshold

FA=0.5% FA=1.0% FA=2.0% FA=5.0%

Number of training frames in the speaker model

12

Test Setup • • • •

Japanese Digits 6,477 speaker models Enrollment: 8-digit string (3x) Held out test set: •



• •

8-digit string (1x)

Adaptation: •



Enrollment

8-digit string (1x)

Gather all test scores Divide into bins w.r.t. to num frames Calculate threshold at target FA

Adaptation (1 utt only)

Testing (Adaptation turned off)

13

Speaker Verification System T E-Carb “Bob” on Electret

Electret

TE-Cell

Cellular

Carbon

Claimant MAP Adaptation

Impostor

T E-Carb TE-Cell Electret

MAP Adaptation

Cellular

Carbon HD+GD Impostor Models

Root (UBM) 14

Root Model - HI+GI GMM

The trend looks good: Threshold = (Target FA, Num Frames)

Japanese Digits 2.5

1.5 1 0.5

Num Frames in Speaker Model 15

2300

2100

1900

1700

1500

1300

1100

900

700

0 500

Threshold

2

0.1% FA 0.2 0.3 0.4 0.5 0.8 1 3 5 7

“It’s a Bug!” Or is it?

Num Frames in Speaker Model 16

00 17

00 15

13 00

11 00

90 0

0 70

0 50

30

0

4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 10 0

Threshold

Canadian French Text

0.1% FA 0.2 0.3 0.4 0.5 0.8 1 3 5 7

The trend is confirmed… UK English Text 3

2 1.5 1 0.5

Num Frames in Speaker Model 17

1300

1200

1100

1000

900

800

700

600

500

400

300

200

0 100

Threshold

2.5

0.1% FA 0.2 0.3 0.4 0.5 0.8 1 3 5 7

What’s the Missing Parameter? • •

• •

• •

Password length! Japanese digits DB uniform, whereas the two others are not Contain both short and long passwords In general: verification performance lower for shorter passwords Hence, need higher threshold Can we parameterize? •

Threshold = G(Target FA, Num. Frames, Password Length)

18

Calibration • • • •

Japanese text AND digit Both short and long passwords Trained and tested 10K speaker models Divided password length region between 0 and 700 into 8 bins

19

Password Length distribution

Bins: between 120K to 196K attempts each

Password Length 20

Num. Training Frames in Speaker Model For each password length Short passwords •Divide data each bin into 10 sub-bins according to the number of frames in speaker model •Calculated Average Password Length (APL) and Average Num Frames (ANF)

Long passwords

Num. Training Frames in Speaker Model 21

Coverage of 2D space

Ave. Password Length

Num. Training Frames vs. Password Len.

Questionable region: at the edge of distrib. - it was excluded

Ave. Num. Training Frames in Speaker Model 22

Result of fitting TargetFA = 0.1% TargetFA=0.1%

• Fit a second degree polynomial with three parameters (10 free coefficients)

Thresh.

APL

ANF/ APL

Fit looks good! 23

Result of fitting TargetFA = 5% TargetFA=5% Thresh.

ANF/ APL

APL

Fit looks good! 24

Actual FA rates for FA goal 0.2-1.5% 0

0.5

EA Dig1

1

1.5

2

0.53

EA Dig2

Goal Region

0.68

EA Dig3

0.42

EA Dig4

0.4

Test set

EA Txt1

2.16

EA Txt2

1.97

EUK Dig

1.13

EUK Txt FR Dig

1.2 0.11

FR Txt SP Dig

2.5

0.41 0.17

SP Txt

1.08 25

Effects of Online Adapatation on FA 3.50%

3.00%

FA Rate

2.50%

2.00%

FCDT BASE

1.50%

1.00%

0.50%

0.00% Preadapt

Iter0

Iter3

Iter6

Iter9

Iter12

Iter15

Iterations of Online Adaptation 26

Iter18

Iter21

Iter24

In Conclusion •



Setting a priori thresholds is a challenging and important problem for field applications Threshold can be parameterized as function of • • •



Target FA Number of training frames in speaker model Password length

Threshold parameters can be calibrated on one database and applied to others successfully • • •

Thresholds achieved within target FA range FA rate kept constant during online adaptation Overall performance (EER) does not degrade

27

Suggest Documents