Decision Thresholds in an Adaptive. Speaker Verification System. Nikki Mirghafori. Joint work with. Matthieu Hebert at Nuance. Friday the 13th, February 2004 ...
A Novel and Computationally Efficient Approach to Setting Decision Thresholds in an Adaptive Speaker Verification System Nikki Mirghafori Joint work with Matthieu Hebert at Nuance Friday the 13th, February 2004
Outline • • •
Beginning Middle End
2
Outline • • • • • •
Motivation Approach Data Analysis Calibration Experimental Results Conclusions ICASSP 2004: Mirghafori, N. and Hebert, M. "Parameterization of the Score Threshold for a Text-Dependent Adaptive Speaker Verification System", Montreal, Canada.
3
Context: Speaker Verification Field Application Please enter your You’re accepted “5551234 ” account number by the system Speaker Model
Compare
Authenticate Verification Authenticate Verification Voice Engine Voice Engine
Score Threshold
An example of a verification session 4
Accept Reject
Problems with Score Thresholds •
•
Overall fixed system thresholds are difficult to set in advance If combined with online adaptation, thresholds must be dynamic and adaptive
5
Problem I – Setting Overall Threshold Setting overall thresholds before-hand is challenging • •
•
High convenience? High security? Difference between lab and field conditions can cause differences in thresholds Not all speakers are created equal
User False Reject Rate (FR)
•
X
High Security Threshold = 1.5 Threshold = 0.8
High Convenience
X
Imposter False Accept Rate (FA) 6
Problem II – Effects of Online Adaptation Effects of Online Adaptation on EER and FA (at fixed threshold)
6
EER FA
4 2
Condition
7
Iter8
Iter7
Iter6
Iter5
Iter4
Iter3
Iter2
Iter1
0 Baseline
Error Rate
8
Problem II – Increase in Impostor Scores
Num Trials (Normalized)
True Speaker and Imposter Distributions Pre and Post Adaptation
PreAdapt:TS PreAdapt: Imp PostAdapt: TS PostAdapt: Imp
-7
-2
3
Verification Scores
8
8
Problem -- II •
•
•
•
Speaker adaptation successfully reduces EER in verification [Heck & Mirghafori, ICSLP ‘00] However, both true speaker and, to a lesser extent, impostor scores may increase Increase in impostor false accept rates can cause speaker model corruption Similar score increase observed in speech recognition for confidence scores [Sankar & Kannan, ICASSP ‘02]. Score transformation used as a solution.
9
Motivation 1.
2.
Eliminate the challenge of setting overall application thresholds in advance Counter the verification score shifts resulting from online adaptation
10
Previous Work in Setting Thresholds in Speaker Verification q
q
q
Mirghafori, N. and Heck, L.P., "An Adaptive Speaker Verification System With Speaker Dependent A Priori Decision Thresholds", Proc. International Conference on Spoken Language Processing, Denver, Colorado, 2002. A.C. Surendran and C.-H Lee. “A priori threshold selection in fixed vocabulary speaker verification system.” ICSLP 2000 J. Lindberg, J.W. Koolwaaij, H.-P. Hutter, D. Genoud, M. Blomberg, J.-B. Pierrot, and F. Bimbot. “Normalization and selection of speech segments for speaker recognition scoring.” RLA2C 1998.
11
Our Approach
SDT = F( target_FA, number_of_training_frames)
Threshold
FA=0.5% FA=1.0% FA=2.0% FA=5.0%
Number of training frames in the speaker model
12
Test Setup • • • •
Japanese Digits 6,477 speaker models Enrollment: 8-digit string (3x) Held out test set: •
•
• •
8-digit string (1x)
Adaptation: •
•
Enrollment
8-digit string (1x)
Gather all test scores Divide into bins w.r.t. to num frames Calculate threshold at target FA
Adaptation (1 utt only)
Testing (Adaptation turned off)
13
Speaker Verification System T E-Carb “Bob” on Electret
Electret
TE-Cell
Cellular
Carbon
Claimant MAP Adaptation
Impostor
T E-Carb TE-Cell Electret
MAP Adaptation
Cellular
Carbon HD+GD Impostor Models
Root (UBM) 14
Root Model - HI+GI GMM
The trend looks good: Threshold = (Target FA, Num Frames)
Japanese Digits 2.5
1.5 1 0.5
Num Frames in Speaker Model 15
2300
2100
1900
1700
1500
1300
1100
900
700
0 500
Threshold
2
0.1% FA 0.2 0.3 0.4 0.5 0.8 1 3 5 7
“It’s a Bug!” Or is it?
Num Frames in Speaker Model 16
00 17
00 15
13 00
11 00
90 0
0 70
0 50
30
0
4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 10 0
Threshold
Canadian French Text
0.1% FA 0.2 0.3 0.4 0.5 0.8 1 3 5 7
The trend is confirmed… UK English Text 3
2 1.5 1 0.5
Num Frames in Speaker Model 17
1300
1200
1100
1000
900
800
700
600
500
400
300
200
0 100
Threshold
2.5
0.1% FA 0.2 0.3 0.4 0.5 0.8 1 3 5 7
What’s the Missing Parameter? • •
• •
• •
Password length! Japanese digits DB uniform, whereas the two others are not Contain both short and long passwords In general: verification performance lower for shorter passwords Hence, need higher threshold Can we parameterize? •
Threshold = G(Target FA, Num. Frames, Password Length)
18
Calibration • • • •
Japanese text AND digit Both short and long passwords Trained and tested 10K speaker models Divided password length region between 0 and 700 into 8 bins
19
Password Length distribution
Bins: between 120K to 196K attempts each
Password Length 20
Num. Training Frames in Speaker Model For each password length Short passwords •Divide data each bin into 10 sub-bins according to the number of frames in speaker model •Calculated Average Password Length (APL) and Average Num Frames (ANF)
Long passwords
Num. Training Frames in Speaker Model 21
Coverage of 2D space
Ave. Password Length
Num. Training Frames vs. Password Len.
Questionable region: at the edge of distrib. - it was excluded
Ave. Num. Training Frames in Speaker Model 22
Result of fitting TargetFA = 0.1% TargetFA=0.1%
• Fit a second degree polynomial with three parameters (10 free coefficients)
Thresh.
APL
ANF/ APL
Fit looks good! 23
Result of fitting TargetFA = 5% TargetFA=5% Thresh.
ANF/ APL
APL
Fit looks good! 24
Actual FA rates for FA goal 0.2-1.5% 0
0.5
EA Dig1
1
1.5
2
0.53
EA Dig2
Goal Region
0.68
EA Dig3
0.42
EA Dig4
0.4
Test set
EA Txt1
2.16
EA Txt2
1.97
EUK Dig
1.13
EUK Txt FR Dig
1.2 0.11
FR Txt SP Dig
2.5
0.41 0.17
SP Txt
1.08 25
Effects of Online Adapatation on FA 3.50%
3.00%
FA Rate
2.50%
2.00%
FCDT BASE
1.50%
1.00%
0.50%
0.00% Preadapt
Iter0
Iter3
Iter6
Iter9
Iter12
Iter15
Iterations of Online Adaptation 26
Iter18
Iter21
Iter24
In Conclusion •
•
Setting a priori thresholds is a challenging and important problem for field applications Threshold can be parameterized as function of • • •
•
Target FA Number of training frames in speaker model Password length
Threshold parameters can be calibrated on one database and applied to others successfully • • •
Thresholds achieved within target FA range FA rate kept constant during online adaptation Overall performance (EER) does not degrade
27