1 Centre for Language and Speech Technology Radboud University Nijmegen The Netherlands Speaker line-up calibration of the i-vector based speaker recognition system for forensic application M. I. Mandasari, D. van Leeuwen and M. McLaren The International Association of Forensic Phonetics and Acoustics 2011 Annual Conference 24-28 July, 2011; Vienna, Austria
2 Outline • Why Likelihood Ratio (LR) calibration? • LR calibration methods ▫ Linear calibration ▫ Line-up calibration (2011) • I-vector based automatic speaker recognition system for forensic application • Experiment and results
3 Likelihood Ratio (LR) • In forensic evidence reporting ▫ Scores – LR representation Prosecution hypothesis Defense hypothesis Trace ▫ Used for posterior odds computing by the fact finder (Posterior odds) (Prior odds)
4 Why is LR calibration important? A study from Rodriguez et. al. (2007): “LR calculated from the un -calibrated system was often misleading , while the calibrated system produced more reliable LR” Well-Calibrated CALIBRATION System Automatic Speaker Recognition System LR Good for Forensics
5 LR calibration method Linear Calibration • 2007 [ref. 7] Line-up Calibration • 2011 [ref. 6]
6 Linear calibration • Scores Linear transformation LR • Calibration: ▫ Optimize the linear transformation ▫ Using a set of development scores ▫ to minimize … The C llr provides an estimation of calibration error over all priors . • Miscalibration cost: ▫ Low miscalibration cost indicates that the system produces more reliable LRs.
7 Line-up LR calibration method • Motivated by the witness line-up scenario in forensic tasks. Suspect Witness Foils Foils
8 Line-up LR calibration method Each speaker scores is “lined - up” with all foils speakers Determining the rank within the line-up set Computing the calibrated LR value!
9 I-vector based speaker recognition i -vector is a speech representation in a low-dimensional total variability space. [Dehak, et. al, 2009] Speech A Within Class Linear Discriminant Total Covariance Analysis (LDA) Variability Projection Normalization space (400D) (200D) (WCCN) Speech B i-vector w w B A Cosine Kernel Scoring LR w w LRs Scores A B . calibration w w A B .
10 I-vector system for forensics [ref. 4] • The i- vector speaker recognition system … ▫ has a good performance in classification & calibration, and ▫ offer a good separation of target and non- target scores • The symmetrical behavior of the i-vector system is of particular interest in forensic evidence reporting, where long speech samples can be collected from a suspected speaker in an interview scenario while the trace may be of uncontrolled duration.
11 i-vector classification performance Symmetrical!
12 Experiment setup • i-vector based automatic speaker recognition • Dataset: ▫ NIST SRE 2010 (Halved into two datasets with disjoint speakers) ▫ For duration = 5, 10, 20, 40 sec. and full utterances • Linear vs. Line-up calibration method • Performance parameter ▫ Classification : EER (Equal Error Rate) ▫ Calibration : Mis-calibration
13 Classification Performance Male Female
14 Classification Performance Male Female
15 Classification Performance • Still offer symmetrical behavior in Line-up calibration, • EER in line-up calibration is generally better than in linear calibration, and • The EER improvement is greater in short duration cases. • To conclude… ▫ Line-up calibration gives a better classification performance in general than linear calibration method.
16 Calibration Performance Male Female
17 Calibration Performance Male Female
18 Calibration Performance • In both male and female case, the miscalibration parameter of the linear calibration method is generally better than the line-up calibration method, however • The difference of the calibration performance, measured by C llr is small – (not more than 0.01) • To conclude ▫ Calibration performance within the line-up calibration method is not better than the linear method, but it is not that bad either.
19 Our Findings Linear vs. Line-up Performance Gender calibration Classification Male .3822 (EER, %) Female .3496 Calibration Male .0052 (Miscalibration) Female .0104 ▫ EER with line-up calibration is better, somehow it shows that this calibration method act more like score normalization * in the system.
20 Reference 1. Butcher, A.R. (2002). Forensic Phonetics: Issues in speaker identification evidence. Proceedings of the Inaugural International Conference of the Institute of Forensic Studies , Italy, p.3-5. 2. Brümmer, N. (2006). Focal II: Toolkit for calibration of multi-class recognition scores , software available at http://www.dsp.sun.ac.za/~nbrummer/focal/index.htm. 3. Dehak, N., Dehak, R., Glass, J., Reynolds, D. and Kenny, P. (2010). Cosine similarity scoring without score normalization techniques. Proceeding of Odyssey . 4. Mandasari, M. I., McLaren, M. and van Leeuwen, D. (2011). Evaluation of i-vector Speaker Recognition Systems for Forensic Application. Submitted to the 12 th Annual Conference of the International Speech Communication Association , Florence, Italy. 5. Rodriguez J. G. and Ramos, D. (2007). Forensic automatic speaker classification in the “coming paradigm shift”. Speaker Classification p. 205 -217. Springer. 6. van Leeuwen, D. and Brümmer, N. (2011). A speaker line-up for the likelihood ratio. Submitted to the 12 th Annual Conference of the International Speech Communication Association , Florence, Italy. 7. van Leeuwen, D. and Brümmer, N. (2007). An introduction to application- independent evaluation of speaker recognition systems. Speaker Classification p. 330-353. Springer.
21 Vienna, 25 July 2011
Recommend
More recommend