Speaker Verification using i-Vectors CAST-F¨ orderpreis IT-Sicherheit 2014 Andreas Nautsch Hochschule Darmstadt, atip GmbH, CASED, da/sec Security Research Group Darmstadt, 20.11.2014 . . . . . . .. . .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . ....... ........... .. .. ..... ........... . . .......... ....... . . ......... ......... .. .. .. .. .. .. .. .. .. .. .. .. .. . .. .. . . . . . Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 1/22
Outline ◮ Motivation with research questions ◮ Speaker verification and i-vectors ◮ Research and development ◮ Conclusion and future perspectives Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 2/22
Motivation Biometric IT-security & forensic applications ◮ Authentication and recognition by voice ◮ Advantages to knowledge-/token-based approaches: ◮ Cannot be forgotten ◮ Cannot be incorporated ◮ Application fields and scenarios e.g.: ◮ Mobile device authentication: random PINs, short duration ◮ Call-center user validation: free speech, variant duration ◮ Suspect tracking: various contents & signal qualities Voice reference Feature extraction Accept Comparison Score Reject Voice probe Feature extraction Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 3/22
Motivation Assessment of Speaker Recognition ◮ Known technology: modeling acoustic features ✦ Very accurate due to detailed modeling ✦ Fast processing on short duration scenarios ✪ High computational effort on text-independent scenarios ◮ State-of-the-Art: identity vector (i-vector) features, 2011 ✦ Fully text- & language-independent ✦ Fast computation & scoring independent of duration ✪ Unknown behavior in commercial voice biometric scenarios Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 4/22
Motivation Voice biometrics: speech duration & sample completeness ◮ Sound unit (phoneme) distribution by duration From [ T. Hasan et al., 2013] ◮ Text-independent case: content varies from sample to sample ◮ Long duration ⇒ stable distribution ◮ Short duration ⇒ insufficient data Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 5/22
Motivation Baseline speaker recognition approach ◮ Hidden-Markov-Models (HMMs) ◮ State-based model ◮ States can represent articulation phases, phonemes, . . . p 00 p 11 p 22 p 10 p 12 0 1 2 ◮ Detailed, but extensive computation of optimal path 2 2 2 2 2 2 2 2 2 Speaker model � 1 1 1 1 1 1 1 1 1 Impostor model 0 0 0 0 0 0 0 0 0 Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 6/22
Motivation Research questions 1. Is the i-vector approach extensible on short duration scenarios with applicable performances? 2. Do i-vector systems deliver new information to HMM systems? 3. Are duration-depending performance mismatches compensable? Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 7/22
Speaker Verification Speech processing 1. Raw speech signal as air pressure changes 2. Frequency analysis: spectral representation 3. Short-time acoustic features, e.g. Mel-Frequency Cepstral Coefficients (MFCCs) 100 Air pressure Frequency 50 MFCCs 0 − 50 Time Time Time Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 8/22
Speaker Verification Statistic model: i-vector extraction 4. Gaussian modeling 5. Speaker sub-spaces from Universal Background Model (UBM) 6. identity vectors (i-vectors) as characteristic offset Mapping by total variability matrix * UBM MFCC-2 MFCC-2 Speaker A Speaker B MFCC-1 MFCC-1 * iteratively in order to optimize the model fit: UBM offset �→ i-vectors on development data Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 9/22
Speaker Verification How to imagine i-vectors? Speaker separation ◮ Relevant parameters ◮ UBM size: detail of the acoustic space ◮ # iterations: adaptation depth of total variability ◮ # characteristic factors: i-vector dimension Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 10/22
Research & Development Extending the baseline, exploring new technologies ◮ Research on new technologies ◮ Experimental evaluations on i-vectors ◮ Commercial and academic scenarios ◮ Participation in international research evaluation ◮ Developing more robust approaches ◮ Extending state-of-the-art i-vector score normalization ◮ Implementation of speaker verification framework in Matlab according to ISO/IEC IS 19795-1 Information technology – Biometric performance testing and reporting – Part 1: Principles and framework ⇒ reproducible research Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 11/22
Research & Development Commercial scenario: experimental set-up ◮ Aiming at research questions 1. i-vectors on short duration scenarios? 2. New information to HMMs by i-vectors? ◮ Short but fix duration scenario ◮ In-house database: 3 – 5 German digits ◮ Text-independent: random sequences ◮ 362 / 56 / 300 subjects (development, calibration, evaluation) ◮ 30 – 34 reference / 2 probe samples ≈ 200,000 comparisons Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 12/22
Research & Development Examining i-vector parameters UBM size: 128 UBM size: 256 EER (in %) 5 EER (in %) 5 2 2 1 1 2 2 200 200 300 5 300 5 400 400 10 10 600 600 Iterations Iterations Factors Factors UBM size: 512 UBM size: 1024 EER (in %) 5 EER (in %) 5 2 2 1 1 2 2 200 200 300 5 300 5 400 400 10 10 600 600 Iterations Iterations Factors Factors Equal Error Rate (EER): % impostor match = % genuine non-match → lower error = better Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 13/22
Research & Development Performance analysis Detection Error Tradeoff diagram UBM size: 128 UBM size: 256 False Non-Match Rate (FNMR in %) 40 EER (in %) EER (in %) 5 5 20 2 2 1 1 2 2 200 200 5 5 10 300 300 400 400 10 10 600 600 5 Iterations Iterations 2 Factors Factors 1 .5 .2 UBM size: 512 UBM size: 1024 .1 .1.2 .5 1 2 5 10 20 40 EER (in %) EER (in %) False Match Rate (FMR in %) 5 5 2 2 1 1 i-vector-128 i-vector-128 2 2 200 200 5 5 300 300 400 400 10 10 i-vector-256 i-vector-256 600 600 i-vector-512 i-vector-512 Iterations Iterations Factors Factors 30 FNMs Equal Error Rate (EER): % impostor match = % genuine non-match → lower error = better Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 14/22
Research & Development Information analysis: cross-entropy of system fusion HMM i-vector-256 Normalized entropy Normalized entropy 1 1 0 . 5 0 . 5 0 0 − 10 − 5 0 5 10 − 10 − 5 0 5 10 Bayesian thresholds η Bayesian thresholds η Optimizing min Entropy( η ≈ 4 . 6 � = odds 1:100) Normalized entropy 1 0 . 5 0 − 10 − 5 0 5 10 Bayesian thresholds η HMM+i-vector-256 Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 15/22
Research & Development Performance gains by fusion HMM i-vector-256 40 Normalized entropy Normalized entropy 1 1 20 FNMR (in %) 10 0 . 5 0 . 5 5 0 0 − 10 − 5 0 5 10 − 10 − 5 0 5 10 2 Bayesian thresholds η Bayesian thresholds η 1 .5 Optimizing min Entropy( η ≈ 4 . 6 � = odds 1:100) .2 .1 .1.2 .5 1 2 5 10 20 40 Normalized entropy FMR (in %) 1 HMM 0 . 5 i-vector-256 HMM+i-vector-256 0 − 10 − 5 0 5 10 30 FNMs Bayesian thresholds η HMM+i-vector-256 Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 16/22
Research & Development Academic scenario: experimental set-up ◮ Aiming at research question 3. Compensation of duration mismatches on i-vectors? ◮ Variable duration scenario ◮ Data of 2013 – 2014 NIST i-vector Machine Learning challenge ◮ Text-independent, multi-lingual scenario ◮ 4,781 / 1,306 subjects (development, evaluation) ◮ 5 reference samples / 9,634 probe i-vectors ≈ 12,000,000 comparisons Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 17/22
Research & Development Analysis of a variant duration scenario ◮ 10 offline 5-fold cross-validations ◮ NIST baseline system 2048-component UBM, 60 MFCCs, 600-dim i-vectors ◮ Adaptive Symmetric (AS) score-normalization � � S ′ = 1 + S − µ probe S − µ reference , each from top-100 scores of comparisons to dev-set 2 σ reference σ probe 10 EER (in %) 5 2 1 5s 10s 20s 40s full Baseline AS-norm Baseline, all AS-norm, all Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 18/22
Recommend
More recommend