Iterative Bayesian and MMSE-based noise compensation techniques for speaker recognition in the i-vector space Waad Ben Kheder Driss Matrouf Moez Ajili Jean-Fran¸ cois Bonastre LIA laboratory University of Avignon Odyssey, 2016 1/27
Outline 1 Introduction State of the art speaker recognition systems Dealing with noise in speaker recognition systems 2 I-vector denoising using I-MAP and the Kabsch algorithm Motivation The I-MAP denoising procedure The Kabsch algorithm 3 Experimental protocol and results Experimental protocol Results 2/27
Outline 1 Introduction State of the art speaker recognition systems Dealing with noise in speaker recognition systems 2 I-vector denoising using I-MAP and the Kabsch algorithm Motivation The I-MAP denoising procedure The Kabsch algorithm 3 Experimental protocol and results Experimental protocol Results 3/27
State of the art speaker recognition systems Structure of a speaker recognition system 4/27
Outline 1 Introduction State of the art speaker recognition systems Dealing with noise in speaker recognition systems 2 I-vector denoising using I-MAP and the Kabsch algorithm Motivation The I-MAP denoising procedure The Kabsch algorithm 3 Experimental protocol and results Experimental protocol Results 5/27
Dealing with noise in speaker recognition systems Many techniques can be used to deal with noise : • Speech enhancement techniques • Features compensation (VTS, SPLICE,..) • Model compensation (PMC,..) • Noise robust scoring (multi-style training) • DNN-based techniques (robust feature extraction, robust stats computation, ..) 6/27
Outline 1 Introduction State of the art speaker recognition systems Dealing with noise in speaker recognition systems 2 I-vector denoising using I-MAP and the Kabsch algorithm Motivation The I-MAP denoising procedure The Kabsch algorithm 3 Experimental protocol and results Experimental protocol Results 7/27
Motivation Motivation • Cleaning i-vectors estimated over noisy data (noisy i-vectors). • Using a clean front-end (same i-vectors extraction procedure for all noises). • Using a clean backend (same scoring procedure for all noises). 8/27
Outline 1 Introduction State of the art speaker recognition systems Dealing with noise in speaker recognition systems 2 I-vector denoising using I-MAP and the Kabsch algorithm Motivation The I-MAP denoising procedure The Kabsch algorithm 3 Experimental protocol and results Experimental protocol Results 9/27
The I-MAP denoising procedure I-MAP model The I-MAP procedure is based on the relationship : N = Y − X (1) Where X and Y are two random variables representing respectively clean and noisy i-vectors and N represents the noise. Hypothesis Full-covariance Gaussian distributions are used for : • Clean i-vectors d X ∼ N ( X ; µ X , Σ X ) • Noise in the i-vector space d N ∼ N ( N ; µ N , Σ N ). 10/27
The I-MAP denoising procedure Solution It is possible to write the cleaned-up version ˆ X 0 of a noisy i-vector Y 0 using MAP criterion as : X 0 = (Σ − 1 ˆ N + Σ − 1 X ) − 1 (Σ − 1 N ( Y 0 − µ N ) + Σ − 1 X µ X ) (2) with : • Clean i-vectors d X ∼ N ( X ; µ X , Σ X ) • Noise in the i-vector space d N ∼ N ( N ; µ N , Σ N ). 11/27
The I-MAP denoising procedure Implementation 12/27
The I-MAP denoising procedure How to improve I-MAP ? Problem : I-MAP can’t be used iteratively on noisy test data : the Gaussianity hypothesis is not guaranteed for residual noise. Solution : We propose to complement this technique by applying another MMSE-based approach that uses the Kabsch algorithm. 13/27
Outline 1 Introduction State of the art speaker recognition systems Dealing with noise in speaker recognition systems 2 I-vector denoising using I-MAP and the Kabsch algorithm Motivation The I-MAP denoising procedure The Kabsch algorithm 3 Experimental protocol and results Experimental protocol Results 14/27
The Kabsch algorithm Goal The Kabsch algorithm finds the best translation vector and rotation matrix between two paired sets of points { x i } i =1 .. n and { y i } i =1 .. n . Example 15/27
The Kabsch algorithm Formulation Given two sets of paired points { x i } i =1 .. n and { y i } i =1 .. n . represented as matrices ( P X and P Y ) : x 1 , 1 x 1 , 2 x 1 , M y 1 , 1 y 1 , 2 y 1 , M . . . . . . x 2 , 1 x 2 , 2 x 2 , M y 2 , 1 y 2 , 2 y 2 , M . . . . . . P X = P Y = . . . . . . . . . . . . . . . . . . x N , 1 x N , 2 x N , M y N , 1 y N , 2 y N , M . . . . . . The orthogonal Procrustes problem aims at finding the best orthogonal matrix R that maps P X to P Y according to: R = argmin R � RP Y − P X � F (3) where: R T R = I N and � . � F denotes the Frobenius norm. 16/27
The Kabsch algorithm Step 1: Translation of the two sets of points: 1 Computing the centroids of the clean and noisy sets of i-vectors: • P X = centroid ( P X ) • P Y = centroid ( P Y ) 2 Centering all points of P X and P Y around the origin of the coordinate system: ˜ P X i = P X i − P X for each row P X i of P X . • ˜ P Y i = P Y i − P Y for each row P Y i of P Y . • 17/27
The Kabsch algorithm Step 2: Estimation of the rotation matrix: T ˜ 1 Estimation of a covariance matrix: A = ˜ P X P Y 2 SVD decomposition of A : A = VSW T 3 Computing d = sign ( det ( WV T )) 4 Estimation of the rotation matrix R as: 1 0 0 . . . ... 0 0 V T R = W (4) . . . . . 1 . 0 0 d . . . 18/27
The Kabsch algorithm Step 3: Application of the rotation on test data: Given a set of noisy test i-vectors { t i } i =1 .. N : 1 Centering test i-vectors: ˜ t i = t i − P Y for all i in i = 1 .. N . 2 Rotating test i-vectors: t i = R ˜ ˆ t i + P X for all i in i = 1 .. N . 19/27
Outline 1 Introduction State of the art speaker recognition systems Dealing with noise in speaker recognition systems 2 I-vector denoising using I-MAP and the Kabsch algorithm Motivation The I-MAP denoising procedure The Kabsch algorithm 3 Experimental protocol and results Experimental protocol Results 20/27
Experimental protocol Used data • Train: NIST SRE 2004, 2005, 2006, Switchboard. • Test: NIST SRE 2008 (det7 condition : All trials involve only English language telephone speech in training and test). SR system • 512 components gender-dependent GMM-UBM. • T matrix of low rank 400. • Two-covariance scoring. 21/27
Outline 1 Introduction State of the art speaker recognition systems Dealing with noise in speaker recognition systems 2 I-vector denoising using I-MAP and the Kabsch algorithm Motivation The I-MAP denoising procedure The Kabsch algorithm 3 Experimental protocol and results Experimental protocol Results 22/27
Recognition performance using the Kabsch algorithm Recognition performance on male data in different test conditions using clean enrollment and noisy test data EER(%) I-MAP + Kabsch I-MAP + Kabsch Test condition Baseline Kabsch I-MAP (1 iteration) (2 iterations) 0dB 26.85 17.18 13.21 8.86 7.24 Air-cooling 5dB 15.21 10.34 7.25 4.71 3.89 noise 10dB 9.51 5.70 4.85 2.94 2.55 15dB 5.41 3.40 2.85 1.82 1.63 0dB 25.54 15.83 12.05 7.91 6.37 Car-driving 5dB 14.54 9.30 6.65 3.63 3.04 noise 10dB 8.32 5.15 3.78 1.99 1.82 15dB 4.82 3.22 2.36 1.79 1.65 - I-MAP : 40% to 60% relative EER improvement. - Kabsch: up to 45% of relative EER improvement. - I-MAP + Kabsch: up to 85% of relative EER improvement. 23/27
Recognition performance using the Kabsch algorithm Recognition performance on female data in different test conditions using clean enrollment and noisy test data EER(%) I-MAP + Kabsch I-MAP + Kabsch Test condition Baseline Kabsch I-MAP (1 iteration) (2 iterations) 0dB 27.19 16.95 13.53 10.80 9.49 Air-cooling 5dB 16.77 10.45 8.34 6.66 5.85 noise 10dB 9.01 5.61 4.48 3.58 3.14 15dB 6.42 4.00 3.19 2.75 2.70 0dB 24.82 15.47 12.35 9.86 8.66 Car-driving 5dB 14.90 9.28 7.41 5.92 5.20 noise 10dB 8.65 5.39 4.30 3.43 3.02 15dB 5.89 3.67 3.12 2.95 2.74 - I-MAP : 40% to 60% relative EER improvement. - Kabsch: up to 45% of relative EER improvement. - I-MAP + Kabsch: up to 85% of relative EER improvement. 24/27
Recognition performance using the Kabsch algorithm Performance comparison in a heterogeneous setup for male and female data EER (%) Male Female Baseline 29.65 31.02 Kabsch 18.78 19.95 I-MAP 16.27 17.46 I-MAP + Kabsch (1 iter.) 8.67 10.62 I-MAP + Kabsch (2 iter.) 7.39 9.28 25/27
Summary • Using I-MAP yields 40% to 60% of relative EER improvement compared to a baseline system performance. • Using the Kabsch algorithm yields up to 45% of relative EER improvement compared to a baseline system performance. • Combining the two algorithms iteratively can achieve better results while using the same train data achieving up to 85% of relative EER improvement. 26/27
References I Waad Ben Kheder et al. ”Additive noise compensation in the I-vector space for speaker recognition.”. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015. Wolfgang Kabsch ”A solution for the best rotation to relate two sets of vectors”. Acta Crystallographica 32:922. 27/27
Recommend
More recommend