Rapid Computation of I-vector Longting XU 1,2 , Kong Aik LEE 1 , Haizhou Li 1 and Zhen Yang 2 1 Institute for Infocomm Research ( I 2 R ) , Singapore 2 Nanjing University of Posts and Telecomm, China
Introduction • Compression process – an i-vector is a fixed-length low-dimensional representation of a variable-length speech utterance [Dehak et al, 2011]. • i-vector = speaker + recording device + transmission channel + acoustic environment • MAP estimate – posterior mean of the latent variable x in a multi-Gaussian factor analysis model (i.e., total variability model) The alignment of frames to Gaussian could be accomplished using GMM [Kenny et al, 2008] or senone posteriors [Lei et al , 2014]. 2 Odyssey 2016, Bilbao, Spain
Introduction (cont’d) • I-vector extraction is a posterior inference process Prior Observations Posterior 1 x 0 I , , , , o o o x , L 1 2 T 1 T Σ NT T Σ F 1 T 1 1 T 1 L I L • We use a pre-whiten statistics [Matejka et al , 2011] in this work 1 1 T 1 T L I T NT L T F 3 Odyssey 2016, Bilbao, Spain
Objective • Objective : To reduce the computation complexity of i-vector extraction while keeping the memory requirement low with slight degradation in performance. • Why ? – Of particularly interest for i-vector extraction on hand-held devices and large- scale cloud-based applications – The number of senone posteriors is approaching 10k and beyond [Sadjadi et al , 2016]. – The T matrix is trained offline and one-off. Computational load not general seen as a bottleneck. 4 Odyssey 2016, Bilbao, Spain
Problem statement • The main computational load of i-vector extraction is at the computation of the posterior covariance as part of the posterior mean estimation. • Existing solutions: – Simplifying the posterior covariance estimation • Eigen decomposition of posterior covariance [Glembek et al, 2011] • Fixed occupancy count [Aronowitz et al, 2012] • Factorized subspace [Cumani et al, 2014] • Sparse coding [Xu et al, 2015] 5 Odyssey 2016, Bilbao, Spain
Problem statement (cont’d) • Proposed solution: – Estimate directly the posterior mean without the need to evaluate the posterior covariance • Using informative prior • Uniform occupancy assumption 6 Odyssey 2016, Bilbao, Spain
I-vector extraction using INFORMATIVE PRIOR 7 Odyssey 2016, Bilbao, Spain
Posterior inference with informative prior • Conventional i-vector extraction assumes a standard Gaussian prior on the latent variable x x 0 I , • Consider a more general case where the prior on x has mean 𝛎 p and covariance 𝚻 p μ Σ x , p p • I-vector extraction with informative prior 8 Odyssey 2016, Bilbao, Spain
Subspace ortho normalizing prior • In this work, we consider the following informative prior 1 Σ Σ T T T x 0 , with p p • I-vector extraction 1 1 T 1 T T L T F L T T T NT 1 T T T T T T NT T F 1 1 T T T T T T I T T T NT T F 1 1 1 T T T T I T T T NT T T T F 9 Odyssey 2016, Bilbao, Spain
Subspace ortho normalizing prior (cont’d) • Using the matrix inversion identity • I-vector extraction with subspace orthonormalizing prior: 1 1 1 T T T T I T T T NT T T T F Q P P 1 1 1 T T T T T T T I NT T T T F Projection matrix with 1 T T T T T T T U U 1 1 orthonormal columns 10 Odyssey 2016, Bilbao, Spain
I-vector extraction with RAPID COMPUTATION 11 Odyssey 2016, Bilbao, Spain
Solving the matrix inversion • Singular value decomposition of T T USV U U SV , 1 2 • It follows that U 1 spans the same subspace as T , and U 1 ⊥ U 2 • Using the above, we solve for the following matric inversion 1 1 1 T T T I N T T T T I N U U 1 1 1 T I N I U U 2 2 1 T I N NU U 2 2 12 Odyssey 2016, Bilbao, Spain
Solving the matrix inversion (cont’d) • Let A = ( I + N ) 1 1 1 1 T T T T I N T T T T I N NU U A NU U 2 2 2 2 • Using the matrix inversion lemma 1 1 T 1 1 T 1 T 1 A NU U A A N I U U A N U U A 2 2 2 2 2 2 • Using again the matrix inversion identity 1 1 1 T T 1 1 T 1 T 1 I N T T T T A A NU U I A NU U A 2 2 2 2 13 Odyssey 2016, Bilbao, Spain
Rapid computation of i-vector • Uniform occupancy assumption: 1 1 A N I N N I for 0 1 • Or equivalently N c c 1 N c • The matrix inversion can be simplified as 1 1 1 T T 1 T 1 T 1 I N T T T T A U U I A NU U A 2 2 2 2 • Since T ⊥ U 2 , the second term diminishes 1 1 1 1 1 T T T T T T T T T I NT T T T F T T T I N F 14 Odyssey 2016, Bilbao, Spain
Computational complexity and memory cost Complexity Memory cost Time ratio Baseline (slow) O ( CFM 2 + M 3 ) O ( CFM ) 106.44 Baseline (fast) O ( CFM + CM 2 + M 3 ) O ( CFM + CM 2 ) 11.99 Proposed (exact) O ( CFM + CM 2 + M 3 ) O ( CFM + CM 2 ) 12.65 Proposed (fast) O ( CFM ) O ( CFM ) 1 1 T T Baseline (slow) I c N T T T F c c c 1 T Baseline (fast) I c N A T F c c 1 Proposed (exact) T T c N 1 T T T F c c c 1 1 Proposed (fast) T T T T T I N F 15 Odyssey 2016, Bilbao, Spain
Posterior covariance • Determined by the zero-order statistics and T matrix. • Might be desired for uncertainty propagation. • Using the same derivation procedure as for the posterior mean. • The computational complexity is O ( CM 2 ), assuming that we pre-computed the matrices . T T T c c 16 Odyssey 2016, Bilbao, Spain
Using informative prior in EM update of T 17 Odyssey 2016, Bilbao, Spain
Rapid computation of i-vector EXPERIMENT 18 Odyssey 2016, Bilbao, Spain
Experimental setup • NIST SRE’10 extended core task, CCs 1 to 9 • UBM – Gender dependent with C = 512 mixtures – 57-dim MFCC – SWB, SRE’04, 05, 06 • T matrix – M = 400 – Trained using the same dataset as UBM • PLDA – LDA to 300-dim and length normalization was performed – 200 speaker factors – Full residual covariance for channel modeling 19 Odyssey 2016, Bilbao, Spain
SRE’10 core -extended (female) • Introducing an informative prior does not seem to degrade the performance. • For the tel-tel CC 5 , the relative degradation is 10.04% in EER and 4.54% in min DCF. • For all the 9 CCs, relative degradation ranges from 10.04% to 16.11% in EER and 0.67% to 20.40% in min DCF. EER MinDCF10 20 Odyssey 2016, Bilbao, Spain
SRE’10 core -extended (female) • We trained the T matrix assuming a subspace-orthonormalizing prior. • Comparing to the results using T trained with standard Gaussian prior, a slightly better results could be observed. EER MinDCF10 21 Odyssey 2016, Bilbao, Spain
Conclusion • We introduced the following for Rapid Computation of I-vector – Subspace ortho nomalizing prior – Uniform occupancy assumption • The computational speed-up is attained by avoiding the need to compute the posterior covariance in order to compute the posterior mean. • The proposed method speed up the i-vector extraction by a factor of 12 compared to fast baseline (and a factor of 106 compared to the slow baseline) with marginal degradation in recognition accuracy. 22 Odyssey 2016, Bilbao, Spain
Thanks for Your Attention! QUESTION? 23 Odyssey 2016, Bilbao, Spain
Recommend
More recommend