rapid computation of i vector
play

Rapid Computation of I-vector Longting XU 1,2 , Kong Aik LEE 1 , - PowerPoint PPT Presentation

Rapid Computation of I-vector Longting XU 1,2 , Kong Aik LEE 1 , Haizhou Li 1 and Zhen Yang 2 1 Institute for Infocomm Research ( I 2 R ) , Singapore 2 Nanjing University of Posts and Telecomm, China Introduction Compression process an


  1. Rapid Computation of I-vector Longting XU 1,2 , Kong Aik LEE 1 , Haizhou Li 1 and Zhen Yang 2 1 Institute for Infocomm Research ( I 2 R ) , Singapore 2 Nanjing University of Posts and Telecomm, China

  2. Introduction • Compression process – an i-vector is a fixed-length low-dimensional representation of a variable-length speech utterance [Dehak et al, 2011]. • i-vector = speaker + recording device + transmission channel + acoustic environment • MAP estimate – posterior mean of the latent variable x in a multi-Gaussian factor analysis model (i.e., total variability model) The alignment of frames to Gaussian could be accomplished using GMM [Kenny et al, 2008] or senone posteriors [Lei et al , 2014]. 2 Odyssey 2016, Bilbao, Spain

  3. Introduction (cont’d) • I-vector extraction is a posterior inference process Prior Observations Posterior          1 x 0 I , , , , o o o x , L 1 2 T    1          T Σ NT T Σ F 1 T 1 1 T 1 L I L • We use a pre-whiten statistics [Matejka et al , 2011] in this work      1      1 T 1 T L I T NT L T F 3 Odyssey 2016, Bilbao, Spain

  4. Objective • Objective : To reduce the computation complexity of i-vector extraction while keeping the memory requirement low with slight degradation in performance. • Why ? – Of particularly interest for i-vector extraction on hand-held devices and large- scale cloud-based applications – The number of senone posteriors is approaching 10k and beyond [Sadjadi et al , 2016]. – The T matrix is trained offline and one-off. Computational load not general seen as a bottleneck. 4 Odyssey 2016, Bilbao, Spain

  5. Problem statement • The main computational load of i-vector extraction is at the computation of the posterior covariance as part of the posterior mean estimation. • Existing solutions: – Simplifying the posterior covariance estimation • Eigen decomposition of posterior covariance [Glembek et al, 2011] • Fixed occupancy count [Aronowitz et al, 2012] • Factorized subspace [Cumani et al, 2014] • Sparse coding [Xu et al, 2015] 5 Odyssey 2016, Bilbao, Spain

  6. Problem statement (cont’d) • Proposed solution: – Estimate directly the posterior mean without the need to evaluate the posterior covariance • Using informative prior • Uniform occupancy assumption 6 Odyssey 2016, Bilbao, Spain

  7. I-vector extraction using INFORMATIVE PRIOR 7 Odyssey 2016, Bilbao, Spain

  8. Posterior inference with informative prior • Conventional i-vector extraction assumes a standard Gaussian prior on the latent variable x   x 0 I , • Consider a more general case where the prior on x has mean 𝛎 p and covariance 𝚻 p   μ Σ x , p p • I-vector extraction with informative prior 8 Odyssey 2016, Bilbao, Spain

  9. Subspace ortho normalizing prior • In this work, we consider the following informative prior      1  Σ Σ T T T x 0 , with p p • I-vector extraction  1          1 T 1 T T L T F L T T T NT    1       T T T T T T NT T F          1    1    T T T T T T I T T T NT T F        1     1 1    T T T T I T T T NT T T T F   9 Odyssey 2016, Bilbao, Spain

  10. Subspace ortho normalizing prior (cont’d) • Using the matrix inversion identity • I-vector extraction with subspace orthonormalizing prior:      1     1 1     T T T T I T T T NT T T T F   Q P P      1     1 1    T T T T T T T I NT T T T F      Projection matrix with 1  T T T T T T T U U 1 1 orthonormal columns 10 Odyssey 2016, Bilbao, Spain

  11. I-vector extraction with RAPID COMPUTATION 11 Odyssey 2016, Bilbao, Spain

  12. Solving the matrix inversion • Singular value decomposition of T     T USV U U SV , 1 2 • It follows that U 1 spans the same subspace as T , and U 1 ⊥ U 2 • Using the above, we solve for the following matric inversion    1     1 1        T T T I N T T T T I N U U       1 1    1       T I N I U U   2 2    1    T I N NU U 2 2 12 Odyssey 2016, Bilbao, Spain

  13. Solving the matrix inversion (cont’d) • Let A = ( I + N )        1      1 1 1        T T T T I N T T T T I N NU U A NU U     2 2 2 2 • Using the matrix inversion lemma       1 1         T 1 1 T 1 T 1 A NU U A A N I U U A N U U A 2 2 2 2 2 2 • Using again the matrix inversion identity      1     1 1          T T 1 1 T 1 T 1 I N T T T T A A NU U I A NU U A     2 2 2 2 13 Odyssey 2016, Bilbao, Spain

  14. Rapid computation of i-vector • Uniform occupancy assumption:            1 1 A N I N N I for 0 1 • Or equivalently N    c c  1 N c • The matrix inversion can be simplified as      1     1 1           T T 1 T 1 T 1 I N T T T T A U U I A NU U A     2 2 2 2 • Since T ⊥ U 2 , the second term diminishes        1         1 1 1      1 T T T T T T T T T I NT T T T F T T T I N F   14 Odyssey 2016, Bilbao, Spain

  15. Computational complexity and memory cost Complexity Memory cost Time ratio Baseline (slow) O ( CFM 2 + M 3 ) O ( CFM ) 106.44 Baseline (fast) O ( CFM + CM 2 + M 3 ) O ( CFM + CM 2 ) 11.99 Proposed (exact) O ( CFM + CM 2 + M 3 ) O ( CFM + CM 2 ) 12.65 Proposed (fast) O ( CFM ) O ( CFM ) 1     1     T T Baseline (slow) I c N T T T F c c c     1     T Baseline (fast) I c N A T F c c       1     Proposed (exact) T T c N 1 T T T F c c c       1    1 Proposed (fast) T T T T T I N F 15 Odyssey 2016, Bilbao, Spain

  16. Posterior covariance • Determined by the zero-order statistics and T matrix. • Might be desired for uncertainty propagation. • Using the same derivation procedure as for the posterior mean. • The computational complexity is O ( CM 2 ), assuming that we pre-computed the matrices . T T T c c 16 Odyssey 2016, Bilbao, Spain

  17. Using informative prior in EM update of T 17 Odyssey 2016, Bilbao, Spain

  18. Rapid computation of i-vector EXPERIMENT 18 Odyssey 2016, Bilbao, Spain

  19. Experimental setup • NIST SRE’10 extended core task, CCs 1 to 9 • UBM – Gender dependent with C = 512 mixtures – 57-dim MFCC – SWB, SRE’04, 05, 06 • T matrix – M = 400 – Trained using the same dataset as UBM • PLDA – LDA to 300-dim and length normalization was performed – 200 speaker factors – Full residual covariance for channel modeling 19 Odyssey 2016, Bilbao, Spain

  20. SRE’10 core -extended (female) • Introducing an informative prior does not seem to degrade the performance. • For the tel-tel CC 5 , the relative degradation is 10.04% in EER and 4.54% in min DCF. • For all the 9 CCs, relative degradation ranges from 10.04% to 16.11% in EER and 0.67% to 20.40% in min DCF. EER MinDCF10 20 Odyssey 2016, Bilbao, Spain

  21. SRE’10 core -extended (female) • We trained the T matrix assuming a subspace-orthonormalizing prior. • Comparing to the results using T trained with standard Gaussian prior, a slightly better results could be observed. EER MinDCF10 21 Odyssey 2016, Bilbao, Spain

  22. Conclusion • We introduced the following for Rapid Computation of I-vector – Subspace ortho nomalizing prior – Uniform occupancy assumption • The computational speed-up is attained by avoiding the need to compute the posterior covariance in order to compute the posterior mean. • The proposed method speed up the i-vector extraction by a factor of 12 compared to fast baseline (and a factor of 106 compared to the slow baseline) with marginal degradation in recognition accuracy. 22 Odyssey 2016, Bilbao, Spain

  23. Thanks for Your Attention! QUESTION? 23 Odyssey 2016, Bilbao, Spain

Recommend


More recommend