constrained discriminative speaker verification specific
play

Constrained discriminative speaker verification specific to - PowerPoint PPT Presentation

Constrained discriminative speaker verification specific to normalized i-vectors P.M. Bousquet, J.F. Bonastre LIA University of Avignon the June 21, 2016 P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 1 / 26


  1. Constrained discriminative speaker verification specific to normalized i-vectors P.M. Bousquet, J.F. Bonastre LIA University of Avignon the June 21, 2016 P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 1 / 26

  2. Discriminative approach for i-vector: SoA Normalization Within-class covariance matrix W (centering and scaling) Length normalization Gaussian-PLDA modelling ... parameters ( µ , Φ , Λ ) LLR score Discriminative classifier Logistic regression-based (SoA) with with score coefficients PLDA parameters ( µ , Φ , Λ ) P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 2 / 26

  3. Discriminative approach for i-vector: proposed ... Normalization Within-class covariance matrix W (centering and scaling) Length normalization Additional normalization procedure (intended to constrain the discriminative training) Gaussian-PLDA modelling ... parameters ( µ , Φ , Λ ) LLR score Discriminative classifier Constrained (limited order of coefficients to optimize) Logistic regression-based (SoA) Orthonormal discriminative classifier a new approach ... with with score coefficients PLDA parameters ( µ , Φ , Λ ) P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 3 / 26

  4. Gaussian-PLDA Model A d -dimensional i-vector w can be decomposed as follows: w = µ + Φy s + ε (1) - Φy s and ε are assumed to be statistically independent and ε follows a centered Gaussian distribution with full covariance matrix Λ . - Speaker factor y s can be a full-rank d -vector ( two-covariance model ) or constrained to lie in the r -linear range of the d × r matrix Φ , ( eigenvoice subspace ). P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 4 / 26

  5. Gaussian-PLDA scoring Closed-form solution of LLR-score: the second degree polynomial function of w i and w j components: s i , j = log P ( w i , w j |H tar ) P ( w i , w j |H non ) i P w j + 1 − µ t ( P + Q ) ( w i + w j ) = w t w t i Q w i + w t � � j Q w j 2 + µ t ( P + Q ) µ + 1 2 log | A t | − log | A n | (2) where � − 1 Φ t Λ − 1 P = Λ − 1 Φ 2 Φ t Λ − 1 Φ + I r � � − 1 Φ t Λ − 1 Q = P − Λ − 1 Φ Φ t Λ − 1 Φ + I r � � − 1 2 Φ t Λ − 1 Φ + I r � A t = � − 1 Φ t Λ − 1 Φ + I r � A n = (3) P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 5 / 26

  6. Discriminative classifiers for speaker recognition SoA: based on logistic regression Given the dataset of target and non-target trials χ tar , χ non with cardinalities N tar , N non respectively, the log-probability of correctly classifying all training ( total cross entropy ) is equal to: 1 1 � � � Nnon � � � Ntar TCE = P ( H non | t ) P ( H tar | t ) (4) t ∈ χ non t ∈ χ tar Goal : maximizing the (log-)TCE by gradient-descent with respect to some coefficients: PLDA LLR score coefficients (i.e. of score matrices P and Q ). LLR-score can be written as a dot-product ϕ i , j .ω between an expanded vector of a trial ϕ i , j and a vector ω initialized with PLDA parameters [Burget et al., 2011] PLDA parameters ( µ, Φ , Λ ) [B¨ orgstrom and Mac Cree, 2013] P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 6 / 26

  7. Discriminative classifiers for speaker recognition Difficulties to overcome Discriminative training (DT) can suffer from various limitations: data insufficiency over-fitting on development data respect of metaparameters conditions: definiteness, positivity / negativity of PLDA LLR-score covariance matrices ... P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 7 / 26

  8. Discriminative classifiers for speaker recognition Difficulties to overcome Discriminative training (DT) can suffer from various limitations: data insufficiency over-fitting on development data respect of metaparameters conditions: definiteness, positivity / negativity of PLDA LLR-score covariance matrices ... Constrained DT : training only a small amount of parameters ⇒ order O ( d ), or even O (1), instead of O ( d 2 ). = Some solutions [Rohdin et al., 2016, B¨ orgstrom and Mac Cree, 2013]: single coefficient optimized for each dimension of the i-vector or, even, the four feature kinds that make up score. only mean vector µ and eigenvalues of PLDA matrices ΦΦ t and Λ are trained by DT and, even, their scaling factors only. metaparameters conditions: working with singular value decomposition of P and Q / flooring of parameters. P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 7 / 26

  9. Discriminative classifiers for speaker recognition Difficulties to overcome Discriminative training (DT) can suffer from various limitations: data insufficiency over-fitting on development data respect of metaparameters conditions: definiteness, positivity / negativity of PLDA LLR-score covariance matrices ... DT struggles to improve speaker detection when i-vectors have been first normalized, whereas this option has proven to achieve best performance in speaker verification. P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 7 / 26

  10. Normalization step Normalization Within-class covariance matrix W (centering and scaling) Length normalization Additional normalization procedure (intended to constrain the discriminative training) Gaussian-PLDA modelling ... parameters ( µ , Φ , Λ ) LLR score Discriminative classifier Constrained (limited order of coefficients to optimize) Logistic regression-based (SoA) Orthonormal discriminative classifier a new approach ... with with score coefficients PLDA parameters ( µ , Φ , Λ ) P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 8 / 26

  11. Normalization step Within-class covariance matrix W (centering and scaling) Length normalization = ⇒ W is almost exactly isotropic, i.e. W ≈ σ I , σ > 0 P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 9 / 26

  12. Normalization step Within-class covariance matrix W (centering and scaling) Length normalization Proposed : Additional normalization step (which does not modify distances between i-vectors): Rotation by the eigenvector basis of between-class covariance matrix B of the training dataset. B = P∆P t ( SVD ) w ← P t w P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 9 / 26

  13. Normalization step Within-class covariance matrix W (centering and scaling) Length normalization Proposed : Additional normalization step (which does not modify distances between i-vectors): Rotation by the eigenvector basis of between-class covariance matrix B of the training dataset. = ⇒ B is diagonal, = ⇒ W remains almost exactly isotropic (and therefore diagonal), since B -eigenvector basis is orthogonal. Assumptions: PLDA matrices ΦΦ t , Λ become almost diagonal, and even isotropic for Λ (as a consequence, P and Q of score are almost diagonal) P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 9 / 26

  14. Normalization step Within-class covariance matrix W (centering and scaling) Length normalization Proposed : Additional normalization step (which does not modify distances between i-vectors): Rotation by the eigenvector basis of between-class covariance matrix B of the training dataset. Moreover, W − 1 B ≈ B ⇒ the LDA solution can be identified as the subspace of the first r eigenvectors of B . First r components of training i-vectors are approximately their projection onto the LDA r -subspace. P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 9 / 26

  15. Normalization step The score can be rewritten as this sum of O ( r ) terms: r � p k w i , k w j , k + 1 � � w 2 i , k + w 2 � � s i , j = 2 q k − ( p k + q k ) µ k ( w i , k + w j , k ) j , k k =1 + res i , j (5) where r is the range of the PLDA eigenvoice subspace res i , j sums all the diagonal terms beyond the r th dimension, all the off-diagonal terms and offsets. Thus, we assume that the major proportion of variability in the LLR-score is contained into the first r terms of the sum above (the residual term is negligible). P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 10 / 26

  16. Normalization step Table: Analysis of PLDA parameters before and after the B -rotation additional normalization procedure. before after male female male female Diagonality of... PLDA eigenvoice subspace ΦΦ t 0.23 0.15 0.95 0.97 PLDA score matrix P 0.48 0.25 0.98 0.96 PLDA score matrix Q 0.41 0.23 0.96 0.97 Isotropy of PLDA nuisance variability Λ 0.98 0.96 0.99 0.97 Residual variance 0.29 0.42 0.004 0.004 � 2 � diag ( ΦΦ t ) Tr Diagonality of the symmetric matrix ΦΦ t : ∈ [0 , 1] Tr (( ΦΦ t ) 2 ) m 2 Isotropy of Λ : d × Tr ( Λ 2 ) ∈ [0 , 1] where m Λ denotes the mean value of Λ Λ -diagonal var ( res ) Variance of the residual term: var ( score ) ∈ [0 , 1] P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 11 / 26

Recommend


More recommend