Bayesian and Discriminative Speaker Adaptation Chih- -Hsien Hsien - PowerPoint PPT Presentation

Bayesian and Discriminative Speaker Adaptation Chih- -Hsien Hsien Huang Huang Chih Supervisor : Prof. Jen-Tzung Chien National Cheng Kung University

Outline � INTRODUCTION � INTRODUCTION INTRODUCTION � � LARGE VOCABULARY CONTINUOUS SPEECH � LARGE VOCABULARY CONTINUOUS SPEECH LARGE VOCABULARY CONTINUOUS SPEECH � RECOGNITION RECOGNITION RECOGNITION � KEYPOINTS OF THIS TALK � CONTRIBUTIONS OF DISSERTATION KEYPOINTS OF THIS TALK � � BAYESIAN DURATION ADAPTATION � BAYESIAN DURATION ADAPTATION BAYESIAN DURATION ADAPTATION � � DISCRIMINATIVE LINEAR REGRESSION DISCRIMINATIVE LINEAR REGRESSION � ADAPTATION ADAPTATION � EXPERIMENTS EXPERIMENTS � � CONCLUSION AND FUTURE WORKS CONCLUSION AND FUTURE WORKS � 2

INTRODUCTION

Why Speech Recognition is Important? � Speech communication Speech communication is one of the basic and essential � capabilities of human beings. � Speech is the only way to exchange information without any tools. � Speech control Speech control is natural on mobile devices. � � Automatic speech recognition Automatic speech recognition is important to broadcast news � transcription. � High performance automatic speech recognition recognition and summarization is desirable. summarization 4

LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION

Elements of Speech Recognition � State-of-the-art speech recognizer is based on hidden Markov hidden Markov models (HMMs). models � Parameter estimation is performed through EM algorithm EM algorithm . � Decoding rule is according to MAP criterion MAP criterion . � Goal of speech recognizer is to minimize the classification classification error . error 6

Bayesian Decision Theory � Bayes rule ( | ) ( ) P X W P W = ( | ) P W X ( ) P X � MAP decoding criterion ˆ = arg max ( ) ( ) W P X W P W W 7

Hidden Markov Models a a a 22 33 11 � Left-to-Right HMM a a 23 12 1 2 3 b b b 2 3 1 λ � Parameters of HMM { } π = π � Initial probabilities i { } A = � Transition probabilities a ij { } = ( ⋅ � Output probabilities ) B b i � Mixture of Gaussians ∑ = μ Σ M ( ) ( , ) b x c N x = j jm jm jm m 1 8

Lexicon Tree Speech Signal Large Vocabulary Feature Feature Continuous Speech Extraction Vectors Recognition Recognition results Hidden n -gram Markov language models models

Lexicon Linear structure � Tree structure � 逼,弊,... ㄅㄧㄢ辦公辦,搬,... ㄔㄍㄨㄥ成功成,程,... 成功大學ㄥㄨㄥㄉㄒㄚㄍㄩㄝㄓ成長ㄤ 10

Search Algorithm subsyllable J(k) = + − The j th state of the k th ( , , ) ( ) max ( 1 , , ' ) Q t k j b t Q t k j , k j − ≤ ≤ j 1 ' j j j subsyllable J(k) 1 1 t T The k th States Observation Transitions within subsyllable 1 The k 'th subsyllable J(k') 1 1 t T Observation Transitions across subsyllables = + ( , , 0 ) ( ) Q t k b t { } , k j − − max max ( 1 , ' , ), ( 1 , , 0 ) Q t k J Q t k ' k ≤ ≤ 1 k ' K 11

Tree-Copy Search Concept Q(word history, arc, state) P( ‧ | 了 ) P( ‧ | 在 ) P( ‧ | 我 ) P( ‧ | 開始 ) 在 P( ‧ | 從此 ) 開始了我清華從此 P( ‧ | sil) P( ‧ | 在 ) P( ‧ | 清華 ) 災 Language Model Look-ahead Acoustic Look-ahead 台視無從窩在職樂 P( ‧ | 無從 ) P( ‧ | 台視 ) P( ‧ | 樂 ) P( ‧ | 窩 ) P( ‧ | 在職 ) V trees V trees V trees V trees V trees 12

A A A A B B C C sil sil A A B B B B C C sil sil A A B B C C C C sil sil A A B B sil sil C C sil sil acoustic model acoustic model language model t t

Search Algorithm Proceed from left to right over time t Acoustic level : process states of lexical trees − = = − Initialization: ( 1 , 0 ) ( ; 1 ) Q v t s H v t − = = − ( 1 , 0 ) 1 B v t s t { } = ′ ⋅ − ′ ( , ) max ( , | ) ( 1 , ) Time alignment: Q t s p x s s Q t s v t v ′ s Propagate back pointers ( s , ) B v t Prune unlikely hypotheses Word pair level : process word ends For each pair ( ; ) w t { } = ⋅ ( ; ) max ( | ) ( , ) H w t p w v Q t s v w v { } = ⋅ ( ; ) arg max ( | ) ( , ) v w t p w v Q t s 0 v w v v = ( ; ) Store best predecessor v w t 0 0 τ = Store best boundary ( , ) B t S v v w 14 0 0

Mismatch Problem � Many mismatch sources mismatch sources exist between training and test data in real applications. � Most popular technique is to conduct speaker/environment speaker/environment adaptation . adaptation � Maximum a posteriori (MAP) � Speaker clustering � Linear regression 15

Speaker Training indepent H C T A M S I M Acoustic Speech Models Database Speaker Adaptation Testing Adapted Data

Keypoints of This Talk � Bayesian Duration Adaptation � Parametric duration modeling � Gaussian, Poisson and gamma distributions � Joint sequential learning of acoustic model and duration model � QB estimates of Gaussian and Poisson duration models were formulated. � Reproducible prior/posterior property was exploited. 17

� Aggregate a Posteriori Linear Regression � Robustness � Considering the prior information of regression matrix � The relation of AAPLR and MAPLR was illustrated. � Discriminative adaptation � The AAP criterion can be represented as the form of minimum error rate. � Rapid adaptation � AAPLR has closed-form solution. � It is superior to traditional discriminative adaptation. (MCELR) 18

BAYESIAN DURATION ADAPTATION

Background Knowledge � Speaking rate Speaking rate is one of the mismatch sources between training � and testing. � In standard HMM, the state duration is represented with transition probability . transition probability � Non-parametric approaches � Ferguson explicitly modeled the duration � Too many parameters � Parametric approaches � Russell and Moore applied Poisson distribution � Levinson applied gamma distribution 20

Parametric Duration Modeling � HMM parameter set is extended with state duration � Initial state probability, � Transition probability, � Observation density, � Duration density, � Maximum likelihood criterion 21

0.30 0.25 Relative Frequency(%) 0.20 0.15 0.10 0.05 0.00 0 2 4 6 8 10 Duration Length 22

0.30 0.25 Relative Frequency(%) 0.20 0.15 0.10 0.05 0.00 0 2 4 6 8 10 Duration Length 23

0.30 0.25 Relative Frequency(%) 0.20 0.15 0.10 0.05 0.00 0 2 4 6 8 10 Duration Length

empirical distribution 0.30 Geometric distribution Gaussian distribution 0.25 Poisson distribution gamma distribution Relative Frequency(%) 0.20 0.15 0.10 0.05 0.00 0 2 4 6 8 10 Duration Length

Parametric Duration Models � Duration models and their prior distributions � Gaussian distribution Gaussian distribution with Gaussian prior Gaussian prior � � Poisson distribution Poisson distribution with gamma prior gamma prior � � Gamma distribution Gamma distribution with Gaussian prior Gaussian prior � � Estimation Criteria � ML estimation � MAP estimation � QB estimation 26

ML Parameter Estimation � Auxiliary Q-function 27

ML Estimation for Different Duration Parameters � Gaussian Gaussian Duration Parameters � 28

� Poisson Poisson Duration Parameters � � Gamma Gamma Duration Parameters �

Bayesian Learning of Duration Models � MAP batch learning � Risk function Risk function � 30

QB Sequential Learning � Risk function Risk function � 31

MAP Estimation for Gamma Gamma Duration Parameters Gamma duration with Gaussian prior � M-step, � � for the parameter η 32

ν for the parameter for the parameter � � � No closed No closed- -form solution form solution exists. � � Newton’s algorithm can be applied. 33

QB Estimation for Gaussian Gaussian Duration Parameters � Gaussian Duration with Gaussian prior � QB estimate is obtained by 34

QB Estimation for Poisson Poisson Duration Parameters � Poisson duration with gamma prior � E-step 35

Updating Hyperparameters � Gamma Gamma hyperparameters : � � Poisson Poisson parameters : � 36

DISCRIMINATIVE LINEAR REGRESSION ADAPTATION

Estimation Criteria � Distribution estimation Distribution estimation and discriminative training discriminative training are two � categories of HMM parameter estimation approach. � Distribution estimation � Maximum likelihood Maximum likelihood criterion � � Maximum Maximum a posteriori a posteriori criterion � � Discriminative training � Minimum classification error Minimum classification error ( MCE MCE ) criterion � � Maximum mutual information Maximum mutual information ( MMI MMI ) criterion � 38

Bayesian and Discriminative Speaker Adaptation Chih- -Hsien Hsien - PowerPoint PPT Presentation

Bayesian and Discriminative Speaker Adaptation Chih- -Hsien Hsien Huang Huang Chih Supervisor : Prof. Jen-Tzung Chien National Cheng Kung University Outline INTRODUCTION INTRODUCTION INTRODUCTION LARGE VOCABULARY CONTINUOUS

A New Adaptation Method for Speaker- -Model Model A New Adaptation Method for Speaker Creation

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology

HMM-based acoustic model adaptation and discriminative training Steven Wegmann ICSI 11 April

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines dhuggins@cs.cmu.edu Overview

Coastal Adaptation Kellie Fisher FCERM Senior Advisor Why Adaptation? Adaptation to a

Discriminative Linear Transforms for Feature Normalization and Speaker Adaptation in HMM

Discriminative word alignment by learning the Discriminative word alignment by learning the

Three models for discriminative machine Three models for discriminative machine translation using

Generative vs. discriminative Generative Discriminative Belief network A is more More

Speech Processing 15-492/18-492 Speaker ID Who is speaking? Speaker ID, Speaker Recognition

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Research in Support of Enhanced Automatic Crash Notification Prof. Kennerly Digges VDI

Forest Change Detection Fiji 1991, 2001, 2007 and 2012, as Wall to Wall Mapping Vilisi

PERCEPTION FOR INTELLIGENT VEHICLES/ROBOTS Olivier Aycard Associate Professor at University of

Preschool Exceptional Student Learning Support (ESLS) A23 Preschool ESLS Program Chart A23 PK

Senate Appropriations Committee March 23, 2018 I dont think its jobs against

BUTTONVILLE Presentation to Development Services Committee December 11, 2012 BUTTONVILLE |

Revitalization Plan BM Ross and Associates Ltd. Bayfield Main Street PAT "Main Street is

VFW Auxiliary 2018 Strategic Planning Update Unwavering Support for Uncommon Heroes tm Agenda

Bayesian and Discriminative Speaker Adaptation Chih- -Hsien Hsien - PowerPoint PPT Presentation

Bayesian and Discriminative Speaker Adaptation Chih- -Hsien Hsien Huang Huang Chih Supervisor : Prof. Jen-Tzung Chien National Cheng Kung University Outline INTRODUCTION INTRODUCTION INTRODUCTION LARGE VOCABULARY CONTINUOUS

A New Adaptation Method for Speaker- -Model Model A New Adaptation Method for Speaker Creation

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology

HMM-based acoustic model adaptation and discriminative training Steven Wegmann ICSI 11 April

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines dhuggins@cs.cmu.edu Overview

Coastal Adaptation Kellie Fisher FCERM Senior Advisor Why Adaptation? Adaptation to a

Discriminative Linear Transforms for Feature Normalization and Speaker Adaptation in HMM

Discriminative word alignment by learning the Discriminative word alignment by learning the

Three models for discriminative machine Three models for discriminative machine translation using

Generative vs. discriminative Generative Discriminative Belief network A is more More

Speech Processing 15-492/18-492 Speaker ID Who is speaking? Speaker ID, Speaker Recognition

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Research in Support of Enhanced Automatic Crash Notification Prof. Kennerly Digges VDI

Forest Change Detection Fiji 1991, 2001, 2007 and 2012, as Wall to Wall Mapping Vilisi

PERCEPTION FOR INTELLIGENT VEHICLES/ROBOTS Olivier Aycard Associate Professor at University of

Preschool Exceptional Student Learning Support (ESLS) A23 Preschool ESLS Program Chart A23 PK

Senate Appropriations Committee March 23, 2018 I dont think its jobs against

BUTTONVILLE Presentation to Development Services Committee December 11, 2012 BUTTONVILLE |

Revitalization Plan BM Ross and Associates Ltd. Bayfield Main Street PAT &quot;Main Street is

VFW Auxiliary 2018 Strategic Planning Update Unwavering Support for Uncommon Heroes tm Agenda

Revitalization Plan BM Ross and Associates Ltd. Bayfield Main Street PAT "Main Street is