Minimum Bayes-Risk Methods in Automatic Speech Recognition Vaibhava - PowerPoint PPT Presentation

Minimum Bayes-Risk Methods in Automatic Speech Recognition Vaibhava Goel – IBM William Byrne – Johns Hopkins University Pattern Recognition in Speech and Language Processing – Chap2

Outline • Minimum Bayes-Risk Classification Framework – Likelihood Ratio Based Hypothesis Testing – Maximum A-Posteriori Probability Classification – Previous Studies of Application Sensitive ASR • Practical MBR Procedures for ASR – Summation over Hidden State Sequences – MBR Recognition with N-best Lists – MBR Recognition with Lattices

Outline • Segmental MBR Procedures – Segmental Voting – ROVER – e-ROVER • Experimental Results – Parameter Tuning within the MBR Classification Rule – Utterance Level MBR Word and Keyword Recognition – ROVER and e-ROVER for Multilingual ASR • Summary

• Minimum Bayes-Risk Classification Framework – Likelihood Ratio Based Hypothesis Testing – Maximum A-Posteriori Probability Classification – Previous Studies of Application Sensitive ASR • Practical MBR Procedures for ASR – Summation over Hidden State Sequences – MBR Recognition with N-best Lists – MBR Recognition with Lattices

Minimum Bayes-Risk Classification Framework • Definition: A : acoustic observatio n sequence W : word string A W : the hypothesis space of the observatio n A h δ → A ( A ) : A W : ASR classifier h ′ ′ l ( W , W ) : loss function, where W is mistranscr iption of W P ( W , A ) : true distributi on of speech and language

Minimum Bayes-Risk Classification Framework How to measure classifier performanc e? [ ] ∑ ∑ → δ = δ Using Bayes - risk E l ( W, ( A )) l ( W , ( A )) P ( W , A ) (2.1) P ( W, A ) A W ∑ ′ ∴ δ = ( A ) arg min l ( W, W ) P ( W | A ) (2.2) is chosen to minimize Bayes - risk ′ A ∈ W W h W ⎛ ⎞ ′ δ = but we use ( A ) arg min l ( W , W ) , W is the correct tr anscriptio n of A ⎜ ⎟ c c ⎝ ⎠ ′ ∈ A W W h { } ⇒ = > A A Let W be the subset of W , with nonzero P ( W | A ) W W | P ( W | A ) 0 e e ∑ ′ ∴ δ = Equation 2.2 can be rewritten as ( A ) arg min l ( W, W ) P ( W | A ) (2.4) ′ A ∈ W W A h ∈ W W e ∑ ′ ′ ′ = ∴ δ = Let l ( W, W ) P ( W | A ) S ( W ) ( A ) arg min S ( W ) A ′ ∈ W W A ∈ h W W e

Minimum Bayes-Risk Classification Framework A Since the observatio ns in W serve as the evidence used by MBR classifier . e ∴ A W is refered as evidence space e and P ( W | A ) is refered as evidence distributi on How to define loss function ? Two ways : = ⇒ = δ loss function l ( X,Y ) classifier ( A ) LRT LRT → method likelihood ratio hypothesis testing = ⇒ = δ loss function l ( X,Y ) classifier ( A ) 0 / 1 MAP → method maximum a - posteriori classifica tion

Likelihood Ratio Based Hypothesis Testing = = ⎧ 0 if X H , Y H n n ⎪ = = t if X H , Y H ⎪ { } { } = = = 1 a n If W H , H and W H , H and define l ( X , Y ) ⎨ e n a h n a LRT = = t if X H , Y H ⎪ 2 n a ⎪ = = 0 if X H , Y H ⎩ a a ∑ ′ ∴ δ = ( A ) arg min l ( W, W ) P ( W | A ) ′ A ∈ W W A h ∈ W W e } [ ] ′ ′ = + arg min l ( H , W ) P ( H | A ) l ( H , W ) P ( H | A ) n n a a { ′ ∈ W H , H n a [ ] + ⎧ ⎫ l ( H , H ) P ( H | A ) l ( H , H ) P ( H | A ) , n n n a n a = arg min ⎨ ⎬ [ ] + l ( H , H ) P ( H | A ) l ( H , H ) P ( H | A ) ⎩ ⎭ n a n a a a [ ] [ ] ⎧ t P ( H | A ) , ⎫ ⎧ t P ( A | H ) P ( H ) , ⎫ = 1 a = 1 a a arg min arg min ⎨ ⎬ ⎨ ⎬ [ ] [ ] t P ( H | A ) t P ( A | H ) P ( H ) ⎩ ⎭ ⎩ ⎭ 2 n 2 n n ⎧ P ( A | H ) t P ( H ) > ⎧ H t P ( A | H ) P ( H ) t P ( A | H ) P ( H ) > = ⎪ H n 1 a t n 2 n n 1 a a = = n ⎨ ⎨ P ( A | H ) t P ( H ) a 2 n H otherwise ⎩ ⎪ H otherwise a ⎩ a

Likelihood Ratio Based Hypothesis Testing ⎧ P ( A | H ) > ⎪ H if n t ∴ δ = ( A ) n (2.6) ⎨ P ( A | H ) LRT a ⎪ H otherwise ⎩ a The threshold t is set in an applicatio n specific manner; it determines the balance between false rejection and false aceptance. A null class H n H a alternative class

Maximum A-Posteriori Probability Classification ′ ≠ ⎧ 1 if W W ′ = Define l ( W , W ) ⎨ 0 / 1 0 otherwise ⎩ ∑ ′ ∴ δ = ( A ) arg min l ( W, W ) P ( W | A ) A ′ ∈ W W A h ∈ W W e ∑ = arg min P ( W | A ) ′ A ∈ W W ′ h ≠ W W ( ) ′ = − arg min 1 P ( W | A ) A ′ ∈ W W h ′ = arg max P ( W | A ) ′ A ∈ W W h

Previous Studies of Application Sensitive ASR • Use of risk minimization in automatic speech has not been extensive. • Early investigations into the minimum Bayes-risk training criteria for speech recognizers were performed by Nadas . • However our focus in this chapter is in minimum-risk classification rather than estimation .

Previous Studies of Application Sensitive ASR • Stolcke et.al. proposed an approximation to a minimum Bayes risk classifier for generation of minimum word error rate hypothesis from recognition N-best lists . • Other researchers have proposed posterior probability and confidence based hypothesis selection strategies for word error rate reduction.

• Minimum Bayes-Risk Classification Framework – Likelihood Ratio Based Hypothesis Testing – Maximum A-Posteriori Probability Classification – Previous Studies of Application Sensitive ASR • Practical MBR Procedures for ASR – Summation over Hidden State Sequences – MBR Recognition with N-best Lists – MBR Recognition with Lattices

Practical MBR Procedures for ASR • Why difficult to implement? – The evidence and hypothesis spaces in Equation 2.4 tend to be quite large . – The problem of large spaces is worsened by the fact that an ASR recognizer often has to process many consecutive utterances . – There are efficient DP techniques for MAP recognizer, such methods are not yet available for an MBR recognizer under an arbitrary loss function.

Practical MBR Procedures for ASR • How to implement? – Two implementation: • N-best list rescoring procedure • Search over a recognition lattice – Segment long acoustic data into sentence or phrase length utterances. – Restrict the evidence and hypothesis spaces to manageable sets of word strings.

Summation over Hidden State Sequences • A computational issue associated with the use of HMM in the evidence distribution will be addressed. • How to obtain the true distribution? P ( W ) P ( A | W ) = P ( W | A ) (2.12) P ( A ) Here P ( W ) is approximat ed using a language model , it is usually a Markov chain based N - gram model. P ( A | W ) is usually approximat ed using a HMM called the acoustic model. Let S be the set of all the states in the acoustic HMM P ( A | W ). Let χ denote the set of all possible state sequences that could generate A . The probabilit y P ( A | W ) is computed as ∑ ∑ = = P ( A | W ) P ( A , X | W ) P ( X | W ) P ( A | X , W ) (2.13) ∈ ∈ X χ X χ

Summation over Hidden State Sequences The summation over all possible hidden state sequences is too expensive. A computatio nally feasible alternativ e is to modify the Equation 2.4 as ∑ ′ δ = ( A ) arg min l ( W, W ) P ( W | A ) A ′ ∈ W W A h ∈ W W e ∑ P ( W ) P ( X | W ) P ( A | X , W ) ∑ ∈ ′ = X χ arg min l ( W, W ) P ( A ) A ′ ∈ W W A h ∈ W W e ∑ ∑ ′ = arg min l ( W, W ) P ( W , X , A ) A ′ ∈ W W A ∈ h ∈ X χ W W e ( ) ∑ ′ ′ ≈ arg min l ( W , X ), ( W , X ) P ( W , X , A ) A A ′ ′ ∈ × ( W , X ) W χ A A h ∈ × ( W , X ) W χ e A where χ is a sparse sampling of the most likely state sequences in χ .

Summation over Hidden State Sequences For convenienc e we use W rather tha n ( W , X ) A A × A W rather tha n W χ h h × A A A W rather tha n W χ e e ∑ ′ ∴ δ = we have ( A ) arg min l ( W, W ) P ( W , A ) (2.15) ′ A ∈ W W h W A ∈ W e

Minimum Bayes-Risk Methods in Automatic Speech Recognition Vaibhava - PowerPoint PPT Presentation

Minimum Bayes-Risk Methods in Automatic Speech Recognition Vaibhava Goel IBM William Byrne Johns Hopkins University Pattern Recognition in Speech and Language Processing Chap2 Outline Minimum Bayes-Risk Classification Framework

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 23: Speech

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 1: Introduction

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 1: Introduction

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFSTs in ASR

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 21: Speaker

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 24: Statistical

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 22: Speaker

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 20:

Audio- -Visual Automatic Speech Recognition: Visual Automatic Speech Recognition: Audio Theory,

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 10: Deep Neural

ANNUAL RESULTS PRESENTATION FOR THE YEAR ENDED 30 JUNE 2017 AGENDA Milestones & Highlights

Prediction of Fracturing and Dynamic Roof Failure in Platinum Mines Kedy Mazibuko, Kirsten Louw,

Property & Asset Management Limited 11/08/2016 Real Estate Construction Hospitality

Ita Unibanco Arethusa R. Pontes IT Architecture Superintendent Vision To be the leading bank

Sanlam to acquire African Life Sanlam to acquire African Life Sanlam to acquire African Life A

Interim Results Presentation January 2012 Ellies Holdings Limited Agenda Financial

EU World Cities Programme Presentation: Innovation for Smart and Green Cities 16 th May 2018

THE APC REPEAT PROGRAMME Why do the APT Repeat Programme? General This is a board course

Minimum Bayes-Risk Methods in Automatic Speech Recognition Vaibhava - PowerPoint PPT Presentation

Minimum Bayes-Risk Methods in Automatic Speech Recognition Vaibhava Goel IBM William Byrne Johns Hopkins University Pattern Recognition in Speech and Language Processing Chap2 Outline Minimum Bayes-Risk Classification Framework

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 23: Speech

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 1: Introduction

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 1: Introduction

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFSTs in ASR

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 21: Speaker

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 24: Statistical

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 22: Speaker

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 20:

Audio- -Visual Automatic Speech Recognition: Visual Automatic Speech Recognition: Audio Theory,

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 10: Deep Neural

ANNUAL RESULTS PRESENTATION FOR THE YEAR ENDED 30 JUNE 2017 AGENDA Milestones &amp; Highlights

Prediction of Fracturing and Dynamic Roof Failure in Platinum Mines Kedy Mazibuko, Kirsten Louw,

Property &amp; Asset Management Limited 11/08/2016 Real Estate Construction Hospitality

Ita Unibanco Arethusa R. Pontes IT Architecture Superintendent Vision To be the leading bank

Sanlam to acquire African Life Sanlam to acquire African Life Sanlam to acquire African Life A

Interim Results Presentation January 2012 Ellies Holdings Limited Agenda Financial

EU World Cities Programme Presentation: Innovation for Smart and Green Cities 16 th May 2018

THE APC REPEAT PROGRAMME Why do the APT Repeat Programme? General This is a board course

ANNUAL RESULTS PRESENTATION FOR THE YEAR ENDED 30 JUNE 2017 AGENDA Milestones & Highlights

Property & Asset Management Limited 11/08/2016 Real Estate Construction Hospitality