An Integrated Framework for Margin-based Sequential Discriminative - PowerPoint PPT Presentation

Overview Background Unification of Margin-modified MPE and MMI An Integrated Framework for Margin-based Sequential Discriminative Training over Lattices using differenced Maximum Mutual Information (dMMI) Erik McDermott - Google Inc. September 14, 2012 1 / 31

Overview Background Unification of Margin-modified MPE and MMI Overview ◮ Error-weighted training using explicit models of error (MPE/MWE/sMBR etc.) ◮ Shifting of loss function: “margin” (MCE, MPE, bMMI) ◮ Make shift proportional to error. ◮ bMMI (Povey et al. 2008): implicit error model , just use error-proportional shift. ◮ Extension of “point” use of margin to integral over margin interval → proposal of “differenced MMI” (dMMI) ◮ dMMI: margin & error-dependent loss smoothing/integration ◮ Unifies margin-modified MMI and MPE ◮ More general than MPE yet allows a simpler implementation using difference of standard Forward-Backward statistics ◮ Bayesian view & further generalization. 2 / 31

Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Integrated system optimization 3 / 31

Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Non-uniform error for discriminative training 4 / 31

Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Minimum Phone Error (Povey 2002); Decision boundaries 5 / 31

Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI MPE as multi-dimensional sigmoid 6 / 31

Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI MPE derivative - String picture 7 / 31

Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Modified Forward-Backward for MPE over lattices 8 / 31

Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI MPE derivative - Arc picture 9 / 31

Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI New approaches based on margin ◮ Intuition: improve generalization by making the training problem “harder”. ◮ “Large-margin MCE” (Yu et al., 2007) ◮ Extension of McDermott & Katagiri (2004)’s Parzen window analysis of MCE → iteratively increase MCE sigmoid bias term ◮ Applicable to implicit error models : ◮ “Large-margin HMMs” (Sha & Saul, 2007): Insertion of fine-grained error (e.g. Edit Distance) into the margin term ◮ “Boosted MMI” (Povey et al., Saon & Povey, 2008) ◮ Heigold’s unified theory (2008): bring margin to standard MMI/MPE/MCE approaches 10 / 31

Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Linking ASR and Machine Learning 11 / 31

Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Modifying MPE/MMI with margin term “Boost” likelihoods (Povey & Saon (2008), Heigold (2008)): 12 / 31

Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Margin-modified MPE 13 / 31

Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Effect of margin on MPE loss 14 / 31

Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Margin-modified MMI (Povey & Saon, 2008) 15 / 31

Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Effect of margin on MMI loss 16 / 31

Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI ◮ 2300h Arabic Broadcast News (GALE) ◮ 2000h English conversational telephone speech (CTS) 17 / 31

Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Margin-modified MPE & MMI summary “Boost” likelihoods (Povey & Saon (2008), Heigold (2008)): 18 / 31

Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI dMMI: the “integrated” framework Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training McDermott & Nakamura, Interspeech 2009 ◮ Mathematical link between margin-modified MPE and MMI; ◮ Proposal of “dMMI” 19 / 31

Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI MPE is the derivative of modified MMI! 20 / 31

Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI dMMI definition Using previous result & Fundamental Theorem of Calculus: 21 / 31

Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI dMMI in practice Just use “reverse-boosted” denominator lattice as numerator lattice: 22 / 31

Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI Approximating MPE As margin interval is reduced, dMMI converges to MPE Property must hold for any correct implementations of bMMI and MPE! 23 / 31

Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI Integrated view of discriminative training 24 / 31

Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI Leveraging approximated, shifted hinge functions 25 / 31

Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI Gradient-based optimization using dMMI 26 / 31

Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI dMMI as integral over margin prior 27 / 31

Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI dMMI as building block for modeling general margin priors 28 / 31

Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI Numerical approximation of arbitrary margin priors ◮ E.g. prior p ( σ ) = c exp ( − c | σ | ) used for Minimum Relative Entropy Discrimination, Jebara (2004) ◮ Here: use prior in context of standard HMM-based discriminative training ◮ Approximate prior using sum of step functions (cf Lebesgue integration) 29 / 31

Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI Buidling margin prior using dMMI 30 / 31

Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI Summary ◮ MPE explicitly models non-uniform error, e.g. phone or word error including insertions, deletions & substitutions ◮ Margin-based “Boosted MMI” (bMMI): ◮ super-cheap approach for incorporating non-uniform error into loss function; ◮ however objective is still (modified) Mutual Information, not explicit model of error. ◮ “Differenced MMI” (dMMI) is similarly cheap alternative that ◮ is explicitly linked to error; ◮ generalizes MPE; ◮ possibly offers better performance (Delcroix et al. ICASSP 2012; Kubo et al. Interspeech 2012); ◮ can be further generalized to define arbitrary margin priors for lattice-based discriminative training. 31 / 31

An Integrated Framework for Margin-based Sequential Discriminative - PowerPoint PPT Presentation

Overview Background Unification of Margin-modified MPE and MMI An Integrated Framework for Margin-based Sequential Discriminative Training over Lattices using differenced Maximum Mutual Information (dMMI) Erik McDermott - Google Inc.

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Maximum Margin Criterion Math

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

About this class Maximizing the Margin Maximum margin classifiers Picture of large and small

Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7 Maximum Margin Criterion

Chapter 5 Synchronous Sequential Logic 5-1 Outline ! Sequential Circuits ! Latches ! Flip-Flops

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require

Introduction to Synchronous Sequential Introduction to Synchronous Sequential Circuits Circuits

Topic #28 Nyquist plots: Gain and phase margin Reference textbook : Control Systems, Dhanesh N.

Hardware Design with VHDL Sequential Circuit Design I ECE 443 Sequential Circuit Design:

Sequential Circuits Combinational circuits : current input output Sequential circuit :

1 Sequential data analysis Sequential data analysis Objects and operators Objects and operators

Sequential Decision Making AIMA Chapters: 17.1, 17.2, 17.3. Sutton and Barto, Reinforcement

Lecture 14: Sequential Circuits, FSM Todays topics: Sequential circuits Finite

To: Interested Parties From: GBAO Date: November 16, 2020 Poll Analysis: Michigan Educators On

Lecture 24/Chapter 20 Estimating Proportions with Confidence Example: Importance of Margin of

COMPSTAT 2010, Paris Two way classification of a table with non negative entries:

Non-homogeneous random walks on a semi-infinite strip Nicholas Georgiou Joint work with Andrew

Large Margin Classification Using the Perceptron Algorithm (Part 2) Henry Tan Georgetown

Statistical and Computational Statistical and Computational Learning Theory Learning Theory

CSCE 478/878 Lecture 2: Supervised Learning Supervised Learning Stephen Scott Introduction

Confidence Intervals II 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom Agenda Polling: