Review: Acoustic Modeling x Observations; sequence of 40d feature - PowerPoint PPT Presentation

Review: Acoustic Modeling x — Observations; sequence of ∼ 40d feature vectors. Lecture 6 ω — word sequence. Language Modeling/Pronunciation Modeling HMM/GMM framework lets us model P ( x | ω ) . . . How likely feature vectors are given word sequence. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com 15 October 2012 2 / 141 Part I The Fundamental Equation of ASR I HATE TO WAIT Language Modeling EYE HATE TWO WEIGHT ω ∗ = arg max P ( x | ω ) ω ⇓ ω ∗ = arg max P ( ω | x ) = arg max P ( ω ) P ( x | ω ) ω ω What’s new? Language model P ( ω ) describing . . . Frequency of each word sequence ω . 3 / 141 4 / 141

Language Modeling: Goals What Type of Model? Describe which word sequences are likely . Want probability distribution over sequence of symbols. e.g. , BRITNEY SPEARS vs. BRIT KNEE SPEARS . (Hidden) Markov model! Analogy: multiple-choice test. Hidden or non-hidden? LM restricts choices given to acoustic model. For hidden, too hard to come up with topology. The fewer choices, the better you do. 5 / 141 6 / 141 Where Are We? What’s an n -Gram Model? Markov model of order n − 1. N -Gram Models 1 To predict next word . . . Only need to remember last n − 1 words. Technical Details 2 Smoothing 3 Discussion 4 7 / 141 8 / 141

What’s a Markov Model? Sentence Begins and Ends Decompose probability of sequence . . . Pad left with beginning-of-sentence tokens. e.g. , w − 1 = w 0 = ⊲ . Into product of conditional probabilities. Always condition on two words to left, even at start. e.g. , trigram model ⇒ Markov order 2 ⇒ . . . Predict end-of-sentence token at end. Remember last 2 words. So true probability, i.e. , � ω P ( ω ) = 1. L � P ( w 1 · · · w L ) = P ( w i | w 1 · · · w i − 1 ) L + 1 � P ( w 1 · · · w L ) = P ( w i | w i − 2 w i − 1 ) i = 1 L i = 1 � = P ( w i | w i − 2 w i − 1 ) P ( I HATE TO WAIT ) = P ( I | ⊲ ⊲ ) × P ( HATE | ⊲ I ) × P ( TO | I HATE ) × i = 1 P ( WAIT | HATE TO ) × P ( ⊳ | TO WAIT ) P ( I HATE TO WAIT ) = P ( I ) P ( HATE | I ) P ( TO | I HATE ) P ( WAIT | HATE TO ) 9 / 141 10 / 141 How to Set Probabilities? Example: Maximum Likelihood Estimation 23M words of Wall Street Journal text. For each history w i − 2 w i − 1 . . . P ( w i | w i − 2 w i − 1 ) is multinomial distribution. FEDERAL HOME LOAN MORTGAGE CORPORATION –DASH ONE .POINT FIVE BILLION DOLLARS OF REALESTATE Maximum likelihood estimation for multinomials. MORTGAGE -HYPHEN INVESTMENT CONDUIT SECURITIES Count and normalize! OFFERED BY MERRILL LYNCH &AMPERSAND COMPANY NONCOMPETITIVE TENDERS MUST BE RECEIVED BY NOON c ( w i − 2 w i − 1 w i ) P MLE ( w i | w i − 2 w i − 1 ) = EASTERN TIME THURSDAY AT THE TREASURY OR AT � w c ( w i − 2 w i − 1 w ) FEDERAL RESERVE BANKS OR BRANCHES = c ( w i − 2 w i − 1 w i ) . . . . . . c ( w i − 2 w i − 1 ) P ( TO | I HATE ) = c ( I HATE TO ) = 17 45 = 0 . 378 c ( I HATE ) 11 / 141 12 / 141

Example: Bigram Model Example: Bigram Model P ( I HATE TO WAIT ) =??? P ( I HATE TO WAIT ) P ( EYE HATE TWO WEIGHT ) =??? = P ( I | ⊲ ) P ( HATE | I ) P ( TO | HATE ) P ( WAIT | TO ) P ( ⊳ | WAIT ) Step 1: Collect all bigram counts, unigram history counts. 3234 21891 × 40 45 510508 × 35 324 882 = 3 . 05 × 10 − 11 = 892669 × 246 × ⊳ ∗ EYE I HATE TO TWO WAIT WEIGHT ⊲ 3 3234 5 4064 1339 8 22 0 892669 0 0 0 26 1 0 0 52 735 EYE P ( EYE HATE TWO WEIGHT ) 0 0 45 2 1 1 0 8 21891 I 0 0 0 40 0 0 0 9 246 HATE = P ( EYE | ⊲ ) P ( HATE | EYE ) P ( TWO | HATE ) P ( WEIGHT | TWO ) × 8 6 19 21 5341 324 4 221 510508 TO 0 5 0 1617 652 0 0 4213 132914 TWO P ( ⊳ | WEIGHT ) 0 0 0 71 2 0 0 35 882 WAIT 0 0 0 38 0 0 0 45 643 3 0 0 132914 × 45 0 WEIGHT = 892669 × 735 × 246 × 643 = 0 13 / 141 14 / 141 Recap: N -Gram Models Where Are We? Simple formalism, yet effective. N -Gram Models 1 Discriminates between wheat and chaff. Easy to train: count and normalize. Technical Details 2 Generalizes. Assigns nonzero probabilities to sentences . . . Not seen in training data, e.g. , I HATE TO WAIT . Smoothing 3 Discussion 4 15 / 141 16 / 141

LM’s and Training and Decoding LM’s and Training and Decoding Decoding without LM’s. Point: n -gram model is (hidden) Markov model. Word HMM encoding allowable word sequences. Can be expressed as word HMM. Replace each word with its HMM. Replace each word with its HMM. Leave in language model probabilities. ❍▼▼♦♥❡ ❖◆❊ ❍▼▼t✇♦ ❚❲❖ ❖◆❊✴P✭❖◆❊✮ ❍▼▼♦♥❡✴P✭❖◆❊✮ ❚❲❖✴P✭❚❲❖✮ ❍▼▼t✇♦✴P✭❚❲❖✮ ❍▼▼t❤r❡❡ ❚❍❘❊❊ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ❚❍❘❊❊✴P✭❚❍❘❊❊✮ ❍▼▼t❤r❡❡✴P✭❚❍❘❊❊✮ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ Lots more details in lectures 7, 8. How do LM’s impact acoustic model training? 17 / 141 18 / 141 One Puny Prob versus Many? The Acoustic Model Weight Not a fair fight. Solution: acoustic model weight. ω ∗ = arg max P ( ω ) P ( x | ω ) α ω ♦♥❡ t✇♦ α usually somewhere between 0.05 and 0.1. t❤r❡❡ ❢♦✉r Important to tune for each LM, AM. ☞✈❡ s✐① Theoretically inelegant. s❡✈❡♥ Empirical performance trumps theory any day of week. ❡✐❣❤t ♥✐♥❡ Is it LM weight or AM weight? ③❡r♦ 19 / 141 20 / 141

Real World Toy Example What is This Word Error Rate Thing? Test set: continuous digit strings. Most popular evaluation measure for ASR systems Unigram language model: P ( ω ) = � L + 1 i = 1 P ( w i ) . Divide total number of errors in test set . . . By total number of words. 15 � utts u ( # errors in u ) WER ≡ � utts u ( # words in reference for u ) 10 What is “number of errors” in utterance? WER Minimum number of word insertions, deletions, and . . . Substitutions to transform reference to hypothesis. 5 0 AM weight=1 AM weight=0.1 21 / 141 22 / 141 Example: Word Error Rate Evaluating Language Models What is the WER? Best way: plug into ASR system; measure WER. Need ASR system. reference: THE DOG IS HERE NOW Expensive to compute (especially in old days). hypothesis: THE UH BOG IS NOW Results depend on acoustic model. Can WER be above 100%? Is there something cheaper that predicts WER well? What algorithm to compute WER? How many ways to transform reference to hypothesis? 23 / 141 24 / 141

Perplexity Example: Perplexity Basic idea: test set likelihood . . . P ( I HATE TO WAIT ) Normalized so easy to interpret. = P ( I | ⊲ ) P ( HATE | I ) P ( TO | HATE ) P ( WAIT | TO ) P ( ⊳ | WAIT ) Take (geometric) average probability p avg . . . 3234 21891 × 40 45 510508 × 35 324 Assigned to each word in test data. 882 = 3 . 05 × 10 − 11 = 892669 × 246 × 1 � L + 1 � L + 1 � p avg = P ( w i | w i − 2 w i − 1 ) 1 i = 1 � L + 1 � L + 1 � p avg = P ( w i | w i − 1 ) 1 Invert it: PP = p avg . i = 1 Interpretation: 1 = ( 3 . 05 × 10 − 11 ) 5 = 0 . 00789 Given history, how many possible next words . . . 1 (For acoustic model to choose from.) PP = = 126 . 8 p avg e.g. , uniform unigram LM over V words ⇒ PP = V . 25 / 141 26 / 141 Perplexity: Example Values Does Perplexity Predict Word-Error Rate? Not across different LM types. training case+ e.g. , word n -gram model; class n -gram model; . . . type domain data punct PP OK within LM type. human 1 biography 142 √ e.g. , vary training set; model order; pruning; . . . machine 2 Brown 600MW 790 ASR 3 WSJ 23MW 120 Varies highly across domains, languages. Why? 1 Jefferson the Virginian ; Shannon game (Shannon, 1951). 2 Trigram model (Brown et al. , 1992). 3 Trigram model; 20kw vocabulary. 27 / 141 28 / 141

Perplexity and Word-Error Rate Recap LM describes allowable word sequences. Used to build decoding graph. 35 Need AM weight for LM to have full effect. Best to evaluate LM’s using WER . . . But perplexity can be informative. 30 WER Can you think of any problems with word error rate? What do we really care about in applications? 25 20 4.5 5 5.5 6 6.5 log PP 29 / 141 30 / 141 Where Are We? An Experiment Take 50M words of WSJ; shuffle sentences; split in two. N -Gram Models 1 “Training” set: 25M words. NONCOMPETITIVE TENDERS MUST BE RECEIVED BY NOON EASTERN TIME THURSDAY AT THE TREASURY OR AT Technical Details 2 FEDERAL RESERVE BANKS OR BRANCHES .PERIOD NOT EVERYONE AGREED WITH THAT STRATEGY .PERIOD . . . Smoothing 3 . . . “Test” set: 25M words. NATIONAL PICTURE &AMPERSAND FRAME –DASH INITIAL Discussion 4 TWO MILLION ,COMMA TWO HUNDRED FIFTY THOUSAND SHARES ,COMMA VIA WILLIAM BLAIR .PERIOD THERE WILL EVEN BE AN EIGHTEEN -HYPHEN HOLE GOLF COURSE .PERIOD . . . . . . 31 / 141 32 / 141

Review: Acoustic Modeling x Observations; sequence of 40d feature - PowerPoint PPT Presentation

Review: Acoustic Modeling x Observations; sequence of 40d feature vectors. Lecture 6 word sequence. Language Modeling/Pronunciation Modeling HMM/GMM framework lets us model P ( x | ) . . . How likely feature vectors are given

Acoustic Acoustic Control Systems BV Acoustic Acoustic Control Systems BV Control Systems BV

Acoustic Modeling: Tied-state HMMs & DNN-based models Lecture 7 CS 753 Instructor: Preethi

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

The Center for Acoustic Neuroma Translabyrinthine Resection of Acoustic Neuroma Indications 1 -

VARIFLEX operable walls Introduction Acoustic overview Acoustic selection table Types of VX

Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Adaptation Techniques for

Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake Runzer Acoustic Fingerprinting

Speech Processing 15-492/18-492 Speech Recognition Intro Acoustic modelling HMMs Speech

Acoustic streaming modeling Milad Setareh Applied Mechanics/Fluid Dynamics, Amirkabir University

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

HMMs for Acoustic Modeling (Part II) Lecture 3 CS 753 Instructor: Preethi Jyothi Recap: HMMs

The ABS The ABS Acoustic Bubble Spectrometer Acoustic Bubble Spectrometer by G. L.

PG Board TAF PG Board TAF Thermal, Acoustic and Thermal, Acoustic and Fire Resistant

Uncertainty in Acoustic Mine Uncertainty in Acoustic Mine Detection due to Environmental

Acoustic Liquid- -Level Determination of Level Determination of Acoustic Liquid Gradients and

T-Type Acoustic Inspection System Acoustic Measurement of Storage Tank Sludge Crude Oil Storage

DAY WIT Y WITH H THE HE SU SUPERINT RINTENDE NDENT NT Ap Apri ril l 16, , 2015 IOWA

Embracing Change Financial Informatics and Risk Analytics Mark D. Flood Senior Financial

Nonlinear dynamic stochastic general equilibrium models in Stata 16 David Schenck Senior

US firms may have sold more to China than Chinese firms sold to the US in 2017 Trade balance vs

rupert report peter rupert professor department of economics, ucsb director, ucsb economic

OHIC Payment and Care Delivery Advisory Committee Telemedicine Subcommittee AUGUST 27, 2020

LONG-TERM LOW INTEREST RATES AT THE FEDERAL RESERVE Peter Conti-Brown Assistant Professor

Remarks given as a discussant of Passthrough Efficiency in the Feds new Monetary Policy

Review: Acoustic Modeling x Observations; sequence of 40d feature - PowerPoint PPT Presentation

Review: Acoustic Modeling x Observations; sequence of 40d feature vectors. Lecture 6 word sequence. Language Modeling/Pronunciation Modeling HMM/GMM framework lets us model P ( x | ) . . . How likely feature vectors are given

Acoustic Acoustic Control Systems BV Acoustic Acoustic Control Systems BV Control Systems BV

Acoustic Modeling: Tied-state HMMs &amp; DNN-based models Lecture 7 CS 753 Instructor: Preethi

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

The Center for Acoustic Neuroma Translabyrinthine Resection of Acoustic Neuroma Indications 1 -

VARIFLEX operable walls Introduction Acoustic overview Acoustic selection table Types of VX

Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Adaptation Techniques for

Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake Runzer Acoustic Fingerprinting

Speech Processing 15-492/18-492 Speech Recognition Intro Acoustic modelling HMMs Speech

Acoustic streaming modeling Milad Setareh Applied Mechanics/Fluid Dynamics, Amirkabir University

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

HMMs for Acoustic Modeling (Part II) Lecture 3 CS 753 Instructor: Preethi Jyothi Recap: HMMs

The ABS The ABS Acoustic Bubble Spectrometer Acoustic Bubble Spectrometer by G. L.

PG Board TAF PG Board TAF Thermal, Acoustic and Thermal, Acoustic and Fire Resistant

Uncertainty in Acoustic Mine Uncertainty in Acoustic Mine Detection due to Environmental

Acoustic Liquid- -Level Determination of Level Determination of Acoustic Liquid Gradients and

T-Type Acoustic Inspection System Acoustic Measurement of Storage Tank Sludge Crude Oil Storage

DAY WIT Y WITH H THE HE SU SUPERINT RINTENDE NDENT NT Ap Apri ril l 16, , 2015 IOWA

Embracing Change Financial Informatics and Risk Analytics Mark D. Flood Senior Financial

Nonlinear dynamic stochastic general equilibrium models in Stata 16 David Schenck Senior

US firms may have sold more to China than Chinese firms sold to the US in 2017 Trade balance vs

rupert report peter rupert professor department of economics, ucsb director, ucsb economic

OHIC Payment and Care Delivery Advisory Committee Telemedicine Subcommittee AUGUST 27, 2020

LONG-TERM LOW INTEREST RATES AT THE FEDERAL RESERVE Peter Conti-Brown Assistant Professor

Remarks given as a discussant of Passthrough Efficiency in the Feds new Monetary Policy

Acoustic Modeling: Tied-state HMMs & DNN-based models Lecture 7 CS 753 Instructor: Preethi