Probabilistic Pronunciation + N-gram Models CMSC 35100 Natural - PowerPoint PPT Presentation

Probabilistic Pronunciation + N-gram Models CMSC 35100 Natural Language Processing April 15, 2003

The ASR Pronunciation Problem Given a series of phones, what is the most probable word? Simplification: Assume phone sequence known, word boundaries known Approach: Noisy channel model Surface form is an instance of lexical form that has passed through a noisy communication path Model channel to remove noise, find original

Bayesian Model • Pr(w|O) = Pr(O|w)Pr(w)/P(O) • Goal: Most probable word – Observations held constant – Find w to maximize Pr(O|w)*Pr(w) • Where do we get the likelihoods? – Pr(O|w) – Probabilistic rules (Labov) • Add probabilities to pronunciation variation rules – Count over large corpus of surface forms wrt lexicon • Where do we get Pr(w)? – Similarly – count over words in a large corpus

Weighted Automata • Associate a weight (probability) with each arc - Determine weights by decision tree compilation or counting from a large corpus 0.54 ax aw 0.68 0.85 0.3 t end 0.12 0.16 start b 0.15 0.2 0.63 ix ae dx 0.37 Computed from Switchboard corpus

Forward Computation • For a weighted automaton and a phoneme sequence, what is its likelihood? – Automaton: Tuple • Set of states Q: q0,…qn • Set of transition probabilities between states aij, – Where aij is the probability of transitioning from state i to j • Special start & end states – Inputs: Observation sequence: O = o1,o2,…,ok – Computed as: • forward[t,j] = P(o1,o2…ot,qt=j| λ )p(w)= Σ i forward[t-1,i]*aij*bjt – Sums over all paths to qt=j

Viterbi Decoding • Given an observation sequence o and a weighted automaton, what is the mostly likely state sequence? – Use to identify words by merging multiple word pronunciation automata in parallel – Comparable to forward • Replace sum with max • Dynamic programming approach – Store max through a given state/time pair

Viterbi Algorithm Function Viterbi(observations length T, state-graph) returns best-path Num-states<-num-of-states(state-graph) Create path prob matrix viterbi[num-states+2,T+2] Viterbi[0,0]<- 1.0 For each time step t from 0 to T do for each state s from 0 to num-states do for each transition s’ from s in state-graph new-score<-viterbi[s,t]*at[s,s’]*bs’(ot) if ((viterbi[s’,t+1]=0) || (viterbi[s’,t+1]<new-score)) then viterbi[s’,t+1] <- new-score back-pointer[s’,t+1]<-s Backtrace from highest prob state in final column of viterbi[] & return

Segmentation • Breaking sequence into chunks – Sentence segmentation • Break long sequences into sentences – Word segmentation • Break character/phonetic sequences into words – Chinese: typically written w/o whitespace » Pronunciation affected by units – Language acquisition: » How does a child learn language from stream of phones?

Models of Segmentation • Many: – Rule-based, heuristic longest match • Probabilistic: – Each word associated with its probability – Find sequence with highest probability • Typically compute as log probs & sum – Implementation: Weighted FST cascade • Each word = chars + probability • Self-loop on dictionary • Compose input with dict* • Compute most likely

N-grams • Perspective: – Some sequences (words/chars) are more likely than others – Given sequence, can guess most likely next • Used in – Speech recognition – Spelling correction, – Augmentative communication – Other NL applications

Corpus Counts • Estimate probabilities by counts in large collections of text/speech • Issues: – Wordforms (surface) vs lemma (root) – Case? Punctuation? Disfluency? – Type (distinct words) vs Token (total)

Basic N-grams • Most trivial: 1/#tokens: too simple! • Standard unigram: frequency – # word occurrences/total corpus size • E.g. the=0.07; rabbit = 0.00001 – Too simple: no context! • Conditional probabilities of word sequences n = n P w P w P w w P w w P w w 2 ( ) ( ) ( | ) ( | )... ( | ) n 1 n 1 2 1 3 1 1 ∏ − = k P w k w 1 ( | ) 1 = k 1

Markov Assumptions • Exact computation requires too much data • Approximate probability given all prior wds – Assume finite history – Bigram: Probability of word given 1 previous • First-order Markov – Trigram: Probability of word given 2 previous • N-gram approximation − − ≈ n n P w w P w w 1 1 ( | ) ( | ) − + n n n N 1 1 n ∏ n ≈ P w P w w ( ) ( | ) − k k Bigram sequence 1 1 k = 1

Issues • Relative frequency – Typically compute count of sequence • Divide by prefix C w w ( ) = P w w − n n 1 ( | ) − n n 1 C w ( ) − n 1 • Corpus sensitivity – Shakespeare vs Wall Street Journal • Very unnatural • Ngrams – Unigram: little; bigrams: colloc; trigrams:phrase

Probabilistic Pronunciation + N-gram Models CMSC 35100 Natural - PowerPoint PPT Presentation

Probabilistic Pronunciation + N-gram Models CMSC 35100 Natural Language Processing April 15, 2003 The ASR Pronunciation Problem Given a series of phones, what is the most probable word? Simplification: Assume phone sequence known, word

21 st Century Antibiotics Gram Negative Antibiotic Gram Positive Antibiotic Plasmid Library

N-gram models Unsmoothed n-gram models (finish slides from last class) Smoothing

More microscopic slides of bacteria Gram stain Good example of bacilli gram stain that is

Books by Mark Hancock English Pronunciation in Use Intermediate (CUP) Pronunciation Games (CUP)

Using weekly group political presentation to enhance pronunciation - Tran Hong Le - Questions to

Pronunciation of Nouns in Pronunciation of Nouns in Text to Speech systems Text to Speech

Pronunciation Lexicon Background Outline Brief Introduction on Pronunciation Lexicon

N-grams & Language ID If N-gram models represent language models, can we use N-gram

N-Gram Model Formulas Estimating Probabilities N-gram conditional probabilities can be

GOLD/SILVER/PLATINUM BARS & COINS RSBL 0.5 Gram 999 Purity Platinum Bar/Coin More Details

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Models for Retrieval Models for Retrieval 1. HMM/N-gram-based 2. Latent Semantic Indexing (LSI)

Pronunciation Variation: TTS & Probabilistic Models CMSC 35100 Natural Language Processing

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

3 Language Models 1: n -gram Language Models While the final goal of a probabilistic machine

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

A gender perspective Isabel Ruiz Carlos Vargas-Silva Today In this study Refugees in

Outline Paper presentation Ultra-Portable Devices Log-Likelihood Algebra Paper:

Fabio Sebastiano Electronic interfaces for quantum processors the challenges A scalable

Privacy-Preserving Personal Information Management Mohamed Layouni PhD Oral Defense School of

Why is domain-wall fermion mathematically interesting? Hidenori Fukaya (Osaka U.) HF,. T Onogi,

= + ( ) cos( ), x t A t t Chapter 4 Chapter 4 k k k = 1 k

BBM 413 Fundamentals of Image Processing Erkut Erdem Dept. of Computer Engineering

On approximation processes defined by the cosine operator function in a Banach space Andi

Probabilistic Pronunciation + N-gram Models CMSC 35100 Natural - PowerPoint PPT Presentation

Probabilistic Pronunciation + N-gram Models CMSC 35100 Natural Language Processing April 15, 2003 The ASR Pronunciation Problem Given a series of phones, what is the most probable word? Simplification: Assume phone sequence known, word

21 st Century Antibiotics Gram Negative Antibiotic Gram Positive Antibiotic Plasmid Library

N-gram models Unsmoothed n-gram models (finish slides from last class) Smoothing

More microscopic slides of bacteria Gram stain Good example of bacilli gram stain that is

Books by Mark Hancock English Pronunciation in Use Intermediate (CUP) Pronunciation Games (CUP)

Using weekly group political presentation to enhance pronunciation - Tran Hong Le - Questions to

Pronunciation of Nouns in Pronunciation of Nouns in Text to Speech systems Text to Speech

Pronunciation Lexicon Background Outline Brief Introduction on Pronunciation Lexicon

N-grams &amp; Language ID If N-gram models represent language models, can we use N-gram

N-Gram Model Formulas Estimating Probabilities N-gram conditional probabilities can be

GOLD/SILVER/PLATINUM BARS &amp; COINS RSBL 0.5 Gram 999 Purity Platinum Bar/Coin More Details

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Models for Retrieval Models for Retrieval 1. HMM/N-gram-based 2. Latent Semantic Indexing (LSI)

Pronunciation Variation: TTS &amp; Probabilistic Models CMSC 35100 Natural Language Processing

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

3 Language Models 1: n -gram Language Models While the final goal of a probabilistic machine

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

A gender perspective Isabel Ruiz Carlos Vargas-Silva Today In this study Refugees in

Outline Paper presentation Ultra-Portable Devices Log-Likelihood Algebra Paper:

Fabio Sebastiano Electronic interfaces for quantum processors the challenges A scalable

Privacy-Preserving Personal Information Management Mohamed Layouni PhD Oral Defense School of

Why is domain-wall fermion mathematically interesting? Hidenori Fukaya (Osaka U.) HF,. T Onogi,

= + ( ) cos( ), x t A t t Chapter 4 Chapter 4 k k k = 1 k

BBM 413 Fundamentals of Image Processing Erkut Erdem Dept. of Computer Engineering

On approximation processes defined by the cosine operator function in a Banach space Andi

N-grams & Language ID If N-gram models represent language models, can we use N-gram

GOLD/SILVER/PLATINUM BARS & COINS RSBL 0.5 Gram 999 Purity Platinum Bar/Coin More Details

Pronunciation Variation: TTS & Probabilistic Models CMSC 35100 Natural Language Processing