Hidden Markov Model π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π = π π¨ 1 | π¨ 0 π π₯ 1 |π¨ 1 β― π π¨ π | π¨ πβ1 π π₯ π |π¨ π = ΰ· π π₯ π |π¨ π π π¨ π | π¨ πβ1 π Goal: maximize (log-)likelihood In practice: we donβt actually observe these z values; we just see the words w
Hidden Markov Model π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π = π π¨ 1 | π¨ 0 π π₯ 1 |π¨ 1 β― π π¨ π | π¨ πβ1 π π₯ π |π¨ π = ΰ· π π₯ π |π¨ π π π¨ π | π¨ πβ1 π Goal: maximize (log-)likelihood In practice: we donβt actually observe these z values; we just see the words w if we did observe z , estimating the if we knew the probability parameters probability parameters would be easyβ¦ then we could estimate z and evaluate but we donβt! :( likelihoodβ¦ but we donβt! :(
Hidden Markov Model Terminology π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π = π π¨ 1 | π¨ 0 π π₯ 1 |π¨ 1 β― π π¨ π | π¨ πβ1 π π₯ π |π¨ π = ΰ· π π₯ π |π¨ π π π¨ π | π¨ πβ1 π Each z i can take the value of one of K latent states
Hidden Markov Model Terminology π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π = π π¨ 1 | π¨ 0 π π₯ 1 |π¨ 1 β― π π¨ π | π¨ πβ1 π π₯ π |π¨ π = ΰ· π π₯ π |π¨ π π π¨ π | π¨ πβ1 π transition probabilities/parameters Each z i can take the value of one of K latent states
Hidden Markov Model Terminology π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π = π π¨ 1 | π¨ 0 π π₯ 1 |π¨ 1 β― π π¨ π | π¨ πβ1 π π₯ π |π¨ π = ΰ· π π₯ π |π¨ π π π¨ π | π¨ πβ1 π emission transition probabilities/parameters probabilities/parameters Each z i can take the value of one of K latent states
Hidden Markov Model Terminology π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π = π π¨ 1 | π¨ 0 π π₯ 1 |π¨ 1 β― π π¨ π | π¨ πβ1 π π₯ π |π¨ π = ΰ· π π₯ π |π¨ π π π¨ π | π¨ πβ1 π emission transition probabilities/parameters probabilities/parameters Each z i can take the value of one of K latent states Transition and emission distributions do not change
Hidden Markov Model Terminology π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π = π π¨ 1 | π¨ 0 π π₯ 1 |π¨ 1 β― π π¨ π | π¨ πβ1 π π₯ π |π¨ π = ΰ· π π₯ π |π¨ π π π¨ π | π¨ πβ1 π emission transition probabilities/parameters probabilities/parameters Each z i can take the value of one of K latent states Transition and emission distributions do not change Q: How many different probability values are there with K states and V vocab items?
Hidden Markov Model Terminology π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π = π π¨ 1 | π¨ 0 π π₯ 1 |π¨ 1 β― π π¨ π | π¨ πβ1 π π₯ π |π¨ π = ΰ· π π₯ π |π¨ π π π¨ π | π¨ πβ1 π emission transition probabilities/parameters probabilities/parameters Each z i can take the value of one of K latent states Transition and emission distributions do not change Q: How many different probability values are there with K states and V vocab items? A: VK emission values and K 2 transition values
Hidden Markov Model Representation π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π = π π¨ 1 | π¨ 0 π π₯ 1 |π¨ 1 β― π π¨ π | π¨ πβ1 π π₯ π |π¨ π emission transition = ΰ· π π₯ π |π¨ π π π¨ π | π¨ πβ1 probabilities/parameters probabilities/parameters π β¦ z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 represent the probabilities and independence assumptions in a graph
Hidden Markov Model Representation π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π = π π¨ 1 | π¨ 0 π π₯ 1 |π¨ 1 β― π π¨ π | π¨ πβ1 π π₯ π |π¨ π emission transition = ΰ· π π₯ π |π¨ π π π¨ π | π¨ πβ1 probabilities/parameters probabilities/parameters π β¦ z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 Graphical Models (see 478/678)
Hidden Markov Model Representation π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π = π π¨ 1 | π¨ 0 π π₯ 1 |π¨ 1 β― π π¨ π | π¨ πβ1 π π₯ π |π¨ π emission transition = ΰ· π π₯ π |π¨ π π π¨ π | π¨ πβ1 probabilities/parameters probabilities/parameters π β¦ z 1 z 2 z 3 z 4 π π₯ 4 |π¨ 4 π π₯ 1 |π¨ 1 π π₯ 2 |π¨ 2 π π₯ 3 |π¨ 3 w 1 w 2 w 3 w 4
Hidden Markov Model Representation π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π = π π¨ 1 | π¨ 0 π π₯ 1 |π¨ 1 β― π π¨ π | π¨ πβ1 π π₯ π |π¨ π emission transition = ΰ· π π₯ π |π¨ π π π¨ π | π¨ πβ1 probabilities/parameters probabilities/parameters π π π¨ 2 | π¨ 1 π π¨ 3 | π¨ 2 π π¨ 4 | π¨ 3 β¦ z 1 z 2 z 3 z 4 π π₯ 4 |π¨ 4 π π₯ 1 |π¨ 1 π π₯ 2 |π¨ 2 π π₯ 3 |π¨ 3 w 1 w 2 w 3 w 4
Hidden Markov Model Representation π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π = π π¨ 1 | π¨ 0 π π₯ 1 |π¨ 1 β― π π¨ π | π¨ πβ1 π π₯ π |π¨ π emission transition = ΰ· π π₯ π |π¨ π π π¨ π | π¨ πβ1 probabilities/parameters probabilities/parameters π initial starting distribution (βBOSβ) π π¨ 1 | π¨ 0 π π¨ 2 | π¨ 1 π π¨ 3 | π¨ 2 π π¨ 4 | π¨ 3 β¦ z 1 z 2 z 3 z 4 π π₯ 4 |π¨ 4 π π₯ 1 |π¨ 1 π π₯ 2 |π¨ 2 π π₯ 3 |π¨ 3 w 1 w 2 w 3 w 4
Hidden Markov Model Representation π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π = π π¨ 1 | π¨ 0 π π₯ 1 |π¨ 1 β― π π¨ π | π¨ πβ1 π π₯ π |π¨ π emission transition = ΰ· π π₯ π |π¨ π π π¨ π | π¨ πβ1 probabilities/parameters probabilities/parameters π initial starting distribution (βBOSβ) π π¨ 1 | π¨ 0 π π¨ 2 | π¨ 1 π π¨ 3 | π¨ 2 π π¨ 4 | π¨ 3 β¦ z 1 z 2 z 3 z 4 π π₯ 4 |π¨ 4 π π₯ 1 |π¨ 1 π π₯ 2 |π¨ 2 π π₯ 3 |π¨ 3 w 1 w 2 w 3 w 4 Each z i can take the value of one of K latent states Transition and emission distributions do not change
Example: 2-state Hidden Markov Model as a Lattice β¦ z 1 = z 2 = z 3 = z 4 = V V V V β¦ z 1 = z 2 = z 3 = z 4 = N N N N w 1 w 2 w 3 w 4
Example: 2-state Hidden Markov Model as a Lattice β¦ z 1 = z 2 = z 3 = z 4 = V V V V β¦ z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 3 |π π π₯ 4 |π π π₯ 1 |π π π₯ 2 |π π π₯ 4 |π π π₯ 1 |π π π₯ 2 |π π π₯ 3 |π w 1 w 2 w 3 w 4
Example: 2-state Hidden Markov Model as a Lattice π π| start π π| π π π| π π π| π β¦ z 1 = z 2 = z 3 = z 4 = V V V V π π| start π π| π π π| π π π| π β¦ z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 3 |π π π₯ 4 |π π π₯ 1 |π π π₯ 2 |π π π₯ 4 |π π π₯ 1 |π π π₯ 2 |π π π₯ 3 |π w 1 w 2 w 3 w 4
Example: 2-state Hidden Markov Model as a Lattice π π| start π π| π π π| π π π| π β¦ z 1 = z 2 = z 3 = z 4 = V V V V π π| π π π| π π π| π π π| π π π| π π π| π π π| start π π| π π π| π π π| π β¦ z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 3 |π π π₯ 4 |π π π₯ 1 |π π π₯ 2 |π π π₯ 4 |π π π₯ 1 |π π π₯ 2 |π π π₯ 3 |π w 1 w 2 w 3 w 4
Comparison of Joint Probabilities π π₯ 1 , π₯ 2 , β¦ , π₯ π = π π₯ 1 π π₯ 2 β― π π₯ π = ΰ· π π₯ π π Unigram Language Model
Comparison of Joint Probabilities π π₯ 1 , π₯ 2 , β¦ , π₯ π = π π₯ 1 π π₯ 2 β― π π₯ π = ΰ· π π₯ π π Unigram Language Model π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ ,π¨ π , π₯ π = π π¨ 1 π π₯ 1 |π¨ 1 β― π π¨ π π π₯ π |π¨ π = ΰ· π π₯ π |π¨ π π π¨ π π Unigram Class- based Language Model (βKβ coins)
Comparison of Joint Probabilities π π₯ 1 , π₯ 2 , β¦ , π₯ π = π π₯ 1 π π₯ 2 β― π π₯ π = ΰ· π π₯ π π Unigram Language Model π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ ,π¨ π , π₯ π = π π¨ 1 π π₯ 1 |π¨ 1 β― π π¨ π π π₯ π |π¨ π = ΰ· π π₯ π |π¨ π π π¨ π π Unigram Class- based Language Model (βKβ coins) π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π = π π¨ 1 | π¨ 0 π π₯ 1 |π¨ 1 β― π π¨ π | π¨ πβ1 π π₯ π |π¨ π = ΰ· π π₯ π |π¨ π π π¨ π | π¨ πβ1 π Hidden Markov Model
Estimating Parameters from Observed Data π π| π z 1 = z 2 = z 3 = z 4 = π π| π V V V V Transition Counts π π| start π π| π N V end z 1 = z 2 = z 3 = z 4 = N N N N start π π₯ 4 |π N π π₯ 1 |π π π₯ 3 |π π π₯ 2 |π V w 1 w 2 w 3 w 4 Emission Counts w 1 w 2 W 3 w 4 z 1 = z 2 = z 3 = z 4 = N V V V V V π π| π π π| π π π| start end emission not shown π π| π z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 2 |π π π₯ 4 |π π π₯ 1 |π π π₯ 3 |π w 1 w 2 w 3 w 4
Estimating Parameters from Observed Data π π| π z 1 = z 2 = z 3 = z 4 = π π| π V V V V Transition Counts π π| start π π| π N V end z 1 = z 2 = z 3 = z 4 = N N N N start 2 0 0 π π₯ 4 |π N 1 2 2 π π₯ 1 |π π π₯ 3 |π π π₯ 2 |π V 2 1 0 w 1 w 2 w 3 w 4 Emission Counts w 1 w 2 W 3 w 4 z 1 = z 2 = z 3 = z 4 = N 2 0 1 2 V V V V V 0 2 1 0 π π| π π π| π π π| start end emission not shown π π| π z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 2 |π π π₯ 4 |π π π₯ 1 |π π π₯ 3 |π w 1 w 2 w 3 w 4
Estimating Parameters from Observed Data π π| π z 1 = z 2 = z 3 = z 4 = π π| π V V V V Transition MLE π π| start π π| π N V end z 1 = z 2 = z 3 = z 4 = N N N N start 1 0 0 π π₯ 4 |π N .2 .4 .4 π π₯ 1 |π π π₯ 3 |π π π₯ 2 |π V 2/3 1/3 0 w 1 w 2 w 3 w 4 Emission MLE w 1 w 2 W 3 w 4 z 1 = z 2 = z 3 = z 4 = N .4 0 .2 .4 V V V V V 0 2/3 1/3 0 π π| π π π| π π π| start end emission not shown π π| π z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 2 |π π π₯ 4 |π π π₯ 1 |π π π₯ 3 |π w 1 w 2 w 3 w 4
Estimating Parameters from Observed Data π π| π z 1 = z 2 = z 3 = z 4 = π π| π V V V V Transition MLE π π| start π π| π N V end z 1 = z 2 = z 3 = z 4 = N N N N start 1 0 0 π π₯ 4 |π N .2 .4 .4 π π₯ 1 |π π π₯ 3 |π π π₯ 2 |π V 2/3 1/3 0 w 1 w 2 w 3 w 4 Emission MLE w 1 w 2 W 3 w 4 z 1 = z 2 = z 3 = z 4 = N .4 0 .2 .4 V V V V V 0 2/3 1/3 0 π π| π π π| π π π| start end emission not shown π π| π z 1 = z 2 = z 3 = z 4 = N N N N smooth these π π₯ 2 |π values if π π₯ 4 |π π π₯ 1 |π π π₯ 3 |π needed w 1 w 2 w 3 w 4
Outline HMM Motivation (Part of Speech) and Brief Definition What is Part of Speech? HMM Detailed Definition HMM Tasks
Hidden Markov Model Tasks π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π = π π¨ 1 | π¨ 0 π π₯ 1 |π¨ 1 β― π π¨ π | π¨ πβ1 π π₯ π |π¨ π emission transition = ΰ· π π₯ π |π¨ π π π¨ π | π¨ πβ1 probabilities/parameters probabilities/parameters π Calculate the (log) likelihood of an observed sequence w 1 , β¦, w N Calculate the most likely sequence of states (for an observed sequence) Learn the emission and transition parameters
Hidden Markov Model Tasks π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π = π π¨ 1 | π¨ 0 π π₯ 1 |π¨ 1 β― π π¨ π | π¨ πβ1 π π₯ π |π¨ π emission transition = ΰ· π π₯ π |π¨ π π π¨ π | π¨ πβ1 probabilities/parameters probabilities/parameters π Calculate the (log) likelihood of an observed sequence w 1 , β¦, w N Calculate the most likely sequence of states (for an observed sequence) Learn the emission and transition parameters
HMM Likelihood Task Marginalize over all latent sequence joint likelihoods π π₯ 1 , π₯ 2 , β¦ , π₯ π = π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π ΰ· π¨ 1 ,β―,π¨ π Q: In a K-state HMM for a length N observation sequence, how many summands (different latent sequences) are there?
HMM Likelihood Task Marginalize over all latent sequence joint likelihoods π π₯ 1 , π₯ 2 , β¦ , π₯ π = π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π ΰ· π¨ 1 ,β―,π¨ π Q: In a K-state HMM for a length N observation sequence, how many summands (different latent sequences) are there? A: K N
HMM Likelihood Task Marginalize over all latent sequence joint likelihoods π π₯ 1 , π₯ 2 , β¦ , π₯ π = π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π ΰ· π¨ 1 ,β―,π¨ π Q: In a K-state HMM for a length N observation sequence, how many summands (different latent sequences) are there? A: K N Goal: Find a way to compute this exponential sum efficiently (in polynomial time)
HMM Likelihood Task Like in language modeling, you need to model when to Marginalize over all latent sequence joint stop generating. likelihoods This ending state is generally not included in βK.β π π₯ 1 , π₯ 2 , β¦ , π₯ π = π π¨ 1 , π₯ 1 , π¨ 2 , π₯ 2 , β¦ , π¨ π , π₯ π ΰ· π¨ 1 ,β―,π¨ π Q: In a K-state HMM for a length N observation sequence, how many summands (different latent sequences) are there? A: K N Goal: Find a way to compute this exponential sum efficiently (in polynomial time)
2 (3)-State HMM Likelihood π π| start π π| π π π| π π π| π β¦ z 1 = z 2 = z 3 = z 4 = V V V V π π| π π π| π π π| π π π| π π π| π π π| π π π| start π π| π π π| π π π| π β¦ z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 3 |π π π₯ 4 |π π π₯ 1 |π π π₯ 2 |π π π₯ 4 |π π π₯ 1 |π π π₯ 2 |π π π₯ 3 |π w 1 w 2 w 3 w 4 Q: What are the latent sequences here (EOS excluded)?
2 (3)-State HMM Likelihood π π| start π π| π π π| π π π| π β¦ z 1 = z 2 = z 3 = z 4 = V V V V π π| π π π| π π π| π π π| π π π| π π π| π π π| start π π| π π π| π π π| π β¦ z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 3 |π π π₯ 4 |π π π₯ 1 |π π π₯ 2 |π π π₯ 4 |π π π₯ 1 |π π π₯ 2 |π π π₯ 3 |π w 1 w 2 w 3 w 4 Q: What are the latent sequences here (EOS excluded)? A: (N, w 1 ), (N, w 2 ), (N, w 3 ), (N, w 4 ) (N, w 1 ), (V, w 2 ), (N, w 3 ), (N, w 4 ) (V, w 1 ), (N, w 2 ), (N, w 3 ), (N, w 4 ) (N, w 1 ), (N, w 2 ), (N, w 3 ), (V, w 4 ) (N, w 1 ), (V, w 2 ), (N, w 3 ), (V, w 4 ) (V, w 1 ), (N, w 2 ), (N, w 3 ), (V, w 4 ) (N, w 1 ), (N, w 2 ), (V, w 3 ), (N, w 4 ) (N, w 1 ), (V, w 2 ), (V, w 3 ), (N, w 4 ) β¦ (six more) (N, w 1 ), (N, w 2 ), (V, w 3 ), (V, w 4 ) (N, w 1 ), (V, w 2 ), (V, w 3 ), (V, w 4 )
2 (3)-State HMM Likelihood π π| start π π| π π π| π π π| π β¦ z 1 = z 2 = z 3 = z 4 = V V V V π π| π π π| π π π| π π π| π π π| π π π| π π π| start π π| π π π| π π π| π β¦ z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 3 |π π π₯ 4 |π π π₯ 1 |π π π₯ 2 |π π π₯ 4 |π π π₯ 1 |π π π₯ 2 |π π π₯ 3 |π w 1 w 2 w 3 w 4 Q: What are the latent sequences here (EOS excluded)? A: (N, w 1 ), (N, w 2 ), (N, w 3 ), (N, w 4 ) (N, w 1 ), (V, w 2 ), (N, w 3 ), (N, w 4 ) (V, w 1 ), (N, w 2 ), (N, w 3 ), (N, w 4 ) (N, w 1 ), (N, w 2 ), (N, w 3 ), (V, w 4 ) (N, w 1 ), (V, w 2 ), (N, w 3 ), (V, w 4 ) (V, w 1 ), (N, w 2 ), (N, w 3 ), (V, w 4 ) (N, w 1 ), (N, w 2 ), (V, w 3 ), (N, w 4 ) (N, w 1 ), (V, w 2 ), (V, w 3 ), (N, w 4 ) β¦ (six more) (N, w 1 ), (N, w 2 ), (V, w 3 ), (V, w 4 ) (N, w 1 ), (V, w 2 ), (V, w 3 ), (V, w 4 )
2 (3)-State HMM Likelihood π π| start π π| π π π| π π π| π β¦ z 1 = z 2 = z 3 = z 4 = V V V V π π| π π π| π π π| π π π| π π π| π π π| π π π| start π π| π π π| π π π| π β¦ z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 3 |π π π₯ 4 |π π π₯ 1 |π π π₯ 2 |π π π₯ 4 |π π π₯ 1 |π π π₯ 2 |π π π₯ 3 |π w 1 w 2 w 3 w 4 N V end w 1 w 2 w 3 w 4 start .7 .2 .1 N .7 .2 .05 .05 N .15 .8 .05 V .2 .6 .1 .1 V .6 .35 .05
2 (3)-State HMM Likelihood π π| start π π| π π π| π π π| π β¦ z 1 = z 2 = z 3 = z 4 = V V V V π π| π π π| π π π| π π π| π π π| π π π| π π π| start π π| π π π| π π π| π β¦ z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 3 |π π π₯ 4 |π π π₯ 1 |π π π₯ 2 |π π π₯ 4 |π π π₯ 1 |π π π₯ 2 |π π π₯ 3 |π w 1 w 2 w 3 w 4 Q: Whatβs the probability of N V end (N, w 1 ), (V, w 2 ), (V, w 3 ), (N, w 4 )? w 1 w 2 w 3 w 4 start .7 .2 .1 N .7 .2 .05 .05 N .15 .8 .05 V .2 .6 .1 .1 V .6 .35 .05
2 (3)-State HMM Likelihood π π| π z 1 = z 2 = z 3 = z 4 = V V V V π π| π π π| π π π| start z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 3 |π π π₯ 2 |π π π₯ 4 |π π π₯ 1 |π w 1 w 2 w 3 w 4 Q: Whatβs the probability of N V end (N, w 1 ), (V, w 2 ), (V, w 3 ), (N, w 4 )? w 1 w 2 w 3 w 4 start .7 .2 .1 N .7 .2 .05 .05 A: (.7*.7) * (.8*.6) * (.35*.1) * (.6*.05) = N .15 .8 .05 V .2 .6 .1 .1 0.0002822 V .6 .35 .05
2 (3)-State HMM Likelihood π π| π z 1 = z 2 = z 3 = z 4 = V V V V π π| π π π| π π π| start z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 3 |π π π₯ 2 |π π π₯ 4 |π π π₯ 1 |π w 1 w 2 w 3 w 4 Q: Whatβs the probability of (N, w 1 ), (V, w 2 ), (V, w 3 ), (N, w 4 ) w 1 w 2 w 3 w 4 # N V end with ending included (unique N .7 .2 .05 .05 0 start .7 .2 .1 ending symbol β#β)? V .2 .6 .1 .1 0 A: (.7*.7) * (.8*.6) * (.35*.1) * (.6*.05) * N .15 .8 .05 (.05 * 1) = 0 0 0 0 1 end V .6 .35 .05 0.00001235
2 (3)-State HMM Likelihood π π| start π π| π π π| π π π| π β¦ z 1 = z 2 = z 3 = z 4 = V V V V π π| π π π| π π π| π π π| π π π| π π π| π π π| start π π| π π π| π π π| π β¦ z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 3 |π π π₯ 4 |π π π₯ 1 |π π π₯ 2 |π π π₯ 4 |π π π₯ 1 |π π π₯ 2 |π π π₯ 3 |π w 1 w 2 w 3 w 4 Q: Whatβs the probability of N V end (N, w 1 ), (V, w 2 ), (N, w 3 ), (N, w 4 )? w 1 w 2 w 3 w 4 start .7 .2 .1 N .7 .2 .05 .05 N .15 .8 .05 V .2 .6 .1 .1 V .6 .35 .05
2 (3)-State HMM Likelihood z 1 = z 2 = z 3 = z 4 = V V V V π π| π π π| π π π| start π π| π z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 2 |π π π₯ 4 |π π π₯ 1 |π π π₯ 3 |π w 1 w 2 w 3 w 4 N V end Q: Whatβs the probability of w 1 w 2 w 3 w 4 (N, w 1 ), (V, w 2 ), (N, w 3 ), (N, w 4 )? start .7 .2 .1 N .7 .2 .05 .05 A: (.7*.7) * (.8*.6) * (.6*.05) * (.15*.05) = N .15 .8 .05 V .2 .6 .1 .1 0.00007056 V .6 .35 .05
2 (3)-State HMM Likelihood z 1 = z 2 = z 3 = z 4 = V V V V π π| π π π| π π π| start π π| π z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 2 |π π π₯ 4 |π π π₯ 1 |π π π₯ 3 |π w 1 w 2 w 3 w 4 Q: Whatβs the probability of w 1 w 2 w 3 w 4 # N V end (N, w 1 ), (V, w 2 ), (N, w 3 ), (N, w 4 ) with N .7 .2 .05 .05 0 ending (unique symbol β#β)? start .7 .2 .1 V .2 .6 .1 .1 0 A: (.7*.7) * (.8*.6) * (.6*.05) * (.15*.05) * N .15 .8 .05 (.05 * 1) = 0 0 0 0 1 end V .6 .35 .05 0.000002646
2 (3)-State HMM Likelihood π π| π z 1 = z 2 = z 3 = z 4 = π π| π V V V V π π| start π π| π z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 1 |π π π₯ 4 |π π π₯ 3 |π π π₯ 2 |π w 1 w 2 w 3 w 4 z 1 = z 2 = z 3 = z 4 = V V V V π π| π π π| π π π| start π π| π z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 2 |π π π₯ 4 |π π π₯ 1 |π π π₯ 3 |π w 1 w 2 w 3 w 4
2 (3)-State HMM Likelihood π π| π z 1 = z 2 = z 3 = z 4 = π π| π V V V V π π| start π π| π z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 1 |π π π₯ 4 |π π π₯ 3 |π π π₯ 2 |π w 1 w 2 w 3 w 4 Up until here, all the computation was the same Letβs reuse what computations we can z 1 = z 2 = z 3 = z 4 = V V V V π π| π π π| π π π| start π π| π z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 2 |π π π₯ 4 |π π π₯ 1 |π π π₯ 3 |π w 1 w 2 w 3 w 4
2 (3)-State HMM Likelihood π π| π z 1 = z 2 = z 3 = z 4 = π π| π V V V V π π| start π π| π z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 1 |π π π₯ 4 |π π π₯ 3 |π π π₯ 2 |π Solution : pass information w 1 w 2 w 3 w 4 "forward" in the graph, e.g., from timestep 2 to 3... z 1 = z 2 = z 3 = z 4 = V V V V π π| π π π| π π π| start π π| π z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 2 |π π π₯ 4 |π π π₯ 1 |π π π₯ 3 |π w 1 w 2 w 3 w 4
2 (3)-State HMM Likelihood π π| π z 1 = z 2 = z 3 = z 4 = π π| π V V V V π π| start π π| π z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 1 |π π π₯ 4 |π π π₯ 3 |π π π₯ 2 |π Solution : pass information w 1 w 2 w 3 w 4 "forward" in the graph, e.g., from timestep 2 to 3... z 1 = z 2 = z 3 = z 4 = V V V V Issue : these are only two of the π π| π 16 paths through the trellis π π| π π π| start π π| π z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 2 |π π π₯ 4 |π π π₯ 1 |π π π₯ 3 |π w 1 w 2 w 3 w 4
2 (3)-State HMM Likelihood π π| π z 1 = z 2 = z 3 = z 4 = π π| π V V V V π π| start π π| π z 1 = z 2 = z 3 = z 4 = N N N N π π₯ 1 |π π π₯ 4 |π π π₯ 3 |π π π₯ 2 |π Solution : pass information w 1 w 2 w 3 w 4 "forward" in the graph, e.g., from timestep 2 to 3... z 1 = z 2 = z 3 = z 4 = V V V V Issue : these are only two of the π π| π 16 paths through the trellis π π| π π π| start π π| π z 1 = z 2 = z 3 = z 4 = N N N N Solution : β¦ marginalize (sum) π π₯ 2 |π out all information from π π₯ 4 |π π π₯ 1 |π π π₯ 3 |π previous timesteps (0 & 1) w 1 w 2 w 3 w 4
Reusing Computation Ξ±(i -1, A) z i-2 z i-1 z i = = A = A A z i-2 z i-1 z i = = B = B B Ξ±(i -1, B) z i-2 z i-1 z i = = C = C C Ξ±(i -1, C) letβs first consider β any shared path ending with B (AB, BB, or CB) β Bβ assume any necessary information has been properly computed and stored along these paths: Ξ±(i -1, A), Ξ±(i -1, B), Ξ±(i -1, C)
Reusing Computation Ξ±(i -1, A) z i-2 z i-1 z i = = A = A A z i-2 z i-1 z i = = B = B B Ξ±(i -1, B) z i-2 z i-1 z i = = C = C C Ξ±(i -1, C) letβs first consider β any shared path ending with B (AB, BB, or CB) β Bβ marginalize across the previous hidden state values, Ξ±(i -1, A), Ξ±(i -1, B), Ξ±(i -1, C)
Reusing Computation Ξ±(i -1, A) z i-2 z i-1 z i = = A = A A z i-2 z i-1 z i = = B = B B Ξ±( i, B) Ξ±(i -1, B) z i-2 z i-1 z i = = C = C C Ξ±(i -1, C) letβs first consider β any shared path ending with B (AB, BB, or CB) β Bβ marginalize across the previous hidden state values, Ξ±(i -1, A), Ξ±(i -1, B), Ξ±(i -1, C) π½ π, πΆ = ΰ· π½ π β 1, π‘ β π πΆ π‘) β π(obs at π | πΆ) π‘
Reusing Computation Ξ±(i -1, A) z i-2 z i-1 z i = = A = A A z i-2 z i-1 z i = = B = B B Ξ±( i, B) Ξ±(i -1, B) z i-2 z i-1 z i = = C = C C Ξ±(i -1, C) letβs first consider β any shared path ending with B (AB, BB, or CB) β Bβ marginalize across the previous hidden state values π½ π, πΆ = ΰ· π½ π β 1, π‘ β π πΆ π‘) β π(obs at π | πΆ) π‘ computing Ξ± at time i-1 will correctly incorporate paths through time i-2 : we correctly obey the Markov property
Forward Probability z i-2 z i-1 z i = = A = A A z i-2 z i-1 z i = = B = B B z i-2 z i-1 z i = = C = C C letβs first consider β any shared path ending with B (AB, BB, or CB) β Bβ marginalize across the previous hidden state values Ξ±(i, B) is the total probability of all π½ π β 1, π‘ β² π‘ β² ) β π(obs at π | πΆ) π½ π, πΆ = ΰ· β π πΆ paths to that state B from the π‘ β² beginning computing Ξ± at time i-1 will correctly incorporate paths through time i-2 : we correctly obey the Markov property
Forward Probability Ξ±(i, s ) is the total probability of all paths: 1. that start from the beginning 2. that end (currently) in s at step i 3. that emit the observation obs at i
Forward Probability what are the whatβs the total probability how likely is it to get immediate ways to up until now? into state s this way? get into state s ? Ξ±(i, s ) is the total probability of all paths: 1. that start from the beginning 2. that end (currently) in s at step i 3. that emit the observation obs at i
2 (3) -State HMM Likelihood with Forward Probabilities Ξ±[3, V] = Ξ±[2, V] * (.35*.1)+ z 3 = z 4 = π π| π Ξ±[2, N] * (.8*.1) V V Ξ±[2, V] = π π| π Ξ±[1, N] * (.8*.6) + z 3 = z 4 = Ξ±[1, V] * (.35*.6) N N π π| start z 1 = z 2 = π π₯ 4 |π π π₯ 3 |π V V w 3 w 4 π π| start π π| π z 1 = z 2 = Ξ±[1, N] = N N (.7*.7) z 3 = z 4 = π π₯ 2 |π π π₯ 1 |π Ξ±[3, N] = V V Ξ±[2, V] * (.6*.05) + w 1 w 2 Ξ±[2, N] * (.15*.05) π π| π π π| π z 3 = z 4 = N V end N N start .7 .2 .1 w 1 w 2 w 3 w 4 π π₯ 4 |π π π₯ 3 |π N .15 .8 .05 N .7 .2 .05 .05 V .2 .6 .1 .1 V .6 .35 .05 w 3 w 4
2 (3) -State HMM Likelihood with Forward Probabilities Ξ±[3, V] = Ξ±[2, V] * (.35*.1)+ z 3 = z 4 = π π| π Ξ±[2, N] * (.8*.1) V V Ξ±[2, V] = π π| π Ξ±[1, V] = Ξ±[1, N] * (.8*.6) + z 3 = z 4 = (.2*.2) Ξ±[1, V] * (.35*.6) N N π π| start z 1 = z 2 = π π₯ 4 |π π π₯ 3 |π V V w 3 w 4 π π| start π π| π z 1 = z 2 = Ξ±[1, N] = N N (.7*.7) z 3 = z 4 = π π₯ 2 |π π π₯ 1 |π Ξ±[3, N] = V V Ξ±[2, V] * (.6*.05) + w 1 w 2 Ξ±[2, N] * (.15*.05) π π| π π π| π z 3 = z 4 = N V end N N start .7 .2 .1 w 1 w 2 w 3 w 4 π π₯ 4 |π π π₯ 3 |π N .15 .8 .05 N .7 .2 .05 .05 V .2 .6 .1 .1 V .6 .35 .05 w 3 w 4
2 (3) -State HMM Likelihood with Forward Probabilities Ξ±[3, V] = Ξ±[2, V] * (.35*.1)+ z 3 = z 4 = π π| π Ξ±[2, N] * (.8*.1) V V Ξ±[2, V] = π π| π Ξ±[1, N] * (.8*.6) + Ξ±[1, V] = z 3 = z 4 = Ξ±[1, V] * (.35*.6) = (.2*.2) = .04 N N 0.2436 π π| start z 1 = z 2 = π π₯ 4 |π π π₯ 3 |π V V w 3 w 4 π π| start π π| π z 1 = z 2 = Ξ±[1, N] = N N (.7*.7) = .49 z 3 = z 4 = π π₯ 2 |π π π₯ 1 |π Ξ±[2, N] = Ξ±[3, N] = V V Ξ±[1, N] * (.15*.2) + Ξ±[2, V] * (.6*.05) + w 1 w 2 Ξ±[1, V] * (.6*.2) = Ξ±[2, N] * (.2*.05) π π| π .0195 π π| π z 3 = z 4 = N V end N N start .7 .2 .1 w 1 w 2 w 3 w 4 π π₯ 4 |π π π₯ 3 |π N .15 .8 .05 N .7 .2 .05 .05 V .2 .6 .1 .1 V .6 .35 .05 w 3 w 4
2 (3) -State HMM Likelihood with Forward Probabilities Ξ±[3, V] = Ξ±[2, V] * (.35*.1)+ z 3 = z 4 = π π| π Ξ±[2, N] * (.8*.1) V V Ξ±[2, V] = π π| π Ξ±[1, N] * (.8*.6) + Ξ±[1, V] = z 3 = z 4 = Ξ±[1, V] * (.35*.6) = (.2*.2) = .04 N N 0.2436 π π| start z 1 = z 2 = π π₯ 4 |π π π₯ 3 |π V V Use dynamic programming w 3 w 4 π π| start π π| π to build the Ξ± left-to-right z 1 = z 2 = Ξ±[1, N] = N N (.7*.7) = .49 z 3 = z 4 = π π₯ 2 |π π π₯ 1 |π Ξ±[2, N] = Ξ±[3, N] = V V Ξ±[1, N] * (.15*.2) + Ξ±[2, V] * (.6*.05) + w 1 w 2 Ξ±[1, V] * (.6*.2) = Ξ±[2, N] * (.2*.05) π π| π .0195 π π| π z 3 = z 4 = N V end N N start .7 .2 .1 w 1 w 2 w 3 w 4 π π₯ 4 |π π π₯ 3 |π N .15 .8 .05 N .7 .2 .05 .05 V .2 .6 .1 .1 V .6 .35 .05 w 3 w 4
Forward Algorithm Ξ± : a 2D table, N+2 x K* N+2: number of observations (+2 for the BOS & EOS symbols) K*: number of states Use dynamic programming to build the Ξ± left-to- right
Forward Algorithm Ξ± = double[N+2][K*] Ξ± [0][*] = 0.0 Ξ± [0][START] = 1.0 for(i = 1; i β€ N+1; ++ i) { }
Forward Algorithm Ξ± = double[N+2][K*] Ξ± [0][*] = 0.0 Ξ± [0][START] = 1.0 for(i = 1; i β€ N+1; ++ i) { for(state = 0; state < K*; ++state) { } }
Forward Algorithm Ξ± = double[N+2][K*] Ξ± [0][*] = 0.0 Ξ± [0][START] = 1.0 for(i = 1; i β€ N+1; ++ i) { for(state = 0; state < K*; ++state) { p obs = p emission (obs i | state) } }
Forward Algorithm Ξ± = double[N+2][K*] Ξ± [0][*] = 0.0 Ξ± [0][START] = 1.0 for(i = 1; i β€ N+1; ++ i) { for(state = 0; state < K*; ++state) { p obs = p emission (obs i | state) for(old = 0; old < K*; ++old) { p move = p transition (state | old) Ξ± [i][state] += Ξ± [i-1][old] * p obs * p move } } }
Forward Algorithm Ξ± = double[N+2][K*] Ξ± [0][*] = 0.0 we still need to learn these Ξ± [0][START] = 1.0 (EM if not observed) for(i = 1; i β€ N+1; ++ i) { for(state = 0; state < K*; ++state) { p obs = p emission (obs i | state) for(old = 0; old < K*; ++old) { p move = p transition (state | old) Ξ± [i][state] += Ξ± [i-1][old] * p obs * p move } } }
Forward Algorithm Ξ± = double[N+2][K*] Ξ± [0][*] = 0.0 Ξ± [0][START] = 1.0 Q: What do we return? (How do we return the likelihood of the sequence?) for(i = 1; i β€ N+1; ++ i) { for(state = 0; state < K*; ++state) { p obs = p emission (obs i | state) for(old = 0; old < K*; ++old) { p move = p transition (state | old) Ξ± [i][state] += Ξ± [i-1][old] * p obs * p move } } }
Forward Algorithm Ξ± = double[N+2][K*] Ξ± [0][*] = 0.0 Ξ± [0][START] = 1.0 Q: What do we return? (How do we return the likelihood of the sequence?) for(i = 1; i β€ N+1; ++ i) { for(state = 0; state < K*; ++state) { p obs = p emission (obs i | state) A: Ξ± [N+1][end] for(old = 0; old < K*; ++old) { p move = p transition (state | old) Ξ± [i][state] += Ξ± [i-1][old] * p obs * p move } } }
Interactive HMM Example https://goo.gl/rbHEoc (Jason Eisner, 2002) Original: http://www.cs.jhu.edu/~jason/465/PowerPoint/lect24-hmm.xls
Recommend
More recommend