Automatic Speech Recognition (CS753) Automatic Speech Recognition - PowerPoint PPT Presentation

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 5: Hidden Markov Models (Part I) Instructor: Preethi Jyothi Lecture 5  

OpenFst Cheat Sheet

�� Qv ick Intro to OpenFst (www.openfst.org) a �� “ 0 ” � l a b e l � i s � r e s e r v e d � f o r � e p s i l o n 0 1 2 an �� 0 1 an a <eps> 0 Input   1 2 <eps> n an 1 alphabet   (in.txt) 0 2 a a a 2 1 2 <eps> 0 Output   a 1 alphabet   A.txt (out.txt) n 2

�� Qv ick Intro to OpenFst (www.openfst.org) a �� 2/0.1 0 1 an �� 0 1 an a 0.5 1 2 <eps> n 1.0 0 2 a a 0.5 1 2 0.1

Compiling & Printing FSTs The text FSTs need to be “compiled” into binary objects before further use with OpenFst utilities Command used to compile: • fstcompile --isymbols=in.txt --osymbols=out.txt A.txt A.fst Get back the text FST using a print command with the binary file: • fstprint --isymbols=in.txt --osymbols=out.txt A.fst A.txt

Drawing FSTs Small FSTs can be visualized easily using the draw tool: fstdraw --isymbols=in.txt --osymbols=out.txt A.fst | dot -Tpdf > A.pdf 1 <eps>:n an:a 0 2 a:a

Fairly large FST!

Hidden Markov Models (HMMs) Following slides contain figures/material from “Hidden Markov Models”,   Chapter 9, “Speech and Language Processing”, D. Jurafsky and J. H. Martin, 2016. (h tu ps://web.stanford.edu/~jurafsky/slp3/9.pd f )

Markov Chains Q = q 1 q 2 ... q N a set of N states A = a 01 a 02 ... a n 1 ... a nn a transition probability matrix A , each a i j rep- resenting the probability of moving from state i to state j , s.t. P n j = 1 a i j = 1 ∀ i π = π 1 , π 2 ,..., π N an initial probability distribution over states. π i is the q 0 , q F a special start state and end (final) state that are probability that the Markov chain will start in state i . Some not associated with observations states j may have π j = 0, meaning that they cannot be initial states. Also, P n i = 1 π i = 1 QA = { q x , q y ... } a set QA ⊂ Q of legal accepting states

Hidden Markov Model Q = q 1 q 2 ... q N a set of N states A = a 11 a 12 ... a n 1 ... a nn a transition probability matrix A , each a i j rep- resenting the probability of moving from state i to state j , s.t. P n j = 1 a i j = 1 ∀ i O = o 1 o 2 ... o T a sequence of T observations , each one drawn from a vocabulary V = v 1 , v 2 ,..., v V B = b i ( o t ) a sequence of observation likelihoods , also called emission probabilities , each expressing the probability of an observation o t being gen- erated from a state i q 0 , q F a special start state and end (final) state that are not associated with observations, together with transition probabilities a 01 a 02 ... a 0 n out of the start state and a 1 F a 2 F ... a nF into the end state

Three problems for HMMs Given an HMM λ = ( A , B ) and an observation se- Problem 1 (Likelihood): quence O , determine the likelihood P ( O | λ ) . Given an observation sequence O and an HMM λ = Problem 2 (Decoding): ( A , B ) , discover the best hidden state sequence Q . Problem 3 (Learning): Given an observation sequence O and the set of states in the HMM, learn the HMM parameters A and B . Computing Likelihood: Given an HMM λ = ( A , B ) and an observation sequence O , determine the likelihood P ( O | λ ) .

Forward Trellis N X α t ( j ) = P ( o 1 , o 2 ... o t , q t = j | λ ) α t ( j ) = α t − 1 ( i ) a ij b j ( o t ) i = 1 q F end end end end α 1 (2) =.32 α 2 (2) = .32*.12 + .02*.08 = .040 P(H|H) * P(1|H) H H H q 2 H P(C|H) * P(1|C) .6 * .2 .3 * .5 P(H|start)*P(3|H) P(H|C) * P(1|H) α 2 (1) = .32*.15 + .02*.25 = .053 α 1 (1) = .02 .8 * .4 .4 * .2 P(C|C) * P(1|C) q 1 C C C C .5 * .5 P(C|start) * P(3|C) .2 * .1 q 0 start start start start 1 3 3 o 2 o 3 o 1 t

Forward Algorithm 1. Initialization: α 1 ( j ) = a 0 j b j ( o 1 ) 1 ≤ j ≤ N 2. Recursion (since states 0 and F are non-emitting): N X α t ( j ) = α t − 1 ( i ) a ij b j ( o t ) ; 1 ≤ j ≤ N , 1 < t ≤ T i = 1 3. Termination: N X P ( O | λ ) = α T ( q F ) = α T ( i ) a iF i = 1

Visualizing the forward recursion α t-2 (N) α t-1 (N) q N q N q N α t (j)= Σ i α t-1 (i) a ij b j (o t ) a Nj q j α t-2 (3) α t-1 (3) a 3j q 3 q 3 q 3 a 2j α t-2 (2) α t-1 (2) b j (o t ) a 1j q 2 q 2 q 2 q 2 α t-2 (1) α t-1 (1) q 1 q 1 q 1 q 1 ot-1 ot o t-2 o t+1

Three problems for HMMs Given an HMM λ = ( A , B ) and an observation se- Problem 1 (Likelihood): quence O , determine the likelihood P ( O | λ ) . Given an observation sequence O and an HMM λ = Problem 2 (Decoding): ( A , B ) , discover the best hidden state sequence Q . Problem 3 (Learning): Given an observation sequence O and the set of states in the HMM, learn the HMM parameters A and B . Decoding : Given as input an HMM λ = ( A , B ) and a sequence of observations O = o 1 , o 2 ,..., o T , find the most probable sequence of states Q = q 1 q 2 q 3 ... q T .

Viterbi Trellis N v t ( j ) = q 0 , q 1 ,..., q t − 1 P ( q 0 , q 1 ... q t − 1 , o 1 , o 2 ... o t , q t = j | λ ) v t ( j ) = i = 1 v t � 1 ( i ) a i j b j ( o t ) max max q F end end end end v 1 (2) =.32 v 2 (2) = max(.32*.12, .02*.08) = .038 P(H|H) * P(1|H) H H H q 2 H P(C|H) * P(1|C) .6 * .2 .3 * .5 P(H|start)*P(3|H) P(H|C) * P(1|H) v 2 (1) = max(.32*.15, .02*.25) = .048 v 1 (1) = .02 .8 * .4 .4 * .2 P(C|C) * P(1|C) q 1 C C C C .5 * .5 P(C|start) * P(3|C) .2 * .1 q 0 start start start start 3 1 3 o 2 o 3 o 1 t

Viterbi recursion 1. Initialization: v 1 ( j ) = a 0 j b j ( o 1 ) 1 ≤ j ≤ N bt 1 ( j ) = 0 2. Recursion (recall that states 0 and q F are non-emitting): N v t ( j ) = i = 1 v t − 1 ( i ) a i j b j ( o t ) ; 1 ≤ j ≤ N , 1 < t ≤ T max N bt t ( j ) = v t − 1 ( i ) a ij b j ( o t ) ; 1 ≤ j ≤ N , 1 < t ≤ T argmax i = 1 3. Termination: N P ∗ = v T ( q F ) = i = 1 v T ( i ) ∗ a iF The best score: max N q T ∗ = bt T ( q F ) = v T ( i ) ∗ a iF The start of backtrace: argmax i = 1

Viterbi backtrace q F end end end end v 1 (2) =.32 v 2 (2) = max(.32*.12, .02*.08) = .038 P(H|H) * P(1|H) H H H q 2 H P(C|H) * P(1|C) .6 * .2 .3 * .5 ) P(H|C) * P(1|H) H | v 2 (1) = max(.32*.15, .02*.25) = .048 3 ( P .4 * .2 v 1 (1) = .02 * 4 ) t . r a * t P(C|C) * P(1|C) 8 q 1 s C C C C . | H .5 * .5 P(C|start) * P(3|C) ( P .2 * .1 q 0 start start start start 3 1 3 o 2 o 3 o 1

Automatic Speech Recognition (CS753) Automatic Speech Recognition - PowerPoint PPT Presentation

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 5: Hidden Markov Models (Part I) Instructor: Preethi Jyothi Lecture 5 OpenFst Cheat Sheet Qv ick Intro to OpenFst (www.openfst.org) a 0

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 1: Introduction

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 1: Introduction

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 23: Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFSTs in ASR

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 21: Speaker

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 24: Statistical

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 22: Speaker

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 20:

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 10: Deep Neural

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 8: Hidden

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFST

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 14: Language

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 19: Search,

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 11: Recurrent

Hidden Markov Models Alan Ri1er Sequences of R.V.s

Hidden Markov Models. Petr Pok Czech Technical University in Prague Faculty of Electrical

Pr Probability obability an and d Ti Time: me: Hidden dden Markov arkov Model odels s (H

Numerical study of complex instantons in the Gross- Witten U(N) matrix model S. Valgushev, P.

Hidden Markov models and dynamic programming Matthew Macauley Department of Mathematical Sciences

Reasoning Over Time [RN2] Sec 15.1-15.3, 15.5 [RN3] Sec 15.1-15.3, 15.5 CS 486/686 University

Project 4, Question 2 3 The def elapseTime(self, gameState) function says: In order to obtain the

Pattern Recognition Part 8: Hidden Markov Models (HMMs) Gerhard Schmidt