automatic speech recognition cs753 automatic speech
play

Automatic Speech Recognition (CS753) Automatic Speech Recognition - PowerPoint PPT Presentation

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFST algorithms contd. + WFSTs in ASR Instructor: Preethi Jyothi August 3, 2017 Qv iz-1 Postmortem Common Mistakes: Correct Incorrect Missing


  1. Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFST algorithms contd. + WFSTs in ASR Instructor: Preethi Jyothi August 3, 2017

  2. Qv iz-1 Postmortem Common Mistakes: • Correct Incorrect Missing insertion/deletion 
 • in E.fst a) E.fst Forgot to mark final 
 • states/self-loops b) T.fst Output vocabulary for 
 • T.fst has to be complete 
 words, “bad”, “bead”, etc. 
 0 10 20 30 40 50 rather than le tu ers

  3. Project Proposal Start brainstorming! • In case of doubt, discuss potential ideas with me during my • o ff ice hours (Thur, 5:00 pm to 6:30 pm) Once decided, you will have to fill out a form specifying: • Title of the project • Names/roll numbers of all project members • A 300-400 word abstract of the proposed project • Due by 11:59 pm on Aug 14th •

  4. ����� ����� ����� Composition: Recap If T 1 transduces x to z , 
 • and T 2 transduces z to y , 
 then T 1 ○ T 2 transduces x to y Note: output alphabet of T 1 ⊆ input alphabet of T 2 • E.g. If T 1 removes punctuation symbols from a string, and T 2 changes 
 • uppercase le tu ers to lowercase le tu ers, then T 1 ⚬ T 2 brings about 
 both changes

  5. Determinization: Recap A (W)FST is deterministic if: • Unique start state • No two transitions from a state share the same input label • No epsilon input labels • Not all WFSAs can be determinized •

  6. ��� ��� ��� ��� ��� ��� Determinization: Weighted FSA Some Weighted -FSAs are not determinizable! [M97] 1 0 3 2 Weight of string �� n � = n and weight of �� n � = 2 n A fu er seeing �� n an FSA can’t remember n [M97] M. Mohri. Finite-State Transducers in Language and Speech Processing. Computational Linguistics, 23(2), 1997

  7. Determinization: Recap A (W)FST is deterministic if: • Unique start state • No two transitions from a state share the same input label • No epsilon input labels • Not all WFSAs can be determinized • Guaranteed to yield a deterministic WFSA under some technical • conditions characterising the automata (e.g. twins property)

  8. � � � � � � � � � � � � Minimization Minimization : find an equivalent deterministic FSA with the least number of states (and transitions) Unweighted FSAs have a unique minimal FSA [Aho74] 1 12 0 3 0 3 2 Obtained by identifying and merging equivalent states Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. The design and analysis of computer algorithms. Addison Wesley, 1974.

  9. ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� Minimization: Weighted FSA Two states are equivalent only if for every input string, the outcome — weight assigned to the string, if accepted — starting from the two states are the same 1 12 0 3 0 3 2 Redistribute weights before identifying equivalent states

  10. ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� Minimization: Weighted FSA Reweighting OK as long as resulting WFSA is equivalent Can reweight using a “potential function” on states +2 2 -2 1 1 0 0 -2 +1 3 0 3 0 +1 -1 2 -1 2 1 “Weight pushing”: Reweighting using a potential function that optimally moves weights towards the start state

  11. ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� Minimization: Weighted FSA A fu er weight-pushing, can simply apply unweighted FSA minimization (treating label/weight as label) 1 3 0 12 3 0 2 Guaranteed to yield a minimal WFSA (under some technical conditions required for weight-pushing)

  12. Toolkits to work with finite-state machines AT&T FSM Library (no longer supported) 
 • h tu p://www3.cs.stonybrook.edu/~algorith/implement/fsm/ implement.shtml RWTH FSA Toolkit 
 • h tu ps://www-i6.informatik.rwth-aachen.de/~kanthak/fsa.html Carmel 
 • h tu ps://www.isi.edu/licensed-sw/carmel/ MIT FST Toolkit 
 • h tu p://people.csail.mit.edu/ilh/fst/ OpenFST Toolkit (actively supported) 
 • h tu p://www.openfst.org/twiki/bin/view/FST/WebHome

  13. Brief Introduction to OpenFst

  14. ��� Qv ick Intro to OpenFst (www.openfst.org) a �� “ 0 ” � l a b e l � i s � r e s e r v e d � f o r � e p s i l o n 0 1 2 an �� 0 1 an a <eps> 0 Input 
 1 2 <eps> n an 1 alphabet 
 (in.txt) 0 2 a a a 2 1 2 <eps> 0 Output 
 a 1 alphabet 
 A.txt (out.txt) n 2

  15. ������� Qv ick Intro to OpenFst (www.openfst.org) a ������ 2/0.1 0 1 an ������ 0 1 an a 0.5 1 2 <eps> n 1.0 0 2 a a 0.5 1 2 0.1

  16. Compiling & Printing FSTs The text FSTs need to be “compiled” into binary objects before further use with OpenFst utilities Command used to compile: • fstcompile --isymbols=in.txt --osymbols=out.txt A.txt A.fst Get back the text FST using a print command with the binary file: • fstprint --isymbols=in.txt --osymbols=out.txt A.fst A.txt

  17. Drawing FSTs Small FSTs can be visualized easily using the draw tool: fstdraw --isymbols=in.txt --osymbols=out.txt A.fst | dot -Tpdf > A.pdf 1 <eps>:n an:a 0 2 a:a

  18. FSTs can get very large!

  19. WFSTs applied to ASR

  20. WFST-based ASR System Acoustic 
 Context 
 Pronunciation 
 Language 
 Models Transducer Monophones Model Model Acoustic 
 Word 
 Triphones Words Indices Sequence

  21. WFST-based ASR System Acoustic 
 Context 
 Pronunciation 
 Language 
 Models Transducer Monophones Model Model Acoustic 
 Word 
 Triphones Words Indices Sequence H a/a_b f 4 : ε f 1 : ε f 3 : ε f 5 : ε f 0 :a: a_b f 2 : ε f 4 : ε f 6 : ε } b/a_b FST Union + One 3-state 
 Closure HMM for 
 Resulting . each 
 FST . triphone H . x/y_z

  22. WFST-based ASR System Acoustic 
 Context 
 Pronunciation 
 Language 
 Models Transducer Monophones Model Model Acoustic 
 Word 
 Triphones Words Indices Sequence C x:x/ ε _ ε y:y/ ε _x x:x/ ε _y x:x/y_x x:x/y_ ε ε ,* x:x/y_y y,x x, ε x:x/x_x x:x/ ε _x y:y/x_x x:x/x_y x,y x,x y:y/x_y y:y/y_x y:y/y_y y,y y:y/y_ ε y:y/x_ ε y, ε x:x/x_ ε y:y/ ε _y y:y/ ε _ ε C -1 : Arc labels: “monophone : phone / le fu -context_right-context” Figure reproduced from “Weighted Finite State Transducers in Speech Recognition”, Mohri et al., 2002

  23. WFST-based ASR System Acoustic 
 Context 
 Pronunciation 
 Language 
 Models Transducer Monophones Model Model Acoustic 
 Word 
 Triphones Words Indices Sequence L (a) t: ε /0.3 ax: ε /1 ey: ε /0.5 2 3 4 dx: ε /0.7 ae: ε /0.5 d:data/1 1 0 d:dew/1 uw: ε /1 5 6 (b) Figure reproduced from “Weighted Finite State Transducers in Speech Recognition”, Mohri et al., 2002

  24. WFST-based ASR System Acoustic 
 Context 
 Pronunciation 
 Language 
 Models Transducer Monophones Model Model Acoustic 
 Word 
 Triphones Words Indices Sequence G are/0.693 walking birds/0.404 the 0 were/0.693 animals/1.789 is boy/1.789

Recommend


More recommend