Pre-midsem Revision Lecture 11 CS 753 Instructor: Preethi Jyothi

Tied-state Triphone Models

State Tying Observation probabilities are shared across triphone states • which generate acoustically similar data b/a/k p/a/k b/a/g Triphone HMMs (No sharing) b/a/k p/a/k b/a/g Triphone HMMs (State Tying)

Tied state HMMs Four main steps in building a tied state HMM system: 1. Create and train 3-state monophone HMMs with single Gaussian observation probability densities 2. Clone these monophone distributions to initialise a set of untied triphone models. Train them using Baum-Welch estimation. Transition matrix remains common across all triphones of each phone. 3. For all triphones derived from the same monophone, cluster states whose parameters should be tied together. 4. Number of mixture components in each tied state is increased and models re-estimated using BW Image from: Young et al., “Tree-based state tying for high accuracy acoustic modeling”, ACL-HLT, 1994

Tied state HMMs: Step 2 Clone these monophone distributions to initialise a set of untied triphone models Image from: Young et al., “Tree-based state tying for high accuracy acoustic modeling”, ACL-HLT, 1994

Tied state HMMs: Step 3 Use decision trees to determine which states should be tied together Image from: Young et al., “Tree-based state tying for high accuracy acoustic modeling”, ACL-HLT, 1994

Example: Phonetic Decision Tree (DT) One tree is constructed for each state of each monophone to cluster all the   corresponding triphone states DT for center   ow2 state of [ow] Head node Uses all training data   aa 2 / ox 2 /f 2 , aa 2 / ox 2 /s 2 ,   tagged with *-ow 2 +* aa 2 / ox 2 /d 2 , h 2 / ox 2 /p 2 , aa 2 / ox 2 /n 2 , aa 2 / ox 2 /g 2 , …

Training data for DT nodes Align training instance x = ( x 1 , …, x T ) where x i ∈ ℝ d with a set • of triphone HMMs Use Viterbi algorithm to find the best HMM triphone state • sequence corresponding to each x Tag each x t with ID of current phone along with left-context • and right-context x t { { { sil-b+aa b-aa+g aa-g+sil x t is tagged with ID b 2 -aa 2 +g 2 i.e. x t is aligned with the second state of the 3-state HMM corresponding to the triphone b-aa+g Training data corresponding to state j in phone p: Gather all • x t ’s that are tagged with ID *- p j +*

Example: Phonetic Decision Tree (DT) One tree is constructed for each state of each monophone to cluster all the   corresponding triphone states DT for center   ow1 ow2 Ow3 state of [ow] Head node Uses all training data   aa 2 / ox 2 /f 2 , aa 2 / ox 2 /s 2 ,   tagged as *-ow 2 +* aa 2 / ox 2 /d 2 , h 2 / ox 2 /p 2 , aa 2 / ox 2 /n 2 , aa 2 / ox 2 /g 2 , Is left ctxt a vowel? … Yes No Is right ctxt a Is right ctxt nasal? fricative? Yes No Yes No Is right ctxt a Leaf E Leaf A Leaf B glide? aa 2 / ox 2 /n 2 ,   aa 2 / ox 2 /f 2 ,   aa 2 / ox 2 /d 2 ,   aa 2 / ox 2 /m 2 , aa 2 / ox 2 /s 2 , Yes No aa 2 / ox 2 /g 2 , … … … Leaf C Leaf D h 2 / ox 2 /l 2 ,   h 2 / ox 2 /p 2 ,   b 2 / ox 2 /r 2 , b 2 / ox 2 /k 2 , … …

      How do we build these phone DTs? 1. What questions are used?   Linguistically-inspired binary questions: “Does the left or right phone come from a broad class of phones such as vowels, stops, etc.?” “Is the left or right phone [k] or [m]?” 2. What is the training data for each phone state, p j ? (root node of DT)   All speech frames that align with the j th state of every triphone HMM that has p as the middle phone 3. What criterion is used at each node to find the best question to split the data on?   Find the question which partitions the states in the parent node so as to give the maximum increase in log likelihood

Likelihood of a cluster of states If a cluster of HMM states, S = {s 1 , s 2 , …, s M } consists of M states • and a total of K acoustic observation vectors are associated with S, { x 1 , x 2 …, x K } , then the log likelihood associated with S is: K X X L ( S ) = log Pr( x i ; µ S , Σ S ) γ s ( x i ) i =1 s ∈ S For a question q that splits S into S yes and S no , compute the • following quantity: ∆ q = L ( S q yes ) + L ( S q no ) − L ( S ) Go through all questions, find Δ q for each question q and choose • the question for which Δ q is the biggest Terminate when: Final Δ q is below a threshold or data associated • with a split falls below a threshold

WFSTs for ASR

WFST-based ASR System Acoustic   Context   Pronunciation   Language   Models Transducer Model Model Acoustic   Word   Triphones Monophones Words Indices Sequence

WFST-based ASR System Acoustic   Context   Pronunciation   Language   Models Transducer Model Model Acoustic   Word   Triphones Monophones Words Indices Sequence H a-a+b f 4 : ε f 1 : ε f 3 : ε f 0 : a-a+b ε f 2 : ε f 4 : ε f 6 : ε } a-b+b FST Union + One 3-state   Closure HMM for   . Resulting each   FST . triphone H . y-x+z

WFST-based ASR System Acoustic   Context   Pronunciation   Language   Models Transducer Model Model Acoustic   Word   Triphones Monophones Words Indices Sequence C . . b-c+x:b cx a-b+c:a � : b � : c ϵ ϵ o bc c ab b-c+a:b ca . .

WFST-based ASR System Acoustic   Context   Pronunciation   Language   Models Transducer Model Model Acoustic   Word   Triphones Monophones Words Indices Sequence L (a) t: ε /0.3 ax: ε /1 ey: ε /0.5 2 3 4 dx: ε /0.7 ae: ε /0.5 d:data/1 1 0 d:dew/1 uw: ε /1 5 6 (b) Figure reproduced from “Weighted Finite State Transducers in Speech Recognition”, Mohri et al., 2002

WFST-based ASR System Acoustic   Context   Pronunciation   Language   Models Transducer Model Model Acoustic   Word   Triphones Monophones Words Indices Sequence G are/0.693 walking birds/0.404 the 0 were/0.693 animals/1.789 is boy/1.789

Decoding Acoustic   Context   Pronunciation   Language   Models Transducer Model Model Acoustic   Word   Triphones Monophones Words Indices Sequence H C L G Carefully construct a decoding graph D using optimization algorithms: D = min(det(H ⚬ det(C ⚬ det(L ⚬ G)))) Given a test utterance O, how do I decode it?   Assuming ample compute, first construct the following machine X from O. f 0 :19.12 f 0 :18.52 f 0 :10.578 f 0 :9.21 If f i maps to state j,   f 1 :12.33 this is -log(b j (O i )) f i maps to a distinct f 1 :13.45 f 1 :5.645 f 1 :14.221 triphone HMM state ⠇ ⠇ ⠇ ⠇ ………… f 500 :20.21 f 500 :10.21 f 500 :8.123 f 500 :11.233 f 1000 :11.11 f 1000 :15.99 f 1000 :5.678 f 1000 :15.638 “Weighted Finite State Transducers in Speech Recognition”, Mohri et al., Computer Speech & Language, 2002

Pre-midsem Revision Lecture 11 CS 753 Instructor: Preethi Jyothi - PowerPoint PPT Presentation

Pre-midsem Revision Lecture 11 CS 753 Instructor: Preethi Jyothi Tied-state Triphone Models State Tying Observation probabilities are shared across triphone states which generate acoustically similar data b/a/k p/a/k b/a/g Triphone

SoM Curriculum Revision The Curriculum Revision Committee 9/19/2017 Who is the Committee

REVISION GUIDES: How to use them effectively Miss A Humphries and Mr C Dawson Science revision

REVISION GUIDES: How to use them effectively Miss A Humphries and Mrs K Leafe Science revision

How many can you remember? How is revision like a piece of cheese? From the same menu.. This

Effective Revision Techniques Mrs Poole DLA RE Revision skills Much more than simply reading,

Year 11 Parents Evening Revision and Study Skills The Importance of Revision Well organised,

Week 12 Revision Discrete Math May 14, 2020 Marie Demlova: Discrete Math Revision Revision

B2 Symmetry and Relativity Revision 1 TT 2020 Revision notes Highlights basic things

Revision Python - Nick Reynolds April 7, 2017 Revision (~15 mins) This Class Quiz

EWBS Receiving Module Specifications 1.00 Century Revision history Revision history Revision

Revision! How can we help? Revision Technique Didnt bother to revise ? How do you revise?

1 ReVision Energy presentation to SMMC Energy Team 3-13-2014 Sam LaValle of ReVision

Revision Control with GIT Eric McCreath Revision Control Systems There are a large number of

Seminar: Search and Optimization 4. An Introduction to Revision Control with Mercurial Gabi R

Revision Revision Web Services 1 Key Words in Questions Name Usually a single word will do

Revision of Pharmaceutical Affairs Law (PAL) - Japan Update - Revision of Pharmaceutical

Revision (Part I I ) Ke Chen Revision slides are going to summarise all you have learnt from

PROPOSALS FOR SI LAW REVISION H N i, 03/8/2012 Proposals for SI Law revision 2 Outline

Accounts Revision https://nsa.org.na Overview Revision of national accounts Performance

ReVision Energy, LLC Who is ReVision Energy? Maine s most experienced renewable energy

For student abcdef : the CS619 repository is: https://stsvn.cs.unh.edu/svn/cs619. abcdef

STRATEGIC MANAGEMENT REVISION CHAPTER-4 STRATEGIC MANAGEMENT REVISION CHAPTER-4 STRATEGIC

By-Laws Revision TownHall Meeting February 10, 2019 By-Law Revision Process By-Laws Review

Marking, Feedback and Presentation of Work Policy Last revision dated: Autumn 2015 This