Review Semirings WFSTs Composition Epsilon Summary Lecture 16: Weighted Finite State Transducers (WFST) Mark Hasegawa-Johnson All content CC-SA 4.0 unless otherwise specified. ECE 417: Multimedia Signal Processing, Fall 2020
Review Semirings WFSTs Composition Epsilon Summary Review: WFSA 1 Semirings 2 How to Handle HMMs: The Weighted Finite State Transducer 3 Composition 4 Doing Useful Stuff: The Epsilon Transition 5 Summary 6
Review Semirings WFSTs Composition Epsilon Summary Outline Review: WFSA 1 Semirings 2 How to Handle HMMs: The Weighted Finite State Transducer 3 Composition 4 Doing Useful Stuff: The Epsilon Transition 5 Summary 6
Review Semirings WFSTs Composition Epsilon Summary Weighted Finite State Acceptors The/0 . 3 dog/1 1 very/0 . 2 A/0 . 2 is/1 cute/0 . 4 dog/0 . 3 0 3 4 5 A/0 . 3 hungry/0 . 4 This/0 . 2 2 6 cat/0 . 7 An FSA specifies a set of strings. A string is in the set if it corresponds to a valid path from start to end, and not otherwise. A WFSA also specifies a probability mass function over the set.
Review Semirings WFSTs Composition Epsilon Summary Every Markov Model is a WFSA 1/ a 13 2/ a 22 1/ a 12 2/ a 23 1/ a 11 3/ a 33 1 2 3 2/ a 21 3/ a 32 3/ a 31 A Markov Model (but not an HMM!) may be interpreted as a WFSA: just assign a label to each edge. The label might just be the state number, or it might be something more useful.
Review Semirings WFSTs Composition Epsilon Summary Best-Path Algorithm for a WFSA Given: Input string, S = [ s 1 , . . . , s T ]. For example, the string “A dog is very very hungry” has T = 5 words. Edges, e , each have predecessor state p [ e ] ∈ Q , next state n [ e ] ∈ Q , weight w [ e ] ∈ R and label ℓ [ e ] ∈ Σ. Initialize: � ¯ 1 i = initial state δ 0 ( i ) = ¯ 0 otherwise Iterate: δ t ( j ) = best δ t − 1 ( p [ e ]) ⊗ w [ e ] e : n [ e ]= j ,ℓ [ e ]= s t ψ t ( j ) = argbest δ t − 1 ( p [ e ]) ⊗ w [ e ] e : n [ e ]= j ,ℓ [ e ]= s t Backtrace: e ∗ t = ψ ( q ∗ q ∗ t = p [ e ∗ t +1 ) , t ]
Review Semirings WFSTs Composition Epsilon Summary Determinization A WFSA is said to be deterministic if, for any given (predecessor state p [ e ], label ℓ [ e ]), there is at most one such edge. For example, this WFSA is not deterministic. The/0 . 3 1 dog/1 very/0 . 2 A/0 . 2 is/1 cute/0 . 4 dog/0 . 3 0 3 4 5 A/0 . 3 hungry/0 . 4 This/0 . 2 2 6 cat/0 . 7
Review Semirings WFSTs Composition Epsilon Summary How to Determinize a WFSA The only general algorithm for determinizing a WFSA is the following exponential-time algorithm: For every state in A , for every set of edges e 1 , . . . , e K that all have the same label: Create a new edge, e , with weight w [ e ] = w [ e 1 ] ⊕ · · · ⊕ w [ e K ]. Create a brand new successor state n [ e ]. For every edge leaving any of the original successor states n [ e k ] , 1 ≤ k ≤ K , whose label is unique: Copy it to n [ e ], ⊗ its weight by w [ e k ] / w [ e ] For every set of edges leaving n [ e k ] that all have the same label: Recurse!
Review Semirings WFSTs Composition Epsilon Summary Outline Review: WFSA 1 Semirings 2 How to Handle HMMs: The Weighted Finite State Transducer 3 Composition 4 Doing Useful Stuff: The Epsilon Transition 5 Summary 6
Review Semirings WFSTs Composition Epsilon Summary Semirings A semiring is a set of numbers, over which it’s possible to define a operators ⊗ and ⊕ , and identity elements ¯ 1 and ¯ 0. The Probability Semiring is the set of non-negative real numbers R + , with ⊗ = · , ⊕ = +, ¯ 1 = 1, and ¯ 0 = 0. The Log Semiring is the extended reals R ∪ {∞} , with ⊗ = +, ⊕ = − logsumexp( − , − ), ¯ 1 = 0, and ¯ 0 = ∞ . The Tropical Semiring is just the log semiring, but with ⊕ = min. In other words, instead of adding the probabilities of two paths, we choose the best path: a ⊕ b = min( a , b ) Mohri et al. (2001) formalize it like this: a semiring is K , ⊕ , ⊗ , ¯ 0 , ¯ � � K = 1 where K is a set of numbers.
Review Semirings WFSTs Composition Epsilon Summary Outline Review: WFSA 1 Semirings 2 How to Handle HMMs: The Weighted Finite State Transducer 3 Composition 4 Doing Useful Stuff: The Epsilon Transition 5 Summary 6
Review Semirings WFSTs Composition Epsilon Summary Weighted Finite State Transducers The:Le/0 . 3 very:tr` es/0 . 2 dog:chien/1 1 A:Un/0 . 2 is:est/0 . 5 cute:mignon/0 . 8 dog:chien/0 . 3 0 3 4 5 is:a/0 . 5 A:Un/0 . 3 hungry:faim/0 . 8 2 7 6 This:Ce/0 . 2 very:tr` es/0 . 2 cat:chat/0 . 7 A (Weighted) Finite State Transducer (WFST) is a (W)FSA with two labels on every edge: An input label, i ∈ Σ, and An output label, o ∈ Ω.
Review Semirings WFSTs Composition Epsilon Summary What it’s for An FST specifies a mapping between two sets of strings. The input set is I ⊂ Σ ∗ , where Σ ∗ is the set of all strings containing zero or more letters from the alphabet Σ. The output set is O ⊂ Ω ∗ . For every � i = [ i 1 , . . . , i T ] ∈ I , the FST specifies one or more possible translations � o = [ o 1 , . . . , o T ] ∈ O . A WFST also specifies a probability mass function over the translations. The example on the previous slide was normalized to compute a joint pmf p ( � i , � o ), but other WFSAs o | � might be normalized to compute a conditional pmf p ( � i ), or something else.
Review Semirings WFSTs Composition Epsilon Summary Normalizing for Conditional Probability o | � Here is a WFST whose weights are normalized to compute p ( � i ): The:Le/1 very:tr` es/1 dog:chien/1 1 A:Un/1 is:est/0 . 5 cute:mignon/1 dog:chien/1 0 3 4 5 is:a/0 . 5 A:Un/1 cat:f´ elin/0 . 1 hungry:faim/1 2 7 6 This:Ce/1 very:tr` es/1 cat:chat/0 . 9
Review Semirings WFSTs Composition Epsilon Summary Normalizing for Conditional Probability Normalizing for conditional probability allows us to separately represent the two parts of a hidden Markov model. 1 The transition probabilities, a ij , are the weights on a WFSA. 2 The observation probabilities, b j ( � x t ), are the weights on a WFST.
Review Semirings WFSTs Composition Epsilon Summary WFSA: Symbols on the edges are called PDFIDs It is no longer useful to say that “the labels on the edges are the state numbers.” Instead, let’s call them pdfids . 1/ a 13 2/ a 22 1/ a 12 2/ a 23 1/ a 11 1 2 3 3/ a 33 2/ a 21 3/ a 32 3/ a 31
Review Semirings WFSTs Composition Epsilon Summary Observation Probabilities as Conditional Edge Weights Now we can create a new WFST whose output symbols are pdfids j , whose input symbols are observations , � x t , and whose weights are the observation probabilities, b j ( � x t ). � x 1 :1/ b 1 ( � x 1 ) x 2 :1/ b 1 ( � x 2 ) x 3 :1/ b 1 ( � x 3 ) x 4 :1/ b 1 ( � � x 4 ) � � � x 1 :2/ b 2 ( � x 1 ) � x 2 :2/ b 2 ( � x 2 ) � x 3 :2/ b 2 ( � x 3 ) � x 4 :2/ b 2 ( � x 4 ) 0 1 2 3 4 � x 1 :3/ b 3 ( � x 1 ) x 2 :3/ b 3 ( � � x 2 ) x 3 :3/ b 3 ( � � x 3 ) � x 4 :3/ b 3 ( � x 4 )
Review Semirings WFSTs Composition Epsilon Summary Hooray! We’ve almost re-created the HMM! So far we have: You can create a WFSA whose weights are the transition probabilities. You can create a WFST whose weights are the observation probabilities. Here are the problems: 1 How can we combine them? 2 Even if we could combine them, can this do anything that an HMM couldn’t already do?
Review Semirings WFSTs Composition Epsilon Summary Outline Review: WFSA 1 Semirings 2 How to Handle HMMs: The Weighted Finite State Transducer 3 Composition 4 Doing Useful Stuff: The Epsilon Transition 5 Summary 6
Review Semirings WFSTs Composition Epsilon Summary Composition The main reason to use WFSTs is an operator called “composition.” Suppose you have 1 A WFST, R , that translates strings a ∈ A into strings b ∈ B with joint probability p ( a , b ). 2 Another WFST, S , that translates strings b ∈ B into strings c ∈ C with conditional probability p ( c | b ). The operation T = R ◦ S gives you a WFST, T , that translates strings a ∈ A into strings c ∈ C with joint probability � p ( a , c ) = p ( a , b ) p ( c | b ) b ∈B
Review Semirings WFSTs Composition Epsilon Summary The WFST Composition Algorithm 1 Initialize: The initial state of T is a pair, i T = ( i R , i S ), encoding the initial states of both R and S . 2 Iterate: While there is any state q T = ( q R , q S ) with edges ( e R = a : b , e S = b : c ) that have not yet been copied to e T , Create a new edge e T with next state n [ e T ] = ( n [ e R ] , n [ e S ]) 1 and labels i [ e T ] : o [ e T ] = i [ e R ] : o [ e S ] = a : c . If an edge with the same n [ e T ], i [ e T ], and o [ e T ] already exists, 2 then update its weight: w [ e T ] = w [ e T ] ⊕ ( w [ e R ] ⊗ w [ e S ]) If not, create a new edge with 3 w [ e T ] = w [ e R ] ⊗ w [ e S ] 3 Terminate: A state q T = ( q R , q S ) is a final state if both q R and q S are final states.
Recommend
More recommend