Phonology and speech applications with weighted automata Natural Language Processing LING/CSCI 5832 Mans Hulden Dept. of Linguistics mans.hulden@colorado.edu Feb 19 2014
Overview (1) Recap unweighted finite automata and transducers (2) Extend to probabilistic weighted automata/transducers (3) See how these can be used in natural language applications + a brief look at speech applications
RE: anatomy of a FSA Regular expression Formal definition L = a b* c Q = {0,1,2} (set of states) Σ = {a,b,c} (alphabet) q 0 = 0 (initial state) Graph representation F = {2} (set of final states) b δ (0,a) = 1, δ (1,b) = 1, δ (1,c) = 2 a c (transition function) 0 1 2 defines a set of strings
RE: anatomy of an FST Formal definition Graph representation Q = {0,1,2,3} (set of states) Σ = {a,b,c,d} (alphabet) q 0 = 0 (initial state) a b a b d 2 a c F = {0,1,2} (set of final states) c b d 0 δ (transition function) 1 c <a:b> 3 d string-to-string mapping
RE: composition NEG+possible+ity+NOUN+PLURAL NEG+possible+ity+NOUN+PLURAL t r a n 24 25 26 27 g 23 e s 33 34 35 s s i b l 18 19 20 21 22 28 n e s o a s 32 36 l y i + r e t t c 39 p 14 15 16 17 31 l 10 u 37 8 n + t y 29 g u i 30 d e e 0 1 9 l 38 i k s i 11 12 13 a e m o 2 3 4 u t 5 6 7 0 d s a in+possible+ity+s 2 5 u i 1 e u 3 6 <+:0> m e t 10 4 7 <n:m> @ + m p <n:m> @ + m n i o n 4 n <n:m> 22 11 0 8 @ m p + 2 <n:m> <+:0> <+:0> <+:0> n + 1 23 12 9 3 p s u l s u l p p 27 33 24 13 im+possible+ity+s t i o r 28 25 17 14 g k r s e 29 26 18 15 a e s t @ + e l t y 30 34 19 16 @ e i l t y n i l t @ + i l t y l 7 e 8 + b 9 @ + e i l t y b b i 20 35 b 10 @ + e i t y 0 1 t b b 11 31 b b <l:i> <e:l> <+:i> 2 3 4 <i:t> 5 21 @ + e i l y g <t:y> @ + e i l t 6 l y <y:0> 32 im+possibility+s e 36 a n 37 i 40 e @ <+:0> 41 38 t l s 42 c 39 0 y s 43 impossibilities impossibilities
Orthographic vs. phonetic representation NEG+possible+ity+NOUN+PLURAL NEG+possible+ity+NOUN+PLURAL t r a n 24 25 26 27 g 23 e s 33 34 35 s s i b l 18 19 20 21 22 28 n e s o a s 32 36 l y i + r e t t c 39 p 14 15 16 17 31 l 10 u 37 8 n + t y 29 g u i 30 d e e 0 1 9 l 38 i k s i 11 12 13 a e m o 2 3 4 u t 5 6 7 0 d s a in+possible+ity+s 2 5 u i 1 e u 3 6 <+:0> m e t 10 4 7 <n:m> @ + m p <n:m> @ + m n i o n 4 n <n:m> 22 11 0 8 @ m p + 2 <n:m> <+:0> <+:0> <+:0> n + 1 23 12 9 3 p s u l s u l p p 27 33 24 13 im+possible+ity+s t i o r 28 25 17 14 g k r s e 29 26 18 15 a e s t @ + e l t y 30 34 19 16 @ e i l t y n i l t @ + i l t y l 7 e 8 + b 9 @ + e i l t y b b i 20 35 b 10 @ + e i t y 0 1 t b b 11 31 b b <l:i> <e:l> <+:i> 2 3 4 <i:t> 5 21 @ + e i l y g <t:y> @ + e i l t 6 l y <y:0> 32 e impossibilities 36 a n 37 i 40 e 41 38 G2P t l s 42 c 39 y s 43 [ ɪ mp ɑ s ə b ɪ l ə tis] [ ɪ mp ɑ s ə b ɪ l ə tis]
Noisy channel models guess at noisy DECODER word original word SOURCE word NOISY CHANNEL A general framework for thinking about spell checking, speech recognition, and other problems that involve decoding in probabilistic models Similar problem to morphology ‘decoding’
Example: spell checking guess at noisy DECODER word original word SOURCE word NOISY CHANNEL Problem form w ˆ argmax P w O w V The function argmax
Noisy channel models guess at noisy DECODER word original word SOURCE word NOISY CHANNEL Problem form w ˆ argmax P w O w V x O into three other proba The function argmax P y x P x (Bayes’ Rule) P x y P y We can see this by substitutin
Noisy channel models guess at noisy DECODER word original word SOURCE word NOISY CHANNEL Problem form w ˆ argmax P w O w V The function argmax We can see this by substituting (5. P O w P w w ˆ argmax P O w V The probabilities on the righ
Noisy channel models guess at noisy DECODER word original word SOURCE word NOISY CHANNEL Problem form language model w ˆ argmax P w O w V The function argmax P O w P w w ˆ argmax argmax P O w P w P O w V w V prior To summarize, the most probable word w given som likelihood #3. w ˆ argmax P O w P w error model w V tions we will show how to compute the
Decoding NEG+possible+ity+NOUN+PLURAL impossibility t r a n 24 25 26 27 g 23 e s 33 34 35 s s i b l 18 19 20 21 22 28 n e s o a s 32 36 l y i + r e t t c 39 p 14 15 16 17 31 l 10 u 37 8 n + t y 29 g u i 30 d e e 0 1 9 l 38 i k s i 11 12 13 a e m o 2 3 4 u t 5 6 7 in+possible+ity+s <n:m> @ + m p @ + m n 4 n <n:m> 0 @ m p + 2 <n:m> n + 1 3 noisy p im+possible+ity+s word word @ + e l t y @ e i l t y @ + i l t y l 7 e 8 + b 9 @ + e i l t y b b i b 10 @ + e i t y 0 1 t NOISY CHANNEL b b 11 b <l:i> <e:l> <+:i> 2 3 4 <i:t> 5 @ + e i l y <t:y> @ + e i l t 6 <y:0> im+possibility+s @ <+:0> 0 impssblity impossibilities
Decoding NEG+possible+ity+NOUN+PLURAL impossibility non-probabilistic probabilistic changes changes (errors) decode noisy Morphology/ word word phonology NOISY CHANNEL impssblity impossibilities
Decoding/speech processing NEG+possible+ity+NOUN+PLURAL decoding is a problem non-probabilistic probabilistic changes changes decode noisy Morphology/ word word phonology NOISY CHANNEL impossibilities
Probabilistic automata Intuition - define probability distributions over strings - symbols have transition probabilities - states have final/halting probabilities - probabilities are multiplied along paths - probabilities are summed for several parallel paths
Probabilistic automata Intuition
Aside: HMMs and prob. automata Are equivalent (though automata may be more compact) 0.04 0.04 0.36 0.36 a 0.02 a 0.18 0.9 b 0.72 0.1 11 12 11 12 b 0.08 [a 0.3] [a 0.2] a 0.21 [b 0.7] [b 0.8] b 0.21 b 0.49 0.7 0.3 a 0.09 0.1 b 0.02 a 0.08 0.9 a 0.72 b 0.18 a 0.27 ⇒ 0.7 22 22 21 21 0.3 a 0.63 [a 0.8] [a 0.9] b 0.03 b 0.07 [b 0.2] [b 0.1] 0.42 0.42 0.18 0.18
Probabilistic automata from probabilistic to weighted As always, we would prefer using(negative) logprobs, since this makes calculations easier: -log(0.16) ≈ 1.8326 -log(0.84) ≈ 0.1744 -log(1) = 0 -log(0) = ∞ Since the more probable is now numerically smaller, we call them weights
Semirings A semiring ( K , ⊕ , ⊗ , 0 , 1) = a ring that may lack negation. • Sum : to compute the weight of a sequence (sum of the weights of the paths labeled with that sequence). • Product: to compute the weight of a path (product of the weights of con- stituent transitions). 0 1 Semiring Set ⊕ ⊗ Boolean { 0 , 1 } 0 1 ∨ ∧ Probability + 0 1 R + × Log R ∪ { −∞ , + ∞ } + + ∞ 0 ⊕ log Tropical R ∪ { −∞ , + ∞ } min + + ∞ 0 Σ ∗ ∪ { ∞ } String · ∧ ∞ � ⊕ log is defined by: x ⊕ log y = − log( e − x + e − y ) and ∧ is longest common prefix. The string semiring is a left semiring . ⊗ respecti ⊕ = s = s , s ⊕ 0 and s ⊗ 1 , = additional constraints equal 0 . equal . Also, s ⊗ 0
Semirings b/1 a/1 1/2 c/3 b/4 3/2 0 a/2 b/3 b/1 c/5 2 Probability semiring ( R + , + , × , 0 , 1) Tropical semiring ( R + ∪ { ∞ } , min , + , ∞ , 0) [ [ A ] ]( ab ) = 14 [ [ A ] ]( ab ) = 4 (1 × 1 × 2 + 2 × 3 × 2 = 14) (min(1 + 1 + 2 , 3 + 2 + 2) = 4)
Formal definition Σ is an automaton, Σ Initial output function , Output function : , Σ Final output function , Function : Σ associated with : .
Weighted transducers Intuition
Weighted transducers Semirings 1/2 a: ε /1 b:r/2 0 a:r/3 b: ε /2 3/2 2 c:s/1 Probability semiring ( R + , + , × , 0 , 1) Tropical semiring ( R + ∪ { ∞ } , min , + , ∞ , 0) [ [ T ] ]( ab, r ) = 16 [ [ T ] ]( ab, r ) = 5 (1 × 2 × 2 + 3 × 2 × 2 = 16) (min(1 + 2 + 2 , 3 + 2 + 2) = 5)
Weighted transducers Formal definition Σ ∆ Finite alphabets Σ and ∆ , Finite set of states , Transition function : 2 , Σ Output function : Σ , Σ set of initial states, set of final states. defines a relation: 2 : Σ
Operations on weighted automata
Booleans Union: Example a/5 a/3 b/1 1 a/3 b/2 3 /0 a/4 0 2 b/6 b/7 c/1 c/0 b/2 a/6 b/3 a/4 1 b/5 2 3 4 5 /0 0 a/3 c/1 c/0 b/2 a/6 b/3 a/4 1 2 3 4 5 /0 b/5 a/3 0 ε /0 10 a/5 b/1 ε /0 a/3 a/3 7 6 b/2 9 /0 b/7 8 b/6 a/4
Composition x x T T ○ U y U z z
Composition x x T T ○ U y U z z Multiplicative ~ p(y|x) p(z|y)
Composition A a:a/3 b: ε /1 c: ε /4 d:d/2 0 1 2 3 4 a:d/5 ε /7 :e d:a/6 B 0 1 2 3 A o B a:d/15 b:e/7 c: ε /4 d:a/12 (0,0) (1,1) (2,2) (3,2) (4,3)
Recommend
More recommend