String Pairs 1 s 1 = #breaking# # # Pr(s 1 ,s 2 ) = 1/Z F(s 1 ,s 2 ) s 2 = # # #broke# F(s 1 ,s 2 ) = #breaking# + exp � i θ i f i ( ) ) ( #br ε oke εε # #breaking# + exp � i θ i f i ( ) #bro ε ke εε # #brea ε king# + exp � i θ i f i ( ) #br εε oke εε # + exp � i θ i f i ( ) #break ε ing# #bro ε ke εεε # + . . .
#break ε ing# #bro ε ke εεε # + exp � i θ i f i ( ) #break ε ing# #bro ε ke εεε #
#break ε ing# #bro ε ke εεε # + exp � i θ i f i ( ) #break ε ing# #bro ε ke εεε #
#break ε ing# #bro ε ke εεε # + exp � i θ i f i ( ) #break ε ing# #bro ε ke εεε #
#break ε ing# #bro ε ke εεε # + exp � i θ i f i ( ) #break ε ing# #bro ε ke εεε #
#break ε ing# #bro ε ke εεε # + exp � i θ i f i ( ) #break ε ing# #bro ε ke εεε #
#break ε ing# #bro ε ke εεε # + exp � i θ i f i ( ) #break ε ing# #bro ε ke εεε #
#break ε ing# #bro ε ke εεε #
#break ε ing# #bro ε ke εεε # eak o ε k full window
#break ε ing# #bro ε ke εεε # eak VVC o ε k V ε C full vowels, window consonants
#break ε ing# #bro ε ke εεε # eak VVC ??? o ε k V ε C o ε k full vowels, target window consonants language
#break ε ing# #bro ε ke εεε # ??? “collapsed” ok eak VVC ??? o ε k V ε C o ε k full vowels, target window consonants language
#break ε ing# #bro ε ke εεε # ??? “collapsed” ok eak VVC ??? subst ident del o ε k V ε C o ε k full vowels, target subst, del, ins, window consonants language ident
#break ε ing# f o s n o i s r e v d d a o s l A e r a t #bro ε ke εεε # a h t s e r u t a e f e s e h t ! s m a r g i b o t f f o d e k c a b ??? “collapsed” ?k ?ak ?VC ??? ident ??? del ? ε k ? ε C ? ε k full vowels, target subst, del, ins, window consonants language ident
String Pairs 1 • To compute such feature-based scores for two string variables S 1 and S 2 , we S 1 construct a weighted finite-state F transducer F • It can assign a score to any string pair S 2 s 1 , s 2 1 Pr(s 1 ,s 2 ) = F(s 1 ,s 2 ) Z
Background: Finite-state machines ? What is a finite-state acceptor (FSA) An automaton with a finite number of states and arcs. Can be used to assign a score to any string . ? What is a finite-state transducer (FST) Same as FSA, but used to assign score to any string pair (e.g. evaluating how well they go together).
String Pairs 1 • Specific kind of grammar that describes and scores one or more strings • Closure properties under many useful operations (we will use composition, intersection, projection) • Useful for many tasks in natural language processing
String Pairs 1 b r e c h e n S 1 = S 1 F F S 2 b r a c h t S 2 =
String Pairs 1 b r e c h e n S 1 = finite-state transducer S 1 F F S 2 b r a c h t S 2 =
String Pairs 1 b r e c h e n S 1 = finite-state transducer S 1 F F S 2 b r a c h t S 2 =
String Pairs 1 b r e c h e n S 1 = finite-state transducer S 1 F F S 2 arc have weights, determined by their features b r a c h t S 2 =
String Pairs 1 b r e c h e n S 1 = finite-state transducer S 1 F F S 2 arc have weights, determined by their features b r a c h t S 2 = Transducer F computes score by looking at all alignments
String Pairs 1 Sum over all paths in the finite-state transducer b r e c h e n S 1 = finite-state transducer S 1 F F =13.26 S 2 arc have weights, determined by their features b r a c h t S 2 = Transducer F computes score by looking at all alignments
String Pairs 1 • The alignment between the string pair is a latent variable. • We add more latent variables to the model: • Change regions For details, see my thesis, and Dreyer, Smith & Eisner, 2008 • Conjugations classes
String Pairs 1 Inflection (on German verbs) 95 88 81 74 67 60 13SIA-13SKE 2PIE-13PKE 2PKE-z rP-pA Moses3 FST FST (+latent) (baseline) (this talk) See my thesis, and Dreyer, Smith & Eisner, 2008
String Pairs 1 Lemmatization 100 90 80 70 Basque English Irish Tagalog Wicentowski (2002) This talk See my thesis, and Dreyer, Smith & Eisner, 2008
String Pairs 1 Transliteration competition, NEWS 2009 Accuracy on English-to-Russian 61.3 60.5 60.0 54.5 a o k C t M T y (basic features) r l k C U e a B o b t I I I T N U l A s U i U h T
Conclusions / Contributions 1 • Presented a novel, well-defined probability model over string pairs (or single strings) • General enough to model many string-to-string problems in NLP (and neighboring disciplines) • Achieved high-scoring results in different tasks (inflection, lemmatization, transliteration) in multiple languages ( German, Basque, English, Irish, Tagalog, Russian)
Conclusions / Contributions 1 • Linguistic properties and soft constraints can be expressed and learned (prefer certain vowel/consonant sequences, prefer identities, ...) • Arbitrary-length output is handled elegantly (eliminates need for limiting structure insertion) • Much information does not need to be annotated; it is inferred as hidden variables (alignments, conjugation classes, regions)
Overview p ( ) 1 String pairs p ( ) Multiple strings 2 (paradigms) Text and 3 p ( ) paradigms
Multiple Strings 2 • We’ve seen how to model 2 strings, using feature-based finite- state machines • But we have bigger goals ...
Multiple Strings 2
Example applications 2 Inflectional paradigms
Example applications 2 Inflectional paradigms
Example applications 2 Inflectional paradigms ? ? ? ? ? ? ? ? ? ? ? ?
Example applications 2 Inflectional paradigms predict predict
Example applications 2 Inflectional paradigms predict predict
Example applications 2 Inflectional paradigms predict predict
Example applications 2 Inflectional paradigms predict predict reinforce
アイスクリーム Example applications 2 Transliteration (using phonology) ice cream English orthography English phonology Japanese orthography Japanese phonology
Example applications 2 Spelling correction Misspelling egg sample Pronunciation example Correct spelling
Example applications 2 ... and all other tasks where word forms and representations interact: • Cognate modeling • Multiple-string alignment • System combination
Multiple Strings 2 • Let’s build a general probability model over multiple strings • It extends the string-pair model we saw in the last part. • We will later be able to use it to learn how to inflect verbs.
Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 ) = Z S 1 x F 1 (s 1 , s 2 ) F 1 S 2
Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 ) = Z S 1 x F 1 (s 1 , s 2 ) Random variable, F 1 ranges over any string S 2 Random variable, ranges over any string
Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 ) = Z S 1 x F 1 (s 1 , s 2 ) Random variable, F 1 ranges over any string Potential function, can score any string pair S 2 Random variable, ranges over any string
Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 ) = Z S 1 x F 1 (s 1 , s 2 ) F 1 S 2
Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 , s 3 ) = Z S 1 x F 1 (s 1 , s 2 ) F 1 x F 2 (s 1 , s 3 ) F 2 S 2 S 3
Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 , s 3, s 4 ) = Z S 1 x F 1 (s 1 , s 2 ) F 1 F 3 x F 2 (s 1 , s 3 ) F 2 x F 3 (s 1 , s 4 ) S 2 S 4 S 3
Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 , s 3, s 4 ) = Z S 1 x F 1 (s 1 , s 2 ) F 1 F 3 x F 2 (s 1 , s 3 ) F 2 x F 3 (s 1 , s 4 ) x F 4 (s 2 , s 3 ) F 4 S 2 S 4 S 3
Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 , s 3, s 4 ) = Z S 1 x F 1 (s 1 , s 2 ) F 1 F 3 x F 2 (s 1 , s 3 ) F 2 x F 3 (s 1 , s 4 ) x F 4 (s 2 , s 3 ) F 4 F 5 S 2 S 4 S 3 x F 5 (s 3 , s 4 )
Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 , s 3, s 4 ) = Z S 1 x F 1 (s 1 , s 2 ) F 1 F 3 x F 2 (s 1 , s 3 ) F 2 x F 3 (s 1 , s 4 ) x F 4 (s 2 , s 3 ) F 4 F 5 S 2 S 4 S 3 x F 5 (s 3 , s 4 ) F 6 x F 6 (s 2 , s 4 )
Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 , s 3, s 4 ) = Z S 1 x F 1 (s 1 , s 2 ) F 1 F 3 x F 2 (s 1 , s 3 ) Potential function, F 2 can score any string pair x F 3 (s 1 , s 4 ) x F 4 (s 2 , s 3 ) F 4 F 5 S 2 S 4 S 3 x F 5 (s 3 , s 4 ) F 6 x F 6 (s 2 , s 4 )
Model. Factor graph examples 2 Factor Graph: S 1 F 1 F 3 Potential function, F 2 can score any string pair F 4 F 5 Each potential function F S 2 S 4 S 3 is computed by a finite- F 6 state transducer .
Model. Factor graph examples 2 Factor Graph: 1 Pr(s 1 , s 2 , s 3, s 4 ) = Z S 1 x F 1 (s 1 , s 2 ) F 1 F 3 x F 2 (s 1 , s 3 ) F 2 x F 3 (s 1 , s 4 ) x F 4 (s 2 , s 3 ) F 4 F 5 S 2 S 4 S 3 x F 5 (s 3 , s 4 ) F 6 x F 6 (s 2 , s 4 ) A formal description of such a model ...
Model. Summary 2 • It is formally an undirected graphical model (a.k.a. Markov Random Field, MRF ), • in which the variables are string-valued , and the factors (potential functions) are finite-state transducers. Dreyer & Eisner, 2009
Model. Less formal description 2 To model multiple strings and their various interactions, I ... • use many finite-state transducers , • have each of them look at a different string pair, • plug them together into a big network, • and coordinate them to predict all strings jointly (also: train the transducers jointly).
Model. Comparison with k-tape FSM 2 • Model k strings with a k-tape finite-state machine? F F b r e ε chen ε b r ε ach εε t b r ε achen ε b r ε ach εεε S 2 S 4 S 1 S 3 • Factored model more powerful: • Encode swaps and other useful models ☺ ☹ • Encode undecidable models
Model. Comparison with k-tape FSM 2 • Model k strings with a k-tape finite-state machine? • >26 k arcs, intractable! Multiple-sequence alignment F F b r e ε chen ε b r ε ach εε t b r ε achen ε b r ε ach εεε S 2 S 4 S 1 S 3 • Factored model more powerful: • Encode swaps and other useful models ☺ ☹ • Encode undecidable models
Inference. Overview 2 Factor Graph: S 1 F 1 F 3 F 2 F 4 F 5 S 2 S 4 S 3 F 6
Inference. Overview 2 • Run Belief Propagation Factor Graph: (BP) S 1 F 1 F 3 F 2 F 4 F 5 S 2 S 4 S 3 F 6
Inference. Overview 2 • Run Belief Propagation Factor Graph: (BP) S 1 • BP is a message-passing F 1 F 3 algorithm, a F 2 generalization of forward-backward . F 4 F 5 S 2 S 4 S 3 F 6
Recommend
More recommend