natural language processing cse 490u generation
play

Natural Language Processing (CSE 490U): Generation: Translation - PowerPoint PPT Presentation

Natural Language Processing (CSE 490U): Generation: Translation & Summarization Noah Smith c 2017 University of Washington nasmith@cs.washington.edu March 68, 2017 1 / 68 No office hours Thursday. 2 / 68 analysis R NL


  1. Natural Language Processing (CSE 490U): Generation: Translation & Summarization Noah Smith c � 2017 University of Washington nasmith@cs.washington.edu March 6–8, 2017 1 / 68

  2. No office hours Thursday. 2 / 68

  3. analysis R NL generation 3 / 68

  4. Natural Language Generation The classical view: R is a meaning representation language. ◮ Often very specific to the domain. ◮ For a breakdown of the problem space and a survey, see Reiter and Dale (1997). Today: considerable emphasis on text-to-text generation, i.e., transformations: ◮ Translating a sentence in one language into another language ◮ Summarizing a long piece of text by a shorter one ◮ Paraphrase generation (Barzilay and Lee, 2003; Quirk et al., 2004) 4 / 68

  5. Machine Translation 5 / 68

  6. Warren Weaver to Norbert Wiener, 1947 One naturally wonders if the problem of translation could be conceivably treated as a problem in cryptography. When I look at an article in Russian, I say: ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.’ 6 / 68

  7. Evaluation Intuition: good translations are fluent in the target language and faithful to the original meaning. Bleu score (Papineni et al., 2002): ◮ Compare to a human-generated reference translation ◮ Or, better: multiple references ◮ Weighted average of n-gram precision (across different n) There are some alternatives; most papers that use them report Bleu, too. 7 / 68

  8. Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source − → Y − → channel − → X 8 / 68

  9. Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source − → Y − → channel − → X ◮ Y is the plaintext, the true message, the missing information, the output 9 / 68

  10. Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source − → Y − → channel − → X ◮ Y is the plaintext, the true message, the missing information, the output ◮ X is the ciphertext, the garbled message, the observable evidence, the input 10 / 68

  11. Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source − → Y − → channel − → X ◮ Y is the plaintext, the true message, the missing information, the output ◮ X is the ciphertext, the garbled message, the observable evidence, the input ◮ Decoding: select y given X = x . y ∗ = argmax p ( y | x ) y p ( x | y ) · p ( y ) = argmax p ( x ) y = argmax p ( x | y ) · p ( y ) y � �� � ���� channel model source model 11 / 68

  12. Bitext/Parallel Text Let f and e be two sequences in V † (French) and ¯ V † (English), respectively. We’re going to define p ( F | e ) , the probability over French translations of English sentence e . In a noisy channel machine translation system, we could use this together with source/language model p ( e ) to “decode” f into an English translation. Where does the data to estimate this come from? 12 / 68

  13. IBM Model 1 (Brown et al., 1993) Let ℓ and m be the (known) lengths of e and f . Latent variable a = � a 1 , . . . , a m � , each a i ranging over { 0 , . . . , ℓ } (positions in e ). ◮ a 4 = 3 means that f 4 is “aligned” to e 3 . ◮ a 6 = 0 means that f 6 is “aligned” to a special null symbol, e 0 . ℓ ℓ ℓ � � � p ( f | e , m ) = · · · p ( f , a | e , m ) a 1 =0 a 2 =0 a m =0 � = p ( f , a | e , m ) a ∈{ 0 ,...,ℓ } m m � p ( f , a | e , m ) = p ( a i | i, ℓ, m ) · p ( f i | e a i ) i =1 1 = ℓ + 1 · θ f i | e ai 13 / 68

  14. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , . . . � 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s 14 / 68

  15. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 15 / 68

  16. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 · 17 + 1 · θ war | was 16 / 68

  17. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , 8 , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 1 · 17 + 1 · θ war | was · 17 + 1 · θ nicht | not 17 / 68

  18. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , 8 , 7 , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 1 · 17 + 1 · θ war | was · 17 + 1 · θ nicht | not 1 · 17 + 1 · θ voller | filled 18 / 68

  19. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , 8 , 7 , ? , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 1 · 17 + 1 · θ war | was · 17 + 1 · θ nicht | not 1 1 · 17 + 1 · θ voller | filled · 17 + 1 · θ Productionsfactoren | ? 19 / 68

  20. Example: f is German Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 4 , 5 , 6 , 8 , 7 , ? , . . . � 1 1 p ( f , a | e , m ) = 17 + 1 · θ Noahs | Noah’s · 17 + 1 · θ Arche | ark 1 1 · 17 + 1 · θ war | was · 17 + 1 · θ nicht | not 1 1 · 17 + 1 · θ voller | filled · 17 + 1 · θ Productionsfactoren | ? Problem: This alignment isn’t possible with IBM Model 1! Each f i is aligned to at most one e a i ! 20 / 68

  21. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , . . . � 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null 21 / 68

  22. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 · 10 + 1 · θ , | null 22 / 68

  23. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 23 / 68

  24. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , 2 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 1 · 10 + 1 · θ ark | Arche 24 / 68

  25. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , 2 , 3 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 1 1 · 10 + 1 · θ ark | Arche · 10 + 1 · θ was | war 25 / 68

  26. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , 2 , 3 , 5 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 1 1 · 10 + 1 · θ ark | Arche · 10 + 1 · θ was | war 1 · 10 + 1 · θ filled | voller 26 / 68

  27. Example: f is English Mr President , Noah's ark was filled not with production factors , but with living creatures . Noahs Arche war nicht voller Produktionsfaktoren , sondern Geschöpfe . a = � 0 , 0 , 0 , 1 , 2 , 3 , 5 , 4 , . . . � 1 1 p ( f , a | e , m ) = 10 + 1 · θ Mr | null · 10 + 1 · θ President | null 1 1 · 10 + 1 · θ , | null · 10 + 1 · θ Noah’s | Noahs 1 1 · 10 + 1 · θ ark | Arche · 10 + 1 · θ was | war 1 1 · 10 + 1 · θ filled | voller · 10 + 1 · θ not | nicht 27 / 68

  28. How to Estimate Translation Distributions? This is a problem of incomplete data : at training time, we see e and f , but not a . 28 / 68

Recommend


More recommend