maxforce max violation perceptron and forced decoding for
play

MaxForce: Max-Violation Perceptron and Forced Decoding for Scalable - PowerPoint PPT Presentation

MaxForce: Max-Violation Perceptron and Forced Decoding for Scalable MT Training held talks with Sharon held Bush talks with Sharon 0 1 2 3 4 5 6 Heng Yu Liang Huang Kai Zhao Haitao Mi Chinese Acad. of Sciences CUNY CUNY IBM T.


  1. Forced Decoding • both as data selection (more literal) and oracle derivations Bushi yu Shalong juxing le huitan Bush held talks with Sharon held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6

  2. Forced Decoding • both as data selection (more literal) and oracle derivations Bushi yu Shalong juxing le huitan Bush held talks with Sharon Bush _ _ _ _ _ ! ● ! ! ! ! ! held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6

  3. Forced Decoding • both as data selection (more literal) and oracle derivations Bushi yu Shalong juxing le huitan Bush held talks with Sharon ... meeting _ _ ● ● ● ● ! ! ! ! ... talks Bush _ _ ● ● ● ● ! ! ! ! ! ! ! ! ! ! ! _ _ _ _ _ ! ● ! ! ! ! ! ... talk _ _ ● ● ● ● ! ! ! ! ! ! ! ! ! ! ! ! ! ! held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6

  4. Forced Decoding • both as data selection (more literal) and oracle derivations Bushi yu Shalong juxing le huitan Bush held talks with Sharon ... meeting _ _ ● ● ● ● ! ! ! ! ●●●●●● ... Shalong ... talks ●●●●●● ... Sharon Bush _ _ ● ● ● ● ! ! ! ! ! ! ! ! ! ! ! _ _ _ _ _ ! ● ! ! ! ! ! ... talk _ _ ● ● ● ● ! ! ! ! ! ! ! ! ! ! ! ! ! ! held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6

  5. Forced Decoding • both as data selection (more literal) and oracle derivations Bushi yu Shalong juxing le huitan Bush held talks with Sharon ... meeting _ _ ● ● ● ● ! ! ! ! ●●●●●● ... Shalong ... talks ●●●●●● ... Sharon Bush _ _ ● ● ● ● ! ! ! ! ! ! ! ! ! ! ! _ _ _ _ _ ! ● ! ! ! ! ! ... talk _ _ ● ● ● ● ! ! ! ! ! ! ! ! ! ! ! ! ! ! held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6

  6. Forced Decoding • both as data selection (more literal) and oracle derivations Bushi yu Shalong juxing le huitan Bush held talks with Sharon ... meeting _ _ ● ● ● ● ! ! ! ! ●●●●●● ... Shalong ... talks ●●●●●● ... Sharon Bush _ _ ● ● ● ● ! ! ! ! ! ! ! ! ! ! ! _ _ _ _ _ ! ● ! ! ! ! ! ... talk _ _ ● ● ● ● ! ! ! ! ! ! ! ! ! ! ! ! ! ! held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6

  7. Forced Decoding • both as data selection (more literal) and oracle derivations Bushi yu Shalong juxing le huitan Bush held talks with Sharon ... meeting _ _ ● ● ● ● ! ! ! ! ●●●●●● ... Shalong ... talks ●●●●●● ... Sharon Bush _ _ ● ● ● ● ! ! ! ! ! ! ! ! ! ! ! _ _ _ _ _ ! ● ! ! ! ! ! one gold derivation ... talk _ _ ● ● ● ● ! ! ! ! ! ! ! ! ! ! ! ! ! ! held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6

  8. Unreachable Sentences and Prefix • distortion limit causes unreachability (hiero would be better) • but we can still use reachable prefix-pairs of unreachable pairs 玻利 维亚 观 察 员 联 合国 大 选 派遣 监 督 50 名 民主 以来 首次 全国 恢复 政治 U.N. sent 50 observers to 5 monitor the 1st election 3 since 3 Bolivia restored democracy 11 4

  9. Unreachable Sentences and Prefix • distortion limit causes unreachability (hiero would be better) • but we can still use reachable prefix-pairs of unreachable pairs 玻利 维亚 观 察 员 联 合国 大 选 派遣 监 督 50 名 民主 以来 首次 全国 恢复 政治 U.N. sent 50 observers to 5 monitor the 1st election 3 since 3 Bolivia restored democracy 11 4

  10. Sentence/Word Reachability Ratio • how many sentences pairs pass forced decoding? • the ratio drops dramatically as sentences get longer • prefixes boost coverage 100% Ratio of complete coverage 90% 80% 70% Distortion-unlimit 60% Distortion-limit 6 50% Distortion-limit 4 Distortion-limit 2 40% Distortion-limit 0 30% 20% 10% 0% 10 20 30 40 50 60 70 Sentence length 12

  11. Sentence/Word Reachability Ratio • how many sentences pairs pass forced decoding? • the ratio drops dramatically as sentences get longer • prefixes boost coverage 100% Ratio of complete coverage 90% 80% 70% Distortion-unlimit 60% Distortion-limit 6 50% Distortion-limit 4 Distortion-limit 2 40% Distortion-limit 0 30% 20% 10% 0% 10 20 30 40 50 60 70 Sentence length 12

  12. Sentence/Word Reachability Ratio • how many sentences pairs pass forced decoding? • the ratio drops dramatically as sentences get longer • prefixes boost coverage 100% Ratio of complete coverage 90% 80% 70% Distortion-unlimit 100% dist-6 Ratio of complete coverage 60% 90% Distortion-limit 6 dist-4 80% 50% Distortion-limit 4 dist-2 70% dist-0 Distortion-limit 2 40% 60% Distortion-limit 0 50% 30% 40% 20% 30% 20% 10% 10% 0% 0% 10 20 30 40 50 60 70 10 20 30 40 50 60 70 Sentence length Sentence length 12

  13. Number of Gold Derivations • exponential in sentence length (on fully reachables) • these are the “latent variables” in learning 100000 Average number of derivations dist-6 90000 dist-4 80000 dist-2 70000 dist-0 60000 50000 40000 30000 20000 10000 0 5 10 15 20 25 30 35 40 45 50 Sentence length 13

  14. Outline • Background: Phrase-based Translation (Koehn, 2004) • Forced Decoding • Violation-Fixing Perceptron for MT Training • Update strategy • Feature design • Experiments 14

  15. Structured Perceptron (Collins 02) binary classification w x x exact z x update weights inference if y ≠ z y y=+ 1 y=- 1 15

  16. Structured Perceptron (Collins 02) binary classification w x x exact z x update weights inference if y ≠ z y y=+ 1 y=- 1 structured classification 15

  17. Structured Perceptron (Collins 02) binary classification w x x exact z x update weights inference if y ≠ z y y=+ 1 y=- 1 structured classification 那 人 咬 了 狗 x the man bit the dog y 15

  18. Structured Perceptron (Collins 02) binary classification w x x exact z x constant update weights inference # of classes if y ≠ z y y=+ 1 y=- 1 exponential structured classification # of classes w 那 人 咬 了 狗 exact x z x update weights inference if y ≠ z y the man bit the dog y • challenges in applying perceptron for MT • the inference (decoding) is vastly inexact (beam search) • we know standard perceptron doesn’t work for MT • intuition: the learner should fix the search error first 15

  19. Structured Perceptron (Collins 02) binary classification w x x exact z x constant update weights inference # of classes if y ≠ z y y=+ 1 y=- 1 exponential structured classification # of classes w inexact 那 人 咬 了 狗 exact x z x update weights inference inference if y ≠ z y the man bit the dog y • challenges in applying perceptron for MT • the inference (decoding) is vastly inexact (beam search) • we know standard perceptron doesn’t work for MT • intuition: the learner should fix the search error first 15

  20. Search Error: Gold Derivations Pruned held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 real decoding beam search _ _ _ _ _ _ 0 1 2 3 4 5 6 16

  21. Search Error: Gold Derivations Pruned held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 real decoding beam search ● _ _ _ _ _ _ _ _ _ _ ● _ _ _ _ ● _ _ _ _ _ _ _ _ _ ● _ _ _ 0 1 2 3 4 5 6 16

  22. Search Error: Gold Derivations Pruned held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 real decoding beam search ● _ _ _ _ _ _ _ ● _ ● _ _ _ _ _ _ ● _ _ ● ● _ _ _ _ _ _ ● _ _ ● ● _ _ _ _ _ _ _ _ _ _ _ ● _ _ _ _ ● _ _ ● _ 0 1 2 3 4 5 6 16

  23. Search Error: Gold Derivations Pruned held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 real decoding beam search ● _ _ _ _ _ _ ● ● ● _ _ _ _ ● _ ● _ _ _ _ _ _ ● _ _ ● ● ● _ _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ _ _ _ _ _ _ _ _ ● _ _ _ _ ● _ _ ● _ _ ● ● ● _ _ ● _ _ ● ● _ 0 1 2 3 4 5 6 16

  24. Search Error: Gold Derivations Pruned held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 real decoding beam search ● _ _ _ _ _ _ ● ● ● _ _ _ ● _ ● ● ● _ _ ● _ ● _ _ _ _ _ _ ● ● _ ● ● _ ● _ _ ● ● ● _ _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ ● _ _ ● ● ● _ _ _ _ _ _ _ _ ● _ _ _ ● _ _ ● ● ● _ ● _ _ ● _ _ ● ● ● _ _ ● _ _ ● ● _ 0 1 2 3 4 5 6 16

  25. Search Error: Gold Derivations Pruned held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 real decoding beam search ● _ _ _ _ _ _ ● ● ● _ _ _ ● _ ● ● ● _ _ ● _ ● _ _ _ _ _ _ ● ● _ ● ● _ ● _ _ ● ● ● _ _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ ● _ _ ● ● ● _ _ _ _ _ _ _ _ ● _ _ _ _ ● _ _ ● _ _ ● ● ● _ _ ● _ _ ● ● _ 0 1 2 3 4 5 6 16

  26. Search Error: Gold Derivations Pruned held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 real decoding beam search should fix search errors here! ● _ _ _ _ _ _ ● ● ● _ _ _ ● _ ● ● ● _ _ ● _ ● _ _ _ _ _ _ ● ● _ ● ● _ ● _ _ ● ● ● _ _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ ● _ _ ● ● ● _ _ _ _ _ _ _ _ ● _ _ _ _ ● _ _ ● _ _ ● ● ● _ _ ● _ _ ● ● _ 0 1 2 3 4 5 6 16

  27. Fixing Search Error 1: Early Update standard update Model (no guarantee!) 17 21

  28. Fixing Search Error 1: Early Update • early update (Collins/Roark’04) when the correct falls off beam • up to this point the incorrect prefix should score higher • that’s a “violation” which we want to fix standard update Model (no guarantee!) 17 21

  29. Fixing Search Error 1: Early Update • early update (Collins/Roark’04) when the correct falls off beam • up to this point the incorrect prefix should score higher • that’s a “violation” which we want to fix correct standard update correct sequence Model (no guarantee!) falls off beam (pruned) 17 21

  30. Fixing Search Error 1: Early Update • early update (Collins/Roark’04) when the correct falls off beam • up to this point the incorrect prefix should score higher • that’s a “violation” which we want to fix correct t c e r r o c n i standard update correct sequence Model (no guarantee!) falls off beam (pruned) 17 21

  31. Fixing Search Error 1: Early Update • early update (Collins/Roark’04) when the correct falls off beam • up to this point the incorrect prefix should score higher • that’s a “violation” which we want to fix correct violation guaranteed: update early incorrect prefix scores t c higher up to this point e r r o c n i standard update correct sequence Model (no guarantee!) falls off beam (pruned) 17 21

  32. Fixing Search Error 1: Early Update • early update (Collins/Roark’04) when the correct falls off beam • up to this point the incorrect prefix should score higher • that’s a “violation” which we want to fix • standard perceptron does not guarantee violation • w/ pruning, the correct seq. might score higher at the end! • called “invalid” update b/c it doesn’t fix the search error correct violation guaranteed: update early incorrect prefix scores t c higher up to this point e r r o c n i standard update correct sequence Model (no guarantee!) falls off beam (pruned) 17 21

  33. Early Update w/ Latent Variable • the gold-standard derivations are not annotated • we treat any reference-producing derivation as good gold derivation lattice held talks with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 Model 18 21

  34. Early Update w/ Latent Variable • the gold-standard derivations are not annotated • we treat any reference-producing derivation as good gold derivation lattice held talks with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 correct Model 18 21

  35. Early Update w/ Latent Variable • the gold-standard derivations are not annotated • we treat any reference-producing derivation as good gold derivation lattice held talks with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 correct Model 18 21

  36. Early Update w/ Latent Variable • the gold-standard derivations are not annotated • we treat any reference-producing derivation as good gold derivation lattice held talks with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 correct all correct derivations fall off Model 18 21

  37. Early Update w/ Latent Variable • the gold-standard derivations are not annotated • we treat any reference-producing derivation as good gold derivation lattice held talks with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 correct t c e r r o c n i all correct derivations fall off Model 18 21

  38. Early Update w/ Latent Variable • the gold-standard derivations are not annotated • we treat any reference-producing derivation as good gold derivation lattice held talks with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 violation guaranteed: correct update early incorrect prefix scores t c higher up to this point e r r o c n i all correct derivations fall off Model 18 21

  39. Early Update w/ Latent Variable • the gold-standard derivations are not annotated • we treat any reference-producing derivation as good gold derivation lattice held talks with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 violation guaranteed: correct update early incorrect prefix scores t c higher up to this point e r r o c n i all correct derivations fall off Model stop decoding 18 21

  40. Fixing Search Error 2: Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ • early update works but learns slowly due to partial updates • max-violation: use the prefix where violation is maximum • “worst-mistake” in the search space • we call these methods “violation-fixing perceptrons” (Huang et al 2012) 19

  41. Early Update vs. Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ _ _ _ _ _ _ 0 1 2 3 4 5 6

  42. Early Update vs. Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ ● _ _ _ _ _ _ _ _ _ _ ● _ _ _ _ ● _ _ _ _ _ _ _ _ _ ● _ _ _ 0 1 2 3 4 5 6

  43. Early Update vs. Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ ● _ _ _ _ _ _ _ ● _ ● _ _ _ _ _ _ ● _ _ ● ● _ _ _ _ _ _ ● _ _ ● ● _ _ _ _ _ _ _ _ _ _ _ ● _ _ _ _ ● _ _ ● _ 0 1 2 3 4 5 6

  44. Early Update vs. Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ ● _ _ _ _ _ _ ● ● ● _ _ _ _ ● _ ● _ _ _ _ _ _ ● _ _ ● ● ● _ _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ _ _ _ _ _ _ _ _ ● _ _ _ _ ● _ _ ● _ _ ● ● ● _ _ ● _ _ ● ● _ 0 1 2 3 4 5 6

  45. Early Update vs. Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ ● _ _ _ _ _ _ ● ● ● _ _ _ _ ● _ ● _ _ _ _ _ _ ● _ _ ● ● ● _ _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ _ _ _ _ _ _ _ _ ● _ _ _ _ ● _ _ ● _ _ ● ● ● _ _ ● _ _ ● ● _ 0 1 2 3 4 5 6 Early-update

  46. Early Update vs. Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ ● _ _ _ _ _ _ ● ● ● _ _ _ ● _ ● ● ● _ _ ● _ ● _ _ _ _ _ _ ● ● _ ● ● _ ● _ _ ● ● ● _ _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ ● _ _ ● ● ● _ _ _ _ _ _ _ _ ● _ _ _ ● _ _ ● ● ● _ ● _ _ ● _ _ ● ● ● _ _ ● _ _ ● ● _ 0 1 2 3 4 5 6 Early-update

  47. Early Update vs. Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ ● _ _ _ _ _ _ ● ● ● _ _ _ ● _ ● ● ● _ _ ● _ ● _ _ _ _ _ _ ● ● _ ● ● _ ● _ _ ● ● ● _ _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ ● _ _ ● ● ● _ _ _ _ _ _ _ _ ● _ _ _ _ ● _ _ ● _ _ ● ● ● _ _ ● _ _ ● ● _ 0 1 2 3 4 5 6 Early-update

  48. Early Update vs. Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ ● _ _ _ _ _ _ ● ● ● _ _ _ ● _ ● ● ● _ _ ● _ ● _ ● ● ● _ ● ● _ _ _ _ _ ● ● _ ● ● _ ● _ _ ● ● ● _ ● _ ● ● ● ● _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ ● ● ● ● _ ● ● _ _ ● ● ● _ _ _ _ _ _ _ _ ● _ _ _ ● ● ● _ ● ● _ ● _ _ ● _ _ ● ● ● _ _ ● _ _ ● ● _ ● ● _ ● ● ● 0 1 2 3 4 5 6 Early-update

  49. Early Update vs. Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ ● _ _ _ _ _ _ ● ● ● _ _ _ ● _ ● ● ● _ _ ● _ ● _ ● ● ● _ ● ● _ _ _ _ _ ● ● _ ● ● _ ● _ _ ● ● ● _ ● _ ● ● ● ● _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ ● ● ● ● _ ● ● _ _ ● ● ● _ _ _ _ _ _ _ _ ● _ _ _ ● ● ● _ ● ● _ ● _ _ ● _ _ ● ● ● _ _ ● ● ● ● ● ● ● _ _ ● ● _ ● ● _ ● ● ● 0 1 2 3 4 5 6 Early-update

  50. Early Update vs. Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ ● _ _ _ _ _ _ ● ● ● _ _ _ ● _ ● ● ● _ _ ● _ ● _ ● ● ● _ ● ● _ _ _ _ _ ● ● _ ● ● _ ● _ _ ● ● ● _ ● _ ● ● ● ● _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ ● ● ● ● _ ● ● _ _ ● ● ● _ _ _ _ _ _ _ _ ● _ _ _ ● ● ● _ ● ● _ ● _ _ ● _ _ ● ● ● _ _ ● ● ● ● ● ● ● _ _ ● ● _ ● ● _ ● ● ● 0 1 2 3 4 5 6 Early-update Max-violation

  51. Latent-Variable Perceptron (standard) full best in the beam violation c o early r r max- e latest c t s e q u e n c e worst in the beam last valid invalid update update! falls off biggest the beam violation d + | x | std d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗

  52. Roadmap of the techniques structured perceptron (Collins, 2002) 22

  53. Roadmap of the techniques structured perceptron (Collins, 2002) latent-variable perceptron (Zettlemoyer and Collins, 2005; Sun et al., 2009) 22

  54. Roadmap of the techniques structured perceptron (Collins, 2002) perceptron w/ latent-variable inexact search perceptron (Collins & Roark, 2004; (Zettlemoyer and Collins, Huang et al 2012) 2005; Sun et al., 2009) 22

  55. Roadmap of the techniques structured perceptron (Collins, 2002) perceptron w/ latent-variable inexact search perceptron (Collins & Roark, 2004; (Zettlemoyer and Collins, Huang et al 2012) 2005; Sun et al., 2009) latent-variable perceptron w/ inexact search (Yu et al 2013) 22

  56. Roadmap of the techniques structured perceptron (Collins, 2002) perceptron w/ latent-variable inexact search perceptron (Collins & Roark, 2004; (Zettlemoyer and Collins, Huang et al 2012) 2005; Sun et al., 2009) latent-variable perceptron w/ inexact search (Yu et al 2013) hiero syntactic parsing semantic parsing transliteration 22

  57. Feature Design • Dense features: • standard phrase-based features (Koehn, 2004) • Sparse Features: • rule-identification features (unique id for each rule) • word-edges features • lexicalized local translation context within a rule • non-local features • dependency between consecutive rules 23

  58. 布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule 24

  59. 布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule 24

  60. 布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule 24

  61. 布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule 24

  62. 布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule 24

  63. 布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule Combo Features: 24

  64. 布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule Combo Features: 24

  65. 布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule Combo Features: 100010= 沙 ⻰龚 |held 24

  66. 布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule Combo Features: 100010= 沙 ⻰龚 |held 24

  67. 布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule Combo Features: 100010= 沙 ⻰龚 |held 24

  68. 布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule Combo Features: 100010= 沙 ⻰龚 |held 010001= 举 行 |talks 24

Recommend


More recommend