translation as weighted deduction
play

Translation as Weighted Deduction Adam Lopez University of - PowerPoint PPT Presentation

Translation as Weighted Deduction Adam Lopez University of Edinburgh Moses Hiero Koehn et al., ACL 2007 Chiang, CL 2007 Adam Lopez Moses Hiero Koehn et al., ACL 2007 Chiang, CL 2007 Adam Lopez Moses Hiero Koehn et al., ACL 2007


  1. Translation as Weighted Deduction Adam Lopez University of Edinburgh

  2. Moses Hiero Koehn et al., ACL 2007 Chiang, CL 2007 Adam Lopez

  3. Moses Hiero Koehn et al., ACL 2007 Chiang, CL 2007 Adam Lopez

  4. Moses Hiero Koehn et al., ACL 2007 Chiang, CL 2007 Adam Lopez

  5. Moses Hiero Koehn et al., ACL 2007 Chiang, CL 2007 30.7 32.6 Lopez, Coling 2008 Adam Lopez

  6. Moses Hiero Koehn et al., ACL 2007 Chiang, CL 2007 hierarchical phrase-based phrase-based 15 features 5 features stack decoding cube pruning 30.7 32.6 Lopez, Coling 2008 Adam Lopez

  7. Moses Hiero Koehn et al., ACL 2007 Chiang, CL 2007 hierarchical phrase-based rules phrase-based parameters 15 features 5 features stack decoding cube pruning search 30.7 32.6 Lopez, Coling 2008 Adam Lopez

  8. hierarchical phrase-based rules phrase-based parameters 15 features 5 features stack decoding cube pruning search Adam Lopez

  9. hierarchical phrase-based rules phrase-based parameters 15 features 5 features 5 features stack decoding cube pruning cube pruning search Adam Lopez

  10. synchronous hierarchical phrase-based rules TAG phrase-based parameters 15 features 5 features 5 features stack decoding cube pruning cube pruning search Adam Lopez

  11. This talk is not about How to improve your BLEU score by 1.9. Adam Lopez

  12. This talk is about Building and analyzing translation models and algorithms in a modular way. Adam Lopez

  13. phrase-based rules parameters 15 features stack decoding search Adam Lopez

  14. deductive logic phrase-based rules parameters 15 features stack decoding search Adam Lopez

  15. deductive logic phrase-based rules semiring parameters 15 features stack decoding search Adam Lopez

  16. deductive logic phrase-based rules semiring parameters 15 features stack decoding (hyper)graph search algorithms Adam Lopez

  17. 北 风 呼 啸 Adam Lopez

  18. 北 风 呼 啸 风 /wind 呼 啸 /whistles 北 /north 风 /winds 呼 啸 /strong 北 /northerly • word-to-word translation • no reordering Adam Lopez

  19. 北 风 呼 啸 风 /wind 呼 啸 /whistles 北 /north 风 /winds 呼 啸 /strong 北 /northerly north wind whistles northerly wind whistles north wind strong northerly winds whistles north winds whistles northerly winds strong north winds strong northerly wind strong notice: complexity is O(2 L ) for sentence length L Adam Lopez

  20. 北 风 呼 啸 风 /wind (.6) 呼 啸 /whistles (.7) 北 /north (.8) 风 /winds (.4) 呼 啸 /strong (.3) 北 /northerly (.2) north wind whistle northerly winds strong complexity is O(L) for sentence length L Adam Lopez

  21. north wind whistle northerly winds strong Adam Lopez

  22. [0] [1] [2] [3] north wind whistle northerly winds strong Adam Lopez

  23. 风 [1] R ( /wind ) [2] [0] [1] [2] [3] north wind whistle northerly winds strong Adam Lopez

  24. [ i ] 风 [1] R ( /wind ) [2] [0] [1] [2] [3] north wind whistle northerly winds strong Adam Lopez

  25. [ i ] R ( f i +1 /e j ) 风 [1] R ( /wind ) [2] [0] [1] [2] [3] north wind whistle northerly winds strong Adam Lopez

  26. [ i ] R ( f i +1 /e j ) 风 [1] R ( /wind ) [2] [ i + 1] [0] [1] [2] [3] north wind whistle northerly winds strong Adam Lopez

  27. [ i ] R ( f i +1 /e j ) 风 [1] R ( /wind ) [2] [ i + 1] i ranges over sentence length [0] [1] [2] [3] north wind whistle northerly winds strong Determine complexity from inspection McAllester, Proc. Static Analysis 1999 Adam Lopez

  28. [ i ] R ( f i +1 /e j ) [ i ] [ i + 1] � [0 , 1] , max , ×� � { ⊤ , ⊥ } , ∪ , ∩� Viterbi: Boolean: � [0 , 1] , + , ×� sum: Reverse (outside) values north wind whistle northerly winds strong Compute many quantities on same graph Goodman, CL 1999 Adam Lopez

  29. [ i ] R ( f i +1 /e j ) [ i ] [ i + 1] Expectation semiring Approximation semiring Eisner 2002 Gimpel & Smith 2009 north wind whistle northerly winds strong Compute many quantities on same graph Goodman, CL 1999 Adam Lopez

  30. Basic Idea • Supply a logic and a semiring, get a complete algorithm. • Does it work for most translation models? Adam Lopez

  31. V ∧ 0 i 1 i ′ − i 0 I − i ′ = 0 I , | i − i ′′ | ≤ d [ i ′′ , V ] R ( f i +1 ...f i ′ /e j ...e j ′ ) [ i ′ , V ∨ 0 i 1 i ′ − i 0 I − i ′ ] Adam Lopez

  32. previous previous distortion last position coverage limit translated vector phrase pair V ∧ 0 i 1 i ′ − i 0 I − i ′ = 0 I , | i − i ′′ | ≤ d [ i ′′ , V ] R ( f i +1 ...f i ′ /e j ...e j ′ ) [ i ′ , V ∨ 0 i 1 i ′ − i 0 I − i ′ ] last position coverage only translate previously translated vector untranslated words Adam Lopez

  33. Phrase-based Models Adam Lopez

  34. Phrase-based Models Max distortion d see, e.g. Moore & Quirk 2007 Window length d Moses (Hoang & Koehn, pc) First d uncovered see, e.g. Tillman & Ney 2003, Zens & Ney 2004 Adam Lopez

  35. Phrase-based Models Max distortion d [ i ′′ , V ] R ( f i +1 ...f i ′ /e j ...e j ′ ) V ∧ 0 i 1 i ′ − i 0 I − i ′ = 0 I , | i − i ′′ | ≤ d [ i ′ , V ∨ 0 i 1 i ′ − i 0 I − i ′ ] see, e.g. Moore & Quirk 2007 [ i, C ] R ( f i +1 ...f i ′ /e j ...e j ′ ) C ∧ 1 i ′ − i 0 d − i ′ + i = 0 d , i ′ − i ≤ d Window length d [ i ′ , C ≪ i ′ − i ] [ i, C ] R ( f i ′ ...f i ′′ /e j ...e j ′ ) [ i, C ∨ 0 i ′ − i 1 i ′′ − i ′ 0 d − i ′′ + i ] C ∧ 0 i ′ − i 1 i ′′ − i ′ 0 d − i ′′ + i = 0 d , i ′′ − i ≤ d Moses (Hoang & Koehn, pc) First d uncovered [ i, U ] R ( f i ′ ...f i ′′ /e j ...e j ′ ) [ i ′′ , U − [ i ′ , i ′′ ] ∨ [ i ′′ , i ′′ + d − | U − [ i ′ , i ′′ ] | ]] i ′ > i, f i +1 ∈ U see, e.g. Tillman & Ney 2003, [ i, U ] R ( f i ′ ...f i ′′ /e j ...e j ′ ) [ i, U − [ i ′ , i ′′ ] ∨ [max( U ∨ i ) + 1 , max( U ∨ i ) + 1 + d − | U − [ i ′ , i ′′ ] | ]] i ′ < i, [ f i ′ , f i ′′ ] ⊂ U Zens & Ney 2004 Adam Lopez

  36. Phrase-based Models Max distortion d see, e.g. Moore & Quirk 2007 Window length d Moses (Hoang & Koehn, pc) First d uncovered see, e.g. Tillman & Ney 2003, Zens & Ney 2004 Adam Lopez

  37. Phrase-based Models Max distortion d see, e.g. Moore & Quirk 2007 Window length d Moses (Hoang & Koehn, pc) First d uncovered see, e.g. Tillman & Ney 2003, Zens & Ney 2004 Adam Lopez

  38. Phrase-based Models d = 3 Max distortion d see, e.g. Moore & Quirk 2007 Window length d Moses (Hoang & Koehn, pc) First d uncovered see, e.g. Tillman & Ney 2003, Zens & Ney 2004 Adam Lopez

  39. Phrase-based Models d = 3 Max distortion d see, e.g. Moore & Quirk 2007 Window length d Moses (Hoang & Koehn, pc) First d uncovered see, e.g. Tillman & Ney 2003, Zens & Ney 2004 Adam Lopez

  40. Phrase-based Models d = 3 Max distortion d see, e.g. Moore & Quirk 2007 Window length d Moses (Hoang & Koehn, pc) First d uncovered see, e.g. Tillman & Ney 2003, Zens & Ney 2004 Adam Lopez

  41. Phrase-based Models d = 3 Max distortion d O ( n 3 d 2 ) see, e.g. Moore & Quirk 2007 Window length d O ( nd 2 2 d ) Moses (Hoang & Koehn, pc) First d uncovered � n � O ( nd ) see, e.g. Tillman & Ney 2003, d +1 Zens & Ney 2004 Adam Lopez

  42. Phrase-based Models These models are not the same. • Each can generate translations that the other cannot (regardless of d ). • Different complexities. • Reported results will be impossible to replicate with your (different) strategy. Adam Lopez

  43. Good News • Most translation models are a few lines of deductive logic. • Computation of any semiring for free. • You might conclude: give a logic and a semiring, get a complete algorithm. Adam Lopez

  44. Good News • Most translation models are a few lines of deductive logic. • Computation of any semiring for free. • You might conclude: give a logic and a semiring, get a complete algorithm. Adam Lopez

  45. Result • Given: • A logic • A semiring • Get: a complete algorithm Adam Lopez

  46. Bad News • Our models use non-local features. • We need approximate search algorithms (and we need to be able to tweak them). Adam Lopez

  47. Non-local features Adam Lopez

  48. Non-local features Adam Lopez

  49. Non-local features [ e q , ..., e q + n − 2 ] R ( e q , ..., e q + n − 1 ) [ e q +1 , ..., e q + n − 1 ] Adam Lopez

  50. Non-local features Adam Lopez

  51. Non-local features Adam Lopez

  52. Non-local features Adam Lopez

  53. Non-local features Adam Lopez

  54. Non-local features minimal logic [ i ] R ( f i +1 ...f i ′ /e j ...e j ′ ) [ i ′ ] Adam Lopez

  55. Non-local features minimal logic [ i ] R ( f i +1 ...f i ′ /e j ...e j ′ ) [ i ′ ] complete logic [ i, e j − n +1 , ..., e j − 1 ] R ( f i +1 ...f i ′ /e j ...e j ′ ) R ( e j − n +1 , ..., e j ) ...R ( e j ′ − n +1 ...e j ′ ) [ i ′ , e j ′ − n +2 ...e j ′ ] Adam Lopez

Recommend


More recommend