Translation as Weighted Deduction Adam Lopez University of Edinburgh
Moses Hiero Koehn et al., ACL 2007 Chiang, CL 2007 Adam Lopez
Moses Hiero Koehn et al., ACL 2007 Chiang, CL 2007 Adam Lopez
Moses Hiero Koehn et al., ACL 2007 Chiang, CL 2007 Adam Lopez
Moses Hiero Koehn et al., ACL 2007 Chiang, CL 2007 30.7 32.6 Lopez, Coling 2008 Adam Lopez
Moses Hiero Koehn et al., ACL 2007 Chiang, CL 2007 hierarchical phrase-based phrase-based 15 features 5 features stack decoding cube pruning 30.7 32.6 Lopez, Coling 2008 Adam Lopez
Moses Hiero Koehn et al., ACL 2007 Chiang, CL 2007 hierarchical phrase-based rules phrase-based parameters 15 features 5 features stack decoding cube pruning search 30.7 32.6 Lopez, Coling 2008 Adam Lopez
hierarchical phrase-based rules phrase-based parameters 15 features 5 features stack decoding cube pruning search Adam Lopez
hierarchical phrase-based rules phrase-based parameters 15 features 5 features 5 features stack decoding cube pruning cube pruning search Adam Lopez
synchronous hierarchical phrase-based rules TAG phrase-based parameters 15 features 5 features 5 features stack decoding cube pruning cube pruning search Adam Lopez
This talk is not about How to improve your BLEU score by 1.9. Adam Lopez
This talk is about Building and analyzing translation models and algorithms in a modular way. Adam Lopez
phrase-based rules parameters 15 features stack decoding search Adam Lopez
deductive logic phrase-based rules parameters 15 features stack decoding search Adam Lopez
deductive logic phrase-based rules semiring parameters 15 features stack decoding search Adam Lopez
deductive logic phrase-based rules semiring parameters 15 features stack decoding (hyper)graph search algorithms Adam Lopez
北 风 呼 啸 Adam Lopez
北 风 呼 啸 风 /wind 呼 啸 /whistles 北 /north 风 /winds 呼 啸 /strong 北 /northerly • word-to-word translation • no reordering Adam Lopez
北 风 呼 啸 风 /wind 呼 啸 /whistles 北 /north 风 /winds 呼 啸 /strong 北 /northerly north wind whistles northerly wind whistles north wind strong northerly winds whistles north winds whistles northerly winds strong north winds strong northerly wind strong notice: complexity is O(2 L ) for sentence length L Adam Lopez
北 风 呼 啸 风 /wind (.6) 呼 啸 /whistles (.7) 北 /north (.8) 风 /winds (.4) 呼 啸 /strong (.3) 北 /northerly (.2) north wind whistle northerly winds strong complexity is O(L) for sentence length L Adam Lopez
north wind whistle northerly winds strong Adam Lopez
[0] [1] [2] [3] north wind whistle northerly winds strong Adam Lopez
风 [1] R ( /wind ) [2] [0] [1] [2] [3] north wind whistle northerly winds strong Adam Lopez
[ i ] 风 [1] R ( /wind ) [2] [0] [1] [2] [3] north wind whistle northerly winds strong Adam Lopez
[ i ] R ( f i +1 /e j ) 风 [1] R ( /wind ) [2] [0] [1] [2] [3] north wind whistle northerly winds strong Adam Lopez
[ i ] R ( f i +1 /e j ) 风 [1] R ( /wind ) [2] [ i + 1] [0] [1] [2] [3] north wind whistle northerly winds strong Adam Lopez
[ i ] R ( f i +1 /e j ) 风 [1] R ( /wind ) [2] [ i + 1] i ranges over sentence length [0] [1] [2] [3] north wind whistle northerly winds strong Determine complexity from inspection McAllester, Proc. Static Analysis 1999 Adam Lopez
[ i ] R ( f i +1 /e j ) [ i ] [ i + 1] � [0 , 1] , max , ×� � { ⊤ , ⊥ } , ∪ , ∩� Viterbi: Boolean: � [0 , 1] , + , ×� sum: Reverse (outside) values north wind whistle northerly winds strong Compute many quantities on same graph Goodman, CL 1999 Adam Lopez
[ i ] R ( f i +1 /e j ) [ i ] [ i + 1] Expectation semiring Approximation semiring Eisner 2002 Gimpel & Smith 2009 north wind whistle northerly winds strong Compute many quantities on same graph Goodman, CL 1999 Adam Lopez
Basic Idea • Supply a logic and a semiring, get a complete algorithm. • Does it work for most translation models? Adam Lopez
V ∧ 0 i 1 i ′ − i 0 I − i ′ = 0 I , | i − i ′′ | ≤ d [ i ′′ , V ] R ( f i +1 ...f i ′ /e j ...e j ′ ) [ i ′ , V ∨ 0 i 1 i ′ − i 0 I − i ′ ] Adam Lopez
previous previous distortion last position coverage limit translated vector phrase pair V ∧ 0 i 1 i ′ − i 0 I − i ′ = 0 I , | i − i ′′ | ≤ d [ i ′′ , V ] R ( f i +1 ...f i ′ /e j ...e j ′ ) [ i ′ , V ∨ 0 i 1 i ′ − i 0 I − i ′ ] last position coverage only translate previously translated vector untranslated words Adam Lopez
Phrase-based Models Adam Lopez
Phrase-based Models Max distortion d see, e.g. Moore & Quirk 2007 Window length d Moses (Hoang & Koehn, pc) First d uncovered see, e.g. Tillman & Ney 2003, Zens & Ney 2004 Adam Lopez
Phrase-based Models Max distortion d [ i ′′ , V ] R ( f i +1 ...f i ′ /e j ...e j ′ ) V ∧ 0 i 1 i ′ − i 0 I − i ′ = 0 I , | i − i ′′ | ≤ d [ i ′ , V ∨ 0 i 1 i ′ − i 0 I − i ′ ] see, e.g. Moore & Quirk 2007 [ i, C ] R ( f i +1 ...f i ′ /e j ...e j ′ ) C ∧ 1 i ′ − i 0 d − i ′ + i = 0 d , i ′ − i ≤ d Window length d [ i ′ , C ≪ i ′ − i ] [ i, C ] R ( f i ′ ...f i ′′ /e j ...e j ′ ) [ i, C ∨ 0 i ′ − i 1 i ′′ − i ′ 0 d − i ′′ + i ] C ∧ 0 i ′ − i 1 i ′′ − i ′ 0 d − i ′′ + i = 0 d , i ′′ − i ≤ d Moses (Hoang & Koehn, pc) First d uncovered [ i, U ] R ( f i ′ ...f i ′′ /e j ...e j ′ ) [ i ′′ , U − [ i ′ , i ′′ ] ∨ [ i ′′ , i ′′ + d − | U − [ i ′ , i ′′ ] | ]] i ′ > i, f i +1 ∈ U see, e.g. Tillman & Ney 2003, [ i, U ] R ( f i ′ ...f i ′′ /e j ...e j ′ ) [ i, U − [ i ′ , i ′′ ] ∨ [max( U ∨ i ) + 1 , max( U ∨ i ) + 1 + d − | U − [ i ′ , i ′′ ] | ]] i ′ < i, [ f i ′ , f i ′′ ] ⊂ U Zens & Ney 2004 Adam Lopez
Phrase-based Models Max distortion d see, e.g. Moore & Quirk 2007 Window length d Moses (Hoang & Koehn, pc) First d uncovered see, e.g. Tillman & Ney 2003, Zens & Ney 2004 Adam Lopez
Phrase-based Models Max distortion d see, e.g. Moore & Quirk 2007 Window length d Moses (Hoang & Koehn, pc) First d uncovered see, e.g. Tillman & Ney 2003, Zens & Ney 2004 Adam Lopez
Phrase-based Models d = 3 Max distortion d see, e.g. Moore & Quirk 2007 Window length d Moses (Hoang & Koehn, pc) First d uncovered see, e.g. Tillman & Ney 2003, Zens & Ney 2004 Adam Lopez
Phrase-based Models d = 3 Max distortion d see, e.g. Moore & Quirk 2007 Window length d Moses (Hoang & Koehn, pc) First d uncovered see, e.g. Tillman & Ney 2003, Zens & Ney 2004 Adam Lopez
Phrase-based Models d = 3 Max distortion d see, e.g. Moore & Quirk 2007 Window length d Moses (Hoang & Koehn, pc) First d uncovered see, e.g. Tillman & Ney 2003, Zens & Ney 2004 Adam Lopez
Phrase-based Models d = 3 Max distortion d O ( n 3 d 2 ) see, e.g. Moore & Quirk 2007 Window length d O ( nd 2 2 d ) Moses (Hoang & Koehn, pc) First d uncovered � n � O ( nd ) see, e.g. Tillman & Ney 2003, d +1 Zens & Ney 2004 Adam Lopez
Phrase-based Models These models are not the same. • Each can generate translations that the other cannot (regardless of d ). • Different complexities. • Reported results will be impossible to replicate with your (different) strategy. Adam Lopez
Good News • Most translation models are a few lines of deductive logic. • Computation of any semiring for free. • You might conclude: give a logic and a semiring, get a complete algorithm. Adam Lopez
Good News • Most translation models are a few lines of deductive logic. • Computation of any semiring for free. • You might conclude: give a logic and a semiring, get a complete algorithm. Adam Lopez
Result • Given: • A logic • A semiring • Get: a complete algorithm Adam Lopez
Bad News • Our models use non-local features. • We need approximate search algorithms (and we need to be able to tweak them). Adam Lopez
Non-local features Adam Lopez
Non-local features Adam Lopez
Non-local features [ e q , ..., e q + n − 2 ] R ( e q , ..., e q + n − 1 ) [ e q +1 , ..., e q + n − 1 ] Adam Lopez
Non-local features Adam Lopez
Non-local features Adam Lopez
Non-local features Adam Lopez
Non-local features Adam Lopez
Non-local features minimal logic [ i ] R ( f i +1 ...f i ′ /e j ...e j ′ ) [ i ′ ] Adam Lopez
Non-local features minimal logic [ i ] R ( f i +1 ...f i ′ /e j ...e j ′ ) [ i ′ ] complete logic [ i, e j − n +1 , ..., e j − 1 ] R ( f i +1 ...f i ′ /e j ...e j ′ ) R ( e j − n +1 , ..., e j ) ...R ( e j ′ − n +1 ...e j ′ ) [ i ′ , e j ′ − n +2 ...e j ′ ] Adam Lopez
Recommend
More recommend