Forced Decoding • both as data selection (more literal) and oracle derivations Bushi yu Shalong juxing le huitan Bush held talks with Sharon held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6
Forced Decoding • both as data selection (more literal) and oracle derivations Bushi yu Shalong juxing le huitan Bush held talks with Sharon Bush _ _ _ _ _ ! ● ! ! ! ! ! held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6
Forced Decoding • both as data selection (more literal) and oracle derivations Bushi yu Shalong juxing le huitan Bush held talks with Sharon ... meeting _ _ ● ● ● ● ! ! ! ! ... talks Bush _ _ ● ● ● ● ! ! ! ! ! ! ! ! ! ! ! _ _ _ _ _ ! ● ! ! ! ! ! ... talk _ _ ● ● ● ● ! ! ! ! ! ! ! ! ! ! ! ! ! ! held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6
Forced Decoding • both as data selection (more literal) and oracle derivations Bushi yu Shalong juxing le huitan Bush held talks with Sharon ... meeting _ _ ● ● ● ● ! ! ! ! ●●●●●● ... Shalong ... talks ●●●●●● ... Sharon Bush _ _ ● ● ● ● ! ! ! ! ! ! ! ! ! ! ! _ _ _ _ _ ! ● ! ! ! ! ! ... talk _ _ ● ● ● ● ! ! ! ! ! ! ! ! ! ! ! ! ! ! held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6
Forced Decoding • both as data selection (more literal) and oracle derivations Bushi yu Shalong juxing le huitan Bush held talks with Sharon ... meeting _ _ ● ● ● ● ! ! ! ! ●●●●●● ... Shalong ... talks ●●●●●● ... Sharon Bush _ _ ● ● ● ● ! ! ! ! ! ! ! ! ! ! ! _ _ _ _ _ ! ● ! ! ! ! ! ... talk _ _ ● ● ● ● ! ! ! ! ! ! ! ! ! ! ! ! ! ! held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6
Forced Decoding • both as data selection (more literal) and oracle derivations Bushi yu Shalong juxing le huitan Bush held talks with Sharon ... meeting _ _ ● ● ● ● ! ! ! ! ●●●●●● ... Shalong ... talks ●●●●●● ... Sharon Bush _ _ ● ● ● ● ! ! ! ! ! ! ! ! ! ! ! _ _ _ _ _ ! ● ! ! ! ! ! ... talk _ _ ● ● ● ● ! ! ! ! ! ! ! ! ! ! ! ! ! ! held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6
Forced Decoding • both as data selection (more literal) and oracle derivations Bushi yu Shalong juxing le huitan Bush held talks with Sharon ... meeting _ _ ● ● ● ● ! ! ! ! ●●●●●● ... Shalong ... talks ●●●●●● ... Sharon Bush _ _ ● ● ● ● ! ! ! ! ! ! ! ! ! ! ! _ _ _ _ _ ! ● ! ! ! ! ! one gold derivation ... talk _ _ ● ● ● ● ! ! ! ! ! ! ! ! ! ! ! ! ! ! held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6
Unreachable Sentences and Prefix • distortion limit causes unreachability (hiero would be better) • but we can still use reachable prefix-pairs of unreachable pairs 玻利 维亚 观 察 员 联 合国 大 选 派遣 监 督 50 名 民主 以来 首次 全国 恢复 政治 U.N. sent 50 observers to 5 monitor the 1st election 3 since 3 Bolivia restored democracy 11 4
Unreachable Sentences and Prefix • distortion limit causes unreachability (hiero would be better) • but we can still use reachable prefix-pairs of unreachable pairs 玻利 维亚 观 察 员 联 合国 大 选 派遣 监 督 50 名 民主 以来 首次 全国 恢复 政治 U.N. sent 50 observers to 5 monitor the 1st election 3 since 3 Bolivia restored democracy 11 4
Sentence/Word Reachability Ratio • how many sentences pairs pass forced decoding? • the ratio drops dramatically as sentences get longer • prefixes boost coverage 100% Ratio of complete coverage 90% 80% 70% Distortion-unlimit 60% Distortion-limit 6 50% Distortion-limit 4 Distortion-limit 2 40% Distortion-limit 0 30% 20% 10% 0% 10 20 30 40 50 60 70 Sentence length 12
Sentence/Word Reachability Ratio • how many sentences pairs pass forced decoding? • the ratio drops dramatically as sentences get longer • prefixes boost coverage 100% Ratio of complete coverage 90% 80% 70% Distortion-unlimit 60% Distortion-limit 6 50% Distortion-limit 4 Distortion-limit 2 40% Distortion-limit 0 30% 20% 10% 0% 10 20 30 40 50 60 70 Sentence length 12
Sentence/Word Reachability Ratio • how many sentences pairs pass forced decoding? • the ratio drops dramatically as sentences get longer • prefixes boost coverage 100% Ratio of complete coverage 90% 80% 70% Distortion-unlimit 100% dist-6 Ratio of complete coverage 60% 90% Distortion-limit 6 dist-4 80% 50% Distortion-limit 4 dist-2 70% dist-0 Distortion-limit 2 40% 60% Distortion-limit 0 50% 30% 40% 20% 30% 20% 10% 10% 0% 0% 10 20 30 40 50 60 70 10 20 30 40 50 60 70 Sentence length Sentence length 12
Number of Gold Derivations • exponential in sentence length (on fully reachables) • these are the “latent variables” in learning 100000 Average number of derivations dist-6 90000 dist-4 80000 dist-2 70000 dist-0 60000 50000 40000 30000 20000 10000 0 5 10 15 20 25 30 35 40 45 50 Sentence length 13
Outline • Background: Phrase-based Translation (Koehn, 2004) • Forced Decoding • Violation-Fixing Perceptron for MT Training • Update strategy • Feature design • Experiments 14
Structured Perceptron (Collins 02) binary classification w x x exact z x update weights inference if y ≠ z y y=+ 1 y=- 1 15
Structured Perceptron (Collins 02) binary classification w x x exact z x update weights inference if y ≠ z y y=+ 1 y=- 1 structured classification 15
Structured Perceptron (Collins 02) binary classification w x x exact z x update weights inference if y ≠ z y y=+ 1 y=- 1 structured classification 那 人 咬 了 狗 x the man bit the dog y 15
Structured Perceptron (Collins 02) binary classification w x x exact z x constant update weights inference # of classes if y ≠ z y y=+ 1 y=- 1 exponential structured classification # of classes w 那 人 咬 了 狗 exact x z x update weights inference if y ≠ z y the man bit the dog y • challenges in applying perceptron for MT • the inference (decoding) is vastly inexact (beam search) • we know standard perceptron doesn’t work for MT • intuition: the learner should fix the search error first 15
Structured Perceptron (Collins 02) binary classification w x x exact z x constant update weights inference # of classes if y ≠ z y y=+ 1 y=- 1 exponential structured classification # of classes w inexact 那 人 咬 了 狗 exact x z x update weights inference inference if y ≠ z y the man bit the dog y • challenges in applying perceptron for MT • the inference (decoding) is vastly inexact (beam search) • we know standard perceptron doesn’t work for MT • intuition: the learner should fix the search error first 15
Search Error: Gold Derivations Pruned held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 real decoding beam search _ _ _ _ _ _ 0 1 2 3 4 5 6 16
Search Error: Gold Derivations Pruned held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 real decoding beam search ● _ _ _ _ _ _ _ _ _ _ ● _ _ _ _ ● _ _ _ _ _ _ _ _ _ ● _ _ _ 0 1 2 3 4 5 6 16
Search Error: Gold Derivations Pruned held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 real decoding beam search ● _ _ _ _ _ _ _ ● _ ● _ _ _ _ _ _ ● _ _ ● ● _ _ _ _ _ _ ● _ _ ● ● _ _ _ _ _ _ _ _ _ _ _ ● _ _ _ _ ● _ _ ● _ 0 1 2 3 4 5 6 16
Search Error: Gold Derivations Pruned held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 real decoding beam search ● _ _ _ _ _ _ ● ● ● _ _ _ _ ● _ ● _ _ _ _ _ _ ● _ _ ● ● ● _ _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ _ _ _ _ _ _ _ _ ● _ _ _ _ ● _ _ ● _ _ ● ● ● _ _ ● _ _ ● ● _ 0 1 2 3 4 5 6 16
Search Error: Gold Derivations Pruned held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 real decoding beam search ● _ _ _ _ _ _ ● ● ● _ _ _ ● _ ● ● ● _ _ ● _ ● _ _ _ _ _ _ ● ● _ ● ● _ ● _ _ ● ● ● _ _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ ● _ _ ● ● ● _ _ _ _ _ _ _ _ ● _ _ _ ● _ _ ● ● ● _ ● _ _ ● _ _ ● ● ● _ _ ● _ _ ● ● _ 0 1 2 3 4 5 6 16
Search Error: Gold Derivations Pruned held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 real decoding beam search ● _ _ _ _ _ _ ● ● ● _ _ _ ● _ ● ● ● _ _ ● _ ● _ _ _ _ _ _ ● ● _ ● ● _ ● _ _ ● ● ● _ _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ ● _ _ ● ● ● _ _ _ _ _ _ _ _ ● _ _ _ _ ● _ _ ● _ _ ● ● ● _ _ ● _ _ ● ● _ 0 1 2 3 4 5 6 16
Search Error: Gold Derivations Pruned held talks gold derivation lattice with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 real decoding beam search should fix search errors here! ● _ _ _ _ _ _ ● ● ● _ _ _ ● _ ● ● ● _ _ ● _ ● _ _ _ _ _ _ ● ● _ ● ● _ ● _ _ ● ● ● _ _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ ● _ _ ● ● ● _ _ _ _ _ _ _ _ ● _ _ _ _ ● _ _ ● _ _ ● ● ● _ _ ● _ _ ● ● _ 0 1 2 3 4 5 6 16
Fixing Search Error 1: Early Update standard update Model (no guarantee!) 17 21
Fixing Search Error 1: Early Update • early update (Collins/Roark’04) when the correct falls off beam • up to this point the incorrect prefix should score higher • that’s a “violation” which we want to fix standard update Model (no guarantee!) 17 21
Fixing Search Error 1: Early Update • early update (Collins/Roark’04) when the correct falls off beam • up to this point the incorrect prefix should score higher • that’s a “violation” which we want to fix correct standard update correct sequence Model (no guarantee!) falls off beam (pruned) 17 21
Fixing Search Error 1: Early Update • early update (Collins/Roark’04) when the correct falls off beam • up to this point the incorrect prefix should score higher • that’s a “violation” which we want to fix correct t c e r r o c n i standard update correct sequence Model (no guarantee!) falls off beam (pruned) 17 21
Fixing Search Error 1: Early Update • early update (Collins/Roark’04) when the correct falls off beam • up to this point the incorrect prefix should score higher • that’s a “violation” which we want to fix correct violation guaranteed: update early incorrect prefix scores t c higher up to this point e r r o c n i standard update correct sequence Model (no guarantee!) falls off beam (pruned) 17 21
Fixing Search Error 1: Early Update • early update (Collins/Roark’04) when the correct falls off beam • up to this point the incorrect prefix should score higher • that’s a “violation” which we want to fix • standard perceptron does not guarantee violation • w/ pruning, the correct seq. might score higher at the end! • called “invalid” update b/c it doesn’t fix the search error correct violation guaranteed: update early incorrect prefix scores t c higher up to this point e r r o c n i standard update correct sequence Model (no guarantee!) falls off beam (pruned) 17 21
Early Update w/ Latent Variable • the gold-standard derivations are not annotated • we treat any reference-producing derivation as good gold derivation lattice held talks with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 Model 18 21
Early Update w/ Latent Variable • the gold-standard derivations are not annotated • we treat any reference-producing derivation as good gold derivation lattice held talks with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 correct Model 18 21
Early Update w/ Latent Variable • the gold-standard derivations are not annotated • we treat any reference-producing derivation as good gold derivation lattice held talks with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 correct Model 18 21
Early Update w/ Latent Variable • the gold-standard derivations are not annotated • we treat any reference-producing derivation as good gold derivation lattice held talks with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 correct all correct derivations fall off Model 18 21
Early Update w/ Latent Variable • the gold-standard derivations are not annotated • we treat any reference-producing derivation as good gold derivation lattice held talks with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 correct t c e r r o c n i all correct derivations fall off Model 18 21
Early Update w/ Latent Variable • the gold-standard derivations are not annotated • we treat any reference-producing derivation as good gold derivation lattice held talks with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 violation guaranteed: correct update early incorrect prefix scores t c higher up to this point e r r o c n i all correct derivations fall off Model 18 21
Early Update w/ Latent Variable • the gold-standard derivations are not annotated • we treat any reference-producing derivation as good gold derivation lattice held talks with Sharon held Bush talks with Sharon _ _ _ _ _ _ ● _ _ _ _ _ ● _ _ ● ● _ ● _ _ ● ● ● ● ● _ ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 violation guaranteed: correct update early incorrect prefix scores t c higher up to this point e r r o c n i all correct derivations fall off Model stop decoding 18 21
Fixing Search Error 2: Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ • early update works but learns slowly due to partial updates • max-violation: use the prefix where violation is maximum • “worst-mistake” in the search space • we call these methods “violation-fixing perceptrons” (Huang et al 2012) 19
Early Update vs. Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ _ _ _ _ _ _ 0 1 2 3 4 5 6
Early Update vs. Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ ● _ _ _ _ _ _ _ _ _ _ ● _ _ _ _ ● _ _ _ _ _ _ _ _ _ ● _ _ _ 0 1 2 3 4 5 6
Early Update vs. Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ ● _ _ _ _ _ _ _ ● _ ● _ _ _ _ _ _ ● _ _ ● ● _ _ _ _ _ _ ● _ _ ● ● _ _ _ _ _ _ _ _ _ _ _ ● _ _ _ _ ● _ _ ● _ 0 1 2 3 4 5 6
Early Update vs. Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ ● _ _ _ _ _ _ ● ● ● _ _ _ _ ● _ ● _ _ _ _ _ _ ● _ _ ● ● ● _ _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ _ _ _ _ _ _ _ _ ● _ _ _ _ ● _ _ ● _ _ ● ● ● _ _ ● _ _ ● ● _ 0 1 2 3 4 5 6
Early Update vs. Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ ● _ _ _ _ _ _ ● ● ● _ _ _ _ ● _ ● _ _ _ _ _ _ ● _ _ ● ● ● _ _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ _ _ _ _ _ _ _ _ ● _ _ _ _ ● _ _ ● _ _ ● ● ● _ _ ● _ _ ● ● _ 0 1 2 3 4 5 6 Early-update
Early Update vs. Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ ● _ _ _ _ _ _ ● ● ● _ _ _ ● _ ● ● ● _ _ ● _ ● _ _ _ _ _ _ ● ● _ ● ● _ ● _ _ ● ● ● _ _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ ● _ _ ● ● ● _ _ _ _ _ _ _ _ ● _ _ _ ● _ _ ● ● ● _ ● _ _ ● _ _ ● ● ● _ _ ● _ _ ● ● _ 0 1 2 3 4 5 6 Early-update
Early Update vs. Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ ● _ _ _ _ _ _ ● ● ● _ _ _ ● _ ● ● ● _ _ ● _ ● _ _ _ _ _ _ ● ● _ ● ● _ ● _ _ ● ● ● _ _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ ● _ _ ● ● ● _ _ _ _ _ _ _ _ ● _ _ _ _ ● _ _ ● _ _ ● ● ● _ _ ● _ _ ● ● _ 0 1 2 3 4 5 6 Early-update
Early Update vs. Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ ● _ _ _ _ _ _ ● ● ● _ _ _ ● _ ● ● ● _ _ ● _ ● _ ● ● ● _ ● ● _ _ _ _ _ ● ● _ ● ● _ ● _ _ ● ● ● _ ● _ ● ● ● ● _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ ● ● ● ● _ ● ● _ _ ● ● ● _ _ _ _ _ _ _ _ ● _ _ _ ● ● ● _ ● ● _ ● _ _ ● _ _ ● ● ● _ _ ● _ _ ● ● _ ● ● _ ● ● ● 0 1 2 3 4 5 6 Early-update
Early Update vs. Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ ● _ _ _ _ _ _ ● ● ● _ _ _ ● _ ● ● ● _ _ ● _ ● _ ● ● ● _ ● ● _ _ _ _ _ ● ● _ ● ● _ ● _ _ ● ● ● _ ● _ ● ● ● ● _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ ● ● ● ● _ ● ● _ _ ● ● ● _ _ _ _ _ _ _ _ ● _ _ _ ● ● ● _ ● ● _ ● _ _ ● _ _ ● ● ● _ _ ● ● ● ● ● ● ● _ _ ● ● _ ● ● _ ● ● ● 0 1 2 3 4 5 6 Early-update
Early Update vs. Max-Violation d + std | x | d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗ ● _ _ _ _ _ _ ● ● ● _ _ _ ● _ ● ● ● _ _ ● _ ● _ ● ● ● _ ● ● _ _ _ _ _ ● ● _ ● ● _ ● _ _ ● ● ● _ ● _ ● ● ● ● _ _ ● ● _ _ _ _ _ _ ● _ ● ● _ ● _ _ _ ● ● _ _ _ ● ● ● ● _ ● ● _ _ ● ● ● _ _ _ _ _ _ _ _ ● _ _ _ ● ● ● _ ● ● _ ● _ _ ● _ _ ● ● ● _ _ ● ● ● ● ● ● ● _ _ ● ● _ ● ● _ ● ● ● 0 1 2 3 4 5 6 Early-update Max-violation
Latent-Variable Perceptron (standard) full best in the beam violation c o early r r max- e latest c t s e q u e n c e worst in the beam last valid invalid update update! falls off biggest the beam violation d + | x | std d − d − best in the beam i i ∗ d − | x | violation model w early max- local d y | x | d + worst in the beam standard update i d + is invalid i ∗
Roadmap of the techniques structured perceptron (Collins, 2002) 22
Roadmap of the techniques structured perceptron (Collins, 2002) latent-variable perceptron (Zettlemoyer and Collins, 2005; Sun et al., 2009) 22
Roadmap of the techniques structured perceptron (Collins, 2002) perceptron w/ latent-variable inexact search perceptron (Collins & Roark, 2004; (Zettlemoyer and Collins, Huang et al 2012) 2005; Sun et al., 2009) 22
Roadmap of the techniques structured perceptron (Collins, 2002) perceptron w/ latent-variable inexact search perceptron (Collins & Roark, 2004; (Zettlemoyer and Collins, Huang et al 2012) 2005; Sun et al., 2009) latent-variable perceptron w/ inexact search (Yu et al 2013) 22
Roadmap of the techniques structured perceptron (Collins, 2002) perceptron w/ latent-variable inexact search perceptron (Collins & Roark, 2004; (Zettlemoyer and Collins, Huang et al 2012) 2005; Sun et al., 2009) latent-variable perceptron w/ inexact search (Yu et al 2013) hiero syntactic parsing semantic parsing transliteration 22
Feature Design • Dense features: • standard phrase-based features (Koehn, 2004) • Sparse Features: • rule-identification features (unique id for each rule) • word-edges features • lexicalized local translation context within a rule • non-local features • dependency between consecutive rules 23
布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule 24
布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule 24
布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule 24
布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule 24
布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule 24
布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule Combo Features: 24
布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule Combo Features: 24
布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule Combo Features: 100010= 沙 ⻰龚 |held 24
布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule Combo Features: 100010= 沙 ⻰龚 |held 24
布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule Combo Features: 100010= 沙 ⻰龚 |held 24
布什 WordEdges Features (local) <s> </s> 举 行 了 会 谈 与 沙 ⻰龚 r 1 r 2 <s> held a few talks Bush • the first and last Chinese words in the rule • the first and last English words in the rule • the two Chinese words surrounding the rule Combo Features: 100010= 沙 ⻰龚 |held 010001= 举 行 |talks 24
Recommend
More recommend