itg for joint phrasal translation modeling
play

ITG for Joint Phrasal Translation Modeling Colin Cherry Dekang Lin - PowerPoint PPT Presentation

ITG for Joint Phrasal Translation Modeling Colin Cherry Dekang Lin University of Alberta Google Inc. April 26, 2007 University of Alberta The Gist Joint phrasal translation models (JPTM) learn a bilingual phrase table using EM


  1. ITG for Joint Phrasal Translation Modeling Colin Cherry Dekang Lin University of Alberta Google Inc. April 26, 2007 University of Alberta

  2. The Gist • Joint phrasal translation models (JPTM) learn a bilingual phrase table using EM • Phrasal ITG: – Use synchronous parsing to replace hill climbing & sampling with dynamic programming • Do resulting phrase tables improve translation? 1 of 28 April 26, 2007 University of Alberta

  3. Outline • Phrasal Translation Models • We build on: – Phrase extraction, JPTM, ITG • Phrasal ITG – Helpful constraints • Results • Summary & Future Work 2 of 28 April 26, 2007 University of Alberta

  4. Phrasal translation model English French P(e|f) P(f|e) ethical food alimentation éthique 0.95 0.16 ethical foreign policy politique étrangère morale 0.23 0.01 ethical foundations fondements éthiques 0.10 0.03 … • Ultimately interested in a bilingual phrase table – Lists and scores possible phrasal translations 3 of 28 April 26, 2007 University of Alberta

  5. Surface Heuristic � cars � red � likes � he il aime les voitures rouges • Alignments provided by GIZA++ combination • Surface heuristic: – Count each consistent phrase as occurring once – Aggregate counts over all sentence pairs 4 of 28 April 26, 2007 University of Alberta

  6. Surface Heuristic � cars � red � likes � he il aime les voitures rouges • Alignments provided by GIZA++ combination • Surface heuristic: – Count each consistent phrase as occurring once – Aggregate counts over all sentence pairs 5 of 28 April 26, 2007 University of Alberta

  7. Surface Heuristic � cars � red � likes � he il aime les voitures rouges • Alignments provided by GIZA++ combination • Surface heuristic: – Count each consistent phrase as occurring once – Aggregate counts over all sentence pairs 6 of 28 April 26, 2007 University of Alberta

  8. Surface Heuristic � cars � red � likes � he il aime les voitures rouges • Alignments provided by GIZA++ combination • Surface heuristic: – Count each consistent phrase as occurring once – Aggregate counts over all sentence pairs 7 of 28 April 26, 2007 University of Alberta

  9. Joint Phrasal Model (JPTM) • Introduced by Marcu and Wong (2002) • Trained with EM, like the IBM models • Sentence pair built simultaneously – Generate a bag of bilingual phrase pairs – Permute the phrases to form e and f 8 of 28 April 26, 2007 University of Alberta

  10. Joint Phrasal Model cars red likes he il aime les voitures rouges Reason over an exponential number of phrasal alignments Space is huge - task actually accomplished by sampling around high-probability point 9 of 28 April 26, 2007 University of Alberta

  11. Joint Phrasal Model cars red likes he il aime les voitures rouges Reason over an exponential number of phrasal alignments Space is huge - task actually accomplished by sampling around high-probability point 10 of 28 April 26, 2007 University of Alberta

  12. Joint Phrasal Model � cars � red � likes � he il aime les voitures rouges Birch et al. (2006): Constrained JPTM Explore only phrasal alignments consistent with high precision word alignment 11 of 28 April 26, 2007 University of Alberta

  13. Inversion Transduction Grammar • Introduced in by Wu (1997) – Transduction: red C • C → red / rouge rouge – Inversion: C • A → [A C] Straight A C • B → <A C> Inverted A 12 of 28 April 26, 2007 University of Alberta

  14. ITG Parse cars red likes he il aime les voitures rouges 13 of 28 April 26, 2007 University of Alberta

  15. Phrasal ITG down down down calm calm calm calmez vous calmez vous calmez vous • Any phrase pair can be produced by the lexicon • Choose between straight, inverted and now: phrasal 14 of 28 April 26, 2007 University of Alberta

  16. Training Phrasal ITG • All phrase pairs share mass as a joint model • Can be trained unsupervised with inside-outside • No more expensive than binary bracketing: – Phrases were already being explored as constituents 15 of 28 April 26, 2007 University of Alberta

  17. The hope • By moving to exact expectation: – Create more accurate statistics – Find a larger variety of phrase pairs 16 of 28 April 26, 2007 University of Alberta

  18. The problem - still slow: O(n 6 ) • ITG algorithms can be pruned: – O(n 4 ) potential constituents are considered – O(n 2 ) time spent considering all ways to build each constituent • Fixed link pruning : Eliminate constituents that are not consistent with a given word alignment – Skip them and treat them as having 0 probability • One link can potentially rule out 50% of constituents 17 of 28 April 26, 2007 University of Alberta

  19. Fixed Link Speed-up • Used GIZA++ intersection alignments • Inside-outside on first 100 sentences of corpus • Compared to Tic-tac-toe (Zhang & Gildea 2005) 1000 415 Time (sec) 100 37 10 5 1 No prune Tic-tac-toe Fixed-link 18 of 28 April 26, 2007 University of Alberta

  20. What about the ITG constraint? 12 are acceptable to the commission Mr. Burtone fully or in part M. Burtone 12 sont acceptables en tout ou partie pour la commission • ITG can’t handle this due to discontinuous constituents • Check fixed links used for pruning – If they are non-ITG, drop from training set • In our French-English Europarl set, this results in a reduction in data of less than 1% 19 of 28 April 26, 2007 University of Alberta

  21. Experiments • Conditionalize joint tables to P(e|f) and P(f|e) • French-English Europarl Set – 25 length limit, 400k sentence pairs • SMT Workshop Baseline MT System – Pharaoh, MERT Training on 500 tuning pairs • Included unnormalized IBM Model 1 features for all • Compared to: – JPTM constrained with GIZA++ Intersect – Surface Heuristic Extraction with GIZA++ GDF 20 of 28 April 26, 2007 University of Alberta

  22. Results: BLEU Scores 31.0 30.5 30.0 29.5 29.0 28.5 28.0 C-JPTM Phrasal ITG Surface 21 of 28 April 26, 2007 University of Alberta

  23. Results: Table Size (in millions of entries) 12 10 8 6 4 2 0 C-JPTM Phrasal ITG Surface 22 of 28 April 26, 2007 University of Alberta

  24. Summary • Phrasal ITG that learns phrases from bitext – Similar to JPTM • Complete expectations do matter – Other JPTMs could benefit from improving their search and sampling methods • A new ITG pruning technique – 80 times faster inside-outside 23 of 28 April 26, 2007 University of Alberta

  25. Future: Eliminate Frequency Limits • Must constrain any joint model to use phrases that occur with a minimum frequency – Otherwise sentence = phrase is ML solution cars red likes he il aime les voitures rouges 24 of 28 April 26, 2007 University of Alberta

  26. Future: Eliminate Frequency Limits • Must constrain any joint model to use phrases that occur with a minimum frequency – Otherwise sentence = phrase is ML solution 31.0 30.5 30.0 29.5 29.0 28.5 28.0 C-JPTM Phrasal Surface Surface >=5 ITG >=5 >=5 25 of 28 April 26, 2007 University of Alberta

  27. Future: Eliminate Frequency Limits • Must constrain any joint model to use phrases that occur with a minimum frequency – Otherwise sentence = phrase is ML solution Apply Bayesian methods (priors) to 31.0 replace these limits 30.5 (Goldwater et al. 2006) 30.0 29.5 29.0 28.5 28.0 C-JPTM Phrasal Surface Surface >=5 ITG >=5 >=5 26 of 28 April 26, 2007 University of Alberta

  28. This isn’t the whole story… • Explored the same model as a phrasal aligner • Needs additional constraints to work: – Fixed links help select phrases that are non-compositional • Alignments work well with surface heuristic • Details in the paper! 27 of 28 April 26, 2007 University of Alberta

  29. Questions? Comments? Suggestions? Support provided by: QuickTime™ and a Alberta Ingenuity Fund TIFF (Uncompressed) decompressor are needed to see this picture. Alberta Informatics Circle of QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Research Excellence 28 of 28 April 26, 2007 University of Alberta

  30. Along the way… • Adapt consistency constraints from heuristic phrase extraction for ITG parsing • Deal with the ITG constraint in large data 29 of 28 April 26, 2007 University of Alberta

  31. University of Alberta 30 of 28 April 26, 2007

Recommend


More recommend