11 001
play

11,001 N EW F EATURES FOR S TATISTICAL M ACHINE T RANSLATION David - PowerPoint PPT Presentation

11,001 N EW F EATURES FOR S TATISTICAL M ACHINE T RANSLATION David Chiang Kevin Knight Wei Wang MOTIVATION Maria no di una bofetada a la bruja verde the green witch Maria DT JJ NN slap NNP NP VB NP did not NP VP RB VBD VP VP


  1. 11,001 N EW F EATURES FOR S TATISTICAL M ACHINE T RANSLATION David Chiang Kevin Knight Wei Wang

  2. MOTIVATION Maria no dió una bofetada a la bruja verde the green witch Maria DT JJ NN slap NNP NP VB NP did not NP VP RB VBD VP VP NP S

  3. MOTIVATION Maria no dió una bofetada a la bruja verde the green witch Maria DT JJ NN slapped NNP NP VBD NP did not NP VP RB VBD VP VP NP S

  4. MOTIVATION Maria no dió una bofetada a la bruja verde the green witch Maria DT JJ NN slapped NNP NP VBD NP did not NP VP RB VBD VP VP NP S

  5. MOTIVATION Maria no dió una bofetada a la bruja verde the green witch Maria DT JJ NN slapped NNP NP VBD NP did not NP VP RB VBD VP VP NP S

  6. MOTIVATION • Minimum error rate training (MERT) works for <30 features • Margin infused relaxed algorithm (MIRA) • Online large-margin discriminative training • Scales better to large feature sets • Enables freer exploration of features

  7. RESULTS GALE 2008 Chinese-English data System Training Features BLEU MERT 11 36.1 Hiero Hiero MIRA 10,990 37.6 MERT 25 39.5 Syntax Syntax MIRA 283 40.6

  8. OVERVIEW Training • Features • Experiments •

  9. Training

  10. MIRA • Crammer and Singer, 2003 • Applied to statistical MT by Watanabe et al., 2007 • Chiang, Marton, and Resnik, 2008: • use more of the forest • parallelize training

  11. MERT BLEU Model score

  12. MERT BLEU Model score

  13. MIRA loss BLEU margin Model score

  14. MIRA BLEU Model score

  15. FOREST-BASED TRAINING 0.5 0.45 0.4 0.35 BLEU 0.3 0.25 0.2 0.15 0.1 0.05 -46 -44 -42 -40 -38 -36 -34 Model score

  16. PARALLEL TRAINING • Run n MIRA learners in parallel decode decode broadcast • Share information among learners update broadcast decode update broadcast update Hiero n = 20 decode broadcast Syntax n = 73 update decode decode broadcast

  17. Features

  18. DISCOUNT FEATURES PP IN PP 晚上 NP 1 左右 count=1 from IN NP 1 p.m. around around • Low counts are often overestimates • Introduce a count=1 feature that fires on 1-count rules, etc.

  19. TARGET SYNTAX FEATURES UN inspectors VP VP UN inspectors VP VBD VP VP were VP expelled by NK expelled by NK insert-were

  20. TARGET SYNTAX FEATURES S NP ADVP VP . " bad-rewrite NP PP NNP NP PP NN IN NP edo NP PP in mind thinking of NP " PP art for the generation bad-rewrite the best-selling book VBN NP published his autobiography …

  21. TARGET SYNTAX FEATURES S S , S NP VP NP VP , Yoon Yoon VBD VBD S said said node=,

  22. TARGET SYNTAX FEATURES 第一 个 站 出 来 第一 个 站 出 来 first stand come out first stand come out NP PP NP SBAR IN NP the first to VP the first leg IN VP from stand up root=IN root=VP

  23. SOURCE CONTEXT FEATURES Marton & Resnik 2008; Chiang et al 2008 VP 这 是 一 个 值得 关 注 和 研究 的 新 动 向 . merit attention and study new trend this is a new trends in the study cross-VP • Use external parser to infer source-side syntax • Rewards and penalties for matching/crossing brackets

  24. SOURCE CONTEXT FEATURES Marton & Resnik 2008; Chiang et al 2008 VP 这 是 一 个 值得 关 注 和 研究 的 新 动 向 . merit attention and study new trend this is a meriting attention and study match-VP • Use external parser to infer source-side syntax • Rewards and penalties for matching/crossing brackets

  25. SOURCE CONTEXT FEATURES Chiang et al 2008 挪威 恢复 在 斯里 兰 卡 的 和平 斡旋 Norway restore in Sri Lanka peace mediation to restore peace in Sri Lanka , the Norwegian mediation 挪威 恢复 在 斯里 兰 卡 的 和平 斡旋 Norway restore in Sri Lanka peace mediation Norway restoring peace mediation in Sri Lanka

  26. SOURCE CONTEXT FEATURES • Word context features: similar to Watanabe et al. 2007 and work on WSD in MT (Chan et al. 2007, Carpuat & Wu 2007) • Relate a word’s translation with its left or right neighbor on the source side (just the 100 most frequent types) f i-1 f i f i f i+1 e e

  27. SOURCE CONTEXT FEATURES 他 说 , 由于 没 有 配音 , 他 不得不 he said because no voice he had to since there is no voice , he said , he had to f i =, & f i-1 = 说 & e=, 他 说 , 由于 没 有 配音 , 他 不得不 he said because no voice he had to he said that because of the lack of voice , he had to f i =, & f i-1 = 说 & e=that

  28. Experiments

  29. TRAINING DATA GALE 2008 Chinese-English data Hiero Syntax Parallel data 260M 65M Language model 2G 1G MERT/MIRA 58k 58k Test 57k 57k

  30. RESULTS (HIERO) Chinese-English Training Features # BLEU MERT baseline 11 36.1 +source-side syntax 56 36.9 +distortion MIRA +discount 61 37.3 +word context 10,990 37.6

  31. RESULTS (SYNTAX) Chinese-English Training Features # BLEU MERT baseline 25 39.5 baseline 25 39.8 rule overlap 132 39.9 node count 136 40.0 MIRA +discount +bad rewrite 283 40.6 +insertion

  32. CONCLUSIONS • Using underutilized information for new features: • Source context is computationally efficient • Target syntax provides a rich structure • MIRA is working well on new features, systems, languages

Recommend


More recommend