experiments in english japanese tree to string machine
play

Experiments in EnglishJapanese Tree-to-String Machine Translation - PowerPoint PPT Presentation

Experiments in English-Japanese Tree-to-String Machine Translation Experiments in EnglishJapanese Tree-to-String Machine Translation Graham Neubig Nara Institute of Science and Technology 10/20/2012 1 Experiments in English-Japanese


  1. Experiments in English-Japanese Tree-to-String Machine Translation Experiments in English↔Japanese Tree-to-String Machine Translation Graham Neubig Nara Institute of Science and Technology 10/20/2012 1

  2. Experiments in English-Japanese Tree-to-String Machine Translation Introduction/Motivation 2

  3. Experiments in English-Japanese Tree-to-String Machine Translation Translation Models string string he visited the white house 彼 は ホワイト ハウス を 訪問 した tree (phrase structure) tree (phrase structure) S S VP PP PP VP to NP NP NP NP NP VP PRP VBD DT NNP NNP N P N N P N V he visited the white house 彼 は ホワイト ハウス を 訪問 した dependency dependency dobj subj det nsubj n n n dobj n n 3 he visited the white house 彼 は ホワイト ハウス を 訪問 した

  4. Experiments in English-Japanese Tree-to-String Machine Translation Recent Usage in English↔Japanese ● Phrase-based translation [Koehn+ 03] is still popular English: he visited the white house Japanese: 彼 は ホワイト ハウス を 訪問 した ● Moses used in 25 papers at NLP2012 ● Also, hierarchical phrase-based translation [Chiang 07] ([Feng+ 11] is one of the few examples) 4

  5. Experiments in English-Japanese Tree-to-String Machine Translation Recent Usage in English↔Japanese ● Pre-ordering [Xia+ 04] is another popular technique obj Source det subj Dependencies: adj he visited the white house Pre-ordering: subj v obj → subj obj v he the white house visited Translation: 彼 は ホワイト ハウス を 訪問 した ● First used for Japanese by [Komachi+ 06]? ● Used by Google [Xu+ 09], NTT [Isozaki+ 11], others [Nguyen+ 08, Neubig+ 12] 5

  6. Experiments in English-Japanese Tree-to-String Machine Translation Recent Usage in English↔Japanese ● Dependency-to-dependency used by Kyoto U [Nakazawa+ 06] and rule based systems dobj det nsubj nsubj dobj n he visited the white house X1 visited X2 X1 X2 訪問 した 彼 は ホワイト ハウス を 訪問 した n n n dobj dobj n subj 6

  7. Experiments in English-Japanese Tree-to-String Machine Translation Recent Usage in English↔Japanese ● String-to-tree models [Yamada+ 01] used by NTT in NTCIR task [Sudoh+ 11] 7

  8. Experiments in English-Japanese Tree-to-String Machine Translation Recent Usage in English↔Japanese string string (H)PBMT he visited the white house 彼 は ホワイト ハウス を 訪問 した S2T tree (phrase structure) tree (phrase structure) S S VP Pre- PP PP VP ordering NP NP NP NP NP VP PRP VBD DT NNP NNP N P N N P N V he visited the white house 彼 は ホワイト ハウス を 訪問 し dependency dependency dobj subj D2D det nsubj n n n dobj n n 8 he visited the white house 彼 は ホワイト ハウス を 訪問 した

  9. Experiments in English-Japanese Tree-to-String Machine Translation What about Tree-driven Models?! string string he visited the white house 彼 は ホワイト ハウス を 訪問 した tree (phrase structure) tree (phrase structure) S S VP T2S PP PP VP NP NP NP NP NP VP PRP VBD DT NNP NNP N P N N P N V he visited the white house 彼 は ホワイト ハウス を 訪問 し dependency dependency D2S dobj subj det nsubj n n n dobj n n 9 he visited the white house 彼 は ホワイト ハウス を 訪問 した

  10. Experiments in English-Japanese Tree-to-String Machine Translation Tree-to-String Models [Liu+ 06] x1 with x0 VP 0-5 VP 2-5 x1 x0 PP 0-1 PP 2-3 N 0 P 1 VP 4-5 N 2 P 3 V 4 SUF 5 友達 と ate ご飯 を 食べ た a meal a friend x1 x0 x1 x0 ate a meal with a friend 10

  11. Experiments in English-Japanese Tree-to-String Machine Translation Dependency-to-String Models [Quirk+ 05] dobj det nsubj nsubj dobj n he visited the white house X1 visited X2 X1 X2 訪問 した 彼 は ホワイト ハウス を 訪問 した 11

  12. Experiments in English-Japanese Tree-to-String Machine Translation T2S/D2S vs Phrase Based ● + Better reordering through use of syntactic structure ● + Very fast! (especially compared to HPBMT) ● + Better lexical choice because long-range context considered (especially D2S) ● - Requires a parser ● - Sensitive to parse errors 12

  13. Experiments in English-Japanese Tree-to-String Machine Translation T2S/D2S vs Pre-ordering ● + T2S/D2S jointly searches for reordering and translation ● + T2S/D2S can easily handle lexicalized reordering VP VP PP PP X X が 高い が 好き X is high likes X ● - Pre-ordering can find translation rules that overlap constituent boundaries 13

  14. Experiments in English-Japanese Tree-to-String Machine Translation T2S vs. D2S ● T2S: Can handle de-lexicalized rules = more general? S VP X1 X3 X2 X1:NP X3:NP (SVO → SOV) X2:VBD ● D2S: Dependent words are close → good for lexical choice? dobj dobj run a program run a marathon 14

  15. Experiments in English-Japanese Tree-to-String Machine Translation Experiments and Summary 15

  16. Experiments in English-Japanese Tree-to-String Machine Translation Question: How well do modern statistical tree-to- string methods work for English↔Japanese translation? 16

  17. Experiments in English-Japanese Tree-to-String Machine Translation Previous Research ● Three examples for En→Ja? ● [Quirk+ 06] Uses dependency treelet translation and shows improvement over PBMT ● [Wu+ 10] Uses HPSG input and shows improvement over Joshua (HPBMT) ● [DeNero+ 11] Shows forest-to-string does slightly better than syntactic pre-ordering in terms of BLEU ● One example for Ja→En? ● [Menezes+ 05] Uses dependency treelet translation, no direct comparison to other methods 17

  18. Experiments in English-Japanese Tree-to-String Machine Translation Experimental Setup ● System: In-house forest-to-string decoder “travatar” ● Forest-to-string translation [Mi+ 08] with tree transducers ● Alignment GIZA++, extraction GHKM, tuning MERT ● Data: Kyoto Free Translation Task (KFTT [Neubig 11]), ~350k sentences of Wikipedia data for training ● Baseline: Moses PBMT, PBMT + Preordering [Neubig+ 12] ● Evaluation: BLEU, RIBES, Acceptability (0-5) 18

  19. Experiments in English-Japanese Tree-to-String Machine Translation Tree-to-String Settings (Explained in Detail Later) ● Language Analysis: ● En Parser: Stanford, Berkeley, Egret (Tree, Forest) ● Ja: Juman+KNP, MeCab+Cabocha, KyTea+EDA ● Composed Rules: 1, 2, 3, 4 ● Non-terminals: 1, 2 , 3 ● Binarization: Left, Right ● Null Attachment: Top, Exhaustive ( 1 , 2) ● Tuning: BLEU, RIBES, (BLEU+RIBES)/2 19

  20. Experiments in English-Japanese Tree-to-String Machine Translation Summary (En-Ja) 21.5 69 68 21 67 20.5 66 20 RIBES BLEU 65 19.5 64 19 63 18.5 62 PBMT+Pre F2S PBMT+Pre F2S PBMT T2S PBMT T2S 3.2 3 Acceptability 2.8 2.6 2.4 2.2 PBMT+Pre F2S 20 PBMT T2S

  21. Experiments in English-Japanese Tree-to-String Machine Translation Summary (Ja-En) 17 65.5 16.8 65 16.6 64.5 16.4 64 RIBES BLEU 16.2 63.5 16 63 15.8 62.5 15.6 62 PBMT PBMT+Pre T2S PBMT PBMT+Pre T2S 3.2 3 Acceptability 2.8 2.6 2.4 2.2 PBMT PBMT+Pre T2S 21

  22. Experiments in English-Japanese Tree-to-String Machine Translation En-Ja F2S vs. PBMT+Pre Input: Department of Sociology in Faculty of Letters opened . PBMT+Pre: 開業 年 文学 部 社会 学科 。 F2S: 文学 部 社会 学 科 を 開設 。 Properly interprets noun phrase + verb 22

  23. Experiments in English-Japanese Tree-to-String Machine Translation En-Ja F2S vs. PBMT+Pre Input: Afterwards it was reconstructed but its influence declined . PBMT+Pre: その 後 衰退 し た が 、 その 影響 を 受け て 再建 さ れ た もの で あ る 。 F2S: その 後 再建 さ れ て い た が 、 影響 力 は 衰え た 。 Properly reconstructs relationship between two verb phrases 23

  24. Experiments in English-Japanese Tree-to-String Machine Translation En-Ja F2S vs. PBMT+Pre Input: Introduction of KANSAI THRU PASS Miyako Card PBMT+Pre: スルッと kansai 都 カード の 導入 F2S: 伝来 スルッと KANSAI 都 カード Parsing error: (NP (NP Introduction) (PP of KANSAI THRU PASS) (NP Miyako) (NP Card)) 24

  25. Experiments in English-Japanese Tree-to-String Machine Translation Ja-En T2S vs. PBMT+Pre Input: 史実 に は 直接 の 関係 は な い 。 PBMT+Pre: in the historical fact is not directly related to it . T2S: is not directly related to the historical facts . … ” Properly translates “ as “related to” に は 関係 が 25

  26. Experiments in English-Japanese Tree-to-String Machine Translation Ja-En T2S vs. PBMT+Pre Input: 九条 道家 は 嫡男 ・ 九条 教実 に 先立 た れ 、 次男 ・ 二 条 良実 は 事実 上 の 勘当 状態 に あ っ た 。 PBMT+Pre: michiie kujo was his eldest son and heir , norizane kujo , and his second son , yoshizane nijo was disinherited . T2S: michiie kujo to his legitimate son kujo norizane died before him , and the second son , nijo yoshizane was virtually disowned . Much better division between clauses 26

Recommend


More recommend