inferring translation candidates for multilingual
play

Inferring Translation Candidates for Multilingual Dictionary - PowerPoint PPT Presentation

Inferring Translation Candidates for Multilingual Dictionary Generation with Multi-Way Neural Machine Translation Mihael Arcan, Daniel Torregrosa*, Sina Ahmadi* and John P. McCrae This publication has emanated from research supported in part


  1. Inferring Translation Candidates for Multilingual Dictionary Generation with Multi-Way Neural Machine Translation Mihael Arcan, Daniel Torregrosa*, Sina Ahmadi* and John P. McCrae This publication has emanated from research supported in part by a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289, co-funded by the European Regional Development Fund, and the European Union’s Horizon 2020 research and innovation programme under grant agreement No 731015, ELEXIS - European Lexical Infrastructure.

  2. Introduction Neural machine translation Results Dictionary data Conclusion 2

  3. Motivation • Knowledge bases are useful for many applications, but available in few languages • The creation and curation of knowledge bases is expensive • Hence, few or no knowledge bases in most languages • Can we use machine translation to translate knowledge? 3

  4. Overview • Multi-way neural machine translation without the targeted direction • Continuous training with a small curated dictionary • Discovery of new bilingual dictionary entries 4

  5. Targeted languages PT GL RO ES IT CA EU FR EN EO 5

  6. Introduction Neural machine translation Results Dictionary data Conclusion 6

  7. Machine translation before 2014 • Rule-based machine translation • Humans write rules • Highly customisable • High maintenance cost • Phrase-based statistical machine translation • Learns from parallel corpus • Less control on the translations 7

  8. Word embeddings • Fixed size numerical representation for words • From one-hot space (one dimension per difgerent word) to embedding space • The embedding vector represents the context where the word appears 8

  9. Long-short term memory Input Gate σ Output Gate σ Memory c t Input Output × × × σ Forget Gate Based on tex.stackexchange.com/questions/332747/how-to-draw-a-diagram-of-long-short-term-memory 9

  10. Bi-directional LSTM ⃗ ⃗ ⃗ ⃗ h 2 h 3 h 4 h 5 . . . LSTM ← LSTM ← LSTM ← LSTM ← . . . . . . LSTM → LSTM → LSTM → LSTM → . . . … … x 2 x 3 x 4 x 5 ⃗ ⃗ ⃗ ⃗ Based on github.com/PetarV-/TikZ 10

  11. Neural machine translation 11

  12. Neural machine translation 11

  13. Neural machine translation 11

  14. Neural machine translation 11

  15. Neural machine translation 11

  16. Neural machine translation 11

  17. Neural machine translation 11

  18. Neural machine translation 11

  19. Neural machine translation 11

  20. Subword units • One-hot vocabulary space has to be limited due to performance issues • This generates a lot of out-of-vocabulary entries • To minimize the efgect, we use subword units instead of words 12

  21. Byte pair encoding • BPE is a compression technique • It starts with all the difgerent characters in the corpus • The most frequent character combination is selected as a BPE operation • This is repeated until the desired number of BPE is reached • The fjnal size of the vocabulary is the number of BPE operations + the alphabet 13

  22. Byte pair encoding example low lower big bigger 14

  23. Byte pair encoding example l o w _ l o w e r _ b i g _ b i g g e r 14

  24. Byte pair encoding example l o w _ l o w e r _ b i g _ b i g g e r 14

  25. Byte pair encoding example l ‚ o w _ l ‚ o w e r _ b i g _ b i g g e r 14

  26. Byte pair encoding example l ‚ o w _ l ‚ o w e r _ b ‚ i g _ b ‚ i g g e r 14

  27. Byte pair encoding II bebo bebemos bebería beberíamos Present bebes bebéis Conditional beberías beberíais bebe beben bebería beberían bebí bebimos beberé beberemos Preterit bebiste bebisteis Future beberás beberéis bebió bebieron beberá beberán bebía bebíamos Imperfect bebías bebíais bebía bebían 15

  28. Byte pair encoding II beb o beb emos beb ería beb eríamos Present beb es beb éis Conditional beb erías beb eríais beb e beb en beb ería beb erían beb í beb imos beb eré beb eremos Preterit beb iste beb isteis Future beb erás beb eréis beb ió beb ieron beb erá beb erán beb ía beb íamos Imperfect beb ías beb íais beb ía beb ían 15

  29. Multi-way model • The model receives corpus in several difgerent languages both for source and target sentences • Each input sentence is annotated with the source language and the requested target language • In our case, Spanish-English, French-Romanian and Italian-Portuguese 16

  30. Continuous training • After training, the network is seldom able to produce text in the requested language other than the training one • For example, if requested to translate Spanish to French, it will generate English • We continue the training with a small corpus of sentences 17

  31. Dictionary data We used three difgerent dictionaries to continue training the system • Spanish to French Apertium dictionary (paper) • Spanish-French, Spanish-Portuguese and French-Portuguese dictionaries generated from Apertium data (task) • By following a cycle-based approach • By following a path-based approach 18

  32. Part of speech • The NMT models were trained without part of speech (POS) data • To assign POS, we use monolingual dictionaries automatically extracted from Wiktionary • If > the source word is in the source-language dictionary; and > the target word is in the target-language dictionary; and > they have one or more POS tags in common, • generate one entry per shared POS 19

  33. Introduction Neural machine translation Results Dictionary data Conclusion 20

  34. Evaluation • We used a dictionary automatically extracted from Wiktionary as gold standard • For those systems that have confjdence intervals, we calculate the precision and recall for all possible thresholds 21

  35. Results (paper) Spanish → French French → Spanish 1 1 0 . 8 0 . 8 0 . 6 0 . 6 Precision Precision 0 . 4 0 . 4 0 . 2 0 . 2 0 0 0 3000 6000 9000 12000 0 3000 6000 9000 12000 Correct entries Correct entries Apertium NMT+Apertium 1 NMT+Apertium 10 22

  36. Introduction Neural machine translation Results Dictionary data Conclusion 23

  37. Graph-based approaches Basic idea: Retrieve translations based on the graph of lan- guages Two defjnitions: • Language graph refers to the Apertium dictionary graph • Translation graph refers to a graph where vertices represent a word and edges represent the translations in other languages. 24

  38. Cycle-based approach EN:antique ES:antiguo EU:zahar FR:antique EN:ancient EO:antikva Apertium translations (black lines) in English (EN), French (FR), Basque (EU) and Esperanto (EO), and discovered possible translations (gray lines) and synonyms (red lines). 25

  39. Path-based approach Traverse all simple paths using pivot-oriented inference origen origem iturri fuente source fonto font fonte brollador spring primavero primavera primavera udaberri primavera printemps printempo malguki muelle English ( en ) Basque ( eu ) Spanish ( es ) French ( fr ) Esperanto ( eo ) Catalan ( ca ) Portuguese ( pt ) (Task) Weight translations w.r.t. frequency and path length 26

  40. Results (task, Wiktionary reference) English → French French → English English → Portuguese 1 1 1 0 . 8 0 . 8 0 . 8 Precision 0 . 6 0 . 6 0 . 6 0 . 4 0 . 4 0 . 4 0 . 2 0 . 2 0 . 2 0 0 0 0 3000 6000 9000 12000 0 3000 6000 9000 12000 0 3000 6000 9000 12000 Portuguese → English Portuguese → French French → Portuguese 1 1 1 0 . 8 0 . 8 0 . 8 Precision 0 . 6 0 . 6 0 . 6 0 . 4 0 . 4 0 . 4 0 . 2 0 . 2 0 . 2 0 0 0 0 3000 6000 9000 12000 0 3000 6000 9000 12000 0 3000 6000 9000 12000 Correct entries Correct entries Correct entries Cycle Path NMT-Cycle NMT-Path 27

  41. Introduction Neural machine translation Results Dictionary data Conclusion 28

  42. Conclusion • Using neural machine translation with • Existing bilingual knowledge (Paper) • Discovered bilingual knowledge (Task) • to generate new dictionaries. 29

Recommend


More recommend