automated translation how does it work
play

Automated Translation: How Does It Work? Stelios Piperidis Simon - PowerPoint PPT Presentation

Automated Translation: How Does It Work? Stelios Piperidis Simon Krek ELRC, ILSP/Athena RC Joef Stefan Institute ELRC Training Workshop in Slovenia, 08.12.2015 1 Machine Translation Agenda: Why MT: Volume, Quality and Cost? Why is


  1. Automated Translation: How Does It Work? Stelios Piperidis Simon Krek ELRC, ILSP/Athena RC Jožef Stefan Institute ELRC Training Workshop in Slovenia, 08.12.2015 1

  2. Machine Translation Agenda: • Why MT: Volume, Quality and Cost? • Why is MT hard? • MT + Human Translators = Quality • How does modern statistical MT work? • Its all about Data! • And the right kind of Data! ELRC Training Workshop in Slovenia, 08.12.2015 2

  3. Machine Translation, Quality and Cost? • Europe = Multilinguality • 24 official languages, 24+2 CEF languages • So much to translate! • Translation costs!? • Can MT help? • What about the Quality? Image: https://en.wikipedia.org/wiki/ENIAC#/media/File:Eniac.jpg License: public domain ELRC Training Workshop in Slovenia, 08.12.2015 3

  4. Why is MT Hard? • Human languages are: – Elegant – Efficient – Flexible – Complex • One word/sentence may mean many things • Many ways of saying the same thing • Meaning depends on context • Literal and figurative language (metaphor) • Language and culture (different ways of conceptualising the same thing) • Word order Image: http://workingtropes.lmc.gatech.edu/wiki/index.php/File:Man-vs-machine.jpg License: CC BY-NC-SA 3.0 • Morphology • … ELRC Training Workshop in Slovenia, 08.12.2015 4

  5. Language and Translation is Complex • Language/translation is complex • We cannot compute it exactly • We tried: rule- based MT and LT … • What do we do? • Machine Learning – Learns from data  data is all important – Approximate solution  not perfect, needs help • human professional translators • Post-editing • Automated Translation ≠ Automatic ELRC Training Workshop in Slovenia, 08.12.2015 5

  6. How does Modern MT Work? • No maths today • Instead: • The story of Statistical MT in p ictures … • Its all about Data … ELRC Training Workshop in Slovenia, 08.12.2015 8

  7. How does Modern MT Work? Statistical MT learns from data Two kinds of data: • Human translations • Text in the target language • The more data the better! • Also: the right kind of data! ELRC Training Workshop in Slovenia, 08.12.2015 9

  8. What can/do we Learn from Data? • Which sentences translate as which: sentence alignment • Which words translate as which: word alignment + translation probabilities • What is good target language like: language model ELRC Training Workshop in Slovenia, 08.12.2015 11

  9. Sentence Alignment ELRC Training Workshop in Slovenia, 08.12.2015 12

  10. Word Alignment: ELRC Training Workshop in Slovenia, 08.12.2015 13

  11. Word Alignment: ELRC Training Workshop in Slovenia, 08.12.2015 14

  12. Learning to Translate Words: • Word alignment mode knows a lot about Chinese soups • Doesn’t know much else … • Only knows what it has seen in the training data • Like people … • A common theme … • Given word aligned translation data, can we learn a translation dictionary? • Yes, really easy … ELRC Training Workshop in Slovenia, 08.12.2015 15

  13. Statistical Machine Translation ELRC Training Workshop in Slovenia, 08.12.2015 16

  14. Statistical Machine Translation ELRC Training Workshop in Slovenia, 08.12.2015 17

  15. Statistical Machine Translation ELRC Training Workshop in Slovenia, 08.12.2015 18

  16. Statistical Machine Translation ELRC Training Workshop in Slovenia, 08.12.2015 19

  17. Statistical Machine Translation I talk to the girl J’ parlent au le fille 2/3 2/3 2/3 3/5 1/1 Je parle à la fille 1/3 1/3 1/3 2/5 1/1 How to choose? ELRC Training Workshop in Slovenia, 08.12.2015 20

  18. Statistical Machine Translation The Language Model: • What is good target language? • Which words can follow which words and which can’t … the grammar • Learnt from the data … Je parle is good … • J ’ parlent is bad … • la fille is good … • le fille is bad … • Je parle à la fille >> J’ parlent à le • fille ELRC Training Workshop in Slovenia, 08.12.2015 21

  19. Statistical Machine Translation ELRC Training Workshop in Slovenia, 08.12.2015 23

  20. How does Modern MT Work? • No maths today • Instead: • The story of Statistical MT in p ictures … • Its all about Data … ELRC Training Workshop in Slovenia, 08.12.2015 24

  21. Phrase-Based SMT • So far: translating single words • Loses context: such as agreement ( le fille …) etc. • To some extent “ repaired ” by language model • A better model: • Not just translations of single words • But also phrase translations: – the girl : la fille – to the girl : a la fille – I talk : Je parle ELRC Training Workshop in Slovenia, 08.12.2015 25

  22. Statistical Machine Translation ELRC Training Workshop in Slovenia, 08.12.2015 26

  23. Phrase Based - Statistical Machine Translation ELRC Training Workshop in Slovenia, 08.12.2015 27

  24. Phrase Based - Statistical Machine Translation ELRC Training Workshop in Slovenia, 08.12.2015 28

  25. Phrase Based - Statistical Machine Translation • Much better than word-based SMT! • Standard technology: Google, Microsoft, Baidu, Global Localisation & Translation Industry • Moses Open Source PB-SMT • Most widely used SMT system • Research funded by EC • Used by EC DGT’s MT@EC ELRC Training Workshop in Slovenia, 08.12.2015 29

  26. Machine Translation and Data • Statistical Machine Translation is all about data • SMT learns how to translate from data • Data – translations (bilingual data) – Monolingual data (target language text) – Dictionaries, terminology, ontologies, named entities • Like people SMT is good at what it has learned ELRC Training Workshop in Slovenia, 08.12.2015 31

  27. Machine Translation and Data ELRC Training Workshop in Slovenia, 08.12.2015 32

  28. Machine Translation and Data ELRC Training Workshop in Slovenia, 08.12.2015 33

  29. CEF.AT and Data • CEF.AT needs the right kind of data • National governments, public administration, public services, NGOs • CEF provide services for multilingual engagement with national citizens, EU citizens and other customers of public administration ELRC Training Workshop in Slovenia, 08.12.2015 34

  30. ELRC • Help us make CEF.AT a success – Services for Europe’s citizens – Services for you – Support multi-linguality • Help us find the right kind of data • Supporting our language is supporting Europe and vice versa ELRC Training Workshop in Slovenia, 08.12.2015 35

Recommend


More recommend