Post-edi7ng Effort for English to Arabic Hybrid Machine Transla7on - PowerPoint PPT Presentation

An Empirical Study: Post-edi7ng Effort for English to Arabic Hybrid Machine Transla7on Hassan Sajjad , Francisco Guzman, Stephan Vogel Qatar Compu7ng Research Ins7tute, HBKU

Introduc7on • Old Arabic documents • Transla7on of metadata from English to Arabic

Tradi7onal Transla7on Process TM Translation Company British Library Translators

Problem • Various small documents • Fewer overlap at sentence/segment level • Few transla7on memory matches – A lot needs to be translated from scratch • Time and cost inefficient

Solu7on: Hybrid Machine Transla7on 100% recall – TM CMT High precision readily available transla7ons transla7ons Hybrid MT Hybrid MT: Combines the benefits of both! Transla7on Memory and Customized MT

Hybrid MT System • Transla7on Memory TM – First pass: use strict matching to translate known words and phrases • Customized Machine Transla7on CMT – Second pass: translate the remaining text using machine transla7on system

Aiming higher: Post Edi7ng for Quality TM CMT Hybrid MT Post Editors • High quality • High consistency • Cost and time effective

Customized Machine Transla7on CMT • A sta7s7cal machine transla7on system – Train specific to the domain of the text that needs to be translated • General prac7ce – Use Moses – Train on the data of transla7on memory – Follow recipe of a compe77on grade system to ensure high quality

English to Arabic CMT CMT • Best compe77on grade pipeline involves – Arabic (de-) tokeniza7on • Spli\ng morphologically rich words into smaller segments and vice-versa • +1.5 BLEU points improvement – Arabic (de-) normaliza7on • Mapping different forms of a leaer to one form and vice verse • +0.5 BLEU point improvement This ensures high quality but does not guarantee less frustra7on for post-editors

Why? CMT Transla7on output requires: • De-tokeniza7on and de-normaliza7on • De-normaliza7on introduces character-level errors – Frustra7ng for the post-editor to correct – Time inefficient

Recommended Prac7ces for CMT of CMT English-Arabic • Don’t normalize But • Always tokenize – Improve coverage of words – Beaer transla7ons

Let’s Talk about BL Case Numbers! We compare: Looking at: • Transla7on Memory (TM) only • Effec7veness • Hybrid MT (TM + CMT) • Quality • Consistency Also: • Translator • Hybrid MT + Post edi7ng (PE)

Data • 1000 documents – 90k parallel sentences/segments – 953 documents for training • 489k tokens – Rest for tune and test

Effec7veness of TM Exact match Fuzzy match 7% 7% 84% 84% 13. 13.5% 5% 50% 50% BUT BUT COVERS COVERS ONLY ONLY words segments words segments More than 85% of words still need to be translated !!!! * Based on an assessment over X documents

Effec7veness of CMT 100% 100% 99. 99.9% 9% AND segments words translated!

Effec7veness of Hybrid MT • High precision – TM exact matches • High recall – CMT to produce high quality transla7ons

Assessing Quality • BLEU – Compare output to ‘reference’ transla7on Strict Par7al TM 7.07 21.01 TM + CMT 54.60 48.54 CMT alone BLEU scores are 53.90

Assessing Quality • TER: Transla7on Error Rate – How much effort is needed to get perfect transla7on? – Compare to ‘reference’ transla7on Hybrid MT TM 0% 20% 40% 60% 80% 100% Percentage of effort required Hybrid MT can improve beyond that!!!

Assessing Quality • TER vs. Post edi7ng effort – Similar effort es7ma7on using post-edi7ng of Hybrid MT PE on Hybrid MT Hybrid MT TM 0% 20% 40% 60% 80% 100% Percentage of effort required * PE is based on an assessment over 4 documents, using a junior translator

Consistency of Hybrid MT • We compared Hybrid MT versus a junior translator • We measured consistency with reference transla7ons Hybrid MT Translator 0% 10% 20% 30% 40% 50% 60% 70% Overlap with reference transla7on Hybrid MT is more consistent with reference translations * Based on an assessment over 4 documents

Speedup of Hybrid MT • We compared Hybrid MT versus a junior translator 120 Hybrid MT+PE is 30% more efficient Time taken to translate 100 80 (mins) Translator 60 Hybrid MT + PE 40 20 0 * Based on an assessment over 4 documents

Conclusion • Hybrid MT – High precision and high recall • Hybrid MT plus Post-edi7ng – Efficient in terms of both 7me and cost – Improves consistency • Customized MT for English-Arabic – Don’t normalize but always tokenize

References Ahmed Abdelali, Kareem Darwish, Nadir Durrani, and Hamdy Mubarak. • Farasa: A Fast and Furious Segmenter for Arabic. In NAACL-2016, San Diego, US. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello • Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Chris7ne Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constan7n, and Evan Herbst. Moses: Open source toolkit for sta7s7cal machine transla7on. In ACL-2007, Prague, Czech Republic Hassan Sajjad, Francisco Guzman, Preslav Nakov, Ahmed Abdelali, Kenton • Murray, Fahad Al Obaidli, and Stephan Vogel. QCRI at IWSLT 2013: Experiments in Arabic-English and English-Arabic Spoken Language Transla7on. In IWSLT-2013, Heidelberg, Germany

Thank you

Post-edi7ng Effort for English to Arabic Hybrid Machine Transla7on - PowerPoint PPT Presentation

An Empirical Study: Post-edi7ng Effort for English to Arabic Hybrid Machine Transla7on Hassan Sajjad , Francisco Guzman, Stephan Vogel Qatar Compu7ng Research Ins7tute, HBKU Introduc7on Old Arabic documents Transla7on of metadata from

RWTH Aachen Machine Translation System: {Arabic, Chinese, German}-English MT Track Stephan Peitz,

Using Synonyms for Arabic-to-English Example-Based Translation Kfir Bar Nachum Dershowitz Tel

The Art of Arabic Calligraphy Fayeq Oweis, Ph.D. The Art of Arabic Calligraphy Islamic Art

Chunk-based Verb Reordering in VSO Sentences for Arabic-English SMT Arianna Bisazza, Marcello

Challenges and Techniques for Dialectal Arabic Speech Recognition and Machine Translation

The LIG Arabic / English Speech Translation System at IWSLT07 Laurent BESACIER, Amar MAHDHAOUI,

QUALITY ESTIMATION AND EVALUATION OF MACHINE TRANSLATION INTO ARABIC Houda Bouamor, Carnegie

Corpus linguistics resources and tools for Arabic lexicography tools for Arabic lexicography

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

www.nic .ir . . Singapore52.icann.org Feb 11, 2015 Task Force on

Arabic Script Variant Issues for TLDs Arabic Case Study Team Arabic Case Study Team

Presentation Notes 1. A brief explanation of spellings that you can share with your class: Islam

Electric Machine Simulation T Electric Machine Simulation Technology chnology St Steve Har e

Arabic Language Challenges Walid Magdy This lecture is not About Arabic language technologies

with CAMduct. When you get a new machine, it can be a hassle figuring out how to get it up and

Hanady Ahmed Allan Ramsay Arabic Department, CAS

Hybrid MachineTool y Simulation Contents 1. General

Deep Linguistic Information in Hybrid Machine Translation Charles

Aspectual object marking in Libyan Arabic Kersti Brjars, Khawla Ghadgoud & John Payne The

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

Energy Efficient GO-PEEK Hybrid Membrane Process for Post-combustion CO 2 Capture DOE Contract No.

Estimating post-editing effort State-of-the-art systems and open issues Lucia Specia University

PART 1 The Speedbumps of Good Faith Effort (Pre-Award and Post-Award) 2016 Statewide DBE

Links for Audio Translation This presentation is being streamed online in English as well as

Post-edi7ng Effort for English to Arabic Hybrid Machine Transla7on - PowerPoint PPT Presentation

An Empirical Study: Post-edi7ng Effort for English to Arabic Hybrid Machine Transla7on Hassan Sajjad , Francisco Guzman, Stephan Vogel Qatar Compu7ng Research Ins7tute, HBKU Introduc7on Old Arabic documents Transla7on of metadata from

RWTH Aachen Machine Translation System: {Arabic, Chinese, German}-English MT Track Stephan Peitz,

Using Synonyms for Arabic-to-English Example-Based Translation Kfir Bar Nachum Dershowitz Tel

The Art of Arabic Calligraphy Fayeq Oweis, Ph.D. The Art of Arabic Calligraphy Islamic Art

Chunk-based Verb Reordering in VSO Sentences for Arabic-English SMT Arianna Bisazza, Marcello

Challenges and Techniques for Dialectal Arabic Speech Recognition and Machine Translation

The LIG Arabic / English Speech Translation System at IWSLT07 Laurent BESACIER, Amar MAHDHAOUI,

QUALITY ESTIMATION AND EVALUATION OF MACHINE TRANSLATION INTO ARABIC Houda Bouamor, Carnegie

Corpus linguistics resources and tools for Arabic lexicography tools for Arabic lexicography

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

www.nic .ir . . Singapore52.icann.org Feb 11, 2015 Task Force on

Arabic Script Variant Issues for TLDs Arabic Case Study Team Arabic Case Study Team

Presentation Notes 1. A brief explanation of spellings that you can share with your class: Islam

Electric Machine Simulation T Electric Machine Simulation Technology chnology St Steve Har e

Arabic Language Challenges Walid Magdy This lecture is not About Arabic language technologies

with CAMduct. When you get a new machine, it can be a hassle figuring out how to get it up and

Hanady Ahmed Allan Ramsay Arabic Department, CAS

Hybrid MachineTool y Simulation Contents 1. General

Deep Linguistic Information in Hybrid Machine Translation Charles

Aspectual object marking in Libyan Arabic Kersti Brjars, Khawla Ghadgoud &amp; John Payne The

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

Energy Efficient GO-PEEK Hybrid Membrane Process for Post-combustion CO 2 Capture DOE Contract No.

Estimating post-editing effort State-of-the-art systems and open issues Lucia Specia University

PART 1 The Speedbumps of Good Faith Effort (Pre-Award and Post-Award) 2016 Statewide DBE

Links for Audio Translation This presentation is being streamed online in English as well as

Aspectual object marking in Libyan Arabic Kersti Brjars, Khawla Ghadgoud & John Payne The