Findings of the 2016 Conference on Machine Translation WMT 2016 @ - PowerPoint PPT Presentation

Findings of the 2016 Conference on Machine Translation WMT 2016 @ ACL Berlin, Germany August 11–12 Organizers : Ond ř ej Bojar (Charles University in Prague), Christian Buck (University of Edinburgh), Rajen Chatterjee (FBK), Christian Federmann (MSR), Liane Guillou (University of Edinburgh), Barry Haddow (University of Edinburgh), Matthias Huck (University of Edinburgh), Antonio Jimeno Yepes (IBM Research Australia), Varvara Logacheva (University of Sheffield), Aurélie Névéol (LIMSI, CNRS), Mariana Neves (Hasso-Plattner Institute), Pavel Pecina (Charles University in Prague), Martin Popel (Charles University in Prague), Philipp Koehn (University of Edinburgh / Johns Hopkins University), Christof Monz (University of Amsterdam), Matteo Negri (FBK), Matt Post (Johns Hopkins University), Carolina Scarton (University of Sheffield), Lucia Specia (University of Sheffield), Karin Verspoor (University of Melbourne), Jörg Tiedemann (University of Helsinki), Marco Turchi (FBK)

News Translation Task

Overview Français č e š tina English Deutsch român ă NEW ́сский p у suomi Türkçe NEW

      Funding • European Union’s Horizon 2020 program   • Yandex (Russian–English and Turkish–English test sets) • University of Helsinki (Finnish–English test set)

Participation 102 entries from 24 institutions +4 anonymized commercial, online, and rule-based systems

Human Evaluation

Human Evaluation • We wish to identify the best systems for each task – Automatic metrics are useful for development, but must be grounded in human evaluation of system output • How to compute it? – Adequacy / fluency, sentence ranking (RR) , constituent ranking, constituent OK, sentence comprehension – Direct Assessment (DA)

Metric / Year ‘06 '07 '08 '09 '10 ’11 '12 '13 '14 '15 '16 ● ● Adequacy / Fluency ● ● ● ● ● ● ● ● ● ● Sentence Ranking ● ● Constituent Ranking ● Constituent   OK ● ● Sentence Comprehension ● Direct Assessment

Sentence Ranking A A > {B, D, E} B > {D, E} B C > {A, B, D, E} C D D > {E} = 10 pairwise E rankings https://github.com/cfedermann/Appraise/

            More Judgments • Innovation: rank distinct outputs instead of systems   • Then, distribute   rankings across   systems:  

Data collected • 150 trusted annotators, 939 person-hours Pairs Expanded 2014 328 2015 290 252 2016 324 245 Pairwise judgments (thousands) statmt.org/wmt16/results.html

Clustering • Rank systems using TrueSkill (Herbrich et al., 2006, Sakaguchi et al., 2014) • Cluster (Koehn, 2012) – Aggregate each system’s rank over 1,000 bootstrap-resampled folds – Throw out top and bottom 25 ranks, collect ranges – Groups systems by non-overlapping ranges

Manual evaluation summary 11000 • ~4.1k rankings / 2015 10000 task (~3k last year) 2016 9000 Pairwise judgments / system • Total judgments: 8000 7000 542k (328k last 6000 year) 5000 • Data: statmt.org/ 4000 3000 wmt16/results.html 2000 1000 0 0 5 10 15 20 Number of systems in task

Czech–English cluster constrained not constrained 1 uedin-nmt 2 jhu-pbmt 3 online-B 4 PJATK, TT-* 5 online-A 6 cu-mergetrees

English–Czech cluster constrained not constrained 1 uedin-nmt 2 nyu-montreal 3 jhu-pbmt 4 cu-chimera, cu-tamchyna 5 uedin-cu-syntax online-B 6 TT-* 7 online-A 8 cu-tectomt 9 tt-usaar-hmm-mert 10 cu-mergetrees 11 tt-usaar-hmm-mira 12 tt-usaar-harm

Russian–English cluster constrained not constrained 1 amu-uedin,NRC, uedin-nmt online-G, online-B 2 AFRL-MITLL-phr online-A 3 AFRL-MITLL-cntr, PROMT-rule 4 online-F

English–Russian cluster constrained not constrained 1 promt-rule 2 amu-uedin, uedin-nmt online-B, online-G 3 NYU-montreal jhu-pbmt, limsi, AFRL- 4 online-A MITLL-phr 5 AFRL-MITLL-verb 6 online-F

German–English cluster constrained not constrained 1 uedin-nmt uedin-syntax, kit,   2 online-B, online-A uedin-pbmt, jhu-pbmt 3 jhu-syntax online-G 4 online-F

English–German cluster constrained not constrained 1 uedin-nmt 2 metamind 3 uedin-syntax 4 nyu-montreal kit-limsi, cambridge, 5 online-B, online-A promt-rule, kit 6 jhu-syntax, jhu-pbmt 7 uedin-pbmt online-F, online-G

Romanian–English cluster constrained not constrained 1 uedin-nmt online-B 2 uedin-pbmt 3 uedin-syntax, jhu-pbmt, limsi online-A

English–Romanian cluster constrained not constrained 1 uedin-nmt, qt21-himl-comb kit, uedin-pbmt,   2 online-B uedin-lmu-hiero, rwth-comb limsi, lmu-cuni, jhu-pbmt, 3 online-A usfd-rescoring

Finnish–English cluster constrained not constrained uedin-pbmt, online-G, 1 online-B, uh-opus 2 PROMT-smt 3 uh-factored, uedin-syntax 4 online-A 5 jhu-pbmt

English–Finnish cluster constrained not constrained abumatran-nmt,   1 online-G, online-B, uh-opus abumatran-cmb 2 abumatran-pb, nyu-montreal online-A jhu-pbmt, uh-factored, aalto, 3 jhu-hltcoe, uut

Turkish–English cluster constrained not constrained 1 online-B, online-G, online-A 2 tbtk-syscomb, usda PROMT-smt 3 jhu-syntax, jhu-pbmt, parFDA

English–Turkish cluster constrained not constrained 1 online-G, online-B 2 online-A 3 ysda 4 jhu-hltcoe, tbtk-morph, cmu 5 jhu-pbmt, parFDA

Trends • UEdin-NMT – 4 languages: uncontested winner – 3 languages: tied for first – 1 language: tied for second (behind rule-based!) • English–Russian: rule-based system (PROMT-rule) the winner by a wide margin

Comparison with BLEU promt-rule 0.8 0.6 uedin-nmt 0.4 0.2 TrueSkill mean 0 -0.2 -0.4 -0.6 -0.8 -1 -1.2 -1.4 0 0.05 0.1 0.15 0.2 0.25 0.3 BLEU score

  Data • statmt.org/wmt16/results.html – Source and reference data, system outputs – Manual evaluation results (raw XML, CSV files with pairwise rankings)   srclang,trglang,id,judge,sys1,sys1rank,sys2,sys2rank,group deu,eng,348,judge13,jhu-syntax,3,online-B,5,190 • github.com/cfedermann/wmt16 – Code used to compute rankings, clusters, annotator agreement

Direct Assessment

Findings of the 2016 Conference on Machine Translation WMT 2016 @ - PowerPoint PPT Presentation

Findings of the 2016 Conference on Machine Translation WMT 2016 @ ACL Berlin, Germany August 1112 Organizers : Ond ej Bojar (Charles University in Prague), Christian Buck (University of Edinburgh), Rajen Chatterjee (FBK), Christian

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Machine Translation Philipp Koehn 1 December 2015 Philipp Koehn Artificial Intelligence:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Global Translation Services Website translation using post-edited machine translation and

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

Addressing(the(challenges of federation in the Nectar(Research(Cloud

Mathematics/ Statistics in Higher Education Chris Feil- Apple Computer Australia P/L

Towards Region-Based Memory Management for Go Matt Davis Peter Schachte, Zoltan Somogyi, and

Lecture 1. Introduction. Probability Theory COMP90051 Machine Learning Sem2 2017 Lecturer:

Probing New Physics with Probing New Physics with Astrophysical Neutrinos Astrophysical

Automatic Parallelisation for Mercury Paul Bone pbone@csse.unimelb.edu.au Department of Computer

High Level Trigger Chunhua Li The University of Melbourne TRG/DAQ workshop BINP, Novosibirsk

RENCONTRES DU VIETNAM Regarded as an unique adventure in the scientific world, the

Sambuz

Useful Links

Newsletter

Mail Us