Combining Crowd and AI to scale professional-quality translation João Graça CTO
The Internet, 1997 80% English
The Internet, 2017 30% English 20% Chinese 8% 6% Spanish 5% Japanese 4% Portuguese 3% German Arabic
“All translation firms together are able to translate far less than 1% of relevant content produced everyday” CSA – MT Is Unavoidable to Keep Up with Content Volumes
Is Machine Translation already here? * Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions Marcin Junczys-Dowmunt, Tomasz Dwojak, Hieu Hoang Everyone agrees that NMT is here to stay and much better than SMT
Unbabel Experiments with Customer Service Tickets 49,9 43,5 35,7 29,6 Moses Neural MT Moses Neural MT (generic) (generic) (adapted) (adapted) Bleu
Is NMT really enough? * A Neural Network for Machine Translation, at Production Scale. Google Blog
Quality per Job MQM 100% 80% 60% 40% 20% 0% 0 6 12 18 24 30 Job Good Not sure Bad
Will AI solve translation? JOBS MQM 95 QUALITY TIME MACHINE ONLY
Will AI solve translation? JOBS MQM 95 HUMAN EFFORT TIME
Quality per Job MQM 100% 80% 60% 40% 20% 0% 0 6 12 18 24 30 Job Good Not sure Bad
Unbabel Pipeline
Unbabel Pipeline Job
Unbabel Pipeline Job Machine Translation
Unbabel Pipeline Job Machine Translation
Unbabel Pipeline Q.E. Job Machine Quality Translation Estimation
Unbabel Pipeline High Q.E. Q.E. Job Customer Machine Quality Translation Estimation
Unbabel Pipeline High Q.E. Low Q.E. Q.E. Job Customer Machine Quality Translation Community Estimation Re-Eval Translators
Data Generation Engine Q.E. Q.E. Customer Job Customer Machine Quality Quality Community Translation Estimation Estimation Translators
Data Generation Engine Before After Initial text Initial text All changes in between: NO Mouse clicks DATA Key presses Timestamps POINTS Submitted text Submitted text
Keystroke Analysis Raw data Processed information At 18:03:30: At 18:03:35: At 18:03:30: In nugget 3 In nugget 3 In nugget 3 Initial text mouseClick Pressed Shift mouseClick “Espero que esto es útil” Cursor at 16 Cursor at 25 Cursor at 16 Selected: 0 Selected: 0 Selected: 0 • Deleted word “ es” At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 • Inserted word “sea” Pressed Backspace Pressed s Pressed Backspace Cursor at 16 Cursor at 25 Cursor at 16 Submitted text Selected: 0 Selected: 0 Selected: 0 “Espero que esto sea útil” At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 Pressed Backspace Pressed i Pressed Backspace Cursor at 15 Cursor at 26 Cursor at 15 Selected: 0 Selected: 0 Selected: 0 At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 Pressed Backspace Pressed e Pressed Backspace
Unbabel Pipeline MACHINE QE COMMUNITY High Q.E. Low Q.E. Q.E. Job Customer Machine Quality Translation Crowd Estimation Re-Eval
Machine Translation Pipeline Translation Memory Job Result MT Router Customer MT Customer APE
Machine Translation Models Phrase-based MT Neural MT
Customer Adaptation Customer Support Tickets 60 52,5 51,7 48,8 47,2 46,9 45 43,2 43,1 37,5 34,1 Bleu 30 G N 1 5 1 2 2 K K 0 0 5 M o K k K o T g l e N M T
Quality Estimation Word-Level QE Which words are translated correctly/incorrectly? Sentence-Level QE How good is the entire translation?
Quality Estimation Word-level QE example Hey lá , eu sou pesaroso sobre aquele ! BA BA BA BA BA OK OK OK OK D D D D D
QE Training Bad translation Unbabel Ticket source MT final Good translation
QE Word Level
QE Word Level F1_MULT Ugent System LinearQE NeuralQE StackedQE APE-QE FullStackedQE 0 15 30 45 60
QE Word Level F1_MULT 41,1 Ugent System LinearQE NeuralQE StackedQE APE-QE FullStackedQE 0 15 30 45 60
QE Word Level F1_MULT 41,1 Ugent System 45,2 LinearQE NeuralQE StackedQE APE-QE FullStackedQE 0 15 30 45 60
QE Word Level F1_MULT 41,1 Ugent System 45,2 LinearQE 47,3 NeuralQE StackedQE APE-QE FullStackedQE 0 15 30 45 60
QE Word Level F1_MULT 41,1 Ugent System 45,2 LinearQE 47,3 NeuralQE 50,3 StackedQE WMT 16 WL QE Winner APE-QE FullStackedQE 0 15 30 45 60
QE Word Level F1_MULT 41,1 Ugent System 45,2 LinearQE 47,3 NeuralQE 50,3 StackedQE WMT 16 WL QE Winner 55,7 APE-QE FullStackedQE 0 15 30 45 60
QE Word Level F1_MULT 41,1 Ugent System 45,2 LinearQE 47,3 NeuralQE 50,3 StackedQE WMT 16 WL QE Winner 55,7 APE-QE 57,5 FullStackedQE 0 15 30 45 60 Andre F. T. Martins, Marcin Junczys-Dowmunt, Fabio Kepler, Ramon Astudillo, Chris Hokamp, Roman Grundkiewicz. “Pushing the Limits of Translation Quality Estimation.” TACL 2017 (To Appear)
QE Sentence Level
QE Sentence Level YANDEX StackedQE APE-QE FullStackedQE 0 17,5 35 52,5 70 Pearson Correlation
QE Sentence Level 52,5 YANDEX StackedQE APE-QE FullStackedQE 0 17,5 35 52,5 70 Pearson Correlation
QE Sentence Level 52,5 YANDEX 54,9 StackedQE APE-QE FullStackedQE 0 17,5 35 52,5 70 Pearson Correlation
QE Sentence Level 52,5 YANDEX 54,9 StackedQE 61,3 APE-QE FullStackedQE 0 17,5 35 52,5 70 Pearson Correlation
QE Sentence Level 52,5 YANDEX 54,9 StackedQE 61,3 APE-QE 65,6 FullStackedQE 0 17,5 35 52,5 70 Pearson Correlation Andre F. T. Martins, Marcin Junczys-Dowmunt, Fabio Kepler, Ramon Astudillo, Chris Hokamp, Roman Grundkiewicz. “Pushing the Limits of Translation Quality Estimation.” TACL 2017 (To Appear)
QE in the Pipeline
QE in the Pipeline High Q.E. Q.E. Job Customer Machine Quality Translation Estimation
QE in the Pipeline High Q.E. Q.E. Job Customer Machine Quality Translation Estimation Document-Level QE how good is the entire document?
QE in the Pipeline High Q.E. Low Q.E. Q.E. Job Customer Machine Quality Community Translation Estimation Re-Eval Translators Document-Level QE how good is the entire document?
QE in the Pipeline High Q.E. Low Q.E. Q.E. Job Customer Machine Quality Community Translation Estimation Re-Eval Translators Document-Level QE Interesting how good is the entire document? numbers Human QE coming soon Can we evaluate post-edit output?
Professional Translation Goals Quality Speed Cost
Professional Translation Goals Pillars Quality Speed Cost
Professional Translation Goals Pillars Quality Speed Cost
Professional Translation Goals Pillars Quality • Editors Pool Speed Cost
Professional Translation Goals Pillars Quality • Editors Pool • Initial Text (MT) Speed Cost
Professional Translation Goals Pillars Quality • Editors Pool • Initial Text (MT) Speed • Editor Assignment Cost
Professional Translation Goals Pillars Quality • Editors Pool • Initial Text (MT) Speed • Editor Assignment • Interfaces Cost
Professional Translation Goals Pillars Quality • Editors Pool • Initial Text (MT) Speed • Editor Assignment • Interfaces Cost • Quality Evaluation
Unbabel Community
Unbabel Community 50 000 Users
Distributed Pipeline Quality Estimation
Editors Pool Specialisation layers will grow with time Expert The best editors have access to paid content Paid Content Editors get ratings for the tasks Training Content Editors are tested when they sign up Testing Phase
Editors Pool Evaluators Specialisation layers will grow with time Expert The best editors have access to paid content Paid Content Editors get ratings for the tasks Training Content Editors are tested when they sign up Testing Phase
Editors Pool Evaluators Specialisation layers will grow with time Expert Annotators The best editors have access to paid content Paid Content Editors get ratings for the tasks Training Content Editors are tested when they sign up Testing Phase
How Editors are Evaluated
Editors Profiling
Editors Profiling
Editor Assignment
Editor Assignment Tasks/time 2 m 6 m 10 m 12 m 18 m 45 m
Editor Assignment Tasks/time Editors 2 m 6 m 10 m 12 m 18 m 45 m
Editor Assignment Tasks/time Editors 2 m 6 m Pull 10 m 12 m 18 m 45 m
Editor Assignment SLA Tasks/time Editors 6 H 2 m 30 m 6 m Pull 2 D 10 m 6 D 12 m 20 m 18 m 40 m 45 m
Editor Assignment Priority SLA Tasks/time Editors 6 H 1000 2 m 30 m 1100 6 m Pull 2 D 1000 10 m 6 D 1000 12 m 20 m 1100 18 m 40 m 1100 45 m
Editor Assignment Queue Priority SLA Tasks/time Editors G 6 H 1000 2 m 30 m G 1100 6 m Pull 2 D G 1000 10 m 6 D G 1000 12 m 20 m 1100 R 18 m 40 m 1100 45 m R
Recommend
More recommend