Combining Crowd and AI to scale professional-quality translation - PowerPoint PPT Presentation

Combining Crowd and AI to scale professional-quality translation João Graça CTO

The Internet, 1997 80%   English

The Internet, 2017 30%   English 20%   Chinese 8%   6%   Spanish 5%   Japanese 4%   Portuguese 3%   German Arabic

“All translation firms together are able to translate far less than 1% of relevant content produced everyday” CSA – MT Is Unavoidable to Keep Up with Content Volumes

Is Machine Translation already here? * Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions Marcin Junczys-Dowmunt, Tomasz Dwojak, Hieu Hoang Everyone agrees that NMT is here to stay and much better than SMT

Unbabel Experiments with Customer Service Tickets 49,9 43,5 35,7 29,6 Moses Neural MT Moses Neural MT (generic) (generic) (adapted) (adapted) Bleu

Is NMT really enough? * A Neural Network for Machine Translation, at Production Scale. Google Blog

Quality per Job MQM 100% 80% 60% 40% 20% 0% 0 6 12 18 24 30 Job Good Not sure Bad

Will AI solve translation? JOBS MQM 95 QUALITY TIME MACHINE ONLY

Will AI solve translation? JOBS MQM 95 HUMAN EFFORT TIME

Quality per Job MQM 100% 80% 60% 40% 20% 0% 0 6 12 18 24 30 Job Good Not sure Bad

Unbabel Pipeline

Unbabel Pipeline Job

Unbabel Pipeline Job Machine Translation

Unbabel Pipeline Q.E. Job Machine Quality Translation Estimation

Unbabel Pipeline High Q.E. Q.E. Job Customer Machine Quality Translation Estimation

Unbabel Pipeline High Q.E. Low Q.E. Q.E. Job Customer Machine Quality Translation Community Estimation Re-Eval Translators

Data Generation Engine Q.E. Q.E. Customer Job Customer Machine Quality Quality Community Translation Estimation Estimation Translators

  Data Generation Engine Before After Initial text Initial text   All changes in between: NO Mouse clicks DATA Key presses Timestamps POINTS Submitted text Submitted text

Keystroke Analysis Raw data Processed information At 18:03:30: At 18:03:35: At 18:03:30: In nugget 3 In nugget 3 In nugget 3 Initial text mouseClick Pressed Shift mouseClick “Espero que esto es útil” Cursor at 16 Cursor at 25 Cursor at 16 Selected: 0 Selected: 0 Selected: 0 • Deleted word “ es” At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 • Inserted word “sea” Pressed Backspace Pressed s Pressed Backspace Cursor at 16 Cursor at 25 Cursor at 16 Submitted text Selected: 0 Selected: 0 Selected: 0 “Espero que esto sea útil” At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 Pressed Backspace Pressed i Pressed Backspace Cursor at 15 Cursor at 26 Cursor at 15 Selected: 0 Selected: 0 Selected: 0 At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 Pressed Backspace Pressed e Pressed Backspace

Unbabel Pipeline MACHINE QE COMMUNITY High Q.E. Low Q.E. Q.E. Job Customer Machine Quality Translation Crowd Estimation Re-Eval

Machine Translation Pipeline Translation Memory Job Result MT Router Customer MT Customer APE

Machine Translation Models Phrase-based MT Neural MT

Customer Adaptation Customer Support Tickets 60 52,5 51,7 48,8 47,2 46,9 45 43,2 43,1 37,5 34,1 Bleu 30 G N 1 5 1 2 2 K K 0 0 5 M o K k K o T g l e N M T

Quality Estimation Word-Level QE   Which words are translated correctly/incorrectly? Sentence-Level QE   How good is the entire translation?

Quality Estimation Word-level QE example Hey lá , eu sou pesaroso sobre aquele ! BA BA BA BA BA OK OK OK OK D D D D D

QE Training Bad translation Unbabel Ticket source MT final Good translation

QE Word Level

QE Word Level F1_MULT Ugent System LinearQE NeuralQE StackedQE APE-QE FullStackedQE 0 15 30 45 60

QE Word Level F1_MULT 41,1 Ugent System LinearQE NeuralQE StackedQE APE-QE FullStackedQE 0 15 30 45 60

QE Word Level F1_MULT 41,1 Ugent System 45,2 LinearQE NeuralQE StackedQE APE-QE FullStackedQE 0 15 30 45 60

QE Word Level F1_MULT 41,1 Ugent System 45,2 LinearQE 47,3 NeuralQE StackedQE APE-QE FullStackedQE 0 15 30 45 60

QE Word Level F1_MULT 41,1 Ugent System 45,2 LinearQE 47,3 NeuralQE 50,3 StackedQE WMT 16 WL QE Winner APE-QE FullStackedQE 0 15 30 45 60

QE Word Level F1_MULT 41,1 Ugent System 45,2 LinearQE 47,3 NeuralQE 50,3 StackedQE WMT 16 WL QE Winner 55,7 APE-QE FullStackedQE 0 15 30 45 60

QE Word Level F1_MULT 41,1 Ugent System 45,2 LinearQE 47,3 NeuralQE 50,3 StackedQE WMT 16 WL QE Winner 55,7 APE-QE 57,5 FullStackedQE 0 15 30 45 60 Andre F. T. Martins, Marcin Junczys-Dowmunt, Fabio Kepler, Ramon Astudillo, Chris Hokamp, Roman Grundkiewicz. “Pushing the Limits of Translation Quality Estimation.” TACL 2017 (To Appear)

QE Sentence Level

QE Sentence Level YANDEX StackedQE APE-QE FullStackedQE 0 17,5 35 52,5 70 Pearson Correlation

QE Sentence Level 52,5 YANDEX StackedQE APE-QE FullStackedQE 0 17,5 35 52,5 70 Pearson Correlation

QE Sentence Level 52,5 YANDEX 54,9 StackedQE APE-QE FullStackedQE 0 17,5 35 52,5 70 Pearson Correlation

QE Sentence Level 52,5 YANDEX 54,9 StackedQE 61,3 APE-QE FullStackedQE 0 17,5 35 52,5 70 Pearson Correlation

QE Sentence Level 52,5 YANDEX 54,9 StackedQE 61,3 APE-QE 65,6 FullStackedQE 0 17,5 35 52,5 70 Pearson Correlation Andre F. T. Martins, Marcin Junczys-Dowmunt, Fabio Kepler, Ramon Astudillo, Chris Hokamp, Roman Grundkiewicz. “Pushing the Limits of Translation Quality Estimation.” TACL 2017 (To Appear)

QE in the Pipeline

QE in the Pipeline High Q.E. Q.E. Job Customer Machine Quality Translation Estimation

QE in the Pipeline High Q.E. Q.E. Job Customer Machine Quality Translation Estimation Document-Level QE   how good is the entire document?

QE in the Pipeline High Q.E. Low Q.E. Q.E. Job Customer Machine Quality Community Translation Estimation Re-Eval Translators Document-Level QE   how good is the entire document?

QE in the Pipeline High Q.E. Low Q.E. Q.E. Job Customer Machine Quality Community Translation Estimation Re-Eval Translators Document-Level QE   Interesting how good is the entire document? numbers Human QE   coming soon Can we evaluate post-edit output?

Professional Translation Goals Quality Speed Cost

Professional Translation Goals Pillars Quality Speed Cost

Professional Translation Goals Pillars Quality • Editors Pool Speed Cost

Professional Translation Goals Pillars Quality • Editors Pool • Initial Text (MT) Speed Cost

Professional Translation Goals Pillars Quality • Editors Pool • Initial Text (MT) Speed • Editor Assignment Cost

Professional Translation Goals Pillars Quality • Editors Pool • Initial Text (MT) Speed • Editor Assignment • Interfaces Cost

Professional Translation Goals Pillars Quality • Editors Pool • Initial Text (MT) Speed • Editor Assignment • Interfaces Cost • Quality Evaluation

Unbabel Community

Unbabel Community 50 000 Users

Distributed Pipeline Quality Estimation

Editors Pool Specialisation layers will grow with time Expert The best editors have access to paid content Paid Content Editors get ratings for the tasks Training Content Editors are tested when they sign up Testing Phase

Editors Pool Evaluators Specialisation layers will grow with time Expert The best editors have access to paid content Paid Content Editors get ratings for the tasks Training Content Editors are tested when they sign up Testing Phase

Editors Pool Evaluators Specialisation layers will grow with time Expert Annotators The best editors have access to paid content Paid Content Editors get ratings for the tasks Training Content Editors are tested when they sign up Testing Phase

How Editors are Evaluated

Editors Profiling

Editor Assignment

Editor Assignment Tasks/time 2 m 6 m 10 m 12 m 18 m 45 m

Editor Assignment Tasks/time Editors 2 m 6 m 10 m 12 m 18 m 45 m

Editor Assignment Tasks/time Editors 2 m 6 m Pull 10 m 12 m 18 m 45 m

Editor Assignment SLA Tasks/time Editors 6 H 2 m 30 m 6 m Pull 2 D 10 m 6 D 12 m 20 m 18 m 40 m 45 m

Editor Assignment Priority SLA Tasks/time Editors 6 H 1000 2 m 30 m 1100 6 m Pull 2 D 1000 10 m 6 D 1000 12 m 20 m 1100 18 m 40 m 1100 45 m

Editor Assignment Queue Priority SLA Tasks/time Editors G 6 H 1000 2 m 30 m G 1100 6 m Pull 2 D G 1000 10 m 6 D G 1000 12 m 20 m 1100 R 18 m 40 m 1100 45 m R

Combining Crowd and AI to scale professional-quality translation - PowerPoint PPT Presentation

Combining Crowd and AI to scale professional-quality translation Joo Graa CTO The Internet, 1997 80% English The Internet, 2017 30% English 20% Chinese 8% 6% Spanish 5% Japanese 4% Portuguese 3% German

Combining Crowd and AI to scale professional-quality translation Joo Graa Joo Graa CTO

Combining Crowd and Expert Labels using Decision Theoretic Active Learning An T. Nguyen 1 Byron

How to Stand Out from the Crowd on How to Stand Out from the Crowd on LinkedIn LinkedIn Maureen

General Schemes of Combining Rules and the Quality Characteristics of Combining Alexander Lepskiy

URBAN SCALE CROWD DATA ANALYSIS, SIMULATION, AND VISUALIZATION Isaac Rudomin May 2017 ABSTRACT

Modeling crowds at mass-events: learning large-scale crowd

Emergent, Crowd-scale Programming Practice in the IDE Ethan Fast , Daniel Ste ff ee, Lucy Wang,

Personalized Professional and Quality Personalized Professional and Quality Service Service Job

TV that matters Case study Poland (see annex) By combining the innovative Philips professional

COMBINING MODELING AND MEASUREMENT TECHNIQUES Shari Beth Libicki, PhD November 7, 2019 RACIE

Motivation of Crowd Workers Does it matter? Babak Naderi Quality and Usability Lab, Telekom

Exploring the Design Space of Combining Linux with Lightweight Kernels for Extreme Scale

Review Transformations Scale Translate Rotate Combining Transformations

Quality Improvement - combining digital technology with innovative learning Nikki Davey

Combining Ocean, Wave, Hydrologic, Riverine Flow Models at a Local and Regional Scale Along the

SatNOGS Crowd-sourced satellite operations Nikos Roussos Libre Space Foundation Hunting

Business Etiquette Passage for Professional Quality Presentation Professional Presentation

POV & EXPERIENCE PROTOTYPES SLOANE, TINA, MARIE & KARNA CROWDPOWER DREAM TEAM Sloane

CrowdsFunding Gilad Ravid, PhD Crowd Sourcing Pooling Collective Knowledge Ushahidi

Collective Intelligence-Based Quality Assurance: Combining Inspection and Risk Assessment to

Utilizing Crowd Funding Utilizing Crowd Funding for Support SMEs funding for Support SMEs

Hyper-local scale air quality modelling Fernando Martn & Jos Luis Santiago CIEMAT What

Evolving Role of the Quality Professional Presented By General Manager Greg Weiler ASQ Asia

Opportunity Based Thinking; Is this a Paradigm Shift for a Quality Professional? Angelo

Combining Crowd and AI to scale professional-quality translation - PowerPoint PPT Presentation

Combining Crowd and AI to scale professional-quality translation Joo Graa CTO The Internet, 1997 80% English The Internet, 2017 30% English 20% Chinese 8% 6% Spanish 5% Japanese 4% Portuguese 3% German

Combining Crowd and AI to scale professional-quality translation Joo Graa Joo Graa CTO

Combining Crowd and Expert Labels using Decision Theoretic Active Learning An T. Nguyen 1 Byron

How to Stand Out from the Crowd on How to Stand Out from the Crowd on LinkedIn LinkedIn Maureen

General Schemes of Combining Rules and the Quality Characteristics of Combining Alexander Lepskiy

URBAN SCALE CROWD DATA ANALYSIS, SIMULATION, AND VISUALIZATION Isaac Rudomin May 2017 ABSTRACT

Modeling crowds at mass-events: learning large-scale crowd

Emergent, Crowd-scale Programming Practice in the IDE Ethan Fast , Daniel Ste ff ee, Lucy Wang,

Personalized Professional and Quality Personalized Professional and Quality Service Service Job

TV that matters Case study Poland (see annex) By combining the innovative Philips professional

COMBINING MODELING AND MEASUREMENT TECHNIQUES Shari Beth Libicki, PhD November 7, 2019 RACIE

Motivation of Crowd Workers Does it matter? Babak Naderi Quality and Usability Lab, Telekom

Exploring the Design Space of Combining Linux with Lightweight Kernels for Extreme Scale

Review Transformations Scale Translate Rotate Combining Transformations

Quality Improvement - combining digital technology with innovative learning Nikki Davey

Combining Ocean, Wave, Hydrologic, Riverine Flow Models at a Local and Regional Scale Along the

SatNOGS Crowd-sourced satellite operations Nikos Roussos Libre Space Foundation Hunting

Business Etiquette Passage for Professional Quality Presentation Professional Presentation

POV &amp; EXPERIENCE PROTOTYPES SLOANE, TINA, MARIE &amp; KARNA CROWDPOWER DREAM TEAM Sloane

CrowdsFunding Gilad Ravid, PhD Crowd Sourcing Pooling Collective Knowledge Ushahidi

Collective Intelligence-Based Quality Assurance: Combining Inspection and Risk Assessment to

Utilizing Crowd Funding Utilizing Crowd Funding for Support SMEs funding for Support SMEs

Hyper-local scale air quality modelling Fernando Martn &amp; Jos Luis Santiago CIEMAT What

Evolving Role of the Quality Professional Presented By General Manager Greg Weiler ASQ Asia

Opportunity Based Thinking; Is this a Paradigm Shift for a Quality Professional? Angelo

POV & EXPERIENCE PROTOTYPES SLOANE, TINA, MARIE & KARNA CROWDPOWER DREAM TEAM Sloane

Hyper-local scale air quality modelling Fernando Martn & Jos Luis Santiago CIEMAT What