Building universal understanding Combining Crowd and AI to scale professional-quality translation João Graça João Graça CTO CTO Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 41
The internet, 1997 80% English Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 42
The internet, 2017 30% English 20% Chinese 8% Spanish 6% Japanese 5% Portuguese 4% German 3% Arabic Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 43
Language barriers = trade barriers “Everyone Just 12% speaks English” costs the UK of EU retailers sell online £48B to other EU countries Just 15% of EU consumers buy online 3.5% UK GDP every year from other EU countries Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 44
Available Solutions Lack of fast, affordable translation with human quality Machine Professional Translation Translation Affordable Expensive Fast Slow Quality not good enough 5 Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 45
“All translation firms together are able to translate far less than 1% of relevant content produced everyday” CSA – MT Is Unavoidable to Keep Up with Content Volumes Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 46
Will AI solve translation? JOBS MQM 95 QUALITY MACHINE ONLY TIME Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 47
Will AI solve translation? JOBS MQM 95 HUMAN EFFORT TIME Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 48
Quality per Job MQM 100% 80% 60% 40% 20% 0% 0 6 12 18 24 30 Job Good Not sure Bad Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 49
Unbabel Pipeline High Q.E. Low Q.E. Q.E. Original Translated Machine customer Quality customer Translation Community request request Estimation Re-Eval Translators Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 50
Machine Translation Pipeline Translation Memory Job Result MT Router Customer MT Customer APE Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 51
Customer Adaptation Customer Support Tickets MQM MQM 100 94,0 82,5 80,0 65 65,0 47,5 50,0 30 S N C P M r M u o s T f T t e o s m s i i o z e n a d l Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 52
Quality Estimation Word-Level QE Which words are translated correctly/incorrectly? Sentence-Level QE How good is the entire translation? Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 53
Quality Estimation Word-level QE example Hey lá , eu sou pesaroso sobre aquele ! BA BA BA BA BA OK OK OK OK D D D D D Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 54
QE Training Bad translation Unbabel Ticket source MT final Good translation Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 55
QE in the Pipeline High Q.E. Low Q.E. Q.E. Job Customer Machine Quality Community Translation Estimation Re-Eval Translators Document-Level QE how good is the entire document? Human QE Can we evaluate post-edit output? Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 56
Data Generation Engine Customer Q.E. Q.E. Customer Job Machine Quality Quality Community Translation Estimation Estimation Translators Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 57
Data Generation Engine Before After Initial text Initial text With Data points: NO Mouse clicks DATA Key presses POINTS Timestamps Submitted text Submitted text Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 58
Keystroke Analysis Raw data Processed information At 18:03:30: At 18:03:35: At 18:03:30: In nugget 3 In nugget 3 In nugget 3 Initial text mouseClick Pressed Shift mouseClick “Espero que esto es útil” Cursor at 16 Cursor at 25 Cursor at 16 Selected: 0 Selected: 0 Selected: 0 At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 • Deleted word “ es” Pressed Backspace Pressed s Pressed Backspace Cursor at 16 Cursor at 25 Cursor at 16 • Inserted word “sea” Selected: 0 Selected: 0 Selected: 0 At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 Pressed Backspace Pressed i Pressed Backspace Submitted text Cursor at 15 Cursor at 26 Cursor at 15 “Espero que esto sea útil” Selected: 0 Selected: 0 Selected: 0 At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 Pressed Backspace Pressed e Pressed Backspace Cursor at 14 Cursor at 27 Cursor at 14 Selected: 0 Selected: 0 Selected: 0 Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 59
Profession translation Unbabel pillars Cost •Editors Pool •Initial Text (MT) •Editor Assignment Speed Quality •Custom Editing Interfaces •Constant Quality Evaluation Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 60
Unbabel Community 50.000 Users Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 61
Editors Pool More specialization layers 4 Expert will be created Editor Annotators Evaluators Only the best rated editors 3 Paid Work have access to customer tasks Editors get rated 2 Training Content with training tasks Testing Phase First tests right after signup 1 Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 62
Evaluation Tool Document Level Human QE Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 63
Deep Annotations Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 64
Error Analysis Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 65
QE for Annotation Pre-fill with word level QE Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 66
Editors Profiling Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 67
Editor Assignment Queue Topics Priority SLA Tasks/time Editors Rating Native Topics G 6 H 4.2 1000 2 m 30 m G 1100 6 m Pull 3.8 2 D G 1000 10 m 4.3 6 D G 1000 12 m 4.8 20 m 1100 R 18 m 40 m 1100 R 45 m Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 68
Editor Assignment Smart distribution Regular distribution 3.8 4.6 old rating Improved rating Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 69
Post-Editing Interfaces Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 70
QE on Interfaces Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 71
Post-Editing Interfaces Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 72
Time Spent on Job Translator 1 Translator 2 MT WAITING WAITING DELIVERY TIME Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 73
Time Spent on Job: Mobile Translator 1 Translator 2 -20% MT WAITING WAITING DELIVERY TIME Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 74
Recommend
More recommend