combining crowd and ai to scale professional quality
play

Combining Crowd and AI to scale professional-quality translation - PowerPoint PPT Presentation

Building universal understanding Combining Crowd and AI to scale professional-quality translation Joo Graa Joo Graa CTO CTO Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March


  1. Building universal understanding Combining Crowd and AI to scale professional-quality translation João Graça João Graça CTO CTO Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 41

  2. The internet, 1997 80% 
 English Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 42

  3. The internet, 2017 30% 
 English 20% 
 Chinese 8% 
 Spanish 6% 
 Japanese 5% 
 Portuguese 4% 
 German 3% 
 Arabic Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 43

  4. Language barriers = trade barriers “Everyone Just 12% speaks English” costs the UK of EU retailers sell online £48B to other EU countries Just 15% of EU consumers buy online 3.5% UK GDP every year from other EU countries Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 44

  5. Available Solutions Lack of fast, affordable translation with human quality Machine Professional Translation Translation Affordable Expensive Fast Slow Quality not 
 good enough 5 Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 45

  6. “All translation firms together are able to translate far less than 1% of relevant content produced everyday” CSA – MT Is Unavoidable to Keep Up with Content Volumes Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 46

  7. Will AI solve translation? JOBS MQM 95 QUALITY MACHINE ONLY TIME Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 47

  8. Will AI solve translation? JOBS MQM 95 HUMAN EFFORT TIME Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 48

  9. Quality per Job MQM 100% 80% 60% 40% 20% 0% 0 6 12 18 24 30 Job Good Not sure Bad Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 49

  10. Unbabel Pipeline High Q.E. Low Q.E. Q.E. Original Translated Machine customer Quality customer Translation Community request request Estimation Re-Eval Translators Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 50

  11. Machine Translation Pipeline Translation Memory Job Result MT Router Customer MT Customer APE Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 51

  12. Customer Adaptation Customer Support Tickets MQM MQM 100 94,0 82,5 80,0 65 65,0 47,5 50,0 30 S N C P M r M u o s T f T t e o s m s i i o z e n a d l Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 52

  13. Quality Estimation Word-Level QE 
 Which words are translated correctly/incorrectly? Sentence-Level QE 
 How good is the entire translation? Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 53

  14. Quality Estimation Word-level QE example Hey lá , eu sou pesaroso sobre aquele ! BA BA BA BA BA OK OK OK OK D D D D D Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 54

  15. QE Training Bad translation Unbabel Ticket source MT final Good translation Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 55

  16. QE in the Pipeline High Q.E. Low Q.E. Q.E. Job Customer Machine Quality Community Translation Estimation Re-Eval Translators Document-Level QE 
 how good is the entire document? Human QE 
 Can we evaluate post-edit output? Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 56

  17. Data Generation Engine Customer Q.E. Q.E. Customer Job Machine Quality Quality Community Translation Estimation Estimation Translators Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 57

  18. Data Generation Engine Before After Initial text Initial text With Data points: NO Mouse clicks DATA Key presses POINTS Timestamps Submitted text Submitted text Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 58

  19. Keystroke Analysis Raw data Processed information At 18:03:30: At 18:03:35: At 18:03:30: In nugget 3 In nugget 3 In nugget 3 Initial text mouseClick Pressed Shift mouseClick “Espero que esto es útil” Cursor at 16 Cursor at 25 Cursor at 16 Selected: 0 Selected: 0 Selected: 0 At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 • Deleted word “ es” Pressed Backspace Pressed s Pressed Backspace Cursor at 16 Cursor at 25 Cursor at 16 • Inserted word “sea” Selected: 0 Selected: 0 Selected: 0 At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 Pressed Backspace Pressed i Pressed Backspace Submitted text Cursor at 15 Cursor at 26 Cursor at 15 “Espero que esto sea útil” Selected: 0 Selected: 0 Selected: 0 At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 Pressed Backspace Pressed e Pressed Backspace Cursor at 14 Cursor at 27 Cursor at 14 Selected: 0 Selected: 0 Selected: 0 Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 59

  20. Profession translation Unbabel pillars Cost •Editors Pool •Initial Text (MT) •Editor Assignment Speed Quality •Custom Editing Interfaces •Constant Quality Evaluation Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 60

  21. Unbabel Community 50.000 Users Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 61

  22. Editors Pool More specialization layers 
 4 Expert will be created Editor Annotators Evaluators Only the best rated editors 3 Paid Work have access to customer tasks Editors get rated 
 2 Training Content with training tasks Testing Phase First tests right after signup 1 Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 62

  23. Evaluation Tool Document Level Human QE Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 63

  24. Deep Annotations Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 64

  25. Error Analysis Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 65

  26. QE for Annotation Pre-fill with word level QE Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 66

  27. Editors Profiling Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 67

  28. Editor Assignment Queue Topics Priority SLA Tasks/time Editors Rating Native Topics G 6 H 4.2 1000 2 m 30 m G 1100 6 m Pull 3.8 2 D G 1000 10 m 4.3 6 D G 1000 12 m 4.8 20 m 1100 R 18 m 40 m 1100 R 45 m Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 68

  29. Editor Assignment Smart distribution Regular distribution 3.8 4.6 old rating Improved rating Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 69

  30. Post-Editing Interfaces Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 70

  31. QE on Interfaces Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 71

  32. Post-Editing Interfaces Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 72

  33. Time Spent on Job Translator 1 Translator 2 MT WAITING WAITING DELIVERY TIME Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 73

  34. Time Spent on Job: Mobile Translator 1 Translator 2 -20% MT WAITING WAITING DELIVERY TIME Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 74

Recommend


More recommend