combining crowd and ai to scale professional quality
play

Combining Crowd and AI to scale professional-quality translation - PowerPoint PPT Presentation

Combining Crowd and AI to scale professional-quality translation Joo Graa CTO The Internet, 1997 80% English The Internet, 2017 30% English 20% Chinese 8% 6% Spanish 5% Japanese 4% Portuguese 3% German


  1. Combining Crowd and AI to scale professional-quality translation João Graça CTO

  2. The Internet, 1997 80% 
 English

  3. The Internet, 2017 30% 
 English 20% 
 Chinese 8% 
 6% 
 Spanish 5% 
 Japanese 4% 
 Portuguese 3% 
 German Arabic

  4. “All translation firms together are able to translate far less than 1% of relevant content produced everyday” CSA – MT Is Unavoidable to Keep Up with Content Volumes

  5. Is Machine Translation already here? * Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions Marcin Junczys-Dowmunt, Tomasz Dwojak, Hieu Hoang Everyone agrees that NMT is here to stay and much better than SMT

  6. Unbabel Experiments with Customer Service Tickets 49,9 43,5 35,7 29,6 Moses Neural MT Moses Neural MT (generic) (generic) (adapted) (adapted) Bleu

  7. Is NMT really enough? * A Neural Network for Machine Translation, at Production Scale. Google Blog

  8. Quality per Job MQM 100% 80% 60% 40% 20% 0% 0 6 12 18 24 30 Job Good Not sure Bad

  9. Will AI solve translation? JOBS MQM 95 QUALITY TIME MACHINE ONLY

  10. Will AI solve translation? JOBS MQM 95 HUMAN EFFORT TIME

  11. Quality per Job MQM 100% 80% 60% 40% 20% 0% 0 6 12 18 24 30 Job Good Not sure Bad

  12. Unbabel Pipeline

  13. Unbabel Pipeline Job

  14. Unbabel Pipeline Job Machine Translation

  15. Unbabel Pipeline Job Machine Translation

  16. Unbabel Pipeline Q.E. Job Machine Quality Translation Estimation

  17. Unbabel Pipeline High Q.E. Q.E. Job Customer Machine Quality Translation Estimation

  18. Unbabel Pipeline High Q.E. Low Q.E. Q.E. Job Customer Machine Quality Translation Community Estimation Re-Eval Translators

  19. Data Generation Engine Q.E. Q.E. Customer Job Customer Machine Quality Quality Community Translation Estimation Estimation Translators

  20. 
 Data Generation Engine Before After Initial text Initial text 
 All changes in between: NO Mouse clicks DATA Key presses Timestamps POINTS Submitted text Submitted text

  21. Keystroke Analysis Raw data Processed information At 18:03:30: At 18:03:35: At 18:03:30: In nugget 3 In nugget 3 In nugget 3 Initial text mouseClick Pressed Shift mouseClick “Espero que esto es útil” Cursor at 16 Cursor at 25 Cursor at 16 Selected: 0 Selected: 0 Selected: 0 • Deleted word “ es” At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 • Inserted word “sea” Pressed Backspace Pressed s Pressed Backspace Cursor at 16 Cursor at 25 Cursor at 16 Submitted text Selected: 0 Selected: 0 Selected: 0 “Espero que esto sea útil” At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 Pressed Backspace Pressed i Pressed Backspace Cursor at 15 Cursor at 26 Cursor at 15 Selected: 0 Selected: 0 Selected: 0 At 18:03:31: At 18:03:35: At 18:03:31: In nugget 3 In nugget 3 In nugget 3 Pressed Backspace Pressed e Pressed Backspace

  22. Unbabel Pipeline MACHINE QE COMMUNITY High Q.E. Low Q.E. Q.E. Job Customer Machine Quality Translation Crowd Estimation Re-Eval

  23. Machine Translation Pipeline Translation Memory Job Result MT Router Customer MT Customer APE

  24. Machine Translation Models Phrase-based MT Neural MT

  25. Customer Adaptation Customer Support Tickets 60 52,5 51,7 48,8 47,2 46,9 45 43,2 43,1 37,5 34,1 Bleu 30 G N 1 5 1 2 2 K K 0 0 5 M o K k K o T g l e N M T

  26. Quality Estimation Word-Level QE 
 Which words are translated correctly/incorrectly? Sentence-Level QE 
 How good is the entire translation?

  27. Quality Estimation Word-level QE example Hey lá , eu sou pesaroso sobre aquele ! BA BA BA BA BA OK OK OK OK D D D D D

  28. QE Training Bad translation Unbabel Ticket source MT final Good translation

  29. QE Word Level

  30. QE Word Level F1_MULT Ugent System LinearQE NeuralQE StackedQE APE-QE FullStackedQE 0 15 30 45 60

  31. QE Word Level F1_MULT 41,1 Ugent System LinearQE NeuralQE StackedQE APE-QE FullStackedQE 0 15 30 45 60

  32. QE Word Level F1_MULT 41,1 Ugent System 45,2 LinearQE NeuralQE StackedQE APE-QE FullStackedQE 0 15 30 45 60

  33. QE Word Level F1_MULT 41,1 Ugent System 45,2 LinearQE 47,3 NeuralQE StackedQE APE-QE FullStackedQE 0 15 30 45 60

  34. QE Word Level F1_MULT 41,1 Ugent System 45,2 LinearQE 47,3 NeuralQE 50,3 StackedQE WMT 16 WL QE Winner APE-QE FullStackedQE 0 15 30 45 60

  35. QE Word Level F1_MULT 41,1 Ugent System 45,2 LinearQE 47,3 NeuralQE 50,3 StackedQE WMT 16 WL QE Winner 55,7 APE-QE FullStackedQE 0 15 30 45 60

  36. QE Word Level F1_MULT 41,1 Ugent System 45,2 LinearQE 47,3 NeuralQE 50,3 StackedQE WMT 16 WL QE Winner 55,7 APE-QE 57,5 FullStackedQE 0 15 30 45 60 Andre F. T. Martins, Marcin Junczys-Dowmunt, Fabio Kepler, Ramon Astudillo, Chris Hokamp, Roman Grundkiewicz. “Pushing the Limits of Translation Quality Estimation.” TACL 2017 (To Appear)

  37. QE Sentence Level

  38. QE Sentence Level YANDEX StackedQE APE-QE FullStackedQE 0 17,5 35 52,5 70 Pearson Correlation

  39. QE Sentence Level 52,5 YANDEX StackedQE APE-QE FullStackedQE 0 17,5 35 52,5 70 Pearson Correlation

  40. QE Sentence Level 52,5 YANDEX 54,9 StackedQE APE-QE FullStackedQE 0 17,5 35 52,5 70 Pearson Correlation

  41. QE Sentence Level 52,5 YANDEX 54,9 StackedQE 61,3 APE-QE FullStackedQE 0 17,5 35 52,5 70 Pearson Correlation

  42. QE Sentence Level 52,5 YANDEX 54,9 StackedQE 61,3 APE-QE 65,6 FullStackedQE 0 17,5 35 52,5 70 Pearson Correlation Andre F. T. Martins, Marcin Junczys-Dowmunt, Fabio Kepler, Ramon Astudillo, Chris Hokamp, Roman Grundkiewicz. “Pushing the Limits of Translation Quality Estimation.” TACL 2017 (To Appear)

  43. QE in the Pipeline

  44. QE in the Pipeline High Q.E. Q.E. Job Customer Machine Quality Translation Estimation

  45. QE in the Pipeline High Q.E. Q.E. Job Customer Machine Quality Translation Estimation Document-Level QE 
 how good is the entire document?

  46. QE in the Pipeline High Q.E. Low Q.E. Q.E. Job Customer Machine Quality Community Translation Estimation Re-Eval Translators Document-Level QE 
 how good is the entire document?

  47. QE in the Pipeline High Q.E. Low Q.E. Q.E. Job Customer Machine Quality Community Translation Estimation Re-Eval Translators Document-Level QE 
 Interesting how good is the entire document? numbers Human QE 
 coming soon Can we evaluate post-edit output?

  48. Professional Translation Goals Quality Speed Cost

  49. Professional Translation Goals Pillars Quality Speed Cost

  50. Professional Translation Goals Pillars Quality Speed Cost

  51. Professional Translation Goals Pillars Quality • Editors Pool Speed Cost

  52. Professional Translation Goals Pillars Quality • Editors Pool • Initial Text (MT) Speed Cost

  53. Professional Translation Goals Pillars Quality • Editors Pool • Initial Text (MT) Speed • Editor Assignment Cost

  54. Professional Translation Goals Pillars Quality • Editors Pool • Initial Text (MT) Speed • Editor Assignment • Interfaces Cost

  55. Professional Translation Goals Pillars Quality • Editors Pool • Initial Text (MT) Speed • Editor Assignment • Interfaces Cost • Quality Evaluation

  56. Unbabel Community

  57. Unbabel Community 50 000 Users

  58. Distributed Pipeline Quality Estimation

  59. Editors Pool Specialisation layers will grow with time Expert The best editors have access to paid content Paid Content Editors get ratings for the tasks Training Content Editors are tested when they sign up Testing Phase

  60. Editors Pool Evaluators Specialisation layers will grow with time Expert The best editors have access to paid content Paid Content Editors get ratings for the tasks Training Content Editors are tested when they sign up Testing Phase

  61. Editors Pool Evaluators Specialisation layers will grow with time Expert Annotators The best editors have access to paid content Paid Content Editors get ratings for the tasks Training Content Editors are tested when they sign up Testing Phase

  62. How Editors are Evaluated

  63. Editors Profiling

  64. Editors Profiling

  65. Editor Assignment

  66. Editor Assignment Tasks/time 2 m 6 m 10 m 12 m 18 m 45 m

  67. Editor Assignment Tasks/time Editors 2 m 6 m 10 m 12 m 18 m 45 m

  68. Editor Assignment Tasks/time Editors 2 m 6 m Pull 10 m 12 m 18 m 45 m

  69. Editor Assignment SLA Tasks/time Editors 6 H 2 m 30 m 6 m Pull 2 D 10 m 6 D 12 m 20 m 18 m 40 m 45 m

  70. Editor Assignment Priority SLA Tasks/time Editors 6 H 1000 2 m 30 m 1100 6 m Pull 2 D 1000 10 m 6 D 1000 12 m 20 m 1100 18 m 40 m 1100 45 m

  71. Editor Assignment Queue Priority SLA Tasks/time Editors G 6 H 1000 2 m 30 m G 1100 6 m Pull 2 D G 1000 10 m 6 D G 1000 12 m 20 m 1100 R 18 m 40 m 1100 45 m R

Recommend


More recommend