machine translation proposal pilot objective
play

Machine Translation Proposal Pilot Objective: Cost : PEMT 27% - PowerPoint PPT Presentation

Machine Translation Proposal Pilot Objective: Cost : PEMT 27% savings over human translation Efficiency : PEMT 25% faster than human translation Quality : an acceptable score under 30 according to the Harmonized the TAUS Dynamic


  1. Machine Translation Proposal

  2. Pilot Objective: ● Cost : PEMT 27% savings over human translation ● Efficiency : PEMT 25% faster than human translation ● Quality : an acceptable score under 30 according to the Harmonized the TAUS Dynamic Quality Framework (DQF) and Multidimensional Quality Metrics (MQM)

  3. Error Type Minor Major Critical Accuracy omission 1 2 3 mistranslation 1 2 3 untranslated 1 2 3 Terminology inconsistent with termbase 1 2 3 inconsistent use of terminology 1 2 3 Fluency grammar 1 2 3

  4. Pilot Project Processes & Problems

  5. Step 1: File Preparation PDF DOCX Delete the unnecessary content Delete the extra space Make the file clear; the alignment easier

  6. Step2: Testing and Training Round A: tmx only Round B: adding related PDFs BLEU score increased

  7. Step3 First Round of Human Evaluation ❖ 2 post-editors ❖ A sample of 1000 words extracted from one of the system ❖ Analyzed and gave the quality score first ❖ average PE time: 53 minutes

  8. Step4: Tuning Failed to train the system put them into training data it works:)!

  9. Step 5: Adding dictionary ❖ 512-page IMF glossary ❖ Converted from PDF to DOCX ❖ Cleaned up formats and terms ❖ Added it into the dictionary data ❖ Trained two systems

  10. Problem Time consuming glossary clean-up

  11. Problem Failed to add the glossary into dictionary data

  12. Problem Lower BLEU score after adding the glossary

  13. Step 6: The final round of human evaluation ❖ 2 post-editors ❖ A sample of 1000 words extracted from the system with the highest BLEU score(44.03) ❖ Analyzed and gave the quality score first ❖ Average PE time: 0.68 hours

  14. Problem Mistranslated and untranslated MT due to incomplete manual cleanup

  15. Pilot Project Results

  16. 85% Efficiency

  17. 71% Cost HT: ❖ Translation: $0.12/word ❖ Editing: $0.05/word PEMT: ❖ Post-editing: $0.05/word

  18. 31.5 Quality

  19. Quality QA Error Score: 49 31.5 PE time for 1000 words: 53mins 40.8mins Comparison of two rounds of human evaluation

  20. Lessons Learned

  21. PDF formating cleanup

  22. When in doubt, check the system ➔ The system IS objective ➔ TMX IS better than PDF

  23. Content relevance is key

  24. ➔ “ Terminology ” and ” Fluency ” performance are improved ➔ Further data needs to be collected for assessment accuracy.

  25. Thank you

Recommend


More recommend