Five Shades of Noise: Analyzing Machine Translation Errors in User-Generated Text Marlies van der Wees, Arianna Bisazza, Christof Monz
Statistical Machine Translation News sentence: 印度⾦釒融中⼼忄孟买亦受到波及。 (mumbai, india's financial center, was also affected.) 😁 SMT india's financial center mumbai also affected. Five Shades of Noise: Analyzing Machine 2 Translation Errors in User-Generated Text
Statistical Machine Translation SMS sentence: 你路上慢点 (be careful on your way / take your time) 😪 SMT you are on the road to slow points Five Shades of Noise: Analyzing Machine 3 Translation Errors in User-Generated Text
SMT for user-generated text is often bad ✤ Reference ✤ SMT output and if i go out, i will and if i went. ✦ ✦ stop by your place i could not bring it to into its enemies. ✦ ✦ you i've never seen a pig i am seen pig there. ✦ ✦ there you're too delighted to anytime you ✦ ✦ be homesick Five Shades of Noise: Analyzing Machine 4 Translation Errors in User-Generated Text
Towards improving SMT quality for UG ✤ To target specific error types, we need to know why mistakes are made: in UG versus formal text ✦ contrast UG with newswire • in different types of UG ✦ five shades of noise: weblogs, comments, • speech (CTS), SMS, and chat messages in different language pairs ✦ Arabic-English & Chinese-English • Five Shades of Noise: Analyzing Machine 5 Translation Errors in User-Generated Text
Analyzing SMT errors in UG text ✤ What translation choices were made by the SMT system? SMT ✤ What translation choices could have been made by the SMT system? ✤ Why did the SMT system make the ✤ Why did the SMT system make the choices that it made? choices that it made? Five Shades of Noise: Analyzing Machine 6 Translation Errors in User-Generated Text
Word Alignment Driven Evaluation: approach * ✤ For each word alignment link in the test (e.g. 你 — your ) that is translated wrongly, determine: source phrase source phrase source phrase target phrase target phrase target phrase probability probability probability source and target source and target source and target �� �� �� on the road 0.4 source on the road on the road 0.4 0.4 source source phrases both in table, phrases both in table, phrases both in table, phrase not in phrase not in phrase not in �� �� �� but other translation but other translation but other translation on the way on the way on the way 0.3 0.3 0.3 phrase table: phrase table: phrase table: preferred: preferred: preferred: �� �� �� SEEN error SEEN error SEEN error on your way on your way on your way 0.2 0.2 0.2 SCORE error SCORE error SCORE error target phrase target phrase target phrase � � � dot 0.1 dot dot not in phrase table: not in phrase table: not in phrase table: SENSE error SENSE error SENSE error � � � point point point 0.4 * Approach adopted from Irvine et al., Measuring Machine Translation Errors in New Domains , 2013 Five Shades of Noise: Analyzing Machine 7 Translation Errors in User-Generated Text
Word Alignment Driven Evaluation: results Word-level error statistics for Arabic-English benchmarks Word-level error statistics for Arabic-English benchmarks 60 60 Correct Correct Seen 50 50 Sense Score Relative frequency Relative frequency 40 40 30 30 20 20 10 10 0 0 News 1 News 1 News 2 News 2 Weblogs Comments Weblogs Comments CTS CTS Chat Chat SMS SMS News UG Five Shades of Noise: Analyzing Machine 8 Translation Errors in User-Generated Text
Word Alignment Driven Evaluation: findings ✤ SMT errors for UG text differ from SMT errors for news ✦ many SEEN and SENSE errors for UG • between different types of UG ✦ SMS and chat messages are most affected • between different language pairs ✦ differences in Chinese-English are more • subtle than in Arabic-English Five Shades of Noise: Analyzing Machine 9 Translation Errors in User-Generated Text
Analyzing SMT errors in UG: what we learned ✤ Common errors in UG are due to: misspellings or Arabic dialectal forms ✦ formal lexical choices ✦ idioms translated word by word ✦ dropped pronouns in Chinese ✦ ✤ UG suffers from low model coverage generate new translation candidates ✦ normalize existing translation candidates ✦ Five Shades of Noise: Analyzing Machine 10 Translation Errors in User-Generated Text
More Error Analysis? ✤ Visit the poster for: Five Shades of Noise: Analyzing Machine Translation Errors in User-Generated Text Marlies van der Wees Arianna Bisazza Christof Monz Informatics Institute, University of Amsterdam Model coverage analysis ✦ Motivation Five Shades of Noise Statistical machine translation (SMT) of user-generated (UG) text Two language pairs Five UG sets Two news sets input SMS message: output translation: Arabic-English & weblogs, comments, different sources, SMT ���� � Chinese-English speech, SMS, chat to contrast with UG you are on the road Arabic-English versus (= be careful on your way / take your time) to slow points Lower translation quality for UG than for news ✦ Understanding SMT errors in UG text why does SMT make the errors that it makes on UG? SMT low model coverage? poor scoring of translation options? Chinese-English results what errors are observed for various types of UG? Quantitative Analysis: SMT Model Coverage Approach for each phrase pair in the test set Qualitative Examples (e.g. ������� � / take your time), determine: ✦ source phrase covered in the SMT models target phrase covered in the SMT models phrase pair covered in the SMT models all computed for various phrase lengths ✤ Read the paper for: Findings coverage of source phrases and phrase pairs is lower for UG than for news coverage of target phrases is more balanced among test sets coverage dramatically decreases for longer phrases SMS and chat suffer most from low coverage Phrase-length analysis ✦ Qualitative Analysis: Word Alignment Driven Evaluation * so the kids do not feel upset Ref: i 'm online . take your time Ref: — Correct — SEEN error: unknown source Detailed explanation and — SENSE error: 上 网 了 , 你 路上 慢 点 Input: Input: qAlt E$An AlEyAl mtzEl$ unknown target ✦ — SCORE error: suboptimal Output: on the internet , and you are on the road to slow points Output: said because of the sons scoring missing pronoun idiom translated in small chunks lexical choices that are too formal out-of-vocabulary (OOV) not inferred by SMT system losing its meaning as a phrase not reflecting colloquial language due to dialect or misspellings discussions * Irvine et al., Measuring Machine Translation Errors in New Domains , 2013 Conclusions SMT errors for UG text differ promising solutions include UG text This research was funded in part from SMT errors for news improving scoring for news by the Netherlands Organization ���� � for Scientific Research (NWO) SMT under project number 639.022.213 between different types of UG increasing phrase pair coverage for UG between different language pairs increasing source phrase coverage for SMS & chat ACL 2015 Workshop on Noisy User-generated Text (WNUT), Beijing, China m.e.vanderwees@uva.nl Five Shades of Noise: Analyzing Machine 11 Translation Errors in User-Generated Text
Recommend
More recommend