Week 2: Overview � Data-driven, statistical approaches to MT Machine Translation � The noisy channel model [Brown et al. 1990, Knight 1999] – Classical and Statistical Approaches � Language modeling � Translation modeling � Word alignment [Koehn et al. 2003] � Phrase alignment � Decoding [Koehn 1994] Session 10: MT Evaluation & Wrap-Up � Lab exercise: building a phrase-based statistical MT Jonas Kuhn system from parallel texts taken from the Internet Universität des Saarlandes, Saarbrücken � Evaluation methods The University of Texas at Austin jonask@coli.uni-sb.de � Other uses of word alignments [Yarowsky et al. 2001] DGfS/CL Fall School 2005, Ruhr-Universität Bochum, September 19-30, 2005 Jonas Kuhn: MT 2 Today’s session Running the decoder � Sample data taken from � Lab exercise: � http://www.statmt.org/wpt05/mt-shared-task/ � Running the phrase-based decoder Pharaoh � Large French-English phrase table (trained from � MT Evaluation Europarl) � Language model for English � Other uses of word alignments � Test sentences in French (along with model solution) � Wrap-Up � A script is provided for filtering out the relevant part of � Final projects the translation table for a set of test sentences � Certificates of participation � run-filtered-pharaoh.perl filtered100.fr pharaoh pharaoh.fr.ini test100.fr.lowercase "-monotone" > test100.fr.out.monotone Jonas Kuhn: MT 3 Jonas Kuhn: MT 4
Translation results Translation with Pharaoh decoder � Original: � Original: Nous savons très bien que les Traités actuels ne suffisent pas et Nous savons très bien que les Traités actuels ne suffisent pas qu' il sera nécessaire à l' avenir de développer une structure plus et qu' il sera nécessaire à l' avenir de développer une structure efficace et différente pour l' Union, une structure plus plus efficace et différente pour l' Union, une structure plus constitutionnelle qui indique clairement quelles sont les constitutionnelle qui indique clairement quelles sont les compétences des États membres et quelles sont les compétences des États membres et quelles sont les compétences de l' Union. compétences de l' Union. � Reference translation: � Phraraoh translation: We know all too well that the present Treaties are inadequate we know very well that the current treaties are not enough and and that the Union will need a better and different structure in that it will be necessary in the future to develop a structure future, a more constitutional structure which clearly distinguishes which is more effective and different for the union , a structure the powers of the Member States and those of the Union . more constitutional which makes it clear what are the powers of the member states , and what are the powers of the union . Jonas Kuhn: MT 5 Jonas Kuhn: MT 6 MT Evaluation Commercial system (online version) � Manual: � Original: Nous savons très bien que les Traités actuels ne suffisent pas � SSER (subjective sentence error rate) et qu' il sera nécessaire à l' avenir de développer une structure � Correct/Incorrect plus efficace et différente pour l' Union, une structure plus � Error categorization constitutionnelle qui indique clairement quelles sont les compétences des États membres et quelles sont les � Testing in an application that uses MT as one sub- compétences de l' Union. component � Question answering from foreign language documents � Systran translation: (http://www.systransoft.com/) We know very well that the current Treaties are not enough and that it will be necessary in the future to develop a more effective � Automatic: and different structure for the Union, a more constitutional � WER (word error rate) structure which indicates clearly which are competences of the � BLEU (Bilingual Evaluation Understudy) Member States and which are competences of the Union. Jonas Kuhn: MT 7 Jonas Kuhn: MT 8 Slides from Kevin Knight
BLEU Evaluation Metric BLEU Evaluation Metric (Papineni et al, ACL-2002) (Papineni et al, ACL-2002) Reference (human) translation: Reference (human) translation: • N-gram precision (score is between 0 & 1) BLEU4 formula The U.S. island of Guam is The U.S. island of Guam is maintaining a high state of alert – What percentage of machine n-grams can maintaining a high state of alert (counts n-grams up to length 4) after the Guam airport and its be found in the reference translation? after the Guam airport and its offices both received an e-mail offices both received an e-mail – An n-gram is a sequence of n words exp (1.0 * log p1 + from someone calling himself the from someone calling himself the – Not allowed to use same portion of Saudi Arabian Osama bin Laden Saudi Arabian Osama bin Laden 0.5 * log p2 + reference translation twice (can’t cheat by and threatening a and threatening a 0.25 * log p3 + typing out “the the the the the”) biological/chemical attack against biological/chemical attack against 0.125 * log p4 – public places such as the airport . public places such as the airport . max(words-in-reference / words-in-machine – 1, • Brevity penalty 0) – Can’t just type out single word “the” (precision 1.0!) p1 = 1-gram precision Machine translation: Machine translation: The American [?] international The American [?] international p2 = 2-gram precision airport and its the office all airport and its the office all *** Amazingly hard to “game” the system (i.e., find p3 = 3-gram precision receives one calls self the sand receives one calls self the sand a way to change machine output so that BLEU p4 = 4-gram precision Arab rich business [?] and so on Arab rich business [?] and so on goes up, but quality doesn’t) electronic mail , which sends out ; electronic mail , which sends out ; The threat will be able after public The threat will be able after public place and so on the airport to start place and so on the airport to start the biochemistry attack , [?] highly the biochemistry attack , [?] highly alerts after the maintenance. alerts after the maintenance. Jonas Kuhn: MT 9 Jonas Kuhn: MT 10 Slides from Kevin Knight Slides from Kevin Knight Multiple Reference Translations BLEU Tends to Predict Human Judgments 2.5 (variant of BLEU) Reference translation 1: Reference translation 1: Reference translation 2: Reference translation 2: Adequacy The U.S. island of Guam is maintaining The U.S. island of Guam is maintaining Guam International Airport and its Guam International Airport and its 2.0 R 2 = 88.0% a high state of alert after the Guam a high state of alert after the Guam offices are maintaining a high state of offices are maintaining a high state of airport and its offices both received an airport and its offices both received an alert after receiving an e-mail that was alert after receiving an e-mail that was Fluency R 2 = 90.2% e-mail from someone calling himself e-mail from someone calling himself from a person claiming to be the from a person claiming to be the 1.5 the Saudi Arabian Osama bin Laden the Saudi Arabian Osama bin Laden wealthy Saudi Arabian businessman wealthy Saudi Arabian businessman and threatening a biological/chemical and threatening a biological/chemical Bin Laden and that threatened to Bin Laden and that threatened to 1.0 attack against public places such as attack against public places such as launch a biological and chemical attack launch a biological and chemical attack the airport . the airport . on the airport and other public places . on the airport and other public places . 0.5 NIST Score Machine translation: Machine translation: The American [?] international airport The American [?] international airport 0.0 and its the office all receives one calls and its the office all receives one calls -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 self the sand Arab rich business [?] self the sand Arab rich business [?] and so on electronic mail , which and so on electronic mail , which -0.5 sends out ; The threat will be able sends out ; The threat will be able after public place and so on the after public place and so on the -1.0 airport to start the biochemistry attack airport to start the biochemistry attack , [?] highly alerts after the , [?] highly alerts after the maintenance. maintenance. -1.5 Reference translation 3: Reference translation 3: Reference translation 4: Reference translation 4: -2.0 The US International Airport of Guam The US International Airport of Guam US Guam International Airport and its US Guam International Airport and its and its office has received an email and its office has received an email office received an email from Mr. Bin office received an email from Mr. Bin -2.5 from a self-claimed Arabian millionaire from a self-claimed Arabian millionaire Laden and other rich businessman Laden and other rich businessman named Laden , which threatens to named Laden , which threatens to from Saudi Arabia . They said there from Saudi Arabia . They said there Human Judgments launch a biochemical attack on such launch a biochemical attack on such would be biochemistry air raid to Guam would be biochemistry air raid to Guam public places as airport . Guam public places as airport . Guam Airport and other public places . Guam Airport and other public places . Guam authority has been on alert . authority has been on alert . needs to be in high precaution about needs to be in high precaution about this matter . this matter . slide from G. Doddington (NIST) Jonas Kuhn: MT 11 Jonas Kuhn: MT 12 Slides from Kevin Knight
Recommend
More recommend