Week 2: Overview � Data-driven, statistical approaches to MT Machine Translation � The noisy channel model [Brown et al. 1990, Knight 1999] – Classical and Statistical Approaches � Language modeling � Translation modeling � Word alignment � Phrase alignment [Koehn et al. 2003] Session 6: Statistical MT – Intro (1) � Decoding [Koehn 1994] � Lab exercise: building a phrase-based statistical MT Jonas Kuhn system from parallel texts taken from the Internet Universität des Saarlandes, Saarbrücken � Evaluation methods The University of Texas at Austin jonask@coli.uni-sb.de � Other uses of word alignments [Yarowsky et al. 2001] DGfS/CL Fall School 2005, Ruhr-Universität Bochum, September 19-30, 2005 Jonas Kuhn: MT 2 Sessions 6/7: Statistical MT – Intro Translation without understanding? � Acknowledgements: � Translation is easy for (bilingual) people � Some slides are borrowed from Kevin Knight, � Process: University of Southern California, from Colin Cherry, Alberta (see http://www.cs.ualberta.ca/~colinc) and � Read the text in French from Leila Kosseim (http://www.cs.concordia.ca/~kosseim/) � Understand it � Write it down in English � “Translation without understanding” � Very brief introduction to probabilities � The noisy channel model for translation � Language modeling � Translation modeling � Decoding Jonas Kuhn: MT 3 Jonas Kuhn: MT 4
Translation without understanding? One approach: Rule-based MT � Translation is easy for (bilingual) people � Compare week 1 � Process: � Read the text in French � Problems: � Understand it � Building a broad-coverage system is an enormous engineering challenge � Write it down in English � Adding new languages/text domains is very � Hard for computers costly � The human process is invisible, intangible � Many disambiguation decisions cannot be made without world knowledge/contextual knowledge Jonas Kuhn: MT 5 Jonas Kuhn: MT 6 Data-Driven Machine Translation Alternative Approach: Statistical MT � Go back to Warren Weaver’s idea of using statistical techniques find the most probable Hmm, every time he sees Man, this is so boring. “banco”, he either types translation of a given sentence “bank” or “bench” … but if he sees “banco de…”, � We want to translate from French to English he always types “bank”, never “bench”… � Task: given a French sentence, what is the most probable English translation? � Notation: Find E * = arg max E P ( E|F ) Translated documents Slide from Kevin Knight Jonas Kuhn: MT 7 Jonas Kuhn: MT 8
Centauri/Arcturan [Knight, 1997] Recent Progress in Statistical MT Exercise: translate this to Arcturan: farok crrrok hihok yorok clok kantok ok-yurp 1a. ok-voon ororok sprok . 7a. lalok farok ororok lalok sprok izok enemok . 2002 2003 2002 2003 1b. at-voon bichat dat . 7b. wat jjat bichat wat dat vat eneat . insistent Wednesday may recurred her Egyptair Has Tomorrow to Resume Its 2a. ok-drubel ok-voon anok plok sprok . 8a. lalok brok anok plok nok . trips to Libya tomorrow for flying Flights to Libya 2b. at-drubel at-voon pippat rrat dat . 8b. iat lat pippat rrat nnat . Cairo 6-4 ( AFP ) - an official Cairo 4-6 (AFP) - said an official at the announced today in the Egyptian lines Egyptian Aviation Company today that company for flying Tuesday is a the company egyptair may resume as 3a. erok sprok izok hihok ghirok . 9a. wiwok nok izok kantok ok-yurp . company " insistent for flying " may of tomorrow, Wednesday its flights to resumed a consideration of a day Libya after the International Security 3b. totat dat arrat vat hilat . 9b. totat nnat quat oloat at-yurp . Wednesday tomorrow her trips to Libya Council resolution to the suspension of 4a. ok-voon anok drok brok jok . 10a. lalok mok nok yorok ghirok clok . of Security Council decision trace the embargo imposed on Libya. international the imposed ban comment . " The official said that the company 4b. at-voon krat pippat sat lat . 10b. wat nnat gat mat bat hilat . And said the official " the institution sent had sent a letter to the Ministry of 5a. wiwok farok izok stok . 11a. lalok nok crrrok hihok yorok zanzanok . a speech to Ministry of Foreign Affairs of Foreign Affairs, information on the lifting on Libya air , a situation her lifting of the air embargo on Libya, 5b. totat jjat quat cat . 11b. wat nnat arrat mat zanzanat . receiving replying are so a trip will pull to where it had received a response, the Libya a morning Wednesday " . first take off a trip to Libya on 6a. lalok sprok izok jok stok . 12a. lalok rarok nok izok hihok mok . Wednesday morning ". 6b. wat dat krat quat cat . 12b. wat nnat forat arrat vat gat . Slide from C. Wayne, DARPA 9 Jonas Kuhn: MT 10 Very brief intro to probabilities Pop star example � Using “common sense”, we are pretty good at � Assume you are a photo reporter and want to dealing with the likelihood of (random) events take an exclusive picture of an international pop star who’s on tour in Germany � Probability functions assign a value between � There are rumors that certain concerts will get 0 and 1 to the occurrence of a particular cancelled outcome of a random event � Example: rolling a die – P ( � ) = 1/6 = 0.1667 � You want to guess what route the pop star will take through Germany � Each route has a certain probability Wait at a � We need some terminology and notation location along the route with the highest probability to take the picture Jonas Kuhn: MT 11 Jonas Kuhn: MT 12
Probabilities Calculations with probabilities � Simple probability (Prior probability) P (A) � How likely is it that the pop star will show up at the Reichstag [ What is P (Rtg) ]? � You call up the tour manager and ask whether the concert in Berlin will be cancelled or not � All we have are conditional probabilities for the pop star visiting the � “With 60% probability the concert will take place” Reichstag, so we have to consider both options for the precondition P (CiB) = 0.6 P (CiB) = 0.6 P (nCiB) = 0.4 P (nCiB) = 0.4 0.25 P (Rtg | CiB) = � Conditional probability (Posterior probability) P (A|B) � P (Rtg | nCiB) = 0.1 Joint probability P (A,B) � If the pop star has a concert in Berlin how likely is it that she will � P (Rtg, CiB) = P (CiB) × P (Rtg | CiB) = 0.6 × 0.25 = 0.15 visit the Reichstagsgebäude? � One out of four pop stars who gives a concert in Berlin also visits � P (Rtg, nCiB) = P (nCiB) × P (Rtg | nCiB) = 0.4 × 0.1 = 0.04 Reichstagsgebäude � Since CiB and nCiB cover the full space of probabilities we get: � Only 10% of the pop stars who don‘t giv e a concert in Berlin visit P (Rtg) = P (Rtg, CiB) + P (Rtg, nCiB) = 0.15 + 0.04 = 0.19 the Reichstagsgebäude � What‘s the use of an exact value like this? P (Rtg | CiB) = 0.25 � Comparison with alternative options, e.g., P (FRA_Airport) = 0.25 P (Rtg | nCiB) = 0.1 Jonas Kuhn: MT 13 Jonas Kuhn: MT 14 Calculations with probabilities Bayes’ Law � We just exploited the fact that joint probabilities [i.e., P(A,B)] can be × calculated by multiplying the prior probability for one event with the ( | ) ( ) P B A P A = ( | ) conditional probability for the other event, given the first event P A B ( ) P B � This is called the “chain rule” � We can go either way (because P (A,B)= P (B,A) ): P (A,B) = P (A) × P ( B | A ) or � This is called Bayes’ Law P (A,B) = P (B) × P ( A | B ) � Importance: Often, training … � So: P ( B ) × P ( A | B ) = P ( A ) × P ( B | A ) [i.e., statistical parameter estimation from a sample of random experiments] � Divide both sides of the equation by P ( B ): … for one of the two conditional probabilities can be done much more reliably than for the other one × ( | ) ( ) P B A P A = ( | ) P A B ( ) P B Jonas Kuhn: MT 15 Jonas Kuhn: MT 16
Bayes’ Law Crime Scene Analogy × P ( B | A ) P ( A ) � B is a crime scene. A is a person who may have = P ( A | B ) committed the crime ( ) P B � P(A|B) - look at the scene - who did it? � When we are only looking for the most likely outcome A* for an event, given a fixed event B, the � P(A) - who had a motive? (Profiler) denominator doesn’t play a role: � P(B|A) - could they have done it? (transportation, = access to weapons, alibi) * arg max ( | ) A P A B A � Some people might have great motives, but no × ( | ) ( ) P B A P A = means - you need both! arg max A ( ) P B = × arg max ( | ) ( ) P B A P A A Jonas Kuhn: MT 17 Jonas Kuhn: MT 18 Back to translation Why Bayes rule at all? � We want to translate from French to English � Why not model P (E|F) directly? � Task: given a French sentence, what is the most � P (F|E) × P (E) decomposition allows us to be sloppy probable English translation? � Notation: Find E * = arg max E P ( E|F ) � P (E) worries about good English: � With Bayes’ law we can search the E that maximizes Fluency P (F|E) × P (E) � P (F|E) worries about French that matches English: � Find the English string E for which the product of Faithfulness � P (E) [language model probability] times � P (F|E) [translation model probability E � F] � The two can be trained independently is maximal Jonas Kuhn: MT 19 Jonas Kuhn: MT 20
Recommend
More recommend