machine translation
play

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn - PowerPoint PPT Presentation

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020 Machine Translation: French (2012) 1 Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020 Machine


  1. Inserting Words 32 ● Words may be added during translation – The English just does not have an equivalent in German – We still need to map it to something: special NULL token 0 1 2 3 4 das Haus ist klein NULL the house is just small 1 2 3 4 5 a ∶ { 1 → 1 , 2 → 2 , 3 → 3 , 4 → 0 , 5 → 4 } Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  2. IBM Model 1 33 ● Generative model: break up translation process into smaller steps – IBM Model 1 only uses lexical translation ● Translation probability – for a foreign sentence f = ( f 1 ,...,f l f ) of length l f – to an English sentence e = ( e 1 ,...,e l e ) of length l e – with an alignment of each English word e j to a foreign word f i according to the alignment function a ∶ j → i l e ǫ p ( e ,a ∣ f ) = t ( e j ∣ f a ( j ) ) ∏ ( l f + 1 ) l e j = 1 – parameter ǫ is a normalization constant Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  3. Example 34 das Haus ist klein t ( e ∣ f ) t ( e ∣ f ) t ( e ∣ f ) t ( e ∣ f ) e e e e the 0.7 house 0.8 is 0.8 small 0.4 that 0.15 building 0.16 ’s 0.16 little 0.4 which 0.075 home 0.02 exists 0.02 short 0.1 who 0.05 household 0.015 has 0.015 minor 0.06 this 0.025 shell 0.005 are 0.005 petty 0.04 p ( e,a ∣ f ) = ǫ 4 3 × t ( the ∣ das ) × t ( house ∣ Haus ) × t ( is ∣ ist ) × t ( small ∣ klein ) = ǫ 4 3 × 0 . 7 × 0 . 8 × 0 . 8 × 0 . 4 = 0 . 0028 ǫ Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  4. 35 em algorithm Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  5. Learning Lexical Translation Models 36 ● We would like to estimate the lexical translation probabilities t ( e ∣ f ) from a parallel corpus ● ... but we do not have the alignments ● Chicken and egg problem – if we had the alignments , → we could estimate the parameters of our generative model – if we had the parameters , → we could estimate the alignments Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  6. EM Algorithm 37 ● Incomplete data – if we had complete data , would could estimate model – if we had model , we could fill in the gaps in the data ● Expectation Maximization (EM) in a nutshell 1. initialize model parameters (e.g. uniform) 2. assign probabilities to the missing data 3. estimate model parameters from completed data 4. iterate steps 2–3 until convergence Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  7. EM Algorithm 38 ... la maison ... la maison blue ... la fleur ... ... the house ... the blue house ... the flower ... ● Initial step: all alignments equally likely ● Model learns that, e.g., la is often aligned with the Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  8. EM Algorithm 39 ... la maison ... la maison blue ... la fleur ... ... the house ... the blue house ... the flower ... ● After one iteration ● Alignments, e.g., between la and the are more likely Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  9. EM Algorithm 40 ... la maison ... la maison bleu ... la fleur ... ... the house ... the blue house ... the flower ... ● After another iteration ● It becomes apparent that alignments, e.g., between fleur and flower are more likely (pigeon hole principle) Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  10. EM Algorithm 41 ... la maison ... la maison bleu ... la fleur ... ... the house ... the blue house ... the flower ... ● Convergence ● Inherent hidden structure revealed by EM Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  11. EM Algorithm 42 ... la maison ... la maison bleu ... la fleur ... ... the house ... the blue house ... the flower ... p(la|the) = 0.453 p(le|the) = 0.334 p(maison|house) = 0.876 p(bleu|blue) = 0.563 ... ● Parameter estimation from the aligned corpus Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  12. IBM Model 1 and EM 43 ● EM Algorithm consists of two steps ● Expectation-Step: Apply model to the data – parts of the model are hidden (here: alignments) – using the model, assign probabilities to possible values ● Maximization-Step: Estimate model from data – take assign values as fact – collect counts (weighted by probabilities) – estimate model from counts ● Iterate these steps until convergence Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  13. 44 phrase-based models Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  14. Phrase-Based Model 45 ● Foreign input is segmented in phrases ● Each phrase is translated into English ● Phrases are reordered Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  15. Phrase Translation Table 46 ● Main knowledge source: table with phrase translations and their probabilities ● Example: phrase translations for natuerlich Probability φ ( ¯ e ∣ ¯ f ) Translation of course 0.5 naturally 0.3 of course , 0.15 , of course , 0.05 Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  16. Real Example 47 ● Phrase translations for den Vorschlag learned from the Europarl corpus: φ ( ¯ e ∣ ¯ f ) φ ( ¯ e ∣ ¯ f ) English English the proposal 0.6227 the suggestions 0.0114 ’s proposal 0.1068 the proposed 0.0114 a proposal 0.0341 the motion 0.0091 the idea 0.0250 the idea of 0.0091 this proposal 0.0227 the proposal , 0.0068 proposal 0.0205 its proposal 0.0068 of the proposal 0.0159 it 0.0068 the proposals 0.0159 ... ... – lexical variation (proposal vs suggestions) – morphological variation (proposal vs proposals) – included function words (the, a, ...) – noise (it) Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  17. 48 decoding Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  18. Decoding 49 ● We have a mathematical model for translation p ( e ∣ f ) ● Task of decoding: find the translation e best with highest probability e best = argmax e p ( e ∣ f ) ● Two types of error – the most probable translation is bad → fix the model – search does not find the most probably translation → fix the search ● Decoding is evaluated by search error, not quality of translations (although these are often correlated) Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  19. Translation Process 50 ● Task: translate this sentence from German into English er geht ja nicht nach hause Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  20. Translation Process 51 ● Task: translate this sentence from German into English er geht ja nicht nach hause er he ● Pick phrase in input, translate Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  21. Translation Process 52 ● Task: translate this sentence from German into English er geht ja nicht nach hause er ja nicht he does not ● Pick phrase in input, translate – it is allowed to pick words out of sequence reordering – phrases may have multiple words: many-to-many translation Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  22. Translation Process 53 ● Task: translate this sentence from German into English er geht ja nicht nach hause er geht ja nicht he does not go ● Pick phrase in input, translate Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  23. Translation Process 54 ● Task: translate this sentence from German into English er geht ja nicht nach hause er geht ja nicht nach hause he does not go home ● Pick phrase in input, translate Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  24. 55 decoding process Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  25. Translation Options 56 er geht ja nicht nach hause he is yes not after house it are is do not to home , it goes , of course does not according to chamber , he go , is not in at home it is not home he will be is not under house it goes does not return home he goes do not do not is to are following is after all not after does not to not is not are not is not a ● Many translation options to choose from – in Europarl phrase table: 2727 matching phrase pairs for this sentence – by pruning to the top 20 per phrase, 202 translation options remain Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  26. Translation Options 57 er geht ja nicht nach hause he is yes not after house it are is do not to home , it goes , of course does not according to chamber , he go is not in at home it is not home he will be is not under house it goes does not return home he goes do not do not is to are following is after all not after does not to not is not are not is not a ● The machine translation decoder does not know the right answer – picking the right translation options – arranging them in the right order → Search problem solved by heuristic beam search Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  27. Decoding: Precompute Translation Options 58 er geht ja nicht nach hause consult phrase translation table for all input phrases Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  28. Decoding: Start with Initial Hypothesis 59 er geht ja nicht nach hause initial hypothesis: no input words covered, no output produced Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  29. Decoding: Hypothesis Expansion 60 er geht ja nicht nach hause are pick any translation option, create new hypothesis Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  30. Decoding: Hypothesis Expansion 61 er geht ja nicht nach hause he are it create hypotheses for all other translation options Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  31. Decoding: Hypothesis Expansion 62 er geht ja nicht nach hause yes he home goes are does not go home it to also create hypotheses from created partial hypothesis Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  32. Decoding: Find Best Path 63 er geht ja nicht nach hause yes he home goes are does not go home it to backtrack from highest scoring complete hypothesis Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  33. Recombination 64 ● Two hypothesis paths lead to two matching hypotheses – same number of foreign words translated – same English words in the output – different scores it is it is ● Worse hypothesis is dropped it is Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  34. Stacks 65 goes does not he are it yes no word one word two words three words translated translated translated translated ● Hypothesis expansion in a stack decoder – translation option is applied to hypothesis – new hypothesis is dropped into a stack further down Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  35. 66 syntax-based models Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  36. Phrase Structure Grammar 67 S VP-A VP-A VP-A PP NP-A PRP MD VB VBG RP TO PRP DT NNS I shall be passing on to you some comments Phrase structure grammar tree for an English sentence (as produced Collins’ parser) Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  37. Synchronous Phrase Structure Grammar 68 ● English rule NP → DET JJ NN ● French rule NP → DET NN JJ ● Synchronous rule (indices indicate alignment): NP → DET 1 NN 2 JJ 3 ∣ DET 1 JJ 3 NN 2 Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  38. Synchronous Grammar Rules 69 ● Nonterminal rules NP → DET 1 NN 2 JJ 3 ∣ DET 1 JJ 3 NN 2 ● Terminal rules N → maison ∣ house NP → la maison bleue ∣ the blue house ● Mixed rules NP → la maison JJ 1 ∣ the JJ 1 house Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  39. Syntax Decoding 70 ➏ VB drink Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S German input sentence with tree Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  40. Syntax Decoding 71 ➏ ➊ PRO VB drink she Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Purely lexical rule: filling a span with a translation (a constituent in the chart) Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  41. Syntax Decoding 72 ➏ ➊ ➋ PRO NN VB coffee drink she Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Purely lexical rule: filling a span with a translation (a constituent in the chart) Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  42. Syntax Decoding 73 ➏ ➊ ➋ ➌ PRO NN VB coffee drink she Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Purely lexical rule: filling a span with a translation (a constituent in the chart) Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  43. Syntax Decoding 74 ➏ ➍ NP NP PP NN DET NN IN | | | a cup of ➊ ➋ ➌ PRO NN VB coffee drink she Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Complex rule: matching underlying constituent spans, and covering words Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  44. Syntax Decoding 75 ➏ ➎ VP VP VBZ | TO VB NP wants | to ➍ NP NP PP NN DET NN IN | | | a cup of ➊ ➋ ➌ PRO NN VB coffee drink she Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Complex rule with reordering Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  45. Syntax Decoding 76 ➏ S PRO VP ➎ VP VP VBZ | TO NP VB wants | to ➍ NP NP PP NN DET NN IN | | | a cup of ➊ ➋ ➌ NN VB PRO she coffee drink Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  46. 77 neural language models Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  47. N-Gram Backoff Language Model 78 ● Previously, we approximated p ( W ) = p ( w 1 ,w 2 ,...,w n ) ● ... by applying the chain rule p ( W ) = ∑ p ( w i ∣ w 1 ,...,w i − 1 ) i ● ... and limiting the history (Markov order) p ( w i ∣ w 1 ,...,w i − 1 ) ≃ p ( w i ∣ w i − 4 ,w i − 3 ,w i − 2 ,w i − 1 ) ● Each p ( w i ∣ w i − 4 ,w i − 3 ,w i − 2 ,w i − 1 ) may not have enough statistics to estimate → we back off to p ( w i ∣ w i − 3 ,w i − 2 ,w i − 1 ) , p ( w i ∣ w i − 2 ,w i − 1 ) , etc., all the way to p ( w i ) – exact details of backing off get complicated — ”interpolated Kneser-Ney” Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  48. First Sketch 79 Word 1 Hidden Layer Word 2 Word 5 Word 3 Word 4 Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  49. Representing Words 80 ● Words are represented with a one-hot vector, e.g., – dog = (0,0,0,0,1,0,0,0,0,....) – cat = (0,0,0,0,0,0,0,1,0,....) – eat = (0,1,0,0,0,0,0,0,0,....) ● That’s a large vector! Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  50. Second Sketch 81 Word 1 Hidden Layer Word 2 Word 5 Word 3 Word 4 Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  51. Add a Hidden Layer 82 Word 1 C Hidden Layer Word 2 C Word 5 Word 3 C Word 4 C ● Map each word first into a lower-dimensional real-valued space ● Shared weight matrix C Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  52. Details (Bengio et al., 2003) 83 ● Add direct connections from embedding layer to output layer ● Activation functions – input → embedding: none – embedding → hidden: tanh – hidden → output: softmax ● Training – loop through the entire corpus – update between predicted probabilities and 1-hot vector for output word Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  53. Word Embeddings 84 Word Embedding C ● By-product: embedding of word into continuous space ● Similar contexts → similar embedding ● Recall: distributional semantics Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  54. Word Embeddings 85 Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  55. Word Embeddings 86 Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  56. Are Word Embeddings Magic? 87 ● Morphosyntactic regularities (Mikolov et al., 2013) – adjectives base form vs. comparative, e.g., good, better – nouns singular vs. plural, e.g., year, years – verbs present tense vs. past tense, e.g., see, saw ● Semantic regularities – clothing is to shirt as dish is to bowl – evaluated on human judgment data of semantic similarities Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  57. 88 recurrent neural networks Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  58. Recurrent Neural Networks 89 1 Word 1 E H Word 2 C ● Start: predict second word from first ● Mystery layer with nodes all with value 1 Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  59. Recurrent Neural Networks 90 1 Word 1 E H Word 2 C copy values H Word 2 E H Word 3 C Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  60. Recurrent Neural Networks 91 1 Word 1 E H Word 2 C copy values H Word 2 E H Word 3 C copy values H Word 3 E H Word 4 C Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  61. Training 92 1 Word 1 E H Word 2 ● Process first training example ● Update weights with back-propagation Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  62. 93 neural translation model Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  63. Feed Forward Neural Language Model 94 Word 1 C Hidden Layer Word 2 C Word 5 Word 3 C Word 4 C Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  64. Recurrent Neural Language Model 95 <s> Given word Predict the first word Embedding of a sentence Hidden Same as before, state just drawn top-down Predicted word the Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  65. Recurrent Neural Language Model 96 <s> the Given word Predict the second word Embedding of a sentence Hidden Re-use hidden state state from first word prediction Predicted word the house Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  66. Recurrent Neural Language Model 97 <s> house the Given word Predict the third word Embedding of a sentence Hidden ... and so on state Predicted word the house is Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  67. Recurrent Neural Language Model 98 <s> house is . the big Given word Embedding Hidden state Predicted word the house is big . </s> Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

  68. Recurrent Neural Translation Model 99 ● We predicted the words of a sentence ● Why not also predict their translations? Philipp Koehn Artificial Intelligence: Machine Translation 28 April 2020

Recommend


More recommend