chapter 11 tree based models
play

Chapter 11 Tree-based models Statistical Machine Translation - PowerPoint PPT Presentation

Chapter 11 Tree-based models Statistical Machine Translation Tree-Based Models Traditional statistical models operate on sequences of words Many translation problems can be best explained by pointing to syntax reordering, e.g., verb


  1. Insertion Rule S VP VP VP PP NP PRP MD VB VBG RP TO PRP DT NNS I shall be passing on to you some comments Ich werde Ihnen die entsprechenden Anmerkungen aushändigen Extracted rule: pp → x | to prp Chapter 11: Tree-Based Models 30

  2. Non-Lexical Rule S VP VP VP PP NP PRP MD VB VBG RP TO PRP DT NNS I shall be passing on to you some comments Ich werde Ihnen die entsprechenden Anmerkungen aushändigen Extracted rule: np → x 1 x 2 | dt 1 nns 2 Chapter 11: Tree-Based Models 31

  3. Lexical Rule with Syntactic Context S VP VP VP PP NP PRP MD VB VBG RP TO PRP DT NNS I shall be passing on to you some comments Ich werde Ihnen die entsprechenden Anmerkungen aushändigen Extracted rule: vp → x 1 x 2 aush¨ andigen | passing on pp 1 np 2 Chapter 11: Tree-Based Models 32

  4. Lexical Rule with Syntactic Context S VP VP VP PP NP PRP MD VB VBG RP TO PRP DT NNS I shall be passing on to you some comments Ich werde Ihnen die entsprechenden Anmerkungen aushändigen Extracted rule: vp → werde x | shall be vp (ignoring internal structure) Chapter 11: Tree-Based Models 33

  5. Non-Lexical Rule S VP VP VP PP NP PRP MD VB VBG RP TO PRP DT NNS I shall be passing on to you some comments Ich werde Ihnen die entsprechenden Anmerkungen aushändigen Extracted rule: s → x 1 x 2 | prp 1 vp 2 done — note: one rule per alignable constituent Chapter 11: Tree-Based Models 34

  6. Unaligned Source Words S VP VP VP PP NP PRP MD VB VBG RP TO PRP DT NNS I shall be passing on to you some comments Ich werde Ihnen die entsprechenden Anmerkungen aushändigen Attach to neighboring words or higher nodes → additional rules Chapter 11: Tree-Based Models 35

  7. Too Few Phrasal Rules? • Lexical rules will be 1-to-1 mappings (unless word alignment requires otherwise) • But: phrasal rules very beneficial in phrase-based models • Solutions – combine rules that contain a maximum number of symbols (as in hierarchical models, recall: ”Option 1”) – compose minimal rules to cover a maximum number of non-leaf nodes Chapter 11: Tree-Based Models 36

  8. Composed Rules x 1 x 2 = np • Current rules dt 1 nns 1 die = dt entsprechenden Anmerkungen = nns some comments • Composed rule die entsprechenden Anmerkungen = np dt nns some comments (1 non-leaf node: np ) Chapter 11: Tree-Based Models 37

  9. Composed Rules • Minimal rule: x 1 x 2 aush¨ andigen = vp prp prp pp 1 np 2 3 non-leaf nodes: passing on vp , pp , np • Composed rule: Ihnen x 1 aush¨ andigen = vp prp prp pp np 1 3 non-leaf nodes: to prp passing on vp , pp and np to you Chapter 11: Tree-Based Models 38

  10. Relaxing Tree Constraints • Impossible rule x = md vb werde shall be • Create new non-terminal label: md+vb ⇒ New rule x = md+vb md vb werde shall be Chapter 11: Tree-Based Models 39

  11. Zollmann Venugopal Relaxation • If span consists of two constituents , join them: x+y • If span conststs of three constituents, join them: x+y+z • If span covers constituents with the same parent x and include – every but the first child y , label as x \ y – every but the last child y , label as x/y • For all other cases, label as fail ⇒ More rules can be extracted, but number of non-terminals blows up Chapter 11: Tree-Based Models 40

  12. Special Problem: Flat Structures • Flat structures severely limit rule extraction np dt nnp nnp nnp nnp the Israeli Prime Minister Sharon • Can only extract rules for individual words or entire phrase Chapter 11: Tree-Based Models 41

  13. Relaxation by Tree Binarization np dt np the nnp np Israeli nnp np Prime nnp nnp Minister Sharon More rules can be extracted Left-binarization or right-binarization? Chapter 11: Tree-Based Models 42

  14. Scoring Translation Rules • Extract all rules from corpus • Score based on counts – joint rule probability: p ( lhs , rhs f , rhs e ) – rule application probability: p ( rhs f , rhs e | lhs ) – direct translation probability: p ( rhs e | rhs f , lhs ) – noisy channel translation probability: p ( rhs f | rhs e , lhs ) – lexical translation probability: � e i ∈ rhs e p ( e i | rhs f , a ) Chapter 11: Tree-Based Models 43

  15. Syntactic Decoding Inspired by monolingual syntactic chart parsing: During decoding of the source sentence, a chart with translations for the O ( n 2 ) spans has to be filled Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Chapter 11: Tree-Based Models 44

  16. ➏ Syntax Decoding VB drink Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S German input sentence with tree Chapter 11: Tree-Based Models 45

  17. ➏ Syntax Decoding ➊ VB PRO she drink Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Purely lexical rule: filling a span with a translation (a constituent in the chart) Chapter 11: Tree-Based Models 46

  18. ➏ Syntax Decoding ➊ ➋ NN VB PRO she coffee drink Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Purely lexical rule: filling a span with a translation (a constituent in the chart) Chapter 11: Tree-Based Models 47

  19. ➏ Syntax Decoding ➊ ➋ ➌ NN VB PRO she coffee drink Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Purely lexical rule: filling a span with a translation (a constituent in the chart) Chapter 11: Tree-Based Models 48

  20. ➏ Syntax Decoding ➍ NP NP PP NN DET NN IN | | | a cup of ➊ ➋ ➌ NN VB PRO she coffee drink Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Complex rule: matching underlying constituent spans, and covering words Chapter 11: Tree-Based Models 49

  21. ➏ Syntax Decoding ➎ VP VP VBZ | TO VB NP wants | to ➍ NP NP PP NN DET NN IN | | | a cup of ➊ ➋ ➌ NN VB PRO she coffee drink Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Complex rule with reordering Chapter 11: Tree-Based Models 50

  22. ➏ S Syntax Decoding PRO VP ➎ VP VP VBZ | TO VB NP wants | to ➍ NP NP PP NN DET NN IN | | | a cup of ➊ ➋ ➌ PRO NN VB she coffee drink Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Chapter 11: Tree-Based Models 51

  23. Bottom-Up Decoding • For each span, a stack of (partial) translations is maintained • Bottom-up: a higher stack is filled, once underlying stacks are complete Chapter 11: Tree-Based Models 52

  24. Naive Algorithm Input: Foreign sentence f = f 1 , ...f l f , with syntax tree Output: English translation e 1: for all spans [start,end] (bottom up) do for all sequences s of hypotheses and words in span [start,end] do 2: for all rules r do 3: if rule r applies to chart sequence s then 4: create new hypothesis c 5: add hypothesis c to chart 6: end if 7: end for 8: end for 9: 10: end for 11: return English translation e from best hypothesis in span [0, l f ] Chapter 11: Tree-Based Models 53

  25. Chart Organization Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S • Chart consists of cells that cover contiguous spans over the input sentence • Each cell contains a set of hypotheses 1 • Hypothesis = translation of span with target-side constituent 1 In the book, they are called chart entries. Chapter 11: Tree-Based Models 54

  26. Dynamic Programming Applying rule creates new hypothesis NP: a cup of coffee apply rule: NP → NP Kaffee ; NP → NP+P coffee NP+P: a cup of NP: coffee eine Tasse Kaffee trinken ART NN NN VVINF Chapter 11: Tree-Based Models 55

  27. Dynamic Programming Another hypothesis NP: a cup of coffee NP: a cup of coffee apply rule: NP → eine Tasse NP ; NP → a cup of NP NP+P: a cup of NP: coffee eine Tasse Kaffee trinken ART NN NN VVINF Both hypotheses are indistiguishable in future search → can be recombined Chapter 11: Tree-Based Models 56

  28. Recombinable States Recombinable? NP: a cup of coffee NP: a cup of coffee NP: a mug of coffee Chapter 11: Tree-Based Models 57

  29. Recombinable States Recombinable? NP: a cup of coffee NP: a cup of coffee NP: a mug of coffee Yes, iff max. 2-gram language model is used Chapter 11: Tree-Based Models 58

  30. Recombinability Hypotheses have to match in • span of input words covered • output constituent label • first n –1 output words not properly scored, since they lack context • last n –1 output words still affect scoring of subsequently added words, just like in phrase-based decoding ( n is the order of the n-gram language model) Chapter 11: Tree-Based Models 59

  31. Language Model Contexts When merging hypotheses, internal language model contexts are absorbed S (minister of Germany met with Condoleezza Rice) the foreign ... ... in Frankfurt NP VP (minister) (Condoleezza Rice) the foreign ... ... of Germany met with ... ... in Frankfurt relevant history un-scored words p LM (met | of Germany) p LM (with | Germany met) Chapter 11: Tree-Based Models 60

  32. Stack Pruning • Number of hypotheses in each chart cell explodes ⇒ need to discard bad hypotheses e.g., keep 100 best only • Different stacks for different output constituent labels? • Cost estimates – translation model cost known – language model cost for internal words known → estimates for initial words – outside cost estimate? (how useful will be a NP covering input words 3–5 later on?) Chapter 11: Tree-Based Models 61

  33. Naive Algorithm: Blow-ups • Many subspan sequences for all sequences s of hypotheses and words in span [start,end] • Many rules for all rules r • Checking if a rule applies not trivial rule r applies to chart sequence s ⇒ Unworkable Chapter 11: Tree-Based Models 62

  34. Solution • Prefix tree data structure for rules • Dotted rules • Cube pruning Chapter 11: Tree-Based Models 63

  35. Storing Rules • First concern: do they apply to span? → have to match available hypotheses and input words • Example rule np → x 1 des x 2 | np 1 of the nn 2 • Check for applicability – is there an initial sub-span that with a hypothesis with constituent label np ? – is it followed by a sub-span over the word des ? – is it followed by a final sub-span with a hypothesis with label nn ? • Sequence of relevant information np • des • nn • np 1 of the nn 2 Chapter 11: Tree-Based Models 64

  36. Rule Applicability Check Trying to cover a span of six words with given rule NP • des • NN → NP: NP of the NN das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 65

  37. Rule Applicability Check First: check for hypotheses with output constituent label np NP • des • NN → NP: NP of the NN das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 66

  38. Rule Applicability Check Found np hypothesis in cell, matched first symbol of rule NP • des • NN → NP: NP of the NN NP das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 67

  39. Rule Applicability Check Matched word des , matched second symbol of rule NP • des • NN → NP: NP of the NN NP das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 68

  40. Rule Applicability Check Found a nn hypothesis in cell, matched last symbol of rule NP • des • NN → NP: NP of the NN NP NN das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 69

  41. Rule Applicability Check Matched entire rule → apply to create a np hypothesis NP • des • NN → NP: NP of the NN NP NP NN das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 70

  42. Rule Applicability Check Look up output words to create new hypothesis (note: there may be many matching underlying np and nn hypotheses) NP • des • NN → NP: NP of the NN NP: the house of the architect Frank Gehry NP: the house NN: architect Frank Gehry das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 71

  43. Checking Rules vs. Finding Rules • What we showed: – given a rule – check if and how it can be applied • But there are too many rules (millions) to check them all • Instead: – given the underlying chart cells and input words – find which rules apply Chapter 11: Tree-Based Models 72

  44. Prefix Tree for Rules NP: NP 1 IN 2 NP 3 NP DET NN NP: NP 1 of DET 2 NP 3 NP … ... NP: NP 1 NP: NP 1 of IN 2 NP 3 ... ... PP … des NP: NP 1 of the NN 2 NN um VP … NP: NP 2 NP 1 NP: NP 1 of NP 2 ... ... VP … ... NP: DET 1 NN 2 DET NN ... ... das Haus NP: the house ... ... ... Highlighted Rules np → np 1 det 2 nn 3 | np 1 in 2 nn 3 np → np 1 | np 1 np → np 1 des nn 2 | np 1 of the nn 2 np → np 1 des nn 2 | np 2 np 1 np → det 1 nn 2 | det 1 nn 2 np → das Haus | the house Chapter 11: Tree-Based Models 73

  45. Dotted Rules: Key Insight • If we can apply a rule like p → A B C | x to a span • Then we could have applied a rule like q → A B | y to a sub-span with the same starting word ⇒ We can re-use rule lookup by storing A B • (dotted rule) Chapter 11: Tree-Based Models 74

  46. Finding Applicable Rules in Prefix Tree das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 75

  47. Covering the First Cell das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 76

  48. Looking up Rules in the Prefix Tree das ❶ ● das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 77

  49. Taking Note of the Dotted Rule das ❶ ● das ❶ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 78

  50. Checking if Dotted Rule has Translations das ❶ DET : the ● DET : that das ❶ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 79

  51. Applying the Translation Rules das ❶ DET : the ● DET : that DET : that DET : the das ❶ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 80

  52. Looking up Constituent Label in Prefix Tree das ❶ ● DET ❷ DET : that DET : the das ❶ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 81

  53. Add to Span’s List of Dotted Rules das ❶ ● DET ❷ DET : that DET : the DET ❷ das ❶ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 82

  54. Moving on to the Next Cell das ❶ ● DET ❷ DET : that DET : the DET ❷ das ❶ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 83

  55. Looking up Rules in the Prefix Tree das ❶ ● DET ❷ Haus ❸ DET : that DET : the DET ❷ das ❶ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 84

  56. Taking Note of the Dotted Rule das ❶ ● DET ❷ Haus ❸ DET : that DET : the DET ❷ house ❸ das ❶ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 85

  57. Checking if Dotted Rule has Translations das ❶ ● DET ❷ Haus ❸ NN : house NP : house DET : that DET : the DET ❷ house ❸ das ❶ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 86

  58. Applying the Translation Rules das ❶ ● DET ❷ Haus ❸ NN : house NP : house DET : that NP : house DET : the NN : house DET ❷ house ❸ das ❶ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 87

  59. Looking up Constituent Label in Prefix Tree das ❶ ● DET ❷ Haus ❸ NN ❹ NP ❺ DET : that NP : house DET : the NN : house DET ❷ das ❶ house ❸ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 88

  60. Add to Span’s List of Dotted Rules das ❶ ● DET ❷ Haus ❸ NN ❹ NP ❺ DET : that NP : house DET : the NN : house DET ❷ NN ❹ NP ❺ das ❶ house ❸ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 89

  61. More of the Same das ❶ ● DET ❷ Haus ❸ NN ❹ NP ❺ DET : that NP : house IN : of NP : architect DET : the NN : house DET : the NN : architect NNP : Frank NNP : Gehry DET ❷ NN ❹ NP ❺ DET ❷ NN ❹ NNP • NNP • das ❶ house ❸ des • Architekten • Frank • Gehry • das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 90

  62. Moving on to the Next Cell das ❶ ● DET ❷ Haus ❸ NN ❹ NP ❺ DET : that NP : house IN : of NP : architect DET : the NN : house DET : the NN : architect NNP : Frank NNP : Gehry DET ❷ NN ❹ NP ❺ DET ❷ NN ❹ NNP • NNP • das ❶ house ❸ des • Architekten • Frank • Gehry • das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 91

  63. Covering a Longer Span Cannot consume multiple words at once All rules are extensions of existing dotted rules Here: only extensions of span over das possible DET : that NP : house IN : of NP : architect DET : the NN : house DET : the NN : architect NNP : Frank NNP : Gehry DET ❷ NN ❹ NP ❺ DET ❷ NN ❹ NNP • NNP • das ❶ house ❸ des • Architekten • Frank • Gehry • das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 92

  64. Extensions of Span over das das ❶ NN , NP , Haus? ● DET ❷ NN , NP , Haus? Haus ❸ NN ❹ NP ❺ DET : that NP : house IN : of NP : architect DET : the NN : house DET : the NN : architect NNP : Frank NNP : Gehry DET ❷ NN ❹ NP ❺ DET ❷ NN ❹ NNP • NNP • das ❶ house ❸ des • Architekten • Frank • Gehry • das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 93

  65. Looking up Rules in the Prefix Tree das ❶ Haus ❻ ● NN ❼ DET ❷ Haus ❽ NN ❾ DET : that NP : house IN : of NP : architect DET : the NN : house DET : the NN : architect NNP : Frank NNP : Gehry DET ❷ NN ❹ NP ❺ DET ❷ NN ❹ NNP • NNP • das ❶ house ❸ des • Architekten • Frank • Gehry • das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 94

  66. Taking Note of the Dotted Rule das ❶ Haus ❻ ● NN ❼ DET ❷ Haus ❽ NN ❾ DET NN ❾ DET Haus ❽ das NN ❼ das Haus ❻ DET : that NP : house IN : of NP : architect DET : the NN : house DET : the NN : architect NNP : Frank NNP : Gehry DET ❷ NN ❹ NP ❺ DET ❷ NN ❹ NNP • NNP • das ❶ house ❸ des • Architekten • Frank • Gehry • das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 95

  67. Checking if Dotted Rules have Translations das ❶ Haus ❻ ● NP : the house NN ❼ NP : the NN DET ❷ Haus ❽ NP : DET house NN ❾ NP : DET NN DET NN ❾ DET Haus ❽ das NN ❼ das Haus ❻ DET : that NP : house IN : of NP : architect DET : the NN : house DET : the NN : architect NNP : Frank NNP : Gehry DET ❷ NN ❹ NP ❺ DET ❷ NN ❹ NNP • NNP • das ❶ house ❸ des • Architekten • Frank • Gehry • das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 96

  68. Applying the Translation Rules das ❶ Haus ❻ ● NP : the house NN ❼ NP : the NN DET ❷ Haus ❽ NP : DET house NN ❾ NP : DET NN NP : that house NP : the house DET NN ❾ DET Haus ❽ das NN ❼ das Haus ❻ DET : that NP : house IN : of NP : architect DET : the NN : house DET : the NN : architect NNP : Frank NNP : Gehry DET ❷ NN ❹ NP ❺ DET ❷ NN ❹ NNP • NNP • das ❶ house ❸ des • Architekten • Frank • Gehry • das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 97

  69. Looking up Constituent Label in Prefix Tree das ❶ Haus ❻ ● NP : the house NN ❼ NP : the NN DET ❷ Haus ❽ NP : DET house NN ❾ NP : DET NN NP ❺ NP : that house NP : the house DET NN ❾ DET Haus ❽ das NN ❼ das Haus ❻ DET : that NP : house IN : of NP : architect DET : the NN : house DET : the NN : architect NNP : Frank NNP : Gehry DET ❷ NN ❹ NP ❺ DET ❷ NN ❹ NNP • NNP • das ❶ house ❸ des • Architekten • Frank • Gehry • das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 98

  70. Add to Span’s List of Dotted Rules das ❶ Haus ❻ ● NP : the house NN ❼ NP : the NN DET ❷ Haus ❽ NP : DET house NN ❾ NP : DET NN NP ❺ NP : that house NP : the house DET NN ❾ NP ❺ DET Haus ❽ das NN ❼ das Haus ❻ DET : that NP : house IN : of NP : architect DET : the NN : house DET : the NN : architect NNP : Frank NNP : Gehry DET ❷ NN ❹ NP ❺ DET ❷ NN ❹ NNP • NNP • das ❶ house ❸ des • Architekten • Frank • Gehry • das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 99

Recommend


More recommend