Insertion Rule S VP VP VP PP NP PRP MD VB VBG RP TO PRP DT NNS I shall be passing on to you some comments Ich werde Ihnen die entsprechenden Anmerkungen aushändigen Extracted rule: pp → x | to prp Chapter 11: Tree-Based Models 30
Non-Lexical Rule S VP VP VP PP NP PRP MD VB VBG RP TO PRP DT NNS I shall be passing on to you some comments Ich werde Ihnen die entsprechenden Anmerkungen aushändigen Extracted rule: np → x 1 x 2 | dt 1 nns 2 Chapter 11: Tree-Based Models 31
Lexical Rule with Syntactic Context S VP VP VP PP NP PRP MD VB VBG RP TO PRP DT NNS I shall be passing on to you some comments Ich werde Ihnen die entsprechenden Anmerkungen aushändigen Extracted rule: vp → x 1 x 2 aush¨ andigen | passing on pp 1 np 2 Chapter 11: Tree-Based Models 32
Lexical Rule with Syntactic Context S VP VP VP PP NP PRP MD VB VBG RP TO PRP DT NNS I shall be passing on to you some comments Ich werde Ihnen die entsprechenden Anmerkungen aushändigen Extracted rule: vp → werde x | shall be vp (ignoring internal structure) Chapter 11: Tree-Based Models 33
Non-Lexical Rule S VP VP VP PP NP PRP MD VB VBG RP TO PRP DT NNS I shall be passing on to you some comments Ich werde Ihnen die entsprechenden Anmerkungen aushändigen Extracted rule: s → x 1 x 2 | prp 1 vp 2 done — note: one rule per alignable constituent Chapter 11: Tree-Based Models 34
Unaligned Source Words S VP VP VP PP NP PRP MD VB VBG RP TO PRP DT NNS I shall be passing on to you some comments Ich werde Ihnen die entsprechenden Anmerkungen aushändigen Attach to neighboring words or higher nodes → additional rules Chapter 11: Tree-Based Models 35
Too Few Phrasal Rules? • Lexical rules will be 1-to-1 mappings (unless word alignment requires otherwise) • But: phrasal rules very beneficial in phrase-based models • Solutions – combine rules that contain a maximum number of symbols (as in hierarchical models, recall: ”Option 1”) – compose minimal rules to cover a maximum number of non-leaf nodes Chapter 11: Tree-Based Models 36
Composed Rules x 1 x 2 = np • Current rules dt 1 nns 1 die = dt entsprechenden Anmerkungen = nns some comments • Composed rule die entsprechenden Anmerkungen = np dt nns some comments (1 non-leaf node: np ) Chapter 11: Tree-Based Models 37
Composed Rules • Minimal rule: x 1 x 2 aush¨ andigen = vp prp prp pp 1 np 2 3 non-leaf nodes: passing on vp , pp , np • Composed rule: Ihnen x 1 aush¨ andigen = vp prp prp pp np 1 3 non-leaf nodes: to prp passing on vp , pp and np to you Chapter 11: Tree-Based Models 38
Relaxing Tree Constraints • Impossible rule x = md vb werde shall be • Create new non-terminal label: md+vb ⇒ New rule x = md+vb md vb werde shall be Chapter 11: Tree-Based Models 39
Zollmann Venugopal Relaxation • If span consists of two constituents , join them: x+y • If span conststs of three constituents, join them: x+y+z • If span covers constituents with the same parent x and include – every but the first child y , label as x \ y – every but the last child y , label as x/y • For all other cases, label as fail ⇒ More rules can be extracted, but number of non-terminals blows up Chapter 11: Tree-Based Models 40
Special Problem: Flat Structures • Flat structures severely limit rule extraction np dt nnp nnp nnp nnp the Israeli Prime Minister Sharon • Can only extract rules for individual words or entire phrase Chapter 11: Tree-Based Models 41
Relaxation by Tree Binarization np dt np the nnp np Israeli nnp np Prime nnp nnp Minister Sharon More rules can be extracted Left-binarization or right-binarization? Chapter 11: Tree-Based Models 42
Scoring Translation Rules • Extract all rules from corpus • Score based on counts – joint rule probability: p ( lhs , rhs f , rhs e ) – rule application probability: p ( rhs f , rhs e | lhs ) – direct translation probability: p ( rhs e | rhs f , lhs ) – noisy channel translation probability: p ( rhs f | rhs e , lhs ) – lexical translation probability: � e i ∈ rhs e p ( e i | rhs f , a ) Chapter 11: Tree-Based Models 43
Syntactic Decoding Inspired by monolingual syntactic chart parsing: During decoding of the source sentence, a chart with translations for the O ( n 2 ) spans has to be filled Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Chapter 11: Tree-Based Models 44
➏ Syntax Decoding VB drink Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S German input sentence with tree Chapter 11: Tree-Based Models 45
➏ Syntax Decoding ➊ VB PRO she drink Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Purely lexical rule: filling a span with a translation (a constituent in the chart) Chapter 11: Tree-Based Models 46
➏ Syntax Decoding ➊ ➋ NN VB PRO she coffee drink Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Purely lexical rule: filling a span with a translation (a constituent in the chart) Chapter 11: Tree-Based Models 47
➏ Syntax Decoding ➊ ➋ ➌ NN VB PRO she coffee drink Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Purely lexical rule: filling a span with a translation (a constituent in the chart) Chapter 11: Tree-Based Models 48
➏ Syntax Decoding ➍ NP NP PP NN DET NN IN | | | a cup of ➊ ➋ ➌ NN VB PRO she coffee drink Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Complex rule: matching underlying constituent spans, and covering words Chapter 11: Tree-Based Models 49
➏ Syntax Decoding ➎ VP VP VBZ | TO VB NP wants | to ➍ NP NP PP NN DET NN IN | | | a cup of ➊ ➋ ➌ NN VB PRO she coffee drink Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Complex rule with reordering Chapter 11: Tree-Based Models 50
➏ S Syntax Decoding PRO VP ➎ VP VP VBZ | TO VB NP wants | to ➍ NP NP PP NN DET NN IN | | | a cup of ➊ ➋ ➌ PRO NN VB she coffee drink Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S Chapter 11: Tree-Based Models 51
Bottom-Up Decoding • For each span, a stack of (partial) translations is maintained • Bottom-up: a higher stack is filled, once underlying stacks are complete Chapter 11: Tree-Based Models 52
Naive Algorithm Input: Foreign sentence f = f 1 , ...f l f , with syntax tree Output: English translation e 1: for all spans [start,end] (bottom up) do for all sequences s of hypotheses and words in span [start,end] do 2: for all rules r do 3: if rule r applies to chart sequence s then 4: create new hypothesis c 5: add hypothesis c to chart 6: end if 7: end for 8: end for 9: 10: end for 11: return English translation e from best hypothesis in span [0, l f ] Chapter 11: Tree-Based Models 53
Chart Organization Sie will eine Tasse Kaffee trinken PPER VAFIN ART NN NN VVINF NP VP S • Chart consists of cells that cover contiguous spans over the input sentence • Each cell contains a set of hypotheses 1 • Hypothesis = translation of span with target-side constituent 1 In the book, they are called chart entries. Chapter 11: Tree-Based Models 54
Dynamic Programming Applying rule creates new hypothesis NP: a cup of coffee apply rule: NP → NP Kaffee ; NP → NP+P coffee NP+P: a cup of NP: coffee eine Tasse Kaffee trinken ART NN NN VVINF Chapter 11: Tree-Based Models 55
Dynamic Programming Another hypothesis NP: a cup of coffee NP: a cup of coffee apply rule: NP → eine Tasse NP ; NP → a cup of NP NP+P: a cup of NP: coffee eine Tasse Kaffee trinken ART NN NN VVINF Both hypotheses are indistiguishable in future search → can be recombined Chapter 11: Tree-Based Models 56
Recombinable States Recombinable? NP: a cup of coffee NP: a cup of coffee NP: a mug of coffee Chapter 11: Tree-Based Models 57
Recombinable States Recombinable? NP: a cup of coffee NP: a cup of coffee NP: a mug of coffee Yes, iff max. 2-gram language model is used Chapter 11: Tree-Based Models 58
Recombinability Hypotheses have to match in • span of input words covered • output constituent label • first n –1 output words not properly scored, since they lack context • last n –1 output words still affect scoring of subsequently added words, just like in phrase-based decoding ( n is the order of the n-gram language model) Chapter 11: Tree-Based Models 59
Language Model Contexts When merging hypotheses, internal language model contexts are absorbed S (minister of Germany met with Condoleezza Rice) the foreign ... ... in Frankfurt NP VP (minister) (Condoleezza Rice) the foreign ... ... of Germany met with ... ... in Frankfurt relevant history un-scored words p LM (met | of Germany) p LM (with | Germany met) Chapter 11: Tree-Based Models 60
Stack Pruning • Number of hypotheses in each chart cell explodes ⇒ need to discard bad hypotheses e.g., keep 100 best only • Different stacks for different output constituent labels? • Cost estimates – translation model cost known – language model cost for internal words known → estimates for initial words – outside cost estimate? (how useful will be a NP covering input words 3–5 later on?) Chapter 11: Tree-Based Models 61
Naive Algorithm: Blow-ups • Many subspan sequences for all sequences s of hypotheses and words in span [start,end] • Many rules for all rules r • Checking if a rule applies not trivial rule r applies to chart sequence s ⇒ Unworkable Chapter 11: Tree-Based Models 62
Solution • Prefix tree data structure for rules • Dotted rules • Cube pruning Chapter 11: Tree-Based Models 63
Storing Rules • First concern: do they apply to span? → have to match available hypotheses and input words • Example rule np → x 1 des x 2 | np 1 of the nn 2 • Check for applicability – is there an initial sub-span that with a hypothesis with constituent label np ? – is it followed by a sub-span over the word des ? – is it followed by a final sub-span with a hypothesis with label nn ? • Sequence of relevant information np • des • nn • np 1 of the nn 2 Chapter 11: Tree-Based Models 64
Rule Applicability Check Trying to cover a span of six words with given rule NP • des • NN → NP: NP of the NN das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 65
Rule Applicability Check First: check for hypotheses with output constituent label np NP • des • NN → NP: NP of the NN das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 66
Rule Applicability Check Found np hypothesis in cell, matched first symbol of rule NP • des • NN → NP: NP of the NN NP das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 67
Rule Applicability Check Matched word des , matched second symbol of rule NP • des • NN → NP: NP of the NN NP das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 68
Rule Applicability Check Found a nn hypothesis in cell, matched last symbol of rule NP • des • NN → NP: NP of the NN NP NN das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 69
Rule Applicability Check Matched entire rule → apply to create a np hypothesis NP • des • NN → NP: NP of the NN NP NP NN das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 70
Rule Applicability Check Look up output words to create new hypothesis (note: there may be many matching underlying np and nn hypotheses) NP • des • NN → NP: NP of the NN NP: the house of the architect Frank Gehry NP: the house NN: architect Frank Gehry das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 71
Checking Rules vs. Finding Rules • What we showed: – given a rule – check if and how it can be applied • But there are too many rules (millions) to check them all • Instead: – given the underlying chart cells and input words – find which rules apply Chapter 11: Tree-Based Models 72
Prefix Tree for Rules NP: NP 1 IN 2 NP 3 NP DET NN NP: NP 1 of DET 2 NP 3 NP … ... NP: NP 1 NP: NP 1 of IN 2 NP 3 ... ... PP … des NP: NP 1 of the NN 2 NN um VP … NP: NP 2 NP 1 NP: NP 1 of NP 2 ... ... VP … ... NP: DET 1 NN 2 DET NN ... ... das Haus NP: the house ... ... ... Highlighted Rules np → np 1 det 2 nn 3 | np 1 in 2 nn 3 np → np 1 | np 1 np → np 1 des nn 2 | np 1 of the nn 2 np → np 1 des nn 2 | np 2 np 1 np → det 1 nn 2 | det 1 nn 2 np → das Haus | the house Chapter 11: Tree-Based Models 73
Dotted Rules: Key Insight • If we can apply a rule like p → A B C | x to a span • Then we could have applied a rule like q → A B | y to a sub-span with the same starting word ⇒ We can re-use rule lookup by storing A B • (dotted rule) Chapter 11: Tree-Based Models 74
Finding Applicable Rules in Prefix Tree das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 75
Covering the First Cell das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 76
Looking up Rules in the Prefix Tree das ❶ ● das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 77
Taking Note of the Dotted Rule das ❶ ● das ❶ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 78
Checking if Dotted Rule has Translations das ❶ DET : the ● DET : that das ❶ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 79
Applying the Translation Rules das ❶ DET : the ● DET : that DET : that DET : the das ❶ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 80
Looking up Constituent Label in Prefix Tree das ❶ ● DET ❷ DET : that DET : the das ❶ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 81
Add to Span’s List of Dotted Rules das ❶ ● DET ❷ DET : that DET : the DET ❷ das ❶ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 82
Moving on to the Next Cell das ❶ ● DET ❷ DET : that DET : the DET ❷ das ❶ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 83
Looking up Rules in the Prefix Tree das ❶ ● DET ❷ Haus ❸ DET : that DET : the DET ❷ das ❶ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 84
Taking Note of the Dotted Rule das ❶ ● DET ❷ Haus ❸ DET : that DET : the DET ❷ house ❸ das ❶ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 85
Checking if Dotted Rule has Translations das ❶ ● DET ❷ Haus ❸ NN : house NP : house DET : that DET : the DET ❷ house ❸ das ❶ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 86
Applying the Translation Rules das ❶ ● DET ❷ Haus ❸ NN : house NP : house DET : that NP : house DET : the NN : house DET ❷ house ❸ das ❶ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 87
Looking up Constituent Label in Prefix Tree das ❶ ● DET ❷ Haus ❸ NN ❹ NP ❺ DET : that NP : house DET : the NN : house DET ❷ das ❶ house ❸ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 88
Add to Span’s List of Dotted Rules das ❶ ● DET ❷ Haus ❸ NN ❹ NP ❺ DET : that NP : house DET : the NN : house DET ❷ NN ❹ NP ❺ das ❶ house ❸ das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 89
More of the Same das ❶ ● DET ❷ Haus ❸ NN ❹ NP ❺ DET : that NP : house IN : of NP : architect DET : the NN : house DET : the NN : architect NNP : Frank NNP : Gehry DET ❷ NN ❹ NP ❺ DET ❷ NN ❹ NNP • NNP • das ❶ house ❸ des • Architekten • Frank • Gehry • das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 90
Moving on to the Next Cell das ❶ ● DET ❷ Haus ❸ NN ❹ NP ❺ DET : that NP : house IN : of NP : architect DET : the NN : house DET : the NN : architect NNP : Frank NNP : Gehry DET ❷ NN ❹ NP ❺ DET ❷ NN ❹ NNP • NNP • das ❶ house ❸ des • Architekten • Frank • Gehry • das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 91
Covering a Longer Span Cannot consume multiple words at once All rules are extensions of existing dotted rules Here: only extensions of span over das possible DET : that NP : house IN : of NP : architect DET : the NN : house DET : the NN : architect NNP : Frank NNP : Gehry DET ❷ NN ❹ NP ❺ DET ❷ NN ❹ NNP • NNP • das ❶ house ❸ des • Architekten • Frank • Gehry • das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 92
Extensions of Span over das das ❶ NN , NP , Haus? ● DET ❷ NN , NP , Haus? Haus ❸ NN ❹ NP ❺ DET : that NP : house IN : of NP : architect DET : the NN : house DET : the NN : architect NNP : Frank NNP : Gehry DET ❷ NN ❹ NP ❺ DET ❷ NN ❹ NNP • NNP • das ❶ house ❸ des • Architekten • Frank • Gehry • das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 93
Looking up Rules in the Prefix Tree das ❶ Haus ❻ ● NN ❼ DET ❷ Haus ❽ NN ❾ DET : that NP : house IN : of NP : architect DET : the NN : house DET : the NN : architect NNP : Frank NNP : Gehry DET ❷ NN ❹ NP ❺ DET ❷ NN ❹ NNP • NNP • das ❶ house ❸ des • Architekten • Frank • Gehry • das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 94
Taking Note of the Dotted Rule das ❶ Haus ❻ ● NN ❼ DET ❷ Haus ❽ NN ❾ DET NN ❾ DET Haus ❽ das NN ❼ das Haus ❻ DET : that NP : house IN : of NP : architect DET : the NN : house DET : the NN : architect NNP : Frank NNP : Gehry DET ❷ NN ❹ NP ❺ DET ❷ NN ❹ NNP • NNP • das ❶ house ❸ des • Architekten • Frank • Gehry • das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 95
Checking if Dotted Rules have Translations das ❶ Haus ❻ ● NP : the house NN ❼ NP : the NN DET ❷ Haus ❽ NP : DET house NN ❾ NP : DET NN DET NN ❾ DET Haus ❽ das NN ❼ das Haus ❻ DET : that NP : house IN : of NP : architect DET : the NN : house DET : the NN : architect NNP : Frank NNP : Gehry DET ❷ NN ❹ NP ❺ DET ❷ NN ❹ NNP • NNP • das ❶ house ❸ des • Architekten • Frank • Gehry • das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 96
Applying the Translation Rules das ❶ Haus ❻ ● NP : the house NN ❼ NP : the NN DET ❷ Haus ❽ NP : DET house NN ❾ NP : DET NN NP : that house NP : the house DET NN ❾ DET Haus ❽ das NN ❼ das Haus ❻ DET : that NP : house IN : of NP : architect DET : the NN : house DET : the NN : architect NNP : Frank NNP : Gehry DET ❷ NN ❹ NP ❺ DET ❷ NN ❹ NNP • NNP • das ❶ house ❸ des • Architekten • Frank • Gehry • das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 97
Looking up Constituent Label in Prefix Tree das ❶ Haus ❻ ● NP : the house NN ❼ NP : the NN DET ❷ Haus ❽ NP : DET house NN ❾ NP : DET NN NP ❺ NP : that house NP : the house DET NN ❾ DET Haus ❽ das NN ❼ das Haus ❻ DET : that NP : house IN : of NP : architect DET : the NN : house DET : the NN : architect NNP : Frank NNP : Gehry DET ❷ NN ❹ NP ❺ DET ❷ NN ❹ NNP • NNP • das ❶ house ❸ des • Architekten • Frank • Gehry • das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 98
Add to Span’s List of Dotted Rules das ❶ Haus ❻ ● NP : the house NN ❼ NP : the NN DET ❷ Haus ❽ NP : DET house NN ❾ NP : DET NN NP ❺ NP : that house NP : the house DET NN ❾ NP ❺ DET Haus ❽ das NN ❼ das Haus ❻ DET : that NP : house IN : of NP : architect DET : the NN : house DET : the NN : architect NNP : Frank NNP : Gehry DET ❷ NN ❹ NP ❺ DET ❷ NN ❹ NNP • NNP • das ❶ house ❸ des • Architekten • Frank • Gehry • das Haus des Architekten Frank Gehry Chapter 11: Tree-Based Models 99
Recommend
More recommend