transforming projective bilexical dependency grammars
play

Transforming Projective Bilexical Dependency Grammars into - PowerPoint PPT Presentation

Transforming Projective Bilexical Dependency Grammars into efficiently-parsable CFGs with Unfold-Fold Mark Johnson Microsoft Research Brown University ACL 2007 1 / 22 Motivation and summary Whats the relationship between CKY parsing


  1. Transforming Projective Bilexical Dependency Grammars into efficiently-parsable CFGs with Unfold-Fold Mark Johnson Microsoft Research Brown University ACL 2007 1 / 22

  2. Motivation and summary ◮ What’s the relationship between CKY parsing and the Eisner/Satta O ( n 3 ) PBDG parsing algorithm? (c.f., McAllester 1999) ◮ split-head encoding , collecting left and right dependents separately ◮ unfold-fold transform reorganizes grammar for efficient CKY parsing ◮ Approach generalizes to 2nd-order dependencies ◮ predict argument given governor and sibling (McDonald 2006) ◮ predict argument given governor and governor’s governor ◮ In principle can use any CFG parsing or estimation algorithm for PBDGs ◮ transformed grammars typically too large to enumerate ◮ my CKY implementations transform grammar on the fly 2 / 22

  3. Outline Projective Bilexical Dependency Grammars Simple split-head encoding O ( n 3 ) split-head CFGs via Unfold-Fold Transformations capturing 2nd-order dependencies Conclusion 3 / 22

  4. Projective Bilexical Dependency Grammars ◮ Projective Bilexical Dependency Grammar (PBDG) 0 gave Sandy gave gave dog the dog gave bone a bone ◮ A dependency parse generated by the PBDG 0 Sandy gave the dog a bone ◮ Weights can be attached to dependencies (and preserved in CFG transforms) 4 / 22

  5. A naive encoding of PBDGs as CFGs S → X u where 0 u X u → u X u → X v X u where v u X u → X u X v where u v S X gave X Sandy X gave Sandy X gave X bone X gave X dog X a X bone gave X the X dog a bone the dog 5 / 22

  6. Spurious ambiguity in naive encoding ◮ Naive encoding allows dependencies on different sides of head to be freely reordered ⇒ Spurious ambiguity in CFG parses (not present in PBDG parses) S X gave S X Sandy X gave X gave Sandy X gave X bone X gave X bone X gave X dog X a X bone X gave X dog X a X bone gave X the X dog a bone X Sandy X gave X the X dog a bone the dog Sandy gave the dog 6 / 22

  7. Parsing naive CFG encoding takes O ( n 5 ) time ◮ A production schema such as X u X u X v → has 5 variables, and so can match input in O ( n 5 ) different ways X u X u X v i u j v k 7 / 22

  8. Outline Projective Bilexical Dependency Grammars Simple split-head encoding O ( n 3 ) split-head CFGs via Unfold-Fold Transformations capturing 2nd-order dependencies Conclusion 8 / 22

  9. Simple split-head encoding ◮ Replace input word u with a left variant u ℓ and a right variant u r (can be avoided in practice with fancy book-keeping) Sandy gave the dog a bone ⇓ Sandy ℓ Sandy r gave ℓ gave r the ℓ the r dog ℓ dog r a ℓ a r bone ℓ bone r ◮ PCFG separately collects left dependencies and right dependencies S S X u where 0 → u X gave X u L u u R where u ∈ Σ → L gave gave R L u → u l X Sandy L gave gave R X bone L u X v L u where v u → u R u r → Sandy gave R X dog a bone u R u R X v where u v → gave ℓ gave r the dog 9 / 22

  10. Simple split-head CFG parse S X gave L gave gave R X Sandy L gave gave R X bone L Sandy Sandy R gave R X dog L bone bone R Sandy ℓ Sandy r gave ℓ gave r L dog dog R X a L bone X the L dog L a a R bone ℓ bone r L the the R dog ℓ dog r a ℓ a r the ℓ the r 10 / 22

  11. L u and u R heads are phrase-peripheral ⇒ O ( n 4 ) ◮ Heads of L u and u R are always at right (left) edge S → X u where 0 u X u X u → L u u R where u ∈ Σ L u u R L u → u l X v 1 X v 3 L u → X v L u where v u L u u R u R → u r u R → u R X v where u v X v 2 L u u R X v 4 u ℓ u r u R ◮ X u take O ( n 3 ) → L u u R u R X v take O ( n 4 ) ◮ u R → u R X v i = u j v k 11 / 22

  12. Outline Projective Bilexical Dependency Grammars Simple split-head encoding O ( n 3 ) split-head CFGs via Unfold-Fold Transformations capturing 2nd-order dependencies Conclusion 12 / 22

  13. The Unfold-Fold transform ◮ Unfold-fold originally proposed for transforming recursive programs; used here to transform CFGs into new CFGs ◮ Unfolding a nonterminal replaces it with its expansion A → α β 1 γ A → α B γ A → α β 2 γ B → β 1 B → β 1 ⇒ B → β 2 B → β 2 . . . . . . ◮ Folding is the inverse of unfolding (replace RHS with nonterminal) A → α β γ A → α B γ B → β B → β ⇒ . . . . . . ◮ Transformed grammar generates same language (Sato 1992) 13 / 22

  14. Unfold-fold converts O ( n 4 ) to O ( n 3 ) grammar ◮ Unfold X v responsible for O ( n 4 ) parse time L u → u l L u u l → L u X v L u → ⇒ L u L v v R L u → X v L v v R → ◮ Introduce new non-terminals x M y (doesn’t change language) x M y → x R L y ◮ Fold two children of L u into x M y L u → u l L u → u l L u L v v R L u L u L v v M u → → ⇒ x M y x R L y x M y x R L y → → 14 / 22

  15. Transformed grammar collects left and right dependencies separately L u u R L u u R ⇒ X v X v ′ v M u u M v ′ L v v R L u u R L v ′ v ′ R L v v R L u u R L v ′ v ′ R u ℓ u r u ℓ u r ◮ X v constituents (which cause O ( n 4 ) parse time) no longer used ◮ Head annotations now all phrase peripheral ⇒ O ( n 3 ) parse time ◮ Dependencies can be recovered from parse tree ◮ Basically same as Eisner and Satta O ( n 3 ) algorithm ◮ explains why Inside-Outside sanity check fails for Eisner/Satta ◮ two copies of each terminal ⇒ each terminals’ Outside probability is double the Inside sentence probability 15 / 22

  16. Parse using O ( n 3 ) transformed split-head grammar S L gave gave R L Sandy Sandy M gave gave M bone bone R Sandy R L gave gave R L bone Sandy ℓ Sandy r gave M dog dog R L a a M bone gave R L dog a R L bone gave ℓ gave r L the the M dog a ℓ a r bone ℓ bone r the R L dog the ℓ the r dog r dog ℓ 0 Sandy gave the dog a bone 16 / 22

  17. Parsing time of CFG encodings of same PBDG CFG schemata sentences parsed / second Naive O ( n 5 ) CFG 45.4 O ( n 4 ) simple split-head CFG 406.2 O ( n 3 ) transformed split-head CFG 3580.0 ◮ Weighted PBDG; all pairs of heads have some dependency weight ◮ Dependency weights precomputed before parsing begins ◮ Timing results on a 3.6GHz Pentium 4 machine parsing section 24 of the PTB ◮ CKY parsers with grammars hard-coded in C (no rule lookup) ◮ Dependency accuracy of Viterbi parses = 0.8918 for all grammars ◮ Feature extraction is much slower than even naive CFG 17 / 22

  18. Outline Projective Bilexical Dependency Grammars Simple split-head encoding O ( n 3 ) split-head CFGs via Unfold-Fold Transformations capturing 2nd-order dependencies Conclusion 18 / 22

  19. Predict argument based on governor and sibling S L gave gave R R gave M bone R bone L R L Sandy Sandy M gave M dog M bone gave dog Sandy R gave ℓ gave r L dog dog R L bone L L Sandy ℓ Sandy r L the the M L a a M dog bone the R dog ℓ dog r a R bone ℓ bone r the ℓ the r a ℓ a r ◮ Very similar to second-order algorithm given by McDonald (2006) 19 / 22

  20. Predict argument based on governor and governor’s governor S L gave gave R L R L Sandy Sandy M gave M bone R gave bone L Sandy ℓ Sandy r L gave gave M a a M bone gave R L a L bone R gave M dog R a ℓ a r bone ℓ bone r dog L gave M the the M dog gave R L the L dog gave ℓ gave r the ℓ the r dog ℓ dog r ◮ Because left and right dependencies are assembled separately, only captures 2nd-order dependencies where one dependency is leftward and other is rightward 20 / 22

  21. Outline Projective Bilexical Dependency Grammars Simple split-head encoding O ( n 3 ) split-head CFGs via Unfold-Fold Transformations capturing 2nd-order dependencies Conclusion 21 / 22

  22. Conclusion and future work ◮ Presented a reduction from PBDGs to O ( n 3 ) parsable CFGs ◮ split-head CFG representation of PBDGs ◮ Unfold-fold transform ◮ CKY algorithm on resulting CFG simulates Eisner/Satta algorithm on original PBDG ◮ Makes CFG techniques applicable to PBDGs ◮ max marginal parsing (Goodman 1996) and other CFG parsing and estimation algorithms ◮ Can capture different dependencies, yielding different PDG models ◮ 2nd-order “horizontal” dependencies (McDonald 2006) ◮ what other combinations of dependencies can we capture? (if we permit O ( n 4 ) parse time?) ◮ do any of these improve parsing accuracy? 22 / 22

Recommend


More recommend