Synchronous Context-Free Grammars and Optimal Linear Parsing Strategies Daniel Gildea Giorgio Satta University of Rochester Università di Padova
Synchronous CFG Context-free Grammar: X → A B Synchronous Context-free Grammar (SCFG) 4 , C 1 B 2 C 3 D 3 A 1 D 4 B 2 X → A C → Powell, 鲍 威 尔
Synchronous CFG • Synchronous parsing: find tree from two strings – used to learn grammar from parallel text • This talk: parsing strategies for long rules • Results also apply to translation with n-gram language model
Context-Free Grammar A → B C C B A
Binary SCFG 2 , C 1 C 2 B 1 A → B C B A
SCFG with 4 nonterminals 4 , C 1 C 2 D 3 E 2 E 4 B 1 D 3 A → B E D C B A
Fan-Out Number of spans in nonterminal. C CFG: fan-out 1 B A SCFG: fan-out 2 E D C B A ϕ ( G ) = max N ∈ G ϕ ( N ) (Rambow & Satta, 1999)
Rank Number of nonterminals on righthand side of rule. C CFG: rank 2 B A SCFG: rank r E D C B A ρ ( G ) = max P ∈ G ρ ( P )
Parsing Strategies Reduce rank E D C B A A → B C D E C D E B X Y X Y A X → B C Y → X D A → Y E
Parsing Strategies Reduce rank, may increase fan-out E D C C B B A X
Rule Length in Synchronous CFG • Binary grammar (ITG): parsing is O ( n 6 ) (Wu, 1997) – Works in real MT (Zhang et al. 2006) • Many rules cannot be binarized without increasing fan-out (Aho and Ullman, 1972) • Fan-out affects space and time complexity
Parsing Complexity Space complexity: O ( n 2 ϕ ( A ) ) Time complexity: O ( n ϕ ( A )+ ϕ ( B )+ ϕ ( C ) ) C C B B A A O ( n 2 ) space O ( n 4 ) space O ( n 3 ) time O ( n 6 ) time (Seki et al. 1991)
SCFG Parsing Strategies E D C C B B A X naïve strategy: O ( n 2 r +2 ) time best strategy: Ω ( n cr ) for some c (Gildea and Štefankovi´ c 2007)
This Talk • Finding optimal space complexity is NP-complete • Finding optimal time complexity ⇒ better algs for treewidth
Example Rule 8 B 7 B 6 B 5 B 4 B 3 B 2 B 1 B A
Optimal Parsing Strategy n 7 n 5 n 6 1 5 B n 3 B n 4 2 6 B n 1 B n 2 3 4 7 8 B B B B 4 B 3 B n 1
Carving Width 1 1 2 3 4 2 3 4 tree layout of G G Carving width: max number edges of G routed through tree layout
Cyclic Permutation Multigraph 1 2 3 4 5 6 7 8 A B B B B B B B B 8 , 1 B 2 B 3 B 4 B 5 B 6 B 7 B A → B 5 B 7 B 3 B 1 B 8 B 6 B 2 B 4 B
Carving Width = Space Complexity A n 7 n 5 n 6 n 3 n 4 5 1 B B n 1 n 2 2 6 B B 7 3 4 8 B B B B
Our Reduction • Carving width instance: ( G , k ) • Construct permutation multigraph G ′ , integer k ′ • Carving width of G ⇔ Carving width of G ′ ⇔ optimal parsing for SCFG
Our Construction 1 1 2 3 4 2 3 4 tree layout of G G X 3 X 1 X 2 X 4 G 3 G 1 G 2 G 4
X 1 X 2 X 3 X 4 G 1 G 2 G 3 G 4
Space Complexity Theorem 1: Finding the parsing strategy with optimal space complexity for an SCFG rule is NP-complete
Treewidth CDE DEF EFG FGH GHI HIJ IJK BCD GHN JKL A C E G I K M ABC HNO KLM B D F H J L N O NOP P Q R OPQ PQR QRS S
Dependency Graph y 0 y 1 y 2 y 3 y 4 x 0 x 1 x 2 x 3 x 4 x 0 x 1 x 2 x 3 x 4 4 , B 1 B 2 C 3 D 2 D 4 A 1 C 3 A → B C D E S → A
Treewidth = Time Complexity x 0 x 3 x 1 x 2 x 4 x 0 x 1 x 2 x 0 x 2 x 3 x 0 x 3 x 4 A → B C D E C D E B X Y X Y A X → B C Y → X D A → Y E
Our Reduction • Treewidth instance: ( G , k ) • Construct dependency graph G ′ , integer k ′ • Approx of treewidth of G ⇔ Treewidth of G ′ ⇔ optimal time complexity for SCFG
Dependency Graph Construction
Approximation Algorithm for Treewidth SOL < 8 ∆ ( G )( OPT + 1) . SOL : solution using SCFG parsing strategy OPT : optimal treewidth of input graph G ∆ ( G ) = degree (max num edges touching one vertex)
Time Complexity Theorem 2: Finding the parsing strategy with optimal time complexity for an SCFG rule implies a ∆ ( G )-factor approximation algorithm for treewidth.
Time Complexity Theorem 3: If finding the parsing strategy with optimal time complexity for an SCFG rule is NP-complete, then treewidth for graphs of degree 6 is NP-complete.
Conclusion • Finding parsing strategy with best space complexity is NP-hard. • P-time alg for finding parsing strategy with best time complexity implies better approximation algs for treewidth • NP-hardness for time complexity implies NP-hardness for treewidth of graphs of degree six
Recommend
More recommend