Optimal Parsing Strategies for Linear Context-Free Rewriting Systems - PowerPoint PPT Presentation

Optimal Parsing Strategies for Linear Context-Free Rewriting Systems Daniel Gildea Computer Science Department University of Rochester

Overview • Factorization lowers rank of LCFRS rules • Binarization minimizes parsing complexity • Minimizing fan-out does not minimize parsing complexity

Linear Context-Free Rewriting Systems LCFRS generalizes CFG, TAG, CCG, SCFG, STAG. Productions p ∈ P take the form: p : A → g ( B 1 , B 2 , . . . , B r ) where A , B 1 , . . . B r ∈ V N , and g is a linear, non-erasing function g ( � x 1,1 , . . . , x 1, ϕ ( B 1 ) � , . . . , � x 1,1 , . . . , x 1, ϕ ( B r ) � ) = � t 1 , . . . , t ϕ ( A ) � (Vijay-Shankar et al. ACL 1987)

Context-Free Grammar g ( � x B � , � x C � ) = � x B x C � A → B C C B A

Tree-Adjoining Grammar C B A

Inversion Transduction Grammar C B A C B A

Synchronous Context-Free Grammar (SCFG) E D C B A

Fan-Out Number of spans in nonterminal. C CFG: fan-out 1 B A C TAG: fan-out 2 B A C ITG: fan-out 2 B A SCFG: fan-out 2 E D C B A ϕ ( G ) = max N ∈ G ϕ ( N ) (Rambow & Satta, 1999)

Rank Number of nonterminals on righthand side of rule. C CFG: rank 2 B A C TAG: rank 2 B A C ITG: rank 2 B A SCFG: rank r E D C B A ρ ( G ) = max P ∈ G ρ ( P )

Factorization Reduces rank E D C B A A → B C D E C D E B X Y X Y A X → B C Y → X D A → Y E

Factorization Reduces rank, may increase fan-out E D C C B B A X

Factorization Algorithms • SCFG → rank 2 (Zhang et al., NAACL 2006) • SCFG → minimum rank in O ( n ) (Zhang & Gildea, SSST 2007) • LCRFS fan-out 2 → rank 2, fan-out 2 in O ( n ) (Sagot & Satta, ACL 2010) • LCRFS → rank 2, min fan-out in O ( n ϕ ) (Gomez-Rodriguez et al., NAACL 2009)

Parsing Complexity C C B B A A O ( n 3 ) O ( n 6 ) For p : A → g ( B 1 , . . . B r ), O ( n c ( p ) ) c ( p ) = ϕ ( A ) + � r i =1 ϕ ( B i ) (Seki et al. 1991)

Parsing Complexity r � c ( p ) = ϕ ( A ) + ϕ ( B i ) i =1 c ( G ) = max p ∈ G c ( p ) c ( G ) ≤ ( ρ ( G ) + 1) ϕ ( G )

Factorization Never increases parsing complexity. E D C C B B A X Binarization minimizes parsing complexity.

Among binarizations, minimizing fan-out and minimizing parsing complexity are INCONSISTENT.

Parsing complexity 14 w/ fan-out 6. Minimum fan-out among binarization = 5.

Dependency Treebank Experiments nmod sbj root vc pp nmod np tmp A hearing is scheduled on the issue today nmod → g 1 g 1 = � A � sbj → g 2 ( nmod , pp ) g 2 ( � x 1,1 � , � x 2,1 � ) = � x 1,1 hearing , x 2,1 � root → g 3 ( sbj , vc ) g 3 ( � x 1,1 , x 1,2 � , � x 2,1 , x 2,2 � ) = � x 1,1 is x 2,1 x 1,2 x 2,2 � vc → g 4 ( tmp ) g 4 ( � x 1,1 � ) = � scheduled , x 1,1 � pp → g 5 ( tmp ) g 5 ( � x 1,1 � ) = � on x 1,1 � nmod → g 6 g 6 = � the � np → g 7 ( nmod ) g 7 ( � x 1,1 � ) = � x 1,1 issue � tmp → g 8 g 8 = � today �

Dependency Treebank Experiments Kuhlmann and Nivre (ACL 2006) define “mildly non-projective dependency structures”. Gomez-Rodriguez et al. (ACL 2009) define “mildly ill-nested dependency structures” parsed in O ( n 3 k +4 ).

Treebank Parsing Complexity complexity arabic czech danish dutch german port swedish 20 1 18 1 16 1 15 1 13 1 12 2 3 11 1 1 1 10 2 6 16 3 9 7 4 1 8 4 7 129 65 10 7 3 12 89 30 18 6 178 11 362 1811 492 59 5 48 1132 93 411 1848 172 201 4 250 18269 1026 6678 18124 2643 1736 3 10942 265202 18306 39362 154948 41075 41245

Conclusion • Parsing complexity � = fan-out •

Conclusion • Parsing complexity � = fan-out • Parsing complexity = 20

Space Complexity • space complexity = O ( n 2 ϕ ( G ) ) • Factorization never improves space complexity.

1: function M INIMAL -B INARIZATION ( p , ≺ ) workingSet ← ∅ ; 2: agenda ← priorityQueue( ≺ ); 3: for i from 1 to ρ ( p ) do 4: workingSet ← workingSet ∪{ B i } ; 5: agenda ← agenda ∪{ B i } ; 6: while agenda � = ∅ do 7: p ′ ← pop minimum from agenda; 8: if nonterms( p ′ ) = { B 1 , . . . B ρ ( p ) } then 9: return p ′ ; 10: for p 1 ∈ workingSet do 11: p 2 ← newProd( p ′ , p 1 ); 12: find p ′ 2 ∈ workingSet : nonterms( p ′ 2 ) = nonterms( p 2 ); 13: if p 2 ≺ p ′ 2 then 14: workingSet ← workingSet ∪{ p 2 }\{ p ′ 2 } ; 15: push(agenda, p 2 ); 16:

Optimal Parsing Strategies for Linear Context-Free Rewriting Systems - PowerPoint PPT Presentation

Optimal Parsing Strategies for Linear Context-Free Rewriting Systems Daniel Gildea Computer Science Department University of Rochester Overview Factorization lowers rank of LCFRS rules Binarization minimizes parsing complexity

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Statistical Parsing Parsing context-free languages ar ltekin University of Tbingen

1 Determinism and Parsing The parsing problem is, given a string w and a context-free grammar G ,

3. Parsing 3.1 Context-Free Grammars and Push-Down Automata 3.2 Recursive Descent Parsing 3.3

Parsing, and Context-Free Grammars Michael Collins, Columbia University Overview An

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Models of Human Parsing Experimental Data 2 Informatics 2A: Lecture 22 Eye-tracking Reading

Outline LR Parsing Review of bottom-up parsing LALR Parser Generators Computing the

Graph-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Parsing as Deduction Joseph K uhner March 24, 2007 Joseph K uhner Parsing as Deduction

Bottom-up parsing LR parsing Construct parse tree for input from leaves up LR( k ) parsing

Automorphisms of Right-Angled Artin Groups Ruth Charney Clay Workshop October 2009 Joint work

Community Assets 137 BCC assets currently let on concessionary terms Total annual value of

St Johns Parish Church Egham Treasurers Report 2019 Financial Result and Accounts

10/9/2018 CAPITAL ALLOWANCES 9 OCTOBER 2018 CONTENTS Nolan Masters CA Benefit, how CAs

Dynamics on free-by-cyclic groups. Chris Leininger (UIUC) joint with S. Dowdall and I. Kapovich

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 February 2011 Tomaso

A Look at Statistics: Where do the Problems Arise?

1 The Hardware: Reorder Buffer Branch Prediction vs. Precise Interrupt If inst write results in