Some Experiments on Indicators of Parsing Complexity for Lexicalized - PowerPoint PPT Presentation

Some Experiments on Indicators of Parsing Complexity for Lexicalized Grammars Anoop Sarkar, Fei Xia and Aravind Joshi Dept. of Computer and Information Sciences University of Pennsylvania f anoop,fxia,joshi g @linc.cis.upenn.edu 1

Lexicalized Tree Adjoining Grammars NP NP u u NNP NP NNP m n n Ms. Haag S u NP VP NP arg n u VBZ NP NNP n arg n plays Elianti These trees can be combined to parse the sentence Ms. Haag plays Elianti . 2

Important Properties of LTAG wrt Parsing � Predicate-argument structure is represented in each elementary tree. � Adjunction instead of feature unification. � No recursive feature structures. FSs are bounded. 3

Important Properties of LTAG wrt Parsing � Transformational relations for the same predicate-argument structure are precomputed. � Each predicate selects a family of elementary trees. � Different sources of issues for parsing efficiency. 4

Parsing Efficiency � Parsing accuracy: Evaluations done in previous work. � Parsing efficiency: observed time complexity for producing all parses. � The usual notion: compare different parsing algorithms wrt time, space, number of edges, : : : � This paper: explore parsing efficiency from a viewpoint that is inde- pendent of a particular parsing algorithm. 5

Parsing Efficiency � Not an alternative to comparision of parsing algorithms. � An exploration of parsing efficiency from the perspective of a fully lexicalized grammar. � Sources of parsing complexity that are part of the input to the parsing algorithm. 6

Parsing Efficiency � We explore two issues: syntactic lexical ambiguity and clausal complexity. � The contention: for LTAGs these issues are relevant across all parsing algorithms. 7

Experiment: The Parser � Implementation of head-corner chart-based parser. � It is bi-directional – van Noord style. � Produces a derivation forest as output. � Written in ANSI C: � -version available at ftp://ftp.cis.upenn.edu/xtag/pub/lem 8

Experiment: Input Grammar � Treebank Grammar � extracted from Sections 02–21 WSJ Penn Treebank 6789 tree templates, 123039 lexicalized trees � � number of word types in the lexicon is 44215 � average number of trees per word is 2 : 78 9

400 350 300 Number of trees selected 250 200 150 100 50 0 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 Word frequency Number of trees selected by the words grouped by word frequency 10

Treebank Grammar and XTAG English Grammar � Compared TG with the XTAG Grammar which has 1004 tree templates, 53 tree families and 1.8 million lexicalized trees. � 82.1% of template tokens in the Treebank grammar match a corre- sponding template in the XTAG grammar � 14.0% are covered by the XTAG grammar but the templates look different because of different linguistic analyses 11

Treebank Grammar and XTAG English Grammar � 1.1% of template tokens in the Treebank grammar are due to annota- tion errors � The remaining 2.8% are not currently covered by the XTAG grammar � A total of 96.1% of the structures in the Treebank grammar match up with structures in the XTAG grammar. 12

Experiment: Test Corpus � input was a set of 2250 sentences � each sentence was 21 words or less � avg. sentence length was 12 : 3 � number of tokens = 27715 � output: shared forest of parses 13

45 40 35 30 log(No. of derivations) 25 20 15 10 5 0 2 4 6 8 10 12 14 16 18 20 Sentence length Number of derivations per sentence 14

10 9 8 7 log(time) in seconds 6 5 4 3 2 1 0 2 4 6 8 10 12 14 16 18 20 Sentence length Parsing times per sentence 2 Coeff of determination R = 0 : 65 15

4000 3500 3000 Median time (seconds) 2500 2000 1500 1000 500 0 5 10 15 20 Sentence length Median parsing times per sentence 16

Hypothesis � There is a large variability in parse times. � The typical increase in time depending on sentence length is not observed. � Can a sentence predict its own parsing time? � Hypothesis: check the number of lexicalized trees that are selected by each sentence. 17

10 9 8 7 log(Time taken) in seconds 6 5 4 3 2 1 0 0 200 400 600 800 1000 Total num of trees selected by a sentence The impact of syntactic lexical ambiguity on parsing times 2 0 : 82 (previous = 0.65) R = 18

Hypothesis � To test the hypothesis further we did the following tests: – Check time taken when an oracle gives us the single correct tree for each word. – Check time taken after parsing based on the output of an n -best SuperTagger. 19

0 -0.5 -1 -1.5 log(Time taken in secs) -2 -2.5 -3 -3.5 -4 -4.5 -5 0 5 10 15 20 Sentence length Parse times when the parser gets the correct tree for each word in the sentence Total time = 31.2 secs vs. 548K secs (orig) 20

8 6 4 log(Time in secs) 2 0 -2 -4 -6 0 5 10 15 20 25 Sentence length Time taken by the parser after n -best SuperTagging ( 60 ) n = Total time = 21K secs vs. 548K secs (orig) 21

Clausal Complexity � The complexity of syntactic and semantic processing is related to the number of predicate-argument structures being computed for a given sentence. � This notion of complexity can be measured using the number of clauses in the sentence. � Does the number of clauses grow proportionally with sentence length? 22

14 12 Average number of clauses in the sentences 10 8 6 4 2 0 0 50 100 150 200 250 Sentence length Average number of clause plotted against sentence length. 99.1% of sentences in the Penn Treebank contain 6 or fewer clauses 23

4 3.5 3 Standard deviation of clause number 2.5 2 1.5 1 0.5 0 0 50 100 150 200 250 Sentence length Standard deviation of clause number plotted against sentence length. Increase in deviation for sentences longer than 50 words. 24

log(Time taken in secs) 10 9 8 7 6 5 4 3 2 1 0 20 15 1 1.5 10 Sentence length 2 2.5 3 5 3.5 4 Num of clauses 4.5 5 Variation in parse time against sentence length while identifying the number of clauses 25

log(Time taken in secs) 10 9 8 7 6 5 4 3 2 1 0 1000 1 1.5 500 Num of trees selected 2 2.5 3 3.5 4 Num of clauses 4.5 5 Variation in parse time against number of trees The parser spends most of its time attaching modifiers 26

Conclusions � We explored two issues that affect parsing effiency for LTAGs: syntactic lexical ambiguity and clausal complexity. – Parsing of LTAGs is determined by number of trees selected by a sentence. – Number of clauses does not grow proportionally with sentence length. � Current work: incorporate these factors to improve parsing efficiency for LTAGs. 27

Some Experiments on Indicators of Parsing Complexity for Lexicalized - PowerPoint PPT Presentation

Some Experiments on Indicators of Parsing Complexity for Lexicalized Grammars Anoop Sarkar, Fei Xia and Aravind Joshi Dept. of Computer and Information Sciences University of Pennsylvania f anoop,fxia,joshi g @linc.cis.upenn.edu 1 Lexicalized

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Models of Human Parsing Experimental Data 2 Informatics 2A: Lecture 22 Eye-tracking Reading

Outline LR Parsing Review of bottom-up parsing LALR Parser Generators Computing the

Graph-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Parsing as Deduction Joseph K uhner March 24, 2007 Joseph K uhner Parsing as Deduction

Bottom-up parsing LR parsing Construct parse tree for input from leaves up LR( k ) parsing

Compilers Shift-Reduce Parsing Alex Aiken Shift-Reduce Parsing Important Fact #1 about

Parsing, Part I Jim Royer April 2, 2019 CIS 352 Parsing, Part I 1 Miss Teen South

Programming Languages: Parsing Onur Tolga S ehito glu Computer Engineering,METU 27 May

* 07/16/96 Plan for Today Shift-reduce parsing The problem with predictive top down parsing

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing Workshop on

AUTOMATED REASONING Otten "Restricting Backtracking in Connection Calculii" (2010).

On Average Case Complexity of SAT Johann A. Makowsky Faculty of Computer Science

s rss r st

SAT-Solving: From Davis- Putnam to Zchaff and Beyond Day 2: Efficient SAT Solving Lintao Zhang

Complexity of MCSP and Its Variants Shuichi Hirahara (The University of Tokyo) Rahul Santhanam

Log Logic ic: : TD TD as se as sear arch ch, , Da Data talo log (v (var ariab ables)

Log Logic ic: : TD TD as se as sear arch ch, , Da Data talo log (v (var ariab ables)

Clique is hard on average for regular resolution Ilario Bonacina, UPC Barcelona Tech July 27,

Some Experiments on Indicators of Parsing Complexity for Lexicalized - PowerPoint PPT Presentation

Some Experiments on Indicators of Parsing Complexity for Lexicalized Grammars Anoop Sarkar, Fei Xia and Aravind Joshi Dept. of Computer and Information Sciences University of Pennsylvania f anoop,fxia,joshi g @linc.cis.upenn.edu 1 Lexicalized

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Models of Human Parsing Experimental Data 2 Informatics 2A: Lecture 22 Eye-tracking Reading

Outline LR Parsing Review of bottom-up parsing LALR Parser Generators Computing the

Graph-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Parsing as Deduction Joseph K uhner March 24, 2007 Joseph K uhner Parsing as Deduction

Bottom-up parsing LR parsing Construct parse tree for input from leaves up LR( k ) parsing

Compilers Shift-Reduce Parsing Alex Aiken Shift-Reduce Parsing Important Fact #1 about

Parsing, Part I Jim Royer April 2, 2019 CIS 352 Parsing, Part I 1 Miss Teen South

Programming Languages: Parsing Onur Tolga S ehito glu Computer Engineering,METU 27 May

* 07/16/96 Plan for Today Shift-reduce parsing The problem with predictive top down parsing

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing Workshop on

AUTOMATED REASONING Otten &quot;Restricting Backtracking in Connection Calculii&quot; (2010).

On Average Case Complexity of SAT Johann A. Makowsky Faculty of Computer Science

s rss r st

SAT-Solving: From Davis- Putnam to Zchaff and Beyond Day 2: Efficient SAT Solving Lintao Zhang

Complexity of MCSP and Its Variants Shuichi Hirahara (The University of Tokyo) Rahul Santhanam

Log Logic ic: : TD TD as se as sear arch ch, , Da Data talo log (v (var ariab ables)

Log Logic ic: : TD TD as se as sear arch ch, , Da Data talo log (v (var ariab ables)

Clique is hard on average for regular resolution Ilario Bonacina, UPC Barcelona Tech July 27,

AUTOMATED REASONING Otten "Restricting Backtracking in Connection Calculii" (2010).