Effective Self-Training for Parsing David McClosky - PowerPoint PPT Presentation

Effective Self-Training for Parsing David McClosky dmcc@cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 1

Parsing David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 2

Parsing “I need a sentence with ambiguity.” David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 2

Parsing S NP VP . PRP VBP NP . NP PP I need DT NN IN NP NN a sentence with ambiguity “I need a sentence with ambiguity.” David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 2

Parsing is a sentence s π is a parse tree parse ( s ) = arg max p ( π | s ) π such that yield ( π ) = s David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 3

Flow Chart David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 4

n -best parsing S NP VP . PRP VBP NP . p ( π 1 ) = 7 . 25 × 10 − 20 NP PP I need a sentence with ambiguity S NP VP . PRP VBP NP PP . p ( π 2 ) = 7 . 05 × 10 − 21 I need a sentence with ambiguity David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 6

Reranking Parsers Best parses are not always first, but the correct parse is often in the top 50 Rerankers rescore parses from the n -best parser using more complex (not necessarily context-free) features Oracle rerankers on the Charniak parser’s 50-best list can achieve over 95% f -score David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 7

Our reranking parser Parser and reranker as described in Charniak and Johnson (ACL 2005) with new features Lexicalized context-free generative parser, maximum entropy discriminative reranker New reranking features improve reranking parser’s performance by 0.3% on section 23 over ACL 2005 David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 9

Unlabelled data Question: Can we improve the reranking parser with cheap unlabeled data? David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 10

Unlabelled data Question: Can we improve the reranking parser with cheap unlabeled data? Self-training Co-training Clustering n -grams, use clusters as general class of n -grams Improve vocabulary, n -gram language model etc. David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 10

Self-training Train model from labeled data train reranking parser on WSJ Use model to annotate unlabeled data use model to parse NANC Combine annotated data with labeled training data merge WSJ training data with parsed NANC data Train a new model from the combined data train reranking parser on WSJ + NANC data Optional: repeat with new model on more unlabeled data David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 11

Previous work Parsing: Charniak (1997), confirmed by Steedman et al. (2003) insignificant improvement Part of speech tagging: Clark et al. (2003) minor improvement/damage depending on amount of training data Parser adaptation: Bacchiani et al. (2006) helps when parsing WSJ when training on Brown corpus and self-training on news data David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 13

Experiments (overview) How should we annotate data? (parser or reranking parser) How much unlabelled data should we label? How should we combine annotated unlabeled data with true data? David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 14

Annotating unlabeled data Annotator Sentences added Parser Reranking parser 0 (baseline) 90.3 50k 90.1 90.7 500k 90.0 90.9 1,000k 90.0 90.8 1,500k 90.0 90.8 2,000k 91.0 Parser (not reranking parser) f -scores on all sentences in section 22 David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 15

Annotating unlabeled data WSJ Section Sentences added 1 22 24 0 (baseline) 91.8 92.1 90.5 50k 91.8 92.4 90.8 500k 92.0 92.4 90.9 1,000k 92.1 92.2 91.3 2,000k 92.2 92.0 91.3 Reranking parser f -scores for all sentences David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 16

Weighting WSJ data Wall Street Journal data is more reliable than the self-trained data Multiply each event in Wall Street Journal data by a constant to give it a higher relative weight events = c × events wsj + events nanc Increasing WSJ weight tends to improve f -scores. Based on development data, our best model is WSJ × 5+1,750k sentences from NANC David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 17

Evaluation on test section Model f parser f reranker Charniak and Johnson (2005) – 91.0 Current baseline 89.7 91.3 Self-trained 91.0 92.1 f -scores from all sentences in WSJ section 23 David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 18

The Story So Far... Retraining parser on its own output doesn’t help Retraining parser on the reranker’s output helps Retraining reranker on the reranker’s output doesn’t help David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 19

Analysis: Global changes Oracle f -scores increase, self-trained parser has greater potential Model 1-best 10-best 50-best Baseline 89.0 94.0 95.9 WSJ × 1 + 250k 89.8 94.6 96.2 WSJ × 5 + 1,750k 90.4 94.8 96.4 Pr(1-best) Average of log 2 Pr(50th-best) increases from 12.0 (baseline parser) to 14.1 (self-trained parser) David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 20

Sentence-level Analysis 2000 Better Number of sentences (smoothed) Better No change No change Number of sentences Worse 100 Worse 1500 80 1000 60 40 500 20 0 0 10 20 30 40 50 60 0 1 2 3 4 5 Sentence length Unknown words 2000 Better Better No change No change 600 Number of sentences Worse Number of sentences 1500 Worse 400 1000 200 500 0 0 1 2 3 4 5 0 2 4 6 8 10 Number of CCs Number of INs David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 21

Effect of Sentence Length Number of sentences (smoothed) Better No change 100 Worse 80 60 40 20 0 10 20 30 40 50 60 Sentence length David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 22

The Goldilocks Effect TM Number of sentences (smoothed) Better No change 100 Worse 80 60 40 20 0 10 20 30 40 50 60 Sentence length David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 23

. . . and . . . 2000 Better No change Number of sentences 1500 Worse 1000 500 0 0 1 2 3 4 5 Number of CCs David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 24

Ongoing work Parser adaptation (McClosky, Charniak, and Johnson ACL 2006) Sentence selection Clustering local trees Other ways of combining data David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 25

Conclusions Self-training can improve on state-of-the-art parsing for Wall Street Journal Reranking parsers can self-train their first stage parser More analysis is needed to understand why reranking is necessary Self-trained reranking parser available from: ftp://ftp.cs.brown.edu/pub/nlparser David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 26

Acknowledgements This work was supported by NSF grants LIS9720368, and IIS0095940, and DARPA GALE contract HR0011-06-2-0001. Thanks to Michael Collins, Brian Roark, James Henderson, Miles Osborne, and the BLLIP team for their comments. Questions? David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 27

Effective Self-Training for Parsing David McClosky - PowerPoint PPT Presentation

Effective Self-Training for Parsing David McClosky dmcc@cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 1

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Models of Human Parsing Experimental Data 2 Informatics 2A: Lecture 22 Eye-tracking Reading

Outline LR Parsing Review of bottom-up parsing LALR Parser Generators Computing the

Graph-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Parsing as Deduction Joseph K uhner March 24, 2007 Joseph K uhner Parsing as Deduction

Bottom-up parsing LR parsing Construct parse tree for input from leaves up LR( k ) parsing

Compilers Shift-Reduce Parsing Alex Aiken Shift-Reduce Parsing Important Fact #1 about

Parsing, Part I Jim Royer April 2, 2019 CIS 352 Parsing, Part I 1 Miss Teen South

Programming Languages: Parsing Onur Tolga S ehito glu Computer Engineering,METU 27 May

* 07/16/96 Plan for Today Shift-reduce parsing The problem with predictive top down parsing

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing Workshop on

Parserpalloza Today, well implement a few recursive-descent parsers in groups Youll have to

A Parallel Union-Find Library in Charm ++ Karthik Senthil Parallel Programming Laboratory

Steven W. Kairys, M.D., M.P.H March 13, 2015 THE THREE PARENTING STYLES AUTHORITARIAN The main

A Minimal Span-Based Neural Constituency Parser Mitchell Stern, Jacob Andreas, Dan Klein CS 546

3. Parsing 3.1 Context-Free Grammars and Push-Down Automata 3.2 Recursive Descent Parsing 3.3

Parsing XML STAT 133 Gaston Sanchez Department of Statistics, UCBerkeley gastonsanchez.com

Earley Parser Christopher Millar and Ekaterina Volkova Seminar fr Sprachwissenschaft

Parsing of Context-Free Grammars Bernd Kiefer { Bernd.Kiefer } @dfki.de Deutsches