effective self training for parsing
play

Effective Self-Training for Parsing David McClosky - PowerPoint PPT Presentation

Effective Self-Training for Parsing David McClosky dmcc@cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 1


  1. Effective Self-Training for Parsing David McClosky dmcc@cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 1

  2. Parsing David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 2

  3. Parsing “I need a sentence with ambiguity.” David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 2

  4. Parsing S NP VP . PRP VBP NP . NP PP I need DT NN IN NP NN a sentence with ambiguity “I need a sentence with ambiguity.” David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 2

  5. Parsing is a sentence s π is a parse tree parse ( s ) = arg max p ( π | s ) π such that yield ( π ) = s David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 3

  6. Flow Chart David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 4

  7. Flow Chart David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 5

  8. n -best parsing S NP VP . PRP VBP NP . p ( π 1 ) = 7 . 25 × 10 − 20 NP PP I need a sentence with ambiguity S NP VP . PRP VBP NP PP . p ( π 2 ) = 7 . 05 × 10 − 21 I need a sentence with ambiguity David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 6

  9. Reranking Parsers Best parses are not always first, but the correct parse is often in the top 50 Rerankers rescore parses from the n -best parser using more complex (not necessarily context-free) features Oracle rerankers on the Charniak parser’s 50-best list can achieve over 95% f -score David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 7

  10. Flow Chart David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 8

  11. Our reranking parser Parser and reranker as described in Charniak and Johnson (ACL 2005) with new features Lexicalized context-free generative parser, maximum entropy discriminative reranker New reranking features improve reranking parser’s performance by 0.3% on section 23 over ACL 2005 David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 9

  12. Unlabelled data Question: Can we improve the reranking parser with cheap unlabeled data? David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 10

  13. Unlabelled data Question: Can we improve the reranking parser with cheap unlabeled data? Self-training Co-training Clustering n -grams, use clusters as general class of n -grams Improve vocabulary, n -gram language model etc. David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 10

  14. Self-training Train model from labeled data train reranking parser on WSJ Use model to annotate unlabeled data use model to parse NANC Combine annotated data with labeled training data merge WSJ training data with parsed NANC data Train a new model from the combined data train reranking parser on WSJ + NANC data Optional: repeat with new model on more unlabeled data David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 11

  15. Flow Chart David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 12

  16. Previous work Parsing: Charniak (1997), confirmed by Steedman et al. (2003) insignificant improvement Part of speech tagging: Clark et al. (2003) minor improvement/damage depending on amount of training data Parser adaptation: Bacchiani et al. (2006) helps when parsing WSJ when training on Brown corpus and self-training on news data David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 13

  17. Experiments (overview) How should we annotate data? (parser or reranking parser) How much unlabelled data should we label? How should we combine annotated unlabeled data with true data? David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 14

  18. Annotating unlabeled data Annotator Sentences added Parser Reranking parser 0 (baseline) 90.3 50k 90.1 90.7 500k 90.0 90.9 1,000k 90.0 90.8 1,500k 90.0 90.8 2,000k 91.0 Parser (not reranking parser) f -scores on all sentences in section 22 David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 15

  19. Annotating unlabeled data WSJ Section Sentences added 1 22 24 0 (baseline) 91.8 92.1 90.5 50k 91.8 92.4 90.8 500k 92.0 92.4 90.9 1,000k 92.1 92.2 91.3 2,000k 92.2 92.0 91.3 Reranking parser f -scores for all sentences David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 16

  20. Weighting WSJ data Wall Street Journal data is more reliable than the self-trained data Multiply each event in Wall Street Journal data by a constant to give it a higher relative weight events = c × events wsj + events nanc Increasing WSJ weight tends to improve f -scores. Based on development data, our best model is WSJ × 5+1,750k sentences from NANC David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 17

  21. Evaluation on test section Model f parser f reranker Charniak and Johnson (2005) – 91.0 Current baseline 89.7 91.3 Self-trained 91.0 92.1 f -scores from all sentences in WSJ section 23 David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 18

  22. The Story So Far... Retraining parser on its own output doesn’t help Retraining parser on the reranker’s output helps Retraining reranker on the reranker’s output doesn’t help David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 19

  23. Analysis: Global changes Oracle f -scores increase, self-trained parser has greater potential Model 1-best 10-best 50-best Baseline 89.0 94.0 95.9 WSJ × 1 + 250k 89.8 94.6 96.2 WSJ × 5 + 1,750k 90.4 94.8 96.4 Pr(1-best) Average of log 2 Pr(50th-best) increases from 12.0 (baseline parser) to 14.1 (self-trained parser) David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 20

  24. Sentence-level Analysis 2000 Better Number of sentences (smoothed) Better No change No change Number of sentences Worse 100 Worse 1500 80 1000 60 40 500 20 0 0 10 20 30 40 50 60 0 1 2 3 4 5 Sentence length Unknown words 2000 Better Better No change No change 600 Number of sentences Worse Number of sentences 1500 Worse 400 1000 200 500 0 0 1 2 3 4 5 0 2 4 6 8 10 Number of CCs Number of INs David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 21

  25. Effect of Sentence Length Number of sentences (smoothed) Better No change 100 Worse 80 60 40 20 0 10 20 30 40 50 60 Sentence length David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 22

  26. The Goldilocks Effect TM Number of sentences (smoothed) Better No change 100 Worse 80 60 40 20 0 10 20 30 40 50 60 Sentence length David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 23

  27. . . . and . . . 2000 Better No change Number of sentences 1500 Worse 1000 500 0 0 1 2 3 4 5 Number of CCs David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 24

  28. Ongoing work Parser adaptation (McClosky, Charniak, and Johnson ACL 2006) Sentence selection Clustering local trees Other ways of combining data David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 25

  29. Conclusions Self-training can improve on state-of-the-art parsing for Wall Street Journal Reranking parsers can self-train their first stage parser More analysis is needed to understand why reranking is necessary Self-trained reranking parser available from: ftp://ftp.cs.brown.edu/pub/nlparser David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 26

  30. Acknowledgements This work was supported by NSF grants LIS9720368, and IIS0095940, and DARPA GALE contract HR0011-06-2-0001. Thanks to Michael Collins, Brian Roark, James Henderson, Miles Osborne, and the BLLIP team for their comments. Questions? David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 27

Recommend


More recommend