bootstrapping dependency grammars
play

Bootstrapping Dependency Grammars from Sentence Fragments via - PowerPoint PPT Presentation

Bootstrapping Dependency Grammars from Sentence Fragments via Austere Models Valentin I. Spitkovsky with Daniel Jurafsky (Stanford University) and Hiyan Alshawi (Google Inc.) Spitkovsky et al. (Stanford & Google) Incomplete Fragments /


  1. Bootstrapping Dependency Grammars from Sentence Fragments via Austere Models Valentin I. Spitkovsky with Daniel Jurafsky (Stanford University) and Hiyan Alshawi (Google Inc.) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 1 / 12

  2. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  3. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  4. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  5. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  6. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  7. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  8. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  9. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  10. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) long sentences are hard — shorter inputs can be easier Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  11. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) long sentences are hard — shorter inputs can be easier: ◮ better chances of guessing larger fractions of correct trees Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  12. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) long sentences are hard — shorter inputs can be easier: ◮ better chances of guessing larger fractions of correct trees ◮ preference for more local structures (Smith and Eisner, 2006) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  13. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) long sentences are hard — shorter inputs can be easier: ◮ better chances of guessing larger fractions of correct trees ◮ preference for more local structures (Smith and Eisner, 2006) ◮ faster training, etc. Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  14. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) long sentences are hard — shorter inputs can be easier: ◮ better chances of guessing larger fractions of correct trees ◮ preference for more local structures (Smith and Eisner, 2006) ◮ faster training, etc. — a rich history going back to Elman (1993) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  15. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) long sentences are hard — shorter inputs can be easier: ◮ better chances of guessing larger fractions of correct trees ◮ preference for more local structures (Smith and Eisner, 2006) ◮ faster training, etc. — a rich history going back to Elman (1993) ... could we “start small” and use more data? Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  16. Introduction Previous Work How have long inputs been handled previously? Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 3 / 12

  17. Introduction Previous Work How have long inputs been handled previously? very carefully... Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 3 / 12

  18. Introduction Previous Work How have long inputs been handled previously? very carefully... ◮ Viterbi training (tolerates bad independence assumptions of models) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 3 / 12

  19. Introduction Previous Work How have long inputs been handled previously? very carefully... ◮ Viterbi training (tolerates bad independence assumptions of models) ◮ + punctuation-induced constraints (partial bracketing: Pereira and Schabes, 1992) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 3 / 12

  20. Introduction Previous Work How have long inputs been handled previously? very carefully... ◮ Viterbi training (tolerates bad independence assumptions of models) ◮ + punctuation-induced constraints (partial bracketing: Pereira and Schabes, 1992) ◮ = punctuation-constrained Viterbi training Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 3 / 12

  21. Introduction Example Example: Punctuation (Spitkovsky et al., 2011) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 4 / 12

  22. Introduction Example Example: Punctuation (Spitkovsky et al., 2011) [ SBAR Although it probably has reduced the level of expenditures for some purchasers ] , Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 4 / 12

Recommend


More recommend