weakly supervised bayesian learning of a ccg supertagger
play

Weakly-Supervised Bayesian Learning of a CCG Supertagger Dan - PowerPoint PPT Presentation

Weakly-Supervised Bayesian Learning of a CCG Supertagger Dan Garrette, Chris Dyer, Jason Baldridge, Noah A. Smith Type-Level Supervision Type-Level Supervision Unannotated text Incomplete tag dictionary: word {tags} Type-Level


  1. Weakly-Supervised Bayesian Learning of a CCG Supertagger Dan Garrette, Chris Dyer, Jason Baldridge, Noah A. Smith

  2. Type-Level Supervision

  3. Type-Level Supervision • Unannotated text • Incomplete tag dictionary: word ↦ {tags}

  4. Type-Level Supervision Used for POS tagging for 20+ years [Kupiec, 1992] [Merialdo, 1994]

  5. Type-Level Supervision Good POS tagger performance even with low supervision [Das & Petrov 2011] [Garrette & Baldridge 2013] [Garrette et al. 2013]

  6. Combinatory Categorial Grammar (CCG)

  7. CCG Every word token is associated with a category Categories combine to categories of constituents [Steedman, 2000] [Steedman and Baldridge, 2011]

  8. CCG np np np / n n the dog

  9. CCG s np s s \ np dogs sleep

  10. POS vs. Supertags S s VP NP np np/n DT NN VBZ n s\np the the dog sleeps dog sleeps

  11. Supertagging Type-supervised learning for supertagging is much more difficult than for POS Penn Treebank POS CCGBank Supertags 48 tags 1,239 tags

  12. CCG The grammar formalism itself can be used to guide learning

  13. CCG Supertagging

  14. CCG Supertagging • Sequence tagging problem, like POS-tagging • Building block for grammatical parsing

  15. Supertagging “almost parsing” [Bangalore and Joshi 1999]

  16. Why Supertagging? np n / n / n n s np \ the lazy dog sleeps

  17. Why Supertagging? s np n np np / n n / n n s s \ np the lazy dog sleeps

  18. CCG Supertagging n np np / n n n / n n s s np \ np the lazy dog sleeps

  19. CCG Supertagging np np / n n n / n n s np s \ np the lazy dog sleeps

  20. CCG Supertagging np n / n / n n s\np the lazy dog sleeps

  21. CCG Supertagging np/n ? n the lazy dog

  22. Principle #1 np/n np n X X the lazy dog Prefer Connections

  23. Supertags vs. POS s S np VP NP np/n n s\np DT NN VBZ ? the dog sleeps the dog sleeps universal, intrinsic all relationships grammar properties must be learned

  24. Principle #2 np/n (np\(np/n))/n n the lazy dog Prefer Simplicity

  25. Prefer Simplicity appears 342 times in CCGbank buy := (s b \np)/np e.g. “Opponents don't buy such arguments.” buy := (((s b \np)/ pp )/ pp )/np appears once “Tele-Communications agreed to buy half of Showtime Networks from Viacom for $ 225 million.” pp pp

  26. Weighted Tag Grammar a {s, np, n,…} p atom ( a ) × p term A B / B p term × p fwd × p mod A B / C p term × p fwd × p mod A B \ B p term × p fwd × p mod A B \ C p term × p fwd × p mod

  27. CCG Supertagging np np/n (np\(np/n))/n n n/n the lazy dog

  28. HMM Transition Prior P( t → u ) = λ · P( u ) + (1 −λ ) · P( t → u ) simple is good connecting is good

  29. Type-Supervised Learning unlabeled corpus same as POS tagging tag dictionary universal properties of the CCG formalism

  30. Training

  31. Posterior Inference Forward-Filter Backward-Sample (FFBS) � [Carter and Kohn, 1996]

  32. Posterior Inference Unlabeled Data ______________ ______________ ______________ the lazy wander dogs ______________ ______________ np/n n/n n n Tag Dictionary ___ : __, __, __ np np n/n ___ : __, __, __ ___ : __, __, __ (s\np)/np np/n ___ : __, __, __ ___ : __, __, __ s\np …

  33. Posterior Inference Priors the lazy wander dogs np/n n/n n n np np n/n HMM (s\np)/np np/n s\np …

  34. Posterior Inference Priors the lazy wander dogs np/n n/n n n np np n/n HMM (s\np)/np np/n s\np …

  35. Posterior Inference Priors the lazy wander dogs np/n n/n n n np np n/n HMM (s\np)/np np/n s\np …

  36. Posterior Inference Priors the lazy wander dogs np/n np/n n/n n/n n n n np np n/n HMM (s\np)/np np/n s\np s\np …

  37. Posterior Inference Priors the lazy wander dogs np/n n/n n n np np n/n HMM (s\np)/np np/n s\np …

  38. Experiments

  39. Baldridge 2008 Use universal properties of CCG to initialize EM • Simpler definition of category complexity • No corpus-specific information

  40. English Supertagging 100 Baldridge '08 Ours y c 75 80 80 a 78 r 73 u c 67 c a 50 55 g 51 n i 41 g g 25 a t 0 0.1 0.01 0.001 none tag dictionary pruning cutoff

  41. Chinese Supertagging 100 Baldridge '08 Ours y c 75 a r u 69 66 c 62 c 56 a 50 g 49 43 n i g 33 g 25 28 a t 0 0.1 0.01 0.001 none tag dictionary pruning cutoff

  42. Italian Supertagging 100 Baldridge '08 Ours y c 75 a r u c c a 50 54 53 g 47 46 45 n i 36 g 33 32 g 25 a t 0 0.1 0.01 0.001 none tag dictionary pruning cutoff

  43. Code Available GitHub repository linked from my website

  44. Conclusion Combining annotation exploitation with universal grammatical knowledge yields good models from weak supervision

Recommend


More recommend