Weakly-Supervised Bayesian Learning of a CCG Supertagger Dan - PowerPoint PPT Presentation

Weakly-Supervised Bayesian Learning of a CCG Supertagger Dan Garrette, Chris Dyer, Jason Baldridge, Noah A. Smith

Type-Level Supervision

Type-Level Supervision • Unannotated text • Incomplete tag dictionary: word ↦ {tags}

Type-Level Supervision Used for POS tagging for 20+ years [Kupiec, 1992] [Merialdo, 1994]

Type-Level Supervision Good POS tagger performance even with low supervision [Das & Petrov 2011] [Garrette & Baldridge 2013] [Garrette et al. 2013]

Combinatory Categorial Grammar (CCG)

CCG Every word token is associated with a category Categories combine to categories of constituents [Steedman, 2000] [Steedman and Baldridge, 2011]

CCG np np np / n n the dog

CCG s np s s \ np dogs sleep

POS vs. Supertags S s VP NP np np/n DT NN VBZ n s\np the the dog sleeps dog sleeps

Supertagging Type-supervised learning for supertagging is much more difficult than for POS Penn Treebank POS CCGBank Supertags 48 tags 1,239 tags

CCG The grammar formalism itself can be used to guide learning

CCG Supertagging

CCG Supertagging • Sequence tagging problem, like POS-tagging • Building block for grammatical parsing

Supertagging “almost parsing” [Bangalore and Joshi 1999]

Why Supertagging? np n / n / n n s np \ the lazy dog sleeps

Why Supertagging? s np n np np / n n / n n s s \ np the lazy dog sleeps

CCG Supertagging n np np / n n n / n n s s np \ np the lazy dog sleeps

CCG Supertagging np np / n n n / n n s np s \ np the lazy dog sleeps

CCG Supertagging np n / n / n n s\np the lazy dog sleeps

CCG Supertagging np/n ? n the lazy dog

Principle #1 np/n np n X X the lazy dog Prefer Connections

Supertags vs. POS s S np VP NP np/n n s\np DT NN VBZ ? the dog sleeps the dog sleeps universal, intrinsic all relationships grammar properties must be learned

Principle #2 np/n (np\(np/n))/n n the lazy dog Prefer Simplicity

Prefer Simplicity appears 342 times in CCGbank buy := (s b \np)/np e.g. “Opponents don't buy such arguments.” buy := (((s b \np)/ pp )/ pp )/np appears once “Tele-Communications agreed to buy half of Showtime Networks from Viacom for $ 225 million.” pp pp

Weighted Tag Grammar a {s, np, n,…} p atom ( a ) × p term A B / B p term × p fwd × p mod A B / C p term × p fwd × p mod A B \ B p term × p fwd × p mod A B \ C p term × p fwd × p mod

CCG Supertagging np np/n (np\(np/n))/n n n/n the lazy dog

HMM Transition Prior P( t → u ) = λ · P( u ) + (1 −λ ) · P( t → u ) simple is good connecting is good

Type-Supervised Learning unlabeled corpus same as POS tagging tag dictionary universal properties of the CCG formalism

Training

Posterior Inference Forward-Filter Backward-Sample (FFBS) � [Carter and Kohn, 1996]

Posterior Inference Unlabeled Data ______________ ______________ ______________ the lazy wander dogs ______________ ______________ np/n n/n n n Tag Dictionary ___ : __, __, __ np np n/n ___ : __, __, __ ___ : __, __, __ (s\np)/np np/n ___ : __, __, __ ___ : __, __, __ s\np …

Posterior Inference Priors the lazy wander dogs np/n n/n n n np np n/n HMM (s\np)/np np/n s\np …

Posterior Inference Priors the lazy wander dogs np/n np/n n/n n/n n n n np np n/n HMM (s\np)/np np/n s\np s\np …

Posterior Inference Priors the lazy wander dogs np/n n/n n n np np n/n HMM (s\np)/np np/n s\np …

Experiments

Baldridge 2008 Use universal properties of CCG to initialize EM • Simpler definition of category complexity • No corpus-specific information

English Supertagging 100 Baldridge '08 Ours y c 75 80 80 a 78 r 73 u c 67 c a 50 55 g 51 n i 41 g g 25 a t 0 0.1 0.01 0.001 none tag dictionary pruning cutoff

Chinese Supertagging 100 Baldridge '08 Ours y c 75 a r u 69 66 c 62 c 56 a 50 g 49 43 n i g 33 g 25 28 a t 0 0.1 0.01 0.001 none tag dictionary pruning cutoff

Italian Supertagging 100 Baldridge '08 Ours y c 75 a r u c c a 50 54 53 g 47 46 45 n i 36 g 33 32 g 25 a t 0 0.1 0.01 0.001 none tag dictionary pruning cutoff

Code Available GitHub repository linked from my website

Conclusion Combining annotation exploitation with universal grammatical knowledge yields good models from weak supervision

Weakly-Supervised Bayesian Learning of a CCG Supertagger Dan - PowerPoint PPT Presentation

Weakly-Supervised Bayesian Learning of a CCG Supertagger Dan Garrette, Chris Dyer, Jason Baldridge, Noah A. Smith Type-Level Supervision Type-Level Supervision Unannotated text Incomplete tag dictionary: word {tags} Type-Level

Primary Care Networks 24 th April 2019 Brighton and Hove CCG | Coastal West Sussex CCG |

Recovery Programme W est Sussex CCG W est Sussex CCG Brighton and Hove CCG Brighton and

free 18-May-17 Towards Weakly Supervised Image Understanding 1/50 Towards Weakly Supervised

Weakly Supervised Classification Weakly Supervised Classification and Robust Learning and Robust

Weakly-Supervised Grammar-Informed Bayesian CCG Parser Learning Dan Garrette UT-Austin Chris

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning Dan Garrette U. Washington

Faversham Network Meeting your communitys health and social care needs Your CCG The CCG

Organising Integrated Care NHS South Kent Coast CCG and NHS Thanet CCG Dr Darren Cocker

Welcome Aylesbury Vale CCG Chiltern CCG Governing Bodies meeting in common in public

Weakly-Supervised Temporal Localization via Occurrence Count Learning Julien Schroeter

LID Challenge: Weakly Supervised Semantic Segmentation 3d place solution NoPeopleAllowed: The 3

Dual-Gradients Localization framework for Weakly Supervised Object Localization Chuangchuang Tan

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Common Conventions BIG BIO Sam Jensen THANKS BIG BIO REVIEW REVIEW

Dependency Parsing as Sequence Labeling with Head-Based Encoding and Multi-Task Learning

Robust Multilingual Part-of-Speech Tagging via Adversarial Training (NAACL 2018) Michihiro

Hybrid Atlas Model of financial equity market Tomoyuki Ichiba 1 Ioannis Karatzas 2 , 3 Adrian

CSE 140 Discussion Section - Apr 09 14 Topics Consensus Theorem Shannons

Lecture 9: Hidden Markov Model Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse

CSEE 3827: Fundamentals of Computer Systems Standard Forms and Simplification with Karnaugh Maps

CS 744: SCOPE Shivaram Venkataraman Fall 2020 ADMINISTRIVIA - Assignment grades this week -