A Supertag-Context Model for Weakly-Supervised CCG Parser Learning - PowerPoint PPT Presentation

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning Dan Garrette U. Washington Chris Dyer CMU Jason Baldridge UT-Austin Noah A. Smith CMU

Contributions 1. A new generative model for learning CCG parsers from weak supervision 2. A way to select Bayesian priors that capture properties of CCG 3. A Bayesian inference procedure to learn the parameters of our model

Type-Level Supervision • Unannotated text • Incomplete tag dictionary: word � {tags}

Type-Level Supervision the lazy dogs wander np/n n/n n np np (s\np)/np

Type-Level Supervision the lazy dogs wander np/n n/n n n np np n/n (s\np)/np np/n s\np …

Type-Level Supervision ? the lazy dogs wander np/n n/n n n np np n/n (s\np)/np np/n s\np …

PCFG: Local Decisions

PCFG: Local Decisions A

PCFG: Local Decisions A B C

PCFG: Local Decisions A B C D E F G

PCFG: Local Decisions A B C D E F G B C B C P( D E | P( F G | ) )

A New Generative Model A B C D E F G B B P( D E | )

A New Generative Model A B C D E F G B B P( D E | × P( | ) ) F B B R

A New Generative Model A B C D E F G <S> B B P( D E | × P( | ) × P( | ) ) F B B B S B R L

A New Generative Model A B C D E F G <E> <S> (This makes inference tricky… we’ll come back to that)

Why CCG? • The grammar formalism itself can be used to guide learning • Given any two categories, we always know whether they are combinable. • We can extract a priori context preferences, before we even look at the data • Adjacent categories tend to be combinable.

Why CCG? S s NP VP np VB s/np DT NN np/n n ? buy the book buy the book universal, intrinsic all relationships grammar properties must be learned

CCG Parsing s np n n / n s np / n n \ np FA sleeps the lazy dog

CCG Parsing s np np/n n / n s np / n n \ np FC sleeps the lazy dog

Supertag Context n /n n s np np np / n n n s \ np sleeps the lazy dog

Supertag Context n n /n n s np np np / n n n s \ np sleeps the lazy dog

Supertag Context s np np np / n n n s \ np sleeps the lazy dog

Supertag Context n n /n n s np np np / n n n s \ np sleeps the lazy dog

Constituent Context • Klein & Manning showed the value of modeling context with the Constituent Context Model (CCM) sleeps the lazy dog [Klein & Manning 2002]

Constituent Context DT ( JJ NN ) VBZ [Klein & Manning 2002]

Constituent Context “substitutability” DT ( JJ NN ) VBZ lazy dog [Klein & Manning 2002]

Constituent Context “substitutability” DT ( NN ) VBZ dog [Klein & Manning 2002]

Constituent Context “substitutability” DT ( JJ JJ NN ) VBZ big lazy dog [Klein & Manning 2002]

Constituent Context “substitutability” ~Noun DT ( ) VBZ [Klein & Manning 2002]

Constituent Context “substitutability” DT ( ) VBZ [Klein & Manning 2002]

Supertag Context n ( n /n n s np n np / n ) s \ np sleeps the lazy dog

Supertag Context • We know the constituent label • We know if it’s a fitting context, even before looking at the data n ( s np np n / ) s \ np sleeps the

This Paper 1. A new generative model for learning CCG parsers from weak supervision 2. A way to select Bayesian priors that capture properties of CCG 3. A Bayesian inference procedure to learn the parameters of our model

Supertag-Context Parsing A 04 Standard PCFG A 03 P(A root ) P(A → A left A right OR w i ) A 13 t 1 t 2 t 3 t 4 w 1 w 2 w 3 w 4 0 1 2 3 4

Supertag-Context Parsing A 04 With Context A 03 P(A root ) P(A → A left A right OR w i ) A 13 P(A → t left ) t 1 t 2 t 3 t 4 <s> <e> P(A → t right ) w 1 w 2 w 3 w 4 0 1 2 3 4

Prior on Categories np np np np\(np/n) n np/n (np\(np/n))/n n np/n n/n n the lazy dog the lazy dog [Garrette, Dyer, Baldridge, and Smith, 2015]

Supertag-Context Prior { 10 5 if t left can combine with A ∝ P L-prior (t left | A) 1 otherwise A ? t left t right sleeps the lazy dog

Supertag-Context Prior P R-prior (t right | A) { 10 5 if A can combine with t right ∝ 1 otherwise n ? t left t right sleeps the lazy dog

This Paper 1. A new generative model for learning CCG parsers from weak supervision 2. A way to select Bayesian priors that capture properties of CCG 3. A Bayesian inference procedure to learn the parameters of our model

Type-Level Supervision ? the lazy dogs wander np/n n/n n np np (s\np)/np

Type-Supervised Learning unlabeled corpus tag dictionary universal properties of the CCG formalism

Posterior Inference • A Bayesian inference procedure will make use of our linguistically-informed priors • But we can’t do sampling like a PCFG • Can’t compute the inside chart, even with dynamic programming.

Sampling via Metropolis-Hastings Idea: • Sample tree from an efficient proposal distribution • (PCFG parameters) (Johnson et al. 2007) • Accept according to the full distribution • (Context parameters)

Posterior Inference Priors (prefer connections) the lazy dogs wander np/n n/n n Model np np (s\np)/np

Posterior Inference Priors (prefer connections) the lazy dogs wander np/n n/n n n Model np np n/n (s\np)/np np/n s\np …

Posterior Inference Priors Inside (prefer connections) the lazy dogs wander np/n n/n n n Model np np n/n (s\np)/np np/n s\np …

Posterior Inference Sample Priors (prefer connections) the lazy dogs wander np/n n/n n n Model np np n/n (s\np)/np np/n s\np …

Metropolis-Hastings Priors (prefer connections) Model

Metropolis-Hastings Priors Existing Tree (prefer connections) New Tree Model

Metropolis-Hastings Priors (prefer connections) Model

Posterior Inference Priors (prefer connections) Model

Metropolis-Hastings • Sample tree based only on the pcfg parameters • Accept based only on the context • New worse than old => less likely to accept

Experimental Results

Experimental Question • When supervision is incomplete, does modeling context, and biasing toward combining contexts, help learn better parsing models?

English Results 75 no context +context combinability 65 64 61 64 63 60 56 60 60 59 55 58 parsing accuracy 50 25 0 250k 200k 150k 100k 50k 25k size of the corpus from which the tag dictionary is drawn

Experimental Results 60 no context 58 +context combinability 55 54 52 parsing accuracy 40 34 29 20 0 English Italian Chinese 25k token TD corpus

Conclusion Under weak supervision, we can use universal grammatical knowledge about context to find trees with a better global structure .

Deficiency • Generative story has a “throw away” step if the context-generated nonterminals don’t match the tree. • We sample only over the space of valid trees (condition on well-formed structures). • This is a benefit of the Bayesian formulation. • See Smith 2011.

Metropolis-Hastings new tree current tree P context ( y ) = P context ( y ′ ) = P full ( y ′ ) / P pcfg ( y ′ ) P full ( y ) / P pcfg ( y ) z ∼ uniform(0,1) P full ( y ′ ) / P pcfg ( y ′ ) P context ( y ′ ) accept if z < = P full ( y ) / P pcfg ( y ) P context ( y )

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning - PowerPoint PPT Presentation

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning Dan Garrette U. Washington Chris Dyer CMU Jason Baldridge UT-Austin Noah A. Smith CMU Contributions 1. A new generative model for learning CCG parsers from weak

https://bazel.build/ Inputs /usr/bin/cc Action Outputs ./parser.h cc -I. -c parser.c -o

Primary Care Networks 24 th April 2019 Brighton and Hove CCG | Coastal West Sussex CCG |

Recovery Programme W est Sussex CCG W est Sussex CCG Brighton and Hove CCG Brighton and

free 18-May-17 Towards Weakly Supervised Image Understanding 1/50 Towards Weakly Supervised

Weakly-Supervised Grammar-Informed Bayesian CCG Parser Learning Dan Garrette UT-Austin Chris

1 2 3+4 2 type Parser = String Tree type Parser = String ( Tree, String) type Parser =

Weakly Supervised Classification Weakly Supervised Classification and Robust Learning and Robust

Faversham Network Meeting your communitys health and social care needs Your CCG The CCG

Organising Integrated Care NHS South Kent Coast CCG and NHS Thanet CCG Dr Darren Cocker

Building a Predictive Parser I.e., How to build the parse table for a recursive-descent parser 1

Tasks of a Parser Tasks of a Parser Document Parser Interfaces Document Parser Interfaces

Welcome Aylesbury Vale CCG Chiltern CCG Governing Bodies meeting in common in public

Weakly-Supervised Bayesian Learning of a CCG Supertagger Dan Garrette, Chris Dyer, Jason

LID Challenge: Weakly Supervised Semantic Segmentation 3d place solution NoPeopleAllowed: The 3

Dual-Gradients Localization framework for Weakly Supervised Object Localization Chuangchuang Tan

Weakly-Supervised Temporal Localization via Occurrence Count Learning Julien Schroeter

E select "barack" , "short" union select "clinton"

Between Dog and Wolf: A Continuous Transition from Fuzzy to Probabilistic Estimates Martine

Recent Advances in Adversarial Machine Learning Nicholas Carlini Google Research Recent

Sustainable Shorelines Designs: Sustainable Shoreline at Foundry Dock Park: Design and Performance

COMP200 INTERFACES OOP using Java, from slides by Shayan Javed 2 Interfaces 3 ANIMAL

TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES

Start N, A, X- #2 Finish N, A, X- #3 Halt Halt Sit N, A, X- #4 Halt Halt Sit Down N, A,

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning - PowerPoint PPT Presentation

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning Dan Garrette U. Washington Chris Dyer CMU Jason Baldridge UT-Austin Noah A. Smith CMU Contributions 1. A new generative model for learning CCG parsers from weak

https://bazel.build/ Inputs /usr/bin/cc Action Outputs ./parser.h cc -I. -c parser.c -o

Primary Care Networks 24 th April 2019 Brighton and Hove CCG | Coastal West Sussex CCG |

Recovery Programme W est Sussex CCG W est Sussex CCG Brighton and Hove CCG Brighton and

free 18-May-17 Towards Weakly Supervised Image Understanding 1/50 Towards Weakly Supervised

Weakly-Supervised Grammar-Informed Bayesian CCG Parser Learning Dan Garrette UT-Austin Chris

1 2 3+4 2 type Parser = String Tree type Parser = String ( Tree, String) type Parser =

Weakly Supervised Classification Weakly Supervised Classification and Robust Learning and Robust

Faversham Network Meeting your communitys health and social care needs Your CCG The CCG

Organising Integrated Care NHS South Kent Coast CCG and NHS Thanet CCG Dr Darren Cocker

Building a Predictive Parser I.e., How to build the parse table for a recursive-descent parser 1

Tasks of a Parser Tasks of a Parser Document Parser Interfaces Document Parser Interfaces

Welcome Aylesbury Vale CCG Chiltern CCG Governing Bodies meeting in common in public

Weakly-Supervised Bayesian Learning of a CCG Supertagger Dan Garrette, Chris Dyer, Jason

LID Challenge: Weakly Supervised Semantic Segmentation 3d place solution NoPeopleAllowed: The 3

Dual-Gradients Localization framework for Weakly Supervised Object Localization Chuangchuang Tan

Weakly-Supervised Temporal Localization via Occurrence Count Learning Julien Schroeter

E select &quot;barack&quot; , &quot;short&quot; union select &quot;clinton&quot;

Between Dog and Wolf: A Continuous Transition from Fuzzy to Probabilistic Estimates Martine

Recent Advances in Adversarial Machine Learning Nicholas Carlini Google Research Recent

Sustainable Shorelines Designs: Sustainable Shoreline at Foundry Dock Park: Design and Performance

COMP200 INTERFACES OOP using Java, from slides by Shayan Javed 2 Interfaces 3 ANIMAL

TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES

Start N, A, X- #2 Finish N, A, X- #3 Halt Halt Sit N, A, X- #4 Halt Halt Sit Down N, A,

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers

E select "barack" , "short" union select "clinton"