Cooperative Learning of Disjoint Syntax and Semantics Serhii - PowerPoint PPT Presentation

Cooperative Learning of Disjoint Syntax and Semantics Serhii Havrylov Germán Kruszewski Armand Joulin

Is using linguistic structures for sentence modelling useful? (e.g. syntactic trees) 2

Is using linguistic structures for Yes, it is! Let’s create sentence modelling useful? more treebanks! (e.g. syntactic trees) 3

Is using linguistic structures for Yes, it is! Let’s create sentence modelling useful? more treebanks! (e.g. syntactic trees) No! Annotations are expensive to make. Parse trees is just a linguists’ social construct. Just stack more layers and you will be fine! 4

Recursive neural network 5

Recursive neural network neutral 9

Recursive neural network neutral 10

Latent tree learning 11

Latent tree learning ● RL-SPINN: Yogatama et al., 2016 ● Soft-CYK: Maillard et al., 2017 ● Gumbel Tree-LSTM: Choi et al., 2018 16

Latent tree learning ● RL-SPINN: Yogatama et al., 2016 ● Soft-CYK: Maillard et al., 2017 ● Gumbel Tree-LSTM: Choi et al., 2018 Recent work has shown that: ● Trees do not resemble any semantic or syntactic formalisms (Williams et al. 2018). 17

Latent tree learning ● RL-SPINN: Yogatama et al., 2016 ● Soft-CYK: Maillard et al., 2017 ● Gumbel Tree-LSTM: Choi et al., 2018 Recent work has shown that: ● Trees do not resemble any semantic or syntactic formalism (Williams et al. 2018). ● Parsing strategies are not consistent across random restarts (Williams et al. 2018). 18

Latent tree learning ● RL-SPINN: Yogatama et al., 2016 ● Soft-CYK: Maillard et al., 2017 ● Gumbel Tree-LSTM: Choi et al., 2018 Recent work has shown that: ● Trees do not resemble any semantic or syntactic formalisms (Williams et al. 2018). ● Parsing strategies are not consistent across random restarts (Williams et al. 2018). ● These models fail to learn the simple context-free grammar (Nangia et al. 2018). 19

ListOps (Nangia, & Bowman (2018)) [MIN 1 [MAX [MIN 9 [MAX 1 0 ] 2 9 [MED 8 4 3 ] ] [MIN 7 5 ] 6 9 3 ] ] [MAX 1 4 0 9 ] [MAX 7 1 [MAX 6 8 1 7 ] [MIN 2 6 ] 3 ] 20

ListOps (Nangia, & Bowman (2018)) [MIN 1 [MAX [MIN 9 [MAX 1 0 ] 2 9 [MED 8 4 3 ] ] [MIN 7 5 ] 6 9 3 ] ] [MAX 1 4 0 9 ] [MAX 7 1 [MAX 6 8 1 7 ] [MIN 2 6 ] 3 ] 9 21

ListOps (Nangia, & Bowman (2018)) 22

Tree-LSTM parser (Choi et al., 2018) 23

Separation of syntax and semantics Parser Compositional Function 36

Parsing as a RL problem Parser Compositional Function 37

Optimization challenges Size of the search space is 38

Optimization challenges Size of the search space is For a sentence with 20 words, there are 1_767_263_190 possible trees. 39

Optimization challenges Syntax and semantic has to be learnt simultaneously model has to infer from examples that [MIN 0 1] = 0 40

Optimization challenges Syntax and semantic has to be learnt simultaneously model has to infer from examples that [MIN 0 1] = 0 – nonstationary environment (i.e the same sequence of actions can receive different rewards) 41

Optimization challenges Typically, the compositional function θ is learned faster than the parser φ. 42

Optimization challenges Typically, the compositional function θ is learned faster than the parser φ. This fast coadaptation limits the exploration of the search space to parsing strategies similar to those found at the beginning of the training. 43

Optimization challenges ● High variance in the estimate of a parser’s gradient ∇ φ has to be addressed. ● Learning paces of a parser θ and a compositional function φ have to be levelled off. 44

Variance reduction 45

Variance reduction reward 46

Variance reduction reward Is this a carrot? 47

Variance reduction the moving average of recent rewards new reward 48

Variance reduction ● [MIN 1 [MAX [MIN 9 [MIN 1 0 ] 2 [MED 8 4 3 ] ] [MAX 7 5 ] 6 9 ] ] ● [MAX 1 0 ] 49

Variance reduction ● [MIN 1 [MAX [MIN 9 [MIN 1 0 ] 2 [MED 8 4 3 ] ] [MAX 7 5 ] 6 9 ] ] ● [MAX 1 0 ] self-critical training (SCT) baseline Rennie et al. (2017) 52

Synchronizing syntax and semantics learning Syntax Semantics 53

Synchronizing syntax and semantics learning – 54

Synchronizing syntax and semantics learning – 55

Synchronizing syntax and semantics learning – Proximal Policy Optimization (PPO) of Schulman et al. (2017) 56

Optimization challenges ● High variance in the estimate of a parser’s gradient ∇ φ is addressed by using self-critical training (SCT) baseline of Rennie et al. (2017). ● Learning paces of a parser φ and a compositional function θ is levelled off by controlling parser’s updates using Proximal Policy Optimization (PPO) of Schulman et al. (2017). 57

ListOps results 9 58

Extrapolation 63

Sentiment Analysis (SST-2) 64

Sentiment Analysis (SST-2) 65

Natural language inference (MultiNLI) 66

Time and Space complexities Time Space Method ListOps complexity complexity O(nd 2 ) O(nd 2 ) RL-SPINN: Yogatama et al., 2016 O(n 3 d+n 2 d 2 ) O(n 3 d) Soft-CYK: Maillard et al., 2017 O(n 2 d+nd 2 ) O(n 2 d) Gumbel Tree-LSTM: Choi et al., 2018 O(Knd 2 ) O(nd 2 ) Ours n – sentence length d – tree-LSTM dimensionality K – number of updates in PPO 67

Conclusions ● The separation between syntax and semantics allows coordination between optimisation schemes for each module. ● Self-critical training mitigates credit assignment problem by distinguishing “hard” and “easy” to solve datapoints. ● The model can recover a simple context-free grammar of mathematical expressions. ● The model performs competitively on several real natural language tasks. github.com/facebookresearch/latent-treelstm 68

Cooperative Learning of Disjoint Syntax and Semantics Serhii - PowerPoint PPT Presentation

Cooperative Learning of Disjoint Syntax and Semantics Serhii Havrylov Germn Kruszewski Armand Joulin Is using linguistic structures for sentence modelling useful? (e.g. syntactic trees) 2 Is using linguistic structures for Yes, it is!

Slide 16 1. Disjoint 2. Not disjoint 3. Disjoint 4. Not disjoint 5. Disjoint Slide 18 Slide 25

Disjoint Sets and Disjoint sets The UNION-FIND ADT for disjoint sets the UNION-FIND

S 3 identified by a rep. identified by a rep. n n = # of = # of Make Make- -Set

Data Structures for Disjoint Set Union-Find Data Structure Disjoint Set Data Structure Disjoint

Chapter 6: Syntax Syntax Syntax is the structure of a language. Earlier, both syntax and

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Fundamantals Syntax of Programming Languages cs3723 1 Syntax and Semantics Syntax The

Syntax and Semantics Philipp Koehn 3 November 2020 Philipp Koehn Machine Translation: Syntax

Glue semantics (Slides available at http://www.ucl.ac.uk/~ucjtmgg/docs/LAGB2015-slides.pdf ) Glue

CSE 326: Data Structures Maintain a set of pairwise disjoint sets. Disjoint Sets

Syntax and ANTLR Syntax vs. Semantics Semantics: What does a program mean? Defined by

Semantics and Verification 2005 Lecture 2 informal introduction to CCS syntax of CCS semantics

Cooperative Web Caching Cooperative Web Caching Cooperative Caching Cooperative Caching

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

Syntax Liam OConnor CSE, UNSW (and data61) Term3 2019 1 Abstract Syntax Parsing Bindings

CSE 3341: Principles of Programming Languages Syntax Jeremy Morris 1 Syntax vs. Semantics

Interstate 81 Corridor Improvement Plan Update Nick Donohue Deputy Secretary of Transportation

Cheryl Morales Marketing Manager Newport News Tourism Development Office Mission Statement

COME TO THE 2013 CONVENTION! National Convention by the Fredericksburg Rappahannock Chapter

4,000 10% 8% 3,000 Millions of

JPMorgan Claverhouse Investment Trust plc Annual General Meeting | 25 April 2019 William Meadon ,

to Commission on Youth William A. Hazel, Jr., Secretary of Health and Human Resources Dietra Y.

The Southern Rockies LCC: Collaborative, Effective, And Efficientand

Why Portland? Portland CRE Trends and Regional Market Overview Portland Why all the hype?