Policy Shaping and Generalized Update Equations for Semantic - PowerPoint PPT Presentation

Policy Shaping and Generalized Update Equations for Semantic Parsing from Denotations 1

Semantic Parsing with Execution Text Environment Semantic Parsing Meaning Representation Execution Denotation (Answer) 2

Semantic Parsing with Execution “ What nation scored the most points? ” Environment Semantic Parsing Select Nation Index Name Nation Points Games Pts/game Where Points is Max 1 Karen Andrew England 44 5 8.8 2 Daniella Waterman England 40 5 8 Execution 3 Christelle Le Duff France 33 5 6.6 “England” 4 Charlotte Barras England 30 5 6 5 Naomi Thomas Wales 25 5 5 3

Indirect Supervision • No gold programs during training “ What nation scored the most points? ” Environment Semantic Parsing Select Nation Index Name Nation Points Games Pts/game Where Points is Max 1 Karen Andrew England 44 5 8.8 2 Daniella Waterman England 40 5 8 Execution 3 Christelle Le Duff France 33 5 6.6 “England” 4 Charlotte Barras England 30 5 6 5 Naomi Thomas Wales 25 5 5 4

Learning ● Neural Model ○ x: “ What nation scored the most points? ” ○ y: Select Nation Where Index is Minimum ○ neural models ⇒ score(x, y): encode x, encode y, and produce scores ● Argmax procedure ○ Beamseach: argmax score(x, y) ● Indirect supervision ○ Find approximated gold meaning representations ○ Reinforcement learning algorithms 5

Semantic Parsing with Indirect Supervision • Question: “ What nation scored the most points? ” • Answer: “England” For Training Index Name Nation Points Games Pts/game 1 Karen Andrew England 44 5 8.8 2 Daniella Waterman England 40 5 8 3 Christelle Le Duff France 33 5 6.6 4 Charlotte Barras England 30 5 6 5 Naomi Thomas Wales 25 5 5 6

Search for Training • A correct program should execute to the gold answer. • In general, there are several spurious programs that execute to the gold answer but are semantically incorrect. 7

Search for Training: Spurious Programs • Search for training. Goal: find semantically correct parse! • Question: “ What nation scored the most points? ” Select Nation Where Points = 44 ⇒ “England” Select Nation Where Index is Minimum ⇒ “England” Select Nation Where Pts/game is Maximum ⇒ “England” Select Nation Where Point is Maximum ⇒ “England” • All programs above generate right answers but only one is correct. 8

Update Step • Generally there are several methods to update the model. • Examples: maximum marginal likelihood, reinforcement learning, margin methods. 9

Contributions ● (1) Policy Shaping for handling spurious programs (2) Generalized Update Equation for generalizing common update strategies and allowing novel updates. ● (1) and (2) seem independent, but they interact with each other!! ● 5% absolute improvement over SOTA on SQA dataset 10

Learning from Indirect Supervision ● Question x , Table t , Answer z , Parameters θ [Search for Training] With x , t , z , beam search suitable Κ = {y’} 1 [Update] Update θ , according K = {y’} 2 11

Spurious Programs ● Question x , Table t , Answer z , Parameters θ [Search for Training] With x , t , z , beam search suitable {y’} 1 • If the model selects a spurious program for update then it increases the chance of selecting spurious programs in future. 12

Policy Shaping [Griffith et al., NIPS-2013] 13

Search with Shaped Policy ● Question x , Table t , Answer z , Parameters θ [Search for Training] With x , t , z , beam search suitable {y’} 1 1 14

Critique Policy 1. Surface-form Match: Features triggered for constants in the program that match a token in the question. 2. Lexical Pair Score: Features triggered between keywords and tokens (e.g., Maximum and “ most ”). 15

Critique Policy Features lexical pair match Question: “ What nation scored the most points? ” Select Nation Where Points = 44 Select Nation Where Index is Minimum Select Nation Where Pts/game is Maximum Select Nation Where Points is Maximum Select Nation Where Name = Karen Andrew surface-form match 16

Learning Pipeline Revisited [Search for Training] With x , t , z , beam search suitable Κ = {y’} 1 ● Using policy shaping to find “better” K ⇐ Shaping affects here [Update] Update θ , according K = {y’} 2 ● What is the better objective function J θ ? 17

Objective Functions Look Different! ● Maximum Marginal Likelihood (MML) ● Reinforcement learning (RL) ● Maximum Margin Reward (MMR) Maximum Reward Program Most violated program generated 18 according to reward augment inference

Update Rules are Similar ● Maximum Marginal Likelihood (MML) ● Reinforcement learning (RL) ● Maximum Margin Reward (MMR ) 19

Generalized Update Equation [Update] Update θ , according K = {y’} 2 20

Improvement over Margin Approaches ● MMR ● MAVER

Results on SQA: Answer Accuracy (%) • Policy shaping helps improve performance. • With policy shaping, different updates matters even more • Achieves new state-of-the-art (previously 44.7%) on SQA 22

Comparing Updates MML: MMR: ● MMR and MAVER are more “aggressive” than MML ○ MMR and MAVER update towards to one program ○ MML updates toward to all programs that can generate the correct answer 23

Conclusion ● Discussed problem with search and update steps in semantic parsing from denotation. ● Introduced policy shaping for biasing the search away from spurious programs. ● Introduced generalized update equation that generalizes common update strategies and allows novel updates. ● Policy shaping allows more aggressive update! 24

BACKUP 25

Generalized Update as an Analysis Tool ● MMR and MAVER are more “aggressive” than MML ○ MMR and MAVER only pick one ○ MML gives credits to all {y} that satisfies {z} ○ MMR and MAVER benefit more from shaping 26

Learning from Indirect Supervision ● Question x , Table t , Answer z , Parameters θ [Search for Training] With x , t , z , beam search suitable {y’} 1 ● Search in training. Goal: finding semantically correct y’ [Update] Update θ , according {y’} 2 ● Many different ways of update θ 27

Shaping and update Better search ⇒ more aggressive update [Search for Training] With x , t , z , beam search suitable Κ = {y’} 1 ● Using policy shaping to find “better” K ⇐ Shaping affects here directly [Update] Update θ , according K = {y’} 2 ● What is the better objective function J θ ? ⇐ Shaping affects here indirectly 28

Novel Learning Algorithm Intensity Competing Distribution Dev Performance w/o shaping Maximum Marginal Likelihood Maximum Marginal Likelihood 32.4 (MML) (MML) Maximum Margin Reward (MMR) Maximum Margin Reward (MMR) 40.7 Maximum Marginal Likelihood Maximum Margin Reward (MMR) 41.9 (MML) • Mixing the MMR’s intensity and MML’s competing distribution gives an update that outperforms MMR. 29

Novel Learning Algorithms 30

Learning Method #1 – Maximum Marginal Likelihood (MML) 31

Learning Method #2 – Reinforcement Learning (RL) 32

Learning Method #3 – Maximum Margin Reward (MMR) 33

Learning Method #4 – Maximum Margin Average Violation Reward (MAVER) 34

Policy Shaping and Generalized Update Equations for Semantic - PowerPoint PPT Presentation

Policy Shaping and Generalized Update Equations for Semantic Parsing from Denotations 1 Semantic Parsing with Execution Text Environment Semantic Parsing Meaning Representation Execution Denotation (Answer) 2 Semantic Parsing with

West Virginia Dietetic Association: Shaping the Future We Are ADA! ADA UPDATE AND RESOURCES

Nonlinear infinite-horizon control using generalized Lyapunov equations Tobias Breiten Karl

On the generalized Emden-Fowler differential equations Zuzana Do sl a Joint research with

Numerical experiments with a level-set tracking algorithm for generalized diffusion equations

Generalized Fourier Series for Solutions of Linear Differential Equations Alexandre Benoit,

PPI PENSIONS POLICY INSTITUTE Shaping a stable pensions solution How pension experts would

Outer Boundary Conditions for the Generalized Harmonic Einstein Equations: Stability and Accuracy

Generalized Kinetic Equations and Stochastic Game Theory for Social Systems Andrea Tosin

Explicit Modular Approaches to Generalized Fermat Equations David Brown University of

Generalized Fourier Series for Solutions of Linear Differential Equations Alexandre Benoit 1 Joint

Generalized Solutions of Riccati equations and inequalities D.Z. Arov, M.A. Kaashoek, D.R. Pik

Exact solutions to three-dimensional generalized Gross-Pitaevskii equations with varying potential

A Generalized Representation Formula for Geometric Extensions The Kirchhoff-Sobolev Parametrix

Shaping the Future (Future shaping us) A Montfortian Synthesis (MONTFORTIAN TERCENTENARY:

Policy Shaping in the EEA and the role of the EFTA Secretariat Autumn Seminar on the EEA

Policy Shaping in the EEA and the role of the Secretariat Ms. Hrund Hafsteinsdttir Senior

Update on the Treatment of Generalized Update on the Treatment of Generalized Anxiet Anxiety

Shaping public policy on childcare to maximise economic independence Presentation for EQUINET

Shaping policy, sharing solutions, strengthening communities Building the Plane While Flying and

Policy shaping in the EEA and the role of the Secretariat EEA Seminar Brussels, 9 - 10 June 2011

Shaping policy, sharing solutions, strengthening communities Managed Long Term Services and

AIST POLICY UPDATE 29 July 2020 SPEAKER: David Haynes, Senior Policy Manager, AIST Policy

Pre-ICANN68 Policy Update Webinar 18 June 2020 | 1 Welcome to the Pre-ICANN68 Policy Update

Year 11 Core GCSE Support 2017 'Shaping Futures' 'Shaping Futures' Three way Partnership