Improving a Neural Semantic Parser by Counterfactual Learning from - PowerPoint PPT Presentation

Introduction Task Objectives Experiments Conclusion Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback Carolin Lawrence , Stefan Riezler Heidelberg University Institute for Computational Linguistics July 17, 2018 1/18

Introduction Task Objectives Experiments Conclusion Situation Overview ◮ Situation: deployed system (e.g. QA, MT ...) ◮ Goal: improve system using human feedback ◮ Plan: create a log D log of user-system interactions & improve system offline (safety) Here: Improve a Neural Semantic Parser 2/18

Introduction Task Objectives Experiments Conclusion Contrast to Previous Approaches parses Answers Database y 1 , ..., y s a 1 , ..., a s predict train Rewards Parser Comparison r 1 , ..., r s required data gold question x answer for 1...n 3/18

Introduction Task Objectives Experiments Conclusion Our Approach parse y Database answer a predict log train Parser (x, y, r) Parser User Feedback r required data question x for 1...n 4/18

Introduction Task Objectives Experiments Conclusion Our Approach parses Answers Database parse y Database answer a y 1 , ..., y s a 1 , ..., a s predict predict train log train Rewards Parser Parser (x, y, r) Parser Comparison r 1 , ..., r s User Feedback r required data required data gold question x question x answer for 1...n for 1...n ◮ No supervision: given an input, the gold output is unknown ◮ Bandit: feedback is given for only one system output ◮ Bias: log D is biased to the decisions of the deployed system Solution: Counterfactual / Off-policy Reinforcement Learning 5/18

Introduction Task Objectives Experiments Conclusion Task 6/18

Introduction Task Objectives Experiments Conclusion A natural language interface to OpenStreetMap ◮ OpenStreetMap (OSM): geographical database ◮ NLmaps v2 : extension of the previous corpus, now totalling 28,609 question-parse pairs 7/18

Introduction Task Objectives Experiments Conclusion A natural language interface to OpenStreetMap ◮ example question: “ How many hotels are there in Paris? ” Answer: 951 ◮ correctness of answers are difficult to judge → judge parses by making them human-understandable ◮ feedback collection setup: 1. automatically convert a parse to a set of statements 2. humans judge the statements 8/18

Introduction Task Objectives Experiments Conclusion Example: Feedback Formula q u e r y ( a r o u n d ( c e n t e r ( a r e a ( k e y v a l ( ' n a m e ' , ' P a r i s ' ) ) , n w r ( k e y v a l ( ' n a m e ' , ' P l a c e d e l a R é p u b l i q u e ' ) ) ) , s e a r c h ( n w r ( k e y v a l ( ' a m e n i t y ' , ' p a r k i n g ' ) ) ) , m a x d i s t ( WA L K I N G _ D I S T ) ) , q t y p e ( fj n d k e y ( ' n a m e ' ) ) ) 9/18

Introduction Task Objectives Experiments Conclusion Objectives 10/18

Introduction Task Objectives Experiments Conclusion Counterfactual Learning Resources collected log D log = { ( x t , y t , δ t ) } n t =1 with ◮ x t : input ◮ y t : most likely output of deployed system π 0 ◮ δ t ∈ [ − 1 , 0]: loss (i.e. negative reward) received from user Deterministic Propensity Matching (DPM) ◮ minimize the expected risk for a target policy π w n R DPM ( π w ) = 1 ˆ � δ t π w ( y t | x t ) n t =1 ◮ improve π w using (stochastic) gradient descent ◮ high variance → use multiplicative control variate 11/18

Introduction Task Objectives Experiments Conclusion Multiplicative Control Variate ◮ for random variables X and Y , with ¯ Y the expectation of Y : E [ X ] = E [ X Y ] · ¯ Y → RHS has lower variance if Y positively correlates with X DPM with Reweighting (DPM+R) 1 � n t =1 δ t π w ( y t | x t ) ˆ n R DPM+R ( π w ) = t =1 π w ( y t | x t ) · 1 Reweight Sum R � n 1 n ◮ reduces variance but introduces a bias of order O ( 1 n ) that decreases as n increases → n should be as large as possible ◮ Problem: in stochastic minibatch learning, n is too small 12/18

Introduction Task Objectives Experiments Conclusion One-Step Late (OSL) Reweighting Perform gradient descent updates & reweighting asynchronously ◮ evaluate reweight sum R on the entire log of size n using parameters w ′ ◮ update using minibatches of size m , m ≪ n ◮ periodically update R → retains all desirable properties DPM+OSL 1 � m t =1 δ t π w ( y t | x t ) ˆ m R DPM+OSL ( π w ) = 1 � n t =1 π w ′ ( y t | x t ) n 13/18

Introduction Task Objectives Experiments Conclusion Experiments 15/18

Introduction Task Objectives Experiments Conclusion Experimental Setup ◮ sequence-to-sequence neural network Nematus ◮ deployed system: pre-trained on 2k question-parse pairs ◮ feedback collection: 1. humans judged 1k system outputs ◮ average time to judge a parse: 16.4s ◮ most parses ( > 70%) judged in < 10s 2. simulated feedback for 23k system outputs ◮ token-wise comparison to gold parse ◮ bandit-to-supervised conversion (B2S): all instances in log with reward 1 are used as supervised training 16/18

Introduction Task Objectives Experiments Conclusion Experimental Results B2S DPM+T+OSL 65.45 +6.96 64.45 +5.77 63.45 F1 Score 62.45 61.45 60.45 59.45 +0.99 58.45 +0.34 57.45 Human Feedback (1k) Large-Scale Simulated Feedback (23k) 17/18

Introduction Task Objectives Experiments Conclusion Take Away Counterfactual Learning ◮ safely improve a system by collecting interaction logs ◮ applicable to any task if the underlying model is differentiable ◮ DPM+OSL: new objective for stochastic minibatch learning Improving a Semantic Parser ◮ collect feedback by making parses human-understandable ◮ judging a parse is often easier & faster than formulating a parse or answer NLmaps v2 ◮ large question-parse corpus for QA in the geographical domain Future Work ◮ integrate feedback form in the online NL interface to OSM 18/18

Improving a Neural Semantic Parser by Counterfactual Learning from - PowerPoint PPT Presentation

Introduction Task Objectives Experiments Conclusion Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback Carolin Lawrence , Stefan Riezler Heidelberg University Institute for Computational Linguistics

https://bazel.build/ Inputs /usr/bin/cc Action Outputs ./parser.h cc -I. -c parser.c -o

Counterfactual Donkeys and the Modal Horizon Andreas Walker and Maribel Romero Counterfactual

1 2 3+4 2 type Parser = String Tree type Parser = String ( Tree, String) type Parser =

Building a Predictive Parser I.e., How to build the parse table for a recursive-descent parser 1

Tasks of a Parser Tasks of a Parser Document Parser Interfaces Document Parser Interfaces

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

Parser Evaluation and the BNC Standard Parser Evaluation The Parsers Jennifer Foster and Josef

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning

Parser Larissa von Witte Institut fr Softwaretechnik und Programmiersprachen 11. Januar 2016

4. Semantic Processing and Attributed Grammars 1 Semantic Processing The parser checks only the

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Application: Semantic Role Labeling CS 6956: Deep Learning for NLP Overview What is semantic

Counterfactual-based mediation analysis Workshop 1 Rhian Daniel London School of Hygiene and

Counterfactual-based mediation analysis Workshop 2 Rhian Daniel London School of Hygiene and

Counterfactual Visual Explanations Yash Goyal Ziyan Wu Jan Ernst Dhruv Batra Devi Parikh

Counterfactual Policy Evaluation in Reproducing Kernel Hilbert Spaces Krikamol Muandet Max

Section V5 and the Data Permissions Matrix Approach for Section V5 of Code DWG 28 th November

ECAP ad hoc Community Advisory Committee Transportation Deep Dive July 23, 2019 Burning fossil

Agenda Ag enda Brief presentation Q&A and discussion Adjourn 2 1 01/23/2020 What

Preliminary results presentation 6 March 2007 Group Chief Executive Andr Lacroix Welcome

Jessica A. DeLorenzo, PMAC Any shoe company mentioned during this presentation is for

Michigan Diabetes Prevention Network Fall Meeting October 28, 2015 Michigan Public Health

Cofinimmo Roadshow Presentation Results at 30.09.2012 1 Cofinimmo Presentation 1. Cofinimmo

TSX-V: CNX I OTCQX: CLLXF A L e a d i n g Z i n c J u n i o r i n E s t a b l i s h e d C a

Improving a Neural Semantic Parser by Counterfactual Learning from - PowerPoint PPT Presentation

Introduction Task Objectives Experiments Conclusion Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback Carolin Lawrence , Stefan Riezler Heidelberg University Institute for Computational Linguistics

https://bazel.build/ Inputs /usr/bin/cc Action Outputs ./parser.h cc -I. -c parser.c -o

Counterfactual Donkeys and the Modal Horizon Andreas Walker and Maribel Romero Counterfactual

1 2 3+4 2 type Parser = String Tree type Parser = String ( Tree, String) type Parser =

Building a Predictive Parser I.e., How to build the parse table for a recursive-descent parser 1

Tasks of a Parser Tasks of a Parser Document Parser Interfaces Document Parser Interfaces

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

Parser Evaluation and the BNC Standard Parser Evaluation The Parsers Jennifer Foster and Josef

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning

Parser Larissa von Witte Institut fr Softwaretechnik und Programmiersprachen 11. Januar 2016

4. Semantic Processing and Attributed Grammars 1 Semantic Processing The parser checks only the

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Application: Semantic Role Labeling CS 6956: Deep Learning for NLP Overview What is semantic

Counterfactual-based mediation analysis Workshop 1 Rhian Daniel London School of Hygiene and

Counterfactual-based mediation analysis Workshop 2 Rhian Daniel London School of Hygiene and

Counterfactual Visual Explanations Yash Goyal Ziyan Wu Jan Ernst Dhruv Batra Devi Parikh

Counterfactual Policy Evaluation in Reproducing Kernel Hilbert Spaces Krikamol Muandet Max

Section V5 and the Data Permissions Matrix Approach for Section V5 of Code DWG 28 th November

ECAP ad hoc Community Advisory Committee Transportation Deep Dive July 23, 2019 Burning fossil

Agenda Ag enda Brief presentation Q&amp;A and discussion Adjourn 2 1 01/23/2020 What

Preliminary results presentation 6 March 2007 Group Chief Executive Andr Lacroix Welcome

Jessica A. DeLorenzo, PMAC Any shoe company mentioned during this presentation is for

Michigan Diabetes Prevention Network Fall Meeting October 28, 2015 Michigan Public Health

Cofinimmo Roadshow Presentation Results at 30.09.2012 1 Cofinimmo Presentation 1. Cofinimmo

TSX-V: CNX I OTCQX: CLLXF A L e a d i n g Z i n c J u n i o r i n E s t a b l i s h e d C a

Agenda Ag enda Brief presentation Q&A and discussion Adjourn 2 1 01/23/2020 What