Task-Oriented Query Reformulation with Reinforcement Learning - PowerPoint PPT Presentation

Task-Oriented Query Reformulation with Reinforcement Learning Authors: Rodrigo Nogueira and Kyunghyun Cho Slides: Chris Benson

Motivation Query: “uiuc natural language processing class” Search Engine

Motivation Query: “uiuc class ai language words computer science” Search Engine

Motivation Using inexact or long queries in search engines tend to result in poor document retrieval ● Vocabulary Mismatch Problem ● Iterative Searching

Idea: Automatic Query Reformulation Query: “uiuc class ai language words computer science” Reformulator Search Engine Query: “uiuc natural language processing class”

Model as a Reinforcement Learning Problem ● Hard to create annotated data for queries ○ What is the “correct” query? ○ Successful queries are not unique ● Learn directly from reward based on relevant document retrieval ● Train to use search engine as a black box

Automatic Query Reformulation q 0 Original Query Reformulator q t D t Search Engine D t Reward Documents Documents Documents D t D* Relevant Documents Scorer Relevant Documents Relevant Documents

Reinforcement Learning: Policy Algorithms ● Directly learn policy of how to act ● Policy ( π ) gives probabilities of taking an action ( a ) in a given state ( s ) using parameters theta ( θ ) π θ (a,s) = P(a|s,θ) ● Find policy that maximizes reward by finding the best parameters θ ● Learn policy instead of a value function ○ Q-learning learns a value function

Policy Gradient Algorithms J(θ) = Expected reward for policy π θ with parameters θ ● Goal: Maximize J(θ) ● ● Update policy parameters θ using gradient ascent ○ Follow gradient with respect to θ ( ∇ θ ): θ := θ + α ∇ θ J(θ) ● REINFORCE ○ Monte Carlo Policy Gradient θ t+1 = θ t + αr t ∇ θ log(π θ (a t ,s t ) )

Policy Gradient Algorithms J(θ) = Expected reward for policy π θ with parameters θ ● Goal: Maximize J(θ) ● ● Update policy parameters θ using gradient ascent ○ Follow gradient with respect to θ ( ∇ θ ): θ := θ + α ∇ θ J(θ) ● REINFORCE ○ Monte Carlo Policy Gradient θ t+1 = θ t + αr t ∇ θ log(π θ (a t ,s t ) ) Reward at step t

REINFORCE with Baseline ● Monte Carlo Policy gradient algorithms suffer from high variance ○ Problem: If r t is always positive, probabilities of actions just keep going up ● Rather than update when a reward is positive or negative , update when a reward is better or worse than expected ● Baseline: ○ Value to subtract from the reward to reduce variance ○ Estimate the reward v t for state s t using a value function θ t = θ t + α(r t - v t ) ∇ θ log(π θ (a t ,s t ))

REINFORCE with Baseline ● Monte Carlo Policy gradient algorithms suffer from high variance ○ Problem: If r t is always positive, probabilities of actions just keep going up ● Rather than update when a reward is positive or negative , update when a reward is better or worse than expected ● Baseline: ○ Value to subtract from the reward to reduce variance ○ Estimate the reward v t for state s t using a value function θ t = θ t + α(r t - v t ) ∇ θ log(π θ (a t ,s t )) (Reward - Baseline)

Reformulator: Inputs and Outputs ● Inputs: Original query: q 0 = (w 1 , … w n ) ○ Documents from q 0 : D 0 ○ Candidate term: t i ○ Context terms: (t i-k , … ,t i+k ) ○ Terms around candidate term to give information on how word is used ■ ● Outputs: Probability of using candidate term in new query ( Policy ): P(t i |q 0 ) ○ ○ Estimated Reward Value ( Baseline) : Ȓ

REINFORCE ● Stochastic Objective Function for Policy ● Value Network Trained to Minimize: ● Minimize using stochastic gradient descent

Reward R = Recall@K Where D K are the top-K retrieved documents and D * are the relevant documents R@40 used for training reinforcement learning models

Reformulator: Model Use Word2vec to convert Inputs terms to vector representations ● Use ● CNN followed by Max Pool or RNN to create fixed length output ● Concatenate outputs from original query and candidate terms ● Generate policy and reward outputs

Reformulator: Model Use CNN/RNN to create fixed length vector outputs ● Use ● CNN followed by Max Pool or RNN to create fixed length output ● Concatenate outputs from original query and candidate terms ● Generate policy and reward outputs

Reformulator: Model Concatenate outputs from original query and candidate terms ● Use ● CNN followed by Max Pool or RNN to create fixed length output ● Concatenate outputs from original query and candidate terms ● Generate policy and reward outputs

Reformulator: Model ● Use ● CNN followed by Max Pool or RNN to create fixed length output ● Concatenate outputs from original query and candidate terms ● Generate policy and reward outputs

Reinforcement Learning Extensions ● Sequential model of term addition ○ Produces shorter queries ● Oracle to estimate upper bound on performance for RL methods ○ Split validation or test data into N smaller subsets ○ Train an RL agent on each subset until it overfits the subset ○ Average the rewards achieved by each agent on their given subset

Baseline Method: Supervised Learning ● Assume terms independently affect query results ● Train binary classifier to predict if adding a term to a given query will increase recall ● Add terms that are predicted to increase performance above a threshold

Experiments: Datasets ○ TREC - Complex Answer Retrieval (TREC-CAR) ■ Query: wikipedia title and subsection title ■ Relevant Documents: Paragraphs in subsection ○ Jeopardy ■ Query: A Jeopardy question ■ Relevant Documents: Wikipedia article with title of the answer ○ Microsoft Academic (MSA) ■ Query: Paper Title ■ Relevant Documents: Papers cited in the original paper

Results

Conclusions ● RL methods work the best overall ○ RL-RNN achieves highest scores ○ RL-RNN-SEQ produces shorter queries and is faster ● There is a large gap between best RL method and RL-Oracle . ○ Shows there is significant room for improvement using RL methods

Questions?

References ● Rodrigo Nogueira and Kyunghyun Cho. Task-oriented query reformulation with reinforcement learning. In Proceedings of EMNLP, 2017. ● Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning ● Sutton, R. S., & Barto, A. G. (2018).Reinforcement learning: An introduction. Cambridge,MA: MIT Press. ● Query Reformulator Github: https://github.com/nyu-dl/QueryReformulator ● Slides on paper by authors: https://github.com/nyu-dl/QueryReformulator/blob/master/Slides.pdf

Task-Oriented Query Reformulation with Reinforcement Learning - PowerPoint PPT Presentation

Task-Oriented Query Reformulation with Reinforcement Learning Authors: Rodrigo Nogueira and Kyunghyun Cho Slides: Chris Benson Motivation Query: uiuc natural language processing class Search Engine Motivation Query: uiuc class ai

Exact Query Reformulation with First-order Ontologies and Databases Nhung Ngo Free University of

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Sustainable Reformulation Sustainable Reformulation and Advanced Materials and Advanced

Constraint Satisfaction: Modeling and Reformulation with Modeling and Reformulation with

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

Reformulation of chance constrained problems using penalty functions Martin Branda Charles

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

How Do Users Respond to Voice Input Errors? Lexical and Phonetic Query Reformulation in Voice

Query reformulation model and patterns from dango to japanese cakes M Universit

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Image Identification with Natural Language Specification Qi Feng, Donghyun Kim Department of

Stochastic Search using the Natural Gradient Ecient Natural Evolution Strategies (eNES) Yi Sun,

NaturalLI: Natural Logic Inference for Common Sense Reasoning Angeli & Manning (2014)

CSE 473: Artificial Intelligence Autumn 2011 Search Luke Zettlemoyer Slides from Dan Klein,

WELC LCOME ME TO JS JS101 Job Search ch Training Skills, Knowledge, and Information for the

Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search Youhei

Stochastic Methods for Continuous Optimization Anne Auger and Dimo Brockhoff Paris-Saclay Master

Current Status of GMSB Searches at CMS SUSY at the Near Energy Frontier Fermilab Peter

Task-Oriented Query Reformulation with Reinforcement Learning - PowerPoint PPT Presentation

Task-Oriented Query Reformulation with Reinforcement Learning Authors: Rodrigo Nogueira and Kyunghyun Cho Slides: Chris Benson Motivation Query: uiuc natural language processing class Search Engine Motivation Query: uiuc class ai

Exact Query Reformulation with First-order Ontologies and Databases Nhung Ngo Free University of

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Sustainable Reformulation Sustainable Reformulation and Advanced Materials and Advanced

Constraint Satisfaction: Modeling and Reformulation with Modeling and Reformulation with

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

Reformulation of chance constrained problems using penalty functions Martin Branda Charles

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

How Do Users Respond to Voice Input Errors? Lexical and Phonetic Query Reformulation in Voice

Query reformulation model and patterns from dango to japanese cakes M Universit

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Image Identification with Natural Language Specification Qi Feng, Donghyun Kim Department of

Stochastic Search using the Natural Gradient Ecient Natural Evolution Strategies (eNES) Yi Sun,

NaturalLI: Natural Logic Inference for Common Sense Reasoning Angeli &amp; Manning (2014)

CSE 473: Artificial Intelligence Autumn 2011 Search Luke Zettlemoyer Slides from Dan Klein,

WELC LCOME ME TO JS JS101 Job Search ch Training Skills, Knowledge, and Information for the

Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search Youhei

Stochastic Methods for Continuous Optimization Anne Auger and Dimo Brockhoff Paris-Saclay Master

Current Status of GMSB Searches at CMS SUSY at the Near Energy Frontier Fermilab Peter

NaturalLI: Natural Logic Inference for Common Sense Reasoning Angeli & Manning (2014)