Learning Action Representations for Reinforcement Learning Georgios - PowerPoint PPT Presentation

Learning Action Representations for Reinforcement Learning Georgios Scott Yash James Philip Theocharous Jordan Chandak Kostas Thomas

Reinforcement Learning

Problem Statement Thousands of possible actions!

Problem Statement Thousands of possible actions! ● Personalized tutoring systems

Problem Statement Thousands of possible actions! ● Personalized tutoring systems ● Advertisement/marketing

Problem Statement Thousands of possible actions! ● Personalized tutoring systems ● Advertisement/marketing ● Medical treatment - drug prescription

Problem Statement Thousands of possible actions! ● Personalized tutoring systems ● Advertisement/marketing ● Medical treatment - drug prescription ● Portfolio management

Problem Statement Thousands of possible actions! ● Personalized tutoring systems ● Advertisement/marketing ● Medical treatment - drug prescription ● Portfolio management ● Video/Songs recommendation

Problem Statement Thousands of possible actions! ● Personalized tutoring systems ● Advertisement/marketing ● Medical treatment - drug prescription ● Portfolio management ● Video/Songs recommendation ● … ● … ● Option selection

Key Insights - Actions are not independent discrete quantities.

Key Insights - Actions are not independent discrete quantities. - There is a low dimensional structure underlying their behavior pattern.

Key Insights - Actions are not independent discrete quantities. - There is a low dimensional structure underlying their behavior pattern. - This structure can be learned independent of the reward .

Key Insights - Actions are not independent discrete quantities. - There is a low dimensional structure underlying their behavior pattern. - This structure can be learned independent of the reward . - Instead of raw actions, agent can act in this space of behavior and feedback can be generalized to similar actions.

Proposed Method

Algorithm (a) Supervised learning of action representations.

Algorithm (a) Supervised learning of action representations. (b) Learning internal policy with policy gradients.

Results

Real-world Applications at Adobe Photoshop HelpX Actions = 1498 tutorials Actions = 1843 tools

Poster #112 Today

Results (Action representations) Actual behavior of 2 12 Maze Learned representations of 2 12 actions domain actions

Policy decomposition

Case 1: Action representations are known - The internal policy acts in the space of action representations - Any existing policy gradient algorithm can be used to improve its local performance, independent of the mapping function.

Case 2: Learning action representations - P(a|e) required to map representation to action can be learned by satisfying the earlier assumption: - We parameterize P(a|e) and P(e|s,s’) with learnable functions f and g , respectively. - Observed transition tuples are from the required distribution. - Parameters can be learned by minimizing the stochastic KL divergence. - Procedure is independent of reward .

Experiments Toy Maze: - Agent in continuous state with n actuators. 2 n actions. Exponentially large action space. - - Long horizon and single goal reward. Adobe Datasets : - N-gram based multi-time step user behavior model from passive data. - Rewards defined using a surrogate objective. - Photoshop tool recommendation ( 1843 tools) - HelpX tutorial recommendation ( 1498 tutorials)

Advantages - Exploits structure in space of actions. - Quick generalization of feedback to similar actions. - Less parameters updated using high variance policy gradients. - Drop-in extension for existing policy gradient algorithms.

Learning Action Representations for Reinforcement Learning Georgios - PowerPoint PPT Presentation

Learning Action Representations for Reinforcement Learning Georgios Scott Yash James Philip Theocharous Jordan Chandak Kostas Thomas Reinforcement Learning Problem Statement Thousands of possible actions! Problem Statement Thousands of

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

Reinforcement learning with restrictions on the action set Mario Bravo Universidad de Chile

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Foundations of Machine Learning Reinforcement Learning Reinforcement Learning Agent exploring

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

61A Lecture 16 Announcements String Representations String Representations 4 String

Reinforcement Learning: How Does It Work? We detect a state Reinforcement Learning We choose an

Question How is HTE information typically presented in prescription drug labeling, and are there

PRESCRIPTION CHANGES DURING GERIATRIC CARE EPISODES A trend analysis Marianne Reimers

Threshold resummation for Drell-Yan production: theory and phenomenology Marco Bonvini

Spending Challenges for Prescription Drugs: Part B and Part D Reform Stacie B. Dusetzina, PhD

The Craft of XML Text Encoding in historical and humanistic context Wendell Piez JADH 2015

Graph-Processing Systems (focusing on GraphChi) Recall: PageRank in MapReduce (Hadoop) (a,[c])

Graph Ordering Lecture 16 CSCI 4974/6971 27 Oct 2016 1 / 12 Todays Biz 1. Reminders 2.

Coresets Meet EDCS: Algorithms for Matching and Vertex Cover on Massive Graphs Sepehr Assadi