Reinforcement Learning Alexander Sasha Vezhnevets, Simon Osindero, - PowerPoint PPT Presentation

FeUdal Networks for Hierarchical Reinforcement Learning Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu: DeepMind The 34th International Conference on Machine Learning (ICML 2017)

• Brief review of FeUdal Networks • Structures • Detailed Features • More on FeUdal Networks for HRL • Training • Experiments results

Rewards Feudal RL (1993) Reward Hiding : Agent ● Managers reward sub-managers for satisfying their commands, not through an external reward Rewards ● Managers have absolute control Agent Information Hiding Rewards ● Observe world at different resolutions ● Managers don’t know what happens at a other Agent levels of the hierarchy Actions Environment Dayan, Peter and Hinton, Geoffrey E. , “ Feudal Reinforcement Learning ”, NIPS, 1993.

FeUdal Networks (2017) Rewards Manager Manager ● Sets directional goals for the worker ● Rewarded by environment Goals, Rewards ● Does not directly act in environment Worker Worker ● Higher temporal resolution Actions ● Reward for achieving manager’s goals Environment ● Produces primitive actions in environment

FeUdal Network:

FeUdal Network : Details Shared Dense Embedding ● Embedding of input state ● Used by both worker and manager to produce goal and action ● CNN ○ 16 8x8 filters ○ 32 4x4 filters ○ 256 fully connected ○ ReLU

FeUdal Network FeUdal Network : Details Manager: Goal embedding ● Lower Temporal Resolution, goals summed over last 10 time steps (goals vary smoothly) ● Uses dilated LSTM ● Goal is in low-dimensional space, not environment ● Trained using transition policy gradient

FeUdal Network FeUdal Network : Details Worker: Action Embedding ● Standard LSTM on shared embedding ● Embedding U matrix: ○ Rows: actions [a] ○ Columns : embedding dimension [k]

FeUdal Network : Details FeUdal Network Goal embedding: Worker ● Compress manager’s goal to dim k using linear transformation - 𝜚 ● Same dim as action embedding ● Linear transformation with no bias ○ Can’t produce a 0 vector ○ Can’t ignore the manager’s input, so manager’s goal will influence final policy

FeUdal Network FeUdal Network : Details Action: Worker ● Product of action embedding matrix (U) with goal embedding (w) ● Produces a distribution over actions ● Action = softmax(U*w)

FeUdal Network : Features Directional Goal

FeUdal Network : Features

FeUdal Network : Features ▪ Intrinsic reward = α 𝑈 β 𝑒 𝑑𝑝𝑡 α, β α β

Training Manager: Transition Policy Gradient Actor-critic: Value function from internal critic:

Training Worker: Weighted reward Actor-Critic: 𝐸 ∇ 𝜄 log 𝜌(𝑏 𝑢 |𝑦 𝑢 ; 𝜄) ∇𝜌 𝑢 = 𝐵 𝑢 Not reward-hiding! ● Use weighted sum of intrinsic reward, and environment reward 𝐸 = (𝑆 𝑢 + 𝛽𝑆 𝑢 𝐽 − 𝑊 𝐸 (𝑦 𝑢 ; 𝜄)) 𝐵 𝑢 𝑢 ● Intrinsic reward is based on whether the worker follows the correct direction 𝑑 𝐽 = 1 𝑆 𝑢 𝑑 ෍ 𝑒 𝑑𝑝𝑡 (𝑡 𝑢 − 𝑡 𝑢−𝑗 , 𝑕 𝑢−𝑗 ) 𝑗=1

More Details: Dilated LSTM ● Better able to preserve memories over long periods ● Output is summed over previous 10 steps ● Specific type of Dilated RNN Dilated RNN [Chang et al. 2017]:

Results: Atari ● Outperforms LSTM baseline whenever there are more delayed rewards

Results: Compared with Option-critic ● Option-critic architecture: the only other end-to-end trainable system with sub-policies at that time ● Similar score on Seaquest, doubles it on Ms. Pacman, more than triples it on Zaxxon and gets more than 20x improvement on Asterix

Sub-policies inspection : Water Maze ● Circular space with invisible goal, agent must find goal ● Next episode put in a random location, and agent must find goal again ● Left two are individual episodes, right visualizes the sub-policies ● Agent learns meaningful sub goals

Ablations : Temporal Resolution Ablations ● Removing dilations from the LSTM or using full temporal time scale for manager is significantly worse

Ablations : Intrinsic Reward Ablations ● Using only intrinsic reward at right ● Environment reward is not necessary for good performance

Summary ● Directional rather than absolute goals are useful ● Dilated LSTM is crucial for high performance ● Improves long-term credit assignment over baselines ● Manager’s goals are meaningful low-level behaviors from the worker

Thanks for listening!

Reinforcement Learning Alexander Sasha Vezhnevets, Simon Osindero, - PowerPoint PPT Presentation

FeUdal Networks for Hierarchical Reinforcement Learning Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu: DeepMind The 34th International Conference on Machine Learning (ICML

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

Machine Learning for NLP Reinforcement learning Aurlie Herbelot 2019 Centre for Mind/Brain

FeUdal Networks for Hierarchical Reinforcement Learning Alexander Sasha Vezhnevets, Simon

Curriculum Design In English Language Arts: Making Curriculum Make Sense For Burnaby English

In Pursuit of Peace LESSON 6 Your Response to the Lesson What was most interesting in the Bible

Year 2 Kickoff Meeting WARMF Data Presentation October 25, 2017 Watershed Modeling Overview

Dr.BrianEgan,DepartmentofGeography,SimonFraserUniversity

Nationalism Lecture 5: Nationalist Systems Change Prof. Lars-Erik Cederman Swiss Federal

SOCI 210: Sociological Perspectives Oct. 13 1. Inequality & mobility 2. Social divisions and

Lecture 7 Public Key Cryptography (Diffie-Hellman and RSA) 1 Public Key Cryptography

Reinforcement Learning Alexander Sasha Vezhnevets, Simon Osindero, - PowerPoint PPT Presentation

FeUdal Networks for Hierarchical Reinforcement Learning Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu: DeepMind The 34th International Conference on Machine Learning (ICML

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

Machine Learning for NLP Reinforcement learning Aurlie Herbelot 2019 Centre for Mind/Brain

FeUdal Networks for Hierarchical Reinforcement Learning Alexander Sasha Vezhnevets, Simon

Curriculum Design In English Language Arts: Making Curriculum Make Sense For Burnaby English

In Pursuit of Peace LESSON 6 Your Response to the Lesson What was most interesting in the Bible

Year 2 Kickoff Meeting WARMF Data Presentation October 25, 2017 Watershed Modeling Overview

Dr.BrianEgan,DepartmentofGeography,SimonFraserUniversity

Nationalism Lecture 5: Nationalist Systems Change Prof. Lars-Erik Cederman Swiss Federal

SOCI 210: Sociological Perspectives Oct. 13 1. Inequality &amp; mobility 2. Social divisions and

Lecture 7 Public Key Cryptography (Diffie-Hellman and RSA) 1 Public Key Cryptography

SOCI 210: Sociological Perspectives Oct. 13 1. Inequality & mobility 2. Social divisions and