FeUdal Networks for Hierarchical Reinforcement Learning Alexander - PowerPoint PPT Presentation

Oct 16, 2023 •684 likes •921 views

FeUdal Networks for Hierarchical Reinforcement Learning Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu Topic: Hierarchical RL Presenter: Thophile Gaudin Why Hierarchical

FeUdal Networks for Hierarchical Reinforcement Learning Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu Topic: Hierarchical RL Presenter: Théophile Gaudin
Why Hierarchical RL? • RL is hard • Sparse reward • Long time-horizon https://www.retrogames.cz/play_124-Atari2600.php?language=EN • More “human-like” approach to decision making
Human-like decision making When we type on a computer keyboard, we just thinking about the words we want to write . We don’t think about each our fingers and muscles individually. We make hierarchical abstractions Could this work for RL too?
Feudalism? Governance system in Europe between 9-15th centuries Top-down “management” https://en.wikipedia.org/wiki/Feudalism
Feudal Reinforcement Learning (Dayan & Hinton 93’) • Only top Manager sees the environment reward • Managers rewards and set goals for level below • Managers are not aware of what happens at other level
FeUdal Networks Manager • Lower temporal resolution • Sets directional goals • Rewarded by env. Worker • Higher temporal resolution • Rewarded by the Manager • Produces actions in env. No gradient are propagated between the Manager and the Worker
Directional vs Absolute Goals An absolute goal would be to reach a particular state Ex: you have an address to reach A direction goal would be to go towards a particular state Ex: you have a direction to follow
Model Architecture Details
How to train this model? • Could use TD-learning but then g t would not have any semantic meaning • Approximate transition policy gradient Manager Worker Direction in the latent space
Manager RNN: Dilated LSTM ● Memories over longer periods ● Outputs are summed over c steps ● Performs better “Standard” RNN Dilated RNN
Results on Atari games
Sub-policies inspection
Sub-policies inspection
Is the Dilated LSTM important?
Influence of 𝝱
Transfer Learning ● They changed the number of action repeat
Did it solve Montezuma’s Revenge?
Sum up of the results • Using directional goals works well • Better long-term credit assignment • Better transfer learning • Manager’s goals corresponds to different sub-policies • Dilated LSTM is essential for good performance • Meticulous ablation studies - proving their points with evidence (vs claiming SOTA)
FeUdal Network vs Options Framework ● Only one Worker vs many options ○ Memory efficient ○ Cheaper computationally ● Meaningful goals producing different sub-policies ● “Standard” MDP
Contributions (recap) • Differentiable model that implements Feudal RL • Approximate transition policy gradient for training the Manager • Directional goals instead of absolute • Dilated LSTM
Has this method inspired others? https://sites.google.com/stanford.edu/iris/ Learning Latent Plans from Play https://learning-from-play.github.io/
Open challenges • Montezuma’s revenge remains a challenge • Maybe using deeper hierarchy and different time scale? • Transfer learning from an environment to another?

Recommend

FeUdal Networks for Hierarchical Reinforcement Learning Alexander Sasha Vezhnevets, Simon

FeUdal Networks for Hierarchical Reinforcement Learning Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu: DeepMind Rene Bidart (rbbidart@uwaterloo.ca) CS885 June 22, 2018

653 views • 27 slides

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning: an Introduction, 2nd Edition: Chapters 6 (6.1 6.5) Outline Reinforcement Learning Reinforcement Learning: the

587 views • 27 slides

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning Q-Learning Deep Q-Learning on Atari Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement Learning Q-Learning Deep Q-Learning on Atari Table of Contents Reinforcement Learning

939 views • 63 slides

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement Learning Animesh Garg Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning Richard S. Sutton , Doina

2.19k views • 29 slides

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Introduction to Reinforcement Learning RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem Inside an RL agent Temporal difference learning Many faces of Reinforcement Learning What is

552 views • 35 slides

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning<br/><br/> 4/25/19, 8*06 PM Reinforcement Learning<br/><br/> 4/25/19, 8*06 PM Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning? Spring 2019 Created:

371 views • 15 slides

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning and Simulation-Based Search Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and Simulation-Based Search Outline 1 Reinforcement Learning 2 Simulation-Based Search 3 Planning Under

425 views • 20 slides

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine playing a new game whose rules you dont know; after a hundred or so moves your don t know; after a hundred or so moves, your opponent announces, You

512 views • 30 slides

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest Lecture May 24, 2017 Lecture overview What makes a reinforcement learning algorithm safe ? Notation Creating a safe reinforcement learning

1.42k views • 88 slides

Reinforcement Learning Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max

FeUdal Networks for Hierarchical Reinforcement Learning Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu: DeepMind The 34th International Conference on Machine Learning (ICML

719 views • 23 slides

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Neural Networks and Deep Reinforcement Learning Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and Courville [chapt. 6,7,8]; AIMA [sect. 21.1-21.3]; Sutton and Barto, Reinforcement Learning: an

528 views • 35 slides

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005 1 / 15 Outline Introduction to hierarchical bounding volume (HBV) Tree generation Other optimization issues () Hierarchical Bounding Volume

350 views • 16 slides

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

DataCamp Hierarchical and Mixed Effects Models in R HIERARCHICAL AND MIXED EFFECTS MODELS IN R What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical and Mixed Effects Models in R Why do we use a

718 views • 25 slides

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning Haarnoja, Tang et al. (2017) Reinforcement Learning with Deep Energy Based Policies, ICML . Haarnoja, Zhou et al. (2018) Soft Actor-Critic: Off-Policy

684 views • 24 slides

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Lecture 1: Introduction to Reinforcement Learning Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to Reinforcement Learning Outline 1. Course Logistics 2. What is Reinforcement Learning? 3.

930 views • 67 slides

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

Reinforcement Learning and Markov Decision Process Q-Learning Q-Learning Convergence Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler Seto (ss3349) Introduction to Reinforcement Learning and

565 views • 27 slides

Curriculum Design In English Language Arts: Making Curriculum Make Sense For Burnaby English

Curriculum Design In English Language Arts: Making Curriculum Make Sense For Burnaby English Teachers Session 1, October 16 th 2017 Burnaby Central Iain Fisher Eyes On The Prize Literacy is the ability and willingness to make meaning from

467 views • 27 slides

In Pursuit of Peace LESSON 6 Your Response to the Lesson What was most interesting in the Bible

In Pursuit of Peace LESSON 6 Your Response to the Lesson What was most interesting in the Bible story? What activity was most enjoyable? What new things did you learn? Activity A All Torn Up Is it easy to put the pieces back together after

363 views • 23 slides

Year 2 Kickoff Meeting WARMF Data Presentation October 25, 2017 Watershed Modeling Overview

UNRBA Modeling and Regulatory Support Year 2 Kickoff Meeting WARMF Data Presentation October 25, 2017 Watershed Modeling Overview Existing Falls Lake Watershed Model In 2009, DWR developed a watershed model using WARMF Additional data

411 views • 29 slides

Last weeks message: The Church as Mission To listen to previous messages in this or any series:

Last weeks message: The Church as Mission To listen to previous messages in this or any series: hopecc.com/messages Mission exists because sin exists Gods mission overcomes sin through Christ Gods salvation is proclaimed through his

376 views • 33 slides

Dr.BrianEgan,DepartmentofGeography,SimonFraserUniversity

Dr.BrianEgan,DepartmentofGeography,SimonFraserUniversity poli;calecology emergedin1980soutofcri;queofstandardexplana;onsfor

522 views • 18 slides

Nationalism Lecture 5: Nationalist Systems Change Prof. Lars-Erik Cederman Swiss Federal

Nationalism Lecture 5: Nationalist Systems Change Prof. Lars-Erik Cederman Swiss Federal Institute of Technology (ETH) Center for Comparative and International Studies (CIS) Seilergraben 49, Room G.2 lcederman@ethz.ch

243 views • 11 slides

SOCI 210: Sociological Perspectives Oct. 13 1. Inequality & mobility 2. Social divisions and

SOCI 210: Sociological Perspectives Oct. 13 1. Inequality & mobility 2. Social divisions and class 3. Global inequality 1 Social Divisions & Social Class 2 Social class Last week: Today: Discussed social class from a

805 views • 6 slides

Lecture 7 Public Key Cryptography (Diffie-Hellman and RSA) 1 Public Key Cryptography

Lecture 7 Public Key Cryptography (Diffie-Hellman and RSA) 1 Public Key Cryptography Asymmetric cryptography Invented in 1974-1978 (Diffie-Hellman and Rivest-Shamir- Adleman) Two keys: private (SK), public (PK) Encryption:

810 views • 42 slides