Subgoals in Hierarchical Reinforcement Learning Tianren Tang Tian - PowerPoint PPT Presentation

Sep 27, 2022 •186 likes •330 views

Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning Tianren Tang Tian Tan Shangqi Guo Xiaolin Hu Feng Chen Background Goal-Conditional HRL High policy suffers from non-stationary problem From MARL's

Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning Tianren Tang Tian Tan Shangqi Guo Xiaolin Hu Feng Chen
Background • Goal-Conditional HRL • High policy suffers from non-stationary problem • From MARL's perspective, agent's policy is influenced by other agents • Another Perspective • Usually the action space for high policy is too large, therefore its action which is sub-goal for low policy usually unreachable • Intuitively, action space reduction or action elimination • Drawbacks: • no similiar literature shows how to do space reduction • Reduction or elimination may cause sub-optimal
Intuition • Restrict space into k-step adajecent region
Theoretical Analysis • Shortest Transition Time • For optimal policy 𝜌 ∗ • where 𝜒 −1 : 𝐻 → 𝑇 is a mapping from goal to state s
Theoretical Analysis • k-step adjacent region of s is defined: • Theorem 1: • there is always a surrogate goal 𝑕 ’ ∈ 𝐻 𝐵 that 𝜌 ∗ (𝑏 ∗ |𝑡, 𝑕 ’ ) = 𝜌 ∗ (𝑏 ∗ |𝑡, 𝑕) • Theorem 2: • 𝑕 ’ ∈ 𝐻 𝐵 , 𝑅 ∗ (𝑡, 𝑕 ’ ) = 𝑅 ∗ (𝑡, 𝑕)
Theoretical Optimizations • Original optimization objective where 𝜐 ∗ = (𝑡 0 . . . 𝑡 𝑈𝐿 ), 𝜍 ∗ = (𝑕 0 . . . 𝑕 (𝑈−1)𝐿 ) • Relax above equations:
HRL with Adjacency Constraint • Adjacent Matrix approximation • Contrasitive Loss
Final Optimization Objective • With a learned adjacency network
Algorithm
Experiment Environment • Discrete & Continuous • Result
Abalation Study • Difference: • HRAC-O: HRAC with perfect adajency matrix from environment • NegReward: Relabel reward to negative and bound critic function
Visualization
Summary • Although Intuition is easy, this paper is overall good.

Recommend

Regression Idea: dont solve one subgoal by itself, but keep track of all subgoals that must

Regression Idea: dont solve one subgoal by itself, but keep track of all subgoals that must be achieved. Given a set of goals: If they all hold in the initial state, return the empty plan Otherwise, choose an action A that

402 views • 6 slides

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement Learning Animesh Garg Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning Richard S. Sutton , Doina

2.19k views • 29 slides

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning: an Introduction, 2nd Edition: Chapters 6 (6.1 6.5) Outline Reinforcement Learning Reinforcement Learning: the

587 views • 27 slides

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005 1 / 15 Outline Introduction to hierarchical bounding volume (HBV) Tree generation Other optimization issues () Hierarchical Bounding Volume

350 views • 16 slides

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

DataCamp Hierarchical and Mixed Effects Models in R HIERARCHICAL AND MIXED EFFECTS MODELS IN R What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical and Mixed Effects Models in R Why do we use a

718 views • 25 slides

FeUdal Networks for Hierarchical Reinforcement Learning Alexander Sasha Vezhnevets, Simon

FeUdal Networks for Hierarchical Reinforcement Learning Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu Topic: Hierarchical RL Presenter: Thophile Gaudin Why Hierarchical

921 views • 22 slides

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Introduction to Reinforcement Learning RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem Inside an RL agent Temporal difference learning Many faces of Reinforcement Learning What is

552 views • 35 slides

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning<br/><br/> 4/25/19, 8*06 PM Reinforcement Learning<br/><br/> 4/25/19, 8*06 PM Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning? Spring 2019 Created:

371 views • 15 slides

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning and Simulation-Based Search Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and Simulation-Based Search Outline 1 Reinforcement Learning 2 Simulation-Based Search 3 Planning Under

425 views • 20 slides

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine playing a new game whose rules you dont know; after a hundred or so moves your don t know; after a hundred or so moves, your opponent announces, You

512 views • 30 slides

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest Lecture May 24, 2017 Lecture overview What makes a reinforcement learning algorithm safe ? Notation Creating a safe reinforcement learning

1.42k views • 88 slides

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning Q-Learning Deep Q-Learning on Atari Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement Learning Q-Learning Deep Q-Learning on Atari Table of Contents Reinforcement Learning

939 views • 63 slides

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay Balov (Stata) Bayesian hierarchical models in Stata 2016 Stata Conference 1 / 55 Why hierarchical models? Hierarchical models represent complex,

751 views • 54 slides

Unsupervised Learning and Clustering Owen Roberts, Zach Busser, Ganesh Sugunan Hierarchical

Unsupervised Learning and Clustering Owen Roberts, Zach Busser, Ganesh Sugunan Hierarchical Clustering Hierarchical Clustering Hierarchical Representations Hierarchical Algorithms Agglomeration "Nearest" Clusters Complexity

769 views • 49 slides

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B. Temporal Difference Reinforcement Learning C. PVLV Model D. Cerebellum and Error-driven Learning 2/23/18 COSC 494/594 CCN 2 Sensory-Motor Loop

791 views • 56 slides

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement learning? Agent/Actor + Action + Environment + State + Reward How does reinforcement learning work?

793 views • 31 slides

Planning and Optimization D7. M&S: Generic Algorithm and Heuristic Properties Gabriele R

Planning and Optimization D7. M&S: Generic Algorithm and Heuristic Properties Gabriele R oger and Thomas Keller Universit at Basel November 7, 2018 Generic Algorithm Heuristic Properties Summary Content of this Course Tasks

516 views • 41 slides

Optimizing Communication Networks Laura Galli & Maria Grazia Scutell Operations Research

Optimizing Communication Networks Laura Galli & Maria Grazia Scutell Operations Research Group Computer Science Department The Seven Bridges of Knigsberg: Euler says a city is a graph! Remember: a graph consists of A graph is a

419 views • 11 slides

Operating System CSCI 125 & 161 / ENGR 144 Program that acts as interface to other

Operating System CSCI 125 & 161 / ENGR 144 Program that acts as interface to other Lecture 3 software and the underlying hardware Operating System Utilities Compilers Martin van Bommel Utility Programs Assemblers Editors Hardware

169 views • 3 slides

A Bugs Life Definition Examples Computer Literacy 1 Lecture 16 Algorithms

Topics Bugs A Bugs Life Definition Examples Computer Literacy 1 Lecture 16 Algorithms Foundation of computer programs 27/10/2008 All applications are programs Software design Minimising the impact of bugs

231 views • 7 slides

Comparative method, coalescents, and the future Joe Felsenstein Depts. of Genome Sciences and of

Comparative method, coalescents, and the future Joe Felsenstein Depts. of Genome Sciences and of Biology, University of Washington Comparative method, coalescents, and the future p.1/29 Correlation of states in a discrete-state model

537 views • 14 slides

rs rrt r

trt Prt t s trsrtt rts rs rrt r

829 views • 15 slides

Slide 1 / 42 1 The gravitational force between two objects is proportional to the distance

Slide 1 / 42 1 The gravitational force between two objects is proportional to the distance between the two objects. A the square of the distance between the two B objects. the product of the two objects. C the square of the product of the

319 views • 14 slides

Sormano Astronomical Observatory IAU code 587 Presentation: Francesco Manca WWW:

Sormano Astronomical Observatory IAU code 587 Presentation: Francesco Manca WWW: http://www.brera.mi.astro.it/sormano/ e-mail: obs.sormano@alice.it Sormano Astronomical Observatory Osservatorio Astronomico Sormano IAU

373 views • 10 slides