subgoals in hierarchical reinforcement
play

Subgoals in Hierarchical Reinforcement Learning Tianren Tang Tian - PowerPoint PPT Presentation

Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning Tianren Tang Tian Tan Shangqi Guo Xiaolin Hu Feng Chen Background Goal-Conditional HRL High policy suffers from non-stationary problem From MARL's


  1. Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning Tianren Tang Tian Tan Shangqi Guo Xiaolin Hu Feng Chen

  2. Background • Goal-Conditional HRL • High policy suffers from non-stationary problem • From MARL's perspective, agent's policy is influenced by other agents • Another Perspective • Usually the action space for high policy is too large, therefore its action which is sub-goal for low policy usually unreachable • Intuitively, action space reduction or action elimination • Drawbacks: • no similiar literature shows how to do space reduction • Reduction or elimination may cause sub-optimal

  3. Intuition • Restrict space into k-step adajecent region

  4. Theoretical Analysis • Shortest Transition Time • For optimal policy 𝜌 ∗ • where 𝜒 −1 : 𝐻 → 𝑇 is a mapping from goal to state s

  5. Theoretical Analysis • k-step adjacent region of s is defined: • Theorem 1: • there is always a surrogate goal 𝑕 ’ ∈ 𝐻 𝐵 that 𝜌 ∗ (𝑏 ∗ |𝑡, 𝑕 ’ ) = 𝜌 ∗ (𝑏 ∗ |𝑡, 𝑕) • Theorem 2: • 𝑕 ’ ∈ 𝐻 𝐵 , 𝑅 ∗ (𝑡, 𝑕 ’ ) = 𝑅 ∗ (𝑡, 𝑕)

  6. Theoretical Optimizations • Original optimization objective where 𝜐 ∗ = (𝑡 0 . . . 𝑡 𝑈𝐿 ), 𝜍 ∗ = (𝑕 0 . . . 𝑕 (𝑈−1)𝐿 ) • Relax above equations:

  7. HRL with Adjacency Constraint • Adjacent Matrix approximation • Contrasitive Loss

  8. Final Optimization Objective • With a learned adjacency network

  9. Algorithm

  10. Experiment Environment • Discrete & Continuous • Result

  11. Abalation Study • Difference: • HRAC-O: HRAC with perfect adajency matrix from environment • NegReward: Relabel reward to negative and bound critic function

  12. Visualization

  13. Summary • Although Intuition is easy, this paper is overall good.

Recommend


More recommend