Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning Tianren Tang Tian Tan Shangqi Guo Xiaolin Hu Feng Chen
Background • Goal-Conditional HRL • High policy suffers from non-stationary problem • From MARL's perspective, agent's policy is influenced by other agents • Another Perspective • Usually the action space for high policy is too large, therefore its action which is sub-goal for low policy usually unreachable • Intuitively, action space reduction or action elimination • Drawbacks: • no similiar literature shows how to do space reduction • Reduction or elimination may cause sub-optimal
Intuition • Restrict space into k-step adajecent region
Theoretical Analysis • Shortest Transition Time • For optimal policy 𝜌 ∗ • where 𝜒 −1 : 𝐻 → 𝑇 is a mapping from goal to state s
Theoretical Analysis • k-step adjacent region of s is defined: • Theorem 1: • there is always a surrogate goal ’ ∈ 𝐻 𝐵 that 𝜌 ∗ (𝑏 ∗ |𝑡, ’ ) = 𝜌 ∗ (𝑏 ∗ |𝑡, ) • Theorem 2: • ’ ∈ 𝐻 𝐵 , 𝑅 ∗ (𝑡, ’ ) = 𝑅 ∗ (𝑡, )
Theoretical Optimizations • Original optimization objective where 𝜐 ∗ = (𝑡 0 . . . 𝑡 𝑈𝐿 ), 𝜍 ∗ = ( 0 . . . (𝑈−1)𝐿 ) • Relax above equations:
HRL with Adjacency Constraint • Adjacent Matrix approximation • Contrasitive Loss
Final Optimization Objective • With a learned adjacency network
Algorithm
Experiment Environment • Discrete & Continuous • Result
Abalation Study • Difference: • HRAC-O: HRAC with perfect adajency matrix from environment • NegReward: Relabel reward to negative and bound critic function
Visualization
Summary • Although Intuition is easy, this paper is overall good.
Recommend
More recommend