unsupervised subgoal discovery method for learning
play

Unsupervised Subgoal Discovery Method for Learning Hierarchical - PowerPoint PPT Presentation

Unsupervised Subgoal Discovery Method for Learning Hierarchical Representations Jacob Rafati Ph.D., Electrical Engineering and Computer Science (EECS) Computational Cognitive Neuroscience Laboratory (CCNL) http://rafati.net Co-authored with David


  1. Unsupervised Subgoal Discovery Method for Learning Hierarchical Representations Jacob Rafati Ph.D., Electrical Engineering and Computer Science (EECS) Computational Cognitive Neuroscience Laboratory (CCNL) http://rafati.net Co-authored with David C. Noelle Professor and Chair of Cognitive Information and Sciences Founding Faculty of EECS & CSE, Director of CCNL University of California, Merced Workshop on Structure and Priors in Reinforcement Learning (SPiRL 2019) 7th International Conference on Learning Representations (ICLR 2019) � 1

  2. Reinforcement Learning Reinforcement learning (RL) is learning how to map situations ( states ) to agent ’s decisions ( actions ) to maximize future rewards (return) by interaction with an unknown environment. Experience ( s, a, r, s’ ) as Data . Sutton and Barto (2017). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, USA, 2nd edition. � 2

  3. Generalization Parameterized Value Function State Expectation of Return Function (Game Scores) Approximator q ( s, a i ; w ) s w . . . . . . � 3

  4. Success in easy tasks, Failure in more complex task Mnih, et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540):529–533. � 4

  5. Learning Representations in model-free HRL • Temporal Abstraction Learning to operate over di ff erent levels of temporal abstraction . Learning a meta-policy to choose a proper subgoal. • Intrinsic Motivation Learning E ffi ciently exploring the state-space while learning reusable subpolicies (skills) through the intrinsic motivation learning . The intrinsic critic sends intrinsic rewards based on attaining subgoals. • Automatic Subgoal Discovery Automatic Subgoal Discovery in large-scale tasks with sparse delayed feedback within model-free HRL framework. • Learning hierarchical representation of model-free HRL in a unified approach Integration of temporal abstraction, intrinsic motivation learning and subgoal discovery in one unified algorithm. � 5

  6. Meta-controller/Controller Framework Kulkarni et al. (2016). Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. NeurIPS. � 6

  7. Unsupervised Subgoal Discovery Properties: • It is close to a rewarding state. • It represents a set of states, at least some of which tend to be along a state transition path to a rewarding state. Hypothesis: We can use unsupervised learning methods to find useful subgoals based on a memory of the agent’s experiences (rewards and visited states). Centroids of K-means clusters (e.g. rooms) Outliers as potential subgoals (e.g. key, box) Boundary of two clusters (e.g. doorway) � 7

  8. Unsupervised Subgoal Discovery Anomaly Detection K-Means Clustering K = 6 K = 8 K = 4 � 8

  9. Unified Model-Free HRL Agent Experience Memory D Subgoal Discovery G Meta-Controller g e t Controller s t r t a t r t +1 Environment s t +1 � 9

  10. Results — 4-Rooms task 1 . 0 100 Success in Reaching Subgoals % State Space Coverage Rate 90 0 . 8 80 0 . 6 70 60 0 . 4 50 0 . 2 40 Intrinsic Motivation within Unified HRL K = 4 Intrinsic Motivation with Random Subgoals Selection 30 K = 6 Regular RL Random Walk 0 . 0 K = 8 20 0 200 400 600 800 1000 0 20000 40000 60000 80000 100000 Training Episodes Training Episodes 100 50 Success in Solving Task% 80 40 Episode Return Unified Model-Free HRL Method, K = 4 60 30 Unified Model-Free HRL Method, K = 4 Unified Model-Free HRL Method, K = 6 Unified Model-Free HRL Method, K = 6 Unified Model-Free HRL Method, K = 8 Unified Model-Free HRL Method, K = 8 20 40 Regular RL Regular RL 10 20 0 0 0 20000 40000 60000 80000 100000 − 10 0 20000 40000 60000 80000 100000 Training Episodes Training Episodes � 10

  11. Montezuma’s Revenge Initial Subgoals Unsupervised Subgoal Discovery Random Walk Our Method Edge Detection Bounding Box 400 Unified Model-Free HRL Method Average return over 1000 episdes Unified Model-Free HRL Method 100 DeepMind DQN Algorithm (Mnih et. al., 2015) Success in reaching subgoals % DeepMind DQN Algorithm (Mnih et. al., 2015) 350 80 300 250 60 200 40 150 100 20 50 0 0 0 500000 1000000 1500000 2000000 2500000 0 500000 1000000 1500000 2000000 2500000 Training steps Training steps � 11

  12. Neural Correlates of Unsupervised Subgoal Discovery • Temporal abstraction in HRL might map onto regions within the dorsolateral and orbital prefrontal cortex (PFC). • More recent discoveries reveal a potential role for medial temporal lobe structures, including the hippocampus, in planning and spatial navigation, utilizing a hierarchical representation of space. • There are evidences that hippocampus serve in model-based and model-free HRL with both flexibility and computational e ffi ciency. • Place cells in the dorsal hippocampus represent small regions while those in the ventral hippocampus represent larger regions. Strange et al. (2014). Functional organization of the hippocampal longitudinal axis. Nature Reviews Neuroscience, 15(10):655–669. Chalmers et al. (2016). Computational properties of the hippocampus increase the e ffi ciency of goal-directed foraging through hierarchical reinforcement learning. Frontiers in Computational Neuroscience, 10. Botvinick et al. (2009). Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition, 113(3). Botvinick, M. and Weinstein, A. (2014). Model-based hierarchical reinforcement learning and human action control. Philosophical Transactions of the Royal Society B: Biological Sciences, 369. � 12

  13. Conclusions • We proposed and demonstrated a novel model-free method for subgoal discovery using unsupervised learning over a small memory of experiences (trajectories) of the agent. • When combined with an intrinsic motivation learning mechanism, this method learns subgoals and skills together, based on experiences in the environment. • Intrinsic motivation learning provides e ffi cient exploration scheme in tasks with sparse rewards that leads to successful subgoal discovery. • We o ff ered a unified approach for learning hierarchical representations in a model-free HRL framework. This method is scalable to larger scale problems. � 13

  14. Publications • Jacob Rafati, David C. Noelle. (2019). Unsupervised Subgoal Discovery Method for Learning Hierarchical Representations. In 7th International Conference on Learning Representations, ICLR 2019 Workshop on "Structure & Priors in Reinforcement Learning", New Orleans, LA, USA. • Jacob Rafati, David C. Noelle. (2019). Unsupervised Methods For Subgoal Discovery During Intrinsic Motivation in Model-Free Hierarchical Reinforcement Learning. In 33rd AAAI Conference on Artificial Intelligence (AAAI-19). Workshop on Knowledge Extraction From Games. Honolulu, Hawaii. USA. • Jacob Rafati, and David C. Noelle (2019). Learning Representations in Model- Free Hierarchical Reinforcement Learning. In 33rd AAAI Conference on Artificial Intelligence (AAAI-19), Honolulu, Hawaii. • Jacob Rafati, and David C. Noelle (2019). Learning Representations in Model- Free Hierarchical Reinforcement Learning. arXiv e-print (arXiv:1810.10096). � 14

  15. Questions and Feedbacks For paper, code, slides: http://rafati.net Email: yrafati@gmail.com � 15

Recommend


More recommend