hierarchical reinforcement learning and human behavior
play

Hierarchical Reinforcement Learning and Human Behavior Matthew - PowerPoint PPT Presentation

Hierarchical Reinforcement Learning and Human Behavior Matthew Botvinick Princeton Neuroscience Institute and Department of Psychology Princeton University a a a * v v v Knutson et al., NeuroReport , 2001 Schultz et al.,


  1. Hierarchical Reinforcement Learning and Human Behavior Matthew Botvinick Princeton Neuroscience Institute and Department of Psychology Princeton University

  2. a a a * v v v � � �

  3. Knutson et al., NeuroReport , 2001 Schultz et al., Science , 1997

  4. Matsumoto & Hikosaka, Nature , 2007 Gehring & Willoughby, Science , 2002

  5. From Glasher, Daw, Dayan & O’Doherty, 2010 From Niv, Joel & Dayan, TICS, 2006 (artwork by B. Balleine)

  6. 0.8%

  7. The Curse of Dimensionality

  8. 15.6% The Blessing of Abstraction

  9. ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! s ( t =1) s ( t =2) s ( t =3) s ( t =4) s ( t =5) s ( t =6) Botvinick, Niv & Barto, Cognition , 2009

  10. W W S W W P G W W W W After Sutton, Precup & Singh, 1999 Botvinick, Niv & Barto, Cognition , 2009

  11. ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! s ( t =1) s ( t =2) s ( t =3) s ( t =4) s ( t =5) s ( t =6) Botvinick, Niv & Barto, Cognition , 2009

  12. Botvinick & Weinstein, Trans. Royal Society, 2014

  13. Hamilton & Grafton, J Neurosci , 2006 Humpheys & Forde, Cog. Neuropsych., 2001

  14. A C Actor Actor π ( s ) DLS Critic Critic state ( s ) state ( s ) V ( s ) VS action action R ( s ) δ DA HT+ Environment Environment B D Actor Actor 1 1 π ( s ) DLS DLS DLS o Critic Critic VS state ( s ) o state ( s ) DLPFC V ( s ) o + OFC action action R ( s ) δ HT+ DA o Environment Environment Botvinick, Niv & Barto, Cognition , 2009

  15. From Curtis & D’Esposito, TICS , 2003

  16. White & Wise, Exp Br Res, 1999

  17. Miller & Cohen, Ann. Rev. Neurosci , 2001

  18. From Badre, TICS , 2008

  19. A C Actor Actor π ( s ) DLS Critic Critic state ( s ) state ( s ) V ( s ) VS action action R ( s ) δ DA HT+ 2 2 Environment Environment B D Actor Actor π ( s ) DLS DLS DLS o Critic Critic VS state ( s ) o state ( s ) DLPFC V ( s ) o + OFC action action R ( s ) δ HT+ DA o Environment Environment Botvinick, Niv & Barto, Cognition , 2009

  20. O’Reilly & Frank, Neural Computation, 2006

  21. O’Reilly & Frank, Neural Computation, 2006 Bonini et al., J. Neurosci ., 2011

  22. A C Actor Actor π ( s ) DLS Critic Critic state ( s ) state ( s ) V ( s ) VS action action R ( s ) δ DA HT+ Environment Environment 3 3 B D Actor Actor π ( s ) DLS DLS DLS o Critic Critic VS state ( s ) o state ( s ) DLPFC V ( s ) o + OFC action action R ( s ) δ HT+ DA o Environment Environment Botvinick, Niv & Barto, Cognition , 2009

  23. Schoenbaum, et al. J Neurosci . 1999

  24. A C Actor Actor π ( s ) DLS Critic Critic state ( s ) state ( s ) V ( s ) VS action action R ( s ) δ DA HT+ Environment Environment B D Actor Actor π ( s ) DLS DLS DLS o Critic Critic VS state ( s ) o state ( s ) DLPFC V ( s ) o + OFC action action R ( s ) δ HT+ DA o Environment Environment 4 4 Botvinick, Niv & Barto, Cognition , 2009

  25. ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! s ( t =1) s ( t =2) s ( t =3) s ( t =4) s ( t =5) s ( t =6)

  26. Carlos Diuk

  27. Carlos Diuk Diuk, et al. , J Neurosci, 2013

  28. “RPE” ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! s ( t =1) s ( t =2) s ( t =3) s ( t =4) s ( t =5) s ( t =6) “PPE”

  29. Jose Fernandes Alec Solway Ribas-Fernandes et al., Neuron , 2011

  30. Jose Fernandes Alec Solway Standard RL Hierarchical RL PPE 1 RPE RPE A 0 -1 B D A C E D B C E Timestep Timestep Timestep Ribas-Fernandes et al., Neuron , 2011

  31. Jose Fernandes Alec Solway ! From Yeung, et al., 2005 Ribas-Fernandes et al., Neuron , 2011

  32. Jose Fernandes Alec Solway Ribas-Fernandes et al., Neuron , 2011

  33. Search Time A 4 Log Solution Time 3 Model Evidenc 2 1 1 100 200 Episode Botvinick, Niv & Barto, Cognition , 2009

  34. The Burden of Abstraction

  35. � 1. What should be learned? � � 2. Do people learn it? � � 3. How? �

  36. Alec Solway Carlos Diuk Solway et al., PLoS Comp. Biol., 2014

  37. Alec Solway Carlos Diuk Solway et al., PLoS Comp. Biol., 2014

  38. Alec Solway Carlos Diuk Solway et al., PLoS Comp. Biol., 2014

  39. Alec Solway Carlos Diuk !!!!!!!!!!!!!!! Pr !"#" !"#$% = Solway et al., PLoS Comp. Biol., 2014

  40. !!!!!!!!!!!!!!! Pr !"#" !"#$% = Pr !"#" !"#$% , ! Pr ! ! !"#$% , ! ∈ ! Codelength Search Time !!!!!!!!!!!!!!! Pr !"#" !"#$% = Model Evidence Solway et al., PLoS Comp. Biol., 2014

  41. Solway et al., PLoS Comp. Biol., 2014

  42. Zachary’s karate club Santa Fe Institute Lusseau’s bottlenose dolphins collaborations Fortunato, Physics Reports , 2010

  43. Simsek, Wolfe & Barto, 2005

  44. � 1. What should be learned? � � 2. Do people learn it? � � 3. How? �

  45. Carlos Diuk DebbieYee Solway et al., PLoS Comp. Biol., 2014

  46. Carlos Diuk DebbieYee Solway et al., PLoS Comp. Biol., 2014

  47. Carlos Diuk DebbieYee Solway et al., PLoS Comp. Biol., 2014

  48. Carlos Diuk DebbieYee B 2900 2800 G S ������������� 2700 2600 2500 2400 2300 2200 ���� Reject Solway et al., PLoS Comp. Biol., 2014

  49. � 1. What should be learned? � � 2. Do people learn it? � � 3. How? �

  50. , !!!!!!!!!!!!!!! Pr !"#" !"#$% = Pr !"#" !"#$% , ! Pr ! ! !"#$% ! ∈ !

  51. Anna Schapiro Schapiro et al., Nature Neurosci, 2013

  52. 1.00 0.66 -0.36 0.66 1.00 -0.36 -0.36 -0.36 1.00 Schapiro et al., Nature Neurosci, 2013

  53. Schapiro et al., Nature Neurosci, 2013

  54. Schapiro et al., Nature Neurosci, 2013

  55. Time Schapiro et al., Nature Neurosci, 2013

  56. Experiment 1 Experiment 2 All trials 0.4 0.4 Hamiltonian Probability of parse paths 0.3 0.3 0.2 0.2 0.1 0.1 0 0 Cluster transition Cluster transition Other parse Other parse parse parse Schapiro et al., Nature Neurosci, 2013

  57. + HC Schapiro et al., Nature Neurosci, 2013

  58. Carlos Diuk 1.0 Successor Representation 0.9 0.8 Correlation 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.38 0.36 0.34 Pattern Correlation 0.32 0.30 0.28 0.26 0.24 0.22 0.20 0.18 + HC Diuk et al., in prep.

  59. Current Stimulus Next Stimulus Schapiro et al., 2013.; Rogers & McClelland, 2003

  60. Codelength Search Time !!!!!!!!!!!!!!! Pr !"#" !"#$% = Model Evidence Solway et al., PLoS Comp. Biol., 2014

  61. cf. Dayan, 1993

  62. Rosvall & Bergstrom, PNAS, 2008

  63. Mahadevan & Maggioni, 2005

  64. Stachenfeld, Botvinick & Gershman, NIPS, 2014

  65. Olshausen & Field, Nature, 1996

  66. Botvinick & Plaut, Psych Review, 2004

  67. Conclusions � • The scaling problem in RL • Hierarchy can help • Model-free versus model-based HRL • HRL in the brain • The need for good representations • Task decomposition, bottlenecks, community detection • Prospective coding and structure discovery Codelength • Hierarchy as compression

  68. Collaborators Lab Contributors Carlos Diuk (Facebook) Jose Ribas-Fernandes (U. Victoria) Andy Barto (UMass) Anna Schapiro Yael Niv (Princeton) Alec Solway (V. Tech / UCL) Tim Rogers (Wisconsin) Kim Stachenfeld Nick Turk-Browne (Princeton) Ari Weinstein Debbie Yee (Wash. U.)

Recommend


More recommend