prefrontal cortex as a meta reinforcement learning system
play

Prefrontal cortex as a Meta-reinforcement learning system Matthew - PowerPoint PPT Presentation

Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience Unit, UCL Mnih et al, Nature (2015) Mnih et al, Nature (2015) Yamins & DiCarlo, 2016 Schultz et al, Science


  1. Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience Unit, UCL

  2. Mnih et al, Nature (2015)

  3. Mnih et al, Nature (2015)

  4. Yamins & DiCarlo, 2016

  5. Schultz et al, Science (1997)

  6. Jederberg et al., 2016

  7. Jederberg et al., 2016

  8. Mante et al., Nature , 2013 Song et al., Elife , 2017

  9. Lake et al, BBS (2017)

  10. “Learning to learn” Harlow, Psychological Review, 1949

  11. “Learning to learn” Training episodes Harlow, Psychological Review, 1949

  12. Mnih et al, Nature (2015)

  13. Jederberg et al., 2016

  14. Jederberg et al., 2016

  15. https://deepmind.com/blog/impala-scalable-distributed-deeprl-dmlab-30/

  16. a v t t (PFC) δ (DA) o a r t t - 1 t - 1 Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci., 2016; Duan et al., arXiv (2016)

  17. 0.7 0.4 0.6 0.9 0.3 0.1 0.8 0.7 Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)

  18. a v t t (PFC) δ (DA) o a r t t - 1 t - 1 Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)

  19. Left Right 4 Gittins indices Cumulative regret Thompson sampling UCB 3 Episode 2 1 1 20 40 60 80 100 Trial Trial Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)

  20. a t v t (PFC) δ (DA) o t a t-1 r t-1 Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)

  21. 0.7 0.3 0.6 0.4 0.3 0.7 0.8 0.2 Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)

  22. 4 Gittins indices Cumulative regret Thompson sampling UCB 3 Episode 2 1 1 20 40 60 80 100 Trial Trial Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)

  23. Training episodes Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)

  24. a v t t (PFC) δ (DA) o a r t t - 1 t - 1 Volkmann et al., Nature Reviews Neurology, 2010

  25. 4 4 2 2 C R C L C R C L 0 0 log 2 log 2 -2 -2 -4 -4 -4 -2 0 2 4 Tsutsui et al., Nature Comms, 2016 -4 -2 0 2 4 R R R R log 2 log 2 R L R L Wang et al., Nature Neuroscience (2018)

  26. a v t t (PFC) δ (DA) o a r t t - 1 t - 1 Wang et al., Nature Neuroscience (2018)

  27. 0.6 0.6 0.5 0.5 Correlation Proportion 0.4 0.4 0.3 0.3 Tsutsui et al., Nature Comms, 2016 0.2 0.2 0.1 0.1 a t-1 r t-1 a t-1 x r t-1 v t a t-1 r t-1 a t-1 x r t-1 v t Wang et al., Nature Neuroscience (2018)

  28. a t v t (PFC) δ (DA) o t a t-1 r t-1 Wang et al., Nature Neuroscience (2018)

  29. 4 Gittins indices Cumulative regret Thompson sampling UCB 3 Episode 2 1 1 20 40 60 80 100 Trial Trial

  30. A 1 feedback action 0.8 Reward probability 0.6 Learning rate 0.4 Inferred/decoded volatility 0.2 0 0 20 40 60 80 100 120 140 160 180 200 Step B 1 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100 120 140 160 180 200 Step Behrens et al., Nature Neuroscience, 2007 Wang et al., Nature Neuroscience (2018)

  31. Behrens et al., Nature Neuroscience, 2007 Wang et al., Nature Neuroscience (2018)

  32. a v t t (PFC) δ (DA) o a r t t - 1 t - 1 Volkmann et al., Nature Reviews Neurology, 2010

  33. REVERSAL Bromberg-Martin et al, J Neurophys, 2010 Wang et al., Nature Neuroscience (2018)

  34. a v t t Left rewarded Right rewarded (PFC) δ (DA) o a r t t - 1 t - 1 Wang et al., Nature Neuroscience (2018)

  35. Model-based RL (from model-free RL) r 2 = 0.89 1 Meta-RL RPE 0 -1 Stage 2 Reward -1 0 1 Model-based RPE Wang et al., Nature Neuroscience (2018) Miller, Botvinick & Brody, Nat. Neuro., 2017; Daw et al., Neuron, 2011

  36. Optogenetic manipulation of dopamine DA blocked upon DA blocked upon DA triggered upon food reward from food reward from food omission from large/risky option small/certain option large/risky option Stopper et al., Neuron, 2014 Wang et al., arXiv; 2018

  37. Mnih et al, Nature (2015)

  38. Current / Future Work • Richer environments / abstractions (Espeholt et al., arXiv, 2018) • Architectural biases (e.g., Raposo et al., NIPS , 2017) • Complementary forms of meta-learning (e.g., Fernando et al., under review) • Episodic reinstatement (Ritter et al., in press)

  39. Neuroscience and AI: A virtuous circle

  40. Collaborators Jane Wang Adam Santoro Zeb Kurth-Nelson Tim Lillicrap Dharshan Kumaran David Barrett Chris Summerfield Dhruva Tirumala Hubert Soyer Remi Munos Joel Leibo Charles Blundell Sam Ritter Demis Hassabis DeepMind, London UK Gatsby Computational Neuroscience Unit, UCL

Recommend


More recommend