Prefrontal cortex as a Meta-reinforcement learning system Matthew - PowerPoint PPT Presentation

Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience Unit, UCL

Mnih et al, Nature (2015)

Yamins & DiCarlo, 2016

Schultz et al, Science (1997)

Jederberg et al., 2016

Mante et al., Nature , 2013 Song et al., Elife , 2017

Lake et al, BBS (2017)

“Learning to learn” Harlow, Psychological Review, 1949

“Learning to learn” Training episodes Harlow, Psychological Review, 1949

Jederberg et al., 2016

https://deepmind.com/blog/impala-scalable-distributed-deeprl-dmlab-30/

a v t t (PFC) δ (DA) o a r t t - 1 t - 1 Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci., 2016; Duan et al., arXiv (2016)

0.7 0.4 0.6 0.9 0.3 0.1 0.8 0.7 Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)

a v t t (PFC) δ (DA) o a r t t - 1 t - 1 Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)

Left Right 4 Gittins indices Cumulative regret Thompson sampling UCB 3 Episode 2 1 1 20 40 60 80 100 Trial Trial Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)

a t v t (PFC) δ (DA) o t a t-1 r t-1 Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)

0.7 0.3 0.6 0.4 0.3 0.7 0.8 0.2 Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)

4 Gittins indices Cumulative regret Thompson sampling UCB 3 Episode 2 1 1 20 40 60 80 100 Trial Trial Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)

Training episodes Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)

a v t t (PFC) δ (DA) o a r t t - 1 t - 1 Volkmann et al., Nature Reviews Neurology, 2010

4 4 2 2 C R C L C R C L 0 0 log 2 log 2 -2 -2 -4 -4 -4 -2 0 2 4 Tsutsui et al., Nature Comms, 2016 -4 -2 0 2 4 R R R R log 2 log 2 R L R L Wang et al., Nature Neuroscience (2018)

a v t t (PFC) δ (DA) o a r t t - 1 t - 1 Wang et al., Nature Neuroscience (2018)

0.6 0.6 0.5 0.5 Correlation Proportion 0.4 0.4 0.3 0.3 Tsutsui et al., Nature Comms, 2016 0.2 0.2 0.1 0.1 a t-1 r t-1 a t-1 x r t-1 v t a t-1 r t-1 a t-1 x r t-1 v t Wang et al., Nature Neuroscience (2018)

a t v t (PFC) δ (DA) o t a t-1 r t-1 Wang et al., Nature Neuroscience (2018)

4 Gittins indices Cumulative regret Thompson sampling UCB 3 Episode 2 1 1 20 40 60 80 100 Trial Trial

A 1 feedback action 0.8 Reward probability 0.6 Learning rate 0.4 Inferred/decoded volatility 0.2 0 0 20 40 60 80 100 120 140 160 180 200 Step B 1 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100 120 140 160 180 200 Step Behrens et al., Nature Neuroscience, 2007 Wang et al., Nature Neuroscience (2018)

Behrens et al., Nature Neuroscience, 2007 Wang et al., Nature Neuroscience (2018)

a v t t (PFC) δ (DA) o a r t t - 1 t - 1 Volkmann et al., Nature Reviews Neurology, 2010

REVERSAL Bromberg-Martin et al, J Neurophys, 2010 Wang et al., Nature Neuroscience (2018)

a v t t Left rewarded Right rewarded (PFC) δ (DA) o a r t t - 1 t - 1 Wang et al., Nature Neuroscience (2018)

Model-based RL (from model-free RL) r 2 = 0.89 1 Meta-RL RPE 0 -1 Stage 2 Reward -1 0 1 Model-based RPE Wang et al., Nature Neuroscience (2018) Miller, Botvinick & Brody, Nat. Neuro., 2017; Daw et al., Neuron, 2011

Optogenetic manipulation of dopamine DA blocked upon DA blocked upon DA triggered upon food reward from food reward from food omission from large/risky option small/certain option large/risky option Stopper et al., Neuron, 2014 Wang et al., arXiv; 2018

Current / Future Work • Richer environments / abstractions (Espeholt et al., arXiv, 2018) • Architectural biases (e.g., Raposo et al., NIPS , 2017) • Complementary forms of meta-learning (e.g., Fernando et al., under review) • Episodic reinstatement (Ritter et al., in press)

Neuroscience and AI: A virtuous circle

Collaborators Jane Wang Adam Santoro Zeb Kurth-Nelson Tim Lillicrap Dharshan Kumaran David Barrett Chris Summerfield Dhruva Tirumala Hubert Soyer Remi Munos Joel Leibo Charles Blundell Sam Ritter Demis Hassabis DeepMind, London UK Gatsby Computational Neuroscience Unit, UCL

Prefrontal cortex as a Meta-reinforcement learning system Matthew - PowerPoint PPT Presentation

Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience Unit, UCL Mnih et al, Nature (2015) Mnih et al, Nature (2015) Yamins & DiCarlo, 2016 Schultz et al, Science

Prefrontal cortex as a meta-reinforcement learning system Wang et al. CS330 Student

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Hippocampal-prefrontal plasticity seems to reverberate in a thalamic-prefrontal loop: what else

Chapter 6 Vision Exam 1 Anatomy of vision Primary visual cortex (striate cortex, V1)

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

ADDICTION Prefrontal Cortex Affect Dysregulation Window of Tolerance Internal

Journal- prefrontal cortex hypoactivity prevents compulsive cocaine Published Weekly (51

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Meta Reinforcement Learning as Task Inference Jan Humplik, Alexandre Galashov, Leonard

Meta Reinforcement Learning Kate Rakelly 11/13/19 Questions we seek to answer Motivation : What

On the Security of Election Audits with Low Entropy Randomness Eric Rescorla ekr@rtfm.com

Control-based continuation From models to experiments David Barton Engineering Mathematics,

Cada Da - Welsh Meeting Template Social Language Learning Program - Template - Monday - Dydd

Dis iscl closures I have no relevant financial relationships with the organizer of todays

Public Meeting #3 October 9, 2014 Tonights Schedule 6:00 6:30 pm Open House 6:30 7:00

Observing the Bursting Universe with LIGO: Status and Prospects Erik Katsavounidis LSC Burst

Quadratic relations for periods of connections Claude Sabbah Joint work with Javier Fresn

CSS CSS - cascading style sheets CSS - permite separar num documento HTML o contedo do

Sambuz

Useful Links

Newsletter

Mail Us