Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience Unit, UCL
Mnih et al, Nature (2015)
Mnih et al, Nature (2015)
Yamins & DiCarlo, 2016
Schultz et al, Science (1997)
Jederberg et al., 2016
Jederberg et al., 2016
Mante et al., Nature , 2013 Song et al., Elife , 2017
Lake et al, BBS (2017)
“Learning to learn” Harlow, Psychological Review, 1949
“Learning to learn” Training episodes Harlow, Psychological Review, 1949
Mnih et al, Nature (2015)
Jederberg et al., 2016
Jederberg et al., 2016
https://deepmind.com/blog/impala-scalable-distributed-deeprl-dmlab-30/
a v t t (PFC) δ (DA) o a r t t - 1 t - 1 Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci., 2016; Duan et al., arXiv (2016)
0.7 0.4 0.6 0.9 0.3 0.1 0.8 0.7 Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)
a v t t (PFC) δ (DA) o a r t t - 1 t - 1 Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)
Left Right 4 Gittins indices Cumulative regret Thompson sampling UCB 3 Episode 2 1 1 20 40 60 80 100 Trial Trial Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)
a t v t (PFC) δ (DA) o t a t-1 r t-1 Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)
0.7 0.3 0.6 0.4 0.3 0.7 0.8 0.2 Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)
4 Gittins indices Cumulative regret Thompson sampling UCB 3 Episode 2 1 1 20 40 60 80 100 Trial Trial Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)
Training episodes Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. ( 2016)
a v t t (PFC) δ (DA) o a r t t - 1 t - 1 Volkmann et al., Nature Reviews Neurology, 2010
4 4 2 2 C R C L C R C L 0 0 log 2 log 2 -2 -2 -4 -4 -4 -2 0 2 4 Tsutsui et al., Nature Comms, 2016 -4 -2 0 2 4 R R R R log 2 log 2 R L R L Wang et al., Nature Neuroscience (2018)
a v t t (PFC) δ (DA) o a r t t - 1 t - 1 Wang et al., Nature Neuroscience (2018)
0.6 0.6 0.5 0.5 Correlation Proportion 0.4 0.4 0.3 0.3 Tsutsui et al., Nature Comms, 2016 0.2 0.2 0.1 0.1 a t-1 r t-1 a t-1 x r t-1 v t a t-1 r t-1 a t-1 x r t-1 v t Wang et al., Nature Neuroscience (2018)
a t v t (PFC) δ (DA) o t a t-1 r t-1 Wang et al., Nature Neuroscience (2018)
4 Gittins indices Cumulative regret Thompson sampling UCB 3 Episode 2 1 1 20 40 60 80 100 Trial Trial
A 1 feedback action 0.8 Reward probability 0.6 Learning rate 0.4 Inferred/decoded volatility 0.2 0 0 20 40 60 80 100 120 140 160 180 200 Step B 1 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100 120 140 160 180 200 Step Behrens et al., Nature Neuroscience, 2007 Wang et al., Nature Neuroscience (2018)
Behrens et al., Nature Neuroscience, 2007 Wang et al., Nature Neuroscience (2018)
a v t t (PFC) δ (DA) o a r t t - 1 t - 1 Volkmann et al., Nature Reviews Neurology, 2010
REVERSAL Bromberg-Martin et al, J Neurophys, 2010 Wang et al., Nature Neuroscience (2018)
a v t t Left rewarded Right rewarded (PFC) δ (DA) o a r t t - 1 t - 1 Wang et al., Nature Neuroscience (2018)
Model-based RL (from model-free RL) r 2 = 0.89 1 Meta-RL RPE 0 -1 Stage 2 Reward -1 0 1 Model-based RPE Wang et al., Nature Neuroscience (2018) Miller, Botvinick & Brody, Nat. Neuro., 2017; Daw et al., Neuron, 2011
Optogenetic manipulation of dopamine DA blocked upon DA blocked upon DA triggered upon food reward from food reward from food omission from large/risky option small/certain option large/risky option Stopper et al., Neuron, 2014 Wang et al., arXiv; 2018
Mnih et al, Nature (2015)
Current / Future Work • Richer environments / abstractions (Espeholt et al., arXiv, 2018) • Architectural biases (e.g., Raposo et al., NIPS , 2017) • Complementary forms of meta-learning (e.g., Fernando et al., under review) • Episodic reinstatement (Ritter et al., in press)
Neuroscience and AI: A virtuous circle
Collaborators Jane Wang Adam Santoro Zeb Kurth-Nelson Tim Lillicrap Dharshan Kumaran David Barrett Chris Summerfield Dhruva Tirumala Hubert Soyer Remi Munos Joel Leibo Charles Blundell Sam Ritter Demis Hassabis DeepMind, London UK Gatsby Computational Neuroscience Unit, UCL
Recommend
More recommend