Prefrontal cortex as a meta-reinforcement learning system Wang et - PowerPoint PPT Presentation

Prefrontal cortex as a meta-reinforcement learning system Wang et al. CS330 Student Presentation

Motivation ● Computational Neuro: AI <> Neurobio Feedback Loop ○ Convolutions and the eye, SNNs and Learning Rules, etc. ● Meta Learning to Inform Biological Systems ○ Canonical Model of Reward-Based Learning ■ dopamine 'stamps in' associations between situations, actions and rewards by modulating the strength of synaptic connections between neurons. ○ Recent findings have placed this standard model under strain. ■ neural activity in PFC appears to reflect a set of operations that together constitute a self-contained RL algorithm New model of Reward Based Learning - proposes a insights from Meta-RL ● that explain these recent findings ○ 6 simulations - tie experimental neuroscience data to matched Meta-RL outputs

Modeling Assumptions ● System Architecture ○ PFC (and basal ganglia, thalamic nuclei) as an RNN ○ Inputs : Perceptual data with accompanying information about actions and rewards ○ Outputs : triggers for actions, estimates of state value ● Learning ○ DA - RL system for synaptic learning (meta train) ■ Modified to provide RPE, in place of reward, as input to the network ○ PFC - RL system for activity based representations (meta-test) ● Task Environment ○ RL takes place on a series of interrelated tasks ○ Necessitating ongoing inference and behavioral adjustment

Model Performance - Two Armed Bandit task Exploration -> Exploitation 0.25, 0.75 (top) 0.6, 0.4 (bottom)

Model Performance - Two Armed Bandit task

Simulation 1 -

Simulation 2 ● Meta Learning on the learning rate ○ Treated as a two-armed bandit task ○ Stable periods vs volatile periods (re: pay-off probabilities) ● Different environment structures will lead to different learning rules

Simulation 3 ● Visual target appeared to the left or right of a display ● Left or right targets yielded juice rewards and sometimes the roles reversed ○ Whenever the rewards reversed, the dopamine response changed to the other target also changed which show that the hippocampus encodes abstract latent-state representations

Simulation 4 Two step task

Simulation 5

Simulation 6 - Experimental Setup: Overriding phasic dopamine signals redirects action selection during risk/reward decision making. Neuron Probabilistic risk/reward task (mice/optogen.) ● Choice: ‘safe’ arm that always offered a small reward (rS = 1) or a ‘risky’ arm that offered a large reward (rL = 4) p = 0.125 ● 5 forced pulls each of the safe and risky arms (in randomized pairs), followed by 20 free pulls.

Simulation 6 - Results Simulate optogenetic stimulation <> manipulating the value of the reward prediction error fed into the actor Same performance across a range of payoff parameters and dopamine interference

Extensions + Criticisms - Analyses in the paper mostly intuition based - “these charts match up” - Ideally should have stronger correlative evidence beyond this - Observation/end results based, not much to do with physical/inner mechanisms of PFC/DA - Results are compared to high level aggregated behaviors - Not much exploration/variation into reference architecture used

Overall Conclusions ● Simulations demonstrate comparisons between meta-RL and RL algorithms with human and animal tests ● Various roles of the brain and associated chemicals in creating model-based learning ● Leverage findings from neuroscience/psychology and existing AI algorithms to help explain learning

Prefrontal cortex as a meta-reinforcement learning system Wang et - PowerPoint PPT Presentation

Prefrontal cortex as a meta-reinforcement learning system Wang et al. CS330 Student Presentation Motivation Computational Neuro: AI <> Neurobio Feedback Loop Convolutions and the eye, SNNs and Learning Rules, etc. Meta

Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK

Meta Reinforcement Learning as Task Inference Jan Humplik, Alexandre Galashov, Leonard

ADDICTION Prefrontal Cortex Affect Dysregulation Window of Tolerance Internal

Journal- prefrontal cortex hypoactivity prevents compulsive cocaine Published Weekly (51

Meta Reinforcement Learning Kate Rakelly 11/13/19 Questions we seek to answer Motivation : What

Reinforcement Learning by the People and for the People: With a Focus on Lifelong / Meta /

Learning a Prior over Intent via Meta-Inverse Reinforcement Learning Kelvin Xu, Ellis Ratner, Anca

Distributed Meta Optimization of Reinforcement Learning Agents Greg Heinrich, Iuri Frosio - GTC

Efficient Off-Policy Meta- Reinforcement Learning via Probabilistic Context Variables Rakelly,

PEARL Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables Kate

Unsupervised Meta-Learning for Reinforcement Learning LAMDA, Nanjing University . .

Hippocampal-prefrontal plasticity seems to reverberate in a thalamic-prefrontal loop: what else

Causal Reasoning from Meta-reinforcement Learning Dasgupta et al. (2018) CS330 Student

Meta Reinforcement Learning Chelsea Finn Why are humans so good at RL? People have prior

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning-Based End-to-End Parking for Automatic Parking System CS885

Paths to Learning Reinforcement Learning Ever wonder how the orcas and dolphins at Sea World are

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Symbolic Regression for Reinforcement Learning and Dynamic System Modeling Robert Babuka 1

Reinforcement Learning You can think of supervised learning as the teacher providing answers

Examples of Reinforcement Learning Robocup Soccer Teams Stone & Veloso, Reidmiller et al.

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Reinforcement Learning Reinforcement Learning Now that you know a little about Optimal Control

Prefrontal cortex as a meta-reinforcement learning system Wang et - PowerPoint PPT Presentation

Prefrontal cortex as a meta-reinforcement learning system Wang et al. CS330 Student Presentation Motivation Computational Neuro: AI <> Neurobio Feedback Loop Convolutions and the eye, SNNs and Learning Rules, etc. Meta

Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK

Meta Reinforcement Learning as Task Inference Jan Humplik, Alexandre Galashov, Leonard

ADDICTION Prefrontal Cortex Affect Dysregulation Window of Tolerance Internal

Journal- prefrontal cortex hypoactivity prevents compulsive cocaine Published Weekly (51

Meta Reinforcement Learning Kate Rakelly 11/13/19 Questions we seek to answer Motivation : What

Reinforcement Learning by the People and for the People: With a Focus on Lifelong / Meta /

Learning a Prior over Intent via Meta-Inverse Reinforcement Learning Kelvin Xu, Ellis Ratner, Anca

Distributed Meta Optimization of Reinforcement Learning Agents Greg Heinrich, Iuri Frosio - GTC

Efficient Off-Policy Meta- Reinforcement Learning via Probabilistic Context Variables Rakelly,

PEARL Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables Kate

Unsupervised Meta-Learning for Reinforcement Learning LAMDA, Nanjing University . .

Hippocampal-prefrontal plasticity seems to reverberate in a thalamic-prefrontal loop: what else

Causal Reasoning from Meta-reinforcement Learning Dasgupta et al. (2018) CS330 Student

Meta Reinforcement Learning Chelsea Finn Why are humans so good at RL? People have prior

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning-Based End-to-End Parking for Automatic Parking System CS885

Paths to Learning Reinforcement Learning Ever wonder how the orcas and dolphins at Sea World are

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Symbolic Regression for Reinforcement Learning and Dynamic System Modeling Robert Babuka 1

Reinforcement Learning You can think of supervised learning as the teacher providing answers

Examples of Reinforcement Learning Robocup Soccer Teams Stone &amp; Veloso, Reidmiller et al.

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Reinforcement Learning Reinforcement Learning Now that you know a little about Optimal Control

Examples of Reinforcement Learning Robocup Soccer Teams Stone & Veloso, Reidmiller et al.