reinforcement learning in humans and animals nathaniel daw nyu - PowerPoint PPT Presentation

reinforcement learning in humans and animals nathaniel daw nyu neuroscience; psychology; neuroeconomics cognition centric roundtable stevens, may 13 2011

collaborators NYU: Aaron Bornstein Sara Constantino Nick Gustafson Jian Li Seth Madlon-Kay Dylan Simon Bijan Pesaran Columbia: Daphna Shohamy Elliott Wimmer UCL: Peter Dayan Ben Seymour Ray Dolan Berkeley: Bianca Wittmann U Chicago: Jeff Beeler Xiaoji Zhuang Princeton: Yael Niv Sam Gershman Trinity: John O’Doherty Tel Aviv: Tom Schonberg Daphna Joel Montreal: Aaron Courville CMU: David Touretzky Austin: Ross Otto funding: NIMH, NIDA, NARSAD, McKnight Endowment, HFSP

question longstanding question in psychology: what information is learned from reward – law of effect (Thorndike): learn to repeat reinforced actions • dopamine – cognitive maps (Tolman): learn “map” of task structure; evaluate new actions online • even rats can do this

new leverage on this problem draw on computer science, economics for methods, frameworks 1.new computational & neural tools – examine learning via trial-by-trial adjustments in behavior and neural signals 1.new computational theories – algorithmic view – dopamine associated with “model-free” RL – “model-based” RL as account for cognitive maps (Daw, Niv & Dayan 2005, 2006)

learned decision making in humans 0.5 probability 0.25 + 0 0.5 “bandit” tasks probability Daw et al. 2006 0.25 Wittmann et al 2008 Gershman et al 2009 Schonberg et al 2007, 2010 Glascher et al. 2010 0 Li & Daw 2011 0 100 200 300 trial

trial-by-trial analysis … t-3 t-4 t-1 t-2 experience (past choices & outcomes) Probability model predicted values (RL algorithm + prediction errors probabilistic choice rule: etc experience  choices) Choice behavior: predicted choice which model & parameters make observed choices most likely? (probabilities)

? Á E[ V ( a )] = Σ o P( o | a ) V ( o ) “model- “model- based” free”

rat version (Balleine, Daw & O’Doherty, 2009) Lever Presses Valued (Holland, 2004) Devalued two behavioral modes: 10 devaluation-sensitive actions per minute (“goal directed”) devaluation-insensitive 5 (“habitual”)  neurally dissociable with lesions 0 (Dickinson, Balleine, Killcross) moderate extensive training training  dual systems view

task 70% 70 % with prob : 26% 57% 41% 28% ( all slowly changing ) (Daw, Gershman, Seymour, et al Neuron 2011)

question does choice behavior respect sequential structure?

idea How does bottom-stage feedback affect top-stage choices? Example: rare transition at top level, followed by win 30% • Which top-stage action is now favored?

predictions direct reinforcement model-based planning ignores transition structure respects transition structure

data 17 subs x 201 trials each reward: p<1e-8 reward x rare: p<5e-5 (mixed effects logit) planning reinforcement  results reject pure reinforcement models  suggest mixture of planning and reinforcement processes (Daw, Gershman, Seymour, et al Neuron 2011)

dual task single task dual task dual x reward: p < 5e-7 dual x reward x rare: p< .05 Otto, Gershman, Markman

neural analysis behavior incorporates model knowledge: not just TD want to ask same question neurally can we dissociate multiple neural systems underlying neural behavior •in particular, can we show subcortical systems are dumb?

dopamine & RL (Schultz et al. 1997) (Daw et al. 2011)

fMRI analysis hypothesis: striatal “error” signals are solely reinforcement driven 1. generate candidate error signals assuming TD 2. additional regressor captures how this signal would be changed for errors relative to values computed by planning TD error change due to net signal forward planning + β· = estimate this

fMRI analysis TD error change due to forward planning + β·  contrary to theories: even striatal error signals incorporate knowledge of task structure (Daw, Gershman, Seymour, et al Neuron 2011) (P<.05 cluster)

variation across subjects subjects differ in degree of model usage change due to net signal planning = TD error + β· = compare behavioral & neural estimates

variation across subjects subjects differ in degree of model usage TD error change due to net signal planning + β· = p<.05 SVC

average signal R NAcc: start of trial: • interaction not significant • but size of interaction covaries with behavioral model usage (p=.02)

thoughts can distinguish multiple learned representations in humans •neurally more intertwined than expected related areas: self control (drugs, dieting, savings etc.) learning in multiplayer interactions (games) •equilibrium vs equilibration •do we learn about actions or about opponents?

p-beauty context • fast equilibration with repeated play, most subjects never reinforced Singaporean undergrads – Ho et al. 1998

RPS • do subjects learn by reinforcement? • best respond to reinforcement? • best respond to that? (Hampton et al, 2008)

conclusions 0. use of computational models to quantify phenomena & distinctions for neural study 1. can leverage this to distinguish different sorts of learning, trial-by-trial – beginning to map neural substrates 2. implications for self control, economic interactions

reinforcement learning in humans and animals nathaniel daw nyu - PowerPoint PPT Presentation

reinforcement learning in humans and animals nathaniel daw nyu neuroscience; psychology; neuroeconomics cognition centric roundtable stevens, may 13 2011 collaborators NYU: Aaron Bornstein Sara Constantino Nick Gustafson Jian Li Seth

10703 Deep Reinforcement Learning Reinforcement Learning in Humans and Animals Tom Mitchell

TRAMADOL LETHAL DOSE HUMANS ARE ANIMALS PRESENTATION Tramadol Lethal Dose Humans Are Animals

Staying Alive Animals need to eat to stay alive. Different animals eat different types of food.

Science Animals Including Humans Science | Year 6 | Animals Including Humans | Transporting Water

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

ANIMALS IN SERBIA Ivan Kurajov President of the Society for the Protection of Animals

Science Success Criteria Aim I can name some animals that are carnivores, herbivores and

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

What were learning about Animals that live on or in other animals Sometimes theyre too

PHYLUM PORIFERA Sponges Simple Animals SPONGES Simplest and most unusual animals Most

Animals Adaptations Makaila and Presten What are adaptions Physical adaptions help the animals

Differential-Algebraic Dynamic Logic for KeYmaera X CPS Grand Prix Benjamin Lim Yao Chong Lim

Code-Assisted Music Composition with Python and Logic Pro X Marcelo Cicconet, IMPA, Feb 2019

Moving the Big Band Online Preparing your students for the future: Capitalizing on the

Request Window: an Approach to Improve Throughput of RDBMS-based Data Integration System by

Ubiquitous and Mobile Computing CS 525M: RiskRanker: Scalable and Accurate Zero day Android

Dis iscl closures I have no relevant financial relationships with the organizer of todays

Cada Da - Welsh Meeting Template Social Language Learning Program - Template - Monday - Dydd

Control-based continuation From models to experiments David Barton Engineering Mathematics,