neurobiological foundations of reward and risk
play

Neurobiological Foundations of Reward and Risk ... and - PDF document

Neurobiological Foundations of Reward and Risk ... and corresponding risk prediction errors Peter Bossaerts 1 Contents 1. Reward Encoding And The Dopaminergic System 2. Reward Prediction Errors And TD (Temporal Difference) Models 3.


  1. Neurobiological Foundations of Reward and Risk ... and corresponding risk prediction errors Peter Bossaerts 1 Contents 1. Reward Encoding And The Dopaminergic System 2. Reward Prediction Errors And TD (Temporal Difference) Models 3. Alternative Approaches: 3.1. Pharmacological 3.2. Direct Dopamine Measurement 4. Risk: Variance 5. Risk Prediction Errors 6. The Norepinephrine Story 7. Skewness 8. Correlation 9. Processing Time 2

  2. 1. Reward Encoding And The Dopaminergic System ★ Go back to Class 2: • Single-Unit Recording Of Dopamine Neurons • fMRI Analysis Of Reward (And Risk) ★ Remark: fMRI focuses on projection areas of Dopamine Neurons 3 2. Reward Prediction Errors And TD (Temporal Difference) Models Dopamine neurons do NOT signal expected rewards but reward prediction errors! (Expected rewards/values are encoded in ventro-medial prefrontal cortex, among others) 4

  3. Prediction Error Learning Simple “Rescorla Prediction! F t + 1 = F t + αη t . ! Wagner” Learning Rule Notice relation between DOPAMINE! math and emotions! +:!Elation! ,:!Disappointment! Bossaerts!@!Claremont!Athenaeum! 7! 5 TD Models: Learning (To Do) The Right Thing Through Reinforcement (Prediction Errors) Can learn to assign value (of discounted future rewards) to complex signals Derives from dynamic programming... 6

  4. Dynamic Programming Value function V States S (transiting to S’) Actions to be taken... while learning value function (Converges IF RIGHT EXPLORATION; see Watkins-Dayan 1992) 7 Dorsal vs Ventral Striatum Top: Ventral Striatum (A: Pavlovian; B: Instrumental (Conditioning) Bottom: Caudate pre- signal instrumental (From O’Doherty) Re- corre- predic- Pavlov- 8

  5. 3.1. Pharmacological Evidence L-Dopa (Green): Dopamine agonist Haloperidol (Red): Dopamine Antagonist Placebo = Grey ... in a two-armed bandit task (From Pessiglione) 9 (Curious I NSULA “M ANIPULATION ” A FFECTS L OSS L EARNING Difference In  Same task, but now with insula lesion Loss patients (Unpublished , Pessiglione) Learning) 27 I NSULA A CTIVATION I N L OSS A VOIDANCE T ASK OPPONENT P REDICTS S UCCESS I N (L OSS ) B ANDIT P ROCESS T HEORY P ROBLEM 1 M ONTH L ATER  For best control, let two a b Younger Adults opponent forces balance Older Adults each other (thumb + 100 Avoidance Learning (% Correct) index) r = .45, p < .05, p rep = .897  (a) Reward prediction 87.5 errors in striatum, GAIN and (2 nd row) 75 LOSS conditions  (b) Punishment prediction errors in 62.5 insula, LOSS trials only  Notice: opponent 50 process in loss trials –0.2 –0.1 0 0.1 0.2 Insula Peak Voxel  (Not unlike pain z = 4.71, p < .000005, p rep = .999 Percentage Signal Change avoidance/relief: 26 Seymour ea 2005)  Samanez- Larkin ea, Psych 28 Reviews 10 Fig. 1. Task structure for a representative trial.

  6. Important Remark The idea that gains and losses are to be valued separately (as in Prospect Theory or Disappointment Aversion) squares well with the neurobiological foundation Of course, it is not clear (yet) where the brain sets the reference point! (What is a loss?) 11 3.2. Direct Dopamine Measurement Day ea, Nat Neuro 2007: dopamine release in Nucleus Accumbens of rats is correlated with reward predictive cues Notice learning effect! 12

  7. 4. Risk: Variance Using brain signals to See Class predict choice 2 Slides. VSt (ventral striatum): (Can one ! correlates with expected reward use ACC (anterior cingulate ! cortex): correlates with signals to (objective) risk predict IFG (inferior frontal gyrus): ! correlates with (subjective) risk choice? (Christopoulos ea, J • Yes!) Neuroscience 2009) (ACC and IFG have opposite • effects: opponent theory in fMRI brain signals predicts risk taking biology ) 13 13 Getting At Causality... Disrupting process using transcranial magnetic stimulation ! Disrupting inferior frontal gyrus leads to reduced risk aversion (Knoch ea, J Neuroscience • 2006) 14 14

  8. 5. Risk Prediction Errors Risk Prediction Error = SIZE of reward prediction error minus EXPECTED SIZE (variance) (Driving term of a GARCH process) 15 6. The Norepinephrine Story See pupil dilation slide, Class 2... 16

  9. (Sophisticated - and only 1 step ahead) E XAMPLE : B ET =“S ECOND CARD LOWER ” C ARD 1=3 C ARD 2=2 Risk Prediction Errors t 1 : Forecast t 0 : Prediction based on t 2 : Outcome = +1 of Forecast = 0 card 1 = -5/9 (t 1 ,t 2 ) (Outcome-Forecast) Risk ~0.7 (t 0 ,t 1 ) (Forecast) Risk 4 ~ 0.6 17 Relation With Tracking “Changes” In The Environment? … AND NOREPINEPHRINE  Enhancing NE levels induces rats to abandon old “hypotheses” and find the newly optimal paths in a navigation task  (Or are they more sensitive to signals of UNEXPECTED UNCERTAINTY, i.e., that 32 something changed? Yu-Dayan, Neuron 2005) 18

  10. 7. Higher Moments Skewness ! Skewness = one-sided outliers ! Positive & negative skewness: anterior insula (again) ! Positive skewness also ventral striatum (see expected reward)! Wu ea, PLoS ONE (2011) 12 19 Variance-Skewness See MEG results, Class 2 20

  11. 8. Correlation The Task 21 Results... (Wunderlich, Symmonds, ea Neuron 2011) 22

  12. 8. Processing Time Computing “value” takes time because all components have to be evaluated (reward, risk,...) and then put together... What happens if we constrain subjects to make a decision within 1s, 3s (and 5s)? 23 Results 1s 5s Prob(gamble) Prob(gamble) • Increases in Expected Reward • Increases in Expected Reward • Decreases in Variance • Insensitive to Variance • Decreases in Skew (Risk Loving • Decreases in Skew (more!) for Losses) • Insensitive to PRICE • Increases in PRICE 24

Recommend


More recommend