inter individual variability in human feedback learning
play

Inter-individual variability in human feedback learning Stefano - PowerPoint PPT Presentation

Financial Education and Investor Behavior Conference Rio de Janeiro - 7/12/2015 Inter-individual variability in human feedback learning Stefano Palminteri, PhD Institute of Cognitive Science (UCL, London) Laboratoire de Neurosciences


  1. Financial Education and Investor Behavior Conference Rio de Janeiro - 7/12/2015 Inter-individual variability in human feedback learning Stefano Palminteri, PhD Institute of Cognitive Science (UCL, London) Laboratoire de Neurosciences Cognitives (ENS, Paris) stefano.palminteri@gmail.com

  2. Reinforcement learning (I) Learning by trial and error to select actions that maximize the occurrence of pleasant events ( rewards ) and minimize the occurrence of unpleasant events ( punishments ) An evolutionary pervasive psychological process A. Mellifera C. Elegans M. Musculus H. Sapiens volution E Thorndike, Skinner, Sutton, Barto etc…

  3. Reinforcement learning (II) Learning is driven by prediction errors and choices are made comparing action values Policy: Prediction error: Learning rule: P(A) t =1/(1+exp((V(B) t -V(A) t )/ β )) V(A) t+1 =V(A) t + α PE t PE t =R-V(A) t Prediction error (PE t ) Prediction (V(A) t ) 1,0 1,0 1,0 P(A) t 0,5 0,5 0,5 0,0 0,0 0,0 -1 -0,5 0 0,5 1 1 20 1 20 (V(A) t -V(B) t ) Trials Trials Q-learning or Rescorla-Wagner model (RW)

  4. Reinforcement learning (III) Fundamental dimensions: positive/negative vs. exploration/exploration Decision rule: Prediction error: Learning rule: P(A) t =1/(1+exp((V(B) t -V(A) t )/ β )) V(A) t+1 =V(A) t + α PE t PE t =R-V(A) t 1 Exploit previous knowledge 1 Positive prediction errors 2 Explore new options 2 Negative prediction errors

  5. The framework (I) Reinforcement learning processes has been show to operate at different levels of human behavior Economy Motor learning “High level” “Low level” The general idea: Can “low level” reinforcement learning biases explain ‘high level’ behavioral biases? Erev, Camerer, Schultz, etc.

  6. The framework (II) Context (s 1 ,… s j ) Options (a 1 , … a i ) Learning biases! Decision biases Update Selection Option values Choice Probabilities V(s j ,a i ) P(s j ,a i ) Agent Environment 1 Obtained outcomes Action (a) 1 Learning from direct experience (“factual”)

  7. Optimism bias (I) Today special question: good news/bad news effect "It is the peculiar and perpetual error of the human understanding to be more moved and excited by affirmatives than negatives ; whereas it ought properly to hold Data itself indifferently disposed towards both alike" (p. 36). Francis Bacon (1620) PE>0 Belief(t) Information(t) Belief(t+1) PE<0 Revising beliefs as a function of: Insensitivity to negative errors can generates: • • Better than expected (PE>0) Inflated likelihood for desired events • • Reduced likelihood for undesired events Worst than expected (PE<0) Sharot et al.

  8. Optimism bias (II) Current hypothesis: Asymmetric learning from positive and negative prediction errors as an atomic computational mechanism to generate and sustain optimistic beliefs (low  high level) Current questions: 1) Is this learning asymmetry specific of abstract belief or applies also to rewards? 2) Is this learning asymmetry dependent on the stimuli having prior desirability or still stands for neutral stimuli? Is this learning asymmetry specific to fictive – simulated- 3) experience or also stands for actual outcomes?

  9. First study Experimental task and contingencies and dependent variables: Symmetric Symmetric options values Asymmetric option values options values = > < = Data “Conservatism” Motor bias Learning performances Data from Palminteri et al, Jneurosci, 2009; Worbe et al, Archives Gen Psy, 2011

  10. First study (N=50)  The stimuli has no prior desirability Data  The outcomes are not hypothetical but real Conditions 2 & 3 1 0.9 Correct Choice Rate 0.8 0.7 0.6 0.5 0 5 10 15 20 25 Trials Conditions 1 & 4 1 0.9 Preferred Choice Rate 0.8 0.7 0.6 0.5 0 5 10 15 20 25 Trials

  11. Formalism and predictions Decision rule: Prediction error: Learning rule: P(A) t =1/(1+exp((V(B) t -V(A) t )/ β )) V(A) t+1 =V(A) t + α PE t PE t =R-V(A) t Rescorla-Wagner model (RW) Rescorla-Wagner model (RW ) V(A) t+1 =V(A) t + α + PE t P(A) t =1/(1+exp((V(B) t -V(A) t )/ β )) PE t =R-V(A) t V(A) t+1 =V(A) t + α - PE t Standard RL Optimistic RL Pessimistic RL Possible results concerning the learning rates

  12. The computational and behavioral results Model comparison Parameters Bayes: Which model? Parameters Behaviour Popper: Why?  Signs of optimistic reinforcement learning  Optimism enforces

  13. A microscopic analysis or optimistic and realistic behavior Typical RW subjects Typical RW subjects A computational Preferred choice rate Preferred choice rate explanation for developing a “preferred option” even in poorly rewarding environments Preferred - Not preferred Preferred - Not preferred Value difference Value difference =

  14. The robustness of the result Minimum outcome: Learning phase: Contingencies: Optimistic RL is expressed Optimistic RL is an artifact Optimistic RL is an because not winning is not arising from subjects artifact arising from “deciding” which is the subjects “giving up” that bad and would disappear with actual best option after the first symmetrical low reward monetary punishments: outcomes : conditions :

  15. Testing the inflexibility of optimist subjects Limitations of the first studies  Task included only stable environments  Thus, subjects incurred not big losses by behaving optimistically Hypothesis concerning the learning rate: Standard Optimistic VS α C+ α C- α C+ α C- Contingence reversal The asymmetry is robust VS suppresses the asymmetry across contingency types.

  16. The second study (N=20) The task includes a reversal learning condition (which should promote flexibility) First set: Second set: Learning is driven by Learning is driven by positive prediction negative prediction errors errors

  17. The computational and behavioral results

  18. The reversal learning curves Slower, but flexible Quicker, but inflexible  Optimistic learning is confirmed also when there are losses  Optimistic learning is confirmed also when it is maladaptive

  19. Interim conclusions (I) and new questions So far: • We demonstrated that even in simple task involving abstract neutral items and direct reinforcement , subjects preferentially update their reward expectations following positive, that negative prediction errors. • This is true even when this behavior is disadvantageous (reversal learning) • However this tendency was quite variable across subjects: New questions: 1) Is optimistic reinforcement learning associated to interindividual differences in optimistic personality trait ? 2) Is this interindividual variability associated to specific neuroanatomical and functional brain signatures ? 3) Is this computational bias influenced by individual socioeconomic environment?

  20. The link with optimistic personality trait Model-based correlation Life Orientation Test - Revised (LOT-R) Behavioural correlation External validity (I): Relation to psychometric measures of “optimism”

  21. The neural bases of optimistic RL Neuroanatomical (VBM) Neurophysiology (fMRI) Model-based correlation Behavioural correlation Policy Update External validity (II): Relation to brain signatures ( Neurocomputational phenotypes)

  22. The effect of environmental harshness ( preliminary ) Different life trajectories Bias amplitude External validity (III): Sensitivity to socio economic status

  23. Extending the framework Context (s 1 ,… s j ) Options (a 1 , … a i ) Learning biases! Update Selection Option values Choice Probabilities V(s j ,a i ) P(s j ,a i ) Agent Environment 1 Obtained outcomes Action (a) 2 Forgone outcomes 1 Learning from direct experience (“factual”) 2 Learning from simulates experience (“counterfactual”)

  24. The third study (N=20) Remain question (among others): - Is counterfactual learning also biased ? - This reinforcement learning bias is a valuation bias or a confirmation bias? New design - This task includes counterfactual feedbacks R U R C

  25. Formalism and predictions Factual learning (as before) Counterfactual learning (new) V(C) t+1 =V(C) t + α C+ PE Ct V(U) t+1 =V(U) t + α U+ PE Ut PE Ct =R C -V(C) t PE Ut =R U -V(U) t V(C) t+1 =V(C) t + α C- PE Ct V(U) t+1 =V(U) t + α U- PE Ut Optimistic Realistic Allocentric Egocentric VS VS α U+ α U- α U+ α U- α U+ α U- α C+ α C- We know that Counterfactual The bias is choice the bias is choice feedback processing independent dependent is unbiased ( valuation bias ) ( confirmation bias )

  26. Results - Counterfactual learning is also biased - The counterfactual learning bias is choice oriented (as a confirmation bias would be)

  27. Final Conclusions  The good news/bad news (optimism) effect may be the consequence of a low level reinforcement learning bias  Factual learning extend to counterfactual learning in the form of a confirmation bias  This computational bias is highly variable in the population and this variability has external validity in term of neural bases, personality trait and environmental influences. Real life Abstract cues Fictive Real VS VS experience experience Reward Punishment Stable Volatile VS VS omission reception environment environment

Recommend


More recommend