reinforcement learning in psychology and neuroscience
play

Reinforcement Learning in Psychology and Neuroscience with thanks - PowerPoint PPT Presentation

Reinforcement Learning in Psychology and Neuroscience with thanks to Elliot Ludvig Princeton University Psychology has identified two primitive kinds of learning Classical Conditioning Operant Conditioning (a.k.a. Instrumental


  1. Reinforcement Learning in Psychology and Neuroscience with thanks to Elliot Ludvig Princeton University

  2. Psychology has identified two primitive kinds of learning • Classical Conditioning • Operant Conditioning (a.k.a. Instrumental learning) • Computational theory: ❖ Classical = Prediction - What is going to happen? ❖ Operant = Control - What to do to maximize reward?

  3. Classical Conditioning

  4. Pavlov • Russian physiologist • Interested in how learning happened in the brain • Conditional and Unconditional Stimuli

  5. Rescorla-Wagner Model (1972) • Computational model of conditioning ❖ Widely cited and used • Learning as violation of expectations ❖ TD learning as extension of RW

  6. Operant Learning • Operant Conditioning is all about choice in 3 main ways: ❖ Decide which response to make? ❖ Decide how much to respond? ❖ Decide when to respond?

  7. Thorndike’s Puzzle Box

  8. Operant Chambers

  9. Complex Cognition

  10. Marr’s 3 Levels of Analysis • Computational ❖ What function is being fulfilled? • Algorithmic ❖ How is it accomplished? • Implementational ❖ What physical substrate is involved?

  11. The Basic TD Model • Learn to predict discounted sum of upcoming reward through TD with linear function approximation: n � V t = w T t x t = w t ( i ) x t ( i ) i =1 • The TD error is calculated as: δ t = r t +1 + γ V t +1 − V t .

  12. TD( λ ) algorithm/model/neuron Reward ∑ w i ⋅ x i States x i e i w i δ i or TD Features Value of state Error or action λ w i ~ δ ⋅ e i ˙ TD Eligibility Error Trace

  13. Brain reward systems What signal does this neuron carry? Honeybee Brain VUM Neuron Hammer, Menzel

  14. Dopamine • Small-molecule Neurotransmitter ❖ Diffuse projections from mid-brain throughout the brain from Pinel (2000), p.364 Key Idea: Phasic change in baseline dopamine responding = reward prediction error

  15. Dopamine neurons signal the TD error error/change in prediction of reward Wolfram Schultz, et al.

  16. Reward Unexpected Reward Value Representation- TD error independent Reward Expected predictions Cue of TD errors Value TD error Reward Absent Value TD error t = r t +1 + γ V t +1 � V t TD error

  17. The theory that Dopamine = TD error is one of the most important interactions ever between artificial intelligence and neuroscience

Recommend


More recommend