10703 deep reinforcement learning
play

10703 Deep Reinforcement Learning Reinforcement Learning in Humans - PowerPoint PPT Presentation

10703 Deep Reinforcement Learning Reinforcement Learning in Humans and Animals Tom Mitchell October 29, 2018 Reading: Barto & Sutton Chapter 15 Tom Mitchell, October 2018 Outline RL in primates RL in humans Error signals and


  1. 10703 Deep Reinforcement Learning Reinforcement Learning in Humans and Animals Tom Mitchell October 29, 2018 Reading: Barto & Sutton Chapter 15 Tom Mitchell, October 2018

  2. Outline • RL in primates • RL in humans • Error signals and predictive coding Tom Mitchell, October 2018

  3. Reward based learning in primates Tom Mitchell, October 2018

  4. Dopamine As Reward Signal t [Schultz et al., Science , 1997] Tom Mitchell, October 2018

  5. Dopamine As Reward Signal t [Schultz et al., Science , 1997] Tom Mitchell, October 2018

  6. Dopamine As Reward Signal t [Schultz et al., Science , 1997] error r γ V(s ) V(s ) = + − t t 1 t + � 6 Tom Mitchell, October 2018

  7. Reward based learning in humans Tom Mitchell, October 2018

  8. RL Models for Human Learning [Seymore et al., Nature 2004] Tom Mitchell, October 2018

  9. [Seymore et al., Nature 2004] � 9 Tom Mitchell, October 2018

  10. One Theory of RL in the Brain from [Nieuwenhuis et al.] • Basal ganglia monitor events, predict future rewards • When prediction revised upward (downward), causes increase (decrease) in activity of midbrain dopaminergic neurons, influencing ACC • This dopamine-based activation somehow results in revising the reward prediction function. Possibly through direct influence on Basal ganglia, and via prefrontal cortex Tom Mitchell, October 2018

  11. Tom Mitchell, October 2018

  12. Tom Mitchell, October 2018

  13. Neuron Level Learning Mechanisms • Hebbian learning – fire together � wire together • Spike Timing Dependent Plasticity 
 (STDP) – if incoming neuron fires before outgoing 
 then strengthen connection – if incoming neuron fires after outgoing 
 then weaken connection • Reward modulated STDP – less understood – in some neurons, appears STDP occurs only if neuromodulator (e.g., dopamine) activity follows firing within time up to 10 sec Tom Mitchell, October 2018

  14. Tom Mitchell, October 2018

  15. Summary: Temporal Difference ML Model 
 Predicts Dopaminergic Neuron Acitivity during Learning • Evidence now of neural reward signals from – Direct neural recordings in monkeys – fMRI in humans (1 mm spatial resolution) – EEG in humans (1-10 msec temporal resolution) • Dopaminergic responses encode Temporal Difference error • Some differences, and efforts to refine the model – How/where is the value function encoded in the brain? – Study timing (e.g., basal ganglia learns faster than PFC ?) – Role of prior knowledge, rehearsal of experience, multi-task learning? Tom Mitchell, October 2018

  16. Predictive Coding Tom Mitchell, October 2018

  17. [Rao & Ballard, Nature, 1999] Tom Mitchell, October 2018

  18. [Rao & Ballard, 1999] Tom Mitchell, October 2018

  19. Tom Mitchell, October 2018

  20. Tom Mitchell, October 2018

Recommend


More recommend