Reinforcement Learning in Psychology and Neuroscience with thanks - PowerPoint PPT Presentation

Reinforcement Learning in Psychology and Neuroscience with thanks to Elliot Ludvig University of Warwick

Bidirectional Influences Psychology Artificial Intelligence Reinforcement Learning Control Neuroscience Theory

Any information processing system can be understood at multiple “levels” • The Computational Theory Level – What is being computed? – Why are these the right things to compute? • Representation and Algorithm Level – How are these things computed? • Implementation Level – How is this implemented physically? David Marr, 1972

Goals for today’s lecture • To learn: • That psychology recognizes two fundamental learning processes, analogous to our prediction and control. • That all the ideas in this course are also important in completely different fields: psychology and neuroscience • That the details of the TD( λ ) algorithm match key features of biological learning

Psychology has identified two primitive kinds of learning • Classical Conditioning • Operant Conditioning (a.k.a. Instrumental learning) • Computational theory: ❖ Classical = Prediction - What is going to happen? ❖ Operant = Control - What to do to maximize reward?

Classical Conditioning

Classical Conditioning as Prediction Learning • Classical Conditioning is the process of learning to predict the world around you ❖ Classical Conditioning concerns (typically) the subset of these predictions to which there is a hard- wired response

Pavlov (1901) • Russian physiologist • Interested in how learning happened in the brain • Conditional and Unconditional Stimuli

Is it really predictions?

Maybe Contiguity? • Foundational principle of classical associationism (back to Aristotle) ❖ Contiguity = Co-occurrence ❖ Sufficient for association?

Contiguity Problems • Unnecessary: ❖ Conditioned Taste Aversion • Insufficient: ❖ Blocking ❖ Contingency Experiments

Blocking Phase 1 Phase 2 Light comes to   Will sound come to cause salivation cause salivation? No. Learning about the sound in Phase 2 does not occur because it is blocked by the association formed in Phase 1

Rescorla-Wagner Model (1972) • Computational model of conditioning ❖ Widely cited and used • Learning as violation of expectations ❖ As in linear supervised learning (LMS, p2) ❖ TD learning is a real-time extension   of this same idea

Operant Learning • The natural learning process directly analogous to reinforcement learning • Control! What response to make when?

Thorndike’s Puzzle Box (1910)

Law of Effect • “Of several responses made to the same situation, those which are accompanied by or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur...” - Thorndike (1911), p. 244

Operant Chambers

Complex Cognition

Any information processing system can be understood at multiple “levels” • The Computational Theory Level – What is being computed? – Why are these the right things to compute? • Representation and Algorithm Level – How are these things computed? • Implementation Level – How is this implemented physically? David Marr, 1972

The Basic TD Model • Learn to predict discounted sum of upcoming reward through TD with linear function approximation • The TD error is calculated as: δ t . = R t +1 + γ ˆ v ( S t +1 , θ ) − ˆ v ( S t , θ )

TD( λ ) algorithm/model/neuron Reward ∑ w i ⋅ x i States x i e i w i δ i or TD Features Value of state Error or action λ w i ~ δ ⋅ e i ˙ TD Eligibility Error Trace

Brain reward systems What signal does this neuron carry? Honeybee Brain VUM Neuron Hammer, Menzel

Dopamine • Small-molecule Neurotransmitter ❖ Diffuse projections from mid-brain throughout the brain Key Idea: dopamine responding = TD error

What does Dopamine Do? • Hedonic Impact • Motivation • Motor Activity • Attention • Novelty • Learning

TD Error = Dopamine Error Calculation Current Old + New Schultz et al., (1997); Dopamine Montague et al. (1996)

Dopamine neurons signal the error/change   in prediction of reward Wolfram Schultz, et al.

Reward Unexpected Reward Value TD error Reward Expected Cue Value TD error Reward Absent Value TD error δ t = R t +1 + γ ˆ v t +1 − ˆ v t

The theory that Dopamine = TD error is the most important interaction ever   between AI and neuroscience

Goals for today’s lecture • To learn: • That psychology recognizes two fundamental learning processes, analogous to our prediction and control. • That all the ideas in this course are also important in completely different fields: psychology and neuroscience • That the details of the TD( λ ) algorithm match key features of biological learning

What have you learned about in this course (without buzzwords)? • “Decision-making over time to achieve a long-term goal” – includes learning and planning – makes plain why value functions are so important – makes plain why so many fields care about these algorithms • AI • Control theory • Psychology and Neuroscience • Operations Research • Economics – all involve decision, goals, and time... • the essence of... mind? intelligence? Intelligent Systems.

Bidirectional Influences Psychology Artificial Intelligence Reinforcement Learning Control Neuroscience Theory

Reinforcement Learning in Psychology and Neuroscience with thanks - PowerPoint PPT Presentation

Reinforcement Learning in Psychology and Neuroscience with thanks to Elliot Ludvig University of Warwick Bidirectional Influences Psychology Artificial Intelligence Reinforcement Learning Control Neuroscience Theory Any information

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

PSYCHOLOGY Dr Amy Pearson Lecturer in Psychology The Truth About Psychology Psychology

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning in Psychology and Neuroscience with thanks to Elliot Ludvig Princeton

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

reinforcement learning in humans and animals nathaniel daw nyu neuroscience; psychology;

What is Health Psychology? Division of Health Psychology Scotland Psychology Focuses on

LGA 504 Sport Psychology Session Objectives Define Sport Psychology and examine why it is

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Cognitive computational neuroscience of vision Nikolaus Kriegeskorte Department of Psychology,

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

An h-adaptive unfitted finite element method for interface elliptic boundary value problems Eric

Modern Discrete Probability III - Stopping times and martingales Review S ebastien Roch

Phonology 9/10/2010 Key Words / Concepts Phonology vs. phonetics Phoneme vs. allophone

Invariances in Gaussian processes And how to learn them ST John PROWLER.io Outline 1. What

Draft Conditioning by Permutation Monte Carlo for Continuous-Time Markov Chains Pierre

Logical Behaviorism vs. Behaviorism (in psychology) vs. Behavioral Psychology Reflex Theory

The condensation threshold in stochastic block models Joe Neeman (with Jess Banks, Cris Moore,

Positive Reinforcement Training A Primer How Dogs (and People) Learn If a dog does something