Clustering and generalization of abstract structures in reinforcement learning Michael J. Frank Laboratory for Neural Computation and Cognition Brown University
Reinforcement learning in neural nets and AI Mnih et al, 2015 , Nature
But nets show failure to transfer learned knowledge Breakout trained on Offset Paddle Breakout Asynchronous Advantage Actor-Critic (A3C) Kansky et al, 2017 See also Witty et al 2018 arXiv � 3
What can we learn from limitations of models and humans? Trade-offs • Limited WM capacity (curse of dimensionality) • Multi-tasking (shared representations enhance learning & generalization) • Robustness to task contingencies (OpAL vs RL) • (catastrophic) interference in episodic memory • Unsupervised (hebbian) vs supervised • Motor learning – hierarchical structure • In defense of “small” problems: - need to understand key elements - link to neural data / experiments - Theory of Everything before Theory of Anything! � 4
Why does motor learning develop so slowly in humans? • Standard story: infants born early due to large head, small birth canal • ‘Fourth trimester’
Why does motor learning develop so slowly in humans? • Standard story: infants born early due to large head, small birth canal • ‘Fourth trimester’ • But 3 month old infants are still pretty incompetent (from babycenter.com):
Why does motor learning develop so slowly in humans? • Standard story: infants born early due to large head, small birth canal • ‘Fourth trimester’ • But 3 month old infants are still pretty incompetent (from babycenter.com): ‘you no longer need to support his head. When he’s on his stomach he can lift his head and chest. He can open and close his hands..’
Why does motor learning develop so slowly in humans? • Standard story: infants born early due to large head, small birth canal • ‘Fourth trimester’ • But 3 month old infants are still pretty incompetent (from babycenter.com): ‘you no longer need to support his head. When he’s on his stomach he can lift his head and chest. He can open and close his hands..’ • Hypothesis: human brain is wired to discover latent generalizable structure, which is initially inefficient – see Werchan et al 2016!
Why does motor learning develop so slowly in humans? • Standard story: infants born early due to large head, small birth canal • ‘Fourth trimester’ • But 3 month old infants are still pretty incompetent (from babycenter.com): ‘you no longer need to support his head. When he’s on his stomach he can lift his head and chest. He can open and close his hands..’ • Hypothesis: human brain is wired to discover latent generalizable structure, which is initially inefficient
Why does motor learning develop so slowly in humans? • Standard story: infants born early due to large head, small birth canal • ‘Fourth trimester’ • But 3 month old infants are still pretty incompetent (from babycenter.com): ‘you no longer need to support his head. When he’s on his stomach he can lift his head and chest. He can open and close his hands..’ • Hypothesis: human brain is wired to discover latent generalizable structure, which is initially inefficient
Humans learn contextualized rule structures Driving rules Driving rules UK Montreal and…
A key structure: Task-sets (TS) Cue 1 stimuli actions
Task-sets (TS) C 1 S1 A1 S2 A2 S3 A3
Task-sets (TS) C 1 S i1 A i1 C 2 S i2 A i2 C 3 S i3 A i3 C 4 S i4 A i4 C 5 S i5 A i5 C 6 S i6 A i6
Abstracting Task-set rules Latent task-set space C 1 C 2 S i1 A i1 TS 1 C 3 C 4 C 5 S i2 A i2 TS 2 C 6 Collins & Frank 2013
Popularity Prior on Task-set rules C 1 C 2 C 2 S i1 A i1 TS 1 C 3 C 4 C 5 S i2 A i2 TS 2 C 6 CRP Prior on TS in a new context: C 7 = N(TS j |C*) / [ α + Σ i ? P 0 (TS = TS j |C new ) N(TS i | C*)] = α / [ α + Σ i N(TS i | C*)] P 0 (TS = new|C new ) Collins & Frank 2013
Ability to create new Task-set rules Latent task-set space: Unknown size C 1 C 2 S i1 A i1 TS 1 C 3 C 4 C 5 S i2 A i2 TS 2 C 6 C 7 S i A i TS new Collins & Frank 2013
Linking algorithmic model and neural network model CTS-model Neural Network-model Ai TSi BG BG DA Both models are approximations of the same process: TS space building Collins & Frank, Psych Review, 2013
Clustering vs partitioning task space in frontostriatal circuits via RL Old TS New TS generalization & transfer RL Collins & Frank 2013; 2016; Frank & Badre, 2012
Clustering vs partitioning task space in frontostriatal circuits via RL Old TS New TS generalization & transfer RL Fitted clustering Model mimicry: prior C-TS and hierarchical neural net are approximations of same structure building process C-PFC sparseness Collins & Frank 2013; 2016; Frank & Badre, 2012
Vector reward prediction errors: “actor-specific” computations “Mixture of Experts” Hierarchical task MIXTURE Flat task • DA signals are tailored to computations of underlying FC-BG circuit - “Mixture of Experts” ( Frank & Badre 2012; fMRI: Badre & Frank 2012; Collins & Frank 2013 … ) - Vector RPEs
Appending to latent task structures: beyond the identity mapping.. S1 S2 C0 A1 A2 Initial C1 A1 A2 Phase C2 A3 A4 C3 C4
Appending to latent task structures: extrapolating beyond the identity mapping S1 S2 S3 S4 C0 A1 A2 A1 A4 Transfer Initial C1 A1 A2 A1 A4 Phase 1 Phase C2 A3 A4 A3 A2 C3 C4
1/4 1/4 1/2 C0 C1 C2 TS1 TS2 ? ? A 1 A 2 A 3 A 4 Initial phase Phase 2 Subjects (N=34) H Subjects (N=34) 1 Model Proportion Correct Proportion Correct * Proportion Correct 0.8 * 0.6 C0, C1 0.4 C0, C1 C2 C0, C1 C2 0.2 C2 0 0 2 4 6 8 Trial# per input pattern Trial# per input pattern Trial# per input pattern
H init 1 1 Model Model Proportion Correct Proportion Correct 0.8 0.8 0.6 0.6 0.4 0.4 C0, C1 C0, C1 0.2 0.2 C2 C2 0 0 0 2 4 6 8 0 2 4 6 8 Initial phase Phase 2 Subjects (N=34) Subjects (N=34) Proportion Correct Proportion Correct * * C0, C1 C0, C1 C2 C2 Trial# per input pattern Trial# per input pattern
Can subjects generalize learned rules to new contexts? C3 C0 C1 C2 C3 C4 TS3 TS1 TS2
Can subjects generalize learned rules to new contexts? S1 S2 S3 S4 C0 TS1 TS1 Transfer Initial C1 TS1 TS1 Phase 1 Phase C2 TS2 TS2 C3 TS old Transfer C4 TS new Phase 2
C3 C0 C1 C2 C3 C4 TS3 TS1 TS2 A 1 A 2 A 3 A 4 A 1 A 4 CV Subjects (N = 34) 1 Model Proportion Correct Proportion Correct 0.8 * 0.6 0.4 C3: TS old C4: TS new 0.2 0 0 2 4 6 8 Trial# per input pattern Trial# per input pattern
C0 C1 C2 Prediction error: TS1 TS2 PE = reward - expectation ? ? A 1 A 2 A 3 A 4 Correct Correct PE Correct Correct PE Structure learning PE
Prediction error (PE) in EEG signal β PE (electrodes, time) For each subject: β Str (electrodes, time) Time from FB trial number ~ β 0 + β PE + β Str Collins & Frank (2016), Cognition
Prediction error (PE) in EEG signal Structure PE in EEG signal EEG(trial) ~ β 0 + β PE PE(trial) + β Str StructurePE(trial) PE effect average β PE ROI1 Time from feedback (ms) ROI2 ** * ns ** * Structure learning PE Unique effect of Collins & Frank (2016) Cognition ROI1 ROI2
Structure PE signal predicts transfer. 1 Unique effect of 0.8 Structure New context Prior P(Correct) learning PE 0.6 C3-TS old 0.4 0.6 C4-TS new 0.2 0.5 % Choices 0.4 0 0 2 4 6 8 0.3 Iteration # 1 0.2 0.1 0.8 0 TS1 TS2 other P(Correct) TS1 TS2 other 0.6 ROI1+2 action action action 0.4 0.2 Collins & Frank, Cognition, 0 0 2 4 6 8 accepted Iteration #
Structure learning Proportion Correct Proportion Correct It affords transfer * * C0, C1 C2 Trial# per input pattern Trial# per input pattern It depends on clustering priors It informs neural PE effect Structure learning PE ** * ns ** * Unique effect of representations of reward predictions
Neural model & EEG: TS switch effects
Neural model & EEG: TS switch effects
No early clustering benefit Structure learning affords transfer - early structure learning is of new information within learned costly clusters S1 S2 S3 S4 C0 TS1 TS1 Transfer Initial C1 TS1 TS1 Phase 1 Phase C2 TS2 TS2 C3 TS old Transfer C4 TS new Phase 2 Neural signatures of Structure learning affords transfer of hierarchical prediction errors known rules to new contexts – with predict structure learning/ transfer: Badre & Frank 2012; Collins et al 2014, 2016 popularity clustering prior
Do we build structure a priori? N = 33 New Context: * Old TS New TS Significant whole group positive transfer Werchan et al, 2016, JNeurosci
Share: Physical Movements (mappings from sounds to notes) Share : Chord progression, rhythm, etc ( desired sound/ song) � 38
Need compositionality : reuse flute mappings to play a song usually played on guitar Piccolo � 39
Recommend
More recommend