michael j frank laboratory for neural computation and
play

Michael J. Frank Laboratory for Neural Computation and Cognition - PowerPoint PPT Presentation

Clustering and generalization of abstract structures in reinforcement learning Michael J. Frank Laboratory for Neural Computation and Cognition Brown University Reinforcement learning in neural nets and AI Mnih et al, 2015 , Nature But nets


  1. Clustering and generalization of abstract structures in reinforcement learning 
 Michael J. Frank Laboratory for Neural Computation and Cognition Brown University

  2. Reinforcement learning in neural nets and AI Mnih et al, 2015 , Nature

  3. But nets show failure to transfer learned knowledge Breakout trained on Offset Paddle Breakout Asynchronous Advantage Actor-Critic (A3C) Kansky et al, 2017 See also Witty et al 2018 arXiv � 3

  4. What can we learn from limitations of models and humans? Trade-offs • Limited WM capacity (curse of dimensionality) • Multi-tasking (shared representations enhance learning & generalization) • Robustness to task contingencies (OpAL vs RL) • (catastrophic) interference in episodic memory • Unsupervised (hebbian) vs supervised • Motor learning – hierarchical structure • In defense of “small” problems: - need to understand key elements - link to neural data / experiments - Theory of Everything before Theory of Anything! � 4

  5. Why does motor learning develop so slowly in humans? • Standard story: infants born early due to large head, small birth canal • ‘Fourth trimester’

  6. Why does motor learning develop so slowly in humans? • Standard story: infants born early due to large head, small birth canal • ‘Fourth trimester’ • But 3 month old infants are still pretty incompetent (from babycenter.com):

  7. Why does motor learning develop so slowly in humans? • Standard story: infants born early due to large head, small birth canal • ‘Fourth trimester’ • But 3 month old infants are still pretty incompetent (from babycenter.com): ‘you no longer need to support his head. When he’s on his stomach he can lift his head and chest. He can open and close his hands..’

  8. Why does motor learning develop so slowly in humans? • Standard story: infants born early due to large head, small birth canal • ‘Fourth trimester’ • But 3 month old infants are still pretty incompetent (from babycenter.com): ‘you no longer need to support his head. When he’s on his stomach he can lift his head and chest. He can open and close his hands..’ • Hypothesis: human brain is wired to discover latent generalizable structure, which is initially inefficient – see Werchan et al 2016!

  9. Why does motor learning develop so slowly in humans? • Standard story: infants born early due to large head, small birth canal • ‘Fourth trimester’ • But 3 month old infants are still pretty incompetent (from babycenter.com): ‘you no longer need to support his head. When he’s on his stomach he can lift his head and chest. He can open and close his hands..’ • Hypothesis: human brain is wired to discover latent generalizable structure, which is initially inefficient

  10. Why does motor learning develop so slowly in humans? • Standard story: infants born early due to large head, small birth canal • ‘Fourth trimester’ • But 3 month old infants are still pretty incompetent (from babycenter.com): ‘you no longer need to support his head. When he’s on his stomach he can lift his head and chest. He can open and close his hands..’ • Hypothesis: human brain is wired to discover latent generalizable structure, which is initially inefficient

  11. Humans learn contextualized 
 rule structures Driving rules Driving rules UK Montreal and…

  12. A key structure: Task-sets (TS) Cue 1 stimuli actions

  13. Task-sets (TS) C 1 S1 A1 S2 A2 S3 A3

  14. Task-sets (TS) C 1 S i1 A i1 C 2 S i2 A i2 C 3 S i3 A i3 C 4 S i4 A i4 C 5 S i5 A i5 C 6 S i6 A i6

  15. Abstracting Task-set rules Latent task-set space C 1 C 2 S i1 A i1 TS 1 C 3 C 4 C 5 S i2 A i2 TS 2 C 6 Collins & Frank 2013

  16. Popularity Prior on Task-set rules C 1 C 2 C 2 S i1 A i1 TS 1 C 3 C 4 C 5 S i2 A i2 TS 2 C 6 CRP Prior on TS in a new context: C 7 = N(TS j |C*) / [ α + Σ i ? P 0 (TS = TS j |C new ) N(TS i | C*)] = α / [ α + Σ i N(TS i | C*)] P 0 (TS = new|C new ) Collins & Frank 2013

  17. Ability to create new Task-set rules Latent task-set space: Unknown size C 1 C 2 S i1 A i1 TS 1 C 3 C 4 C 5 S i2 A i2 TS 2 C 6 C 7 S i A i TS new Collins & Frank 2013

  18. Linking algorithmic model and neural network model CTS-model Neural Network-model Ai TSi BG BG DA Both models are approximations of the same process: TS space building Collins & Frank, Psych Review, 2013

  19. Clustering vs partitioning task space in frontostriatal circuits via RL Old TS New TS generalization & transfer RL Collins & Frank 2013; 2016; Frank & Badre, 2012

  20. Clustering vs partitioning task space in frontostriatal circuits via RL Old TS New TS generalization & transfer RL Fitted clustering Model mimicry: prior C-TS and hierarchical neural net are approximations of same structure building process C-PFC sparseness Collins & Frank 2013; 2016; Frank & Badre, 2012

  21. Vector reward prediction errors: 
 “actor-specific” computations “Mixture of Experts” Hierarchical task MIXTURE Flat task • DA signals are tailored to computations of underlying FC-BG circuit - “Mixture of Experts” ( Frank & Badre 2012; fMRI: Badre & Frank 2012; Collins & Frank 2013 … ) - Vector RPEs

  22. Appending to latent task structures: beyond the identity mapping.. S1 S2 C0 A1 A2 Initial C1 A1 A2 Phase C2 A3 A4 C3 C4

  23. Appending to latent task structures: extrapolating beyond the identity mapping S1 S2 S3 S4 C0 A1 A2 A1 A4 Transfer Initial C1 A1 A2 A1 A4 Phase 1 Phase C2 A3 A4 A3 A2 C3 C4

  24. 1/4 1/4 1/2 C0 C1 C2 TS1 TS2 ? ? A 1 A 2 A 3 A 4 Initial phase Phase 2 Subjects (N=34) H Subjects (N=34) 1 Model Proportion Correct Proportion Correct * Proportion Correct 0.8 * 0.6 C0, C1 0.4 C0, C1 C2 C0, C1 C2 0.2 C2 0 0 2 4 6 8 Trial# per input pattern Trial# per input pattern Trial# per input pattern

  25. H init 1 1 Model Model Proportion Correct Proportion Correct 0.8 0.8 0.6 0.6 0.4 0.4 C0, C1 C0, C1 0.2 0.2 C2 C2 0 0 0 2 4 6 8 0 2 4 6 8 Initial phase Phase 2 Subjects (N=34) Subjects (N=34) Proportion Correct Proportion Correct * * C0, C1 C0, C1 C2 C2 Trial# per input pattern Trial# per input pattern

  26. Can subjects generalize learned rules to new contexts? C3 C0 C1 C2 C3 C4 TS3 TS1 TS2

  27. Can subjects generalize learned rules to new contexts? S1 S2 S3 S4 C0 TS1 TS1 Transfer Initial C1 TS1 TS1 Phase 1 Phase C2 TS2 TS2 C3 TS old Transfer C4 TS new Phase 2

  28. C3 C0 C1 C2 C3 C4 TS3 TS1 TS2 A 1 A 2 A 3 A 4 A 1 A 4 CV Subjects (N = 34) 1 Model Proportion Correct Proportion Correct 0.8 * 0.6 0.4 C3: TS old C4: TS new 0.2 0 0 2 4 6 8 Trial# per input pattern Trial# per input pattern

  29. C0 C1 C2 Prediction error: TS1 TS2 PE = reward - expectation ? ? A 1 A 2 A 3 A 4 Correct Correct PE Correct Correct PE Structure learning PE

  30. Prediction error (PE) in EEG signal β PE (electrodes, time) For each subject: β Str (electrodes, time) Time from FB trial number ~ β 0 + β PE + β Str Collins & Frank (2016), Cognition

  31. Prediction error (PE) in EEG signal 
 Structure PE in EEG signal EEG(trial) ~ β 0 + β PE PE(trial) + β Str StructurePE(trial) PE effect average β PE ROI1 Time from feedback (ms) ROI2 ** * ns ** * Structure learning PE Unique effect of Collins & Frank (2016) Cognition ROI1 ROI2

  32. Structure PE signal predicts transfer. 1 Unique effect of 0.8 Structure New context Prior P(Correct) learning PE 0.6 C3-TS old 0.4 0.6 C4-TS new 0.2 0.5 % Choices 0.4 0 0 2 4 6 8 0.3 Iteration # 1 0.2 0.1 0.8 0 TS1 TS2 other P(Correct) TS1 TS2 other 0.6 ROI1+2 action action action 0.4 0.2 Collins & Frank, Cognition, 0 0 2 4 6 8 accepted Iteration #

  33. Structure learning Proportion Correct Proportion Correct It affords transfer * * C0, C1 C2 Trial# per input pattern Trial# per input pattern It depends on clustering priors It informs neural PE effect Structure learning PE ** * ns ** * Unique effect of representations of reward predictions

  34. Neural model & EEG: TS switch effects

  35. Neural model & EEG: TS switch effects

  36. No early clustering benefit Structure learning affords transfer - early structure learning is of new information within learned costly clusters S1 S2 S3 S4 C0 TS1 TS1 Transfer Initial C1 TS1 TS1 Phase 1 Phase C2 TS2 TS2 C3 TS old Transfer C4 TS new Phase 2 Neural signatures of Structure learning affords transfer of hierarchical prediction errors known rules to new contexts – with predict structure learning/ transfer: Badre & Frank 2012; Collins et al 2014, 2016 popularity clustering prior

  37. Do we build structure a priori? N = 33 New Context: * Old TS New TS Significant whole group positive transfer Werchan et al, 2016, JNeurosci

  38. Share: Physical Movements (mappings from sounds to notes) Share : Chord progression, rhythm, etc ( desired sound/ song) � 38

  39. Need compositionality : reuse flute mappings to play a song usually played on guitar Piccolo � 39

Recommend


More recommend