complementary learning systems in natural and artificial
play

Complementary Learning Systems in Natural and Artificial - PowerPoint PPT Presentation

Complementary Learning Systems in Natural and Artificial Intelligence James L. McClelland Department of Psychology & Center for Mind, Brain and Computation Stanford University Toms questions for me What sort of NN architectures


  1. Complementary Learning Systems in Natural and Artificial Intelligence James L. McClelland Department of Psychology & Center for Mind, Brain and Computation Stanford University

  2. Tom’s questions for me • What sort of NN architectures could serve an automated programmer in constructing a program? • How do you imagine different memory systems working in a human programmer?

  3. Outline for the session • Complementary learning systems – The basic theory – Rapid schema consistent learning – Comparison of the two learning systems • Deep learning and complementary learning systems – Rehearsal buffer in the DQN – Memory based parameter adaptation • Revisiting Tom’s prompt and a response

  4. Your knowledge is in your connections! • An experience is a pattern of activation over neurons in one or more brain regions. • The trace left in memory is the set of adjustments to the strengths of the connections. – Each experience leaves such a trace, but the traces are not separable or distinct. – Rather, they are superimposed in the same set of connection weights. • Recall involves the recreation of a pattern of activation, using a part or associate of it as a cue. • The reinstatement depends on the knowledge in the connection weights, which in general will reflect influences of many different experiences. • Thus, memory is always a constructive process, dependent on contributions from many different experiences.

  5. Effect of a Hippocampal Lesions • Intact performance on tests of intelligence, general knowledge, language, other acquired skills • Dramatic deficits in formation of some types of new memories: – Explicit memories for episodes and events – Paired associate learning – Arbitrary new factual information • Spared priming and skill acquisition Note: HM’s lesion • Temporally graded retrograde amnesia: was bilateral – lesion impairs recent memories leaving remote memories intact.

  6. Key Points • We learn about the general pattern of experiences, not just specific things • Gradual learning in the cortex builds implicit semantic and procedural knowledge that forms much of the basis of our cognitive abilities • The Hippocampal system complements the cortex by allowing us to learn specific things without interference with existing structured knowledge • In general these systems must be thought of as working together rather than being alternative sources of information.

  7. Effect of Prior Association on Paired-Associate Learning in Control and Amnesic Populations Cutting (1978), Expt. 1 100 Control (Expt) 80 Percent Correct Amnesic (Expt) 60 40 20 Base rates 0 -20 Very Easy Easy Fairly Easy Hard Very Hard Category (Ease of Association)

  8. Kwok & McClelland Model of Semantic and Episodic Memory • Model includes slow learning cortical system and a fast-learning hippocampal system. Hippocampus • Cortex contains units representing both content and context of an experience. • Semantic memory is gradually built up through repeated presentations of the same content in different contexts. • Formation of new episodic memory depends on hippocampus and the relevant cortical areas, including context. Relation Cue Context Target – Loss of hippocampus would prevent initial rapid binding of content and context. Neo-Cortex • Episodic memories benefit from prior cortical learning when they involve meaningful materials.

  9. Simulation Results From KM Model Cutting (1978), Expt. 1 100 Control (Model) 84 80 Amnesic (Model) 70 68 Percent Correct Control (Expt) 60 Amnesic (Expt) 40 20 9 Base rates 0 0 0 in model -20 Very Easy Easy Fairly Easy Hard Very Hard Category (Ease of Association)

  10. Emergence of Meaning in Learned Distributed Representations through Gradual Interleaved Learning • Distributed representations (what ML calls embeddings) that capture aspects of meaning emerge through a gradual learning process • The progression of learning and the representations formed capture many aspects of cognitive development  Progressive differentiation – Sensitivity to coherent covariation across contexts – Reorganization of conceptual knowledge

  11. The Rumelhart Model

  12. The Training Data: All propositions true of items at the bottom level of the tree, e.g.: Robin can {grow, move, fly}

  13. Early E x p e r Later i e n c e Later Still

  14. What happens in this system if we try to learn something new? Such as a Penguin

  15. Learning Something New • Used network already trained with eight items and their properties. • Added one new input unit fully connected to the representation layer • Trained the network with the following pairs of items: – penguin-isa living thing-animal-bird – penguin-can grow-move-swim

  16. Rapid Learning Leads to Catastrophic Interference

  17. A Complementary Learning System in the Medial Temporal Lobes name action motion Temporal pole color valance form Medial Temporal Lobe

  18. Avoiding Catastrophic Interference with Interleaved Learning

  19. Initial Storage in the Hippocampus Followed by Repeated Replay Leads to the Consolidation of New Learning in Neocortex, Avoiding Catastrophic Interference name action motion Temporal pole color valance form Medial Temporal Lobe

  20. Richard Morris Rapid Consolidation of Schema Consistent Information

  21. Tse et al ( Science, 2007, 2011) During training, 2 wells uncovered on each trial

  22. Schemata and Schema Consistent Information • What is a ‘schema’? – An organized knowledge structure into which existing knowledge is organized. • What is schema consistent information? – Information that can be added to a schema without disturbing it. • What about a penguin? – Partially consistent – Partially inconsistent • In contrast, consider – a trout – a cardinal

  23. New Simulations • Initial training with eight items and their properties as before. • Added one new input unit fully connected to the representation layer also as before • Trained the network on one of the following pairs of items: – penguin-isa & penguin-can – trout-isa & trout-can – cardinal-isa & cardinal-can

  24. New Learning of Consistent and Partially Inconsistent Information LEARNING INTERFERENCE

  25. Connection Weight Changes after Simulated NPA, OPA and NM Analogs Tse Et al 2011

  26. How Does It Work?

  27. How Does It Work?

  28. Comparison of the two learning systems

  29. Dense vs Sparse Coding • Pattern separation: – Sparse random conjunctive coding

  30. Similarity Based Representations in Cortex

  31. In more detail… • Input from neocortex comes into EC; EC projects to DG, CA3, and CA1 • Drastic pattern separation occurs in DG • Downsampling in CA3 assigns an arbitrary code • Invertable somewhat sparsified representation in CA1 • Fewish-shot learning in DG, CA3, CA3- >CA1 allows reconstruction of ERC pattern from partial input. • Other connections shown in black are part of the slow-learning neocortical network. • Recurrence within CA3, through the hippocampal circuit shown, and through the outer loop also involving the rest of the neocortex

  32. Two modes of generalization • Parametric vs. Item- based • As long as the Hippocampus embeddings are already known, these modes can both support generalization • The hippocampus can do so without requiring interleaved learning Relation Cue Context Target • Adapting the embeddings Neo-Cortex may be relatively hard

  33. How might hippocampus support inference and generalization? ‘Inference’ • Finding missing links in the transitive inference task

  34. Complementary Learning Systems in AI • DQN • MBPA

  35. Tom’s questions for me • What sort of NN architectures could serve an automated programmer in constructing a program? • How do you imagine different memory systems working in a human programmer? • My version of the question: – What additional form of memory do intelligent agent’s need?

  36. Working Memory • Is there a special working memory system in the brain? • Or do we learn connection weights that sustain information an active state in memory? • RNNs and LSTMs provide forms of working memory • What is exciting about these models is that they learn what to retain – We learn to retain the information that will be useful later

  37. The Differentiable Neural Computer

  38. Learning what to store – in two senses

  39. Memory Augmented Neural Networks Santoro et al (2016) One-shot learning with MANNs

Recommend


More recommend