Speculations on possible brain substrates of symbolic processing and structured I/O from memory Adam Marblestone CS 379c Stanford 2019 (slides based on pre-DeepMind work, much of it with Ken Hayworth)
A tentative high-level template for AI cognitive architectures, based on some interpretations of modern neuroscience (such as it is)
A tentative high-level template for AI cognitive architectures, based on some interpretations of modern neuroscience (such as it is) ???????????? But raises more questions than it answers...
Psychological inspirations for knowledge representation in AI cognitive architectures... and assumptions to question Working memory : reverberating activity? qualitatively similar to ongoing activity in a LSTM? -- but, in cortex? cortico-thalamic loops? unstructured versus pre-structured? variables/slots? -- gating / routing of access to/from working memory Episodic memory : rapid plasticity in hippocampus, supports pattern completion, linked to diverse cortical representations -- many open questions… temporal, spatial, predictive and other relational organizing principles? -- how is it consolidated into semantic memory or other cortically-encoded knowledge? -- free association, chunking, hierarchical contexts... -- how are memory recall, offline replay + prospective planning linked with RL? -- interplay of feature-based generalization and sparse, arbitrary pattern-separated codes? -- ... Semantic memory : knowledge-graph like representations in cortical association areas? -- distinct from episodic memory? distinct from “unstructured” cortical weights? -- is this a distinct architecture, or something that emerges from the other systems? Procedural memory : cortico-striatal synapses governing basal-ganglia action selection? selectable cortical programs? Other : how is the information encoded (e.g., based on which loss functions) before entering any of the above systems? are VAE-like “latent vectors” able to capture enough structure, when trained with the right loss functions, e.g., see MERLIN predictive losses? or does one need something more like “capsules” or other architectural features?
Neural Turing Machine: originally framed as extension to LSTM “working memory”
NTM arguably solves long-standing complaints about lack of symbolic “variable binding” in NNs (e.g., Gary Marcus)
Can we forge tighter links with neuroscience to constrain architectural choices for working + episodic memory analogs, symbolic structures, dynamic routing, and training procedures in ANNs?
neural attractors/assemblies/ensembles (cf., Hopfield…) http://fourier.eng.hmc.edu/e161/lectures/figures/energylandscape.gif
(cf., Hopfield…) https://github.com/adammarblestone/AssociativeMemories
Information represented via assemblies/attractors
Information represented via assemblies/attractors
Sequences of point attractors in the hippocampus?
Sequences of point attractors in the hippocampus?
The attractors may be in cortico-thalamo-cortical loops
Thalamic Latches and Working Memory Buffers McFarland & Haber Murray Sherman
Thalamic Latches and Working Memory Buffers Assumption: Information necessary to select an assembly passes through thalamus between cortical buffers
Gated communication using thalamic relay of attractors Idea: Thalamic relay + attractor implementation of “dynamically partitionable auto-associative neural network” (Hayworth 2012) •Global attractors/assemblies/ensembles shared across source > thalamic relay > destination buffers •Gating the thalamic relay off allows “partitioning” of the buffers •Gating the thalamic relay on allows information to be “copied” from a source buffer to a destination buffer, forcing the destination buffer to occupy an attractor globally shared with that of the source
Cortico-thalamic latched memory buffer
Cortico-thalamic latched memory buffer Assembly/attractor/ ensemble shared across connected cortical and thalamic areas…
“Copy and paste” of symbols using partitionable attractors Hayworth and Marblestone 2018
“Copy and paste” of symbols using partitionable attractors Sequence of gating operations for copy-and-paste of assemblies (cf., symbolic variable binding) During training / symbol allocation... Hayworth 2009
“Copy and paste” of symbols using partitionable attractors Sequence of gating operations for copy-and-paste of assemblies (cf., symbolic variable binding) Later, executing a routing operation... Hayworth 2009
“Copy and paste” of symbols using partitionable attractors Sequence of gating operations for copy-and-paste of assemblies (cf., symbolic variable binding) Hayworth and Marblestone 2018
“Copy and paste” of symbols using partitionable attractors “Latch” and “relay” control via basal ganglia discrete outputs discrete inhibitory/disinhibitory control over target thalamic areas/relays/latches? -Evolutionarily ancient (homologies to simplest vertebrate brains, e.g., ZFish) -Does RL -BG and superior colliculus may also contain innate control structures that could drive “training routines” / “internal curricula” / “bootstrap cost functions”... Lisman 2015
Clamping in target patterns for “contrastive” learning Hayworth and Marblestone 2018
Clamping in target patterns for “contrastive” learning Explicit basal ganglia directed control over the learning of invariances (not just unsupervised “slow feature” finding)? Example: -Basal ganglia recognizes boundaries of “episode” with a given object (BG learns this policy via reinforcement learning?) -BG “clamps” target patterns into thalamo-cortical target buffer -BG trains upstream sensory hierarchy to map varying input to clamped target -Target pattern may be retrieved from memory on subsequent episode? Hayworth and Marblestone 2018
Structured I/O from an associative memory Unstructured associative code Hayworth and Marblestone 2018
Structured I/O from an associative memory Structured representation across multiple buffers Hayworth and Marblestone 2018
A crude, very partial, and speculative “integrative picture” Hayworth and Marblestone 2018
Returning to the current situation re integrated memory-based RL architectures in AI
Returning to the current situation re integrated memory-based RL architectures in AI Basically “soft attention” over a set of memory “slots”, with cosine-distance based similarity lookup…
What about structured routing / potential thalamus analogs?
What about structured routing / potential thalamus analogs?
Recommend
More recommend