adam marblestone stanford cs379c tom dean 2017 machine
play

Adam Marblestone Stanford cs379c (Tom Dean) 2017 Machine learning - PowerPoint PPT Presentation

Adam Marblestone Stanford cs379c (Tom Dean) 2017 Machine learning and neuroscience speak different languages today Neuro ML Circuits Gradient-based optimization Representations Supervised learning Computational motifs Augmenting


  1. Adam Marblestone Stanford cs379c (Tom Dean) 2017

  2. Machine learning and neuroscience speak different languages today… Neuro ML Circuits Gradient-based optimization Representations Supervised learning Computational motifs Augmenting neural nets with external memories “the neural code”

  3. Machine learning and neuroscience speak different languages today… Neuro ML Circuits Gradient-based optimization Representations Supervised learning Computational motifs Augmenting neural nets with external memories “the neural code” Key message: These are not as far apart as we think Modern ML, suitably modified, may provide a partial framework for theoretical neuro

  4. “ Atoms of computation” framework (outdated) Apparently-uniform six-layered neocortical sheet: common communication interface, not common algorithm?

  5. “ Atoms of computation” framework (outdated) biological specializations <> different circuits <> different computations

  6. What about this objection? “ The big, big lesson from neural networks is that there exist computational systems (artificial neural networks) for which function only weakly relates to structure ... A neural network needs a cost function and an optimization procedure to be fully described; and an optimized neural network's computation is more predictable from this cost function than from the dynamics or connectivity of the neurons themselves .” Greg Wayne (DeepMind) in response to Atoms of Neural Computation paper

  7. Three hypotheses for linking neuroscience and ML 1) Existence of cost functions : the brain optimizes cost functions (~ as powerfully as backprop) 2) Diversity of cost functions : the cost functions are diverse, area-specific and systematically regulated in space and time (not a single “end-to-end” training procedure) 3) Embedding within a structured architecture : optimization occurs within a specialized architecture containing pre-structured systems (e.g., memory systems, routing systems) that support efficient optimization

  8. Three hypotheses for linking neuroscience and ML 1) Existence of cost functions : the brain optimizes cost functions (~ as powerfully as backprop) Not just the trivial “neural dynamics can be described in terms of cost function(s)”… it actually has machinery to do optimization 2) Diversity of cost functions : the cost functions are diverse, area-specific and systematically regulated in space and time (not a single “end-to-end” training procedure) 3) Embedding within a structured architecture : optimization occurs within a specialized architecture containing pre-structured systems (e.g., memory systems, routing systems) that support efficient optimization

  9. Three hypotheses for linking neuroscience and ML 1) Existence of cost functions : the brain optimizes cost functions (~ at least as powerfully as backprop) Trained Relatively relatively unstructured unstructured network network

  10. 1) Existence of cost functions : Ways to perform optimization in a neural network efficient, exact Back-propagation gradient computation by propagating errors through multiple layers Node perturbation slow, high-variance Serial gradient computation Parallel Weight perturbation slow, high-variance Serial gradient computation Parallel

  11. 1) Existence of cost functions : Back-propagation is much more efficient and precise, but computational neuroscience has mostly rejected it It has instead focused on local synaptic plasticity rules, or occasionally on weight or node perturbation Example:

  12. 1) Existence of cost functions :

  13. 1) Existence of cost functions : Do you really need information to flow “backwards along the axon”? Or more generally, is the “weight transport” problem a genuine one?

  14. 1) Existence of cost functions : transpose( W ) x e gets fed back into the hidden units B x e gets fed back into the hidden units

  15. 1) Existence of cost functions : normal back-prop fixed random feedback weights

  16. 1) Existence of cost functions : Even spiking, recurrent networks may be trainable using similar ideas

  17. 1) Existence of cost functions : Use multiple dendritic compartments to store both “activations” and “errors” soma voltage ~ activation dendritic voltage ~ error derivative

  18. 1) Existence of cost functions : Or use temporal properties of the neuron to encode the signal firing rate ~ activation d(firing rate)/dt ~ error derivative See also similar claims by Hinton

  19. 1) Existence of cost functions : But isn’t gradient descent only compatible with “supervised” learning?

  20. 1) Existence of cost functions : But isn’t gradient descent only compatible with “supervised” learning? No! Lots of unsupervised learning paradigms operate via gradient descent…

  21. 1) Existence of cost functions : But isn’t gradient descent only compatible with “supervised” learning? No! Lots of unsupervised learning paradigms operate via gradient descent… classic auto-encoder

  22. 1) Existence of cost functions : But isn’t gradient descent only compatible with “supervised” learning? No! Lots of unsupervised learning paradigms operate via gradient descent… filling in

  23. 1) Existence of cost functions : But isn’t gradient descent only compatible with “supervised” learning? No! Lots of unsupervised learning paradigms operate via gradient descent… prediction of the next frame of a movie

  24. 1) Existence of cost functions : But isn’t gradient descent only compatible with “supervised” learning? No! Lots of unsupervised learning paradigms operate via gradient descent… prediction of the next frame of a movie

  25. 1) Existence of cost functions : But isn’t gradient descent only compatible with “supervised” learning? No! Lots of unsupervised learning paradigms operate via gradient descent… generative adversarial network

  26. 1) Existence of cost functions : Signatures of error signals being computed in the visual hierarchy?!

  27. 1) Existence of cost functions : Take Away The brain could efficiently compute approximate gradients of its multi-layer weight matrix via propagating credit through multiple layers of neurons. Diverse potential mechanisms available. Such a core capability for error-driven learning could underpin diverse supervised and unsupervised learning paradigms.

  28. 1) Existence of cost functions : Key Research Questions Does it actually do this? Can this be used to explain features of the cortical architecture, e.g., dendritic computation in pyramidal neurons?

  29. Three hypotheses for linking neuroscience and ML 2) Biological fine-structure of cost functions : the cost functions are diverse, area-specific and systematically regulated in space and time A B Cortical Area Label Error Internally-Generated Cost Function Error Other inputs to cost function Inputs Inputs C

  30. 2) Biological fine-structure of cost functions : the cost functions are diverse, area-specific and systematically regulated in space and time Global “value functions” vs. multiple local internal cost functions These diagrams describe a global “value function” for “end-to-end” training of the entire brain… but these aren’t the whole story! Randal O’Reilly

  31. Internally-generated bootstrap cost functions: against “end to end” training Simple optical flow calculation provides an internally generated “bootstrap” training signal for hand recognition Optical flow: bootstraps hand recognition Hands + faces: bootstraps gaze direction recognition Gaze direction (and more): bootstraps more complex social cognition

  32. Internally-generated bootstrap cost functions: against “end to end” training Generalizations of this idea could be a key architectural principle for how the biological brain would generate and use internal training signals (a form of “weak label”)

  33. But how are internal cost functions represented and delivered ? Normal backprop: need a full vectorial target pattern to train towards Reinforcement: problems of credit assignment are even worse A B Cortical Area Label Error Internally-Generated Cost Function Error ? Other inputs to cost function Inputs Inputs C

  34. But how are internal cost functions represented and delivered ? Normal backprop: need a full vectorial target pattern to train towards Reinforcement: problems of credit assignment are even worse A B Cortical Area Label Error Internally-Generated Cost Function Error ? Other inputs to cost function Inputs Inputs Possibility : C The brain may re-purpose deep reinforcement learning to optimize diverse internal cost functions, which are computed internally and delivered as scalars

  35. Ways of making deep RL efficient

  36. Ways of making deep RL efficient “biologically plausible”?

  37. A complex molecular and cellular basis for reinforcement-based training in primary visual cortex Reinforcement in striatum: VTA dopaminergic projections Reinforcement in cortex: basal forebrain cholinergic projections with a glial intermediate! (i.e., glia not neurons)

  38. A diversity of reinforcement-like signals? Classic work by Eve Marder in the crab stomatogastric ganglion

Recommend


More recommend