learning hierarchical information flow with recurrent
play

Learning Hierarchical Information Flow with Recurrent Neural Modules - PowerPoint PPT Presentation

Learning Hierarchical Information Flow with Recurrent Neural Modules Danijar Hafner 1 , Alex Irpan 1 , James Davidson 1 , Nicolas Heess 2 1 Google Brain, 2 DeepMind NIPS 2017 #3374 1. Contribution Brain-inspired modular sequence model


  1. Learning Hierarchical Information Flow with Recurrent Neural Modules Danijar Hafner 1 , Alex Irpan 1 , James Davidson 1 , Nicolas Heess 2 1 Google Brain, 2 DeepMind NIPS 2017 #3374

  2. 1. Contribution Brain-inspired modular sequence model outperforming stacked GRUs. Learn connectivity rather than manually defining layer structure for the task. Learns skip-connections, feedback loops, and discovers novel connectivity patterns.

  3. 1. Contribution Brain-inspired modular sequence model outperforming stacked GRUs. Learn connectivity rather than manually defining layer structure for the task. Learns skip-connections, feedback loops, and discovers novel connectivity patterns.

  4. 1. Contribution Brain-inspired modular sequence model outperforming stacked GRUs. Learn connectivity rather than manually defining layer structure for the task. Learns skip-connections, feedback loops, and discovers novel connectivity patterns.

  5. 2. Motivation Neocortex often described as hierarchy but there are many side-connections and feedback loops: MST, MT FST V1 V2 V4 IT A V3 TEO PR, PH ER Figure adapted from: Gross et al. 1993. Inferior temporal cortex as a pattern recognition device. Areas communicate both directly and indirectly via the thalamus. We focus on the latter here. Modules communicating via a routing center include hierarchy as a special case.

  6. 2. Motivation User udaix, Shutterstock.

  7. 2. Motivation Oh et al. A mesoscale connectome of the mouse brain. Nature. 2014. Figure 6.

  8. 3. Method: ThalNet MODULE B MODULE A MODULE D TASK TASK CENTER INPUT OUTPUT MODULE C Multiple recurrent modules share their features via a routing center.

  9. 3. Method: ThalNet x 1 x 2 x 3 MODULE A MODULE A MODULE A MODULE B CENTER MODULE B CENTER MODULE B MODULE C MODULE C MODULE C y 1 y 2 y 3 The center concatenates the features and lets modules read from it at the next time step.

  10. 3. Method: ThalNet x 1 x 2 x 3 MODULE A MODULE A MODULE A MODULE B CENTER MODULE B CENTER MODULE B MODULE C MODULE C MODULE C y 1 y 2 y 3 The center concatenates the features and lets modules read from it at the next time step.

  11. 3. Method: ThalNet x 1 x 2 x 3 MODULE A MODULE A MODULE A MODULE B CENTER MODULE B CENTER MODULE B MODULE C MODULE C MODULE C y 1 y 2 y 3 The center concatenates the features and lets modules read from it at the next time step.

  12. 3. Method: ThalNet TASK OUTPUT B TASK C CENTER INPUT D × MODULE A A = CONTEXT Reading mechanisms can be static or dynamic, allowing locations to change.

  13. 3. Method: ThalNet TASK OUTPUT Linear reading B TASK CENTER C INPUT ● Can be unstable to train × D MODULE A ● Less interpretable reading weights A Weight normalization = CONTEXT ● Static reading at same location ● Works well in practice Fast softmax ● Dynamic weights based on current RNN state ● Many parameters (features x center x context) Fast Gaussian ● Dynamic and fewer parameters, but unstable to train

  14. 3. Method: ThalNet TASK OUTPUT Linear reading B TASK CENTER C INPUT × D MODULE A A Weight normalization = CONTEXT Fast softmax ` Fast Gaussian

  15. 3. Method: ThalNet TASK OUTPUT B TASK C CENTER INPUT D × MODULE A A = CONTEXT Reading mechanisms can be static or dynamic, allowing locations to change.

  16. 3. Method: ThalNet TASK OUTPUT B TASK C CENTER INPUT D × MODULE A A = CONTEXT Reading mechanisms can be static or dynamic, allowing locations to change.

  17. 4. Findings A A A B B B C C C D D D

  18. 4. Findings A A A B B B C C C D D D

  19. 4. Findings A A A B B B C C C D D D

  20. 4. Findings skip connection skip connection feedback connection A B C D x y A B C D x y skip connection feedback connection A A A B B B C C C D D D

  21. 4. Findings skip connection skip connection feedback connection A B C D x y A B C D x y skip connection feedback connection A A A B B B C C C D D D

  22. 4. Findings ThalNet learns hierarchical information flow, skip-connections, and long feedback loops. Hierarchical connections are known from feed-forward neural networks. Skip-connections are known from ResNet architectures. Long feedback loops could be beneficial for recurrent machine learning models.

  23. 4. Findings ThalNet learns hierarchical information flow, skip-connections, and long feedback loops. Hierarchical connections are known from feed-forward neural networks. Skip-connections are known from ResNet architectures. Long feedback loops could be beneficial for recurrent machine learning models. Similar connectivity is learned for the same task.

  24. 4. Findings ThalNet learns hierarchical information flow, skip-connections, and long feedback loops. Hierarchical connections are known from feed-forward neural networks. Skip-connections are known from ResNet architectures. Long feedback loops could be beneficial for recurrent machine learning models. Similar connectivity is learned for the same task. Static weight normalized reading is fast and performs well. Fast reading mechanisms can be explored further in the future.

  25. 5. Performance Outperforms stacked GRU in test performance on several sequential tasks.

  26. 5. Performance Outperforms stacked GRU in test performance on several sequential tasks.

  27. 5. Performance Outperforms stacked GRU in test performance on several sequential tasks.

  28. 5. Performance Outperforms stacked GRU in test performance on several sequential tasks.

  29. 6. Conclusion Brain-inspired modular sequence model outperforming stacked GRUs. Modularity and reading bottleneck regularize the model and improve generalization. Other recurrent models might benefit from long feedback loops learned by ThalNet. Provides framework for multi-task learning and online architecture search. Project page: https://danijar.com/thalnet Contact: mail@danijar.com

  30. Bonus: more reading masks

  31. Reading mechanisms: fully connected tanh layer Almost no connection pattern visible Similar performance on MNIST, slightly worse on text8 (fewer parameters)

  32. Reading mechanisms: fast softmax weights Selection based on softmax mask computed as function of module features Too many parameters to compute fast weights as activations!

  33. Reading mechanisms: fast softmax weights 1 x 0 3 y 2

  34. Reading mechanisms: softmax weights Hierarchical information flow with feedback cycles and skip connections emerges Slightly worse performance than linear mapping

  35. Reading mechanisms: softmax weights skip connection + feedback weight 1 x 0 3 y x 0 2 1 3 y 2 feedback weight

  36. Reading mechanisms: Gaussian kernel Very few parameters, can afford fast weights again Experimented with soft kernel and sampled version I couldn't make this work well, tips appreciated

  37. Reading mechanisms: MNIST text8 4 x FF10-GRU10 4 x FF32-GRU32 softmax weights Forms similar connection patterns on same task Clearer read boundaries on text8 (larger task) than on MNIST

Recommend


More recommend