82
play

#82 Adaptive Neural Trees Ryutaro Tanno , Kai Arulkumaran, Daniel - PowerPoint PPT Presentation

#82 Adaptive Neural Trees Ryutaro Tanno , Kai Arulkumaran, Daniel C. Alexander, Antonio Criminisi, Aditya Nori Two Paradigms of Machine Learning Deep Neural Networks Decision Trees hierarchical representation of data hierarchical


  1. #82 Adaptive Neural Trees Ryutaro Tanno , Kai Arulkumaran, Daniel C. Alexander, Antonio Criminisi, Aditya Nori

  2. Two Paradigms of Machine Learning Deep Neural Networks Decision Trees 『 hierarchical representation of data 』 『 hierarchical clustering of data 』 Super-resolution of dMR brain images with a DT [Alexander et al. NeuroImage 2017] ImageNet classifiers with CNNs [Zeiler and Fergus, ECCV 2014] Trainable Low-level Mid-level High-level White Matter Classifier features features features Water Grey matter Oriented edges & Textures & patterns Object parts colours

  3. Two Paradigms of Machine Learning Deep Neural Networks Decision Trees 『 hierarchical representation of data 』 『 hierarchical clustering of data 』

  4. Two Paradigms of Machine Learning Deep Neural Networks Decision Trees 『 hierarchical representation of data 』 『 hierarchical clustering of data 』 + learn features of data + scalable learning with stochastic optimisation - architectures are hand-designed - heavy-weight inference, engaging every parameter of the model for each input

  5. Two Paradigms of Machine Learning Deep Neural Networks Decision Trees 『 hierarchical representation of data 』 『 hierarchical clustering of data 』 - operate on hand-designed features + learn features of data - limited expressivity with simple splitting functions + scalable learning with stochastic optimisation + architectures are learned from data - architectures are hand-designed + lightweight inference, activating only a fraction - heavy-weight inference, engaging every of the model per input parameter of the model for each input

  6. Joining the Paradigms Adaptive Neural Trees 『 hierarchical representation of data 』 『 hierarchical clustering of data 』 + architectures are learned from data + learn features of data + lightweight inference, activating only a fraction + scalable learning with stochastic optimisation of the model per input ANTs unify the two paradigms and generalise previous work

  7. Joining the Paradigms Adaptive Neural Trees 『 hierarchical representation of data 』 『 hierarchical clustering of data 』 + architectures are learned from data + learn features of data + lightweight inference, activating only a fraction + scalable learning with stochastic optimisation of the model per input ANTs unify the two paradigms and generalise previous work

  8. What are ANTs? • ANTs consist of two key designs:

  9. What are ANTs? • ANTs consist of two key designs: (1). DTs which uses NNs in every path and routing decisions. input, x

  10. What are ANTs? • ANTs consist of two key designs: (1). DTs which uses NNs in every path and routing decisions. (2). DT-like architecture growth using SGD (a) Split (b) Deepen Target Node OR

  11. What are ANTs? • ANTs consist of two key designs: (1). DTs which uses NNs in every path and routing decisions. (2). DT-like architecture growth using SGD

  12. Conditional Computation Multi-path inference Single-path inference • Single-path inference enables e ffi cient inference without compromising accuracy. Errors Number of Parameters MNIST CIFAR10 SARCOS MNIST CIFAR10 SARCOS (%) (%) (mse) 1.3M 101K 100K 10 1.6 1.8 Model size drops! 51K 0.65M 50K 0.9 5 0.8 ANT 0 0K 0 0 ANT 0K 0M ANT 1 ANT 2 ANT 3 ANT 1 ANT 2 ANT 3 ANT 1 ANT 2 ANT 3 ANT 1 ANT 2 ANT 3

  13. Adaptive Model Complexity • ANTs can tune the architecture to the availability of training data. Models are trained on subsets of size 50, 250, 500, 2.5k, 5k, 25k, 45k examples.

  14. Unsupervised Hierarchical Clustering Please come & see me at poster #82 for details!

Recommend


More recommend