#82 Adaptive Neural Trees Ryutaro Tanno , Kai Arulkumaran, Daniel C. Alexander, Antonio Criminisi, Aditya Nori
Two Paradigms of Machine Learning Deep Neural Networks Decision Trees 『 hierarchical representation of data 』 『 hierarchical clustering of data 』 Super-resolution of dMR brain images with a DT [Alexander et al. NeuroImage 2017] ImageNet classifiers with CNNs [Zeiler and Fergus, ECCV 2014] Trainable Low-level Mid-level High-level White Matter Classifier features features features Water Grey matter Oriented edges & Textures & patterns Object parts colours
Two Paradigms of Machine Learning Deep Neural Networks Decision Trees 『 hierarchical representation of data 』 『 hierarchical clustering of data 』
Two Paradigms of Machine Learning Deep Neural Networks Decision Trees 『 hierarchical representation of data 』 『 hierarchical clustering of data 』 + learn features of data + scalable learning with stochastic optimisation - architectures are hand-designed - heavy-weight inference, engaging every parameter of the model for each input
Two Paradigms of Machine Learning Deep Neural Networks Decision Trees 『 hierarchical representation of data 』 『 hierarchical clustering of data 』 - operate on hand-designed features + learn features of data - limited expressivity with simple splitting functions + scalable learning with stochastic optimisation + architectures are learned from data - architectures are hand-designed + lightweight inference, activating only a fraction - heavy-weight inference, engaging every of the model per input parameter of the model for each input
Joining the Paradigms Adaptive Neural Trees 『 hierarchical representation of data 』 『 hierarchical clustering of data 』 + architectures are learned from data + learn features of data + lightweight inference, activating only a fraction + scalable learning with stochastic optimisation of the model per input ANTs unify the two paradigms and generalise previous work
Joining the Paradigms Adaptive Neural Trees 『 hierarchical representation of data 』 『 hierarchical clustering of data 』 + architectures are learned from data + learn features of data + lightweight inference, activating only a fraction + scalable learning with stochastic optimisation of the model per input ANTs unify the two paradigms and generalise previous work
What are ANTs? • ANTs consist of two key designs:
What are ANTs? • ANTs consist of two key designs: (1). DTs which uses NNs in every path and routing decisions. input, x
What are ANTs? • ANTs consist of two key designs: (1). DTs which uses NNs in every path and routing decisions. (2). DT-like architecture growth using SGD (a) Split (b) Deepen Target Node OR
What are ANTs? • ANTs consist of two key designs: (1). DTs which uses NNs in every path and routing decisions. (2). DT-like architecture growth using SGD
Conditional Computation Multi-path inference Single-path inference • Single-path inference enables e ffi cient inference without compromising accuracy. Errors Number of Parameters MNIST CIFAR10 SARCOS MNIST CIFAR10 SARCOS (%) (%) (mse) 1.3M 101K 100K 10 1.6 1.8 Model size drops! 51K 0.65M 50K 0.9 5 0.8 ANT 0 0K 0 0 ANT 0K 0M ANT 1 ANT 2 ANT 3 ANT 1 ANT 2 ANT 3 ANT 1 ANT 2 ANT 3 ANT 1 ANT 2 ANT 3
Adaptive Model Complexity • ANTs can tune the architecture to the availability of training data. Models are trained on subsets of size 50, 250, 500, 2.5k, 5k, 25k, 45k examples.
Unsupervised Hierarchical Clustering Please come & see me at poster #82 for details!
Recommend
More recommend