structural priors in deep neural networks
play

Structural Priors in Deep Neural Networks YANI IOANNOU, MAR. 12 TH - PowerPoint PPT Presentation

Structural Priors in Deep Neural Networks YANI IOANNOU, MAR. 12 TH 2018 About Me o Yani Ioannou (yu-an-nu) o Ph.D. Student, University of Cambridge o Dept. of Engineering, Machine Intelligence Lab o Prof. Roberto Cipolla, Dr. Antonio Criminisi


  1. Structural Priors in Deep Neural Networks YANI IOANNOU, MAR. 12 TH 2018

  2. About Me o Yani Ioannou (yu-an-nu) o Ph.D. Student, University of Cambridge o Dept. of Engineering, Machine Intelligence Lab o Prof. Roberto Cipolla, Dr. Antonio Criminisi (MSR) o Research scientist at Wayve o Self-driving car start-up in Cambridge o Have lived in 4 countries (Canada, UK, Cyprus and Japan)

  3. Research Background o M.Sc. Computing, Queen’s University o Prof. Michael Greenspan o 3D Computer Vision o Segmentation and recognition in massive unorganized point clouds of urban environments o “Difference of Normals” multi-scale operator (Published at 3DIMPVT)

  4. Research Background o Ph.D. Engineering, University of Cambridge (2014 - 2018) o Prof. Roberto Cipolla, Dr. Antonio Criminisi (Microsoft Research) o Microsoft PhD Scholarship, 9-month internship at Microsoft Research c. 1496 c. 2012

  5. Ph.D. – Collaborative Work o Segmentation of brain tumour tissues with CNNs D. Zikic, Y. Ioannou, M. Brown, A. Criminisi (MICCAI-BRATS 2014) MICCAI-BRATS 2014 o One of the first papers using deep learning for volumetric/medical imagery o Using CNNs for Malaria Diagnosis Intellectual Ventures/Gates Foundation o Designed CNN for the classification of malaria parasites in blood smears o Measuring Neural Net Robustness with Constraints O. Bastani, Y. Ioannou, L. Lampropoulos, D. Vytiniotis, A. Nori, A. Criminisi NIPS 2016 o Found that not all adversarial images can be used to improve network robustness o Refining Architectures of Deep Convolutional Neural Networks S. Shankar, D. Robertson, Y. Ioannou, A. Criminisi, R. Cipolla CVPR 2016 o Proposed a method for adapting neural network architectures to new datasets

  6. Ph.D. – First Author o Thesis: “Structural Priors in Deep Neural Networks” o Training CNNs with Low-Rank Filters for Efficient Image Classification Yani Ioannou, Duncan Robertson, Jamie Shotton, Roberto Cipolla, Antonio Criminisi ICLR 2016 o Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups Yani Ioannou, Duncan Robertson, Roberto Cipolla, Antonio Criminisi CVPR 2017 o Decision Forests, Convolutional Networks and the Models In-Between Y. Ioannou, D. Robertson, D. Zikic, P. Kontschieder, J. Shotton, M. Brown, A. Criminisi Microsoft Research Tech. Report (2015)

  7. Motivation o Deep Neural Networks are massive! o AlexNet 1 (2012) o 61 million parameters o 724 million FLOPS o Most compute in conv. layers 1 Krizhevsky, Sutskever, and Hinton, “ImageNet Classification with Deep Convolutional Neural Networks” 2 He, Zhang, Ren, and Sun, “Deep Residual Learning for Image Recognition”

  8. Motivation o Deep Neural Networks are massive! o AlexNet 1 (2012) o 61 million parameters o 724 million FLOPS o 96% of param in F.C. layers! 1 Krizhevsky, Sutskever, and Hinton, “ImageNet Classification with Deep Convolutional Neural Networks” 2 He, Zhang, Ren, and Sun, “Deep Residual Learning for Image Recognition”

  9. Motivation o Deep Neural Networks are massive! o AlexNet 1 (2012) o 61 million parameters o 7.24x10 8 million FLOPS o ResNet 2 200 (2015) o 62.5 million parameters o 5.65x10 12 FLOPS o 2-3 weeks of training on 8 GPUs 1 Krizhevsky, Sutskever, and Hinton, “ImageNet Classification with Deep Convolutional Neural Networks” 2 He, Zhang, Ren, and Sun, “Deep Residual Learning for Image Recognition”

  10. Motivation alexnet 18% Crop & Mirror Aug. o Until very recently, state-of-the- Extra Augmentation art DNNs for Imagenet were only 16% getting more computationally complex 14% vgg-11 Top-5 Error o Each generation increased in 12% googlenet depth and width vgg-11 10% vgg-13 o Is it necessary to increase vgg-16 (C) googlenet 10x vgg-19 vgg-16 (D) resnet-50-mirror-earlylr complexity to improve 8% googlenet 144x generalization? msra-a msra-b pre-resnet 200 6% msra-c 10 8 10 9 10 10 10 11 10 12 10 13 10 14 log 10 (Multiply-Accumulate Operations) 17

  11. Over-parameterization of DNNs o There are many proposed methods for improving the test time efficiency of DNNs showing that trained DNNs are over-parameterized o Compression o Pruning o Reduced Representation

  12. Structural Prior Incorporating our prior knowledge of the problem and its representation into the connective structure of a neural network o Optimization of neural networks needs to learn what weights not to use o This is usually achieved with regularization o Can we structure networks closer to the specialized components used for learning images with our prior knowledge of the problem/it’s representation? o Structural Priors ⊂ Network Architecture ◦ architecture is a more general term, i.e., number of layers, activation functions, pooling, etc.

  13. Regularization o Regularization does help training, but is not a substitute for good structural priors o MacKay (1991): regularization is not enough to make an over-parameterized network generalize as well as a network with a more appropriate parameterization o We liken regularization to a weak structural prior o Used where our only prior knowledge is that our network is greatly over-parameterized

  14. Rethinking Regularization o “Understanding deep learning requires rethinking generalization”, Zhang et al., 2016 o “Deep neural networks easily fit random labels.” o Identifies types of “regularization”: o “Explicit regularization” – i.e. weight decay, dropout and data augmentation o “Implicit regularization” – i.e. early stopping, batch normalization o “Network architecture” o Explicit regularization has little effect on fitting random labels, while implicit regularization and network architecture does o Highlights the importance of network architecture, and by extension structural priors, for good generalization

  15. Convolutional Neural Networks Prior Knowledge for Natural Images: o Local correlations are very important o -> Convolutional filters o We don’t need to learn a different filter for each pixel o -> Shared weights

  16. c 2 filters ∗ … ReLU H c 1 c 1 c 2 h 1 w 1 W H W filter input image/ (parameters) feature map Convolutional Neural Networks Structural Prior for Natural Images

  17. Connection structure Kernel Connection weights 0 1 2 3 4 5 6 7 8 9 10 11 Input image Input pixels 0 1 2 3 4 5 6 7 8 9 (zero-padded 3 x 4 pixels) Output pixels Fully connected N/A (a) 0 1 2 3 layer structure 4 5 6 7 8 9 10 11 10 11 Output pixels Output Input pixels Input pixels feature map (4 x 3) Convolutional (b) 0 1 2 3 3 × 3 square 4 5 6 7 8 9 10 11 Input pixels Convolutional Neural Networks Structural Prior for Natural Images

  18. Ph.D. Thesis Outline My thesis is based on three novel contributions which have explored separate aspects of structural priors in DNN: I. Spatial Connectivity II. Inter-Filter Connectivity III. Conditional Connectivity

  19. Spatial Connectivity

  20. Spatial Connectivity Prior Knowledge: o Many of the filters learned in CNNs appear to be representing vertical/horizontal edges/relationships o Many others appear to be representable by combinations of low-rank filters o Previous work had shown that full-rank filters could be replaced with low rank approximations , e.g. Jaderberg (2014) Does every filter need to be square in a CNN?

  21. c 3 filters c 2 filters * * H H … … H d c 1 c 2 c 1 1 h W W c 2 w W 1 Approximated Low-Rank Filters Jaderberg, Max, Andrea Vedaldi, and Andrew Zisserman (2014) “Speeding up Convolutional Neural Networks with Low Rank Expansions”.

  22. c 2 filters c 3 filters ∗ … ∗ … H H c 1 ReLU ReLU c 3 c 1 c 2 c 2 W h 1 w 1 W H W 1 1 CNN with Low-Dimensional Embedding Typical sub-architecture found in Network-in-Network, ResNet/Inception

  23. c 2 filters c 3 filters … ∗ ∗ ReLU … H H ReLU c 3 c 1 c 2 c 2 W W H … W 1 1 Proposed: Low-Rank Basis Same total number of filters on each layer as original network, but 50% are 1x3, and 50% are 3x1

  24. c 2 filters … c 3 filters ∗ ∗ … ReLU … H H ReLU c 3 c 1 c 2 c 2 W W H W 1 1 … Proposed Structural Prior: Low-Rank + Full Basis 25% of total filters are full 3x3

  25. c 1 1 … 1 c 2 filters c 3 filters … c 1 3 3 * * H H H … c 3 c 1 c 2 W W … W 1 c 2 c 1 5 1 5 … 7 c 1 7 Inception Learning a Filter-Size Basis – learning many small filters (1x1, 3x3), and fewer of the larger (5x5, 7x7)

  26. ImageNet Results o gmp: vgg-11 w/ global max pooling o gmp-lr-2x: o 60% less computation o gmp-lr-join-wfull: o 16% less computation o 1% pt. lower error

  27. c 2 filters … VGG-11 ILSVRC 21% fewer parameters, 41% less computation c 3 filters (low-rank only) ∗ ∗ … or … H H ReLU ReLU c 3 1% pt higher accuracy, 16% less computation c 1 c 2 c 2 W W H (low/full-rank mix) W 1 1 … Low-Rank Basis Structural Prior for CNNs

Recommend


More recommend