Organising Deep Networks Edouard Oyallon advisor: Stphane Mallat - PowerPoint PPT Presentation

DATA 1 Organising Deep Networks Edouard Oyallon advisor: Stéphane Mallat following the works of Laurent Sifre, Joan Bruna, … collaborators : Eugene Belilovsky, Sergey Zagoruyko, Jörn Jacobsen, …

DATA 2 2 High Dimensional classification ( x i , y i ) ∈ R 224 2 × { 1 , ..., 1000 } , i < 10 6 → ˆ y ( x )? − Estimation problem Training set to   "Rhino" predict labels • Caltech 101, etc Not a "rhino" "Rhinos"

DATA     3 Fighting the curse of dimensionality • Objective : building a representation of such that a Φ x x <latexit sha1_base64="eqfykHkxP29JLdLAGokHSA1pBNI=">AAAB53icbZDLSgMxFIbP1Futt6pLN8EiuCozCupKC25ctuDYQjuUTHqmjc1khiSjlNIncONCxa3P4hu4821MLwtt/SHw8f/nkHNOmAqujet+O7ml5ZXVtfx6YWNza3unuLt3p5NMMfRZIhLVCKlGwSX6hhuBjVQhjUOB9bB/Pc7rD6g0T+StGaQYxLQrecQZNdaqPbaLJbfsTkQWwZtB6erz9LQCANV28avVSVgWozRMUK2bnpuaYEiV4UzgqNDKNKaU9WkXmxYljVEHw8mgI3JknQ6JEmWfNGTi/u4Y0ljrQRzaypianp7PxuZ/WTMz0UUw5DLNDEo2/SjKBDEJGW9NOlwhM2JggTLF7ayE9aiizNjbFOwRvPmVF8E/KZ+V3ZpXqlzCVHk4gEM4Bg/OoQI3UAUfGCA8wQu8OvfOs/PmvE9Lc86sZx/+yPn4AYu+jmw=</latexit> <latexit sha1_base64="O0lQJ2xoQXj8ACW3DbgTbPBIgF0=">AAAB53icbZDLSgNBEEVr4ivGV9Slm8YguAozCupKA25cJuCYQDKEnk5N0qbnQXePEoZ8gRsXKm79Fv/AnX9jZ5KFJl5oONxbRVeVnwiutG1/W4Wl5ZXVteJ6aWNza3unvLt3p+JUMnRZLGLZ8qlCwSN0NdcCW4lEGvoCm/7wepI3H1AqHke3epSgF9J+xAPOqDZW47FbrthVOxdZBGcGlavP01z1bvmr04tZGmKkmaBKtR070V5GpeZM4LjUSRUmlA1pH9sGIxqi8rJ80DE5Mk6PBLE0L9Ikd393ZDRUahT6pjKkeqDms4n5X9ZOdXDhZTxKUo0Rm34UpILomEy2Jj0ukWkxMkCZ5GZWwgZUUqbNbUrmCM78yovgnlTPqnbDqdQuYaoiHMAhHIMD51CDG6iDCwwQnuAFXq1769l6s96npQVr1rMPf2R9/ABGL474</latexit> simple (say euclidean) classifier can estimate the ˆ y label :   y Φ D � d w R D R d • Designing consist of building an approximation of a Φ low dimensional space which is regular with respect to the class: k Φ x � Φ x 0 k n 1 ) ˆ y ( x 0 ) y ( x ) = ˆ • Necessary dimensionality and variance reduction   Completely solved by the deep blackbox

DATA 4 non- linear operator linear operator x j +1 = ρ j W j x j Classi fi er … ρ J − 1 W J − 1 x J = Φ x ρ 1 W 1 ρ 0 W 0 x 0 x 2 x 1 W i learned from labeled data Ref.: ImageNet Classi fi cation with Solving it: DeepNetwork Deep Convolutional Neural Networks. A Krizhevsky et al.

DATA 5 Why mathematics about deep learning are important • Pure black box . Few mathematical results are available. Many rely on a "manifold hypothesis".   Ex: stability to di ff eomorphisms   • No stability results . It means that "small" variations of the inputs might have a large impact on the system. And this happens. Ref.: Intriguing properties of neural networks. C. Szegedy et al. • No generalisation result. Rademacher complexity can not explain the generalization properties. Ref.: Understanding deep learning requires rethinking generalization C. Zhang et al. • Shall we learn each layer from scratch? (geometric priors?) The deep cascade makes features are hard to Ref.: Deep Roto-Translation Scattering interpret for Object Classi fi cation. EO and S Mallat

DATA 6 Organization is a key • Consider a problem of questionnaires: people answer to 0 or 1 to some question. What does organizing mean? Ref.: Harmonic Analysis of Digital Data Bases Coifman R. et al. In general, Organizing questions Questions Questions works tackle only one of the aspect Answers Answers Both Organizing answers neighbours Questions Questions become meaningful: local metrics Answers Answers

DATA 7 Structuring the input with the Scattering Transform • Scattering Transform is a deep local descriptor of S J 2 J neighbourhood of amplitude , for images. Ref.: Group Invariant Scattering, Mallat S • It is a representation built via geometry with limited learning. (~SIFT) Ref.: Invariant Convolutional Scattering Network, J. Bruna and S Mallat • Successfully used in several applications: All variabilities are known • Digits Small deformations +Translation Rotation+Scale Ref.: Rotation, Scaling and Deformation Invariant Scattering for texture discrimination, Sifre L and Mallat S. • Textures

DATA 8 Scattering on ImageNet: Geometry in CNNs Ref.: Scaling the Scattering Transform: Deep Hybrid Networks EO, E Belilovsky, S Zagoruyko x ResNet S J • Cascading a modern CNN leads to almost state-of-the- art result on Imagenet2012: 1M images for training, 400k testing, 1000 classes Accuracy Depth #params 80.1 9 61M AlexNet ResNet 88.8 18 11.7M Scat+ResNet 88.6 10 12.8M • Demonstrates no loss of information + Less layers Learning?

DATA 9 Benchmarking Scattering + small data Ref.: Scaling the Scattering Transform: x ResNet S J Deep Hybrid Networks EO, E Belilovsky, S Zagoruyko • Adding geometric prior regularises the CNN input, in the particular case of limited samples situations, without reducing the number of parameters. • State-of-the-art results on STL10 and CIFAR10: STL10: 5k training, 8k testing, 10 classes Cifar10, 10 classes + 100k unlabeled(not used!!) CNN Scattering+CNN Number of samples 100 Accuracy 500 Scattering+CNN 76 1000 50000 Deep 70 0 25 50 75 100 Unsupervised 75 Geometry helps Accuracy

DATA 10 Necessary mechanism: Separation - Contraction • In high dimension, typical distances are huge, thus an appropriate representation must contract the space:   k Φ x � Φ x 0 k  k x � x 0 k Ref.: Understanding deep convolutional networks S Mallat boundary of the training set Φ • While avoiding the di ff erent classes to collapse: 9 ✏ > 0 , y ( x ) 6 = y ( x 0 ) ) k Φ x � Φ x 0 k � ✏ ✏ classification boundary

DATA 11 Complexity measure • Measuring the complexity of the classification boundary (estimating the local dimensionality is hard )   Ref.: Building a Regular Decision complexity Boundary with Deep Networks EO • Progressive contraction of the space, at each layer: 100 100 Classification acc% layer 1 75 # boundary points layer 2 75 layer 3 50 Explains the layer 4 50 improvement 25 layer 5 25 last layer 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Depth Complexity What variabilities are reduced?

DATA 12 Identifying the variabilities? • Several works showed a DeepNet exhibits some covariance: Ref.: Understanding deep features with computer-generated imagery, M Aubry, B Russel • Manifold of faces at a certain depth (e.g. good interpolations): Ref.: Unsupervised Representation Learning with Deep Convolutional GAN, • It is hard to enumerate them… Radford, Metz & Chintalah

DATA 13 Flattening the variability Organised Not organised Layers Layers Defining an order on layers of neurons Organized Not organised 12 6 Ref.: Multiscale Hierarchical Convolutional Networks J Jacobsen, EO, S Mallat, AWM Smeulders 0 #params

DATA 14 Conclusion • Stability, generalisation results, interpretability are important aspects… • Check the website of the team DATA:   http://www.di.ens.fr/data/ • Check my webpage for softwares and papers:   http://www.di.ens.fr/~oyallon/ Jörn Eugene Sergey Stéphane Jacobsen Belilovsky Zagoruyko Mallat Thank you!

Organising Deep Networks Edouard Oyallon advisor: Stphane Mallat - PowerPoint PPT Presentation

DATA 1 Organising Deep Networks Edouard Oyallon advisor: Stphane Mallat following the works of Laurent Sifre, Joan Bruna, collaborators : Eugene Belilovsky, Sergey Zagoruyko, Jrn Jacobsen, DATA 2 2 High Dimensional

Momentum- Driven Organising Presentation at By 2020 We Rise Up European meeting, 4-8 March 2020

Technologien und Mobilkommunikation Self-Healing in Self-Organising Networks Oliver Scheit

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Structure 1 Objective of an auction of EAs 2 Organising an auction of EAs 3 Special aspects

Sociocracy An Overview In a nutshell Working and organising together to get things done

Organising maintenance by contractors David Bevan Historic Church Buildings Support Officer

Supervised Self-Organising Maps similarity/distance (Kohonen, 1982). Ron Wehrens Institute of

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

E9 205 Machine Learning for Signal Processing Understanding Deep Networks 08-11-2019 Instructor

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Kenya is located in East Africa, which lies on the Equator The population of Kenya is 47.5

Computational Models of Language Learning Jelle Zuidema Institute for Logic, Language and

Constructing Sentiment Sensitive Vectors for Word Polarity Classification Speaker: Johann Chu

How Does an EV Work? Motor and Energy Storage Auke Hoekstra - Senior Advisor Electric Mobility,

Automatic Wrapper Generation and Data Extraction Kristina Lerman University of Southern

IP Infrastructure Geolocation Guan-Yan Cai, Michael McCarrin ,

Linpack Evaluation on Linpack Evaluation on a Supercomputer with p p Heterogeneous

Flexible Hierarchical Execution of Parallel Task Loops Michael Robson, Villanova University