DATA 1 Organising Deep Networks Edouard Oyallon advisor: Stéphane Mallat following the works of Laurent Sifre, Joan Bruna, … collaborators : Eugene Belilovsky, Sergey Zagoruyko, Jörn Jacobsen, …
DATA 2 2 High Dimensional classification ( x i , y i ) ∈ R 224 2 × { 1 , ..., 1000 } , i < 10 6 → ˆ y ( x )? − Estimation problem Training set to "Rhino" predict labels • Caltech 101, etc Not a "rhino" "Rhinos"
DATA 3 Fighting the curse of dimensionality • Objective : building a representation of such that a Φ x x <latexit sha1_base64="eqfykHkxP29JLdLAGokHSA1pBNI=">AAAB53icbZDLSgMxFIbP1Futt6pLN8EiuCozCupKC25ctuDYQjuUTHqmjc1khiSjlNIncONCxa3P4hu4821MLwtt/SHw8f/nkHNOmAqujet+O7ml5ZXVtfx6YWNza3unuLt3p5NMMfRZIhLVCKlGwSX6hhuBjVQhjUOB9bB/Pc7rD6g0T+StGaQYxLQrecQZNdaqPbaLJbfsTkQWwZtB6erz9LQCANV28avVSVgWozRMUK2bnpuaYEiV4UzgqNDKNKaU9WkXmxYljVEHw8mgI3JknQ6JEmWfNGTi/u4Y0ljrQRzaypianp7PxuZ/WTMz0UUw5DLNDEo2/SjKBDEJGW9NOlwhM2JggTLF7ayE9aiizNjbFOwRvPmVF8E/KZ+V3ZpXqlzCVHk4gEM4Bg/OoQI3UAUfGCA8wQu8OvfOs/PmvE9Lc86sZx/+yPn4AYu+jmw=</latexit> <latexit sha1_base64="O0lQJ2xoQXj8ACW3DbgTbPBIgF0=">AAAB53icbZDLSgNBEEVr4ivGV9Slm8YguAozCupKA25cJuCYQDKEnk5N0qbnQXePEoZ8gRsXKm79Fv/AnX9jZ5KFJl5oONxbRVeVnwiutG1/W4Wl5ZXVteJ6aWNza3unvLt3p+JUMnRZLGLZ8qlCwSN0NdcCW4lEGvoCm/7wepI3H1AqHke3epSgF9J+xAPOqDZW47FbrthVOxdZBGcGlavP01z1bvmr04tZGmKkmaBKtR070V5GpeZM4LjUSRUmlA1pH9sGIxqi8rJ80DE5Mk6PBLE0L9Ikd393ZDRUahT6pjKkeqDms4n5X9ZOdXDhZTxKUo0Rm34UpILomEy2Jj0ukWkxMkCZ5GZWwgZUUqbNbUrmCM78yovgnlTPqnbDqdQuYaoiHMAhHIMD51CDG6iDCwwQnuAFXq1769l6s96npQVr1rMPf2R9/ABGL474</latexit> simple (say euclidean) classifier can estimate the ˆ y label : y Φ D � d w R D R d • Designing consist of building an approximation of a Φ low dimensional space which is regular with respect to the class: k Φ x � Φ x 0 k n 1 ) ˆ y ( x 0 ) y ( x ) = ˆ • Necessary dimensionality and variance reduction Completely solved by the deep blackbox
DATA 4 non- linear operator linear operator x j +1 = ρ j W j x j Classi fi er … ρ J − 1 W J − 1 x J = Φ x ρ 1 W 1 ρ 0 W 0 x 0 x 2 x 1 W i learned from labeled data Ref.: ImageNet Classi fi cation with Solving it: DeepNetwork Deep Convolutional Neural Networks. A Krizhevsky et al.
DATA 5 Why mathematics about deep learning are important • Pure black box . Few mathematical results are available. Many rely on a "manifold hypothesis". Ex: stability to di ff eomorphisms • No stability results . It means that "small" variations of the inputs might have a large impact on the system. And this happens. Ref.: Intriguing properties of neural networks. C. Szegedy et al. • No generalisation result. Rademacher complexity can not explain the generalization properties. Ref.: Understanding deep learning requires rethinking generalization C. Zhang et al. • Shall we learn each layer from scratch? (geometric priors?) The deep cascade makes features are hard to Ref.: Deep Roto-Translation Scattering interpret for Object Classi fi cation. EO and S Mallat
DATA 6 Organization is a key • Consider a problem of questionnaires: people answer to 0 or 1 to some question. What does organizing mean? Ref.: Harmonic Analysis of Digital Data Bases Coifman R. et al. In general, Organizing questions Questions Questions works tackle only one of the aspect Answers Answers Both Organizing answers neighbours Questions Questions become meaningful: local metrics Answers Answers
DATA 7 Structuring the input with the Scattering Transform • Scattering Transform is a deep local descriptor of S J 2 J neighbourhood of amplitude , for images. Ref.: Group Invariant Scattering, Mallat S • It is a representation built via geometry with limited learning. (~SIFT) Ref.: Invariant Convolutional Scattering Network, J. Bruna and S Mallat • Successfully used in several applications: All variabilities are known • Digits Small deformations +Translation Rotation+Scale Ref.: Rotation, Scaling and Deformation Invariant Scattering for texture discrimination, Sifre L and Mallat S. • Textures
DATA 8 Scattering on ImageNet: Geometry in CNNs Ref.: Scaling the Scattering Transform: Deep Hybrid Networks EO, E Belilovsky, S Zagoruyko x ResNet S J • Cascading a modern CNN leads to almost state-of-the- art result on Imagenet2012: 1M images for training, 400k testing, 1000 classes Accuracy Depth #params 80.1 9 61M AlexNet ResNet 88.8 18 11.7M Scat+ResNet 88.6 10 12.8M • Demonstrates no loss of information + Less layers Learning?
DATA 9 Benchmarking Scattering + small data Ref.: Scaling the Scattering Transform: x ResNet S J Deep Hybrid Networks EO, E Belilovsky, S Zagoruyko • Adding geometric prior regularises the CNN input, in the particular case of limited samples situations, without reducing the number of parameters. • State-of-the-art results on STL10 and CIFAR10: STL10: 5k training, 8k testing, 10 classes Cifar10, 10 classes + 100k unlabeled(not used!!) CNN Scattering+CNN Number of samples 100 Accuracy 500 Scattering+CNN 76 1000 50000 Deep 70 0 25 50 75 100 Unsupervised 75 Geometry helps Accuracy
DATA 10 Necessary mechanism: Separation - Contraction • In high dimension, typical distances are huge, thus an appropriate representation must contract the space: k Φ x � Φ x 0 k k x � x 0 k Ref.: Understanding deep convolutional networks S Mallat boundary of the training set Φ • While avoiding the di ff erent classes to collapse: 9 ✏ > 0 , y ( x ) 6 = y ( x 0 ) ) k Φ x � Φ x 0 k � ✏ ✏ classification boundary
DATA 11 Complexity measure • Measuring the complexity of the classification boundary (estimating the local dimensionality is hard ) Ref.: Building a Regular Decision complexity Boundary with Deep Networks EO • Progressive contraction of the space, at each layer: 100 100 Classification acc% layer 1 75 # boundary points layer 2 75 layer 3 50 Explains the layer 4 50 improvement 25 layer 5 25 last layer 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Depth Complexity What variabilities are reduced?
DATA 12 Identifying the variabilities? • Several works showed a DeepNet exhibits some covariance: Ref.: Understanding deep features with computer-generated imagery, M Aubry, B Russel • Manifold of faces at a certain depth (e.g. good interpolations): Ref.: Unsupervised Representation Learning with Deep Convolutional GAN, • It is hard to enumerate them… Radford, Metz & Chintalah
DATA 13 Flattening the variability Organised Not organised Layers Layers Defining an order on layers of neurons Organized Not organised 12 6 Ref.: Multiscale Hierarchical Convolutional Networks J Jacobsen, EO, S Mallat, AWM Smeulders 0 #params
DATA 14 Conclusion • Stability, generalisation results, interpretability are important aspects… • Check the website of the team DATA: http://www.di.ens.fr/data/ • Check my webpage for softwares and papers: http://www.di.ens.fr/~oyallon/ Jörn Eugene Sergey Stéphane Jacobsen Belilovsky Zagoruyko Mallat Thank you!
Recommend
More recommend