Organizing Deep Networks Edouard Oyallon advisor: Stphane Mallat - PowerPoint PPT Presentation

DATA 1 Organizing Deep Networks Edouard Oyallon advisor: Stéphane Mallat following the works of Laurent Sifre, Joan Bruna, … collaborators : Eugene Belilovsky, Sergey Zagoruyko, Bogdan Cirstea, Jörn Jacobsen, …

2 DATA Classification of signals ( X, Y ) ∈ R n × Y • Let , random variables n > 0 • Problem : Estimate such that y = arg inf ˜ ˆ y E ( | ˜ y ( X ) − Y | ) ˆ y ( x i , y i ) ∈ R n × Y • We are given a training set to build ˆ y • Say one can write , Classifier being y = Classifier( Φ x ) ˆ built with ( Φ x i , y i ) • 3 ways to build :   Φ <latexit sha1_base64="eqfykHkxP29JLdLAGokHSA1pBNI=">AAAB53icbZDLSgMxFIbP1Futt6pLN8EiuCozCupKC25ctuDYQjuUTHqmjc1khiSjlNIncONCxa3P4hu4821MLwtt/SHw8f/nkHNOmAqujet+O7ml5ZXVtfx6YWNza3unuLt3p5NMMfRZIhLVCKlGwSX6hhuBjVQhjUOB9bB/Pc7rD6g0T+StGaQYxLQrecQZNdaqPbaLJbfsTkQWwZtB6erz9LQCANV28avVSVgWozRMUK2bnpuaYEiV4UzgqNDKNKaU9WkXmxYljVEHw8mgI3JknQ6JEmWfNGTi/u4Y0ljrQRzaypianp7PxuZ/WTMz0UUw5DLNDEo2/SjKBDEJGW9NOlwhM2JggTLF7ayE9aiizNjbFOwRvPmVF8E/KZ+V3ZpXqlzCVHk4gEM4Bg/OoQI3UAUfGCA8wQu8OvfOs/PmvE9Lc86sZx/+yPn4AYu+jmw=</latexit> <latexit sha1_base64="O0lQJ2xoQXj8ACW3DbgTbPBIgF0=">AAAB53icbZDLSgNBEEVr4ivGV9Slm8YguAozCupKA25cJuCYQDKEnk5N0qbnQXePEoZ8gRsXKm79Fv/AnX9jZ5KFJl5oONxbRVeVnwiutG1/W4Wl5ZXVteJ6aWNza3unvLt3p+JUMnRZLGLZ8qlCwSN0NdcCW4lEGvoCm/7wepI3H1AqHke3epSgF9J+xAPOqDZW47FbrthVOxdZBGcGlavP01z1bvmr04tZGmKkmaBKtR070V5GpeZM4LjUSRUmlA1pH9sGIxqi8rJ80DE5Mk6PBLE0L9Ikd393ZDRUahT6pjKkeqDms4n5X9ZOdXDhZTxKUo0Rm34UpILomEy2Jj0ukWkxMkCZ5GZWwgZUUqbNbUrmCM78yovgnlTPqnbDqdQuYaoiHMAhHIMD51CDG6iDCwwQnuAFXq1769l6s96npQVr1rMPf2R9/ABGL474</latexit> Supervised Unsupervised Predefined Geometric priors ( x i ) i ( x i , y i ) i Y = { } , n = 2 w Classifier w

DATA 3 3 High Dimensional classification ( x i , y i ) ∈ R 224 2 × { 1 , ..., 1000 } , i < 10 6 → ˆ y ( x )? − Estimation problem Training set to   "Rhino" predict labels • Caltech 101, etc Not a "rhino" "Rhinos"

DATA 4 High-dimensional variabilities • Claim : In , the variance is huge.   R n , n � 1 Ex .:   9 C > 0 , 8 n, P ( k X k � t ))  2 e − t 2 X ∼ N (0 , I n ) then Cn E ( X ) = 0 • Claim : Small deformations (not parametric) can have huge e ff ects:   Ex.:   x ∈ L 2 ( R n ) , L τ x ( u ) = x ( u − τ ( u )) define τ ∈ C ∞ ⌧ ( u ) = ✏ , C ⇢ R 2 , k 1 C � L τ 1 C k = 2 • The variance is high , and the bias is di ffi cult to estimate . There are also few available samples …   How to handle that? k x � y k 2 = 2 x y

DATA 5 Image variabilities Geometric variability Class variability Groups acting on images: translation, rotation, scaling Intraclass variability Not informative Extraclass variability Other sources : luminosity, occlusion, small deformations L τ x ( u ) = x ( u − τ ( u )) , τ ∈ C ∞ I − τ High variance: how to reduce it?

DATA     6 Fighting the curse of dimensionality • Objective : building a representation of such that a Φ x x <latexit sha1_base64="eqfykHkxP29JLdLAGokHSA1pBNI=">AAAB53icbZDLSgMxFIbP1Futt6pLN8EiuCozCupKC25ctuDYQjuUTHqmjc1khiSjlNIncONCxa3P4hu4821MLwtt/SHw8f/nkHNOmAqujet+O7ml5ZXVtfx6YWNza3unuLt3p5NMMfRZIhLVCKlGwSX6hhuBjVQhjUOB9bB/Pc7rD6g0T+StGaQYxLQrecQZNdaqPbaLJbfsTkQWwZtB6erz9LQCANV28avVSVgWozRMUK2bnpuaYEiV4UzgqNDKNKaU9WkXmxYljVEHw8mgI3JknQ6JEmWfNGTi/u4Y0ljrQRzaypianp7PxuZ/WTMz0UUw5DLNDEo2/SjKBDEJGW9NOlwhM2JggTLF7ayE9aiizNjbFOwRvPmVF8E/KZ+V3ZpXqlzCVHk4gEM4Bg/OoQI3UAUfGCA8wQu8OvfOs/PmvE9Lc86sZx/+yPn4AYu+jmw=</latexit> <latexit sha1_base64="O0lQJ2xoQXj8ACW3DbgTbPBIgF0=">AAAB53icbZDLSgNBEEVr4ivGV9Slm8YguAozCupKA25cJuCYQDKEnk5N0qbnQXePEoZ8gRsXKm79Fv/AnX9jZ5KFJl5oONxbRVeVnwiutG1/W4Wl5ZXVteJ6aWNza3unvLt3p+JUMnRZLGLZ8qlCwSN0NdcCW4lEGvoCm/7wepI3H1AqHke3epSgF9J+xAPOqDZW47FbrthVOxdZBGcGlavP01z1bvmr04tZGmKkmaBKtR070V5GpeZM4LjUSRUmlA1pH9sGIxqi8rJ80DE5Mk6PBLE0L9Ikd393ZDRUahT6pjKkeqDms4n5X9ZOdXDhZTxKUo0Rm34UpILomEy2Jj0ukWkxMkCZ5GZWwgZUUqbNbUrmCM78yovgnlTPqnbDqdQuYaoiHMAhHIMD51CDG6iDCwwQnuAFXq1769l6s96npQVr1rMPf2R9/ABGL474</latexit> simple (say euclidean) classifier can estimate the ˆ y label :   y Φ D � d w R D R d • Designing consist of building an approximation of a Φ low dimensional space which is regular with respect to the class: k Φ x � Φ x 0 k n 1 ) ˆ y ( x 0 ) y ( x ) = ˆ • Necessary dimensionality reduction  

DATA 7 Translation x k x � y k 2 = 2 y Rotation Averaging is the key to get invariants y x Averaging makes euclidean distance meaningful in high dimension

DATA   8 An example: Invariance to translation Translation operator L a x ( u ) = x ( u − a ) • In many cases, one wish to be invariant globally to translation, a simple way is to perform an averaging:   Z Z Ax = L a xda = x ( u ) du It’s the 0 frequency! AL a = A • Even if it can be localized, the averaging keeps the low frequency structures: the invariance brings a loss of information! A • Bias issue! How do we recover the missing information?

DATA 9 Necessary mechanism: Separation - Contraction • In high dimension, typical distances are huge, thus an appropriate representation must contract the space:   k Φ x � Φ x 0 k  k x � x 0 k Φ • While avoiding the di ff erent classes to collapse: 9 ✏ > 0 , y ( x ) 6 = y ( x 0 ) ) k Φ x � Φ x 0 k � ✏ ✏

DATA 10 Deep learning: Technical breakthrough • Deep learning has permitted to solve a large number of task that were considered as extremely challenging for a computer. • The technique that is used is generic and its success implies that it reduces those sources of variability. • Previous properties hold for deep learning. • How , why ?

DATA 11 non- linear operator linear operator x j +1 = ρ j W j x j Classi fi er … ρ J − 1 W J − 1 x J = Φ x ρ 1 W 1 ρ 0 W 0 x 0 x 2 x 1 x j ( ., ˜ X x j +1 ( u, � ) = ⇢ ( � ) ? w j, λ , ˜ λ ( u )) The kernel ˜ is learned λ Ref.: ImageNet Classi fi cation with DeepNetwork Deep Convolutional Neural Networks. A Krizhevsky et al.

DATA 12 Why mathematics about deep learning are important • Pure black box . Few mathematical results are available. Many rely on a "manifold hypothesis". Clearly wrong:   Ex: stability to di ff eomorphisms   • No stability results . It means that "small" variations of the inputs might have a large impact on the system. And this happens. Ref.: Intriguing properties of neural networks. C. Szegedy et al. • No generalisation result. Rademacher complexity can not explain the generalization properties. Ref.: Understanding deep learning requires rethinking generalization C. Zhang et al. • Shall we learn each layer from scratch? (geometric priors?) The deep cascade makes features are hard to Ref.: Deep Roto-Translation Scattering interpret for Object Classi fi cation. EO and S Mallat

DATA 13 Organization is a key • Consider a problem of questionnaires: people answer to 0 or 1 to some question. What does structuration means? Ref.: Harmonic Analysis of Digital Data Bases Coifman R. et al. In general, structuration à changer Organizing questions Questions Questions works tackle only one of the aspect Answers Answers Both Organizing answers neighbours Questions Questions become meaningful: local metrics Answers Answers

DATA   14 Organization permits creation of invariance • As (all) the sources of regularities are obtained, interpolating new points is possible ( in statistical terms: generalisation property! )   regularity • In the previous case, one can build a discriminative and invariant representation: Haar wavelets on graphs for example. + + - Questions Ref.: Harmonic Analysis of Digital Data Bases Coifman R. et al. 0 0 Answers

DATA 15 Organising the CNN representation: Local Support Vectors Ref.: Building a Regular Decision • Let’s consider a CNN of depth J. Boundary with Deep Networks EO Local dimension is intractable! • Local Support Vectors of order k at depth j: representations at depth j that are well classified by a k- NN but not by a l-NN for l<k   2-LSV 4-LSV k-LSV, k>6 0-LSV • They give a measure of the separation-contraction via: j ) , l  k + 1 } > k j | card { y ( x ( l ) j ) 6 = y ( x ( l ) Γ k +1 x j 2 Γ k � = j 2 x ( l ) j : l-NN at depth j

DATA 16 Complexity measure # of k-local support vectors at di ff erent depth n Slow decay to stationary regime indicates high complexity (separation) Small amount indicates contraction

DATA 17 An organisation of the representation • There is a progressive localisation which explains why a 1-NN (or a Gaussian SVM) works better with depth: linear metrics are more meaningful in low dimension • How do the representation got localized? Necessary variability reduction

DATA 18 Identifying the variabilities? • Several works showed a Deepnet exhibits some covariance: Ref.: Understanding deep features with computer-generated imagery, M Aubry, B Russel • Manifold of faces at a certain depth: • Can we use these? Ref.: Unsupervised Representation Learning with Deep Convolutional GAN, Radford, Metz & Chintalah

Organizing Deep Networks Edouard Oyallon advisor: Stphane Mallat - PowerPoint PPT Presentation

DATA 1 Organizing Deep Networks Edouard Oyallon advisor: Stphane Mallat following the works of Laurent Sifre, Joan Bruna, collaborators : Eugene Belilovsky, Sergey Zagoruyko, Bogdan Cirstea, Jrn Jacobsen, 2 DATA Classification

Self-Organizing Maps Kyle Thayer Organizing Marbles Self-Organizing Maps Algorithm

Self-Organizing Maps Cao Mai December 14, 2009 Outline n Define Self-Organizing Maps (SOM)

MANAGEMENT FUNDAMENTALS ORGANIZING ORGANIZING Lesson 3 Part 2 After developing plans,

Self organizing robot Self organizing robot gathering Seminar in Distributed Computing Christof

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

E9 205 Machine Learning for Signal Processing Understanding Deep Networks 08-11-2019 Instructor

Combatting Slumlords Through Grassroots Organizing Carlos Aguilar, Director of Organizing

Curriculum on Communications L2/B: Organizing Your Thoughts Organizing Your Thoughts Agenda

Mathematical Modeling of Mathematical Modeling of Self-Organizing Systems Self Organizing

ORGANIZING REGULATED RESEARCH ACTIVITIES ORGANIZING REGULATED RESEARCH ACTIVITIES Research

Organizing a BUG Bounty program GetClouder Marian Marinov < mm@1h.com > GetClouder

Parallelizing the Growing Self-Organizing Maps algorithm using Software Transactional Memory

Is There a New Cartography? Steve Chilton Middlesex University Educational Development Manager

Instruction Set Architecture "Speaking with the computer" CSE 141, S2'06 Jeff Brown

Diagonalization of the Discrete Fourier Transform using Weil Representation Shamgar Gurevich

Visual Analysis of the Air Pollution Problem in Hong Kong Huamin Qu, Wing-Yi Chan , Anbang Xu,

Connected components, universal covers and the Lascar group Anand Pillay University of Leeds

Lists Advanced Concepts Dennis Komm Programming and Problem-Solving The modules numpy ,

Koszul Duality and Geometric Satake for SL 2 ( R ) Oliver Straser Annual conference of the DFG

Lets Fight Air Inequality with Open Data Christa Hasenkopf, PhD Co-Founder & CEO of

Organizing Deep Networks Edouard Oyallon advisor: Stphane Mallat - PowerPoint PPT Presentation

DATA 1 Organizing Deep Networks Edouard Oyallon advisor: Stphane Mallat following the works of Laurent Sifre, Joan Bruna, collaborators : Eugene Belilovsky, Sergey Zagoruyko, Bogdan Cirstea, Jrn Jacobsen, 2 DATA Classification

Self-Organizing Maps Kyle Thayer Organizing Marbles Self-Organizing Maps Algorithm

Self-Organizing Maps Cao Mai December 14, 2009 Outline n Define Self-Organizing Maps (SOM)

MANAGEMENT FUNDAMENTALS ORGANIZING ORGANIZING Lesson 3 Part 2 After developing plans,

Self organizing robot Self organizing robot gathering Seminar in Distributed Computing Christof

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

E9 205 Machine Learning for Signal Processing Understanding Deep Networks 08-11-2019 Instructor

Combatting Slumlords Through Grassroots Organizing Carlos Aguilar, Director of Organizing

Curriculum on Communications L2/B: Organizing Your Thoughts Organizing Your Thoughts Agenda

Mathematical Modeling of Mathematical Modeling of Self-Organizing Systems Self Organizing

ORGANIZING REGULATED RESEARCH ACTIVITIES ORGANIZING REGULATED RESEARCH ACTIVITIES Research

Organizing a BUG Bounty program GetClouder Marian Marinov &lt; mm@1h.com &gt; GetClouder

Parallelizing the Growing Self-Organizing Maps algorithm using Software Transactional Memory

Is There a New Cartography? Steve Chilton Middlesex University Educational Development Manager

Instruction Set Architecture &quot;Speaking with the computer&quot; CSE 141, S2'06 Jeff Brown

Diagonalization of the Discrete Fourier Transform using Weil Representation Shamgar Gurevich

Visual Analysis of the Air Pollution Problem in Hong Kong Huamin Qu, Wing-Yi Chan , Anbang Xu,

Connected components, universal covers and the Lascar group Anand Pillay University of Leeds

Lists Advanced Concepts Dennis Komm Programming and Problem-Solving The modules numpy ,

Koszul Duality and Geometric Satake for SL 2 ( R ) Oliver Straser Annual conference of the DFG

Lets Fight Air Inequality with Open Data Christa Hasenkopf, PhD Co-Founder &amp; CEO of

Organizing a BUG Bounty program GetClouder Marian Marinov < mm@1h.com > GetClouder

Instruction Set Architecture "Speaking with the computer" CSE 141, S2'06 Jeff Brown

Lets Fight Air Inequality with Open Data Christa Hasenkopf, PhD Co-Founder & CEO of