learning music images and physics with deep neural
play

Learning Music, Images and Physics with Deep Neural Networks Joan - PowerPoint PPT Presentation

Learning Music, Images and Physics with Deep Neural Networks Joan Bruna, Matthew Hirn, Stphane Mallat Vincent Lostanlen, Edouard Oyallon, Nicolas Poilvert, Laurent Sifre, Irne Waldspurger cole Normale Suprieure www.di.ens.fr/data High


  1. Learning Music, Images and Physics with Deep Neural Networks Joan Bruna, Matthew Hirn, Stéphane Mallat Vincent Lostanlen, Edouard Oyallon, Nicolas Poilvert, Laurent Sifre, Irène Waldspurger École Normale Supérieure www.di.ens.fr/data

  2. High Dimensional Learning • High-dimensional x = ( x (1) , ..., x ( d )) ∈ R d : • Classification: estimate a class label f ( x ) given n sample values { x i , y i = f ( x i ) } i ≤ n Image Classification d = 10 6 Huge variability Anchor Joshua Tree Beaver Lotus Water Lily inside classes

  3. High Dimensional Learning • High-dimensional x = ( x (1) , ..., x ( d )) ∈ R d : • Classification: estimate a class label f ( x ) given n sample values { x i , y i = f ( x i ) } i ≤ n Huge variability Audio: instrument recognition inside classes

  4. High Dimensional Learning • High-dimensional x = ( x (1) , ..., x ( d )) ∈ R d : • Regression: approximate a functional f ( x ) given n sample values { x i , y i = f ( x i ) ∈ R } i ≤ n Physics: energy f ( x ) of a state vector x Astronomy Quantum Chemistry

  5. Curse of Dimensionality • f ( x ) can be approximated from examples { x i , f ( x i ) } i by local interpolation if f is regular and there are close examples: ? x • Need ✏ − d points to cover [0 , 1] d at a Euclidean distance ✏ ) k x � x i k is always large Huge variability inside classes

  6. Learning by Euclidean Embedding Representation x ∈ R d Data: Φ x ∈ H k x � x 0 k : non-informative Linear Classifier Φ x Gaussian & Separated k Φ x � Φ x 0 k ”Similarity” metric: ∆ ( x, x 0 ) Equivalent Euclidean metric: C 1 k Φ x � Φ x 0 k  ∆ ( x, x 0 )  C 2 k Φ x � Φ x 0 k How to define Φ ?

  7. Deep Convolution Neworks • The revival of an old (1950) idea: Y. LeCun , G. Hinton x L 1 linear convolution ρ ( u ) = | u | non-linear scalar: neuron ρ L 2 linear convolution ρ . . . Linear Classificat. Φ ( x ) Optimize the L k with support constraints: over 10 9 parameters Exceptional results for images, speech, bio-data classification. Products by FaceBook, IBM, Google, Microsoft, Yahoo... Why does it work so well ?

  8. Overview • Deep multiscale networks: invariant and stable metrics on groups • Image classification • Models of audio and image textures: information theory • Learning physics: quantum chemistry energy regression

  9. Image Metrics • Low-dimensional ”geometric shapes” x 0 ( u ) x ( u ) (classic mechanics) Deformation metric: Grenander Di ff eomorphism action: D τ x ( u ) = x ( u − τ ( u )) ∆ ( x, x 0 ) ⇠ min k D τ x � x 0 k + kr τ k 1 k x k τ Invariant to translations di ff eomorphism amplitude

  10. Image Metrics X ( u ) • High dimensional textures: ergodic stationary processes 2D Turbulence Highly non-Gaussian processes • A Euclidean metric is a Maximum Likelihood on Gaussian models. • Can we find Φ so that Φ ( X ) is nearly Gaussian, without loosing information ?

  11. Euclidean Metric Embedding • Stability to additive perturbations: k Φ x � Φ x 0 k  C k x � x 0 k • Invariance to translations: x c ( u ) = x ( u − c ) ⇒ Φ ( x c ) = Φ ( x ) • Stability to deformations: x τ ( u ) = x ( u � τ ( u )) ) k Φ x � Φ x τ k  C kr τ k ∞ k x k Failure of Fourier and classic invariants

  12. Wavelet Transform ψ λ ( t ) = 2 − j/Q ψ (2 − j/Q t ) with λ = 2 − j/Q • Dilated wavelets: ψ λ � ( t ) | ˆ | ˆ ψ λ ( ω ) | 2 | ˆ ψ λ � ( ω ) | 2 φ ( ω ) | 2 ψ λ ( t ) λ � λ 0 ω Q-constant band-pass filters ˆ ψ λ ✓ x ? � 2 J ( t ) ◆ : average Wx = • Wavelet transform: x ? λ ( t ) : higher λ ≤ 2 J frequencies Preserves norm: � Wx � 2 = � x � 2 .

  13. Scale separation with Wavelets • Complex wavelet: ψ ( t ) = g ( t ) exp i ξ t , t = ( t 1 , t 2 ) ψ λ ( t ) = 2 − j ψ (2 − j r θ t ) with λ = (2 j , θ ) rotated and dilated: ω 2 real parts imaginary parts | ˆ ψ λ ( ω ) | 2 ω 1 ✓ x ? � 2 J ( t ) ◆ : average Wx = • Wavelet transform: x ? λ ( t ) : higher λ ≤ 2 J frequencies Preserves norm: � Wx � 2 = � x � 2 .

  14. Fast Wavelet Transform 2 0 | W 1 | 2 1 | x ? 2 1 , θ | 2 J Scale

  15. Wavelet Transform 2 0 | W 1 | | W 1 | 2 1 | x ? 2 1 , θ | 2 2 | x ? 2 2 , θ | 2 3 | x ? 2 3 , θ | 2 J Depth: Scale x ? � J : locally invariant by translation How to make everything invariant to translation ?

  16. Wavelet Translation Invariance First wavelet transform full translation invariance x ( t ) local translation invariance x ? � 2 J ( t ) ✓ x ? � 2 J ✓ ◆ x ? � 2 J ◆ W 1 x = | W 1 | x = x ? λ 1 | x ? λ 1 | 2 J = ∞ λ 1 2 J λ 1 q x ? λ 1 ( t ) = x ? a λ 1 ( t ) + i x ? b λ 1 ( t ) | 2 + | x ? b λ 1 ( t ) | x ? a λ 1 ( t ) | 2 Modulus improves invariance: | x ? λ 1 ( t ) | = but covariant | x ? λ 1 | ? � 2 J ( t ) Second wavelet transform modulus ✓ ◆ | x ? λ 1 | ? � 2 J ( t ) | W 2 | | x ? λ 1 | = || x ? λ 1 | ? λ 2 ( t ) | λ 2

  17. Scattering Transform x x ? � 2 J | W 1 | | x ⇥ � λ 1 ( t ) | | x ⇥ � λ 0 1 ( t ) | | x ⇥ � λ 00 1 ( t ) | | x ⇥ � λ 000 1 ( t ) |

  18. Scattering Transform x x ? � 2 J | W 1 | | x ? λ 1 | ? � 2 J | W 2 | || x ? λ 1 | ? λ 2 ( t ) |

  19. Scattering Neural Network x x ? � 2 J | W 1 | | x ? λ 1 | ? � 2 J | W 2 | || x ? λ 1 | ? λ 2 | ? � 2 J | W 3 | ||| x ? λ 1 | ? λ 2 | ? λ 3 |

  20. Scattering Properties   x ? � 2 J | x ? λ 1 | ? � 2 J   = . . . | W 3 | | W 2 | | W 1 | x   || x ? λ 1 | ? λ 2 | ? � 2 J S J x =     ||| x ? λ 2 | ? λ 2 | ? λ 3 | ? � 2 J   ... λ 1 , λ 2 , λ 3 ,... W k is unitary ⇒ | W k | is contractive Theorem : For appropriate wavelets, a scattering is contractive k S J x � S J y k  k x � y k ( L 2 stability ) preserves norms k S J x k = k x k translations invariance and deformation stability: if x τ ( u ) = x ( u − τ ( u )) then J →∞ k S J x τ � S J x k  C kr τ k ∞ k x k lim

  21. Digit Classification: MNIST Joan Bruna y = f ( x ) Linear Classifier S J x x Classification Errors Training size Conv. Net. Scattering 50000 0 . 5% 0 . 4 % LeCun et. al.

  22. Classification of Textures J. Bruna CUREt database 61 classes Texte Scat. Moments y = f ( x ) Linear Classifier S J x x 2 J = image size Classification Errors Training Fourier Histogr. Scattering per class Spectr. Features 46 1% 1% 0 . 2 %

  23. Scattering Moments of Processes The scattering transform of a stationary process X ( t )   X | X ? λ 1 |   ? � 2 J : Gaussian for 2 J large   || X ? λ 1 | ? λ 2 | S J X =     ||| X ? λ 2 | ? λ 2 | ? λ 3 | if X is ergodic   ... J → ∞   E ( X ) E ( | X ? λ 1 | )     E ( SX ) = E ( || X ? λ 1 | ? λ 2 | )     E ( ||| X ? λ 2 | ? λ 2 | ? λ 3 | )   ... λ 1 , λ 2 , λ 3 ,...

  24. Representation of Random Processes   E ( X ) = E ( U 0 X ) E ( | X ? λ 1 | ) = E ( U 1 X )     E ( SX ) = E ( || X ? λ 1 | ? λ 2 | ) = E ( U 2 X )     E ( ||| X ? λ 2 | ? λ 2 | ? λ 3 | ) = E ( U 3 X )   ... λ 1 , λ 2 , λ 3 ,... Theorem (Boltzmann) The distribution p ( x ) which satisfies Z R N U m x p ( x ) dx = E ( U m X ) R with a maximum entropy H max = − p ( x ) log p ( x ) dx is ∞ p ( x ) = 1 ⇣ ⌘ X Z exp λ m . U m x m =1 H max ≥ H ( X ) (entropie of X) Little loss of information: H max ≈ H ( X )

  25. Ergodic Texture Reconstructions Joan Bruna Original Textures 2D Turbulence Gaussian process model with same second order moments Second order Gaussian Scattering: O (log N 2 ) moments E ( | x ? λ 1 | ) , E ( || x ? λ 1 | ? λ 2 | )

  26. Representation of Audio Textures Joan Bruna Gaussian Gaussian Original in time in scattering 60 ω 20 40 Applauds 60 t 20 Paper 40 60 Cocktail Party

  27. Failures: Harmonic Sounds V . Lostanlen Need to express frequency channel interactions: time-frequency image Bird Speech Cello

  28. Harmonic Spiral V . Lostanlen Need to capture frequency variability and structures. octave λ 5 - - 4 • - • • • • 3 - • • • • • - j • • 2 • • - • 1 • - • • θ t R R × Z • Alignment of harmonics in two main groups. More regular variations along ( θ , j ) than λ

  29. Rotation and Scaling Invariance Laurent Sifre UIUC database: 25 classes Scattering classification errors Training Scat. Translation 20 20 %

  30. Extension to Rigid Mouvements Laurent Sifre Need to capture the variability of spatial directions. • Group of rigid displacements: translations and rotations • Action on wavelet coe ffi cients: rotation & translation rotation & translation , angle translation x j ( r α ( u − c ) , θ − α ) x j ( u, ✓ ) = | x ? 2 j , θ ( u ) | x ( u ) | W 1 | | W 1 | x ( r α ( u − c )) R R x ( u ) du x ( u ) du

  31. Extension to Rigid Mouvements Laurent Sifre • To build invariants: second wavelet transform on L 2 ( G ): convolutions of x j ( u, θ ) with wavelets ψ λ 2 ( u, θ ) Z 2 π Z x j ~ ψ λ 2 ( u, θ ) = x j ( v, α ) ψ λ 2 ( u − v, θ − α ) dv d α R 2 0 • Scattering on rigid mouvements: Wavelets on Translations Wavelets on Rigid Mvt. Wavelets on Rigid Mvt. | x j ~ ψ λ 2 ( v, θ ) | | W 3 | | W 2 | x j ( u , θ ) x ( u ) | W 1 | Z | x j ~ ψ λ 2 ( v, θ ) | dud θ R x j ( u, θ ) dud θ R x ( u ) du

Recommend


More recommend