Harmonic Analysis of Deep Convolutional Networks 1 Yuan YAO HKUST - PowerPoint PPT Presentation

Harmonic Analysis of Deep Convolutional Networks 1 Yuan YAO HKUST Based on Mallat and Bolcskei talks etc.

Acknowledgement A following-up course at HKUST: https://deeplearning-math.github.io/

High Dimensional Natural Image Classification • High-dimensional x = ( x (1) , ..., x ( d )) ∈ R d : • Classification: estimate a class label f ( x ) given n sample values { x i , y i = f ( x i ) } i ≤ n Image Classification d = 10 6 Huge variability Anchor Joshua Tree Beaver Lotus Water Lily inside classes Find invariants

Curse of Dimensionality • Analysis in high dimension: x ∈ R d with d ≥ 10 6 . • Points are far away in high dimensions d : o o o o o o o o o o - 10 points cover [0 , 1] at a distance 10 − 1 o o o o o o o o o o - 100 points for [0 , 1] 2 o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o - need 10 d points over [0 , 1] d o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o impossible if d ≥ 20 points are volume sphere of radius r concentrated lim = 0 in 2 d corners! volume [0 , r ] d d →∞ ⇒ Euclidean metrics are not appropriate on raw data .

A Blessing from Physical world? Multiscale “compositional” sparsity • Variables x ( u ) indexed by a low-dimensional u : time/space... pixels in images, particles in physics, words in text... • Mutliscale interactions of d variables: u 1 u 2 From d 2 interactions to O (log 2 d ) multiscale interactions. • Multiscale analysis: wavelets on groups of symmetries. hierarchical architecture.

Learning as an Approximation • To estimate f ( x ) from a sampling { x i , y i = f ( x i ) } i ≤ M we must build an M -parameter approximation f M of f . • Precise sparse approximation requires some ”regularity”. ⇢ 1 if x ∈ Ω • For binary classification f ( x ) = − 1 if x / ∈ Ω f ( x ) = sign( ˜ f ( x )) where ˜ f is potentially regular. • What type of regularity ? How to compute f M ?

1 Hidden Layer Neural Networks One-hidden layer neural network: ρ ( w n .x + b n ) x α n M w n .x = P k w k,n x k X f M ( x ) = α n ρ ( w n .x + b n ) n =1 { w k,k } k,n and { α n } n are learned d non-linear approximation. M Fourier series: ρ ( u ) = e iu M X α n e iw n .x f M ( x ) = n =1 For nearly all ρ : essentially same approximation results.

Piecewise Linear Approximation • Piecewise linear approximation: ρ ( u ) = max( u, 0) ˜ X f ( x ) = a n ⇢ ( x − n ✏ ) f ( x ) n x ✏ n ✏ If f is Lipschitz: | f ( x ) − f ( x 0 ) | ≤ C | x − x 0 | | f ( x ) − ˜ f ( x ) | ≤ C ✏ . ⇒ Need M = ✏ − 1 points to cover [0 , 1] at a distance ✏ k f � f M k  C M − 1 ⇒

Linear Ridge Approximation • Piecewise linear ridge approximation: x ∈ [0 , 1] d ˜ X f ( x ) = a n ⇢ ( w n .x − n ✏ ) ρ ( u ) = max( u, 0) n If f is Lipschitz: | f ( x ) � f ( x 0 ) |  C k x � x 0 k Sampling at a distance ✏ : | f ( x ) − ˜ f ( x ) | ≤ C ✏ . ⇒ need M = ✏ − d points to cover [0 , 1] d at a distance ✏ ⇒ k f � f M k  C M − 1 /d Curse of dimensionality!

Approximation with Regularity • What prior condition makes learning possible ? • Approximation of regular functions in C s [0 , 1] d : | f ( x ) − p u ( x ) | ≤ C | x − u | s with p u ( x ) polynomial ∀ x, u f ( x ) u x p u ( x ) | x − u | ≤ ✏ 1 /s | f ( x ) − p u ( x ) | ≤ C ✏ ⇒ Need M − d/s point to cover [0 , 1] d at a distance ✏ 1 /s k f � f M k  C M − s/d ⇒ • Can not do better in C s [0 , 1] d , not good because s ⌧ d . Failure of classical approximation theory.

Kernel Learning Change of variable Φ ( x ) = { φ k ( x ) } k ≤ d 0 to nearly linearize f ( x ), which is approximated by: ˜ X f ( x ) = h Φ ( x ) , w i = w k φ k ( x ) . 1D projection k Φ ( x ) ∈ R d 0 x ∈ R d Data: Linear Classifier w Φ x Metric: k x � x 0 k k Φ ( x ) � Φ ( x 0 ) k • How and when is possible to find such a Φ ? • What ”regularity” of f is needed ?

Spirit in Fisher’s Linear Discriminant Analysis Reduction of Dimensionality • Discriminative change of variable Φ ( x ): Φ ( x ) 6 = Φ ( x 0 ) if f ( x ) 6 = f ( x 0 ) ∃ ˜ f with f ( x ) = ˜ f ( Φ ( x )) ⇒ • If ˜ f is Lipschitz: | ˜ f ( z ) � ˜ f ( z 0 ) |  C k z � z 0 k z = Φ ( x ) | f ( x ) � f ( x 0 ) |  C k Φ ( x ) � Φ ( x 0 ) k , Discriminative: k Φ ( x ) � Φ ( x 0 ) k � C � 1 | f ( x ) � f ( x 0 ) | • For x ∈ Ω , if Φ ( Ω ) is bounded and a low dimension d 0 ) k f � f M k  C M − 1 /d 0

Deep Convolution Neworks • The revival of neural networks: Y. LeCun x L 1 linear convolution ρ ( u ) = max( u, 0) non-linear scalar: neuron ρ Hierarchical L 2 linear convolution invariants ρ Linearization . . . Linear Classificat. y = ˜ Φ ( x ) f ( x ) Optimize L j with architecture constraints: over 10 9 parameters Exceptional results for images, speech, language, bio-data... Why does it work so well ? A di ffi cult problem

Deep Convolutional Networks x J ( u, k J ) x 2 ( u, k 2 ) x ( u ) x 1 ( u, k 1 ) ρ L J classification ρ L 1 k 1 k 2 x j = ρ L j x j − 1 • L j is a linear combination of convolutions and subsampling: ⇣ X ⌘ x j ( u, k j ) = ⇢ x j − 1 ( · , k ) ? h k j ,k ( u ) k sum across channels • ρ is contractive: | ρ ( u ) − ρ ( u 0 ) | ≤ | u − u 0 | ρ ( u ) = max( u, 0) or ρ ( u ) = | u |

Many Questions x J ( u, k J ) x 2 ( u, k 2 ) x ( u ) x 1 ( u, k 1 ) ρ L J classification ρ L 1 ρ L j k 1 k 2 • Why convolutions ? Translation covariance. • Why no overfitting ? Contractions, dimension reduction • Why hierarchical cascade ? • Why introducing non-linearities ? • How and what to linearise ? • What are the roles of the multiple channels in each layer ?

Linear Dimension Reduction Ω 1 Classes Ω 2 x Ω 3 Level sets of f ( x ) Ω t = { x : f ( x ) = t } Φ ( x ) If level sets (classes) are parallel to a linear space then variables are eliminated by linear projections: invariants .

Linearise for Dimensionality Reduction Classes x Level sets of f ( x ) Ω 2 Ω 1 Ω t = { x : f ( x ) = t } Ω 3 Φ ( x ) • If level sets Ω t are not parallel to a linear space - Linearise them with a change of variable Φ ( x ) - Then reduce dimension with linear projections • Di ffi cult because Ω t are high-dimensional, irregular, known on few samples.

Level Set Geometry: Symmetries • Curse of dimensionality ⇒ not local but global geometry Level sets: classes , characterised by their global symmetries. Ω 1 g g Ω 2 • A symmetry is an operator g which preserves level sets: f ( g.x ) = f ( x ) . ∀ x , : global If g 1 and g 2 are symmetries then g 1 .g 2 is also a symmetry f ( g 1 .g 2 .x ) = f ( g 2 .x ) = f ( x )

Groups of symmetries • G = { all symmetries } is a group: unknown ⇒ g.g 0 ∈ G ∀ ( g, g 0 ) ∈ G 2 g − 1 ∈ G ∀ g ∈ G , Inverse: ( g.g 0 ) .g 00 = g. ( g 0 .g 00 ) Associative: If commutative g.g 0 = g 0 .g : Abelian group. • Group of dimension n if it has n generators: g = g p 1 1 g p 2 2 ... g p n n • Lie group: infinitely small generators (Lie Algebra)

Translation and Deformations • Digit classification: x 0 ( u ) = x ( u − τ ( u )) x ( u ) Ω 3 Ω 5 - Globally invariant to the translation group : small - Locally invariant to small di ff eomorphisms : huge group Video of Philipp Scott Johnson

Rotation and Scaling Variability • Rotation and deformations SO (2) × Di ff ( SO (2)) Group: • Scaling and deformations R × Di ff ( R ) Group:

Linearize Symmetries • A change of variable Φ ( x ) must linearize the orbits { g.x } g ∈ G g p g 1 x 1 .x x x 0 g 1 x 0 g p 1 .x 0 • Linearise symmetries with a change of variable Φ ( x ) Φ ( g p 1 .x ) Φ ( x ) Φ ( x 0 ) Φ ( g p 1 .x 0 ) • Lipschitz: 8 x, g : k Φ ( x ) � Φ ( g.x ) k  C k g k

Translation and Deformations • Digit classification: x 0 ( u ) x ( u ) - Globally invariant to the translation group - Locally invariant to small di ff eomorphisms Linearize small di ff eomorphisms: ⇒ Lipschitz regular Video of Philipp Scott Johnson

Translations and Deformations • Invariance to translations: g.x ( u ) = x ( u − c ) Φ ( g.x ) = Φ ( x ) . ⇒ • Small di ff eomorphisms: g.x ( u ) = x ( u − τ ( u )) Metric: k g k = kr τ k ∞ maximum scaling Linearisation by Lipschitz continuity k Φ ( x ) � Φ ( g.x ) k  C kr τ k ∞ . • Discriminative change of variable: k Φ ( x ) � Φ ( x 0 ) k � C � 1 | f ( x ) � f ( x 0 ) |

Fourier Deformation Instability x ( t ) e − i ω t dt R • Fourier transform ˆ x ( ω ) = x c ( ω ) = e − ic ω ˆ x c ( t ) = x ( t − c ) ˆ x ( ω ) ⇒ The modulus is invariant to translations: Φ ( x ) = | ˆ x | = | ˆ x c | • Instabilites to small deformations x τ ( t ) = x ( t − τ ( t )) : | | ˆ x τ ( ω ) | − | ˆ x ( ω ) | | is big at high frequencies | b x ( ω ) | | b x τ ( ω ) | ⌧ ( t ) = ✏ t ω ) k | ˆ x | � | ˆ x τ | k � kr τ k ∞ k x k

Wavelet Transform • Complex wavelet: ψ ( t ) = ψ a ( t ) + i ψ b ( t ) ψ λ ( t ) = 2 − j ψ (2 − j t ) with λ = 2 − j . • Dilated: ψ λ � ( t ) | ˆ | ˆ ψ λ ( ω ) | 2 | ˆ ψ λ � ( ω ) | 2 φ ( ω ) | 2 ψ λ ( t ) x ( ω ) ˆ λ � λ 0 ω Z x ? λ ( t ) = x ( u ) λ ( t − u ) du • Wavelet transform: ✓ x ? � ( t ) ◆ Wx = x ? λ ( t ) t, λ � Wx � 2 = � x � 2 . Unitary:

Harmonic Analysis of Deep Convolutional Networks 1 Yuan YAO HKUST - PowerPoint PPT Presentation

Harmonic Analysis of Deep Convolutional Networks 1 Yuan YAO HKUST Based on Mallat and Bolcskei talks etc. Acknowledgement A following-up course at HKUST: https://deeplearning-math.github.io/ High Dimensional Natural Image Classification

Contents averages averages Contents Contents Harmonic mean (average) Harmonic mean (average)

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Harmonic Map Let f : T 2 S 3 = SU (2) be a harmonic map. A harmonic map is a critical

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Class 14: Simple harmonic motion Class 14: Simple harmonic motion Origin of simple harmonic motion

Harmonic Analysis of Deep Convolutional Neural Networks Helmut B olcskei Department of

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

CS7015 (Deep Learning) : Lecture 13 Visualizing Convolutional Neural Networks, Guided

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image

Math 211 Math 211 Lecture #34 Forced Harmonic Motion November 14, 2003 2 Forced Harmonic

Math 211 Math 211 Lecture #35 Forced Harmonic Motion November 18, 2002 2 Forced Harmonic

Math 211 Math 211 Lecture #35 Forced Harmonic Motion November 19, 2001 2 Forced Harmonic

Math 211 Math 211 Lecture #35 Forced Harmonic Motion April 16, 2001 2 Forced Harmonic Motion

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Life! Reality Actions Of Others Davids Wilderness Isolated | Disconnected Saul Is

Utility-Driven Solar Projects for Low-Income Customers June 8, 2017 Housekeeping Use the red

Policies and Impact: An Analysis of Village-Level Micronance Institutions Joseph Kaposki (Ohio

Bayou City Waterkeeper For Wetlands Protection, Cleaner Water, & a More Resilient Future A

Welcome to this place of seeking Sayville Congregational United Church of Christ Where you are

Mine and Mineral Processing Virtual Workshop Se Sessi ssion 2 n 2 - Em Emerge genc ncy y

Welcome to 4th Grade! Mrs. Schulte Language Arts/ Writing Room 211 Meet your Teachers: Mrs.

Systematic Biases in Weak Lensing Cosmology with the Dark Energy Survey Simon Samuroff,

Harmonic Analysis of Deep Convolutional Networks 1 Yuan YAO HKUST - PowerPoint PPT Presentation

Harmonic Analysis of Deep Convolutional Networks 1 Yuan YAO HKUST Based on Mallat and Bolcskei talks etc. Acknowledgement A following-up course at HKUST: https://deeplearning-math.github.io/ High Dimensional Natural Image Classification

Contents averages averages Contents Contents Harmonic mean (average) Harmonic mean (average)

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Harmonic Map Let f : T 2 S 3 = SU (2) be a harmonic map. A harmonic map is a critical

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Class 14: Simple harmonic motion Class 14: Simple harmonic motion Origin of simple harmonic motion

Harmonic Analysis of Deep Convolutional Neural Networks Helmut B olcskei Department of

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

CS7015 (Deep Learning) : Lecture 13 Visualizing Convolutional Neural Networks, Guided

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Networks (Part III) 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image

Math 211 Math 211 Lecture #34 Forced Harmonic Motion November 14, 2003 2 Forced Harmonic

Math 211 Math 211 Lecture #35 Forced Harmonic Motion November 18, 2002 2 Forced Harmonic

Math 211 Math 211 Lecture #35 Forced Harmonic Motion November 19, 2001 2 Forced Harmonic

Math 211 Math 211 Lecture #35 Forced Harmonic Motion April 16, 2001 2 Forced Harmonic Motion

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Life! Reality Actions Of Others Davids Wilderness Isolated | Disconnected Saul Is

Utility-Driven Solar Projects for Low-Income Customers June 8, 2017 Housekeeping Use the red

Policies and Impact: An Analysis of Village-Level Micronance Institutions Joseph Kaposki (Ohio

Bayou City Waterkeeper For Wetlands Protection, Cleaner Water, &amp; a More Resilient Future A

Welcome to this place of seeking Sayville Congregational United Church of Christ Where you are

Mine and Mineral Processing Virtual Workshop Se Sessi ssion 2 n 2 - Em Emerge genc ncy y

Welcome to 4th Grade! Mrs. Schulte Language Arts/ Writing Room 211 Meet your Teachers: Mrs.

Systematic Biases in Weak Lensing Cosmology with the Dark Energy Survey Simon Samuroff,

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image

Bayou City Waterkeeper For Wetlands Protection, Cleaner Water, & a More Resilient Future A