Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics On the Invertibility of ReLU Networks Inverse Problems and Machine Learning, Caltech Jens Behrmann joint work with: S¨ oren Dittmer, Pascal Fernsel, Peter Maass February 09. 2018 Motivation Uniqueness Stability 1 / 20 U n i v e r s i t y o f B re m e n
Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Motivation: Inverting a network Reconstruct input x from features 1 10 -3 2.2 2 z ∗ ≈ F ( x ) , 1.8 1.6 1.4 1.2 F : R d → R D , MLP or CNN 1 x ∗ ∈ R d input 0.8 0.6 z ∗ ∈ R D features, z ∗ = F ( x ∗ ) 0.4 0.2 600 800 1000 1200 1400 1600 1800 2000 2200 Further applications: Inverse problems with learned forward operators Theoretical understanding ... 1 Mahendran et al. 2015: Understanding deep image representations by inverting them Motivation Uniqueness Stability 2 / 20 U n i v e r s i t y o f B re m e n
Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Main Questions 1 How is information lost during propagation? Pre-images of ReLU layers 2 Is the inverse mapping stable/ instable? Singular values of linearization Related work: Invertibility via assumptions of random weights 2 , 3 Injectivity and stability of ReLU and pooling 4 2 Giryes et al. 2016: DNN with Random Gaussian Weights: A Universal Classification Strategy? 3 Arora et al. 2015: Why are deep nets reversible: a simple theory, with implications for training 4 Bruna et al. 2014: Signal Recovery from Pooling Representations Motivation Uniqueness Stability 3 / 20 U n i v e r s i t y o f B re m e n
Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Injectivity, Pre-images, Activation functions Combinatorial conditions for injectivity under ReLU 5 Definition (Retrieval, singleton pre-images) A ∈ R m × n , b ∈ R m . Then, ( A , b ) does retrieval under ReLU for x ∈ R n if the pre-image of ReLU( Ax + b ) is a singleton. Remark: Other activation functions like ELU, leakyReLU, tanh injective cReLU injective if A is frame 6 5 Bruna et al. 2014: Signal Recovery from Pooling Representations 6 Shang et al. 2016: Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units Motivation Uniqueness Stability 4 / 20 U n i v e r s i t y o f B re m e n
Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Equality and Inequality Systems A | y > 0 x + b | y > 0 = y | y > 0 A | y =0 x + b | y =0 ≤ 0 . Consider the two cases N ( A | y > 0 ) = { 0 } and N ( A | y > 0 ) � = { 0 } A | y ≤ 0 ( P N ( A | y > 0 ) ⊥ x + P N ( A | y > 0 ) x ) + b | y ≤ 0 ≤ 0 Rewrite it into: Ax + b ≤ 0 Motivation Uniqueness Stability 5 / 20 U n i v e r s i t y o f B re m e n
Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Definition (Omnidirectional) A ∈ R m × n is called omnidirectional if ∃ ! x : Ax ≤ 0 . Corollary The following statements are equivalent: 1 A ∈ R m × n is omnidirectional. 2 Every linear open halfspace contains a row of A. 3 Ax ≤ 0 implies x = 0 , where x ∈ R n . Motivation Uniqueness Stability 6 / 20 U n i v e r s i t y o f B re m e n
Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Definition (Omnidirectional for point) A ∈ R m × n is called omnidirectional if ∃ ! x : Ax ≤ 0 . A ∈ R m × n and b ∈ R m is called omnidirectional for the point p ∈ R n if b = − Ap and A omnidirectional. p Motivation Uniqueness Stability 7 / 20 U n i v e r s i t y o f B re m e n
Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Theorem (Unique solutions of inequality system) Let Ax + b ≤ 0 have a solution x 0 . Then this solution is unique iff there exists an index set, I, for the rows s.t. ( A | I , b | I ) is omnidirectional for x 0 . Realistic? p Motivation Uniqueness Stability 8 / 20 U n i v e r s i t y o f B re m e n
Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Pre-Image finite or infinite? Motivation Uniqueness Stability 9 / 20 U n i v e r s i t y o f B re m e n
Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Pre-Image finite or infinite? Theorem (Convex hull) A ∈ R m × n is omnidirectional iff 0 ∈ Conv( A ) o , where Conv( A ) o is the interior of the convex hull, spanned by the rows of A. Motivation Uniqueness Stability 9 / 20 U n i v e r s i t y o f B re m e n
Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Singleton / Finite / Infinite Setup: 2 layer MLP on MNIST, (3500, 784) neurons 1 Count number of positive outputs ( > 784 singleton) 2 Projection onto Null-Space of equality system 3 Check for omnidirectionality via linear programming (convex hull as side-condition) Motivation Uniqueness Stability 10 / 20 U n i v e r s i t y o f B re m e n
Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Singleton / Finite / Infinite Setup: 2 layer MLP on MNIST, (3500, 784) neurons 1 Count number of positive outputs ( > 784 singleton) 2 Projection onto Null-Space of equality system 3 Check for omnidirectionality via linear programming (convex hull as side-condition) in-/finite Finite Infinite # (in-)finite pre-image 40 singleton 30 infinite 20 10 0 300 400 500 600 723 900 1 , 000 784 # positive outputs Motivation Uniqueness Stability 10 / 20 U n i v e r s i t y o f B re m e n
Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Stability - Locally Linear Theorem (Linear functions on convex polytopes 7 ) The input space R d of a ReLU network F is partitioned into convex polytopes P F , where for P ∈ P F F ( x ) = A P x + b P , ∀ x ∈ P . (1) 7 Raghu et al. 2017: On the Expressive Power of Deep Neural Networks Motivation Uniqueness Stability 11 / 20 U n i v e r s i t y o f B re m e n
Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Stability - Simplifications Assume: x ∈ P known (for reconstruction of x given a output z ∗ of the network F ) Analyze: Stability of linearization using singular values σ min , σ max : x , x ′ ∈ P ∩ N ( A P ) ⊥ σ min � x − x ′ � 2 ≤ � A P ( x − x ′ ) � 2 ≤ σ max � x − x ′ � 2 , Source: Raghu et al. 2017: On the Expressive Power of Deep Neural Networks Motivation Uniqueness Stability 12 / 20 U n i v e r s i t y o f B re m e n
Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Stability - ReLU as Diagonal Matrix Linearization A P of a network with L layers can be written as 8 � 1 , i �∈ I A P = A L D I L − 1 A L − 1 · · · D I 1 A 1 , where D ii = . 0 , i ∈ I → Removal of rows due to ReLU x • � � 8 Wang et al. 2016: Analysis of deep neural networks with extended data jacobian matrix Motivation Uniqueness Stability 13 / 20 U n i v e r s i t y o f B re m e n
Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Lemma (Removal of weakly correlated rows) A ∈ R m × n with rows a j and I ⊆ [ m ] . For a fixed k ∈ I let a k ∈ N ( D I A ) ⊥ . Moreover, let ∀ j �∈ I : |� a j , a k �| ≤ c � a k � 2 √ , M where M = m − | I | and constant c > 0 . Then for the singular values σ l � = 0 of D I A: 0 < σ K = min { σ l : σ l � = 0 } ≤ c x • � � Motivation Uniqueness Stability 14 / 20 U n i v e r s i t y o f B re m e n
Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Numerical Experiments Convolutional Networks (CNN) fit the theoretical framework Linearization via backpropagation w.r.t. input Full SVD for different layers/ samples (nonlinear!) Small CNN on CIFAR10 Type kernel size stride # feature maps # output units Conv layer (3,3) (1,1) 32 - Conv layer (3,3) (2,2) 64 - Conv layer (3,3) (1,1) 64 - Conv layer (3,3) (1,1) 32 - Conv layer (3,3) (1,1) 32 - Conv layer (3,3) (2,2) 64 - Dense layer - - - 512 Dense layer - - - 10 Motivation Uniqueness Stability 15 / 20 U n i v e r s i t y o f B re m e n
Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Effect of ReLU 10 2 10 1 10 0 Singular value 10 − 1 10 − 2 Layer 3 Layer 4 10 − 3 Layer 9 Layer 10 10 − 4 0 500 1000 1500 2000 2500 3000 3500 Index of singular value Motivation Uniqueness Stability 16 / 20 U n i v e r s i t y o f B re m e n
Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Decay over Layers 10 2 10 1 10 0 Singular value 10 − 1 Layer 1 Layer 2 10 − 2 Layer 3 Layer 4 10 − 3 Layer 5 Layer 6 10 − 4 0 500 1000 1500 2000 2500 3000 3500 Index of singular value Motivation Uniqueness Stability 17 / 20 U n i v e r s i t y o f B re m e n
Recommend
More recommend