A Mathematical Theory of Dimensionality Reduction Abbas Kazemipour Druckmann Lab Meeting October 26, 2018
Introduction: Decoding and Dimensionality Reduction Autoencoders Perform well in most applications but not always! § Don’t generalize easily § Hard to understand theoretically § # " ! " # " $ . . . . . . . . . . . . . . . . . . Decoder Encoder
Introduction: Decoding and Dimensionality Reduction Our focus today: Decoder side Observed data: ! " ∈ ℝ % , ' = 1,2 ⋯ , , § Common latents - " ∈ ℝ . generate the data in an unknown nolinear fashion: § ! 0" = 1 0 - " + 3 0" ! " / - " . . . . . . . . . Decoder
Introduction: Role of Dynamics Dynamics play an important role in solving the inverse problem The inverse problem is still ill-posed ! Latents can only be identified up to an § isomorphism ! " = * ! "+, $ %" = ' % ! " + ) %" ! " $ " # . . . . . . . . . Decoder
Koopman Theory Resolves this Ambiguity Generalizes eigenfunction/eigenvalue to nonllinear dynamics: ! " = $ ! "%& § Koopman Operator: Linear, Infinite-dimensional § '( ! = ( $ ! › Linearizes the dynamics ') ! = *)(!) Eigenfunctions/Eigenvalues: § 1 ) $ ! = *) ! - " = . 2 / * / ) / ! "%& Goal: /0& ) interacts nicely with $
Polynomials are Eigenfunctions of Linear Dynamics! Linear Dynamical System ! " = $! "%& - ! ' / ! ' ( . ( eigen pairs of $ / / ! / ! ⋯ . ( , / ! ' = ' ( ) ' ( * ⋯ ' ( , polynomials . ( ) . ( * 0 ) . ( * 0 * ⋯ . ( , 0 1 0 * ⋯ ' ( , / ! / ! / ! 0 ) ' ( * 0 1 posynomials ' = ' ( ) . ( ) complex 2 ( ’s for conjugate pairs periodic combinations ReLu, etc. with same '
Polynomials are Eigenfunctions of Linear Dynamics! ! " = $! "%& * ! * ! ⋯ ' ( - * ! ' ( ) ' ( + . ! = Good news: they also form a basis! Nonlinear dimensionality reduction for dynamical systems ≡ Low-rank harmonic analysis
Polynomial Principal Component Analysis (Poly-PCA) § Replace deterministic linear dynamics with an AR model § Model observations as polynomials of degree ≤ " in latents ) % = .) %/0 + 1 % ⊗+ + - $% # $% = ' $ ( ) % ' $ : Symmetric tensor of polynomial coefficients ) % : Latents augmented with 1 ⊗+ = + > ; = minimize ' 7 , 9 : ; # $% − ' $ ( ) % ||) % − .) %/0 || = $,% %
̇ ̇ Example: Van der Pol Oscillator with Quadratic Measurements § A 2-dimensional nonlinear oscillator % 1 = % 2 & ' " % # + ) "# ! "# = % # * % 2 = 3 1 − % 1 0 % 2 − % 1 origin 10 0 10 time (s) 10 For known rank-1 ' " ≡ phase retrieval 0 2 4 6 8 10 ( s ) t i m e
Why does linear dimensionality reduction fail? § Nonlinearity changes topology ICA PCA origin L L E tsne 10 0 10 time (s) 0 2 4 6 8 10 time (s)
Poly-PCA Recovers True Latents Recovered Ground-truth Non-singular origin origin linear map 10 10 0 2 4 6 8 10 time (s)
Axioms of Dimensionality Reduction § Nonsingular linear transformations of latents should also be a solution § Nonsingular and stable linear transformations of measurements should result in the same latents › Gives stability and robustness to outliers § Stable reconstruction possible if › ! " = $ % " is Lipschitz: far away latents do not map to close observations Poly-PCA is compatible with these Axioms!
Some Poly-PCA Theory Poly-PCA ≡ constrained PCA § › ALS has no local minima, local minima appear from the polynomial constraints Experimental observation: Poly-PCA has few local minima (compared to Bezout’s § theorem) x local minimum local maximum Least squares error Poly-PCA error LS Poly-PCA x (unique) feasible manifold Also gives a good intuitive initialization
Some Poly-PCA Theory § Linear ambiguity can be handled by small penalization ⊗3 4 + 6 + 4 + 8 + 4 minimize & ' , ) * + . ,- − & , 0 1 - ||1 - || 4 ||& , || 4 ,,- - , convex one global minimum local minimum 2 optimal solution nonconvex after small regularization one global minimum manifold of equivalent local minima local minimum 1 slope = All minima of Poly-PCA
Some Poly-PCA Theory § Local minima are unique up to a linear transformation, i.e. for * : '() ! " and # $ in general position and % > ) ⊗) = ℬ $ / 3 " ⊗) ⇒ ! " = 53 " # $ / ! " › Generalization of linear preserver theory '() § Conjecture: minimum required samples is % > + , + 1. )
Equivalence to a 1-layer Decoder with Polynomial Activation ( ( ⊗+ ⇒ ! " - . / ⊗0 = $ 0 1 . / ! " = $ ) "% ) "% %&' %&' Not easy to train (Mondelli, Montanari, 2018) § Better to train directly on ! " § . . . + neuron 1 Universal approximation theorem ≡ Taylor § approximation theorem . . . . . . . . . + neuron p Poly-PCA Decoder
Poly-PCA Initialization Strategy 1: Use PCA § › Beats PCA › Need larger Lipschitz constant Strategy 2: Data Embedding + PCA § Ground-truth › Use Taken’s Embedding Theorem ! " (1) ! "&' (1) Embedded ! "&(' (1)
Recommend
More recommend