on the statistical rate of nonlinear recovery in
play

On the Statistical Rate of Nonlinear Recovery in Generative Models - PowerPoint PPT Presentation

On the Statistical Rate of Nonlinear Recovery in Generative Models with Heavy-tailed Data Xiaohan Wei , Zhuoran Yang, and Zhaoran Wang University of Southern California, Princeton University and Northwestern University June 12th, 2019 Generative


  1. On the Statistical Rate of Nonlinear Recovery in Generative Models with Heavy-tailed Data Xiaohan Wei , Zhuoran Yang, and Zhaoran Wang University of Southern California, Princeton University and Northwestern University June 12th, 2019

  2. Generative Model vs Sparsity in Signal Recovery Classical sparsity: structure of the signals depend on basis.

  3. Generative Model vs Sparsity in Signal Recovery Classical sparsity: structure of the signals depend on basis. Generative model: explicit parametrization of low-dimensional signal manifold.

  4. Generative Model vs Sparsity in Signal Recovery Classical sparsity: structure of the signals depend on basis. Generative model: explicit parametrization of low-dimensional signal manifold. Previous works: [Bora et al. 2017] [Hand et al. 2018] [Mardani et al. 2017].

  5. Nonlinear Recovery via Generative Models Given: Generative model G : R k → R d and measurement matrix X ∈ R m × d .

  6. Nonlinear Recovery via Generative Models Given: Generative model G : R k → R d and measurement matrix X ∈ R m × d . Goal: Recovery G ( θ ∗ ) up to scaling from nonlinear observations y = f ( XG ( θ ∗ )) .

  7. Nonlinear Recovery via Generative Models Given: Generative model G : R k → R d and measurement matrix X ∈ R m × d . Goal: Recovery G ( θ ∗ ) up to scaling from nonlinear observations y = f ( XG ( θ ∗ )) . Challenges: 1 High-dimensional recovery: k ≪ d , m ≪ d .

  8. Nonlinear Recovery via Generative Models Given: Generative model G : R k → R d and measurement matrix X ∈ R m × d . Goal: Recovery G ( θ ∗ ) up to scaling from nonlinear observations y = f ( XG ( θ ∗ )) . Challenges: 1 High-dimensional recovery: k ≪ d , m ≪ d . 2 Non-Gaussian X and unknown non-linearity f .

  9. Nonlinear Recovery via Generative Models Given: Generative model G : R k → R d and measurement matrix X ∈ R m × d . Goal: Recovery G ( θ ∗ ) up to scaling from nonlinear observations y = f ( XG ( θ ∗ )) . Challenges: 1 High-dimensional recovery: k ≪ d , m ≪ d . 2 Non-Gaussian X and unknown non-linearity f . 3 Observations y can be heavy-tailed .

  10. Our Method: Stein + Adaptive Thresholding Suppose the rows of X := [ X 1 , · · · , X m ] T ∈ R m × d have density p : R d → R . Define the (row-wise) score transformation: S p ( X ) := [ S p ( X 1 ) , · · · , S p ( X m )] T = [ ∇ log p ( X 1 ) , · · · , ∇ log p ( X m )] T .

  11. Our Method: Stein + Adaptive Thresholding Suppose the rows of X := [ X 1 , · · · , X m ] T ∈ R m × d have density p : R d → R . Define the (row-wise) score transformation: S p ( X ) := [ S p ( X 1 ) , · · · , S p ( X m )] T = [ ∇ log p ( X 1 ) , · · · , ∇ log p ( X m )] T . (First-order) Stein’s identity: when E f ′ ( � X i , G ( θ ∗ ) � ) > 0, � � S p ( X ) T y ∝ G ( θ ∗ ) . E (Second-order) Stein’s identity: when E f ′′ ( � X i , G ( θ ∗ ) � ) > 0, δ is a constant, � � S p ( X ) T diag ( y ) S p ( X ) ∝ G ( θ ∗ ) G ( θ ∗ ) T + δ · I d × d . E

  12. Our Method: Stein + Adaptive Thresholding Suppose the rows of X := [ X 1 , · · · , X m ] T ∈ R m × d have density p : R d → R . Define the (row-wise) score transformation: S p ( X ) := [ S p ( X 1 ) , · · · , S p ( X m )] T = [ ∇ log p ( X 1 ) , · · · , ∇ log p ( X m )] T . (First-order) Stein’s identity: when E f ′ ( � X i , G ( θ ∗ ) � ) > 0, � � S p ( X ) T y ∝ G ( θ ∗ ) . E (Second-order) Stein’s identity: when E f ′′ ( � X i , G ( θ ∗ ) � ) > 0, δ is a constant, � � S p ( X ) T diag ( y ) S p ( X ) ∝ G ( θ ∗ ) G ( θ ∗ ) T + δ · I d × d . E Adaptive thresholding: suppose � y i � L q < ∞ , q > 4, and τ m ∝ m 2 / q , � y i = sign ( y i ) · ( | y i | ∧ τ m ) , i ∈ { 1 , 2 , · · · , m }

  13. Our Method: Stein + Adaptive Thresholding Least-squares estimator: � � 2 � � � G ( θ ) − 1 � � m S p ( X ) T � � θ ∈ argmin θ ∈ R k y . � 2

  14. Our Method: Stein + Adaptive Thresholding Least-squares estimator: � � 2 � � � G ( θ ) − 1 � � m S p ( X ) T � � θ ∈ argmin θ ∈ R k y . � 2 Main performance theorem: Theorem (Wei, Yang and Wang, 2019) For any accuracy level ε ∈ ( 0 , 1 ] , suppose (1) E f ′ ( � X i , G ( θ ∗ ) � ) > 0 , (2) the generative model G is a ReLU network with zero bias, (3) the number of measurements m ∝ k ε − 2 log d . Then, with high probability, � � � � G ( � G ( θ ∗ ) θ ) � � − ≤ ε. � � � G ( � � G ( θ ∗ ) � 2 � � θ ) � 2 2 Similar results hold for more general Lipschitz generators G .

  15. Our Method: Stein + Adaptive Thresholding PCA type estimator: θ ∈ argmax � G ( θ ) � 2 = 1 G ( θ ) T S p ( X ) T diag ( � � y ) S p ( X ) G ( θ )

  16. Our Method: Stein + Adaptive Thresholding PCA type estimator: θ ∈ argmax � G ( θ ) � 2 = 1 G ( θ ) T S p ( X ) T diag ( � � y ) S p ( X ) G ( θ ) Main performance theorem: Theorem (Wei, Yang and Wang, 2019) For any accuracy level ε ∈ ( 0 , 1 ] , suppose (1) E f ′′ ( � X i , G ( θ ∗ ) � ) > 0 , (2) the generative model G is a ReLU network with zero bias, (3) the number of measurements m ∝ k ε − 2 log d . Then, with high probability, � � � G ( θ ∗ ) � � � G ( � � θ ) − ≤ ε. � � G ( θ ∗ ) � 2 2 Similar results hold for more general Lipschitz generators G .

  17. Thank you! Poster 198, Pacific Ballroom, 6:30-9:00 pm

Recommend


More recommend