Generalization Error of Generalized Linear Models in High Dimensions Melika Emami 1 , Mojtaba Sahraee-Ardakan 1,2 , Parthe Pandit 1,2 , Sundeep Rangan 3 , Alyson K. Fletcher 1,2 1 ECE, UCLA, 2 STAT, UCLA, 3 ECE, NYU ICML 2020 Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 1 / 15
Overview • Generalization Error: Performance on new data • Fundamental question in modern systems: – Low generalization error despite over-parameterization [BHMM19] • This work : Exact calculation of generalization error for GLMs – High dimensional regime – Double descent phenomenon Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 2 / 15
Overview y = φ out ( � x , w 0 � , d ) • Generalized linear models (GLMs): d w 0 0 x i φ out ( · ) y Σ w 0 1 w 0 p − 1 • Regularized ERM: w = argmin � F out ( y , X w ) + F in ( w ) w • Generalization error: E f ts ( y ts , � y ts ) (1) – Test sample: ( x ts , y ts ) � x ts , w 0 � – y ts = φ out ( , d ts ) , � y ts = φ ( � x ts , � w � ) Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 3 / 15
Overview • Prior work – Understanding generalization in deep neural nets [BMM18, BHX19, BLLT19, NLB + 18, ZBH + 16, AS17] – Linear models [MRSY19, DKT19, MM19, HMRT19, GAK20] – GLMs with uncorrelated features [BKM + 19] • Our contribution : – A procedure for characterizing generalization error (1) – General test metrics, training losses, regularizers, link function – Correlated covariates – Train-test distributional mismatch – Over-parameterized and under-parameterized regime Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 4 / 15
Outline Main Result Scalar Equivalent System Main Theorem Examples Linear Regression Logistic Regression Non-linear Regression Proof Technique Multi-layer VAMP Future Directions Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 5 / 15
Scalar Equivalent System True vector system Scalar equivalent system N (0 , τ ) w 0 d � w y W 0 � W z ≡ φ out ( · ) + X Est Denoiser High dimensional: Hard to analyze Scalar: Easy to analyze � w = argmin F out ( y , X w ) + F in ( w ) (2) w • Key tool : Approximate Message Passing (AMP) framework [DMM09, BM11, RSF19, FRS18, PSAR + 20] – As a constructive proof technique – Performance of the estimates: → deterministic recursive equations: state evolution (SE) Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 6 / 15
Main Result True vector system Scalar equivalent system N (0 , τ ) w 0 d � w y � W 0 W z ≡ φ out ( · ) X Est + Denoiser High dimensional: Hard to analyze Scalar: Easy to analyze Theorem (Generalization error of GLMs) (a) Under some regularity conditions on f ts , φ, φ out , the above convergence is rigorous: � N 1 f ( w 0 w i ) = E f ( W 0 , � lim W ) i , � a.s. N N →∞ i =1 W = prox f in /γ ( W 0 + Q ) , � ( independent of W 0 ) Q = N (0 , τ ) (b) Generalization error: � � φ out ( Z ts , D ) , φ ( � ( Z ts , � E ts = E f ts Z ts ) , Z ts ) ∼ N ( 0 2 , M ) ⊥ ( Z ts , � τ, γ , and M are computed by SE equations, and D ⊥ Z ts ) Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 7 / 15
Example Setting • Train-test distributional mismatch – x train ∼ N ( 0 , Σ tr ), x test ∼ N ( 0 , Σ ts ), Σ tr and Σ ts commute – i.i.d. log-normal eigenvalues � log( S 2 � � � 1 �� tr ) ρ i . i . d . ∼ N 0 , σ ∀ i log( S 2 ts ) ρ 1 • 3 different cases : (i) Uncorrelated features ( σ = 0) Σ tr = Σ ts = I (ii) Correlated features ( σ > 0 , ρ = 1) Σ tr = Σ ts � = I Σ tr � = Σ ts (iii) Mismatched features ( σ > 0 , ρ < 1) Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 8 / 15
Example: Linear Regression • Under-regularized linear regression: – φ out ( p, d ) = p + d , and d ∼ N (0 , σ 2 d ) – MSE output loss – double descent phenomenon (Recovered result of [HMRT19]) Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 9 / 15
Example: Logistic Regression • Logistic regression – Logistic output P ( y = 1) = 1 / (1 + e − p ) – Binary cross-entropy loss with ℓ 2 regularization Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 10 / 15
Example: Non-linear Regression • Non-linear Regression d ∼ N (0 , σ 2 – φ out ( p, d ) = tanh( p ) + d, d ) 1 d ( y − tanh( p )) 2 – f out ( y, p ) = 2 σ 2 Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 11 / 15
Proof Technique: Multi-Layer Representation z 0 0 = w 0 z 0 3 = y Σ 1 / 2 φ out ( · ) U tr • Represent the mapping w 0 �→ y as a multi-layer network y = φ out ( X w , d ) • Decompose Gaussian training data X with covariance Σ tr 1 X = UΣ tr , 2 U i.i.d. Gaussian 1 • Use SVD of U and eigendecomposition of Σ tr : 2 Σ tr = 1 p V T 0 diag( s 2 tr ) V 0 , U = V 2 S mp V 1 • V 0 , V 1 , V 2 : Haar-distributed • S mp : Singular values of U – converges in distribution to Marchenko-Pastur law Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 12 / 15
Proof Technique: Multi-Layer VAMP z 0 0 = w 0 z 0 3 = y p 0 z 0 p 0 z 0 p 0 2 = X w 0 0 1 1 2 V 0 S tr V 1 S mp V 2 φ out ( · ) • Algorithm to solve inference problem in deep neural networks • Similar to ADMM algorithm for optimization • Statistical guarantees: – Joint distribution of ( W 0 , � W ) and other hidden signals Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 13 / 15
Proof Technique: Generalization Error z 0 0 = w 0 , � z 0 3 = y w p 0 z 0 p 0 z 0 p 0 2 = X w 0 0 1 1 2 V 0 S ts V 1 S mp V 2 φ out ( · ) � � � � � p 0 z 1 p 1 z 2 p 2 • ML-VAMP ⇒ Joint distribution of ( W 0 , � W ) (part (a) of Thm) • Given test data: x T ts = u T diag( s ts ) V 0 2 , � • Find joint distribution of ( P 0 P 2 ) for test data (part (b) of Thm) 2 , � ( P 0 P 2 ) ∼ N ( 0 2 , M ) • Obtain generalization error � � 2 , D ) , φ ( � φ out ( P 0 E ts = E f ts P 2 ) Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 14 / 15
Future Directions • Generalize results to: – Non-Gaussian covariates – Multitask GLMs using multi-layer matrix-valued VAMP – Deeper models like two-layer neural networks – Non-asymptotic regimes • Use results to get: – Generalization errors in reproducing kernel Hilbert spaces, such as NTK space Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 15 / 15
Madhu S Advani and Andrew M Saxe. High-dimensional dynamics of generalization error in neural networks. arXiv preprint arXiv:1710.03667 , 2017. Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proc. National Academy of Sciences , 116(32):15849–15854, 2019. Mikhail Belkin, Daniel Hsu, and Ji Xu. Two models of double descent for weak features. arXiv preprint arXiv:1903.07571 , 2019. Jean Barbier, Florent Krzakala, Nicolas Macris, L´ eo Miolane, and Lenka Zdeborov´ a. Optimal errors and phase transitions in high-dimensional generalized linear models. Proc. National Academy of Sciences , 116(12):5451–5460, March 2019. Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 15 / 15
Peter L Bartlett, Philip M Long, G´ abor Lugosi, and Alexander Tsigler. Benign overfitting in linear regression. arXiv preprint arXiv:1906.11300 , 2019. M. Bayati and A. Montanari. The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Trans. Inform. Theory , 57(2):764–785, February 2011. Mikhail Belkin, Siyuan Ma, and Soumik Mandal. To understand deep learning we need to understand kernel learning. arXiv preprint arXiv:1802.01396 , 2018. Zeyu Deng, Abla Kammoun, and Christos Thrampoulidis. A model of double descent for high-dimensional binary linear classification. arXiv preprint arXiv:1911.05822 , 2019. David L Donoho, Arian Maleki, and Andrea Montanari. Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 15 / 15
Message-passing algorithms for compressed sensing. Proc. National Academy of Sciences , 106(45):18914–18919, 2009. Alyson K Fletcher, Sundeep Rangan, and P. Schniter. Inference in deep networks in high dimensions. Proc. IEEE Int. Symp. Information Theory , 2018. C´ edric Gerbelot, Alia Abbara, and Florent Krzakala. Asymptotic errors for convex penalized linear regression beyond gaussian matrices. arXiv preprint arXiv:2002.04372 , 2020. Trevor Hastie, Andrea Montanari, Saharon Rosset, and Ryan J Tibshirani. Surprises in high-dimensional ridgeless least squares interpolation. arXiv preprint arXiv:1903.08560 , 2019. Song Mei and Andrea Montanari. The generalization error of random features regression: Precise asymptotics and double descent curve. arXiv preprint arXiv:1908.05355 , 2019. Melika Emami (UCLA) Generalization Error of GLMs ICML 2020 15 / 15
Recommend
More recommend