Empirical Study of the Benefits of Overparameterization in Learning - PowerPoint PPT Presentation

Empirical Study of the Benefits of Overparameterization in Learning Latent Variable Models Rares-Darius Buhai 1 , Yoni Halpern 2 , Yoon Kim 3 , Andrej Risteski 4 , David Sontag 1 1 MIT, 2 Google, 3 Harvard, 4 CMU

Overparameterization = training a larger model than necessary Supervised learning: easier optimization, often without sacrificing generalization . → practice: [Zhang et al., 2016] commonly used neural networks are so large that they can learn randomized labels. → theory: [Allen-Zhu et al., 2018; Allen-Zhu et al., 2019] overparameterized neural networks provably learn and generalize for certain classes of functions.

Overparameterization in unsupervised learning Task: learning latent variable models . Contribution: Empirical study of the benefits of overparameterization in learning latent variable models.

Latent variable models Know . Task: learn . unobserved observed Maximum likelihood: . Typically intractable . Iterative algorithms (e.g., EM, variational learning). inference (typically gradient step) inferred observed unobserved observed

Our setting Ground truth model. latent variables (synthetic setting) observed variables Task: learn model from samples. non-overparameterized overparameterized

Our question A ground truth latent variable is recovered if there exists a learned latent variable with the same parameters. How does overparameterization affect the recovery of ground truth latent variables ?

Our finding With overparameterization , the learned model recovers the ground truth latent variables more non-overparameterized often than without overparameterization. The unmatched learned latent variables are typically redundant. Demonstration through extensive experiments with: overparameterized noisy-OR network models ● sparse coding models ● neural PCFG models ●

Noisy-OR networks latent variables 1 1 0 observed variables 1 Example : image model. latent variables observed variables

Noisy-OR networks Train using variational learning . noisy-OR network recognition network (in our experiments: logistic regression and independent Bernoulli) Maximize the evidence lower bound (ELBO), alternating between gradient steps w.r.t and .

Noisy-OR networks: recovery Image model. # recovered % runs true latent full variables recovery # latent variables of # latent variables of learned model learned model

Noisy-OR networks: recovery Harm of extreme overparameterization is minor . Similar trends for held-out log-likelihood.

Noisy-OR networks: unmatched latent variables discarded or duplicates high failure low prior discarded discarded Simple filtering step to recover ground truth: eliminate latent variables with low prior or high failure ● eliminate latent variables that are duplicates ●

Noisy-OR networks: algorithm variations Overparameterization remains beneficial: batch size : 20 → 1000 ● recognition network : logistic regression → independent Bernoulli ● Suggests benefits are general when learning latent variable models with iterative algorithms.

Noisy-OR networks: explanation Hypothesis With overparameterization, more latent variables initialized close to ground truth latent variables . Then, the benefit is due to a “ warm start ”. Actual finding Latent variables do not converge quickly to ground truth latent variables. In the beginning, undecided . Throughout, contentions .

Noisy-OR networks: optimization stability State of latent variables after 1/9, 2/9, and 3/9 of the first epoch. both contend for the same ground truth latent variable In the beginning, many latent variables are undecided .

Noisy-OR networks: optimization stability State of latent variables after 10, 20, and 30 epochs. both contend for the same ground truth latent variable Throughout, latent variables often contend .

Sparse Coding Neural PCFG Linear model. Nonlinear model. Synthetic experiments. Semi-synthetic experiments with neural network parameterization. Training with linear alternating Training with EM and neural network minimization algorithm. parameterization. → overparameterization gives → overparameterization gives better recovery better recovery (similarity between parse trees) → simple filtering step

Discussion Why is any of this surprising? Typically, smaller models are more likely to be identifiable. However, our experiments show that larger models often make optimization easier and have an inductive bias toward ground truth recovery .

Application For practice : it is helpful to overparameterize. For theory : interesting phenomenon, may provide insights into learning and optimization.

Future work Study larger and more complex models , e.g., commonly used deep generative models. Understand model identifiability. ● Define overparameterization. ● Define ground truth recovery ● and design filtering steps.

Thank you! Our code is available at https://github.com/clinicalml/overparam.

Empirical Study of the Benefits of Overparameterization in Learning - PowerPoint PPT Presentation

Empirical Study of the Benefits of Overparameterization in Learning Latent Variable Models Rares-Darius Buhai 1 , Yoni Halpern 2 , Yoon Kim 3 , Andrej Risteski 4 , David Sontag 1 1 MIT, 2 Google, 3 Harvard, 4 CMU Overparameterization = training a

An Empirical Evaluation to Study Benefits of Visual versus Textual Test Coverage Information

AN EMPIRICAL STUDY OF PRACTITIONERS PERSPECTIVES ON GREEN SOFTWARE ENGINEERING Empirical

An Investigation of Why Overparameterization Exacerbates Spurious Correlation Authors: Shiori

An Investigation of Why Overparameterization Exacerbates Spurious Correlations Shiori Sagawa*

Empirical Study: Expert Finding Matthieu Vergne vergne@fbk.eu Fondazione Bruno Kessler

An empirical study of Gaussian belief propagation and application in the detection of F-formations

An Empirical Study of Code Clone Genealogies

An Empirical Security Study of An Empirical Security Study of the Native Code in the JDK the

An Empirical Study of Delay Introduction Jitter Management Policies Want to support

Empirical problem solving Statistical method R.W. Oldford Empirical problem solving - PPDAC The

An Empirical Study on the Use of Defect U N I VE R SI TY OF WASHI N G TON Prediction for Test

and benefits (rab) study previously known as railyard alternatives & i-280 boulevard study

UNEMPLOYMENT BENEFITS AND UNEMPLOYMENT IN THE GREAT RECESSION: THE ROLE OF MACRO EFFECTS

Towards Demystifying Overparameterization in Deep Learning Mahdi Soltanolkotabi Department of

An Empirical Study of How Developers Use - Results - Discussion Autocompletion Sheldon Chi

An Empirical Study on Reducing Omission Errors in Practice Jihun

VI. The Feasibility Study VI. The Feasibility Study What is a feasibility study? What is a

' $ Prelimina ry Empirical Study of BTC T o ols Ma rk Allman, NASA GRC/BBN IPPM W

An Empirical Study of Wireless Carrier Authentication for SIM Swaps Kevin Lee

DETERMINANTS OF PROFITABILITY OF PRIVATE COMMERCIAL BANKS IN BANGLADESH: AN EMPIRICAL STUDY

On the Evaluation of Outlier Detection: Measures, Datasets, and an Empirical Study Continued

An empirical study of attitude towards C&DW recycling: Integrating social impression and

Interview Review: an empirical study on detecting ambiguities in requirements elicitation

An Empirical Study on the Efficiency of Different Design Pattern Representations in UML Class

Empirical Study of the Benefits of Overparameterization in Learning - PowerPoint PPT Presentation

Empirical Study of the Benefits of Overparameterization in Learning Latent Variable Models Rares-Darius Buhai 1 , Yoni Halpern 2 , Yoon Kim 3 , Andrej Risteski 4 , David Sontag 1 1 MIT, 2 Google, 3 Harvard, 4 CMU Overparameterization = training a

An Empirical Evaluation to Study Benefits of Visual versus Textual Test Coverage Information

AN EMPIRICAL STUDY OF PRACTITIONERS PERSPECTIVES ON GREEN SOFTWARE ENGINEERING Empirical

An Investigation of Why Overparameterization Exacerbates Spurious Correlation Authors: Shiori

An Investigation of Why Overparameterization Exacerbates Spurious Correlations Shiori Sagawa*

Empirical Study: Expert Finding Matthieu Vergne vergne@fbk.eu Fondazione Bruno Kessler

An empirical study of Gaussian belief propagation and application in the detection of F-formations

An Empirical Study of Code Clone Genealogies

An Empirical Security Study of An Empirical Security Study of the Native Code in the JDK the

An Empirical Study of Delay Introduction Jitter Management Policies Want to support

Empirical problem solving Statistical method R.W. Oldford Empirical problem solving - PPDAC The

An Empirical Study on the Use of Defect U N I VE R SI TY OF WASHI N G TON Prediction for Test

and benefits (rab) study previously known as railyard alternatives &amp; i-280 boulevard study

UNEMPLOYMENT BENEFITS AND UNEMPLOYMENT IN THE GREAT RECESSION: THE ROLE OF MACRO EFFECTS

Towards Demystifying Overparameterization in Deep Learning Mahdi Soltanolkotabi Department of

An Empirical Study of How Developers Use - Results - Discussion Autocompletion Sheldon Chi

An Empirical Study on Reducing Omission Errors in Practice Jihun

VI. The Feasibility Study VI. The Feasibility Study What is a feasibility study? What is a

' $ Prelimina ry Empirical Study of BTC T o ols Ma rk Allman, NASA GRC/BBN IPPM W

An Empirical Study of Wireless Carrier Authentication for SIM Swaps Kevin Lee

DETERMINANTS OF PROFITABILITY OF PRIVATE COMMERCIAL BANKS IN BANGLADESH: AN EMPIRICAL STUDY

On the Evaluation of Outlier Detection: Measures, Datasets, and an Empirical Study Continued

An empirical study of attitude towards C&amp;DW recycling: Integrating social impression and

Interview Review: an empirical study on detecting ambiguities in requirements elicitation

An Empirical Study on the Efficiency of Different Design Pattern Representations in UML Class

and benefits (rab) study previously known as railyard alternatives & i-280 boulevard study

An empirical study of attitude towards C&DW recycling: Integrating social impression and