RECSM Summer School: Machine Learning for Social Sciences Session - PowerPoint PPT Presentation

RECSM Summer School: Machine Learning for Social Sciences Session 1.4: Ridge Regression Reto Wüest Department of Political Science and International Relations University of Geneva 1

Shrinkage Methods

Shrinkage Methods • Shrinkage methods shrink the coefficient estimates of a regression model towards 0 . • This leads to a decrease in variance at the cost of an increase in bias. • If the decrease in variance dominates the increase in bias, this leads to a decrease in the test error. • The two best-known methods for shrinking regression coefficients are ridge regression and the lasso. 1

Shrinkage Methods Ridge Regression

Ridge Regression • When we fit a model by least squares, the coefficient estimates ˆ β 0 , ˆ β 1 , . . . , ˆ β p are the values that minimize   2 p n � � RSS =  y i − β 0 − β j x ij . (1.4.1)  i =1 j =1 • In ridge regression, the coefficient estimates are the values that minimize   2 p p p n � � � � β 2 β 2  y i − β 0 − β j x ij + λ j = RSS + λ , (1.4.2)  j i =1 j =1 j =1 j =1 � �� shrinkage penalty where λ ≥ 0 is a tuning parameter. 2

Ridge Regression • Tuning parameter λ controls the relative impact of the two terms on the coefficient estimates: • If λ = 0 , then the ridge regression estimates are identical to the least squares estimates. • As λ → ∞ , the ridge regression estimates will approach 0 . • Note that the shrinkage penalty is applied to β 1 , . . . , β p , but not to the intercept β 0 , which is a measure of the mean value of the response variable when x i 1 = x i 2 = . . . = x ip = 0 . 3

Ridge Regression • Least squares estimates are scale equivariant: multiplying predictor X j by constant c leads to a scaling of the least squares estimate by factor 1 /c (i.e., ˆ β j X j remains the same). • Ridge regression estimates can change substantially when multiplying a predictor by a constant, due to the sum of squared coefficients term in the objective function. • Therefore, the predictors should be standardized as follows before applying ridge regression x ij x ij = ˜ x j ) 2 , (1.4.3) � � n 1 i =1 ( x ij − ¯ n so that they are all on the same scale. 4

Shrinkage Methods Why Does Ridge Regression Improve Over Least Squares?

Why Does Ridge Regression Improve Over Least Squares? • As λ increases, the flexibility of ridge regression decreases, leading to increased bias but decreased variance. • Simulated data containing n = 50 observations and p = 45 predictors (test MSE is a function of variance and squared bias): 60 Mean Squared Error 50 40 30 20 1 0 0 1e − 01 1e+01 1e+03 λ (Squared bias (black), variance (green), and test MSE (purple) for the ridge regression predictions on a simulated data set. Source: James et al. 2013, 218.) 5

Why Does Ridge Regression Improve Over Least Squares? • When the relationship between the response and the predictors is close to linear, the least squares estimates have low bias but may have high variance. • In particular, when the number of predictors p is almost as large as the number of observations n (as in the above simulated data), the least squares estimates are extremely variable. • Hence, ridge regression works best in situations where the least squares estimates have high variance. 6

RECSM Summer School: Machine Learning for Social Sciences Session - PowerPoint PPT Presentation

RECSM Summer School: Machine Learning for Social Sciences Session 1.4: Ridge Regression Reto West Department of Political Science and International Relations University of Geneva 1 Shrinkage Methods Shrinkage Methods Shrinkage methods

RECSM Summer School: Machine Learning for Social Sciences Session 1.3: Supervised Learning and

RECSM Summer School: Machine Learning for Social Sciences Session 3.3: K -Means Clustering Reto

RECSM Summer School: Machine Learning for Social Sciences Session 3.2: Principal Components

RECSM Summer School: Machine Learning for Social Sciences Session 2.1: Introduction to

RECSM Summer School: Machine Learning for Social Sciences Session 3.4: Hierarchical Clustering

RECSM Summer School: Machine Learning for Social Sciences Session 2.1: Introduction to

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

RECSM Summer School: Social Media and Big Data Research Pablo Barber a London School of

RECSM Summer School: Social Media and Big Data Research Pablo Barber a London School of

RECSM Summer School: Social Media and Big Data Research Pablo Barber a London School of

RECSM Summer School: Social Media and Big Data Research Pablo Barber a London School of

RECSM Summer School: Social Media and Big Data Research Pablo Barber a London School of

RECSM Summer School: Social Media and Big Data Research Pablo Barber a London School of

RECSM Summer School: Social Media and Big Data Research Pablo Barber a School of

RECSM Summer School: Social Network Analysis Pablo Barber a School of International Relations

RECSM Summer School: Scraping the web Pablo Barber a School of International Relations

Machine learning, shrinkage estimation, and economic theory Maximilian Kasy December 14, 2018 1

Return-oriented programming without returns S. Checkoway, L. Davi, A. Dmitrienko, A. Sadeghi, H.

Selfishness and Rupert Property of convex bodies Liping Yuan College of Mathematics and

Customer Service Customer Service Contact Points Hosting Support Ticketing System Service

Shrinkage estimation of the three-parameter logistic model Michela Battauz (joint with Ruggero

Shrinkage priors Dr. Jarad Niemi Iowa State University August 24, 2017 Jarad Niemi (Iowa State)

Hyperparameter tuning in caret Dr. Shirin Glander Data Scientist DataCamp Hyperparameter

The One-Quarter Fraction Need two generating relations. E.g. a 2 6 2 design, with generating