Linear Models for Statistical Learning, Regression David Dalpiaz - PowerPoint PPT Presentation

Linear Models for Statistical Learning, Regression David Dalpiaz STAT 430, Fall 2017 1

Announcements • Homework 01 due today. • Homework 02 released later today. (Hopefully.) 2

Statistical Learning • Supervised Learning • Regression • Classification • Unsupervised Learning 3

Regression Setup Y = f ( x 1 , x 2 , x 3 , . . . x p ) + ǫ numeric response = signal + noise • Want to learn the signal • Want to be very careful not to “learn noise” 4

Using a Linear Model Setup: Y = f ( x 1 , x 2 , x 3 , . . . x p ) + ǫ Assume: f ( x 1 , x 2 , x 3 , . . . x p ) = β 0 + β 1 x 1 + β 2 x 2 + . . . + β p x p 5

The Linear Model ǫ ∼ N (0 , σ 2 ) Y = β 0 + β 1 x 1 + β 2 x 2 + . . . + β p x p + ǫ, Y | X ∼ N ( β 0 + β 1 x 1 + β 2 x 2 + . . . + β p x p , σ 2 ) There are a total of p + 2 parameters in this model • The p + 1 β parameters, or coefficients, control the signal • The σ 2 controls the noise 6

Fitting a Linear Model This is a parametric model, meaning to fit the model, we need to estimate the parameters. For the sake of making predictions, we only need to estimate the β parameters since ˆ y ( x 1 , x 2 , x 3 , . . . x p ) = ˆ β 0 +ˆ β 1 x 1 +ˆ β 2 x 2 + . . . +ˆ f ( x 1 , x 2 , x 3 , . . . x p ) = ˆ β p x p Using either least squares or maximum likelihood , this becomes the same optimization problem n � ( y i − ( β 0 + β 1 x i 1 + β 2 x i 2 + · · · + β p x ip )) 2 argmin β 0 ,β 1 ,...β p i =1 7

Estimating σ 2 While it is not needed to make predictions, to fully estimate the model, we would also need to estimate σ 2 . n 1 s 2 � y i ) 2 e = ( y i − ˆ Least Squares n − ( p + 1) i =1 n σ 2 = 1 � y i ) 2 ˆ ( y i − ˆ MLE n i =1 Both are estimates of σ 2 . What is the difference? 8

Model “Size” Consider two models: Y = β 0 + β 1 x 1 + β 2 x 2 + ǫ Y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 4 + ǫ Which is bigger? 9

Model Complexity In general, we are interested in the complexity or flexibility of a model. For nested linear models, the more parameters, the bigger, thus, more complex. Models that are more complex will be more wiggly . 10

Pictures of Complexity Go to ISL Slides 11

Test-Train Split We’ve already discussed the Test-Train Split and RMSE � � 1 � 2 � RMSE Train = RMSE(ˆ � y i − ˆ � f , Train Data) = f ( x i ) � n Tr i ∈ Train � � 1 � 2 � � RMSE Test = RMSE(ˆ � y i − ˆ f , Test Data) = f ( x i ) � n Te i ∈ Test 12

Overfitting • Overfitting occurs when a model is too complex (too flexible) for the data • Underfitting occurs when a model is not complex enough (too inflexible) for the data 13

Train RMSE Prediction Error vs Model Complexity 3.0 2.5 2.0 Error (RMSE) 1.5 1.0 0.5 0.0 0 20 40 60 80 100 Complexity (Parameters) 14

(Expected) Test RMSE Prediction Error vs Model Complexity 3.0 2.5 2.0 Error (RMSE) 1.5 1.0 0.5 (Expected) Test 0.0 Train 0 20 40 60 80 100 Complexity (Parameters) 15

The “Best” Model • Pick the model with the lowest Test RMSE • Compared to this. . . • More complex models with higher Test RMSE are Overfitting • Less complex models with higher Test RMSE are Underfitting • This is only a “guess” of the “best” model based on available information • In practice, Test RMSE might not be such a nice curve • This is due to the randomness of the split • You could get lucky, or unlucky 16

Explanation vs Prediction • Sometimes we check model assumptions directly • When predicting, we make assumptions and check them indirectly • If we assume a correct (or close to correct) form of the model, the Test RMSE will be low 17

If Time. . . • rmarkdown Tables • Using code from the Internet • Back to Test-Train Split Lab • What would be a good Test RMSE? • Overfitting: n vs p • Randomness of Split • Pseudo RNG 18

Linear Models for Statistical Learning, Regression David Dalpiaz - PowerPoint PPT Presentation

Linear Models for Statistical Learning, Regression David Dalpiaz STAT 430, Fall 2017 1 Announcements Homework 01 due today. Homework 02 released later today. (Hopefully.) 2 Statistical Learning Supervised Learning Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Linear regression How to measure the accuracy of linear regression models Linear Regression

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Linear regression Linear regression is a simple approach to supervised learning. It assumes

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

ECON 950 Winter 2020 Prof. James MacKinnon 9. Going Beyond Linear Models Linear regression,

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Chapter 7 Linear Regression 04/05/2016 Huamei Dong 1. Review Least square regression line 2.

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today

Theory of quarkonium electromagnetic transitions Antonio Vairo Technische Universit at M

Analytic Functions and Small Complexity Classes Akitoshi Kawamura, Florian Steinberg, Holger Thies

Background Regarding The Incompleteness Theorems Modern logic began in the mid 19th century with

On computable fields of reals and some applications Victor Selivanov 1 A.P. Ershov IIS SB RAS

Section 6 : Cross Validation Yotam Shem-Tov Fall 2014 1/25 Yotam Shem-Tov STAT 239/ PS

Using a Volterra Feedback Model Maarten Schoukens, Fritjof Griesing Scheiwe Benchmarks Cascaded

Distributed Projection Approximation Subspace Tracking Based on Consensus Propagation Carolina

Optimizing Vessel Trajectory Compression Giannis Fikioris, Kostas Patroumpas, Alexander Artikis