Linear Regression David M. Blei COS424 Princeton University April - PowerPoint PPT Presentation

Linear Regression David M. Blei COS424 Princeton University April 10, 2008 D. Blei Linear Regression 1 / 65

Regression • We have studied classification, the problem of automatically categorizing data into a set of discrete classes. • E.g., based on its words, is an email spam or ham? • Regression is the problem of predicting a real-valued variable from data input. D. Blei Linear Regression 2 / 65

Linear regression ● ● ● ● ● ● 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● response ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.5 ● ● ● ● −2 −1 0 1 2 input Data are a set of inputs and outputs D = { ( x n , y n ) } N n =1 D. Blei Linear Regression 3 / 65

Linear regression ● ● ● ● ● ● 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● response ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.5 ● ● ● ● −2 −1 0 1 2 input The goal is to predict y from x using a linear function. D. Blei Linear Regression 4 / 65

Examples ● ● ● ● ● ● 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● response ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.5 ● ● ● ● −2 −1 0 1 2 input • Given today’s weather, how much will it rain tomorrow? • Given today’s market, what will be the price of a stock tomorrow? • Given her emails, how long will a user stay on a page? • Others? D. Blei Linear Regression 5 / 65

Linear regression ● ● ● ● ● ● 1.0 ( x n , y n ) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● f ( x ) = β 0 + β x ● ● ● ● ● response ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 0.5 ● ● ● ● − 2 − 1 0 1 2 input D. Blei Linear Regression 6 / 65

Multiple inputs • Usually, we have a vector of inputs, each representing a different feature of the data that might be predictive of the response. x = � x 1 , x 2 , . . . , x p � • The response is assumed to be a linear function of the input p � f ( x ) = β 0 + x i β i i =1 • Here, β ⊤ x = 0 is a hyperplane. D. Blei Linear Regression 7 / 65

Multiple inputs Y • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • X 2 • • • X 1 D. Blei Linear Regression 8 / 65

Flexibility of linear regression • This set-up is less limiting than you might imagine. • Inputs can be: • Any features of the data • Transformations of the original features, e.g., x 2 = log x 1 or x 2 = √ x 1 . • A basis expansion, e.g., x 2 = x 2 1 and x 3 = x 3 1 • Indicators of qualitative inputs, e.g., category • Interactions between inputs, e.g., x 1 = x 2 x 3 • Its simplicity and flexibility make linear regression one of the most important and widely used statistical prediction techniques. D. Blei Linear Regression 9 / 65

Polynomial regression example 10 ● 8 6 ● response 4 ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 −1 0 1 2 input D. Blei Linear Regression 10 / 65

Linear regression 10 ● 8 6 ● response 4 ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 −1 0 1 2 input f ( x ) = β 0 + β x D. Blei Linear Regression 11 / 65

Polynomial regression 10 ● 8 6 ● response 4 ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 −1 0 1 2 input f ( x ) = β 0 + β 1 x + β 2 x 2 + β 3 x 3 D. Blei Linear Regression 12 / 65

Fitting a regression ● ● ● • Given data D = { ( x n , y n ) } N ● 1.0 n =1 , ● ● ● ● ● ● ● ● ● ● find the coefficient β that can ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● predict y new from x new . ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● y ● ● ● • Simplifications: ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.5 ● • 0-intercept, i.e., β 0 = 0 ● ● ● ● ● ● ● ● ● • One input, i.e., p = 1 ● ● −1.0 ● ● ● ● ● ● ● ● • How should we proceed? ● −2 −1 0 1 2 x D. Blei Linear Regression 13 / 65

Residual sum of squares ● ● ● 1.0 ● | ( y n − β x n ) | ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 0.5 ● ● ● ● ● ● ● ● ● ● ● ● − 1.0 ● ● ● ● ● ● ● ● ● − 2 − 1 0 1 2 x A reasonable approach is to minimize sum of the squared Euclidean distance between each prediction β x n and the truth y n N RSS ( β ) = 1 � ( y n − β x n ) 2 2 n =1 D. Blei Linear Regression 14 / 65

RSS for two inputs Y • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • X 2 • • • X 1 D. Blei Linear Regression 15 / 65

Optimizing β The objective function is N RSS ( β ) = 1 � ( y n − β x n ) 2 2 n =1 The derivative is N d � d β RSS ( β ) = − ( y n − β x n ) x n n =1 The optimal value is � N n =1 y n x n ˆ β = � n x 2 n D. Blei Linear Regression 16 / 65

The optimal β ● ● ● ● • The optimal value is 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● � N ● ● ● ● ● ● ● n =1 y n x n ● ● ● ● ● ● ˆ ● ● β = ● ● ● ● ● ● ● ● ● ● ● n x 2 ● ● � 0.0 ● ● y ● ● ● ● ● ● ● n ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.5 ● ● ● ● ● ● ● ● ● ● • + values pull the slope up. ● ● −1.0 ● ● ● ● ● ● ● ● • − values pull the slope down ● −2 −1 0 1 2 x D. Blei Linear Regression 17 / 65

Linear Regression David M. Blei COS424 Princeton University April - PowerPoint PPT Presentation

Linear Regression David M. Blei COS424 Princeton University April 10, 2008 D. Blei Linear Regression 1 / 65 Regression We have studied classification, the problem of automatically categorizing data into a set of discrete classes.

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Chapter 7 Linear Regression 04/05/2016 Huamei Dong 1. Review Least square regression line 2.

Technical conditions for linear regression Jo Hardin Professor, Pomona College DataCamp

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Data Combination in Particle Physics Correlated and non-Gaussian data with systematic

Econ 2148, fall 2019 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of

Slide Set 5 CLRM: sample properties of OLS Pietro Coretto pcoretto@unisa.it Econometrics

Announcements Midterm review: next Wed Oct 4, 12-1 pm, ENS 31NQ Lecture 9: Fitting, Contours

Fitting a Line, Residuals, and Correlation August 27, 2019 August 27, 2019 1 / 54 Fitting a

11. Regression and Least Squares Prof. Tesler Math 186 Winter 2019 Prof. Tesler Ch. 11: Linear

Least Squares and Data Fitting Data fitting How do we best fit a set of data points? Linear