Linear Regression David M. Blei COS424 Princeton University April 10, 2008 D. Blei Linear Regression 1 / 65
Regression • We have studied classification, the problem of automatically categorizing data into a set of discrete classes. • E.g., based on its words, is an email spam or ham? • Regression is the problem of predicting a real-valued variable from data input. D. Blei Linear Regression 2 / 65
Linear regression ● ● ● ● ● ● 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● response ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.5 ● ● ● ● −2 −1 0 1 2 input Data are a set of inputs and outputs D = { ( x n , y n ) } N n =1 D. Blei Linear Regression 3 / 65
Linear regression ● ● ● ● ● ● 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● response ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.5 ● ● ● ● −2 −1 0 1 2 input The goal is to predict y from x using a linear function. D. Blei Linear Regression 4 / 65
Examples ● ● ● ● ● ● 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● response ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.5 ● ● ● ● −2 −1 0 1 2 input • Given today’s weather, how much will it rain tomorrow? • Given today’s market, what will be the price of a stock tomorrow? • Given her emails, how long will a user stay on a page? • Others? D. Blei Linear Regression 5 / 65
Linear regression ● ● ● ● ● ● 1.0 ( x n , y n ) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● f ( x ) = β 0 + β x ● ● ● ● ● response ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 0.5 ● ● ● ● − 2 − 1 0 1 2 input D. Blei Linear Regression 6 / 65
Multiple inputs • Usually, we have a vector of inputs, each representing a different feature of the data that might be predictive of the response. x = � x 1 , x 2 , . . . , x p � • The response is assumed to be a linear function of the input p � f ( x ) = β 0 + x i β i i =1 • Here, β ⊤ x = 0 is a hyperplane. D. Blei Linear Regression 7 / 65
Multiple inputs Y • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • X 2 • • • X 1 D. Blei Linear Regression 8 / 65
Flexibility of linear regression • This set-up is less limiting than you might imagine. • Inputs can be: • Any features of the data • Transformations of the original features, e.g., x 2 = log x 1 or x 2 = √ x 1 . • A basis expansion, e.g., x 2 = x 2 1 and x 3 = x 3 1 • Indicators of qualitative inputs, e.g., category • Interactions between inputs, e.g., x 1 = x 2 x 3 • Its simplicity and flexibility make linear regression one of the most important and widely used statistical prediction techniques. D. Blei Linear Regression 9 / 65
Polynomial regression example 10 ● 8 6 ● response 4 ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 −1 0 1 2 input D. Blei Linear Regression 10 / 65
Linear regression 10 ● 8 6 ● response 4 ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 −1 0 1 2 input f ( x ) = β 0 + β x D. Blei Linear Regression 11 / 65
Polynomial regression 10 ● 8 6 ● response 4 ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 −1 0 1 2 input f ( x ) = β 0 + β 1 x + β 2 x 2 + β 3 x 3 D. Blei Linear Regression 12 / 65
Fitting a regression ● ● ● • Given data D = { ( x n , y n ) } N ● 1.0 n =1 , ● ● ● ● ● ● ● ● ● ● find the coefficient β that can ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● predict y new from x new . ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● y ● ● ● • Simplifications: ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.5 ● • 0-intercept, i.e., β 0 = 0 ● ● ● ● ● ● ● ● ● • One input, i.e., p = 1 ● ● −1.0 ● ● ● ● ● ● ● ● • How should we proceed? ● −2 −1 0 1 2 x D. Blei Linear Regression 13 / 65
Residual sum of squares ● ● ● 1.0 ● | ( y n − β x n ) | ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 0.5 ● ● ● ● ● ● ● ● ● ● ● ● − 1.0 ● ● ● ● ● ● ● ● ● − 2 − 1 0 1 2 x A reasonable approach is to minimize sum of the squared Euclidean distance between each prediction β x n and the truth y n N RSS ( β ) = 1 � ( y n − β x n ) 2 2 n =1 D. Blei Linear Regression 14 / 65
RSS for two inputs Y • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • X 2 • • • X 1 D. Blei Linear Regression 15 / 65
Optimizing β The objective function is N RSS ( β ) = 1 � ( y n − β x n ) 2 2 n =1 The derivative is N d � d β RSS ( β ) = − ( y n − β x n ) x n n =1 The optimal value is � N n =1 y n x n ˆ β = � n x 2 n D. Blei Linear Regression 16 / 65
The optimal β ● ● ● ● • The optimal value is 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● � N ● ● ● ● ● ● ● n =1 y n x n ● ● ● ● ● ● ˆ ● ● β = ● ● ● ● ● ● ● ● ● ● ● n x 2 ● ● � 0.0 ● ● y ● ● ● ● ● ● ● n ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.5 ● ● ● ● ● ● ● ● ● ● • + values pull the slope up. ● ● −1.0 ● ● ● ● ● ● ● ● • − values pull the slope down ● −2 −1 0 1 2 x D. Blei Linear Regression 17 / 65
Recommend
More recommend