Bus 701: Advanced Statistics Harald Schmidbauer c Harald - PowerPoint PPT Presentation

Bus 701: Advanced Statistics Harald Schmidbauer c � Harald Schmidbauer & Angi R¨ osch, 2007

13.1 Simple Linear Regression: Goals Goals of Simple Linear Regression. Once again, given are points ( x i , y i ) , from a bivariate metric variable ( X, Y ) . How can we establish a functional relationship between X and Y ? Most importantly: • Which straight line is “good”? — What does “good” mean? • How can the parameters of a “good” line be computed? c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 2/35

13.1 Simple Linear Regression: Goals Goals of Simple Linear Regression. Why would we want to fit a line to a cloud of points? • In order to quantify the relationship between X and Y , using a simple model. • In order to forecast Y for a given value of X . c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 3/35

13.2 The Regression Line Finding a “good” line. . . ● ● ● ● ● ● ● ● ● ● ● y ● ● ● ● ● ● ● ● ● x . . . and how can we find a “good” line? — A criterion is needed! c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 4/35

13.2 The Regression Line A very simple scatterplot. • observed points: y 2 ● ( x i , y i ) ^ y 3 ^ • points on the line: y 2 y 3 ● ^ y 1 ( x i , ˆ y i ) y 1 ● x 1 x 2 x 3 c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 5/35

13.2 The Regression Line Definition. Define ˆ y i = a + bx i and e i = y i − ˆ y i . The regression line of Y with respect to X is the line y = a + bx with parameters a and b such that n n n y i ) 2 = � � � ( y i − a − bx i ) 2 e 2 Q ( a, b ) = i = ( y i − ˆ i =1 i =1 i =1 attains its minimum. The parameter b thus obtained is called the regression coefficient. This way to find a and b is called the method of least squares . c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 6/35

13.2 The Regression Line Regression: some first comments. • “Good” means: The sum of squared distances, parallel to the y -axis , is minimized. • This procedure is asymmetric! • It comforms to the idea: Given X , what is Y ? • X : “independent variable”, Y : “dependent variable” c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 7/35

13.2 The Regression Line Regression is asymmetric. The regression lines. . . ● • . . . of Y w.r.t. X and ● y • . . . of X w.r.t. Y ● are usually different. x c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 8/35

13.2 The Regression Line Y w.r.t. X , or rather X w.r.t. Y ? Example: = body-height of a person; X = body-weight of a person Y Here, a regression of Y w.r.t. X looks quite natural, while a regression of X w.r.t. Y would be strange. c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 9/35

13.2 The Regression Line Y w.r.t. X , or rather X w.r.t. Y ? Example: Consider the change in percent of price indices, on the corresponding month of the previous year: = change of housing price index; X = change of clothing price index Y Here, neither of the regressions — Y w.r.t. X nor X w.r.t. Y — looks very meaningful, because it is neither convincing to say that X influences (or even causes) Y , nor vice versa. In this example, a symmetric procedure is more appropriate than regression. c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 10/35

13.2 The Regression Line Computing the regression line. Minimizing Q leads to the following equations for the slope b and the intercept a : = n � x i y i − ( � x i ) ( � y i ) � ( x i − ¯ x )( y i − ¯ y ) = b n � x 2 i − ( � x i ) 2 � ( x i − ¯ x ) 2 cov( X, Y ) = var( X ) , = y − b ¯ ¯ a x. c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 11/35

13.2 The Regression Line Example: (This is a toy example. . . ) x 2 y 2 i x i y i x i y i y i ˆ e i i i 1 5 15 25 225 75 13.9 1.1 2 10 8 100 64 80 11.3 − 3.3 3 15 12 225 144 180 8.7 3.3 4 20 5 400 25 100 6.1 − 1.1 � 50 40 750 458 435 40 0 Then, b = 4 · 435 − 50 · 40 a = 40 4 − ( − 0 . 52) · 50 = − 0 . 52 , 4 = 16 . 5 4 · 750 − 50 2 The regression line is: y = 16 . 5 − 0 . 52 x . Using this regression line, the ˆ y i and the e i can be computed. We observe: ¯ y = ¯ ˆ y , ¯ e = 0 . (This is always the case.) c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 12/35

13.2 The Regression Line A plot of the toy example. 20 15 ● ● 10 y ● 5 ● 0 0 5 10 15 20 25 x c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 13/35

13.3 Explanatory Power of the Model Next, we look at the explanatory power of the regression model. y 2 ● ^ y 3 ^ y 2 y 3 ● ^ y 1 y 1 ● x 1 x 2 x 3 c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 14/35

13.3 Explanatory Power of the Model The explanatory power of the regression model. . . We observe: • There is (in general) less variability in the ˆ y i than in the y i ! — That is, the regression line cannot explain the entire variablity in the observed y i . • The regression could provide a complete explanation if all points ( x i , y i ) were on the regression line. c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 15/35

13.3 Explanatory Power of the Model Decomposition of variance. y ) 2 = � (ˆ y ) 2 + � ( y i − ˆ y i ) 2 � ( y i − ¯ y i − ¯ SST = SSR + SSE Here, SST: total sum of squares SSR: regression sum of squares SSE: error sum of squares c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 16/35

13.3 Explanatory Power of the Model The coefficient of determination. It is defined as: SSR SST • The coefficient of determination is the share of variablity in the data which is explained by the regression. • It holds that SSR SST = r 2 = cor 2 ( X, Y ) . • r 2 = 100% if and only if all observed points are on the regression line. • r 2 = 0% if and only if X and Y are uncorrelated. c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 17/35

13.3 Explanatory Power of the Model Overseas Shipholding Group, Inc. (“OSG”), is a Example: marine transportation company whose stock is listed at New York Stock Exchange (NYSE). Let monthly returns in percent be defined as osg.ret = on OSG stock (black in the figure below); nyse.ret = on the NYSE Composite Index (red) 20 ret on osg / nyse 10 0 −10 −20 2001 2002 2003 2004 2005 c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 18/35

13.3 Explanatory Power of the Model Scatterplot and regression results. ● 20 ● ● ● ● ● ● ● ● ● ● ● ● ● ● • regression line: 10 ● ● ● ● return on osg ● ● ● ● ● ● ● ● osg.ret = 1 . 50 + 1 . 47 · nyse.ret ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● −10 • coef. of determination: ● ● ● ● ● ● ● ● ● r 2 = 29% ● ● −20 ● −10 −5 0 5 10 return on nyse c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 19/35

13.3 Explanatory Power of the Model An interpretation of our results. Why are there fluctuations in OSG stock price? • It is not by pure chance that OSG stock price fluctuates. • It is because the market index NYSE Composite fluctuates! • Is this the only reason? — No, but fluctuations in NYSE Composite explain about 29% of the variability in OSG stock price. • So what might be other reasons? This is not investigated here. . . (a guess: import/export quantities, decisions of the CEO, condition of competitors, . . . ) c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 20/35

13.4 A Stochastic SLR Model SLR in descriptive and inductive statistics. • So far, we have seen SLR from a purely descriptive point of view. (There were no probabilities, no stochastic models.) • Advantage of this approach: simplicity • Disadvantage: We obtain no insight into the mechanism which created the data — for this purpose, we need a stochastic model and the methods of inductive statistics! c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 21/35

13.4 A Stochastic SLR Model A stochastic simple linear regression model. Y i = α + βx i + ǫ i , i = 1 , . . . , n • The random variable Y i represents the observation belonging to x i . • α and β are unknown parameters (to be estimated). • x i is the observation of the independent variable X . • ǫ i is a random variable; is contains everything not accounted for in the equation y = α + βx . c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 22/35

13.4 A Stochastic SLR Model Assumptions about ǫ . We shall assume that the ǫ i in Y i = α + βx i + ǫ i , i = 1 , . . . , n are a sequence of independent and identically distributed random variables: ǫ i ∼ N (0 , σ 2 ǫ ) iid The “normality assumption” is very strong. c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 23/35

Bus 701: Advanced Statistics Harald Schmidbauer c Harald - PowerPoint PPT Presentation

Bus 701: Advanced Statistics Harald Schmidbauer c Harald Schmidbauer & Angi R osch, 2007 13.1 Simple Linear Regression: Goals Goals of Simple Linear Regression. Once again, given are points ( x i , y i ) , from a bivariate metric

Advanced Statistics Janette Walde janette.walde@uibk.ac.at Department of Statistics University

Advanced Mathematical Methods Part II Statistics Probability Mel Slater

Advanced Mathematical Methods Part II Statistics Statistical Inference Mel Slater

Advanced Mathematical Methods Part II Statistics Scientific Method Mel Slater

Advanced Mathematical Methods Part II Statistics Generalised Linear Model Mel Slater

Advanced Mathematical Methods Part II Statistics GLM Analysis of Variance Designs Mel

Advanced Mathematical Methods Part II Statistics Probability Distributions Mel Slater

Advanced Loops STAT 133 Gaston Sanchez Department of Statistics, UCBerkeley

1. Foundations of Numerics from Advanced Mathematics 1. Foundations of Numerics from Advanced

Domain Statistics Collector Tutorial Duane Wessels DNS-OARC Advanced ccTLD Workshop September

Instrumental Variable Regression Erik Gahner Larsen Advanced applied statistics, 2015 1 / 58

Information Security Identification and authentication Advanced User Authentication III

Modern Computational Statistics Lecture 8: Advanced MCMC Cheng Zhang School of Mathematical

Advanced Methods in Applied Statistics Christian Starup & Loui Wentzel Niels Bohr Institute

Confidence intervals and the Feldman-Cousins construction Edoardo Milotti Advanced Statistics

Matching and Propensity Scores Erik Gahner Larsen Advanced applied statistics, 2015 1 / 56

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

Experiments and Causal Inference Erik Gahner Larsen Advanced applied statistics, 2015 1 / 67

1 Random vectors I Some experiments produce outcomes that are vectors. Such a vector is

Lecture on advanced volatility models Erik Lindstrm FMS161/MASM18 Financial Statistics Erik

Regression Discontinuity Designs Erik Gahner Larsen Advanced applied statistics, 2015 1 / 48

Statistics in high- -content biology content biology Statistics in high Rebecca Walls Rebecca

Statistics 3-1 Definitions Statistics is a branch of Mathematics

Who we are? OECD STATISTICS ESTONIA AUSTRALIAN BUREAU OF STATISTICS STATISTICS NEW ZEALAND

Bus 701: Advanced Statistics Harald Schmidbauer c Harald - PowerPoint PPT Presentation

Bus 701: Advanced Statistics Harald Schmidbauer c Harald Schmidbauer & Angi R osch, 2007 13.1 Simple Linear Regression: Goals Goals of Simple Linear Regression. Once again, given are points ( x i , y i ) , from a bivariate metric

Advanced Statistics Janette Walde janette.walde@uibk.ac.at Department of Statistics University

Advanced Mathematical Methods Part II Statistics Probability Mel Slater

Advanced Mathematical Methods Part II Statistics Statistical Inference Mel Slater

Advanced Mathematical Methods Part II Statistics Scientific Method Mel Slater

Advanced Mathematical Methods Part II Statistics Generalised Linear Model Mel Slater

Advanced Mathematical Methods Part II Statistics GLM Analysis of Variance Designs Mel

Advanced Mathematical Methods Part II Statistics Probability Distributions Mel Slater

Advanced Loops STAT 133 Gaston Sanchez Department of Statistics, UCBerkeley

1. Foundations of Numerics from Advanced Mathematics 1. Foundations of Numerics from Advanced

Domain Statistics Collector Tutorial Duane Wessels DNS-OARC Advanced ccTLD Workshop September

Instrumental Variable Regression Erik Gahner Larsen Advanced applied statistics, 2015 1 / 58

Information Security Identification and authentication Advanced User Authentication III

Modern Computational Statistics Lecture 8: Advanced MCMC Cheng Zhang School of Mathematical

Advanced Methods in Applied Statistics Christian Starup &amp; Loui Wentzel Niels Bohr Institute

Confidence intervals and the Feldman-Cousins construction Edoardo Milotti Advanced Statistics

Matching and Propensity Scores Erik Gahner Larsen Advanced applied statistics, 2015 1 / 56

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

Experiments and Causal Inference Erik Gahner Larsen Advanced applied statistics, 2015 1 / 67

1 Random vectors I Some experiments produce outcomes that are vectors. Such a vector is

Lecture on advanced volatility models Erik Lindstrm FMS161/MASM18 Financial Statistics Erik

Regression Discontinuity Designs Erik Gahner Larsen Advanced applied statistics, 2015 1 / 48

Statistics in high- -content biology content biology Statistics in high Rebecca Walls Rebecca

Statistics 3-1 Definitions Statistics is a branch of Mathematics

Who we are? OECD STATISTICS ESTONIA AUSTRALIAN BUREAU OF STATISTICS STATISTICS NEW ZEALAND

Advanced Methods in Applied Statistics Christian Starup & Loui Wentzel Niels Bohr Institute