ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Introduction to Regression Analysis Modeling a Response A regression model describes how a dependent variable (or response ) Y is affected, on average, by one or more independent variables (or factors , or covariates ) x 1 , x 2 , . . . , x k . Example Bleaching cotton: Y = measured whiteness of a cotton swatch x 1 = temperature of bleaching bath x 2 = time spent in the bath . 1 / 13 Introduction to Regression Analysis Modeling a Response
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II The average value of Y , E ( Y ), depends on x 1 , x 2 , . . . , x k , so it is a function of them: E ( Y ) = f ( x 1 , x 2 , . . . , x k ) = f ( x ) . We may know the general form of f ( x ), but it may contain constants β 0 , β 1 , . . . , β p whose values are unknown. So more completely, E ( Y ) = f ( x 1 , x 2 , . . . , x k ; β 0 , β 1 , . . . , β p ) = f ( x , β ) . This equation is a regression model . 2 / 13 Introduction to Regression Analysis Modeling a Response
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II In any given measurement, Y will differ from E ( Y ). The difference ǫ = Y − E ( Y ) is called the random error , and clearly E ( ǫ ) = E ( Y ) − E ( Y ) = 0 . We can then write the regression model as Y = E ( Y ) + ǫ = f ( x , β ) + ǫ. 3 / 13 Introduction to Regression Analysis Modeling a Response
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Example: bleaching cotton Bleaching is a chemical reaction in which colored impurities are oxidized either to colorless products, or to soluble products that are washed out. If we knew all the reactions, their rates at various temperatures, and the solubility of the products, we could use a process-based model to predict whiteness, E ( Y ). In practice, we don’t have all the details, so instead we use an empirical model. 4 / 13 Introduction to Regression Analysis Modeling a Response
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II The simplest empirical model is a linear function: E ( Y ) = β 0 + β 1 x 1 + β 2 x 2 . A quadratic model gives a better approximation: E ( Y ) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 + β 4 x 2 1 + β 5 x 2 2 . If β 4 < 0, β 5 < 0, and β 2 3 < 4 β 4 β 5 , this function has a maximum, which gives the optimum combination of temperature and time. 5 / 13 Introduction to Regression Analysis Overview of Regression Analysis
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Origin of “Regression” Francis Galton studied inheritability of physical characteristics such as height. Consider the deviation of an individual’s height from the gender average. Suppose that the deviation height Y of a son is, on average, linearly related to the average deviation height x of his parents: E ( Y ) = β 0 + β 1 x 6 / 13 Introduction to Regression Analysis Regression Applications
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II The intercept β 0 measures overall increase in height between generations, which is interesting but not related to inheritability. If β 1 = 1, the son inherits the full characteristic of his parents. If β 1 = 0, there is no inheritability. Galton observed β 1 ≈ 2 / 3, and described this as a regression to the mean. (OED: from Latin regressus, from regredi ’go back, return’, from re- ’back’ + gradi ’to walk’.) 7 / 13 Introduction to Regression Analysis Regression Applications
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II See Francis Galton, “Regression towards mediocrity in hereditary stature”. The Journal of the Anthropological Institute of Great Britain and Ireland , Vol 15, pages 246–263. (or Wikipedia!) The term “regression” has since been used for any such analysis, involving one or more variables, and involving linear and nonlinear relationships, mostly having no connection with inheritability. 8 / 13 Introduction to Regression Analysis Regression Applications
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Estimation In a regression context, we sample from many populations. For example, in bleaching cotton, for each combination of temperature and time, we could test many cotton swatches. Each time, the measured whiteness is drawn from some population. The constants β 0 , β 1 , . . . , β p are parameters of that collection of populations. 9 / 13 Introduction to Regression Analysis Collecting the Data for Regression
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II We need to make inferences about them, in the form of: point estimates; interval estimates; hypothesis tests. We shall get point estimates using the method of least squares . For other inferences, we need to know the distribution of the errors ǫ , and we shall assume that they are normally distributed. 10 / 13 Introduction to Regression Analysis Collecting the Data for Regression
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Observational and Experimental Data In some investigations, the independent variables x 1 , x 2 , . . . , x k can be controlled ; that is, held at desired values. For example, time and temperature in the bleaching problem. The resulting data are called experimental . 11 / 13 Introduction to Regression Analysis Collecting the Data for Regression
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II In other cases, the independent variables cannot be controlled, and their values are simply observed. For example, Galton’s heights of parents and sons. The resulting data are called observational . Observational data show how the value of the response is associated with values of the independent variables, but generally cannot reveal cause and effect . George Box: “To find out what happens to a system when you interfere with it, you have to interfere with it (not just passively observe it).” 12 / 13 Introduction to Regression Analysis Collecting the Data for Regression
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Random Thoughts About Statistical Models A model is a simplified representation of reality. George Box: “All models are wrong, but some are useful.” John Tukey: “An approximate answer to the right question is worth a good deal more than the exact answer to an approximate problem.” Albert Einstein: “For every complex question there is a simple and wrong solution.” 13 / 13 Introduction to Regression Analysis Random Thoughts
Recommend
More recommend