stat 350 introduction
play

STAT 350: Introduction Instructor: Richard A. Lockhart e-mail: - PowerPoint PPT Presentation

STAT 350: Introduction Instructor: Richard A. Lockhart e-mail: lockhart at sfu.ca Office: TLX 10549 Phone: (778) 782-3264 Web Site: http://www.stat.sfu.ca/lockhart Richard Lockhart STAT 350: Introduction Text :


  1. STAT 350: Introduction ◮ Instructor: Richard A. Lockhart ◮ e-mail: lockhart ‘at’ sfu.ca ◮ Office: TLX 10549 ◮ Phone: (778) 782-3264 ◮ Web Site: http://www.stat.sfu.ca/˜lockhart Richard Lockhart STAT 350: Introduction

  2. ◮ Text : Applied Linear Statistical Models by Kutner, Nachtsheim, Neter, Li (5th ed). ◮ Coverage : Chapters 1 through 11 and selected material from Chapters 15 to 22; coverage in individual chapters will not be complete. ◮ Course structure : 3 hours per week of lectures, (2 on Monday and 1 on Wednesday). ◮ Course structure : regular assignments (one every two weeks roughly) ◮ Course structure : two midterms and a final exam. ◮ Grading : Assigments 15%, Midterms 35%, Final 50%. Richard Lockhart STAT 350: Introduction

  3. ◮ Computing requirements : You will be required to do statistical computing in SAS, JMP or other statistical language. ◮ Computing requirements : I will hold tutorials in the PC computing lab in week 2 (and possibly week 3) to show you a bit of SAS. ◮ Reading : I will be assuming you have some familiarity with the material in Part I of the text: Chapters 1-4 and the basics of matrices as in Chapter 5 sections 1 through 7. I don’t assume you have covered every topic there, however. ◮ Reading : I also assume you are familiar with the material in Appendix A except possibly sections 5 and 9. Please let me know now if this is wrong! Richard Lockhart STAT 350: Introduction

  4. Things you have seen before ◮ Inference: estimation, hypothesis tests, P -values, confidence intervals. ◮ Simple linear regression: least squares, inference. ◮ Maximum likelihood estimation. ◮ Basic probability: distributions, densities, expected values. ◮ Experimental designs: randomization, treatment vs control, blinding, confounding, observational studies. Richard Lockhart STAT 350: Introduction

  5. Subject of this course : ◮ Values Y 1 , . . . , Y n of a “response” or “dependent” variable are measured under different “conditions”. ◮ Goal: understand influence of conditions on response. ◮ Role of statistics: response is subject random fluctuation or error. Richard Lockhart STAT 350: Introduction

  6. Where do the data come from? ◮ Designed experiment: ‘conditions’ controlled by experimenter. ◮ Survey data: Y and ‘conditions’ each measured on sample from population. In the latter case: consider conditional behaviour of Y given ‘conditions’. Richard Lockhart STAT 350: Introduction

  7. Basic Statistical Model Additive errors: Y = µ + ǫ Assume E ( ǫ ) = 0 (or define µ = E ( Y ) and deduce that E ( ǫ ) = 0). For a sample of size n : Y i = µ i + ǫ i ; E ( ǫ i ) = 0 i = 1 , . . . , n Goal now: relate µ i to “conditions” for measurement i . “Condition” summarized by values of “covariates” x ij = value of j th covariate for i th response Richard Lockhart STAT 350: Introduction

  8. Linear Models Often we assume µ i = x i 1 β 1 + x i 2 β 2 + · · · + x ip β p where β 1 , . . . , β p are parameters (usually unknown). Key is: ◮ µ is a linear function of   β 1 .   . β = .   β p ◮ This makes it a linear model. ◮ The x ij are known . A useful alternative description: ∂µ i (= x ij ) is known ∂β j Richard Lockhart STAT 350: Introduction

  9. Example: Thermoluminescence Dating (TL) ◮ Used to determine age of a piece of pottery or a sand dune ◮ Piece of pottery ground up, split into small samples. ◮ Samples irradiated with different amounts of gamma radiation then heated in an oven. ◮ At temperatures around 300 C they glow with blue light called thermoluminescence. ◮ Amount of light given off, Y depends on the dose D of radiation given (and also on the amount of radiation —cosmic rays or radiation from trace isotopes in the ground— to which the pot or sand was exposed while buried). Richard Lockhart STAT 350: Introduction

  10. Several models are in use: 1. a straight-line model, Y i = β 1 + β 2 D i + ǫ i 2. a quadratic model, Y i = β 1 + β 2 D i + β 3 D 2 i + ǫ i 3. a cubic model, Y i = β 1 + β 2 D i + β 3 D 2 i + β 4 D 3 i + ǫ i 4. and a saturating exponential model, Y i = β 1 [1 − exp { ( − ( β 2 D i + β 3 ) } ] + ǫ i . First three are linear models while the fourth is not. In the first three cases the mean µ i can be differentiated with respect to any β j and you get a known (measured) constant. Richard Lockhart STAT 350: Introduction

  11. E.g., in the second model ( x i , 1 , x i , 2 , x i , 3 ) = (1 , D i , D 2 i ) . For last model derivatives depend on unknown parameters, such as, ∂µ i = 1 − exp { ( − ( β 2 D i + β 3 ) } ∂β 1 which is not known since it involves β 2 and β 3 . Richard Lockhart STAT 350: Introduction

  12. Here is a plot of the data with the least squares line drawn in. Plot of Data • • • 45000 • Count • • 35000 • • • • • • • • • • 25000 • 0 1000 2000 3000 Dose Richard Lockhart STAT 350: Introduction

  13. Same plot with the least squares fit of the quadratic model. Plot of Data • • • 45000 • Count • • 35000 • • • • Linear Fit • Quadratic Fit • • • • • 25000 • 0 1000 2000 3000 Dose Richard Lockhart STAT 350: Introduction

  14. ◮ Fits are virtually indistinguishable. ◮ But: important to test hypothesis that the β 3 = 0. Why? ◮ Consider the use to which these models are put. ◮ Intercept term β 1 is amount of TL if you don’t add any radiation. ◮ That is, β 1 is TL due to the exposure to cosmic rays and so on while buried. ◮ Total exposure while buried equivalent to some dose D eq of added radiation called “equivalent dose”, equivalent in sense that β 1 = β 2 D eq if a straight line model is appropriate. ◮ Measure equivalent dose by finding the value of D which would produce a predicted TL equal to 0 ◮ Extrapolate to negative doses until fit crosses x axis. ◮ Warning: extrapolation requires scientific theory. Richard Lockhart STAT 350: Introduction

  15. Linear and quadratic fits cross x axis ( y = 0) at different places: Plot of Data • • 50000 • • • • • Count 30000 • • • • • •• • • • 10000 Linear Fit Quadratic Fit 0 -4000 -2000 0 2000 4000 Dose Richard Lockhart STAT 350: Introduction

  16. Fit linear (and non-linear) models by least squares. Examine residual plots to judge whether or not the model assumptions are adequate: Plot of Residual versus Dose • 2000 • • • Residual • • • • • • • • 0 • • • -3000 • • 0 1000 2000 3000 Dose Richard Lockhart STAT 350: Introduction

  17. Plot shows clear signs of heteroscedasticity — unequal variances. Look at Q-Q plots of the residuals to judge normality. • 2000 resid(linear.fit) • • • • • • • • • • 0 • • • • -3000 • • -2 -1 0 1 2 Quantiles of Standard Normal Richard Lockhart STAT 350: Introduction

  18. Plot is not straight So assumption of normally distributed errors in doubt Problem probably irrelevant in view of the heteroscedasticity, however. Richard Lockhart STAT 350: Introduction

  19. Matrix form of a linear model Stack Y i , µ i and ǫ i into vectors:       µ 1 ǫ 1 Y 1 Y 2 µ 2 ǫ 2             Y = µ = ǫ = . . .       . . . . . .       µ n ǫ n Y n Richard Lockhart STAT 350: Introduction

  20. Define     β 1 · · · x 1 , 1 x 1 , p β 2 x 2 , 1 · · · x 2 , p         β = X = . .  .   .  . .     β p · · · x n , 1 x n , p n × p Note   x 1 , 1 β 1 + · · · + x 1 , p β p .  .  X β =  = µ .  x n , 1 β 1 + · · · + x n , p β p so µ = X β Richard Lockhart STAT 350: Introduction

  21. Finally Y = X β + ǫ is our original set of n model equations written in vector matrix form. Assumptions so far: E ( ǫ i ) = 0 Y = µ + ǫ µ = X β Still to come: independence, homoscedasticity, normality. Richard Lockhart STAT 350: Introduction

  22. Examples : please take the point that this is a very large class of models. ◮ One sample problem. ◮ Two sample problem. ◮ Simple linear regression. ◮ Polynomial models: “polynomial regression”. ◮ Analysis of Covariance: fitting two straight lines ◮ Weighing designs: (a simple example mostly for illustration) ◮ One way layout (ANOVA). Example has data Y ij being Next: details of these as linear models. Richard Lockhart STAT 350: Introduction

  23. One Sample Problem ◮ Y 1 , . . . , Y n measured under “identical” conditions. ◮ So µ 1 , . . . , µ n = β 1 , say.   1 1     ◮ X = .  .  .   1 n × 1 ◮ β = [ β 1 ] 1 × 1 (so p = 1).   1 1     ◮ Y =  β + ǫ . .  .  .  1 Richard Lockhart STAT 350: Introduction

  24. Two sample problem For n = r + s µ 1 = · · · = µ r = β 1 µ r +1 = · · · = µ r + s = β 2 For i ≤ r Y i = β 1 + ǫ i E ( Y i ) = β 1 For r < i ≤ r + s Y i = β 2 + ǫ i E ( Y i ) = β 2 In matrix form   1 0 . .  . .  . .   � β 1   � 1 0   Y = + ǫ   0 1 β 2     . .  . .  . .   0 1 Richard Lockhart STAT 350: Introduction

Recommend


More recommend