nonnormal responses
play

Nonnormal Responses We have usually assumed that experimental data - PowerPoint PPT Presentation

ST 516 Experimental Statistics for Engineers II Nonnormal Responses We have usually assumed that experimental data are at least approximately normally distributed, with at least approximately constant variance. When either assumption is


  1. ST 516 Experimental Statistics for Engineers II Nonnormal Responses We have usually assumed that experimental data are at least approximately normally distributed, with at least approximately constant variance. When either assumption is violated, we can try transforming the response to remove the violation, or using another model for the response distribution. 1 / 10 Other Topics Nonnormal Responses

  2. ST 516 Experimental Statistics for Engineers II Box-Cox approach The power transformations y ∗ = y λ are useful. Box and Cox developed a systematic approach to finding a good λ , based on y λ − 1  λ � = 0 ,  y ( λ ) =  y λ − 1 λ ˙  y ln y ˙ λ = 0 ,  where � 1 � � y = exp ˙ ln y n is the geometric mean response. 2 / 10 Other Topics Nonnormal Responses

  3. ST 516 Experimental Statistics for Engineers II Procedure Fit model for various λ , and graph SSE against λ . Lowest SSE gives best λ . All λ with SSE( λ ) ≤ SS ∗ comprise a 100(1 − α )% confidence interval, where � � t 2 SS ∗ = SSE( λ opt ) α/ 2 , dfE 1 + . dfE Example Peak discharge data (peak-discharge.txt): (peak-discharge-box-cox.R). 3 / 10 Other Topics Nonnormal Responses

  4. ST 516 Experimental Statistics for Engineers II Generalized Linear Model Sometimes a better approach is to use a different statistical model. E.g., for counted data, assume that Y has the Poisson distribution. Replace the linear model E( Y ) = µ = β 0 + β 1 x 1 + β 2 x 2 + · · · + β k x k = x ′ β by ⇒ E( Y ) = µ = g − 1 ( x ′ β ) g ( µ ) = x ′ β ⇐ for some nonlinear link function g ( · ). 4 / 10 Other Topics Nonnormal Responses

  5. ST 516 Experimental Statistics for Engineers II If the distribution is in the exponential family and the link function is chosen to match it, estimation by maximum likelihood is relatively easy. In general, the variance of Y also depends on µ ; examples from the exponential family: Distribution g ( µ ) V ( µ ) Normal, σ 2 = 1 µ 1 Poisson log µ µ µ 2 Gamma 1 /µ 1 /µ 2 µ 3 Inverse Gaussian µ Binomial log µ (1 − µ ) 1 − µ 5 / 10 Other Topics Nonnormal Responses

  6. ST 516 Experimental Statistics for Engineers II Other combinations of distribution, g ( · ), and V ( · ) may also be used, but are not supported by standard software. The binomial case is widely used: e x ′ β 1 P( Y = 1) = 1 + e x ′ β . = 1 + e − x ′ β . Example Coupon redemption: Y is the number of customers out of 1000 who redeem the coupon; three factors were used in a 2 3 factorial design. 6 / 10 Other Topics Nonnormal Responses

  7. ST 516 Experimental Statistics for Engineers II R commands Generalized linear models are fitted using glm() : summary(glm(cbind(Redeemed, Customers - Redeemed) ~ A * B + A * C + B * C, coupon, family = "binomial")) Output Call: glm(formula = cbind(Redeemed, Customers - Redeemed) ~ A * B + A * C + B * C, family = "binomial", data = coupon) Deviance Residuals: 1 2 3 4 5 6 7 8 0.4723 -0.4307 -0.4228 0.3949 -0.4572 0.4166 0.4238 -0.3987 7 / 10 Other Topics Nonnormal Responses

  8. ST 516 Experimental Statistics for Engineers II Output, continued Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.011545 0.025515 -39.645 < 2e-16 *** A 0.169208 0.025509 6.633 3.28e-11 *** B 0.169622 0.025515 6.648 2.97e-11 *** C 0.023317 0.025510 0.914 0.361 A:B -0.006285 0.025512 -0.246 0.805 A:C -0.002773 0.025432 -0.109 0.913 B:C -0.041020 0.025434 -1.613 0.107 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 93.0238 on 7 degrees of freedom Residual deviance: 1.4645 on 1 degrees of freedom AIC: 72.286 Number of Fisher Scoring iterations: 3 8 / 10 Other Topics Nonnormal Responses

  9. ST 516 Experimental Statistics for Engineers II Reduced model The analyst decides to fit a reduced model including A , B , and BC (and, to keep it hierarchical, C ): summary(glm(cbind(Redeemed, Customers - Redeemed) ~ A + B * C, coupon, family = "binomial")) Output Call: glm(formula = cbind(Redeemed, Customers - Redeemed) ~ A + B * C, family = "binomial", data = coupon) Deviance Residuals: 1 2 3 4 5 6 7 8 0.3402 -0.3114 -0.3783 0.3531 -0.5142 0.4692 0.5509 -0.5171 9 / 10 Other Topics Nonnormal Responses

  10. ST 516 Experimental Statistics for Engineers II Output, continued Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.01142 0.02551 -39.652 < 2e-16 *** A 0.16868 0.02542 6.635 3.25e-11 *** B 0.16912 0.02543 6.650 2.94e-11 *** C 0.02308 0.02543 0.908 0.364 B:C -0.04097 0.02543 -1.611 0.107 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 93.0238 on 7 degrees of freedom Residual deviance: 1.5360 on 3 degrees of freedom AIC: 68.358 Number of Fisher Scoring iterations: 3 10 / 10 Other Topics Nonnormal Responses

Recommend


More recommend