chi squared 2 1 10 5 and f tests 9 5 2 for the variance
play

Chi-squared ( 2 ) (1.10.5) and F -tests (9.5.2) for the variance of - PowerPoint PPT Presentation

Chi-squared ( 2 ) (1.10.5) and F -tests (9.5.2) for the variance of a normal distribution 2 tests for goodness of fit and indepdendence (3.5.43.5.5) Prof. Tesler Math 283 Fall 2016 2 and F tests Prof. Tesler Math 283 / Fall 2016


  1. Chi-squared ( χ 2 ) (1.10.5) and F -tests (9.5.2) for the variance of a normal distribution χ 2 tests for goodness of fit and indepdendence (3.5.4–3.5.5) Prof. Tesler Math 283 Fall 2016 χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 1 / 41

  2. Tests of means vs. tests of variances 2 Data x 1 , . . . , x n , sample mean ¯ x , sample var. s X 2 Data y 1 , . . . , y m , sample mean ¯ y , sample var. s Y Tests for mean Tests for variance One-sample tests: One-sample test: H 0 : σ 2 = σ 0 2 vs. H 1 : σ 2 � σ 0 2 H 0 : µ = µ 0 vs. H 1 : µ � µ 0 test statistic: test statistic: “chi-squared” x − µ 0 ¯ x − µ 0 ¯ χ 2 = ( n − 1 ) s 2 /σ 0 2 ( df = n − 1 ) z = σ/ √ n or t = s / √ n ( df = n − 1 ) Two-sample tests: Two-sample test: 2 = σ Y 2 vs. H 1 : σ X 2 � σ Y 2 H 0 : µ X = µ Y vs. H 1 : µ X � µ Y H 0 : σ X test statistic: test statistic: x − ¯ ¯ y 2 / s X 2 z = F = s Y � 2 2 σ X σ Y (with m − 1 and n − 1 d.f.) n + m x − ¯ ¯ y √ 1 or t = ( df = n + m − 2 ) n + 1 s p m χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 2 / 41

  3. Application: The fine print in the Z and t -tests One-sample z -test, H 0 : µ = µ 0 vs. H 1 : µ � µ 0 This assumes that you know the value of σ 2 , say σ 2 = σ 02 . A χ 2 test could be used to verify that the data is consistent with H 0 : σ 2 = σ 02 instead of H 1 : σ 2 � σ 02 . Two-sample z -test, H 0 : µ X = µ Y vs. H 1 : µ X � µ Y 2 and σ Y 2 . This assumes that you know the values of σ X Separate χ 2 tests for σ X 2 and σ Y 2 could be performed to verify consistency with the assumed values. Two-sample t -test, H 0 : µ X = µ Y vs. H 1 : µ X � µ Y 2 = σ Y 2 (but doesn’t assume that this common This assumes σ X value is known to you). An F -test could be used to verify that the data is consistent with 2 = σ Y 2 instead of H 1 : σ X 2 � σ Y 2 . H 0 : σ X If the variances are unequal, Welch’s t -test can be used instead of the regular two-sample t -test (Ewens & Grant pp. 127–128). χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 3 / 41

  4. The χ 2 (“Chi-squared”) distribution Used for confidence intervals and hypothesis tests on the unknown parameter σ 2 of the normal distribution, based on the test statistic s 2 (sample variance). It has the same “degrees of freedom” as for the t distribution. Point these out on the graphs: The chi-squared distribution with k degrees of freedom has Range [ 0 , ∞ ) Mean µ = k χ 2 = k − 2 (for k � 2 , the pdf is maximum for χ 2 = k − 2 ) Mode χ 2 = 1 (for k = 1 ) ≈ k ( 1 − 2 9 k ) 3 Median Between k and k − 2 3 . Asymptotically decreases → k − 2 3 as k → ∞ . σ 2 = 2 k Variance x ( k / 2 )− 1 e − x / 2 2 , rate λ = 1 Γ distrib. with shape r = k PDF 2 k / 2 Γ ( k / 2 ) : 2 Unlike z and t , the pdf for χ 2 is NOT symmetric. χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 4 / 41

  5. The graphs for 1 and 2 degrees of freedom are decreasing: 5 0.5 4 0.4 3 0.3 pdf pdf 2 0.2 1 0.1 0 0 0 1 2 3 4 0 1 2 3 4 5 6 ! 2 ! 2 mean @ µ = 1 mean @ µ = 2 1 2 mode @ χ 2 = 0 mode @ χ 2 = 0 median @ χ 2 = chi2inv(.5,1) = qchisq(.5,1) = 0 . 4549 median @ χ 2 = chi2inv(.5,2) = qchisq(.5,2) = 1 . 3863 The rest are “hump” shaped and skewed to the right: 0.25 0.12 0.1 0.2 0.08 0.15 pdf pdf 0.06 0.1 0.04 0.05 0.02 0 0 0 2 4 6 8 0 2 4 6 8 10 12 14 16 ! 2 ! 2 mean @ µ = 3 mean @ µ = 8 3 8 mode @ χ 2 = 1 mode @ χ 2 = 6 median @ χ 2 = chi2inv(.5,3) = qchisq(.5,3) = 2 . 3660 median @ χ 2 = chi2inv(.5,8) = qchisq(.5,8) = 7 . 3441 χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 5 / 41

  6. χ 2 (“Chi-squared”) distribution — Cutoffs 2 � sided acceptance region: df=5, � = 0.05 left � sided critical region 0.15 0.15 0.10 0.10 pdf pdf 0.05 0.05 � 2 � 2 0.025,5 = 0.8312116 � ,df � 2 0.975,5 = 12.83250 0.00 0.00 0 2 4 6 8 10 12 0 5 10 15 � 2 � 2 5 5 Define χ 2 α , df as the number where the cdf (area left of it) is α : P ( χ 2 df � χ 2 α , df ) = α Different notation than z α and t α , df (area α on right ) since pdf isn’t symmetric. Matlab R χ 2 0 . 025 , 5 = chi2inv(.025,5) = qchisq(.025,5) = 0 . 8312 χ 2 0 . 975 , 5 = chi2inv(.975,5) = qchisq(.975,5) = 12 . 8325 = pchisq(0.8312,5) = 0 . 025 chi2cdf(0.8312,5) chi2cdf(12.8325,5) = pchisq(12.8325,5) = 0 . 975 = dchisq(0.8312,5) = 0 . 0665 chi2pdf(0.8312,5) chi2pdf(12.8325,5) = dchisq(12.8325,5) = 0 . 0100 χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 6 / 41

  7. Two-sided cutoff 2 � sided acceptance region: df=5, � = 0.05 0.15 0.10 pdf 0.05 � 2 0.025,5 = 0.8312116 � 2 0.975,5 = 12.83250 0.00 0 5 10 15 � 2 5 The mean, median, and mode are different, so it may not be obvious what values of χ 2 are “more consistent” with the null H 0 : σ 2 = 10000 vs. the alternative σ 2 � 10000 . Closer to the median of χ 2 is "more consistent" with H 0 . For 2-sided hypothesis tests or confidence intervals with α = 5 %, we still put 95 % of the area in the middle and 2 . 5 % at each end, but the pdf is not symmetric, so the lower and upper cutoffs are determined separately instead of ± each other. χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 7 / 41

  8. Two-sided hypothesis test for variance H 0 : σ 2 = 10000 vs. H 1 : σ 2 � 10000 at sig. level α = . 05 Test (In general, replace 10000 by σ 02 ; here, σ 0 = 100 ) Decision procedure Get a sample x 1 , . . . , x n . 1 650, 510, 470, 570, 410, 370 with n = 6 � n s 2 = m = x 1 + ··· + x n 1 i = 1 ( x i − m ) 2 . Calculate and 2 n − 1 n s 2 = 10666 . 67 , m = 496 . 67 , s = 103 . 28 = � n Calculate the test-statistic χ 2 = ( n − 1 ) s 2 ( x i − m ) 2 3 i = 1 σ 02 σ 02 χ 2 = ( n − 1 ) s 2 = ( 6 − 1 )( 10666 . 67 ) = 5 . 33 σ 02 10000 Accept H 0 if χ 2 is between χ 2 α/ 2 , n − 1 and χ 2 1 − α/ 2 , n − 1 . 4 Reject H 0 otherwise. χ 2 . 025 , 5 = . 8312 , χ 2 . 975 , 5 = 12 . 8325 Since χ 2 = 5 . 33 is between these, we accept H 0 . (Or, there is insufficient evidence to reject σ 2 = 10000 .) χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 8 / 41

  9. Doing the same test with a P -value 0.15 ● ● ● ● ● ● ● ● Supports H 0 better ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Supports H 1 better ● ● ● ● ● ● ● ● 24.61% ● ● median=4.35 ● ● ● ● ● ● ● ● ● 0.10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● pdf ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 37.69% 37.69% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 3.50 5.33 10 15 20 χ 5 2 5 � 5 . 33 ) = 0 . 6231 is the area left of 5.33 for χ 2 with 5 d.f.: P ( χ 2 Matlab: chi2cdf(5.33,5) R: pchisq(5.33,5) Values at least as extreme as this are those at the 62.31th percentile or higher, OR at the 37.69th percentile or lower, so P = ( 1 − . 6231 ) + . 3769 = 2 ( . 3769 ) = 0 . 7539 P > α ( 0 . 75 > 0 . 05 ) so accept H 0 . To turn a one-sided P -value p 1 into a two-sided P -value, use P = 2 min ( p 1 , 1 − p 1 ) . χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 9 / 41

  10. Two-sided 95 % confidence interval for the variance Continue with data 650, 510, 470, 570, 410, 370 s 2 = 10666 . 67 , which has n = 6 , m = 496 . 67 , s = 103 . 28 . Get bounds on σ 2 in terms of s 2 for the two-sided test: . 025 , 5 < χ 2 < χ 2 P ( χ 2 0 . 95 = . 975 , 5 ) P ( 0 . 8312 < χ 2 < 12 . 8325 ) = � � 0 . 8312 < ( 6 − 1 ) S 2 = < 12 . 8325 P σ 2 � ( 6 − 1 ) S 2 0 . 8312 > σ 2 > ( 6 − 1 ) S 2 � = P 12 . 8325 A two-sided 95 % confidence interval for the variance σ 2 is � ( 6 − 1 ) S 2 12 . 8325 , ( 6 − 1 ) S 2 � = ( 4156 . 11 , 64164 . 26 ) 0 . 8312 A two-sided 95 % confidence interval for σ is � � � � ( 6 − 1 ) S 2 ( 6 − 1 ) S 2 12 . 8325 , = ( 64 . 47 , 253 . 31 ) 0 . 8312 χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 10 / 41

  11. Properties of Chi-squared distribution Definition of Chi-squared distribution: 1 Let Z 1 , . . . , Z k be independent standard normal variables. k = Z 12 + · · · + Z k 2 . Let χ 2 The pdf of the random variable χ 2 k is the “chi-squared distribution with k degrees of freedom.” Pooling property: If U and V are independent χ 2 random 2 variables with q and r degrees of freedom respectively, then U + V is a χ 2 random variable with q + r degrees of freedom. Sample variance: Pick X 1 , . . . , X n from a normal distribution 3 N ( µ , σ 2 ) . It turns out that n ( X i − X ) 2 σ 2 = ( n − 1 ) S 2 � = SS σ 2 σ 2 i = 1 has a χ 2 distribution with df = n − 1 , so we test on χ 2 = ( n − 1 ) s 2 . 2 σ 0 χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 11 / 41

Recommend


More recommend