Unit 4: Inference for numerical variables Lecture 3: ANOVA Statistics 101 Thomas Leininger June 10, 2013
Announcements Announcements 1 2 ANOVA Aldrin in the Wolf River ANOVA and the F test ANOVA output, deconstructed Checking conditions 3 Multiple comparisons & Type 1 error rate Statistics 101 U4 - L3: ANOVA Thomas Leininger
Announcements Announcements Proposals due tomorrow. Will be returned to you by Wednesday. You MUST complete the proposal process. A few things to watch out for: Data is plural, data set is singular. Avoid using population data - if you have population data, you might consider taking a random sample. Exploratory analysis: should include some summary statistics and some graphics AND interpretations. If using existing data, find out how your data were collected, and discuss the sampling method as well as any possible biases. Scope of inference: generalizability & causality. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 2 / 34
ANOVA Announcements 1 2 ANOVA Aldrin in the Wolf River ANOVA and the F test ANOVA output, deconstructed Checking conditions 3 Multiple comparisons & Type 1 error rate Statistics 101 U4 - L3: ANOVA Thomas Leininger
ANOVA Aldrin in the Wolf River Announcements 1 2 ANOVA Aldrin in the Wolf River ANOVA and the F test ANOVA output, deconstructed Checking conditions 3 Multiple comparisons & Type 1 error rate Statistics 101 U4 - L3: ANOVA Thomas Leininger
ANOVA Aldrin in the Wolf River The Wolf River in Tennessee flows past an abandoned site once used by the pesticide industry for dumping wastes, including chlordane (pesticide), aldrin, and dieldrin (both insecticides). Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 3 / 34
ANOVA Aldrin in the Wolf River The Wolf River in Tennessee flows past an abandoned site once used by the pesticide industry for dumping wastes, including chlordane (pesticide), aldrin, and dieldrin (both insecticides). These highly toxic organic compounds can cause various cancers and birth defects. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 3 / 34
ANOVA Aldrin in the Wolf River The Wolf River in Tennessee flows past an abandoned site once used by the pesticide industry for dumping wastes, including chlordane (pesticide), aldrin, and dieldrin (both insecticides). These highly toxic organic compounds can cause various cancers and birth defects. The standard methods to test whether these substances are present in a river is to take samples at six-tenths depth. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 3 / 34
ANOVA Aldrin in the Wolf River The Wolf River in Tennessee flows past an abandoned site once used by the pesticide industry for dumping wastes, including chlordane (pesticide), aldrin, and dieldrin (both insecticides). These highly toxic organic compounds can cause various cancers and birth defects. The standard methods to test whether these substances are present in a river is to take samples at six-tenths depth. But since these compounds are denser than water and their molecules tend to stick to particles of sediment, they are more likely to be found in higher concentrations near the bottom than near mid-depth. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 3 / 34
ANOVA Aldrin in the Wolf River Data Aldrin concentration (nanograms per liter) at three levels of depth. aldrin depth 1 3.80 bottom 2 4.80 bottom ... 10 8.80 bottom 11 3.20 middepth 12 3.80 middepth ... 20 6.60 middepth 21 3.10 surface 22 3.60 surface ... 30 5.20 surface Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 4 / 34
ANOVA Aldrin in the Wolf River Exploratory analysis Aldrin concentration (nanograms per liter) at three levels of depth. bottom middepth surface 3 4 5 6 7 8 9 n mean sd bottom 10 6.04 1.58 middepth 10 5.05 1.10 surface 10 4.20 0.66 overall 30 5.1 0 1.37 Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 5 / 34
ANOVA Aldrin in the Wolf River Research question Is there a difference between the mean aldrin concentrations among the three levels? Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 6 / 34
ANOVA Aldrin in the Wolf River Research question Is there a difference between the mean aldrin concentrations among the three levels? To compare means of 2 groups we use a Z or a T statistic. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 6 / 34
ANOVA Aldrin in the Wolf River Research question Is there a difference between the mean aldrin concentrations among the three levels? To compare means of 2 groups we use a Z or a T statistic. To compare means of 3+ groups we use a new test called ANOVA and a new statistic called F . Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 6 / 34
ANOVA Aldrin in the Wolf River Recap: 2-sample CIs and HTs n mean sd bottom 10 6.04 1.58 middepth 10 5.05 1.10 surface 10 4.20 0.66 overall 30 5.1 0 1.37 � s 2 s 2 HT: T df = (¯ x 1 − ¯ x 2 ) − null value where SE = 1 n 2 and 2 n 1 + SE df = min ( n 1 − 1 , n 2 − 1) x 2 ) ± t ⋆ CI: (¯ x 1 − ¯ df × SE Application exercise: Perform a HT and construct a CI for each difference. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 7 / 34
ANOVA Aldrin in the Wolf River ANOVA ANOVA is used to assess whether the mean of the outcome variable is different for different levels of a categorical variable. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 8 / 34
ANOVA Aldrin in the Wolf River ANOVA ANOVA is used to assess whether the mean of the outcome variable is different for different levels of a categorical variable. H 0 : The mean outcome is the same across all categories, µ 1 = µ 2 = · · · = µ k , where µ i represents the mean of the outcome for observations in category i . H A : At least one mean is different than others. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 8 / 34
ANOVA Aldrin in the Wolf River Conditions The observations should be independent within and between 1 groups If the data are a simple random sample, this condition is satisfied. Carefully consider whether the between-group data is independent (e.g. no pairing). Always important, but sometimes difficult to check. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 9 / 34
ANOVA Aldrin in the Wolf River Conditions The observations should be independent within and between 1 groups If the data are a simple random sample, this condition is satisfied. Carefully consider whether the between-group data is independent (e.g. no pairing). Always important, but sometimes difficult to check. The observations within each group should be nearly normal. 2 Especially important when the sample sizes are small. How do we check for normality? Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 9 / 34
ANOVA Aldrin in the Wolf River Conditions The observations should be independent within and between 1 groups If the data are a simple random sample, this condition is satisfied. Carefully consider whether the between-group data is independent (e.g. no pairing). Always important, but sometimes difficult to check. The observations within each group should be nearly normal. 2 Especially important when the sample sizes are small. How do we check for normality? The variability across the groups should be about equal. 3 Especially important when the sample sizes differ between groups. How can we check this condition? Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 9 / 34
ANOVA Aldrin in the Wolf River z / t test vs. ANOVA - Purpose ANOVA z / t test Compare the means from two or Compare means from two groups more groups to see whether they to see whether they are so far are so far apart that the observed apart that the observed difference differences cannot all reasonably cannot reasonably be attributed to sampling variability. be attributed to sampling variability. H 0 : µ 1 = µ 2 H 0 : µ 1 = µ 2 = · · · = µ k H A : µ 1 � µ 2 H A : µ 1 < µ 2 H A : At least one mean is different H A : µ 1 > µ 2 Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 10 / 34
ANOVA Aldrin in the Wolf River z / t test vs. ANOVA - Method ANOVA z / t test Compute a test statistic (a ratio). Compute a test statistic (a ratio). z / t = (¯ x 1 − ¯ x 2 ) − ( µ 1 − µ 2 ) F = variability bet. groups SE (¯ x 1 − ¯ x 2 ) variability w/in groups Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 11 / 34
ANOVA Aldrin in the Wolf River z / t test vs. ANOVA - Method ANOVA z / t test Compute a test statistic (a ratio). Compute a test statistic (a ratio). z / t = (¯ x 1 − ¯ x 2 ) − ( µ 1 − µ 2 ) F = variability bet. groups SE (¯ x 1 − ¯ x 2 ) variability w/in groups Large test statistics lead to small p-values. If the p-value is small enough H 0 is rejected, and we conclude that the population means are not equal. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 11 / 34
ANOVA Aldrin in the Wolf River z / t test vs. ANOVA With only two groups t-test and ANOVA are equivalent, but only if we use a pooled standard variance in the denominator of the test statistic. Statistics 101 (Thomas Leininger) U4 - L3: ANOVA June 10, 2013 12 / 34
Recommend
More recommend