ACMS 20340 Statistics for Life Sciences Chapter 18: Comparing Two Means
Daily Activity and Obesity Researchers at the Mayo Clinic investigate the link between obesity and energy spent on daily activities. They choose 20 healthy volunteers and monitor their activities for 10 days. They are deliberately chosen so there are 10 who are lean and 10 who are mildly obese. However, the individuals in each group are selected randomly.
Warning! We CANNOT proceed how we did last time. Last time, we discussed the special case of using differences with a matched pairs design. This experiment does not include any pairing off of the subjects. So how do we approach these sorts of situations?
Two-Sample Problems With two-sample problems, we are actually comparing two separate populations. Our goal is to compare the responses to two treatments or to simply compare the two populations. We are not comparing a sample to its unknown population as we have in the past.
Conditions for Inference Comparing Two Population Means ◮ We have two SRSs , from two distinct populations. ◮ The samples are independent , meaning that one sample has no influence on the other. (Matching would violate independence.) ◮ Both populations are Normally distributed . In practice, it is sufficient for the distributions to have similar shapes and no strong outliers in the data.
Comparing Two Populations The notation we use for the populations is as follows: Popluation Population Mean Population s.d. 1 µ 1 σ 1 2 µ 2 σ 2 All four of these parameters are unknown. When comparing two populations, focus on the difference between the two means: µ 1 − µ 2 .
Comparing Two Populations As with the one-sample t procedures, we estimate the parameters using our sample statistics. Population Sample Size Sample Mean Sample S.D. 1 n 1 ¯ x 1 s 1 2 ¯ n 2 x 2 s 2 NOTE: The sizes of the two samples may be different .
Two-Sample t Procedures Since we’re focusing on the difference between the populations, the variable we are concerned with is the “difference in sample means,” or ¯ x 1 − ¯ x 2 . The sampling distributions of ¯ x 1 and ¯ x 2 have standard deviations σ 1 / √ n 1 and σ 2 / √ n 2 respectively. When looking at two samples together, our formulas have to change fairly drastically.
Two-Sample t Procedures The standard deviation of the sampling distribution for the difference ¯ x 1 − ¯ x 2 is � σ 2 + σ 2 1 2 . n 1 n 2 Because we do not know either population standard deviation, we instead use the standard error, � s 2 + s 2 1 2 SE = . n 1 n 2
Degrees of Freedom Since the samples may be different sizes, we need a new way of choosing our degrees of freedom: � 2 � s 2 + s 2 1 2 n 1 n 2 df = � 2 � 2 � s 2 � s 2 1 1 1 2 + n 1 − 1 n 2 − 1 n 1 n 2 This calculation rarely yields a whole number, so you must round down in order to use the t table, Table C.
Confidence Intervals and Hypothesis Tests A level C confidence interval for µ 1 − µ 2 is given by � s 2 + s 2 1 2 (¯ x 1 − ¯ x 2 ) ± t ∗ . n 1 n 2 To test the hypothesis H 0 : µ 1 = µ 2 (which is equivalent to H 0 : µ 1 − µ 2 = 0), we calculate the two-sample t statistic t = (¯ x 1 − ¯ x 2 ) − ( µ 1 − µ 2 ) . � s 2 + s 2 1 2 n 1 n 2
Hypothesis Tests Usually the null hypothesis is one of no difference, i.e. µ 1 − µ 2 = 0. In this case the two-sample t statistic simplifies to t = (¯ x 1 − ¯ x 2 ) � s 2 + s 2 1 2 n 1 n 2 Find the t ∗ critical values and P -values the same way as before.
Daily Activity and Obesity Recall: 10 lean subjects and 10 mildly obese subjects are monitored for amount of time spent standing or walking per day in minutes. Group Condition n x ¯ s 1 lean 10 525.751 107.121 2 obese 10 373.269 67.498 Find a 90% confidence interval for the difference in average daily minutes spent walking or standing.
Daily Activity and Obesity First we must find the degrees of freedom: � 2 � 107 . 121 2 + 67 . 498 2 10 10 df = � 2 = 15 . 174 � 2 � 107 . 121 2 � 67 . 498 2 1 + 1 9 10 9 10 Using 15 degrees of freedom, find critical value t ∗ for a confidence level of 0.90.
Daily Activity and Obesity The 90% confidence interval for the difference in population mean, µ 1 − µ 2 is � s 2 + s 2 1 2 (¯ x 1 − ¯ x 2 ) ± t ∗ . n 1 n 2 Plugging in the values we have simplified solution [82 . 29 , 222 . 67]
Studying Alzheimer’s Disease An observational study of Alzheimer’s disease (AD) obtained data from 10 AD patients exhibiting moderate dementia and selected a group of 14 individuals without AD to act as a control group. For the study to be credible, the populations must be similar. We’ll perform a hypothesis test to determine if there is any difference in age between the two groups.
Studying Alzheimer’s Disease The null hypothesis is one of no difference between the populations. H 0 : µ 1 = µ 2 (that is µ 1 − µ 2 = 0) The alternative hypothesis is two-sided because we do not have a direction in mind. H a : µ 1 � = µ 2 (that is µ 1 − µ 2 � = 0)
Studying Alzheimer’s Disease The summary statistics of the two samples are as follows: Group Condition n ¯ x s 1 Alzheimer’s 10 85.9 6.21 2 Control 14 83.7 8.14
Studying Alzheimer’s Disease The two-sample t statistic is x 1 − ¯ ¯ x 2 t = � s 2 + s 2 1 2 n 1 n 2 = 0 . 75 .
Studying Alzheimer’s Disease The degrees of freedom (df) are given by � 2 � 6 . 21 2 + 8 . 14 2 10 14 df = � 2 = 21 . 856 � 2 � 6 . 21 2 � 8 . 14 2 1 + 1 9 10 13 14 Using Table C, we compare t = 0 . 75 with the two critical values of the t (21) distribution.
Studying Alzheimer’s Disease We fail to reject H 0 . There is no significant evidence that there is an age difference between the two groups even at a larger significance level α = 0 . 10.
Quiz To study the effect of the spectrum of light on the growth of plants, researchers assigned tobacco seedlings at random to two groups of 8 plants each. The plants were grown in a greenhouse under identical conditions except for lighting. The control group was grown under natural light, the experimental group under a blue light. What is the experimental design? A completely randomized design . Stem growth in millimeters: Control 4.3, 4.2, 3.9, 4.1, 4.1, 4.2, 3.8, 4.1 Experimental 3.1, 2.9, 3.2, 3.2, 2.7, 2.9, 3.0, 3.1 Find a 95% confidence interval for the difference in mean stem growth.
Quiz, continued Stem growth in millimeters: Control 4.3, 4.2, 3.9, 4.1, 4.1, 4.2, 3.8, 4.1 Experimental 3.1, 2.9, 3.2, 3.2, 2.7, 2.9, 3.0, 3.1 size x ¯ s Control 8 4.09 0.164 Experimental 8 3.01 0.173 Can we use our two-sample method to compute the confidence interval? ◮ Do we have independent random samples? ◮ Is each sample approximately normal?
Quiz, continued Stem growth in millimeters: Control 4.3, 4.2, 3.9, 4.1, 4.1, 4.2, 3.8, 4.1 Experimental 3.1, 2.9, 3.2, 3.2, 2.7, 2.9, 3.0, 3.1 size ¯ x s Control 8 4.09 0.164 Experimental 8 3.01 0.173 � s 2 + s 2 1 2 SE (¯ x 1 − ¯ x 2 ) = = 0 . 084 n 1 n 2 � 2 � s 2 1 / n 1 + s 2 2 / n 2 df = = 13 . 96 � 2 / ( n 1 − 1) + � 2 / ( n 2 − 1) � s 2 � s 2 1 / n 1 2 / n 2 Round down to get df = 13.
Quiz, continued size ¯ x s Control 8 4.09 0.164 Experimental 8 3.01 0.173 SE (¯ x 1 − ¯ x 2 ) = 0 . 084 df = 13 Look up 95% critical value: t ∗ = 2 . 160. Calculate: (¯ x 1 − ¯ x 2 ) ± t ∗ SE (¯ x 1 − ¯ x 2 ) = (1 . 08) ± (2 . 160)(0 . 084) = [0 . 90 , 1 . 26] The estimated difference between populations is positive, indicating the control group has more growth than the experimental group.
General Comments on Two-Sample Tests ◮ Two-sample tests are robust against the data not being exactly normal, just as long as there are no outliers. ◮ It is better to have the two samples be the same size, if possible. ◮ When the sizes of the two samples are equal and the two populations being compared have distributions with similar shapes, probabilities from the t table are fairly accurate for a broad range of distributions when the sample sizes are as small as n 1 = n 2 = 5. ◮ Do not try to estimate the standard deviations beyond calculating s 1 and s 2 . Standard deviations are actually very hard to estimate since the methods only work if the population is normal. It is usually best to seek expert advice in this case.
Recommend
More recommend