ACMS 20340 Statistics for Life Sciences Chapter 15: Inference in Practice
Inference in Practice Recall the simple conditions for inference about a population mean: ◮ Known standard deviation ◮ Sample obtained randomly ◮ Normally distributed population Let’s consider the last two conditions. . .
The source of our data matters Inference via confidence intervals and hypotheses tests depend on the samples being random (since, for instance, we treat a sample statistic as a random variable). If our data don’t come from a random sample or a randomized experiment, we have a greater chance of drawing the wrong conclusion.
Some pitfalls Given a sample, we need to be cautious: It may be hard to tell if it is an SRS. ◮ Nonresponse or dropouts from an experiment ◮ Confidence intervals and hypothesis testing may not work if our sample is given by a method more complicated than an SRS. ◮ We have to deal with voluntary response surveys, uncontrolled experiments, biased samples, etc.
The assumption of Normality The underlying distribution of the population is less of an issue: many statistical procedures are based on the Normality of the sampling distribution. The Central Limit Theorem justifies our assumption. One worry: Inference procedures based on sampling distributions can be influenced by outliers.
To cut a long story short. . . We should always plot our data to check to see if it is roughly Normal before making any inferences. In case there are outliers, or the population is strongly non-Normal, there are alternative methods that don’t require Normality and are not sensitive to outliers.
z procedures Inference via confidence intervals and hypothesis testing are sometimes referred to as z procedures , as both start with the one-sample z statistic and both use the standard Normal distribution. Let’s briefly consider the behavior of these z procedures in practice.
Confidence Intervals The ideal situation: High confidence and a small margin of error. High confidence means our method almost always gives the correct answers. A small margin of error means we’ve pinned down the parameter to a high degree of precision.
How to get a small margin of error? 1. A smaller critical value z ∗ (which means a lower confidence level). 2. A smaller standard deviation (which means there is less variation in the population). 3. A larger sample size n (which allows for more precise estimates).
Several important points The margin of error only accounts for sampling error (variation due to repeated sampling, as captured by the sampling distribution). More serious difficulties: undercoverage, nonresponse. Margin of error doesn’t take these into account.
In sum. . . The margin of error in a confidence interval ignores everything except the sample-to-sample variation due to choosing the sample randomly.
Significance Tests We use a test of significance to describe the degree of evidence provided by a sample against the null hypothesis. More precisely, the p -value gives the degree of evidence. How small a p -value is convincing evidence against a null hypothesis?
How small a p -value? The answer depends on two circumstances: 1. How plausible is the null hypothesis? (If H 0 is widely accepted, then strong evidence and thus a small p is needed.) 2. What are the consequences of rejecting the null hypothesis? (Would it require an expensive change?)
Significance and alternative hypotheses The p -value for a one-sided test is half the p -value for the two-sided test of the same null hypothesis based on the same data. The evidence against a null hypothesis is stronger when the alternative hypothesis is one-sided, since it’s based on the data plus information about the direction of possible deviations.
More on significance 1 Sample size affects statistical significance: Because large random samples have small chance variation, very small population effects can be highly significant if the sample is large. z = ¯ x − µ 0 σ/ √ n = size of the observed effect size of chance variation
More on significance 2 Because small random samples have a lot of chance variation, even large population effects can fail to be significant if the sample is small. Statistical significance does not tell us whether an effect is large enough to be important. In other words, statistical significance is not the same thing as practical significance.
Planning studies The practical questions we can ask when planning a study are: ◮ How large should our sample size be to ensure a small margin of error in confidence intervals? ◮ How large should our sample size be in performing tests of significance?
High confidence + small margin of error We can have both a high level of confidence and a small margin of error as long as our sample is large enough. To get a margin of error: m = z ∗ σ √ n we need a sample of size � 2 � z ∗ σ n = . m
A familiar example In the previous chapter, we considered mean body temperature. Population standard deviation is σ = 0 . 6 ◦ F. We want to estimate mean body temperature µ for healthy adults within ± 0 . 05 ◦ F with 95% confidence.
The solution The desired margin of error is m = 0 . 05. For 95% confidence, we have z ∗ = 1 . 96. � 2 � 2 � z ∗ σ � 1 . 96 × 0 . 6 n = = = 553 . 2 0 . 05 m
Sample size in significance tests How large of a sample should we take? Worry: If our sample is too small, large effects in the population might fail to give statistically significant results.
Three Questions We must answer the following to decide how large a sample to take: Significance How much protection do we want against level: getting a significant result from our sample when there really is no effect in the population? Effect size: How large an effect in the population is important in practice? Power: How confident do we want to be that our study will detect an effect of the size we think is important?
Power Suppose that we determine an effect size that is important in practice. The probability that our test successfully detects an effect of the specified size is the power of the test. The higher the power of a test, the more sensitive it is to deviations from the null hypothesis.
An illustration (I) Suppose we are performing a hypothesis test with the following null and alternative hypotheses: H 0 : µ = 0 H a : µ > 0 Suppose further that an effect µ > 0 . 8 has practical importance for us.
An illustration (II) We want to ensure that our test will reject the null hypothesis if the effect µ > 0 . 8 really is true. We can’t be 100% certain that this will happen. The power of our test is the probability that we will reject the null hypothesis when this effect really does occur.
Two probabilities (I) We can assess the performance of a test by giving two probabilities: 1. The significance level α . 2. The power for an alternative that we want to detect.
Two probabilities (II) The significance level of a test is the probability of making the wrong decision when the null hypothesis is true. The power against a specific alternative is the probability of making the right decision when that alternative is true.
Type I and Type II Errors If we reject H 0 when in fact H 0 is true, this is a Type I error . If we fail to reject H 0 when in fact H a is true, this is a Type II error .
The probability of error The significance level α of any fixed-level test is the probability of a Type I error. The probability of a Type II error is denoted β . The power of a test against any alternative is 1 − β .
A helpful diagram
Recommend
More recommend