ACMS 20340 Statistics for Life Sciences Chapter 14: Introduction - PowerPoint PPT Presentation

ACMS 20340 Statistics for Life Sciences Chapter 14: Introduction to Inference

Sampling Distributions For a population distributed as N ( µ, σ ) the statistic ¯ x calculated from a sample of size n has the distribution N ( µ, σ/ √ n ). We would like to use ¯ x to estimate µ . Unfortunately, while ¯ x is likely to be close to µ , they are unlikely to be exactly equal. We will make things easier and only guess an interval which contains µ instead of its exact value.

Inference Assumptions We will make the following (possibly unrealistic) assumptions: ◮ The population is normally distributed N ( µ, σ ). ◮ We do not know µ , but we do know σ . ◮ We have a random sample of size n . Later we will see how to handle the common case where we do not know σ .

To what extent can we determine µ ? Since the population is distributed as N ( µ, σ ), we know ¯ x has the distribution N ( µ, σ/ √ n ). For example, heights of 8 year old boys are normally distributed with σ = 10. The population also has a mean µ , but we do not know it. The population distribution is N ( µ, 10). Samples of size 217 are distributed as N ( µ, 0 . 7). Why? √ σ/ √ n = 10 / 217 ≈ 10 / 14 . 73 ≈ 0 . 6788 ≈ 0 . 7.

To what extent can we determine µ ? Using the normal tables, we can calculate the probability that ¯ x is within 1.4 of µ . � µ − 1 . 4 − µ � < Z < µ + 1 . 4 − µ P ( µ − 1 . 4 < ¯ x < µ + 1 . 4) = P 0 . 7 0 . 7 = P ( − 2 < Z < 2) = 0 . 954

To what extent can we determine µ ? Thus, the probability that ¯ x is within 1.4 of µ is 0.95. In other words, for 95% of all samples, 1.4 is the maximum distance between ¯ x and µ . So if we estimate that µ lies in the interval [¯ x − 1 . 4 , ¯ x + 1 . 4], we will be right 95% of the time we take a sample.

Confidence Intervals We say the interval [¯ x − 1 . 4 , ¯ x + 1 . 4] is a 95% confidence interval for µ , because 95% of the time, the interval we construct contains µ . The 95% is the confidence level. In general we write the interval as x ± 1 . 4 ¯ Of course, we could ask for different confidence levels. Other common choices are 90%, and 97%, 98%, 99%. A 100% confidence interval would be the range [ −∞ , ∞ ], which is not useful at all. So we must allow the possibility of being wrong.

Confidence Intervals The interval ¯ x ± 1 . 4 is not 100% reliable. The exact interval we will get depends on the sample we chose. All the intervals will have length 2.8, but their centers will vary. Saying we are 95% confident means the interval we constructed will contain µ 95% of the time, but 5% of the time it will be wrong.

Confidence Intervals For any given sample we construct an interval. We only know about the long run probability of our sample giving a good interval. We do not know, without further information, whether the interval from our particular sample is one of the 95% which contains µ , or one of the 5% which don’t.

Summing Up The Main Idea The sampling distribution of ¯ x tells us how close to µ the sample mean ¯ x is likely to be. A confidence interval turns that information around to say how close to ¯ x the unknown population mean µ is likely to be.

General Method to Construct a Confidence Interval We estimate parameter µ of a normal population N ( µ, σ ) using ¯ x by constructing a level C confidence interval. The interval will look like z ∗ σ ¯ x ± √ n . � �� margin of error z ∗ is called the critical value and depends only on C .

Confidence Levels Common z ∗ values are Confidence Level z ∗ 90% 1.645 95% 1.960 99% 2.576 For any confidence level C , the critical value z ∗ is the number for which P ( Z < − z ∗ ) = 1 − C 2 We can find this using a table look-up.

Critical Value in Tables Or, common values of z ∗ are listed in table C in the textbook.

Assumptions Remember the assumptions we made at the beginning: ◮ The population is normal with distribution N ( µ, σ ) ◮ We know the value of σ , but do not know µ . ◮ We have a SRS. How much can we relax these assumptions? ◮ We always need a SRS, otherwise ¯ x is not a random variable. ◮ This method requires us to know σ . (There are technical problems with estimating σ by s ) ◮ We only needed the population to be normal to ensure the sampling distribution was normal. In practice we can fudge this, especially if the sample sizes are large enough. Then the central limit theorem says the sampling distribution is approximately normal.

A Story About Basketball Charlie claims that he makes free throws at an 80% clip. To test his claim, we ask Charlie to take 20 shots. Unfortunately, Charlie only makes 8 out of 20. We respond, “Someone who makes 80% of his shots would almost never make only 8 out of 20!” The basis for our response: If Charlie’s claim were true and we repeated the sample of 20 shots many times, then he would almost never make just 8 out of 20 shots.

The basic idea of significance tests An outcome that would rarely happen if a claim were true is good evidence that the claim is NOT true. As with confidence intervals, we ask what would happen if we repeated the sample or experiment many times. For now, we will assume that we have a perfect SRS from an exactly Normal population with standard deviation σ known to us.

Phosphorus in the blood Levels of inorganic phosphorus in the blood of adults are Normally distributed with mean µ = 1 . 2 and standard deviation σ = 0 . 1 mmol/L. Does inorganic phosphorus blood level decrease with age? A retrospective chart review of 12 men and women between the ages of 75 and 79 yields: 1.26 1.00 1.19 1.39 1.10 1.29 1.00 0.87 1.03 1.00 1.23 1.18 The sample mean is ¯ x = 1 . 128 mmol/L.

The Question Do these data provide good evidence that, on average, inorganic phosphorus levels among adults of ages 75 to 79 are lower than in the whole adult population? To answer this question, here’s how we proceed: ◮ We want evidence that the mean blood level of inorganic phosphorus in adults of ages 75 to 79 is less than 1.2 mmol/L. ◮ Thus the claim we test is that the mean for people ages 75 to 79 is 1.2 mmol/L.

Answering the Question (I) If the claim that the population mean µ for adults aged 75 to 79 is 1.2 mmol/L were true, then sampling distribution of ¯ x from 12 individuals ages 75 to 79 would be Normal with mean µ ¯ x = 1 . 2 and standard deviation x = σ √ n = 0 . 1 σ ¯ √ = 0 . 0289 . 12

Answering the Question (II) There are two general outcomes to consider: 1. A sample mean is close to the population mean. This outcome could easily occur by chance when the population mean is µ = 1 . 2. 2. A sample mean is far from the population mean. It is somewhat unlikely for this outcome to occur by chance when the population mean is µ = 1 . 2.

Answering the Question (III) In our case, the sample mean ¯ x = 1 . 128 mmol/L is very far from the population mean µ = 1 . 2. An observed value this small would rarely occur just by chance if the true µ were equal to 1.2 mmol/L.

Null and Alternative Hypotheses The claim tested by a statistical test is called the null hypothesis . ◮ The test is designed to determine the strength of the evidence against the null hypothesis. ◮ Usually the null hypothesis is a statement of “no effect” or “no difference.” The claim about the population that we are trying to find evidence for is called the alternative hypothesis .

One-sided vs. two-sided alternative hypotheses The alternative hypothesis is one-sided if it states that a parameter is larger than or that it is smaller than the null hypothesis value. The alternative hypothesis is two-sided if it states that the parameter is merely different from the null value.

Hypothesis Notation Null hypothesis: H 0 Alternative hypothesis: H a Remember that these are always hypotheses about some population parameter, not some particular outcome.

Back to the phosphorus example Null “No difference H 0 : µ = 1 . 2 hypothesis: from adult mean of 1.2 mmol/L.” Alternative “Their mean is H a : µ < 1 . 2 hypothesis: lower than 1.2 (one-sided) mmol/L.”

Aspirin labels On an aspirin label, we find the following: “Active Ingredient: Aspirin 325 mg” There will be slight variation in the amount of aspirin, but this is fine as long as the production has mean µ = 325 mg. Let’s test the accuracy of the statement on the label: H 0 : µ = 325mg H a : µ � = 325mg Note that this is a two sided alternative hypothesis. Why do we use a two-sided H a rather than a one-sided H a ?

One last point on hypotheses Hypotheses should express the expectations or suspicions we have prior to our seeing the data. We shouldn’t first look at the data and then frame hypotheses to fit what the data show.

The P -value of a test Starting with a null hypothesis, we consider the strength of the evidence against this hypothesis. The number that measures the strength of the evidence against a null hypothesis is called a P-value .

ACMS 20340 Statistics for Life Sciences Chapter 14: Introduction - PowerPoint PPT Presentation

ACMS 20340 Statistics for Life Sciences Chapter 14: Introduction to Inference Sampling Distributions For a population distributed as N ( , ) the statistic x calculated from a sample of size n has the distribution N ( , / n ). We

ACMS 20340 Statistics for Life Sciences Chapter 9: Introducing Probability Why Consider

ACMS 20340 Statistics for Life Sciences Chapter 3: Scatterplots and Correlation Exploratory

ACMS 20340 Statistics for Life Sciences Chapter 7: Samples and Observational Studies Obtaining

ACMS 20340 Statistics for Life Sciences Chapter 8: Designing Experiments Fishers Experiments

ACMS 20340 Statistics for Life Sciences Chapter 13: Sampling Distributions Sampling We use

ACMS 20340 Statistics for Life Sciences Chapter 18: Comparing Two Means Daily Activity and

ACMS 20340 Statistics for Life Sciences Chapter 15: Inference in Practice Inference in Practice

ACMS 20340 Statistics for Life Sciences Chapter 4: Regression A Quick Recap of Chapter 3

ACMS 20340 Statistics for Life Sciences Chapter 11: The Normal Distributions Introducing the

ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two Proportions Two sample tests

ACMS 20340 Statistics for Life Sciences Chapter 22: The Chi-Square Test for Two-Way Tables

ACMS 20340 Statistics for Life Sciences Chapter 17: Inference About a Population Mean

ACMS 20340 Statistics for Life Sciences Chapter 19: Inference about a Population Proportion

ACMS 20340 Statistics for Life Sciences Chapter 24: One-way Analysis of Variance: Comparing

ACMS 20340 Statistics for Life Sciences Chapter 12: Discrete Probability Distributions What

ACMS 20340 Statistics for Life Sciences Chapter 21: The Chi-Square Test for Goodness of Fit

MOL2NET, 2018 , 4, http://sciforum.net/conference/mol2net-04 2 enormous amounts followed by

1 We have recorded the average high temperature in October at the San Diego airport, each year

Unit 4 Input (cin) More Assignment Statements 2 Review of Data Types bool true or

Ch.2: Loops and lists (part 2) Joakim Sundnes 1 , 2 Hans Petter Langtangen 1 , 2 Simula Research

CS 2334: Proje ject 3 Java Collections Framework Andrew H. Fagg: CS2334: Project 3 1 Project 2

Birkbeck (University of London) Software and Programming 1 In-class Test 1.1 13 Feb 2020

ECON2228 Notes 2 Christopher F Baum Boston College Economics 20142015 cfb (BC Econ)

Wh What is a list comprehension? Concise way to create a list from another list Syntax:

Sambuz

Useful Links

Newsletter

Mail Us