Inference for Proportions Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Based on Rare Event Rule: “rare events happen – but not to me”. Marc Mehlman (University of New Haven) Inference for Proportions 1 / 20
Table of Contents Inference for a Single Proportion 1 Comparing Two Proportions 2 Marc Mehlman (University of New Haven) Inference for Proportions 2 / 20
Inference for a Single Proportion Inference for a Single Proportion Inference for a Single Proportion Marc Mehlman (University of New Haven) Inference for Proportions 3 / 20
Inference for a Single Proportion Let X 1 , · · · , X n be a random sample from BIN(1 , p ). Then X = � n j =1 X j ∼ BIN ( n , p ). Definition def n = ¯ X The sample population proportion is ˆ p = X . � p (1 − ˆ ˆ p ) def The standard error of ˆ p is SE ˆ = . p n � � � By the CLT, ¯ p (1 − p ) X is approximately N p , for big n and also ˆ p is approximately p n � � � for big n . Thus for big n , ¯ p (1 − ˆ ˆ p ) X is approximately N ˆ p , . n Theorem ( Large Sample Confidence Interval for p :) � ˆ p (1 − ˆ p ) margin of error = m = z ⋆ = z ⋆ SE ˆ p n and the confidence interval is ˆ p ± m. Use this interval for confidence 90% or more and when the number of successes and failures are both at least 15. Marc Mehlman (University of New Haven) Inference for Proportions 4 / 20
Inference for a Single Proportion We compute a 90% confidence interval for the population proportion of arthritis patients who suffer some "adverse symptoms." 23 = ≈ p ˆ 0 . 052 What is the sample proportion p ̂ ? 440 For a 90% confidence level, z* = 1.645. Confidence level C df 0.50 0.60 0.70 0.80 0.90 0.95 0.96 Using the large sample method : z* 0.674 0.841 1.036 1.282 1.645 1.960 2.054 = − m z * p ˆ (1 p ˆ ) n ˆ ± 90%CIfor p p : m = − m 1.645* 0.052(1 0.052) / 440 ± 0.052 0.017 = ≈ m 1.645*0.0106 0.017 With 90% confidence level, between 3.5% and 6.9% of arthritis patients taking this pain medication experience some adverse symptoms. Marc Mehlman (University of New Haven) Inference for Proportions 5 / 20
Inference for a Single Proportion “Plus four” confidence interval for p The “ plus four ” method gives reasonably accurate confidence intervals. We act as if we had four additional observations , two successes and two failures. Thus, the new sample size is n + 4 and the count of successes is X + 2. + counts of successes 2 ~ = p The “plus four” estimate of p is: + count of all observatio ns 4 An approximate level C confidence interval is: ~ ± CI : p m , with ~ ~ ~ = = − + m z * S E z * p ( 1 p ) ( n 4 ) Use this method when C is at least 90% and sample size is at least 10. Marc Mehlman (University of New Haven) Inference for Proportions 6 / 20
Inference for a Single Proportion We want a 90% CI for the population proportion of arthritis patients who suffer some “adverse symptoms.” + 23 2 25 ~ = = ≈ p 0 . 056 What is the value of the “plus four” estimate of p ? + 440 4 444 An approximate 90% confidence interval for p using the “plus four” method is: ~ ~ = − + m z * p ( 1 p ) ( n 4 ) ± 90%CIfor p : p % m = − m 1 . 645 * 0 . 056 ( 1 0 . 056 ) / 444 ± 0.056 0.018 = ≈ m 1 . 645 * 0 . 011 0 . 018 With 90% confidence, between 3.8% and 7.4% of the population of arthritis patients taking this pain medication experience some adverse symptoms. Confidence level C df 0.50 0.60 0.70 0.80 0.90 0.95 0.96 0.98 0.99 0.995 0.998 0.999 z* 0.674 0.841 1.036 1.282 1.645 1.960 2.054 2.326 2.576 2.807 3.091 3.291 Marc Mehlman (University of New Haven) Inference for Proportions 7 / 20
Inference for a Single Proportion Theorem (Sample Size) Given a desired margin of error, m, one should chose the following sample size, n, to obtain the confidence interval, ˆ p ± m (or ˜ p ± m) of p. � � z ⋆ � 2 p ⋆ (1 − p ⋆ ) when p ⋆ is an educated guess of what p is m n = . � z ⋆ � 2 with no educated guess of p 2 m Note: 1 round up n to ensure it is a positive integer. 2 the closer one’s educated guess, p ⋆ , of p is to 1/2, the safer one is. 3 n = ( z ⋆ ) 2 4 m 2 (ie, when p ⋆ = 1 / 2) is the most conservative estimate of n . Marc Mehlman (University of New Haven) Inference for Proportions 8 / 20
Inference for a Single Proportion Theorem (Sample Size) Given a desired margin of error, m, one should chose the following sample size, n, to obtain the confidence interval, ˆ p ± m (or ˜ p ± m) of p. � � z ⋆ � 2 p ⋆ (1 − p ⋆ ) when p ⋆ is an educated guess of what p is m n = . � z ⋆ � 2 with no educated guess of p 2 m Note: 1 round up n to ensure it is a positive integer. 2 the closer one’s educated guess, p ⋆ , of p is to 1/2, the safer one is. 3 n = ( z ⋆ ) 2 4 m 2 (ie, when p ⋆ = 1 / 2) is the most conservative estimate of n . Marc Mehlman (University of New Haven) Inference for Proportions 8 / 20
Inference for a Single Proportion Theorem (Sample Size) Given a desired margin of error, m, one should chose the following sample size, n, to obtain the confidence interval, ˆ p ± m (or ˜ p ± m) of p. � � z ⋆ � 2 p ⋆ (1 − p ⋆ ) when p ⋆ is an educated guess of what p is m n = . � z ⋆ � 2 with no educated guess of p 2 m Note: 1 round up n to ensure it is a positive integer. 2 the closer one’s educated guess, p ⋆ , of p is to 1/2, the safer one is. 3 n = ( z ⋆ ) 2 4 m 2 (ie, when p ⋆ = 1 / 2) is the most conservative estimate of n . Marc Mehlman (University of New Haven) Inference for Proportions 8 / 20
Inference for a Single Proportion Theorem (Sample Size) Given a desired margin of error, m, one should chose the following sample size, n, to obtain the confidence interval, ˆ p ± m (or ˜ p ± m) of p. � � z ⋆ � 2 p ⋆ (1 − p ⋆ ) when p ⋆ is an educated guess of what p is m n = . � z ⋆ � 2 with no educated guess of p 2 m Note: 1 round up n to ensure it is a positive integer. 2 the closer one’s educated guess, p ⋆ , of p is to 1/2, the safer one is. 3 n = ( z ⋆ ) 2 4 m 2 (ie, when p ⋆ = 1 / 2) is the most conservative estimate of n . Marc Mehlman (University of New Haven) Inference for Proportions 8 / 20
Inference for a Single Proportion What sample size would we need in order to achieve a margin of error no more than 0.01 (1 percentage point) with a 90% confidence level? We could use 0.5 for our guessed p *. However, since the drug has been approved for sale over the counter, we can safely assume that no more than 10% of patients should suffer “adverse symptoms” (a better guess than 50%). For a 90% confidence level, z * = 1.645. Confidence level C df 0.50 0.60 0.70 0.80 0.90 0.95 0.96 z* 0.674 0.841 1.036 1.282 1.645 1.960 2.054 2 2 z * 1 . 645 = − = ≈ n p * ( 1 p *) ( 0 . 1 )( 0 . 9 ) 2434 . 4 m 0 . 01 To obtain a margin of error no more than 0.01 we need a sample size n of at least 2435 arthritis patients. Marc Mehlman (University of New Haven) Inference for Proportions 9 / 20
Inference for a Single Proportion Theorem (Large Sample z –Test for a Population Proportion) Let X 1 , · · · , X n be a random sample where X j ∼ BIN (1 , p ) and such that np ≥ 10 and n (1 − p ) ≥ 10 . Let H 0 : p = p 0 where p is unknown. Then p − p 0 ˆ z = ∼ N (0 , 1) � p 0 (1 − p 0 ) n is a test statistic for H 0 . Marc Mehlman (University of New Haven) Inference for Proportions 10 / 20
Inference for a Single Proportion Example A potato-chip producer has just received a truckload of potatoes from its main supplier. If the producer determines that more than 8% of the potatoes in the shipment have blemishes, the truck will be sent away to get another load from the supplier. A supervisor selects a random sample of 500 potatoes from the truck. An inspection reveals that 47 of the potatoes have blemishes. Carry out a significance test at the α = 0.10 significance level. What should the producer conclude? We want to perform a test at the α = 0.10 significance level of H 0 : p = 0.08 H a : p > 0.08 where p is the actual proportion of potatoes in this shipment with blemishes. If conditions are met, we should do a one-sample z test for the population proportion p . Random: The supervisor took a random sample of 500 potatoes from the shipment. Normal: Assuming H 0 : p = 0.08 is true, the expected numbers of blemished and unblemished potatoes are np 0 = 500(0.08) = 40 and n (1 – p 0 ) = 500(0.92) = 460, respectively. Because both of these 13 values are at least 10, we should be safe doing Normal calculations. Marc Mehlman (University of New Haven) Inference for Proportions 11 / 20
Inference for a Single Proportion Example p = 47/500 = 0.094. The sample proportion of blemished potatoes is ˆ p − p ˆ = 0.094 − 0.08 Test statistic z = = 1.15 0 0 (1 − p p 0 ) 0.08(0.92) n 500 P -value The desired P -value is: P ( z ≥ 1.15) = 1 – 0.8749 = 0.1251 Since our P -value, 0.1251, is greater than the chosen significance level of α = 0.10, we fail to reject H 0 . There is not sufficient evidence to conclude that the shipment contains more than 8% blemished potatoes. The producer will use this truckload of potatoes to make 14 potato chips. Marc Mehlman (University of New Haven) Inference for Proportions 12 / 20
Recommend
More recommend