Announcements Unit 5: Inference for categorical data 2. Comparing two proportions ▶ Review materials will be posted over the weekend. Sta 101 - Spring 2015 ▶ PA 5 will open Monday morning (3/23 at 12:01am) and close Tuesday night (3/24 at 11:59pm). Duke University, Department of Statistical Science ▶ Project 1 due date will be pushed back to Monday morning 3/30. March 19, 2015 Dr. Windle Slides posted at http://bitly.com/windle2 1 CLT also describes the distribution of ˆ p 1 − ˆ p 2 For theoretical HT where H 0 : p 1 = p 2 , pool! For independent groups hypothesis test with H 0 : p 1 = p 2 ▶ Sampling distribtion √ √ p 1 (1 − p 1 ) + p 2 (1 − p 2 ) p (1 − p ) + p (1 − p ) p 1 − ˆ p 2 ) ∼ N mean = ( p 1 − p 2 ) , SE = (ˆ p 1 − ˆ p 2 ∼ N mean = 0 , SE = ˆ n 1 n 2 n 1 n 2 ▶ Best guess of p : Conditions: ▶ Independence: Random sample/assignment + 10% rule p pool = total successes total sample size = suc 1 + suc 2 ˆ ▶ Success-Failure: At least 10 expected successes and failures n 1 + n 2 for each group ▶ Best guess of SE: √ When we do not know or assume anything about p 1 and p 2 : p pool (1 − ˆ p pool ) p pool (1 − ˆ p pool ) ˆ ˆ SE pool = + n 1 n 2 ▶ Success-Failure: At least 10 observed successes and failures for each group ▶ Success-Failure: At least 10 ``expected'' successes and failures for each group (since we do not know p , use ˆ p pool ). 2 3
When S-F fails, simulate! Clicker question Suppose in group 1 30 out of 50 observations are successes, and in group 2 20 out of 60 observations are successes. What is the pooled proportion? ▶ If the S-F condition is met, can do theoretical inference: Z test, Z interval (a) 30 50 ▶ If the S-F condition is not met, must use simulation based (b) 20 60 methods: randomization test, bootstrap interval (c) 30 50 + 20 60 (d) 30+20 50+60 50 + 20 30 (e) 60 2 4 5 ``Healthy adults immunized with an experimental malaria vaccine, called PfSPZ may be completely protected from infection, according to Outcome government researchers." reported Time magazine in Aug 2013. The Malaria No malaria vaccine contains weakened forms of the live parasite -- Plasmodium Vaccine 0 6 6 falciparum -- responsible for causing malaria. In a randomized trial, none of Group No vaccine 5 1 6 the six patients who received the vaccine developed malaria, while five of Total 5 7 12 the six who were not vaccinated became infected. Do these data provide convincing evidence of a difference in rate of malaria infection? H 0 : p T = p C H A : p T ̸ = p C Outcome Conditions: Malaria No malaria 1. Independence: Patients are randomly assigned to treatment Vaccine 0 6 6 Group groups No vaccine 5 1 6 2. Success-failure: ? Total 5 7 12 http://healthland.time.com/2013/08/09/malaria-vaccine-shows-strongest-protection-yet-against-parasite/ 6 7
Difference between two proportions -- success: malaria no malaria Observed difference between proportions (no vaccine-vaccine) = 0.8333 12 6 6 Sum 7 6 HA: p_no vaccine - p_vaccine != 0 p-value = 0.0152 1 5 H0: p_no vaccine - p_vaccine = 0 0 5 malaria no vaccine vaccine Sum y x download("https://stat.duke.edu/~mc301/data/vacc_malaria.csv", destfile = "vacc_malaria.csv") vacc_malaria = read.csv("vacc_malaria.csv") inference(vacc_malaria$outcome, vacc_malaria$group, success = "malaria", est = "proportion", type = "ht", null = 0, alternative = "twosided", method = "simulation", seed = 1028) Response variable: categorical, Explanatory variable: categorical Summary statistics: Clicker question Clicker question Assuming that the null hypothesis ( H 0 : p T = p C ) is true, which of the Assuming that the null hypothesis ( H 0 : p T = p C ) is true, how many following is the pooled proportion of patients with malaria in the two patients would we expect to get infected with malaria in the vaccine groups? group? (a) 6 (a) 0 . 417 × 12 = 5 12 = 0 . 5 Outcome Outcome (b) 5 (b) 0 . 417 × 6 = 2 . 5 12 = 0 . 417 Malaria No malaria Malaria No malaria (c) 0 (c) 0 . 417 × 5 = 2 . 085 5 = 0 Vaccine 0 6 6 Vaccine 0 6 6 Group Group No vaccine 5 1 6 No vaccine 5 1 6 (d) 6 (d) 0 . 5 × 6 = 3 7 = 0 . 857 Total 5 7 12 Total 5 7 12 (e) 7 (e) 0 . 583 × 12 = 7 12 = 0 . 583 8 9 Simulation scheme Simulate in R 1. Use 12 index cards, where each card represents an experimental unit. 2. Mark 5 of the cards as ``malaria" and the remaining 7 as ``no malaria". 3. Shuffle the cards and split into two groups of size 6, for vaccine and no vaccine. 4. Calculate the difference between the proportions of ``malaria" in the vaccine and no vaccine decks, and record this number. 5000 Randomization distribution no vaccine vaccine observed 0.8333 vacc_malaria$outcome 5. Repeat steps (3) and (4) many times to build a randomization 3000 malaria distribution of differences in simulated proportions. 1000 no malaria 0 vacc_malaria$group −0.5 0.0 0.5 10 11
Summary of main ideas Application exercise: App Ex 5.2 1. CLT also describes the distribution of ˆ p 1 − ˆ p 2 See course website for details. 2. For theoretical HT where H 0 : p 1 = p 2 , pool! 3. When S-F fails, simulate! 12 13
Recommend
More recommend