stat 113 sampling randomization and confounding
play

STAT 113 Sampling, Randomization and Confounding Colin Reimer - PowerPoint PPT Presentation

Warmup Sampling Confounding Variables and Simpsons Paradox Observational vs. Experimental Designs STAT 113 Sampling, Randomization and Confounding Colin Reimer Dawson Oberlin College August 31 and Sept 5, 2017 1 / 23 Warmup Sampling


  1. Warmup Sampling Confounding Variables and Simpson’s Paradox Observational vs. Experimental Designs STAT 113 Sampling, Randomization and Confounding Colin Reimer Dawson Oberlin College August 31 and Sept 5, 2017 1 / 23

  2. Warmup Sampling Confounding Variables and Simpson’s Paradox Observational vs. Experimental Designs Cases and Variables Warmup The handout has a graph depicting some information about monuments depicting Confederate leaders and soldiers involved in the U.S Civil War in the 1860s. (Source: Southern Poverty Law Center) 1. Identify what the cases are (what does one dot represent?) 2. Identify all of the variables that the graph depicts (what do we know about each dot?) and how their value is conveyed. 3. Sketch the first few rows of the data table that might have been used to create this graph. 4. What features of the graph stand out? 5. Does this graph help resolve the following question: “To what extent do Confederate monuments represent southern heritage, honoring those who died, and to what extent were they meant to intimidate Black citizens in the south?” 3 / 23

  3. Warmup Sampling Confounding Variables and Simpson’s Paradox Observational vs. Experimental Designs Sampling Summarizing the Gettysburg Address 5 / 23

  4. Warmup Sampling Confounding Variables and Simpson’s Paradox Observational vs. Experimental Designs Sampling and Inference: The “Big Picture” 6 / 23

  5. Warmup Sampling Confounding Variables and Simpson’s Paradox Observational vs. Experimental Designs Population, Samples, and Inference Population: All potential cases that we are interested in saying something about Sample: The set of cases we actually have data for (a subset of the population) Statistical Inference: Using sample data to obtain information about the population For inference to be effective, samples ought to be representative of the population. 7 / 23

  6. Warmup Sampling Confounding Variables and Simpson’s Paradox Observational vs. Experimental Designs Simple Random Sampling • To guard against sampling bias, we typically want to collect a random sample. Versions... • Simple Random Sampling ← (Our focus) • Stratified Sampling • Cluster Sampling • Systematic Sampling 8 / 23

  7. Warmup Sampling Confounding Variables and Simpson’s Paradox Observational vs. Experimental Designs Feasibility of Random Sampling It is often not feasible to get a truly random sample. Options: • Reduce the scope of your population, and limit generalization accordingly • Collect a non-random sample, avoid as many sources of bias as possible 9 / 23

  8. Warmup Sampling Confounding Variables and Simpson’s Paradox Observational vs. Experimental Designs Not all Non-Random Samples are Created Equal You want to estimate the average hours per week that Oberlin students spend studying. None of the following is random; which would you go with? (a) Go to Mudd and ask people there (b) Email every student and use all responses (c) Require all students in a statistics class to respond (d) Go to the gym and ask everyone going in (e) Go to The Local and ask everyone 10 / 23

  9. Warmup Sampling Confounding Variables and Simpson’s Paradox Observational vs. Experimental Designs Not all Non-Random Samples are Created Equal None of the above are representative in every way, but some are more obviously non-representative for the variable we care about . 11 / 23

  10. Warmup Sampling Confounding Variables and Simpson’s Paradox Observational vs. Experimental Designs “Don’t Ask Don’t Tell” 2010 CBS/NYT polls (when DADT was being reconsidered): “Do you favor or oppose homosexuals gay men and lesbians serving openly in the military?” Favor Oppose “homosexuals” 44% 42% “gay men and lesbians” 58% 28% 12 / 23

  11. Warmup Sampling Confounding Variables and Simpson’s Paradox Observational vs. Experimental Designs Non-Sampling Bias Other sources of bias not due to sampling procedure • Question wording • Non-response bias • Context 13 / 23

  12. Warmup Sampling Confounding Variables and Simpson’s Paradox Observational vs. Experimental Designs Confounding Variables Does a packed arena help or hurt the home team in basketball? 15 / 23

  13. Warmup Sampling Confounding Variables and Simpson’s Paradox Observational vs. Experimental Designs Race and the Death Penalty Data from 1981 Florida Homocide Convictions DP / Convictions 0.08 0.04 0.00 Black White Defendant's Race Figure: Proportions of Death Sentences out of Total Convictions, by Defendant’s Race. Source: Agresti (2002) 16 / 23

  14. Warmup Sampling Confounding Variables and Simpson’s Paradox Observational vs. Experimental Designs Race and the Death Penalty 0.4 Defendant's Race Prop. DP Black White 0.2 0.0 Black White Victim's Race Figure: Proportions of Death Sentences out of Total Convictions, by both Victim’s and Defendant’s Race. Source: Agresti (2002) Black defendants are sentenced to death at higher rates with both white and black victims. But combined, white defendants have a (slightly) higher rate. Why the seeming contradiction? 17 / 23

  15. Warmup Sampling Confounding Variables and Simpson’s Paradox Observational vs. Experimental Designs Race and the Death Penalty 0.20 1.0 Vic. Race 0.8 0.15 White Proportion DP Black Proportion 0.6 0.10 0.4 0.05 0.2 0.00 0.0 Black White Black White Victim's Race Def. Race The DP was applied much more often for white victims, and most homocides involved same-race individuals. The tendency to be involved in cases with white victims was a slightly bigger disadvantage as a defendant than being white was an advantage. 18 / 23

  16. Warmup Sampling Confounding Variables and Simpson’s Paradox Observational vs. Experimental Designs Confounding Variables A confounding variable has a relationship with both the explanatory and response variable, making it difficult or impossible to interpret the relationship between the two. • If we view defendant’s race as explanatory variable and sentencing outcome as response, then victim’s race is a confounding variable . • When the confound is ignored, we get a spurious association . 19 / 23

  17. Warmup Sampling Confounding Variables and Simpson’s Paradox Observational vs. Experimental Designs Simpson’s Paradox When controlling for the confound results in a reversal of the direction of association, this is an instance of Simpson’s Paradox . • Ignoring victim’s race flips the direction of the association between defendant’s race and sentencing outcome. 20 / 23

  18. Warmup Sampling Confounding Variables and Simpson’s Paradox Observational vs. Experimental Designs Example: Cursive Handwriting and SAT Scores Handout 22 / 23

  19. Warmup Sampling Confounding Variables and Simpson’s Paradox Observational vs. Experimental Designs Observational vs. Experimental Designs In an experimental design , the researchers control which cases are assigned to which levels of the explanatory variable(s). Otherwise (if cases have “naturally occurring” values of the e.v.), the study is observational . A “gold standard” procedure for an experiment is random assignment of cases to levels of the explanatory variable(s). This ensures that there is no systematic relationship between the explanatory variable and any would-be confounding variables. 23 / 23

Recommend


More recommend