R A N D O M I Z AT I O N PMAP 8521: Program Evaluation for Public Service October 14, 2019 Fill out your reading report on iCollege!
P L A N F O R T O D A Y The magic of randomization The “Gold” Standard Running and analyzing RCTs
T H R E A T S T O V A L I D I T Y Internal validity Omitted variable bias Trends Study calibration Contamination External validity Construct validity Statistical conclusion validity
I N T E R N A L V A L I D I T Y Omitted variable bias Attrition Selection Trends Maturation Secular trends Seasonality Testing Regression Study calibration Contamination Measurement error Hawthorne John Henry Time frame of study Spillovers Intervening events
T H E M AG I C O F R A N D O M I Z AT I O N
<latexit sha1_base64="6honxTkUB64g6L3bUQhexACzE10=">ACAXicbVDLSsNAFJ3UV62vqBvBzWAR3FiSKuhGKLpxWcE+pI1hMrlph04ezEyEurGX3HjQhG3/oU7/8Zpm4W2Hhju4Zx7uXOPl3AmlWV9G4WFxaXleJqaW19Y3PL3N5pyjgVFBo05rFoe0QCZxE0FMc2okAEnocWt7gauy3HkBIFke3apiAE5JexAJGidKSa+51feCKuAxf4Lt7W9djXS2XuWbZqlgT4Hli56SMctRd86vrxzQNIVKUEyk7tpUoJyNCMcphVOqmEhJCB6QHU0jEoJ0skFI3yoFR8HsdAvUni/p7ISCjlMPR0Z0hUX856Y/E/r5Oq4NzJWJSkCiI6XRSkHKsYj+PAPhNAFR9qQqhg+q+Y9okgVOnQSjoEe/bkedKsVuyTSvXmtFy7zOMon10gI6Qjc5QDV2jOmogih7RM3pFb8aT8WK8Gx/T1oKRz+yiPzA+fwCpaJUW</latexit> W H Y R A N D O M I Z E ? Fundamental problem of causal inference δ i = Y 1 i − Y 0 i Individual-level effects are impossible to observe
<latexit sha1_base64="pN7mJOGZdI4pMNJmbJ2I7RQyEFU=">ACDXicbVDLSgMxFM3UV62vUZduglVoF5aZKuhGqErBZYU+aYchk2ba0MyDJCOUoT/gxl9x40IRt+7d+Tdm2hG09UDg3HPu5eYeJ2RUSMP40jJLyura9n13Mbm1vaOvrvXFEHEMWngAW87SBGPVJQ1LJSDvkBHkOIy1ndJP4rXvCBQ38uhyHxPLQwKcuxUgqydaPrupVeAmrhY5twhPYsY3iT1lUdUKMoq3njZIxBVwkZkryIEXN1j97/QBHvElZkiIrmE0oRlxQzMsn1IkFChEdoQLqK+sgjwoqn10zgsVL60A24er6EU/X3RIw8Icaeozo9JIdi3kvE/7xuJN0LK6Z+GEni49kiN2JQBjCJBvYpJ1iysSIc6r+CvEQcYSlCjCnQjDnT14kzXLJPC2V787yles0jiw4AIegAExwDirgFtRA2DwAJ7AC3jVHrVn7U17n7VmtHRmH/yB9vENh0KWKQ=</latexit> <latexit sha1_base64="togvVy7XxoWsr9z5bpvtjw7BhDE=">ACF3icbVDLSsNAFJ3UV62vqEs3g0VoF4akCroRim5cVrAPaUKZTCbt0MkzEyEvsXbvwVNy4Ucas7/8Zpm4W2Hrhw5px7mXuPnzAqlW1/G4Wl5ZXVteJ6aWNza3vH3N1ryTgVmDRxzGLR8ZEkjHLSVFQx0kEQZHPSNsfXk389j0Rksb8Vo0S4kWoz2lIMVJa6pmWGxCmELyAFdHIrsbwfY0E+nCo/nNbvaM8u2ZU8BF4mTkzLI0eiZX24Q4zQiXGpOw6dqK8DAlFMSPjkptKkiA8RH3S1ZSjiEgvm941hkdaCWAYC1cwan6eyJDkZSjyNedEVIDOe9NxP+8bqrCcy+jPEkV4Xj2UZgyqGI4CQkGVBCs2EgThAXVu0I8QAJhpaMs6RCc+ZMXSatmOSdW7ea0XL/M4yiCA3AIKsABZ6AOrkEDNAEGj+AZvI348l4Md6Nj1lrwchn9sEfGJ8/YUmbpA=</latexit> W H Y R A N D O M I Z E ? Average treatment effect ATE = E ( Y 1 − Y 0 ) = E ( Y 1 ) − E ( Y 0 ) δ = ( ¯ Y | P = 1) − ( ¯ Y | P = 0)
<latexit sha1_base64="togvVy7XxoWsr9z5bpvtjw7BhDE=">ACF3icbVDLSsNAFJ3UV62vqEs3g0VoF4akCroRim5cVrAPaUKZTCbt0MkzEyEvsXbvwVNy4Ucas7/8Zpm4W2Hrhw5px7mXuPnzAqlW1/G4Wl5ZXVteJ6aWNza3vH3N1ryTgVmDRxzGLR8ZEkjHLSVFQx0kEQZHPSNsfXk389j0Rksb8Vo0S4kWoz2lIMVJa6pmWGxCmELyAFdHIrsbwfY0E+nCo/nNbvaM8u2ZU8BF4mTkzLI0eiZX24Q4zQiXGpOw6dqK8DAlFMSPjkptKkiA8RH3S1ZSjiEgvm941hkdaCWAYC1cwan6eyJDkZSjyNedEVIDOe9NxP+8bqrCcy+jPEkV4Xj2UZgyqGI4CQkGVBCs2EgThAXVu0I8QAJhpaMs6RCc+ZMXSatmOSdW7ea0XL/M4yiCA3AIKsABZ6AOrkEDNAEGj+AZvI348l4Md6Nj1lrwchn9sEfGJ8/YUmbpA=</latexit> W H Y R A N D O M I Z E ? δ = ( ¯ Y | P = 1) − ( ¯ Y | P = 0) This only works if subgroups that received/didn’t receive treatment look the same
W H Y R A N D O M I Z E ? With big enough numbers, the magic of randomization helps make comparison groups comparable
R example
How big of a sample?
T H E “ G O L D ” S TA N DA R D
T Y P E S O F R E S E A R C H Experimental studies vs. observational studies Which is better?
T Y P E S O F R E S E A R C H Experimental studies vs. observational studies Medicine Epidemiology Social science DAGs in RCTs?
RCTs are great! Super impractical to do all the time though!
“Gold standard” implies that all causal inferences will be valid if you do the experiment right We don’t care if studies are experimental or not We care if our causal inferences are valid RCTs are a helpful baseline/rubric for other methods
Moving to Opportunity
R C T S & V A L I D I T Y Randomization fixes a ton of internal validity issues Trends Selection Treatment and control Maturation, secular groups are comparable; trends, seasonality, regression to the mean people don’t self-select all generally average out
R C T S & V A L I D I T Y RCTs don’t fix attrition! Worst threat to internal validity in RCTs If attrition is correlated with treatment, that’s bad People might drop out because of the treatment, or because they got/didn’t get the control group
A D D R E S S I N G A T T R I T I O N Recruit as effectively as possible You don’t just want weird/WEIRD participants Get people on board Get participants invested in the experiment Collect as much baseline information as possible Check for randomization of attrition
R C T S & V A L I D I T Y Randomization failures Check baseline pre-data Noncompliance Some people assigned to treatment won’t take it; some people assigned to control will take it Intent-to-treat (ITT) vs . Treatment-on-the treated (TTE)
O T H E R L I M I T A T I O N S RCTs don’t magically fix construct validity and statistical conclusion validity RCTs definitely don’t magically fix external validity
W H E N T O R A N D O M LY A S S I G N Demand for treatment exceeds supply Treatment will be phased in over time Treatment is in equipoise Local culture open to randomization When you’re a nondemocratic monopolist When people won’t know (and it’s ethical!) When lotteries are going to happen anyway
W H E N T O N O T R A N D O M L Y A S S I G N When you need immediate results When it’s unethical or illegal When it’s something that happened in the past When it involves universal ongoing phenomena
R U N N I N G & A N A LY Z I N G R C T S
R A N D O M A S S I G N M E N T Coins Dice Unbiased lottery Random numbers + threshold Atmospheric noise random.org
R example
RCT with Qualtrics
Recommend
More recommend