Generalizing experimental study results to target populations Elizabeth Stuart Johns Hopkins Bloomberg School of Public Health Departments of Mental Health, Biostatistics, and Health Policy and Management estuart@jhu.edu www.biostat.jhsph.edu/ ∼ estuart Funding thanks to NSF DRL-1335843, IES R305D150003 February 26, 2016 Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 1 / 25
Outline Introduction, context, and framework 1 The setting and overview of approaches 2 Reweighting approaches 3 Conclusions 4 Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 2 / 25
Outline Introduction, context, and framework 1 The setting and overview of approaches 2 Reweighting approaches 3 Conclusions 4 Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 3 / 25
Making research results relevant: A range of policy or practice questions A given district or school may go on to the What Works Clearinghouse to see whether a new reading intervention is “evidence-based” and helpful for them The state of Maryland may be deciding whether to recommend the new program for all schools or districts in the state Or for all “struggling” schools? Medicare may be deciding whether or not to approve payment for a new treatment for back pain Should a broad public health media campaign be started around not switching car seats to forward facing until a child is 12 months old? Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 4 / 25
From individual to population effects All of these reflect a “population” average treatment effect e.g., across individuals in a population, does this intervention work “on average”? This population could be fairly narrow, or quite broad There may actually be underlying treatment effect heterogeneity e.g., stronger effects for some individuals Lots of interest in tailoring treatments for individuals; not my focus today But for policy questions that motivate today’s talk, desire an overall average effect Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 5 / 25
At this point, relatively little attention to how well results from a given study might carry over to a relevant target population This talk will discuss recent work trying to get people to start thinking about these issues, while taking advantage of recent advances in study quality and data Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 6 / 25
How much do we need to worry about external validity? Lots of evidence that the people or groups that participate in trials differ from general populations Will cause bias if the factors that differ also moderate treatment effects Districts that participate in rigorous educational evaluations much larger than typical districts in the US (Stuart et al., under review) People that participate in trials of drug abuse treatment have higher education levels than those in drug abuse treatment nationwide (Susukida et al., in press) Increasing worries about lack of minority representation in clinical trials And these differences can lead to external validity bias (Bell et al., in press) Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 7 / 25
Outline Introduction, context, and framework 1 The setting and overview of approaches 2 Reweighting approaches 3 Conclusions 4 Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 8 / 25
The setting Assume we have one randomized trial, already conducted And also covariate data on some target population of interest (do not have treatment values or outcomes in the population) The question: How can we use these data to estimate the effects of the intervention in the target population? Note: Focused on assessing and enhancing external validity with respect to the characteristics of trial and population subjects Lots of other threats to external validity as well: scale-up problems, implementation, different settings, . . . (see Cook, 2014) Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 9 / 25
Analysis approaches for estimating population effects Meta-analysis: When multiple studies available, but does not necessarily give population estimates Cross-design synthesis: Explicitly combines experimental and non-experimental effect estimates (Pressler & Kaizar, 2013) Model-based approaches: Model outcome in the trial, use to predict outcomes in the population (e.g., BART; Kern et al., 2016) Post-stratification: Estimate separate effects, then combine using population proportions Reweighting: Like a smoothed version of post-stratification (Cole & Stuart, 2009; O’Muircheartaigh & Hedges, 2014) (Of course design options exist too, e.g., aiming to enroll representative (or “balanced”) samples (Royall!)) Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 10 / 25
Outline Introduction, context, and framework 1 The setting and overview of approaches 2 Reweighting approaches 3 Conclusions 4 Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 11 / 25
Case study: The ACTG Trial Examined highly active antiretroviral (HAART) therapy for HIV compared to standard combination therapy 577 US HIV+ adults randomized to treatment, 579 to control 33/577 and 63/579 endpoints (AIDS/death) during 52-week follow-up Intent-to-treat analysis: Hazard ratio of 0.51 (95% CI: 0.33, 0.77) Cole & Stuart (2010) Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 12 / 25
The target population Don’t necessarily just care about people in trial What would the effects of the treatment be if implemented nationwide? US estimates of the number of people infected with HIV in 2006 (CDC, 2008) HIV incidence was estimated using a statistical approach with adjustment for testing frequency and extrapolated to the US Have joint distribution of sex, race, and age groups of the newly infected individuals Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 13 / 25
Inverse probability of selection weighting Weight the trial subjects up to the population 1 Each subject in trial receives weight w i = P ( S i =1 | X ) (Inverse of their probability of being in the trial) Use those weights when calculating means or running regressions Related to inverse probability of treatment weighting, Horvitz-Thompson estimation in surveys Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 14 / 25
Standard assumptions Experiment was randomized “Sample ignorability for treatment effects”: selection into the trial independent of impacts given the observed covariates For the same value of observed covariates, impacts the same across trial and population No unmeasured variables related to selection into the trial and treatment effects (Sensitivity analysis for this: Nguyen et al., under review) “Overlap”: all individuals in the population had a non-zero probability of participating in the trial Analogous to strong ignorability/unconfoundedness of treatment assignment in non-experimental studies (If outcome under control observed in the population, can use a slightly different assumption) Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 15 / 25
Effect heterogeneity and predictors of participation People in trial more likely to be: Older (not 13-29) Male White or Hispanic Those characteristics also moderate effects in the trial Detrimental effects for young people Largest effects for those 30-39 Larger effects for males, as compared to females Larger effects for blacks, as compared to White or Hispanic Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 16 / 25
Estimated population effects Hazard ratio 95% CI Crude trial results 0.51 0.33, 0.77 Age weighted 0.68 0.39, 1.17 Sex weighted 0.53 0.34, 0.82 Race weighted 0.46 0.29, 0.72 Age-sex-race weighted 0.57 0.33, 1.00 CI’s longer for weighted results Effects generally somewhat attenuated, except for weighting only by race Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 17 / 25
Placebo checks Can also use the weighting as a diagnostic Weighted control group mean should match the population outcome mean if the control conditions are the same (“placebo check”) In HAART case, if we had mortality information in the population, could see if weighted mortality rate among control group matched the population mortality rate (assuming no treatment in the population) If placebo check fails, may indicate unobserved differences between the groups Hartman et al., 2013; Stuart et al., 2011 Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 18 / 25
Outline Introduction, context, and framework 1 The setting and overview of approaches 2 Reweighting approaches 3 Conclusions 4 Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 19 / 25
Everyone wants to assume that study results generalize But very few statistical methods exist At this point, lots of “hand waving,” qualitative statements Need more statistical methods to quantify and improve external validity For both study design and study analysis Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 20 / 25
Recommend
More recommend