On the Causal Interpretation of Race in Regressions Adjusting for Confounding and Mediating Variables Tyler J. VanderWeele Departments of Epidemiology and Biostatistics Harvard T.H. Chan School of Public Health
Regressions with Race Researchers often fit regression models of some outcome (Y) on Race (R) and other covariates (X): (1) What are different interpretation of the race coefficient? (2) Under what assumptions do different interpretations hold? (3) Does it matter what is included in the covariates X? Does it matter if some covariates X are affected by race? (4) How might any of this be useful in trying to reduce disparities?
Related Literature and Motivation Current presentation is based on a paper (VanderWeele and Robinson, 2014) and work in progress (with John Jackson); related literature include: (1) Countless studies in sociology, social epidemiology, social policy, etc. (2) Position in the statistics causal inference literature that race is an immutable characteristic not allowing counterfactuals (Holland, 1986; Greiner and Rubin, 2011) (3) Exchange in AJE between Kaufman and Cooper (1999, 2000) and Krieger and Davey Smith (2000) on the meaning of adjusting for SES (4) Discussion of Berkman (2004) on reframing counterfactual questions in social epidemiology on interventions not involving race (5) Other recent work on the topic from a counterfactual perspective (Blank et al., 2004; Marcellesi, 2013; Sen and Wasow, 2015) (6) Oaxaca-Blinder decomposition in economics (Blinder 1973; Oaxaca 1973) and mediation analysis (Pearl, 2001; Imai, 2010; VanderWeele, 2015)
Other Motivation Analyses by Neal and Johnson (1996) and Fryer (2011), using NLSY data, indicate that for black-white racial inequalities in income, unemployment, incarceration, and self-reported health: Adjustment for a standardized test measure of educational achievement (AFQT) at ages 15-18, eliminate 72% of the gap in wages 75% of the gap in unemployment 69% of the gap in incarceration rates 100%+ of the gap in self-reported physical health How do we interpret this … ? Is this the right analysis … ? What do we learn from it … ?
Definitions of Race Results presented are purely formal For any given definition of race: The results relate: (i) the results of an analysis (under that definition) to (ii) an interpretation (with respect to that same definition) The results are applicable irrespective of how race is defined They are applicable if race is e.g. defined by - self-reported - genealogy, ancestry, genetic analysis etc. The results are essentially agnostic to how race is defined The interpretation the results provide are always relative to the definition of race used in the analysis
Associational Interpretation We might simply interpret the race coefficient, β 1 , in an associational or predictive manner Interpretation: For persons with the same value of covariates X (e.g. age; or age and childhood SES; or age and childhood SES; and educational attainment), but who differed in race (e.g. black versus white) what is the expected difference in outcomes? Assumptions: The regression model is correctly specified Covariates: The interpretation is relative to the covariates X But it is the same type predictive/associational interpretational irrespective of what the covariates are
Associational Interpretation We might simply interpret the race coefficient, β 1 , in an associational or predictive manner Use: The interpretation is straightforward (descriptive), but it is not causal and it is not clear how we would use it to reduce disparities If we find an association with race we do not know if it is: - Discrimination - Physical or genetic characteristics - Unequal educational or economic opportunities - Common cause of race and the outcome - Common cause a covariate X affected by race, and the outcome
Equalizing SES Distributions Suppose instead we consider the race coefficient with and without control for individual and neighborhood SES in childhood R H Y SES 0 Y – outcome R – race variable NSES 0 SES 0 – SES in childhood N SES 0 – neighborhood SES in childhood H – complex historical process giving rise to associations
Equalizing SES Distributions We fit the model twice: Once with X empty; once with X = (SES 0 ,NSES 0 ) Interpretation: When X is empty, the race coefficient is just the average difference in outcomes comparing black and white individuals When X = (SES 0 , NSES 0 ), the race coefficient is the inequality that would remain if the distributions of individual and neighborhood SES during childhood, in the black population had been set equal to that of the white population The difference between the two is how much of the inequality we could eliminate by equalizing the SES distributions
Equalizing SES Distributions We fit the model twice: Once with X empty; once with X = (SES 0 ,NSES 0 ) Covariates: If we include a covariate in X that is one of the components of race (e.g. childhood individual or neighborhood SES) then the race coefficient only picks up the remaining components
Equalizing SES Distributions Let E[Y|R=1] denote the average outcome for black individuals Let E[Y|R=0] denote the average outcomes for white individuals Let Y x be the outcome that would have been observed if the SES variable(s) had been set to x G 0 be a random draw from the SES variable(s) from the distribution in the white population E[Y|R=1] - E[Y G0 |R=1] is the portion of the inequality eliminated if we equalized white-black SES distributions E[Y G0 |R=1] - E[Y|R=0] is the inequality that would remain if we equalized white-black SES distributions Under assumptions described below: E[Y G0 |R=1] = Σ x E[Y|R=1,x] P(x|R=0)
Equalizing SES Distributions Note: In the interpretation here we are not talking about the effect of race, but really about the effects of SES How much the racial inequality would be reduced by intervening on early childhood SES R H Y Assumptions: SES 0 The effects of childhood individual and neighborhood SES on the outcome Y NSES 0 should be unconfounded conditional on the race variable and other covariates (e.g. may want to control for C=age) i.e. Y x | | X | (R,C) Age Caveat: In any given study we will only have specific measures of childhood individual and neighborhood SES (so we see how the inequality would be reduced if we equalized our actual SES measures)
Equalizing SES Distributions Covariates: What if we control for covariates in X that may themselves be affected by race e.g. years of education? H R Y SES 1 SES 0 NSES 0
Equalizing SES Distributions We could again fit the model twice: Once with X = (SES 0 , NSES 0 ) Once with (SES 0 , NSES 0 , SES 1 ) Interpretation: When X= (SES 0 , NSES 0 ), the race coefficient is the racial inequality for those with the same childhood individual and neighborhood SES When we control for (SES 0 , NSES 0 , SES 1 ) the race coefficient is the inequality for those with a given (SES 0 , NSES 0 ) that would remain if the distribution of adult SES 1 in the black population had been set equal to that of the white population The difference between the two is how much of the racial inequality we could eliminate (for those of the same childhood SES) by equalizing the adult SES 1 distributions
Equalizing SES Distributions We could again fit the model twice: Once with X = (SES 0 , NSES 0 ) Once with (SES 0 , NSES 0 , SES 1 ) Assumptions: The assumption required is that effects of the adult measure M=SES 1 on outcome Y are unconfounded conditional on race, covariates C, and X=childhood individual and neighborhood SES i.e. Y m | | M | (R,C,X) It is thus important to control for childhood SES (otherwise this is confounding the effect; adult SES then picks up childhood SES effects too) Note again we are not interpreting the race coefficient itself causally Our causal intervention is on adult SES 1
Equalizing SES Distributions We could again fit the model twice: Once with X = (SES 0 , NSES 0 ) Once with (SES 0 , NSES 0 , SES 1 ) Method: Adding adult SES 1 is somewhat analogous to mediation Analytic approaches in causal mediation analysis literature (Pearl, 2001; Imai et al., 2010; Valeri and VanderWeele, 2013; VanderWeele, 2015) in fact empirically coincide with analytic approach described here Assumptions differ, but the analytic methods are the same
Equalizing SES Distributions We could again fit the model twice: Once with X = (SES 0 , NSES 0 ) Once with X = (SES 0 , NSES 0 , SES 1 ) Method: Adding adult SES 1 is somewhat analogous to mediation This also allows us also to use more complex models We can do this non-parametrically E.g. we could include interaction (cf. VanderWeele and Vansteelandt, 2009; Imai et al., 2010; VanderWeele, 2015) between race and adult SES 1 We can still estimate “mediated/reduced” and “direct effect/remainder” inequality measures Reduction: Σ m E[Y|R=1,x,m,c] {P(m|R=0,x,c) - P(m|R=0,x,c)} Remainder: Σ m {E[Y|R=1,x,m,c] - E[Y|R=0,x,m,c]} P(m|R=0,x,c)
Equalizing SES Distributions We could again fit the model twice: Once with X = (SES 0 , NSES 0 ) Once with X = (SES 0 , NSES 0 , SES 1 ) Caveat: We will only have specific measures of adult SES The interpretation is thus how the inequality would be reduced if we equalized our actual SES measure(s) Use: In fact, this caveat may actually be useful in informing how to reduce racial inequalities We might compare several different SES measures, or measures of education, to see which might most reduce racial inequalities
Recommend
More recommend