A unification of mediation and interaction: a four-way decomposition Tyler J. VanderWeele Departments of Epidemiology and Biostatistics Harvard School of Public Health 1
Plan of Presentation (1) Questions of Mediation and Interaction (2) A Unification of Mediation and Interaction (3) Regression Approaches and Ratio Scales (4) Application to Genetic Epidemiology (5) Relation to Prior Decompositions (6) Concluding Remarks 2
Mediation In some research contexts we might be interested in the extent to which the effect of some exposure A on some outcome Y is mediated by an intermediate variable M and to what extent it is direct A M Y Stated another way, we are interested in the direct and indirect effects of the exposure In other research contexts we may be interested in whether A and M interact in their effects, and how much of their effects are due to interaction 3
Mediation In some cases, we may be interested in both mediation and interaction In 2008, GWAS studies found variants 15q25.1 associated with lung cancer (Thorgeirsson et al., 2008; Hung et al., 2008; Amos et al., 2008) These same variant were known to be associated with smoking (average cigarettes per day) (Saccone et al., 2007; Spitz et al., 2008) The variants also increased vulnerability to the harmful effect of smoking, a gene-environment interaction e.g. carriers of the variant allele extract more nicotine and toxins from each cigarette (Le Marchand, 2008) The causal inference literature has developed methods that can assess mediation in the presence of interaction to get direct and indirect effects In this example from genetic epidemiology, most of the effect seemed “ direct ” (94%) with respect to cigarettes per day (VanderWeele et al. 2012) 4 But this does not clarify the role of interaction itself
Notation Let Y denote some outcome of interest for each individual Let A denote some exposure or treatment of interest for each individual Let M denote some post-treatment intermediate(s) for each individual (potentially on the pathway between A and Y) Let C denote a set of covariates for each individual Let Y a be the counterfactual outcome (or potential outcome) Y for each individual when intervening to set A to a Let M a be the counterfactual outcome M for each individual when intervening to set A to a Let Y am be the counterfactual outcome Y for each individual 5 when intervening to set A to a and M to m
A Unification of Mediation and Interaction We can in fact decompose a total effect, TE = Y 1 - Y 0 , into four components (VanderWeele, 2014) under the “composition” assumption that Y a =Y aMa (1)A controlled direct effect (CDE): the effect of A in the absence of M (2)A reference interaction (INT ref ): The interaction that operates only if the mediator is present in the absence of exposure (3)A mediated interaction (INT med ): The interaction that operates only if the exposure changes the mediator (4)A pure indirect effect (PIE): The effect of the mediator in the absence of 6 the exposure times the effect of the exposure on the mediator
A Unification of Mediation and Interaction We can summarize the four components as: (1)CDE: Neither mediation nor interaction (2)INT ref : Interaction but not mediation (3)INT med : Both mediation and interaction (4)PIE: Mediation but not interaction 7
A Unification of Mediation and Interaction We cannot identify these effects for an individual but, under certain confounding assumptions (next slides), we can identify them on average for a population. If so, we let p am = P(Y=1|A=a,M=m) then we have: We could calculate the proportions due to each of the components: 8
A Unification of Mediation and Interaction The four components are: We could add E[INT ref ] and E[INT med ] for the overall proportion due to interaction: We could add E[PIE] and E[INT med ] for the overall proportion due to mediation: 9
Identification The confounding assumptions are the same as those generally used in the causal inference literature to identify direct and indirect effects: (1) There are no unmeasured exposure-outcome confounders given C (2) There are no unmeasured mediator-outcome confounders given (C,A) (3) There are no unmeasured exposure-mediator confounders given C (4) None of the mediator-outcome confounders are affected by exposure For controlled direct effects, only assumptions (1) and (2) are needed C 1 A M Y Note (1) and (3) are guaranteed when treatment is randomized C 3 C 2 10
Identification More formally, in counterfactual notation, these assumptions are: (1)is Y am | | A | C (2) is Y am | | M | C,A (3) is M a | | A | C (4) is Y am | | M a* | C For controlled direct effects, only assumptions (1) and (2) are needed C 1 A M Y Note (1) and (3) are guaranteed when treatment is randomized C 3 C 2 11
Regression Approach Similar results hold if one or both of A or M are binary Under the confounding assumptions we can estimate each of the four components in a straightforward way using regression models for Y and M: Under these models if our confounding assumptions, then the effects for a change in the exposure from reference level a* to level a are given by: 12
Relation to Mediation Decompositions Our basic four-way decomposition was: If we combine the CDE and INT ref we obtain what is sometimes called the “ natura/pure direct effect ” If we combine the PIE and INT med we obtain what is some times called the “ natural/total indirect effect ” (Robins and Greenland1992;Pearl 2001) PDE = Pure direct effect (natural direct effect) = TIE = Total indirect effect (natural indirect effect = These are also sometimes called natural direct and indirect effects This is the decomposition of Robins and Greenland (1992) and Pearl (2001) This is essentially the decomposition used in epidemiology and the social sciences when interaction is absent 13
Relation to Prior Decompositions VanderWeele and Tchetgen Tchetgen (2014) also showed the total effect could be divided into CDE, PIE and proportion attributable to interaction; the 4-way decomposition unites all other; We can summarize in a figure: 14
Ratio Scale A similar four-way decomposition also holds using a ratio scale Where RR am = p am /p 00 and where κ = p 00 / p a=0 is a scaling factor If we divide each component by the sum, then κ drops out: We can estimate the components using logistic regression (w/SAS code) 15 We can also proceed with case-control data under a rare outcome assumption
Genetic Epidemiology In 2008, GWAS studies found variants 15q25.1 associated with lung cancer (Thorgeirsson et al., 2008; Hung et al., 2008; Amos et al., 2008) These same variant were known to be associated with smoking (average cigarettes per day) (Saccone et al., 2007; Spitz et al., 2008) The variants also increased vulnerability to the harmful effect of smoking, a gene-environment interaction e.g. carriers of the variant allele extract more nicotine and toxins from each cigarette (Le Marchand, 2008) When methods for direct and indirect effects were employed most of the effect seemed “ direct ” with respect to cigarettes per day (VanderWeele et al. 2012) But this did not fully capture the role of interaction; there was evidence for such interaction (Li et al, 2010; Truong et al, 2010; VanderWeele et al, 2012) Now we will examine what proportion of the effect is due (i) to just mediation, 16 (ii) to just interaction, (iii) to both and (iv) to neither
Genetic Epidemiology The study sample consists of 1836 cases and 1452 controls is from a case control study (cf. Miller et al., 2002) assessing the molecular epidemiology of lung cancer, which began in 1992 at the Massachusetts General Hospital (MGH) Eligible cases included any person over the age of 18 years, with a diagnosis of primary lung cancer that was further confirmed by an MGH lung pathologist. The controls were recruited from among the friends or spouses of cancer patients or the friends or spouses of other surgery patients in the same hospital. Potential controls that carried a previous diagnosis of any cancer (other than non-melanoma skin cancer) were excluded from 17 participation.
Genetic Epidemiology Sample characteristics of cases and controls _________________________________________________________________ Cases (N=1836) Controls (N=1452) _________________________________________________________________ Average Cigarettes per Day 25.42 13.97 Smoking Duration 38.50 18.93 Age 64.86 58.58 College Education 31.3% 33.5% Sex Male 50.1% 56.1% Female 49.9% 43.9% rs8034191 C alleles 0 33.8% 43.3% 1 48.5% 43.7% 2 17.7% 13.0% 18
Assumptions About Confounding To use our approach with the genetic variants we need to assume no unmeasured confounding for the (1) exposure-outcome, (2) mediator- outcome, and (3) exposure-mediator relationships Assumptions (1) and (3) are probably plausible for the exposure (the genetic variant) subject to no population stratification (the analysis was restricted to Caucasians) *(2)* No confounding may be less plausible for the smoking – lung cancer association (e.g. SES / neighborhood) We consider sensitivity analysis later (4) Smoking duration may affect C A M Y cigarettes/day and lung cancer and may affected by the variant (though not much evidence) and results are similar C U when duration is omitted
Recommend
More recommend