why causality to paraphrase a old joke there are two
play

Why causality? To paraphrase a old joke, there are two types of - PDF document

PRNI 2017 21 June 2017 A Sebastian Weichwald Max Planck Institute


  1. PRNI 2017 21 June 2017 A         Sebastian Weichwald Max Planck Institute for Intelligent Systems, Max Planck ETH Center for Learning Systems sweichwald.de/prni2017 neural.engineering � Why causality?

  2. To paraphrase a old joke, there are two types of statisticians: those who do causal inference and those who lie about it. (L Wasserman, Journal of the American Statistical Association, 1999) 1 (FH Messerli, Chocolate Consumption, Cognitive Function, and Nobel Laureates. New England Journal of Medicine, 2012) 2

  3. Why causality? Goal of scientific theories! A scientific theory should ▸ Explain already observed data ▸ Predict future observations ○ of a passively observed system ○ of a system that is actively intervened upon We want to predict the effect of interventions! C 3 Why causality? Goal of neuroimaging studies! amygdala hippocampus explicit memory Hippocampal activity in this study was correlated with amygdala activity, supporting the view that the amygdala enhances explicit memory by modulating activity in the hippocampus. (Anonymous Authors, Trends in Cognitive Sciences, 2001) 4

  4. Common causal frameworks Common causal frameworks ▸ Potential Outcomes Framework ▸ Granger Causality ▸ Dynamic Causal Modelling ▸ Causal Bayesian Networks and Structural Equation Models 5

  5. Potential Outcomes Framework Potential Outcomes Framework Ingredients: ▸ Population U of units u ∈ U , e. g. a patient group ▸ Treatment variable S ∶ U → { t , c } , e. g. assignment to treatment/control ▸ Potential outcomes Y ∶ U × { t , c } → R , e. g. survival times Y t ( u ) and Y c ( u ) of patient u (PW Holland, Statistics and Causal Inference. Journal of the American Statistical Association, 1986) 6

  6. Potential Outcomes Framework Fundamental problem of causal inference: For each unit u we get to observe either Y t ( u ) or Y c ( u ) and hence the treatment effect Y t ( u ) − Y c ( u ) cannot be computed. Possible remedy assumptions: ▸ Unit homogeneity: Y t ( u 1 ) = Y t ( u 2 ) and Y c ( u 1 ) = Y c ( u 2 ) ▸ Causal transience: can measure Y t ( u ) and Y c ( u ) sequentially “Statistical solution”: Average Treatment Effect E [ Y t ] − E [ Y c ] ▸ Can observe E [ Y t ∣ S = t ] and E [ Y c ∣ S = c ] ▸ which, when randomly assigning treatments, i. e. ( Y t ,Y c ) ⊥ ⊥ S , ▸ is equal to E [ Y t ] and E [ Y c ] . (PW Holland, Statistics and Causal Inference. Journal of the American Statistical Association, 1986) 7 Potential Outcomes Framework coffee ? cancer 8

  7. Potential Outcomes Framework ▸ Split population U into ○ ‘consumed little’: S ( u ) = ◻ ○ ‘consumed lots’: S ( u ) = ∎ ▸ Observe whether they suffer from cancer or not, Y ∈ { 0 , 1 } ▸ Assume older units have higher cumulative coffee consumption as well as an increased risk of cancer 8 Potential Outcomes Framework coffee age cancer 8

  8. Potential Outcomes Framework ▸ Split population U into ○ ‘consumed little’: S ( u ) = ◻ ○ ‘consumed lots’: S ( u ) = ∎ ▸ Observe whether they suffer from cancer or not, Y ∈ { 0 , 1 } ▸ Assume older units have higher cumulative coffee consumption as well as an increased risk of cancer ○ ( Y ◻ ,Y ∎ ) / ⊥ ⊥ S ○ E [ Y ◻ ∣ S = ◻ ] < E [ Y ◻ ] � ⇒ E [ Y ∎ ] − E [ Y ◻ ] systematically overestimates the effect of cumulative coffee consumption on cancer 8 Common causal frameworks ▸ Potential Outcomes Framework may work under certain (untestable) assumptions ▸ Granger Causality ▸ Dynamic Causal Modelling ▸ Causal Bayesian Networks and Structural Equation Models 9

  9. Granger Causality Granger Causality Simplified Definition: One stochastic process X is causal to a second Y if the autoregressive predictability of the second process at a given time point is improved by including measurements from the past of the first, i. e. if PredAcc [ Y t ∣ Y < t ] < PredAcc [ Y t ∣ Y < t ,X < t ] ( not by C Granger) (CWJ Granger, Investigating Causal Relations by Econometric Models and Cross-spectral Methods. Econometrica, 1969) 10

  10. Granger Causality X ∶ X t + 1 X t + 2 X t + 3 X t + 4 Z ∶ Z t + 1 Z t + 2 Z t + 3 Z t + 4 Y ∶ Y t + 1 Y t + 2 Y t + 3 Y t + 4 PredAcc [ Y t ∣ Y < t ] < PredAcc [ Y t ∣ Y < t ,X < t ] Granger causality erroneously infers causal influence from X to Y ! (J Peters et al. Causal discovery on time series using restricted structural equation models. NIPS, 2013) 11 Granger Causality Simplified Definition: One stochastic process X is causal to a second Y if the autoregressive predictability of the second process at a given time point is improved by including measurements from the past of the first, i. e. if PredAcc [ Y t ∣ Y < t ] < PredAcc [ Y t ∣ Y < t ,X < t ] ( not by C Granger) Granger’s Definition: One stochastic process X is causal to a second Y if the predictability of the second process at a given time point is worsened by removing past measurements of the first from the universe’s past, i. e. if PredAcc [ Y t ∣ � < t ] > PredAcc [ Y t ∣ � < t ∖ X < t ] (by C Granger) (CWJ Granger, Investigating Causal Relations by Econometric Models and Cross-spectral Methods. Econometrica, 1969) 12

  11. Granger Causality X ∶ X t + 1 X t + 2 X t + 3 X t + 4 Y ∶ Y t + 1 Y t + 2 Y t + 3 Y t + 4 PredAcc [ Y t ∣ � < t ] = PredAcc [ Y t ∣ � < t ∖ X < t ] Granger causality fails to predict the effects of interventions! (N Ay and D Polani, Information flows in causal networks. Advances in Complex Systems, 2008) 13 Common causal frameworks ▸ Potential Outcomes Framework may work under certain (untestable) assumptions ▸ Granger Causality problems with confounding may fail to predict effects of interventions ▸ Dynamic Causal Modelling ▸ Causal Bayesian Networks and Structural Equation Models 14

  12. Dynamic Causal Modelling Dynamic Causal Modelling Causality in DCM is used in a control theory sense and means that, under the model, activity in one brain area causes dynamics in another, and that these dynamics cause the observations. (Friston, PLOS Biology, 2009) Inference procedure: ▸ Observe ▸ Define models M = { M 1 ,...,M N } ▸ Fit models to observed data ▸ Best fitting model ̂ M wins (KJ Friston et al., Dynamic Causal Modelling. NeuroImage, 2003) 15

  13. Dynamic Causal Modelling (KJ Friston et al., Dynamic Causal Modelling. NeuroImage, 2003) 16 Dynamic Causal Modelling (KJ Friston et al., Dynamic Causal Modelling. NeuroImage, 2003) 17

  14. Dynamic Causal Modelling ↭ (KJ Friston et al., Dynamic Causal Modelling. NeuroImage, 2003) 18 Dynamic Causal Modelling Causality in DCM is used in a control theory sense and means that, under the model, activity in one brain area causes dynamics in another, and that these dynamics cause the observations. (Friston, PLOS Biology, 2009) Inference procedure: ▸ Observe ▸ Define models M = { M 1 ,...,M N } ▸ Fit models to observed data ▸ Best fitting model ̂ M wins (KJ Friston et al., Dynamic Causal Modelling. NeuroImage, 2003) 19

  15. Dynamic Causal Modelling Is ̂ M guaranteed to reflect the true connectivities? Number of models Model fit � ⇒ Similar model fit does not translate into similar connectivities! (Lohmann et al., Critical comments on dynamic causal modelling. NeuroImage, 2012) 20 Common causal frameworks ▸ Potential Outcomes Framework may work under certain (untestable) assumptions ▸ Granger Causality problems with confounding may fail to predict effects of interventions ▸ Dynamic Causal Modelling unclear how it predicts interventional setting inference procedure provably correct? ▸ Causal Bayesian Networks and Structural Equation Models 21

  16. Causal Bayesian Networks and Structural Equation Models Structural Equation Models A Structural Equation Model (SEM) M X = (S X , I X , P E X ) with ▸ structural equations S X ; ▸ a set of interventions I X ; ▸ exogenous variables distributed according to P E X induces distributions P X over the X variables for each i ∈ I X . (J Pearl, Causality: Models, reasoning, and inference, 2000; P Spirtes et al., Causation, Prediction, and Search, 2001) 22

  17. Structural Equation Models: Example M X = (S X , I X , P E X ) ⎧ ⎪ ⎪ X 1 = E 1 ⎪ ⎪ ▸ S X = ⎨ ⎪ ⎪ ⎪ X 2 = X 1 + E 2 ⎪ ⎩ ▸ I X = {∅ , do ( X 1 = 5 ) , do ( X 2 = 3 )} ▸ E ∼ N( 0 , I ) observational intervention on X 1 intervention on X 2 P do ( X 1 = 5 ) P do ( X 2 = 3 ) P ∅ X 1 ∼ N ( 0 , 1 ) ≡ 5 ∼ N ( 0 , 1 ) X 1 X 1 P ∅ X 2 ∼ N ( 0 , 2 ) P do ( X 1 = 5 ) P do ( X 2 = 3 ) ∼ N ( 5 , 1 ) ≡ 3 X 2 X 2 (J Pearl, Causality: Models, reasoning, and inference, 2000; P Spirtes et al., Causation, Prediction, and Search, 2001) 23 Causal Bayesian Networks Definition of Cause and Effect ⇒ P do ( X = x ) ≠ P ∅ X → Y ⇐ Y for some x Y Causal Markov Condition d-separation ↝ independence Faithfulness d-separation � independence chain fork collider X → Y → Z X ← Y → Z X → Y ← Z X / ⊥ ⊥ Z X / ⊥ ⊥ Z X ⊥ ⊥ Z X ⊥ ⊥ Z ∣ Y X ⊥ ⊥ Z ∣ Y X / ⊥ ⊥ Z ∣ Y (J Pearl, Causality: Models, reasoning, and inference, 2000; P Spirtes et al., Causation, Prediction, and Search, 2001) 24

Recommend


More recommend