in the presence of latent confounders
play

in the presence of latent confounders and linear non-Gaussian SEMs - PowerPoint PPT Presentation

1 Causal Modeling and Machine Learning Beijing, China, June 2014 Estimation of causal direction in the presence of latent confounders and linear non-Gaussian SEMs Shohei Shimizu Osaka University, Japan with Kenneth Bollen University of


  1. 1 Causal Modeling and Machine Learning Beijing, China, June 2014 Estimation of causal direction in the presence of latent confounders and linear non-Gaussian SEMs Shohei Shimizu Osaka University, Japan with Kenneth Bollen University of North Carolina, Chapel Hill, USA

  2. 2 Abstract • Estimation of causal direction of two observed variables in the presence of latent confounders • A key challenge in causal discovery • Propose a non-Gaussian method • Not require to specify the number of latent confounders • Experiments on artificial and sociology data

  3. Background

  4. 4 Motivation • Causality is a main interest in many empirical sciences • Many recent methods for estimating causal directions (with no temporal information) – Linear non-Gaussian model (Dodge & Rousson 2001; Shimizu et al., 2006) – Nonlinear model (Hoyer et al., 2009; Zhang & Hyvarinen, 2009; Peters et al. 2011) Sleep Depression problems mood or Which is dominant? Sleep Depression ? problems mood Epidemiology (Rosenstrom et al., 2012) • Another important challenge: Latent confounders

  5. 5 Structural equation modeling (SEM) (Bollen, 1989; Pearl, 2000, 2009) • A framework for describing causal relations • An example (of linear cases): 𝒚 𝟑 ∶= 𝒈(𝒚 𝟐 , 𝒇 𝟑 ) e2 x2 x1 = 𝒄 𝟑𝟐 𝒚 𝟐 + 𝒇 𝟑 – The value of 𝑦 2 is determined by the values of 𝑦 1 and error/exogenous variable 𝑓 2 through the linear function • Generally speaking, if the value of 𝑦 1 is changed and that of 𝑦 2 also changes, then 𝑦 1 causes 𝑦 2

  6. 6 Major challenges 1. Estimation of causal direction when temporal information is not available x2 ? x2 or x1 x1 2. Coping with latent confounders x2 ? or f 1 f 1 x1 x1 x2

  7. Non-Gaussian approach: LiNGAM 7 (Linear Non-Gaussian Acyclic Model) (Shimizu et al., 2006) • Acyclic SEMs with different directions distinguishable (Dodge & Rousson, 2001; Shimizu et al., 2006) Model 2: Model 1: e2 e2 e1 e1    x b x e x e or 1 12 2 1 1 1 x1 x2 x1 x2    x e x b x e 2 2 2 21 1 2 e e where and are error/exogenous variables 1 2 • Fundamental assumptions: – e1 and e2 are non-Gaussian – Independence btw. e1 and e2 (No latent confounders)

  8. Different directions give 8 different data distributions Gaussian Non-Gaussian (uniform) x2 x2 Model 1:  x e x1 x1 x1 e1 1 1   0.8 x 0 . 8 x e 2 1 2 x2 e2 Model 2: x2 x2   x 0 . 8 x e x1 e1 1 2 1 x1 0.8 x1  x e x2 e2 2 2       E e E e 0 , 1 2       var x var x 1 1 2

  9. 9 LiNGAM with latent confounders (Hoyer, Shimizu & Kerminen, 2008) • Extension to incorporate non-Gaussian latent f confounders q Q         x f b x e i i iq q ij j i  1  q j i  f q ( q 1 , , Q ) where, WLG, are independent:  f f Q       2 x f e 1 1 1 1 q q 1  q 1 Q        x f b x e 2 2 2 q q 21 1 2 e e x1 x2  q 1 1 2

  10. 10 Previous estimation approaches • Explicitly model latent confounders and compare two models with opposite directions of causation – Maximum likelihood principle (Hoyer et al., 2008 ) – Bayesian model selection (Henao & Winther, 2011) e – Laplace / finite mixture of Gaussians for p( ) i • Require to specify the number of latent confounders, which is difficult in general … … f Q f Q f 1 f 1 or e e e e x1 x2 x1 x2 1 2 1 2

  11. Our proposal Reference: Shimizu and Bollen (2014) Journal of Machine Learning Research In press

  12. 12 Key idea (1/2) • Another look at the LiNGAM with latent confounders: Q        ( m ) ( m ) ( m ) ( m ) x f b x e m -th obs.: 2 2 2 q q 21 1 2  q 1  ( m ) 2 Observations are generated from the LiNGAM    ( m ) model with possibly different intercepts 2 2    ( 1 ) b 2 2 21 ( 1 ) ( 1 ) ( 1 ) e ( 1 ) e x x … f Q 2 f 1 1 2 1 …    ( m ) e e b 2 2 x1 x2 21 1 2 ( 1 ) ( m ) b x e ( m ) ( m ) e x 21 1 2 1 2 …

  13. 13 Key idea (2/2) • Include the sums of latent confounders as the observation-specific intercepts: Q        ( m ) ( m ) ( m ) ( m ) x f b x e m -th obs.: 2 2 2 q q 21 1 2  q 1  ( m ) Obs.-specific 2 intercept • Not explicitly model latent confounders • Neither necessary to specify the number of latent confounders Q nor estimate the  coefficients 2 q

  14. 14 Our approach • Compare these two LiNGAM models with opposite directions: Model 3 (x1  x2) Model 4 (x1  x2) ( m )     ( m )  ( m ) ( m )     ( m )  ( m )  ( m ) x e x b x e 1 1 1 1 1 1 1 12 2 1       ( m ) ( m ) ( m ) ( m )      x b x e ( m ) ( m ) ( m ) x e 2 2 2 21 1 i 2 2 2 2    ( m ) • Many additional parameters ( i 1 , 2 ; m 1 , , n )  i  ( m ) • Prior for the observation-specific intercepts i • Other para. low-informative: Gaussian with large sd. • Bayesian model selection (marginal likelihoods)

  15. 15 Prior for the observation - specific Q Q   intercepts       ( m ) ( m ) ( m ) ( m ) f , f 1 1 q q 2 2 q q   q 1 q 1 • Motivation: Central limit theorem – Sums of independent variables tend to be more Gaussian • Approximate the density by a bell-shaped curve dist.    ( m )  1 ,  1 t -distribution with sd , ~   2  ( m )  v   correlation , and DOF 2 12 • Select the hyper-parameter values that maximize the marginal likelihood: Empirical Bayes         { 0 , 0 . 2 sd ( x ), , 1 . 0 sd ( x )}, { 0 , 0 . 1 , , 0 . 9 }   – l l l 12 v – DOF fixed to be 6 in the experiments below  • Small means similar intercepts l

  16. Experiments on artificial data

  17. 17 Experimental results (100 obs.) • Data generated from LiNGAM with latent confounders • Various non-Gaussian distributions … f Q f 1 – Laplace, Uniform, asymmetric dist. etc. e • Our method uses Laplace for p( ) x1 x2 i Numbers of successful discoveries (100 rep.) N. latent confounders = 6 N. latent confounders = 1 100 100 86 80 80 72 80 58 58 55 55 54 51 60 60 47 39 34 40 40 20 20 0 0 Our Our Hoyer: Henao: Hoyer: Henao: mthd mthd 1, 4 conf. 1, 4, 10 conf. 1, 4 conf. 1, 4, 10 conf.

  18. Experiment on sociology data

  19. 19 Sociology data • Source: General Social Survey (n=1380) – Non-farm background, ages 35-44, white, male, in the labor force, no missing data for any of the covariates, 1972-2006 x 2: Son’s Income Status attainment model (Duncan et al., 1972)

  20. 20 Evaluation of our method using the sociology data Known (temporal) orderings of 15 pairs Father’s Son’s Education Education … Father’s Son’s Education Income … Son’s Son’s Occupation Income

  21. Conclusions

  22. 22 Conclusions • Estimation of causal direction in the presence of latent confounders is a major challenge in causal discovery • Our proposal: Fit linear non-Gaussian SEM with possibly different intercepts to data • Future works – Test other informative priors for observation-specific intercepts – Implement a wider variety of error/prior distributions (e.g., learn DOF of t dist.) – Develop extensions using nonlinear/cyclic models (Hoyer et al., 2009; Zhang & Hyvarinen, 2009; Lacerda et al., 2008) instead of LiNGAM

Recommend


More recommend