linear non gaussian acyclic model for causal discovery
play

Linear Non-Gaussian Acyclic Model for Causal Discovery Aapo Hyv - PowerPoint PPT Presentation

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Linear Non-Gaussian Acyclic Model for Causal Discovery Aapo Hyv arinen Dept of Computer Science University of Helsinki, Finland with Patrik Hoyer,


  1. Abstract Structural equation models Applications in brain imaging Extensions Conclusion Linear Non-Gaussian Acyclic Model for Causal Discovery Aapo Hyv¨ arinen Dept of Computer Science University of Helsinki, Finland with Patrik Hoyer, Shohei Shimizu, Kun Zhang, Steve M. Smith Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

  2. Abstract Structural equation models Applications in brain imaging Extensions Conclusion Abstract ◮ Estimating causal direction is fundamental problem in science ◮ Bayesian networks or structural equation models (SEM) are ill-defined for gaussian data ◮ For non-Gaussian data, SEM is identifiable (Shimizu et al, JMLR 2006) ◮ Theory closely related to independent component analysis (ICA) ◮ A simple approach possible based on likelihood ratios of variable pairs (Hyv¨ arinen and Smith, JMLR, 2013) Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

  3. Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Practical models for causal discovery ◮ Model connections between the measured variables: Which variable causes which? ◮ “Discovery” means data-driven approach ◮ “Correlation does not equal causation”: but we can go beyond correlation ◮ Two fundamental approaches ◮ If we have time series and time-resolution of measurements fast enough: ◮ we may be able to use autoregressive modelling (Granger causality) ◮ Otherwise, use structural equation models (here) Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

  4. Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Structural equation models ◮ How does an externally imposed change in one variable affect the others? x4 ◮ Assume influences are linear, and all variables -0.56 observable: -0.3 x2 0.82 0.89 � x3 0.37 x i = b ij x j + e i for all i 0.14 j � = i -0.26 x1 1 -1 0.12 ◮ Difficult to estimate, not simple regression x7 x5 ◮ Classic methods fail: not identifiable 1 ◮ Becomes identifiable if data non-Gaussian x6 (Shimizu et al., JMLR, 2006) Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

  5. Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Starting point: Two variables ◮ Consider two random variables, x and y , both standardized (zero mean, unit variance) ◮ Goal: distinguish between two statistical models: y = ρ x + d ( x → y ) (1) x = ρ y + e ( y → x ) (2) where disturbances d , e are independent of x , y . ◮ If variables gaussian, completely symmetric: ◮ Variance explained same for both models ◮ Likelihood same for both models (simple function of ρ ) Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

  6. Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Non-Gaussianity comes to rescue Real-life signals often non-Gaussian 5 6 2 4 1.5 4 3 1 2 2 0.5 1 0 0 0 −0.5 −1 −2 −1 −2 −4 −1.5 −3 −2 −4 −6 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0.7 0.7 0.8 0.7 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0 0 0 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −4 −3 −2 −1 0 1 2 3 4 5 −6 −4 −2 0 2 4 6 Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

  7. Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Assumption of non-Gaussianity ◮ We assume that in each model, regressor or residual or both are non-Gaussian y = ρ x + d ( x → y ) (3) x = ρ y + e ( y → x ) (4) where disturbances d , e are independent of x , y . ◮ Non-Gaussianity breaks the symmetry between x , y (Dodge and Rousson, 2001; Shimizu et al, 2006). ◮ We can just compare the likelihoods of the models. Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

  8. Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Illustration of symmetry-breaking non-Gaussian Gaussian y y x x y y x x Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

  9. Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Intuitive idea behind non-Gaussianity ◮ Central limit theorem: sums of independent variables tend to be more Gaussian ◮ Assume (just on this slide!) that residuals are Gaussian ◮ For y = ρ x + d , y must be more gaussian than x ◮ So, causality must be from the less Gaussian variable to the more Gaussian ◮ We could measure non-Gaussianity with classical measures, e.g. kurtosis/skewness and just look at the difference of kurtoses of x and y . Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

  10. Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Intuitive idea behind non-Gaussianity ◮ Central limit theorem: sums of independent variables tend to be more Gaussian ◮ Assume (just on this slide!) that residuals are Gaussian ◮ For y = ρ x + d , y must be more gaussian than x ◮ So, causality must be from the less Gaussian variable to the more Gaussian ◮ We could measure non-Gaussianity with classical measures, e.g. kurtosis/skewness and just look at the difference of kurtoses of x and y . ◮ This is a simple illustration with its flaws ◮ The method fails for non-Gaussian residuals ◮ Kurtosis/skewness not a good measure of non-Gaussianity in terms of classical statistical measures (asymptotic variance, robusteness) Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

  11. Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Likelihood ratio and non-Gaussianity ◮ Principled approach (Hyv¨ arinen and Smith, JMLR, 2013) ◮ Ratio of probabilities that data comes from the two models ◮ Asymptotic limit of the log-likelihood ratio lim log L ( x → y ) L ( y → x ) = − H ( x ) − H ( d /σ d ) + H ( y ) + H ( e /σ e ) with H , differential entropy; residuals d = y − ρ x , e = x − ρ y with variances σ 2 d , σ 2 e . ◮ Entropy is maximized by Gaussian distribution ◮ Log-likelihood ratio is thus nongaussianity( x ) + nongaussianity(residual x → y ) − nongaussianity( y ) − nongaussianity(residual y → x ) Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

  12. Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Likelihood ratios and independence ◮ We can equally interpret the likelihood ratio as independence ◮ We had asymptotic limit of the likelihood ratio as log L ( x → y ) log L ( y → x ) = − H ( x ) − H ( d /σ d ) + H ( y ) + H ( e /σ e ) (5) with H , differential entropy; residuals d = y − ρ x , e = x − ρ y with variances σ 2 d , σ 2 e . ◮ Mutual information I ( u , v ) = H ( u ) + H ( v ) − H ( u , v ) measures statistical dependence ◮ Log-likelihood ratio can be manipulated to give I ( y , e ) − I ( x , d ) (6) since the terms related to H ( x , e ) and H ( y , e ) cancel. Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

  13. Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Even simpler approximation of likelihood ratios ◮ We can make first-order approximations to obtain: log L ( x → y ) log L ( y → x ) ≈ ρ � − x t g ( y t ) + g ( x t ) y t T t where typically g ( u ) = − tanh( u ) and ρ is the correlation coefficient. ◮ Choosing between models is reduced to considering the sign of a nonlinear correlation Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

Recommend


More recommend