Abstract Structural equation models Applications in brain imaging Extensions Conclusion Linear Non-Gaussian Acyclic Model for Causal Discovery Aapo Hyv¨ arinen Dept of Computer Science University of Helsinki, Finland with Patrik Hoyer, Shohei Shimizu, Kun Zhang, Steve M. Smith Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery
Abstract Structural equation models Applications in brain imaging Extensions Conclusion Abstract ◮ Estimating causal direction is fundamental problem in science ◮ Bayesian networks or structural equation models (SEM) are ill-defined for gaussian data ◮ For non-Gaussian data, SEM is identifiable (Shimizu et al, JMLR 2006) ◮ Theory closely related to independent component analysis (ICA) ◮ A simple approach possible based on likelihood ratios of variable pairs (Hyv¨ arinen and Smith, JMLR, 2013) Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery
Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Practical models for causal discovery ◮ Model connections between the measured variables: Which variable causes which? ◮ “Discovery” means data-driven approach ◮ “Correlation does not equal causation”: but we can go beyond correlation ◮ Two fundamental approaches ◮ If we have time series and time-resolution of measurements fast enough: ◮ we may be able to use autoregressive modelling (Granger causality) ◮ Otherwise, use structural equation models (here) Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery
Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Structural equation models ◮ How does an externally imposed change in one variable affect the others? x4 ◮ Assume influences are linear, and all variables -0.56 observable: -0.3 x2 0.82 0.89 � x3 0.37 x i = b ij x j + e i for all i 0.14 j � = i -0.26 x1 1 -1 0.12 ◮ Difficult to estimate, not simple regression x7 x5 ◮ Classic methods fail: not identifiable 1 ◮ Becomes identifiable if data non-Gaussian x6 (Shimizu et al., JMLR, 2006) Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery
Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Starting point: Two variables ◮ Consider two random variables, x and y , both standardized (zero mean, unit variance) ◮ Goal: distinguish between two statistical models: y = ρ x + d ( x → y ) (1) x = ρ y + e ( y → x ) (2) where disturbances d , e are independent of x , y . ◮ If variables gaussian, completely symmetric: ◮ Variance explained same for both models ◮ Likelihood same for both models (simple function of ρ ) Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery
Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Non-Gaussianity comes to rescue Real-life signals often non-Gaussian 5 6 2 4 1.5 4 3 1 2 2 0.5 1 0 0 0 −0.5 −1 −2 −1 −2 −4 −1.5 −3 −2 −4 −6 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0.7 0.7 0.8 0.7 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0 0 0 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −4 −3 −2 −1 0 1 2 3 4 5 −6 −4 −2 0 2 4 6 Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery
Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Assumption of non-Gaussianity ◮ We assume that in each model, regressor or residual or both are non-Gaussian y = ρ x + d ( x → y ) (3) x = ρ y + e ( y → x ) (4) where disturbances d , e are independent of x , y . ◮ Non-Gaussianity breaks the symmetry between x , y (Dodge and Rousson, 2001; Shimizu et al, 2006). ◮ We can just compare the likelihoods of the models. Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery
Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Illustration of symmetry-breaking non-Gaussian Gaussian y y x x y y x x Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery
Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Intuitive idea behind non-Gaussianity ◮ Central limit theorem: sums of independent variables tend to be more Gaussian ◮ Assume (just on this slide!) that residuals are Gaussian ◮ For y = ρ x + d , y must be more gaussian than x ◮ So, causality must be from the less Gaussian variable to the more Gaussian ◮ We could measure non-Gaussianity with classical measures, e.g. kurtosis/skewness and just look at the difference of kurtoses of x and y . Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery
Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Intuitive idea behind non-Gaussianity ◮ Central limit theorem: sums of independent variables tend to be more Gaussian ◮ Assume (just on this slide!) that residuals are Gaussian ◮ For y = ρ x + d , y must be more gaussian than x ◮ So, causality must be from the less Gaussian variable to the more Gaussian ◮ We could measure non-Gaussianity with classical measures, e.g. kurtosis/skewness and just look at the difference of kurtoses of x and y . ◮ This is a simple illustration with its flaws ◮ The method fails for non-Gaussian residuals ◮ Kurtosis/skewness not a good measure of non-Gaussianity in terms of classical statistical measures (asymptotic variance, robusteness) Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery
Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Likelihood ratio and non-Gaussianity ◮ Principled approach (Hyv¨ arinen and Smith, JMLR, 2013) ◮ Ratio of probabilities that data comes from the two models ◮ Asymptotic limit of the log-likelihood ratio lim log L ( x → y ) L ( y → x ) = − H ( x ) − H ( d /σ d ) + H ( y ) + H ( e /σ e ) with H , differential entropy; residuals d = y − ρ x , e = x − ρ y with variances σ 2 d , σ 2 e . ◮ Entropy is maximized by Gaussian distribution ◮ Log-likelihood ratio is thus nongaussianity( x ) + nongaussianity(residual x → y ) − nongaussianity( y ) − nongaussianity(residual y → x ) Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery
Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Likelihood ratios and independence ◮ We can equally interpret the likelihood ratio as independence ◮ We had asymptotic limit of the likelihood ratio as log L ( x → y ) log L ( y → x ) = − H ( x ) − H ( d /σ d ) + H ( y ) + H ( e /σ e ) (5) with H , differential entropy; residuals d = y − ρ x , e = x − ρ y with variances σ 2 d , σ 2 e . ◮ Mutual information I ( u , v ) = H ( u ) + H ( v ) − H ( u , v ) measures statistical dependence ◮ Log-likelihood ratio can be manipulated to give I ( y , e ) − I ( x , d ) (6) since the terms related to H ( x , e ) and H ( y , e ) cancel. Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery
Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Even simpler approximation of likelihood ratios ◮ We can make first-order approximations to obtain: log L ( x → y ) log L ( y → x ) ≈ ρ � − x t g ( y t ) + g ( x t ) y t T t where typically g ( u ) = − tanh( u ) and ρ is the correlation coefficient. ◮ Choosing between models is reduced to considering the sign of a nonlinear correlation Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery
Recommend
More recommend