Linear Non-Gaussian Acyclic Model for Causal Discovery Aapo Hyv - PowerPoint PPT Presentation

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Linear Non-Gaussian Acyclic Model for Causal Discovery Aapo Hyv¨ arinen Dept of Computer Science University of Helsinki, Finland with Patrik Hoyer, Shohei Shimizu, Kun Zhang, Steve M. Smith Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Abstract ◮ Estimating causal direction is fundamental problem in science ◮ Bayesian networks or structural equation models (SEM) are ill-defined for gaussian data ◮ For non-Gaussian data, SEM is identifiable (Shimizu et al, JMLR 2006) ◮ Theory closely related to independent component analysis (ICA) ◮ A simple approach possible based on likelihood ratios of variable pairs (Hyv¨ arinen and Smith, JMLR, 2013) Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Practical models for causal discovery ◮ Model connections between the measured variables: Which variable causes which? ◮ “Discovery” means data-driven approach ◮ “Correlation does not equal causation”: but we can go beyond correlation ◮ Two fundamental approaches ◮ If we have time series and time-resolution of measurements fast enough: ◮ we may be able to use autoregressive modelling (Granger causality) ◮ Otherwise, use structural equation models (here) Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Structural equation models ◮ How does an externally imposed change in one variable affect the others? x4 ◮ Assume influences are linear, and all variables -0.56 observable: -0.3 x2 0.82 0.89 � x3 0.37 x i = b ij x j + e i for all i 0.14 j � = i -0.26 x1 1 -1 0.12 ◮ Difficult to estimate, not simple regression x7 x5 ◮ Classic methods fail: not identifiable 1 ◮ Becomes identifiable if data non-Gaussian x6 (Shimizu et al., JMLR, 2006) Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Starting point: Two variables ◮ Consider two random variables, x and y , both standardized (zero mean, unit variance) ◮ Goal: distinguish between two statistical models: y = ρ x + d ( x → y ) (1) x = ρ y + e ( y → x ) (2) where disturbances d , e are independent of x , y . ◮ If variables gaussian, completely symmetric: ◮ Variance explained same for both models ◮ Likelihood same for both models (simple function of ρ ) Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Non-Gaussianity comes to rescue Real-life signals often non-Gaussian 5 6 2 4 1.5 4 3 1 2 2 0.5 1 0 0 0 −0.5 −1 −2 −1 −2 −4 −1.5 −3 −2 −4 −6 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0.7 0.7 0.8 0.7 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0 0 0 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −4 −3 −2 −1 0 1 2 3 4 5 −6 −4 −2 0 2 4 6 Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Assumption of non-Gaussianity ◮ We assume that in each model, regressor or residual or both are non-Gaussian y = ρ x + d ( x → y ) (3) x = ρ y + e ( y → x ) (4) where disturbances d , e are independent of x , y . ◮ Non-Gaussianity breaks the symmetry between x , y (Dodge and Rousson, 2001; Shimizu et al, 2006). ◮ We can just compare the likelihoods of the models. Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Illustration of symmetry-breaking non-Gaussian Gaussian y y x x y y x x Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Intuitive idea behind non-Gaussianity ◮ Central limit theorem: sums of independent variables tend to be more Gaussian ◮ Assume (just on this slide!) that residuals are Gaussian ◮ For y = ρ x + d , y must be more gaussian than x ◮ So, causality must be from the less Gaussian variable to the more Gaussian ◮ We could measure non-Gaussianity with classical measures, e.g. kurtosis/skewness and just look at the difference of kurtoses of x and y . Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Intuitive idea behind non-Gaussianity ◮ Central limit theorem: sums of independent variables tend to be more Gaussian ◮ Assume (just on this slide!) that residuals are Gaussian ◮ For y = ρ x + d , y must be more gaussian than x ◮ So, causality must be from the less Gaussian variable to the more Gaussian ◮ We could measure non-Gaussianity with classical measures, e.g. kurtosis/skewness and just look at the difference of kurtoses of x and y . ◮ This is a simple illustration with its flaws ◮ The method fails for non-Gaussian residuals ◮ Kurtosis/skewness not a good measure of non-Gaussianity in terms of classical statistical measures (asymptotic variance, robusteness) Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Likelihood ratio and non-Gaussianity ◮ Principled approach (Hyv¨ arinen and Smith, JMLR, 2013) ◮ Ratio of probabilities that data comes from the two models ◮ Asymptotic limit of the log-likelihood ratio lim log L ( x → y ) L ( y → x ) = − H ( x ) − H ( d /σ d ) + H ( y ) + H ( e /σ e ) with H , differential entropy; residuals d = y − ρ x , e = x − ρ y with variances σ 2 d , σ 2 e . ◮ Entropy is maximized by Gaussian distribution ◮ Log-likelihood ratio is thus nongaussianity( x ) + nongaussianity(residual x → y ) − nongaussianity( y ) − nongaussianity(residual y → x ) Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Likelihood ratios and independence ◮ We can equally interpret the likelihood ratio as independence ◮ We had asymptotic limit of the likelihood ratio as log L ( x → y ) log L ( y → x ) = − H ( x ) − H ( d /σ d ) + H ( y ) + H ( e /σ e ) (5) with H , differential entropy; residuals d = y − ρ x , e = x − ρ y with variances σ 2 d , σ 2 e . ◮ Mutual information I ( u , v ) = H ( u ) + H ( v ) − H ( u , v ) measures statistical dependence ◮ Log-likelihood ratio can be manipulated to give I ( y , e ) − I ( x , d ) (6) since the terms related to H ( x , e ) and H ( y , e ) cancel. Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

Introduction Abstract Definition of two-variable model Structural equation models Assumption of non-Gaussianity Applications in brain imaging Likelihood ratio Extensions General definition of LiNGAM Conclusion Estimation of LiNGAM by ICA Even simpler approximation of likelihood ratios ◮ We can make first-order approximations to obtain: log L ( x → y ) log L ( y → x ) ≈ ρ � − x t g ( y t ) + g ( x t ) y t T t where typically g ( u ) = − tanh( u ) and ρ is the correlation coefficient. ◮ Choosing between models is reduced to considering the sign of a nonlinear correlation Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

Linear Non-Gaussian Acyclic Model for Causal Discovery Aapo Hyv - PowerPoint PPT Presentation

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Linear Non-Gaussian Acyclic Model for Causal Discovery Aapo Hyv arinen Dept of Computer Science University of Helsinki, Finland with Patrik Hoyer,

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery

Causal Effect Evaluation and Causal Network Learning Zhi Geng Peking University, China June

Finding Strongly Connected Components Directed Acyclic Graphs Directed Acyclic Graphs Directed

6.1 Directed Acyclic Graphs Directed acyclic graphs , or DAGs are acyclic directed graphs where

5.1 Directed Acyclic Graphs Directed acyclic graphs , or DAGs are acyclic directed graphs where

Causal Discovery from Observational Data Brady Neal causalcourse.com What if we dont have

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CAUSAL DISCOVERY CAUSAL DISCOVERY Beware of the DAG! Beware of the DAG! Philip Dawid

Causal Programming Causal Programming Joshua Brul Joshua Brul

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Causal Inference By: Miguel A. Hern an and James M. Robins Part I: Causal inference without

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Benchmarks, wikis, and open-source causal discovery Patrik O. Hoyer Univ. of Helsinki Finland

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

Anatomically and Functionally Constrained MEG/EEG Source Estimates Matti Hmlinen

Boosted Spatial and Temporal Precision in Functional Brain Imaging via Multimodal Analysis

A A deep ep l lear arni ning-bas based ed p pipel peline ne f for e error de detecti

3/25/16 Decision Making Finances: Save money or spend it? The Neural Mechanisms of

Brain Imaging of Protein Synthesis: What can it tell us about Fragile X ? 1 R E S E A R CH S T

high-resolution optical imaging Ludovico Silvestri European Laboratory for Non-linear

Exploring the Brains Activity: a signal and image modeling challenge Maureen Clerc Inria

UCSF: Advances in Internal Medicine 2020: Whats New in Neurology? Megan Richie, MD